On the plane home from Canberra, I worked on my various writing projects.
In the time around sessions, and on the conference hike (with kangaroos), I had many conversations with Marcus Frean (Wellington) about a generalization of his source finding system. He finds sources in data by looking for anomalies in the "pixel histogram"; his model has no knowledge of astronomy (or anything else) on the inside. We discussed several generalizations in which the model would learn—as it saw more and more data—what kinds of properties that astronomical data have. The idea is that the system should learn that the anomalies (sources) fall into different categories, learn the process that generates each of those categories, and instantiate new categories as required by the data. A system like this would be like a simulation of a hypothesis-generating astronomer! It also would be extremely useful running on any operating astronomical survey; "eyes on the data" are always valuable and usually very expensive. As my reader knows, I think that eyes on the data is the most valuable contribution of massive citizen science projects like the Zooniverse; awesome if we could add some robots into the mix!
At the end of the day (Australian time), during the MaxEnt2013 conference dinner, Gaia launched! It looks right now like the launch was successful. This is potentially the beginning of a new era in observational astrophysics. Congratulations to everyone involved.
I spoke at MaxEnt2013 today, in a short astronomy session that included also Ensslin (MPA), Frean (Wellington), and Brewer. Brewer spoke about our project to fully marginalize out catalogs, and Frean showed some exceedingly general methods for source discovery in data streams, applied (among other things) to astronomical data. I pitched a project to him at lunch about predicting or automatically inspecting survey data as it comes off of telescopes, which would be a beautiful extension of his work. Ensslin showed awesome reconstructions of astrophysical fields (especially the magnetic field in the Galaxy) from sparse data samples (rotation measures, in this case). He uses ideas from field theory to go beyond the Gaussian Process.There were many valuable talks; too many to mention. Stand-outs for me included a talk by Hutter (ANU) about things that overlap my crazy paper. He was arguing for a message-length approach to selecting theories, especially huge theories of everything. He made the good point that the message must include both the initial conditions and a description of the position of the observer. Hutter describes himself as a mathematical philosopher. Lineweaver (ANU) argued passionately that the Universe is not a fluctuation away from a high-entropy state (I agree) and Goyal (Albany) argued that exchangeability can be used to prove that the universe can only contain fermions and bosons (nothing else). On the latter, I would like to understand it better; I certainly grew up learning the opposite: I learned that this was an additional postulate. Wood (ANU) gave a nice overview of probabilistic topic models and their value and limitations.
After lunch, there were break-out sessions, and we guided the (very well attended) astronomical one to things where Brewer, Murray, and I overlap. We talked about combining information from images taken at different times, through different bandpasses, and with very different calibration properties. The issues are very different if you have the multiple images or if you just have catalogs. Many good ideas came up, including many that I had (nearly) forgotten from my Gaia paper. In the end, we didn't resolve anything but we specified a very sensible project, which is to figure out how one might construct catalog outputs such that the catalogs can be combined to produce inferences that are almost as good as the inferences you get from working with the images directly. Very sensible! And very related to abortive projects I have started with Marshall.
At the end of a long day, Huppenkothen (Amsterdam) was showing Murray and me bursts from Fermi observations of a magnetar, and discussing ways we might fit the data with some kind of process (Gaussian or dictionary). We accreted Brewer and Frean and then challenged ourselves to produce a result by midnight. After a monster hack session we succeeded; we hope to be able to use what we have to constrain rise times (or make some new discovery) in these kinds of bursts.
A little bit of hooky: I was kidnapped by Aaron Dotter (ANU) and taken up to Mt Stromlo Observatory, to chat with the locals about data and calibration (my favorite subjects these days). Ken Freeman (ANU) came by, and we discussed the just-starting HERMES project, which is on sky and taking data. The project is performing a survey of a million stars at high s/n (like 100) and high-ish resolution (like 30,000 or 50,000). The idea is to do "chemical tagging" and dissect the Milky Way into its accretion-historical parts. There are two challenges we discussed. The first is calibration and extraction of the spectra, which must deal with cross-talk between fibers (yes, it is fiber-fed like APOGEE) or their traces on the CCD, modeling and removal of sky in wavelength regions where there are no sky lines, and determination of the point-spread and line-spread functions as a function of position in the device. The second is the chemical tagging problem in the limit that models are good but not perfect. I have many ideas about that; they fall into the category we talked about with Sontag (NYU) last week: Simultaneously we want to use the models to understand the data and the data to update the models.
In the meeting today I missed some of the talks, of course, which I regret. There were talks in the morning about the possibility that not only does nature (in equilibrium) maximize entropy, but perhaps when it is out of equilibrium it also maximizes entropy production. I actually worked on this as an undergraduate back around 1991 with Michel Baranger (MIT emeritus); back then we were suspicious that the principle could even work. I think now it is known that for some systems, in some circumstances, they do seem to choose the path of maximum dissipation, but the general value of the principle is not clear; it might even be more misleading than useful.
Iain Murray (Edinburgh) gave a great talk (clear, surprising, useful), about inferring density functions (probability distributions) given points. He showed some amazing results for "autogregressive" models, which are so much more general than what I thought they were this summer when we were working on Kepler. He gave me a lot of new ideas for MJ Vakili's project.
Today was the first day of MaxEnt 2013 in Canberra, Australia. There were many great talks, including about exponential-family probability distributions and their generalizations, image reconstruction from geophysical and medical imaging, model selection via marginalized likelihood, and inference and decision making for networks and flow. There were also amusing discussions of the "surprise test paradox" and the "black raven paradox" and other simple probability arguments that are very confusing. These conversations carried into dinner, at which Iain Murray (Edinburgh) and Brewer and I argued about their relevance to our understanding of inference.
The most productive part of the day for me was at lunch (and a bit beyond), during which Murray, Brewer, and I argued with various attendees about the various topics of discussion I brought to Australia to discuss with Murray. Among the various things that came up are GPLVM (by Neil Lawrence) as a possible tool for Vakili and me on galaxy image priors, Gaussian Processes for not just function values but also derivatives (including higher derivatives) and the potential this has for "data summary", and MCMC methods to find and explore badly multi-modal posterior pdfs. We spent some significant time discussing how to make likelihood calculation more efficient or adaptive. In particular, Murray pointed out that if you are using it in a Metropolis–Hastings accept/reject step, how precisely you need to know it depends on the value of the random number draw; in principle this should be passed into the likelihood function! We also spent some time talking about how large-scale structure is measured. Murray had some creative ideas about how to use better the existing cosmological simulations in the data analysis.
Part of CampHogg had lunch with David Sontag (NYU) today; Sontag works on tractable approximations to intractable Bayesian inferences (among other things). In particular, he is interested in making scientific discoveries in data using non-trivial Bayesian models. We spent much of lunch discussing the gap between supervised and unsupervised methods; most important scientific tasks don't fall clearly into one category or the other, and progress in the huge region between them could be immensely useful. I pitched several April-Fools scale projects at Sontag; none have quite stuck yet.
In the afternoon, Christian Ott (Caltech) gave a nice talk about numerical modeling of exploding stars, and the persistent problem that supernova explosions do not reliably happen on the computer. That deep problem has been around for all the time I have been an astrophysicist!
Over lunch we the state of Foreman-Mackey's sampling of all Kepler Objects of Interest, and the state of Fadely and my project to infer the spectra of objects using photometry alone. With the term coming to an end, it was barely a research day.
In a secret "overwhelming force" project, Foreman-Mackey is resampling all the Kepler Objects of Interest, to produce a full probabilistic catalog. In many low signal-to-noise systems, there are fitting degeneracies (really near-degeneracies). In many of these, the posterior pdf does not accord with our intuitive views about what ought to be going on. We realized that this is because our "flat priors" didn't accord with our real, prior beliefs. That is, there was a disparity between our actual prior beliefs and our coded-up prior function. We knew this would be the case—we are using wrong but simple interim priors that we plan to replace with a hierarchical inference—but it was amusing to be reminded of the the simple point that ugly priors lead to ugly inferences. We made some small changes to the priors, resampled, and our inferences look much better.
Say you have many noisy inferences of some quantity (planet radius, say, for hundreds of planets), and you want to know the true distribution of that quantity (the planet-radius distribution you would observe with very high signal-to-noise data). How should you estimate the distribution? One option: Histogram your maximum-likelihood estimates. Another: Co-add (sum) your individual-object likelihood functions. Another: Co-add your individual-object posterior pdfs. These are all wrong, of course, but the odd thing is that the latter two—which seem so sensible, since they "make use of" uncertainty information in the inferences—are actually wronger than just histogramming your ML estimates. Why? Because your ML-estimate histogram is something like the truth convolved with your uncertainties, but a co-add of your likelihoods or posteriors is pretty-much the same as that but convolved again. The Right Thing To Do (tm) is hierarchical inference, which is like a deconvolution (by forward modeling, of course). I feel like a skipping record. Fadely, Foreman-Mackey, and I discussed all this over lunch, in the context of recent work on (wait for it) planet radii.
In MCMC meeting today, Goodman brought up the difference between the fully marginalized posterior model probability and what you learn by cross-validation, for model selection. As my loyal reader knows, I have many thoughts about this and also a nascent paper with Vanderplas. However, Goodman has a different take from me: He sees cross-validation as producing the most predictive model (preventing over-fitting), but posterior probability as delivering the most probable model, given the universe of models. I think this is deeply (and obviously) correct. However, we haven't settled on words for Hou's paper yet, because he is still willing to use the T-word ("truth"), and I am not! (I also think, in the end, this is all related to the point that we want to deliver the most useful result for our purposes, and this necessarily involves utility.)
Ramirez-Ruiz (Santa Cruz) gave a morning talk on tidal disruption flares. He finds that for every fully disrupted star (by a central black hole in a galaxy) there should be many partially disrupted stars, and we should be able to find the remnants. The remnants should look odd in various ways, and be on odd orbits. Worth looking for!
In the afternoon, Johnson (Harvard) talked about exoplanets around cool stars, with some diversions into interesting false positives (a white dwarf eclipsing an M star, providing a gravitational-lensing measurement of the white dwarf mass) and new hardware (the MINERVA project is building an exoplanet search out of cheap hardware). Johnson gave various motivations for his work, but not the least was the idea that someday we might go to one of these planets! A great pair of talks. Late in the afternoon, Johnson and I collected some suggestions for combining our research groups and projects.
John Asher Johnson (Harvard) and Ben Montet (Caltech) are secretly visiting NYU today and tomorrow. We spent a long session today talking about overlapping interests. Montet showed results on the population of long-period planets, where they have radial velocity trends (not period signals but trends) and adaptive-optics imaging upper limits (that is, no detections). What can you conclude when you have no period and no direct detection? A lot, it turns out, because the trends sets a relationship between mass, inclination, and period, and the adaptive optics rules out a large class of binary-star models.
In related news, Johnson was excited about the methods Fadely and I are using to infer the HST pixel-convolved point-spread function. It is very related to methods he wants to use to infer the line-spread function in the HiRes data he has on exoplanets. He was particularly impressed by our smoothness priors that regularize very flexible model fits without breaking convexity.
In a low-research day, Fadely and I discussed the issue that any HST PSF model must be able to track or model the dependence of the PSF on focus, which changes during the mission, both in a secular way and through the orbit (depending on pointing). That's a problem, because with only millions of stars, we do not have lots of excess data for our goals.
Fadely and Foreman-Mackey are both having fitting issues that are hard to comprehend, both in extremely ambitious comprehensive data analysis programs. Fadely has a model where the update steps (the hand-built optimization steps) are guaranteed (by math) to improve the objective function and yet, and yet! I asked him for a document, so we can compare code to document. His model is a beautiful one, which simultaneously finds the position and flux of every star in the HST data for the WFC3 IR channel, the point-spread function, and the pixel-level flat-field!
Foreman-Mackey is finding that his automatic re-fits (samplings using emcee and interim priors) to all Kepler Objects of Interest are favoring high impact parameters. This is a generic problem with exoplanet transit fits; the KOI best-fit values have these biases too; that doesn't make it trivial to understand. Even our hierarchical Bayesian inference of the impact-parameter distribution is not okay. It has something to do with the prior volume or else with the freedom to fit at larger impact parameter; or perhaps a (wrong) lack of penalty for large planets. Not sure yet. We have some hypotheses we are going to test (by looking at the samplings and prior-dependences) tomorrow.
Vakili and I discussed the issue that you can run kernel PCA on galaxy images, or on the shapelet transforms of galaxy images, and you should get the same answer. PCA is invariant to rotations of the coordinate system. However, really we are using the shapelets for denoising: We truncate the high-order terms that we think are noise-dominated. We discussed less heuristic approaches to this.
At MCMC meeting, Hou showed his impressive results on marginalized likelihood computations. He gets answers that are provably (if the code is correct) unbiased and come with uncertainty estimates. He gets some discrepancies with numbers in the literature, even when he uses the same data and the same prior pdfs, so we are confused, but we don't know how to diagnose the differences. Goodman explained to us the magical Bernstein–von Mises theorem, which guarantees that the posterior pdf approaches a Gaussian as the data grows very large. Of course the theorem depends on assumptions that cannot possibly be true, like that the model space includes the process that generated the data in the first place!
On the phone with the exoSAMSI crew, we de-scoped our first papers on search to the minimum (and set Spring targets for completion). At lunch, Mark Wyman (NYU) talked about modifications to inflation that would make the gravitational wave signatures both more prominent and more informative.