In MCMC meeting today, Goodman brought up the difference between the fully marginalized posterior model probability and what you learn by cross-validation, for model selection. As my loyal reader knows, I have many thoughts about this and also a nascent paper with Vanderplas. However, Goodman has a different take from me: He sees cross-validation as producing the most predictive model (preventing over-fitting), but posterior probability as delivering the most probable model, given the universe of models. I think this is deeply (and obviously) correct. However, we haven't settled on words for Hou's paper yet, because he is still willing to use the T-word ("truth"), and I am not! (I also think, in the end, this is all related to the point that we want to deliver the most useful result for our purposes, and this necessarily involves utility.)
Ramirez-Ruiz (Santa Cruz) gave a morning talk on tidal disruption flares. He finds that for every fully disrupted star (by a central black hole in a galaxy) there should be many partially disrupted stars, and we should be able to find the remnants. The remnants should look odd in various ways, and be on odd orbits. Worth looking for!
In the afternoon, Johnson (Harvard) talked about exoplanets around cool stars, with some diversions into interesting false positives (a white dwarf eclipsing an M star, providing a gravitational-lensing measurement of the white dwarf mass) and new hardware (the MINERVA project is building an exoplanet search out of cheap hardware). Johnson gave various motivations for his work, but not the least was the idea that someday we might go to one of these planets! A great pair of talks. Late in the afternoon, Johnson and I collected some suggestions for combining our research groups and projects.
John Asher Johnson (Harvard) and Ben Montet (Caltech) are secretly visiting NYU today and tomorrow. We spent a long session today talking about overlapping interests. Montet showed results on the population of long-period planets, where they have radial velocity trends (not period signals but trends) and adaptive-optics imaging upper limits (that is, no detections). What can you conclude when you have no period and no direct detection? A lot, it turns out, because the trends sets a relationship between mass, inclination, and period, and the adaptive optics rules out a large class of binary-star models.
In related news, Johnson was excited about the methods Fadely and I are using to infer the HST pixel-convolved point-spread function. It is very related to methods he wants to use to infer the line-spread function in the HiRes data he has on exoplanets. He was particularly impressed by our smoothness priors that regularize very flexible model fits without breaking convexity.
In a low-research day, Fadely and I discussed the issue that any HST PSF model must be able to track or model the dependence of the PSF on focus, which changes during the mission, both in a secular way and through the orbit (depending on pointing). That's a problem, because with only millions of stars, we do not have lots of excess data for our goals.
Fadely and Foreman-Mackey are both having fitting issues that are hard to comprehend, both in extremely ambitious comprehensive data analysis programs. Fadely has a model where the update steps (the hand-built optimization steps) are guaranteed (by math) to improve the objective function and yet, and yet! I asked him for a document, so we can compare code to document. His model is a beautiful one, which simultaneously finds the position and flux of every star in the HST data for the WFC3 IR channel, the point-spread function, and the pixel-level flat-field!
Foreman-Mackey is finding that his automatic re-fits (samplings using emcee and interim priors) to all Kepler Objects of Interest are favoring high impact parameters. This is a generic problem with exoplanet transit fits; the KOI best-fit values have these biases too; that doesn't make it trivial to understand. Even our hierarchical Bayesian inference of the impact-parameter distribution is not okay. It has something to do with the prior volume or else with the freedom to fit at larger impact parameter; or perhaps a (wrong) lack of penalty for large planets. Not sure yet. We have some hypotheses we are going to test (by looking at the samplings and prior-dependences) tomorrow.
Vakili and I discussed the issue that you can run kernel PCA on galaxy images, or on the shapelet transforms of galaxy images, and you should get the same answer. PCA is invariant to rotations of the coordinate system. However, really we are using the shapelets for denoising: We truncate the high-order terms that we think are noise-dominated. We discussed less heuristic approaches to this.
At MCMC meeting, Hou showed his impressive results on marginalized likelihood computations. He gets answers that are provably (if the code is correct) unbiased and come with uncertainty estimates. He gets some discrepancies with numbers in the literature, even when he uses the same data and the same prior pdfs, so we are confused, but we don't know how to diagnose the differences. Goodman explained to us the magical Bernstein–von Mises theorem, which guarantees that the posterior pdf approaches a Gaussian as the data grows very large. Of course the theorem depends on assumptions that cannot possibly be true, like that the model space includes the process that generated the data in the first place!
On the phone with the exoSAMSI crew, we de-scoped our first papers on search to the minimum (and set Spring targets for completion). At lunch, Mark Wyman (NYU) talked about modifications to inflation that would make the gravitational wave signatures both more prominent and more informative.
Or Graur (JHU), Yuqian Liu (NYU), Maryam Modjaz (NYU), and Gabe Perez-Giz (NYU) came by today to pick my brain and Fadely's brain about interpreting spectral data. Their problem is that they want to analyze supernova spectral data, but for which they don't know the SN spectral type, don't know the velocity broadening of the lines, don't know the true spectral resolution, don't know the variance of the observational noise, and expect the noise variance to depend on wavelength. We discussed proper probabilistic approaches, and also simple filtering techniques, to separate the signal from the noise. Obviously strong priors on supernova spectra help enormously, but the SN people want to stay as assumption-free as possible. In the end, a pragmatic filtering approach won out; we discussed ways to make the filtering sensible and not mix (too badly) the signal output with the noise output.
Aside from a blackboard talk by Maryam Modjaz (NYU) about supernova types and classification, it was a day of all talk. I spoke with Goodman and CampHogg about Hou's paper on marginalized likelihood calculation using the geometric path. I spoke with Vakili about how you go, in kernel PCA, back from the (high dimensional) feature space back to the original data space. (It's complicated.) I spoke with the exoSAMSI crew about exoplanet populations inference; Megan Shabram (PSU) is close to having a hierarchical inference of the exoplanet eccentricity distribution (as a function of period). Finally, I spoke with Foreman-Mackey about his new evil plan (why is there a new evil plan every four days?) to build an interim-prior-based sampling of the posterior density of exoplanet parameters for every KOI in the Kepler Catalog.
In the astro seminar, Carlos Badenes (Pitt) talked about white-dwarf–white-dwarf binaries and an inferred rate of inspiral, based on SDSS spectra split up exposure by exposure: The orbits of the soon-to-merge white dwarfs are so fast and short-period that even the twenty-minute intervals between spectral exposures in SDSS are long enough to show velocity changes! He finds a merger event rate for the binaries large enough to explain the type-Ia supernova rate, but only if he permits sub-Chandrasekhar total masses to make the SNe. That is, he gets enough events, but they tend to be low-mass.
Tim Morton (Princeton) spent the day at NYU to talk exoplanets, sampling, selection functions, marginalized likelihoods, and so on. We had a productive talk about making high-performance importance-sampling code to compute the marginalized likelihoods.
I spent too much time today trying to understand kernel PCA, inspired by Vakili's use of it to build a probabilistic model of galaxy images. Schölkopf would be disappointed with me! I don't see how it can give useful results. But then on further reflection, I realized that all my problems with kPCA are really just re-statements of my problems with PCA, detailed in my HMF paper: PCA delivers results that are not affine invariant. If you change the metric of your space, or the units of your quantities, or shear or scale things, you get different PCA components. That problem is even more severe and hard to control and incomprehensible as you generalize with the kernel trick.
I also don't understand how you go from the results of kPCA back to reconstructions in the original data space. But that is a separate problem, and just represents my weakness.
In a low-research day, I discussed spectral plotting with Jeffrey Mei (NYUAD). This is serious wheel-reinvention: Every student who works on spectra pretty-much has to build her or his own plotting tools.
In a blast from the past, James Long (TAMU) called me today to discuss a re-start of what I like to call the "insane robot" project, in which we are fitting photometric data censored by an unknown (but assumed stationary) probabilistic process. This project was started with Joey Richards (wise.io), who wrote much of the code with Long's help, but it has been dormant for some time now. One astonishing thing, after a couple years of disuse, the code was comprehensible and ran successfully. Let's hear it for well-documented, well-structured code!
Late in the day, Foreman-Mackey proposed a very simple approach to inferring exoplanet population parameters, based only on the content of the Kepler "Object of Interest" catalog. That is, a way to build a probabilistic model of this catalog that would be responsible and rigorous (though involving many simplifying assumptions, of course). It relates to projects by Subo Dong and others, who have been doing approximations to hierarchical inference; one goal would be to test those conclusions. The common theme between the exoplanet project and the insane robot project is that both require a parameterized model of the completeness or data censoring; we don't know with any reliability in either case the conditions under which an observation makes it into the catalog.
I spoke with MJ Vakili today about how to turn his prior over galaxy images into a probabilistic weak lensing measurement system. Any time we write down a probabilistic model, we need to be able to evaluate the probability of some set of parameters given data, or some set of data given parameters, and we also need to be able to sample from it: We need to be able to generate fair samples of artificial data given parameters, and generate fair samples of parameters given data. Vakili is assigned with the task of making both kinds of operations first correct and second fast; the weak lensing community won't care that we are more righteous if we aren't practicable.
Masao Sato (Penn) gave the astro seminar today, talking about supernova cosmology, now and in the near future. Afterwards we discussed the possibility that precise cosmological measurements may be reaching their maximum possible precisions, some from cosmic variance and some from complicated and random systematic issues (unknown unknowns, as it were).
Before and at lunch, CampHogg discussed the chapters and title for Hou's PhD thesis, which is about probabilistic inference in the exoplanet domain. This subject of discussion was inspired by Hou's extremely rapid write-up of his new MCMC method (which he is calling multi-canonical, but which we now think is probably a misnomer).
After the Spitzer Oversight Committee meeting came to a close, I got lunch with Heather Knutson (Caltech), during which I picked her brain about things exoplanet. She more-or-less agreed with my position that if any eta-Earth-like calculation is going to be precise, it will have to find new, smaller planets, beyond what was found by Petigura and company (in their pay-walled article, and recent press storm). That said, she was skeptical that CampHogg could detect smaller-sized planets than anyone else has.
Knutson described to me a beautful project in which she is searching the hot jupiters for evidence of more massive, outer planets and she says she does find them. That is, she is building up evidence that migration is caused by interactions with heavier bodies. She even finds that more massive hot Jupiters tend to have even more massive long-period siblings. That's pretty convincing.