In a day of talks—I had to leave early a talk by Ruth Angus (Oxford) on stellar ages from Kepler to see a talk in Computer Science by Alekh Agarwal (Microsoft) on distributed and clever machine-learning algorithms and engineering—Bekki Dawson (Berkeley) showed us results on the statistics of exoplanet populations and tests of planetary migration scenarios. She showed that the continuity of tidal circularization models (conservative exoplanet flow, in some sense) makes a prediction for the distribution of planets in the period–eccentricity plane, and that the prediction is falsified strongly by Kepler. There is not yet any good model for the formation and migration of exoplanets that explains the main features of the data, but there are many possible effects, and it is possible that all of them are acting at some level. Her talk suggested scores of other projects that could and should be done. On a side note, she showed convincingly that you can measure eccentricities just with Kepler data alone, and that there are strong asymmetries that make it much more likely that you will see a faster-than-circular transit than a slower-than-circular transit when the transiting-planet orbit is eccentric. She also showed some transit timing work by our own Foreman-Mackey.
In the morning, Andrea Maccio (MPIA) gave a nice talk about making star-formation and AGN feedback in numerical simulations of galaxy formation much more realistic. He is very negative about the possibility that AGN can stop star formation: AGN emit jets, which punch through the central part of the ISM and don't really heat all of the necessary volume. He also showed that the the dark-energy equation of state can affect galaxy evolution, not because the dark energy has any direct effect on how structures form, but because it changes the timing of gravitational collapse and star-formation episodes. That got the audience all in a tizzy: Can we infer the dark-energy equation of state from the radial distribution of stars in a galaxy? The answer is no: It looks like this effect is strongly degenerate with feedback parameters, but it is super-intriguing.
Late in the day, Foreman-Mackey and I checked in with Dawson about the in-transit noise. She has some systems that show a very strong effect of higher noise in transit than out. We suggested improvements to the statistical tests, and Dawson will try to move to smaller and smaller planets tomorrow.
Bekki Dawson (Berkeley) showed up for a few days, in which she will work with Foreman-Mackey and Angus and me and also give our astro seminar. We discussed her observation (not new) that the photometric noise in a star light curve is often higher in transit than out of transit. This is explained by there being strong surface features on the star that get selectively blocked by the planet but affect the out-of-transit lightcurve only in an integrated way. Dawson's concern is that this excess noise is not being incorporated in completeness and sensitivity tests with the Kepler data; that is, we might be being over-optimistic about our small-planet samples. We made a plan to test this, and, if necessary, make more realistic artificial planet injections for better completeness and sensitivity studies.
This morning was a "hack session" in the Josh-Peek-organized reading group on astrostatistics up at Columbia called #NYCastroML. I helped Kelle Cruz (CUNY) and others build a mixture-of-Gaussians model of the WISE point-source catalog. Well, we just worked on single-Gaussian models, but we are getting ready to do multiple Gaussians. We got a long way, although, as usual, most of the session was really "data munging" and not "data analysis". That's not uncommon: Getting the data into a consistent, useful, checked state is often the hardest part of the project. And, as far as I know, there is no "theory" of this part of data analysis. It just is.
I discussed with Perez-Giz and with Foreman-Mackey the creation of quasi-periodic oscillator Gaussian Process models for stars. We want to start by fitting with a damped simple harmonic oscillator kicked by a white-noise source (this has an exact solution as a Gaussian Process, worked out by Goodman and I am sure many others before him). We then want to evolve to non-harmonic oscillators that are better at modeling pulsating stars, but still with tunable incoherence. Applications include: Making the study of quasi-periodic oscillations in compact objects more probabilistic, and more faithful and complete searches for RR Lyrae stars. One problem is that you can't arbitrarily modify your covariance function (kernel function) and obey the rule that it must construct only positive definite variance tensors. I don't really see how to deal with that problem in a simple way, since there is no simple test one can apply to a kernel function that tells you whether or not it is permitted.
I spent the morning at Yale with Geha and Bonaca. Bonaca is finishing a great paper that shows (duh) that fitting with a smooth, time-independent potential tidal stream data generated in a clumpy, time-dependent potential is biased. She shows, however, that it is not more biased than expected for other kinds of data (that is, non-stream data). One interesting thing about her work is that the closest smooth potential to the realistic cosmological simulation she is using is something triaxial, which is not integrable, which pleases the anti-action-angle devil inside of me.
I ate lunch with Debra Fischer's (Yale) exoplanet group (thanks!), discussing data analysis. Fischer is a big believer (as am I) that when she builds new hardware, she should do so in partnership with data analysis and software teams, so that hardware and software choices can inform one another. There is no separation between hardware and software any more. We discussed some simple examples, mainly on the experimental design side rather than strictly hardware, but the point applies there too.
I am not sure it counts as "research" but I spent part of the morning touring the future space of the NYU Center for Data Science, currently occupied by Forbes Magazine. The space is excellent, and can be renovated to meet our needs beautifully. The real question is whether we can understand our needs faster than the design schedule.
In the afternoon, Foreman-Mackey and I discussed the difference between frequentist and Bayesian estimates of parameter uncertainty. There are regimes in which they agree, and we couldn't quite agree on what those are. Certainly in the super-restrictive case of Gaussian-shaped likelihood function (Gaussian-shaped in parameter space), and (relatively) uninformative priors, the uncertainty estimates converge. But I think the convergence is more general than this.
Research returned to my life briefly today when I got a chance to catch up with Price-Whelan. He modified his stream-fitting likelihood function to have the stars in the tidal streams depart their progenitor near the classcial Lagrange points, instead of just anywhere near the tidal radius. This change was not complicated to implement (here's to good code), makes his model more realistic, and (it turns out) improves the constraints he gets on the gravitational potential, both in precision and accuracy. So it is all-win.
I spent the day assembling my zeroth draft material for my Atlas together into one file, including plates, captions, and some half-written text. It is a mess, but it is in one file. All the galaxies are shown at the same plate scale and same exposure, calibration, and stretch. One of the hardest problems to solve (and I solved it boringly) is how to split up the page area into multiple panels (all at same plate scale) to show the full extents of all the galaxies without too much waste. Another hard problem was going through the data for the millionth time, looking at outliers and understanding what's wrong in each case. It is a mess, but as I am writing this I am uploading to the web to deliver it to my editor (Gnerlich at Princeton University Press).
I worked all day trying to get a zeroth draft of all the plates for my Atlas together for delivery to my editor; I have a deadline today. I got a set of plates together, but I couldn't get it assembled with captions and the partially written text I have into one big document. That is, I failed. I will have to finish on Monday.
I had a full day hiding at home and working; I spent it on my Atlas. I got multi-galaxy plates close to fully working and worked on the automatic caption generation. On the multi-galaxy plate issue, one problem is deciding how big to make each image: Galaxies scaled to the same half-light or 90-percent radius look very different when presented at the same exposure time, brightness, and contrast (stretch). One of the points of my Atlas is to present everything in a quantitatively comparable way, so this is a highly relevant issue.
I spent some quality time with Ekta Patel tracking down a bug in our visualization of output from The Tractor. In the end it turned out to be a think-o (as many hard-to-find bugs are) in which I had put in some calibration information as if it calibrated flux, when in fact it calibrates intensity. The flux vs intensity issues have got me many times previously, so I might learn it some day. As my loyal reader knows (from this and this, for example) I feel very strongly that an astronomical image is a measure of intensity not flux! If you don't know what I mean by that, probably it doesn't matter, but the key idea is that intensity is the thing that is preserved by transparent optics; it is the fundamental quantity.
I spent the morning up at Columbia, in part to participate in the reading group set up by Josh Peek (Columbia) to work through the astroML book. We covered probability distributions and how to compute and sample from them, along with some frequentist correlation tests (which are not all that useful, in my opinion).
The other reason to be up at Columbia was to discuss the streams projects with Price-Whelan. I encouraged him strongly to write the abstract of our paper; I think the earlier you draft an abstract the better; it scopes the project and makes sure everything important gets said. The abstract is the most important part of the paper, so it makes sense to spend a lot of time working on it. We agreed to follow the (annoying but useful) Astronomy & Astrophysics template of Context, Aims, Method, Results. This guidance is great (and, in the end, you don't have to include the headings explicitly, at least if you aren't publishing in A&A).