I discussed with Perez-Giz and with Foreman-Mackey the creation of quasi-periodic oscillator Gaussian Process models for stars. We want to start by fitting with a damped simple harmonic oscillator kicked by a white-noise source (this has an exact solution as a Gaussian Process, worked out by Goodman and I am sure many others before him). We then want to evolve to non-harmonic oscillators that are better at modeling pulsating stars, but still with tunable incoherence. Applications include: Making the study of quasi-periodic oscillations in compact objects more probabilistic, and more faithful and complete searches for RR Lyrae stars. One problem is that you can't arbitrarily modify your covariance function (kernel function) and obey the rule that it must construct only positive definite variance tensors. I don't really see how to deal with that problem in a simple way, since there is no simple test one can apply to a kernel function that tells you whether or not it is permitted.
I spent the morning at Yale with Geha and Bonaca. Bonaca is finishing a great paper that shows (duh) that fitting with a smooth, time-independent potential tidal stream data generated in a clumpy, time-dependent potential is biased. She shows, however, that it is not more biased than expected for other kinds of data (that is, non-stream data). One interesting thing about her work is that the closest smooth potential to the realistic cosmological simulation she is using is something triaxial, which is not integrable, which pleases the anti-action-angle devil inside of me.
I ate lunch with Debra Fischer's (Yale) exoplanet group (thanks!), discussing data analysis. Fischer is a big believer (as am I) that when she builds new hardware, she should do so in partnership with data analysis and software teams, so that hardware and software choices can inform one another. There is no separation between hardware and software any more. We discussed some simple examples, mainly on the experimental design side rather than strictly hardware, but the point applies there too.
I am not sure it counts as "research" but I spent part of the morning touring the future space of the NYU Center for Data Science, currently occupied by Forbes Magazine. The space is excellent, and can be renovated to meet our needs beautifully. The real question is whether we can understand our needs faster than the design schedule.
In the afternoon, Foreman-Mackey and I discussed the difference between frequentist and Bayesian estimates of parameter uncertainty. There are regimes in which they agree, and we couldn't quite agree on what those are. Certainly in the super-restrictive case of Gaussian-shaped likelihood function (Gaussian-shaped in parameter space), and (relatively) uninformative priors, the uncertainty estimates converge. But I think the convergence is more general than this.
Research returned to my life briefly today when I got a chance to catch up with Price-Whelan. He modified his stream-fitting likelihood function to have the stars in the tidal streams depart their progenitor near the classcial Lagrange points, instead of just anywhere near the tidal radius. This change was not complicated to implement (here's to good code), makes his model more realistic, and (it turns out) improves the constraints he gets on the gravitational potential, both in precision and accuracy. So it is all-win.
I spent the day assembling my zeroth draft material for my Atlas together into one file, including plates, captions, and some half-written text. It is a mess, but it is in one file. All the galaxies are shown at the same plate scale and same exposure, calibration, and stretch. One of the hardest problems to solve (and I solved it boringly) is how to split up the page area into multiple panels (all at same plate scale) to show the full extents of all the galaxies without too much waste. Another hard problem was going through the data for the millionth time, looking at outliers and understanding what's wrong in each case. It is a mess, but as I am writing this I am uploading to the web to deliver it to my editor (Gnerlich at Princeton University Press).
I worked all day trying to get a zeroth draft of all the plates for my Atlas together for delivery to my editor; I have a deadline today. I got a set of plates together, but I couldn't get it assembled with captions and the partially written text I have into one big document. That is, I failed. I will have to finish on Monday.
I had a full day hiding at home and working; I spent it on my Atlas. I got multi-galaxy plates close to fully working and worked on the automatic caption generation. On the multi-galaxy plate issue, one problem is deciding how big to make each image: Galaxies scaled to the same half-light or 90-percent radius look very different when presented at the same exposure time, brightness, and contrast (stretch). One of the points of my Atlas is to present everything in a quantitatively comparable way, so this is a highly relevant issue.
I spent some quality time with Ekta Patel tracking down a bug in our visualization of output from The Tractor. In the end it turned out to be a think-o (as many hard-to-find bugs are) in which I had put in some calibration information as if it calibrated flux, when in fact it calibrates intensity. The flux vs intensity issues have got me many times previously, so I might learn it some day. As my loyal reader knows (from this and this, for example) I feel very strongly that an astronomical image is a measure of intensity not flux! If you don't know what I mean by that, probably it doesn't matter, but the key idea is that intensity is the thing that is preserved by transparent optics; it is the fundamental quantity.
I spent the morning up at Columbia, in part to participate in the reading group set up by Josh Peek (Columbia) to work through the astroML book. We covered probability distributions and how to compute and sample from them, along with some frequentist correlation tests (which are not all that useful, in my opinion).
The other reason to be up at Columbia was to discuss the streams projects with Price-Whelan. I encouraged him strongly to write the abstract of our paper; I think the earlier you draft an abstract the better; it scopes the project and makes sure everything important gets said. The abstract is the most important part of the paper, so it makes sense to spend a lot of time working on it. We agreed to follow the (annoying but useful) Astronomy & Astrophysics template of Context, Aims, Method, Results. This guidance is great (and, in the end, you don't have to include the headings explicitly, at least if you aren't publishing in A&A).
In a low-research day, Hou gave a lunch-time talk on Monte-Carlo methods for doing difficult integrals in probabilistic inference, and Sven Kreiss (NYU) continued a short course he is doing on the Higgs discovery. Hou's methods all are forms of adaptive importance sampling, even his nested sampling method; there really are no new ideas. His main advantages over other equally simple methods are, for one, that his method works well on our specific problem and, for another, that he produces an unbiased estimate of the integral plus an uncertainty.
In the astrophysics seminar today, Castorina (SISSA) gave a nice talk about the influence of neutrinos on the evolution of large-scale structure. The nice thing is that even the known neutrino species have to have an observable effect (and do; see Planck); every dark-matter halo (concentration) should have another neutrino halo around it. Immediately after the talk we argued about the detectability of this neutrino halo; it could in principle be detected with weak lensing, but it is hard; detecting the neutrinos more directly is even harder. I predicted (and Castorina didn't have an immediate answer) that there should be large variance (especially at low masses) in the fraction of the mass of each condensed dark-matter halo that is in neutrinos. My prediction—which is qualitative and ill-thought-out—is based on intuitions about dynamics with multiple fluids with different initial velocity distributions. Maybe there are some collapsed objects that are neutrino dominated! They would be rare, but maybe exist somewhere?
Nick Konidaris (Caltech) appeared out of nowhere to join us for lunch. He is working on inexpensive spectroscopic follow-up systems for surveys like LSST and SKA. We discussed various things, including the fact that extended emission is just as well detected by small telescopes as large, that there are software vs hardware trade-offs, and that many of the gut decisions we make in designing experiments could be made objectively. We tentatively agreed to try to write a short, pedagogical note about the first of these.
In a small amount of research time today, I read various bits of writing from students. Leslie Rogers's (Caltech) student Ellen Price (Caltech) has written a very nice paper about Fisher Information in exoplanet transit observations. I am going to recommend to them that they make the paper more useful to experimental design by specifically accounting for the differences between photon noise and read noise, the latter penalizing you for taking shorter exposures. I say this because their main conclusion is (naively) "take shorter exposures".
My own student MJ Vakili has written some nice short papers on noise bias and PSF bias in weak-lensing surveys. We are trying to understand these biases to make stronger our arguments that probabilistic reasoning (read: Bayes) will solve all problems. What is hard to understand specifically is under what conditions the current plans (coming from the point-estimation proponents) to measure and then correct for biases will work and under what conditions they will fail. If we can understand that, we can make a very strong argument for probabilistic reasoning. Of course, all methods are destined to fail if they make bad assumptions, so the key thing for any weak lensing program is to test the assumptions as thoroughly as possible.
My trip to Penn State got cancelled for weather, which is bad, not least because Eric Ford (PSU) and I have lots to discuss about getting our exoplanet characterization and discovery software programs funded, and not least because there are many people there with whom I was looking forward to valuable conversations. That said, I just got two days added into my schedule in which I had cancelled all regular meetings. I spent my time today working on my Sloan Atlas of Galaxies project, making mock-ups of some of the plates to check that my sizing, spacing, borders, and so on all make sense.
Fired up by yesterday's discussion of Kepler photometry, Foreman-Mackey forced me to pair-code some PSF (pixel-convolved, of course) inference and photometry for the Kepler pixel-level data. The idea is to make a slightly more flexible model for the PSF than what the Kepler team has created, permit variation in stellar centroid (from exposure to exposure) and detector sensitivity (from pixel to pixel) and see if that—plus some super-fast optimization with ceres—can crush this. Yes, yet another distraction from the distractions that are distracting us from the research that is stopping us from writing papers. Also, yet another project that is turning into The Tractor. Soon all my projects will be clean-rooms of that one project.
Because Hou has finished and submitted his paper on the (apparently aptly named) FML, today's MCMC meeting devolved into a discussion of how Kepler does its simple aperture photometry (or SAP, if you like the acronym circus). By the end of the meeting, we had formulated a question in fundamental astronomy which I am not sure has been answered in the literature: If you know your point-spread function (PSF or pixel-response function or PRF or pixel-convolved point-spread function or PCPSF) perfectly but you have a pointing jitter, such that each exposure is not at precisely the same pointing, with what method or weights should you "co-add" your pixel values to get the most consistent possible photometry from exposure to exposure? The rules are that you aren't permitted to re-centroid the star in every frame; you have to do the same operation on every frame. There are only five people on this Earth who have ever wanted to know the answer to this question, but it now turns out that I am one of them.
In FML news, we asked Hou to see if we can compute the FML for exoplanet models in the Kepler data, even for target stars that don't have any high-significance planet transits. That, we anticipate, will be a very hard problem to solve.