#astrohackny, week N+1

The day started at #astrohackny with Foreman-Mackey and I arguing about convolutions of Gaussians. The question is: Consider a model (probability of the data given parameters) with two (linear) parameters of importance and 150 (linear) nuisance parameters. There is a very weak Gaussian prior on the nuisance parameters. How to write down the marginalized likelihood such that you only have to do a 2x2 least squares, not a 152x152 least squares? I had a very strong intuition about the answer but no solid argument. Very late at night I demonstrated that my intuition is correct, by the method of experimental coding. Not very satisfying, but my abilities to complete squares with high-dimensional linear operators are not strong!

Taisiya Kopytova (MPIA) is visiting NYU for a couple of months, to work on characterizing directly imaged extra-solar planets. We discussed the simultaneous fitting of photometry and spectroscopy, one of my favorite subjects! I, of course, recommended modeling the calibration (or, equivalently, continuum-normalization) issues simultaneously with the parameter estimation. We also discussed interpolation (of the model grid) and MCMC sampling and the likelihood function.

At Pizza Lunch at Columbia, Chiara Mingarelli (Caltech) talked about the Pulsar Timing Array and its project to detect the stochastic background of gravitational waves. The beautiful thing about the experiment is that it detects the motion of the Earth relative to the pulsars, not the individual motions of the pulsars, and it does so using time correlations in timing residuals as a function of angle between the pulsars. The assumption is that the ball of pulsars is far larger than the relevant wavelengths, and that different pulsars are causally unconnected in time. Interesting to think about the "multiple hypotheses" aspects of this with finite data.


vary all the exposure times!

Ruth Angus showed up for a few days, and we talked out the first steps to make an argument for taking time-series data with variable exposure times. We all know that non-uniform spacing of data helps with frequency recovery in time series; our new intuition is that non-uniform exposure time will help as well, especially for very high frequencies (short periods). We are setting up tests now with Kepler data but an eye to challenging the TESS mission to biting a big, scary bullet.

After complaining for the millionth time about PCA (and my loyal reader—who turns out to be Todd Small at The Climate Corporation—knows I love to hate on the PCA), Foreman-Mackey and I finally decided to fire up the robust PCA or PCP method from compressed sensing (not the badly-re-named "robust PCA" in the astronomy literature). The fundamental paper is Candès et al; the method has no free parameters, and the paper includes ridiculously simple pseudo-code. It looks like it absolutely rocks, and obviates all masking or interpolation of missing or bad data!

At lunch, Gabriele Veneziano (Paris, NYU) spoke about graviton–graviton interactions and causality constraints. Question that came up in the talk: If a particle suffers a negative time delay (like the opposite of a gravitational time delay), can you necessarily therefore build a time machine? That's something to dine out on.


The Climate Corporation

I spent a bit of today at The Climate Corporation, hosted by former astronomer Todd Small. He told me about things they work on, which include field-level predictions for farmers about rainfall and sunlight, precise prediction and instructions for irrigation and fertilization, and advice about crop density (distances between seeds). He said they get data feeds from Earth observing but also from soil testing and even the farm machinery itself (who knew: combine harvesters produce a data feed!). A modern piece of large-scale farm equipment is precisely located on its path by GPS, makes real-time decisions about planting or whatever it is doing, and produces valuable data. They even have a tractor simulator in the house for testing systems.

We talked for a bit about the challenging task of communicating with clients (farmers, in this case) about probabilistic information, such as measurement uncertainties and distributions over predictions. This is another motivation for developing an undergraduate data-science educational program: It doesn't matter what industry you are in, you will need to be able to understand and communicate about likelihoods and posterior probabilities. I gave an informal seminar to a part of the climate modeling team about our uses of Gaussian Processes in exoplanet discovery.


Caltech for a day

[On a research break this week; hence the lack of posts.]

I spent the day at Caltech Astronomy, with a dense schedule! Judy Cohen and I talked about RR Lyrae she is finding in the Milky Way halo and the challenge of getting a completeness function for statistical inferences. Pete Mao described to me the fiber positioners they are building for the Prime Focus Spectrograph multi-object spectrograph for Subaru. They have 50 seconds to get 2500 actuators in place (to a tolerance of 10-ish microns). Leslie Rogers and Ellen Price told me about Price's senior-thesis project to work out the composition constraints on short-period planets from tidal stress and the requirement that they not break up or mass-transfer. I was surprised to learn that there are some planets so close in that the tidal constraints rule out a range of compositions.

Nick Konidaris showed me new spectrographs being built in the high bay, and we talked about choices for spectral resolution. Adi Zitrin showed me amazing image of massive galaxy clusters he is using as telescopes to magnify very high-redshift galaxies; he has amazing examples, and also some nice techniques for building mass models. He does a great job of explaining arc multiplicity and morphology.

At lunch, students Rebecca Jensen-Clem and Michael Bottom told me about projects in high-contrast imaging, and Allison Strom and Sirio Belli told me about measuring the physical properties of galaxies in the redshift range 2 to 3. I tried to pitch my data-driven approaches at them: In the former area you might think about learning the actuator commands given the wavefront data directly from an optimization (with, presumably, the quality of the science image encapsulated in the objective function). In the latter, you might think about making a graph of galaxy spectra, where galaxies are joined by edges in the graph if their spectra are similar under any (reasonable) assumption about dust extinction. The students were (rightly) suspicious about both options!

Adam Miller and I discussed correct uses of machine learning in astronomy (since he is a great practitioner), and I once again pitched at him the possibility that we could try to replace their random-forest classifier in the time domain with a generative model of all stellar variable types. It would be hard, but exceedingly satisfying to do that. We discussed training-set imbalance and some clever ideas he has about combating it.

I asked Heather Knutson about how to get our (new) single transits we are finding in Kepler followed up with radial-velocity spectrographs. She made a great point about our future population inferences: Anything we infer from single transits makes a prediction for radial-velocity surveys and we should try to encourage the radial-velocity groups to confirm or deny in parallel. I discussed cadence and exposure-time issues for the Zwicky Transient Factory with Eric Bellm and visualization and interface for the LSST data with George Helou. I gave the Astrophysics Colloquium there about The Cannon and data-driven models (by my definition) and the day ended with a great dinner with Tom Soifer, Chuck Steidel, George Djorgovski, and Judy Cohen. It was a great trip!


cosmology, now with less approximation; RNADE

In the morning, Mike O'Neil (NYU), Iain Murray (Edinburgh), Malz, Vakili, and I met for coffee to talk cosmology and cosmological inference. We discussed the linear algebra of making full (non-approximate) likelihood calls for cosmic background inference, which includes determinant and solve calls for enormous (dense) matrices. A few days ago, Wandelt made the (perhaps obscure) comment to me that he did his Gibbs sampling in the CMB to avoid making determinant calls. I finally understood this: The determinant is big and expensive in part because it is like an integral or marginalization over the nuisance parameters (which are the initial conditions or phases). If you can compute the determinant, you get something you can think of as being a marginalized likelihood function. The Gibbs sampling is part of a system that does posterior sampling, so it does the marginalization but doesn't return the amplitude of the likelihood function. If you need a marginalized likelihood function, you can't do it with the Gibbs sampling. Murray once again made the simple point that the posterior (for the cosmological parameters) will in general be much easier to compute and represent than the likelihood function (the probability of the data given the cosmological parameters) because the former is a much smaller-dimensional thing. That's a deep point!

That deep point played also into his talk in the afternoon about his RNADE method (and code) to use deep learning to represent very complex densities, or estimate a density given a sampling. One of his application areas is a project to obtain posterior constraints on the mass of the Milky Way given (as data) the properties of the Magellanic Clouds. The theory in this case is represented by a large number of numerical simulations, but Murray wants a continuous density to do inference. RNADE is really impressive technology.


challenging inferences

Group meeting featured discussion from our two distinguished visitors this week, Brendon Brewer (Auckland) and Iain Murray (Edinburgh). Brewer described how he and Foreman-Mackey intend to re-do some exoplanet populations inference with something that he calls "joint-space sampling" and in the context of what you might call "likelihood-free inference" (although Brewer objects to even that label) or "approximate Bayesian computation" (a label I despise, because aren't all Bayesian computations approximate?).

The idea is that we have the counts of transiting systems with 0, 1, 2, or etc transiting planets. What is the true distribution of planetary system multiplicities implied by those counts? Brewer calls it joint-space sampling because to answer this question requires sampling in the population parameters, and all the parameters of all the individual systems. The result of the posterior inference, of course, depends on everything we assume about the systems (radius and period distributions, detectability, and so on). One point we discussed is what is lost or gained by restricting the data: In principle we should always use all the data, as opposed to just the summary statistics (the counts of systems). That said, the approach of Brewer and Foreman-Mackey is going to be fully principled, subject to the (strange) constraint that all you get (as data) are the counts.

Murray followed this up by suggesting a change to the ABC or LFI methods we usually use. Usually you do adaptive sampling from the prior, and reject samples that don't reproduce the data (accurately enough). But since you did lots of data simulations, you could just learn the conditional probability of the parameters given the data, and evaluate it at the value of the data you have. In general (his point is), you can learn these conditional probabilities with deep learning (as his RNADE code and method does routinely).

Murray also told us about a project with Huppenkothen and Brewer to make a hierarchical generalization of our Magnetron project published here. In this, they hope to hierarchically infer the properties of all bursts (and the mixture components or words that make them up). The challenge is to take the individual-burst inferences and combine them subsequently. That's a common problem here at CampHogg; the art is deciding where to "split" the problem into separate inferences, and how to preserve enough density of samples (or whatever) to pass on to the data-combination stage.

This was followed by Malz telling us about his project to infer the redshift density of galaxies given noisy photometric-redshift measurements, only just started. We realized in the conversation that we should think about multiple representations for surveys to output probabilistic redshift information, including quantile lists, which are fixed-length representations of pdfs. As I was saying that we need likelihoods but most surveys produce posteriors, Murray pointed out that in general posteriors are much easier to produce and represent (very true) so we should really think about how we can work with them. I agree completely.


#astrohackny, day N

At #astrohackny, Ben Weaver (NYU) showed a huge number of binary-star fits to the APOGEE individual-exposure heliocentric radial velocity measurements. He made his code fast, but not yet sensible, in that it treats all possible radial-velocity curves as equally likely, when some are much more easily realized physically than others. In the end, we hope that he can adjust the APOGEE shift-and-add methodology and make better combined spectra.

Glenn Jones (Columbia) and Malz showed some preliminary results building a linear Planck foreground model, using things that look a lot like PCA or HMF. We argued out next steps towards making it a probabilistic model with more realism (the beam and the noise model) and more flexibility (more components or nonlinear functions). Also, the model has massive degeneracies; we talked about breaking those.


light echos

I hung out in the office while Tom Loredo (Cornell), Brendon Brewer (Auckland), Iain Murray (Edinburgh), and Huppenkothen all argued about using dictionary-like methods to model a variable-rate Poisson process or density. Quite an assemblage of talent in the room! At lunch-time, Fed Bianco talked about light echos. It is such a beautiful subject: If the light echo is a linear response to the illumination, I have this intuition that we could (in principle) infer the full three-dimensional distribution of all the dust in the Galaxy and the time-varible illumination from all the sources. In principle!


radical fake TESS data

This past week, the visit by Zach Berta-Thompson (MIT) got me thinking about possible imaging surveys with non-uniform exposure times. In principle, at fixed bandwidth, there might be far more information in a survey with jittered exposure times than in a survey with uniform exposure times. In the context of LSST I have been thinking about this in terms of saturation, calibration, systematics monitoring, dynamic range, and point-spread function. However, in the context of TESS the question is all about frequency content in the data: Can we do asteroseismology at frequencies way higher than the inverse mean exposure time if the exposure times are varied properly? This weekend I started writing some code to play in this sandbox, that is, to simulate TESS data but with randomized exposure times (though identical total data output).


probabilistic density inference, TESS cosmics

Boris Leistedt (UCL) showed up for the day; we discussed projects for the future when he is a Simons Postdoctoral Fellow at NYU. He even has a shared Google Doc with his plans, which is a very good idea (I should do that). In particular, we talked about small steps we can take towards fully probabilistic cosmology projects. One is performing local inference of large-scale structure to hierarchically infer (or shrink) posterior information about the redshift-space positions of objects with no redshift measurement (or imprecise ones).

Zach Berta-Thompson (MIT) reported on his efforts to optimize the hyper-parameters of my online robust statistics method for cosmic-ray mitigation in the TESS spacecraft. He found values for the two hyper-parameters such that, for some magnitude ranges, my method beats his simple and brilliant middle-eight-of-ten method. However, because my method is more complicated, and because it seems to have its success depends (possibly non-trivially) on his (somewhat naive) TESS simulation, he is inclined to stick with middle-eight-of-ten. I asked him for a full and complete search of the hyper-parameter space but agreed with his judgement in general.


online, on-board robust statistics

Zach Berta-Thompson (MIT) showed up at NYU today to discuss the on-board data analysis performed by the TESS spacecraft. His primary concern is cosmic rays: With the thick detectors in the cameras, cosmic rays will affect a large fraction of pixels in a 30-minute exposure. Fundamentally, the spacecraft takes 2-second exposures and co-adds them on-board, so there are lots of options for cosmic-ray mitigation. The catch is that the computation all has to be done on board with limited access to RAM and CPU.

Berta-Thompson showed that a "middle-eight-of-ten" strategy (every 10 sub-exposures average all but the highest and the lowest) does a pretty good job. I proposed something that looks like the standard "iteratively reweighted least squares" algorithm, but operating in an "online" mode where it can only see the last few elements of the past history. Berta-Thompson, Foreman-Mackey, and I tri-coded it in the Center for Data Science studio space. The default algorithm I wrote down didn't work great (right out of the box) but there are two hyper-parameters to tune. We put Berta-Thompson onto tuning.


dissertation transits

Schölkopf, Foreman-Mackey, and I discussed the single-transit project, in which we are using standard machine learning and a lot of signal injections into real data to find single transits in the Kepler light curves. This is the third chapter of Foreman-Mackey's thesis, so the scope of the project is limited by the time available! Foreman-Mackey had a breakthrough on how to split the data (for each star) into train, validate, and test such that he could just do three independent trainings for each star and still capture the full variability. False positives remain dominated by rare events in individual light curves.

With Dun Wang, we discussed the GALEX photon project; his job is to see what about the photons is available at MAST, if anything, especially anything about the focal-plane coordinates at which they were detected (as opposed to celestial-sphere coordinates). This was followed by lunch at facebook with Yann LeCun.


Simons Center for Data Analysis

Bernhard Schölkopf arrived for a couple of days of work. We spent the morning discussing radio interferometry, Kepler light-curve modeling, and various thing philosophical. We headed up to the Simons Foundation to the Simons Center for Data Analysis for lunch. We had lunch with Marina Spivak (Simons) and Jim Simons (Simons). With the latter I discussed the issues of finding exoplanet rings, moons, and Trojans.

After lunch we ran into Leslie Greengard (Simons) and Alex Barnett (Dartmouth), with whom we had a long conversation about the linear algebra of non-compact kernel matrices on the sphere. This all relates to tractable non-approximate likelihood functions for the cosmic microwave background. The conversation ranged from cautiously optimistic (that we could do this for Planck-like data sets) to totally pessimistic, ending on an optimistic note. The day ended with a talk by Laura Haas (IBM) about infrastructure (and social science) she has been building (at IBM and in academic projects around data-driven science and discovery. She showed a great example of drug discovery (for cancer) by automated "reading" of the literature.



I took a physical-health day today, which means I stayed at home and worked on my students' projects, including commenting on drafts, manuscripts, or plots from Malz, Vakili, and Wang.


robust fitting, intelligence, and stellar systems

In the morning I talked to Ben Weaver (NYU) about performing robust (as in "robust statistics") fitting of binary-star radial-velocity functions to the radial velocity measurements of the individual exposures from the APOGEE spectroscopy. The goal is to identify radial-velocity outliers and improve APOGEE data analysis, but we might make a few discoveries along the way, a la what's implied by this paper.

At lunch-time I met up with Bruce Knuteson (Kn-X) who is starting a company (see here) that uses a clever but simple economic model to obtain true information from untrusted and anonymous sources. He asked me about possible uses in astrophysics. He also asked me if I know anyone in US intelligence. I don't!

In the afternoon, Tim Morton (Princeton) came up to discuss things related to multiple-star and exoplanet systems. One of the things we discussed is how to parameterize or build pdfs over planetary systems, which can have very different numbers of elements and parameters. One option is to classify systems into classes, and build a model of each (implicitly qualitatively different) class and then model the full distribution as a mixture of classes. Another is to model the "biggest" or "most important" planet first; in this case we build a model of the pdf over the "most important planet" and then deal with the rest of the planets later. Another is to say that every single star has a huge number of planets (like thousands or infinity) and just most of them are unobservable. Then the model is over the an (effectively) infinite-dimensional vector for every system (most elements of which describe planets that are unobservable or will not be observed any time soon).

This infinite-planet descriptor sounds insane, but there are lots of tractable models like this in the world of non-parametrics. And the Solar System certainly suggests that most stars probably do have many thousands of planets (at least). You can guess from this discussion where we are leaning. Everything we figure out about planet systems applies to stellar systems too.