I continued working on "hot Jupiters as clocks", building a system to find clock drifts. I find that the precision with which a drift is detected is slightly less than I found for the precision of the clock phase or offset; this makes sense, because any drift model has multiple parameters. It took me a while to get good sampling to work (with emcee, and a sensible burn-in schedule) but it now seems to. I can now start searching for signals. Heather Knutson (Caltech) gave me a couple of suggestions.
It is back to the non-research grind now, but that didn't stop me from stealing a couple hours to work on Foreman-Mackey and my "stellar clocks" project. My question was: How precisely can you phase up or measure the time using a transiting hot Jupiter. That is, how good is a hot Jupiter as a clock? If good, we could use it for all kinds of fun things.
I started with fake data. As my loyal reader knows, I am against using fake data; all it provides is a functional test of your code! But over the years I have come around: After all, it provides a functional test of your code! It also permits you to check your results with absolutely known input assumptions; that is, it provides sanity checking on your estimates of (in this case) the precision you expect.
I find, with extremely optimistic assumptions (about the period, depth, duration, ingress time, gravitational dynamics, and, of course, photon noise), that a high-quality transiting hot Jupiter in Kepler normal-cadence data could, over the lifetime of the now-over Kepler mission, provide a time standard that is good to a few tenths of a second. This agrees with the order-of-magnitude calculation I had made myself earlier and suggests that we might be able to do some timing experiments.
While in transit back to NYC, I worked on writing more about Ness and my quadratic-in-tags model for stellar spectra in our manuscript, and she got the quadratic model to work from end to end. It looks extremely promising.
Today was my last day in Heidelberg for 2014. Sad day! I spent most of my research time working with Ness on our data-driven models of stars. We got the code to go from first-order (linear) linear fitting to second-order (quadratic) linear fitting, and vowed that we would never go to third order! We will use a Gaussian Process before we do that. We discussed outliers—both outlier stars and outlier pixels—and how to deal with them while still retaining computational tractability, good probability of convergence to a good optimum (or good sampling prospects), and the likelihood principle. We don't have a good solution yet, although I have ideas in my old document on fitting a line. We also discussed next steps on the path to paper 1, including especially making figures that demonstrate the properties and qualities of the individual-wavelength least-square fits we are doing.
At the end of the day, I was noticing that this project is a perfect example of the new discipline of "Data Science", about which I am supposed to have opinions: It was an idea we both had, but I didn't know enough about the spectroscopy, and Ness didn't know enough about the methods available. Neither of us could have done this project without the collaboration (nor without significant amounts of computation).
MPIA Milky-Way group meeting was excellent today. Ross Fadely talked about our star–galaxy separation projects and showed some very encouraging signs that a supervised regression system to predict stellarity using the object colors might work. We are training not on the labels (star or galaxy) but on the morphological inputs that are used to make those labels. Nicholas Martin (Strasbourg) showed incredible maps of the outskirts of M31 split by metallicity. He is building the maps with a proper forward model of the data. They are gorgeous. He is also experimenting with showing the results of the MCMC sampling as a noisy movie. Cool. Greg Stinson (MPIA) showed both some incredible movies of simulations, and also some interesting results on the delay time distribution for Type Ia SNe. It appears that the direct measurements of the delay time distribution from stellar populations and the inferences of the delay times from chemical abundances might be in conflict. We argued about possible resolutions.
Late in the day, Ness and I closed the loop on the data-driven APOGEE spectral modeling: We used our data-driven model to "learn" the tags (temperature, logg, and metallicity) for a new star. We cheated—we tagged a star that was used in the training—but (a) we rocked it with accurate tag recovery, and (b) we can do cross-validation as a more conservative test. It was an exciting moment.
Malte Kuhlmann (MPI-IS), Rix, and I had a great, long meeting with MPIA radio astronomers Simon Bihr, Carl Ferkinhoff, and Laura Zschaechner, to discuss how radio astronomy works and what issues could benefit from some new technology coming from probabilistic inference or machine learning. We located a few points of great interest. One is the problem of flagging data affected by RFI, where it is a time-consuming task but nonetheless there is now a huge quantity of well flagged data to "learn" from. Great place for supervised classification. A second area is calibration of phase delays: The calibrator stars are used to get calibration estimates, which are interpolated to the science data. This interpolation is neither probabilistic nor uses calibration information latent in the science data nor knows about spatial nor time coherence. I am thinking Gaussian Processes, as my loyal reader can imagine. A third area is, of course, scene reconstruction, which is Kuhlmann's current project. Everyone recognizes the great value of CLEAN but also its limitations: It doesn't propagate noise, it contains many heuristically set parameters, and it doesn't optimize a well-specified scalar objective.
In other news, Rix, Ness and I figured out, with our data-driven model of APOGEE spectra, how to label new spectra, and how to identify the most relevant spectral regions to use as metallicity "indices". We outlined paper one and put the code and outline up on github (tm).
Malte Kuhlmann (MPI-IS) arrived in Heidelberg today. We looked at his results on reconstructing astronomical scenes from radio interferometry data. He has done both optimization with a regularized likelihood, and full sampling in image space using hybrid Monte Carlo. He is using an "L1" regularization, which is equivalent to a biexponential prior probability in pixel amplitudes. He finds that he can do a good job of reproducing the input scenes for fake data, when he samples, but that the optimization takes you to a strange place. This is a consequence of the point that in a very high dimensional space, with non-Gaussian priors, the posterior mode an be very far from the posterior mean. We discussed noise propagation for interferometry and also the advantages that a sampling system has over the conventional heuristic methods (such as CLEAN).
Melissa Ness (MPIA) and I worked on adding intrinsic scatter into her model of APOGEE spectra. We need the intrinsic scatter to be understood when we turn her model around and use it to label new spectra for which we don't know the temperature and metallicity. The results we got looked pretty bad, so we have work to do tomorrow.
Foreman-Mackey re-started exoplanet search (in Kepler) last night, and on the train home from Tübingen we looked at some of the early false positives. We realized that the way we are constructing the search leads to some non-trivial issues: For performance reasons, we slide a single "box" model across the lightcurve once, and then we do search by linking together likelihood increments from the one-d box slide in periodic groups. The issue is: When we slide the box, we permit the box to take any transit depth, so the system can link together in a periodic group a set of transit-like excursions that all have different depths. A true exoplanet transit signal has a (nearly) fixed depth.
The solution (and this brings back memories of GALEX) is to combine the periodic box results both with and without a constraint that the depths must be identical. The identical model must beat the non-identical model for something to make the search cut. The one complexity is that we have to penalize model complexity: The likelihood of the non-uniform depths model is always better at maximum, because it has much more freedom.
After talking to Muandet (MPI-IS) and Lopez-Paz (MPI-IS) about false-positive classification using supervised methods yesterday, Foreman-Mackey sent them some lightcurve information from injected exoplanets and from some random locations, just to give them an idea of what to expect when we start a search for exoplanets in earnest. They classified like the wind. Meanwhile, Schölkopf, Foreman-Mackey, and I discussed the problem of exoplanet search.
The point of our de-trending or modeling of stellar + s/c variability is to make search more effective, and (later) characterization more accurate. Since search is the object, we should optimize with search in mind. We encouraged Foreman-Mackey to start up the search in earnest. One insane thing we realized as we got this started was that the Gaussian-Process-based likelihood function we are using (which uses a Gaussian Process in time or with only one feature, time, as input) can be generalized (in the way described yesterday) to incorporate thousands of other stars as features, with almost no performance hit. That is, we can do the stellar and exoplanet transit modeling simultaneously, even when the stellar model uses information from thousands of other stars! This realization set Foreman-Mackey coding.
With Wang we investigated the possibility that we could do our pixel-level modeling of the imaging data not one Kepler quarter at a time but instead one Kepler month at a time (the data break naturally into months at the DSN data download interruptions). It appears that we can: The full quarter doesn't have that much information in it relevant to short predictions than the month. This potentially speeds up the code by a large factor, because most aspect of our inferences scale up with the size of the data. This is super-true for non-parametric inferences (as with GPs) but also with parametric inferences, both because of internal dot products and also because the bigger the data set the bigger (usually) cross-validation says your feature list should be.
In re-reading yesterday's post, I found it strange to hear myself say that the model was over-fitting stellar variability and then we decided to make the model far more flexible! Today we decided that we don't yet have the technology (well, perhaps not the patience, since we want to detect exoplanets asap) to fully separate stellar variability from spacecraft-induced issues, or at least we would have to do something that pooled much more data to do it—we wouldn't be able to work on one light-curve at a time. So we de-scoped to exoplanet science and decided that we would try to fit out everything except the exoplanet transits. This is not unlike what others are doing, except that we are trying to be extremely principled about not letting information in the data about any exoplanet transits "leak" into our modeling of the variability of the light-curve. We are doing this with a censoring or a train-and-test framework.
Because we decided to eradicate all variability—spacecraft and stellar—we had Wang work on auto-regressive models, in which the past and future of the star is used to predict the present of the star. The first results are promising. We also had Foreman-Mackey put all the other stars into the Gaussian Process predictions we are making. This means we are are doing Gaussian Process regression and prediction with thousands of ambient dimensions (features). That seems insane to me, but Schölkopf insists that it will work—being non-parametric, GPs scale in complexity with the number of data points, not the number or size of the features. I will believe it when I see it. The curse of dimensionality and all that!
In the afternoon, we had discussions with Krik Muandet (MPI-IS) and David Lopez-Paz (MPI-IS) about false-positive classification for exoplanet search using supervised methods and a discussion with Michael Hirsch (MPI-IS) about non-parametric models for the imaging PSF. More on the former tomorrow, I very much hope!
Today Foreman-Mackey and I arrived in Tübingen to work with Schölkopf. On arrival, we got Dun Wang on the phone, because our trip to MPI-IS is designed to make huge progress on Wang's recalibration of the Kepler satellite detector pixels, using the variations that are found in common across stars. The way Schölkopf likes to say it is that we are capitalizing on the causal structure of the problem: If stars (or, really, pixels illuminated by stars) co-vary it must be because of the telescope, since the stars are causally disconnected. The goal of our work on this is to increase the sensitivity of the satellite to exoplanet transits.
We opened the day with two questions: The first was about why, despite this causal argument, we seem to be able to over-fit or fit out stellar variability. We are being careful with the data (using a train-and-test framework) to ensure that no information about the short-term variability of the star near any putative transit is leaking into the training of the predictive model. My position is that it is because our set of prediction stars might span the full basis of anything a star can do. We are using thousands of stars as features!
The second question was about why, in our residuals, there seems to be some trace of the spacecraft variability. We don't know that for sure, but just at an intuitive visual level it looks like the fitting process is not only not removing the spacecraft, but actually increasing the calibration "noise". We started Wang on tests of various hypotheses, and put Foreman-Mackey on trying models that are far more flexible than Wang's purely linear model.
On a walk up the mountain to MPIA, Rix and I discussed quantitatively the question of whether stellar oscillations could be used as clocks, to measure kinematics, pulsar timing, stellar companions, and so on. We got results that make me optimistic. I started to write some words about doing this in practice using the Kepler data. It all hinges on the quality factor—the coherence of the oscillation modes—and the signal-to-noise ratio at which they are detected in the full data set.
Later in the day, Fadely showed early results from his machine learning applied to SDSS colors (to predict imaging morphology, in service of star–galaxy separation). He is basically going to throw everything he has at that. Mykytyn finished making more robust estimates of quasar lightcurve covariance properties and we started to discuss a hierarchical inference on the hyperparameters of those.
Yesterday and today, Mykytyn and I worked out the generalization of iteratively reweighted least squares (a method for down-weighting outliers that is slightly, slightly better than sigma-clipping) to the case of a Gaussian Process noise model. We take the standard algorithm and replace the residual scaled by the noise with the deviation from the conditional mean prediction scaled by the sum in quadrature of the observational noise and the GP conditional variance. We implemented this simple algorithm on Mykytyn's quasar time-series data and it looks like it works beautifully. Now he is going to see if our inferences look better with the re-weighted uncertainties.
At Galaxy Coffee today, Cristina Garcia (MPIA) spoke about the quasar–galaxy cross-correlation at redshift four. The quasars have huge clustering amplitude, so the cross-correlation is expected to have large amplitude too. She uses a clever galaxy selection technique and finds consistency between the data and expectations. Also at Galaxy Coffee, Željko Ivezić (UW) showed that there are simple situations in which the mean of the data is not the best estimator of the location of a distribution function. In one example, not only was there a better estimator, but it improved with more data as 1/N (rather than 1/sqrt(N)). He strongly advocated using likelihood functions to generate estimators.
Fadely showed up in Heidelberg today, and we discussed star–galaxy classification improvements that could help PanSTARRS and LSST. As our loyal reader will recall, one issue with supervised methods for the problem of star–galaxy clasification is that we don't have any good sets of labels, even when we have HST data, or spectroscopy; there is a lot of label noise at the faint end, and good labels only exist for very small slices of the total population. We realized today that we could try to predict not the labels, but the things that go into making the labels, like "psf minus model" in SDSS or roundness and sharpness and so on. We vowed to give it a try. Key idea: Predict data, not latent variables!
We decided to simplify Ben Johnson's calibration code by switching the spectroscopic likelihood function to log space, where the calibration vector is additive. This makes the noise non-Gaussian, of course, but at very high signal-to-noise (where he is working), the Gaussian approximation is not severe. At Milky Way group meeting, Wilma Trick (MPIA) talked about observational signatures of spiral structure in the kinematics of gas and stars, and Alexia Lewis (UW) talked about the star-formation histories of patches of M31 from PHAT data. Her method is very simple (fitting the distribution of main-sequence stars); it could be applied to the Hipparcos data on the local neighborhood, which would be super-cool.
Three of my conversations today were the following: Zhitai Zhang (MPIA) is working out, for a given stellar position in three-space and radial velocity, what kinds of orbits might that star be on, given a Milky Way gravitational potential model and the unknown proper motion. Wilma Trick (MPIA) is generating toy stellar position and velocity data on a toy Milky Way disk and censoring it with a toy selection function and then trying to infer the toy model parameters from the simulated data. Ben Johnson (UCSC) is burning in his simultaneous fit of star-cluster model and spectrophotometric calibration vector. In all three cases, the code is doing something unexpected, despite passing lots of local sanity checks. This is like the difference between unit testing and functional testing: Sometimes the whole system is hard to understand. Especially when it is a complex combination of physics, statistics, and code approximations. Is the puzzling behavior a problem with the code, or is it a problem with the customer (me), who just can't handle the truth.
Zhang sometimes finds that the true stellar orbit is low probability given the data, even when there are no observational errors on the four observed phase-space coordinates. We think this has something to do with the orbital phase being such that the (unobserved) transverse velocity is unusually large (or small). Trick is finding biased disk structure parameters, even when she is generating and fitting with the same model family and the data are perfect; we suspect either the toy-data generation or else some treatment of the (trivial) censoring. Johnson is finding that the cluster amplitude or mass is unconstrained by the spectral data, even when the calibration uncertainty is set to something finite. In each case, we can't quite tell whether the behavior is evidence for problems with the code or problems with our concepts (that is, is it a bug or a think-o?). All I know how to do in these cases is come up with sensible sanity checks on the code. We suggested sampling a larger range in transverse velocity for Zhang, making even simpler toy data for Trick, and looking at every stage in the likelihood calculation for Johnson. The latter lasted well into the night and I believe in the end it was a think-o not a bug.
Ben Johnson (UCSC) showed up in Heidelberg today and we discussed his project to obviate spectrophotometric calibration. He said, however, that we are not permitted to give the paper the same title as this blog post. However, I think it is apt, because he has a working system that suggests that inferences on spectra can proceed with a tiny bit of photometric data and no serious spectrophotometric calibration just as well as they can proceed when fully calibrated. Really the no-calibration method is better, in fact, because it models the spectrograph simultaneously with the spectral information; that is, it is makes less rigid assumptions about the hardware in general. The idea is to fit the calibration "vector" along with the parameters of interest, and take up the fiddly bits with a Gaussian Process. It seems to work. We spent part of the day moving a bit of furniture, to make the GP contribute multiplicatively (rather than additively) to the signal.
My very busy week ended with a seminar on Gaussian Processes, that I gave on the blackboard, supported with some projected slides showing Foreman-Mackey's demo plots. I tried to go slowly, but there is a lot to do in a one-hour lecture on something that can fill an entire semester's course. I got great questions from the crowd.
In the morning, Željko Ivezić (UW) and I discussed LSST cadence and exposure-time issues. He showed me his "conservation laws" slide, which shows that the exposure time flows down to all sorts of constraints on the survey. We talked about the plan to split the individual (probably 30-sec) exposures into two parts. What's the difference between 15+15 and 5+25 (and so on)? The expectations about the noise and the point-spread function are both related to the exposure time, and there are cosmic rays and moving objects. We tried to spec out some simple projects to analyze the relevant problems. Simplest: What are the implications for point-source position and flux measurements, in the limit that the sky is static? Even this question is not trivial.