I spent the day refactoring my diffraction-imaging code. I turned a set of disjoint functions into a callable class, making the functions easy to maintain, reducing and caching repeat calculations, and making it easier to parallelize. Also making it more “Pythonic”. The funny thing about refactoring is that it always takes less time than you think it will. At the same time, you end up (after a day of work) with a piece of code that does exactly what the old one did, only better (in some nebulous sense). By the end of the day, I had success.
I spent spare moments during the break writing code to do diffraction microscopy with few-photon and single-photon images. It works! The estimator has the properties I expect. You could, in principle, do diffraction imaging with many images taken at unknown angles, each of which contains only a single photon! Sounds crazy? Just the asymptotic property of maximum-likelihood estimators. Of course, as the number of photons per image drops, the sensitivity of the result to assumptions (the necessarily wrong assumptions) becomes very strong.
The discussion of cryo-EM on Monday ranged into the properties of maximum-likelihood estimators. This got me thinking about the possibility of doing something that Leslie Greengard had challenged me with a month or two ago: Could we use diffraction-imaging data in which we have many exposures, each at random (that is, unknown) Euler angles, but in each of which we only have a few photons? Do we need enough photons in every image to align it confidently? My theory (born last night and started in earnest today) is the following: As long as we have a correct probabilistic model for both the orientation distribution (isotropic, perhaps) and as long as our model or representation for the three-dimensional (squared norm of the) Fourier transform contains the true function (a tall order), we should be able to use images of any quality, even single-photon images.
I spent the day writing this up in a document and planning how I would code it. A Christmas project, perhaps. Of course it is almost never that you could have either a correct model of the orientation distribution or a complete representation of the function space. But that just reminds us of the obvious point that all of physical science is impossible; that's not a show-stopper.
Today was a day of phone calls. The first was with Dan Foreman-Mackey, in which we discussed his results on single transits in the Kepler data (that is, planet transits where the orbital period is comparable to or longer than the Kepler mission lifetime). His approach is what you might call a probabilistic expert system: He detects anomalies in the light-curve and then competes various generative models for each anomaly. These models include exoplanet transits, but also many kinds of data defects and stellar events. In every case, he is using a flexible Gaussian-process noise model and permitting the hyper-parameters of that to vary, so it is computationally non-trivial. He has found many new single transits, including some in systems with known shorter-period planets.
This was followed by a call with Andy Casey, Melissa Ness, and Hans-Walter Rix, on the subject of regularizing The Cannon with prior information. Casey and I are looking at L1, which says that sparser models are more likely. Ness is looking at hand-tuning which wavelengths can be affected by which labels, using priors from physical models of stars (or, really, atomic physics). In both cases, we see that the priors have big effects, but we don't yet know what works best, and we expect it to be different for different stars and different chemical abundances.
Marina Spivak (SCDA) led a discussion of a paper about the RELION code for the reconstruction of molecular structures from cryo-EM imaging data. We discussed spatial smoothness prior pdfs (pdfs over three-dimensional spatial models, that is), which they construct in the Fourier domain. We also discussed the optimality of estimators built from marginalized likelihoods. I'm a bit unsure about the theory after Kyle Cranmer (NYU) sowed doubt in me last week. But in general we were disappointed with the RELION papers in justifying the improvement brought by their prior: They made no argument that their resulting structures are better in the domain of the data! I don't know any other way to check, test, or compare models.
Andy Casey and I got our continuum-fitting working on all the individual sub-exposures that make up the APOGEE data today. We also built our own stacking code that combines these into a high signal-to-noise spectrum for each star (not that that is ever necessary for any scientific experiment—Don't Stack Your Data tm). We were able to show that our precision improves with signal-to-noise exactly as expected, which is good and expected (and not true for many physics-model-based pipelines). We launched our large suite (a grid search) of validations to set the hyper-parameters.
Over the last few days we have realized that our model has 1.5 million parameters and more than twice that many hyper-parameters. Uh oh. So we set many of them equal and got the whole thing down to just 2. Hyper-parameters, that is. All this just shows that we ought to be able to do way better with more cleverness.
Unfortunately Casey leaves NYC this weekend, and we don't have a finished product! But we got close. We are just days away from having a full set of figures, and we have a paper half-written.
In the afternoon, a few of the SCDA members recapped things they learned at NIPS 2015 this past week. The most relevant to me right now was work by Chen and Candès on solving many quadratic equations. This is equivalent to—or contains as a special case—phase retrieval. What they have done sounds like magic, so I will have to read the paper.
The day started with a journal-club talk by Eftychios Pnevmatikakis (SCDA) about spike sorting and neuron deblending in calcium imaging. This is a kind of video-rate imaging in which you see the activity in the neurons of a live brain by fluorescence. Every technique he uses in these contexts is a technique used in astronomical image analysis, from sparse and non-negative matrix factorizations to MCMC with reversible jump. This was yet another reminder that astronomy and microscopy are very similar fields, technically.
Andy Casey and I did get our factor of 150 speed-up by including the analytic derivatives! We still have concerns about convergence, but you can do a lot more experiments along those lines when your code is 150 times faster. We switched gears today to continuum normalization. For this purpose, we built a simple linear least-square fitter with light ridge regularization (L2) to keep the matrix well conditioned. We started downloading all the individual-exposure files, because the whole point of our regularized version of The Cannon is that it will work well at much lower signal-to-noise than traditional methods.
Andy Casey and I discovered bad convergence issues and bad optimization problems in our multi-abundance Cannon model. This got me discouraged: It is a convex optimization, how hard can it be? At the end of the day, we quickly coded up analytic derivatives for the model and are hoping that these will fix our optimization problems (and vastly speed up optimization). In theory, analytic derivatives in our 17-label model should speed up our code by a factor of around 170, since the quadratic model has a lot of parameters!
At group meeting, Adrian Price-Whelan talked about chaotic models for the Ophiuchus Stream, Chang Hoon Hahn (NYU) talked about fiber collisions (our top-favorite subject at group meeting!), and Robyn Sanderson talked about the mutual information between chemical abundances and dynamical actions. In the fiber-collision discussion, we veered into the usual talk about solving all of astronomy: If we have a generative model for everything, we can deal with fiber collisions. That's not practical, but it got us talking about what a forward-modeling approach to large-scale structure might look like. I amused Blanton with the point that not any function you write down can consistently be thought of as a spatial correlation function: There are positive-definiteness requirements.
In the afternoon at Simons, there was a talk by Patrick Combettes (Paris 6) about convex optimization problems from a math perspective. He made connections—which we have been making here—between set intersection and optimization, which were interesting. He was very negative on non-convex optimization, in the sense that he doesn't think that there is much that can be said or proven about such problems, even fairly simple ones like bilinear.
In the morning, Alex Malz successfully passed his qualifying exam. He presented work on inferring the true redshift distribution given a set of noisy photometric redshifts, and proposed a thesis that extends this in various ways, including, ambitiously, to two-point functions.
In the afternoon, Andy Casey got The Cannon working with a full set of 17 labels (two stellar parameters and 15 chemical abundances). And by working, I mean training (with high signal-to-noise, well-behaved stars from APOGEE) and validation. Since this doesn't yet include our compressed-sensing regularization, Casey didn't think of this as a huge milestone, but I think that it is the highest dimensional space in which we have ever operated The Cannon. (Although I admit that we knew it would work from experiments that Melissa Ness has done.) The delivery of 15-dimensional chemical abundances for all the APOGEE stars on the red-giant branch looks feasible to me. And the labels look plausible.
John Moustakas (Siena) came in for a day of pair-coding, trying to resurrect a project we worked on way back in 2007 and 2008 with Sam Roweis (deceased) for the PRIMUS survey. The idea is to find the best subset of a galaxy spectral template list to explain a set of spectral data (and get redshifts). We cast this “best subset” problem as an integer programming problem—minimize the number of templates such that every data point is well represented by at least one template—and solved it using IBM's amazing (and closed-source, proprietary) CPLEX optimizer for non-convex problems. Today, Moustakas and I not only got our code from 2008 working again (let's hear it for readable code!), we replaced the old IDL code (that writes input files for CPLEX) with direct calls from Python to the new CPLEX Python API. It worked!
I don't want to sound like an advertisement for IBM, but CPLEX is hundreds of times faster than anything we were able (even with Roweis's help) to write ourselves, and also found consistently better solutions. These problems are simple to state but NP-Hard to solve. CPLEX is unbelievably expensive (and subject to export restrictions) but it is available for free to US academics through IBM's academic initiative.
First thing in the morning, Andy Casey and I discussed some ideas for immediately following the first Gaia data release, which may be expanding in scope, apparently. We had the idea of gathering multi-band point-source photometry for as much of the sky as possible, and then using the data release to learn the relationship between the photometry and absolute magnitude. Then we can deliver all distances for every single point source we can! With error analysis. There would be a lot of users for such a thing.
By the end of the day, Casey had the validation working to set the regularization hyperparameter at each wavelength of the compressed-sensing version of The Cannon. Everything is looking as we predicted, so we will try to reproduce all of the results from the original Cannon paper, but now with a model with more sensible model freedom. I am extremely optimistic about where this project is going.
Mid-day I had a phone call to talk about photometry in K2 Campaign 9, which is in the bulge (and hence crowded). We discussed forward modeling and more data-driven detrending methods. But most importantly, we decided (more or less; final decision tomorrow) to do a tiny, half-pixel offset (dither) at the mid-campaign break. That might not sound like much, but I think it will significantly improve the calibration we can do and substantially increase the number of things we can learn.
I am still not well, but well enough for the first time in ages to do my group meeting. It was great! Boris Leistedt (NYU) talked to us about his project to do template-based photometric redshifts but where he learns the templates too. I love this project; it is the only method I like for getting the redshift distribution in LSST past the (effective) spectroscopic magnitude limit. He gave us the philosophy and showed us some preliminary successes on simple fake data.
Andy Casey (Cambridge) talked to us about the different stellar parameter and abundance codes working in the Gaia-ESO spectroscopic survey collaboration. It is complicated! Getting reliable and valuable stellar quantities out of mutually inconsistent codes is not an easy problem. His methodologies are filled with good ideas. Brian McFee (NYU) suggested that he or we look at the crowd-sourcing literature for ideas here, and that turns out to be a very good idea. I proposed to Casey that we do some reading this coming week.
Right after group meeting, Casey diagnosed our bugs from yesterday and got the L1-regularized fitting working! We celebrated with ramen.
Foreman-Mackey called with ideas for The Cannon: He thinks we might be able to do some simple things to propagate errors or uncertainties in our labels in the training set into uncertainties in our internal parameters, and from there to uncertainties in the labels we derive for new stars. His idea is based on the “uncertainties in both directions” model in my old fitting-a-line document. He also wants to play around with a fully Bayesian Cannon. We think this will be hard, but conceivably possible. He is thinking forward to some kind of sophisticated TESS input catalog in these projects.
As for Casey and my work on the compressed-sensing version of The Cannon: We turned on optimization with our L1-regularized model and nothing seems to be working right. Diagnosis progresses.
Andy Casey (Cambridge) arrived in NYC for two weeks to work on a compressed-sensing version of The Cannon. We talked about the design matrix, which is built from functions of the stellar labels. We also talked about how to do the cross-validation to both set the hyper-parameters (which are regularization strengths) and also check the prediction accuracy, without double-using the data. We are adopting ideas from Foreman-Mackey, who also encouraged us to think about propagating label uncertainties. We keep deciding this is computationally impossible, but Foreman-Mackey had some cheap-and-dirty suggestions (which come at the cost of severe approximation).
In the afternoon, Tim Geithner (formerly US Secretary of the Treasury) came by the Simons Foundation to talk about his career and involvement in the 2007–2009 financial crisis. This isn't really bloggable (according to The Rules at right), except that he actually used the word “Bayesian” to describe the attitude that he and his colleagues at the Federal Reserve (where he was in 2007 and 2008) took towards global financial markets, where there is no firmly established causal model. They had to act (or not act) in the face of great uncertainty. I am not sure I agree with what they did, but he quasi-espoused a quasi-principle that I hadn't heard before: Instead of optimizing expected utility, they tried to optimize for ability to correct their mistakes after the fact. It is like expected repair-ability! Not sure if that is really a useful guiding principle, but it occurred to me that it might have implications for policies around global climate change. And TESS and LSST?
Iain Murray commented yesterday that I should look at this paper and this paper, both by Salakhutdinov, Roweis, and Ghahramani, about optimizing marginalized likelihoods. Standard practice is expectation-maximization, but the upshot of these two papers (if I understand them correctly) is that gradient descent can be better if the latent variables (the ones being marginalized out) are badly determined. That's relevant to Dalya Baron and me, deprojecting galaxies, and to cryo-EM.
In a very short research day, I thought about whether it is better to run the E-M algorithm or to take derivatives and optimize with an industrial optimizer, when the optimization is of a marginalized likelihood. The standard practice in the business is E-M, but it isn't clear to me that this would beat taking the derivatives. It is expensive to take derivatives, but E-M is usually slow. I realized I need to know more about these matters.
In conversation with Ness about abundance space, we realized that we should do some exploring to see if the abundances returned by The Cannon show more structure than those returned by other pipelines acting on the APOGEE data. I suggested t-sne for exploratory data analysis, and also looking at random projections. Although we don't yet know whether we have novel or informative structure in abundance space, the open clusters look like they do stand out as islands!
I had a phone conversation with Foreman-Mackey, in which we discussed measurement and inference. The question at hand is whether to hand-tune the coefficients of The Cannon to not permit it to use lines from element A in assessing the abundance of element B. Left to its own devices, the code will do such things, if element A and element B are covariant in the training set of stars! If you permit the code to use A lines to get B abundances, you will probably get more precise answers (because there really is information there) but it will be harder for a user of the code output to use the output responsibly (if, say, he or she is fitting models of abundance patterns). This is yet another example of the (obvious) point that your data analysis decisions depend on your (or your user's) utility.
In other news, in reading about microscopy data-analysis methods, I finally understood the point of the E-M algorithm: It optimizes a marginalized likelihood, without requiring the user to explicitly compute the marginalized likelihood itself or its derivatives. It is funny that it took me so long to understand this, when I have about four or five papers in which we developed, proved, and used E-M algorithms!