I spent the day refactoring my diffraction-imaging code. I turned a set of disjoint functions into a callable class, making the functions easy to maintain, reducing and caching repeat calculations, and making it easier to parallelize. Also making it more “Pythonic”. The funny thing about refactoring is that it always takes less time than you think it will. At the same time, you end up (after a day of work) with a piece of code that does exactly what the old one did, only better (in some nebulous sense). By the end of the day, I had success.
I spent spare moments during the break writing code to do diffraction microscopy with few-photon and single-photon images. It works! The estimator has the properties I expect. You could, in principle, do diffraction imaging with many images taken at unknown angles, each of which contains only a single photon! Sounds crazy? Just the asymptotic property of maximum-likelihood estimators. Of course, as the number of photons per image drops, the sensitivity of the result to assumptions (the necessarily wrong assumptions) becomes very strong.
The discussion of cryo-EM on Monday ranged into the properties of maximum-likelihood estimators. This got me thinking about the possibility of doing something that Leslie Greengard had challenged me with a month or two ago: Could we use diffraction-imaging data in which we have many exposures, each at random (that is, unknown) Euler angles, but in each of which we only have a few photons? Do we need enough photons in every image to align it confidently? My theory (born last night and started in earnest today) is the following: As long as we have a correct probabilistic model for both the orientation distribution (isotropic, perhaps) and as long as our model or representation for the three-dimensional (squared norm of the) Fourier transform contains the true function (a tall order), we should be able to use images of any quality, even single-photon images.
I spent the day writing this up in a document and planning how I would code it. A Christmas project, perhaps. Of course it is almost never that you could have either a correct model of the orientation distribution or a complete representation of the function space. But that just reminds us of the obvious point that all of physical science is impossible; that's not a show-stopper.
Today was a day of phone calls. The first was with Dan Foreman-Mackey, in which we discussed his results on single transits in the Kepler data (that is, planet transits where the orbital period is comparable to or longer than the Kepler mission lifetime). His approach is what you might call a probabilistic expert system: He detects anomalies in the light-curve and then competes various generative models for each anomaly. These models include exoplanet transits, but also many kinds of data defects and stellar events. In every case, he is using a flexible Gaussian-process noise model and permitting the hyper-parameters of that to vary, so it is computationally non-trivial. He has found many new single transits, including some in systems with known shorter-period planets.
This was followed by a call with Andy Casey, Melissa Ness, and Hans-Walter Rix, on the subject of regularizing The Cannon with prior information. Casey and I are looking at L1, which says that sparser models are more likely. Ness is looking at hand-tuning which wavelengths can be affected by which labels, using priors from physical models of stars (or, really, atomic physics). In both cases, we see that the priors have big effects, but we don't yet know what works best, and we expect it to be different for different stars and different chemical abundances.
Marina Spivak (SCDA) led a discussion of a paper about the RELION code for the reconstruction of molecular structures from cryo-EM imaging data. We discussed spatial smoothness prior pdfs (pdfs over three-dimensional spatial models, that is), which they construct in the Fourier domain. We also discussed the optimality of estimators built from marginalized likelihoods. I'm a bit unsure about the theory after Kyle Cranmer (NYU) sowed doubt in me last week. But in general we were disappointed with the RELION papers in justifying the improvement brought by their prior: They made no argument that their resulting structures are better in the domain of the data! I don't know any other way to check, test, or compare models.
Andy Casey and I got our continuum-fitting working on all the individual sub-exposures that make up the APOGEE data today. We also built our own stacking code that combines these into a high signal-to-noise spectrum for each star (not that that is ever necessary for any scientific experiment—Don't Stack Your Data tm). We were able to show that our precision improves with signal-to-noise exactly as expected, which is good and expected (and not true for many physics-model-based pipelines). We launched our large suite (a grid search) of validations to set the hyper-parameters.
Over the last few days we have realized that our model has 1.5 million parameters and more than twice that many hyper-parameters. Uh oh. So we set many of them equal and got the whole thing down to just 2. Hyper-parameters, that is. All this just shows that we ought to be able to do way better with more cleverness.
Unfortunately Casey leaves NYC this weekend, and we don't have a finished product! But we got close. We are just days away from having a full set of figures, and we have a paper half-written.
In the afternoon, a few of the SCDA members recapped things they learned at NIPS 2015 this past week. The most relevant to me right now was work by Chen and Candès on solving many quadratic equations. This is equivalent to—or contains as a special case—phase retrieval. What they have done sounds like magic, so I will have to read the paper.
The day started with a journal-club talk by Eftychios Pnevmatikakis (SCDA) about spike sorting and neuron deblending in calcium imaging. This is a kind of video-rate imaging in which you see the activity in the neurons of a live brain by fluorescence. Every technique he uses in these contexts is a technique used in astronomical image analysis, from sparse and non-negative matrix factorizations to MCMC with reversible jump. This was yet another reminder that astronomy and microscopy are very similar fields, technically.
Andy Casey and I did get our factor of 150 speed-up by including the analytic derivatives! We still have concerns about convergence, but you can do a lot more experiments along those lines when your code is 150 times faster. We switched gears today to continuum normalization. For this purpose, we built a simple linear least-square fitter with light ridge regularization (L2) to keep the matrix well conditioned. We started downloading all the individual-exposure files, because the whole point of our regularized version of The Cannon is that it will work well at much lower signal-to-noise than traditional methods.
Andy Casey and I discovered bad convergence issues and bad optimization problems in our multi-abundance Cannon model. This got me discouraged: It is a convex optimization, how hard can it be? At the end of the day, we quickly coded up analytic derivatives for the model and are hoping that these will fix our optimization problems (and vastly speed up optimization). In theory, analytic derivatives in our 17-label model should speed up our code by a factor of around 170, since the quadratic model has a lot of parameters!
At group meeting, Adrian Price-Whelan talked about chaotic models for the Ophiuchus Stream, Chang Hoon Hahn (NYU) talked about fiber collisions (our top-favorite subject at group meeting!), and Robyn Sanderson talked about the mutual information between chemical abundances and dynamical actions. In the fiber-collision discussion, we veered into the usual talk about solving all of astronomy: If we have a generative model for everything, we can deal with fiber collisions. That's not practical, but it got us talking about what a forward-modeling approach to large-scale structure might look like. I amused Blanton with the point that not any function you write down can consistently be thought of as a spatial correlation function: There are positive-definiteness requirements.
In the afternoon at Simons, there was a talk by Patrick Combettes (Paris 6) about convex optimization problems from a math perspective. He made connections—which we have been making here—between set intersection and optimization, which were interesting. He was very negative on non-convex optimization, in the sense that he doesn't think that there is much that can be said or proven about such problems, even fairly simple ones like bilinear.
In the morning, Alex Malz successfully passed his qualifying exam. He presented work on inferring the true redshift distribution given a set of noisy photometric redshifts, and proposed a thesis that extends this in various ways, including, ambitiously, to two-point functions.
In the afternoon, Andy Casey got The Cannon working with a full set of 17 labels (two stellar parameters and 15 chemical abundances). And by working, I mean training (with high signal-to-noise, well-behaved stars from APOGEE) and validation. Since this doesn't yet include our compressed-sensing regularization, Casey didn't think of this as a huge milestone, but I think that it is the highest dimensional space in which we have ever operated The Cannon. (Although I admit that we knew it would work from experiments that Melissa Ness has done.) The delivery of 15-dimensional chemical abundances for all the APOGEE stars on the red-giant branch looks feasible to me. And the labels look plausible.
John Moustakas (Siena) came in for a day of pair-coding, trying to resurrect a project we worked on way back in 2007 and 2008 with Sam Roweis (deceased) for the PRIMUS survey. The idea is to find the best subset of a galaxy spectral template list to explain a set of spectral data (and get redshifts). We cast this “best subset” problem as an integer programming problem—minimize the number of templates such that every data point is well represented by at least one template—and solved it using IBM's amazing (and closed-source, proprietary) CPLEX optimizer for non-convex problems. Today, Moustakas and I not only got our code from 2008 working again (let's hear it for readable code!), we replaced the old IDL code (that writes input files for CPLEX) with direct calls from Python to the new CPLEX Python API. It worked!
I don't want to sound like an advertisement for IBM, but CPLEX is hundreds of times faster than anything we were able (even with Roweis's help) to write ourselves, and also found consistently better solutions. These problems are simple to state but NP-Hard to solve. CPLEX is unbelievably expensive (and subject to export restrictions) but it is available for free to US academics through IBM's academic initiative.
First thing in the morning, Andy Casey and I discussed some ideas for immediately following the first Gaia data release, which may be expanding in scope, apparently. We had the idea of gathering multi-band point-source photometry for as much of the sky as possible, and then using the data release to learn the relationship between the photometry and absolute magnitude. Then we can deliver all distances for every single point source we can! With error analysis. There would be a lot of users for such a thing.
By the end of the day, Casey had the validation working to set the regularization hyperparameter at each wavelength of the compressed-sensing version of The Cannon. Everything is looking as we predicted, so we will try to reproduce all of the results from the original Cannon paper, but now with a model with more sensible model freedom. I am extremely optimistic about where this project is going.
Mid-day I had a phone call to talk about photometry in K2 Campaign 9, which is in the bulge (and hence crowded). We discussed forward modeling and more data-driven detrending methods. But most importantly, we decided (more or less; final decision tomorrow) to do a tiny, half-pixel offset (dither) at the mid-campaign break. That might not sound like much, but I think it will significantly improve the calibration we can do and substantially increase the number of things we can learn.
I am still not well, but well enough for the first time in ages to do my group meeting. It was great! Boris Leistedt (NYU) talked to us about his project to do template-based photometric redshifts but where he learns the templates too. I love this project; it is the only method I like for getting the redshift distribution in LSST past the (effective) spectroscopic magnitude limit. He gave us the philosophy and showed us some preliminary successes on simple fake data.
Andy Casey (Cambridge) talked to us about the different stellar parameter and abundance codes working in the Gaia-ESO spectroscopic survey collaboration. It is complicated! Getting reliable and valuable stellar quantities out of mutually inconsistent codes is not an easy problem. His methodologies are filled with good ideas. Brian McFee (NYU) suggested that he or we look at the crowd-sourcing literature for ideas here, and that turns out to be a very good idea. I proposed to Casey that we do some reading this coming week.
Right after group meeting, Casey diagnosed our bugs from yesterday and got the L1-regularized fitting working! We celebrated with ramen.
Foreman-Mackey called with ideas for The Cannon: He thinks we might be able to do some simple things to propagate errors or uncertainties in our labels in the training set into uncertainties in our internal parameters, and from there to uncertainties in the labels we derive for new stars. His idea is based on the “uncertainties in both directions” model in my old fitting-a-line document. He also wants to play around with a fully Bayesian Cannon. We think this will be hard, but conceivably possible. He is thinking forward to some kind of sophisticated TESS input catalog in these projects.
As for Casey and my work on the compressed-sensing version of The Cannon: We turned on optimization with our L1-regularized model and nothing seems to be working right. Diagnosis progresses.
Andy Casey (Cambridge) arrived in NYC for two weeks to work on a compressed-sensing version of The Cannon. We talked about the design matrix, which is built from functions of the stellar labels. We also talked about how to do the cross-validation to both set the hyper-parameters (which are regularization strengths) and also check the prediction accuracy, without double-using the data. We are adopting ideas from Foreman-Mackey, who also encouraged us to think about propagating label uncertainties. We keep deciding this is computationally impossible, but Foreman-Mackey had some cheap-and-dirty suggestions (which come at the cost of severe approximation).
In the afternoon, Tim Geithner (formerly US Secretary of the Treasury) came by the Simons Foundation to talk about his career and involvement in the 2007–2009 financial crisis. This isn't really bloggable (according to The Rules at right), except that he actually used the word “Bayesian” to describe the attitude that he and his colleagues at the Federal Reserve (where he was in 2007 and 2008) took towards global financial markets, where there is no firmly established causal model. They had to act (or not act) in the face of great uncertainty. I am not sure I agree with what they did, but he quasi-espoused a quasi-principle that I hadn't heard before: Instead of optimizing expected utility, they tried to optimize for ability to correct their mistakes after the fact. It is like expected repair-ability! Not sure if that is really a useful guiding principle, but it occurred to me that it might have implications for policies around global climate change. And TESS and LSST?
Iain Murray commented yesterday that I should look at this paper and this paper, both by Salakhutdinov, Roweis, and Ghahramani, about optimizing marginalized likelihoods. Standard practice is expectation-maximization, but the upshot of these two papers (if I understand them correctly) is that gradient descent can be better if the latent variables (the ones being marginalized out) are badly determined. That's relevant to Dalya Baron and me, deprojecting galaxies, and to cryo-EM.
In a very short research day, I thought about whether it is better to run the E-M algorithm or to take derivatives and optimize with an industrial optimizer, when the optimization is of a marginalized likelihood. The standard practice in the business is E-M, but it isn't clear to me that this would beat taking the derivatives. It is expensive to take derivatives, but E-M is usually slow. I realized I need to know more about these matters.
In conversation with Ness about abundance space, we realized that we should do some exploring to see if the abundances returned by The Cannon show more structure than those returned by other pipelines acting on the APOGEE data. I suggested t-sne for exploratory data analysis, and also looking at random projections. Although we don't yet know whether we have novel or informative structure in abundance space, the open clusters look like they do stand out as islands!
I had a phone conversation with Foreman-Mackey, in which we discussed measurement and inference. The question at hand is whether to hand-tune the coefficients of The Cannon to not permit it to use lines from element A in assessing the abundance of element B. Left to its own devices, the code will do such things, if element A and element B are covariant in the training set of stars! If you permit the code to use A lines to get B abundances, you will probably get more precise answers (because there really is information there) but it will be harder for a user of the code output to use the output responsibly (if, say, he or she is fitting models of abundance patterns). This is yet another example of the (obvious) point that your data analysis decisions depend on your (or your user's) utility.
In other news, in reading about microscopy data-analysis methods, I finally understood the point of the E-M algorithm: It optimizes a marginalized likelihood, without requiring the user to explicitly compute the marginalized likelihood itself or its derivatives. It is funny that it took me so long to understand this, when I have about four or five papers in which we developed, proved, and used E-M algorithms!
In a day crushed by health issues, I worked from my bed! I wrote up a new baby problem for my cryo-EM and deprojecting-galaxies projects: Can you infer the variance tensor for a 3-d Gaussian density blob if you only get to observe noisy 2-d projections, and you don't know any of the projection angles or offsets? Obviously the answer is yes, but the real question is how close to fully principled inference can you get, tractably, and for realistic data sets? I wrote the problem statement and half the solution (in text form); if I am confined to my bed for the rest of the week, maybe I will get to write the code too.
I also had a conversation with Marina Spivak (SCDA) about likelihoods, posteriors, marginalization, and optimization for cryo-EM, and a conversation with Ness about chemical tagging with The Cannon.
[I am on vacation this week; that didn't stop me from doing a tiny bit of research.]
I did a bit of writing for the project of taking The Cannon into compressed-sensing territory, while Andy Casey (Cambridge) structures the code so we are ready to work on the problem when he is here in NYC in a couple of weeks. I tried to work out the most conservative possible train–validate–test framework for training and validation, consistent with some ideas from Foreman-Mackey. I also tried to understand what figures we will make to demonstrate that we are getting better or more informative abundances than other approaches.
Hans-Walter called to discuss the behavior of The Cannon when we try to do large numbers of chemical abundance labels. The code finds that its best model for one element will make use of lines from other elements. Why? He pointed out (correctly) that The Cannon does it's best to predict abundances. It is not strictly just measuring the abundances. It is doing it's best to predict, and the best prediction will both measure the element directly, and also include useful indirect information. So we have to decide what our goals are, and whether to restrict the model.
In the morning, Joakim Andén (Princeton) spoke at the SCDA about classifying noisy cryo-EM projected molecules into different categories coming from different (but similar) conformations of the same basic structure. These are subtle differences, and each data point (which is a tiny, noisy image) is only a randomly oriented projection of the molecule in question. He develops the best linear discriminant by finding the eigenvalues of the data matrix in the three-dimensional space, which is clever because he only has the data in (many) two-dimensional projections. His method works well and is fast. This is very relevant to the galaxy deprojection project I have going with Baron. The only significant issue with the method is that it assumes that the angles for the projections are known. They aren't really; it is interesting to think about the generalization to the case of unknown projection angles.
In the afternoon, Kapala (Cape Town), Lang, and I assigned short-term tasks for our Herschel dust-mapping project: Kapala will get the data in order and comment and understand the (legacy) code. Dustin will get the code working and delivering results, and I will write the abstract and outline the paper. We worked a bit on the title late in the day.
I coded up a toy problem from the (paywalled) Sigworth paper and solved it by optimizing a marginalized likelihood function. The toy problem is an 8×8 image subject to all possible 64 integer cyclic shifts and 4 integer rotations (256 possible views) and then noisified. The shifts and rotations are not known in advance; we only have the data. The marginalized likelihood optimization rocks! Below find 16 data examples (from a set of 512 total) and then some iterations (starting from a random guess) of the likelihood optimization. Rocking it!
I started reading this classic (and pay-walled) paper by Sigworth on cryo-EM imaging reconstruction. It contains an error in elementary probability (one I warn against in my probability calculus tutorial): It marginalizes out the angle variables using the likelihood or posterior on angles where it should use the prior. However, the paper is filled with good ideas and is strongly related to what I hope to do in this area. Indeed, all the opportunity is in incremental improvements, I suspect, not (probably) fundamentals. The paper also contains some nice toy problems!
In the afternoon and into the evening, I participated in the Spitzer Oversight Committee meeting (from my sick bed and by telecon). The big issue is the upcoming Senior Review; the project needs to figure out how many more years to ask for, at what budget, and for what reason. These are hard questions, even though the project has been incredibly productive in its latter years (and more productive per dollar every year).
The morning began with a talk by Oguz Semerci (Schlumberger-Doll) about optimization and inverse problems in imaging where the data are non trivially related to the image of interest and the problem is under-determined (requiring regularization). He showed examples from medical imaging, airport security, and oil exploration. Unlike many talks I see in these areas, he wasn't restricting to convex problems but still had very good performance. My only complaint would be that he was custom-building optimizers for each problem; I would be surprised if he beats the best industrial optimizers out there. Of course Lang and I are guilty of the same thing in The Tractor! One beautiful idea in his talk is the use of level sets to define the shapes of complex, opaque objects (or image segments).
Mid-day, Maria Kapala (Cape Town), Dustin Lang, and I had a chat about how to proceed on modeling Herschel imaging. Lang agreed to help Kapala get our old code from 2012 working again, and consult on optimization. I promised to start writing the paper (my favorite) and Kapala agreed to gather the relevant data for paper 1. (Somehow the two paragraphs of this research-blog post are more similar than I expected.)
Maria Kapala (Cape Town) showed up today, to discuss analyses of multi-wavelength imaging. Our idea, which was born years ago in work with Lang, is to build a dust map that constrains dust density and temperature and emission properties using all the Herschel bands, but works at the angular resolution of the best band. The usual practice is to smooth everything to the worst band!
Also had a long conversation with Boris Leistedt (NYU) about learning the templates simultaneously with the redshifts in a template-based photometric-redshift system. This is the right thing to do: It captures the causal model that is inherent in the template-based systems, but also captures the data-driven advantages of a machine-learning method. I am interested to know how accurately and at what resolution we could recover templates in realistic fake-data tests. We scoped a first paper on the subject.
Today was the second day of the Astronomy and Astrophysics Advisory Committee meeting. The most interesting material today was a report on proposal success rates and proposal pressures, particularly focusing on the NSF programs. The beautiful result is that none of the standard myths about proposal over-subscription are correct: It is not coming from an increase in the size of our community, it is not coming from faculty at smaller or non-traditional research institutions, it is not coming from any simple demographic changes, it is not coming from any increase in typical proposal budgets, and it is definitely not that people are writing proposals more hastily and less well. And these myth-bustings are empirical findings from data gathered by a subcommittee of the AAAC.
It appears that the vastly increased proposal pressure is coming from the resubmission of proposals rated Very Good to Excellent, which are getting rejected more and more because the funding rate is now so low. That is, there is a runaway process in which when the funding rate gets low enough, very good proposals are getting rejected with good comments, and the proposers resubmit, thereby increasing further the proposal pressure and reducing further the funding rate. This effect is expected to increase further when LSST comes on-line, because the NSF AST budget has to absorb operations.
Today was the first day of the Astronomy and Astrophysics Advisory Committee meeting. The AAAC oversees the points of mutual interest and overlap between NSF, NASA, and DOE. For me the highlight of the meeting was consideration of the National Academy of Sciences “OIR Report”. It explicitly calls out the community and the funding agencies to make sure we are properly preparing students to work in data analysis, observing, and instrumentation.
These things were all standard in the astronomy of past ages. Then, instrumentation became more professionalized as instruments got more complicated (and more expensive), and fewer students built instruments. Then observing started to become professionalized with big surveys like SDSS and the like. And data analysis is headed the same way, as our precision goals get more challenging and our data sets get larger. What to do about all this is unclear. However it is abundantly clear that if we start to lose track of where our data come from or what has happened to them in processing, we are doomed.
For health and civic-holiday reasons, it was a very short day today. I took a look at some of the classic or leading papers on cryo-EM data reduction, provided to me by Marina Spivak (SCDA). Many of these talk about optimizing a marginalized likelihood, which was just about all the cleverness I had come up with myself, so I am not sure I have much to contribute to this literature! The idea is that you really do have a strong prior on the distribution of projection angles (isotropic!) and you really do think each molecular shadow is independent, so there really is a case for marginalizing the likelihood function. Magland showed me correlations (time correlations) among neuron firings in his spike-sorting output. There is lots of phenomenology to understand there!
In another day out of commission, I spoke to Foreman-Mackey at length about various matters statistical, and wrote some text for a Tinker, Blanton, & Hogg NSF proposal. The statistics discussion ranged all around, but perhaps the most important outcome is that Foreman-Mackey clarified for me some things about the cryo-EM and galaxy deprojection projects I have been thinking about. The question is: Can averaging (apparently) similar projected images help with inferring angles and reconstruction? Foreman-Mackey noted that if we condition on the three-dimensional model, the projections are independent. Therefore there can be no help from neighboring images in image space. They might decrease the variance of any estimator, but they would do so at the cost of bias. You can't decrease variance without increasing bias if you aren't bringing in new information. At first I objected to this line of argument and then all of a sudden I had that “duh” moment. Now I have to read the literature to see if mistakes have been made along these lines.
The text I wrote for the proposal was about CMB anomalies and the corresponding large-scale-structure searches that might find anomalies in the three-space. Statistical homogeneity and isotropy are such fundamental predictions of the fundamental model, it is very worth testing them. Any anomalies found would be very productive.
In a day wrecked by health issues—I was supposed to be spending the day with visiting researchers from Northrop Grumman–I did manage to write a few equations and words into the method section of the document describing our plans for a compressed-sensing upgrade of The Cannon. It is so simple! And I realize we have to correct our argument (in the first paper) that we need exponentially large training sets to go to large numbers of labels. That argument was just plain wrong; it was based on an assumption that The Cannon is effectively a density estimator. It is not; it is essentially an interpolator.
In another day with limited motility, one small victory was drafting an abstract for upcoming work on The Cannon with Andy Casey (Cambridge). I like to draft an abstract, introduction, and method section before I start a project, to check the scope and (effectively) set the milestones. We plan to obtain benefit from both great model freedom and parsimony by using methods from compressed sensing.
I also had a few conversations; I spoke with Dun Wang and Schiminovich about Wang's work on inferring the GALEX flat-field. We made a plan for next steps, which include inferring the stellar flat and the sky flat separately (which is unusual for a spacecraft calibration). I spoke with Magland about colors, layout, and layering in his human interface to neuroscience data. This interface has some sophisticated inference under the hood, but needs also to have excellent look and feel, because he wants customers.
In a day shortened by health issues, I did get in a good conversation with David Schlegel (LBL), Aaron Meisner (LBL), and Dustin Lang on asteroid detection below the “plate limit”. That is, if we have multi-epoch imaging spread out over time, and we want to find asteroids, do we have to detect objects in each individual exposure or frame and then line up the detections into orbits, or can we search without individual-image detections? Of course the answer is we don't have to detect first, and we can find things below the individual-image detection limits. Meisner has even shown this to be true for the WISE data. We discussed how to efficiently search for faint, Earth-crossing (or impacting) asteroids.
I had lunch with luminary David Donoho (Stanford); we discussed a very clever of idea of his regarding significance thresholds (like five-sigma or ten-sigma): The idea is that a five-sigma threshold is only interesting if it is unlikely that the investigator would have come to this threshold by chance. As computers grow and data-science techniques evolve, it is easier to test more and more hypotheses, and therefore accidentally find (say) five-sigma results. Really the question should be: How expensive would it be to find this result by chance? That is, how much computation would I have to do on a “null” data set to accidentally discover a result of this significance? If the answer is “$5000 of Amazon EC2 time” then the result isn't really all that significant, even if it is many sigma! If the answer is “a billion dollars“ it is, probably, significant. We expanded on this idea in a number of directions, including what it would take to keep such calculations (translations of significance into dollars) up-to-date, and how to get this project funded!
I had to cancel a trip to Rutgers today for health reasons. My only research was some conversation with Magland about the relationship between the standard data-analysis practice of executing sets of sequential operations on data, and the concept of optimizing a scalar objective function. The context is spike sorting for neuroscience and catalog generation for astronomical imaging surveys.
When the data are complex and there is no simple, compact parametric model, it is hard to just optimize a likelihood or penalized likelihood or utility (though that doesn't stop us with The Tractor). However, sequential heuristic procedures can be designed to be some kind of locally greedy optimization of a scalar. That is, even if the code isn't explicitly an optimization, it can implicitly use the concept of optimization to set the values of otherwise arbitrary parameters (like detection thresholds, window sizes, and decision boundaries).
Today was almost entirely lost to letter writing and some somewhat severe health issues. But I did get in some good conversations with Tim Morton (Princeton) about exoplanet populations and stellar system populations, and with Ness and Rix about chemical abundances, asteroseismology, and our next steps with The Cannon.
In the morning, arguments continued with Magland about the neuroscience problem of spike sorting. He showed some beautiful visualizations of real spike-train data from live rat brains. We talked about the connection between spike sorting and decision making: Just like in astronomical catalog generation, unless you have a way to deliver a sampling or pdf in catalog space, a spike-sorting algorithm is a decision-making algorithm. As such, it must include a utility!
In the afternoon, I gave the Applied Mathematics Seminar at the Courant Institute at NYU. I spoke about exoplanet search, where we have used lots of applied linear algebra ideas.
On a totally different topic, I worked for a bit on the neuroscience spike-sorting problem with Magland. I had suggested a trivial E-M-like algorithm for the problem, and he was testing it out with some toy fake data. He found some anomalies in the scaling of the noise in the answer, which we spent time trying to unwind.
In the morning Keith Hawkins (Cambridge) showed up for my group meeting. He made the strong argument that if we could measure stellar ages, we could see structure in the Milky Way that won't be revealed by (blunt) chemical tracers alone. I agree, and he came to the right place!
In the afternoon at the SCDA, Charlie Epstein (Penn) gave Magland and me some very good arguments or intuitions about the chaotic map we have been playing with in the phase-retrieval problem. He argued that the projection operators can be designed to greatly increase the effective overlap of sets for which the map is looking. Recall that the goal is to find the overlap of two sets. In the conversation, he effectively generalized the concept of “reflection” (which might be, for example, across a line) to the equivalent for any kind of set (that is, for things other than lines and points). Awesome!
In the afternoon, Tarmo Aijo (SCDA) and the Rich Bonneau (SCDA) group talked with Greengard and me about their model for the time evolution of the human (gut) biome. They are using a set of Gaussian processes, manipulated into multinomials, to model the relative abundances of various components. It is an extremely sophisticated model, fully sampled by STAN, apparently. They asked us about speeding things up; we opined that it is unlikely (at the scale of their data) that the Gaussian processes are dominating the compute time.
In my tiny bit of research time today, I had a call with Rix, who wanted to talk about various projects related to APOGEE and The Cannon. In particular we discussed results from Bovy that show that individual star-forming clusters seem to truly form single-abundance populations, with a unique chemical signature. This result is remarkable because he achieved it without any use of physical stellar models! We discussed next steps for The Cannon, including a modification in which we control the model complexity differently at every wavelength. I have great hopes for this. My first-try technology: The Lasso.
Also had a nice call with Dalya Baron and Dovi Poznanski (TAU) about deprojecting galaxies. We figured out that early-type galaxies in the SDSS would be a good place to start, since I know there is a result lurking there.
I spent my research time today in Magland's office, watching him explore the magic of the iterated maps I discussed yesterday. To recap: These are chaotic maps that do not optimize any scalar objective function (that we know) but which are attracted to fixed points that are related to (project to) solutions of the equations we want (phase retrieval with arbitrarily good data). We wondered how the author of these maps created them; we tried experiments in which we parameterized various choices and saw which maps work and which don't. We wondered about possible connections to MCMC, which is a stochastic iterated map (these are deterministic). The math is magical.
My mind was blown today by the most remarkable solution to the phase-retrieval problem. (This is the problem of inferring the phase of a Fourier Transform given only the squared amplitude, as comes up in diffraction microscopy.) The solution is a chaotic iterated map, designed such that its fixed points are (related to) solutions to the equation, and such that its dynamics is attracted to those fixed points. The paper we found it in doesn't explain how they found the map or how they constructed it or how they tested it, but it just straight-up rocks on our test problems. And these problems are supposed to be officially NP-Hard. That is, the chaotic map takes us to the correct solution (we have a certificate, as it were) with very high probability, in a problem that is supposed to be combinatorically impossible. How this relates to optimization, I don't understand: The map is not justified as an optimizer of any specific objective function. How this relates to MCMC is possibly interesting: An MCMC is like a stochastic iterated map. Magland ended the day very excited about the breakthrough, even though it means that all the code he has been building and testing is now (probably) obviated.
On the last day of the workshop, various people were asked to talk about where they would like to be (or the field to be) in ten years. Rita Tojeiro (St Andrews) gave a nice argument that large LSS surveys are going to be critical and valuable for understanding galaxy evolution. Aaron Dotter (ANU) noted that, among other things, we are going to know a huge amount about stellar binaries. This is important for understanding stars (dynamics and eclipses to get masses and radii), exoplanets (inferences depend on multiplicity), and star formation.
It was a great meeting and thanks go to Charlie Conroy (CfA) for organizing (and for paying for all the food!).
The day began with Pieter van Dokkum (Yale) and me arguing about what's important and how to achieve it. In my view, the biggest issue with everything is that we don't believe any of the models, and yet we have to do science with them. How does that work? I don't think anyone has a clear answer. You can say that you are only getting answers subject to very strong asusmptions—that's very true—but that doesn't tell you what to believe and what not to believe. Like, given that the line lists going into 1-D models are wrong in such-and-such ways, what conclusions about stars are in danger of being wrong? In some sense every model-based result we have is mixed with some probability of some kind of ill-specified null model that things are very wrong and anything (within reason, whatever that means) could be happening.
In my research, this really comes up in the question of whether we have true parameters for stars. That is, are we correct about log-g and T-eff and various chemical abundances? At some level it is not just that we can't be right fundamentally (stars don't even have steady-state values for these!) and not just that we can't be right in practice (our models give different answers depending on the data at hand, etc.), it is that we can't know that we are right even when we are right as we can be. All we can really know is whether we do a good job of predicting new data. Compare models in the space of the data! I emphasized that today.
In response to all these issues, I said one thing that made people uneasy: I said we should focus our attention on problems that can be solved with the tools at hand. We should try to re-cast our projects in terms of things for which our models produce stable predictions (like relative measurements of various kinds). I don't think we should choose our scientific questions based on an abstract concept of “what's interesting”. I think we should choose on the concrete concept of what's possible. I not only think this is true, I think it how it has always been in the history of science.
Today was the first day of a workshop put together by Charlie Conroy (Harvard) to get people who model stars together with people who model stellar populations and people who model galaxies. I came because I want to understand better the “customers” for any further work we do modeling stars, either with The Cannon or else if we jump in to the 1-D modeling problem.
Everyone at the workshop said something today, which was amazing (and valuable) and way too much to report here. One highlight was Phil Cargile (Harvard) showing us how he can update atomic line parameters using observations of the Sun. We discussed how this might be done to jointly improve the predictions for many stars. Another highlight was Alexa Villaume (UCSC) showing us the provenance of the stellar parameters in some of the calibrator-level star sets. It was horrifying (one thing was a weighted average of literature values, weighed by publication date).
A number of people in the room are computing spectra (of stars or star clusters or galaxies) on a grid and then interpolating the grid at likelihood-evaluation time. This started a discussion of whether you should interpolate the spectra themselves or just the log-likelihood value. I argued very strongly for the latter: The likelihood is lower dimensionality and smoother than the spectrum in its variations. Not everyone agreed. Time for a short paper on this?
The day started with Dun Wang, Steven Mohammed (Columbia), David Schiminovich and I discussing the short-term plans for our work with GALEX. My top priority is to get the flat-field right, because if we can do that, I think we will be able to do everything else (pointing model, focal-plane distortion model, etc.).
Over lunch, Greengard and Jeremy Magland (SCDA) “reminded me” how the FFT works in the case of irregularly sampled data. This in the context of using Gaussian-process kernels built not in real space but in Fourier space. And then Greengard and Magland more-or-less simultaneously suggested that maybe we can turn all our Gaussian process problems into convolution problems! The basic idea is that the matrix product of a kernel matrix and a vector looks very close to a convolution, and the product with the inverse matrix looks like a deconvolution. And we know how to do this fast in Fourier space. This could be huge for asteroseismology. The log-determinant may also be simple when we think about it all in Fourier space. We will reconvene this conversation late next week.
There is much magical math these days around L1, LASSO, compressed sensing, and so on. These methods are having huge impacts across many fields, especially where data-driven models reign. There were two talks at SCDA today about these matters. In the morning, Christian Müller (SCDA) spoke about TREX, which is his set of methods for identifying predictors in very sparse problems. He showed incredible performance on a set of toy problems, and value in real problems. In the afternoon, Eftychios Pnevmatikakis (SCDA) reviewed a paper that proves some results related to the conditions under which an optimization problem (minimize blah subject to foo) will return the true or correct answer. There was a lot of geometry and there were some crazy sets. Definitions of “descent cone” and “statistical dimension” were introductions, for me, to some real math.
In the phase-retrieval problem, there are many objective functions one can write down for the problem, and many optimizers one can use, and many initializations, and many schedules for switching among objectives and optimizers. I spent my research time today playing in this playground. I got nothing awesome—everything gets stuck in local optima (not surprisingly).
One of the optimization methods I figured out turns the problem into a quadratic program with quadratic constraints (QCQP). This is convex if the constraints themselves are properly signed. They aren't! When they aren't, QCQP is apparently NP-Hard. So either this is going to be a tough optimization or else I am going to solve P = NP! Have I mentioned that I hate optimization?
I started a github repository with code for a probabilistic-modeling-based approach to phase retrieval. I started with optimization of a likelihood-like object. My current likelihood is clearly not convex, as expected: This is a quadratic (not a linear) model (the data is the squared norm of the Fourier transform of the image).
I came back from #AstroHackWeek and #DSEsummit all fired up to work on Cryo-EM, but then Leslie Greengard, Charlie Epstein (Penn) and Jeremy Magland (SCDA) distracted me on the phase retrieval problem: In one-dimensional problems, if you know the squared amplitude of the Fourier transform of an all-positive function on a bounded domain, you do not know the function: There are true degeneracies. In higher dimensions, there are degeneracies possible, but generic two-d and three-d all-positive scenes in bounded domains are uniquely specified by the norm of the Fourier transform the vast majority of the time. Or so we think. There are arguments, not proofs, I believe. Anyway, Magland has coded up (an improved version of) one of the standard algorithms for retrieving the phases of the Fourier transform and spent the last week or two testing and adjusting it. He seems to find that the solutions are unique, but that they are very hard to find. We spent the day arguing around this. I am trying not to get sucked in!
The day started with Josh Tucker (NYU) talking about the SMaPP lab at NYU, where they are doing observational work in Politics and Economics using data science methods and in a lab-like structure. The science is new, but so is the detailed structure of the lab, which is not a standard way of doing Political Science! He pointed out that some PIs in the audience have larger budgets for their individual labs than the entire NSF budget for Political Science! He showed very nice results of turning political-science theories into hypotheses about twitter data, and then performing statistical falsifications. Beautiful stuff, and radical. He showed that different players (protesters, opposition, and oppressive regimes) are all using diverse strategies on social media; the simple stories about twitter being democratizing are not (or no longer) correct.
In the afternoon, we returned from Cle Elum to UW, where I discussed problems of inference in exoplanet science with Foreman-Mackey, Elaine Angelino (UCB), and Eric Agol (UW). After we discussed some likelihood-free inference (ABC) ideas, Angelino pointed us to the literature on probabilistic programs, which seems highly relevant. In that same conversation, Foreman-Mackey pointed out the retrospectively obvious point that you can parameterize a positive-definite matrix using its LU decomposition and then never have to put checks on the eigenvalues. Duh! And awesome.
In the morning, Katy Huff (UCB) gave an energizing talk about The Hacker Within, which is a program to have peers teach peers about their data-science (or programming or engineering) skills to improve everyone's scientific capabilities. The model is completely ground-up and self-organized, and she is trying to make it easy for other institutions to “get infected” by the virus. She had some case studies and insights about the conditions under which a self-organized peer-educational activity can be born and flourish. UW and NYU are now both going to launch something; I was very much reminded of #AstroHackNY, which is currently dormant.
Karthik Ram (UCB) talked about a really deep project on reproducibility: They have interviewed about a dozen scientists in great detail about their “full stack” workflow, from raw data to scientific results, identifying how reproducibility and openness is or could be introduced and maintained. But the coolest thing is that they are writing up the case studies in a book. This will be a great read; both a comparative look at different disciplines, but also a snapshot of science in 2015 and a gift to people thinking about making their stack open and reproducible.
I had a great conversation with Stefan Karpinski (NYU) and Fernando Perez (UCB) about file formats (of all things). They want to destroy CSV once and for all (or not, if that doesn't turn out to be a good idea). Karpinski explained to me the magic of UTF8 encoding for text. My god is it awesome. Perez asked me to comment on the new STScI-supported ASDF format to replace FITS, and compare to HDF5. I am torn. I think ASDF might be slightly better suited to astronomers than HDF5, but HDF5 is a standard for a very wide community, who maintain and support it. This might be a case of the better is the enemy of the good (a phrase I learned from my mentor Gerry Neugebauer, who died this year). Must do more analysis and thinking.
In the afternoon, in the unconference, I participated in a discussion of imaging and image processing as a cross-cutting data-science methodology and toolkit. Lei Tian (UCB) described forward-modeling for super-resolution microscopy, and mentioned a whole bunch of astronomy-like issues, such as spatially variable point-spread function, image priors, and the likelihood function. It is very clear that we have to get the microscopists and astronomers into the same room for a couple days; I am certain we have things to learn from one another. If you are reading this and would be interested, drop me a line.
Today was the first day of the Moore-Sloan Data Science Environments annual summit, held this year in Cle Elum, Washington. We had talks about activities going on at UW; many of the most interesting to me were around reproducibility and open science. For example, there were discussions of reproducibility badges, where projects can be rated on a range of criteria and given a score. The idea is to make reproducibility a competitive challenge among researchers. A theme of this is that it isn't cheap to run fully reproducible. That said, there are also huge advantages, not just to science, but also to the individual, as I have commented in this space before. It is easy to forget that when CampHogg first went fully open, we did so because it made it easier for us to find our own code. That sounds stupid, but it's really true that it is much easier to find your three-year-old code on the web than on your legacy computer.
Ethics came up multiple times at the meeting. Ethical training and a foregrounding of ethical issues in data science is a shared goal in this group. I wonder, however, if we got really specific and technical, whether we would agree what it means to be ethical with data. Sometimes the most informative and efficient data-science methods to (say) improve the fairness in distribution of services could easily conflict with concerns about privacy, for example. That said, this is all the more reason that we should encourage ethical discussions in the data science community, and also encourage those discussions to be specific and technical.
In the morning, tutorials were again from Brewer and Marshall, this time on MCMC sampling. Baron got our wrap-up presentation document ready while I started writing a title, abstract, and outline for a possible first paper on what we are doing. This has been an amazingly productive week for me; Baron and I both learned a huge amount but also accomplished a huge amount. Now, to sustain it.
The hack-week wrap-up session blew my mind. Rather than summarize what I liked most, I will just point you to the hackpad where there are short descriptions and links out to notebooks and figures. Congratulations to everyone who participated, and to Huppenkothen, who did everything, and absolutely rocked it.
The day started with tutorials on Bayesian reasoning and inference by Brewer and Marshall. Brewer started with a very elementary example, which led to lots of productive and valuable discussion, and tons of learning! Marshall followed with an example in a Jupyter notebook, built so that every person's copy of the notebook was solving a slightly different problem!
Baron and I got the full optimization of our galaxy deprojection project working! We can optimize both for the projections (Euler angles and shifts etc) of the galaxies into the images and for the parameters of the mixture of Gaussians that makes up the three-dimensional model of the galaxy. It worked on some toy examples. This is incredibly encouraging, since although our example is a toy, we also haven't tried anything non-trivial for the optimizer. I am very excited; it is time to set the scope of what would be the first paper on this.
In the evening, there was a debate about model comparison, with me on one side, and the tag team of Brewer and Marshall on the other. They argued for the evidence integral, and I argued against, in all cases except model mixing. Faithful readers of this blog know my reasons, but they include, first, an appeal to utility (because decisions require utilities) and then, second, a host of practical issues about computing expected utility (and evidence) integrals, that make precise comparisons impossible. And for imprecise comparisons, heuristics work great. Unfortunately it isn't a simple argument; it is detailed engineering argument about real decision-making.
Andy Mueller (NYU) started the day with an introduction to machine learning and the scikit-learn package (of which he is a developer). His tutorial was beautiful, as it followed a fully interactive and operational Jupiter notebook. There was an audible gasp in the room when the astronomers saw the magic of dimensionality reduction in the form of the t-SNE algorithm. It is truly magic! There was much debate in the room about whether it was useful for anything other than blowing your mind (ie, visualization).
In the afternoon, Baron and I refactored our galaxy projection/deprojection code in preparation for inferring the galaxy itself. This refactor was non-trivial (and we learned a lot about the Jupyter notebook!), but it was done by the end of the day. We discussed next steps for the galaxy inference, which we hope to start tomorrow!
The second day of #AstroHackWeek started with Juliana Freire (NYU) talking to us about databases and data management. She also talked about map-reduce and its various cousins. Freire is the Executive Director of the Moore-Sloan Data Science Environment at NYU, and so also our boss in this endeavor (AstroHackWeek is supported by the M-S DSE, and also Github and the LSST project). Many good questions came up in Freire's discussion about the special needs of astronomers when it comes to database choice and customization. Freire opined that one of the reasons that the SDSS databases were so successful is that we had Jim Gray (Microsoft; deceased) and a team working full time on making them awesome. I agree!
In the afternoon, Dalya Baron (TAU) and I made fake data for our galaxy-deprojection project. These data were images with finite point-spread function and Gaussian noise. We then showed that by likelihood optimization we can (very easily, I am surprised to say) infer the Euler angles of the projection (and other nuisance parameters, like shifts, re-scalings, and background level). We also showed that if we have two kinds of three-dimensional galaxies making the two-dimensional images, we can fairly confidently decide which three-d galaxy made which two-d image. This is important for deprojecting heterogeneous collections. I ended the day very stoked about all this!
We kicked off AstroHackWeek 2015 today, with a huge crowd (some 60-ish) people from all over the world and in all different astrophysics fields (and a range of career stages!). Kyle Barbary (UCB) started the day with an introduction to Python for data analysts and Daniela Huppenkothen (NYU, and the principal organizer of the event) followed with some ideas for exploratory data analysis. They set up Jupyter notebooks for interactive tutorials and the crowd followed along. At some point in the morning, Huppenkothen shocked the crowd by letting them know that admissions to AstroHackWeek had been done (in part) with a random number generator!
In the afternoon, the hacking started: People split up into groups to work on problems brought to the meeting by the participants. Dalya Baron (TAU) and I teamed up to write code to build and project mixtures of Gaussians in preparation for an experiment in classifying and determining the projections (Euler angles) of galaxies. This is the project that Leslie Greengard and I have been discussing at the interface of molecular microscopy and astrophysics; that is, anything we figure out here could be used in many other contexts. By the end of the day we had working mixture-of-Gaussian code and could generate fake images.
Who walked in to the SCDA today but Amit Singer (Princeton), a mathematician who works on Cryo-EM and related problems! That was good for Greengard and me; we bombarded him with questions about how the system works, physically. Of course he knew, and showed us a beautiful set of notes from Fred Sigworth (Yale) explaining the physics of the situation. It is beautifully situated at the interface of geometric and physical optics, which will make the whole thing fun.
After I started to become convinced that we might be able to make progress on the Cryo-EM problem, Greengard described to me an easier (in some ways) problem: cryo-electron tomography. This is similar to Cryo-EM, but the experimenter controls the angles! In principle this should make it easier (and it does), but according to mathematical standards the problem is still ill-posed: The device can't be turned to all possible angles, or not even enough to fill out the tomographic information needed for general reconstructions. Of course this doesn't phase me!.
Useful conversations with Foreman-Mackey included two interesting subjects. One is that even if you have K different MCMC moves, each of which satisfies detailed balance, it is not guaranteed that a deterministic sequence of them will satisfy detailed balance! That blew me away for a few minutes but then started to make sense. Ish. According to Foreman-Mackey, there is a connection between this issue and the point that a product of symmetric matrices will not necessarily be symmetric!
The other interesting subject with Foreman-Mackey was on exoplanet system modeling. We want to explore some non-parametrics: That is, instead of considering the exoplanet population as a mixture of one-planet, two-planet, three-planet (and so-on) systems, model it just with K-planet systems, where K is very large (or infinite). This model would require having a significant amount of the planet-mass pdf at very low masses (or sizes). Not only might this have many technical advantages, it also accords with our intuitions: After all, I think we (almost) all think that if you have one or two major planets, you probably have many, many minor ones. Like way many.
At group meeting, Ana Bonaca (Yale) told us about inferring the potential and mass distribution in the Milky Way using pairs of cold stellar streams. She seems to find—even in the analysis of fully simulated data sets—somewhat inconsistent inferences from different streams. They aren't truly inconsistent, but they look inconsistent when you view only two parameters at a time (because there are many other nuisane parameters marginalized out). She shows (unsurprisingly) that radial velocity information is extremely valuable.
Brian McFee (NYU) talked about measuring rhythm in recorded music. Not tempo but rhythm. The idea is to look at ratios of time lags between spectrogram features (automatically, of course). He showed some nice demonstrations with things that are like "scale transforms": Like Fourier transforms but in the logarithm of frequency.
In the afternoon, Bonaca, Foreman-Mackey, and I discussed the relationships between dynamics and geometry and statistics. I gave a very powerful argument about why sampling is hard in high dimensions, and then immediately forgot what I said before writing it down. We discussed new MCMC methods, including Foreman-Mackey's proposals for Hamiltonian MCMC in an ensemble.
In the morning at the Simons Center for Data Analysis, Brian Weitzner (JHU) gave a nice talk about determining structures of antibodies, and predicting structures of new or novel antibodies for drug design and so on. I learned many things in the talk, one of which is that there are only 100,000 known protein structures. That might sound like a big number, but compare it to the number of known protein sequences!. Another thing I learned is that it is often possible to get a good guess at a structure for a (part of a) protein by looking at the (known) structures of similar sequences. The cool thing is that they (the large community or communities) have developed very good “energy functions” or scores for ranking (and thus optimizing) conformations; these are obviously approximations to the real physics, but seem to work extremely well.
In the rest of my research day, I batted around issues with diffraction microscopy, ptycography, and cryo-electron microscopy imaging with Leslie Greengard. We talked about a very non-trivial issue for the first two, which is that there can be very different functions with identical squared-moduli of their Fourier transforms, or at least we think there must be. There sure are in one dimension. What about in three dimensions? We talked about an exciting prospect for the latter, which is that the cryo-EM problem is very similar to an old idea I had of deprojecting all galaxy images ever taken. “Cryo-EM for galaxies!”
I spent my research time today planning for my sabbatical. I have too many things to write, so I have to cut down the list. How to cut it down? I have never been good at anything other than “Look: squirrel!” (Meaning: ultra short-term decision making!)
I participated today in a panel at the Simons Foundation about the present and future of computational astrophysics. We talked about challenges in simulating the Universe, the Milky Way, stars, and exoplanets (formation and atmospheres). We also talked about challenges in comparing theory to data, where the theory is computational and the data sets are large. We also talked about the point that sometimes computation can be used to solve fundamental physics and applied math questions. We all agreed that an Advanced-LIGO detection of a gravitational radiation source could have a huge impact on what we all think about. It was a great day, with participation from Bryan (Columbia), Hernquist (Harvard), Ho (CMU), MacFadyen (NYU), Spergel (Princeton), Stone (Princeton), and Quataert (Berkeley) all participating, plus people from the Simons Foundation.
I spent the day writing code to create mixtures of Gaussians and (importantly) their Fourier transforms. I can't count the number of times I have written mixtures-of-Gaussians code! But each use case is at least slightly different. Today the application is diffraction microscopy. I want to explore bases other than the standard grid-of-pixels basis.
The funny thing about the diffraction-microscopy problem is that it is simultaneously trivial and impossible: It is to infer all the phases of the Fourier transform given only a noisy, censored measurement of its square modulus. All the approaches that work apply very informative priors or regularization. My biggest concern with them is that they often put the most informative part of the prior on the space outside the object. Hence my goal of using a basis that is compact to begin with.
As a teaser and demo, here is an unlabeled set of figures that “test” my code:
Group meeting today was a pleasure. Laura Norén (NYU) talked about ethnography efforts across the Moore–Sloan Data Science Environments, including some analysis of space. This is relevant to my group and also the NYU Center for Data Science. She talked also about the graph of co-authorship that she and a team are compiling, to look at the state of data-science collaborations (especially interdisciplinary ones) before, during, and after the M-S DSE in the three member universities and also comparison universities. There was some excitement about looking at that graph.
Nitya Mandyam-Doddamane (NYU) showed us results on the star-formation rates in galaxies of different optical and ultraviolet colors. She is finding that infrared data from WISE is very informative about hidden star formation, and this changes some conclusions about star formation and environment (local density).
Dun Wang talked about how he is figuring out the pointing of the GALEX satellite by cross-correlating the positions of photons with the positions of stars. This fails at bright magnitudes, probably because of pile-up or saturation. He also showed preliminary results on the sensitivity of the detector, some of which appear to be different from the laboratory calibration values. The long-term goal is a full self-calibration of the satellite.
Dan Cervone (NYU) spoke about statistics problems he has worked on, in oceans and in sports. We talked about sports, of course! He has been working on spatial statistics that explain how basketball players play. We talked about the difference between normative and descriptive approaches. Apparently we are not about to start a betting company!
Daniela Huppenkothen spoke about the outburst this summer of V404 Cygni, a black hole that hasn't had an outburst since 1989. There are many observatories that observed the outburst, and the question (she is interested in) is whether it shows any oscillation frequencies or quasi-periodic oscillations. There are many spurious signals caused by the hardware and observing strategies, but apparently there are some potential signatures that she will show us in the coming weeks.
After I asked for more, Greengard sent me two classic papers (here and here) on diffraction imaging (microscopy). These are beautifully written and very clear. So I now understand the problem well, and the standard solution (which is “oversample by a factor of two!”). One interesting issue is that in real experiments a beam stop kills your low-k modes, so you don't get to see the zero (or near-zero) part of the Fourier Transform. Most of the heavy lifting in standard approaches is done by setting a zero or null region around the object and requiring that the function go to zero there. That strikes me as odd, and only applicable in some circumstances. So I became inspired to try some complementary approaches.
The day also included a conversation with So Hattori, who is going to re-start our search for very long-period planets in the Kepler data. This is an important project: It is Jupiter, not Earth, that is the driver of the present-day Solar System configuration.
Leslie Greengard put me onto thinking about diffraction imaging of non-periodic (non-crystal-lattice) objects. I did some reading and slowly started to get some understanding of how people think about this problem, which is (mathematically) that you observe (something like) the squared-modulus of the Fourier transform of the thing you care about. You have to reconstruct all the (missing) phase information. This requires very strong prior information. Late in the day I had a long conversation with Foreman-Mackey about this and many other things.
I went off the grid for the weekend, but that didn't stop me from working out a probabilistic approach to understanding the population of long-period planets given low signal-to-noise radial-velocity data. This problem was given to me by Heather Knutson (Caltech), who has a great data set, in which many stars show no companion at high detection confidence. The problem is hairy if you permit any star to have any number of long-period planets. It becomes much more tractable if you assume (by fiat) that every star has exactly one or zero. However this assumption is clearly unphysical!
I spent most of the day at Columbia's Data Science Institute, participating in a workshop on data science in the natural sciences. I learned a huge amount! There were way too many cool things to mention them all here, but here are some personal highlights:
Andrew Gelman (Columbia) talked about the trade-off between spatial resolution and what he called “statistical resolution”; he compared this trade-off to that in political science between conceptual resolution (the number of questions we are trying to ask) and statistical resolution (the confidence with which you can answer those questions). He also talked about distribution (or expectation) propagation algorithms that permit you to analyze your data in parts and combine the inferences, without too much over-working.
Joaquim Goes (Columbia) talked about ocean observing. He pointed out that although the total biomass in the oceans is far smaller than that on land, it cycles faster, so it is extremely important to the total carbon budget (and the natural sequestration of anthropogenic carbon). He talked about the Argo network of ocean float data (I think it is all public!) and using it to model the ocean.
John Wright (Columbia) pointed out that bilinear problems (like those that come up in blind deconvolution and matrix factorization and dictionary methods) are non-convex in principle, but we usually find good solutions in practice. What gives? He has results that in the limit of large data sets, all solutions become transformations of one another; that is, all solutions are good solutions. I am not sure what the conditions are etc., but it is clearly very relevant theory for some of our projects.
There was a student panel moderated by Carly Strasser (Moore Foundation). The students brought up many important issues in data science, one of which is that there are translation issues when you work across disciplinary boundaries. That's something we have been discussing at NYU recently.
Blanton–Hogg group meeting re-started today. We began the introduction process that will take at least two meetings! I talked about inferring stellar ages from spectra.
Boris Leistedt spoke about large-scale structure projects he did this summer, including one in which they are finding a good basis for doing power spectrum estimation with weak lensing. He also talked about large-scale structure work he is doing in DES.
Kopytova spoke about her project with me to infer cool-star parameters, marginalizing out issues with calibration or continuum normalization. Blanton asked her to interpret the calibration inferences; that's a good idea!
Chang Hoon Hahn spoke about comparisons of two-point and three-point functions measured in SDSS-IV data with different fiber-collision methods, and the same in simulations. This led to our monthly argument about fiber collisions, which is a problem that sounds so simple and is yet so hard! The problem is that when two galaxies are very nearby on the sky, there is a reduced probability for getting the redshift of one of them. If left uncorrected, these missing data screw up two-point statistics badly (and one-point statistics less badly). Dan Cervone suggested that we look at the missing data literature for guidance. The big issue is that we don't have a good model for the spatial distribution of galaxies, and the best corrections for missing redshifts depend in detail on that spatial distribution! I suggested some data-driven approaches, but there might not be enough data to test or implement them.
Vakili spoke about using likelihood-free inference to do cosmological large-scale structure projects, which currently use a wrong (and difficult to compute) likelihood function. He and Hahn are collaborating on a test-bed to get this kind of inference working; we discussed their first results.
Late in the day, I spoke with Matt Kleban (NYU) about a statistical theory he has for the distribution (number per class) of "classified objects". He seems to have a model with almost no parameters that does a good job of explaining a wide range of data. I discussed with him some methods for testing goodness of fit and also model comparisons with other kinds of models.
Today was my first day at the Simons Center for Data Analysis. I spent a good chunk of the morning discussing projects with Leslie Greengard (SCDA), the director of the place. We discussed cryogenic EM imaging of molecules, in which you get many shadows of the same molecule at different positions and orientations. The inference of the three-dimensional structure involves inferring also all the projection angles and offsets—or does it?. We discussed the problem of neuron spike sorting, where potential events observed from an array of electrodes (inserted in a brain) are assigned to neurons. How do you know if you got it right? Many algorithms are stable, but which are correct?
Later in the day, Mike O'Neil joined us and we discussed (among other things) non-negativity as a constraint on problem solving. This is an amazingly informative constraint. For example, if you are doing image reconstruction and you have an array of a thousand by a thousand pixels, the non-negativity constraint removes all but one part in two to the millionth power of your parameter space! That is more informative than any data set you could possibly get. Greengard didn't like this argument: It is a counting argument! And I was reminded of Phillip Stark's (Berkeley) objection to it: He said something to the effect of “No, the constraint is much stronger than that, because it applies everywhere in space, even where you haven't sampled, so it is really an infinite number of constraints”. Greengard showed us some magic results in diffractive imaging where it appears that the non-negativity constraint is doing a huge part of the heavy lifting. This all relates to things I have discussed previously with Bernhard Schölkopf and his team.
In yet another low-research day, I talked to Taisiya Kopytova about her projects and her upcoming job search. I also talked to Ness about our APOGEE-3 white paper with Jonathan Bird. I just want to have spectra of all the red giants in the entire disk. Is that too much to ask?
In a low-research day, the highlight was a long discussion of current projects with Foreman-Mackey, who is now settled in Seattle. We discussed exoplanet search and populations inference with K2 data. One test of the false-positive rate (the part not generated by easy-to-model astrophysical sources) is to invert the lightcurve and search again. And when you do, sure enough, you find small, long-period exoplanets! That is the beginning of an empirical approach to understanding stellar-variability-induced and instrumental false positives. We have also talked about reordering the time-series data, but for long-period planets, this doesn't cover all bases. There are many improvements to Foreman-Mackey's search since his first K2 catalog paper: New basis for the systematic effects, built with the PCP (the robust PCA); new prior on principal component amplitudes for more informative marginalization; automated vetting that makes use of probabilistic models for model comparison. It all looks like it is working.
We also talked about the next generation of the emcee code, dubbed emcee3. The idea is that it is an ensemble sampler, but the stretch move (the basis of emcee now) becomes just one of a huge menu of update steps. We discussed schedules of update methods in mixed schemes. We discussed the problem, after a run with a non-trivial schedule, of figuring out which updates worked best. We discussed what statistics you want to keep for an ensemble sampler to assess performance in general. Homework was assigned.
Today was my last full day at MPIA for 2015. Ness and I worked on her age paper, and made the list of final items that need to be completed before the paper can be submitted, first to the SDSS-IV Collaboration, and then to the journal. I also figured out a bunch of baby steps I can do on my own paper with the age catalog.
At MPIA Galaxy Coffee, K. G. Lee (MPIA) and Jose Oñorbe (MPIA) gave talks about the intergalactic medium. Lee spoke about reconstruction of the density field, and Oñorbe spoke about reionization. The conversations continued into lunch, where I spoke with the research group of Joe Hennawi (MPIA) about various problems in inferring things about the intergalactic medium and quasar spectra in situations where (a) it is easy to simulate the data but (b) there is no explicit likelihood function. I advocated likelihood-free inference or ABC (as it is often called), plus adaptive sampling. We also discussed model selection, and I advocated cross-validation.
In the afternoon, Ness and I continued code review and made decisions for final runs of The Cannon for our red-giant masses and ages paper.
At Milky Way group meeting, Eddie Schlafly (MPIA) showed beautiful results (combining PanSTARRS, APOGEE, 2MASS, and WISE data) on the dust extinction law in the Milky Way. He can see that some of the nearby dust structures have anomalous RV values (dust extinction law shapes). Some of these are previously unknown features; they only appear when you have maps of density and RV at the same time. Maybe he gets to name these new structures!
Late in the day, Ness and I audited her code that infers red-giant masses from APOGEE spectra. We found some issues with sigmas and variances and inverse variances. It gets challenging! One consideration is that you don't ever want to have infinities, so you want to use inverse variances (which become zero when data are missing). But on the other hand, you want to avoid singular or near-singular matrices (which happen when you have lots of vanishing inverse variances). So we settled on a consistent large value for sigma (and correspondingly small value for the inverse variance) that satisfies both issues for our problem.
At breakfast I told Morgan Fouesneau (MPIA) my desiderata for a set of matplotlib color maps: I want a map that indicates intensity (dark to bright, say), a map that indicates a sequential value (mass or metallicty or age, say), a map that indicates residuals away from zero that de-emphasizes the near-zero values, and a map that is the same but that emphasizes the near-zero values. I want the diverging maps to never hit pure white or pure black (indeed none of these maps should) because we always want to distinguish values from “no data”. And I want them to be good for people with standard color-blindness. But here's the hard part: I want all four of these colormaps to be drawn from the same general palette, so that a scientific paper that uses them will have a consistent visual feel.
Before lunch, Ness and I met with Marie Martig (MPIA) and Fouesneau to go through our stellar age results. Martig and Fouesneau are writing up a method to use carbon and nitrogen features to infer red-giant ages and masses. Ness and I are writing up our use of The Cannon to get red-giant ages and masses. It turns out that The Cannon (being a brain-dead data-driven model) has also chosen, internally, to use carbon and nitrogren indicators. This is a great endorsement of the Martig and Fouesneau method and project. Because they are using their prior beliefs about stellar spectroscopy better than we are, they ought to get more accurate results, but we haven't compared in detail yet.
Late in the day, Foreman-Mackey and I discussed K2 and Kepler projects. We discussed at length the relationship between stellar multiplicity and binary and trinary (and so on) populations inference. Has all this been done for stars, just like we are doing it for exoplanets? We also discussed candidate (transiting candidate) vetting, and the fact that you can't remove the non-astrophysics (systematics-induced) false positives unless you have a model for all the things that can happen.
Late in the day, Rix, Ness, and I showed Ben Weiner (Arizona) the figures we have made for our paper on inferring red-giant masses and ages from APOGEE spectroscopy. He helped us think about changes we might make to the figures to bolster and make more clear the arguments.
I spent some of the day manipulating delta functions and mixtures of delta functions for my attempt to infer the star-formation history of the Milky Way. I learned (for the Nth time) that it is better to manipulate Gaussians than delta functions; delta functions are way too freaky! And, once again, thinking about things dimensionally (that is, in terms of units) is extremely valuable.
In the morning, Rix and I wrote to Andy Casey (Cambridge) regarding a proposal he made to use The Cannon and things we know about weak (that is, linearly responding) spectral lines to create a more interpretable or physically motivated version of our data-driven model, and maybe get detailed chemical abundances. Some of his ideas overlap what we are doing with Yuan-Sen Ting (Harvard). Unfortunately, there is no real way to benefit enormously from the flexibility of the data-driven model without also losing interpretability. The problem is that the training set can have arbitrary issues within it, and these become part of the model; if you can understand the training set so well that you can rule out these issues, then you don't need a data-driven model!
We looked at numerical derivatives of models of stellar spectroscopy, taken by Yuan-Sen Ting (Harvard). When you numerically differentiate someone else's computer code, you have to be careful! If you take a step size that is too large, the behavior won't be linear. If you take a step size that is too small, the differences won't be trustable (because the code has unknown convergence precision). So there should be a “sweet spot”. Ting's analysis doesn't clearly show such a sweet spot for his derivatives of the Kurucz stellar models. That's concerning.
Mid-afternoon I had an idea for a new project, related to the above, and also to The Cannon: If we have a pair of matched pairs (yes, four stars), all of which have similar Teff and logg, but two of which are from one open cluster and two of which are from another, we should be able to look at the precision of any possible abundance estimate, even without any model. That is, we can see at what signal-to-noise we can differentiate the pairs, or we can see how much more difference there is across clusters as compared to within clusters. This could be a baby step towards a fully data-driven, spectral-space chemical tagging method, something I have dreamed about before.
[I am supposed to still be on vacation but we came back a day early.]
At the end of the day, Rix and I met with Yuan-Sen Ting (Harvard) and Ness to discuss the state of projects. Ting is modeling stellar spectra with linear combinations of model derivatives and has some extremely promising results (and some small bugs). We worked on diagnosis. One great thing about his project is that many of the results (on fake data, anyway) can be predicted exactly by methods of information theory. That is, he is close to having a testing framework for his scientific project! This relates to things CampHogg has discussed previously.
With Ness we went through all of the figures and discussed their relevance and necessity for her paper on measuring stellar masses and ages with The Cannon. Her current figures follow a very nice set of conventions, in which a progressive "grey" color map is used to indicate density (of, say, points in a scatter plot) but a sequential colored color map is used to indicate any other quantity (for example, mean stellar age in that location). The paper shows clearly that we can measure stellar masses and ages, and explains how. We made many small change requests.
At Galaxy Coffee, Trevor Mendel (MPE) showed discussed resolved stellar populations as observed with the MUSE integral-field spectroscopy instrument. They extract spectra using spatial priors based on HST imaging, which is not unlike the forced photometry we did with WISE. This is obviously a good idea in confused fields. That led to a bus conversation with Mendel about the confusion limit as applied to spectroscopic (or multi-band imaging data). The confusion limit is a function (in this case) of the spectral diversity, as well as spatial resolution; indeed a band with low angular resolution but in which there is strong spectral diversity could help enormously with confusion. This is a fundamental astronomy point which (to my limited knowledge of the literature) may be unexplored. We also discussed what the data-analysis problems would look like if we had models of stellar spectra that we believed.
In the afternoon, Price-Whelan and I pair-coded a search algorithm for stellar structures that hew close to one orbit, given noisy phase-space measurements. We chased the problem for a while and then decided we have to suck it up and do some real engineering. Brute-force search is slow! Our usual mode is play first, profile later, but this problem is slow enough that we need to profile before playing. In particular, we have to apply some engineering know-how to the discretization of our model (computed) orbits and to our comparison of data to models (where we probably need a kd-tree or the like).
[Off the grid for the last few days.]
In Milky Way group meeting, Xue (MPIA) talked about our Kinematic Consensus project, which is starting to show promise: We look for sets of stars with observable properties consistent with having the same orbital integrals of motion. It appears to group stars even when the trial potential (we use for orbit construction) is significantly different from the true potential.
At that same meeting, I described my hopes for modeling or measuring the Milky Way star-formation history, using stellar ages derived from stellar masses on the red-giant branch. This got Dalcanton (UW) asking about the point that the age distribution of red-giant stars is not the age distribution of the disk as a whole, since different stars get onto the red-giant branch at different times and stay on for different times. This led to the question: Why I don't model the mass distribution and then transform it into a star-formation history “in post” as it were? I like it! So I will re-code tomorrow. It is very good to talk out a project early rather than late!
In the middle of the day, Price-Whelan and I pair-coded some of his search for substructure, in a project called King Kong, which is a direct competitor to Kinematic Consensus (and inspired by work by Ibata at Strasbourg). Here we look for orbits that “catch” a large number of stars (within their uncertainties). We struggled with some elementary geometry, but prevailed!
Late in the day, Kopytova swung by and we looked at marginalizing out the calibration (or continuum) vectors in her inference of stellar parameters for substellar companions. It turns out that the linear algebra is trivial, along the lines of Foreman-Mackey and my re-formulation of our K2 project. We will pair-code the matrix inversion lemma tomorrow.
Bird and I spent our last hours together in Heidelberg working out a plan and to-do list for his paper on the age–velocity relation in the Milky-Way disk. We planned tests and extensions, the scope of the minimal paper, and a visualization of the model. We worked out a way to express the model in the space of the raw-ish data, which is important: You can't assess what your model is doing in the space of the latent parameters; the data are the only things that exist (imho). That is, you have to project your model into the data space to assess where it is working and not working. And to decide whether you need to do more. That's old-school Roweis.
At MPIA Galaxy Coffee, Rix talked about mono-abundance populations in the Milky Way, and Dan Weisz talked about combining photometric and spectroscopic observations to constrain the properties of distant, compact star clusters. In the first talk, Rix showed that at low abundances, the stellar populations want negative scale lengths: This is because they are like "donuts" or "rings" around the Milky Way: Low-abundance stars in the disk appear to form far out. In the discussion, we argued about how we could use his results to test models for the thickness of the disk and its dependence on stellar age and orbital radius.
In the second talk, Weisz showed that inclusion (and marginalization out) of a non-trivial calibration model makes the inferences from badly-calibrated spectroscopy much more accurate. They also permit learning about calibration (he showed that you learn correct things about calibration) and they permit principled combination of different data sets of different calibration veracity or trustability. All calibration data are wrong, so you really want to leave it free in everything you do; that's a nice challenge for the next generations of The Cannon.
At MPIA Milky Way group meeting, Rix talked about the recent paper by Sanders & Binney about extended distribution functions using a series of very simple, analytic prescriptions of various inputs. I objected to the paper: I don't see any reason to work with simulation-output-inspired simple functional forms when you have actual simulation outputs! There is no difference, intellectually or conceptually, between a piece of computer code executing tanh() and a similar piece of code executing output_of_simulation_h277() except that the latter, tied as it is to physical models, would be interpretable physically. But anyway. Rix pointed out that their prescription for radial migration could be useful for quantitative analysis of our age results. I agree!
Also at group meeting, and related to radial migration, there were discussions of results from Hernitschek, Sesar, and Inno results on variable stars in PanSTARRS. It looks likely that Inno will be able to extract a catalog of Cepheid stars in the disk, and (if we can convince someone to take data) we can get metallicities and radial velocities for them. We also discussed (momentarily) the point that we should be doing visualization, discovery, and astronomy with Hernitschek's immense catalog of variable stars in PanSTARRS: She has colors, magnitudes, and variability timescale and amplitude for every point source!
Today Yuan-Sen Ting (Harvard) gave Ness and me derivatives of Kurucz-like models of APOGEE spectral data with respect to individual element abundances and effective temperature and gravity. Ness and I plotted these theoretical derivatives against “beliefs” obtained by The Cannon sbout these same dependencies. They agree remarkably well! But in detail they disagree; it remains to figure out what parts of the disagreement are noise or real results. We also showed (based on the derivative vectors) that the precisions with which APOGEE can detect or measure individual abundances are very, very high.
In other news, I found and fixed bugs in my star-formation history code, but none of the bug fixes led to reasonable results. Time to write some tests.