The Cannon and detailed abundances

[I am on vacation this week; that didn't stop me from doing a tiny bit of research.]

I did a bit of writing for the project of taking The Cannon into compressed-sensing territory, while Andy Casey (Cambridge) structures the code so we are ready to work on the problem when he is here in NYC in a couple of weeks. I tried to work out the most conservative possible train–validate–test framework for training and validation, consistent with some ideas from Foreman-Mackey. I also tried to understand what figures we will make to demonstrate that we are getting better or more informative abundances than other approaches.

Hans-Walter called to discuss the behavior of The Cannon when we try to do large numbers of chemical abundance labels. The code finds that it's best model for one element will make use of lines from other elements. Why? He pointed out (correctly) that The Cannon does it's best to predict abundances. In no sense is it directly measuring the abundances. It is doing it's best to predict, and the best prediction will measure the element directly, and also include useful indirect information. So we have to decide what our goals are, and whether to restrict the model.


linear classifier for projections, Herschel

In the morning, Joakim Andén (Princeton) spoke at the SCDA about classifying noisy cryo-EM projected molecules into different categories coming from different (but similar) conformations of the same basic structure. These are subtle differences, and each data point (which is a tiny, noisy image) is only a randomly oriented projection of the molecule in question. He develops the best linear discriminant by finding the eigenvalues of the data matrix in the three-dimensional space, which is clever because he only has the data in (many) two-dimensional projections. His method works well and is fast. This is very relevant to the galaxy deprojection project I have going with Baron. The only significant issue with the method is that it assumes that the angles for the projections are known. They aren't really; it is interesting to think about the generalization to the case of unknown projection angles.

In the afternoon, Kapala (Cape Town), Lang, and I assigned short-term tasks for our Herschel dust-mapping project: Kapala will get the data in order and comment and understand the (legacy) code. Dustin will get the code working and delivering results, and I will write the abstract and outline the paper. We worked a bit on the title late in the day.


toy deprojection problem

I coded up a toy problem from the (paywalled) Sigworth paper and solved it by optimizing a marginalized likelihood function. The toy problem is an 8×8 image subject to all possible 64 integer cyclic shifts and 4 integer rotations (256 possible views) and then noisified. The shifts and rotations are not known in advance; we only have the data. The marginalized likelihood optimization rocks! Below find 16 data examples (from a set of 512 total) and then some iterations (starting from a random guess) of the likelihood optimization. Rocking it!


cryo-EM calculus, NASA

I started reading this classic (and pay-walled) paper by Sigworth on cryo-EM imaging reconstruction. It contains an error in elementary probability (one I warn against in my probability calculus tutorial): It marginalizes out the angle variables using the likelihood or posterior on angles where it should use the prior. However, the paper is filled with good ideas and is strongly related to what I hope to do in this area. Indeed, all the opportunity is in incremental improvements, I suspect, not (probably) fundamentals. The paper also contains some nice toy problems!

In the afternoon and into the evening, I participated in the Spitzer Oversight Committee meeting (from my sick bed and by telecon). The big issue is the upcoming Senior Review; the project needs to figure out how many more years to ask for, at what budget, and for what reason. These are hard questions, even though the project has been incredibly productive in its latter years (and more productive per dollar every year).


image modeling and optimization

The morning began with a talk by Oguz Semerci (Schlumberger-Doll) about optimization and inverse problems in imaging where the data are non trivially related to the image of interest and the problem is under-determined (requiring regularization). He showed examples from medical imaging, airport security, and oil exploration. Unlike many talks I see in these areas, he wasn't restricting to convex problems but still had very good performance. My only complaint would be that he was custom-building optimizers for each problem; I would be surprised if he beats the best industrial optimizers out there. Of course Lang and I are guilty of the same thing in The Tractor! One beautiful idea in his talk is the use of level sets to define the shapes of complex, opaque objects (or image segments).

Mid-day, Maria Kapala (Cape Town), Dustin Lang, and I had a chat about how to proceed on modeling Herschel imaging. Lang agreed to help Kapala get our old code from 2012 working again, and consult on optimization. I promised to start writing the paper (my favorite) and Kapala agreed to gather the relevant data for paper 1. (Somehow the two paragraphs of this research-blog post are more similar than I expected.)


high-resolution dust maps, data-driven SED templates

Maria Kapala (Cape Town) showed up today, to discuss analyses of multi-wavelength imaging. Our idea, which was born years ago in work with Lang, is to build a dust map that constrains dust density and temperature and emission properties using all the Herschel bands, but works at the angular resolution of the best band. The usual practice is to smooth everything to the worst band!

Also had a long conversation with Boris Leistedt (NYU) about learning the templates simultaneously with the redshifts in a template-based photometric-redshift system. This is the right thing to do: It captures the causal model that is inherent in the template-based systems, but also captures the data-driven advantages of a machine-learning method. I am interested to know how accurately and at what resolution we could recover templates in realistic fake-data tests. We scoped a first paper on the subject.


AAAC, day 2

Today was the second day of the Astronomy and Astrophysics Advisory Committee meeting. The most interesting material today was a report on proposal success rates and proposal pressures, particularly focusing on the NSF programs. The beautiful result is that none of the standard myths about proposal over-subscription are correct: It is not coming from an increase in the size of our community, it is not coming from faculty at smaller or non-traditional research institutions, it is not coming from any simple demographic changes, it is not coming from any increase in typical proposal budgets, and it is definitely not that people are writing proposals more hastily and less well. And these myth-bustings are empirical findings from data gathered by a subcommittee of the AAAC.

It appears that the vastly increased proposal pressure is coming from the resubmission of proposals rated Very Good to Excellent, which are getting rejected more and more because the funding rate is now so low. That is, there is a runaway process in which when the funding rate gets low enough, very good proposals are getting rejected with good comments, and the proposers resubmit, thereby increasing further the proposal pressure and reducing further the funding rate. This effect is expected to increase further when LSST comes on-line, because the NSF AST budget has to absorb operations.


AAAC, day 1

Today was the first day of the Astronomy and Astrophysics Advisory Committee meeting. The AAAC oversees the points of mutual interest and overlap between NSF, NASA, and DOE. For me the highlight of the meeting was consideration of the National Academy of Sciences “OIR Report”. It explicitly calls out the community and the funding agencies to make sure we are properly preparing students to work in data analysis, observing, and instrumentation.

These things were all standard in the astronomy of past ages. Then, instrumentation became more professionalized as instruments got more complicated (and more expensive), and fewer students built instruments. Then observing started to become professionalized with big surveys like SDSS and the like. And data analysis is headed the same way, as our precision goals get more challenging and our data sets get larger. What to do about all this is unclear. However it is abundantly clear that if we start to lose track of where our data come from or what has happened to them in processing, we are doomed.


marginalized likelihoods for cryo-EM

For health and civic-holiday reasons, it was a very short day today. I took a look at some of the classic or leading papers on cryo-EM data reduction, provided to me by Marina Spivak (SCDA). Many of these talk about optimizing a marginalized likelihood, which was just about all the cleverness I had come up with myself, so I am not sure I have much to contribute to this literature! The idea is that you really do have a strong prior on the distribution of projection angles (isotropic!) and you really do think each molecular shadow is independent, so there really is a case for marginalizing the likelihood function. Magland showed me correlations (time correlations) among neuron firings in his spike-sorting output. There is lots of phenomenology to understand there!


bias-variance trade-off, anomalies

In another day out of commission, I spoke to Foreman-Mackey at length about various matters statistical, and wrote some text for a Tinker, Blanton, & Hogg NSF proposal. The statistics discussion ranged all around, but perhaps the most important outcome is that Foreman-Mackey clarified for me some things about the cryo-EM and galaxy deprojection projects I have been thinking about. The question is: Can averaging (apparently) similar projected images help with inferring angles and reconstruction? Foreman-Mackey noted that if we condition on the three-dimensional model, the projections are independent. Therefore there can be no help from neighboring images in image space. They might decrease the variance of any estimator, but they would do so at the cost of bias. You can't decrease variance without increasing bias if you aren't bringing in new information. At first I objected to this line of argument and then all of a sudden I had that “duh” moment. Now I have to read the literature to see if mistakes have been made along these lines.

The text I wrote for the proposal was about CMB anomalies and the corresponding large-scale-structure searches that might find anomalies in the three-space. Statistical homogeneity and isotropy are such fundamental predictions of the fundamental model, it is very worth testing them. Any anomalies found would be very productive.


The Cannon is an interpolator

In a day wrecked by health issues—I was supposed to be spending the day with visiting researchers from Northrop Grumman–I did manage to write a few equations and words into the method section of the document describing our plans for a compressed-sensing upgrade of The Cannon. It is so simple! And I realize we have to correct our argument (in the first paper) that we need exponentially large training sets to go to large numbers of labels. That argument was just plain wrong; it was based on an assumption that The Cannon is effectively a density estimator. It is not; it is essentially an interpolator.


writing and talking

In another day with limited motility, one small victory was drafting an abstract for upcoming work on The Cannon with Andy Casey (Cambridge). I like to draft an abstract, introduction, and method section before I start a project, to check the scope and (effectively) set the milestones. We plan to obtain benefit from both great model freedom and parsimony by using methods from compressed sensing.

I also had a few conversations; I spoke with Dun Wang and Schiminovich about Wang's work on inferring the GALEX flat-field. We made a plan for next steps, which include inferring the stellar flat and the sky flat separately (which is unusual for a spacecraft calibration). I spoke with Magland about colors, layout, and layering in his human interface to neuroscience data. This interface has some sophisticated inference under the hood, but needs also to have excellent look and feel, because he wants customers.


faint asteroids; statistical significance in dollar units

In a day shortened by health issues, I did get in a good conversation with David Schlegel (LBL), Aaron Meisner (LBL), and Dustin Lang on asteroid detection below the “plate limit”. That is, if we have multi-epoch imaging spread out over time, and we want to find asteroids, do we have to detect objects in each individual exposure or frame and then line up the detections into orbits, or can we search without individual-image detections? Of course the answer is we don't have to detect first, and we can find things below the individual-image detection limits. Meisner has even shown this to be true for the WISE data. We discussed how to efficiently search for faint, Earth-crossing (or impacting) asteroids.

I had lunch with luminary David Donoho (Stanford); we discussed a very clever of idea of his regarding significance thresholds (like five-sigma or ten-sigma): The idea is that a five-sigma threshold is only interesting if it is unlikely that the investigator would have come to this threshold by chance. As computers grow and data-science techniques evolve, it is easier to test more and more hypotheses, and therefore accidentally find (say) five-sigma results. Really the question should be: How expensive would it be to find this result by chance? That is, how much computation would I have to do on a “null” data set to accidentally discover a result of this significance? If the answer is “$5000 of Amazon EC2 time” then the result isn't really all that significant, even if it is many sigma! If the answer is “a billion dollars“ it is, probably, significant. We expanded on this idea in a number of directions, including what it would take to keep such calculations (translations of significance into dollars) up-to-date, and how to get this project funded!


catalog generation as greedy optimization

I had to cancel a trip to Rutgers today for health reasons. My only research was some conversation with Magland about the relationship between the standard data-analysis practice of executing sets of sequential operations on data, and the concept of optimizing a scalar objective function. The context is spike sorting for neuroscience and catalog generation for astronomical imaging surveys.

When the data are complex and there is no simple, compact parametric model, it is hard to just optimize a likelihood or penalized likelihood or utility (though that doesn't stop us with The Tractor). However, sequential heuristic procedures can be designed to be some kind of locally greedy optimization of a scalar. That is, even if the code isn't explicitly an optimization, it can implicitly use the concept of optimization to set the values of otherwise arbitrary parameters (like detection thresholds, window sizes, and decision boundaries).