SVM, ML, hierarchical, and torques

Fadely dropped in on Brewer, Foreman-Mackey, and me for the day. He has very nice ROC curves for the performance of (1) a support vector machine (with the kernel trick), (2) our maximum-likelihood template method, and (3) our hierarchical Bayesian template method, for separating stars from galaxies using photometric data. His curves show various things, not limited to the following: Hierarchical Bayesian inference beats maximum likelihood always (as we expect; it is almost provable). A SVM trained on a random subsample of the data beats everything (as we should expect; Bernhard Schölkopf at Tübingen teaches us this), probably because it is data-driven. A SVM trained much more realistically on the better part (a high signal-to-noise subsample) of the data does much worse than the Hierarchical Bayesian method. This latter point is important: SVMs rock and are awesome, but if your training data are different in S/N to your test data, they can be very bad. I would argue that that is the generic situation in astrophysics (because your well labeled data are your best data).

In the afternoon, David Merritt (RIT) gave a great talk about stellar orbits near the central black hole in the Galactic Center. He showed that relativistic precession has a significant effect on the statistics of the random torques to which the orbits are subject (from stars on other orbits). This has big implications for the scattering of stars onto gravitaitonal-radiation-relevant inspirals. He showed some extremely accurate n-body simulations in this hard regime (short-period orbits, intermediate-period precession, but very long-timescale random-walking and scattering).


measuring the undetectable

Brewer, Foreman-Mackey, and I have been working hard on sampling all week, so we took a break today to discuss the astrophysics of the problem. We want to measure the number–flux relation (or luminosity function) of stars in a cluster, below where confusion becomes a problem for identifying or photometering stars. There is an old literature from radio astronomy in the sixties and seventies about inferring properties of faint, overlapping sources from looking at statistics of the resulting confusion noise. But nowadays with awesome probabilistic techniques like Brewer's, we might be able to directly generatively model the confused background, and thereby produce probabilistic information about the astrophysical systems that generate it. Could be huge, in these days of confused Herschel data, the PHAT data on M31, the Galactic Center, and so on.

One of the things we discussed is how to make the problem as challenging as possible. We don't want to do any cheating, where the brighter (resolved) stars are effectively giving us the information we seek about the fainter (unresolved) stars.


sampling and degeneracies

Brendon Brewer (UCSB), Foreman-Mackey, and I worked on sampling for most of the day, with some calls to Lang. I want to start a challenge called MCMC High Society, which is a bake-off for samplers that can handle ten thousand parameters. However, in the benchmark problem we have, Brewer's DNest sampler is crushing emcee! That might not be surprising, because we have chosen a problem with massive combinatoric degeneracies (with large low-probability valleys separating them) and emcee is not awesome in that situation. Insights from the day include the following: You should measure your sampler's autocorrelation time in the data space (prediction space) not the parameter space (at least if you aren't a realist, and I am not). Ensemble samplers (like emcee) that work with pairs of walkers should keep track of the acceptance fraction as a function of the walker pair involved in the proposal. Multi-modality in the posterior is the norm, not the exception, and it is hard to deal with for any sampler; that's the whole deal. Samplers that estimate the Bayes integral as they sample might be slow for easy problems, but they might be much better for hard problems, and it is hard problems that are game changers.


not much

Can't say I did all that much today, which is a problem since it is research-only-Tuesday. I spoke at length with Lang about Tractor paper 0, which is still just a gleam in our eyes. I spoke briefly with Brewer and Foreman-Mackey, who are working to bake-off samplers on a very hard problem (see yesterday's post). I worked a bit more on a note about photometric redshift ideas for Alexandra Abate (Arizona), but it isn't moving forward very fast!


Brewer benchmark

Today Brendon Brewer (UCSB) showed up for a week of probabilistic reasoning. He sat in on the weekly meeting of Goodman, Hou, Foreman-Mackey, and I where we discuss all things sampling and Bayes and stellar oscillations (these days). In the afternoon, we tried to get serious about what we would try to accomplish by the end of the week. One thing Brewer and we are both interested in is the performance of samplers in situations where the number of parameters gets large. Another is adapting Foreman-Mackey's emcee code to make it capable of measuring the evidence integral (fully marginalized likelihood).

In the end, we formulated a problem that is amusing to all of us and also a good test of samplers in the large-numbers-of-continuous-parameters regime: Modeling an astronomical image of a crowded field. We figured out three potential publications: The first is a benchmark for samplers working in the astronomical context. The second is a python wrap and release of Brewer's simulated-tempering-like sampler for dealing with multimodal distributions. The third is a short paper that shows that you can figure out the brightness function of stars below the brightness levels at which you can reliably resolve them in a crowded field. The latter project fits into the Camp Hogg brand of measuring the undetectable.


reverberation mapping

Aaron Barth (UCI) gave our seminar today, about reverberation mapping to get black-hole masses in galaxies with active nuclei. He showed results on attempts to truly invert the problem of the distribution of gas; Brendon Brewer (UCSB), who will be visiting me next week, has been doing some great stuff there with his collaborators at UCSB. Barth emphasized that reverberation mapping is not a precise tool, but it seems to me that with serious modeling of a serious spectroscopic observing program, it perhaps could be.


after SDSS-III

On the train to and from Baltimore, I worked on typing up everything Fergus has done so far in modeling data from the P1640 spectroscopic coronograph. It is all words, and one heck of a lot of them, but I somehow feel like equations will be even more confusing.

I spent the full work day at JHU in a meeting about the projects that are being assembled to make use of the SDSS 2.5-m Telescope and its spectrographs in the 2014 and beyond period. It was a great meeting, chaired by AS3 (yes "After SDSS-III") Director Mike Blanton (congratulations!). So much was conveyed and discussed in one intense day, I could never properly summarize it. But here are a few highlights for me:

Blanton reviewed the status and budget; with the current imagined scope the budget really can't be less than 40M USD; at full scope and a southern hemisphere companion instrument it gets close to 60M USD. That's not cheap, so we have to create great things. Bundy overviewed the MaNGA project to perform full IFU spectroscopy on 10,000 nearby galaxies to do lots of science and create a massive legacy data set. Drory convinced me that fiber bundles—at least mass-produced fiber bundles— are not a solved problem and represent considerable project risk. During Wake's talk about sample selection a brief battle (started by Drory) broke out about selecting on purely observational properties relative to inferred physical properties. It was a great discussion and hit many philosophical issues of the responsibility for responsible hypothesis testing between observers and theorists and so on. Of course I scare-quoted the above words because the distinction is not totally sound.

Kneib talked about the eBOSS project to continue measuring the baryon acoustic feature at many redshifts with many probes. His thinking about all this has led to a number of enormous proposals (my name on some of them) going in to large telescopes to do enormous imaging projects for selection and weak lensing. I have found a great audience for Holmes, Rix, and my paper on self-calibration. He also showed that he can easily find emission line galaxies at a wide range of redshifts in SDSS and measure their redshifts spectroscopically with the BOSS spectrograph. He showed a galaxy at redshift 1.6 measured with BOSS. Newman showed that WISE does an amazing job of selecting redshift-unity luminous red galaxies. It can select them confidently at magnitudes fainter than we can confidently get spectroscopic reshifts! Crazy! Green overviewed the ambitious TDSS, which plans—I love this— to take a spectrum of every (in some well defined sense) variable object in the PanSTARRS imaging footprint. Awesome!

I got some reactions out of the crowd on three points: I argued that spectrophotometry for the IFU survey is fundamentally different than it was for the previous three SDSS surveys. The arguments for this are subtle and I should write them down carefully. I argued that every MaNGA cartridge should be different by design to maximize efficiency. That was shouted down because of complexity; I don't in fact disagree. I argued that even if you can't get a redshift for a lot of the faint LRGs, they still might be useful for constraining the baryon acoustic feature. That just got laughs!


why stop at a few eigenvectors?

My loyal reader knows that I hate PCA. That said, Fergus and I are using it to model Oppenheimer's P1640 imaging spectrographic coronograph. We find that the right number of eigenvectors to use from the PCA is in the hundreds not the few to dozen that astronomers are used to. The reason? Fergus and I need to represent the data at exceedingly high accuracy to find faint companions (think exoplanets) among the speckly noise. That said, with hundreds of components, a PCA can properly model any exoplanet and all the speckles, so we use a train and test framework, in which the pixels of interest for finding the exoplanet are not used in building the eigenvectors (which are then subsequently used to model the pixels of interest). That permits us to go to immense model complexity without over-fitting. I love it because it is so crazy; we are barely even compressing the signal with the PCA; we really are just using the PCA to figure out if the pixels of interest are outliers relative to the pixels not of interest. Of course, because all pixels are of interest, we are cycling through all choices of the pixels of interest (and their complementary training set). My job is to write this all up. By Friday! Luckily I am on a train to Baltimore tomorrow. If you have recently sent me email: Expect high latency.


NASA-Sloan Atlas

Blanton has made the beautiful and useful NASA-Sloan Atlas of galaxies. This is very relevant to my plan for making a human-viewable atlas of the very largest (angularly) galaxies on the sky. Because Blanton was not concerned with precise measurements of the absolutely hugest galaxies, I think we will probably have to use Mykytyn's re-measurements made with the Tractor, but the NASA-Sloan Atlas gives me something pretty good to work with while Mykytyn gets a robust pipeline working. I wrote code today to read, manipulate, and plot the tabular data in Blanton's atlas.



Zolotov (Hebrew) returned to NYU for a day for catching up. She is working on—among other things—the effect of dramatic baryonic evolution (star formation and supernovae) on the dark-matter halos of substructure within galaxies. She finds bigger effects than you might imagine, because although the baryons are a small fraction of each subhalo, they can undergo rapid evolution at very small scales and high densities. I pitched to her the idea of testing in her super-realistic simulations the strong dependence of scale height and scale length on metallicity and alpha-abundance that Bovy, Rix, and others (including me) find in the Milky Way disk (at least in the Solar Neighborhood). She ought to be able to see exactly the same effect in her simulations.

Procrastination on a large number of pressing items has led me to write a long document about probability calculus. I spent a lot of the weekend working on it, with a short diversion to Lang's place, where we talked about decision theory, among many other things. I am not sure it counts as research, but look for it on the arXiv sometime this semester.


uncertainties are parameters too

The day started with a call with Rory Holmes about our nearly-finished paper on self-calibration of imaging surveys. One of the things we discussed was journal choice. Not an easy one. We also agreed to move to git and cloud-based code hosting.

Later in the day, Fouesneau (UW), Weisz (UW), and I worked on issues of completeness, cluster membership posterior probabilities, and radial-profile fitting. On the latter point, Fouesneau pointed out that the errors in the measurements of the radial profiles (which are photometric) are probably under-estimated because they are generated by shot noise not in the number of photons, but in the number of stars, which have a huge dynamic range in brightness. He doesn't trust the uncertainties that are reported to him. Not having the ability to re-do the error analysis, we discussed the various things that the uncertainties could depend on, among the data outputs and model inputs we do have. Once we had that written down, we realized that we could just parameterize the dependence of the uncertainties on the measured and model quantities, and fit for them. We pair-coded that in Fouesneau's sandbox and it worked! So we might be treating uncertainties in the way they deserve. It reminded me of conversations I have had in the past with Brendon Brewer (UCSB).


young clusters

I continued working on various aspects of the young clusters in M31 with Fouesneau (UW), Gordon (STScI), and Weisz (UW). We discussed at length how the mixture model works, with a mixing that is position-dependent. We figured out, conceptually, how the cluster properties in the mixture model are constrained strongly by stars that fit the cluster well, and not by stars that don't. We agreed that we need more pedagogical stuff on mixture models, for the paper, for talks, and for the world. I worked with Fouesneau to debug some code that made it seem like emcee was not working; in the end emcee was not the problem. We had a great Korean lunch on 32nd Street before sending Gordon off at Penn Station, and then spent the walk home re-discussing completeness. We decided, tentatively, to do a naive completeness estimation and just note the limitations. Doing the Right Thing (tm) is beyond our current scope, and probably won't change our results very much (for reasons we can quantitatively argue).



Lang arrived for the day, and after a re-cap of what we are trying to do with the PHAT data, the issues of completeness came up. We had a lively discussion with Fouesneau (UW), Gordon (STScI), and Weisz (UW) of how we should measure the completeness and how we should use those measurements. We realized that there are so many subtleties, there is a paper that could be written on this alone. It is hard to measure and easy to use wrongly! Coding continued, and in the evening, Lang and I re-discussed our first papers from the Tractor. I promised to email Lang minutes of that chat.


young, PHAT clusters

A chunk of the PHAT team—Dalcanton (UW), Fouesneau (UW), Gordon (STScI), Weisz (UW)—arrived today to talk about stellar SED fitting and propagation of uncertainties therein to quantities and studies of interest. (For those of you who do not read and remember absolutely everything I write here, the PHAT project is a Dalcanton-PI six-band imaging survey over a large fraction of the M31 disk to create a catalog of tens of millions of stars and do a lot of the science you can do with that.) Today we made plans for the week-long sprint, which appear to be to use the outputs of SED fitting for every star on a big parameter grid and inputs from a model of the stellar population in a cluster, plus some crazy integration, to build a marginalized likelihood for the cluster parameters. Now everyone is coding like crazy. I think we surprised ourselves when we decided we would work in IDL and not Python. Crazy, but pragmatic.


informal scientific writing

On the plane home, I answered a detailed question from Jannuzi (NOAO), in the form of a few-page typeset document, started work on an answer to a detailed question from Abate (NOAO), and wrote a few pages about probabilistic reasoning for the next installment in the Data Analysis Recipes series. My conversation yesterday with Jannuzi convinced me that it would be useful to review the basic operations available in measure theory (probability calculus). Most of the things we do here at Camp Hogg in data analysis boil down to simple applications of simple operations in measure theory; it isn't hard, but it is powerful, in the sense there are a few rules that—once you learn them—make every novel probabilistic inference straightforwardly discoverable. I also want to advertise the dimensional-analysis way of thinking about probability theory, which has been very useful to me (and Rix too, I think).


photometric redshifts

I spent the morning in two conversations about photometric redshifts, one with Buell Jannuzi (NOAO), who is using them to measure evolution of clustering of massive galaxies, and one with Alexandra Abate (Arizona), who is using them (and anything else she can find) to estimate the redshift distribution of weak-lensing galaxies in future LSST data. Abate and I figured out that extremely low signal-to-noise spectroscopy could in principle be very decisive, because for objects that have a wide range of redshifts permitted by photometric redshifts, there is usually a strong dependence of inferred redshift on inferred SED (and therefore predicted emission lines). I promised both Jannuzi and Abate that I would write notes on the airplane home.

In the afternoon, Mario Juric (NOAO) and I talked about all things LSST, including how we can go beyond catalogs to data products that contain more probabilistic and noise-model information. As my loyal reader knows, I think that if all LSST produces is a catalog and some images, it will not achieve many of its most valuable goals.


Tucson firehose

After my Colloquium here at Steward Observatory plus NOAO, Buell Jannuzi (NOAO) told me that I had, in a one hour talk, perfectly simulated a AAS session: My talk consisted of five ten-minute talks, each of which was intriguing and depressing in a different way! I took that as a compliment.

I also had nice conversations with Todd Lauer (NOAO) about image processing and data modeling in general. We discussed, among many other things, what the relationship might be between a model of a full set of overlapping images and a co-add of deconvolved images (with suitable priors, I presume). These two things might look very similar, if the deconvolution is light and the variations among the images is not too large. We also discussed when it makes sense to do the Right Thing (tm) when the Simplest Thing (tm) is much easier to understand and use.

At dinner I spoke with many of the graduate students, which was a pleasure. I learned that Ken Wong (Steward, of PRIMUS fame, among other things) has executed a pie-in-the-sky idea of Ann Zabludoff's (Steward): Find lines of sight in SDSS that are highly likely to have high magnification from lensing by superimposed galaxy groups and clusters. This is not trivial because the relationship (at group and cluster scale) between dark matter and galaxies is not trivial.


how many stars have planets?

Foreman-Mackey and I interacted with Eric Ford (Florida) about some of the issues in inferring the total planet population from the tiny, tiny fraction that produce observable transits. Ford sent us two great, long, detailed emails filled with ideas and advice. It is very interesting to me how email communication is such an important part of the scientific process; Ford's emails to me are always so good they should really count as publications on his CV!

We are interested in these problems in part because they are very good examples for using probabilistic graphical models in astrophysics, but also in part because there are some versions of these questions that might be easy to answer right away, with data straight from the literature. It gets hard if we really have to build a model of the Kepler selection function, but we are thinking about problems we can do without doing that (at least at first). We also have the GALEX eclipses I have been blogging about; these also lead to interesting problems, although because we have almost no period information, the inferences we can do are limited.

One interesting thing Ford pointed us to is emerging hints that multiple-planet systems violate the null hypothesis of being built by multiple independent draws from the one-planet systems. Obviously that null hypothesis must be wrong for dynamical reasons, but apparently it is already strongly violated in the data in hand. My intuitions says that is something worth checking out.


the electromagnetic fields in cameras

I spent a very enjoyable hour in a coffee shop with Fergus and applied mathematicians Leslie Greengard, Charlie Epstein, and Mike O'Neil. We discussed the possibility that Fergus and I might do better on our coronograph problems by doing real-live modeling of the scalar or vector wave equations in a physically realizable device. I have an intuition that we will, but I also have a pretty good sense that we can't really model the full electromagnetic field on every surface; that's insane. Some great ideas came up in the discussion. Greengard pointed out that the convolutions (think Green functions) I am doing numerically can be much, much faster with FFT-like techniques (and Greengard should know). Epstein started out by saying five-meter mirror and micron wavelengths; geometric optics isn't good enough for you?, which is a pretty fair comment, and then followed that by saying that there is a next order to the geometric optics approximation. It is a well-defined limit, after all, so what we think of as being geometric optics (that beautiful theory) is really just the first term in an expansion. That's cool, although the implication is that the next term in the expansion is ugly (not in the textbooks as Greengard said). Epstein also noted that inference of the phase from an intensity image is probably not a good idea. The conversation was great; but unfortunately it didn't convince me to give up on this crazy idea. Tonight I have to polish up my code and send it to Greengard for re-factoring to non-stupidity.

One paradox still remains in my mind after it all, and it is this: Heuristically (yes, very heuristically) there are trillions of wavelength-squared cells on the entrance aperture (or primary mirror), but heuristically (yes, yes) there are only hundreds or thousands of speckles on the focal plane that get significant illumination. So doesn't this mean that we don't have to model the entrance aperture at full wavelength resolution to precisely model any real speckle pattern? And isn't that somehow odd?


the Higgs, stellar modeling

Neil Weiner gave a great brown-bag (all chalk) talk about the Higgs at lunch today. He started from scratch: He started at the Lagrangian in field theory and ended up saying what the implications are for new physics (especially supersymmetry-like theories) of various different Higgs masses! And all in 50 minutes. It was a masterpiece, and hilarious to boot. Ask him for his joke about string theorists.

Before and after that, Hou showed us the results of fitting a stochastic stellar oscillation model to K-giant radial velocity data. We are hoping to show that including a physical model for surface variations will improve the results of exoplanet fitting. We can show this for fake data; now onto real data.


probabilistic modeling of eclipses

Here is a GALEX eclipse with 64 posterior samples from a probabilistic model superimposed.

The star looks slightly variable out of eclipse because there is a varying background superimposed on the stellar light; this is part of the model too if you are doing proper Poisson likelihoods (as we are). Thanks to Foreman-Mackey for the awesome emcee sampler and Dave Spiegel (IAS) for some great (because it is so simple) transit-modeling code.

Not surprisingly, there is some degeneracy between the inferred impact parameter and the inferred ratio of radii of the star and companion. But it looks like we will be able to say things about the companion radii.


measuring bright galaxies, foundations of math

A few weeks ago I reported on my safari to Philosophy. Today, in the Astro seminar at NYU, Tim Maudlin (NYU Philosophy) went on safari to Physics. He works on the question of why mathematics is useful in describing the physical world, and this has led him to the basis of mathematics, or really the basis of the mathematics that is used in physics. He finds (remarkably to me) that if he builds topology (which is really the continuity structure of space or spacetime) on the properties of one-dimensional fundamental objects rather than open sets or neighborhoods, he gets some aspects of the causal structure of spacetime for free. We think (often, informally) of the causal structure as coming from the metric, but Maudlin finds that it can come in far earlier than that if we replace or transpose (in some sense) the foundations of topology. Crazy stuff, and a very lively seminar. My kind of Friday afternoon. Lunch was pretty hilarious too; Maudlin has at his fingertips many paradoxes that get at controversies about probability and information, related to the anthropic principle and the like.

In the morning, Mykytyn showed me that he can fit the intensity images of big galaxies, even in the presence of bright stars, even when those galaxies span different fields taken on different nights, and subsets of the images come from different photometric bandpasses. We are very close to having a system that can re-measure all the (very, very) bright galaxies in SDSS. That could have big impact.


talk at UMass

I spent the day at UMass Amherst where I chatted with various people and gave a seminar on finding the dark matter with snapshots of tracers in phase space. I talked a lot about Gaia data but at dinner afterwards someone pointed out that I never said what Gaia actually is! After my recent trips and interactions I am starting to forget that here in the US no-one knows about this great mission.