optimized photometry and etc.

I finished the zeroth draft of my philosophical paper on Dawkins. I sent it to a few friendlies for comments, and some of them gave fast responses which make me think I need to do massive revision! So it goes!

I had a long chat with Dan Foreman-Mackey about current projects and ideas. We discussed the sensibility (or not) of publishing the results (with Dun Wang and Bernhard Schölkopf) on using ICA to find or perform photometry on variable stars in crowded fields. I want to publish this, because it is so simple (and maybe contains some insight) but we can't see how this generalizes or wouldn't be demolished by even a simple forward model. We discussed a generalization of The Cannon that simultaneously fits a physical model and then a data-driven model for just the residuals away from that physical model. If we could get away with a linear model for the residuals, we could explicitly marginalize out a bunch of stuff. Foreman-Mackey reported on work by Benjamin Pope (Oxford) on photometry that connects to the OWL, my unpublished (and, really, dormant-to-dead) project on optimal photometry in bad imaging. I resolved to drop Pope a line.


Stream Team projects

In the morning, I worked on my philosophical writing, and in the afternoon, I met up with the Stream Team (the joint groups of Kathryn Johnston, Marla Geha, and me) at Columbia. Some of the discussion was about what we can learn from streams, with Ana Bonaca working on this by considering new kinds of data we could (in principle) have and comparing the inferences with those we can have now, and Adrian Price-Whelan working on this by fitting increasingly complex (or free) models for the Milky Way force-field. Some of the discussion was about new data, with Adrian Meyers (Yale) telling us about a putative new stellar stream (he calls it a “feature”) he may have found by matched-filtering imaging from the DECam, and David Hendel (Columbia) showing us that the very precise RR Lyrae distances they are getting are not leading to a simple story about the Orphan Stream. Chervin Laporte (Columbia) discussed the possibility that thin disks could be used as antennae that respond sensitively to interactions and accretions; we discussed how we might look at this in the Milky Way with APOGEE and Gaia.




I worked on writing in my research time today, in one of my more philosophical pieces. Writing is a slow process, at least for me!


reading and writing

On the plane home I wrote an abstract for Jan Rybizki and my possible paper about the diversity of possible nucleosynthetic pathways for stars. It seems like there might be a paper to write! I also looked up (inspired by conversations with statistician Sarah Michalak of LANL) writing I did many years ago on how pundit Richard Dawkins mis-characterizes science and the work of scientists. Maybe I should brush that into shape?


#cosgal16, day 2

I kicked off day 2 of the Cosmic Dawn meeting with a discussion of how Bayesian inference and marginalization of nuisances can be used to improve astronomical inferences with spectroscopy. It turned out I was preaching to the converted, because the next five or six talks all fully endorsed Bayesian approaches. This is a big change in our community. Along those lines, Brinchmann emphasized that Bayesian approaches are a good idea because they implicitly encourage investigators to make their decisions and assumptions explicit. I agree, although there is really nothing about frequentism that prevents this! Stenning showed a very nice, very simple, very good hierarchical inference of star-formation relationships among noisily observed galaxies; that warmed my heart.

Leja showed (reinforcing much vaguer content from my talk) that different models give different answers about galaxies even given the same data. He also showed that in general, inferred stellar masses and star-formation histories are inconsistent, which is potentially bad. He showed nice results that they can use photometry (plus serious, complex modeling) to predict spectroscopic indices (like line strengths). This is a great way to validate the models, because it makes comparisons in the space of observables, not latents. I talked to him and Ben Johnson afterwards about generalizing this idea for experimental design.

There was a lot of discussion (from Heckman, Erb, Henry) of the problem of understanding the Lyman-alpha luminosity density of the Universe, with all of them to some extent wondering whether we could predict Lyman-alpha emission from other (easier to observe) emission lines. That seems like an interesting project for data-driven approaches.


#cosgal16, day 1

Today was the first day of the Cosmic Dawn of Galaxy Formation meeting in Paris, organized by Charlot and a cast of many. The meeting is not in any area I work these days, so I learned a huge amount. In general I find it very rewarding to go to meetings outside the areas in which I work. So much happened I can't possibly mention it all, but here are a few highlights:

Oesch showed redshift 10 and 11 galaxy population estimates. And along the way, showed full 2-d modeling of the 3DHST grism spectroscopy images. I am impressed by how good this looks, and this was shown again by other speakers. McLure pointed out that much of the disagreement about luminosity functions at redshift 10-ish are really disagreements about how the data should be presented; there isn't nearly as much disagreement at the data level. This is a pet peeve of mine: Often debates happen at punchline level, or in the latent space, when the only points to argue about live in the space of observables.

Treu and Smit both spoke about using lenses to find very high-redshift sources. In answer to a question from me about reliability of galaxy population statistics given observations of highly magnified sources, Treu emphasized that once you have highly magnified sources, the lens model is very strongly constrained. That said, he emphasized that the volume as a function of magnification is not well constrained at high magnifications. Smit showed that at finite angular resolution, naive interpretations about galaxy emission processes depend strongly on resolution. She could make this point securely because (with lensing) she has multiple views of the same galaxy, at different angular magnifications.

Ocvirk and Wise and Beckmann all spoke about insanely large computational programs to simulate high-redshift galaxies. The first two each had tens of millions of CPU+GPU hours on national facilities! Ocvirk talked about tracking the radiation along with the matter, which is a hard problem (and, indeed, even with 7 million hours per run they are only doing the fluid approximation). Beckmann talked about tracking the accretion onto black holes, by adaptively refining the hell out of the simulations around seed black holes. She finds (numerically) that the BHs are thrown from the halos at early times and has to add drag to keep them in. Apparently there is a history of doing this in simulations, but it sounds a tiny bit suspicious to this (untrained) outsider.

In side conversation, Chuck Steidel (Caltech) asked me a good question: Do any of the APOGEE stars have such high alpha-to-iron that they could be understood with pure core-collapse nucleosynthetic origins? That's worth checking out.


betterizing radial velocities

Megan Bedell showed me results this week that the velocity measured for a star by HARPS appears to be correlated with the pipeline-determined wavelength solution parameters. That's bad! But also good! Because it may give us way to improve the end-to-end calibration of radial velocities. We started down the path of looking at train-and-test regression, where we fit the relationship in all but a held-out exposure, and apply the relationship to the held-out exposure in a (gasp!) data-correction step. Or something along those lines to avoid over-fitting. Of course the right thing to do might be to fit everything at once, including the radial-velocity curve generated by exoplanets, but I don't think the exoplaneteers would like that so much.


chemical abundances in APOGEE and GALAH

In a full day at MPIA in Heidelberg, I spoke at length with Jan Rybizki (MPIA) and Sven Buder (MPIA), along with Hans-Walter Rix and Melissa Ness. Jan's project is to work on nucleosynthetic models; he has upgraded his code to generate what I call “shot noise”—the noise arising from the fact that in small star-formation regions, the number of supernovae can be small enough such that chemical enrichment does not approach the expected mean. We discussed first papers, observing that many of our plans could be executed with just the Sun and Arcturus and a few other standards; we need accuracy more than we need numbers for many first projects. My first project with Rybizki's code is to look at chemical-abundance diversity. I'd like to pose a well-posed question that makes use of the power of APOGEE.

Buder's project is to produce chemical-abundance labels for GALAH spectra using The Cannon. The problem, as per usual, is to obtain a good training set. We discussed the SME code that he is using and how we could wrap it or repurpose it. He showed us a beautiful set of results that show that when a star in GALAH gets low likelihood under the trained model of The Cannon, it is almost always interesting: A binary, a fast rotator, or maybe even a white dwarf.


#ISBA2016, day three

Today I saw some good talks on approximate Bayesian computation (ABC) or likelihood-free inference (as it were). One highlight was Ewan Cameron's talk on epidemiology. He gave me two ideas that I could translate immediately into my own work. The first is a definition of “Indirect Inference”, which (if I understand correctly) is exactly what is being done in cosmology that I have been railing against in my Inference-of-Variances project. The second is the (very simple: meaning great!) idea that one can trade off long-range-ness (as it were) of a Gaussian Process by complexifying the mean function: Put in a non-trivial, flexible mean function and the variance has less to do. That could be valuable in many of our contexts. One has to be careful not to use the data twice; I believe Cameron handled this by splitting the data into two parts, one of which constrained the mean function, and one of which constrained the GP.

Other highlights included Wentao Li showing that he could adjust ABC to give precise results in finite time when the data get large (unadulterated ABC gets impossible as the data get large, because the distance metric thresholds generally have to get smaller and acceptance ratios go to zero). Edward Meeds (in a move similar to things mentioned to me by Brendon Brewer) separated the parameters of the problem from the random-number draws (simulations usually have random numbers in their initial conditions, etc); conditioned on the random-number draw, the code becomes deterministic, and you can auto-differentiate. Then: optimization, Hamiltonian, whatever! That's a good idea.


#ISBA2016, day two

Today was day two of ISBA, the big international Bayesian conference. In my session—which was on statistics in astronomy—I spoke about exoplanet search and population inferences (work with Foreman-Mackey, Wang, and Schölkopf), Xiao-Li Meng (Harvard) spoke about instrument calibration, and David van Dyk (ICL) spoke about Bayesian alternatives to p-values for physics and astronomy. Meng had very valuable things to say about taking an inference problem from the linear domain (where calibration multiplies linear flux) to the log domain (where log calibration adds to log flux). I learned a lot that is of relevance to things like self-calibration, where I think maybe we have been going to the log domain (very slightly) incorrectly! There is a half-sigma correction floating around!

van Dyk made the center of his presentation the five-sigma discovery of the Higgs Boson; he pointed out that five-sigma is very conservative in principle, but the fact that it has been applied to as many hypotheses as there are relatively different Higgs mass options makes it less conservative. This isn't trivial to deal with if the only question is whether or not there is a Higgs. He solved the problem in some Bayesian contexts and compared to frequentist multiple-hypothesis solutions. Interestingly, he isn't against using p-values in the discovery context (especially when billions of dollars of public money are at stake); 5-sigma p-values are conservative, and (more importantly) perceived to be conservative!


travel to #ISBA2016

Today was a travel day, from New York to Sardinia for #ISBA2016. I spent my research time on the trip planning how to present our exoplanet research to Bayesian statisticians.


likelihood functions for imaging

Mario Juric (UW) showed up for the day and we spoke for hours about many things. One category was image likelihood functions (for things like The Tractor or weak lensing). We came up with a very dumb (read: good!) idea for testing out ideas around likelihood functions: Take two image data sets from different telescopes that overlap on the sky. Build a catalog of sources (with positions and colors and so on) from a joint analysis of both data sets. Then do the same, but in a world in which your only interface to each data set is a callable API to a likelihood function! That is, something that takes as input a parameterized high-resolution image model and returns a likelihood value, given what it knows about its data and calibration, PSF, and so on. This would force us to figure out what would be needed in such an API. I think we would learn a lot, and it would help us think about how to construct next-next-generation data products. We also talked about image differencing, Dun Wang's Causal Pixel Model, and other matters of mutual interest.


code seminar

The research highlight for the day was a seminar led by Jeremy Magland (SCDA), about MountainView, his deployed code that sorts and presents data on neural recordings (cuts the data into spikes and clusters the spikes into neurons and presents the results visually). The great thing about the seminar is that it wasn't really about neuroscience, it was about code: How to structure and build and maintain a project of this scale, which works on both local and remote data. The audience was asked to weigh in on design issues and react to design choices. It was a very productive discussion and is a model for events we should do next year jointly between the SCDA and the new SCCA; both institutions will be building and supporting non-trivial software projects.


Gaussian processes, black-hole dark matter

At group meeting, Dan Cervone (NYU), who works on spatial statistics for sports and climate, went over the basics of Gaussian Processes for interpolation. This is all related to Boris Leistedt's project to determine galaxy photometric redshifts with a flexible spectral energy distribution model.

I had lunch with Kat Deck (Caltech) and Yacine Ali-Haimoud (JHU) and Kyle Cranmer (NYU). We talked about various things, but especially whether the dark matter could be massive (LIGO-detected!) black holes. One idea that would be easy to look at, that Scott Tremaine (IAS) mentioned to me a few years ago, is whether the dark matter granularity could be limited by looking at the dispersal of cold tidal streams of stars.

Late in the day, I worked on my few-photon image-reconstruction toy problem, tuning the stochastic gradient and going to smaller numbers of photons per image. It works well and I am excited about the implications for diffraction microscopy.


stochastic gradient FTW!

So I have been working on this toy problem: Imagine you have a scene, and each "image" you have of the scene contains only a few photons. Each image, furthermore, is taken at a different, unknown angle. That is, you don't know either the scene nor any image's angle, and each image contains only a few photons. This is a toy problem because (a) it is technically trivial, and (b) to my knowledge, no-one has this problem to solve! (Correct me if I am wrong!). I am using this problem as a technology development platform for the diffraction microscopy projects I want to do.

One piece of infrastructure I built today is a stochastic-gradient optimizer, that reads in just one image at a time (just a few photons at a time) and takes a gradient step based only on those new photons. This is standard practice these days in machine learning, but I was skeptical. Well it rocked. The image below (believe it or not) is my reconstruction of my scene. I also show four of the “images” that were used as data. The images are just scatterings of photons, of course. (As my loyal reader knows, the objective of the optimizer is a marginalized likelihood.)

I also had a great lunch with Kathryn Johnston, in which we discussed our past and future projects in Milky Way astrophysics and why we are doing them.


AAAC meeting, hoggsumexp

In big marginalized likelihood calculations, there are usually some "logsumexp" calculations floating around. Today I wrote a custom logsumexp that returns the answer (in a numerically stable way) and also derivatives with respect to parameters, for use in my diffraction microscopy (and later galaxy-pose) projects.

I spent most of my research day on the Astronomy and Astrophysics Advisory Committee meeting (that advises NSF, NASA, and DOE on areas of mutual overlap). The most interesting thing I learned on the call, from a research perspective, is that NASA now has very long-duration balloon technology that can fly around the globe many times, with no natural time limit! The balloons can carry 5000 lbs. This may create a lot of new low-cost opportunities for astronomy outside the visible.


toy problem

Friday got messed up, and only a tiny bit of research got done, but I made up for it on the weekend, building a toy version of the hard problem I presented to the mathematicians on Thursday. The toy problem is to reconstruct an image with few-photon examples, each taken at a different rotation angle. The form of the likelihood is similar, but the computational cost is much lower. It should be an intuition-building example.


talking stats at the mathematicians

I spent a very large part of the day today at the whiteboard in front of Charlie Epstein (Penn), Leslie Greengard, Jeremy Magland (SCDA), and Marina Spivak (SCDA). I presented my proposed solution to the problem of diffraction imaging of molecules in the limit of very few photons per exposure. We had a brief discussion of the physics, a very long discussion of the idea of solving this problem by optimizing a marginalized likelihood, and a brief discussion of its derivatives with respect to parameters in a representation. It was an incredibly useful session: The crowd found some mistakes on my part, and it forced me to clearly articulate how I think probabilistic inference works in these cases.

I think my proposed solution is palatable to both Bayesians and frequentists: In principle the frequentists should object to my marginalization over angles, but this is close to unassailable, because when the angles are generated by an isotropic process, they really do have a well-defined distribution that does not have to be described as a prior. That is, this integral is frequentist-safe. In principle the Bayesians should object to the fact that I am going to optimize the (marginalized) likelihood rather than fully sample the posterior, but even hard-core Bayesians recognize that sometimes you can't afford to do more than get the peak and its width!

Amusing notes from the chat: My definition of “principled” is very different from Leslie Greengard's! And the crowd was not as confident as I am that I can solve a problem where the intermediate steps to the answer require as much disk storage space as exists in all of facebook (note the reference to “1011 to 1017 numbers” on the board).


Dr Ana Bonaca!

Today I had the great pleasure to sit on the PhD defense meeting of Ana Bonaca at Yale. She defended her thesis work on measuring the mass and gravitational acceleration field of the Milky Way using cold tidal streams. Her thesis is comprehensive: She develops a new method for performing the inference (a good likelihood function!); she compares the results obtained for (artificial) streams made in simple potentials and realistic, time-dependent potentials; she finds (in real data) a new, cold stream (incorrectly named Triangulum Stream) that is of great value for this work; and she performs the first ever measurement of the Milky Way using two streams simultaneously (GD-1 and Palomar 5). A great set of projects and a great seminar and discussion.

Bonaca, Marla Geha (Bonaca's advisor), and I spent lunch afterwards fighting with cosmologists Nikhil Padmanabhan and Frank van den Bosch about why this work is important, with the cosmologists taking the (exaggerated, I think) position that the Milky Way doesn't matter to cosmology! That was fun, too!