purity and completeness

Today Kate Storey-Fisher (NYU) and I discussed how to estimate the stellar contamination of her Gaia and WISE quasar catalog. Because there are few large, complete samples of anything, it’s hard to do this by comparison with any kind of Ground Truth™. What we realized on the call is that it’s easier to estimate how the contamination *changed* as we went from the Gaia quasar candidate table to our final sample. We discussed how to use what external data we have to estimate this.


CMB component separation with linear fitting

Today I sat down with Fiona McCarthy (Flatiron) to look at data-driven methods for separating cosmic microwave background data into different components. We implemented a simple polynomial regression to fit foregrounds, using (observed) difference maps as inputs (features) that are designed to contain foregrounds only. We obtained some preliminary results that looked exciting but we’ve only just started. Part of the motivation is that CNNs are hard to train, but linear combinations of image monomials are easy! I realized in all this that there are connections to the group-equivariant stuff I’ve done with Villar’s group, because we use invariants, and also to the causal inference things that Schölkopf’s group does, because we’re trying to impose some causal structure on our functions.



Imagine that you want to read all the text in an arbitrary image of the world. That text will lie at different locations, rotations, shears, and even reflections (think signage painted on windows! or mirrors!) in an image. As training for a baby problem in this area I made this training set today.


catalogs rant

Should I write this paper?

Abstract: Observational astronomy projects often produce catalogs—of stars, galaxies, quasars, planet hosts, and so on—for use in other projects. How can we use these catalogs responsibly? The answer to this turns out to be complex; it depends sensitively on how the catalogs were made. In particular, if the catalog entries were obtained by operations on a set of (nearly) independent or separable likelihood functions, the catalog can be used in a much wider set of circumstances than if the catalog entries were obtained by operations on a posterior pdf or on likelihood functions involving important shared parameters or shared data or shared prior information. This is true no matter whether the subsequent analyses of the catalog are Bayesian or frequentist. Importantly, at the present day, many important catalogs are being made from the outputs of MCMC runs or discriminative machine-learning methods (classifications or regressions). These catalogs are very hard or even impossible to use for population studies. I demonstrate these points mathematically, and also with toy examples from comology, stars, and exoplanets. I recommend that catalogs be designed and made with the feasibility of particular end-user investigations as explicit requirements.


abundance moments are the new actions

I had fun today talking to Neige Frankel (CITA) about all things Snail-y. We discussed how to verify the stellar parameters we are using for our Snail studies. One issue is that we want to check that things are (fairly) well mixed along orbits, but we need a theory of the orbits to check this. I recommended that, instead of computed actions (integrals of motion), we use statistics of the abundance distribution. After all, if the stars are well mixed, moments of the abundance distribution ought to be constant on orbits. If you just need the actions to label the orbits, abundance moments serve as replacements. Actions are theoretical and unobservable. Abundance moments are observable and measurable (noisily maybe!).


reproducibility; reionization

Today featured a blackboard talk by Soltan Hassan (NYU), about a semi-analytic model to explain the various bits of data we have about the reionization of the universe at redshifts around 7. The model is baroque, but there are no options when it comes to problems that are deep in gastrophysics.

After lunch I spent an hour on a panel organized by NYU Libraries about reproducibility in the natural sciences. That was fun; so many ideas! One interesting idea is that it is transparency, more than reproducibility, that is important. Another was a technical suggestion: If you want your students to be good at making reproducible code, they shouldn't bring you plots of their results, they should bring you code that you can run to make those plots! Haha, genius.


nulling the CMB

I had a fun conversation today with Fiona McCarthy (Flatiron) and Colin Hill (Columbia) about combining CMB maps that have been contaminated (by God) with foregrounds. The issue is that any machine-learning method for finding combination weights will deliver weights that are covariant with the true CMB signal, and thus bias the results importantly. We figured out that there are linear combinations of the maps that will have (by design) zero CMB signal in them! If we train our machine-learning method using those, it can't be sensitive to the CMB signal itself? Will it work? We'll see.


guarantees on diffusion

Today at JHU I went to the joint group meeting of Jeremias Sulam (JHU) and Soledad Villar (JHU) in which Jacopo Teneggi (JHU) showed guarantees on correctness for some diffusion-based de-noising schemes for images. The quarantees are a bit weak, because it is hard to put a hard boundary on coverage in image space! But essential for medical applications (the illustrated domain). I have to say, it was nice to see a rigorous approach to errors from machine-learning methods. I think this is necessary for our uses in cosmology, where we are expecting machine-learning emulators to be as accurate as simulations!?!


nerve-wracking talk

I spent a ride down to Baltimore preparing a talk for mathematicians. That's outside my comfort zone. I gave the talk with the MINDS institute at Johns Hopkins at lunchtime. It was about passive symmetries, active symmetries, classical physics, and machine learning. There was no math. They only asked me, in the end, a few questions I couldn't answer. I hypothesized that the differences between passive and active symmetries is that the latter are statements about interventions.


thermodynamics of cosmic gas

The day ended today at Flatiron with a great Colloquium by Eichiro Komatsu (MPA) about the temperature of cosmic gas. Gravitational collapse heats the gas, and that takes it up to something like 2 million degrees. This was computed ages ago by Peebles and others, but is now measured. There was a lot of discussion during and after about other heating mechanisms, and what things constitute gravitational heating. I'm interested in whether this result meaningfully constrains scattering interactions between the dark matter and baryons; if they scatter, and the dm is heavier, the baryons will (eventually) get exceedingly hot.