data analysis consulting, mixture models

Julianne Dalcanton (UW) gave a great Galaxy Coffee talk about mapping the dust in M31, by a very clever mixture-model fitting approach, fitting the extinctions towards red-giant stars in the PHAT data. She shows amazing angular resolution and incredible relationships with emission measures from infrared and millimeter. And that after a great Galaxy Coffee talk from Aaron Dutton (MPIA) summarizing the meeting "The Physical Link between Galaxies and their Halos". He did the very clever / sensible thing of choosing the three things about the meeting that most impressed him, and only talking about those.

I got back into my "data analysis guru" mode today, with long conversations with a group (including Smolcic and Groves and many others) that is trying to detect very faint sources in very deep JVLA imaging of blank fields: How do you know that the sources you are seeing are real, and how do you measure their properties? Much of the non-triviality comes from the fact that the raw data are interferometry visibilities and the maps are made with (relatively speaking) black boxes. I was a very strong advocate of jackknife (or full likelihood modeling, which is a great plan but outrageously hard given where everyone is right now).

I also spoke a bit with Watkins (MPIA) and van de Ven (MPIA) about modeling dynamical systems in the presence of a large-amplitude background, such that the model must include not just the dynamical system of interest (a stellar cluster, in this case) but also the foreground or background (the non-cluster stars of the galaxy or Galaxy, in this case). I worked through my understanding of the mixture models that the Loyal Reader (tm) knows so much about. It gets confusing when the mixture amplitudes are conditioned on observables or data that are not explicitly being modeled in the likelihood; for example in the Watkins case, the velocity distribution is a mixture of components, but the mixture amplitudes depend on position. I originally recommended modeling position and velocity simultaneously, but given the crazy selection of stars they face, it is better to model velocity conditioned on position. This makes the mixture less trivial.

In general we would all be better off if we understood mixture models much better. They obviate hard classification and capture a lot of our ideas about how our data are generated.

No comments:

Post a Comment