modeling phase space

There has been a long email conversation between Bovy, Lang, Tremaine, and me about modeling mixed-phase (ergodic) dynamical systems to determine potentials from instantaneous phase-space observations. This has been endless, for all the reasons mentioned in previous posts. I brought in Marshall before the Thanksgiving weekend, and traffic continued.

I think we understand something better than we did: We have to include parameters that explicitly model the distribution function in actions (recall that this is the action-angle formalism for integrable potentials at present). We can then marginalize over the parameters of the distribution function when we determine the parameters of the integrable potential.

But there is still the original problem that Tremaine started us on. We understand it better but we don't have a solution: How do you explicitly find the potential that is most consistent with mixed phase using straight-up Bayesian (that is, proper probabilistic) inference? It seems that all you are allowed to do is determine the posterior probability, and that posterior probability is not penalized (very much) if the particles are clustered strongly in phase. There is no way to penalize any such clustering strongly without also strongly violating the independent and identically distributed assumption or picture in which we would like to work (for now).

Partly this all goes to show that mixed-phase systems are hard to work with. Fortunately, they also don't exist.


optimization, deconvolved distributions

I started working through Bovy's opus on our technique for estimating the error-deconvolved distribution function that generates a d-dimensional data set. There are lots of algorithms for describing the distribution function that generates your data. But in general each data point has noise contributions, and in general those noise contributions are drawn from different distributions for each data point. Bovy's framework and algorithm models the distribution that—when convolved with each data point's noise distribution function, maximizes the likelihood of the data. This is the fundamental object of interest in science: The distribution you would have found if your data were perfect.

Conversations continued about image modeling, with Bolton, Lang, Marshall, and Bovy. We have ideas about parameterization of the model, and we have ideas about the objective function, but we are totally at sea when it comes to optimization. Everything we have in mind is expected to be incredibly slow. Incredibly. So I started looking into that.


mixture of delta functions, roulette

On the weekend, Adam Bolton (Hawaii) and I had a long discussion of modeling images, possibly as mixtures of delta functions (which, when convolved with the PSF, are a reasonable and complete basis for modeling any image). My interest is understanding multiwavelength data at the angular resolution of the highest resolution image (or better). This involves not just modeling the pixels, but also modeling spectral-energy-distribution space. We discussed using a delta-function sampling of this too.

Today, Bovy and I continued this discussion, with thoughts about how to generalize the pixel-space and SED-space models to other kinds of mixtures or linear subspaces. This is a non-trivial issue, because choices made here probably affect issues related to optimization, error analysis, and sampling, all of which will come much later.

A long email conversation among Bovy, Lang, Scott Tremaine (IAS) and I continued over our Bayesian formulation of the orbital roulette problem. This is all getting very philosophcal, but Bovy and I are taking the position that the statement of roulette is the statement that the posterior probability distribution of phase, for each object taken individually, is flat. This becomes a constraint on the permitted priors. Tremaine had hoped that roulette could be derived from Bayesian considerations, not that it would be an assumption modifying the Bayesian inputs.


clustering, shapelets

Alison Coil (UCSD) gave a nice talk about clustering and the things that can be learned about galaxy evolution from a halo occupation picture of galaxy and quasar clustering and cross-correlations. She has some tantalizing results on green valley galaxies.

Along those lines, Phil Marshall and I specified a well-posed problem for doing galaxy morphologies and learning things from them using shapelets, or, better, Kuijken's sechlets. We wrote down some early specs, although we didn't in fact get started.



Phil Marshall (UCSB) came into town to talk about various things, including image modeling or deconvolution. We are both trying to reach the information limit in ground-based data by image modeling. Marshall's problem is to find or analyze arcsecond-scale gravitational lenses in large ground-based surveys where the seeing is on the order of one arcsecond. My problem is to create the best possible astrometric catalog from multi-wavelength data from NASA surveys. In both cases, we need to fit point sources to our images; this is deconvolution (even though most wouldn't call it that).


S5 visualizations, phase space

Schminovich, Wu, and I discussed our S5 project, a statistical project with Spitzer spectroscopy of SDSS galaxies. Wu is trying to automate all of the data analysis, which is a laudable and ambitious goal. We worked out a set of data visualizations that will help us vet the output of the automated pipelines.

Kathryn Johnston (Columbia) and I discussed the determination of the Galaxy potential from the kinematics of stars when we can't assume that the potential is integrable or even time-independent, and when we can't assume the stars are phase-mixed. We discussed approaches that—instead of assuming phase-mixed—assume maximum clumpiness or minimum entropy in phase space. There are no well-posed solutions to the general problem that either of us know about; everything has to assume something we know not to be true. But which is the more damaging assumption, the assumption that the potential has time-independence or other (known to be broken) symmetries, or the assumption that the phases of the particle orbits are mixed? And how do we find out empirically?



Spent my research time today editing and commenting on manuscripts from Wu (infrared properties of ultra-low luminosity galaxies) and Bovy (Bayesian orbital roulette).


digitized plates

I spent some time learning about the horrendous photometric and astrometric issues with photographic plates that have been digitized. The photometric and astrometric solutions are non-trivial functions of stellar brightness, because as stars saturate, only the halo (outer part of the PSF) can be measured. The detectable size of the halo of each star grows slowly with magnitude (this is good, but it needs to be calibrated), and it shifts radially relative to the star (the outer part is not concentric with the inner parts). The former effect means that magnitudes are a function of "flux" in the scan and apparent angular size of the star. The latter effect means that the plate scale is a function of stellar magnitude! If we are going to get all this right, we need a generative model of the optics and the photographic process. Uh-oh.


uninformative priors

[In a violation of the rules (at right), I haven't posted for a couple of days. Not much research got done, between finishing a funding proposal, giving a talk at Michigan State University (thanks everyone!), and the related travel.]

Bovy, Lang, and I had the realization that, in the one-dimensional simple-harmonic-oscillator formulation of the Bayesian orbital roulette problem, the priors matter. Some apparently natural choices of prior on amplitude and frequency make it such that the marginalized posterior probability distribution function for the phase is not the same as the prior distribution, that is, flat in phase. These choices of prior must be wrong choices in the absence of truly informative prior information, because the phases ought to be uniformly distributed no matter what.

Once we figured out which priors are truly uninformative, we were able to make the whole thing work. Time to write it up and extend it to more complicated one-dimensional cases (such as Oort's method for determination of the vertical mass distribution in the disk).


harmonic oscillator roulette

Bovy, Lang, and I worked on the weekend and today on an even simpler problem than the orbital roulette problem: The equivalent for a one-dimensional harmonic oscillator. If we get this squared away, we can update the Oort methods for determining the mass distribution (vertically) in the Milky Way disk to a justified Bayesian framework (and improve the precision and build in the capability of accepting finite errors to boot). It turns out to be a little bit non-trivial, though we got something working by the end of the day.



Beautiful astro seminar by Leslie Greengard (NYU) who is the author of fast multipole method, and various machine-precision methods for electromagnetism and gravity. He spoke about solving equations related to scattering in velocity space in plasma physics experiments. His method gets close to machine precision by converting some of the equations into integral equations and then solving the hard part numerically with expansions. Afterwards, we ended up having a discussion of the solution of linear differential equations under non-trivial boundary conditions, which took me back to undergrad.


bayesian orbital roulette

While in Princeton, Tremaine assigned Bovy and me the problem of making a Bayesian formulation of the orbital roulette problem. The existing formulation is disturbingly frequentist (choose a statistic and optimize it!), but Tremaine thought there might be problems in principle with any Bayesian formulation. Conversations between Bovy, Lang, and myself got us to a solution that works beautifully! Unfortunately, we don't quite understand the justification of our own method, which is an odd situation, but maybe not all that uncommon when inference gets hard. This is a baby problem for the Gaia problem that I have been talking about, so it is nice to already have something solid to talk about.

The idea in the Bayesian formulation is that the fact that you expect the phases of the orbits to be randomly distributed (that is, you expect a flat distribution of phases) turns into a distribution of inferred potentials (because, for each observed position and velocity, the phase is a function of what potential you choose). The problem with our formulation is that we used this thinking to jump to the solution, without going through all the steps that ought to be there. More when we have it.


huge data

Today was the second Big Data Lunch at NYU, hosted by me and Heather Stewart of NYU Information Technology Services. The speaker was Kyle Cranmer (NYU), who talked about the LHC, which is truly big data. His talk was great, and started a tiny bit of a discussion about what would constitute truly usable shared, high-performance compute infrastructure for those of us who analyze data. For most of us, common facilities are not that useful in their current form, because we are very constrained in OS, file systems, data transfer, software installation, and other issues. Indeed, Cranmer is not allowed to publish results if he does not use particular versions of operating systems and software! Cranmer advocated an elastic system like Amazon's EC2, with virtualization that would permit arbitrary OS and software installs; I am agnostic, but I agree that we need to do something other than just build huge multi-core systems if we are going to facilitate computational science that goes beyond simulations.


the disk, AEGIS

Spent the day with Bovy at the IAS speaking with Tremaine (and others) about inferring dynamics from kinematics. We decided that the best place to start is the disk, because (a) there are lots of data available right now, (b) we already have results on the velocity distribution and the significance and origins of structure therein, (c) there are other straightforward projects to work through there (most of them much easier than what Bovy has already made happen), and (d) we need to build up some experience before we start thinking about the Gaia problem, which is huge.

Sandy Faber (UCSC) gave the IAS talk today; she spoke about the stellar mass and star-formation evolution results from DEEP2, but especially the multi-wavelength AEGIS data. She argued for a very simple picture in which halo growth is simple, and galaxy growth within those haloes is also simple (depending only on halo mass, with a slight adjustment with redshift). She showed that even this very simple picture explained most of the observations adequately. I don't disagree.


steady-state Galaxy?

I spent a chunk of the weekend and this morning working through the literature on modeling the Galaxy as a steady-state system, using phase-space data from Gaia or the projects that precede it. This didn't take long, because there isn't much written on the subject. Several good papers are by Binney, who has a torus programme. However, this program and all the others (going back to Schwarzschild) make the assumption that the potential is time-independent and has no orbiting substructure (a kind of time-dependence). Binney and others suggest that the time-dependence could be seen in residuals of any fit, but that is not clear to me, and it is also not clear that such a discovery of time dependence would permit good analysis of the time dependence.

Interestingly, tidal streams (disrupting satellites and clusters) both measure the potential directly (in some sense, because the streams highlight an orbit or a small family of similar orbits), and show that the Milky Way is not in steady state. I would love to discover the analysis program that makes use of the near-steadiness and the substructure that is not steady to infer the potential and its time dependence all at once.