sampling in hard problems

In the morning, I had a call with Foreman-Mackey. We talked about various things. One is the possibility that we could fully sample the galaxy-deprojection or cryo-EM problems. My optimism comes from the fact that there are many samplings of low-level latent parameters that can be done independently at fixed high-level parameters. My pessimism comes from the fact that there are so many parameters. Foreman-Mackey was optimistic. We also talked about building a physical model for the Kepler focal plane (PSF, flat-field, and so on) for K2 data. We were a bit pessimistic about our options here, but we are contractually obliged to deliver something. We discussed ways we might combined data-driven and physics-driven approaches.

In the afternoon, Tarmo Aijo (SCDA) and the Rich Bonneau (SCDA) group talked with Greengard and me about their model for the time evolution of the human (gut) biome. They are using a set of Gaussian processes, manipulated into multinomials, to model the relative abundances of various components. It is an extremely sophisticated model, fully sampled by STAN, apparently. They asked us about speeding things up; we opined that it is unlikely (at the scale of their data) that the Gaussian processes are dominating the compute time.


  1. Could be a speed up from going to an (approximate) Multinomial-Poisson likelihood?