flexible rigid models

I worked on Gaussian Processes with Foreman-Mackey today. In our formulation of the "search Kepler data for transiting planet candidates" problem, what is usually called "the model" of the transit sets (in our formulation) the mean of a Gaussian Process, and the photon noise, stellar variability, and spacecraft-induced variability is all subsumed into a covariance function that sets the variance tensor of that Process. The nice thing about this formulation is that the estimation or inference is just what you would do with linear chi-squared inference (optimization or sampling or whatever you like), but just with a much more complicated covariance matrix. In particular, the covariance matrix is now nothing like diagonal.

We are off the usual path of Gaussian Processes, but only because we have a non-trivial function for the mean of the Process. That isn't much of a change, but it is rarely mentioned in the usual texts and descriptions of GPs. I think the fundamental reason for this is that GPs are usually used as incredibly flexible models for barely understood data. In typical astronomical (and physical) problems, there are parts of the problem that are barely understood and require flexible models, but other parts that are very well understood and require pretty rigid, physically motivated models. In our use of GPs, we get both at once: The mean function is set by our physical model of transiting exoplanet; the variance function is set to capture as well as possible all the variability that we don't (officially) care about.

More soon.


  1. Cool. The other thing I've noticed about Gaussian process literature is they often use the GP distribution as a prior for some unknown function, but it sounds like in your case the GP is the sampling distribution with a lot of effects integrated out.

  2. There can be numerical problems when integrating out a mean function to build one do-everything covariance. The simplest example is a GP with non-zero but unknown constant mean offset:

    c ~ N(0, σ²) # c is a scalar
    f ~ N(c1, Σ) # "1" is a vector of ones

    which is marginally equivalent to a zero mean process:

    f ~ N(0, Σ+σ²) # σ² added to every element

    The (Σ+σ²) covariance is poorly conditioned for large σ². I believe there's some discussion in Rasmussen and Williams on how to write well-behaved code with this sort of covariance.