pulsar timing Gaussian process

van Haasteren described in more detail today how they can recognize that covariances in pulsar timing residuals among multiple pulsars can be the result of a stochastic background of gravitational radiation. The method they use is a Gaussian process, which is a technology much loved by Bovy, myself, and our collaborators (and at least one of my regular readers). The nice thing is that van Haasteren's project is precisely a Gaussian process, and he knows that for good physical reasons. Furthermore, the covariance matrix he constructs is highly constrained by the physics of the problem. At lunch, he described to a subset of Camp Hogg some of the issues he faces in the real world; two of the most interesting are linear algebra—he needs to invert and take the determinant of some very large, very non-sparse matrices—and visualization—if you detect a signal at low signal-to-noise ratio, how do you convince the skeptics? Camp Hogg didn't have much useful to give him on either issue, though we can help, I think, with sampling issues.


  1. This is a wonderfully difficult topic. We wrestled a lot back in the day on how to do the statistics "right" in a computationally reasonable way, using some of the data analysis insights being developed at the time for WMAP, and some pretty sophisticated theorists (including a future TED prize winner...) got nowhere in the end.

  2. A question when someone wants to "invert" (usually we really want to factor) a really large matrix is "do you really want to do that?". Maybe one can come up with a lower-rank model that sidesteps the difficult computation in the first place.

    For GPs there are some possible solutions from numerical methods (early work in Mark Gibbs' PhD thesis with David MacKay, although I wouldn't stop looking there). Although I would be critical and careful in adopting fancy methods: there's a whole bunch of papers that claim amazing results and actually do nothing useful (I have a slightly ranty video on videolectures.net about this).

    As for visualization, I really like a (frequentist!) approach to this taught to me by Diane Cooke (one of the authors of ggobi). Come up with a way of visualizing your data that you claim highlights the signal. Then generate 19 synthetic datasets from your null hypothesis and subject them to the same visualization technique. Print all 20 figures in a random order and give them to a naive subject. Ask them to tell you which is the odd-one-out. They then apply the sophisticated image processing and reasoning algorithms implemented in their brain to do a non-parametric test. If they pick out the real data then you have a 0.05 significance for your signal. If you want more significance, use multiple subjects.