Foreman-Mackey and I looked at the variability of Kepler sources, trying to understand the variability introduced by instrument or detector or full-system sensitivity. There are a lot of effects, and they are oddly repeatable from season to season and from star to star, but with massive exceptions. So we don't really understand it. We briefly got to the point that we thought the variations might be additive, but by the end of the day we were feeling like the dominant effects are multiplicative. We got some very nice and useful feedback about it all from Tom Barclay (Ames) via Twitter (tm) and from Eric Ford (Florida) via email. There are many effects, including stellar abberation, stellar proper motions relative to the detector, both for target stars and for nearby stars, photometric mask issues, and spacecraft thermal and reorientation issues. We need to learn a lot!
At Astronomy meets Computer Vision meeting this morning, Fadely, Foreman-Mackey, and I spent a long time discussing with Schölkopf how we might try out some causal inference in astrophysics data sets. We didn't figure out the killer app, but we re-discussed some old things I have done in my life in galaxy evolution, where there are lots of correlations but little known when it comes to causality.
Schölkopf gave the second of his NYU Courant Lectures today, this time on inferring causality from observed data (data in hand, not intervention-based experiments). All of the ideas are built around concepts of conditional independence. In some cases these are the kinds of independences that are represented by graphical models; in some cases these are new kinds of independence ideas that compare distributions to features in functions that transform those distributions. He showed some simple cases where he can infer causality when we know the direction of causality. After his talk I proposed that we find some places in astrophysics where causality is debated and test them. I encourage suggestions from the peanut gallery.
One thing that Schölkopf has convinced me of in the last year or so is that the reason XDQSO beats SVM or simpler likelihood approaches to separating quasars from stars is that it contains a causal model of the noise process affecting the data. I kept thinking about it as being better because it is generative, but I was wrong: There are lots of ways to be generative that do not help in dealing with changing noise properties. It is really the causal prior information that has been injected into XD that makes it so good.
In a talk-filled day, McWilliams (Princeton) talked about super-massive black-hole merger events and their detectability (through gravitational waves) with pulsar timing, Zrake (NYU) defended his PhD on relativistic turbulence, and Schölkopf gave the first of his Courant Lectures at NYU on causal inference and machine learning. Zrake is particularly deserving of congratulations: He has demonstrated that (warm) relativistic turbulence has very similar statistics to non-relativistic turbulence, which is very, very new. He did this by writing and operating some pretty high-end open-source code.
The day began with a discussion with Tinker (which led to a set of email trails with SDSS-IV eBOSS) about how to make a uniform sample of quasars on the sky, when quasars look very like stars (morphologically and in color) and the photometric errors and stellar density are varying significantly. The answer is: You can't make a uniform sample, for extremely deep reasons: There really is far less information when the errors get worse, and there really are different prior expectations in regions of different stellar density. However, we talked about various approaches to mitigating final, observed quasar density variance in the final observed sample. Nothing is easy here. The problem is that quasars and stars look very similar in the SDSS ugriz filter set. (The crazy thing is that LSST has now set in stone that they will use the same filters! That seems like a big mistake, given everything we now know about stars, galaxies, and quasars in the ugriz bands.)
In a small stint of research time today I worked on a document expressing ideas about weak lensing with Marshall. We keep writing documents, but never a paper! I think we have a good idea now, which is to fully implement Yike Tang's work on lensing inference in a toy universe, showing that good galaxy shape estimators don't necessarily lead to good shear estimators, and good shear estimators don't necessarily lead to good cosmological parameter estimators. Solution: Use probability theory and a full hierarchical model to do it all in one shot. Working in a toy universe makes it all tractable for now; in the real Universe with real data this will be somewhat computationally expensive.
It was old-school-applied-math day at Camp Hogg today, with Itay Yavin (McMaster, Perimeter), Foreman-Mackey, and I talking about how to very quickly find periodic but anharmonic signals in time-series data. We are thinking about Kepler of course, and we are taking brute-force approaches. Our key realization this week, however, has been that if you can make a Fourier Series approximation to the signal you are looking for, then "dot products" or overlap integrals of the data with sinusoids become sufficient statistics for signal detection. This brings down computational complexity by one factor of the size of the data from whatever you had before. Of course we are only thinking about algorithms that work on irregularly sampled data with heterogeneous noise properties; things get even easier if you have uniformly sampled data.
By the way, in a strict technical sense, even if the Kepler satellite takes data at a regular rate according to its on-board clock, the data are not regularly sampled from the point of view of a barycentric or inertial observer for any star. So there are no regularly sampled data sets! This is similar to my oft-stated point that there are no data sets that have no missing data. We have to suck it up. I, for one, am ready to suck.
Chris Fassnacht (UC Davis) came into town and we discussed various things lensing. He showed us some very beautiful Keck AO images of strong gravitational lenses. He also discussed with us the generalization of blind deconvolution that includes a lens model in the middle of the generative model. This is a great idea, and no-one really has the whole package. He and his collaborators have great evidence for massive substructure in lensing galaxies.
In an informal seminar in the morning, Rashid Sunyaev (MPA) finally explained to me clearly why the CMB does not show Lyman, Balmer, Paschen, and so on recombination lines: The answer is that the baryon-to-photon ratio is so enormous that recombination contributes only a minute fraction of all photons. He showed nonetheless that it might be visible in the future by proposed PIXIE.
Prior to that talk I gave the last of our stats school lectures, this one on machine learning. I described some of the popular goals and methods in machine learning and connected them to physical science problems. I gave strong advice, which was mainly to avoid large classes of algorithms. In the afternoon, Shirley Ho (CMU) told us about a use of machine learning to take a few good simulations and lots of bad ones and build lots of (pretty) good ones at low cost.
With Schölkopf, Hirsch, Fadely, and Foreman-Mackey, we had a long session in my office about blind deconvolution, image priors, and so on. Schölkopf had an intuition—at variance with mine—that locality-sensitive hashing will beat a kd-tree for fast lookup of image patches in any massive data-driven prior on image contents. Fadely was volunteered to find out if this is true. Of course the kd-tree can be used to get exact results and the hashing may only give approximate results, so there is a difference, but I had the impression that the kd-tree is hard to beat. It might also depend on scale. Time to call in Lang and Mierle!
On blind deconvolution, we argued about the representation of the high-resolution scene and all agreed on two points: The first is that the model one builds of the imaging data should represent one's beliefs about the causal processes that create that imaging data. The second is that the parameterization of the high-resolution scene should be connected to the specific questions one wants to ask about that scene.
Schiminovich and I met at an undisclosed location to discuss ongoing projects, and he showed me materials from the first-light run of a prototype imaging spectrograph his team has built and deployed at MDM. We discussed a bit how one ought to calibrate such a spectrograph, and reduce the data. My loyal reader will not be surprised to learn that I advocate a causally and probabilistically justified generative (or forward) model that goes from the intensity field incident on the telescope down to the imager pixels in the focal plane. We discussed a bit about how to build that model, which ought to be informed by instrument modeling, calibration data, and science data. We also discussed how to take the science data to make them most useful for calibration. I volunteered (yes, I know) to consult on the data analysis. Weren't we supposed to be working on something related to GALEX?
Oh yeah, we are, and on that front Schiminovich showed me his "bespoke" data reductions for GALEX imaging of the Magellanic Clouds. The coolest thing about them is that you can see the difference—even in a single band—between the very young and slightly older star-forming regions: In the youngest regions, the light is "noisier" because the shot-noise is higher (there is more disparity in stellar luminosities, or fewer effective stars per unit brightness). This suggests a "surface-brightness-fluctuations" approach to age dating! That would be fun.
Once again, talk at Camp Hogg came around to empirical priors for images. Schölkopf and Hirsch (MPI-IS) were inspired to talk about it because of Krishnan's PhD defense yesterday. Fadely, Fergus, and I have been using empirical priors—priors based on the sum total of all previously observed data—to build our science-data-based image calibration system (which itself is a project suggested to us by Schölkopf last year). We discussed nearest neighbors and alternatives, and also how to build this prior in the "calibrated image" space but use it to predict in the "raw image" space. That is a detail that I think has to be important; I plan to work more on it tonight.
Dilip Krishnan (NYU) defended his PhD today, successfully, of course. His work has been on image priors and deconvolution (blind and non-blind), as well as preconditioners for linear algebra problems. His defense talk and the discussion afterwards got some ideas going around about priors on natural and astronomical images; conversations about those with Schölkopf and Michael Hirsch (UCL and MPI-IS, visiting this week) continued well into the afternoon.
I gave the stats-school seminar this morning, on PCA and its limitations. I have so much to say about this. I should write something for the arXiv. I have ranted in this space before about PCA. But at the same time, it is super-effective. That's why making small changes to PCA which make it more responsible, more robust, more probabilistically righteous, and better justified will just pay us all dividends. We bid goodbye to Brewer after lunch; it has been a fun week, and we have a chance of doing something super-cool with the Kuiper Belt.
Today's research highlight was our weekly MCMC meeting, which included Brewer and Schölkopf in addition to the regulars Goodman, Hou, Foreman-Mackey, and Fadely. We discussed many matters, including but not limited to: how to make use of rejected samples (that is, how to not waste those likelihood calls which resulted in samples that are not in the output chain), how to replace MCMC with a method that returns something better than a mixture-of-delta-functions approximation for the posterior, how to combine MCMC with classification or other machine-learning methods to turn a multi-modal posterior into a mixture of uni-modal distributions, and how to propagate errors in diffusive nested sampling and how to make that method more adaptive. Hou has some results on exoplanets in which some stars (radial velocity data here) have roughly similar marginalized likelihood (Bayesian evidence) for two-planet and three-planet models. We discussed how to diagnose these situations. Brewer predicted that if the evidence is good, then the parameters should also be well constrained. That also jives with my intuitions. We encouraged Hou to make visualizations and predictions; some of these cases might turn into discoveries.
Wednesdays are supposed to be sacred research days. That didn't really work out today; life is full! The best thing about the day was an absolutely great talk by Itay Yavin (McMaster) about dark matter with (very weak) electromagnetic interactions (magnetic dipole, to be specific). He (with Chang, Weiner, others) have a model that could potentially explain the putative / suspicious Fermi 130 GeV line, open up direct detection options, and leave signatures at the LHC. That would be the trifecta, and he would get the Nobel Prize! One thing I learned in the talk is that the Fermi team agrees that there might be a line at 130. That's big, and another endorsement for the intuition and judgement of my old collaborator Finkbeiner (Harvard). (Well Finkbeiner isn't old, but my collaborations with him are.)
The second-best thing about the day was lunch with Schölkopf, David Sontag (NYU), and Amir Globerson (Hebrew). We talked about probabilistic models; Sontag is trying to understand the medical states of patients based on medical records, and make predictions for future medical needs. We briefly discussed realism, and my rejection of it. The computer scientists have trouble believing that a physicist would reject realism! But we all agreed that models have to be compared with one another in the space of the data. If you think about it, that point alone is worrying for realism.
Fergus, Schölkopf (MPI-IS, visiting us for three months, now counting as a Camp Hogg regular), Brewer, Fadely, Foreman-Mackey, and I went to lunch together. What a pleasure! Long (unfortunately expensive) lunches are an important part of how we get things done here at Camp Hogg. We discussed further the Kuiper Belt problem, and many other things. One beautiful idea that came up is that outer Solar System objects always have (nearly) zero proper motion, whereas extra-Solar Galactic sources always have proper motions that are as large as (or larger than) the parallax amplitude. Late in the day we discussed (over drinks I am afraid) with Brewer possible engineering improvements to nested sampling and the future of complex, high performance samplers that adaptively take advantage of everything that is now known about the huge class of sampling problems. It was a great day.
Last year, Brendon Brewer (Auckland) visited us and solved a huge problem in astronomy: How do you get the number counts (or flux distribution) of sources too faint to detect in your image? You might think you can't but you can if they contribute significantly to the pixel noise statistics. Brewer's solution is the full Bayesian solution: Sample all possible catalogs! Today (because he is in town), I pitched to him the same problem but for moving sources, in particular sources moving like Kuiper Belt objects move (in an apparent sense from the Earth). If we can generalize what we have done to a mixture of stars and Kuiper Belt objects, we might be able to determine things about the size and semi-major axis distribution even for objects we can't detect significantly even in the full stack of imaging.
Mike Cushing (Toledo) and I discussed some pet peeves (of his, and mine) about data analysis and statistics in astrophysics. We both agreed that things would be much better if the standard data analysis or modeling paper started out by saying what the Right Thing To Do (tm) is (it pretty-much always involves probabilistic modeling), and then how what is actually being done is an approximation to the RTTD(tm). That would keep us clamped to justifiable methods, guide approximation, but not put undue methodological burden on already burdened, pragmatic astrophysicists. We also talked about a few specific data-analysis challenges.
Michelle Deady and Lawrence Anderson-Huang (both Toledo) and I discussed the inversion of (or solution of equations involving) very large matrices that are sparse. I pointed them to ARPACK, which is what scipy.sparse wraps. Sparse matrix methods have made a huge impact on my group; along with MCMC methods, they are among the most valuable potential contributions to astrophysics from applied math, but they are not yet well known. If you have a huge-ass matrix and it is sparse (many elements are zero or near zero in the relevant senses), then probably you aren't inverting it efficiently if you aren't using sparse methods!
I spent part of the day preparing (and an hour giving) a public lecture on dark matter at the University of Toledo. I pitched it at undergrads, and got in some philosophical comments about how we know things. After the talk I had a great discusion with a bunch of physics majors at Toledo, who seemed extremely engaged with research. I also chatted with the locals about Herschel data, star formation, and galaxy evolution.
I spent the day attending a site visit by the Sloan and Moore foundations relating to NYU's efforts in data science. Of course no-one yet knows what data science is; that was part of the whole conversation. The day included a lot of discussions about university structures to support interdisciplinary research, cultural changes required in the scientific community to properly support and promote people working on methods, and the context and value of new educational programs. I learned a huge amount, and enjoyed it. Whatever happens with these foundations, NYU is committed to hiring nine new people in data science over the next five-ish years. Watch this space for pointers.
Ian Dobbs-Dixon (UW) gave a very nice informal talk at the blackboard today about planet migration and planetary atmospheres, prompted by questions out of the blue from various CCPP members (including myself). He placed exoplanet science in the realm of applied physics, which is a reasonable place, although I have been thinking that the subject also has some possible connections to very fundamental physics questions: In some sense it could eventually lead to understandings about the origin of life, which remains one of the great gaps in human knowledge. Dobbs-Dixon described asymmetries in hot Jupiters that could be created by the anisotropic insolation plus rotation; in principle the secondary eclipses measured by Spitzer (or something next-generation) might get asymmetric, especially at wavelengths where there is strong absorption by atmospheric elements. It's a small effect, but exoplanet science is the science of small effects; indeed that's why I am getting interested.
Today we prepared for a site visit by the Sloan and Moore Foundations, who are doing research related to data science and ways in which they could support it in the universities. People from all over the university—many of whom I know well because of our overlapping interests in extracting science from data—got together to figure out what things we want to discuss with the visiting team. One thing I was reminded of in the discussions is that MCMC is a truly cross-cutting tool for data science; it finds a use in every discipline. That makes me even more excited about our various MCMC ideas. Late in the day I worked on my Sloan Atlas of Galaxies. The Sloan Foundation has been pretty important in my scientific life!