George Lewis (NYU) gave an absolutely wonderful defense of his thesis today, on top-quark physics with ATLAS at the LHC. He ruled out a range of new physics in the top mass range, and measured the top pair-production cross section more accurately than it can even be predicted in the standard model. A very nice talk and a very well-deserved PhD.
In a low-research day, Tom Barclay (Ames) gave a very nice talk about exoplanets. He made many interesting and novel points. The first was that big planets are still very interesting, because their large impact on the system means that many things can be measured precisely. In particular, he showed examples where you can measure the Doppler beaming of the stellar light resulting from the reflex velocity of the star induced by the planet! Another point was that it is possible to find very tiny planets; he showed some of the smallest planets discovered with Kepler; several are much smaller than Earth. He is personally responsible for the smallest ever. Another point was that there are a few planets now that are debatably and reasonably "habitable". The striking thing is that there aren't yet Earth-sized planets that have been found in year-ish orbits. All known planets are either on shorter orbits or else larger. Time to fix that!
Foreman-Mackey and I finished our NASA ADAP proposal. In the afternoon, we hatched a plan with Barclay (Ames) to search the Kepler photometry for very long-period planets, because the Kepler-team searches are weakest there.
Tom Barclay (Ames), Foreman-Mackey, and I made our plan for hacking this week: We are going to take a multi-tranet (a "tranet" is a transiting planet) system from the Kepler data and infer the host-star limb-darkening profile. Multi-tranet is important, because for any individual tranet (especially at low signal-to-noise), the limb-darkening has substantial covariance with the transit geometry. Indeed, our goal for the week is to find out how constraints on limb-darkening improve as the system increases in the number of tranets; I predict that it will improve faster than the total signal-to-noise in the transits, because the different tranets will place (at least slightly) different constraints on the appearance of the star. But we will see. By the end of the day, some progress was evident, although not through any useful action of mine!
Today was Kepler Day at Camp Hogg, kicking off Kepler Week. Tom Barclay (Ames) came into town for a week to help Foreman-Mackey and me understand the Kepler data in more detail. We spent a lot of the day discussing the various physical effects coming in to the instrument-induced variations we see in the Kepler lightcurves. There are some crazy things, including stellar aberration variations, temperature and point-spread function variations, CCD electronics cross-talk, cosmic-rays and bad cosmic-ray removal, and thruster firings. For many of these things we might be able to build a model or help with modeling. The goal for tomorrow is to decide on week-scale goals and execute. At lunch-time, Foreman-Mackey gave a very nice blackboard talk on Kepler systematics and population modeling, which was pretty relevant to everything we did today.
Foreman-Mackey and I are putting in a NASA proposal to support our Kepler work. I spent a lot of today hacking on it. It has got me excited about what we are doing, which is the best thing a funding proposal can do.
I spent the day at Radcliffe, in a small meeting arranged by Alyssa Goodman (Harvard) and Xiao-Li Meng (Harvard) on how to curate and keep data for analysis and re-analysis. Most of the discussion in this (free-form, informal, small) workshop was around the idea of meta-analysis and re-use of the data by other users. Some of the interesting ideas that came up were the following: Different people coming from different backgrounds have very different meanings for the word "model" and also many other words, including "data" and "provenance". The goals of data preservation, meta-analysis, re-use, and scientific reproducibility are all related and overlapping. Archivists and curators do best when they get involved with the data as early as possible in the "life cycle", preferably right at the original taking of the data. The concerns that arise with reproducibility and the concerns that arise with privacy (think: health data and the like) are strongly at odds.
Meta-analysis can be described in terms of hierarchical modeling (duh) and we should probably think about it that way. Meng showed some nice results on the idea of sufficient statistics in hierarchical models; specifically, he is thinking about statistics that are sufficient for sub-branches of the full model: When are they also sufficient statistics for the whole model? The range of expertise in the room—from statistics to particle physics to library science—made for a lively conversation, and many (small) disagreements. The goal for tomorrow is to write a document summarizing various things learned.
Schölkopf and I worked over coffee to come up with a method for the Lang–Schölkopf idea of combining Web-scraped images using pixel rank information. The idea is that human-viewable images can be very strangely transformed, but if they have been transformed in a way that doesn't re-order pixel brightnesses (at least locally), there ought to be ways to combine them. We came up with several simple methods. It was an interesting conversation, because I like to think about problems as having a causal, generative, probabilistic model underlying them and justify all procedures as being approximations to the Right Thing To Do (tm) within that model framework. Schölkopf likes to think about fast, tractable, scalable procedures with good properties, and only then see if there is an understanding of that procedure in terms of inference. Fortunately, I think we have it all; more soon after we try it out. My job (as usual) is to start the document. While we were talking, Lang was scraping the Web and calibrating images with Astrometry.net.
Lang and Schölkopf blew me away today by suggesting that we combine heterogenous images not by co-adding them but rather by inferring a consistent brightness ranking for the pixels. There are lots of real-world issues (think registration and pixellization and bad data), but there are also lots of reasons that a brightness ranking analysis might be far more robust than a co-adding procedure for finding very faint structure. We set the scope of a fast project to kick this off and Lang started on the dirty work, which involves scraping the web for images (recall our Comet Holmes project?) and running everything through Astrometry.net.
At Computer-Vision-meets-Astronomy group meeting this morning, several extremely good ideas were hatched. One idea, from Lang originally in part, is to build a model of heterogeneous JPEG images of the sky grabbed from the Web but using not true brightness on a linear or magnitude scale but just brightness ranking. This would get us much of the information we seek about the sky without putting nearly as hard requirements on our PSF and photometric calibration.
Another idea, hatched by Schölkopf after an amazing image-recognition demonstration by Fergus, was to start a company (non-profit) that provides a browser plug-in or skin that delivers image-labeling content to a public database, rather than letting it just get sucked into the black hole that is Google Corporation. The idea is that whenever you do a Google image search, Google learns image labels by looking at which images you subsequently click, or so we hypothesize; that's valuable content, since all high-performing image recognition systems (except Astrometry.net) are data-driven. (A related idea was to start a class-action lawsuit to get everyone's image-labeling data back from Google!)
In a very low-research day, I had a short conversation with Mike Kesden (NYU) about how to distinguish models of black-hole–black-hole binary formation using an Advanced LIGO or eLISA data set. I gave the usual gospel of hierarchical probabilistic modeling with your causal knowledge baked in. The non-research part of the day was spent handicapping the Kentucky Derby.
[Note added later: The Kentucky Derby thing worked out well.]
I spent an hour or so on the phone with Lang discussing his ambitious project to build a model of all the WISE imaging using as a very strong prior the SDSS imaging catalog. The results are beautiful, and will help enormously in SDSS-IV eBOSS targeting. We talked about various issues with doing enormous least-square fits like these: The idea is to believe exactly everything we know about the SDSS catalog, the SDSS PSF, and the WISE PSF, and then just find the set of amplitudes (brightnesses), one per catalog entry, that best explains all the WISE pixels. This method makes use of all the WISE epochs but without co-adding them. It also deals as correctly as possible with overlapping sources, since the fit to all amplitudes is simultaneous. It is very beautiful and is—under brutal assumptions—the Right Thing To Do (tm).
Schiminovich and I got to our undisclosed location and then decided to write. We each wrote various things, but I mainly worked on my upcoming proposal to NASA to support my Kepler projects with Foreman-Mackey.
Foreman-Mackey and I looked at the variability of Kepler sources, trying to understand the variability introduced by instrument or detector or full-system sensitivity. There are a lot of effects, and they are oddly repeatable from season to season and from star to star, but with massive exceptions. So we don't really understand it. We briefly got to the point that we thought the variations might be additive, but by the end of the day we were feeling like the dominant effects are multiplicative. We got some very nice and useful feedback about it all from Tom Barclay (Ames) via Twitter (tm) and from Eric Ford (Florida) via email. There are many effects, including stellar abberation, stellar proper motions relative to the detector, both for target stars and for nearby stars, photometric mask issues, and spacecraft thermal and reorientation issues. We need to learn a lot!
At Astronomy meets Computer Vision meeting this morning, Fadely, Foreman-Mackey, and I spent a long time discussing with Schölkopf how we might try out some causal inference in astrophysics data sets. We didn't figure out the killer app, but we re-discussed some old things I have done in my life in galaxy evolution, where there are lots of correlations but little known when it comes to causality.