Tri-State Astronomy

I spent the research part of my day at the Tri-State Astronomy meeting, beautifully organized by Geha (Yale) and Maller (CUNY) and others. I won't do it justice; there were five great review talks and lots of great posters (and one-minute poster summaries). Alyson Brooks (Rutgers), in her review talk, beautifully summarized the issues with CDM at small scales, and discussed the idea that baryonic processes might explain all of these. My take on her bottom line (that baryons might explain everything) is that it would be a disappointment if we can't see the predicted properties of the dark matter at small scales! But she did point towards isolated dwarf galaxies (that is, very low luminosity galaxies that are not satellites of any more luminous galaxy) as critical tests of the ideas; isolated dwarfs should differ from satellite dwarfs if CDM is going to stay okay at small scales. She also implicitly made many predictions about outflows from small galaxies at intermediate redshifts. And she gave a few shout-outs to Zolotov, who has been deeply involved in these baryon issues since leaving NYU.


inferring spectra, optimizing search

Vy Tran (TAMU) showed up and we discussed inference of spectral energy distributions of high-redshift galaxies given only photometry (no spectroscopy). She showed us some nice results, but Fadely and I think we could actually infer spectral properties at wavelength resolution higher than that implied by the broad-band (or medium-band) photometry. So we more-or-less launched a collaboration.

On the way to lunch, Fadely, Foreman-Mackey, and I had an epiphany: Foreman-Mackey has been trying to set the hyperparameters of a Gaussian Process to optimize our ability to search for and find exoplanets. This has been befuddling because it is so frequentist (or not even frequentist you might say). The issue is that what you want to optimize (hyperparameter-wise) depends on the details of how you are going to decide what is an exoplanet (that is, on your threshold choices). We realized on the way to lunch that we probably should be choosing the hyperparameters to maximize the area under the ROC curve. Foreman-Mackey launched on this in the afternoon. Cool if it works; novel definitely.


search and hyperparameters

In a last-minute clutch play on a low-research day, Foreman-Mackey showed me that our ability to detect exoplanet transits in the lightcurves of stochastically varying stars depends in a non-trivial way on the hyperparameters of the Gaussian Process we use to model the unocculted star. We are hoping we can tune to the most-sensitive-for-search hyperparameters before searching, and only switch back to being agnostic about the hyperparameters at the post-search characterization stage.


marginalized likelihood, goose egg

Having written some stuff that I wasn't happy with a few months ago, thought about it, forgot about it, remembered it, mentioned it on the blog here and there, and then dusted it off today, I got close to being ready to make an argument about when you should and shouldn't compute the marginalized likelihood, even if you are a committed probabilistic reasoner. The fundamental idea is that you shouldn't do model selection based on marginalized likelihoods; these are very challenging integrals but only approximations (and often bad ones) to the integrals you would really want to do to inform a model choice. I guess one way to put it is: Don't spend a huge amount of computer time computing something that is a worse approximation to what you want than something else that might be much easier to compute! I sent my argument for review to Brewer, my guru in all things Bayes. I want to say things that are useful and uncontroversial, so I need to be careful, because on these matters I tend to be like a bull in a china shop.

Late in the day I talked to a NYU Data Science class about possible research projects they could do with the Kepler data. As I was talking about how we search for exoplanets in the data (and how likely it is that the data contain many undiscovered planets), one of the faculty in charge of the class (Mike O'Neil) asked me how many exoplanets we (meaning CampHogg) have found in the data so far. I had to admit that the answer is zero. That's just embarrassing. I need to ride the team hard tomorrow.


exoplanets, exoplanets, exoplanets

I gave the CCPP Brown Bag talk today, about Kepler data and exoplanets. I was going to talk about calibration, flat-field, and our white paper, but I ended up talking about very flexible models, for intrinsic stellar variability and spacecraft-induced variability in the lightcurves. People were shocked that some of our models have hundreds of thousands of parameters. I didn't let them in on the secret that, in some sense, the Gaussian Processes we use have infinite numbers of parameters!

Tim Morton (Caltech) dropped by to talk about various things exoplanet. He has a very nice system for computing and propagating probabilities for various exoplanet and non-exoplanet (false-positive) scenarios, given Kepler data. He produces most of what you need in order to do efficient follow-up and population studies, given uncertain or noisy exoplanet identifications. In other news, today was Angus's last day here at NYU. She has been visiting for a month from Oxford, and started some projects with us on search, using Gaussian Processes under the hood. They told us it couldn't be done (too slow) but we are doing it.


astronomy engineering at TAMU

I talked to Lucas Macri (TAMU) and Lifan Wang (TAMU) about optical observations being made in Antarctica at Dome A. This is a Chinese site that is horrifying to get to and use, but which has amazing observational properties, like excellent seeing, transparency, and sky spectrum. They have taken lots of valuable data with amazing 24/7 time coverage and short cadence. They even have some potential exoplanet discoveries. Wang has lots of ideas for next-generation projects, including some with a hefty component of high-grade robotics.

In the morning, Darren DePoy (TAMU) and Jennifer Marshall (TAMU) showed me the hardware being built for the HETDEX experiment. This involves 150 identical (or near-identical) fiber-fed spectrographs, plus a robot positioner. We spent a lot of time talking about mechanical engineering, because if you want to mass assemble a whole lot of instrumentation, you need it to be simultaneously easy to adjust mechanically, and unnecessary to adjust. This may be the future of astrophysics.


Texas A&M

I spent today at Texas A&M University, where I spoke to the Physics Department. I took a risk and talked entirely about modeling astrophysics data, including even inferring the Kepler flat-field! Nick Suntzeff (TAMU) introduced me with a discussion of astro-statistics and its importance in the future of astrophysics, which made me feel a bit better about choosing such a technical topic. I particularly emphasized that making measurements in astrophysics problems—where we can't do controlled experiments—usually requires building a hybrid model that includes both data-driven components (for the parts of the problem that are complicated but we don't particularly need to understand), and causal-physical components (for the parts where we hope to gain some understanding). My examples were XDQSO, Kepler, Comet Holmes and the kitten, and the Solar System force law. On the first example, all I really said is that a whole lot of bad data can be as good as a small amount of good data, when you have a good noise model. On the last point, all I really said was that we have no idea how to scale up for Gaia.


dotastro, day 3

The third and last day of dotastronomy 5 started with reports of the outcome of the Hack Day. Various extremely impressive hacks happened, way too many to mention, but including a very impressive video about planet naming, by Deacon and Angus and others, an automated astronomer-career mapping app by Foreman-Mackey and others, a XBox-Kinect doppler-shift app by Lynn that got everyone in the room dancing and spinning more than once, and (near and dear to my heart) improved functionality for the Zoonibot by Barentsen and Simmons and others. That latter hack is an extension of the the bot that got started by Beaumont and Price-Whelan (at, I am proud to say, my suggestion) at dotastronomy 4.

Among the talks, one of the highlights for me was Trouille (Adler) talking about the Galaxy Zoo Quench project, in which Zooites are taking the project from soup to nuts, including writing the paper. She spent a time in her talk on the problem of getting the participants to boldly play with the data as professional scientists might. It is a rich and deep piece of public outreach; it takes self-selected people through the full scientific process. Another highlight was Microsoft's Tony Hey talking about open access, open data, open science, libraries, and the fourth paradigm. Very inspiring stuff.

Related to that, there was great unconference action in a session on open or low-page-charge publishing models, led by Lynn (Adler) and Lintott (Oxford), in which Simpson (Oxford; and our fearless dotastronomy leader) got emotional (in all the right ways) about how crazy it is that the professional societies and individual scientists have signed away their right to their own work that they researched, wrote, reviewed, and edited for the literature. Testify!

I ran a short unconference session on combining noisy information coming from Zoo participants (or equivalent) in citizen-science and croud-sourcing situations. A good discussion of many issues came up, including about the graphical model that represents our assumptions about what is going on in the projects, about active learning and adaptive methods, and about exposing the internal data in real time so that external (third-party) systems can participate in the adaptive decision-making. I also advocated for boosting-like methods, based on the idea that there might be classifiers (people) with non-trivial and covariant residual (error) properties.

It has been a great meeting; Rob Simpson (Oxford) and Gus Muench (Harvard) deserve huge thanks for organizing and running it.


taking over the world

For reasons outside my control, I had to play hookey on the second day of dotastro. That is very sad, because it is the Hack Day, and the whole raison d'etre of the meeting, as far as I am concerned. The only (research) compensation for this was a fast meeting with Kathryn Johnston (Columbia) and Robyn Sanderson in which we planned various ways to take over the world. The general idea is that we might be able to build a team, connected by commuter rail, that covers the observational, theoretical, and data analytical sides of measuring the Milky Way potential with cold phase-space structures.


dotastro, day 1

Today was the first day of dotastronomy, the meeting for astronomy and web and outreach and so-on, this time in Cambridge, MA. Stand-out talks included those by Stuart Lynn (Adler) on the Zooniverse and Elisabeth Newton (Harvard) about astronomy blogging in general (she mentioned this blog) and Astrobites in particular. Astrobites has been an incredible resource for astronomy, and it is carefully cultivated, edited, and managed. What a project!

In the afternoon we switched to unconference, some of which I skipped to attend a phonecon about Kepler data with the exoSAMSI crew, organized by Bekki Dawson (Harvard), who is effectively our leader. On that call, we discussed what everyone has been doing since exoSAMSI, which is quite a bit. Barclay (Ames) has been working on inferring the limb-darkening laws using transits as measuring tools. Quarles (Texas) has been searching the real-stars-with-injected-planets that we (read: Foreman-Mackey) made back at exoSAMSI, with some success. Foreman-Mackey and Angus have been searching for long-period systems with a fast Gaussian Process inside the search loop. We also spent some time talking about modeling the pixel-level data, since we at CampHogg have become evangelists about this. The SAMSI program, organized mainly by Eric Ford (PSU) has been incredibly productive and is effectively the basis for a lot of my research these days.

In my dotastro talk this morning, I mentioned the point that in "citizen science" you have to model the behavior of your citizens, and then generalized to "scientist science": If you are using data or results over which you have almost no control, you probably have to build a model of the behavior and interests and decision-making of the human actors involved in the data-generating process. In the afternoon, Lintott (Oxford) suggested that we find a simple example of this and write a short paper about it, maybe in an area where it is obviously true that your model of the scientists impacts your conclusions. That's a good idea; suggestions about how to do this from my loyal reader (you know who you are) are welcome.



In a low-research day, I spoke to the new graduate students about research opportunities among the astrophysicists in the Department. I ended up making a pitch for discovery and for origins: We work in a field in discoveries are daily occurrences. And we have some hope of understanding our origins, of our Universe, of our Galaxy, of our Solar System, and maybe even of our tree of life. I love my job!



The highlight of a low-research day was a visit from Roger Blandford (KIPAC), who gave the Physics Colloquium on particle acceleration, especially as regards ultra high-energy particles. He pointed out that the (cosmic) accelerators are almost the opposite of thermal systems: They put all the energy very efficiently into the most energetic particles, with a steep power-law distribution. He made the argument that the highest energy particles are probably accelerated by shocks in the intergalactic media of the largest galaxy clusters and groups. This model makes predicitions, one of which is that the cosmic rays pretty-much must be iron nuclei. In conversations over coffee and dinner we touched on many other subjects, including gravitational lensing and (separately) stellar spectroscopy.


WFC3, exoplanet searching

At Computer-vision-meets-astronomy today, Fadely showed us all some example HST WFC3 images, some models of the PSF, and some comparisons between model and observed stars. I had never put two-and-two together, but the PHAT project (on the periphery of which I lie) has taken some absolutely awesome WFC3 images for the purposes of array calibration: The PHAT images (being of M31) are absolutely teeming with stars. Indeed, it is impressive that the PHAT team can photometer them at all. We discussed strategies for flat-field determination given that we have a good but not perfect PSF model and a lot of heterogeneous data.

After that but before lunch, we more-or-less decided that while Foreman-Mackey works on a Kepler light-curve likelihood function paper, Angus (Oxford) should start work on a Kepler light-curve exoplanet search paper, making use of the same machinery. This is a great division of labor (I hope) and might eventually bring us close to the goal of everything we have been doing with Kepler, to wit, finding Earth-like planets on year-ish orbits around Sun-like stars. Pleased.


the sub-pixel flat field

I took a risk up at Columbia's Pizza Lunch forum by talking about the Kepler flat-field. I also was exceedingly rude and talked through Price-Whelan's spot (he was supposed to follow me). I apologize! Well, you can't say I didn't try to bore the pants off of everyone: I talked about the (novel, and exciting to almost no-one other than me) result, published in our white paper, that it is possible to infer the properties of the flat field at higher than pixel resolution.

That is, the team (meaning, in this case, Lang) made simulated data with drifting stars, a PSF that varies slowly with position (and is well understood), and no prior knowledge about how the stars in the field are drifting. We find (meaning Lang finds) that he can simultaneously figure out the pointing of the satellite and the flat-field, even when the flat-field is both created and fit with models that have multiple sub-pixels per pixel. The reason it works is that as the star moves, it illuminates each pixel differently, and is therefore differently sensitive to the different parts of each pixel. It is not clear yet whether we can do this accurately enough to recover the Kepler sub-pixel flat-field, but damn I want to try. Unfortunately, we need lots of data taken in the two-wheel mode, and (as far as I know) they aren't yet taking any new data. Kepler: Please?


talking about exoplanets

Discussed MCMC convergence with Jeffrey Gertler (NYU), Bayesian evidence (fully marginalized likelihood) with Hou and Goodman, and data-science projects with Mike O'Neil (NYU). O'Neil is co-teaching a course at NYU for the new Data Science program, where the idea is that Masters students will do research projects on real research topics. Foreman-Mackey and I are happy to provide; we discussed several ideas, most of which involve the Kepler data, which we have on the brain right now. One idea is to find all the single transits and see if you can use them to place limits on (or measure!) the frequency (suitably defined) of Jupiter analogs (suitably defined). That's a great problem to post on my Ideas Blog. Hou is computing the marginalized likelihoods of various qualitatively different explanations of radial velocity data, including stellar oscillation models and multi-planet scenarios. Gertler is preparing to find exoplanets in the Galex photon (time-resolved) data.


KIPAC@10, day 4

The morning session was about dark energy, with Sawicki (Heidelberg) taking the theory side and various observers taking the data side. Highlights for me were the weak lensing talks, with von der Linden (DARK) talking about measuring cluster masses and Bard (KIPAC) talking about cosmic shear. During Bard's talk I came up with three possible short papers about weak lensing methodologies, which Marshall, Bard, Meyers (KIPAC) and various others refined over lunch:

The first paper obviates all the problems with transferring stellar PSFs to galaxies by measuring the PSF using the galaxies. LSST can do this because it takes many images under different PSFs. The second paper uses the configurational separations of features (think peaks) in galaxy images to measure shear, independently of, or in concert with, ellipticity measurements. In principle this might be useful, because point separations depend only on astrometric calibration, not PSF determination. The third is to use image-to-image variations in astrometric distortions to infer image-to-image changes in the PSF. I think these two things have to be related, no? This latter project has probably already been done; it requires a literature search.


KIPAC@10, day 3

Although there were very amusing and useful talks this morning from Bloom (Berkeley), Boutigny (CNRS), Marshall, and Wecshler (KIPAC), the highlight for me was a talk by Stuart Lynn (Adler) about the Zooniverse family of projects. He spent a lot of time talking about the care they take of their users; he not only demonstrated that they are doing great science in their new suite of projects, but also that they are treating their participants very ethically. He also emphasized my main point about the Zoo, which is that the rich communication and interaction on the forums of the site is in many ways what's most interesting about the projects.

In the afternoon, we had the "unconference" session. Marshall and I led a session on weak lensing. We spent the entire afternoon tweaking and re-tweaking and arguing about a single graphical model! It was useful and fun, though maybe a bit less pragmatic than we wanted.


KIPAC@10, day 2

Today was the second day (I missed the first) of the KIPAC@10 meeting at KIPAC at SLAC. There was a whirlwind of talks on compact objects and galaxy evolution, too many to summarize, but some highlights for me were the following:

Steiner (UW) showed neutron-star mass and radius measurements and discussed their implications for the properties of matter at extreme density. He showed some very noisy likelihood functions (ish) in mass–radius space, one per measured neutron star (and there are 8-ish with measurements) and tried to draw a curve through them. I have opinions about how to do that and he seems to be doing it right; each time we tried to discuss this over coffee something interrupted us.

Perna (Colorado) talked about magnetars; I hadn't appreciated how extremely short-lived these stars must be; their lifetimes are measured in kyr, which is not a unit you see every day. Romani (Stanford) made a pitch that Fermi-discovered gamma-ray pulsars are the bees knees. He didn't show folded light-curves but apparently there are now hundreds where you can see the periodicity in the (sparse) Fermi data. Tomsick (Berkeley) showed some outrageously awesome NuSTAR data, making me want to hear much more about that mission. It's PI is my old friend from graduate school, Fiona Harrison (Caltech), to drop a name.

Cordes (Cornell) talked about pulsar timing and gravitational radiation, a subject on which I have opinions (from a precision measurement perspective). He, like is common in that business, concentrated on the stochastic gravitational wave background; I would like to hear or think more about coherent source detection. It is usually easier! Along those lines, at one point Blandford (KIPAC) asked Aarons (Berkeley) if physical models of pulsar emission were likely to help in measurements of pulsar timing. Aarons didn't commit either way, but I think the answer has to be yes. Indeed, I have suggested previously that modeling the emission almost has to improve the measurements.

Stark (Arizona) showed very nice new data on galaxies at extremely high redshifts. He noted that almost every result at redshifts beyond six depends entirely on photometric redshifts. That's true, but is it a concern? I guess it is because there could be interloping lower-redshift objects (or stars) having a big effect on the conclusions. Kriek (Berkeley) and Lu (KIPAC), in separate talks, showed that it is difficult to explain the evolution of galaxies in sizes and stellar populations with simple models of star formation and merging. Also, Kriek called into question the star-formation-rate estimates people have been using, which is interesting; she finds a factor-of-two-ish range in the mistakes that could be being made, and this is the same order of magnitude as the amplitude of the variation in specific star-formation rate with galaxy mass. She didn't claim that there is an error there.

In the discussions at lunch, Stuart Lynn (Adler) pitched an idea from David Harris (NAS) that we start a journal of short contributions. Marshall was all over that; it might get launched tomorrow in the unconference session.



First thing in the morning, I tweaked up our Kepler two-wheel white paper and submitted it. I am very proud, because it contains at least three novel results in astronomical image modeling and calibration. I very much hope I can keep momentum and get my team to publish these results. If you want to read the white paper (see if you can spot the three novel results), it is here.

At a very leisurely lunch, Foreman-Mackey, Fadely, Ruth Angus (Oxford), and I discussed possible projects for this month, in which Angus is visiting CampHogg. We more-or-less settled on long-period exoplanet search, with heavy use of Gaussian Processes.


image modeling for Kepler

I spent the whole long holiday weekend (it is Labor Day here in the US) working on our Kepler white paper, in response to the two-wheel call (PDF). Foreman-Mackey came through with some auto-regressive-like data-driven models for extant Kepler data, Michael Hirsch (UCL, MPI-IS) came through with some PSF-photometry tests for insane (drifted) PSFs, and Lang came through with simulations of toy Kepler data in the two-wheel era, along with models thereof. Lang was able to show that we can (in principle) infer a flat-field that is higher resolution than the data! That is, we can infer heterogeneous intra-pixel sensitivity variations, pixel by pixel. That is awesome, and we ought to apply it to every data set out there! We will post the white paper on arXiv after we submit it.