single transits, new physics, K2

In my small amount of research time, I worked on the text for Hattori's paper on single transits in the Kepler data, including how we can search for them and what can be inferred from them. At lunch, Josh Ruderman (NYU) gave a nice talk on finding beyond-the-standard-model physics in the Atlas experiment at LHC. He made a nice argument at the beginning of his talk that there must be new physics for three reasons: baryogenesis, dark matter, and the hierarchy. The last is a naturalness argument, but the other two are pretty strong arguments! In the afternoon, while I ripped out furniture, Ben Montet (Harvard) and Foreman-Mackey worked on centroiding stars in the K2 data.


three talks

Three great talks happened today. Two by Jason Kalirai (STScI) on WFIRST and the connection between white dwarf stars and their progenitors. One by Foreman-Mackey on the new paper on M-dwarf planetary system abundances by Ballard & Johnson. Kalirai did a good job of justifying the science case for WFIRST; it will do a huge survey at good angular resolution and great depth. He distinguished it nicely from Euclid. It also has a Guest Observer program. On the white-dwarf stuff he showed some mind-blowing color-magnitude diagrams; it is incredible how well calibrated HST is and how well Kalirai and his team can do crowded-field photometry, both at the bright end and at the faint end. Foreman-Mackey's journal-club talk convinced us that there is a huge amount to do in exoplanetary system population inference going forward; papers like Ballard & Johnson only barely scratch the surface of what we might be doing.


regression of continuum-normalized spectra

I had a short phone call this morning with Jeffrey Mei (NYUAD) about his project to find the absorption lines associated with high-latitude, low-amplitude extinction. The plan is to do regression of A and F-star spectra against labels (in this case, H-delta EW as a temperature indicator and SFD extinction), just like the project with Melissa Ness (MPIA) (where the features are stellar parameters instead). Mei and I got waylaid by the SDSS calibration system, but now we are working on the raw data, and continuum-normalizing before we regress. This gets rid of almost all our calibration issues. The remaining problem (which I don't know how to solve) is the redshift or rest-frame problem: We want to work on the spectra in the rest frame of the ISM, which we don't know!


measuring the positions of stars

At group meeting, Vakili showed his results on star positional measurements. We have several super-fast, approximate schemes that come close to saturating the Cramér–Rao bound, without requiring a good model of the point-spread function.

One of these methods is the (insane) method used in the SDSS pipelines, which was communicated to us in the form of code (since it isn't fully written up anywhere). This method (due to Lupton) is genius, fast, runs on minimal hardware with almost no overhead, and comes close to saturating the bound. Another of these is the method made up on the spot by Price-Whelan and me when we wrote this paper on digitization bandwidth, with a small modification (involving smoothing (gasp!) the image); the APW method is simpler and faster than the SDSS method on modern compute machinery.

Full-up PSF modeling should beat (very slightly) both of these methods, but it degrades in an unknown way as the PSF model gets wrong, and who is confident that he or she has a perfect PSF model? Vakili is going to have a nice paper on all this; we started writing it just as an aside to other things we are doing, but we realized that much of what we are learning is not really in the literature. Let's hear it for the analysis of astronomical engineering infrastructure!


software and literature; convex problems

Fernando Perez (Berkeley), Karthik Ram (Berkeley), and Jake Vanderplas (UW) all descended on CampHogg today, and we were joined by Brian McFee (NYU) and Jennifer Hill (NYU) to discuss an idea hatched by Hill at Asilomar to build a system to scrape the literature—both refereed and informal—for software use. The idea is to build a network and a recommendation system and alt metrics and a search system for software in use in scientific projects. There are many different use cases if we can understand how papers made use of software. There was a lot of discussion of issues with scraping the literature, and then some hacking. This has only just begun.

At lunch, I visited the Simons Center for Data Analysis. I ended up having a long conversation with Christian Mueller (Simons) about the intersection of statistics with convex optimization. Among other things, he is working on principled methods for setting the hyperparameters in regularized optimizations. He told me many things I didn't know about convex problems in data analysis. In particular, he indicated that there might be some very clever and provably optimal (or non-sub-optimal) ways to reduce the feature space for the "Causal Pixel Model" for Kepler pixels that Wang is working on.


Kepler occurrence rate review, day 2

Today the review committee wrote up and presented recommendations to the Kepler team on it's close-out planet occurrence rate inference plans. We recommended that the big issues in occurrence rate—especially near Earth-like planets—are factor-of-two and larger, so the team ought to focus on the big things and not spend time tracking down percent-level effects. After the review I had long talks with Jon Jenkins (Ames) and Tom Barclay (Ames) about Kepler projects and tools.


Kepler occurrence rate review, day 1

Today I got up at dawn's crack and drove to Mountain View for a review of the NASA Kepler team's planet occurrence rate inferences. It was an incredible day of talks and conversations about the data products and experiments needed to turn Kepler's planet (or object-of-interest) catalog into a rate density for exoplanets, and especially the probabilities that stars host Earth-like planets. We spent time talking about high-level priorities, but also low-level methodologies, including MCMC for uncertainty propagation, adaptive experimental design for completeness (efficiency) estimation, and the relative merits of forward modeling and counting planets in bins. On the latter, the Kepler team is creating (and will release publicly) everything needed for either approach.

One thing that pleased me immensely is that Foreman-Mackey's paper on the abundance of Earth analogs got a lot of play in the meeting as an exemplar of good methodology, and also an exemplar of how uncertain we are about the planet occurrence rate! The Kepler team—and increasingly the whole astronomical community—is coming around to the view that forward modeling methods (as in hierarchical probabilistic modeling or approximate bayesian computation) are preferable to counting dots in bins.


DSE Summit, day 3

On the last day of the Summit, we spent the full meeting talking about the collaboration and deliverables for the funding agencies. That does not qualify as research. Late in the day I had a revelation about the relationship between ethnography and science. They are related, but not really the same. Some of the conclusions of ethnography have a factual or hypothesis-generating character, but ethnographic results do not really live in the same domain as scientific results. That is no knock on ethnography! Ethnographers can ask questions that we don't even know how to start to ask quantitatively.


DSE Summit, day 2

On the second day of the Moore–Sloan Data Science Summit, we did some awesome community building exercises involving team problem-solving. We then discussed and tried to understand how it relates to our ideas about collaboration and creativity. That was pretty fun!

At lunch I had a great conversation with Philip Stark (Berkeley) about finding signals in time series below the Nyquist (sampling) limit; in principle it is possible if you have a good idea what you are looking for or what's hidden there. We also talked about geometric descriptions of statistics: The world is infinite dimensional (there are a set of fields at every position in phase space) but observations are finite (noisy measurements of certain kinds of projections). This has lots of implications for the impact of priors (such as non-negativity), when they apply to the infinite-dimensional object (the latent variables, rather than the finite observations).

After lunch, it was probabilistic generalizations of periodograms with Jake Vanderplas (UW) and some frisbee, and then a discussion about the open spaces for Data Science that we are building at Berkeley, UW, and NYU. In all three, there are issues of setting the rules and culture of the space. I think the three institutions can make progress together that no one institution could make on its own.


DSE Summit, day 1

Today was the first day of the community-building meeting of the Moore-Sloan Data Science Environments, held at Asilomar (near Monterey, CA). The project is a collaboration between Berkeley, UW Seattle, and NYU; the meeting has about 100 attendees from across the three institutions. The day started with an unconference in the morning, in which I attended a discussion session on text and text analysis. After that, we got into small inter-institutional groups and worked out our commonalities (and then presented them as lightning talks), as a way to get to know one another and also introduce ourselves to the community. Much of the community building happened on the beach!


the Sun is normal; how did Jupiter form?

At group meeting, Wang reviewed Basri, Walkowicz, & Reiners (2013) on the variability of the Sun in terms of Kepler stars. It shows that (despite rumors to the contrary) the Sun is very typically variable for G-type dwarf stars. It is a very nice piece of work; it just shows summary statistics, but they are nicely robust and insensitive to satellite systematics.

Also in group meeting, Vakili showed first results from a dictionary-learning approach to the point-spread function in LSST simulated imaging. He is using stochastic gradient descent, which I learned (in the meeting) is useful for starting off an optimization, even in cases where the full likelihood (or objective) function can be computed just fine.

After lunch, Roman Rafikov (Princeton) gave a nice talk about the formation of giant planets. He argued that distant planets (like in HR 8799) might have a different formation mechanism that close planets (like Jupiter and hot Jupiters). One very interesting thing about planets—unlike stars—is that the structure is not just set by the composition; it is also set by the formation history.


what is the interstellar-medium rest frame?

I spoke with Jeffrey Mei (NYUAD) early in the morning (my time) to discuss his continuum-normalized SDSS spectra of standard stars. We are trying to look for absorption in the spectra that is associated with interstellar medium by regressing the spectra against the Galactic reddening. This is a great project, but has many complicated issues. Not the least is that it is easy to shift the spectra to the stellar rest frame, or even the Solar System barycentric rest frame, but it is hard to shift them to the mean (line-of-sight) interstellar-medium rest frame. I have some ideas, or we could look for blurry features, blurred by the interstellar velocity differences. Maybe Na D will save us?


Jupiter analogs, and exoplanet music

As with every Wednesday, the highlight was group meeting, which we held (as we do every Wednesday) in the Center for Data Science studio space. We discussed Hattori's search for Jupiter analogs in the Kepler data: The plan is to search with a top-hat function, and then, for the good candidates, do a hypothesis test of top-hat vs saw-tooth vs realistic transit shape. Then do parameter estimation on the ones that prefer the latter. This is a nice structure and highly achievable.

After that, we discussed sonification of the Kepler data with Brian McFee (NYU) and also his tempogram method for looking at beat tracks in music (yes, music). We have some ideas about how these things might be related! At the end of group meeting, we worked on Foreman-Mackey's and Wang's AAS abstracts, both about calibrating out stochastic variability in Kepler light-curves to improve exoplanet search.


comparing data-driven and theory-driven models

I gave the brown-bag talk in the Center for Cosmology and Particle Physics at lunch-time today. I talked about The Cannon, Ness and Rix and my data-driven model of stellar spectra. I also used the talk as an opportunity to talk about machine learning and data science in the CCPP. Various good ideas came up from the audience. One is that we ought to be able to synthesize, with our data-driven model, the theory-driven model spectra that the APOGEE team uses to do stellar parameter estimation. That would be a great idea; it would help identify where our models and the theory diverge; it might even point to improvements both for The Cannon and for the APOGEE pipelines.


Gerry Neugebauer

I learned late on Friday that Gerry Neugebauer (Caltech) has died. Gerry was one of the most important scientists in my research life, and in my personal life. He co-advised my PhD thesis (with also Roger Blandford and Judy Cohen); we spent many nights together at the Keck and Palomar Observatories, and many lunches together with Tom Soifer and Keith Matthews at the Athenaeum (the Caltech faculty club).

In my potted history (apologies in advance for errors), Gerry was one of the first people (with Bob Leighton) to point an infrared telescope at the sky; he found far more sources bright in the infrared than anyone seriously expected. This started infrared astronomy. In time, he became the PI of the NASA IRAS mission, which has been one of the highest-impact (and incredibly high in impact-per-dollar) astronomical missions in NASA history. The IRAS data are still the primary basis for many important results and tools in astronomy, including galaxy clustering, infrared background, ultra-luminous galaxies, young stars, and the dust maps.

To a new graduate student at Caltech, Gerry was intimidating: He was gruff, opinionated, and never wrong (as far as I could tell). But if you broke through that very thin veneer of scary, he was the most loving, caring, thoughtful advisor a student could want. He patiently taught me why I should love (not hate) magnitudes and relative measurements. He showed me how a telescope worked by having me observe at Palomar at his side. He showed me how to test our imaging-data uncertainties, both theoretically and observationally, to make sure we weren't making mistakes. (He taught me to call them "uncertainties" not "errors"!) He helped me develop observing strategies and data-analysis strategies that minimize the effects of detector "memory" and non-linearities. He enjoyed data analysis so much, on one of our projects he insisted that he do the data analysis, so long as I (the graduate student) would be willing to write the paper! Uncharacteristically for then or now, he could run his group so efficiently that many of his students designed, built, and operated an astronomical instrument, from soup to nuts, in a few years of PhD! He had strong opinions about how to run a scientific project, how to write up the results, and even about how to typeset numbers. I obey these positions strictly now in all my projects.

Reading this back, it doesn't capture what I really want to say, which is that Gerry spent a huge fraction of his immense intellectual capability on students, postdocs, and others new to science. He cared immensely about mentoring. From working with Gerry I realized that if you want to propagate great ideas into astronomy, you do it not just by writing papers and giving seminars: You do it by mentoring well new generations of scientists who will, in turn, pass it on in their own work and their own students. Many of the world's best infrared astronomers are directly or indirectly a product of Gerry's wonderful mentoring. I was immensely privileged to get some of that!

[I am also the author of Gerry's only erratum ever in the scientific literature. Gerry was a bit scary the day we figured out that error!]