Today the review committee wrote up and presented recommendations to the Kepler team on it's close-out planet occurrence rate inference plans. We recommended that the big issues in occurrence rate—especially near Earth-like planets—are factor-of-two and larger, so the team ought to focus on the big things and not spend time tracking down percent-level effects. After the review I had long talks with Jon Jenkins (Ames) and Tom Barclay (Ames) about Kepler projects and tools.
2014-10-10
2014-10-09
Kepler occurrence rate review, day 1
Today I got up at dawn's crack and drove to Mountain View for a review of the NASA Kepler team's planet occurrence rate inferences. It was an incredible day of talks and conversations about the data products and experiments needed to turn Kepler's planet (or object-of-interest) catalog into a rate density for exoplanets, and especially the probabilities that stars host Earth-like planets. We spent time talking about high-level priorities, but also low-level methodologies, including MCMC for uncertainty propagation, adaptive experimental design for completeness (efficiency) estimation, and the relative merits of forward modeling and counting planets in bins. On the latter, the Kepler team is creating (and will release publicly) everything needed for either approach.
One thing that pleased me immensely is that Foreman-Mackey's paper on the abundance of Earth analogs got a lot of play in the meeting as an exemplar of good methodology, and also an exemplar of how uncertain we are about the planet occurrence rate! The Kepler team—and increasingly the whole astronomical community—is coming around to the view that forward modeling methods (as in hierarchical probabilistic modeling or approximate bayesian computation) are preferable to counting dots in bins.
2014-10-08
DSE Summit, day 3
On the last day of the Summit, we spent the full meeting talking about the collaboration and deliverables for the funding agencies. That does not qualify as research. Late in the day I had a revelation about the relationship between ethnography and science. They are related, but not really the same. Some of the conclusions of ethnography have a factual or hypothesis-generating character, but ethnographic results do not really live in the same domain as scientific results. That is no knock on ethnography! Ethnographers can ask questions that we don't even know how to start to ask quantitatively.
2014-10-07
DSE Summit, day 2
On the second day of the Moore–Sloan Data Science Summit, we did some awesome community building exercises involving team problem-solving. We then discussed and tried to understand how it relates to our ideas about collaboration and creativity. That was pretty fun!
At lunch I had a great conversation with Philip Stark (Berkeley) about finding signals in time series below the Nyquist (sampling) limit; in principle it is possible if you have a good idea what you are looking for or what's hidden there. We also talked about geometric descriptions of statistics: The world is infinite dimensional (there are a set of fields at every position in phase space) but observations are finite (noisy measurements of certain kinds of projections). This has lots of implications for the impact of priors (such as non-negativity), when they apply to the infinite-dimensional object (the latent variables, rather than the finite observations).
After lunch, it was probabilistic generalizations of periodograms with Jake Vanderplas (UW) and some frisbee, and then a discussion about the open spaces for Data Science that we are building at Berkeley, UW, and NYU. In all three, there are issues of setting the rules and culture of the space. I think the three institutions can make progress together that no one institution could make on its own.
2014-10-06
DSE Summit, day 1
Today was the first day of the community-building meeting of the Moore-Sloan Data Science Environments, held at Asilomar (near Monterey, CA). The project is a collaboration between Berkeley, UW Seattle, and NYU; the meeting has about 100 attendees from across the three institutions. The day started with an unconference in the morning, in which I attended a discussion session on text and text analysis. After that, we got into small inter-institutional groups and worked out our commonalities (and then presented them as lightning talks), as a way to get to know one another and also introduce ourselves to the community. Much of the community building happened on the beach!
2014-10-03
the Sun is normal; how did Jupiter form?
At group meeting, Wang reviewed Basri, Walkowicz, & Reiners (2013) on the variability of the Sun in terms of Kepler stars. It shows that (despite rumors to the contrary) the Sun is very typically variable for G-type dwarf stars. It is a very nice piece of work; it just shows summary statistics, but they are nicely robust and insensitive to satellite systematics.
Also in group meeting, Vakili showed first results from a dictionary-learning approach to the point-spread function in LSST simulated imaging. He is using stochastic gradient descent, which I learned (in the meeting) is useful for starting off an optimization, even in cases where the full likelihood (or objective) function can be computed just fine.
After lunch, Roman Rafikov (Princeton) gave a nice talk about the formation of giant planets. He argued that distant planets (like in HR 8799) might have a different formation mechanism that close planets (like Jupiter and hot Jupiters). One very interesting thing about planets—unlike stars—is that the structure is not just set by the composition; it is also set by the formation history.
2014-10-02
what is the interstellar-medium rest frame?
I spoke with Jeffrey Mei (NYUAD) early in the morning (my time) to discuss his continuum-normalized SDSS spectra of standard stars. We are trying to look for absorption in the spectra that is associated with interstellar medium by regressing the spectra against the Galactic reddening. This is a great project, but has many complicated issues. Not the least is that it is easy to shift the spectra to the stellar rest frame, or even the Solar System barycentric rest frame, but it is hard to shift them to the mean (line-of-sight) interstellar-medium rest frame. I have some ideas, or we could look for blurry features, blurred by the interstellar velocity differences. Maybe Na D will save us?
2014-10-01
Jupiter analogs, and exoplanet music
As with every Wednesday, the highlight was group meeting, which we held (as we do every Wednesday) in the Center for Data Science studio space. We discussed Hattori's search for Jupiter analogs in the Kepler data: The plan is to search with a top-hat function, and then, for the good candidates, do a hypothesis test of top-hat vs saw-tooth vs realistic transit shape. Then do parameter estimation on the ones that prefer the latter. This is a nice structure and highly achievable.
After that, we discussed sonification of the Kepler data with Brian McFee (NYU) and also his tempogram method for looking at beat tracks in music (yes, music). We have some ideas about how these things might be related! At the end of group meeting, we worked on Foreman-Mackey's and Wang's AAS abstracts, both about calibrating out stochastic variability in Kepler light-curves to improve exoplanet search.
2014-09-29
comparing data-driven and theory-driven models
I gave the brown-bag talk in the Center for Cosmology and Particle Physics at lunch-time today. I talked about The Cannon, Ness and Rix and my data-driven model of stellar spectra. I also used the talk as an opportunity to talk about machine learning and data science in the CCPP. Various good ideas came up from the audience. One is that we ought to be able to synthesize, with our data-driven model, the theory-driven model spectra that the APOGEE team uses to do stellar parameter estimation. That would be a great idea; it would help identify where our models and the theory diverge; it might even point to improvements both for The Cannon and for the APOGEE pipelines.
2014-09-28
Gerry Neugebauer
I learned late on Friday that Gerry Neugebauer (Caltech) has died. Gerry was one of the most important scientists in my research life, and in my personal life. He co-advised my PhD thesis (with also Roger Blandford and Judy Cohen); we spent many nights together at the Keck and Palomar Observatories, and many lunches together with Tom Soifer and Keith Matthews at the Athenaeum (the Caltech faculty club).
In my potted history (apologies in advance for errors), Gerry was one of the first people (with Bob Leighton) to point an infrared telescope at the sky; he found far more sources bright in the infrared than anyone seriously expected. This started infrared astronomy. In time, he became the PI of the NASA IRAS mission, which has been one of the highest-impact (and incredibly high in impact-per-dollar) astronomical missions in NASA history. The IRAS data are still the primary basis for many important results and tools in astronomy, including galaxy clustering, infrared background, ultra-luminous galaxies, young stars, and the dust maps.
To a new graduate student at Caltech, Gerry was intimidating: He was gruff, opinionated, and never wrong (as far as I could tell). But if you broke through that very thin veneer of scary, he was the most loving, caring, thoughtful advisor a student could want. He patiently taught me why I should love (not hate) magnitudes and relative measurements. He showed me how a telescope worked by having me observe at Palomar at his side. He showed me how to test our imaging-data uncertainties, both theoretically and observationally, to make sure we weren't making mistakes. (He taught me to call them "uncertainties" not "errors"!) He helped me develop observing strategies and data-analysis strategies that minimize the effects of detector "memory" and non-linearities. He enjoyed data analysis so much, on one of our projects he insisted that he do the data analysis, so long as I (the graduate student) would be willing to write the paper! Uncharacteristically for then or now, he could run his group so efficiently that many of his students designed, built, and operated an astronomical instrument, from soup to nuts, in a few years of PhD! He had strong opinions about how to run a scientific project, how to write up the results, and even about how to typeset numbers. I obey these positions strictly now in all my projects.
Reading this back, it doesn't capture what I really want to say, which is that Gerry spent a huge fraction of his immense intellectual capability on students, postdocs, and others new to science. He cared immensely about mentoring. From working with Gerry I realized that if you want to propagate great ideas into astronomy, you do it not just by writing papers and giving seminars: You do it by mentoring well new generations of scientists who will, in turn, pass it on in their own work and their own students. Many of the world's best infrared astronomers are directly or indirectly a product of Gerry's wonderful mentoring. I was immensely privileged to get some of that!
[I am also the author of Gerry's only erratum ever in the scientific literature. Gerry was a bit scary the day we figured out that error!]
2014-09-26
interstellar bands; PSF dictionaries
Gail Zasowski (JHU) gave an absolutely great talk today, about diffuse interstellar bands in the APOGEE spectra and their possible use as tools for mapping the interstellar medium and measuring the kinematics of the Milky Way. Her talk also made it very clear what a huge advance APOGEE is over previous surveys: There are APOGEE stars in the mid-plane of the disk on the other side of the bulge! She showed lots of beautiful data and some results that just scratch the surface of what can be learned about the interstellar medium with stellar spectra.
In CampHogg group meeting in the morning, we realized we can reformulate Vakili's work on the point-spread function in SDSS and LSST so that he never has to interpolate the data (to, for example, centroid the stars properly). We can always shift the models, never the data. We also realized that we don't need to build a PCA or KL basis for the PSF representation; we can use a dictionary and learn the dictionary elements along with the PSF. This is an exciting realization; it almost ensures that we have to beat the existing methods for accuracy and flexibility. Also interesting: The linear algebra we wrote down permits us to make use of "convolutional methods" and also permits us to represent the PSF at pixel resolutions higher than the data (super-resolution).
2014-09-25
overlapping stars, stellar training sets
On the phone with Schölkopf, Wang, Foreman-Mackey, and I tried to understand how it is that we can fit some insanely variable stars in the Kepler data using other stars, when the variability seems so specific to each star. In one case we investigated, it turned out that the crazy variability of one star (below) was perfectly matched by the variability of another, brighter star. What gives? It turns out that the two stars overlap on the detector, so their footprints actually share pixels! The shared variability is caused by the situation that they are being photometered through overlapping apertures. We also learned that some stars in Kepler have been assigned non-contiguous apertures.

Late in the day, Gail Zasowski (JHU) showed up. I explained in detail The Cannon—Ness, Rix, and my label-transfer code for stellar parameter estimation. She had many questions about our training set, both because it is too large (it contains some obviously wrong entries) and too small (it doesn't nearly cover all kinds of stars at all metallicities).
2014-09-24
deep learning and exoplanet transits
At group meeting, Foreman-Mackey and Wang showed recent results on calibration of K2 and Kepler data, respectively, and Malz showed some SDSS spectra of the night sky. After group meeting, Elizabeth Lamm (NYU) came to ask about possible Data Science capstone projects. We pitched a project on finding exoplanets with Gaia data and another on finding exoplanet transits with deep learning! The latter project was based on Foreman-Mackey's realization that everything that makes convolutional networks great for finding kittens in video also makes them great for finding transits in variable-star light-curves. Bring it on!
2014-09-23
half full or half empty?
Interestingly (to me, anyway), as I have been raving in this space about how awesome it is that Ness and I can transfer stellar parameter labels from a small set of "standard stars" to a huge set of APOGEE stars using a data driven model, Rix (who is one of the authors of the method) has been seeing our results as requiring some spin or adjustment in order to be impressive to the stellar parameter community. I see his point: What impresses me is that we get good structure in the label (stellar parameter) space and we do very well where the data overlap the training sample. What concerns Rix is that many of our labels are clearly wrong or distorted, especially where we don't have good coverage in the training sample. We discussed ways to modify our method or our display of the output to make both points in a responsible way.
Late in the day, Foreman-Mackey and I discussed NYU's high-performance computing hardware and environment with Stratos Efstathiadis (NYU), who said he would look into increasing our disk-usage limits. Operating on the entire Kepler data set inside the compute center turns out to be hard, not because the data set is large, but rather because it is composed of so many tiny files. This is a problem, apparently, for distributed storage systems. We discussed also the future of high-performance computing in the era of Data Science.
2014-09-22
making black holes from gravitons!
I am paying for a week of hacking in Seattle with some days of not research back here in New York City. The one research highlight of the day was Gia Dvali (NYU) telling us at lunch about his work on black holes as information processing machines. Along the way, he described the thought experiment of constructing a black hole by concentrating enormous numbers of gravitons in a small volume. Apparently this thought experiment, as simple as it sounds, justifies the famous black-hole entropy result. I was surprised! Now I am wondering what it would take, physically, to make this experiment happen. Like could you do this with a real phased array of gravitational radiation sources?