2014-10-29

single transits, redshift likelihoods

A high research day today, for the first time in what feels like months! In group meeting in the finally-gutted NYU CDS studio space, So Hattori told us about some single transits in Kepler and his putative ability to find them. We unleashed some project management on him and now he has a great to-do list. No success in CampHogg goes unpunished! Along the way, he re-discovered a ridiculously odd Kepler target that has three transits from at least two different kinds of planets, neither of which seems periodic. Or maybe it is one planet around a binary host, or maybe worse? That launched some email trail with some Kepler peeps.

Also at group meeting, Dun Wang showed some near-final tests of the hyper-parameter choices in his data-driven model of the Kepler pixels. It is getting down to details, but details matter. We came up with one final possible simplification for his hyper-parameter choices for him to test this week.

In the afternoon, Alex Malz came by to discuss Spring courses and we ended up working through a menu of possible thesis projects. One that I pitched is so sweet: It is just to write down, very carefully, what we would do if we had instead of a redshift catalog a set of low-precision redshift likelihood functions (with SED or spectral nuisance parameters). Could we then get the luminosity function and spatial clustering of galaxies? Of course we could, but we would have to go hierarchical. Is this practical at LSST scale? Not sure yet.

2014-10-28

text, R, and politics

Today at lunch Michael Blanton organized a Data Science event in which Ken Benoit (LSE) told us about quanteda, his package for manipulating text in R. This package does lots of the data massaging and munging that used to be manual work, and gets the text data into "rectangular" form for data analysis. It also does lots of data analysis tasks too, but the munging was very interesting: Part of Benoit's motivation is to make text analyses reproducible from beginning to end. Benoit's example texts were amusing because he works on political speeches. He had examples from US and Irish politics. Some discussion in the room was about Python vs R; the key motivation for working in R is that it is by far the dominant language at the intersection of statistics and political science.

2014-10-27

nuclear composition of UHECRs

Today Michael Unger (Karlsruhe) told us over lunch about ultra-high energy cosmic rays from Auger. There are many mysteries, but it does look like the composition moves to higher-Z nuclei as you go to higher energies, or at least that's my read. He told us also about a very intriguing extension to Auger which would make it possible to distinguish protons from iron in the ground detectors; if that became possible, it might be possible to do cosmic-ray imaging: It is thought that the cosmic magnetic fields are small enough that protons near the GZK cutoff should point back to their sources. So far this hasn't been possible, presumably because the iron (and other heavy elements) have charge-to-momentum ratios too large; they get heavily deflected by the magnetic fields they encounter.

2014-10-26

Math-Astrophysics collaboration proposal

I spent a big chunk of the day today trying to write a draft of a collaboration proposal (really a letter of intent) for the Simons Foundation. That is only barely research.

2014-10-24

exoplanet compositions

Today Angie Wolfgang (UCSC) gave a short morning seminar about hierarchical inference of exoplanet compositions (like are they ice or gas or rock?). She showed that the super-Earth (1 to 4 Earth-radius) planet radius distribution fairly simply translates into a composition distribution, if you are willing to make the (pretty justified, actually) assumption that the planets are rocky cores with a hydrogen/helium envelope. She inferred the distribution of gas fractions for these presumed rocky planets and got some reasonable numbers. Nice! There is much more to do, of course, since she cut to a very clean sample, and hasn't yet looked at the interdependence of composition, period, and host-star properties. There is a lot to do in exoplanet populations still!

2014-10-22

training convolutional nets to find exoplanets

In group meeting today, a good discussion arose about training a supervised method to find exoplanet transits. Data-Science Masters student Elizabeth Lamm (NYU) is working with us to use a convolutional net (think: deep learning) to find exoplanet transits in the Kepler data. Our rough plan is to train this net using real Kepler lightcurves into which we have injected artificial planets. This will give "true positive" training examples, but we also need "true negative" examples. Since transits are rare, most of the lightcurves would make good negative training data; even if we used all of the non-injected lightcurves arbitrarily, we would only have a false-negative rate of a tiny fraction of a percent (like a hundredth of a percent).

That said, there were various intuitions (about training) represented in the discussion. One intuition is that even this low rate of false negatives might lead to some kinds of over-fitting. Another is that perhaps we should up-weight in the training data true negatives that are "threshold crossing events" or, in other words, places where simple software systems think there is a transit but close inspection says there isn't. We finished the discussion in disagreement, but realized that Lamm's project is pretty rich!

2014-10-21

K2 pointing model

Imagine a strange "game": A crazy telescope designer put thousands of tiny pixelized detectors in the focal plane of an otherwise stable telescope and put it in space. Each detector has an arbitary position in the focal plane, orientation, and pixel scale, or even non-square (affine) pixels. But given the stability, the telescope's properties are set only by three Euler angles. How can you build a model of this? Ben Montet (Harvard CfA), Foreman-Mackey, and I worked on this problem today. Our approach is to construct a three-dimensional "latent-variable" space in which the telescope "lives" and then an affine transformation for each detector patch. It worked like crazy on the K2 data, which are the data from the two-wheel era of the NASA Kepler satellite. Montet is very optimistic about our abilities to improve both K2 and Kepler photometry.

2014-10-20

single transits, new physics, K2

In my small amount of research time, I worked on the text for Hattori's paper on single transits in the Kepler data, including how we can search for them and what can be inferred from them. At lunch, Josh Ruderman (NYU) gave a nice talk on finding beyond-the-standard-model physics in the Atlas experiment at LHC. He made a nice argument at the beginning of his talk that there must be new physics for three reasons: baryogenesis, dark matter, and the hierarchy. The last is a naturalness argument, but the other two are pretty strong arguments! In the afternoon, while I ripped out furniture, Ben Montet (Harvard) and Foreman-Mackey worked on centroiding stars in the K2 data.

2014-10-17

three talks

Three great talks happened today. Two by Jason Kalirai (STScI) on WFIRST and the connection between white dwarf stars and their progenitors. One by Foreman-Mackey on the new paper on M-dwarf planetary system abundances by Ballard & Johnson. Kalirai did a good job of justifying the science case for WFIRST; it will do a huge survey at good angular resolution and great depth. He distinguished it nicely from Euclid. It also has a Guest Observer program. On the white-dwarf stuff he showed some mind-blowing color-magnitude diagrams; it is incredible how well calibrated HST is and how well Kalirai and his team can do crowded-field photometry, both at the bright end and at the faint end. Foreman-Mackey's journal-club talk convinced us that there is a huge amount to do in exoplanetary system population inference going forward; papers like Ballard & Johnson only barely scratch the surface of what we might be doing.

2014-10-16

regression of continuum-normalized spectra

I had a short phone call this morning with Jeffrey Mei (NYUAD) about his project to find the absorption lines associated with high-latitude, low-amplitude extinction. The plan is to do regression of A and F-star spectra against labels (in this case, H-delta EW as a temperature indicator and SFD extinction), just like the project with Melissa Ness (MPIA) (where the features are stellar parameters instead). Mei and I got waylaid by the SDSS calibration system, but now we are working on the raw data, and continuum-normalizing before we regress. This gets rid of almost all our calibration issues. The remaining problem (which I don't know how to solve) is the redshift or rest-frame problem: We want to work on the spectra in the rest frame of the ISM, which we don't know!

2014-10-15

measuring the positions of stars

At group meeting, Vakili showed his results on star positional measurements. We have several super-fast, approximate schemes that come close to saturating the Cramér–Rao bound, without requiring a good model of the point-spread function.

One of these methods is the (insane) method used in the SDSS pipelines, which was communicated to us in the form of code (since it isn't fully written up anywhere). This method (due to Lupton) is genius, fast, runs on minimal hardware with almost no overhead, and comes close to saturating the bound. Another of these is the method made up on the spot by Price-Whelan and me when we wrote this paper on digitization bandwidth, with a small modification (involving smoothing (gasp!) the image); the APW method is simpler and faster than the SDSS method on modern compute machinery.

Full-up PSF modeling should beat (very slightly) both of these methods, but it degrades in an unknown way as the PSF model gets wrong, and who is confident that he or she has a perfect PSF model? Vakili is going to have a nice paper on all this; we started writing it just as an aside to other things we are doing, but we realized that much of what we are learning is not really in the literature. Let's hear it for the analysis of astronomical engineering infrastructure!

2014-10-14

software and literature; convex problems

Fernando Perez (Berkeley), Karthik Ram (Berkeley), and Jake Vanderplas (UW) all descended on CampHogg today, and we were joined by Brian McFee (NYU) and Jennifer Hill (NYU) to discuss an idea hatched by Hill at Asilomar to build a system to scrape the literature—both refereed and informal—for software use. The idea is to build a network and a recommendation system and alt metrics and a search system for software in use in scientific projects. There are many different use cases if we can understand how papers made use of software. There was a lot of discussion of issues with scraping the literature, and then some hacking. This has only just begun.

At lunch, I visited the Simons Center for Data Analysis. I ended up having a long conversation with Christian Mueller (Simons) about the intersection of statistics with convex optimization. Among other things, he is working on principled methods for setting the hyperparameters in regularized optimizations. He told me many things I didn't know about convex problems in data analysis. In particular, he indicated that there might be some very clever and provably optimal (or non-sub-optimal) ways to reduce the feature space for the "Causal Pixel Model" for Kepler pixels that Wang is working on.

2014-10-10

Kepler occurrence rate review, day 2

Today the review committee wrote up and presented recommendations to the Kepler team on it's close-out planet occurrence rate inference plans. We recommended that the big issues in occurrence rate—especially near Earth-like planets—are factor-of-two and larger, so the team ought to focus on the big things and not spend time tracking down percent-level effects. After the review I had long talks with Jon Jenkins (Ames) and Tom Barclay (Ames) about Kepler projects and tools.

2014-10-09

Kepler occurrence rate review, day 1

Today I got up at dawn's crack and drove to Mountain View for a review of the NASA Kepler team's planet occurrence rate inferences. It was an incredible day of talks and conversations about the data products and experiments needed to turn Kepler's planet (or object-of-interest) catalog into a rate density for exoplanets, and especially the probabilities that stars host Earth-like planets. We spent time talking about high-level priorities, but also low-level methodologies, including MCMC for uncertainty propagation, adaptive experimental design for completeness (efficiency) estimation, and the relative merits of forward modeling and counting planets in bins. On the latter, the Kepler team is creating (and will release publicly) everything needed for either approach.

One thing that pleased me immensely is that Foreman-Mackey's paper on the abundance of Earth analogs got a lot of play in the meeting as an exemplar of good methodology, and also an exemplar of how uncertain we are about the planet occurrence rate! The Kepler team—and increasingly the whole astronomical community—is coming around to the view that forward modeling methods (as in hierarchical probabilistic modeling or approximate bayesian computation) are preferable to counting dots in bins.

2014-10-08

DSE Summit, day 3

On the last day of the Summit, we spent the full meeting talking about the collaboration and deliverables for the funding agencies. That does not qualify as research. Late in the day I had a revelation about the relationship between ethnography and science. They are related, but not really the same. Some of the conclusions of ethnography have a factual or hypothesis-generating character, but ethnographic results do not really live in the same domain as scientific results. That is no knock on ethnography! Ethnographers can ask questions that we don't even know how to start to ask quantitatively.