cosmic neutrino background

Ben Safdi (Princeton) gave the high-energy physics seminar today, about the possibility of detecting the cosmic neutrino background. The neutrinos are cold (like the CMB) and really, really hard to detect. Amazingly, there is a plan and a project underway. The idea is to find the neutrino-capture events that take a neutron to a proton; these have no energy threshold but in the electron output are kinematically unlike neutrino decays. The experiment requires a huge amount of tritium (in a crazy low-density foam-like setup), incredible energy resolution, and some luck. But it looks conceivably successful. In the afternoon, Chang Hoon Hahn (NYU) gave a nice candidacy presentation in which he showed a (preliminary) three-point function (bispectrum) from SDSS-III BOSS. It looks incredible.


data textification?

Arfon Smith (Github) was in town for the day, as was Josh Bloom (Berkeley). We spent a good deal of the morning talking about matters of mutual interest, also with Foreman-Mackey. One idea we batted around was topic modeling for code repositories on Github. It would be so cool to find other codebases that are about similar subjects and not just in the same language. We split for a bit into pairs, with Bloom and me discussing probabilistic astrometric calibration. He has a plan to fit the focal-plane distortions in a telescope image with a Gaussian Process, which is very much aligned with current ideas at CampHogg. After Bloom left, Smith, Foreman-Mackey, and I discussed things we would like to do with or see in Github. In particular we discussed the great value of the (parody, non-serious) open-source report card that Foreman-Mackey built last year. It is valuable because it is a information-rich text-based description of Github activity; it is like a textification (as opposed to a visualization or a sonification) and it provides heterogeneous detail (in a humorous way). What else could be done like that? Somehow this all relates to evaluation and metrics. Imagine a system that could "summarize" an astronomer's full publication history, or even better full publication, hardware-building, and code-writing history.


Moore-Sloan Data Science Environment at NYU

Today was the launch event for the Moore-Sloan Data Science Environment at NYU, of which I am the Executive Director. This would not be research (see The Rules at right) except that the event featured six extremely good, short science talks.

  • Cranmer (NYU Physics) talked about the discovery and measurement of the Higgs. He showed that the complicated "Data Science" part of the problem was combining the code and scientific analyses of dozens of disparate groups doing disparate things with the data.
  • Bonneau (NYU Biology and CS) talked about relationships between genes and microbes. Much of what they do is infer networks of interactions among genes. But he showed some beautiful stuff on inferring differential equations from noisy data, which is relevant across many disciplines.
  • Pesaran (NYU Neural Science) spoke about massive (relatively) non-invasive recording efforts in monkey and human brains. His lab can record from thousands of sites simultaneously, but at the brain's surface. He gets large data volume at some loss of resolution, and can record in normal situations, while also capturing movement and behavior with large networks of cameras and other devices. He is working on the brain–behavior relationships, but also has a good chance of being able to make very high capability brain–computer interfaces!
  • Freire (NYU Engineering) showed what can be done with data from cities; focusing on amazing things they have learned by visualizing the huge amounts of data NYC keeps about taxicabs. They can see many interesting events in the data, and are looking at automatically flagging and identifying events both in historical data and in real time.
  • Tucker (NYU Politics) spoke about political information available on the web from Twitter and similar sources. He showed results on events and subjects where political polarization is high, others where it is low, and others where it changes quickly in time (one example: Sandy Hook shooting).
  • Fergus (NYU CS) showed our work on Oppenheimer's (AMNH) P1640 data and how it is possible to extract faint signals from data with a generative model. Both Fergus and Cranmer showed non-trivial probabilistic graphical models, which was super.
I loved the event and learned a huge amount. There were also abundant posters in a poster session, and participants from all over the University and from all over the city. Also from UW and UCB, our partners.


UnDisLo, day 5

We checked out of the UnDisLo in the morning and drove up to UCSC to spend the day with Conroy, Johnson, Weisz, and Prochaska. At lunch I put on my cray-cray and talked about my theory that any solar-power device will have non-trivial absorption and reflection spectra, because to be efficient it has to stay cold while absorbing lots of energy. If I am right (and I probably am not), this might lead to reliable spectral signatures of life. After that we spent quality time planning paper zero on the spectroscopic calibration paper, tentatively titled "Combining information from photometric and spectroscopic data: Don’t waste your time in spectrophotometric calibration!". We opened a github repo and started writing.


UnDisLo, day 4

Today was exoplanet day at our undisclosed location. Quintana and Barclay (both Ames) were in the house; we worked with Barclay on some of the new K2 data. Foreman-Mackey worked on building a physical model along the lines of our white paper. I worked on making optimized aperture photometry. The latter worked well, at least relative to anything the Kepler team has done so far. The K2 pointing and precision is good, but not nearly as good as in the Kepler era. That said, I think we can recover most of the precision through Good Data Analysis (tm). (Just like we said in the white paper.)

Wolfgang (UCSC) told us about her model for the period distribution of exoplanets, which she is implementing in JAGS. We gave her suggestions for enfastenating it. The key idea is that, in her framework, it might be better to model the observed distribution of transiting exoplanets, rather than the true distribution of (either transiting or not) exoplanets. In the latter case you do a lot of modeling of planets you will never see. Or at least not in Kepler.

Because both the exoplanet people and the star-formation people (Conroy, Weisz, Johnson of UCSC) are using Gaussian processes, Foreman-Mackey was pressed into an impromptu tutorial in the afternoon. He rocked it, and Conroy encouraged us in no uncertain terms to write it up in the Data Analysis Recipes series.


UnDisLo, day 3

Another impressively productive day just happened in our work bunker on the Central Coast. Conroy (UCSC) came in again, with his student Jieun Choi (UCSC). We discussed ways to generalize "stacking" of spectra in "bins"; it is often (usually?) better to regress. This can be expressed as a wavelength-by-wavelength weighted linear least-square fit. We worked through the linear-regression math in the morning and Choi implemented in the afternoon. The results look sweet. Choi used them to look for consistent residuals in rest-frame (redshifted) and observed-frame (spectrograph) space and it looks like there are effects in both places. At the end of the day, we left Choi with the project of going to quadratic regression.

Marshall (KIPAC) and I worked on extending The Tractor to fit variable point sources, as part of our long-term goal of finding strong gravitational lenses in ground-based data by comparing lens and non-lens explanations of the data. We spent a lot of time understanding inheritance and Lang's pythonisms.

Johnson (UCSC) showed us the results of his Gaussian-Process data-driven calibrations of spectra. They are extremely precise, produce sensible-looking posterior information on spectral properties, and don't seem to erase or interfere with narrow-line issues with the models. We plan to write the method paper this summer.

At the very end of the day, Barclay, Quintana (Ames), and Wolfgang (UCSC) showed up, so I guess we will be talking exoplanets tomorrow! Quintana and Barclay just found the first Earth-radius planet in the habitable zone of a star, and Wolfgang is using physics-based models to fit the radius–period distribution of exoplanets.


UnDisLo, day 2

There is nothing like being isolated in the middle of nowhere with bad internet! Johnson (UCSC) and Foreman-Mackey worked on Johnson's problem of fitting simultaneously photometry and spectroscopy, with an extremely flexible model for spectrograph calibration (or calibration residuals). The key idea that moved us forward is that you can fit the large-scale calibration "vector" with a polynomial and then pick up the small remaining residuals with a Gaussian Process. The latter is analytic to marginalize out, so it doesn't increase the number of parameters in the MCMC sampling (and hence doesn't hurt much CPU-wise). The results are beautiful, even on completely uncalibrated spectra: People: Don't waste your time on calibration! I think we will be able to write a very strong paper making this point, for a large set of spectroscopy use-cases.

Weisz (UCSC) and Foreman-Mackey worked on hierarchical inference of the initial mass function of stars, given stellar-population fits to a large number of resolved clusters in M31. Each cluster gets a very different IMF in a point-estimate sense (maximum-likelihood or whatever), but is this variance intrinsic or just from observational noise? We did our usual (should be patented!) importance-sampling thing to infer the intrinsic distribution and find that the data are at least marginally consistent with a delta-function (narrow) distribution. But we are only looking at a tiny fraction of the available data.

Late in the day, Marshall showed up! Tomorrow we will do some gravitational lensing.


UnDisLo, day 1

Foreman-Mackey and I drove down to an undisclosed location on Monterey Bay to work with Charlie Conroy, Dan Weisz, and Ben Johnson (all UCSC). On the way down we discussed my new optimized photometry program. Once we arrived, we got to planning the week of hacking. We decided to focus on a few areas of mutual interest involving non-trivial data analysis. One area is combining spectroscopic and photometric information on galaxies and stars, where the spectroscopy is less reliable but far larger in total bytes. We have ideas about this. Another is learning a population distribution from noisy measurements. We have done this for exoplanets and photometric quasars and so on; Foreman-Mackey and I want to build general tools. Another area is learning the dependence of average (mean) spectra of galaxies on intrinsic properties like luminosity, redshift, metallicity, and velocity dispersion; Conroy has done great work in this area with blunt tools. We can help sharpen those. Should be a fun week!



I was in CA this week working on the Moore–Sloan Data Science Environment. This doesn't exactly count as research, so I haven't been posting. But today I crashed a meeting in Napa Valley hosted by Wechsler (KIPAC), Conroy (UCSC), and others. I saw just a few talks, but they were excellent: Jeremiah Murphy (UFl) on supernovae explosions, Conroy on abundance anomalies on globular clusters, Blanton (NYU) on photometry, Finkbeiner (CfA) on photometric calibration, and Sarah Tuttle (UT) on the HETDEX spectrograph hardware. Great stuff.

Murphy showed us that there are crazy neutrino dynamics in the first fraction of a second in a supernova explosion; in particular there should be stellar oscillations imprinted on the neutrino signal! Conroy showed that there are light-element vs heavy-element abundance anti-correlations in essentially all globular clusters, and indications that some stars are very over-rich in helium. There is no good explanation. Blanton went carefully through the properties of astronomical imaging and photometry, for two hours. I loved it, and at the end, Kollmeier (OCIW) said she wanted more! Finbeiner showed that PanSTARRS and SDSS have great, precise, consistent photometry, and the calibration is all, entirely, self-calibration. This justifies strongly things I said at AAS this year. Tuttle talked about trade-offs in hardware design. The mass production of spectrographs for HETDEX is a huge engineering challenge.


red giants as clocks

Lars Bildsten (KITP) was in town and gave two talks today. In the first, he talked about super-luminous supernovae, and how they might be powered by the spin-down of the degenerate remnant, when spin-down times and diffusion times become comparable. In the second, he talked about making precise inferences about giant stars from Kepler and COROT photometry. The photometry shows normal modes and mode splittings, which are sensitive to the run of density in the giants; this in turn constrains what fraction of the star has burned to helium. There is a lot of interesting unexplained phenomenology related to the spin of the stellar core, which remains a puzzle. There was much more in the talk as well, but one thing that caught my interest is that some of the modes are exceedingly high in quality factor or coherence. That is, giants look like very good clocks. A discussion broke out at the end about whether or not we could use these clocks to constrain, detect, or measure gravitational radiation. Each star is much worse than a radio pulsar, but there are far, far more of them available for use. Airplane project!


probabilistic halo mass inference

In a low-research day, at lunch, Kilian Walsh pitched to Fadely and me a project to infer galaxy host halo masses from galaxy positions and redshifts. We discussed some of the issues and previous work. I am out of the loop, so I don't know the current literature. But I am sure there is interesting work that can be done, and it would be fun to combine galaxy kinematic information with weak lensing, strong lensing, x-ray, and SZ effect data.


permitted kernel functions, cosmology therewith

I spent a while at the group meeting of applied mathematician Leslie Greengard (NYU, Simons Foundation), telling the group how cosmology is done, and then how it might be done if we had some awesome math foo. In part we got on to how you could make a non-parametric kernel function for a Gaussian Process for the matter density field at late times, given that you need to stay non-negative definite. Oh wait, I mean positive semi-definite. Oh the things you learn! Anyway, it turns out that this is not really a solved problem and possibly a project was born. Hope so! I would love to recreate our discovery of the baryon acoustic feature with proper inference. At the group meeting, Foreman-Mackey and I had an "aha moment" about Ambikasaran et al's method for solving and taking the determinants of kernel matrices (Siva Ambikasaran (NYU) was in attendance), and then spent the post-group-meeting lunch in part quizzing Mike O'Neil (NYU) about how to structure our code to work fast in the three-dimensional case (the cosmology case).


fit all your streams, gamma-Earth

I spoke with Kathryn Johnston's group by phone for a long time at midday, about the meeting last week at Oxford. I opined that "the competition" is going to stick with integrable orbits for a while, so we can occupy the niche of more general potentials and orbit families. We discussed at some length the disagreement between Sanders (Oxford) and Bovy about how and why streams are different from orbits. Towards the end of that meeting, we discussed Price-Whelan's PhD projects, which he wants to include a balance of theory and real-data inference. I argued strongly that Price-Whelan should follow the Branimir Sesar (MPIA) "plan" which is to fit all the known streams and use those fits to figure out what observations are most crucial to do next. Plus maybe some theory.

In the afternoon, Foreman-Mackey and I discussed figures and content for his "gamma-Earth" paper (not "eta-Earth" but "gamma-Earth"). We decided to choose a fiducial model, work that through completely, and show all the other things we know as adjustments to that fiducial model. We also discussed how to show everything on one big figure (which would be great, for talks and the paper). Foreman-Mackey told me that the Tremaine papers on planet occurrence get the likelihood function for the variable-rate Poisson problem correct (including overall normalization); our only "advances" relative to the Tremaine papers are that we have a more flexible functional form for the rate function and its prior, and we fully account for the observational uncertainties (which basically no-one knows how to do at this point).


probabilistic grammar, massive graviton

In a low-research day, I saw two absolutely excellent seminars. The first was Alexander Rush (MIT, Columbia) talking about methods for finding the optimal parsing or syntactical structure for a natural-language sentence using lagrangian relaxation. The point is that the number of parsings is combinatorially large, so you have to do clever things to find good ones. He also looked at machine translation, which is a very related problem. At the end of his talk he discussed extraction of structured information from unstructured text, which might be applicable to the scientific literature.

Over lunch, Sergei Dubovsky (NYU) spoke about massive graviton theories and the recent BICEP2 results. He started by explaining that there are non-pathological gravity modifications in which the graviton is massive in its tensor effects, but doesn't get messed up in its scalar and vector effects. This means you have no change to the "force law" as it were (nor the black-hole solutions nor the cosmological world model) but you modify gravitational radiation. He then said two amazing things: The first is that the BICEP2 result, if it holds up, will put the strongest ever bound on the graviton mass, because it means that gravitational radiation propagated a significant fraction of a Hubble length. The second is that the BICEP2 data are better fit by a model with a tiny but nonzero graviton mass than by the standard massless theory. That's insane! But of course early days and much skepticism about the data, let alone the theory. Great talks today!



In the morning, Juna Kollmeier (OCIW) gave a great talk on the intergalactic radiation fields (called "metagalactic" for reasons I don't understand). She has found a serious conflict between what is computed by any reasonable sum of sources, what is inferred from the outskirts of galaxies, and what is needed for local IGM studies. One possible resolution, which she was not particularly endorsing, is heating from dark-matter decay or annihilation. Neal Weiner (NYU) loved that idea, for obvious reasons. During the talk, several good project ideas came up, some of them related to the kinds of things Schiminovich has been thinking about, and some related to SDSS-IV MANGA data. Kollmeier convinced us that a next-generation experiment will just see the IGM!

After lunch, Bob Kirshner (CfA) gave a nice talk about how much more precise supernova cosmology might become if we could switch to (or include) rest-frame near-infrared imaging. He endorsed WFIRST pretty strongly! He also agreed explicitly that getting more SNe is not valuable unless there are associated precision or redshift-distribution improvements. That is, the SNe are systematics-limited; hence his concentration on infrared data, where precision is improved.

Late in the afternoon, Vakili sketched out a fully probabilistic approach to interpolating the point-spread function in imaging between observed stars (to, for example, galaxies being used in a weak-lensing study). Again with the Gaussian Processes. They are so damned useful!


smoothness priors

Foreman-Mackey and I had a long discussion about how to normalize smoothness priors. That is, if you just "regularize" a fit using differences between bin heights (think: making a smooth histogram), it is hard to compute analytically the resulting implicit prior. In the end we decided to use a proper Gaussian Process prior on our histogram bin heights, because then at least the normalization is a determinant, and we can now compute those super fast. In general: If you can solve a problem with a mature technology or else invent something yourself, you should use the mature technology! In this case, that's Gaussian Processes.


NRFG, day three

We did informal discussion and wrap-up at the workshop today. In that discussion, we tried to focus on next steps for dynamics and inference, in the context of Gaia's upcoming early data release. In some ways the clearest conclusion from this discussion came from Sésar (MPIA), who said that we should analyze all the data we have on each Milky Way stream, to find out both what it tells us about the MW potential, and also what new data (better distances, more radial velocities, and so on) would bring us. That would permit us to plan the next round of observing proposals and surveys.

The last agenda item was Bovy showing us all galpy. Binney (Oxford) and Rix both agreed that we should be building public code bases in this style. Bovy's code is beautifully documented and decorated with tutorials.

In the car to the airport, Rix, Schlafly (MPIA), and I discussed the three-dimensional dust map. I opined that we might be able to apply a spatial prior to the map in a post-processing step, and Schlafly agreed in principle. I promised to look at the question on the flight home. If it works, it is great for my interim-sampling, importance-sampling brand!


NRFG, day two

Today was streams day at the Heidelberg–Oxford meetup. Sanders (Oxford) and Bovy showed their tidal-stream-modeling machineries, with emphases on the relationship between action-space and angle-space structure, or really frequency-space and angle-space structure. They both work only in integrable potentials, which creates one of the opportunities that Price-Whelan and I might exploit. That said, Sanders and Bovy have both developed some great computational simplicities that make their methods far faster than ours. For example, they can compute actions in any potential fast, angles pretty fast, and use affine approximations to local transformations to speed up integration over "true" phase-space positions. Bovy argued that Sanders's result from last year on streams not following orbits needs adjustment, when you consider the full frequency distribution hiding in every section of the stream. Sésar showed beautiful data on the Orphan Stream and advertised new work on RR Lyrae stars. He showed pretty convincingly that the Orphan Stream just ends abruptly at a location in which we could easily still observe it. Odd!

Schlafly (MPIA) and Sale (Oxford) showed work on three-dimensional dust mapping, which is essential both for understanding the stars and also for providing dust as a new tracer of the potential. Sales is working on non-parametrics with Gaussian Processes, like Bailer-Jones (MPIA) and Hanson (MPIA), while Schlafly is more old-school with independent angular pixels. That said, Schlafly has a complete map! We all emphasized the value for Schlafly of publishing not just the three-dimensional map, but also an easy-to-use tool for querying it.

Sormani (Oxford) made novel use of the "earth-mover distance" to compare features in images (models of the (l, v) distribution of gas in the Galactic Center and also data). The day ended with Martig (MPIA) showing beautiful galaxy simulations to investigate out-of-plane disk structure. It looks like the Monoceros-type stuff seen at the outskirts of the Milky Way might be quite typical.