Oregon, falsifiability, and the LIGO project

Today was my first day of a two-day visit to Ben Farr (Oregon) and the University of Oregon. I got lots of work done during the travel phases of the day, because I have a NASA proposal due while I'm here in Oregon! Nothing like a deadline.

I had a great day. Highlights included a discussion with James Schombert (Oregon) about various philosophical matters related to falsification. He explicitly brought up my paper about plausibility and science, which I had nearly forgotten! It's nice to know that people are finding it useful still. I really wrote it to get some things off my chest, things that had been troubling me since graduate school in the 1990s. In that paper I argue that we prefer theories that are both observationally reasonable and also theoretically reasonable; there isn't really such a thing as purely empirical falsification. At least not in the observational sciences.

But of course the main theme of my visit was LIGO. The lure of discussing this project with Farr is what brought me here. We postponed our ideas for new projects until tomorrow and, maybe surprisingly, spent our time talking about university-based project management! Because although LIGO is well funded to build hardware and deliver strain measurements, what is done with those to detect and characterize systems and populations is left to the science community, which is a looser collaboration, and which must raise most of its money externally. And, like with the SDSS family of projects, relies on essentially volunteer efforts from many ornery faculty. That's an interesting set of problems in organizational management, psychology, and political science!


radial velocities from slit spectra

Marla Geha (Yale) made a surprise visit to Flatiron today, and bombed the weekly Stars and Exoplanets Meeting with a discussion of the challenges of measuring velocity dispersions (and hence masses, and hence dark-matter-annihilation limits) in ultra-faint dwarf galaxies in the halo of the Milky Way. As my loyal reader knows, this problem is very similar to problems we are working on at Flatiron around extreme-precision radial-velocity (EPRV) spectroscopy. Geha's problem is both harder and easier. It is easier because she only needs km/s (not cm/s) precision. It is harder because she has to use a slit spectrograph and point it at very faint stars! It is easier because she has both sky emission lines and telluric absorption lines to help calibrate. It is harder because differences in slit illumination mean that the sky lines and the telluric absorption don't agree for the wavelength calibration!

After stars meeting, the conversation continued among Geha, Bedell (Flatiron), and me. We discussed many things, including the point that the offset between tellurics and sky lines is a wavelength offset, not a radial-velocity offset. Or it is even something more sophisticated, related to the spectrograph optics. We discussed the point that her problems are fundamentally hierarchical, because some parameters are associated with a star, some with an exposure, some with a slit-mask, some with a time and so on. We also discussed how the wobbble framework that Bedell and I have built could be extended to capture these effects. It's certainly possible. Oh I nearly forgot: We also discussed masking and apodization of sky lines and telluric lines in the science spectra, and how to do that without biasing down-stream measurements. Spergel (Flatiron) pointed us to some literature that he was pleased to say is older than any of us (Spergel included).

I should say that Geha's admirable goal is to re-reduce all of the nearly 105df stellar spectra in the DEIMOS archive! Now that's my kind of project.


imaging asteroseismic modes on the stellar surface

Many threads of conversation over the past weeks came together today in a set of coincidences. Conversations with Bedell (Flatiron), Pope (NYU), Luger (Flatiron), and Farr (Flatiron) ranging around stochastic processes and inferring stellar surface features from doppler imaging all overlap at stellar asteroseismic p modes: In principle, with high-resolution, high-signal-to-noise stellar spectral time series (and we have these, in hand!) we should be able not only to see p modes but also see their footprint on the stellar surface. That is, directly read ell and em off the spectral data. In addition, we ought to be able to see the associated temperature variations. This is all possible because the stars are slowly rotating, and each mode projects onto the rotating surface differently. Even cooler than all this: Because the modes are coherent for days in the stars we care about, we can build very precise matched filters to combine the data coherently from many exposures. There are many things to do here.


predicting one population of transients from another

Tyler Pritchard (NYU) convenes a meeting on Mondays at NYU to discuss time-domain astrophysics. Today we had a discussion of a very simple idea: Use the rates of short GRBs that are observed and measured (using physical models from the MacFadyen group at NYU) to have certain jet–observer offset angles to infer rates for all the way-off-axis events that won't be GRB triggers but might be seen in LSST or other ground-based optical or radio surveys. Seems easy, right? It turns out it isn't trivial at all, because the extrapolation of a few well-understood events in gamma-rays, subject to gamma-ray selection effects to a full population of optical and radio sources (and then assessing those selection effects) requires quite a few additional or auxiliary assumptions. This is even more true for the bursts where we don't know redshifts. I was surprised to hear myself use the astronomy word "V-max"! But we still (as a group) feel like there must be low-hanging fruit. And this is a great application for the MacFadyen-group models, which predict brightness as a function of wavelength, time, and jet–observer angle.


hierarchical probabilistic calibration

Today Lily Zhao (Yale) visualized for me some of the calibration data they have for the EXPRES spectrograph at Yale. What she showed is that the calibration does vary at very high signal-to-noise, and that the variations are systematic or smooth. That is, the instrument varies only a tiny tiny bit, but it does so very smoothly and the smooth variations are measured incredibly precisely. This suggests that it should be possible to pool data from many calibration exposures to build a better calibration model for every exposure than we could get if we treated the data all independently.

Late in the day, we drew a graphical model for the calibration, and worked through a possible structure. As my loyal reader knows, I want to go to full two-dimensional modeling of spectrographs! But we are going to start with measurements made on one-dimensional extractions. That's easier for the community to accept right now, anyways!


forecasting tools; beautiful spectrograph calibration

Our five-person (Bedell, Hogg, Queloz, Winn, Zhao) exoplanet meeting continued today, with Winn (Princeton) working out the elements needed to produce a simulator for a long-term EPRV monitoring program with simple observing rules. He is interested in working out under what circumstances such a program can be informative about exoplanets in regimes that neither Kepler nor existing EPRV programs have strongly constrained, like near-Earth-masses on near-Earth-orbits around near-Sun stars. And indeed we must choose a metric or metrics for success. His list of what's needed, software-wise, is non-trivial, but we worked out that every part of it would be a publishable contribution to the literature, so it could be a great set of projects. And a very useful set of tools.

Zhao (Yale) showed me two-dimensional calibration data from the EXPRES instrument illuminated by their laser-frequency comb. It is astounding. The images are beautiful, and every single line in each image is at a perfectly known (from physics!) absolute wavelength. This might be the beginning of a very new world. The instrument is also beautifully designed so that all the slit (fiber, really, but it is a rectangular fiber) images are almost perfectly aligned with one of the CCD directions, even in all four corners of the image. Not like the spectrographs I'm used to!


do we need to include the committee in our model?

Josh Winn (Princeton) and Lily Zhao (Yale) both came in to Flatiron for a couple of days today to work with Megan Bedell (Flatiron), Didier Queloz (Cambridge), and me. So we had a bit of a themed Stars and Exoplanets Meeting today at Flatiron. Winn talked about various ways to measure stellar obliquities (that is, angles between stellar-rotation angular momentum vectors and planetary system angular-momentum vectors). He has some six ways to do it! He talked about statistical differences between vsini measurements for stars with and without transiting systems.

Zhao and Queloz talked about their respective big EPRV programs to find Earth analogs in radial-velocity data. Both projects need to get much more precise measurements, and observe fewer stars (yes fewer) for longer times. That's the direction the field is going, at least where it concerns discovery space. Queloz argued that these are going to be big projects that require patience and commitment, and that it is important for new projects to control facilities, not just to apply for observing time each semester! And that's what he has with the Terra Hunting Experiment, in which Bedell, Winn, and I are also partners.

Related to all that, Zhao talked about how to make an observing program adaptive (to increase efficiency) without making it hard to understand (for statistical inferences at the end). I'm very interested in this problem! And it relates to the Queloz point, because if a time allocation committee is involved every semester, any statistical inferences about what was discovered would have to model not just the exoplanet population but also the behavior of the various TACs!


normalizing flows; information theory

At lunchtime I had a great conversation with Iain Murray (Edinburgh) about two things today. One was new ideas in probabilistic machine learning, and the other was this exoplanet transit spectroscopy challenge. On the former, he got me excited about normalizing flows, that use machine learning methods (like deep learning) and a good likelihood function to build probabilistic generative models for high dimensional data. These could be useful for astronomical applications; we discussed. On the latter, we discussed how transits work and how sunspots cause trouble for them. And how the effects might be low dimensional. And thus how a good machine-learning method should be able to deal with it or capture it.

In the afternoon I spent a short session with Rodrigo Luger (Flatiron) talking about the information about a stellar surface or about an exoplanet surface encoded in a photometric light curve. The information can come from rotation, or from transits, or both, and it is different (there is more information), oddly, if there is limb darkening! We talked about the main points such a paper should make, and some details of information theory. The problem is nice in part because if you transform the stellar surface map to spherical harmonics, a bunch of the calculations lead to beautiful trigonometric forms, and the degeneracy or eigenvector structure of the information tensor becomes very clear.


eclipsing binaries

I had a good conversation with with Laura Chang (Princeton) today, who is interested in doing some work in the area of binary stars. We discussed the point that many of the very challenging things people have done with the Kepler data in the study of exoplanets—exoplanet detection, completeness modeling, populations inferences— are very much easier in the study of eclipsing binary stars. And the numbers are very large: The total number of eclipsing binary systems found in the Kepler data is comparable to the total number of exoplanets found. And there are also K2 and TESS binaries! So there are a lot of neat projects to think about for constraining the short-period binary population with these data. We decided to start by figuring out what's been done already.


Pheno 2019, day 3

I spent the day at Pheno 2019, where I gave a plenary about Gaia and dark matter. It was a fun day, and I learned a lot. For example, I learned that when you have a dark photon, you naturally get tiny couplings between the dark matter and the photon, as if the dark matter has a tiny charge. And there are good experiments looking for milli-charged particles. I learned that deep learning methods applied to LHC events are starting to approach information-theoretic bounds for classifying jets. That's interesting, because in the absence of a likelihood function, how do you saturate bounds? I learned that the Swampland (tm) is the set of effective field theories that can't be represented in any string theory. That's interesting: If we could show that there are many EFTs that are incompatible with string theory, then string theory has strong phenomenological content!

In the last talk of the day, Mangano (CERN) talked about the future of accelerators. He made a very interesting point, which I have kind-of known for a long time, but haven't seen articulated explicitly before: If you are doing a huge project to accomplish a huge goal (like build the LHC to find the Higgs), you need to design it such that you know you will produce lots and lots of interesting science along the way. That's an important idea, and it is a great design principle for scientific research.



I spent a bit of research time today writing up my ideas about what we might do with The Snail (the local phase spiral in the vertical dynamics discovered in Gaia data) to infer the gravitational potential (or force law, or density) in the Milky Way disk. The idea is to model it as an out-of-equilibrium disturbance winding up towards equilibrium. My strong intuition (that could be wrong) is that this is going to be amazingly constraining on the gravitational dynamics. I'm hoping it will be better (both in accuracy and precision) than equilibrium methods, like virial theorem and Jeans models. I sent my hand-written notes to Hans-Walter Rix (MPIA) for comments.


not much

My only research events today were conversations with Eilers, Leistedt, and Pope about short-term strategies.


Dr Alex Malz!

Today it was my great pleasure to participate in the PhD defense of my student Alex Malz (NYU). His dissertation is about probabilistic models for next-generation cosmology surveys (think LSST but also Euclid and so on). He showed that it is not trivial to store, vet, or use probabilistic information coming from these surveys, using photometric-redshift outputs as a proxy: The surveys expect to produce probabilistic information about redshift for the galaxies they observe. What do you need to know about these probabilistic outputs in order to use them? It turns out that the requirements are strong and hard. A few random comments:

On the vetting point: Malz showed with an adversarial attack that the ways cosmologists were comparing photometric-redshift probability outputs across different codes were very limited: His fake code that just always returned the prior pdf did as well on almost all metrics as the best codes.

On the requirements point: Malz showed that you need to know all the input assumptions and priors on any method in order to be able to use its output, especially if its output consists of posterior information. That is, you really want likelihood information, but no methods currently output that (and many couldn't even generate it because they aren't in the form of traditional inferences).

On the storage point: Malz showed that quantiles are far better than samples for storing a pdf! The results are very strong. But the hilarious thing is that the LSST database permits up to 200 floating-point numbers for storage of the pdf, when in fact the photometric redshifts will be based on only six photometric measurements! So, just like in many other surveys that I care about, the LSST Catalog will represent a data expansion, not a data reduction. Hahaha!

It was a great talk, and in support of a great dissertation. And a great day.


Dr Mandyam

Today I had the pleasure of serving on the PhD committee for Nitya Mandyam Doddamane, who defended her thesis on the measurement of star-formation rates and stellar masses in spectroscopic surveys of galaxies. She compared different stellar populations models, based on different parts of the galaxy spectral energy distributions, and galaxy environments, to make inferences about which galaxies are and aren't forming stars. She has some nice examples that use environment to break some degeneracies in interpretation. In that sense, some of what she did was a causal inference. She also looked at aperture biases, comparing fiber spectroscopy to integral-field spectroscopy from various SDSS surveys. Her results are nice, and were beautifully presented, both in the talk and in the thesis. Congratulations Dr Mandyam!


Galactic archaeology

It's a long story, but we have been experimenting continuously with the rules and principles underlying the weekly Stars and Exoplanets Meeting that we run at Flatiron for the NYC astrophysics community. One of the things I say about it is that if you want a meeting to be open, supportive, easy, and community-building, it has to have a strong set of draconian rules! In our most recent set of discussions, we have been talking about theming the meetings around specific science themes. Today was our first experiment with that! Joss Bland-Hawthorn (Sydney) is in town, so we themed the meeting around Galactic Archaeology. We had five short discussions; here are some highlights:

Megan Bedell (Flatiron) showed her incredibly precise 35-element (?) abundance measurements vs stellar age for her Solar twin sample. The abundances are very closely related to the age (for this sample that is selected to have Solar [Fe/H]). Suroor Gandhi (NYU) showed her results on the dependence on dynamical (or really kinematic) actions on [Fe/H] and age for low-alpha and high-alpha stars in the local Milky Way disk. These show that the two different sequences (high and low alpha) have different origins. And Rocio Kiman (CUNY) showed her M dwarf kinematics as a function of magnetic activity that could be used to constrain a disk heating model. All three of these presentations could benefit (for interpretation) from a forward model of star formation and radial migration in the Milky Way disk, along with heating! This is related to things I have done with Neige Frankel (MPIA) but would require extensions. Simple extensions, though.

Adam Wheeler (Columbia) showed us abundances he has measured all over the Milky Way from LAMOST spectroscopy, training a version of The Cannon with GALAH abundances. It's an amazing data set, and he asked us to brainstorm ideas about what we could do with it. He seems to have features in his catalog that look similar to the midplane issues that were causing me existential angst this past August. Bland-Hawthorn said that he sees similar things in the GALAH data too.

And Bland-Hawthorn himself talked about the possibility that some future instrument could measure stellar accelerations and get the Milky Way acceleration field directly! He started by commenting on the conclusions of the Bonaca et al work on a possible dark-matter perturber acting on the GD-1 stellar stream. His remarks played very well with things Bonaca and I have been discussing around making a non-parametric acceleration map of the Milky Way.

In summary: A great experiment!