testing tacit knowledge about measurements

In astrometry, there is folk knowledge (tacit knowledge?) that the (best possible) uncertainty you can obtain on any measurement of the centroid of a star in an image is proportional to the with the size (radius or diameter or FWHM) of the point-spread function, and inversely proportional to the signal-to-noise ratio with which the star is detected in the imaging. This makes sense: The sharper a star is, the more precisely you can measure it (provided you are well sampled and so on), and the more data you have, the better you do. These are (as my loyal reader knows) Cramer–Rao bounds. And related directly to Fisher information.

Oddly, in spectroscopy, there is folk knowledge that the best possible uncertainty you can obtain on the radial-velocity of a star is proportional to the square-root of the width (FWHM) of the spectral lines in the spectrum. I was suspicious, but Bedell (Flatiron) demonstrated this today with simulated data. It's true! I was about to resign my job and give up, when we realized that the difference is that the spectroscopists don't keep signal-to-noise fixed when they vary the line widths! They keep the contrast fixed, and the contrast appears to be the depth of the line (or lines) at maximum depth, in a continuum-normalized spectrum.

This all makes sense and is consistent, but my main research event today was to be hella confused.


optimization is hard

Megan Bedell (Flatiron) and I worked on optimization for our radial-velocity measurement pipeline. We did some experimental-coding on scipy optimization routines (which are not documented quite as well as I would like), and we played with our own home-built gradient-descent. It was a roller-coaster, but we still get some unexpected behaviors. Bugs clearly remain, which is good, actually,
because it means that we can only do better than how we are doing now, which is pretty good.


if your sample isn't contaminated, you aren't trying hard enough

At Gaia DR2 parallel-working meeting, Adrian Price-Whelan (Princeton) and I discussed co-moving stars with Jeff Andrews (Crete). Our discussion was inspired by the fact that Andrews has written some pretty strongly worded critical things about our own work with Semyeong Oh (Princeton). We clarified that there are three (or maybe four) different things you might want to be looking for: stars that have the same velocities (co-moving stars), stars that are dynamically bound (binaries), or stars that were born together (co-eval) or have the same birthplace or same abundances or same ages etc.. In the end we agreed that different catalogs might be made with different goals in mind, and different tolerances for completeness and purity. But one thing I insisted on (and perhaps pretty strongly) is that you can't have high completeness without taking on low purity. That is, you have to take on contamination if you want to sample the full distribution.

This is related to a much larger point: If you want a pure and complete sample, you have to cut your data extremely hard. Anyone who has a sample of anything that is both pure and complete is either missing large fractions of the population they care about, or else is spending way too much telescope time per object. In any real, sensible sample of anything in astronomy that is complete, we are going to have contamination. And any models or interpretation we make of the sample must take that contamination into account. Any astronomers who are unwilling to live with contamination are deciding not to use our resources to their fullest, and that's irresponsible, given their preciousness and their expense.


latent variable models: What's the point?

The only research time today was a call with Rix (MPIA) and Eilers (MPIA) about data-driven models of stars. The Eilers project is to determine the stellar luminosities from the stellar spectra, and to do so accurately enough that we can do Milky-Way mapping. And no, Gaia won't be precise enough for what we need. Right now Eilers is comparing three data-driven methods. The first is a straw man,
which is nearest-neighbor! Always a crowd-pleaser, and easy. The second is The Cannon, which is a regression, but fitting the data as a function of labels. That is, it involves optimizing a likelihood. The third is the GPLVM (or a modification thereof) where both the data and the labels are a nonlinear function of some uninterpretable latent variables.

We spent some of our time talking about exactly what are the benefits of going to a latent-variable model over the straight regression. We need benefits, because the latent-variable model is far more computationally challenging. Here are some benefits:

The regression requires that you have a complete set of labels. Complete in two senses. The first is that the label set is sufficient to explain the spectral variability. If it isn't, the regression won't be precise. It also needs to be complete in the sense that every star in the training set has every label known. That is, you can't live with missing labels. Both of these are solved simply in the latent-variable model. The regression also requires that you not have an over-complete set of labels. Imagine that you have label A and label B and a label that is effectively A+B. This will lead to singularities in the regression. But no problem for a latent-variable model. In the latent-variable model, all data and all known labels are generated as functions (nonlinear functions drawn from a Gaussian process, in our case) of the latent variables. And those functions can generate any and all data and labels we throw at them. Another (and not unrelated) advantage is in the latent-variable formulation is that we can have a function space for the spectra that is higher (or lower) dimensionality than the label space, which can cover variance that isn't label-related.

Finally, the latent-variable model has the causal structure that most represents how stars really are: That is, we think star properties are set by some unobserved physical properties (relating to mass, age, composition, angular momentum, dynamo, convection, and so on) and the emerging spectrum and other properties are set by those intrinsic physical properties!

One interesting thing about all this (and brought up to me by Foreman-Mackey last week) is that the latent-variable aspect of the model and the Gaussian-process aspect of the model are completely independent. We can get all of the (above) advantages of being latent-variable without the heavy-weight Gaussian process under the hood. That's interesting.


playing with stellar spectra; dimensionality

In another low-research day, I did get in a tiny bit of work time with Bedell (Flatiron). We did two things: In the first, we fit each of her Solar twins as a linear combination of other Solar twins. Then we looked for spectral deviations. It looks like we find stellar activity in the residuals. What else will we find?

In the second thing we did, we worked through all our open threads, and figured out what are the next steps, and assigned tasks. Some of these are writing tasks, some of these are coding tasks, and some are thinking tasks. The biggest task I am assigned—and this is also something Rix (MPIA) is asking me to do—is to write down a well-posed procedure for deciding what the dimensionality is of a low-dimensional data set in a high-dimensional space. I don't like the existing solutions in the literature, but as Rix likes to remind me: I have to put up or shut up!


batman and technetium

Today was a low-research day (letters of recommendation), but Elisabeth Andersson (NYU) and I got an optimization working, comparing a batman periodic transit model to a Kepler light curve. I left her with the problem of characterizing the six planets in the Trappist 1 system.

At lunch, Foreman-Mackey (Flatiron) proposed a model for stellar spectra that is intermediate in sophistication and computational complexity between The Cannon and the Eilers (MPIA) GPLVM. He also has a fast implementation in TensorFlow. Most of the TensorFlow speed-up comes from its clever use of GPUs. Late in the day, Bedell proposed that we look for short-lived radioactive isotopes in her Solar twins. That’s a great idea!


Fisher matrix manipulations

Not much happened today, research-wise. But one great thing was a short call with Ana Bonaca, in which we reviewed what we are doing with our Fisher matrices. We are doing the right things! There are two operations you want to do: Change variables, and marginalize out nuisances. These look pretty different. That is, if you just naively change variables to a single variable, and don't marginalize out anything, the operation is an outer product, with the Fisher matrix as metric, but it is equivalent to assuming that all else is fixed. That is, it slices your likelihood. This is not usually the conservative move, which is to either marginalize (if you are Bayesian) or profile (if you are frequentist). These operations involve inverses of Fisher matrices. Some of the relevant details are in section 2 of this useful paper.


quality time with the iPTA

The research highlight of the day was a couple of hours spent with the iPTA data analysis collaboration. Justin Ellis (WVU) led an overview and extremely interactive discussion of their likelihood function, which they use to detect the gravitational radiation stochastic background from pulsar timing, in the presence of systematic nuisances. These include time-variable dispersion measure, red noise, accelerations and spin-down, receiver and backend calibrations, ephemeris issues, and more! The great cleverness is that they linearize and apply Gaussian priors, so they can make use of all the beautiful linear algebra that my loyal reader hears so much about. The likelihood function is a thing of beauty, and computationally tractable. They asked me for advice, but frankly, I’m not worthy.


objective Bayes; really old data

Today was day two with the Galactic Center Group at UCLA. Again, a huge argument about priors broke out. As my loyal reader knows, I am a subjective Bayesian, not an objective Bayesian. Or more correctly “I don't always adopt Bayes, but when I do, I adopt subjective Bayes!” But the argument was about the best way to set objective-Bayes priors. My position is that you can't set them in the space of your parameters, because your parameterization itself is subjective. So you have to set them in the space of your data. That's exactly what the Galactic Center Group at UCLA is doing, and they can show that it gives them much better results (in terms of bias and coverage) than setting the priors in dumber “flat” ways (which is standard in the relevant literature).

One incredible thing about the work of this group is that they are still using, and still re-reducing, imaging data taken in the 1990s! That means that they are an amazing example of curation and preservation of data and reproducibility and workflow and etc. For this reason, there were information scientists at the meeting this week. It is an interesting consideration when thinking about how a telescope facility is going to be used: Will your data still be interesting 22 years from now? In the case of the Galactic Center, the answer turns out to be a resounding yes.


Galactic Center review

I spent the day at UCLA, reviewing the data-analysis work of the Galactic Center Group there, for reporting to the Keck Foundation. It was a great day on a great project. They have collected large amounts of data (for more than 20 years!), both imaging and spectroscopy, to tie down the orbits of the stars near the Galactic Center black hole, and also to tie down the Newtonian reference frame. The approach is to process imaging and spectroscopy into astrometric and kinematic measurements, and then fit those measurements with a physical model. Among the highlights of the day were arguments about priors on orbital parameters, and descriptions of post-Newtonian terms that matter if you want to test General Relativity. Or test for the presence of dark matter concentrated at the center of the Galaxy.


the assumptions underlying EPRV

The conversation on Friday with Cisewski and Bedell got me thinking all weekend. It appears that the problem of precise RV difference measurement becomes ill-posed once we permit the stellar spectrum to vary with time. I felt like I nearly had a breakthrough on this today. Let me start by backing up.

It is impossible to obtain exceedingly precise absolute radial velocities (RVs) of stars, because to get an absolute RV, you need a spectral model that puts the centroids of the absorption lines in precisely the correct locations. Right now physical models of convecting photospheres have imperfections that lead to small systematic differences in line shapes, depths, and locations between the models of stars and the observations of stars. Opinions vary, but most astronomers would agree that this limits absolute RV accuracy at the 0.3-ish km/s level (not m/s level, km/s level).

How is it, then, that we measure at the m/s level with extreme-precision RV (EPRV) projects? The answer is that as long as the stellar spectrum doesn't change with time, we can measure relative velocity changes to arbitrary accuracy! That has been an incredibly productive realization, leading as it did to the discovery, confirmation, or characterization of many hundreds of planets around other stars!

The issue is: Stellar spectra do change with time! There is activity, and also turbulent convection, and also rotation. This puts a long-term wrench in the long-term EPRV plans. It might even partially explain why current EPRV projects never beat m/s accuracy, even when the data (on the face of it) seem good enough to do better. Now the question is: Do the time variations of stellar spectra put an absolute floor on relative-RV measurement? That is, do they limit ultimate precision?

I think the answer is no. But the Right Thing To Do (tm) might be hard. It will involve making some new assumptions. No longer will we assume that the stellar spectrum is constant with time. But we will have to assume that spectral variations are somehow uncorrelated (in the long run) with exoplanet phase. We might also have to assume that the exoplanet-induced RV variations are dynamically predictable. Time to work out exactly what we need to assume and how.


all about radial velocities

The day started with a conversation among Stuermer (Chicago), Montet (Chicago), Bedell (Flatiron), and me about the problem of deriving radial velocities from two-d spectroscopic images rather than going through one-d extractions. We tried to find scope for a minimal paper on the subject.

The day ended with a great talk by Jessi Cisewski (Yale) about topological data analysis. She finally convinced me that there is some there there. I asked about using automation to find best statistics, and she agreed that it must be possible. Afterwards, Ben Wandelt (Paris) told me he has a nearly-finished project on this very subject. Before Cisewski's talk, she spoke to Bedell and me about our EPRV plans. That conversation got me concerned about the non-identifiability of radial velocity if you let the stellar spectrum vary with time. Hmm.


what's the circular acceleration?

Ana Bonaca (Harvard) and I started the day with a discussion that was in part about how to present our enormous, combinatoric range of results we have created with our information-theory project. One tiny point there: How do you define the equivalent of the circular velocity in a non-axi-symmetric potential? There is no clear answer. One is to do something relating to averaging the acceleration around a circular ring. Another is to use v2/R locally. Another is to use that locally, but on the radial component of the acceleration.

While I was proctoring an exam, Megan Bedell (Flatiron) wrote me to say that our one-d, data-driven spectroscopic RV extraction code is now performing almost as well as the HARPS pipeline, on real data. That's exciting. We had a short conversation about extending our analysis to more stars to make the point better. We believe that our special sauce is our treatment of the tellurics, but we are not yet certain of this.


Gaia-based training data, GANs, and optical interferometry

In today's Gaia DR2 working meeting, I worked with Christina Eilers (MPIA) to build the APOGEE+TGAS training set we could use to train her post-Cannon model of stellar spectra. The important idea behind the new model is that we are no longer trying to specify the latent parameters that control the spectral generation; we are using uninterpreted latents. For this reason, we don't need complete labels (or any labels!) for the training set. That means we can train on, and predict, any labels or label subset we like. We are going to use absolute magnitude, and thereby put distances onto all APOGEE giants. And thereby map the Milky Way!

In stars group meeting, Richard Galvez (NYU) started a lively discussion by showing how generative adversarial networks work and giving some impressive examples on astronomical imaging data. This led into some good discussion about uses and abuses of complex machine-learning methods in astrophysics.

Also in stars meeting, Oliver Pfuhl (MPA) described to us how the VLT four-telescope interferometric imager GRAVITY works. It is a tremendously difficult technical problem to perform interferometric imaging in the optical: You have to keep everything aligned in real time to a tiny fraction of a micron, and you have little carts with mirrors zipping down tunnels at substantial speeds! The instrument is incredibly impressive: It is performing milli-arcsecond astrometry of the Galactic Center, and it can see star S2 move on a weekly basis!.


purely geometric spectroscopic parallaxes

Today was a low research day; it got cut short. But Eilers made progress on the semi-supervised GPLVM model we have been working on. One thing we have been batting around is scope for this paper. Scope is challenging, because the GPLVM is not going to be high performance for big problems. Today we conceived a scope that is a purely geometric spectroscopic parallax method. That is, a spectroscopic parallax method (inferring distances from spectra) that makes no use of stellar physical models whatsoever, not even in training!


Spitzer death; nearest neighbors

Today was spent at the Spitzer Science Center for the 39th meeting of the Oversight Committee, on which I have served since 2008. This meeting was just like every other: I learned a huge amount! This time about how the mission comes to a final end, with the exercise of various un-exercised mechanisms, and then the expenditure of all propellants and batteries. We discussed also the plans for the final proposal call, and the fitness of the observatory to observe way beyond its final day. On that latter note: We learned that NASA will transfer operations of Spitzer to a third party, for about a million USD per month. That's an interesting opportunity for someone. Or some consortium.

In unrelated news, Christina Eilers (MPIA) executed a very simple (but unprecedented) idea today: She asked what would happen with a data-driven model of stellar spectra (APOGEE data) if the model is simply nearest neighbor: That is, if each test-set object is given the labels of its nearest (in a chi-squared sense) training-set object. The answer is impressive: the nearest-neighbor method is only slightly worse than the quadratic data-driven model known as The Cannon. This all relates to the point that most machine-learning methods are—in some sense—nearest-neighbor methods!


seeing giants shrink in real time? the dark matter

At parallel-working session in my office at NYU, I worked with Lauren Blackburn (TransPerfect) to specify a project on clustering and classification of red-giant asteroseismic spectra. The idea (from Tim Bedding's group at Sydney) is to distinguish the stars that are going up the red-giant branch from the ones coming down. Blackburn asked if we could just see the spectra change with time for the stars coming down. I said “hell no” and then we wondered: Maybe?. That's not the plan, but we certainly should check that!

In the NYU Astro Seminar, Vera Glusevic (IAS) gave a great talk on inferring the physical properties of the dark matter (that is, not just the mass and cross-section, but real interaction parameters in natural models. She has results that combinations of different direct-detection targets, being differently sensitive to spin-dependent interactions, could be very discriminatory. But she did have to assume large cross sections, so her results are technically optimistic. She then blew us away with strong limits on dark-matter models using the CMB (and the dragging of nuclei by dark-matter particles in the early universe). Great, and ruling out some locally popular models!

Late in the day, Bedell and I did a writing workshop on our EPRV paper. We got a tiny bit done, which should be called not “tiny” but really a significant achievement. Writing is hard.