Christina Eilers (MIT) and I discussed our ESA *Gaia* EDR3 projects today. Our top priority is to re-do our machine-learning (linear regression, really) spectrophotometric distance estimates for very luminous red-giant stars, and then re-map the Milky Way disk in abundances and kinematics. We think that even a small improvement in the parallaxes (as we expect to get on Thursday) might make a big difference to our inferred spectroscopic distances. We discussed the point that in our DR2 work we only used stars on the “low-alpha sequence”; we want to generalize if we are going to make complete abundance maps. But also the stars with different abundance trends might want very different distance estimation parameters. That suggests doing the EDR3 regression in a more “abundance-aware” way.

## 2020-11-30

### re-doing spectroscopic distances in EDR3 with high-alpha too

## 2020-11-27

### writing about regression

I spent my research time today (hiding from the world; it's closed here in the US because of Thanksgiving plus pandemic) writing in my paper with Soledad Villar (JHU) about how to perform regression with very flexible models (like polynomials, wavelets, and Fourier modes).

## 2020-11-25

### getting ready for EDR3: cold streams

ESA *Gaia* EDR3 is next week! I have been trying to get ready in various ways. Today Ana Bonaca (Harvard) showed me remarkable evidence that the cold stellar streams in the Milky Way halo are clustered in kinematic space, maybe also with the halo globular clusters! We discussed things to do in this area. It's perfect for EDR3 because if the clustering is real, it should improve with the improved precision of the EDR3 data.

## 2020-11-24

### how many black holes?

I spoke with Katie Breivik (Flatiron) today about a project to paint toy binary stars (from Breivik's model of how binaries form and evolve) onto toy spectroscopic targets (from Neige Frankel's model of how the Milky Way disk formed) to see how many binary stars and how many black-hole (or compact-object) binaries Adrian Price-Whelan (Flatiron) and I should be finding in the *APOGEE* survey. The project is simple in principle, but the matching up of differently simulated catalogs is a conceptual and administrative challenge! The hope for this project is that we can constrain something about the formation of black-hole binaries by the observation that *we don't find any* (or don't find very many) in *APOGEE*!

## 2020-11-23

### a discriminative model for EPRV

Based on things I have been learning about linear regression this year, I suggested this morning to Lily Zhao (Yale) that she replace our generative model for stellar spectral variability with a discriminative model, which tries to predict the radial-velocity offset of a star from changes in the stellar spectral shape. It's not a trivial model, since the number of parameters (features) is *immense* (more than 200,000) and the number of training-set examples is small (45-ish). But by the afternoon *she did it* and *it works*. Indeed, it looks like it works better than the generative models we have, even when tested by cross-validation.

## 2020-11-21

### selection function and white dwarfs

Hans-Walter Rix (MPIA) and I worked on our project to explain, elucidate, and determine the selection function (for ESA *Gaia* and other surveys) today. We decided that a good toy problem is the luminosity function of white dwarf stars. This is a good toy problem because the selection function is necessary, but they aren't so far away that the full three-dimensional dust map is required. I wrote words about this problem in a latex document to get us started.

## 2020-11-19

### a selection function for Gaia

This morning we had a call to begin the *GaiaUnlimited* project, which is a multi-institution collaboration to make a useful selection function for the ESA *Gaia* Catalog and Mission. The idea is: *Gaia* produces catalogs, but it has no deliverable mask or selection probability, so it is not possible to use the catalogs for certain (maybe most?) statistical purposes without additional information. We are going to try to construct that information for the community. Today we kicked off this project, and discussed the scope. We decided that all observational selections are in, but the three-dimensional dust map in the Milky Way is out!

After the call, Rix (MPIA) and I decided that we have to find some good example projects that make use of the selection function but don't need the dust map, because we want to be customers for the project as well as owners.

By the way, this project was started at a #GaiaSprint!

## 2020-11-18

### the physics of “do we live in a simulation?”

I had a good call with Paula Seraphim (NYU) today, who is doing a literature search on the physics relevant to the question “Do we live in a simulation?”. She found the classic papers by Dyson and by Frautschi in the 1980s on information processing in the expanding universe, and by Feynman on whether one quantum system can exactly simulate another. Meanwhile (really yesterday), I have got ready a class to teach for Blakesley Burkhart (Rutgers) about this subject: What does a physicist (as opposed to a philosopher, say) have to say about this question?

## 2020-11-17

### BOSS bright limit and Gaia parallax quality

Today I finished my open-fiber proposal, with Adrian Price-Whelan (Flatiron). We discovered that one of the oddities that we discovered yesterday—to wit, that there are no spectra of very luminous red giants—comes from an interaction between any sensible parallax signal-to-noise cut on the ESA *Gaia* data, and the bright limit on the *SDSS* visible spectrographs. Brutal! We have to select in some way that doesn't make the signal-to-noise cut (since the bright limit is unavoidable). I have ideas (one of which is in this paper), but I didn't have time to implement them before we had to submit the proposal. Oh well! We will get opportunities to update our target lists later.

## 2020-11-16

### more open fiber proposal

My research time today was spent writing in my *SDSS-V* open-fiber proposal. Adrian Price-Whelan (Flatiron) got and selected the relevant ESA *Gaia* data, and we did experiments with boxelization of the color-magnitude diagram. We are finding that there are (to my surprise) *many* *SDSS-II*, *SDSS-III*, and *SDSS-IV* spectra of the stars in between the main sequence and the white dwarf sequence (home of CVs, stripped stars, and low-metallicity stars). Thousands! But to my equally large surprise, there are almost no optical spectra of the most luminous giant stars. What gives?

## 2020-11-13

### open-fiber proposal

The *SDSS-V* project uses robot fiber positioners to take millions of short (15-ish minutes per visit) spectra in the visible and infrared. Because of the geometric constraints of the fiber positioners, and the targeting, there will be many, many unusued fibers—meaning, many opportunities to add additional spectroscopic targets! The project issued an internal call for proposals for the open fibers. Today I spent time writing one, which is very simple: It is to fill out the unobserved parts of the ESA *Gaia* color-magnitude diagram, but targeting stars for spectroscopy that do not already have a nearby star with a spectrum. The word “nearby” implies a resolution (how nearby?). The proposals are due in a few days and we still don't know exactly what our resolution should be! Also, do we treat variable stars differently from non-variable stars? We have work to do!

## 2020-11-12

### complexifying a residuals model for EPRV

Lily Zhao (Yale) has built (what I call) a generative model for the residuals (in the spectral domain) of the spectra taken by *EXPRES* for a magnetically active star away from the mean spectrum. This generative model takes the pipeline-generated radial-velocity as the label that generates the residuals. Then we do inference to find out whether spectral-shape variations predict or can correct pipeline-generated radial velocities. This model is very conceptually like *The Cannon*. Today we generalized this model to take two or more pieces of meta-data or labels that can be used to generate the residuals, and then still do inference to correct the radial velocities. We'll see if it helps. My intuition says it does help.

## 2020-11-11

### preparing a NeurIPS tutorial

In December (the 7th, to be precise), Kate Storey-Fisher (NYU) and I will be giving a tutorial at the *NeurIPS* Conference. Our tutorial will be on on machine learning and astronomy. Ordinarily we'd get slides ready and be ready to present but two things are making our preparation harder: One is that we want to deliver not just content, but also working Jupyter notebooks that show the participants how to get and use astronomical data of different kinds. The other is that the online format means that we need to pre-record parts of our tutorial. That's daunting, somehow! We spent a good part of today discussing content, scheduling, and making slides.

## 2020-11-10

### Oort constants and the first-order MySpace

Today we realized or re-realized that the linear terms in our *MySpace* project (the project to find the coordinate transformation that maximizes the informative-ness of the velocity sub-structure in the Milky Way disk) are just the Oort constants, or a generalization thereof. But since they maximize the velocity structure, they don't necessarily mean, for us, what Oort expected them to mean. We also got the method working to first order. Or I should say that Adrian Price-Whelan (Flatiron) did. He made use of *jax*, the auto-differentiation package for *Python*. It's impressive.

## 2020-11-09

### writing down the linear regression GP relationship

My loyal reader knows that I have been working on the fundamentals of linear regression for a bit now. Today I did some writing on this topic. Last week, Soledad Villar (JHU) and I got the point that we could write down a specific case where the limit of infinite features in a particular, carefully designed linear regression becomes exactly a Gaussian Process with a particular, carefully chosen kernel. I understand how to generalize this result partially, but not completely: Apparently this will work in an infinity of different bases, with an infinity of different weighting functions or kernels. My goal is to write something pedagogical and useful for practitioners.

## 2020-11-06

### human aspects of data analysis

We had a fun data group meeting today, in which we discussed many human aspects of data analysis (like asking questions in talks and seminars, and sharing work when it is in pre-publication status). I spoke about the connection between Gaussian processes and linear fitting with enormous numbers of basis functions; there is a limit in which they become identical, which is awesome. Group meeting was followed by a conversation with Storey-Fisher about what we are going to work on next: Pulsar timing? Looking for anomalies in large-scale structure? Intensity mapping?

## 2020-11-05

### talking about writing

It was a low-research day today. But I did have a great conversation with Viviana Acquaviva (CUNY) about the textbook she is writing on machine learning for advanced undergraduates in the natural sciences.

## 2020-11-04

### a Gaussian process is a limit of linear regression

Today Soledad Villar (JHU) and I completed a problem I've had open for literally years (I think I first worked on it in AstroHackWeek 2017): Does a linear fit become a Gaussian process when the number of components (parameters) goes to infinity? The answer is *yes*! But you have to choose your features very carefully, and take the limit (to infinite features) sensibly. But if you meet those conditions it works, and the kernel function for the GP becomes a Fourier transform of the squares of the amplitudes of the features. That is, the kernel function in real space is the Fourier transform of the power spectrum in fourier space. There are many details I don't yet understand, but we got it working both theoretically (on paper) and numerically (on the computer).

## 2020-11-03

### writing philosophy about EPRV

Lily Zhao (Yale), Megan Bedell (Flatiron), and I are working on a project to look at stellar spectral variations in extreme-precision radial-velocity spectroscopy, with *EXPRES* data. Do these stellar spectral variations tell you anything about stellar noise that distort radial-velocity measurements? This project is very specific and technical, but it connects to some deep ideas in velocity measurement:

In principle you can only precisely measure radial-velocity *changes* in a star, never the precise absolute or systemic radial velocity. But this precision argument depends on having a constant spectrum. If the spectrum varies, there is no rock to stand on. So this project requires some philosophical backing, I think. I tried to write some of that down this morning. I love stuff like this!

## 2020-11-02

### rebooting MySpace

Some people of a certain age will know what it means when you say that MySpace is dead. But the statement is wrong! Today Jason Hunt (Flatiron) rebooted an old project by Price-Whelan and mine called MySpace. The motivation for our reboot: The upcoming ESA *Gaia* EDR3.

The idea is to figure out how the velocity-space structure (the moving groups, as it were) in the local disk varies with position, with a data-driven model, and then interpret the variations with position in terms of dynamical properties of the Milky Way. Hunt's innovation is to apply this same procedure to simulations as well as data, and use the output to classify the velocity substructure (classify as in: Does it come from resonances or disrupting clusters or what?). We discussed some of the math and optimization involved. Because we phrase this as the fitting of an expansion.