Hans-Walter Rix (MPIA) and I worked on our project to explain, elucidate, and determine the selection function (for ESA Gaia and other surveys today. We decided that a good toy problem is the luminosity function of white dwarf stars. This is a good toy problem because the selection function is necessary, but they aren't so far away that the full three-dimensional dust map is required. I wrote words about this problem in a latex document to get us started.
I had a good call with Paula Seraphim (NYU) today, who is doing a literature search on the physics relevant to the question “Do we live in a simulation?”. She found the classic papers by Dyson and by Frautschi in the 1980s on information processing in the expanding universe, and by Feynman on whether one quantum system can exactly simulate another. Meanwhile (really yesterday), I have got ready a class to teach for Blakesley Burkhart (Rutgers) about this subject: What does a physicist (as opposed to a philosopher, say) have to say about this question?
The SDSS-V project uses robot fiber positioners to take millions of short (15-ish minutes per visit) spectra in the visible and infrared. Because of the geometric constraints of the fiber positioners, and the targeting, there will be many, many unusued fibers—meaning, many opportunities to add additional spectroscopic targets! The project issued an internal call for proposals for the open fibers. Today I spent time writing one, which is very simple: It is to fill out the unobserved parts of the ESA Gaia color-magnitude diagram, but targeting stars for spectroscopy that do not already have a nearby star with a spectrum. The word “nearby” implies a resolution (how nearby?). The proposals are due in a few days and we still don't know exactly what our resolution should be! Also, do we treat variable stars differently from non-variable stars? We have work to do!
Lily Zhao (Yale) has built (what I call) a generative model for the residuals (in the spectral domain) of the spectra taken by EXPRES for a magnetically active star away from the mean spectrum. This generative model takes the pipeline-generated radial-velocity as the label that generates the residuals. Then we do inference to find out whether spectral-shape variations predict or can correct pipeline-generated radial velocities. This model is very conceptually like The Cannon. Today we generalized this model to take two or more pieces of meta-data or labels that can be used to generate the residuals, and then still do inference to correct the radial velocities. We'll see if it helps. My intuition says it does help.
In December (the 7th, to be precise), Kate Storey-Fisher (NYU) and I will be giving a tutorial at the NeurIPS Conference. Our tutorial will be on on machine learning and astronomy. Ordinarily we'd get slides ready and be ready to present but two things are making our preparation harder: One is that we want to deliver not just content, but also working Jupyter notebooks that show the participants how to get and use astronomical data of different kinds. The other is that the online format means that we need to pre-record parts of our tutorial. That's daunting, somehow! We spent a good part of today discussing content, scheduling, and making slides.
Today we realized or re-realized that the linear terms in our MySpace project (the project to find the coordinate transformation that maximizes the informative-ness of the velocity sub-structure in the Milky Way disk) are just the Oort constants, or a generalization thereof. But since they maximize the velocity structure, they don't necessarily mean, for us, what Oort expected them to mean. We also got the method working to first order. Or I should say that Adrian Price-Whelan (Flatiron) did. He made use of jax, the auto-differentiation package for Python. It's impressive.
My loyal reader knows that I have been working on the fundamentals of linear regression for a bit now. Today I did some writing on this topic. Last week, Soledad Villar (JHU) and I got the point that we could write down a specific case where the limit of infinite features in a particular, carefully designed linear regression becomes exactly a Gaussian Process with a particular, carefully chosen kernel. I understand how to generalize this result partially, but not completely: Apparently this will work in an infinity of different bases, with an infinity of different weighting functions or kernels. My goal is to write something pedagogical and useful for practitioners.
We had a fun data group meeting today, in which we discussed many human aspects of data analysis (like asking questions in talks and seminars, and sharing work when it is in pre-publication status). I spoke about the connection between Gaussian processes and linear fitting with enormous numbers of basis functions; there is a limit in which they become identical, which is awesome. Group meeting was followed by a conversation with Storey-Fisher about what we are going to work on next: Pulsar timing? Looking for anomalies in large-scale structure? Intensity mapping?
It was a low-research day today. But I did have a great conversation with Viviana Acquaviva (CUNY) about the textbook she is writing on machine learning for advanced undergraduates in the natural sciences.
Today Soledad Villar (JHU) and I completed a problem I've had open for literally years (I think I first worked on it in AstroHackWeek 2017): Does a linear fit become a Gaussian process when the number of components (parameters) goes to infinity? The answer is yes! But you have to choose your features very carefully, and take the limit (to infinite features) sensibly. But if you meet those conditions it works, and the kernel function for the GP becomes a Fourier transform of the squares of the amplitudes of the features. That is, the kernel function in real space is the Fourier transform of the power spectrum in fourier space. There are many details I don't yet understand, but we got it working both theoretically (on paper) and numerically (on the computer).
Lily Zhao (Yale), Megan Bedell (Flatiron), and I are working on a project to look at stellar spectral variations in extreme-precision radial-velocity spectroscopy, with EXPRES data. Do these stellar spectral variations tell you anything about stellar noise that distort radial-velocity measurements? This project is very specific and technical, but it connects to some deep ideas in velocity measurement:
In principle you can only precisely measure radial-velocity changes in a star, never the precise absolute or systemic radial velocity. But this precision argument depends on having a constant spectrum. If the spectrum varies, there is no rock to stand on. So this project requires some philosophical backing, I think. I tried to write some of that down this morning. I love stuff like this!
Some people of a certain age will know what it means when you say that MySpace is dead. But the statement is wrong! Today Jason Hunt (Flatiron) rebooted an old project by Price-Whelan and mine called MySpace. The motivation for our reboot: The upcoming ESA Gaia EDR3.
The idea is to figure out how the velocity-space structure (the moving groups, as it were) in the local disk varies with position, with a data-driven model, and then interpret the variations with position in terms of dynamical properties of the Milky Way. Hunt's innovation is to apply this same procedure to simulations as well as data, and use the output to classify the velocity substructure (classify as in: Does it come from resonances or disrupting clusters or what?). We discussed some of the math and optimization involved. Because we phrase this as the fitting of an expansion.
We submitted Kate Storey-Fisher's (NYU) paper on estimating the correlation function to the AAS Journals (probably ApJ, but they decide now, not us). I am so excited. It's been a great project and it has beautiful results and—if we can get this method adopted—we will save future missions and projects a lot of compute time. (And therefore reduce their carbon footprints!)
[Note added later: Here is the manuscript.]
I spoke (remotely) at CCA today about linear regression (fitting linear models for the purposes of prediction), when the linear regressions have huge numbers of parameters. Yes huge: More than the number of data points! It turns out that even though you can thread the data perfectly—your chi-squared will be exactly zero—you can still make good predictions for held-out data. That surprised the crowd, which, in turn, surprised me: Many in this crowd use Gaussian processes and deep learning, both of which have these properties: More parameters than data, can fit any training data perfectly, and yet still make good, non-trivial predictions on held-out data.
My slides are here. Should I write something about all this?
When we think about finding extra-solar planets from the reflex motions they imprint into stellar radial-velocity data, we think about the problem of noise: There is shot noise, there are spectrograph-calibration offsets, there are imprints of the atmosphere, there is surface convection on the star and asteroseismic modes, there is magnetic activity, flaring, and so on! It's a mess. But there's also noise from other, unmodeled and undiscovered planets. That is, the other things orbiting the star, other than the planet of interest.
Today, Winston Harris (MTSU), Megan Bedell (Flatiron) and I came up with a plan for inserting this planetary-system noise into Harris's simulations of radial-velocity data. The question arose: What periods to use for the planets? And Bedell suggested that we adapt the Titius–Bode law! Hilarious. This gives us an extra-solar system architecture that makes sense, and simultaneously trolls anyone reading our paper.
[Insert here obligatory objection to naming things after people.]