I had a quick call today with Hans-Walter Rix in which we had the millionth conversation about the selection function and how it is used in astronomy problems. And how it ought to be used. We resolved our differences, once again. I think it's important to take any project I'm working on and discuss it over and over again. I learned this from various mathematics colleagues, including Goodman (NYU), Greengard (Flatiron), and Villar (JHU). You don't really understand a project until you can rigorously describe it. And you can only rigorously describe it after quite a few trials and errors.
I had some work conversations today. One was with Kate Storey-Fisher (NYU) about our paper on a continuous-function estimator for the two-point correlation function of galaxies. We were discussing this related paper and how we might give a better or more intuitive mathematical derivation of our estimator. It is correct, but our argument is somewhat obtuse: It is that the estimator becomes Landy–Szalay for the top-hat-bin case, but is affine invariant. That is an odd argument!
I also had a long chat with Lehman Garrison (Flatiron) about N-body simulations, initial conditions, entropy, symmetries, reconstruction, halo occupation, and other matters related to the computational theory of cosmology. I'm hoping there's a project for us to do here.
I dug up my old project (from early in the pandemic) to write a note about how to estimate uncertainties on your measurements. I didn't write much today, but I did do some re-reading and made some notes to my future self.
Soledad Villar (JHU) and I spent time today discussing symmetries and machine-learning methods. Like many astronomers, I'm interested in imposing deep physical symmetries on methods, figuring that if the methods respect the symmetries, they will learn much more relevant and useful things from their training data (and not spend their training information on learning the symmetries). An easy symmetry is the convolutional symmetry, but physics has far more: The laws of physics are permutation-equivariant (which in the ML business is graph symmetry), they are unitary (which in the ML business is (sometimes) normalizing), and they have rotation and translation and boost symmetries. We looked at papers on gauge invariant networks, graph networks, and hamiltonian networks. All extremely relevant! It seems like all the tools are in place to do something interesting in cosmology.
I spent some time today discussing a possible NSF Center on real-time data analysis with Ashley Villar (Columbia) and Tyler Pritchard (NYU), based on the wide-ranging grass-roots interest we found in time-domain astrophysics we have discovered in NYC this pandemic. NSF Centers are big projects!
Because we found some apparent anomalies in the DR16 data, Anna-Christina Eilers (MIT) and I are looking at the noise properties of some of the SDSS-IV APOGEE stellar spectra. It's pretty inside baseball but we are comparing the MAD (my favorite statistic—the median absolute difference in brightness between neighboring spectral pixels) to the reported spectral signal-to-noise ratio and stellar parameters. We find that, as expected, MAD (on a normalized spectrum) decreases with SNR, but increases with metallicity and decreases with temperature (because more metal-rich and cooler stars have stronger lines). But there are outliers, where the SNR measurements appear completely wrong. We're following up now, but first: How to sensitively identify outliers in this space? It won't surprise my loyal reader that we tentatively decided to use nearest neighbors with a kd-tree.
As my loyal reader knows, Lily Zhao (Yale) has had great success with improving extreme precision radial-velocity measurements. In her latest work, she shows that she can use regression to predict radial velocity measurements from shape changes in the spectrum, independent of (or even orthogonal to) pure Doppler shifts. Under cross-validation, the regression can improve the scatter in the measured radial velocities. Today with Megan Bedell (Flatiron) we scoped a paper for these results. We want to show that the regression is a baby step towards something like doppler imaging spectroscopy, in which we would build a model of the full, rotating stellar surface in order to correct for it and see just the center-of-mass motion of the star. If we want to make that argument, we need a source for realistic simulated stellar spectra from a real, spotty, rotating star.
My day started with a conversation with Gaby Contardo (Flatiron) about modeling light curves of stars. We have projects in which we try to predict forwards and backwards in time, and compare the results. We're trying to make a good scope for a paper, which could involve classification, regression, or causal inference. Or all three. As usual, we decided to write an abstract to help us picture the full scope of the first paper.
Today Adrian Price-Whelan (Flatiron) showed me a great idea for looking at stellar dynamics in the Milky Way, that I think is unique and new: Look at the statistics of close, comoving (but unbound) pairs of stars as a function of orbit. The idea is that orbits can be classified (continuously maybe) by how unbound pairs close in phase space separate over time, when their center of mass (say) is on that orbit. And, especially, there are very big differences between chaotic and regular orbits in this respect (it's almost the definition of chaotic). So maybe the distribution of comoving pair separations is a strong function of orbit? That would be awesome. And it relates to things we have thought about with cold stellar streams.
Today Kate Storey-Fisher (NYU) and I spoke about next projects to do in large-scale structure, now that her first paper on clustering is out. One project is to use the flexibility of her clustering estimator to look at large-scale gradients or variations in the large-scale structure (analagous to power asymmetries in the CMB, for example). We have Abby Williams (NYU) working on this now. Another is to use it to make real-space correlation-function components that are the (appropriate angular averages of) Fourier transforms of power-spectrum (harmonic) modes. If we do that, we can perfectly unify real-space and k-space approaches to large-scale structure, without making the (brutal frankly) approximations that they make in k-space projects. We will have a big carbon footprint, though!
Today I finished the zeroth draft of a (first-author; gasp!) paper about linear regression with large numbers of parameters. My co-author is Soledad Villar (JHU). The paper shows how—when you are fitting a flexible model like a polynomial or a Fourier series—you can have more parameters than data with no problem, and in fact you often do better in that regime, even in predictive accuracy for held-out data. It also shows that as the number of parameters goes to infinity, your linear regression becomes a Gaussian process if you choose your regularization correctly. It is designed to be like a textbook chapter so we are faced with the question: Where to publish (other than arXiv, which is a given).
Today Rachael Roettenbacher (Yale) gave the CCA Colloquium, on Doppler imaging and interferometry and other methods for imaging stellar surfaces. In conversations with Roettenbacher and others (including Lily Zhao at Yale and Megan Bedell and Rodrigo Luger at Flatiron), I'm coming around to the position that extremely precise radial-velocity measurements of stars will require accurate models of rotating, spotty, time-evolving stellar surfaces. At the end of her talk, there were some discussions about this point: Should EPRV surveys be associated with stellar surface imaging campaigns? Probably, if we want to characterize true Earth analogs!
Today David Spergel (Flatiron) came by to discuss the following question: How much do we know—empirically—about fluctuations or clustering in the dark matter distribution in the Milky Way halo? Spergel's idea is maybe to use old or metal-poor halo stars: Since stars form in the centers of their DM halos (we think), the clustering or fluctuations in phase space of the old stars should always be larger than the clustering or fluctuations in the dark matter. I bet that's true! And it's easy to test in standard simulations, I think.
I have spent a lot of time in my life advocating cross-validation for model selection. That's sad, maybe? But for many reasons, I think it is much better than computing Bayesian evidences or fully-marginalized likelihoods (FMLs on this site!). Today, for the paper Soledad Villar (JHU) and I are writing, I made this figure, which demonstrates leave-one-out cross-validation. Each curve is a different leave-one-out fit, decorated with that fit's prediction for the left-out point. Instructive? I hope so.
Today Kate Storey-Fisher (NYU) and I led a tutorial at the beginning of the big NeurIPS machine-learning meeting. Our title was Machine learning for Astrophysics and Astrophysics Problems for Machine Learning. We used our forum to advertise astronomy problems to machine-learning practitioners: We astronomers have interesting, hard problems, and our data sets are free! We made public Jupyter notebooks that download data samples to give the crowd a taste of what's available, and how easy it is to get and munge into form.
Our slides are here and our four notebooks (for four different kinds of data) are here, here, here, and here. We got good audience interactions, including especially a group of great astronomers who helped answer questions in the chats!
I spent weekend reasearch time elaborating my code notebooks and my slides for the #NeurIPS2020 tutorial that Kate Storey-Fisher (NYU) and I are doing on Monday. I'm a bit stressed with the last-minute prep, but how else would I ever do this??
Today at the Astronomical Data Group meeting (led by Dan Foreman-Mackey) we did our quasi-monthly thing of getting a quick update from everyone who shows up. And 18 people showed up! Everyone gave an update; it was great to see the breadth of activity in the group. One contribution that got me excited was Christina Hedges (Ames, but still part of the Group!), who is looking at ESA CoRoT data. The mission was designed to have some tiny bit of color sensitivity, which makes it possible to look at colored light-curve variations and distinguish causal effects. This builds on work by Hedges to look at tiny point-spread-function changes in NASA Kepler and TESS data to get a tiny bit of color information in those white-light missions. Colored light curves are the future.
Today ESA Gaia EDR3 dropped! It was a fun day; the data are more precise and less noisy! I'm involved in a few different projects with the data. With Hunt (Flatiron) and Price-Whelan (Flatiron) I am looking at the local velocity-space structure in the disk, and seeing if we can classify features by looking at how they vary spatially around the Solar position. With Eilers (MIT) I am going to update our spectrophotometric distance estimates to APOGEE luminous red giants. With Bonaca (Harvard) I will find out if we can improve the kinematics and orbit identification of stellar streams. None of these projects got very far today, but we did make this visualization!
I learned a lot about linear algebra today! I learned that if you exponentiate a matrix (Yes, matrix exponentiation; if this makes you uncomfortable, think about the Taylor series for exponentiation. Do that with a matrix.), the determinant of the resulting matrix is the exponential of the trace of the exponent matrix. So if you need unit-determinant matrices, you can make them by multiplying together exponents of traceless matrices.
Why do I care about all this? Because Jason Hunt (Flatiron), Adrian Price-Whelan (Flatiron), and I realized yesterday that we need to make some of our transformation matrices volume-preserving. This is in our MySpace project for ESA Gaia EDR3 that finds a data-driven, transformation of phase-space to emphasize velocity structure. And I learned that Jax (the simple auto-differentiation tool for numpy and scipy) knows about matrix exponentiation.
I give thanks to Soledad Villar (JHU) for these insights about linear algebra.
Christina Eilers (MIT) and I discussed our ESA Gaia EDR3 projects today. Our top priority is to re-do our machine-learning (linear regression, really) spectrophotometric distance estimates for very luminous red-giant stars, and then re-map the Milky Way disk in abundances and kinematics. We think that even a small improvement in the parallaxes (as we expect to get on Thursday) might make a big difference to our inferred spectroscopic distances. We discussed the point that in our DR2 work we only used stars on the “low-alpha sequence”; we want to generalize if we are going to make complete abundance maps. But also the stars with different abundance trends might want very different distance estimation parameters. That suggests doing the EDR3 regression in a more “abundance-aware” way.
I spent my research time today (hiding from the world; it's closed here in the US because of Thanksgiving plus pandemic) writing in my paper with Soledad Villar (JHU) about how to perform regression with very flexible models (like polynomials, wavelets, and Fourier modes).
ESA Gaia EDR3 is next week! I have been trying to get ready in various ways. Today Ana Bonaca (Harvard) showed me remarkable evidence that the cold stellar streams in the Milky Way halo are clustered in kinematic space, maybe also with the halo globular clusters! We discussed things to do in this area. It's perfect for EDR3 because if the clustering is real, it should improve with the improved precision of the EDR3 data.
I spoke with Katie Breivik (Flatiron) today about a project to paint toy binary stars (from Breivik's model of how binaries form and evolve) onto toy spectroscopic targets (from Neige Frankel's model of how the Milky Way disk formed) to see how many binary stars and how many black-hole (or compact-object) binaries Adrian Price-Whelan (Flatiron) and I should be finding in the APOGEE survey. The project is simple in principle, but the matching up of differently simulated catalogs is a conceptual and administrative challenge! The hope for this project is that we can constrain something about the formation of black-hole binaries by the observation that we don't find any (or don't find very many) in APOGEE!
Based on things I have been learning about linear regression this year, I suggested this morning to Lily Zhao (Yale) that she replace our generative model for stellar spectral variability with a discriminative model, which tries to predict the radial-velocity offset of a star from changes in the stellar spectral shape. It's not a trivial model, since the number of parameters (features) is immense (more than 200,000) and the number of training-set examples is small (45-ish). But by the afternoon she did it and it works. Indeed, it looks like it works better than the generative models we have, even when tested by cross-validation.
Hans-Walter Rix (MPIA) and I worked on our project to explain, elucidate, and determine the selection function (for ESA Gaia and other surveys) today. We decided that a good toy problem is the luminosity function of white dwarf stars. This is a good toy problem because the selection function is necessary, but they aren't so far away that the full three-dimensional dust map is required. I wrote words about this problem in a latex document to get us started.
This morning we had a call to begin the GaiaUnlimited project, which is a multi-institution collaboration to make a useful selection function for the ESA Gaia Catalog and Mission. The idea is: Gaia produces catalogs, but it has no deliverable mask or selection probability, so it is not possible to use the catalogs for certain (maybe most?) statistical purposes without additional information. We are going to try to construct that information for the community. Today we kicked off this project, and discussed the scope. We decided that all observational selections are in, but the three-dimensional dust map in the Milky Way is out!
After the call, Rix (MPIA) and I decided that we have to find some good example projects that make use of the selection function but don't need the dust map, because we want to be customers for the project as well as owners.
By the way, this project was started at a #GaiaSprint!
I had a good call with Paula Seraphim (NYU) today, who is doing a literature search on the physics relevant to the question “Do we live in a simulation?”. She found the classic papers by Dyson and by Frautschi in the 1980s on information processing in the expanding universe, and by Feynman on whether one quantum system can exactly simulate another. Meanwhile (really yesterday), I have got ready a class to teach for Blakesley Burkhart (Rutgers) about this subject: What does a physicist (as opposed to a philosopher, say) have to say about this question?
Today I finished my open-fiber proposal, with Adrian Price-Whelan (Flatiron). We discovered that one of the oddities that we discovered yesterday—to wit, that there are no spectra of very luminous red giants—comes from an interaction between any sensible parallax signal-to-noise cut on the ESA Gaia data, and the bright limit on the SDSS visible spectrographs. Brutal! We have to select in some way that doesn't make the signal-to-noise cut (since the bright limit is unavoidable). I have ideas (one of which is in this paper), but I didn't have time to implement them before we had to submit the proposal. Oh well! We will get opportunities to update our target lists later.
My research time today was spent writing in my SDSS-V open-fiber proposal. Adrian Price-Whelan (Flatiron) got and selected the relevant ESA Gaia data, and we did experiments with boxelization of the color-magnitude diagram. We are finding that there are (to my surprise) many SDSS-II, SDSS-III, and SDSS-IV spectra of the stars in between the main sequence and the white dwarf sequence (home of CVs, stripped stars, and low-metallicity stars). Thousands! But to my equally large surprise, there are almost no optical spectra of the most luminous giant stars. What gives?
The SDSS-V project uses robot fiber positioners to take millions of short (15-ish minutes per visit) spectra in the visible and infrared. Because of the geometric constraints of the fiber positioners, and the targeting, there will be many, many unusued fibers—meaning, many opportunities to add additional spectroscopic targets! The project issued an internal call for proposals for the open fibers. Today I spent time writing one, which is very simple: It is to fill out the unobserved parts of the ESA Gaia color-magnitude diagram, but targeting stars for spectroscopy that do not already have a nearby star with a spectrum. The word “nearby” implies a resolution (how nearby?). The proposals are due in a few days and we still don't know exactly what our resolution should be! Also, do we treat variable stars differently from non-variable stars? We have work to do!
Lily Zhao (Yale) has built (what I call) a generative model for the residuals (in the spectral domain) of the spectra taken by EXPRES for a magnetically active star away from the mean spectrum. This generative model takes the pipeline-generated radial-velocity as the label that generates the residuals. Then we do inference to find out whether spectral-shape variations predict or can correct pipeline-generated radial velocities. This model is very conceptually like The Cannon. Today we generalized this model to take two or more pieces of meta-data or labels that can be used to generate the residuals, and then still do inference to correct the radial velocities. We'll see if it helps. My intuition says it does help.
In December (the 7th, to be precise), Kate Storey-Fisher (NYU) and I will be giving a tutorial at the NeurIPS Conference. Our tutorial will be on on machine learning and astronomy. Ordinarily we'd get slides ready and be ready to present but two things are making our preparation harder: One is that we want to deliver not just content, but also working Jupyter notebooks that show the participants how to get and use astronomical data of different kinds. The other is that the online format means that we need to pre-record parts of our tutorial. That's daunting, somehow! We spent a good part of today discussing content, scheduling, and making slides.
Today we realized or re-realized that the linear terms in our MySpace project (the project to find the coordinate transformation that maximizes the informative-ness of the velocity sub-structure in the Milky Way disk) are just the Oort constants, or a generalization thereof. But since they maximize the velocity structure, they don't necessarily mean, for us, what Oort expected them to mean. We also got the method working to first order. Or I should say that Adrian Price-Whelan (Flatiron) did. He made use of jax, the auto-differentiation package for Python. It's impressive.
My loyal reader knows that I have been working on the fundamentals of linear regression for a bit now. Today I did some writing on this topic. Last week, Soledad Villar (JHU) and I got the point that we could write down a specific case where the limit of infinite features in a particular, carefully designed linear regression becomes exactly a Gaussian Process with a particular, carefully chosen kernel. I understand how to generalize this result partially, but not completely: Apparently this will work in an infinity of different bases, with an infinity of different weighting functions or kernels. My goal is to write something pedagogical and useful for practitioners.
We had a fun data group meeting today, in which we discussed many human aspects of data analysis (like asking questions in talks and seminars, and sharing work when it is in pre-publication status). I spoke about the connection between Gaussian processes and linear fitting with enormous numbers of basis functions; there is a limit in which they become identical, which is awesome. Group meeting was followed by a conversation with Storey-Fisher about what we are going to work on next: Pulsar timing? Looking for anomalies in large-scale structure? Intensity mapping?
It was a low-research day today. But I did have a great conversation with Viviana Acquaviva (CUNY) about the textbook she is writing on machine learning for advanced undergraduates in the natural sciences.
Today Soledad Villar (JHU) and I completed a problem I've had open for literally years (I think I first worked on it in AstroHackWeek 2017): Does a linear fit become a Gaussian process when the number of components (parameters) goes to infinity? The answer is yes! But you have to choose your features very carefully, and take the limit (to infinite features) sensibly. But if you meet those conditions it works, and the kernel function for the GP becomes a Fourier transform of the squares of the amplitudes of the features. That is, the kernel function in real space is the Fourier transform of the power spectrum in fourier space. There are many details I don't yet understand, but we got it working both theoretically (on paper) and numerically (on the computer).
Lily Zhao (Yale), Megan Bedell (Flatiron), and I are working on a project to look at stellar spectral variations in extreme-precision radial-velocity spectroscopy, with EXPRES data. Do these stellar spectral variations tell you anything about stellar noise that distort radial-velocity measurements? This project is very specific and technical, but it connects to some deep ideas in velocity measurement:
In principle you can only precisely measure radial-velocity changes in a star, never the precise absolute or systemic radial velocity. But this precision argument depends on having a constant spectrum. If the spectrum varies, there is no rock to stand on. So this project requires some philosophical backing, I think. I tried to write some of that down this morning. I love stuff like this!
Some people of a certain age will know what it means when you say that MySpace is dead. But the statement is wrong! Today Jason Hunt (Flatiron) rebooted an old project by Price-Whelan and mine called MySpace. The motivation for our reboot: The upcoming ESA Gaia EDR3.
The idea is to figure out how the velocity-space structure (the moving groups, as it were) in the local disk varies with position, with a data-driven model, and then interpret the variations with position in terms of dynamical properties of the Milky Way. Hunt's innovation is to apply this same procedure to simulations as well as data, and use the output to classify the velocity substructure (classify as in: Does it come from resonances or disrupting clusters or what?). We discussed some of the math and optimization involved. Because we phrase this as the fitting of an expansion.
We submitted Kate Storey-Fisher's (NYU) paper on estimating the correlation function to the AAS Journals (probably ApJ, but they decide now, not us). I am so excited. It's been a great project and it has beautiful results and—if we can get this method adopted—we will save future missions and projects a lot of compute time. (And therefore reduce their carbon footprints!)
[Note added later: Here is the manuscript.]
I spoke (remotely) at CCA today about linear regression (fitting linear models for the purposes of prediction), when the linear regressions have huge numbers of parameters. Yes huge: More than the number of data points! It turns out that even though you can thread the data perfectly—your chi-squared will be exactly zero—you can still make good predictions for held-out data. That surprised the crowd, which, in turn, surprised me: Many in this crowd use Gaussian processes and deep learning, both of which have these properties: More parameters than data, can fit any training data perfectly, and yet still make good, non-trivial predictions on held-out data.
My slides are here. Should I write something about all this?
When we think about finding extra-solar planets from the reflex motions they imprint into stellar radial-velocity data, we think about the problem of noise: There is shot noise, there are spectrograph-calibration offsets, there are imprints of the atmosphere, there is surface convection on the star and asteroseismic modes, there is magnetic activity, flaring, and so on! It's a mess. But there's also noise from other, unmodeled and undiscovered planets. That is, the other things orbiting the star, other than the planet of interest.
Today, Winston Harris (MTSU), Megan Bedell (Flatiron) and I came up with a plan for inserting this planetary-system noise into Harris's simulations of radial-velocity data. The question arose: What periods to use for the planets? And Bedell suggested that we adapt the Titius–Bode law! Hilarious. This gives us an extra-solar system architecture that makes sense, and simultaneously trolls anyone reading our paper.
[Insert here obligatory objection to naming things after people.]
I spent research time today with Kate Storey-Fisher on the final details in her new paper on a continuous-function estimator for the 2-point correlation function. It removes binning from the estimation (all current estimators bin), and makes the results far less computationally costly to interpret.
This afternoon, David Spergel (Flatiron) and the CCA group leaders (and some friends) had a retreat to discuss long-term mission and plans. Given the global sitch, this retreat was only partially in-person. We discussed the plans of all the groups, and whether there is a coherent, cross-cutting mission statement that could be adopted by the full CCA. I think there is! We also discussed the structure of the organization, and how we want to be organized in the future. Not research, maybe? But in support of research, in the long run.
As my loyal reader knows, I have been interested in finding out how radial-velocity measurements of stars (for, say, exoplanet discovery) are affected by shape changes in the stellar spectrum. Today Lily Zhao (Yale) had a breakthrough: She did regression of residuals away from a constant-spectrum fit to radial-velocity data, and showed that the residuals can be used to predict the radial velocity! That is, she can show that stellar spectrum shape predicts measured radial velocity, over and above the expected Doppler shift. And she did this with proper cross-validation, so the result looks solid. I'm stoked! I need to write down some theory this weekend.
Jonah Goldfine (NYU) is looking, with Adrian Price-Whelan (Flatiron) and me, at short-period binary star systems in the NASA TESS data. We find that most of the short-period binaries that Price-Whelan finds in the APOGEE radial-velocity data have interesting variability in their photometry in the TESS data. Today we compared light curves folded on the Price-Whelan period found by The Joker with light curves folded on the period found with the Lomb-Scargle periodogram. There are lots of stars where The Joker gets the period better than the light curve, which is surprising, since we are validating with the light curve! There are so many kinds of variability to consider. Goldfine is going to start with ellipsoidal variations, I think.
Gaby Contardo (Flatiron) and I are asking whether you can predict the next data point in a stellar light curve from the last N data points or whether you can post-dict (is that a word?) the previous data point in a light curve from the next N. The results are surprisingly rich. She showed some of them in Stars and Exoplanets Meeting today.
In that same meeting I showed my attempt to measure the astrometry (celestial position and proper motion) of a star from the radial-velocity variations you see on an Earth-bound observatory. It is surprisingly precise! But not as precise as direct astrometric measurements.
Dustin Lang (Perimeter) called me today and alerted me to this NASA funding call related to open-source projects. He argued that we need to take Astrometry.net/ to the next level. I agree! So we kicked around project and development ideas and vowed to take a stab at a letter of intent.
Today was a board meeting for the Terra Hunting collaboration. It was an important meeting, because in it, we more-or-less approved adding a laser-frequency comb to the required hardware of the project. This increases our budget, and brings in new partners. But I'm a huge fan (or I should say “And I'm a huge fan”), because our new partners will be awesome, and, just as important, as Lily Zhao, Megan Bedell, and I have shown, the LFC is great for constructing a fully data-driven, non-parametric calibration method for the spectrograph.
Calibration paper to appear on arXiv soon!
In our paper on factoring products of Gaussiasn, we recommend using the matrix inversion lemma (and we should have recommended the matrix determinant lemma too) to speed calculations. This weekend I wrote the most efficient numpy code I could to implement the log-Gaussian formulae in that paper. Here were my rules: Never construct a zero matrix or a sparse matrix using diag() or anything else that makes a container of mainly zeros. Never use inv(), only solve(). Always store and re-use elements of the calculation, especially those shared between the inversion and determinant lemmas. In the end I got a factor of thousands in speed over naive implementations, which themselves were hundreds of times faster than just constructing the matrices and operating on them.
That was fun, and a rare moment of pure coding for me.
As my loyal reader knows, our radial-velocity companion-search code called The Joker is a brute-force least-square fitting (plus some Bayes) of an elliptical orbit at every possible period, orientation, and phase. In astrophysics, the Lomb–Scargle periodogram is a workhorse tool that, under the hood, is a brute-force least-square fitting of a sinusoid at every possible period and phase. So these two ideas are fundamentally incredibly similar. And indeed, today Winston Harris (MTSU) demonstrated quantitatively that these are the same, and will become identical as eccentricities go to zero. That's interesting, because if we are doing companion search and we don't mind making the approximation that eccentricities are zero, the fitting of sinusoids (even at arbitrary phase, because: trig identities) is way, way faster than the fitting of Kepler functions.
Today Juna Kollmeier (Carnegie) convened a tiny meeting with Dalal, Percival, and me to discuss the next generation of cosmology missions and projects. We wandered around metrics, around data-analysis methods (forward modeling?), and new hardware, but didn't come up with much specific to say, yet. But Kollmeier is right to be thinking forward, because the landscape is changing and the most interesting objects of cosmological research are evolving on time-scales shorter than project execution. Of course that's always the case for interesting disciplines!
We're in the process of finishing a paper for submission to AISTATS. I spent time on the weekend and Monday working on figures, and I basically failed. But luckily Teresa Huang (JHU) picked it up and made really nice figures that show the double-descent phenomena in regression, the susceptibility of methods to “data poisoning” attacks, and the good effects of well-chosen regularizations.
Adam Wheeler (Columbia) and I went through points in his (very constructive) referee report on his paper using data-driven methods to find stars with abundance anomalies. We also discussed what to do next in this line of research. We are both leaning towards even more data-driven things, like finding parts of the spectrum that do and don't predict other parts of the spectrum. Information-theory-like things.
This weekend, I did something that Teresa Huang (JHU) has been suggesting I do for a while in our project on linear regression: Plot the null prediction. That is, how much better do our regression methods do than a trivial prediction of zero, no matter what, for every data point? It turns out that her intuitions were correct: There are many settings for our regression problems in which the null prediction does better than any data-driven regression! What gives? If the training data are noisy, it is possible for any training to move your predictions in the worse direction from the null prediction. Hahaha, that's something we have to think about.
Today I hid in an undisclosed location and worked on a submission for AISTATS, which is due next week. The paper is with Teresa Huang (JHU) and Soledad Villar (JHU) and it is about regression and robustness thereof, as measured by prediction accuracy and sensitivity to adversarial attacks.
My loyal reader knows that Adrian Price-Whelan (Flatiron) and I are trying to finish a paper in which we image the orbital tori in the phase-space of the Milky Way using chemical abundances of stars. We are using a very simplistic, toy model of the galaxy mass distribution, which then generates orbits, which trace the tori. In the process of writing this up, we realized that there must be a fully data-driven, flexible model for the orbital tori that we could use. In this formulation we would find the best closed-3-torus foliation of 6-dimensional phase space, and then only after finding these tori, interpret them in terms of the force law and hence the mass model. Interesting thought whether there are useful math tools that do this.
Winston Harris (MTSU) and I looked at the output of The Joker and the Lomb-Scargle periodogram on his fake exoplanet radial-velocity data. They maybe look similar. It makes sense, since they are both doing a simple likelihood-based linear fit of periodic functions.
I spent some time putting together a notebook to test the idea that Josh Winn (Princeton) had that the Solar System barycentric radial-velocity correction could be indicative of the stellar astrometry for the very brightest stars, where precise astrometry is (strangely) hard. It should be possible at some level, since the barycentic correction does depend on the astrometry! I love the idea of doing astrometry by taking radial-velocity measurements.
I joined Hans-Walter Rix's group meeting at MPIA today by telecon so I could see Anthony Brown (Leiden) present the plans for the ESA Gaia eDR3 data release. It's going to be great, with proper motions improving by a factor of two (in standard error; a factor of four in information content). But full DR3 isn't until 2022, so we don't get low-res spectra, high-res spectra, and other such goodies for a while. Now, how to get ready for eDR3? What projects to concentrate on?
I talked to Jason Hunt (Flatiron) this afternoon about things we might do with the ESA Gaia early-DR3 data (December 2020), which will be similar to DR2 but with improved parallaxes and proper motions. In 2022 we'll get the complete DR3, which contains lots of qualitatively new data and measurements. But in the meantime, eDR3 is great for projects that we did—or tried to do—in DR2 but were limited by proper-motion precisions (especially, since proper motions get better by a lot). One such project is work Hunt did with Bovy on velocity substructure in the disk. So I pitched to him the MySpace project that Price-Whelan and I formulated a while back. I have an intuition that it should work well with the new data.
I got some weekend research time in, which was fun: When I work on the weekends, I try to make it things that I want to do, rather than things I have to do. I wrote code to predict a point in a NASA Kepler light curve from the preceding points and then from the following points. And it appears that the two predictions disagree, as expected, at gaps in the light curve (where there is a stretch of missing data) and at discontinuities in the light curve (like those created by small stellar flares). So it is interesting! And just linear regression of course (that's my brand). Now: Can we do science with it?
This was originally inspired by a comment by Bernhard Schölkopf (MPI-IS), years ago, about whether we can predict the past from the future better than we can predict the future from the past, which might have relationships to causal inference. I'm enjoying thinking about the philosophical aspects.
Today was a busy day. The first half of the day was the Terra Hunting Science Meeting, day 3. We substantially narrowed down our target-selection criteria, simplifying in the process. Tim Naylor (Exeter) ran a great meeting.
The second half of the day was me giving a physics colloquium at UC Merced. I spoke about Chemical Torus Imaging, which is our new name for the project code-named Chemical Tangents, with Price-Whelan. I started at the beginning, with Kepler and Newton, to motivate the idea that we can do better inferences if we can see the orbits.
Today Gaby Contardo (Flatiron) and I discussed the following problem, which (I find) is hard to even formulate: The red-giant branch (above the red clump) is supposed to be a place where most stars are “going up” (getting brighter and bigger) and a minority of stars are “going down”, heading back towards the red clump. And yet, all the stars on the red-giant branch look very similar spectrally (though not asteroseismically!). Can we see in the spectra (I'm thinking APOGEE here), in a purely data-driven way (that is, without a supervisory training set), that the red-giant branch is a folded locus that projects to a simple line, and not just a simple line? This paper by Ting, Hawkins, & Rix gives me hope that we can.
I'm interested in all this because it gets at what can be just observed in the data, as compared to what we project onto the data from theory. This project is a toy, in some sense, because we do know about the asteroseismic differences, and we can use them, without being theoretically supervised.
I spent my research time today working for Adrian Price-Whelan (Flatiron). I took our huge list of bullet points for our paper's discussion section (paper is on inferring the Milky Way mass model using stellar element abundances and kinematics), rearranged them around four themes, and deleted duplicate points. I then turned a few of the bullet points into a few paragraphs. Writing is a fun part of my job. But I don't find that I can write many paragraphs in a day. Or at least not many good ones.
Today Lily Zhao (Yale) crashed the Terra Hunting science meeting to talk about the EXPRES experience with its laser frequency comb calibration source. She showed the results of our hierarchical, non-parametric wavelength solution and how it improves radial-velocity measurements end-to-end. We then spent lots of time talking about the operational aspects of having a LFC in the system. It looks likely (to me) that Terra Hunting will budget, acquire, and install an LFC. I think it's a good idea, after our EXPRES experiences.
Today was the first day of the Terra Hunting science meeting. We made great progress on target selection, which was our primary goal. We decided that we should only consider stars that have sufficient visibility from the observatory and sufficient brightness to deliver sufficient photons to provide precisely enough measured radial-velocity variations over a decade to meet our planet-detection goals. That is, we should make the parent sample of our final selection to be stars where we at least have enough photons to detect an Earth. That is an obvious and simple point, but this was the first meeting where we really clearly identified it, and started to figure out how that flows down to a target list. It turns out that there aren't huge numbers of stars that even meet this strict requirement. And of course we spent lots of time talking about all the reasons that we won't get photon-limited radial-velocities, so our final target list must be much more restricted, probably.
After conversations with Christina Eilers (MIT) this week, I spent a bit of weekend research time re-scoping our paper. Now it is a paper about the elemnent-abundance gradients in the Milky Way disk, made with an abundance-calibration model. It is no longer a paper about abundance calibration, with gradients shown as a demonstration. What does it mean to change scope? It means deleting the old file and writing a brand-new title and abstract. As my loyal reader knows: Title and abstract come first!
In the weekly call between Lily Zhao (Yale), Megan Bedell (Flatiron), and myself, Zhao showed some really nice results: She has run our RV code wobble on all the data on a particularly active, spotty star. By design, wobble makes an empirical average stellar spectrum, which permits us to look at the residuals, colored by various things. Zhao has made these plots and there are some hints of spectral signatures of the spots! Too early to tell, but Zhao may be about to discover spectrum-space spot removers. This might be the realization of a dream I've had ever since we started looking at EPRV data.
Today between zoom (tm) calls, Adrian Price-Whelan (Flatiron) and I discussed the following problem with the k nearest neighbors algorithm: If your point is near an edge or gradient in your sample, and the quantity you are predicting (your label) also has a gradient locally, then the “center of mass” of your neighbors will be offset from your target point, and therefore have an average or mean that is inappropriate to your target point.
No problem: Instead of taking a mean of your nearest neighbors, fit a linear plane to your neighbors (label as a function of position) and interpolate or evaluate that fit at the point. Awesome! This is (a simplification of) what Adam Wheeler (Columbia) and I did in our X-enhanced stars paper. The only problem is: We are doing nearest neighbors inside an optimization loop, and we need it to be very fast.
Price-Whelan and my solution: Compute the k-neighbor center of mass position, project all target–neighbor offset vectors onto the target–center-of-mass vector, and then do a linear regression in one dimension, interpolated to the target point in one dimension. This works! It's fast, and (when you flow through all the linear algebra) it looks like a really strange kind of weighted mean. Price-Whelan implemented this and it improved our results immediately. Yay!
Today Teresa Huang (JHU), Soledad Villar (JHU), and I met for a sprint on a possible paper for the AISTATS conference submission deadline. We are thinking about pulling together results we have on double descent (the phenomenon that good, predictive models can have many more data than parameters, or many more parameters than data, but in-between, unless you are careful, they can suck) and on adversarial attacks against regressions.
I have a fantasy that we can unify a bunch of things in the literature, but I'm not sure. I ended up drawing a figure on my board that might be relevant: The out-of-sample prediction error, the (equivalent of the) condition number of the data SVD, and the susceptibility to “data-poisoning attacks” all depend on number of data n and the number of parameters p in related ways. Our main conclusion is that regularization (or dimensionality reduction, which amounts to the same thing) is critical.
Today I spent more time writing linear algebra and likelihood-function math and discussion into the paper with Bonaca (Harvard). I like to write first, and code later, so that the paper and the code use the same terminology and embody the same concepts.
Later in the day I had the first of regular meetings with new arrival Katie Breivik (Flatiron). We discussed the overlap between her interests in binary-star evolution and our binary-star-relevant data sets.
I spent my research time this morning typing math into a document co-authored by Ana Bonaca (Harvard) about a likelihood function for asteroseismology. It is tedious and repetitive to type the math out in LaTeX, and I feel like I have done the nearly-same typing over and over again. And, adding to the feeling of futility: The linear-algebra objects we define in the paper are more conceptual than operational: In many cases we don't actually construct these objects; we avoid constructing things that will make the execution either slow or imprecise. So in addition to typing these, I find myself also typing lots of non-trivial implementation advice.
I spent part of the weekend looking at issues with chemical abundances as a function of position and velocity in the Solar neighborhood. In the data, it looks like somehow the main-sequence abundances in the APOGEE and GALAH data sets are different for stars in different positions along the same orbit. That's bad for my Chemical Torus Imaging (tm) project with Adrian Price-Whelan (Flatiron)! But slowly we realized that the issue is that the abundances depend on stellar effective temperatures, and different temperatures are differently represented in different parts of the orbit. Phew. But there is a problem with the data. (Okay actually this can be a real, physical effect or a problem with the data; either way, we have to deal.) Time to call Christina Eilers (MIT), who is thinking about exactly this kind of abundance-calibration problem.
Today Bonaca (Harvard) and I settled on a full scope for a first paper on asteroseismology of giant stars in the NASA TESS survey. We are going to find that our marginalized-likelihood formalism confirms beautifully many of the classical asteroseismology results; we are going to find that some get adjusted by us; we are going to find that some are totally invisible to us. And we will have reasons or discussion for all three kinds of cases. And then we will run on “everything” (subject to some cuts). That's a good paper! If we can do it. This weekend I need to do some writing to get our likelihood properly recorded in math.
Today, Tyler Pritchard (NYU) and I assembled a group of time-domain-interested astrophysicists from around NYC (and a few who are part of the NYC community but more far-flung). In a two-hour meeting, all we did was introduce ourselves and our interests in time domain, multi-messenger, and Vera Rubin Observatory LSST, and then discuss what we might do, collectively, as a group. Time-domain expertise spanned an amazing range of scales, from asteroid search to exoplanet characterization to stellar rotation to classical novae to white-dwarf mergers with neutron stars, supernovae, light echoes, AGN variability, tidal-disruption events, and black-hole mergers. As we had predicted in advance, the group recognized a clear opportunity to create some kind of externally funded “Gotham” (the terminology we often use for NYC-area efforts these days) center for time-domain astrophysics.
Also, as we predicted, there was more confusion about whether we should be thinking about a real-time event broker for LSST. But we identified some themes in the group that might make for a good project: We have very good theorists working, who could help on physics-driven multi-messenger triggers. We have very good machine-learners working, who could help on data-driven triggers. And we have lots of non-supernovae (and weird-supernova) science cases among us. Could we make something that serves our collective science interests but is also extremely useful to global astrophysics? I think we could.
In Fourier transforms, or periodograms, or signal processing in general, when you look at a time stream that is generated by a single frequency f (plus noise, say) and has a total time length T, you expect the power-spectrum ppeak, or the likelihood function for f to have a width that is no narrower than 1/T in the frequency direction. This is for extremely deep reasons, that relate to—among other things—the uncertainty principle. You can't localize a signal in both frequency and time simultaneously.
Ana Bonaca (Harvard) and I are fitting combs of frequencies to light curves. That is, we are fitting a model with K frequencies, equally spaced. We are finding that the likelihood function has a peak that is a factor of K narrower than 1/T, in both the central-frequency direction, and the frequency-spacing direction. Is this interesting or surprising? I have ways to justify this point, heuristically. But is there a fundamental result here, and where is it in the literature?
Adrian Price-Whelan (Flatiron) and I have been trying to use the chemical abundances of stars to constrain the mass model (or gravitational potential, or orbit structure) of the Milky Way. One thing we have noticed is that the abundances are very sensitive to the coordinate system: If you have the velocity or position of the disk wrong, it is clearly visible in the abundances! That's fun, and motivating. But then we have noticed—and we noticed this two summers ago now—that different elements want to put the disk midplane in different places!
What gives? We have various theories. We started with systematics in the data, but the effects are seen in both APOGEE and GALAH. So it seems like it is real. Is it because relaxation times are very long at small vertical heights? (The disk is like a harmonic oscillator at small vertical amplitudes.) Is it because the thinner disk and thicker disk have inconsistent midplanes? Whatever it is, it seems like it is super interesting. We can't solve this problem in our current paper, but we want to comment on it.
This week is a sprint week for Ana Bonaca (Harvard) and me to work on our asteroseismology likelihood and search, for (possibly even ground-based) irregularly spaced data and heterogenous data. We figured out what we need for inputs to this likelihood function: We need empirical relationships between the nu-max parameter and other aspects of the power spectrum, like the mean heights of the mode peaks, and the width of the mode forest or comb. It seemed daunting at first, but awesomely a lot of what we need is in the Basu & Chaplin book. Most of our current technical struggles relate to the problem that the likelihood function is amazingly, amazingly featured. It's almost adversarial!
Late in the day, Tyler Pritchard (NYU) and I met socially distanced in a park to make final plans for a working meeting on time-domain astrophysics for NYC. The plan is to start to build a community around Vera Rubin LSST (which I just learn got backronymed) that is centered here in New York, and possibly build and operate a real-time event broker. But in this first meeting we want people to really discuss ideas and get something started: How to design online meetings to involve discussions and idea generation? We are learning a lot from our friends at TESS dot science online and AstroHackWeek 2020, both of which worked out new ways to have scientists who aren't physically in the same space—and maybe don't know one another all that well yet—do novel work together.
The bugbear of extreme precision radial velocity measurements is often called “stellar activity”. I don't love that terminology, because stellar activity has a magnetic-field-reconnection feel to it, when the thing being referenced really covers all sorts of stellar variability that maps onto radial-velocity measurements. Today, Bedell (Flatiron), Zhao (Yale), and I discussed how we are going to approach a stellar activity challenge coming from the EXPRES group at Yale: They have released spectra for many epochs of observing from one star, and the team that delivers the best RV precision wins. Wins? Well, has some bragging rights.
Our plan is to look for spectral changes that predict velocity changes. That is, can we see different spectra on different parts of the stellar surface and relate those to measured and true velocities? We discussed the stellar rotation period, photometric variations, and classical activity indicators, all of which might help us in our goals.
The first order of business? Get wobble to run on the spectra, and make it deliver residuals away from the best constant-star fit.
My research day started with a conversation with Teresa Huang (JHU) and Soledad Villar (JHU) about the regressions that are used to determine stellar parameters. Huang has shown that different machine-learning methods (which are generally over-parameterized) obtain very different gradients in their explicit or implicit functions that connects labels (stellar parameters) to features (stellar spectra), and very different from those indicated by the physical models we have of stellar atmospheres. These differences can be exploited for attack.
Later, Megan Bedell (Flatiron) and I spent time designing projects that are aimed at maximizing the efficiency of radial-velocity exoplanet searches. The idea is: You have a finite amount of telescope time allocated over a fixed (long) interval. How do you assign observation slots to stars to maximize expected yield (or any other statistic you care about)? The answer is going to depend strongly on what we assume about the noise properties of the measurements we make.
I spent the last few days hiding in various undisclosed locations. In my small amount of research time, I wrote words in the method and discussion sections of my nascent paper with Adrian Price-Whelan (Flatiron) about imaging orbital 3-tori in phase space using element abundances.
In my student-research meeting, Kate Storey-Fisher (NYU), Abby Williams (NYU), and I discussed how we could make a flexible, parameterized model of a correlation function (for large-scale structure) with a continuous spatial gradient in it. The idea is that we are going to use Storey-Fisher's continuous-function estimator for the correlation function to look for departures from statistical homogeneity. Our inspiration is the power asymmetry in the cosmic microwave background.
Our approach to building the flexible model is to take a model for the homogeneous correlation function and analytically Taylor-expanding it around a fiducial point. In this formalism, the local spatial gradient at the fiducial point turns into a linear dependence on position, and the local second derivative leads to a quadratic term, and so on. Of course an alternative approach would be to just measure the correlation funciton in spatial patches, and compare the differences to the measurement varainces. But binning is sinning, and I also believe that the continuous estimation will be higher precision.
Late in the day, Tyler Pritchard and I met to discuss a workshop we are putting together to see if there is critical mass in the NYC area to start a coherent, multi-institutional effort on time-domain astrophysics and LSST in particular. We want the workshop to be brief, but we want everyone to be able to say something. But we don't want the workshop to be all talking: We want to have the participants do design work and make decisions, jointly.
Today was a very low-research day. But I did get in a good phonecall with Lily Zhao (Yale) and Megan Bedell (Flatiron) about what's next, now that Zhao has submitted her paper on Excalibur, which is our hierarchical, non-parametric model for spectrograph calibration. In the submitted version, she finds that running Excalibur improves calibration of the EXPRES spectrograph over the standard pipeline by an amount that corresponds to removing an additive source of noise with a rms of 62 cm/s! So we really are improving the output of the hardware with sensible, well thought-out software.
Now that this calibration paper is done, we have decided to swing our focus over onto an activity challenge, in which teams compete to model and remove stellar-activity noise in radial-velocity signals. The challenge will come with raw spectra, so we work in this at the pixel level, which, as my loyal reader knows, is where I think all the innovation is possible.
In group meeting today, I had the group comment on my nascent mission statement for the Astronomical Data Group.
Today Soledad Villar (JHU) and I went back to the idea of adversarial attacks against linear regressions. But this time we looked at the training-set side, meaning: Can you attack the method with adversarial training data. One idea: What is the worst single data point you can add to the training set to distort the results? It looks to me like ordinary least squares is incredibly (arbitrarily) sensitive to this kind of attack if the dimensions of the data (the number of parameters) exceeds the size of the training set. This kind of sensitivity to attack somehow mitigates against the idea that more parameterized models are better (which is the thinking in machine learning these days).
Most of my research time today was spent working on student papers. I print them out, mark them up, and then send photographs of the marked-up pages to the first authors. That's ridiculous, but I don't know how else to do it that meets all my requirements (which are legion). In-between pages of papers, I discussed the red-giant branch with Gaby Contardo (Flatiron): We are trying to spectrally separate the upward-going stars from the downward-going stars, above the red clump. We formulated an unsupervised method that involves regression.
My committee on Equity and Inclusion at NYU is in final throes of getting a report out for our summer of work towards recommendations for the Department. So nothing that meets The Rules (tm) at the right. That said, what we are doing is definitely and absolutely physics.
Today was the final wrap-up presentations from the new partnership between the Simons Foundation (and especially Simons Observatory) and the National Society of Black Physicists that made the S-NSBP Scholars program. Scholars were paid to spend the summer interning in various physics projects. The wrap-up lightning talks were incredibly broad and impressive. But the organizer of the program, Kasey Wagoner (Princeton), asked the Scholars to say something in their lightning talks about what they learned or what the program meant to them, and some of the responses were pretty impressive. My student was Winston Harris (MTSU), who (my loyal reader knows) did work on making exoplanet detection more efficient. The Astronomical Data Group here at Flatiron hosted four Scholars, and the whole program included dozens.
I got up at 04:55 NYC time so that I could be at MPIA Galaxy Coffee (at 11:00 European time) to talk about the project I have with Adrian Price-Whelan (Flatiron) code-named Chemical Tangents. The MPIA crew has done a great job of moderating Galaxy Coffee and making it interactive; I was interrupted many times during my talk with questions and comments (which I love). I sure miss being at MPIA! This is my first summer in 15 not living in Heidelberg. Among the many great questions I got during my talk was one from Joe Hennawi (UCSB): He asked whether the Chemical Tangents idea—that stellar element abundance ratios can depend on actions but not angles—could be turned around and used to calibrate or adjust noisy or systematically biased abundance-ratio measurements. That's a great question, because that is precisely the project I am doing with Christina Eilers (MIT) right now. It's cool that the Tangents and calibration projects are two sides of the same coin (bad metaphor?).
This summer I have been working with Winston Harris (MTSU) on the efficiency with which we can detect planets using an RV survey or with RV data. He has some nice results, some of which he presented at Stars & Exoplanets meeting today. After this, Megan Bedell (Flatiron) and I discussed the question: What are the simplest questions we can ask about detecting planets by RV observations? We have questions about the distribution / cadence of the observations, given a finite, pre-defined survey window. But we also have questions about the cadence in relation to the coherence times of the various noise processes: It should be different to take many observations within one coherence time vs taking observations separated by many coherence times. My intuition is not as rich as I'd like it to be here.
Adrian Price-Whelan (Flatiron) and I have been working on our project code-named Chemical Tangents this week. We have been working on a better name! But we have also been producing results and trying to understand the best scope for our first paper on the subject. It's hard, because there are so many things the method can do (in principle). We discussed how we would present the material in a short talk (because I am giving one in Germany—okay fine, remote to Germany) on Thursday. Then we realized that the scope of the first paper should be the same as the scope of a short talk! And indeed, it can be very useful to think about giving a talk when one thinks about how to assemble content into a paper: A paper is like a more detailed version of a technical seminar, in some sense. But anyway, that helped us reduce (even further) our scope. And it is making me think that I have a new theory of how to design a scientific paper...
Gaby Contardo (Flatiron) and I got in some research time at the end of the day today, visualizing data from SDSS-IV APOGEE data on stars on the upper red-giant branch. The question we are trying to ask is: Are there spectral differences between the stars that are going up the red-giant branch and stars that are going down? The problem we face is that we don't have a good training set. Is there some way we can find this, but unsupervised? I have an intuition that we can. Right now we are visualizing the data in photometric space, spectroscopic-parameter space, and PCA-amplitude space (where the PCA was performed on the spectral data).
I took a week of vacation this past week. My only research was a bit of thinking and writing in my project code-named Chemical Tangents.
As my loyal reader knows, I have four separate projects with Bonaca, Feeney, Casey, and Bedell to forward-model asteroseismic modes, with Bonaca concentrating on ground-based data, Feeney on principled Bayes, Casey on ESA Gaia, and Bedell on removing the modes as nuisances when we want to find planets. Today Bedell and I discussed where we are at, and came up with some things to try. In Sun-like stars, the modes have days-ish coherence times (apparently) and the modes have minutes-ish periods. So there are different regimes as your exposure cadence ranges from minutes to days to weeks. We have some qualitative predictions, and we are trying to make quantitative results that will influence survey design in the near future (especially for Terra Hunting Experiment and NASA NEID).
One funny thing about quasi-periodic (as in: finite coherence-time) oscillation processes is that they can be generated as a subset of Gaussian Processes, if you have good kernel machinery. We do! Another funny thing is that a GP can fit anything it is asked to fit! It literally has infinity free parameters (yes, literally). But the more appropriate kernels will do better (we hope) at predicting held-out data.
It was great to talk to Winston Harris (MTSU) about his results today running The Joker on exoplanet extreme-precision radial-velocity data. Instead of using The Joker to rejection-sample a huge prior sampling of models, he used it just to importance sample or weight each prior sample with the likelihood. In the highly-informative-data regime, rejection sampling only leaves samples in the dominant posterior mode. But importance samplings can be visualized to show all the other modes. His visualization confirms something we (Price-Whelan, Foreman-Mackey, and I) have been saying for years: When you have very informative time-domain data, you don't get a uni-modal likelihood (or posterior): You just get a likelihood (or posterior) with one very dominant mode. Locally, and at very low likelihood, all those other modes still exist. Period-finding is always an extremely non-convex problem.
Kate Storey-Fisher (NYU) and I worked through some figures for her paper on her new correlation-function estimator. Her estimator isn't tied to bins in separation: It infers the parameters of a smooth correlation-function model. Thus it is more expressive than the standard estimator with fewer parameters (bins). And her figures show that, using it, she can measure the baryon acoustic feature peak more precisely! Which corresponds to a reduction in the cost of a project at fixed precision, or an increase in precision at fixed cost. Awesome! Now how to present this all so it seems natural and sensible to the cosmology community: It is hard to teach an old dog new tricks.
A discriminative model is a method for finding a function of your features (like your spectrum) that delivers a prediction or expectation for your labels (like the temperature and metallicity of the star). A generative model is a method for finding a function of your labels that delivers a prediction or expectation for your features. In the discriminative case, you can take a derivative of labels wrt data. In the generative case you can take a derivative of the data wrt the labels. How are these related?
In the one-dimensional calculus context, these things are just inverses of each other. But in the multivariate labels and multivariate features case, it isn't so simple: Some of the features don't depend at all on the labels, so the generative derivative is zero; that doesn't make the discriminative derivative infinity!
The answer is the pseudo-inverse: You can convert a generative model into a discriminative model (locally, or to linear order) using the Taylor series and then linear least squares inference. That delivers a discriminative model (locally) and thus the other derivative you might want. The pseudo-inverse is the thing that appears in linear least squares. In numpy, the pseudo-inverse of X is solve(X.T @ X, X.T) or the operator embodied in lstsq().
As my loyal reader knows, Christina Eilers (MIT) and I have been looking at surface-gravity systematics in surface-abundance measurements in red-giant stars. It appears that stars with different gravities say different things about the abundance trends with (say) Galactocentric radius and perpendicular distance from the midplane. And none of these things agrees with the trends published in the literature by various groups. We now think that the published trends are caused by selection differences as a function of Galactic azimuth (probably primarily beause of crowding), plus these surface-gravity effects. Today we discussed with Hans-Walter Rix (MPIA) the scope of such projects, and we were able to establish a limited scope that we think will work. But one step on the path is to argue all this out with the APOGEE team, because we need to understand whether our interpretations of all these things are correct.
I hid in an undisclosed location today and worked on various pieces of writing, but especially the nearly-complete paper by Adam Wheeler (Columbia) on finding abundance outliers with an unsupervised method.
In a long conversation, Soledad Villar (NYU) and I worked out the expectation values for various kinds of what statisticians call “risk” for ordinary least-squares regression. The risk is a kind of expected mean-square error, and unfortunately much of the literature is ambiguous about what the expectation is taken over. That is, an expectation is an integral. Integral over what? So we worked some of that out, and then re-derived a known result, which is that ordinary least squares is unbiased (under assumptions, and under definitions of bias in terms of expectations) when the number of data points is larger than the number of free parameters, and it is biased when the number of data points is smaller than the number of free parameters.
If you are shocked that we are considering such cases (fewer data points than parameters! Blasphemy!) then you haven't been paying attention: In linear regression, the mean-squared error (for out-of-sample test data) for OLS generically gets smaller when the number of parameters far exceeds the number of data points. Everything we were taught in school is wrong! Of course in order to find this result, you have to define least squares so that it has a well-defined solution; that solution is the min-norm solution: The solution that minimizes the total sum of squares (or something related to this) of the parameters. That breaks the degeneracies you might be fearing.
When you measure the asteroseismic spectrum of a star, there are many observables: Mode frequencies, mode amplitudes, a frequency difference between modes, an envelope in frequency space where the modes have large amplitudes, and some more detailed things like even-odd-ell mode splittings and rotational splittings. Some of these observables are used in predicting or inferring stellar propeties (like mass, age, and composition) and some are not. Why not? My feeling (both from looking at the data and from thinking about the theory) is that the amplitudes of the modes (or mean amplitude over a set of modes) and the width of the envelope at nu-max are both going to be informative about stellar structure. Ana Bonaca (Harvard) and I discussed these things today, and how we might both measure and exploit this information with new data.
An absolutely great PhD defense today by Emily Sandford (Columbia), who has worked on planet transits and what can be learned therefrom. And so wide-ranging: She worked on what you learn about a star from a single transit, what you learn about an orbit from a single transit, what you learn about the shape of the transiter, and even what you learn about the population of planetary systems from the statistics of the transits. The discussion was excellent and enlightening. One thing I loved was a little discussion about what it would mean to think of zero-planet systems as planetary systems. And lots about the representation of multi-planet systems (where Sandford has a grammar or natural-language-like approach).
I loved the defense so much, in part because Sandford crushed it and in part because I generally love PhD defenses: They remind me of all the reasons that I love my job. I was reflecting afterwards that the PhD is a kind of model of education: It is student-centered, it is customized and personalized for every student, it is self-directed, it is constructive, and it is success-oriented. And it produces some amazing scientists, including the brand-new Dr Sandford.
I have had a good day finishing up my reading of the PhD dissertation of Emily Sandford (Columbia), who has a great collection of results on transit measurements of planets and stars. She makes use of Bayesian methods and also classic optimization or signal-processing methods to make challenging inferences, and she shows the limits and capabilities of current and future data. One thing I liked was that she works on what are called “single transits” where only one transit is found in the whole survey duration; what can you infer from that? A lot! (I have also worked on this problem long ago.) In the dissertation she busts a myth that the multiplicity distribution requires that there be a mixture of two qualitatively different kinds of planetary systems. I enjoyed that, and it leads to a lot of other science. Plus some creative work on understanding the detailed properties of multi-planet systems, treating them like sequence data. It's a great thesis and I am very much looking forward to tomorrow's defense.
In a long conversation, Lily Zhao (Yale), Megan Bedell (Flatiron), and I looked at a possible re-scope of our new paper on spectrograph calibration. Zhao is finishing one paper that builds a hierarchical, non-parametric wavelength solution. It does better than fitting polynomials independently for different exposure groups, even in the full end-to-end test of measuring and fitting stellar radial-velocities. But the paper we were discussing today is about the fact that the same distortions to the spectrograph that affect the wavelength solution also (to the same order) affect the positions of the spectroscopic traces on the device. That is, the relationship between (say) x position on the detector and wavelength can be inferred (given good training data) from the y position on the detector of the spectral traces. We have been struggling with how to implemnent and use this but then we realized that we could instead write a paper showing that it is true, and defer implementations to the pipeline teams. Implementation isn't trivial! But the result is nice. And kind-of obvious in retrospect.
I have my student-project office hours on Tuesdays and Wednesdays. It was a pretty fun session this week, in part because two of my students (Abby Shaum and Avery Simon) got jobs (congratulations!) and in part because we had a wide-ranging conversation that went way beyond everyone's particular projects. In one part of it, we talked about looking for pairs of planets (around a common host star) that have a period ratio that is a known irrational number (like e or pi or tau). Anu Raghunathan and I are using this kind of model as a null test when we search for planets in resonances (that is, we should find small-integer rational period ratios, not irrational ratios (or rational numbers with very large denominators; don't at-me)). But then of course we realized that one way that alien civilizations might signal us is by creating “transit beacons” that are in a ratio like e or pi. Hahaha new project!
Not much astrophysics research today. It turns out that racism is a much harder problem than, say, uncertainty estimation on a measurement! But I did do a tiny bit of research:
Adrian Price-Whelan (Flatiron) came by my office (yes, actually in person) and we worked a tiny bit on our reboot of the Chemical Tangents idea: The (birth) abundances, the birthday (like age), and (approximately) the actions of a star are invariants; the orbital angles and evolutionary phase are not. In an integrable, steady-state problem, the angle distributions should be both uniform and separable (from everything else). These ideas unify a lot of work over the last few years. We're trying to write a simple paper that's primarily pedagogical, but we don't have set scope yet.
With Winston Harris (MTSU) I am working on automating some aspects of discovery and decision-making in exoplanet science. Today we discussed the Bayesian and frequentist equivalents or analogs for a 5-sigma detection or measurement. It turns out that there are quite a few different ways you can define this. Some of the different ideas can be made to collapse to the same thing when the likelihood is Gaussian (in parameter space), but of course the likelihood is rarely precisely Gaussian! So we have choices to make. And of course when I say "5-sigma" I mean whatever threshold you want; it doesn't have to be precisely 5-sigma! A key idea that will guide our choieces is that Bayesian priors are measures with which we do integrals. That said, we will also make principled frequentist methods too. It's not obvious that you want to be Bayesian when you are claiming a discovery!
Lily Zhao (Yale) and I looked at the (very strange; don't try this at home) problem of locating spectrograph traces in the extreme-precision spectrograph EXPRES (Fischer, PI). As my loyal reader knows, we are looking at this problem because the location of the traces in the cross-dispersion direction can be an indicator of the physical state (and hence wavelength-calibration state) of the instrument, without harming any wavelength standards. We discussed the creation and use of a matched filter for this purpose. This is amusing, because we would be locating the traces in the cross-dispersion direction the same way that the pipelines locate the stellar spectrum in the dispersion direction! If our matched filter works, we believe that we will be able to use the trace positions as an ersatz simultaneous reference for wavelelength-calibration tracking.
A my loyal reader knows, I love putting machine learning inside a physical model. That is, not just using machine learning, but re-purposing machine learning to play a role in modeling a nuisance we don't care about inside our physical model. It's similar to how the best methods for causal inference use machine learning to capture the possibly complex and adversarial effects of confounders. Today I had the pleasure of reading closely a new manuscript by Francois Lanusse (Paris) that describes a use of machine learning to model galaxy images, but putting that model inside a causal structure (think: directed acyclic graph) that includes telescope optics and photon noise. The method seems to work really well.
In a great conversation with Soledad Villar (NYU) today, we realized that we have (more than) 10 methods for linear regression! Hahaha. Meaning: more than 10 differently conceived methods for finding a linear relationship between features X and labels Y using some kind of optimization or inference. Some involve regularization, some involve dimensionality reduction, some involve latent variables, and so on. None of them align with my usual practice, because we have put ourselves in a sandbox where we don't know the data-generating process: That is, we only have the rectangular X array and the Y array; we don't have any useful meta-data.
Things we are investigating are: Performance on predicting held-out data, and susceptibility to the double descent phenomenon. It turns out that both of these are amazingly strongly dependent on how we (truly) generate the data. In our sandbox, we are gods.
Ana Bonaca (Harvard) and I discussed two important successes today (both of them due to Bonaca): We found the issue in our code that caused there to be frequency structure on frequency scales smaller than what you might call the “uncertainty-principle” limit of the inverse of the duration of the survey. It was literally a bug, not a think-o. Which is reassuring. And we do have likelihoods that seem to peak (at least locally) at sensible values of delta-nu, the large frequency difference, in the parlance of asteroseismology. So now there are questions of how to improve performance, and how Bayesian to be, as it were. Frequentism is easier in some respects, harder in others. Lots of engineering to think about!
I did not get much done today. I had useful but frustrating conversations with Teresa Huang (NYU) and Soledad Villar (NYU) about the APOGEE data: We are using the synthetic apogoee spectra (which are part of the data model; released with the observed spectra) to infer (using local linear regression) the temperature derivative of the theoretical spectral expectation. It isn't working! That is, the derivative we get depends very strongly on the number of neighbors we use and how we infer it. Which makes no sense to me. But anyway, there is something I don't understand. Unfortunately, this makes me the blocker on Huang's paper! Argh. I need to compose complicated technical questions for the collaboration.
Christina Eilers (MIT) got some really nice results today, in which she used a causal argument to self-calibrate element abundances in red-giant stars to a fiducial surface-gravity. Abundances—for red-giant stars—can depend on surface gravity for (at least) two reasons: There can be evolutionary effects, in which surface abundances change as elements are dredged up from the stellar interiors, and there can be systematic errors in abundance measurements as a function of surface gravity (because of model wrongness, convective shifts, and so on). Since we have many stars at different positions in the Galaxy and at different surface gravities, we can take a self-calibration approach. Today, Eilers showed amazing results: The inconsistencies in Galactic abundance gradients along the red-giant branch get resolved (with some exceptions) and precision is greatly improved. I mentioned “causal” above: Self-calibration methods involve assumptions that would be described as causal by statisticians.
I spent some time this weekend working on the write-up of my project with Adam Wheeler (Columbia) that uses data-driven methods (and no training labels—it is unsupervised, that is) to find stars with anomalous abundances. It's a beautifully simple method, and has many many applications. We are just using it to find Lithium-enriched stars, but there are many other things to try with it.
Lily Zhao (Yale), Megan Bedell (Flatiron), and I spoke about Zhao's attempt to understand the variations in the data in the EXPRES precision spectrograph. We have the genius (to us, anyway) idea of using the trace positions in the cross-dispersion direction to calibrate the dispersion direction. Sound wrong-headed? It isn't, because the small number of instrument degrees of freedom that affect the wavelength solution will also affect the trace positions. So they will co-vary. We can even show that they do! The big issue we are having—and we are having great trouble diagnosing this—is that the calibration data (laser-frequency comb, in this case) are offset from the science data in strange ways. Which makes no sense, given the incredible stability of the spectrograph. In our call today, we came up with ways to further investigate and visualize what's going on.
It's going to be a summer of student projects for me! I spoke with Winston Harris (Middle Tennessee State) about our summer project to automate the detection of planets in RV data; he is installing software and data now. I spoke with Abby Shaum (NYU) about phase modulation methods for finding planets; we may have found a signal in a star thought to have no signal! I spoke with Abby Williams (NYU) who, with Kate Storey-Fisher (NYU), is looking at measuring not just large-scale structure but in fact spatial gradients in the large-scale structure, with Storey-Fisher's cool new tools. With Jonah Goldfine (NYU) I looked at his solutions to the exercises in Fitting a Model to Data, done in preparation for some attempts on NASA TESS light curves. And I spoke with Anu Raghunathan (NYU) about possibly speeding up her search for planet transits using box least squares. Our project is theoretical: What are the statistical properties of the BLS algorithm? But we still need things to be much much faster.
It was a very low-research day! But I did get in an hour with Bedell (Flatiron), discussing our plan to automate some aspects of exoplanet detection and discovery. The idea is: If we can operationalize and automate discovery, we can use that to perform experimental design on surveys. Surveys that we want to optimize for exoplanet yield. We discussed frequentist vs Bayesian approaches.
I'm writing a paper on uncertainty estimation—how you put an error bar on your measurement—and at the same time, Kate Storey-Fisher (NYU) and I are working on a new method for estimating the correlation function (of galaxies, say) that improves the precision (the error bar) on measurements.
In the latter project (large-scale structure), we are encountering some interesting conceptual things. For instance, if you make many independent measurements of something (a vector of quantities say) and you want to plot your mean measurement and an uncertainty, what do you plot? Standard practice is to use the square root of the diagonal entries of the covariance matrix. But you could instead plot the inverse square roots of the diagonal entries of the inverse covariance matrix. These two quantities are (in general) very different! Indeed we are taught that the former is conservative and the latter is not, but it really depends what you are trying to show, and what you are trying to learn. The standard practice tells you how well you know the correlation function in one bin, marginalizing out (or profiling) any inferences you have about other bins.
In the end, we don't care about what we measure at any particular radius in the correlation function, we care about the cosmological parameters we constrain! So Storey-Fisher and I discussed today how we might propagate our uncertainties to there, and compare methods there. I hope we find that our method does far better than the standard methods in that context!
Apologies for this rambling, inside-baseball post, but this is my research blog, and it isn't particularly intended to be fun, useful, or artistic!
I came in to the office today! for the first time in almost four months. I met with Jason Hunt (Flatiron) and we (at a distance) discussed the made-to-measure approach to fitting a steady-state dynamical system to observed kinematic (position and velocity) data. We are trying to move the method in the observational direction; that is, we are trying to make it more responsibly sensitive to the realistic uncertainties in the data. Long-term goal: Apply the method to the entire ESA Gaia data set! We aren't very far along in that, yet. But the first thing to do is to make sure the method can work with stars for which the uncertainties are large (and highly anisotropic in phase-space).
I spent time today writing a new paper stub (as is my wont). I re-framed the project Bedell (Flatiron) and I are working on in planet detection into a project on experimental (survey or project) design. After all, if you want to perform experimental design trades, you need a system that can tell you what you could or would observe or detect. So these questions: How do you discover an exoplanet? and How do I design my survey? are very intimately related. It's just a stub; we'll see if Bedell likes it.
This morning Ana Bonaca and I discussed an a apparent remarkable sensitivity of a likelihood function we have (for a photometric light curve) to change in frequency. There are sampling theorems, which say how smooth your likelihood function can be in the frequency direction. But do these apply when you are fitting multiple frequencies simultaneously? I would have thought yes for sure, but either we have a bug, a numerical instability, or a think-o. I don't think the problem is numerical because our condition numbers on all our matrices are reasonable.
I worked a little bit today on the question of whether we could possibly measure the local Milky Way gravitational tidal tensor using nearly-comoving stars in ESA Gaia. The idea is that each pair of stars, in some range of separations, should be a tiny two-star tidal stream. The problem looks hard, but there are a lot of pairs!
After carefully writing down a likelihood function for velocity differences between stellar pairs, Price-Whelan (Flatiron) and I had the Duh realization that the maximum-likelihood velocity difference is just the precise velocity difference you would have computed by plugging the ESA Gaia DR2 catalog entries into the most straightforward formula. Oh well! But of course the likelihood function is still useful, because it can be evaluated at alternative hypotheses (such as: velocity difference vanishes) and it can be integrated in Bayes to make evidences (for what those are worth).
The short-term consequence of all this is that it is extremely easy to make a good comoving-star catalog from Gaia DR2. So easy it is barely a publication! Should we do it anyway?
Yesterday Adrian Price-Whelan told me about an extremely interesting pair of comoving stars that Christina Hedges (NASA) found in a planet search. That had the effect of reminding him that we never ran a comoving-pair search in ESA Gaia DR2 data; we only ever did DR1 (where it was harder!). Add that to the point that Price-Whelan and I have been looking for straightforward, fun inference projects to do during this pandemic. Today I took some time (I am in an undisclosed location for a few days) to write up one (of many) possible methodologies for this project. One thought: Switch from a model-comparison framework (that we used in Oh et al.) to a parameter-estimation framework. This brings lots of advantages in terms of critical assumption-making, and computation. Another thought: Switch from Bayes to frequentism. Before you get too shocked: The DR2 RVS sample is so informative and precise that frequentist and Bayesian parameter estimations will be almost identical.
If yesterday was day 3, then the one-day working meeting on Monday regarding SDSS-IV was day zero, and today—a working meeting for SDSS-V—was day 3+1. The highlight for me today was a discussion led by Hans-Walter Rix (MPIA) and Kevin Covey (WWU) of what we might do with extra fiber–visits.
SDSS-V is a robot-positioned multi-object fiber-fed spectroscopic survey, with both optical and infrared spectrographs. It works in a few modes, but most of them involve jumping around the sky, taking (relatively) short spectroscopic observations of hundreds of stars at a time. The operations are complex: There are multiple target categories with different cadence requirements, and there are positional constraints on what the fiber robots can do. All this means that there are many, many (like millions of) unassigned fiber–visits.
The range of projects proposed was breathtaking, from Cepheids to microlensing events to nearby galaxy redshifts to quasar catalogs. And all of them so clever and thought-out that they were all compelling. And this wasn't even an official call for proposals: It was just a brainstorming session. Towards the end, there were some ideas (that I loved) about taking a union of the star-oriented suggestions and making a project with excellent data volume and legacy value. I love this Collaboration.
Want a piece of this? Consider buying in. Send me email if you want to discuss that, for yourself or for your institution.
I participated in day 3 of #sdss2020 today, and even started to pitch a project that could make use of the (literally) millions of unassigned fiber–visits in SDSS-V. Yes, the SDSS-V machines are so high throughput that, even doing multiple, huge surveys, there will be millions of unassigned fiber–visits. My pitch is with Adrian Price-Whelan; it is our project to get a spectrum of every possible “type”of star, where we have a completely algorithmic definition of “type”. More on this tomorrow, I hope.
In the afternoon, I spent time with Soledad Villar (NYU) reading this paper (Hastie et al 2019) on regression. It contains some remarkable results about what they call “risk” (and I call mean squared error) in regression. This paper is one of the key papers analyzing the double descent phenomena I described earlier. The idea is that when the number of free parameters of a regression becomes very nearly equal to the number of data points in the training set, the mean squared error goes completely to heck. This is interesting in its own right—I am learning about the eigenvalue properties of random matrices—but it is also avoidable with regularization. The paper explains both why and how. Villar and I are interested in avoiding it with dimensionality reduction, which is another kind of regularization, in a sense.
Related somehow to all this, I have been reading a new (new to me, anyway) book on writing, aimed at mathematicians. The Hastie et al paper is written by math-y people, and it has some great properties, like giving a clear summary of all of its findings up-front, a section giving the reader intuitions for each of them, and clear and timely reminders of key findings along the way. It's written almost like a white paper. It's refreshing, especially for a non-mathematician reader like me. As you may know, I can't read a paper that begins with the word Let!
Yesterday and today I had the first face-to-face meetings I have had since early March. Both were in Washington Square Park, and both were with sweaty bike-riding astrophysicists who live in Brooklyn. You can guess who they were! It was a pleasure though. I have missed face-to-face astronomy and teaching. Really. And I don't know how anyone gets anything done during this time. If you are having a hard time: You are not alone.
It was a low-research day as I had Departmental responsibilities. But I did get one single (long) paragraph written in a new plan of research for my work with Eilers (MIT), based on the discussions yesterday of self-calibration of element abundances in red-giant stars, and the inference of birth (as opposed to present-day surface) abundances. One paragraph of writing isn't much. But since a typical scientific paper is only 25 to 26 paragraphs, it is a relevant unit. Good luck yall.