I hid in an undisclosed location today and worked on various pieces of writing, but especially the nearly-complete paper by Adam Wheeler (Columbia) on finding abundance outliers with an unsupervised method.
In a long conversation, Soledad Villar (NYU) and I worked out the expectation values for various kinds of what statisticians call “risk” for ordinary least-squares regression. The risk is a kind of expected mean-square error, and unfortunately much of the literature is ambiguous about what the expectation is taken over. That is, an expectation is an integral. Integral over what? So we worked some of that out, and then re-derived a known result, which is that ordinary least squares is unbiased (under assumptions, and under definitions of bias in terms of expectations) when the number of data points is larger than the number of free parameters, and it is biased when the number of data points is smaller than the number of free parameters.
If you are shocked that we are considering such cases (fewer data points than parameters! Blasphemy!) then you haven't been paying attention: In linear regression, the mean-squared error (for out-of-sample test data) for OLS generically gets smaller when the number of parameters far exceeds the number of data points. Everything we were taught in school is wrong! Of course in order to find this result, you have to define least squares so that it has a well-defined solution; that solution is the min-norm solution: The solution that minimizes the total sum of squares (or something related to this) of the parameters. That breaks the degeneracies you might be fearing.
When you measure the asteroseismic spectrum of a star, there are many observables: Mode frequencies, mode amplitudes, a frequency difference between modes, an envelope in frequency space where the modes have large amplitudes, and some more detailed things like even-odd-ell mode splittings and rotational splittings. Some of these observables are used in predicting or inferring stellar propeties (like mass, age, and composition) and some are not. Why not? My feeling (both from looking at the data and from thinking about the theory) is that the amplitudes of the modes (or mean amplitude over a set of modes) and the width of the envelope at nu-max are both going to be informative about stellar structure. Ana Bonaca (Harvard) and I discussed these things today, and how we might both measure and exploit this information with new data.
An absolutely great PhD defense today by Emily Sandford (Columbia), who has worked on planet transits and what can be learned therefrom. And so wide-ranging: She worked on what you learn about a star from a single transit, what you learn about an orbit from a single transit, what you learn about the shape of the transiter, and even what you learn about the population of planetary systems from the statistics of the transits. The discussion was excellent and enlightening. One thing I loved was a little discussion about what it would mean to think of zero-planet systems as planetary systems. And lots about the representation of multi-planet systems (where Sandford has a grammar or natural-language-like approach).
I loved the defense so much, in part because Sandford crushed it and in part because I generally love PhD defenses: They remind me of all the reasons that I love my job. I was reflecting afterwards that the PhD is a kind of model of education: It is student-centered, it is customized and personalized for every student, it is self-directed, it is constructive, and it is success-oriented. And it produces some amazing scientists, including the brand-new Dr Sandford.
I have had a good day finishing up my reading of the PhD dissertation of Emily Sandford (Columbia), who has a great collection of results on transit measurements of planets and stars. She makes use of Bayesian methods and also classic optimization or signal-processing methods to make challenging inferences, and she shows the limits and capabilities of current and future data. One thing I liked was that she works on what are called “single transits” where only one transit is found in the whole survey duration; what can you infer from that? A lot! (I have also worked on this problem long ago.) In the dissertation she busts a myth that the multiplicity distribution requires that there be a mixture of two qualitatively different kinds of planetary systems. I enjoyed that, and it leads to a lot of other science. Plus some creative work on understanding the detailed properties of multi-planet systems, treating them like sequence data. It's a great thesis and I am very much looking forward to tomorrow's defense.
In a long conversation, Lily Zhao (Yale), Megan Bedell (Flatiron), and I looked at a possible re-scope of our new paper on spectrograph calibration. Zhao is finishing one paper that builds a hierarchical, non-parametric wavelength solution. It does better than fitting polynomials independently for different exposure groups, even in the full end-to-end test of measuring and fitting stellar radial-velocities. But the paper we were discussing today is about the fact that the same distortions to the spectrograph that affect the wavelength solution also (to the same order) affect the positions of the spectroscopic traces on the device. That is, the relationship between (say) x position on the detector and wavelength can be inferred (given good training data) from the y position on the detector of the spectral traces. We have been struggling with how to implemnent and use this but then we realized that we could instead write a paper showing that it is true, and defer implementations to the pipeline teams. Implementation isn't trivial! But the result is nice. And kind-of obvious in retrospect.
I have my student-project office hours on Tuesdays and Wednesdays. It was a pretty fun session this week, in part because two of my students (Abby Shaum and Avery Simon) got jobs (congratulations!) and in part because we had a wide-ranging conversation that went way beyond everyone's particular projects. In one part of it, we talked about looking for pairs of planets (around a common host star) that have a period ratio that is a known irrational number (like e or pi or tau). Anu Raghunathan and I are using this kind of model as a null test when we search for planets in resonances (that is, we should find small-integer rational period ratios, not irrational ratios (or rational numbers with very large denominators; don't at-me)). But then of course we realized that one way that alien civilizations might signal us is by creating “transit beacons” that are in a ratio like e or pi. Hahaha new project!
Not much astrophysics research today. It turns out that racism is a much harder problem than, say, uncertainty estimation on a measurement! But I did do a tiny bit of research:
Adrian Price-Whelan (Flatiron) came by my office (yes, actually in person) and we worked a tiny bit on our reboot of the Chemical Tangents idea: The (birth) abundances, the birthday (like age), and (approximately) the actions of a star are invariants; the orbital angles and evolutionary phase are not. In an integrable, steady-state problem, the angle distributions should be both uniform and separable (from everything else). These ideas unify a lot of work over the last few years. We're trying to write a simple paper that's primarily pedagogical, but we don't have set scope yet.
With Winston Harris (MTSU) I am working on automating some aspects of discovery and decision-making in exoplanet science. Today we discussed the Bayesian and frequentist equivalents or analogs for a 5-sigma detection or measurement. It turns out that there are quite a few different ways you can define this. Some of the different ideas can be made to collapse to the same thing when the likelihood is Gaussian (in parameter space), but of course the likelihood is rarely precisely Gaussian! So we have choices to make. And of course when I say "5-sigma" I mean whatever threshold you want; it doesn't have to be precisely 5-sigma! A key idea that will guide our choieces is that Bayesian priors are measures with which we do integrals. That said, we will also make principled frequentist methods too. It's not obvious that you want to be Bayesian when you are claiming a discovery!
Lily Zhao (Yale) and I looked at the (very strange; don't try this at home) problem of locating spectrograph traces in the extreme-precision spectrograph EXPRES (Fischer, PI). As my loyal reader knows, we are looking at this problem because the location of the traces in the cross-dispersion direction can be an indicator of the physical state (and hence wavelength-calibration state) of the instrument, without harming any wavelength standards. We discussed the creation and use of a matched filter for this purpose. This is amusing, because we would be locating the traces in the cross-dispersion direction the same way that the pipelines locate the stellar spectrum in the dispersion direction! If our matched filter works, we believe that we will be able to use the trace positions as an ersatz simultaneous reference for wavelelength-calibration tracking.
A my loyal reader knows, I love putting machine learning inside a physical model. That is, not just using machine learning, but re-purposing machine learning to play a role in modeling a nuisance we don't care about inside our physical model. It's similar to how the best methods for causal inference use machine learning to capture the possibly complex and adversarial effects of confounders. Today I had the pleasure of reading closely a new manuscript by Francois Lanusse (Paris) that describes a use of machine learning to model galaxy images, but putting that model inside a causal structure (think: directed acyclic graph) that includes telescope optics and photon noise. The method seems to work really well.
In a great conversation with Soledad Villar (NYU) today, we realized that we have (more than) 10 methods for linear regression! Hahaha. Meaning: more than 10 differently conceived methods for finding a linear relationship between features X and labels Y using some kind of optimization or inference. Some involve regularization, some involve dimensionality reduction, some involve latent variables, and so on. None of them align with my usual practice, because we have put ourselves in a sandbox where we don't know the data-generating process: That is, we only have the rectangular X array and the Y array; we don't have any useful meta-data.
Things we are investigating are: Performance on predicting held-out data, and susceptibility to the double descent phenomenon. It turns out that both of these are amazingly strongly dependent on how we (truly) generate the data. In our sandbox, we are gods.
Ana Bonaca (Harvard) and I discussed two important successes today (both of them due to Bonaca): We found the issue in our code that caused there to be frequency structure on frequency scales smaller than what you might call the “uncertainty-principle” limit of the inverse of the duration of the survey. It was literally a bug, not a think-o. Which is reassuring. And we do have likelihoods that seem to peak (at least locally) at sensible values of delta-nu, the large frequency difference, in the parlance of asteroseismology. So now there are questions of how to improve performance, and how Bayesian to be, as it were. Frequentism is easier in some respects, harder in others. Lots of engineering to think about!
I did not get much done today. I had useful but frustrating conversations with Teresa Huang (NYU) and Soledad Villar (NYU) about the APOGEE data: We are using the synthetic apogoee spectra (which are part of the data model; released with the observed spectra) to infer (using local linear regression) the temperature derivative of the theoretical spectral expectation. It isn't working! That is, the derivative we get depends very strongly on the number of neighbors we use and how we infer it. Which makes no sense to me. But anyway, there is something I don't understand. Unfortunately, this makes me the blocker on Huang's paper! Argh. I need to compose complicated technical questions for the collaboration.
Christina Eilers (MIT) got some really nice results today, in which she used a causal argument to self-calibrate element abundances in red-giant stars to a fiducial surface-gravity. Abundances—for red-giant stars—can depend on surface gravity for (at least) two reasons: There can be evolutionary effects, in which surface abundances change as elements are dredged up from the stellar interiors, and there can be systematic errors in abundance measurements as a function of surface gravity (because of model wrongness, convective shifts, and so on). Since we have many stars at different positions in the Galaxy and at different surface gravities, we can take a self-calibration approach. Today, Eilers showed amazing results: The inconsistencies in Galactic abundance gradients along the red-giant branch get resolved (with some exceptions) and precision is greatly improved. I mentioned “causal” above: Self-calibration methods involve assumptions that would be described as causal by statisticians.
I spent some time this weekend working on the write-up of my project with Adam Wheeler (Columbia) that uses data-driven methods (and no training labels—it is unsupervised, that is) to find stars with anomalous abundances. It's a beautifully simple method, and has many many applications. We are just using it to find Lithium-enriched stars, but there are many other things to try with it.
Lily Zhao (Yale), Megan Bedell (Flatiron), and I spoke about Zhao's attempt to understand the variations in the data in the EXPRES precision spectrograph. We have the genius (to us, anyway) idea of using the trace positions in the cross-dispersion direction to calibrate the dispersion direction. Sound wrong-headed? It isn't, because the small number of instrument degrees of freedom that affect the wavelength solution will also affect the trace positions. So they will co-vary. We can even show that they do! The big issue we are having—and we are having great trouble diagnosing this—is that the calibration data (laser-frequency comb, in this case) are offset from the science data in strange ways. Which makes no sense, given the incredible stability of the spectrograph. In our call today, we came up with ways to further investigate and visualize what's going on.
It's going to be a summer of student projects for me! I spoke with Winston Harris (Middle Tennessee State) about our summer project to automate the detection of planets in RV data; he is installing software and data now. I spoke with Abby Shaum (NYU) about phase modulation methods for finding planets; we may have found a signal in a star thought to have no signal! I spoke with Abby Williams (NYU) who, with Kate Storey-Fisher (NYU), is looking at measuring not just large-scale structure but in fact spatial gradients in the large-scale structure, with Storey-Fisher's cool new tools. With Jonah Goldfine (NYU) I looked at his solutions to the exercises in Fitting a Model to Data, done in preparation for some attempts on NASA TESS light curves. And I spoke with Anu Raghunathan (NYU) about possibly speeding up her search for planet transits using box least squares. Our project is theoretical: What are the statistical properties of the BLS algorithm? But we still need things to be much much faster.
It was a very low-research day! But I did get in an hour with Bedell (Flatiron), discussing our plan to automate some aspects of exoplanet detection and discovery. The idea is: If we can operationalize and automate discovery, we can use that to perform experimental design on surveys. Surveys that we want to optimize for exoplanet yield. We discussed frequentist vs Bayesian approaches.
I'm writing a paper on uncertainty estimation—how you put an error bar on your measurement—and at the same time, Kate Storey-Fisher (NYU) and I are working on a new method for estimating the correlation function (of galaxies, say) that improves the precision (the error bar) on measurements.
In the latter project (large-scale structure), we are encountering some interesting conceptual things. For instance, if you make many independent measurements of something (a vector of quantities say) and you want to plot your mean measurement and an uncertainty, what do you plot? Standard practice is to use the square root of the diagonal entries of the covariance matrix. But you could instead plot the inverse square roots of the diagonal entries of the inverse covariance matrix. These two quantities are (in general) very different! Indeed we are taught that the former is conservative and the latter is not, but it really depends what you are trying to show, and what you are trying to learn. The standard practice tells you how well you know the correlation function in one bin, marginalizing out (or profiling) any inferences you have about other bins.
In the end, we don't care about what we measure at any particular radius in the correlation function, we care about the cosmological parameters we constrain! So Storey-Fisher and I discussed today how we might propagate our uncertainties to there, and compare methods there. I hope we find that our method does far better than the standard methods in that context!
Apologies for this rambling, inside-baseball post, but this is my research blog, and it isn't particularly intended to be fun, useful, or artistic!
I came in to the office today! for the first time in almost four months. I met with Jason Hunt (Flatiron) and we (at a distance) discussed the made-to-measure approach to fitting a steady-state dynamical system to observed kinematic (position and velocity) data. We are trying to move the method in the observational direction; that is, we are trying to make it more responsibly sensitive to the realistic uncertainties in the data. Long-term goal: Apply the method to the entire ESA Gaia data set! We aren't very far along in that, yet. But the first thing to do is to make sure the method can work with stars for which the uncertainties are large (and highly anisotropic in phase-space).
I spent time today writing a new paper stub (as is my wont). I re-framed the project Bedell (Flatiron) and I are working on in planet detection into a project on experimental (survey or project) design. After all, if you want to perform experimental design trades, you need a system that can tell you what you could or would observe or detect. So these questions: How do you discover an exoplanet? and How do I design my survey? are very intimately related. It's just a stub; we'll see if Bedell likes it.
This morning Ana Bonaca and I discussed an a apparent remarkable sensitivity of a likelihood function we have (for a photometric light curve) to change in frequency. There are sampling theorems, which say how smooth your likelihood function can be in the frequency direction. But do these apply when you are fitting multiple frequencies simultaneously? I would have thought yes for sure, but either we have a bug, a numerical instability, or a think-o. I don't think the problem is numerical because our condition numbers on all our matrices are reasonable.