As my loyal reader knows, Christina Eilers (MIT) and I have been looking at surface-gravity systematics in surface-abundance measurements in red-giant stars. It appears that stars with different gravities say different things about the abundance trends with (say) Galactocentric radius and perpendicular distance from the midplane. And none of these things agrees with the trends published in the literature by various groups. We now think that the published trends are caused by selection differences as a function of Galactic azimuth (probably primarily beause of crowding), plus these surface-gravity effects. Today we discussed with Hans-Walter Rix (MPIA) the scope of such projects, and we were able to establish a limited scope that we think will work. But one step on the path is to argue all this out with the APOGEE team, because we need to understand whether our interpretations of all these things are correct.
I hid in an undisclosed location today and worked on various pieces of writing, but especially the nearly-complete paper by Adam Wheeler (Columbia) on finding abundance outliers with an unsupervised method.
In a long conversation, Soledad Villar (NYU) and I worked out the expectation values for various kinds of what statisticians call “risk” for ordinary least-squares regression. The risk is a kind of expected mean-square error, and unfortunately much of the literature is ambiguous about what the expectation is taken over. That is, an expectation is an integral. Integral over what? So we worked some of that out, and then re-derived a known result, which is that ordinary least squares is unbiased (under assumptions, and under definitions of bias in terms of expectations) when the number of data points is larger than the number of free parameters, and it is biased when the number of data points is smaller than the number of free parameters.
If you are shocked that we are considering such cases (fewer data points than parameters! Blasphemy!) then you haven't been paying attention: In linear regression, the mean-squared error (for out-of-sample test data) for OLS generically gets smaller when the number of parameters far exceeds the number of data points. Everything we were taught in school is wrong! Of course in order to find this result, you have to define least squares so that it has a well-defined solution; that solution is the min-norm solution: The solution that minimizes the total sum of squares (or something related to this) of the parameters. That breaks the degeneracies you might be fearing.
When you measure the asteroseismic spectrum of a star, there are many observables: Mode frequencies, mode amplitudes, a frequency difference between modes, an envelope in frequency space where the modes have large amplitudes, and some more detailed things like even-odd-ell mode splittings and rotational splittings. Some of these observables are used in predicting or inferring stellar propeties (like mass, age, and composition) and some are not. Why not? My feeling (both from looking at the data and from thinking about the theory) is that the amplitudes of the modes (or mean amplitude over a set of modes) and the width of the envelope at nu-max are both going to be informative about stellar structure. Ana Bonaca (Harvard) and I discussed these things today, and how we might both measure and exploit this information with new data.
An absolutely great PhD defense today by Emily Sandford (Columbia), who has worked on planet transits and what can be learned therefrom. And so wide-ranging: She worked on what you learn about a star from a single transit, what you learn about an orbit from a single transit, what you learn about the shape of the transiter, and even what you learn about the population of planetary systems from the statistics of the transits. The discussion was excellent and enlightening. One thing I loved was a little discussion about what it would mean to think of zero-planet systems as planetary systems. And lots about the representation of multi-planet systems (where Sandford has a grammar or natural-language-like approach).
I loved the defense so much, in part because Sandford crushed it and in part because I generally love PhD defenses: They remind me of all the reasons that I love my job. I was reflecting afterwards that the PhD is a kind of model of education: It is student-centered, it is customized and personalized for every student, it is self-directed, it is constructive, and it is success-oriented. And it produces some amazing scientists, including the brand-new Dr Sandford.
I have had a good day finishing up my reading of the PhD dissertation of Emily Sandford (Columbia), who has a great collection of results on transit measurements of planets and stars. She makes use of Bayesian methods and also classic optimization or signal-processing methods to make challenging inferences, and she shows the limits and capabilities of current and future data. One thing I liked was that she works on what are called “single transits” where only one transit is found in the whole survey duration; what can you infer from that? A lot! (I have also worked on this problem long ago.) In the dissertation she busts a myth that the multiplicity distribution requires that there be a mixture of two qualitatively different kinds of planetary systems. I enjoyed that, and it leads to a lot of other science. Plus some creative work on understanding the detailed properties of multi-planet systems, treating them like sequence data. It's a great thesis and I am very much looking forward to tomorrow's defense.
In a long conversation, Lily Zhao (Yale), Megan Bedell (Flatiron), and I looked at a possible re-scope of our new paper on spectrograph calibration. Zhao is finishing one paper that builds a hierarchical, non-parametric wavelength solution. It does better than fitting polynomials independently for different exposure groups, even in the full end-to-end test of measuring and fitting stellar radial-velocities. But the paper we were discussing today is about the fact that the same distortions to the spectrograph that affect the wavelength solution also (to the same order) affect the positions of the spectroscopic traces on the device. That is, the relationship between (say) x position on the detector and wavelength can be inferred (given good training data) from the y position on the detector of the spectral traces. We have been struggling with how to implemnent and use this but then we realized that we could instead write a paper showing that it is true, and defer implementations to the pipeline teams. Implementation isn't trivial! But the result is nice. And kind-of obvious in retrospect.
I have my student-project office hours on Tuesdays and Wednesdays. It was a pretty fun session this week, in part because two of my students (Abby Shaum and Avery Simon) got jobs (congratulations!) and in part because we had a wide-ranging conversation that went way beyond everyone's particular projects. In one part of it, we talked about looking for pairs of planets (around a common host star) that have a period ratio that is a known irrational number (like e or pi or tau). Anu Raghunathan and I are using this kind of model as a null test when we search for planets in resonances (that is, we should find small-integer rational period ratios, not irrational ratios (or rational numbers with very large denominators; don't at-me)). But then of course we realized that one way that alien civilizations might signal us is by creating “transit beacons” that are in a ratio like e or pi. Hahaha new project!
Not much astrophysics research today. It turns out that racism is a much harder problem than, say, uncertainty estimation on a measurement! But I did do a tiny bit of research:
Adrian Price-Whelan (Flatiron) came by my office (yes, actually in person) and we worked a tiny bit on our reboot of the Chemical Tangents idea: The (birth) abundances, the birthday (like age), and (approximately) the actions of a star are invariants; the orbital angles and evolutionary phase are not. In an integrable, steady-state problem, the angle distributions should be both uniform and separable (from everything else). These ideas unify a lot of work over the last few years. We're trying to write a simple paper that's primarily pedagogical, but we don't have set scope yet.
With Winston Harris (MTSU) I am working on automating some aspects of discovery and decision-making in exoplanet science. Today we discussed the Bayesian and frequentist equivalents or analogs for a 5-sigma detection or measurement. It turns out that there are quite a few different ways you can define this. Some of the different ideas can be made to collapse to the same thing when the likelihood is Gaussian (in parameter space), but of course the likelihood is rarely precisely Gaussian! So we have choices to make. And of course when I say "5-sigma" I mean whatever threshold you want; it doesn't have to be precisely 5-sigma! A key idea that will guide our choieces is that Bayesian priors are measures with which we do integrals. That said, we will also make principled frequentist methods too. It's not obvious that you want to be Bayesian when you are claiming a discovery!
Lily Zhao (Yale) and I looked at the (very strange; don't try this at home) problem of locating spectrograph traces in the extreme-precision spectrograph EXPRES (Fischer, PI). As my loyal reader knows, we are looking at this problem because the location of the traces in the cross-dispersion direction can be an indicator of the physical state (and hence wavelength-calibration state) of the instrument, without harming any wavelength standards. We discussed the creation and use of a matched filter for this purpose. This is amusing, because we would be locating the traces in the cross-dispersion direction the same way that the pipelines locate the stellar spectrum in the dispersion direction! If our matched filter works, we believe that we will be able to use the trace positions as an ersatz simultaneous reference for wavelelength-calibration tracking.
A my loyal reader knows, I love putting machine learning inside a physical model. That is, not just using machine learning, but re-purposing machine learning to play a role in modeling a nuisance we don't care about inside our physical model. It's similar to how the best methods for causal inference use machine learning to capture the possibly complex and adversarial effects of confounders. Today I had the pleasure of reading closely a new manuscript by Francois Lanusse (Paris) that describes a use of machine learning to model galaxy images, but putting that model inside a causal structure (think: directed acyclic graph) that includes telescope optics and photon noise. The method seems to work really well.
In a great conversation with Soledad Villar (NYU) today, we realized that we have (more than) 10 methods for linear regression! Hahaha. Meaning: more than 10 differently conceived methods for finding a linear relationship between features X and labels Y using some kind of optimization or inference. Some involve regularization, some involve dimensionality reduction, some involve latent variables, and so on. None of them align with my usual practice, because we have put ourselves in a sandbox where we don't know the data-generating process: That is, we only have the rectangular X array and the Y array; we don't have any useful meta-data.
Things we are investigating are: Performance on predicting held-out data, and susceptibility to the double descent phenomenon. It turns out that both of these are amazingly strongly dependent on how we (truly) generate the data. In our sandbox, we are gods.
Ana Bonaca (Harvard) and I discussed two important successes today (both of them due to Bonaca): We found the issue in our code that caused there to be frequency structure on frequency scales smaller than what you might call the “uncertainty-principle” limit of the inverse of the duration of the survey. It was literally a bug, not a think-o. Which is reassuring. And we do have likelihoods that seem to peak (at least locally) at sensible values of delta-nu, the large frequency difference, in the parlance of asteroseismology. So now there are questions of how to improve performance, and how Bayesian to be, as it were. Frequentism is easier in some respects, harder in others. Lots of engineering to think about!
I did not get much done today. I had useful but frustrating conversations with Teresa Huang (NYU) and Soledad Villar (NYU) about the APOGEE data: We are using the synthetic apogoee spectra (which are part of the data model; released with the observed spectra) to infer (using local linear regression) the temperature derivative of the theoretical spectral expectation. It isn't working! That is, the derivative we get depends very strongly on the number of neighbors we use and how we infer it. Which makes no sense to me. But anyway, there is something I don't understand. Unfortunately, this makes me the blocker on Huang's paper! Argh. I need to compose complicated technical questions for the collaboration.