Today we delved into even more detail about how the HARPS3 instrument works, looking at engineering drawings and discussing how charge-coupled devices (CCDs) read out. We discussed the time stability of various parts of the instrument and electronics. We are all very excited about assembly, verification, and testing in Cambridge this summer.
Today was a delight! In a working session, Clark Baker (Cambridge) gave a beautiful, conceptual and concrete description of how an echelle spectrograph works and the blaze and the resolution and etc. My favorite moment was the aha! moment I had when he described the Littrow condition. This was followed by Alicia Anderson (Cambridge) explaining how the data reduction proceeds. Then she and Federica Rescigno (Exeter) helped us install the data-reduction software for the ESO instruments (ESPRESSO, HARPS-N, etc) and we started reducing raw echelle data.
Before all this there was a wide-ranging discussion of measuring 3-point functions of radial-velocity time series data. This was inpired by the question: Is a Gaussian process a good model for these data? I hope this turns into a project or set of projects.
So many good things happened in the meeting today! Highlights were presentations by Niamh O'Sullivan (Oxford) Ben Lakeland (Exeter) who showed amazing results running models of stellar variability on data from the Sun. O'Sullivan can see that the sun goes through many different phases of spots, granulation, and super-granulation. She finds these by fitting Gaussian processes of certain forms. Related: Suzanne Aigrain (Oxford) showed that even in very gappy data, the GP fits are unbiased, whereas naive use of periodograms is biased!
Lakeland showed that super-granulation can in principle be modeled in the Solar time series, and maybe the tiniest hint that when he corrects for super-granulation well, the RV variability might be even lower than at times at which there is no super-granulation in play at all. Does super-granulation suppress other kinds of variability?
I'm very optimistic—between Liang yesterday, Zhou's work at Flatiron, and these presentations—that we will be able to mitigate many difficult sources of stellar variability. I was inspired to outline a conceptual paper on why or how this is all going to work.
Today was the first day of the Terra Hunting annual science meeting. One highlight of the day was a presentation by Yan Liang (Princeton), who is modeling stellar spectral variability (the tiny variability) that affects extremely precise radial-velocity measurements. Her method involves a neural network, which is trained to distinguish RV variations and spectral shape variations through a self-supervised approach (with a data augmentation). Then it separates true stellar RV variations from spectral-variability-induced wrong RV variations by requiring (essentially) that the RV variations be uncorrelated with the (latent) description of the stellar spectral shape. This connects to various themes I am interested in, including wobble by Bedell, a spectral variability project by Zhao, and causal structure in machine learning.
Cole Johnston (Leuven) is in New York this week. We discussed the problem of finding oscillation modes in the photometry of stars in the presence of a large, binary-induced periodicity. What he kind-of wants is a simultaneous fitting of a flexible periodic function plus a periodogram. We did some experiments (very promising!) and discussed the elements that will come together to make this all happen. The final method will look like a double fourier transform, in which one frequency grid gets the periodic part, and the other grid gets the rest of the modes and noise.
There is a non-wrong view of academic science that it is all about applying for funding, and evaluating the proposals of others for funding. That's all I did today (evaluated proposals for a foreign funding program; I submitted my own proposal to the NSF yesterday).
On Monday of this week, Shirley Ho (Flatiron) gave a talk at NYU in which she mentioned the unreasonable effectiveness of pre-training a neural network: If, before you train your network on your real (expensive, small) training data, you train it on a lot of (cheap, approximate) pre-training data, you get better overall performance. Why? Ho discussed this in the context of PDE emulation: She pre-trains with cheap PDEs and then trains on expensive PDEs and she gets way better performance than she does if she just trains on the expsensive stuff.
Why does this work? One interesting observation is that even pre-training on cat videos helps with the final training! Ho's belief is that the pre-training gets the network understanding time continuity and other smoothness kinds of things. My conjecture is that the pre-training teaches the network about (approximate) diffeomorphism invariance (coordinate freedom). The cool thing is that these conjectures could be tested with interventions!
I have to finish my NSF proposal with Mike Blanton (NYU), so naturally I am in procrastination mode. Here are three papers I wish I would write. Maybe I should post them on my ideas blog:
Occam's Razor is wrong: This paper, co-authored with Jennifer Hill (NYU), would be about the fact that, in the real, observed world, the simplest explanation is always wrong or at least incomplete.
Causation is just causality: This paper, maybe co-authored with David Blei (Columbia) or Bernhard Schölkopf (MPI-IS) or Hill, shows that you don't need to have free will in order to have cogent causal explanations of data. That is, you don't need to phrase causality in terms of predictions for counter-factual experiments that you might have chosen to do.
You don't ever want evidence: This paper shows that any time you are computing the Bayesian evidence—what I call the fully marginalized likelihood (fml)—you are doing the wrong integral and solving the wrong problem. For both practical and theoretical (principled) reasons.
A highlight of my day was a colloquium by Renée Hložek (Toronto) about cosmology and event detection with the LSST/Rubin. Importantly (from my perspective), she has run a set of challenges for classifying transients, based on simulations of the output of the very very loud LSST event-detection systems. The results are a bit depressing, I think (sorry Renée!), because (as she emphasized), all the successful methods (and none were exceedingly successful) made heavy use of data augmentation: They noisified things, artificially redshifted things, dropped data points from things, and so on. That's a good idea, but it shows that machine-learning methods at the present day can't easily (or ever?) be told what to expect as an event redshifts or gets fainter or happens on a different night. I'd love to fix those problems. You can almost think of all of these things as group operations. They are groups acting in a latent space though, not in the data space. Hard problems! But worthwhile.
Valentina Tardugno (NYU) and I are looking at the NASA TESS housekeeping data: What parts of it are relevant to understanding the light curves? The weird thing is: We are asking this by asking: What housekeeping data can be reliably predicted using the light curves? Why this way? Because the light curves are higher in signal-to-noise (in general) than most channels of the housekeeping data. Today we went through all the relevant linear algebra for big linear models (which is where we are starting, of course!).
It is traditional to plot things like the mean iron abundances of stars (or ratios of magnesium to iron, or other ratios) as a function of position in the Galaxy. However, stars change their positions over time, so the gradients (the features in any abundance–position plots) will be smeared out over cosmic time by their motions.
At the same time, stars have approximately invariant actions or integrals of motion, which don't change (much) as they orbit. These invariants are only approximate, both because the Galaxy isn't exactly integrable, and also because we don't know or measure everything we need to compute them precisely for any observed star.
Putting these two ideas together, the abundance–action features, or really abundance–invariant features should be much clearer and more informative than the abundance–position features. Awesome, let's go! The only problem is: Selection effects are often simple in the position space, but are almost never simple in the dynamical-invariant-space. So any plots are harder to interpret generally.
These are issues that I have discussed over many years with Hans-Walter Rix (MPIA). Today I discussed them with Danny Horta (Flatiron) and Adrian Price-Whelan (Flatiron), in preparation for an exploratory study by Horta.
Saakshi More (NYUAD) came into my office during office hours today to ask about possible data science projects in physics. I pitched to her predicting ESA Gaia RVS spectra from Gaia XP spectra, and vice versa. Has anyone done that? In one direction, you have to predict high resolution detail from low-resolution input; in the other direction, you have to predict a wide wavelength range from narrow input. It seems like perfect for something like a linear auto-encoder (at least for a small patch of the color–magnitude diagram; non-linear for a large patch). Later in the day I talked to Gaby Contardo and she said: If you want to go simple, how about nearest neighbor? Good idea!
I spent the day with Juna Kollmeier (CITA) talking about epistemology, physical cosmology, and project management (especially academic management). I found myself saying to her the following argument (which I have not seen written down anywhere): Imagine that our Universe is hamiltonian (or lagrangian; it doesn't matter for these purposes). And imagine that our Universe is a simulation being run inside some bigger universe, which is also hamiltonian.
If our Universe is being observed in any sense by any system in that bigger universe, then there ought to be a loss of unitarity in our Universe. That is, there should be a violation of Liouville's theorem, or a violation of key conservation laws, or an information sink. And there is! At black hole horizons, there is an information paradox: Information that goes in never comes back (an evaporating black hole evaporates thermally, or so we think). Thoughts?