In group meeting, and other conversations today, I asked about how to optimize very large parameter vector, when my problem is convex but has an L1 term in the norm. Both Gaby Contardo (Flatiron) and Soledad Villar (JHU) said: Use the standard Lasso optimizer. At first I thought “but my problem doesn't have exactly the Lasso form!”. But then I realized that it is possible to manipulate the operators I have so that it has exactly the Lasso form, and then I can just use a standard Lasso optimizer! So I'm good and I can proceed.
I talked to Kate Storey-Fisher (NYU) about a beautiful rainbow quartz rock that she has: It is filled with colors, in a beautiful geological palette. Could we turn this into a colormap for making plots, or a set of colormaps? We discussed considerations.
After having many conversations with Gaby Contardo (Flatiron) and Christina Hedges (Ames) about finding events of various kinds in stellar light curves (from NASA Kepler and TESS), I was reminded of dictionary methods, or sparse-coding methods. So I spent some time writing down a possible sparse-coding approach for Kepler light curves, and even a bit of time writing some code. But I think we probably want something more general than the kind of bilinear problem I find it easy to write down: I am imagining a set of words, and a set of occurrences (and amplitudes) of those words in the time domain. But real events will have other parameters (shape and duration parameters), which suggests using more nonlinear methods.
Kate Storey-Fisher (NYU) and I are advising an undergraduate research project for Abby Williams (NYU) in cosmology. Williams is looking at the question: How precisely can we say that the large-scale structure in the Universe is homogenous? Are there gradients in the amplitude of galaxy clustering (or other measures)? Her plan is to use Storey-Fisher's new clustering tools, which can look at variations in clustering without binning or patchifying the space. In the short term, however, we are starting in patches, just to establish a baseline. Today things came together and Williams can show that if we simulate a toy universe with a clustering gradient, she can discover and accurately measure that gradient, using analyses in patches. The first stage of this is to do some forecasts or information theory.
Today Soledad Villar (JHU) and I discussed different ways to structure a machine-learning method for a cosmological problem: The idea is to use the machine-learning method to replace or emulate a cosmological simulation. This is just a toy problem; of course I'm interested in data analysis, not theory, in the long run. But we realized today that we have a huge number of choices about how to structure this. Since the underlying data come from an ordinary differential equation, we can structure our ML method like an ordinary differential equation, and see what it finds! Or we can give it less structure (and more freedom) and see if it does better or worse. That is, you can build a neural network that is, on the inside, a differential equation. That's crazy. Obvious in retrospect but I've never thought this way before.
Lily Zhao (Yale) and Megan Bedell (Flatiron) and I are working on measuring very precise radial velocoties for very small data sets, where (although there are hundreds of thousands of pixels per spectrum) there are only a few epochs of observations. In these cases, it is hard for our data-driven method to separate the stellar spectrum model from the telluric spectrum model—our wobble method makes use of the independent covariances of stellar and telluric features to separate the star from the sky. So we discussed the point that really we should use all stars to learn the (maybe flexible) telluric model). That's been a dream since the beginning (it is even mentioned in the original wobble paper), but execution requires some design thinking: We want the optimizations to be tractable, and we want the interface to be sensible. Time to go to the whiteboard. Oh wait, it's a pandemic.
In my weekly meeting with Teresa Huang (JHU) and Soledad Villar (JHU), we went through our methods for putting labels on stellar spectra (labels like effective temperature, surface gravity, and metallicity). We have all the machinery together now to do this with physical models, with The Cannon (a data-driven generative model), and with neural networks (deep learning, or other data-driven discriminative models). The idea is to see how well these different kinds of models respect our beliefs about stars and spectroscopic observations, and how they fit or over-fit, as a function of training and model choices. We are using the concept of adversarial attacks to guide us. All our pieces are in place now to do this full set of comparisons.
Gaby Contardo (Flatiron) and I have been working on time asymmetry in NASA Kepler light curves. Our first attempts on this have been about prediction: Is it easier to predict a point in a light curve using its past or its future? It turns out that, for very deep mathematical reasons, there is a lot of symmetry here, even when the light curve is obviously time asymmetric in seemingly relevant ways. So deep, I think we might have some kind of definition of “stationary”. So we are re-tooling around just observable asymmetries. We discussed many things, including dictionary methods. It also occurred to us that in addition to time-reversal questions, there are also flux-reversal questions (like if you flip a light-curve event upside down).
One research highlight today was working on the writing and organization of a paper on the defintion and use of the selection function in population studies (with, say, a catalog of observed sources). The paper is led by Hans-Walter Rix (MPIA), is aimed at the ESA Gaia community, and uses the white-dwarf luminosity and temperature distribution as its test case.
My loyal reader knows that Hans-Walter Rix (MPIA) and I have been looking at the population of white dwarfs as observed by ESA Gaia. This is demonstration project; it is somewhat adjacent to our usual science. However, today he ran our white-dwarf code for the much redder stars at the bottom of the main sequence (late M and brown dwarfs) and what did he find? It looks like the main sequence splits into two branches at the lowest-mass (coldest) end. Is that a discovery or known? And who could tell us?
I had an interesting conversation with Soledad Villar (JHU) about the difference between frequentist and Bayesian descriptions or analysis of the expected wrongness (out-of-sample prediction error) for a regression or interpolation. The different statistical philosophies lead to different kinds of operations you naturally do (frequentists naturally integrate over all possible data sets; Bayesians naturally also integrate over all possible (latent) parameter values consistent with the data). These differences in turn lead to different meanings for the eventual estimates of prediction error. I'm not sure I have it all right yet, but I'd like to figure it out and write something about all this. I'm generally a pragmatist, but statistical philosophy matters sometimes!
I had a great call today with So Hattori (NYUAD) and Dan Foreman-Mackey (Flatiron), about Hattori's reboot of the causal pixel model by Dun Wang (that we used in NASA Kepler data) for new use on NASA TESS data. Importantly, Hattori has generalized the model so it can be used in way more science cases than we have looked at previously, including supernovae and tidal disruption events. And his paper is super-pedagogical, so it will invite and support (we hope) new users. Very excited to help finish this up!
I worked with Adrian Price-Whelan (Flatiron) this morning on an empirical noise model for SDSS-IV APOGEE radial-velocity data. We fit a mixture of quiet and noisy stars plus additive Gaussian noise to empirical radial-velocity data, and started to figure out how the noise must depend on temperature, metallicity, and signal-to-noise. It looks like we can learn the noise model! And thus be less dependent on the assumptions in the pipelines.
I brought up the following issue at group meeting: When Lily Zhao (Yale) looks at how well spectral shape changes predict radial-velocity offsets (in simulated spectroscopic data from a rotating star with time-dependent star spots), she finds that there are small segments of data that predict the radial velocity offsets better than the whole data set does. That is, if you start with a good, small segment, and add data, your predictions get worse. Add data, do worse! This shouldn't be.
Of course whenever this happens it means there is something wrong with the model. But what to do to diagnose this and fix it? Most of the crowd was in support of what I might call “feature engineering”, in which we identify the best spectral regions and just use those. I don't like that solution, but it's easier to implement than a full shake-down of the model assumptions.
Gaby Contardo (Flatiron) and I have been working on predicting light-curve data points from their pasts and their futures, to see if there is a time asymmetry. And we have been finding one! But today we discussed results in which Contardo was much more aggressive in removing data at or near spacecraft issues (this is NASA Kepler data). And most of our results go away! So we have to decide where we go from here. Obviously we should publish our results even if they are negative! But how to spin it all...?
One of the things I say over and over in my group is: We build software, but every piece of software itself is not that valuable: Our software is valuable because it encodes good ideas and good practices for data analysis. In that spirit, I re-wrote The Cannon (Ness et al 2015) in an hour today in a Google (tm) Colab notebook. It's only ten-ish lines of active code! And ten more of comments. The Cannon is not a software package; it is a set of ideas. And my reimplementation has way more stable linear algebra than any previous version I've seen (because I've learned so much about this in the last few years, with help from Soledad Villar). I did the Cannon reimplementation for Teresa Huang (JHU), who is finding adversarial attacks against it.
I had a nice conversation today with Renbin Yan (Kentucky) and Xihan Ji (Kentucky) about work they have been doing with emission-line ratios. Typically these ratios are plotted on a “BPT” diagram (yes, named after people, unfortunately). Ji has been looking for more informative two-dimensional diagrams, by considering linear combinations of a larger set of ratios. He has beautiful visualizations! And he can also clearly show how the models of the line ratios depend on assumptions and parameters, which develop intuitions about what the ratios tell us, physically. We briefly discussed the possibility that we might actually be able to constrain nucleosynthesis parameters using emission-line spectra of nebulae!
Today was a low-ish research day. In my research time, I discussed improvements to radial-velocity measurements with Adrian Price-Whelan (Flatiron) and gauge-invariant machine learning with Soledad Villar.