Late in the day, Tyler Pritchard and I met to discuss a workshop we are putting together to see if there is critical mass in the NYC area to start a coherent, multi-institutional effort on time-domain astrophysics and LSST in particular. We want the workshop to be brief, but we want everyone to be able to say something. But we don't want the workshop to be all talking: We want to have the participants do design work and make decisions, jointly.
Today was a very low-research day. But I did get in a good phonecall with Lily Zhao (Yale) and Megan Bedell (Flatiron) about what's next, now that Zhao has submitted her paper on Excalibur, which is our hierarchical, non-parametric model for spectrograph calibration. In the submitted version, she finds that running Excalibur improves calibration of the EXPRES spectrograph over the standard pipeline by an amount that corresponds to removing an additive source of noise with a rms of 62 cm/s! So we really are improving the output of the hardware with sensible, well thought-out software.
Now that this calibration paper is done, we have decided to swing our focus over onto an activity challenge, in which teams compete to model and remove stellar-activity noise in radial-velocity signals. The challenge will come with raw spectra, so we work in this at the pixel level, which, as my loyal reader knows, is where I think all the innovation is possible.
In group meeting today, I had the group comment on my nascent mission statement for the Astronomical Data Group.
Today Soledad Villar (JHU) and I went back to the idea of adversarial attacks against linear regressions. But this time we looked at the training-set side, meaning: Can you attack the method with adversarial training data. One idea: What is the worst single data point you can add to the training set to distort the results? It looks to me like ordinary least squares is incredibly (arbitrarily) sensitive to this kind of attack if the dimensions of the data (the number of parameters) exceeds the size of the training set. This kind of sensitivity to attack somehow mitigates against the idea that more parameterized models are better (which is the thinking in machine learning these days).
Most of my research time today was spent working on student papers. I print them out, mark them up, and then send photographs of the marked-up pages to the first authors. That's ridiculous, but I don't know how else to do it that meets all my requirements (which are legion). In-between pages of papers, I discussed the red-giant branch with Gaby Contardo (Flatiron): We are trying to spectrally separate the upward-going stars from the downward-going stars, above the red clump. We formulated an unsupervised method that involves regression.
My committee on Equity and Inclusion at NYU is in final throes of getting a report out for our summer of work towards recommendations for the Department. So nothing that meets The Rules (tm) at the right. That said, what we are doing is definitely and absolutely physics.
Today was the final wrap-up presentations from the new partnership between the Simons Foundation (and especially Simons Observatory) and the National Society of Black Physicists that made the S-NSBP Scholars program. Scholars were paid to spend the summer interning in various physics projects. The wrap-up lightning talks were incredibly broad and impressive. But the organizer of the program, Kasey Wagoner (Princeton), asked the Scholars to say something in their lightning talks about what they learned or what the program meant to them, and some of the responses were pretty impressive. My student was Winston Harris (MTSU), who (my loyal reader knows) did work on making exoplanet detection more efficient. The Astronomical Data Group here at Flatiron hosted four Scholars, and the whole program included dozens.
I got up at 04:55 NYC time so that I could be at MPIA Galaxy Coffee (at 11:00 European time) to talk about the project I have with Adrian Price-Whelan (Flatiron) code-named Chemical Tangents. The MPIA crew has done a great job of moderating Galaxy Coffee and making it interactive; I was interrupted many times during my talk with questions and comments (which I love). I sure miss being at MPIA! This is my first summer in 15 not living in Heidelberg. Among the many great questions I got during my talk was one from Joe Hennawi (UCSB): He asked whether the Chemical Tangents idea—that stellar element abundance ratios can depend on actions but not angles—could be turned around and used to calibrate or adjust noisy or systematically biased abundance-ratio measurements. That's a great question, because that is precisely the project I am doing with Christina Eilers (MIT) right now. It's cool that the Tangents and calibration projects are two sides of the same coin (bad metaphor?).
This summer I have been working with Winston Harris (MTSU) on the efficiency with which we can detect planets using an RV survey or with RV data. He has some nice results, some of which he presented at Stars & Exoplanets meeting today. After this, Megan Bedell (Flatiron) and I discussed the question: What are the simplest questions we can ask about detecting planets by RV observations? We have questions about the distribution / cadence of the observations, given a finite, pre-defined survey window. But we also have questions about the cadence in relation to the coherence times of the various noise processes: It should be different to take many observations within one coherence time vs taking observations separated by many coherence times. My intuition is not as rich as I'd like it to be here.
Adrian Price-Whelan (Flatiron) and I have been working on our project code-named Chemical Tangents this week. We have been working on a better name! But we have also been producing results and trying to understand the best scope for our first paper on the subject. It's hard, because there are so many things the method can do (in principle). We discussed how we would present the material in a short talk (because I am giving one in Germany—okay fine, remote to Germany) on Thursday. Then we realized that the scope of the first paper should be the same as the scope of a short talk! And indeed, it can be very useful to think about giving a talk when one thinks about how to assemble content into a paper: A paper is like a more detailed version of a technical seminar, in some sense. But anyway, that helped us reduce (even further) our scope. And it is making me think that I have a new theory of how to design a scientific paper...
Gaby Contardo (Flatiron) and I got in some research time at the end of the day today, visualizing data from SDSS-IV APOGEE data on stars on the upper red-giant branch. The question we are trying to ask is: Are there spectral differences between the stars that are going up the red-giant branch and stars that are going down? The problem we face is that we don't have a good training set. Is there some way we can find this, but unsupervised? I have an intuition that we can. Right now we are visualizing the data in photometric space, spectroscopic-parameter space, and PCA-amplitude space (where the PCA was performed on the spectral data).
I took a week of vacation this past week. My only research was a bit of thinking and writing in my project code-named Chemical Tangents.
As my loyal reader knows, I have four separate projects with Bonaca, Feeney, Casey, and Bedell to forward-model asteroseismic modes, with Bonaca concentrating on ground-based data, Feeney on principled Bayes, Casey on ESA Gaia, and Bedell on removing the modes as nuisances when we want to find planets. Today Bedell and I discussed where we are at, and came up with some things to try. In Sun-like stars, the modes have days-ish coherence times (apparently) and the modes have minutes-ish periods. So there are different regimes as your exposure cadence ranges from minutes to days to weeks. We have some qualitative predictions, and we are trying to make quantitative results that will influence survey design in the near future (especially for Terra Hunting Experiment and NASA NEID).
One funny thing about quasi-periodic (as in: finite coherence-time) oscillation processes is that they can be generated as a subset of Gaussian Processes, if you have good kernel machinery. We do! Another funny thing is that a GP can fit anything it is asked to fit! It literally has infinity free parameters (yes, literally). But the more appropriate kernels will do better (we hope) at predicting held-out data.
It was great to talk to Winston Harris (MTSU) about his results today running The Joker on exoplanet extreme-precision radial-velocity data. Instead of using The Joker to rejection-sample a huge prior sampling of models, he used it just to importance sample or weight each prior sample with the likelihood. In the highly-informative-data regime, rejection sampling only leaves samples in the dominant posterior mode. But importance samplings can be visualized to show all the other modes. His visualization confirms something we (Price-Whelan, Foreman-Mackey, and I) have been saying for years: When you have very informative time-domain data, you don't get a uni-modal likelihood (or posterior): You just get a likelihood (or posterior) with one very dominant mode. Locally, and at very low likelihood, all those other modes still exist. Period-finding is always an extremely non-convex problem.
Kate Storey-Fisher (NYU) and I worked through some figures for her paper on her new correlation-function estimator. Her estimator isn't tied to bins in separation: It infers the parameters of a smooth correlation-function model. Thus it is more expressive than the standard estimator with fewer parameters (bins). And her figures show that, using it, she can measure the baryon acoustic feature peak more precisely! Which corresponds to a reduction in the cost of a project at fixed precision, or an increase in precision at fixed cost. Awesome! Now how to present this all so it seems natural and sensible to the cosmology community: It is hard to teach an old dog new tricks.
A discriminative model is a method for finding a function of your features (like your spectrum) that delivers a prediction or expectation for your labels (like the temperature and metallicity of the star). A generative model is a method for finding a function of your labels that delivers a prediction or expectation for your features. In the discriminative case, you can take a derivative of labels wrt data. In the generative case you can take a derivative of the data wrt the labels. How are these related?
In the one-dimensional calculus context, these things are just inverses of each other. But in the multivariate labels and multivariate features case, it isn't so simple: Some of the features don't depend at all on the labels, so the generative derivative is zero; that doesn't make the discriminative derivative infinity!
The answer is the pseudo-inverse: You can convert a generative model into a discriminative model (locally, or to linear order) using the Taylor series and then linear least squares inference. That delivers a discriminative model (locally) and thus the other derivative you might want. The pseudo-inverse is the thing that appears in linear least squares. In numpy, the pseudo-inverse of X is solve(X.T @ X, X.T) or the operator embodied in lstsq().
As my loyal reader knows, Christina Eilers (MIT) and I have been looking at surface-gravity systematics in surface-abundance measurements in red-giant stars. It appears that stars with different gravities say different things about the abundance trends with (say) Galactocentric radius and perpendicular distance from the midplane. And none of these things agrees with the trends published in the literature by various groups. We now think that the published trends are caused by selection differences as a function of Galactic azimuth (probably primarily beause of crowding), plus these surface-gravity effects. Today we discussed with Hans-Walter Rix (MPIA) the scope of such projects, and we were able to establish a limited scope that we think will work. But one step on the path is to argue all this out with the APOGEE team, because we need to understand whether our interpretations of all these things are correct.