I spent my research time today working for Adrian Price-Whelan (Flatiron). I took our huge list of bullet points for our paper's discussion section (paper is on inferring the Milky Way mass model using stellar element abundances and kinematics), rearranged them around four themes, and deleted duplicate points. I then turned a few of the bullet points into a few paragraphs. Writing is a fun part of my job. But I don't find that I can write many paragraphs in a day. Or at least not many good ones.
Today Lily Zhao (Yale) crashed the Terra Hunting science meeting to talk about the EXPRES experience with its laser frequency comb calibration source. She showed the results of our hierarchical, non-parametric wavelength solution and how it improves radial-velocity measurements end-to-end. We then spent lots of time talking about the operational aspects of having a LFC in the system. It looks likely (to me) that Terra Hunting will budget, acquire, and install an LFC. I think it's a good idea, after our EXPRES experiences.
Today was the first day of the Terra Hunting science meeting. We made great progress on target selection, which was our primary goal. We decided that we should only consider stars that have sufficient visibility from the observatory and sufficient brightness to deliver sufficient photons to provide precisely enough measured radial-velocity variations over a decade to meet our planet-detection goals. That is, we should make the parent sample of our final selection to be stars where we at least have enough photons to detect an Earth. That is an obvious and simple point, but this was the first meeting where we really clearly identified it, and started to figure out how that flows down to a target list. It turns out that there aren't huge numbers of stars that even meet this strict requirement. And of course we spent lots of time talking about all the reasons that we won't get photon-limited radial-velocities, so our final target list must be much more restricted, probably.
After conversations with Christina Eilers (MIT) this week, I spent a bit of weekend research time re-scoping our paper. Now it is a paper about the elemnent-abundance gradients in the Milky Way disk, made with an abundance-calibration model. It is no longer a paper about abundance calibration, with gradients shown as a demonstration. What does it mean to change scope? It means deleting the old file and writing a brand-new title and abstract. As my loyal reader knows: Title and abstract come first!
In the weekly call between Lily Zhao (Yale), Megan Bedell (Flatiron), and myself, Zhao showed some really nice results: She has run our RV code wobble on all the data on a particularly active, spotty star. By design, wobble makes an empirical average stellar spectrum, which permits us to look at the residuals, colored by various things. Zhao has made these plots and there are some hints of spectral signatures of the spots! Too early to tell, but Zhao may be about to discover spectrum-space spot removers. This might be the realization of a dream I've had ever since we started looking at EPRV data.
Today between zoom (tm) calls, Adrian Price-Whelan (Flatiron) and I discussed the following problem with the k nearest neighbors algorithm: If your point is near an edge or gradient in your sample, and the quantity you are predicting (your label) also has a gradient locally, then the “center of mass” of your neighbors will be offset from your target point, and therefore have an average or mean that is inappropriate to your target point.
No problem: Instead of taking a mean of your nearest neighbors, fit a linear plane to your neighbors (label as a function of position) and interpolate or evaluate that fit at the point. Awesome! This is (a simplification of) what Adam Wheeler (Columbia) and I did in our X-enhanced stars paper. The only problem is: We are doing nearest neighbors inside an optimization loop, and we need it to be very fast.
Price-Whelan and my solution: Compute the k-neighbor center of mass position, project all target–neighbor offset vectors onto the target–center-of-mass vector, and then do a linear regression in one dimension, interpolated to the target point in one dimension. This works! It's fast, and (when you flow through all the linear algebra) it looks like a really strange kind of weighted mean. Price-Whelan implemented this and it improved our results immediately. Yay!
Today Teresa Huang (JHU), Soledad Villar (JHU), and I met for a sprint on a possible paper for the AISTATS conference submission deadline. We are thinking about pulling together results we have on double descent (the phenomenon that good, predictive models can have many more data than parameters, or many more parameters than data, but in-between, unless you are careful, they can suck) and on adversarial attacks against regressions.
I have a fantasy that we can unify a bunch of things in the literature, but I'm not sure. I ended up drawing a figure on my board that might be relevant: The out-of-sample prediction error, the (equivalent of the) condition number of the data SVD, and the susceptibility to “data-poisoning attacks” all depend on number of data n and the number of parameters p in related ways. Our main conclusion is that regularization (or dimensionality reduction, which amounts to the same thing) is critical.
Today I spent more time writing linear algebra and likelihood-function math and discussion into the paper with Bonaca (Harvard). I like to write first, and code later, so that the paper and the code use the same terminology and embody the same concepts.
Later in the day I had the first of regular meetings with new arrival Katie Breivik (Flatiron). We discussed the overlap between her interests in binary-star evolution and our binary-star-relevant data sets.
I spent my research time this morning typing math into a document co-authored by Ana Bonaca (Harvard) about a likelihood function for asteroseismology. It is tedious and repetitive to type the math out in LaTeX, and I feel like I have done the nearly-same typing over and over again. And, adding to the feeling of futility: The linear-algebra objects we define in the paper are more conceptual than operational: In many cases we don't actually construct these objects; we avoid constructing things that will make the execution either slow or imprecise. So in addition to typing these, I find myself also typing lots of non-trivial implementation advice.
I spent part of the weekend looking at issues with chemical abundances as a function of position and velocity in the Solar neighborhood. In the data, it looks like somehow the main-sequence abundances in the APOGEE and GALAH data sets are different for stars in different positions along the same orbit. That's bad for my Chemical Torus Imaging (tm) project with Adrian Price-Whelan (Flatiron)! But slowly we realized that the issue is that the abundances depend on stellar effective temperatures, and different temperatures are differently represented in different parts of the orbit. Phew. But there is a problem with the data. (Okay actually this can be a real, physical effect or a problem with the data; either way, we have to deal.) Time to call Christina Eilers (MIT), who is thinking about exactly this kind of abundance-calibration problem.
Today Bonaca (Harvard) and I settled on a full scope for a first paper on asteroseismology of giant stars in the NASA TESS survey. We are going to find that our marginalized-likelihood formalism confirms beautifully many of the classical asteroseismology results; we are going to find that some get adjusted by us; we are going to find that some are totally invisible to us. And we will have reasons or discussion for all three kinds of cases. And then we will run on “everything” (subject to some cuts). That's a good paper! If we can do it. This weekend I need to do some writing to get our likelihood properly recorded in math.
Today, Tyler Pritchard (NYU) and I assembled a group of time-domain-interested astrophysicists from around NYC (and a few who are part of the NYC community but more far-flung). In a two-hour meeting, all we did was introduce ourselves and our interests in time domain, multi-messenger, and Vera Rubin Observatory LSST, and then discuss what we might do, collectively, as a group. Time-domain expertise spanned an amazing range of scales, from asteroid search to exoplanet characterization to stellar rotation to classical novae to white-dwarf mergers with neutron stars, supernovae, light echoes, AGN variability, tidal-disruption events, and black-hole mergers. As we had predicted in advance, the group recognized a clear opportunity to create some kind of externally funded “Gotham” (the terminology we often use for NYC-area efforts these days) center for time-domain astrophysics.
Also, as we predicted, there was more confusion about whether we should be thinking about a real-time event broker for LSST. But we identified some themes in the group that might make for a good project: We have very good theorists working, who could help on physics-driven multi-messenger triggers. We have very good machine-learners working, who could help on data-driven triggers. And we have lots of non-supernovae (and weird-supernova) science cases among us. Could we make something that serves our collective science interests but is also extremely useful to global astrophysics? I think we could.
In Fourier transforms, or periodograms, or signal processing in general, when you look at a time stream that is generated by a single frequency f (plus noise, say) and has a total time length T, you expect the power-spectrum ppeak, or the likelihood function for f to have a width that is no narrower than 1/T in the frequency direction. This is for extremely deep reasons, that relate to—among other things—the uncertainty principle. You can't localize a signal in both frequency and time simultaneously.
Ana Bonaca (Harvard) and I are fitting combs of frequencies to light curves. That is, we are fitting a model with K frequencies, equally spaced. We are finding that the likelihood function has a peak that is a factor of K narrower than 1/T, in both the central-frequency direction, and the frequency-spacing direction. Is this interesting or surprising? I have ways to justify this point, heuristically. But is there a fundamental result here, and where is it in the literature?
Adrian Price-Whelan (Flatiron) and I have been trying to use the chemical abundances of stars to constrain the mass model (or gravitational potential, or orbit structure) of the Milky Way. One thing we have noticed is that the abundances are very sensitive to the coordinate system: If you have the velocity or position of the disk wrong, it is clearly visible in the abundances! That's fun, and motivating. But then we have noticed—and we noticed this two summers ago now—that different elements want to put the disk midplane in different places!
What gives? We have various theories. We started with systematics in the data, but the effects are seen in both APOGEE and GALAH. So it seems like it is real. Is it because relaxation times are very long at small vertical heights? (The disk is like a harmonic oscillator at small vertical amplitudes.) Is it because the thinner disk and thicker disk have inconsistent midplanes? Whatever it is, it seems like it is super interesting. We can't solve this problem in our current paper, but we want to comment on it.
This week is a sprint week for Ana Bonaca (Harvard) and me to work on our asteroseismology likelihood and search, for (possibly even ground-based) irregularly spaced data and heterogenous data. We figured out what we need for inputs to this likelihood function: We need empirical relationships between the nu-max parameter and other aspects of the power spectrum, like the mean heights of the mode peaks, and the width of the mode forest or comb. It seemed daunting at first, but awesomely a lot of what we need is in the Basu & Chaplin book. Most of our current technical struggles relate to the problem that the likelihood function is amazingly, amazingly featured. It's almost adversarial!
Late in the day, Tyler Pritchard (NYU) and I met socially distanced in a park to make final plans for a working meeting on time-domain astrophysics for NYC. The plan is to start to build a community around Vera Rubin LSST (which I just learn got backronymed) that is centered here in New York, and possibly build and operate a real-time event broker. But in this first meeting we want people to really discuss ideas and get something started: How to design online meetings to involve discussions and idea generation? We are learning a lot from our friends at TESS dot science online and AstroHackWeek 2020, both of which worked out new ways to have scientists who aren't physically in the same space—and maybe don't know one another all that well yet—do novel work together.
The bugbear of extreme precision radial velocity measurements is often called “stellar activity”. I don't love that terminology, because stellar activity has a magnetic-field-reconnection feel to it, when the thing being referenced really covers all sorts of stellar variability that maps onto radial-velocity measurements. Today, Bedell (Flatiron), Zhao (Yale), and I discussed how we are going to approach a stellar activity challenge coming from the EXPRES group at Yale: They have released spectra for many epochs of observing from one star, and the team that delivers the best RV precision wins. Wins? Well, has some bragging rights.
Our plan is to look for spectral changes that predict velocity changes. That is, can we see different spectra on different parts of the stellar surface and relate those to measured and true velocities? We discussed the stellar rotation period, photometric variations, and classical activity indicators, all of which might help us in our goals.
The first order of business? Get wobble to run on the spectra, and make it deliver residuals away from the best constant-star fit.
My research day started with a conversation with Teresa Huang (JHU) and Soledad Villar (JHU) about the regressions that are used to determine stellar parameters. Huang has shown that different machine-learning methods (which are generally over-parameterized) obtain very different gradients in their explicit or implicit functions that connects labels (stellar parameters) to features (stellar spectra), and very different from those indicated by the physical models we have of stellar atmospheres. These differences can be exploited for attack.
Later, Megan Bedell (Flatiron) and I spent time designing projects that are aimed at maximizing the efficiency of radial-velocity exoplanet searches. The idea is: You have a finite amount of telescope time allocated over a fixed (long) interval. How do you assign observation slots to stars to maximize expected yield (or any other statistic you care about)? The answer is going to depend strongly on what we assume about the noise properties of the measurements we make.
I spent the last few days hiding in various undisclosed locations. In my small amount of research time, I wrote words in the method and discussion sections of my nascent paper with Adrian Price-Whelan (Flatiron) about imaging orbital 3-tori in phase space using element abundances.
In my student-research meeting, Kate Storey-Fisher (NYU), Abby Williams (NYU), and I discussed how we could make a flexible, parameterized model of a correlation function (for large-scale structure) with a continuous spatial gradient in it. The idea is that we are going to use Storey-Fisher's continuous-function estimator for the correlation function to look for departures from statistical homogeneity. Our inspiration is the power asymmetry in the cosmic microwave background.
Our approach to building the flexible model is to take a model for the homogeneous correlation function and analytically Taylor-expanding it around a fiducial point. In this formalism, the local spatial gradient at the fiducial point turns into a linear dependence on position, and the local second derivative leads to a quadratic term, and so on. Of course an alternative approach would be to just measure the correlation funciton in spatial patches, and compare the differences to the measurement varainces. But binning is sinning, and I also believe that the continuous estimation will be higher precision.