so much Gaussian processes

The day was all GPs. Markus Bonse (Darmstadt) showed various of us very promising GPLVM results for spectra, where he is constraining part of the (usually unobserved) latent space to look like the label space (like stellar parameters). This fits into the set of things we are doing to enrich the causal structure of existing machine-learning methods, to make them more generalizable and interpretable. In the afternoon, Dan Foreman-Mackey (Flatiron) found substantial issues with GP code written by me and Christina Eilers (MPIA), causing Eilers and I to have to re-derive and re-write some analytic derivatives. That hurt!
Especially since the derivatives involve some hand-coded sparse linear algebra. But right at the end of the day (like with 90 seconds to spare), we got the new derivatives working in the fixed code. Feelings were triumphant.


what's our special sauce? and Schwarzshild modeling

My day started with Dan Foreman-Mackey (Flatiron) smacking me down about my position that it is causal structure that makes our data analyses and inferences good. The context is: Why don't we just turn on the machine learning (like convnets and GANs and etc). My position is: We need to make models that have correct causal structure (like noise sources and commonality of nuisances and so on). But his position is that, fundamentally, it is because we control model complexity well (which is hard to do with extreme machine-learning methods) and we have a likelihood function: We can compute a probability in the space of the data. This gets back to old philosophical arguments that have circulated around my group for years. Frankly, I am confused.

In our Gaia DR2 prep meeting, I had a long conversation with Wyn Evans (Cambridge) about detecting and characterizing halo substructure with a Schwarzschild model. I laid out a possible plan (pictured below). It involves some huge numbers, so I need some clever data structures to trim the tree before we compute 1020 data–model comparisons!

Late in the day, I worked with Christina Eilers (MPIA) to speed up her numpy code. We got a factor of 40! (Primarily by capitalizing on sparseness of some operators to make the math faster.)


empirical yields; galaxy alignments; linear algebra foo.

Early in the day, Kathryn Johnston (Columbia) convened the local Local Group group (yes, I wrote that right) at Columbia. We had presentations from various directions (and I could only be at half of the day). Subjective highlights for me included the following: Andrew Emerick (Columbia) showed that there is a strong prediction that in dwarf galaxies, AGB-star yields will be differently distributed than supernovae yields. That should be observable, and might be an important input to my life-long goal of deriving nucleosynthetic yields from the data (rather than theory). Wyn Evans (Cambridge) showed that you can measure some statistics of the alignments of dwarf satellite galaxies with respect to their primary-galaxy hosts, using the comparison of the Milky Way and M31. M31 is more constraining, because we aren't sitting near the center of it! These alignments appear to have the right sign (but maybe the wrong amplitude?) to match theoretical predictions.

Late in the day, Christina Eilers (MPIA) showed up and we discussed with Dan Foreman-Mackey (Flatiron) our code issues. He noted that we are doing the same linear-algebra operations (or very similar ones) over and over again. We should not use solve, but rather use cho_factor and then cho_solve that permits fast operation given the pre-existing factorization. He also pointed out that in the places where we have missing data, the factorization can be updated in a fast way rather than fully re-computed. Those are good ideas! As I often like to say, many of my super-powers boil down to just knowing who to ask about linear algebra.


self-calibration for EPRV; visualizations of the halo

The morning started with Bedell and Foreman-Mackey and me devising a self-calibration approach to combining the individual-order radial velocities we are getting for the different orders at the different epochs for a particular star in the HARPS archive. We need inverse variances for weighting in the fit, so we got those too. The velocity-combination model is just like the uber-calibration of the SDSS imaging we did so many years ago. We discussed optimization vs marginalization of nuisances, and decided that the data are going to be good enough that probably it doesn't matter which we do. I have to think about whether we have a think-o there.

After that, I worked with Anderson and Belokurov on finding kinematic (phase-space) halo substructure in fake data, in SDSS, and in Gaia DR2. We have been looking at proper motions, because for halo stars, these are better measured than parallaxes! Anderson made some great visualizations of the proper-motion distribution in sky (celestial-coordinate) pixels. Today she made some visualizations of celestial-coordinate distribution in proper-motion pixels. I am betting this latter approach will be more productive. However, Belokurov and I switched roles today, with me arguing for “visualize first, think later” and him arguing for making sensible metrics or models for measuring overdensity significances.

Andy Casey (Monash) is in town! I had a speedy conversation with him about calibration, classification, asteroseismology, and The Cannon.


finding and characterizing halo streams in Gaia

Our weekly Gaia DR2 prep meeting once again got us into long arguments about substructure in the Milky Way halo, how to find it and how to characterize it. Wyn Evans (Cambridge) showed that when he looks at halo substructures he has found in terms of actions, they show larger spreads in action in some potentials and smaller in others. Will this lead to constraints on dynamics? Robyn Sanderson (Caltech) thinks so, and so did everyone in the room. Kathryn Johnston (Columbia) and I worked through some ideas for empirical or quasi-empirical stream finding in the data space, some of them inspired by the Schwarzschild-style modeling suggested by Sanderson in my office last Friday. And Lauren Anderson showed plots of Gaia expectations for substructure from simulations, visualized in the data space. We discussed many other things!


gradients in cosmological power, and EPRV

In the morning, Kate Storey-Fisher (NYU) dropped by to discuss our projects on finding anomalies in the large-scale structure. We discussed the use of mocks to build code that will serve as a pre-registration of hypotheses before we test them. We also looked at a few different kinds of anomalies for which we could easily search. One thing we came up with is a generalization of the real-space two-point function estimators currently used in large-scale structure into estimators not just of the correlation function, but also its gradient with respect to spatial position. That is, we could detect arbitrary generalizations of the hemispheric asymmetry seen in Planck but in a large-scale structure survey, and with any scale-dependence (or different gradients at different scales). Our estimator is related to the concept of marked correlation functions, I think.

Late in the day, Bedell (Flatiron), Montet (Chicago), and Foreman-Mackey (Flatiron) showed great progress on measuring RVs for stars in high-resolution spectroscopy. Their innovation is to simultaneously fit all velocities, a stellar spectrum, and a telluric spectrum, all data-driven. The method scales well (linearly with data size) and seems to suggest that we might beat the m/s barrier in measuring RVs. This hasn't been demonstrated, but the day ended with great hopes. We have been working on this model for weeks or months (depending on how you count) but today all the pieces came together. And it easily generalizes to include various kinds of variability.


things are looking good

I had early-morning chats with Ana Bonaca (Harvard), who has very nice sanity checks showing that our Fisher analysis (Cramér–Rao analysis) is delivering sensible constraints on the Milky-Way potential, and Christina Eilers (MPIA), who is getting sensible results out of her novel modification of the GPLVM for stellar spectra. After that, I took the rest of the day off for my health.


refactoring, seminar technique, search by modeling

In parallel working session this morning (where collaborators gather in my office to work together), Montet, Bedell, and I worked out a re-factor of the RV code they have been working on, in order to make it more efficient and easier to maintain. That looked briefly like a big headache and challenge, but in the end the re-factor got completely done today. Somehow it is brutal to consider a refactor, but in the end it is almost always a good idea (and much easier than expected). I'm one to talk: I don't write much code directly myself these days.

Sarah Pearson (Columbia) gave the NYU Astro Seminar today. It was an excellent talk on what we learn about the Milky Way from stellar streams. She did exactly the right thing of spending more than half of the talk on necessary context, before describing her own results. She got the level of this context just right for the audience, so by the time she was talking about what she has done (which involves chaos on the one hand, and perturbations from the bar on the other), it was comprehensible and relevant for everyone. I wish I could boil down “good talk structure” to some simple points, but I feel like it is very context-dependent. Of course one thing that's great about the NYU Astro Seminar is that we are an interactive audience, so the speaker knows where the audience is.

After lunch I had a great and too-short discussion with Robyn Sanderson (Caltech), continuing ideas that came up on Wednesday about search for halo substructure. We discussed the point that when you transform the data to something like action space (or indeed do any non-linear transformation of the data), the measurement uncertainties become crazy and almost impossible to marginalize or even visualize. Let alone account for properly in a scientific analysis. So then we discussed whether we could search for substructure by transforming orbits into the data space and associating data with orbits, in the space where the data uncertainties are simple. As Sanderson pointed out, that's Schwarzschild modeling. Might be a great idea for substructure search.


theory of anomalies

Today was a low-research day, because [reality]. However, Kate Storey-Fisher (NYU) and I had a great discussion with Josh Ruderman (NYU) about anomalies in the LSS. As my loyal reader knows, we are looking at constructing a statistically valid, safe search for deviations from the cosmological model in the large-scale structure. That search is going to focus towards the overlap (if there is any overlap) between anomalies that are safe to systematic problems with the data (that is, anomalies that can't be mocked by reasonable adjustments to our beliefs about our selection function) and anomalies that live in spaces suggested or predicted by theoretical ideas about non-standard cosmological theories. In particular, we are imagining theories that have the dark sector do interesting things at late times. We didn't make concrete plans in this meeting, except to read down literatures about late decays of the dark matter, dark radiation, and other kinds of dark–dark interactions that could be happening in the current era.


actions or observables? forbidden planet radii

The highlight today of our Gaia DR2 prep meeting was a plenary argument (recall that this meeting is supposed to be parallel working, not plenary discussion, at least not mainly) about how to find halo substructure in the data. Belokurov (Cambridge) and Evans (Cambridge) showed some nice results of searching for substructure in something close to the raw data. We argued about the value of transforming to a space of invariants. The invariants are awesome, because clustering is long-lived and stark there. But clustering is terrible because (a) it introduces unnecessarily wrong assumptions into the problem and (b) normal uncertainties in the data space become arbitraily ugly noodles in the action space. We discussed whether there are intermediate approaches, that get the good things about working in observables, without too many of the bad things of working in the actions. We didn't make specific plans, but many good ideas hit the board.

Stars group meeting contained too many results to describe them all! It was great, and busy. But the stand-out result for me (and this is just me!) was a beautiful result by Vincent Van Eylen (Leiden) on exoplanet radii. As my loyal reader knows, the most common kinds of planets are not Earths or Neptunes, but something in-between, variously called super-Earths and mini-Neptunes. Now it turns out that even this class bifurcates, with a bimodal distribution—there really is a difference between super-Earths and mini-Neptunes, and little in between. Now Van Eylen shows that this gap really looks like it goes exactly to zero: There is a range of planet radii that really don't exist in the world. Note to reader: This effect probably depends on host star and many other things, but it is incredibly clear in this particular sample. Cool thing: The forbidden radii are a function of radius, and the forbidden zone was (loosely) predicted before it was observed. Just incredible. Van Eylen's super-power: Revision of asteroseismic stellar radii to get much more precision on stars and therefore on the transiting planets they host. What a result.


you never really understand a model until you implement it

Eilers (MPIA) and I discussed puzzling results she was getting in which she could fit just about any data (including insanely random data) with the Gaussian Process latent variable model (GPLVM) but with no predictive power on new data. We realized that we were missing a term in the model: We need to constrain the latent variables with a prior (or regularization), otherwise the latent variables can go off to crazy corners of space and the data points have (effectively) nothing to do with one another. Whew! This all justifies a point we have been making for a while, which is that you never really understand a model until you implement it.


modeling the heck out of the atmosphere

The day started with planning between Bedell (Flatiron), Foreman-Mackey (Flatiron), and I about a possible tri-linear model for stellar spectra. The model is that the star has a spectrum, which is drawn from a subspace in spectral space, and doppler shifted, and the star is subject to telluric absorption, which is drawn from a subspace in spectral space, and doppler shifted. The idea is to learn the telluric subspace using all the data ever taken from a spectrograph (HARPS, in this case). But of course the idea behind that is to account for the tellurics by simultaneously fitting them and thereby getting better radial velocities. This was all planning for the arrival of Ben Montet (Chicago), who arrived later in the day for a two-week visit.

At lunch time, Mike Blanton (NYU) gave the CCPP brown-bag talk about SDSS-V. He did a nice job of explaining how you measure the composition of ionized gas by looking at thermal state. And etc!


detailed abundances of pairs; coherent red-giant modes

In the morning I sat in on a meeting of the GALAH team, who are preparing for a data release to precede Gaia DR2. In that meeting, Jeffrey Simpson (USyd) showed me GALAH results on the Oh et al comoving pairs of stars. He finds that pairs from the Oh sample that are confirmed to have the same radial velocity (and are therefore likely to be truly comoving) have similar detailed element abundances, and the ones that aren't, don't. So awesome! But interestingly he doesn't find that the non-confirmed pairs are as different as randomly chosen stars from the sample. That's interesting, and suggests that we should make (or should have made) a carefully constructed null sample for A/B testing etc. Definitely for Gaia DR2!

In the afternoon, I joined the USyd asteroseismology group meeting. We discussed classification of seismic spectra using neural networks (I advised against) or kernel SVM (I advised in favor). We also discussed using very narrow (think: coherent) modes in red-giant stars to find binaries. This is like what my host Simon Murphy (USyd) does for delta-Scuti stars, but we would not have enough data to phase up little chunks of spectrum: We would have to do one huge simultaneous fit. I love that idea, infinitely! I asked them to give me a KIC number.

I gave two talks today, making it six talks (every one very different) in five days! I spoke about the pros and cons of machine learning (or what is portrayed as machine learning on TV) as my final Hunstead Lecture at the University of Sydney. I ended up being very negative on neural networks in comparison to Gaussian processes, at least for astrophysics applications. In my second talk, I spoke about de-noising Gaia data at Macquarie University. I got great crowds and good feedback at both places. It's been an exhausting but absolutely excellent week.


mixture of factor analyzers; centroiding stars

On this, day four of my Hunstead Lectures, Andy Casey (Monash) came into town, which was absolutely great. We talked about many things, including the mixture-of-factor-analyzers model, which is a good and under-used model in astrophysics. I think (if I remember correctly) that it can be generalized to heteroskedastic and missing data too. We also talked about using machine learning to interpolate models, and future projects with The Cannon.

At lunch I sat with Peter Tuthill (Sydney) and Kieran Larkin (Sydney) who are working on a project design that would permit measurement of the separation between two (nearby) stars to better than one millionth of a pixel. It's a great project; the designs they are thinking about involve making a very large, but very finely featured point-spread function, so that hundreds or thousands of pixels are importantly involved in the positional measurements. We discussed various directions of optimization.

My talk today was about The Cannon and the relationships between methods that are thought of as “machine learning” and the kinds of data analyses that I think will win in the long run.


MCMC, asteroseismology, delta-Scutis

Today I am on my third of five talks in five days, as part of my Hunstead Lectures at Sydney. I spoke about MCMC sampling. A lot of what I said was a subset of things we write in our recent manual on MCMC. At the end of the talk there was some nice discussion of detailed balance, with contributions from Tuthill (USyd) and Sharma (USyd).

At lunch I grilled asteroseismology guru Tim Bedding (USyd) about measuring the large frequency difference delta-nu in a stellar light curve. My position is that you ought to be able to do this without explicitly taking a Fourier Transform, but rather as some kind of mathematical operation on the data. That is, I am guessing that there is a very good and clever frequentist estimator for it. Bedding expressed the view that there already is such a thing, in that there are methods for automatically generating delta-nu values. They do take a Fourier Transform under the hood, but they are nonetheless good Frequentist estimators. But I want to work on sparser data, like Gaia and LSST light curves. I need to understand this all better. We also talked about how it is possible for a gastrophysics-y star to have oscillations with quality factors better than 105. Many stars do!

That's all highly relevant to the work of Simon Murphy (USyd), who finds binary stars by looking at phase drifts in highly coherent delta-Scuti star oscillations. He and I spent an Afternoon of hacking on models for one of his delta-Scuti stars, with the hopes of measuring the quality factor Q and also maybe exploring new and more information-preserving methods for finding the binary companions. This method of finding binaries has similar sensitivity to astrometric methods, which makes it very relevant to the binaries that Gaia will discover.


noise, calibration, and GALAH

Today I gave my second of five Hunstead Lectures at University of Sydney. It was about finding planets in the Kepler and K2 data, using our non-stationary Gaussian Process or linear model as a noise model. This is the model we wrote up in our Research Note of the AAS. In the question period, the question of confirmation or validation of planets came up. It is very real that the only way to validate most tiny planets is to make predictions for other data. But when will we have data more sensitive than Kepler? This is a significant problem for much of bleeding-edge astronomy.

Early in the morning I had a long call with Jason Wright (PSU) and Bedell (Flatiron) about the assessment of the calibration programs for extreme-precision RV surveys. My position is that it is possible to assess the end-to-end error budget in a data-driven way. That is, we can use ideas from causal inference to figure out what parts of the RV noise are coming from telescope plus instrument plus software. Wright didn't agree: He believes that large parts of the error budget can't be seen or calibrated. I guess we better start writing some kind of paper here.

In the afternoon I had a great discussion with Buder (MPIA), Sharma (USyd), and Bland-Hawthorn (USyd) about the current status of detailed elemental abundance measurements in GALAH. The element–element plots look fantastic, and clear trends and high precision are evident, just looking at the data. To extract these abundances, Buder has made a clever variant of The Cannon which makes use of the residuals away from a low-dimensional model to measure the detailed abundances. They are planning on doing a large data release in April.


five talks in five days

On the plane to Sydney, I started an outline for a paper with Bedell (Flatiron) on detailed elemental abundances, and the dimensionality or interpretability of the elemental subspace. I also started to plan the five talks I am going to give in five days as the Hunstead Lecturer. On arrival I went straight to University of Sydney and started lecturing. My first talk was on fitting a line to data, with a concentration on the assumptions and their role in setting procedures. That is, I emphasized that you shouldn't choose a procedure by which you fit your data: You should choose a set of assumptions you are willing to make about your data. Once you do that, the procedure will flow from the assumptions. After my talk I had a great lunch with graduate students at Sydney. The range of research around the table was remarkable. I plan to spend some of the week learning about asteroseismology.


best-ever detailed abundances

In Friday parallel-working session, Bedell (Flatiron) showed me all 900-ish plots of every element against every element for her sample of 80 Solar twins. Incredible. Outrageous precision, and outrageous structure. And it is a beautiful case where you can just see the precision directly in the figures: There are clearly real features at very small scales. And hugely informative structures. This is the ideal data set for addressing something that has been interesting me for a while: What is the dimensionality of the chemical-abundance space? And can we see different nucleosynthetic processes directly in the data?

Late in the day, Jim Peebles (Princeton) gave the Astro Seminar. He spoke about three related issues in numerical simulations of galaxies: They make bulges that are too large and round; they make halos that have too many stars; and they don't create a strong enough bimodality between disks and spheroids. There were many galaxy-simulators in the audience, so it was a lively talk, and a very lively dinner afterwards.


combinatoric options for a paper

I had my weekly call with Bonaca (Harvard), about information theory and cold stellar streams. We discussed which streams we should be considering in our paper. We have combinatoric choices, because there are N streams and K Milky-Way parameters; we could constrain any combination of parameters with any combination of streams! And it is even worse than that, because we are talking about basis-function expansions for the Milky-Way potential, which means that K is tending to infinity! We tentatively decided to do something fairly comprehensive and live with the fact that we won't be able to fully interpret it with finite page charges.


circumbinary planets, next-gen EPRV

The Gaia DR2 workshop and Stars Group meeting were both very well attended! At the former, Price-Whelan (Princeton) showed us PyGaia, a tool from Anthony Brown's group in Leiden to simulate the measurement properties of the Gaia Mission. It is really a noise model. And incredibly useful, and easy to use.

In the Stars meeting, so many things! Andrew Mann (Columbia) spoke about the reality or controversies around Planet 9, which got us arguing also about claims of extra-solar asteroids. Kopytova (ASU) described her project to sensitively find chemical abundance anomalies among stars with companions, and asked the audience to help find ways that true effects could be scooped. Her method is very safe, so it takes a near-conspiracy, I think, but Brewer (Yale) disagreed. Veselin Kostov (Goddard) talked about searching for circumbinary planets. This is a good idea! He has found a few in Kepler but believes there are more hidden. It is interesting for TESS for a number of reasons, one of which is that you can sometimes infer the period of the exoplanet with only a short stretch of transit data (much shorter than the period), by capitalizing on a double-transit across the binary.

Didier Queloz (Cambridge) was in town for the day. Bedell (Flatiron) and I discussed with him next-generation projects for HARPS and new HARPS-like instruments. He is pushing for extended campaigns on limited sets of bright stars. I like this idea for its statistical and experimental-design simplicity! But (as he notes) it is hard to get the heterogeneous community behind such big projects. He has a project to pitch, however, if people are looking to buy in to new data sources. He, Bedell, and I discussed what we know about limits to precision in this kind of work. We aren't far apart, in that we all agree that HARPS (and its competitors) are extremely well calibrated machines, much better calibrated than the end-to-end precision obtained.