jackknife, radial migration, and chaos

Yesterday, in conversation with Andrew Mann (Columbia), Jessica Birky (UCSD) and I decided that she should do a full set of jackknife tests on her Cannon model of APOGEE M-dwarf stars. She did that overnight (I love working with such great people!) and the results indeed show that we don't have much good metallicity information about the M-dwarf stars in the training set we have. This inspired Mann to look for more training-set objects; he found a few dozen more, with a bit more metallicity span. Excellent.

In the afternoon, Kathryn Johnston (Columbia) organized a meeting of the Local Local-Group Group. As it were. There were many interesting things discussed. Megan Bedell (Flatiron) showed her Solar-twin abundances and this got a lot of interesting discussion going about their use to constrain Galactic chemical evolution and radial migration in the Milky Way disk. In particular, they could be very constraining if stellar birth composition is a nearly-unique function of time and Galactocentric radius. There were also questions about whether she can constrain nucleosynthetic yields, which I think she can!

Also in that session Tomer Yavetz spoke about chaos and chaotic orbits, and the properties of stellar streams thereon. He had a nice explanation for why chaos shows up so quickly and clearly in stellar streams: The relevant timescale is not the Lyapunov time, but the time it takes for orbits to wander around their local neighborhood in frequency space, which can be a much shorter time (short reason: because that frequency neighborhood can be small). I hope this is correct, because it has been a puzzle!


ages of M stars

At Stars group meeting, Rocio Kiman (CUNY) showed some beautiful results comparing activity indictors for late-type dwarf stars with kinematic measurements. The stars that are older by activity indicators (less activity) are clearly also older kinematically (higher vertical velocity dispersion). The data are the clearest I have ever seen in this age world. Her goal is to build a hierarchical model of the different age indicators to cross-calibrate them and deliver highest possible precision age measures. She is ready for Gaia DR2!

In Gaia DR2 prep meeting, we went back through our Gaia projects. David Spergel (Flatiron) pointed out to us that the Gaia Archive is fair game for NASA ADAP proposals, which reminded me that I have some serious proposal-writing to do, asap! The discussion of projects got me very excited for April 25, which I think will be a fun celebration of everything astrometric.


the very local neighborhood

Today Jackie Faherty (AMNH) gave the astro seminar at NYU. She got us fired up about Gaia even before her talk, at lunch, where she said that on April 25 the curtains would finally open and we would get to see the Milky Way for the first time! Her seminar didn't disappoint: She pointed out that of the five closest stars to the Sun, three were discovered in 2014! And it appears that the Solar Neighborhood still has lots of secrets for us to discover. She also showed us a star that passed within 60,000 AU of the Sun some 70,000 years ago. That's interesting! If it disturbed comets onto elliptical orbits, we won't see their infall for a few million years! (Just a free-fall argument there.) That observation, combined with things people have found in Gaia DR1, suggests that we have a close encounter like that about once per million years.


black holes and quantum neural networks

It seems like a low-research month! But at lunch time, Gia Dvali (NYU) gave us a very surprising black-board talk in which he compared a black-hole horizon (which contains an enormous number of microstates, implied by the black-hole entropy argument) to a quantum neural network with a particular kind of hamiltonian term on each edge. In the network, there is an occupation number for the states in which there is an exponential increase in the number of microstates, which he was arguing is similar to the huge increase in entropy when a black hole forms. That's interesting! But there was plenty of skepticism in the room about its significance. Discussion was heated, especially afterwards.


model of everything

This morning was the usual parallel-working session at NYU. We discussed various things: Boris Leistedt (NYU) has a draft of a paper that models galaxy photometric data in large-scale structure surveys with a model that includes galaxy types, flexible SEDs for every type, and filter bandpasses, calibration issues or offsets, and luminosity distributions as a function of redshift. That is: A model of everything! Well not quite, but close. This could permit the dream of maximizing information extraction from photometric cosmology surveys, or surveys with mixed photometric and spectroscopic targets. The cool thing is that when the model is causally structured like his, you don't need a representative training set for your photometric redshift estimation.

At the same meeting, Elisabeth Andersson (NYU) showed us a matched filter run over some Kepler data to find a (known) exoplanet, and we discussed how to generalize this to find any other planets that are in a resonant orbit with any of the known planets. Her current plan is to fold and filter, at resonant periods.

Late in the day, Kelle Cruz (CUNY) gave a talk at Flatiron about how to make astronomy better, from an inclusion perspective. Lots of good ideas there; hopefully we can implement them at Flatiron and NYU.


RV information

My only research today was going through the linear algebra of spectro-perfectionism to see if it is possible that s-p preserves radial-velocity information. I think it doesn't, in the sense that there is more RV information in the two-dimensional spectral image than in the one-dimensional spectrum extracted by s-p. But I don't have a full answer yet. Information theory is hard for me!


snow day

Today was a snow day! So everything got cancelled, and I met up anyway with a small team, in an undisclosed location and did my busy work that I have been avoiding. Most of this is not research, but I did comment on some parts of various student and postdoc papers.


tracing dark matter with old stars

The research highlight of a low-research Tuesday was an absolutely great talk by Lina Necib (Caltech), who is looking at empirical methods for understanding the velocity distribution of dark matter in the local volume. Why? Because dark-matter detection experiments depend on it. How? By using an old (but rarely executed) idea that low-metallicity stars, being very old, ought to trace (at least some component of) the dark matter. This method of finding the dark matter is assumption-laden, but so are all the theoretical approaches. It is exciting (for me) to see an empirical approach. Indeed, she finds a lower velocity dispersion than the standard value used in the business; this weakens some of the current experimental limits. It's also a great use of the Gaia and RAVE data.


observing ourselves

In a (nearly) no-research day, we realized that Gaia DR2 will tell us more about ourselves than it will about the Milky Way. And it will tell us a lot about the Milky Way! Can't wait.


radial-velocity information

It was Fisher-information time today with Bedell (Flatiron), where we looked at whether or not spectro-perfectionism can deliver full radial-velocity precision from the 2-d spectrograph data to its 1-d extraction. The answer is unclear. There are two steps to s-p: The first is a least-square fit, which is definitely information-preserving, but the second is a smoothing back to natural resolution, which might be lossy. We are still working on it. My linear algebra is pushed to its limits.

Ruth Murray-Clay (UCSC) was in town, and we discussed exoplanets at lunch, and she gave a great talk late in the afternoon. At lunch, a highlight was discussing how we might update the expectations for exoplanet discoveries in the Gaia Mission; the papers on this are now way out of date (I think?). In her seminar, a highlight was a very simple, high-level picture of what current theory says about exoplanet formation, and some very simple ideas about critically testing this high-level model.


RV in 2D?

Today Megan Bedell (Flatiron) and I called Julian Stuermer (Chicago) and Ben Montet (Chicago) to talk a bit about spectrographic measurements of radial velocity. We are looking at different extraction methods and how much information they sacrifice: Is it better to be measuring radial-velocity in the two-d image plane of the spectrograph rather than in the one-d extracted spectrum? This is not yet clear, but we have formulated the question in an information-theoretic framework.

This all relates to ancient conversations I had with Sam Roweis about spectroscopy and spectro-perfectionism (s-p). There are so many questions! In s-p, the assumption is that your spectrograph is perfectly calibrated in every way; in this case, how do you extract all the information? But the real world isn't so great, so there might be lots of experiments to do in different regimes of realism, either about the spectrograph, or about the noise, or about calibration imperfections. I promised the crew that I would figure out the Cramér–Rao bound on radial-velocity in the two-d spectrograph image and in the s-p extraction under perfect conditions.


plots, WDs, and chemistry at group meetings

Group meetings were fun today. In Gaia DR2 prep meeting, I worked with Megan Bedell (Flatiron) to get some plots ready for Gaia DR2. That is, we planned what we will plot the moment that the data release happens. The goal is to look at physical and kinematic properties of exoplanet host stars.

In Stars meeting, JJ Hermes (UNC) showed some incredible WD lightcurves, which appear to come from white dwarfs that have quadrupolar temperature distortions on their surfaces, rotating. There appears to be a common sub-type of white dwarfs that show evidence of magnetism and strong surface temperature variations. We discussed things to do with Gaia and other data sources.

John Brewer (Yale) showed hot-off-the-presses results on chemical-abundance variations within Praesepe. This cluster has some really strange properties, like an amazingly low velocity dispersion. But he finds a chemical-abundance variation in iron but also elements ratioed to iron. This is in qualitative disagreement with work I have done with Melissa Ness (Columbia), so there is something to work out there. We discussed critical tests of his methods and results.


basis-function expansions for the MW

Today Kathryn Johnston (Columbia) organized a few-hour meeting at Flatiron to discuss kinematic or dynamical models of the Milky Way that would have far more flexibility than the models we have used up to now. That is, employing function expansions or highly parameterized models of perturbations away from the toy models that are currently used in Galaxy dynamics at the present day. Part of the discussion was about expansions that help with making simulations more accurate, but some (and the part I cared about) was about making data analyses better.

Many good ideas came up for near-term projects, for instance: One was refinement of an idea with Chervin Laporte (UVic) to use his disk simulations to make empirical basis functions from simulation snapshots that would permit us to make flexible but interpretable models of the disk in the Gaia data. Connected to this, the idea of making such basis functions not in 3-d density or potential space but in 6-d phase-space-density space. That could be valuable both for data analysis with Gaia and for theory. Indeed, the things that Martin Weinberg (Amherst) has been thinking about in basis functions might be expandable to 6-d.

There was much discussion about how such basis function expansions might make data or theory descriptions of the Milky Way (or simulations thereof) compact. This is a dimensionality reduction point and issue. There was more-or-less consensus that we should only be thinking about linear dimensionality reduction (which is good, because it can often be made into a convex optimization problem) but non-linear generalizations could be worth thinking about.

In some ways, the most impressive aspect of the day was the community-building activity. Johnston got together groups of people that have not usually collaborated and set up the conditions under which they might actually collaborate. She is not just an extremely insightful and accomplished physicist: She is really thinking about improving the long-term health of the fields in which she works.


machine learning for astronomers

There is a Monday seminar at Princeton run by the astrophysics graduate students that focuses on useful skills and knowledge around research, rather than research results. That's a good idea!

I gave the seminar today; I spoke about machine learning in astronomy. I started with my ML taxonomy and my recommendation to understand five beautiful, simple, and instructive examples: SVM, linear regression, PCA, k-means, and GMM with the EM algorithm. How's that for acronyms! I think each of these five methods is so beautiful, everyone should know how each of them works and generalizes.

Each of these methods is in a different taxonomic category (in order: classification, regression, dimensionality reduction, clustering, and density estimation), and each is beautiful. The first three are linear and convex, and each (for related reasons) can be generalized with the kernel trick. In the second half of my talk I discussed this, but my explanation went off the rails. I think I left everyone confused. Time to do more homework.


#TESSninja, day 5

The day started with a discussion or break-out about making a latent-variable structure for the incredible result by Guy Davies (Birmingham) that the power-spectra of red-giants in an open cluster lie on a one-dimensional locus. Details include: He is only looking at the overall envelope of the power spectrum, parameterized by 8-ish parameters. His 8-ish parameters follow a one-dimensional locus of power laws with respect to each other, except one. That one is the white-noise level, which makes sense is different. So he has a two-dimensional model that seems to fit extremely well every single star power spectrum in an open cluster observed by Kepler!

This discussion merged into a longer discussion, code-named Light-Curve Cannon with contributions from many people looking at how time-domain behavior of stars on different time scales can be used to predict or infer stellar parameters. It is extremely promising that TESS-like time-domain data will be able to tell you stellar parameters at comparable precision to contemporary spectroscopic modeling! Ruth Angus (Columbia) did a great job of bringing together the threads in these discussions: There are many papers to write.

The day ended with a wrap-up in which everyone contributed one slide and spoke for less than two minutes. Here are the wrap-up slides. They only give you the tiniest hint at all the things that happened this week!

Thank you to Dan Foreman-Mackey (Flatiron) and the Flatiron CCA staff and the Simons Foundation events staff for an absolutely great meeting. In particular, Foreman-Mackey's vision, leadership, technical abilities, and good nature got everyone participating and working together. That's community building.


#TESSninja, day 4

Today was a short day at #TESSninja for me, because I had [life events]. But in the morning, I spent some time working with [unnamed participants] and I managed, through my efforts, to fully bork their code. I guess I really, really don't understand Python packages. I felt bad about that. You are supposed to move fast and break things and fail fast but I often participate in projects in such a way that I feel like I make them worse!

I also spoke with Ellie Schwab Abrahams (AMNH) and Ben Montet (Chicago) about linear regression to calibrate a Kepler light curve. You can think of calibration as a kind of regression (predicting data using housekeeping data); we worked out what that would look like and got Schwab Abrahams on to gathering the housekeeping data.


#TESSninja, day 3

My plan for #TESSninja is to work on automated approaches to radial-velocity follow-up of TESS discoveries. I am bringing some new things to this question. The first is that I am not going to ask “when should I next observe this planet candidate?”, I am going to ask “I have telescope time right now, which of my follow-up objects should I observe next?”. The second new thing is that I think that it is insufficient to make this decision only on the basis of information obtained in this observation. It should be made based on the future discounted information that it unlocks or makes available, under assumptions about observing into the future.

This second point was a breakthrough for me. It comes from this point: Imagine that you are using RV measurements to measure precise periods, and you want period information. The first observation you make gives you no period information whatsoever: It only constrains the overall system velocity! So you would never make that first observation if you cared only about the immediate information gain on period. You have to think about the future information-gain potential that your observation unlocks, discounted by your discount rate. Or even more complex objectives (yes, cash flow ought to be involved).

In other news, Guy Davies (Birmingham) made a nice point in discussion of the time-domain behavior of stars in an open cluster observed by Kepler: Because these stars ought to be the same age, and the same composition, and (on the red-giant branch) nearly the same mass, the asteroseismological (and jitter) signals ought to—in some sense—lie along a one-dimensional sequence in the relevant space. That's a great idea; I want to test that.


#TESSninja, day 2

The highlight today of #TESSninja was Ashley Villar (Harvard) showing the lightcurve of a supernova discovered in the K2 mission, with models over-plotted. It appears that the supernova is a type Ia, but the early-time light curve (and K2 was observing it well before the start) is not consistent with any null type Ia models. The early time requires an interaction of the explosion with some nearby material, probably a companion star! This is an important discovery and (I think) a first!

Earlier in the day I worked with Ellie Schwab (AMNH) and Ben Montet (Chicago) on detrending a particular low-mass star that Schwab is interested in. We discussed how to combine the full-frame image information (where we know more about calibration and integrated photometry) with the long-cadence data (where we have a limited aperture and know less).


#TESSninja, day 1

Today was the first day of Preparing for TESS, organized by Dan Foreman-Mackey (Flatiron) and others. It is organized like the #GaiaSprint in that it is a hack week, starting with pitches and dedicated to getting stuff done. The crew pitched some great ideas on day one and then hacked. I am trying to work on algorithmic approaches to efficient radial-velocity follow-up.

Melissa Ness (Columbia) and Megan Bedell (Flatiron) started an interesting project to follow up anomalous stars in an open cluster: Do the stars with element-abundance anomalies also show anomalies in the time domain or in asteroseismology? Many other projects are working towards obtaining cleaned or calibrated light curves, although my heart sang when various people (notably Rodrigo Luger at UW) pointed out that we don't want to de-trend, we want to have a model that explains every light curve as a combination of spacecraft and stellar variability (and planets).


#siRTDM18, day 5

Armin Rest (STScI) gave a nice talk about time-domain astronomy, with stuff about finding Earth-impactors and also light echoes. After his talk, I told him the insane project conceived by Rix, Schölkopf, and me about modeling the whole Milky Way as a set of flickering light sources and a three-dimensional map of dust, using time-domain imaging at very low brightness. That's probably not possible! Rest is part of a big new sky survey for near-earth asteroids, which will also do a lot of variable-star science.

After that, Sarah Richardson (Microbyre) talked about automating various aspects of phylogeny for various kinds of microbes. I was impressed by the robotics setups available to biologists! Her talk also contained a lot of biology-101 content for the physicists and engineers; I learned a lot (and felt, once again, my regret that I didn't take more biology in college!).

Late in the day, Josh Bloom (Berkeley) and I did some real-time decision-making at the Emoryville card room.


#siRTDM18, day 4

Today was road-traffic day at Real-time Decision Making at Berkeley. Jane MacFarlane (LBNL) and Alexandre Bayen (Berkeley) gave great talks about road dynamics. In MacFarlane's talk I learned that provided (by providers) mobile-phone location information is posterior information not likelihood information. And the priors are outrageously informative (like that every phone is on the midline of a known road!). That is good for the user (the mobile-phone owner), who wants navigation information, but not good for anyone trying to do hierarchical inference over phones or people! This is very related to the issues that Alex Malz (NYU) is working on in cosmology.

Bayen focused on the influence of mobile phones on traffic, which has been immense! As mobile phones have gained traction with drivers, they have driven traffic patterns to a non-optimal Nash equilibrium, where all paths from point A to B take the same amount of time. But these same phones also create crazy new nonlinear dynamics, because all drivers get re-routed simultaneously to a small number of alternate routes when something goes wrong. And it is like a repeating multiplayer game, because each routing company is constantly learning the dynamics induced by all the other companies! But this game is played out in the parameters of a set of differential equations, so it is crazy.

Things would be better if we could find a way to cooperate; this led to great lunch discussions with Josh Bloom (Berkeley). We discussed ways to capitalize on the fact that different drivers have different objectives. No existing apps capture this at all: They all optimize for the triviality of minimum expected travel time!