#LennartFest day 2

Many great things happened at the meeting today; way too many to mention. Steinmetz showed how good the RAVE-on results are, and nicely described also their limitations. Korn showed an example of an extremely underluminous star, and discussed possible explanations (most of them boring data issues). Brown explained that with a Gaia mission extension, the parameter inference for exoplanet orbit parameters can improve as a huge power (like 4.5?) of mission lifetime. That deserves more thought. Gerhard explained that the MW bar is a large fraction of the mass of the entire disk! Helmi showed plausible halo substructure and got me really excited about getting ready for Gaia DR2. In the questions after her talk, Binney claimed that galaxy halos don't grow primarily by mergers, not even in theory! Hobbs talked about a mission concept for a post-Gaia NIR mission (which would be incredible). He pointed out that the reference frame and stellar positions require constant maintenance; the precision of Gaia doesn't last.

One slightly (and embarrassingly) frustrating thing about the talks today was that multiple discussed open clusters without noting that we found a lot ourselves. And several discussed the local standard of rest without mentioning our value. Now of course I (officially) don't mind; neither of these are top scientific objectives for me (I don't even think the LSR exists). But it is a Zen-like reminder not to be attached to material things (like citations)!


#LennartFest day 1

I broke my own rules and left #AstroHackWeek to catch up with #LennartFest. The reason for the rule infraction is that the latter meeting is the retirement celebration of Lennart Lindegren (Lund) who is one of the true pioneers in astrometry, and especially astrometry in space and at scale. My loyal reader knows his influence on me!

Talks today were somewhat obscured by my travel exhaustion. But I learned some things! Francois Mignard (Côte d'Azur) gave a nice talk on the reference frame. He started with an argument that we need a frame. I agree that we want inertial proper motions, but I don't agree that they have to be on a coordinate grid. If there is one thing that contemporary physics teaches us it is that you don't need a coordinate system. But the work being done to validate the inertial-ness of the frame is heroic, and important.

Floor van Leewen (Cambridge) spoke about star clusters. He hypothesized—and then showed—that proper motions can be as informative about distance as parallaxes, especially for nearby clusters. This meshes with things Boris Leistedt (NYU) and I have been talking about, and I think we can lay down a solid probabilistic method for combining these kinds of information responsibly.

Letizia Capitanio (Paris) reminded us (I guess, but it was new to me) that the Gaia RVS instrument captures a diffuse interstellar band line. This opens up the possibility that we could do kinematic dust mapping with Gaia! She also showed some competitive dust maps based on Gaussian Process inferences.


#AstroHackWeek day 2

Today Jake vanderPlas (UW) and Boris Leistedt (NYU) gave morning tutorials on Bayesian inference. One of Leistedt's live exercises (done in Jupyter notebooks in real time) involved increasing model complexity in a linear fit and over-fitting. I got sucked into this example:

I made a model where y(x) is a (large) sum of sines and cosines (like a Fourier series). I used way more terms than there are data points, so over-fitting is guaranteed, in the case of maximum likelihood. I then did Bayesian inference with this model, but putting a prior on the coefficients that is more restrictive (more informative) as the wave number increases (or the wavelength decreases). This model is a well-behaved Gaussian process! It was nice to see the continuous evolution from fitting a rigid function to fitting a Gaussian process, all in just a few lines of Python.


#AstroHackWeek day 1

AstroHackWeek 2017 kicked off today, with Adrian Price-Whelan (Princeton) and I doing a tutorial on machine learning. We introduced a couple of ideas and simple methods, and then we set ten (randomly assigned) groups working on five methods on two different data sets. We didn't get very far! But we tried to get the discussion started.

In the afternoon hacking sessions, Price-Whelan and I looked at some suspcious equations in the famous Binney & Tremaine book, 2ed. In Chapter 4, there are lots of integrals of phase-space densities and we developed an argument that some of these equations must have wrong units. We can't be right—Binney & Tremaine can't be wrong—because all of Chapter 4 follows from these equations. But we don't see what we are doing wrong.

[Note added later: We were wrong and B&T is okay; but they don't define their distribution functions very clearly!]



I just returned from an off-grid eclipse trip; hence no posting. Also, it was vacation! The total eclipse (which I saw in central Oregon) was extremely dramatic: The air got cold, the sky got so dark we could see the stars, and there was a quasi-sunset around all the horizon. The corona of the Sun could be seen, during totality, out to more than a Solar Diameter off the limb of the Sun. During late stages of partiality, the shadows created an amazing demonstration of the sense in which a shadow is a convolution of a mask with the illumination, and how that convolution depends on distance (between the object and its shadow). I have much more to say about all this (and photographs), but I will save it for one of my other blogs!

The only research I did during the week was discussion of near-term projects with Bernhard Schölkopf (MPI-IS) and Hans-Walter Rix (MPIA), which will involve building latent-variable models (and related things) on stellar spectra. In some down time, Schölkopf also helped me with some non-trivial linear algebra!


a mistake in an E-M algorithm

[I am on quasi-vacation this week, so only posting irregularly.]

I (finally—or really for the N-th time, because I keep forgetting) understood the basis of E-M algorithms for optimizing (what I call) marginalized likelihoods in latent-variable models. I then worked out the equations for the E-M step for factor analysis, and a generalization of factor analysis that I hope to use in my project with Christina Eilers (MPIA).

Imagine my concern when I got a different update step than I find in the writings of my friend and mentor Sam Roweis (deceased), who is the source of all knowledge, as far as I am concerned! I spent a lot of time looking up stuff on the web, and most things agree with Roweis. But finally I found this note by Andrew Ng (Stanford / Coursera), which agrees with me (and disagrees with Roweis).

If you care about the weeds, the conflict is between equation (8) in those Ng notes and page 3 of these Roweis notes. It is a subtle difference, and it takes some work to translate notation. I wonder if the many documents that match Roweis derive from (possibly unconscious) propagation from Roweis, or whether the flow is in another direction, or whether it is just that the mistake is an easy one to make? Oddly, Ng decorates his equation (8) with a warning about an error you can easily make, but it isn't the error that Roweis made.

So much of importance in computer science and machine learning is buried in lecture notes and poorly indexed documents in user home pages. This is not a good state of affairs!


serious bugs; dimensionality reduction

Megan Bedell (Chicago) and I had a scare today: Although we can show that in very realistically simulated fake data (with unmodeled tellurics, wrong continuum, and so on) a synthetic spectrum (data-driven) beats a binary mask for measuring radial velocities, we found that in real data from the HARPS instrument that the mask was doing better. Why? We went through a period of doubting everything we know. I was on the point of resigning. And then we realized it was a bug in the code! Whew.

Adrian Price-Whelan (Princeton) also found a serious bug in our binary-star fitting. The thing we were calculating as the pericenter distance is the distance of the primary-star center-of-mass to the system barycenter. That's not the minimum separation of the two stars! Duh. That had us rolling on the floor laughing, as the kids say, especially since we might have gotten all the way to submission without noticing that absolutely critical bug.

At the end of the day, I gave the Königstuhl Colloquium, on the blackboard, about dimensionality reduction. I started with a long discussion about what is good and bad about machine learning, and then went (too fast!) through PCA, ICA, kernel-PCA, PPCA, factor analyzers, HMF, E-M algorithms, latent-variable models, and the GPLVM, drawing connections between them. The idea was to give the audience context and jumping-off points for their projects.



Today, in an attempt to make our simulated extreme-precision radial-velocity fake data as conservative as possible, Megan Bedell (Chicago) and I built a ridiculously pessimistic model for un-modeled (and unknown) telluric lines that could be hiding in the spectra, at amplitudes too low to be clearly seen in any individual spectrum, but with the full wavelength range bristling with lines. Sure enough, these “micro-tellurics” (as you might call them) do indeed mess up radial-velocity measurements. The nice thing (from our perspective) is that they mess up the measurements in a way that is co-variant with barycentric velocity, and they mess up synthetic-spectrum-based RV measurements less than binary-mask-based RV measurements.

At MPIA Galaxy Coffee, Irina Smirnova-Pinchukova (MPIA) gave a great talk about her trip on a SOFIA flight.


machine learning, twins, excitation temperature

After our usual start at the Coffee Nerd, it was MW Group Meeting, where we discussed (separately) Cepheids and Solar-neighborhood nucleosynthesis. On the latter, Oliver Philcox (St Andrews) has taken the one-zone models of Jan Rybizki (MPIA) and made them 500 times faster using a neural-network emulator. This emulator is tuned to interpolate a set of (slowly computed) models very quickly and accurately. That's a good use of machine learning! Also, because of backpropagation, it is possible to take the derivatives of the emulator with respect to the inputs (I think) and therefore you also get derivatives, for optimization and sampling.

The afternoon's PSF Coffee meeting had presentations by Meg Bedell (Chicago) about Solar Twin abundances, and by Richard Teague (Michigan) about protoplanetary disk TW Hya. On the former, Bedell showed that she can make extremely precise measurements, because a lot of theoretical uncertainties cancel out. She finds rock-abundance anomalies (that is, abundance anomalies that are stronger in high-condensation-temperature lines) all over the place, which is context for results from Semyeong Oh (Princeton). On TW Hya, Teague showed that it is possible to get pretty consistent temperature information out of line ratios. I would like to see two-dimensional maps of those: Are there embedded temperature anomalies in the disk?


latent-variable model; bound-saturating EPRV

Today, Christina Eilers (MPIA) and I switched her project over to a latent variable model. In this model, stellar spectra (every pixel of every spectrum) and stellar labels (Teff, logg, and so on for every star) are treated on an equal footing as “data”. Then we fit an underlying low-dimensional model to all these data (spectra and labels together). By the end of the day, cross-validation tests were pushing us to higher and higher dimensionality for our latent space, and the quality of our predictions was improving. This seems to work, and is a fully probabilistic generalization of The Cannon. Extremely optimistic about this!

Also today, Megan Bedell (Chicago) built a realistic-data simulation for our EPRV project, and also got pipelines working that measure radial velocities precisely. We have realistic, achievable methods that saturate the Cramér–Rao bound! This is what we planned to do this week not today. However, we have a serious puzzle: We can show that a data-driven synthetic spectral template saturates the bound for radial-velocity measurement, and that a binary mask template does not. But we find that the binary mask is so bad, we can't understand how the HARPS pipeline is doing such a great job. My hypothesis: We are wrong that HARPS is using a binary mask.


linear models for stars

My loyal reader knows that my projects with Christina Eilers (MPIA) failed during the #GaiaSprint, and we promised to re-group. Today we decided to take one last attempt, using either heteroskedastic matrix factorization (or other factor-analysis-like method) or else probabilistic principal components analysis (or a generalization that would be heteroskedastic). The problem with these models is that they are linear in the data space. The benefit is that they are simple, fast, and interpretable. We start tomorrow.

I made a plausible paper plan with Megan Bedell (Chicago) for our extreme-precision radial-velocity project, in which we assess the information loss from various methods for treating the data. We want to make very realistic experiments and give very pragmatic advice.

I also watched as Adrian Price-Whelan (Princeton) used The Joker to find some very strange binary-star systems with red-clump-star primaries: Since a RC star has gone up the giant branch and come back down, it really can't have a companion with a small periastron distance! And yet...



Various hacking sessions happened in undisclosed locations in Heidelberg this weekend. The most productive moment was that in which—in debugging a think-o about how we combine independent samplings in The Joker—Adrian Price-Whelan (Princeton) and I found a very efficient way to make our samplings adapt to the information in the data (likelihood). That is, we used a predictive adaptation to iteratively expand the number of prior samples we use to an appropriate size for our desired posterior output. (Reminder: The Joker is a rejection sampler.) This ended up speeding up our big parallel set of samplings by a factor of 8-ish!


M-dwarf spectral types; reionization

Jessica Birky (UCSD) and I met with Derek Homeier (Heidelberg) and Matthias Samland (MPIA) to update them on the status of the various things Birky has been doing, and discuss next steps. One consequence of this meeting is that we were able to figure out a few well-defined goals for Birky's project by the end of the summer:

Because of a combination of too-small training set and optimization issues in The Cannon, we don't have a great model for M-dwarf stars (yet) as a function of temperature, gravity, and metallicity. That's too bad! But on the other hand, we do seem to have a good (one-dimensional) model of M-dwarf stellar spectra as a function of spectral type. So my proposal is the following: We use the type model to paint types onto all M-dwarf stars in the APOGEE data set, which will probably correlate very well with temperature in a range of metallicities, and then use those results to create recommendations about what spectral modeling would lead to a good model in the more physical parameters.

Late in the day, José Oñorbe (MPIA) gave a great talk about the empirical study of reionization. He began with a long and much needed review of all the ways you can measure reionization, using radio imaging, lyman-alpha forest, damping wings, cosmic microwave background polarization, and so on. This brought together a lot of threads I have been hearing about over the last few years. He then showed his own work on the lyman-alpha forest, where they exploit the thermodynamic memory the low-density gas has about its thermal history. They get good results even with fairly toy models, which is very promising. All indicators, by the way, suggest a very late reionization (redshifts 7 to 9 for the mid-point of the process). That's good for observability.


planning; marginalization

I had phone calls with Megan Bedell (Chicago) and Lauren Anderson (Flatiron) to discuss near-term research plans. Anderson and I discussed whether the precise MW mapping we were doing could be used to measure the length, strength, and amplitude of the Milky Way bar. It looks promising, although (by bad luck), the 2MASS sensitivity to red-clump stars falls off right around the Galactic Center (even above the plane and out of the dust). There are much better surveys for looking at the Galactic center region.

Bedell and I contrasted our plans to build a data-driven extreme-precision radial-velocity (EPRV) pipeline with our plans to write something more information-theoretic and pragmatic about how to maximize RV precision. Because our data-driven pipeline requires some strong applied math, we might postpone that to the Fall, when we are co-spatial with math expertise in New York City.

I was pleased by a visit from Joe Hennawi (UCSB) and Fred Davies (MPIA / UCSB) in which they informed me that some comments I made about sampling approximations to marginalizations changed their strategy in analyzing very high redshift quasars (think z>7) for IGM damping wing (and hence reionization). We discussed details of how you can use a set of prior-drawn simulations to do a nuisance-parameter marginalization (in this case, over the phases of the simulation).


graphical models; bugs

At MPIA MW group meeting, Semyeong Oh (Princeton) described her projects to find—and follow up—co-moving stellar pairs and groups in the Gaia TGAS data. She presented the hypothesis test (or model comparison) by showing the two graphical models, which was an extremely informative and compact way to describe the problem. This led to a post-meeting discussion of graphical models and how to learn about them. There is no really good resource for astronomers. We should write one!

I spent the afternoon with Matthias Samland (MPIA) and Jessica Birky (UCSD), debugging code! Samland is adding a new kind of systematics model to his VLT-SPHERE data analysis. Birky is hitting the limitations of some of our code that implements The Cannon. I got a bit discouraged about the latter: The Cannon is a set of ideas, not a software package! That's good, but it means that I don't have a perfectly reliable and extensible software package.


Simpson's paradox

I spent part of the day working through Moe & Di Stefano (2017), which is an immense and comprehensive paper on binary-star populations. The reason for my attention: Adrian Price-Whelan (Princeton) and I need a parameterization for the binary-star population work we are doing in APOGEE. We are not going to make the same choices as those made by Moe, but there is tons of relevant content in that paper. What a tour de force!

I spent part of the afternoon crashing the RAVE Collaboration meeting at the University of Heidelberg. I learned many things, though my main point was to catch up with Matijevic, Minchev, Steinmetz, and Freeman! Ivan Minchev (AIP), in his talk, discussed relationships between age, metallicity, and Galactocentric radius for stars in the disk. He has a beautiful example of Simpson's paradox, in which, for small population slices (age slices), the lower metallicity stars have higher tangential velocities, but overall the opposite is true, causing measured gradients to depend very strongly on the measurement uncertainties (because: Can you slice the populations finely enough in age?). We discussed paths to resolving this with a proper generative model of the data.