the Universe has been expanding since before 1924

I had a great conversation with Markus Pössel (MPIA) and Johannes Fröschle (MPIA) about Fröschle's work re-analyzing data on the expansion of the Universe. He is looking at when the expansion of the Universe was clearly discovered, and (subsequently) when the acceleration was clearly discovered. His approach is to reanalyze historical data sets with clear, simple hypotheses, and perform Bayesian evidence tests. He finds that even in 1924 the expansion of the Universe was clearly and firmly established, by such a large factor that in fact it was probably known much earlier.

This conversation got me thinking about a more general question, which is simple: Imagine you have measured a set of galaxy redshifts but know nothing about distances. How much data do you need to infer that the Universe is expanding? The two hypotheses are: Galaxies have random velocities but with a well-defined rest frame, with respect to which we are moving, and the same but there is expansion. You don't know any distances nor any expansion parameter. Go! I bet that once you have good sky coverage, you are done, even without any distance information at all.

In the afternoon, Melissa Ness and I worked on fiber-number and LSF issues in the APOGEE data. There are clear trends of abundance measurements with fiber number (presumably mainly because of the variation in spectrograph resolution). We worked on testing methods to remove them, which involve correcting the training set going in to The Cannon and also giving The Cannon information to simultaneously fit the relevant (nuisance) trends.

At the end of the day, I gave my talk at MPIA on probabilistic graphical models.


preparing talk

My only real research today was to prepare my talk on probabilistic graphical models. I do have lots to say, and maybe a Data Analysis Recipes to write on the subject.


the Milky Way bar

At MW group meeting, Gail Zasowski (JHU) talked about work with Melissa Ness showing that the Milky Way has a kinematic bar. They see the bar very clearly in the distribution of stellar velocities projected onto the sky as a function of sky coordinates. It appears as a skewness or a second component in the velocity distribution. I wondered after the talk about the spiral structure: If it exists, shouldn't it also show up this way? There are hints of spiralness in the literature on stellar velocities, but nothing nearly as clear as what Zasowski and Ness see in the bulge for the bar.

Hans-Walter Rix suggested that I give my annual Königstuhl Colloquium here at MPIA on probabilistic graphical models, in part because he himself has found them so useful for clarifying our ideas about inference of stellar parameters, given data from Gaia, RAVE, Kepler, and so on. Late in the day, Dan Foreman-Mackey said just the opposite: His opinion is that PGMs merit either 10 minutes or else a semester course; nothing in-between makes sense! Might be too late.



Today Gail Zasowski (JHU) showed up at MPIA for a week or so. Hans-Walter Rix brought up the possibility that we should consider—for the next generation of SDSS projects—using the BOSS spectrograph in concert with the APOGEE spectrograph to study the detailed chemical abundances of millions of stars. He pointed out that both observationally (Anna Ho's work) and theoretically (Yuan-Sen Ting's work) we believe that we can get many precise chemical abundances out of high signal-to-noise spectra at low (2000-ish) resolution.

Late in the day I worked on text for a large NSF Physics Frontier Center proposal we are putting in at NYU to build new data-science methods for the physical sciences of cosmology, planetary and stellar dynamics, and particle physics. This is being led by Kyle Cranmer (NYU) but includes many other luminaries, including our new hires at (that is, coming to) NYU: Anthony Pullen, Yacine Ali-Haïmoud, and Kat Deck!


fiber number issues

[I was on vacation for a few days.]

Just before I left, Melissa Ness discovered that instrumental fiber number is a good predictor of whether or not two stars will get similar abundances in APOGEE, either with The Cannon or with the standard pipeline! This is perhaps not a surprise: The different fibers have different line-spread functions, and sit on different parts of the detector. We discussed how to mitigate this, and looked at the dependence of the issues on fiber number and line-spread function FWHM separately.

For the nth time, I re-wrote my abstract (that is, the scope of a possible paper) on what you could learn about a star's intrinsic properties from a Gaia-like parallax measurement. I think the focus perhaps should be the subjectivity of it: What you can learn depends on what you know and believe.

Hans-Walter Rix decided that my talk at the end of this week should be on the graphical model as a tool for data analysis. I hope he is right!


chemical equilibria and bimodality; etc.

DFM and I worked through issues remaining in our MCMC Data Analysis Recipes paper, which I would like to post on the arXiv this month (or next!). We also worked through some remaining issues in his long-period transiting exoplanet paper, in which he discovers and estimates the population of very long-period planets in the Kepler data.

David Weinberg (OSU) gave a nice talk about how stellar populations come to chemical equilibria, making use of nucleosynthetic models. He looked at how star-formation events might appear in the metallicity distribution. He also showed the beautiful data on the alpha-abundance bimodality in the APOGEE data, but in the end did not give a confident explanation of that bimodality, which really is intriguing.

I also had a substantial chat with Matthias Samland about his project to constrain the directly emitted infrared spectrum of an exoplanet using multiple data sources. He has the usual issues of inconsistent data calibration, correlated noise in extracted spectra, and the simultaneous fitting of photometry and spectroscopy. It looks like he will have lots of good conclusions, though: The specta are highly informative.


get simpler; chemical diversity

Dan Foreman-Mackey opined that the main thing I was communicating with my ur-complex probabilistic graphical model for stars and Gaia is it's complicated. So I built a simplified pgm, crushing all the stellar physics into one node, with the intention of blowing up that node into its own model in a separate figure.

I had a great conversation at lunch with David Weinberg (OSU) about stellar chemical abundances, and in particular whether we could show that chemical diversity is larger than can be explained with any model in which supernovae ejecta get fully mixed into the ISM. He pointed out that while there does appear to be diversity among stars, in certain directions the diversity of abundance ratios is extremely tiny; in particular: Although the alpha/Fe distribution is bimodal, at any Fe/H, the two modes are both very narrow! That's super-interesting.


Gaia DR-1

Today Coryn Bailer-Jones (MPIA) gave a great talk at MPIA about the upcoming first data release from the Gaia mission. It will be called “Gaia DR-1” and it will contain a primary release of the TGAS sample of about 2 million stars with proper motions and parallaxes, and a secondary release of about a billion stars with just positions. He walked us through why the DR-1 data will be very limited relative to later releases. But (as my loyal reader knows) I am stoked! The talk was clear and complete, and gives us all lots to think about.

In the morning, Hans-Walter Rix and I met with Karin Lind (MPIA), Maria Bergemann (MPIA), Sven Buder (MPIA), Morgan Fouesneau (MPIA), and Joachim Bestenlehner (MPIA) to talk about how we combine Gaia parallaxes with optical and infrared spectroscopy to determine stellar parameters. The meeting was inspired by inconsistencies between how we expected people to use the spectral and astrometric data. We learned in the meeting that there are simple ways forward to improve both the spectral analysis and things that will be done with Gaia. I showed the crowd my probabilistic graphical model (directed acyclic graph) and we used it to guide the discussion. Almost as useful as showing the PGM for the full problem was showing the PGMs for what is being done now. As my loyal reader knows, I think PGMs are great for communicating about problem structure, and planning code.


ABC is hard; The Cannon with missing labels

I spent some time discussing ABC today with Joe Hennawi (MPIA) and Fred Davies (MPIA), with some help from Dan Foreman-Mackey. The context is the transmission of the IGM (the forest) at very high redshift. We discussed the distance metric to use when you are comparing two distributions, and I suggested the K-S statistic. I suggested this not because I love it, but because there is experience in the literature with it. For ABC to work (I think) all you need is that the distance metric go to zero if and only if the data statistics equal the simulation statistics, and that the metric be convex (which perhaps is implied in the word “distance metric”; I'm not sure about that). That said, the ease with which you can ABC sample depends strongly on the choice (and details within that choice). There is a lot of art to the ABC method. We don't expect the sampling in the Hennawi–Davies problem to be easy.

As part of the above discussion, Foreman-Mackey pointed out that when you do an MCMC sampling, you can be hurt by unimportant nuisance parameters. That is, if you add 100 random numbers to your inference as additional parametersL, each of which has no implications for the likelihood at all, your MCMC still may slow way down, because you still have to accept/reject the prior! Crazy, but true, I think.

In other news, Christina Eilers (MPIA) showed today that she can simultaneously optimize the internal parameters of The Cannon and the labels of training-set objects with missing labels! The context is labeling dwarf stars in the SEGUE data, using labels from Gaia-ESO. This is potentially a big step for data-driven spectral inference, because right now we are restricted (very severely) to training sets with complete labels.


Gaia, Gaia, Gaia, and dust!

In the late morning, there was an absolutely great Milky Way Group meeting. I love this forum. Ted von Hippel (Embry-Riddle) walked us through the use of white dwarfs as clocks, and the method by which one could constrain (very precisely, in principle) the star-formation history of the Solar Neighborhood, especially post-Gaia. He argued strongly (and correctly) that the right method is not to make a histogram of temperatures! Some of the physics is non-trivial: The less massive WDs are larger, so I would expect them to cool much faster (more surface area, less stuff to store heat), but apparently this isn't so, for detailed reasons about heat capacity.

Also in that meeting, Nicholas Martin (Strasbourg) told us about the possibility that some of the new ultra-faint dwarfs being found in the Local Group might be LMC or SMC satellites, and Haijun Tian (MPIA) showed us the latest on proper motions from PanSTARRS. On the latter, even the first data release from Gaia (that only releases positions for the relevant stars) will be hugely informative.

Mark Allen (CDS Strasbourg) told us about the structure and projects at the CDS, which is one of the most important and long-lived archives and data centers in astronomy. The CDS is composed not just of astronomers, but also data librarians, concerned with long-term preservation and correctness. The CDS is going to be one of the Gaia data-release points, so we pressed him for details!

Over lunch, I discussed with Sara Rezaei (MPIA) and Coryn Bailer-Jones (MPIA) and more of the MPIA Gaia group their non-parametric model for Milky Way dust, which my loyal reader will remember from discussions in summers past. Rezaei has a working system, but (as predicted) the linear algebra is extremely expensive. I showed them a method to collapse their problem to a smaller problem that is instantiated only at the stars, and nowhere else (that is, to live the non-parametric dream!). If we are lucky, this might give them speedups of hundreds or more. Here's hoping.


loving the graphical model; abundances of binaries

I decided to take a shot at writing up a short note around our stellar-parameters-in-the-age-of-Gaia probabilistic graphical model (directed acyclic graph). The first step was turning our hand-drawn model and typesetting it with Daft. In the process I found some “issues” with Daft, if you know what I'm sayin'. I discussed a bit with Hans-Walter Rix what should be the scope and abstract of this note. He came up with a simple demonstration that Sven Buder (MPIA) could do with some stellar models and an MCMC sampler to support the document.

In the afternoon, Taisiya Kopytova (MPIA) pitched a project to Melissa Ness and I: Find the chemical differences between the APOGEE targets with and without low-mass stellar companions. There should be some! Because we can make large, matched samples of stars with and without companions, we ought to be able to perform this test very precisely. This relates to things we talked about with Kevin Schlaufman (OCIW) many months ago; we will bring him into the loop.


The Cannon with noisy and missing labels

Dan Foreman-Mackey showed up in Heidelberg today. Hans-Walter Rix and I interviewed him about his ideas around training The Cannon when the training-set objects have noisy labels. We came up with some simple ideas, and Foreman-Mackey thinks it might even be possible to just sample the whole thing. I'm skeptical, but it is also the case that Jonathan Weare (Chicago) and Charles Matthews (Chicago) both also said the same thing to me when I spoke in Chicago this past Spring.

In related news, Anna-Christina Eilers (MPIA) is working with The Cannon in a context in which some of her training-set objects are completely missing some labels. We discussed how to simultaneously optimize the internal model parameters and the missing label values. It should work, but the conversation really reminded me of the regrettable point that The Cannon is just a maximum-likelihood system!

In a long conversation in Rix's office, Melissa Ness, Rix, and I drew a graphical model for stellar parameter estimation in the age of Gaia. Rix has an intuition that we are going to want to use different parameters when we are combining photometry, spectroscopy, and astrometry. I think he is right. Is our probabilistic graphical model publishable?


small telescopes and low intensities

I spent some research time on the weekend working on the question that Joe Hennawi (MPIA) asked me on Friday: What is the sensitivity of a telescope and detector to very faint features on the sky as a function of aperture diameter and focal ratio? There are various regimes to consider. In one, the object is much smaller than a resolution element on the focal plane (the point source limit). In another, much larger, but still much smaller than the detector as a whole. In another, larger even than the detector array. There are slightly different answers in each case, but the large telescope does better in the first two cases, and even in the third if the faint object has any structure on the scale of the detector. I wrote words about this and wondered if there was something to publish (informally or otherwise). Of course small telescopes often have much better optics and scattered light properties, so this isn't the end of the story!


clusters, telescopes, and pixel models

In the morning, Melissa Ness showed me very strong evidence that not all open clusters are mono-abundance! This is surprising, and violates my expectations based on recent work by Jo Bovy. It doesn't literally conflict with his results, because the clusters in which we find this are not the clusters he looked at, and the elements we find it in are exposed only subtly in the data. Now we have yet another paper to write this summer!

At lunch, Joe Hennawi (MPIA) interviewed me about the argument (often given by the builders of small telescope projects) that the surface-brightness sensitivity of a small telescope is just as good as that of a large telescope, provided that the detectors and f-ratios are similar. I have always been suspicious of this argument, and Hennawi shares my suspicion. We discussed the simplifications in the argument and planned to write out a well-posed data analysis question and answer it. A project for the garden.

Late in the day, Jeroen Bouwman (MPIA) showed me the results of running a CPM-like (modeling data with data) model on some Spitzer spectroscopy of a star through a planet transit, as we had discussed a few days ago. It looks like it does almost as well as a full calibration of the data. In precision, not accuracy though: There are some things that the data-driven model erases. This is the problem in general with data-driven models: When you go data-driven, you lose some aspects of interpretability.


binary stars, Nyquist, and TESS

At MPIA Galaxy Coffee, I learned about star-formation rates and dust in galaxy disks from Sarah Leslie (MPIA) and about galaxy nuclear star cluster formation from Nicolas Guillard (MPIA). Afterwards, I had a long conversation with Simon J. Murphy (Sydney) about his work on asteroseismic binary stars. I asked him to try doing some experiments towards building a (less approximate) likelihood for asteroseismic binary signals. We also talked about super-Nyquist frequency determination, and also his (very clever) idea for improving TESS by having the satellite disrupt its periodic sampling of stars each time it points back at Earth to downlink. This breaks the pure periodicity of the time sampling (in the spacecraft frame) and therefore delivers more information, without overly complexifying spacecraft operations or data analyses.

Late in the day I thought about weighing stars in Gaia using the “lensing” of unbound star trajectories. I am not sure it is possible, but it is interesting to think about.


not ready for Gaia, exoplanet spectroscopy

A constant theme of the past summers in Heidelberg has been our unreadiness for the Gaia data releases. The first release is this September 14, so it really is now down to the wire. Hans-Walter Rix led a very interactive Milky Way Group Meeting today on the subject, where we first (with the help of Coryn Bailer-Jones MPIA) outlined the contents of the first data release, and then brain-stormed projects. Here (below) is the board after the first part. My view is that there are a lot of papers that could be published in the first week after the release, and good papers, too! But it is important to understand up-front what the restricted data release can and cannot do.

In the afternoon, I had a long conversation with Jeroen Bouwman (MPIA) and Juergen Schreiber (MPIA) about spectroscopy of exoplanet transits with infrared spectrographs. There are many systematic effects, some deterministically related to spacecraft state and pointing, some more stochastic, and the idea is to model them as accurately as possible. I suggested an approach very similar to Dun Wang's CPM, in which we model data pixel values at one wavelength using pixels at other wavelengths. We came up with first steps to try.


omg delta-Scuti stars

In the MPIA stellar physics group meeting today, in what might be the best seminar I have ever heard about the Kepler data, Simon J. Murphy (Sydney) told us about things he has been doing with delta-Scuti variable stars. There were so many results in his 30-minute discussion I can't even name them all here, but here are some highlights: delta-Scuti stars have such high coherence (Qs in the 105 or 106 range, or maybe even higher) that they are almost as good as pulsars as timing standards. By timing the stars, you can see binaries, and he has found hundreds of binaries. He can measure the binary parameters in the timing of every one of multiple modes! But there are some binaries where you can see the asteroseismic modes of both stars in the binary! Like double-lined spectroscopic binaries, these are double-moded asteroseismic binaries. Insane, and incredible tests of stellar physics. But that's not all: He has some that are even transiting too, to test radius models and more. Among his binaries are brown dwarfs, but also a few planets. One of these is a habitable-zone planet in orbit around an A-type star! That's a first, and very good for habitability, because it is near the triple-point for water but also has lots of blue photons for life-giving free energy! He showed eccentricity and period distributions for binaries, showing evidence for circularization at small radii and also showing posterior inferences about binaries at periods much longer than the Kepler mission lifetime. So. Many. Things.

In addition to this, Mia Lundkvist (LSW Heidelberg) gave a pedagogical introduction to asteroseismology and showed her evidence that super-Earths are getting ablated or destroyed or stripped when they are very close to their host stars. I spent the afternoon reading and editing in Dan Foreman-Mackey's latest draft for his long-period planet paper.


stellar pairs too close for comfort?

Today started my summer at MPIA in Heidelberg. I had many conversations with Hans-Walter Rix about our various projects, and then spent a long time working with Melissa Ness on pairs of stars with near-identical abundances. Ness finds that if the stars are very, very close in 19-dimensional abundance space, they also tend to be close in position (like within 500 pc). That's hard to understand, since red-clump stars (her targets) are a few Gyr old, and in a few Gyr, two stars that are formed together ought to have drifted apart by a few kpc!

Her results are surprising if true. This left us at the end of the day with Ness working to see if there are any issues with the data that could be causing the problems, and Rix and I discussing possible astrophysical explanations. Julianne Dalcanton (UW) and Rix jointly suggested that we are finding disrupted binaries, that are disrupting because of the red-giant mass loss that leads to red-clumpiness! This explanation is insane but works on many levels. Now to test it.