fiber number issues

[I was on vacation for a few days.]

Just before I left, Melissa Ness discovered that instrumental fiber number is a good predictor of whether or not two stars will get similar abundances in APOGEE, either with The Cannon or with the standard pipeline! This is perhaps not a surprise: The different fibers have different line-spread functions, and sit on different parts of the detector. We discussed how to mitigate this, and looked at the dependence of the issues on fiber number and line-spread function FWHM separately.

For the nth time, I re-wrote my abstract (that is, the scope of a possible paper) on what you could learn about a star's intrinsic properties from a Gaia-like parallax measurement. I think the focus perhaps should be the subjectivity of it: What you can learn depends on what you know and believe.

Hans-Walter Rix decided that my talk at the end of this week should be on the graphical model as a tool for data analysis. I hope he is right!


chemical equilibria and bimodality; etc.

DFM and I worked through issues remaining in our MCMC Data Analysis Recipes paper, which I would like to post on the arXiv this month (or next!). We also worked through some remaining issues in his long-period transiting exoplanet paper, in which he discovers and estimates the population of very long-period planets in the Kepler data.

David Weinberg (OSU) gave a nice talk about how stellar populations come to chemical equilibria, making use of nucleosynthetic models. He looked at how star-formation events might appear in the metallicity distribution. He also showed the beautiful data on the alpha-abundance bimodality in the APOGEE data, but in the end did not give a confident explanation of that bimodality, which really is intriguing.

I also had a substantial chat with Matthias Samland about his project to constrain the directly emitted infrared spectrum of an exoplanet using multiple data sources. He has the usual issues of inconsistent data calibration, correlated noise in extracted spectra, and the simultaneous fitting of photometry and spectroscopy. It looks like he will have lots of good conclusions, though: The specta are highly informative.


get simpler; chemical diversity

Dan Foreman-Mackey opined that the main thing I was communicating with my ur-complex probabilistic graphical model for stars and Gaia is it's complicated. So I built a simplified pgm, crushing all the stellar physics into one node, with the intention of blowing up that node into its own model in a separate figure.

I had a great conversation at lunch with David Weinberg (OSU) about stellar chemical abundances, and in particular whether we could show that chemical diversity is larger than can be explained with any model in which supernovae ejecta get fully mixed into the ISM. He pointed out that while there does appear to be diversity among stars, in certain directions the diversity of abundance ratios is extremely tiny; in particular: Although the alpha/Fe distribution is bimodal, at any Fe/H, the two modes are both very narrow! That's super-interesting.


Gaia DR-1

Today Coryn Bailer-Jones (MPIA) gave a great talk at MPIA about the upcoming first data release from the Gaia mission. It will be called “Gaia DR-1” and it will contain a primary release of the TGAS sample of about 2 million stars with proper motions and parallaxes, and a secondary release of about a billion stars with just positions. He walked us through why the DR-1 data will be very limited relative to later releases. But (as my loyal reader knows) I am stoked! The talk was clear and complete, and gives us all lots to think about.

In the morning, Hans-Walter Rix and I met with Karin Lind (MPIA), Maria Bergemann (MPIA), Sven Buder (MPIA), Morgan Fouesneau (MPIA), and Joachim Bestenlehner (MPIA) to talk about how we combine Gaia parallaxes with optical and infrared spectroscopy to determine stellar parameters. The meeting was inspired by inconsistencies between how we expected people to use the spectral and astrometric data. We learned in the meeting that there are simple ways forward to improve both the spectral analysis and things that will be done with Gaia. I showed the crowd my probabilistic graphical model (directed acyclic graph) and we used it to guide the discussion. Almost as useful as showing the PGM for the full problem was showing the PGMs for what is being done now. As my loyal reader knows, I think PGMs are great for communicating about problem structure, and planning code.


ABC is hard; The Cannon with missing labels

I spent some time discussing ABC today with Joe Hennawi (MPIA) and Fred Davies (MPIA), with some help from Dan Foreman-Mackey. The context is the transmission of the IGM (the forest) at very high redshift. We discussed the distance metric to use when you are comparing two distributions, and I suggested the K-S statistic. I suggested this not because I love it, but because there is experience in the literature with it. For ABC to work (I think) all you need is that the distance metric go to zero if and only if the data statistics equal the simulation statistics, and that the metric be convex (which perhaps is implied in the word “distance metric”; I'm not sure about that). That said, the ease with which you can ABC sample depends strongly on the choice (and details within that choice). There is a lot of art to the ABC method. We don't expect the sampling in the Hennawi–Davies problem to be easy.

As part of the above discussion, Foreman-Mackey pointed out that when you do an MCMC sampling, you can be hurt by unimportant nuisance parameters. That is, if you add 100 random numbers to your inference as additional parametersL, each of which has no implications for the likelihood at all, your MCMC still may slow way down, because you still have to accept/reject the prior! Crazy, but true, I think.

In other news, Christina Eilers (MPIA) showed today that she can simultaneously optimize the internal parameters of The Cannon and the labels of training-set objects with missing labels! The context is labeling dwarf stars in the SEGUE data, using labels from Gaia-ESO. This is potentially a big step for data-driven spectral inference, because right now we are restricted (very severely) to training sets with complete labels.


Gaia, Gaia, Gaia, and dust!

In the late morning, there was an absolutely great Milky Way Group meeting. I love this forum. Ted von Hippel (Embry-Riddle) walked us through the use of white dwarfs as clocks, and the method by which one could constrain (very precisely, in principle) the star-formation history of the Solar Neighborhood, especially post-Gaia. He argued strongly (and correctly) that the right method is not to make a histogram of temperatures! Some of the physics is non-trivial: The less massive WDs are larger, so I would expect them to cool much faster (more surface area, less stuff to store heat), but apparently this isn't so, for detailed reasons about heat capacity.

Also in that meeting, Nicholas Martin (Strasbourg) told us about the possibility that some of the new ultra-faint dwarfs being found in the Local Group might be LMC or SMC satellites, and Haijun Tian (MPIA) showed us the latest on proper motions from PanSTARRS. On the latter, even the first data release from Gaia (that only releases positions for the relevant stars) will be hugely informative.

Mark Allen (CDS Strasbourg) told us about the structure and projects at the CDS, which is one of the most important and long-lived archives and data centers in astronomy. The CDS is composed not just of astronomers, but also data librarians, concerned with long-term preservation and correctness. The CDS is going to be one of the Gaia data-release points, so we pressed him for details!

Over lunch, I discussed with Sara Rezaei (MPIA) and Coryn Bailer-Jones (MPIA) and more of the MPIA Gaia group their non-parametric model for Milky Way dust, which my loyal reader will remember from discussions in summers past. Rezaei has a working system, but (as predicted) the linear algebra is extremely expensive. I showed them a method to collapse their problem to a smaller problem that is instantiated only at the stars, and nowhere else (that is, to live the non-parametric dream!). If we are lucky, this might give them speedups of hundreds or more. Here's hoping.


loving the graphical model; abundances of binaries

I decided to take a shot at writing up a short note around our stellar-parameters-in-the-age-of-Gaia probabilistic graphical model (directed acyclic graph). The first step was turning our hand-drawn model and typesetting it with Daft. In the process I found some “issues” with Daft, if you know what I'm sayin'. I discussed a bit with Hans-Walter Rix what should be the scope and abstract of this note. He came up with a simple demonstration that Sven Buder (MPIA) could do with some stellar models and an MCMC sampler to support the document.

In the afternoon, Taisiya Kopytova (MPIA) pitched a project to Melissa Ness and I: Find the chemical differences between the APOGEE targets with and without low-mass stellar companions. There should be some! Because we can make large, matched samples of stars with and without companions, we ought to be able to perform this test very precisely. This relates to things we talked about with Kevin Schlaufman (OCIW) many months ago; we will bring him into the loop.


The Cannon with noisy and missing labels

Dan Foreman-Mackey showed up in Heidelberg today. Hans-Walter Rix and I interviewed him about his ideas around training The Cannon when the training-set objects have noisy labels. We came up with some simple ideas, and Foreman-Mackey thinks it might even be possible to just sample the whole thing. I'm skeptical, but it is also the case that Jonathan Weare (Chicago) and Charles Matthews (Chicago) both also said the same thing to me when I spoke in Chicago this past Spring.

In related news, Anna-Christina Eilers (MPIA) is working with The Cannon in a context in which some of her training-set objects are completely missing some labels. We discussed how to simultaneously optimize the internal model parameters and the missing label values. It should work, but the conversation really reminded me of the regrettable point that The Cannon is just a maximum-likelihood system!

In a long conversation in Rix's office, Melissa Ness, Rix, and I drew a graphical model for stellar parameter estimation in the age of Gaia. Rix has an intuition that we are going to want to use different parameters when we are combining photometry, spectroscopy, and astrometry. I think he is right. Is our probabilistic graphical model publishable?


small telescopes and low intensities

I spent some research time on the weekend working on the question that Joe Hennawi (MPIA) asked me on Friday: What is the sensitivity of a telescope and detector to very faint features on the sky as a function of aperture diameter and focal ratio? There are various regimes to consider. In one, the object is much smaller than a resolution element on the focal plane (the point source limit). In another, much larger, but still much smaller than the detector as a whole. In another, larger even than the detector array. There are slightly different answers in each case, but the large telescope does better in the first two cases, and even in the third if the faint object has any structure on the scale of the detector. I wrote words about this and wondered if there was something to publish (informally or otherwise). Of course small telescopes often have much better optics and scattered light properties, so this isn't the end of the story!


clusters, telescopes, and pixel models

In the morning, Melissa Ness showed me very strong evidence that not all open clusters are mono-abundance! This is surprising, and violates my expectations based on recent work by Jo Bovy. It doesn't literally conflict with his results, because the clusters in which we find this are not the clusters he looked at, and the elements we find it in are exposed only subtly in the data. Now we have yet another paper to write this summer!

At lunch, Joe Hennawi (MPIA) interviewed me about the argument (often given by the builders of small telescope projects) that the surface-brightness sensitivity of a small telescope is just as good as that of a large telescope, provided that the detectors and f-ratios are similar. I have always been suspicious of this argument, and Hennawi shares my suspicion. We discussed the simplifications in the argument and planned to write out a well-posed data analysis question and answer it. A project for the garden.

Late in the day, Jeroen Bouwman (MPIA) showed me the results of running a CPM-like (modeling data with data) model on some Spitzer spectroscopy of a star through a planet transit, as we had discussed a few days ago. It looks like it does almost as well as a full calibration of the data. In precision, not accuracy though: There are some things that the data-driven model erases. This is the problem in general with data-driven models: When you go data-driven, you lose some aspects of interpretability.


binary stars, Nyquist, and TESS

At MPIA Galaxy Coffee, I learned about star-formation rates and dust in galaxy disks from Sarah Leslie (MPIA) and about galaxy nuclear star cluster formation from Nicolas Guillard (MPIA). Afterwards, I had a long conversation with Simon J. Murphy (Sydney) about his work on asteroseismic binary stars. I asked him to try doing some experiments towards building a (less approximate) likelihood for asteroseismic binary signals. We also talked about super-Nyquist frequency determination, and also his (very clever) idea for improving TESS by having the satellite disrupt its periodic sampling of stars each time it points back at Earth to downlink. This breaks the pure periodicity of the time sampling (in the spacecraft frame) and therefore delivers more information, without overly complexifying spacecraft operations or data analyses.

Late in the day I thought about weighing stars in Gaia using the “lensing” of unbound star trajectories. I am not sure it is possible, but it is interesting to think about.


not ready for Gaia, exoplanet spectroscopy

A constant theme of the past summers in Heidelberg has been our unreadiness for the Gaia data releases. The first release is this September 14, so it really is now down to the wire. Hans-Walter Rix led a very interactive Milky Way Group Meeting today on the subject, where we first (with the help of Coryn Bailer-Jones MPIA) outlined the contents of the first data release, and then brain-stormed projects. Here (below) is the board after the first part. My view is that there are a lot of papers that could be published in the first week after the release, and good papers, too! But it is important to understand up-front what the restricted data release can and cannot do.

In the afternoon, I had a long conversation with Jeroen Bouwman (MPIA) and Juergen Schreiber (MPIA) about spectroscopy of exoplanet transits with infrared spectrographs. There are many systematic effects, some deterministically related to spacecraft state and pointing, some more stochastic, and the idea is to model them as accurately as possible. I suggested an approach very similar to Dun Wang's CPM, in which we model data pixel values at one wavelength using pixels at other wavelengths. We came up with first steps to try.


omg delta-Scuti stars

In the MPIA stellar physics group meeting today, in what might be the best seminar I have ever heard about the Kepler data, Simon J. Murphy (Sydney) told us about things he has been doing with delta-Scuti variable stars. There were so many results in his 30-minute discussion I can't even name them all here, but here are some highlights: delta-Scuti stars have such high coherence (Qs in the 105 or 106 range, or maybe even higher) that they are almost as good as pulsars as timing standards. By timing the stars, you can see binaries, and he has found hundreds of binaries. He can measure the binary parameters in the timing of every one of multiple modes! But there are some binaries where you can see the asteroseismic modes of both stars in the binary! Like double-lined spectroscopic binaries, these are double-moded asteroseismic binaries. Insane, and incredible tests of stellar physics. But that's not all: He has some that are even transiting too, to test radius models and more. Among his binaries are brown dwarfs, but also a few planets. One of these is a habitable-zone planet in orbit around an A-type star! That's a first, and very good for habitability, because it is near the triple-point for water but also has lots of blue photons for life-giving free energy! He showed eccentricity and period distributions for binaries, showing evidence for circularization at small radii and also showing posterior inferences about binaries at periods much longer than the Kepler mission lifetime. So. Many. Things.

In addition to this, Mia Lundkvist (LSW Heidelberg) gave a pedagogical introduction to asteroseismology and showed her evidence that super-Earths are getting ablated or destroyed or stripped when they are very close to their host stars. I spent the afternoon reading and editing in Dan Foreman-Mackey's latest draft for his long-period planet paper.


stellar pairs too close for comfort?

Today started my summer at MPIA in Heidelberg. I had many conversations with Hans-Walter Rix about our various projects, and then spent a long time working with Melissa Ness on pairs of stars with near-identical abundances. Ness finds that if the stars are very, very close in 19-dimensional abundance space, they also tend to be close in position (like within 500 pc). That's hard to understand, since red-clump stars (her targets) are a few Gyr old, and in a few Gyr, two stars that are formed together ought to have drifted apart by a few kpc!

Her results are surprising if true. This left us at the end of the day with Ness working to see if there are any issues with the data that could be causing the problems, and Rix and I discussing possible astrophysical explanations. Julianne Dalcanton (UW) and Rix jointly suggested that we are finding disrupted binaries, that are disrupting because of the red-giant mass loss that leads to red-clumpiness! This explanation is insane but works on many levels. Now to test it.


optimized photometry and etc.

I finished the zeroth draft of my philosophical paper on Dawkins. I sent it to a few friendlies for comments, and some of them gave fast responses which make me think I need to do massive revision! So it goes!

I had a long chat with Dan Foreman-Mackey about current projects and ideas. We discussed the sensibility (or not) of publishing the results (with Dun Wang and Bernhard Schölkopf) on using ICA to find or perform photometry on variable stars in crowded fields. I want to publish this, because it is so simple (and maybe contains some insight) but we can't see how this generalizes or wouldn't be demolished by even a simple forward model. We discussed a generalization of The Cannon that simultaneously fits a physical model and then a data-driven model for just the residuals away from that physical model. If we could get away with a linear model for the residuals, we could explicitly marginalize out a bunch of stuff. Foreman-Mackey reported on work by Benjamin Pope (Oxford) on photometry that connects to the OWL, my unpublished (and, really, dormant-to-dead) project on optimal photometry in bad imaging. I resolved to drop Pope a line.