In a great victory, Jessica Birky (UCSD) has used The Cannon to put internally consistent labels on more than 10,000 M-type dwarf stars observed in (both) the SDSS-IV APOGEE and Gaia surveys. Her labels are based on a data-driven model of spectra, but they correlate beautifully with the Gaia photometry and astrometry, and they do a good job of predicting photometric measures of temperature. The spectral model is also remarkable: Using only two parameters (effective temperature and a mean composition) the model explains a typical 8000-pixel APOGEE spectrum to percent-level accuracy. So I am pretty happy. This has implications for TESS too. We spent time late in the day writing the paper.
Years ago, Jo Bovy (now Toronto) and I wrote this crazy paper, in which we infer the force law in the Solar System from a snapshot of the 8 planets' positions and velocities. Because you can't infer dynamics from initial conditions in general, we had to make additional assumptions; we made the assumptions that the system is old, non-resonant, and being observed at no special time. That led to the conclusion that the distribution function should depend only on actions and not on conjugate angles.
But that's not enough: How to do inference? The frequentist solution is orbital roulette, in which you choose the force law(s) in which the conjugate angles look well mixed or uniformly distributed. That's clever, but what's the Bayesian generalization? (Or, really, specification?)
It turns out that there is no way to generate the data with a likelihood function and also insist that the angles be mixed. In Bayesian inference, all you can do is generate the data, and the data can be generated with functions that don't depend on angles. But beyond the generative model, you can't additionally insist that the angles look mixed. That isn't part of the generative model! So the solution (which was expensive) was to just model the kinematic snapshot with a very general form for the distribution function, which has a lot of flexibility but only depends on actions, generate the angles uniformly, and hope for the best. And it worked.
Why am I saying all of this? Because exactly the same issue came up today (and in the last few weeks) between Rix (MPIA) and me: I have this project to find the potential in which the chemical abundances don't vary with angle. And I can make frequentist methods that are based on minimizing angle dependences. But the only Bayesian methods I can create don't really directly insist that the abundances don't depend on angle: They only insist that the abundance distribution is controlled by the actions alone. I spent the non-discussion part of the day coding up relevant stuff.
I gave my annual blackboard Königstuhl Colloquium today. This year I spoke about fitting models, which was a reprise of my 2010 talk that launched the infamous polemical tome. I spent some time on the point that you make your life hard (or your results wrong, or both) if you cut your data or select your sample on the quantities that your model generates. You should cut or trim or select on housekeeping data that aren't part of your probabilistic model! I also talked about outliers, model selection, and subjectiveness.
In the morning, I spent time talking with Rix (MPIA) and (by email) Jo Bovy (Toronto) about my chemical-tangents method, or the idea that dynamical tori must be tangent to chemical-abundance level surfaces in 6-d phase space. Bovy agreed with my position that this idea is new; though I wrote to him about it because it is so closely connected to things he has done and is doing. And Rix agreed that the method doesn't depend (to first order) on survey selection functions. He also made me a toy model that showed feasibility. So this project is on.
Within one frantic half-hour, Eilers (MPIA) and I completely implemented a new version of The Cannon and ran it on her sample of luminous red giants. We did this so that we can compare the internals of her linear model for parallax estimation to the internal derivatives or label dependencies for The Cannon. This will let us take a step towards interpreting the internals of the spectrophotometric-parallax model. We scanned the comparison but it doesn't look quite as easy to interpret as I had hoped.
As soon as this was done, I said some words in MPIA Milky Way group meeting about my ideas for Chemical Tangents: That is, the idea that orbits must lie in the level surfaces (hyper-surfaces in 6-d phase space) of the chemical abundance distribution. The method puts an enormous number of constraints on the orbit space, so it has the potential to be extremely constraining. Rix (MPIA) is suspicious that it all sounds too good to be true: The method requires no knowledge of the selection function (to zeroth order) and no second-order statistics. It is entirely first-order in the data. Damn I hope I'm not wrong here.
In the morning, Rene Andrae (MPIA) showed me his enormous cross-match of spectroscopic surveys that he is putting together in part to understand the stellar parameter pipelines of Gaia (to which he is a contributor). He has the input data for a combinatoric diversity of projects we could do with The Cannon or stellar-parameter self-calibration.
Rix (MPIA) started the day concerned with substantial issues with the linear parallax model that Eilers (MPIA) and I have built; we spent much of the day following them up. Our precision gets worse with distance—an effect we have noticed all summer but haven't been able to explain—and now we have to explain it! We compared stars in clusters and looked at parallax offsets as a function of various things; we don't yet have an explanation. But we did do some straightforward error propagation and guess what: Our precision really can't be much better than the 9-ish percent that we are seeing. The whole exercise left me more confident in the quality of the model in the end: The model really seems to have learned how to cope with dust, age, and intrinsic luminosity effects, even though we didn't tell it how.
In a call with Bonaca (Harvard) we looked at oddities in her model of the morphology of the GD-1 stream gaps. We had some provable scalings that should be there but the code wasn't reproducing them. We worked out today that the stream perturbation isn't quite in the regime we thought it was. In more detail: An encounter of a massive perturber with a stream is impulsive if GM/(b v^2) is much less than 1, where G is Newton's constant, M is the perturber's mass, b is the impact parameter, and v is the relative velocity of the encounter (or maybe some component thereof). That is, you have to have this dimensionless number much less than unity if you want the impulse approximation to hold. Duh! But now we understand the simulations she is making.
The day ended with Birky (UCSD) and I calling Andrew Mann (UNC) and Adam Burgasser (UCSD) to discuss Birky's results modeling M-type dwarf spectra in APOGEE. She has beautiful results, and can show both that her spectral models are accurate (in the space of the spectral data) and that her inferences about latents (temperature and metallicity) are reasonable when compared with proxies and tests of various kinds. So it is time to finish writing it up! We made plans for that. One amusing thing about her project is that it creates a beautiful translation between temperature, metallicity, and spectral type. And it isn't trivial!
Today in MPIA/LSW Stars Meeting Néstor Espinoza (MPIA) gave a nice presentation about how star spots (cool spots) and faculae (hot spots) on stellar surfaces make it difficult to simply extract an exoplanet transit spectrum from differences between in-transit and out-of-transit spectra of the star. Some of the issues are extremely intractable: Even spectral monitoring of the star might not help in certain geometries. But we did agree that space-based spectral monitoring could do a lot towards understanding the issues. He showed that some of the transit-spectrum results in the literature are likely wrong, too. One conclusion: Gaia low-resolution spectrophotometry as a function of transit epoch at Gaia DR4 or thereabouts might have a lot to say here! And I also thought: SPHEREx!
After weeks of writing, today I finished the zeroth draft (yes, it isn't even close to being ready for anything) of the paper about our spectrophotometric parallax model for luminous red giant stars with Eilers (MPIA). I will get it into a state that I can share it with the APOGEE team this week.
And Eilers made maps of kinematic evidence of non-axi-symmetry in the Milky Way disk and radial abundance gradients, using our luminous red giants. We have lots of issues of interpretation, but there are a lot of things here. In my spare brain cycles I figured out a way that we could use Eilers's results to calibrate the variations of the inferred stellar abundances as a function of effective temperature and surface gravity: We can see that the data have issues.
At the suggestion of Rix (MPIA), Eilers (MPIA), Rix, and I applied Eilers's and my linear model for parallax prediction to the RR Lyrae sample from PanSTARRS and Gaia DR2 today. It worked beautifully, delivering an error-convolved scatter of less than 7 percent, and an error-deconvolved intrinsic scatter of something more like 5 percent in distance. That's exciting! Our features are magnitudes, period, and light-curve shape parameters. Eilers was able to do this all in under an hour, because it was a plug-in replacement for the model we built for upper-red-giant-branch stars. This is another confirmation that on sufficiently small parts of the color–magnitude diagram, linear models can do a great job of predicting stellar properties, especially absolute magnitude or distance. Deep learning be damned!
Aside from this, most of my research time today (and this weekend) was spent writing. Trying to submit the red-giant paper before I depart Germany.
I spent the day hiding from all responsibilities in order to write. I wrote in my spectrophotometric-distances paper, and I wrote in my new chemical-tangents paper. I am trying to get the first of these done and submitted before I leave Heidelberg this month.
I also did a little bit of coding in the chemical-tangents project. I wrote up a general integrator that can take a general vertical density profile in the Milky Way and integrate one-dimensional orbits. It produces position, velocity, and phase for general orbits in the general one-d gravitational problem. Next up: Using this to characterize the GALAH data.
After the incredibly valuable Milky Way Group Meeting discussion of the spectrophotometric parallaxes, Eilers (MPIA) and I simplified our model, re-factored the code, and re-ran. And, despite the fact that the new model is provably better than the old model, everything failed. The reason is: Our objective isn't convex. Not only that, but there is an enormously high-dimensional degenerate bad optimum that is hard to avoid. That sent us back to the books: Optimization is hard!
The trick we settled on (and you are allowed to do many, many tricks here) is to take the very highest signal-to-noise stars (in terms of Gaia parallax) to optimize an initialization and then do our final optimization with all stars, but starting off from that initialization. That is, we burn in to the optimum using the best stars first. It's a hack but it worked, and now the better model is performing the way it should be. That's good! Because it is discouraging when you refactor your code and everything goes worse.
A MPIA Galaxy Coffee, Wolfgang Brandner (MPIA) described the new GRAVITY results on the perihelion passage of S2 at the Galactic Center. The perihelion passage shows gravitational and transverse-Doppler redshifts and puts an amazingly strong constraint on the geometry and kinematics of the Galactic Center.
Today was spectrophotometric-parallax day. I did writing in the paper, I presented the method at MPIA Milky Way Group Meeting, and Eilers (MPIA) and I refactored slightly the model. In the presentation I gave, we got lots of feedback about how to present the method, which I tried to record carefully in the to-do list at the top of our LaTeX document. We also realized that without much change, we could move the model from a model for magnitude to a direct model for the parallax, bypassing any physical idea of how the star indicates its parallax (which is through its brightness and its log-g, to leading order). So our model is now truly data-driven. We also realized that we could make changes to how we represent the spectral pixels that might make the parameters more well-behaved.
All these things are great things! But when we made the relevant code changes, everything borked. The reason appears simple: It is because the model has a bad pathology: While it has a very good, sensible, non-trivial optimum, it has an enormous family of degenerate trivial optima in which the exponential underflows, the predicted parallaxes are all zero, and the derivatives all vanish. And at 7400 free parameters, this degenerate set of minima has a huge space (huge entropy) to find and eat our optimizer. So by the end of the day, Eilers and I realized we have to get much more clever about initializing the optimizer.
Question of the day: Does the method need a name, like The Cygnet? Or is it okay to just call it “linear spectrophotometric parallax”?
I spent the day writing in the spectroscopic-parallax project. I wrote six or seven paragraphs, and that's about it! (Actually, that's a great day: My goal is two paragraphs per day.)
But in addition to the writing, I did have an interesting conversation with Tom Herbst (MPIA), Thomas Bertram (MPIA), and Kalyan Radhakrishnan (MPIA) about adaptive optics. The idea is to think about using the science data (the imaging you care about) to update the adaptive mirrors. What new things might be unlocked by that, especially if used in concert with the wavefront sensors? This reminds me of old conversations I have had with Matthew Kenworthy (Leiden). I also asked what kinds of science you might do with the wavefront sensors. Just as the imaging detector gives wavefront information, the wavefront sensors give imaging information!
I also was present for presentations by Eilers (MPIA) and Birky (UCSD) on their stellar projects in the MPIA Stars Group Meeting.
The excitement of the day is that we looked at velocity-tensor maps (maps of the means of average velocity-velocity products) across the disk with Eilers (MPIA): We see lots of structure, including possible evidence of spiral arms or bar resonances on the off-diagonal tensor components. Reminder: If the Galaxy is axisymmetric, there will only be diagonal tensor components in the R, phi, z coordinate system. If we find off-diagonal components: Non-axisymmetry. Could be interesting. Rix (MPIA) encouraged us to stay on target for a Jeans model and leave these hints of complex disk morphology for later investigations.
In addition to this, I had a great chat with Maria Bergemann (MPIA) and Mikhail Kovalev (MPIA) about fitting spectra with spectral models, given that the models are amazingly expensive to compute. They do a (random) grid and then interpolate using The Payne. They are getting some results they aren't happy with, so I walked through basic tests that can be done in these situations.
Basic sanity checks—when you are fitting data using an interpolation of a grid or random assemblage of model predictions—are the following: Find the closest model point in the grid, and then the K next closest, where K is larger than the dimensionality of the model parameter space. Is the best-fit model in the convex hull of the K? Are the K in one group or multiple groups? Do the K look like they hit the edge of the grid? And what are the chi-squared values? And is the interpolated best point also in the convex hull? All these pieces of information go into an analysis of whether you have enough model evaluations and how to interpolate them.
Today Ana Bonaca (Harvard) showed beautifully that the features seen in the GD-1 stellar stream are very well described by an encounter in the past (collision, if you will) with a dark-matter substructure. Her argument is fundamentally qualitative, but so many aspects of the data are matched by the toy model she has made that it is hard to see how to get around the conclusion. This could be huge! We discussed the scope of the paper she could write (or really the content of the abstract).
We spent the day discussing Milky-Way halo and disk structures with Amina Helmi (Kapteyn). It was fun! Along the way, Adrian Price-Whelan (Princeton) and I spent time looking at large halo structures that have been found in the literature. We could find some extremely odd structures when we match the cuts used in the papers we were looking at. And then we found the following:
Say you are cutting at parallax signal-to-noise of 5 (parallax over parallax error greater than 5). And then you look at the configuration-space shape of the stellar distribution you find? Well guess what? Since parallax errors are a strong function of sky position, the shape of your object will be very strange at large distance. For instance, the parallax errors only go below 0.05 mas in some parts of the sky. So your stellar distribution will only extend out past 4 kpc in some specific directions (and not all directions).
All this relates to various things I have said repeatedly in this forum: Build your science on measured quantities, not estimated uncertainties on those quantities! Your uncertainties are not really your data, and it is almost impossible to know your uncertainties on your uncertainties. Furthermore, the people who want to cut on parallax signal-to-noise are also using inverse-parallax as distance, and that's dangerous too. Finally, if you cut on parallax signal-to-noise, you will bias any means or averages or regressions you do using those parallaxes.
My advice: Find ways to work that don't require these cuts. These issues are a big danger for people using actions to study the stellar distribution: Actions require distances, distances are generally inverse-parallax, and then low signal-to-noise parallaxes must get cut. These arguments apply there too. We have to forward-model the data if we want to understand spatial structures, I am afraid.
At lunch we had a discussion (inspired by Bertrand Goldman, MPIA) about the expected shapes of open clusters. I think they should be elongated along their orbits. There was some back and forth but this made me more confident: Once the clusters start to disperse, they should distort through orbital phase-frequency differences. I proposed a simple test of this. But I'm more interested in the point that this should help us find new kinds of (maybe older) clusters!
In the afternoon, Amina Helmi (Kapteyn) showed up and Bonaca (Harvard), Price-Whelan (Princeton), and I discussed many things with her. We discussed the question of when and how stellar streams in the Milky Way halo constrain purely local properties of the Galaxy. Does this result (from Bonaca) depend on the potential being time-dependent? I think it does. Helmi didn't disagree but is optimistic that we can handle the time dependence.
We also discussed the lack of tidal tails around globular clusters: Is it surprising that only Palomar 5 has these tails? Price-Whelan has looked at a few of the most likely clusters in Gaia, and nada. This led to (or was part of) a longer discussion of the statistics of streams: How many will there be and how many do we expect?