Aspen, day 5

Today was a whirlwind of meetings and sessions. I started with a great conversation with Vasiliev (Cambridge) and Valluri (Michigan) about statistical inference problems related to Schwarzschild modeling a galaxy using an orbit library. I'm not sure I helped much! Right now no-one knows how to realistically marginalize out the effective distribution function, although I think there might be good ideas somewhere in the probabilistic machine-learning world.

We had a plenary discussion of non-steady-state and non-equilibrium aspects of the Milky Way and how we will model or understand them. Fundamentally, we only know how to infer the dynamics of the Milky Way by making strong assumptions: Either (in the case of Jeans or Schwarzschild modeling, say) that there is time symmetry and also cylindrical or spherical symmetries, or (in the case of stream modeling, say) that the stars were put onto orbits in some collectively informative way. Since the real Milky Way violates these assumptions for most stars at some level, we need qualitatively new kinds of assumptions to make. My proposal: That the Milky Way grew from the cosmological initial conditions! That's the right thing, I think, but we don't yet have any tractable way to think about (say) the Gaia data in that context. At least not precisely.

In the afternoon, there was a cross-meeting dark-matter session in which a large set of particle physicists and a large set of astrophysicists interacted over testing dark-matter models. I learned that there is a huge literature we don't know enough about. I am very interested in going down this path, because it connects Gaia and SDSS-V to fundamental properties of the Universe. That's what I would really love to do (and that's in part why my original idea for SDSS-V was called "Disco": Cosmology with the disk).

At some point in the day, I realized that we can test chemical-tangents method (my baby) in fake data! I discussed this with Loebman (Davis) and Price-Whelan (Princeton). I also realized that we can compare it to Jeans modeling, and show (I hope) that it always wins.

At the very end of the day, I had a conversation with Anderson (Flatiron) about advising and mentoring of postdocs. I feel very lost, I have to say: I want to give my attention to all my projects, my attention and time are limited, I don't spend my attention in the right places, I disappoint many of my people, and I impede their progress. I feel like I am doing it wrong! I don't feel like I understand how to be a mentor, and I am starting to feel stressed about it. One of the strange things is that the postdocs with whom I work are both the best and most fun collaborators I have, and also independent, capable scientists. That would seem to make it all easy and fun, but instead it somehow makes it confusing and existential. I went home unhappy from an absolutely great week in Aspen.


Aspen, day 4

Adrian Price-Whelan (Princeton) resolved some of our code differences today as unit or dimensions differences. That was good! But we still have the problem that different elements (in comparison with kinematics) lead to different inferences about orbits in the Milky Way disk. Don't know what to do about that! Either the data are wrong, or there is a big discovery here.

Ana Bonaca (Harvard), Price-Whelan, and I discussed how to build a pseudo-likelihood for comparing the models that Bonaca has for a stream perturbation to the real data. This is a bit of a hard problem, because we want objectives that improve as the agreement improves, but we don't want to build a fully generative model of the data. Why not? because we don't have a good generative model, and perturbations away from a bad generative model could lead to very wrong inferences. All we want, after all, is a rough sense of what kinds of events are consistent with the data.

In the afternoon, I gave the Aspen Center for Physics Colloquium. I spoke about Gaia and dark matter, but I also threw in my thinking about the inference of Solar System dynamics in the 17th Century: We would do it very differently now! I have much more to say but I am too tired to write it here.

Aspen, day 3

Side by side, Price-Whelan (Princeton) and I worked through and discussed code inconsistencies between my code and his code to compute the likelihood function for my chemical-tangents (working title) project, that uses chemical invariants to find dynamical invariants. It was a frustrating discussion, because we couldn't figure out either the issues or how to test. That's our project for tomorrow!

Bonaca (Harvard) reported in: She can show that the gaps and associated loops we find in the GD-1 stream cannot be caused by interaction with a molecular cloud on a disk orbit! That means that the only explanation remaining is dark-matter substructure. Awesome! I'm stoked!


Aspen, day 2

After I found an insanely huge and existential bug in my code, Price-Whelan (Princeton) and I did a full code review of my project that takes advantage of chemical-abundance invariance to determine dynamical invariants. The big issue is that the action computation is expensive; it involves some kind of integration or quadrature. As we were discussing this, Eugene Vasiliev (Cambridge) joined us and suggested ways to speed up the action calculation using the energy invariant to aid in the integration.

[Insert tire screeching noise] We don't need actions! We only need some kind of invariant for this project. Indeed, I have tried various different invariants and they all work equally well. So we can use the energy invariant, which requires no integration! Woah, and thanks Vasiliev! That sped up the code by factors of many hundreds, which we then partially compensated down by doing more complete MCMC samplings. But development cycle is far improved.

I also had a great planning session with Bonaca (Harvard) where we worked out coordinate systems and methods for making far more realistic our project to model the GD-1 stream gaps. We are modeling one of the gaps and spurs with a dark-matter (or really dense-object) interaction. We needed to make things far more realistic, because we want to rule out disk-passage events as gap causes. That requires that we have GD-1 orbiting in a Galaxy with a disk and the observer (us) in the right places at the right times. Things are complicated, because everything happens on the kinematic equivalent of the past light cone—the past star cone?


Aspen, day 1

I'm in Aspen for the week, working on Milky Way dynamics and chemical abundances. The day started with introductions, in which many themes arose. One is that detailed abundances are here, are good, are numerous, and are under-utilized. So the work I am doing fits in pretty well! It occurred to me (Duh!), when some of the more particle-oriented people spoke, that imaging the dark matter is no different from testing gravity, at least conceptually. So I can spin off a testing-gravity side project.

The MCMC runs I sent off working on my laptop on Friday did not disappoint: I got amazingly strong constraints on the dynamics of the disk, including a percent-level measurement of the disk density! That's a precision of course; at percent level none of my assumptions are defensible. But it works really well: The iso-chemical contours really do show you the orbits, and precisely.

But, the most interesting respect in which my assumptions are violated is that the different elements want to put the midplane of the disk in different locations! Huh? And the effect is just clearly visible in the GALAH data. Rix (MPIA) pointed out today that the phase-mixing times can be long near the disk center because in the limit of a harmonic potential, all frequencies are degenerate and mixing doesn't happen. So maybe we have clear evidence for non-phase-mixing, vertically, in the disk. Or of course maybe there are (very adversarial, I might say) issues with the GALAH data. But the nice thing is you can just see it in the element abundances. Look for yourself: The midplane in [Fe/H] looks different from the midplane in [Si/Fe]. (Plots have z-velocity on the horizontal axis and z-position on the vertical axis).


bad development cycle is bad

My day started with a conversation with Christina Eilers (MPIA) about the Milky Way rotation curve. We found some strange kinematics points that might be messing with us, and realized that they are almost all stars at or past the Bulge, and therefore not affecting our results, which are only for Galactocentric radii greater than 5 kpc, to avoid the craziness of the bar (which violates our dynamical assumptions). Her figures are ready, so I encouraged her to write figure captions and assemble the paper.

I spent my research time getting MCMC running on my Chemical Tangents project. I have a marginalized likelihood, so all I had to do is put on priors and insert into emcee. Oh how I would have benefitted from a testing environment! When I packaged it all up for emcee I messed up the units of almost all the inputs, so I got garbage in every MCMC run. And the runs took a long time, so diagnosis was painful. Unit testing. And for units! Live and fail to learn, that's what I say.

Once everything appeared to be working, I set up some (nasty) multi-processing, set my laptop to stay awake all night, and blew processes. I should have converged samplings by morning.


best constraints on disk dynamics ever

In my Chemical-Tangents project (working title only), I am modeling the abundance distribution as a function of dynamical actions to determine the shapes of the orbits in phase space, or, equivalently, the force law or the density distribution or the potential. I have parameters for the force law but also for the abundances and their variation. I spent the morning figuring out how to marginalized out those abundance parameters, which are nuisances (for my purposes). I got it working; much of it is even analytic (so I only have to do one numerical integral per element ratio).

I then ran on everything, and I find that I have the best constraints on the Milky Way disk dynamics, ever! That is, on the kinematic location of the midplane, the scale height of the mass distribution, and the central or mean density. I am pretty stoked: Each abundance ratio individually gives good constraints on these parameters, and their combination will be exceedingly constraining. So I am pretty confident that I have a great project. My next job is to put this all into a sampler (like emcee, which is good for low-dimensional problems) and sample it.


detecting stars and galaxies in imaging

Another enjoyable morning today working with Dustin Lang (Toronto) on combining images. We looked at how the Zackay paper I talked about yesterday relates to ideas in speckle imaging. And Lang showed me his very nice argument that a detection map (like an image but really a statement about maximum-likelihood flux estimates for stars) that is sufficient for detecting point sources in the field is also sufficient for detecting galaxies. That is intuitively surprising. But it is also a claim in the Zackay paper. We have generalized the result, I think, to include variations in pixel scale, noise, and bandpass across the collection of images used to make the detection map.


what is a methods paper?

Dustin Lang (Toronto) and I spent time discussing this strongly worded paper about combining images. The paper makes extremely strong claims about what its method for co-adding images can do; it claims that the combined image is “optimal” for any measurement of any time-independent quantity of any kind in the original images. The word “optimal” is one I'm allergic to: For one, once you have written down your assumptions with sufficient precision, there is no optimal method, there is just one method! For two, optimal must be for a specific purpose, or set of assumptions. So while it is probably true (we are looking at it) that this paper delivers a method that is optimal for some purposes, it cannot be for any purpose whatsoever.

I guess I have a few general philosophical points to make here: Methods flow from assumptions! So if you can clearly state a complete set of assumptions, you will fully specify your method; there will be only one method that makes sense or is justifiable under those assumptions. It will therefore be (trivially) optimal for your purposes. That is, any well-specified method is optimal, by construction. And methods are defined not by what they do but by what they don't do. That is, your job when you deliver a method is to explain all the ways that it won't work and will fail and won't be appropriate in real-world situations. Because most people are trying to figure out if their problem is appropriate to your method! This means that much of the discussion of a methodological paper should be about how or why the method will fail or become wrong as the assumptions are violated. And finally, a method that relies on you knowing your PSF perfectly, and having perfectly registered images, and having no rotations of the detector relative to the the sky, and having all images through exactly the same atmospheric transmission, and having all images taken with the same filter and detector, and having no photon noise beyond background noise, and having perfect flat-fielding, and having identical noise in every pixel, and having absolutely no time variability is not a method that is optimal for any measurement. That said, the paper contains some extremely good and important ideas, and we will be citing it positively.


writing is slow

I spent research time putting references in to my paper introduction, and outlining the method and discussion sections. I'm not efficient at writing! I enjoy it, but it isn't easy.


dynamics, abundances, and a likelihood function

In an undisclosed location off the grid I worked on my code to leverage element-abundance invariances of stars (in the GALAH Survey) to deliver dynamical invariants. I switched from a non-parametric abundance model that put the stars into abundance bins to a parametric model which makes the abundances a smooth function of actions. I also improved my action-computing code and built an interpolator to speed things up. And suddenly my likelihood function got really smooth, and it peaks at sensible dynamical parameters for the Milky Way disk! So I have a project. Note that the way I am operating is to work on the likelihood function until I can see that it behaves sensibly on slices (lines in parameter space) and only then will I think about the inference more generally.

After that success, I tweaked code, ran stuff, and started making an outline of the hard parts of the paper I need to write.


integrating orbits; calculating actions

I got up early and worked on an integrator to integrate orbits in a simplified one-dimensional dynamics problem designed to mimic the vertical properties of the Milky Way disk. I was integrating the orbits around in circles (closed orbits in 1-d kinematic space, or 2-d phase space) to get the actions and angles, but I realized that I only really need to integrate one quadrant: There are symmetries! So I switched to make use of those symmetries.

I am also working towards code that doesn't require one integral per star; I want to integrate a look-up table and interpolate to the stars. I need everything to be fast, because we have to compute these actions and angles inside the inference (optimization or MCMC) loop.

Of course part of me wonders why I am doing this at all since both galpy and gala probably do everything I need right out of the box. I just can't stop myself! (And I'm learning a lot.)


detecting sources in imaging

I spent a large part of the morning going through the figures, exposition, and connection to the literature in the draft by Dustin Lang (Toronto) and me about how to detect a star in a collection of probably multi-band images. There is interesting and interestingly different prior work, which wasn't prior when we started this project! Yes, it's an old paper. One of the points Lang makes is that what you detect—if, say, you use Bayes or posterior ratios—depends both on your priors and your utility. And when there are complex combinations of things creating detections in your images (including asteroids, cosmic rays, red and blue galaxies, quasars, stars, and high-redshift g dropouts, say), you want those priors to be reasonable if you want your catalogs to be reasonable.

I also spoke with Kate Storey-Fisher (NYU) about adversarial systematics in cosmology and other radical projects, and [unnamed collaborators] about our HST proposal, which is nearly ready (and anonymous, which is cool!).


a likelihood function for element abundances

On the flight back to NYC, I worked on building a likelihood function for the dependence of element abundances on dynamical actions. It is a hard problem; what freedom to give the model? I was doing crazy binning experiments, trying to make the model fully non-parametric, but it has annoying pathologies, like that the assignments of stars to bins is strongly dynamical-parameter-dependent. And thus you get a jagged likelihood function. At the end of a long set of numerical experiments on my laptop, I realized the following simple thing (duh):

All the abundance-ratio dependences on vertical action are extremely smooth! So using a parameterized rather than non-parametric form should be absolutely fine. By the time I had this realization, I was done all I could do for the day, so: Tomorrow!


last day in HD

It was my last day in Heidelberg today! Very sad to go; I love it here. I spent my day making sure my projects are on target as I disappear to an undisclosed location for the next week and a half. Also Simon J Murphy (Sydney) showed up and we talked delta Scuti stars and also observation planning for time-domain discovery.


so many M dwarfs!

In a great victory, Jessica Birky (UCSD) has used The Cannon to put internally consistent labels on more than 10,000 M-type dwarf stars observed in (both) the SDSS-IV APOGEE and Gaia surveys. Her labels are based on a data-driven model of spectra, but they correlate beautifully with the Gaia photometry and astrometry, and they do a good job of predicting photometric measures of temperature. The spectral model is also remarkable: Using only two parameters (effective temperature and a mean composition) the model explains a typical 8000-pixel APOGEE spectrum to percent-level accuracy. So I am pretty happy. This has implications for TESS too. We spent time late in the day writing the paper.


making the angle distribution uniform

Years ago, Jo Bovy (now Toronto) and I wrote this crazy paper, in which we infer the force law in the Solar System from a snapshot of the 8 planets' positions and velocities. Because you can't infer dynamics from initial conditions in general, we had to make additional assumptions; we made the assumptions that the system is old, non-resonant, and being observed at no special time. That led to the conclusion that the distribution function should depend only on actions and not on conjugate angles.

But that's not enough: How to do inference? The frequentist solution is orbital roulette, in which you choose the force law(s) in which the conjugate angles look well mixed or uniformly distributed. That's clever, but what's the Bayesian generalization? (Or, really, specification?)

It turns out that there is no way to generate the data with a likelihood function and also insist that the angles be mixed. In Bayesian inference, all you can do is generate the data, and the data can be generated with functions that don't depend on angles. But beyond the generative model, you can't additionally insist that the angles look mixed. That isn't part of the generative model! So the solution (which was expensive) was to just model the kinematic snapshot with a very general form for the distribution function, which has a lot of flexibility but only depends on actions, generate the angles uniformly, and hope for the best. And it worked.

Why am I saying all of this? Because exactly the same issue came up today (and in the last few weeks) between Rix (MPIA) and me: I have this project to find the potential in which the chemical abundances don't vary with angle. And I can make frequentist methods that are based on minimizing angle dependences. But the only Bayesian methods I can create don't really directly insist that the abundances don't depend on angle: They only insist that the abundance distribution is controlled by the actions alone. I spent the non-discussion part of the day coding up relevant stuff.


talking models; chemically identified tori

I gave my annual blackboard Königstuhl Colloquium today. This year I spoke about fitting models, which was a reprise of my 2010 talk that launched the infamous polemical tome. I spent some time on the point that you make your life hard (or your results wrong, or both) if you cut your data or select your sample on the quantities that your model generates. You should cut or trim or select on housekeeping data that aren't part of your probabilistic model! I also talked about outliers, model selection, and subjectiveness.

In the morning, I spent time talking with Rix (MPIA) and (by email) Jo Bovy (Toronto) about my chemical-tangents method, or the idea that dynamical tori must be tangent to chemical-abundance level surfaces in 6-d phase space. Bovy agreed with my position that this idea is new; though I wrote to him about it because it is so closely connected to things he has done and is doing. And Rix agreed that the method doesn't depend (to first order) on survey selection functions. He also made me a toy model that showed feasibility. So this project is on.


The Cannon again, chemical tori

Within one frantic half-hour, Eilers (MPIA) and I completely implemented a new version of The Cannon and ran it on her sample of luminous red giants. We did this so that we can compare the internals of her linear model for parallax estimation to the internal derivatives or label dependencies for The Cannon. This will let us take a step towards interpreting the internals of the spectrophotometric-parallax model. We scanned the comparison but it doesn't look quite as easy to interpret as I had hoped.

As soon as this was done, I said some words in MPIA Milky Way group meeting about my ideas for Chemical Tangents: That is, the idea that orbits must lie in the level surfaces (hyper-surfaces in 6-d phase space) of the chemical abundance distribution. The method puts an enormous number of constraints on the orbit space, so it has the potential to be extremely constraining. Rix (MPIA) is suspicious that it all sounds too good to be true: The method requires no knowledge of the selection function (to zeroth order) and no second-order statistics. It is entirely first-order in the data. Damn I hope I'm not wrong here.

In the morning, Rene Andrae (MPIA) showed me his enormous cross-match of spectroscopic surveys that he is putting together in part to understand the stellar parameter pipelines of Gaia (to which he is a contributor). He has the input data for a combinatoric diversity of projects we could do with The Cannon or stellar-parameter self-calibration.


projects examined

Rix (MPIA) started the day concerned with substantial issues with the linear parallax model that Eilers (MPIA) and I have built; we spent much of the day following them up. Our precision gets worse with distance—an effect we have noticed all summer but haven't been able to explain—and now we have to explain it! We compared stars in clusters and looked at parallax offsets as a function of various things; we don't yet have an explanation. But we did do some straightforward error propagation and guess what: Our precision really can't be much better than the 9-ish percent that we are seeing. The whole exercise left me more confident in the quality of the model in the end: The model really seems to have learned how to cope with dust, age, and intrinsic luminosity effects, even though we didn't tell it how.

In a call with Bonaca (Harvard) we looked at oddities in her model of the morphology of the GD-1 stream gaps. We had some provable scalings that should be there but the code wasn't reproducing them. We worked out today that the stream perturbation isn't quite in the regime we thought it was. In more detail: An encounter of a massive perturber with a stream is impulsive if GM/(b v^2) is much less than 1, where G is Newton's constant, M is the perturber's mass, b is the impact parameter, and v is the relative velocity of the encounter (or maybe some component thereof). That is, you have to have this dimensionless number much less than unity if you want the impulse approximation to hold. Duh! But now we understand the simulations she is making.

The day ended with Birky (UCSD) and I calling Andrew Mann (UNC) and Adam Burgasser (UCSD) to discuss Birky's results modeling M-type dwarf spectra in APOGEE. She has beautiful results, and can show both that her spectral models are accurate (in the space of the spectral data) and that her inferences about latents (temperature and metallicity) are reasonable when compared with proxies and tests of various kinds. So it is time to finish writing it up! We made plans for that. One amusing thing about her project is that it creates a beautiful translation between temperature, metallicity, and spectral type. And it isn't trivial!


star spots and exoplanets; mapping the disk

Today in MPIA/LSW Stars Meeting Néstor Espinoza (MPIA) gave a nice presentation about how star spots (cool spots) and faculae (hot spots) on stellar surfaces make it difficult to simply extract an exoplanet transit spectrum from differences between in-transit and out-of-transit spectra of the star. Some of the issues are extremely intractable: Even spectral monitoring of the star might not help in certain geometries. But we did agree that space-based spectral monitoring could do a lot towards understanding the issues. He showed that some of the transit-spectrum results in the literature are likely wrong, too. One conclusion: Gaia low-resolution spectrophotometry as a function of transit epoch at Gaia DR4 or thereabouts might have a lot to say here! And I also thought: SPHEREx!

After weeks of writing, today I finished the zeroth draft (yes, it isn't even close to being ready for anything) of the paper about our spectrophotometric parallax model for luminous red giant stars with Eilers (MPIA). I will get it into a state that I can share it with the APOGEE team this week.

And Eilers made maps of kinematic evidence of non-axi-symmetry in the Milky Way disk and radial abundance gradients, using our luminous red giants. We have lots of issues of interpretation, but there are a lot of things here. In my spare brain cycles I figured out a way that we could use Eilers's results to calibrate the variations of the inferred stellar abundances as a function of effective temperature and surface gravity: We can see that the data have issues.


RR Lyrae like red giants

At the suggestion of Rix (MPIA), Eilers (MPIA), Rix, and I applied Eilers's and my linear model for parallax prediction to the RR Lyrae sample from PanSTARRS and Gaia DR2 today. It worked beautifully, delivering an error-convolved scatter of less than 7 percent, and an error-deconvolved intrinsic scatter of something more like 5 percent in distance. That's exciting! Our features are magnitudes, period, and light-curve shape parameters. Eilers was able to do this all in under an hour, because it was a plug-in replacement for the model we built for upper-red-giant-branch stars. This is another confirmation that on sufficiently small parts of the color–magnitude diagram, linear models can do a great job of predicting stellar properties, especially absolute magnitude or distance. Deep learning be damned!

Aside from this, most of my research time today (and this weekend) was spent writing. Trying to submit the red-giant paper before I depart Germany.


writing and integrating

I spent the day hiding from all responsibilities in order to write. I wrote in my spectrophotometric-distances paper, and I wrote in my new chemical-tangents paper. I am trying to get the first of these done and submitted before I leave Heidelberg this month.

I also did a little bit of coding in the chemical-tangents project. I wrote up a general integrator that can take a general vertical density profile in the Milky Way and integrate one-dimensional orbits. It produces position, velocity, and phase for general orbits in the general one-d gravitational problem. Next up: Using this to characterize the GALAH data.


optimization is the worst

After the incredibly valuable Milky Way Group Meeting discussion of the spectrophotometric parallaxes, Eilers (MPIA) and I simplified our model, re-factored the code, and re-ran. And, despite the fact that the new model is provably better than the old model, everything failed. The reason is: Our objective isn't convex. Not only that, but there is an enormously high-dimensional degenerate bad optimum that is hard to avoid. That sent us back to the books: Optimization is hard!

The trick we settled on (and you are allowed to do many, many tricks here) is to take the very highest signal-to-noise stars (in terms of Gaia parallax) to optimize an initialization and then do our final optimization with all stars, but starting off from that initialization. That is, we burn in to the optimum using the best stars first. It's a hack but it worked, and now the better model is performing the way it should be. That's good! Because it is discouraging when you refactor your code and everything goes worse.

A MPIA Galaxy Coffee, Wolfgang Brandner (MPIA) described the new GRAVITY results on the perihelion passage of S2 at the Galactic Center. The perihelion passage shows gravitational and transverse-Doppler redshifts and puts an amazingly strong constraint on the geometry and kinematics of the Galactic Center.


spectrophotometric parallax; optimization fail

Today was spectrophotometric-parallax day. I did writing in the paper, I presented the method at MPIA Milky Way Group Meeting, and Eilers (MPIA) and I refactored slightly the model. In the presentation I gave, we got lots of feedback about how to present the method, which I tried to record carefully in the to-do list at the top of our LaTeX document. We also realized that without much change, we could move the model from a model for magnitude to a direct model for the parallax, bypassing any physical idea of how the star indicates its parallax (which is through its brightness and its log-g, to leading order). So our model is now truly data-driven. We also realized that we could make changes to how we represent the spectral pixels that might make the parameters more well-behaved.

All these things are great things! But when we made the relevant code changes, everything borked. The reason appears simple: It is because the model has a bad pathology: While it has a very good, sensible, non-trivial optimum, it has an enormous family of degenerate trivial optima in which the exponential underflows, the predicted parallaxes are all zero, and the derivatives all vanish. And at 7400 free parameters, this degenerate set of minima has a huge space (huge entropy) to find and eat our optimizer. So by the end of the day, Eilers and I realized we have to get much more clever about initializing the optimizer.

Question of the day: Does the method need a name, like The Cygnet? Or is it okay to just call it “linear spectrophotometric parallax”?