detecting sources in imaging

I spent a large part of the morning going through the figures, exposition, and connection to the literature in the draft by Dustin Lang (Toronto) and me about how to detect a star in a collection of probably multi-band images. There is interesting and interestingly different prior work, which wasn't prior when we started this project! Yes, it's an old paper. One of the points Lang makes is that what you detect—if, say, you use Bayes or posterior ratios—depends both on your priors and your utility. And when there are complex combinations of things creating detections in your images (including asteroids, cosmic rays, red and blue galaxies, quasars, stars, and high-redshift g dropouts, say), you want those priors to be reasonable if you want your catalogs to be reasonable.

I also spoke with Kate Storey-Fisher (NYU) about adversarial systematics in cosmology and other radical projects, and [unnamed collaborators] about our HST proposal, which is nearly ready (and anonymous, which is cool!).


a likelihood function for element abundances

On the flight back to NYC, I worked on building a likelihood function for the dependence of element abundances on dynamical actions. It is a hard problem; what freedom to give the model? I was doing crazy binning experiments, trying to make the model fully non-parametric, but it has annoying pathologies, like that the assignments of stars to bins is strongly dynamical-parameter-dependent. And thus you get a jagged likelihood function. At the end of a long set of numerical experiments on my laptop, I realized the following simple thing (duh):

All the abundance-ratio dependences on vertical action are extremely smooth! So using a parameterized rather than non-parametric form should be absolutely fine. By the time I had this realization, I was done all I could do for the day, so: Tomorrow!


last day in HD

It was my last day in Heidelberg today! Very sad to go; I love it here. I spent my day making sure my projects are on target as I disappear to an undisclosed location for the next week and a half. Also Simon J Murphy (Sydney) showed up and we talked delta Scuti stars and also observation planning for time-domain discovery.


so many M dwarfs!

In a great victory, Jessica Birky (UCSD) has used The Cannon to put internally consistent labels on more than 10,000 M-type dwarf stars observed in (both) the SDSS-IV APOGEE and Gaia surveys. Her labels are based on a data-driven model of spectra, but they correlate beautifully with the Gaia photometry and astrometry, and they do a good job of predicting photometric measures of temperature. The spectral model is also remarkable: Using only two parameters (effective temperature and a mean composition) the model explains a typical 8000-pixel APOGEE spectrum to percent-level accuracy. So I am pretty happy. This has implications for TESS too. We spent time late in the day writing the paper.


making the angle distribution uniform

Years ago, Jo Bovy (now Toronto) and I wrote this crazy paper, in which we infer the force law in the Solar System from a snapshot of the 8 planets' positions and velocities. Because you can't infer dynamics from initial conditions in general, we had to make additional assumptions; we made the assumptions that the system is old, non-resonant, and being observed at no special time. That led to the conclusion that the distribution function should depend only on actions and not on conjugate angles.

But that's not enough: How to do inference? The frequentist solution is orbital roulette, in which you choose the force law(s) in which the conjugate angles look well mixed or uniformly distributed. That's clever, but what's the Bayesian generalization? (Or, really, specification?)

It turns out that there is no way to generate the data with a likelihood function and also insist that the angles be mixed. In Bayesian inference, all you can do is generate the data, and the data can be generated with functions that don't depend on angles. But beyond the generative model, you can't additionally insist that the angles look mixed. That isn't part of the generative model! So the solution (which was expensive) was to just model the kinematic snapshot with a very general form for the distribution function, which has a lot of flexibility but only depends on actions, generate the angles uniformly, and hope for the best. And it worked.

Why am I saying all of this? Because exactly the same issue came up today (and in the last few weeks) between Rix (MPIA) and me: I have this project to find the potential in which the chemical abundances don't vary with angle. And I can make frequentist methods that are based on minimizing angle dependences. But the only Bayesian methods I can create don't really directly insist that the abundances don't depend on angle: They only insist that the abundance distribution is controlled by the actions alone. I spent the non-discussion part of the day coding up relevant stuff.


talking models; chemically identified tori

I gave my annual blackboard Königstuhl Colloquium today. This year I spoke about fitting models, which was a reprise of my 2010 talk that launched the infamous polemical tome. I spent some time on the point that you make your life hard (or your results wrong, or both) if you cut your data or select your sample on the quantities that your model generates. You should cut or trim or select on housekeeping data that aren't part of your probabilistic model! I also talked about outliers, model selection, and subjectiveness.

In the morning, I spent time talking with Rix (MPIA) and (by email) Jo Bovy (Toronto) about my chemical-tangents method, or the idea that dynamical tori must be tangent to chemical-abundance level surfaces in 6-d phase space. Bovy agreed with my position that this idea is new; though I wrote to him about it because it is so closely connected to things he has done and is doing. And Rix agreed that the method doesn't depend (to first order) on survey selection functions. He also made me a toy model that showed feasibility. So this project is on.


The Cannon again, chemical tori

Within one frantic half-hour, Eilers (MPIA) and I completely implemented a new version of The Cannon and ran it on her sample of luminous red giants. We did this so that we can compare the internals of her linear model for parallax estimation to the internal derivatives or label dependencies for The Cannon. This will let us take a step towards interpreting the internals of the spectrophotometric-parallax model. We scanned the comparison but it doesn't look quite as easy to interpret as I had hoped.

As soon as this was done, I said some words in MPIA Milky Way group meeting about my ideas for Chemical Tangents: That is, the idea that orbits must lie in the level surfaces (hyper-surfaces in 6-d phase space) of the chemical abundance distribution. The method puts an enormous number of constraints on the orbit space, so it has the potential to be extremely constraining. Rix (MPIA) is suspicious that it all sounds too good to be true: The method requires no knowledge of the selection function (to zeroth order) and no second-order statistics. It is entirely first-order in the data. Damn I hope I'm not wrong here.

In the morning, Rene Andrae (MPIA) showed me his enormous cross-match of spectroscopic surveys that he is putting together in part to understand the stellar parameter pipelines of Gaia (to which he is a contributor). He has the input data for a combinatoric diversity of projects we could do with The Cannon or stellar-parameter self-calibration.


projects examined

Rix (MPIA) started the day concerned with substantial issues with the linear parallax model that Eilers (MPIA) and I have built; we spent much of the day following them up. Our precision gets worse with distance—an effect we have noticed all summer but haven't been able to explain—and now we have to explain it! We compared stars in clusters and looked at parallax offsets as a function of various things; we don't yet have an explanation. But we did do some straightforward error propagation and guess what: Our precision really can't be much better than the 9-ish percent that we are seeing. The whole exercise left me more confident in the quality of the model in the end: The model really seems to have learned how to cope with dust, age, and intrinsic luminosity effects, even though we didn't tell it how.

In a call with Bonaca (Harvard) we looked at oddities in her model of the morphology of the GD-1 stream gaps. We had some provable scalings that should be there but the code wasn't reproducing them. We worked out today that the stream perturbation isn't quite in the regime we thought it was. In more detail: An encounter of a massive perturber with a stream is impulsive if GM/(b v^2) is much less than 1, where G is Newton's constant, M is the perturber's mass, b is the impact parameter, and v is the relative velocity of the encounter (or maybe some component thereof). That is, you have to have this dimensionless number much less than unity if you want the impulse approximation to hold. Duh! But now we understand the simulations she is making.

The day ended with Birky (UCSD) and I calling Andrew Mann (UNC) and Adam Burgasser (UCSD) to discuss Birky's results modeling M-type dwarf spectra in APOGEE. She has beautiful results, and can show both that her spectral models are accurate (in the space of the spectral data) and that her inferences about latents (temperature and metallicity) are reasonable when compared with proxies and tests of various kinds. So it is time to finish writing it up! We made plans for that. One amusing thing about her project is that it creates a beautiful translation between temperature, metallicity, and spectral type. And it isn't trivial!


star spots and exoplanets; mapping the disk

Today in MPIA/LSW Stars Meeting Néstor Espinoza (MPIA) gave a nice presentation about how star spots (cool spots) and faculae (hot spots) on stellar surfaces make it difficult to simply extract an exoplanet transit spectrum from differences between in-transit and out-of-transit spectra of the star. Some of the issues are extremely intractable: Even spectral monitoring of the star might not help in certain geometries. But we did agree that space-based spectral monitoring could do a lot towards understanding the issues. He showed that some of the transit-spectrum results in the literature are likely wrong, too. One conclusion: Gaia low-resolution spectrophotometry as a function of transit epoch at Gaia DR4 or thereabouts might have a lot to say here! And I also thought: SPHEREx!

After weeks of writing, today I finished the zeroth draft (yes, it isn't even close to being ready for anything) of the paper about our spectrophotometric parallax model for luminous red giant stars with Eilers (MPIA). I will get it into a state that I can share it with the APOGEE team this week.

And Eilers made maps of kinematic evidence of non-axi-symmetry in the Milky Way disk and radial abundance gradients, using our luminous red giants. We have lots of issues of interpretation, but there are a lot of things here. In my spare brain cycles I figured out a way that we could use Eilers's results to calibrate the variations of the inferred stellar abundances as a function of effective temperature and surface gravity: We can see that the data have issues.


RR Lyrae like red giants

At the suggestion of Rix (MPIA), Eilers (MPIA), Rix, and I applied Eilers's and my linear model for parallax prediction to the RR Lyrae sample from PanSTARRS and Gaia DR2 today. It worked beautifully, delivering an error-convolved scatter of less than 7 percent, and an error-deconvolved intrinsic scatter of something more like 5 percent in distance. That's exciting! Our features are magnitudes, period, and light-curve shape parameters. Eilers was able to do this all in under an hour, because it was a plug-in replacement for the model we built for upper-red-giant-branch stars. This is another confirmation that on sufficiently small parts of the color–magnitude diagram, linear models can do a great job of predicting stellar properties, especially absolute magnitude or distance. Deep learning be damned!

Aside from this, most of my research time today (and this weekend) was spent writing. Trying to submit the red-giant paper before I depart Germany.


writing and integrating

I spent the day hiding from all responsibilities in order to write. I wrote in my spectrophotometric-distances paper, and I wrote in my new chemical-tangents paper. I am trying to get the first of these done and submitted before I leave Heidelberg this month.

I also did a little bit of coding in the chemical-tangents project. I wrote up a general integrator that can take a general vertical density profile in the Milky Way and integrate one-dimensional orbits. It produces position, velocity, and phase for general orbits in the general one-d gravitational problem. Next up: Using this to characterize the GALAH data.


optimization is the worst

After the incredibly valuable Milky Way Group Meeting discussion of the spectrophotometric parallaxes, Eilers (MPIA) and I simplified our model, re-factored the code, and re-ran. And, despite the fact that the new model is provably better than the old model, everything failed. The reason is: Our objective isn't convex. Not only that, but there is an enormously high-dimensional degenerate bad optimum that is hard to avoid. That sent us back to the books: Optimization is hard!

The trick we settled on (and you are allowed to do many, many tricks here) is to take the very highest signal-to-noise stars (in terms of Gaia parallax) to optimize an initialization and then do our final optimization with all stars, but starting off from that initialization. That is, we burn in to the optimum using the best stars first. It's a hack but it worked, and now the better model is performing the way it should be. That's good! Because it is discouraging when you refactor your code and everything goes worse.

A MPIA Galaxy Coffee, Wolfgang Brandner (MPIA) described the new GRAVITY results on the perihelion passage of S2 at the Galactic Center. The perihelion passage shows gravitational and transverse-Doppler redshifts and puts an amazingly strong constraint on the geometry and kinematics of the Galactic Center.


spectrophotometric parallax; optimization fail

Today was spectrophotometric-parallax day. I did writing in the paper, I presented the method at MPIA Milky Way Group Meeting, and Eilers (MPIA) and I refactored slightly the model. In the presentation I gave, we got lots of feedback about how to present the method, which I tried to record carefully in the to-do list at the top of our LaTeX document. We also realized that without much change, we could move the model from a model for magnitude to a direct model for the parallax, bypassing any physical idea of how the star indicates its parallax (which is through its brightness and its log-g, to leading order). So our model is now truly data-driven. We also realized that we could make changes to how we represent the spectral pixels that might make the parameters more well-behaved.

All these things are great things! But when we made the relevant code changes, everything borked. The reason appears simple: It is because the model has a bad pathology: While it has a very good, sensible, non-trivial optimum, it has an enormous family of degenerate trivial optima in which the exponential underflows, the predicted parallaxes are all zero, and the derivatives all vanish. And at 7400 free parameters, this degenerate set of minima has a huge space (huge entropy) to find and eat our optimizer. So by the end of the day, Eilers and I realized we have to get much more clever about initializing the optimizer.

Question of the day: Does the method need a name, like The Cygnet? Or is it okay to just call it “linear spectrophotometric parallax”?



I spent the day writing in the spectroscopic-parallax project. I wrote six or seven paragraphs, and that's about it! (Actually, that's a great day: My goal is two paragraphs per day.)

But in addition to the writing, I did have an interesting conversation with Tom Herbst (MPIA), Thomas Bertram (MPIA), and Kalyan Radhakrishnan (MPIA) about adaptive optics. The idea is to think about using the science data (the imaging you care about) to update the adaptive mirrors. What new things might be unlocked by that, especially if used in concert with the wavefront sensors? This reminds me of old conversations I have had with Matthew Kenworthy (Leiden). I also asked what kinds of science you might do with the wavefront sensors. Just as the imaging detector gives wavefront information, the wavefront sensors give imaging information!

I also was present for presentations by Eilers (MPIA) and Birky (UCSD) on their stellar projects in the MPIA Stars Group Meeting.


spiral arms? and model-grid troubleshooting

The excitement of the day is that we looked at velocity-tensor maps (maps of the means of average velocity-velocity products) across the disk with Eilers (MPIA): We see lots of structure, including possible evidence of spiral arms or bar resonances on the off-diagonal tensor components. Reminder: If the Galaxy is axisymmetric, there will only be diagonal tensor components in the R, phi, z coordinate system. If we find off-diagonal components: Non-axisymmetry. Could be interesting. Rix (MPIA) encouraged us to stay on target for a Jeans model and leave these hints of complex disk morphology for later investigations.

In addition to this, I had a great chat with Maria Bergemann (MPIA) and Mikhail Kovalev (MPIA) about fitting spectra with spectral models, given that the models are amazingly expensive to compute. They do a (random) grid and then interpolate using The Payne. They are getting some results they aren't happy with, so I walked through basic tests that can be done in these situations.

Basic sanity checks—when you are fitting data using an interpolation of a grid or random assemblage of model predictions—are the following: Find the closest model point in the grid, and then the K next closest, where K is larger than the dimensionality of the model parameter space. Is the best-fit model in the convex hull of the K? Are the K in one group or multiple groups? Do the K look like they hit the edge of the grid? And what are the chi-squared values? And is the interpolated best point also in the convex hull? All these pieces of information go into an analysis of whether you have enough model evaluations and how to interpolate them.