dynamics and inference

Eilers (MPIA), Rix (MPIA), and I have spent two weeks now discussing how to model the kinematics in the Milky Way disk, if we want to build a forward model instead of just measuring velocity moments (Jeans style). And we have the additional constraint that we don't know the selection function of the APOGEEGaiaWISE cross-match that we are using, so we need to be building a conditional likelihood, velocity conditioned on position (yes, this is permitted; indeed all likelihoods are conditioned on a lot of different things, usually implicitly!).

At Eilers's insistence, we down-selected to one choice of approach today. Then we converted the (zeroth-order, symmetric) equations in this paper on the disk into a conditional probability for velocity given position. When we use the epicyclic approximations (in that paper) the resulting model is Gaussian in velocity space. That's nice; we completed a square, Eilers coded it up, and it just worked. We have inferences about the dynamics of the (azimuthally averaged) disk, in the space of one work day!


adversarial attacks on machine-learning methods

Today, in a surprise visit, Bernhard Schölkopf (MPI-IS) appeared in Heidelberg. We discussed many things, including his beautiful pictures of the total eclipse in Chile last week. But one thing that has been a theme of conversation with Schölkopf since we first met is this: Should we build models that go from latent variables or labels to the data space, or should we build models that go from the data to the label space? I am a big believer—on intuitive grounds, really—in the former: In physics contexts, we think of the data as being generated from the labels. Schölkopf had a great idea for bolstering my intuition today:

A lot has been learned about machine learning by attacking classifiers with adversarial attacks. (And indeed, on a separate thread, Kate Storey-Fisher (NYU) and I are attacking cosmological analyses with adversarial attacks.) These adversarial attacks take advantage of the respects in which deep-learning methods are over-fitting to produce absurdly mis-classified data. Such attacks work when a machine-learning method is used to provide a function that goes from data (which is huge-dimensional) to labels (which are very low-dimensional). When the model goes from labels to data (it is generative) or from latents to data (same), these adversarial attacks cannot be constructed.

We should attack some of the astronomical applications of machine learning with such attacks! Will it work? I bet it has to; I certainly hope so! The paper I want to write would show that when you are using ML to transform your data into labels, it is over-fitting (in at least some respects) but when you are using ML to transform labels into your data, you can't over-fit in the same ways. This all connects the the idea (yes, I am like a broken record) that you should match your methods to the structure of your problem.


conditional likelihoods

Today Christina Eilers (MPIA) and I spent time working out different formulations for an inference of the force law in the Milky Way disk, given stellar positions and velocities. We have had various overlapping ideas and we are confused a bit about the relationships between our different options. One of the key ideas we are trying to implement is the following: The selection function of the intersection of Gaia and APOGEE depends almost entirely on position and almost not at all on velocity. So we are looking at likelihood functions that are probabilities for velocity given position or conditioned on position. We have different options, though, and they look very different.

This all relates to the point that data analysis is technically subjective. It is subjective of course, but I mean it is subjective in the strict sense that you cannot obtain objectively correct methods. They don't exist!


p modes and g modes in stars

Today was the first of two 90-minute pedagogical lectures at MPIA by Conny Aerts (Leuven), who is also an external member of the MPIA. I learned a huge amount! She started by carefully defining the modes and their numbers ell, em, and en. She explained the difference between pressure (p) modes and gravity (g) modes, which I have to admit I had never understood. And I asked if this distinction is absolutely clear. I can't quite tell; after all, in the acoustic case, the pressure is still set by the gravity of the star! The g modes have never been detected for the Sun, but they have been detected for many other kinds of stars, and they are very sensitive to the stellar interiors. The relative importance of p and g modes is a strong function of stellar mass (because of the convective and radiative structure in the interior). She also showed that p modes are separated by near-uniform frequency differences, and g modes by near-uniform period differences. And the deviations of these separations from uniformity are amazingly informative about the interiors of the stars, because (I think) the different modes have different radial extents into the interior, so they measure different integrals of the density. Amazing stuff. She also gave a huge amount of credit to the NASA Kepler Mission for changing the game completely.


radial actions and angles

[No posts for a few days because vacation.]

Great day today! I met up with Eilers (MPIA) early to discuss our project to constrain the dynamics of the Milky Way disk using the statistics of the actions and conjugate angles. During our conversation, I finally was able to articulate the point of the project, which I have been working on but not really understanding. Or I should say perhaps that I had an intuition that we were going down a good path, but I couldn't articulate it. Now I think I can:

The radial action of a star in the Milky Way disk is a measure of how much it deviates in velocity from the circular velocity. The radial action is (more or less) the amplitude of that deviation and the radial angle is (more or less) the phase of that deviation. Thus the radial action and angle are functions (mostly though not perfectly) of the stellar velocity. So as long as the selection function of the survey we are working with (APOGEE cross Gaia in this case) is a function only (or primarily) of position and not velocity, the selection function doesn't really come in to the expected distribution of radial actions and angles!

That's cool! We talked about how true these assumptions are, and how to structure the inference.


white paper

I spent today working on a Decadal Astro2020 white paper, which I think is probably DOA!


orbital roulette, radial-action edition

I spent time today with Christina Eilers (MPIA), discussing how to constrain the Milky Way disk potential (force law) using the kinematics of stars selected in a strange way (yes, APOGEE selection). She and others have shown in small experiments that the radial angle—the conjugate angle to the radial action—is very informative! The distribution of radial angles should be (close to) uniform if you can observe a large patch of the disk, and she finds that the distribution you observe is a very strong function of potential (force law) parameters. That means that the angle distribution should be very informative! (Hey: Information theory!)

This is an example of orbital roulette. This is a dynamical inference method which was pioneered in its frequentist form by Beloborodov and Levin and turned into a Bayesian form (that looks totally unlike the frequentist form) by Bovy, Murray, and me. I think we should do both forms! But we spent time today talking through the Bayesian form.


why does deep learning work?

There is a paradox about deep learning. Which everyone either finds incredibly unconvincing or totally paradoxical. I'm not sure which! But it is this: It is simultaneously the case that deep learning is so flexible it can fit any data, including randomly generated data, and the case that when it is trained on real data, it generalizes well to new examples. I spent some time today discussing this with Soledad Villar (NYU) because I would like us to understand this a bit better in the context of possible astronomical applications of deep learning.

In many applications, people don't need to know why a method works; they just need to know that it does. But in our scientific applications, where we want to use the deep-learning model to de-noise or average over data, we actually need to understand in what contexts it is capturing the structure in the data and not just over-fitting the noise. Villar and I discussed how we might test these things, and what kinds of experiments might be illuminating. As my loyal reader might expect, I am interested in taking an information-theoretic attitude to the problem.

One relevant thing that Villar mentioned is that there is research that suggests that when the data has simpler structure, the models train faster. That's interesting, because it might be that somehow the deep models still have some internal sense of parsimony that is saving them; that could resolve the paradox. Or not!


decadal white paper

The tiny bit of research I did today was work on a Decadal Survey (Astro2020) white paper on changes we might make to the Decadal Survey process itself. The challenge is to write this constructively and with the appropriate tone. I don't want to be sanctimonious!



Nick Pingel (ANU) came by Flatiron and impressed us all with discussions of ASKAP, which is one of the pathfinders to the SKA. The most impressive thing I learned is that the feeds for the telescope array are themselves dipole arrays, so you can synthesize multiple beams at each telescope, and then synthesize an aperture for each beam. That's a great capability for the array, but of course is also an engineering challenge. He said scary things about what the calibration looks like. It really made me wish I had got closer to radio astronomy in my life!


technology-enhanced distributed peer review

At Stars and Exoplanets Meeting today, Wolfgang Kerzendorf spoke about a novel idea for peer review (for telescope-time proposals, but it could be applied to funding proposals or paper refereeing too): When you submit a proposal, you are sent K proposals to review. And the reviews thus obtained are combined in a sensible way to perform the peer review. This approach is scalable, and connects benefit (funding opportunity) to effort (reviewing). That's a good idea, and crystallizes some things I have been trying to articulate for years.

Kerzendorf's contribution, however, is to make a technology that makes this whole problem simpler: He wants to use natural-language processing (NLP) to help the organizations match proposals to reviewers. He showed snippets from a paper that shows that a simple NLP implementation, looking for similarity between proposal texts and proposers' scientific literature, does a reasonable job of matching reviewers to proposals that they feel comfortable to review. This is a great set of issues, and connects also to the discussions in our community about blind reviewing.



I spent my little bit of research time today working on the paper by Matt Buckley (Rutgers) about observing and using as a tool the conservation of phase-space density that is guaranteed by Hamiltonian dynamics.


critical stars; physics easier than math?

Fridays are the good days at Flatiron. We have the Astronomical Data Group internal meeting (which operates by extremely odd and clever rules, not designed nor enforced by me) and the new Dynamics Group internal meeting. In the latter Robyn Sanderson (Penn) brought her entire group from Penn. Students working with the ESA Gaia data. One thing the group is finding is that certain stars have dynamics that are far more sensitive to dynamical (potential) parameters than others. This is something that Bovy and I were arguing long ago: The dynamical model of the Milky Way will not rest equally on all Gaia stars: Some will be critical. That's either obvious or deep. Or both! (I'm loving that phrase these days.)

Late in the day, Rodrigo Luger (Flatiron) and I trapped Leslie Greengard (Flatiron) and Alex Barnett (Flatiron) into a conversation about performing line integrals of spherical harmonics along curves that are themselves solutions of spherical-harmonic equations. In a typical astronomy–math interaction, we spent most of our time describing the problem, and then the answer is either: That's trivial! or That's hard! Unfortunately the answer wasn't That's trivial! But they did give us some good ideas for how to think about the problem.

One funny thing Greengard asked, which resonated with me (no pun intended): He said: Can you convert this math question into a physics question? Because if you can, it probably has a simple answer! You see how odd that is? That if your equation represents a physics problem, it is probably simple to solve. And yet it seems like it is exactly right. That's either deep or wrong or scary. I think maybe the latter.



Today I had the great honor of meeting Ingrid Daubechies (Duke), who is a pioneering and accomplished mathematician, known for some of the fundamental work on wavelets and representations that have been incredibly important in data. For example, the JPEG standard is based on her wavelets! She gave a talk at the end of the day on teeth. Yes teeth. It turns out that the shapes of tooth surfaces tell you simultaneously about evolution and diet. And she has worked out beautiful ways to first get distances between surfaces. Like metric distances in surface space. And then join those distances up into local manifolds. It could have relevance to things we have been thinking about for a non-parametric version of The Cannon. It was a beautiful talk, with the theme or message that you do better in your science if you use mathematical tools that are matched well to the structure of your problem. That message is either obvious or deep. Or both! What a privilege to be there.