On the weekend and today, Eilers (MPIA), Rix (MPIA), and I started to build a true ansatz for a m&equals2 spiral in the Milky Way disk, in both density and velocity. The idea is to compute the model as a perturbation away from an equilibrium model, and not self-consistent (because the stars we are using as tracers don't dominate the density of the spiral perturbation). This caused us to write down a whole bunch of functions and derivatives and start to plug them into the first-order expansion away from the steady-state equilibrium of an exponential disk (the Schwarzschild distribution, apparently). We don't have an ansatz yet that permits us to solve the equations, but it feels very very close. The idea behind this project is to use the velocity structure we see in the disk to infer the amplitude (at least) of the spiral density structure, and then compare to what's expected in (say) simulations or theory. Why not just observe the amplitude directly? Because that's harder, given selection effects (like dust).
I gave the Königstuhl Colloquium in Heidelberg today. I spoke about the (incredibly boring) subject of selecting targets for spectroscopic follow-up. The main point of my talk is that you want to select targets so that you can include the selection function in your inferences simply. That is, include it in your likelihood function, tractably. This puts actually extremely strong constraints on what you can and cannot do, and many surveys and projects have made mistakes with this (I think). I certainly have made a lot of mistakes, as I admitted in the talk. Hans-Walter Rix (MPIA) and I are trying to write a paper about this. The talk video is here (warning: I haven't looked at it yet!).
I had an inspiring conversation with Sara Rezaei Kh. (Gothenburg) today, about next-generation dust-mapping projects. As my loyal reader knows, I want to map the dust in 3d, and then 4d (radial velocity too) and then 6d (yeah) and even higher-d (because there will be temperature and size-distribution variations with position and velocity). She has some nice new data, where she has her own 3d dust map results along lines of sight that also have molecular gas emission line measurements. If it is true that dust traces molecular gas (even approximately) and if the 3-d dust map is good, then it should be possible to paint velocity onto dust with this combined data. My proposal is: Find the nonlinear function of radial position that is the mean radial velocity such that both line-of-sight maps are explained by the same dust in 4d. I don't know if it will work, but we were able to come up with some straw-man possible data sets for which it would obviously work. Exciting project.
[After I posted this, Josh Peek (STScI) sent me an email to note that these ideas are similar to things he has been doing with Tchernyshyov and Zasowski to put velocities onto dust clouds. Absolutely! And I love that work. That email (from Peek) inspires me to write something here that I thought was obvious, but apparently isn't: This blog is about my research. Mine! It is not intended to be a comprehensive literature review, or a statement of priority, or a proposal for future work. It is about what I am doing and talking about now. If anything that I mention in this blog has been done before, I will be citing that prior work if I ever complete a relevant project! Most ideas on this blog never get done, and when they do get done, they get done in responsible publications (and if you don't think they are responsible, email me, or comment here). This blog itself is not that responsible publication. It contains almost no references and it does not develop the full history of any idea. And, in particular, in this case, the ideas that Rezaei Kh. and I discussed this day (above) were indeed strongly informed by things that Peek and Tchernyshyov and Zasowski have done previously. I didn't cite them because I don't cite everything relevant when I blog. If full citations are required for blogging, I will stop blogging.]
I had a conversation today with Kate Storey-Fisher (NYU) about the software she is writing in our large-scale structure projects. One question is whether to develop on a fork of another project, or to bring in code from another project and work in our own project?
My view on this is complicated. I am a big believer in open-source software and building community projects. But I am also a believer that science and scientific projects have to have clear authorship: Authorship is part of giving and getting credit, part of taking responsibility for your work and decisions, and part of the process of criticism that is essential to science (in its current form now). So we left this question open; we didn't decide.
But my thoughts about the right thing to do depend on many factors, like: Is this code an important part of your scientific output, or is it a side project? Do you expect to write a paper about this code? Do you expect or want this code to be used by others?
Today Christina Eilers (MPIA) gave a great colloquium talk at MPIA about the intergalactic medium, and how it can be used to understand the lifetime of quasars: Basically the idea is that quasars ionize bubbles around themselves, and the timescales are such that the size of the bubble tells you the age of the quasar. It's a nice and simple argument. Within this context, she finds some very young quasars; too young to have grown to their immense sizes. What explanation? There are ways to get around the simple argument, but they are all a bit uncomfortable. Of course one idea I love (but it sure is speculative) is the idea that maybe these very young quasars are primordial black holes!
In other research today (actually, I think this is not research according to the Rules), I finished a review of a book (a history of science book, no less) for Princeton University Press. I learned that reviewing a book for a publisher is a big job!
Yesterday Eilers (MPIA) and I thought that splitting the stars into many populations would help us: Every stellar population would have its own kinematic distribution, but every population would share the same gravitational potential. We were right on the first part: The velocity dispersion and scale length are both a strong function of chemical abundances (metallicity or alpha enhancement). We even made a bin-free model where we modeled the dependences continuously! But for each stellar population, the degeneracy between circular velocity of the potential and scale-length of the distribution function remains. And it is exact. So splitting by stellar sub-population can't help us! Durn. And Duh.
Eilers and I looked at the dependence of the kinematics of disk populations with various element-abundance ratios. And we built a model to capitalize on these differences without binning the data: We parameterized the dependences of kinematics (phase-space distribution function) on element abundances and then re-fit our dynamical model. It didn't work great; we don't yet understand why.
Today Adrian Price-Whelan (Flatiron) and I resurrected an old project from last summer: Code-named Chemical Tangents, the project is to visualize or model the orbits (tori) in the phase-mixed parts of the Milky Way by looking at the element abundance distributions. The gradients in the statistics of the element-abundance distributions (like mean, or quantiles, or variances, or so on) should be perpendicular to the tori. Or the gradients should be in the action directions and never the conjugate-angle directions. Price-Whelan resurrected old code and got it working on new data (APOGEE cross Gaia). And we discussed name and writing and timeline and so on.
Eilers (MPIA), Rix (MPIA), and I have spent two weeks now discussing how to model the kinematics in the Milky Way disk, if we want to build a forward model instead of just measuring velocity moments (Jeans style). And we have the additional constraint that we don't know the selection function of the APOGEE–Gaia–WISE cross-match that we are using, so we need to be building a conditional likelihood, velocity conditioned on position (yes, this is permitted; indeed all likelihoods are conditioned on a lot of different things, usually implicitly!).
At Eilers's insistence, we down-selected to one choice of approach today. Then we converted the (zeroth-order, symmetric) equations in this paper on the disk into a conditional probability for velocity given position. When we use the epicyclic approximations (in that paper) the resulting model is Gaussian in velocity space. That's nice; we completed a square, Eilers coded it up, and it just worked. We have inferences about the dynamics of the (azimuthally averaged) disk, in the space of one work day!
Today, in a surprise visit, Bernhard Schölkopf (MPI-IS) appeared in Heidelberg. We discussed many things, including his beautiful pictures of the total eclipse in Chile last week. But one thing that has been a theme of conversation with Schölkopf since we first met is this: Should we build models that go from latent variables or labels to the data space, or should we build models that go from the data to the label space? I am a big believer—on intuitive grounds, really—in the former: In physics contexts, we think of the data as being generated from the labels. Schölkopf had a great idea for bolstering my intuition today:
A lot has been learned about machine learning by attacking classifiers with adversarial attacks. (And indeed, on a separate thread, Kate Storey-Fisher (NYU) and I are attacking cosmological analyses with adversarial attacks.) These adversarial attacks take advantage of the respects in which deep-learning methods are over-fitting to produce absurdly mis-classified data. Such attacks work when a machine-learning method is used to provide a function that goes from data (which is huge-dimensional) to labels (which are very low-dimensional). When the model goes from labels to data (it is generative) or from latents to data (same), these adversarial attacks cannot be constructed.
We should attack some of the astronomical applications of machine learning with such attacks! Will it work? I bet it has to; I certainly hope so! The paper I want to write would show that when you are using ML to transform your data into labels, it is over-fitting (in at least some respects) but when you are using ML to transform labels into your data, you can't over-fit in the same ways. This all connects the the idea (yes, I am like a broken record) that you should match your methods to the structure of your problem.
Today Christina Eilers (MPIA) and I spent time working out different formulations for an inference of the force law in the Milky Way disk, given stellar positions and velocities. We have had various overlapping ideas and we are confused a bit about the relationships between our different options. One of the key ideas we are trying to implement is the following: The selection function of the intersection of Gaia and APOGEE depends almost entirely on position and almost not at all on velocity. So we are looking at likelihood functions that are probabilities for velocity given position or conditioned on position. We have different options, though, and they look very different.
This all relates to the point that data analysis is technically subjective. It is subjective of course, but I mean it is subjective in the strict sense that you cannot obtain objectively correct methods. They don't exist!
Today was the first of two 90-minute pedagogical lectures at MPIA by Conny Aerts (Leuven), who is also an external member of the MPIA. I learned a huge amount! She started by carefully defining the modes and their numbers ell, em, and en. She explained the difference between pressure (p) modes and gravity (g) modes, which I have to admit I had never understood. And I asked if this distinction is absolutely clear. I can't quite tell; after all, in the acoustic case, the pressure is still set by the gravity of the star! The g modes have never been detected for the Sun, but they have been detected for many other kinds of stars, and they are very sensitive to the stellar interiors. The relative importance of p and g modes is a strong function of stellar mass (because of the convective and radiative structure in the interior). She also showed that p modes are separated by near-uniform frequency differences, and g modes by near-uniform period differences. And the deviations of these separations from uniformity are amazingly informative about the interiors of the stars, because (I think) the different modes have different radial extents into the interior, so they measure different integrals of the density. Amazing stuff. She also gave a huge amount of credit to the NASA Kepler Mission for changing the game completely.
[No posts for a few days because vacation.]
Great day today! I met up with Eilers (MPIA) early to discuss our project to constrain the dynamics of the Milky Way disk using the statistics of the actions and conjugate angles. During our conversation, I finally was able to articulate the point of the project, which I have been working on but not really understanding. Or I should say perhaps that I had an intuition that we were going down a good path, but I couldn't articulate it. Now I think I can:
The radial action of a star in the Milky Way disk is a measure of how much it deviates in velocity from the circular velocity. The radial action is (more or less) the amplitude of that deviation and the radial angle is (more or less) the phase of that deviation. Thus the radial action and angle are functions (mostly though not perfectly) of the stellar velocity. So as long as the selection function of the survey we are working with (APOGEE cross Gaia in this case) is a function only (or primarily) of position and not velocity, the selection function doesn't really come in to the expected distribution of radial actions and angles!
That's cool! We talked about how true these assumptions are, and how to structure the inference.
I spent time today with Christina Eilers (MPIA), discussing how to constrain the Milky Way disk potential (force law) using the kinematics of stars selected in a strange way (yes, APOGEE selection). She and others have shown in small experiments that the radial angle—the conjugate angle to the radial action—is very informative! The distribution of radial angles should be (close to) uniform if you can observe a large patch of the disk, and she finds that the distribution you observe is a very strong function of potential (force law) parameters. That means that the angle distribution should be very informative! (Hey: Information theory!)
This is an example of orbital roulette. This is a dynamical inference method which was pioneered in its frequentist form by Beloborodov and Levin and turned into a Bayesian form (that looks totally unlike the frequentist form) by Bovy, Murray, and me. I think we should do both forms! But we spent time today talking through the Bayesian form.
There is a paradox about deep learning. Which everyone either finds incredibly unconvincing or totally paradoxical. I'm not sure which! But it is this: It is simultaneously the case that deep learning is so flexible it can fit any data, including randomly generated data, and the case that when it is trained on real data, it generalizes well to new examples. I spent some time today discussing this with Soledad Villar (NYU) because I would like us to understand this a bit better in the context of possible astronomical applications of deep learning.
In many applications, people don't need to know why a method works; they just need to know that it does. But in our scientific applications, where we want to use the deep-learning model to de-noise or average over data, we actually need to understand in what contexts it is capturing the structure in the data and not just over-fitting the noise. Villar and I discussed how we might test these things, and what kinds of experiments might be illuminating. As my loyal reader might expect, I am interested in taking an information-theoretic attitude to the problem.
One relevant thing that Villar mentioned is that there is research that suggests that when the data has simpler structure, the models train faster. That's interesting, because it might be that somehow the deep models still have some internal sense of parsimony that is saving them; that could resolve the paradox. Or not!