empirical template RVs

As my loyal reader knows, I am finishing up an old paper on information theory, radial velocities, and measurement. Today I wrote the code that produces radial-velocity measurements using cross-correlations with an empirical template that is built from the data themselves. I learned a lot doing this! One thing I learned is that edge effects matter: It is important to either design your template, or else design your method, such that as you redshift or blueshift the template, lines that enter or leave the spectral range don't mess up your radial-velocity measurements. In principle this is all handled correctly by correct statistics, but in practice it requires attention!


what are the responsibilities of letter writers?

The Astronomical Data Group Meeting at Flatiron is a remarkable weekly meeting. It is always impressive. Contributions are categorized into categories like words, statistics, visualization, and feelings and so on. But many of the most interesting conversations have come from the feelings category! Today was no exception. Alex Gagliano (UIUC) asked about the ethical responsibilities of letter writers (as in: letters of recommenation) to applicants, and the responsibilities the other way. It led to a great conversation about many things that we all think about but are not written down, taught, or explicitly discussed in our field. What are we doing? The union of the feelings-categorized conversations we have had in group meeting would make an admirable textbook!


linear programming FTW

As my loyal reader knows, I am violating all of my principles and engaging in a linear-regression version of a symbolic regression project to test how ideas in dimensional analysis (or units equivariance) might impact machine learning. I have been struggling to get a sparse regression working, because when problems get large, optimizing combined L1 and L2 losses can be sticky and tricky. But Soledad Villar (JHU) saved me today by pointing out that in the over-parameterized regime (when you have more terms in your linear regression than data), you can do a sparse regression with a cleverly designed linear program! Woah, we coded it up and it Just Worked (tm)! We can get the exactly correct total mechanical energy expression in our toy problem with a very small amount of training data, far less than we needed when we were using L2 as our objective. Far less.


Gaia DR3 plans: much better distances; unicorns

Katie Breivik (Flatiron) and I convene a weekly meeting at Flatiron for NYC-area astronomers to discuss and work on things we are going to do with the ESA Gaia DR3 and SDSS-V data. Today I showed people what Price-Whelan (Flatiron), Eilers (MIT), and I are planning on doing with the 108 (!) very low-resolution Gaia BpRp spectra, which will be released in DR3 this summer: We plan on making a much-improved version of this method for estimating spectrophotometric distances to stars but covering the whole HR diagram and with better overall data and precision (we hope). In the meeting today we discussed ways to validate our distances (which is hard!) and we discussed what interesting side things might appear as part of the project. One category, that Dalcanton (Flatiron) liked, is this: Since our method involves comparing stars to other stars in their neighborhoods in spectrophotometric space, we will automatically and unintentionally find unicorn stars that are unlike any others. Those will be interesting for follow-up.


I broke everything

My loyal reader knows that Soledad Villar (JHU) and I are trying to get a paper on units-equivariance (machine learning that exactly obeys the symmetries of dimensional analysis) ready in time for the ICML deadline. Maybe not a good idea! Anyway, right now none of our numerical experiments are working all that well and today I broke one even further. We have several different, equivalent forms for unit-equivariance. My code had been written for one, and then we wrote the paper for a different one. So I updated my code and now nothing works. The old and new codes aren't identical, so it makes sense that they differ. But it used to work. Argh. I can use version control to go backwards, but the problem is: Then the code won't be consistent with the framework we advocate in the paper.


making fake RV data

I spent the day hacking away in notebooks to make fake but realistic radial-velocity data, and to test that maximum-likelihood and cross-correlation are good methods for measuring the radial velocities themselves, given the data. In cross-correlation the issues are subtle: If the templates are not normalized absolutely correctly, things can go badly wrong. But I can now empirically justify my theoretical claim that cross-correlation (with suitsbly normalized templates) saturates the bounds guaranteed by maximum-likelihood estimation.

This, by the way, is not new: I, and many others, have done all this previously. My previous work here was with Megan Bedell (Flatiron), and indeed I am hacking around to resurrect some of our past results.


more data is always better, right?

In Astronomical Data Group Meeting at Flatiron today, Lily Zhao (Flatiron) asked a great question about how to combine radial-velocity data: Imagine you have a very high-resolution echelle-type spectrograph with many spectral orders. You get a stellar radial-velocity out of every order (or, in some cases, out of many small spectral patches in every order. Now: How to combine those radial-velocity measurements into one, true radial velocity measurement? Obviously you just do an inverse-variance weighted average, no? Well, no! It doesn't work right. There are bad orders, and the only way to know is to see that they make the measurements worse in some end-to-end variance sense. So what to do? How do you empirically determine how to do the combination of data? This problem is simultaneously trivial and impossible. It's a great subject of discussion, and one I've mentioned here previously.


the bolometric correction of Vega is not zero

Wow I went Inside Baseball (tm) today on magnitudes. Magnitudes were originally designed to be purely observational, purely relative measurements, relative to a standard star. But then once you go off-course and define absolute magnitudes (gasp) and then even farther off and define bolometric magnitudes (double gasp), the system is no longer relative to anything! It depends on (imprecise) distance estimates and (wrong) stellar models. So you get many absurdities, like that Vega is not zero in any Vega-relative absolute or bolometric magnitude system. I think maybe we've lost our way! And not because magnitudes are obtuse; apparent magnitudes and colors make lots of sense! We lost our way when we tried to extend this to cover luminosities of various kinds.


weighing the disk with the Snail

Today we had a presentation from Axel Widmark (NBI), who has traced the spiral structure in the Snail (tm), which is the phase spiral in the vertical dynamics discovered by Antoja and others in the ESA Gaia data. He uses the trace to estimate the density of the Milky Way disk. Widmark's approach reminds me of old-school work on stream tracks (by, for instance, Johnston and others, but also maybe me?): He traces the spiral ridge-line and asks for it to represent a frequency dependence on vertical energy. That's a good idea! And it seems to work. And it motivated me to think more about methods that—rather than just trace a linear locus—perform full distribution-function fitting in a time-independent potential.


why does anyone need a bolometric correction?

As I mentioned yesterday, and against my better judgement, I am writing a long pedagogical document on magnitudes and how they work. I got to the bolometric-correction part and I realized that I didn't understand exactly why they are defined and how they are used. So I had a long think. The answer is: Theoretical models of stars are better at predicting the total bolometric luminosity output (energy per time) coming from a star than they are at predicting the detailed spectrum. Thus it makes sense to separate the belief about the total luminosity from the belief about the luminosity in any one band.


the magnitude system: go big or go home

I spent some time on the weekend and today writing in my various writing projects. Actually, I am obliged to do this: I have a two-paragraph-per-day New Year's resolution. One thing I am (stupidly) writing is a note about how apparent magnitudes work, in order to (ultimately) explain the bolometric correction to physicists. This latter concept is confusing because when you measure through a limited band, you don't see the whole stellar spectrum, and yet the bolometric correction can be positive or negative or zero. Today as I was writing in this document, I realized that I have to do it all: Absolute magnitudes, distance moduli, color excesses, and so on, if I am going to reach my intended audience. I expanded the outline of the paper and I feel good about where it's going. That said, shouldn't I be doing research?


detecting shifts with cross-correlations

Lily Zhao (Flatiron) has been looking at the variability of the SDSS-IV BOSS spectrograph calibration frames, in the thought that if we can understand them well enough, we might be able to substantially reduce the observing overheads for the robotic phase of SDSS-V. To start, we are doing simple cross-correlations between calibration images. After all, the dominant calibration changes from exposure to exposure appear to be shifts. But these cross-correlations have weird features in them that I don't understand: In general if you are cross-correlating two images that are very similar, the cross-correlation function should look very symmetric, I think? But Zhao's cross-correlation functions look asymmetric in weird ways. We ended our session puzzled.


other people's code

I spent much of the day understanding Other People's Code (tm). One piece of code is legacy code from Aaron Dotter (Harvard) that computes magnitudes from flux densities. I was looking at it to confirm my frantic writing of yesterday. Another piece of code is the code by Weichi Yao (NYU) that implemented our group-equivariant machine learning. I was looking at it to see if we can modify it to impose units equivariance (dimensional scaling symmetries). I think we can, and I confirmed that it Just Works (tm).


magnitudes are (logarithmic) ratios of signals

A couple of weeks ago I had intense arguments with Belokurov (Cambridge) and Farr (Stony Brook) about the definition of a photometric bandpass and a photometric magnitude. And a few months ago I had long conversations with Breivik (Flatiron) about the bolometric correction. For some reason these things took over my mind today and I couldn't stop myself from starting a short pedagogical note about how magnitudes are related to spectral energy distributions. It's not trivial! Indeed, the integral isn't the integral you naively think it might be, because most photometric systems count photons (they don't integrate energy). People often say that a magnitude is negative-2.5 times the log of a flux. But that's not right! It is negative-2.5 times the log of a ratio of signals measured in two experiments.


building dimensionless monomials

I got stuck today on a problem that seems trivial but is in fact not at all trivial: Given N inputs, how to make all possible dimensionless monomials of those N inputs, at (or less than) degree d. For our purposes (which are the units-equivariant machine-learning projects I have with Soledad Villar, JHU), a dimensionless monomial of degree d is a product of integer powers of the inputs, in which those powers can be positive or negative, such that the dimensions cancel out completely, and for which the the max (or sum or some norm) of the absolute values of the powers is ≤d. We have a complete basis of dimensionless monomials, such that any valid dimensionless monomial can be expressed as a monomial of the basis monomials. Because of this, the dimensionless monomials can be seen as the vertices of a Bravais lattice, technically. The problem is just to traverse the entire lattice within some kind of ball. Why is this hard? Or am I just dull? I feel like there are fill algorithms that should do this correctly.


refereeing can be very valuable

Christina Eilers (MIT) and I discussed our referee report today, on our paper on re-calibrating abundance ratios as measured by APOGEE to remove log-g-dependent systematics. The referee report came quickly! And it was very useful: The referee found an assumption that we are making that we had not explicitly stated in the paper. And this is important: As my loyal reader knows, I believe that a data-analysis paper is correct only insofar as it is consistent with its explicitly stated (and hopefully tested and justified) assumptions. So if a paper is missing an assumption, it is wrong!


what is academic freedom?

In response to feedback from current and former members of the Center for Cosmology and Particle Physics, Mike Blanton (NYU) organized a career panel on careers outside of physics today. It was great! Three of my former collaborators—Andreas Berlind (NSF), Morad Masjedi (Brevan Howard), and Adi Zolotov (BCG)—were on the panel. They said so many interesting things I can't summarize it all here. But here are a few examples:

Zolotov said that one of the ways in which she used her physics training in her current work (which is consulting for government agencies and companies) is that she was (right from the outset) comfortable preparing and giving talks to smart, skeptical (even hostile) audiences. That's interesting! They all had interesting things to say about preparing to move out of physics; what homework should you do? The panelists were asked about writing, because Masjedi said that one of the proximal reasons for going from physics to finance was that he didn't love writing! Interestingly, Masjedi followed up by saying that he has spent his entire career without ever having to do a lot of writing. That surprises me! Berlind generalized this by commenting that the fundamental reason to leave academia is that you can find other jobs that are a better match to your interests and capabilities.

That's an interesting thought, in the context of academic freedom. As you my loyal reader knows, I love and respect academic freedom immensely. But for all my freedom to work on what I want, the modes in which I work (publishing in scientific literatures, writing grant proposals) are very constrained; almost fixed. If you are an academic, you get academic freedom. But in the real world you get the freedom to pursue and find the position and career path that matches your interests. Definitely things to think about.

One take-away though is how damned successful they all are. Zolotov spoke about building a new 20-person group within a consulting company (prior to BCG), and advising people at the highest levels of government. Masjedi spoke of analyzing and buying bonds and bond-like securities for huge clients, including even the Federal Reserve (prior to being at Brevan Howard). Berlind came straight into a Program Director position at NSF, one of the highest ranks in the agency. My loyal reader also knows that, in my view, the most important thing about this job is the people. I'm really proud of them and everything they've done.


optimization is terrible

I had a great conversation this morning with Kiyan Tavangar (Chicago) and Adrian Price-Whelan (Flatiron) about finding a dynamical model for the GD-1 stellar stream and the (purported) compact mass perturber that distorted the stream. They are building a likelihood function and trying to optimize in the kinematic parameters of the perturber. But it's hard! Because every model involves a full forward model for the stream, and how do you compare a particle simulation to a set of observed stars? We've worked on problems like this for years, and we don't have extremely satisfying answers. But the question we talked about today is: Even once you have a good likelihood function implementation: How to initialize the optimization? My position is that optimization needs to be principled, but initialization of that optimization can be arbitrarily nasty. We discussed nasty options. My experiene is that optimization is the hardest part of many of the projects I've done in my life.


maximum-likelihood estimates are often optima of cross-correlations

I've been writing like mad in a paper about measuring radial velocities. The standard practice involves cross correlations. The information-theoretic results suggest maximum-likelihood estimates. What gives? It turns out that, provided that your models are normalized in a certain way, maximizing a cross-correlation between data and a template can be equivalent to minimizing a chi-squared or maximizing a log likelihood. I wrote words about all this in my de-novo re-write of my paper with Bedell (Flatiron) on how you measure a radial velocity.


re-start a paper on EPRV

I promised Megan Bedell (Flatiron) that, in January, I would finish our paper on how you measure a radial velocity. I looked at this today. Once a draft of a paper is a few years (!) old, the only option is to re-start from scratch. There's no salvaging that shih. So I re-started from scratch today. Here's to finishing this, this month.


a phase diagram for sailboats

Matt Kleban (NYU) and I are finishing up a paper on the theoretical basis for sailing (yes sailing). One of our conclusions is that (large) sailboats are described by three dimensionless ratios: The sail-to-keel ratio, the ratio of the sail working force to the air drag force, and the ratio of the keel working force to the water drag force. We imagined today a phase diagram that shows the space of dimensionless ratios that permit, for example, upwind sailing. And sailing downwind faster than the wind. And sailing cross-wind faster than the wind. And so on.