epistemology and adversarial attacks

I am trying to prepare slides for a new talk (meaning: no recycled slides!) that I will be giving at MIT in the coming week. I am trying to boil down what's so troubling about machine learning in the natural sciences. I realized that what's so troubling (to me) is that the only standard of truth for a machine-learning model is how it performs on a finite set of held-out test data (usually drawn from the same distribution as the training data). That is nothing like the standard of truth for scientific models or theories! So, for example, a successful adversarial attack against a method is seen as a strange feature, when it comes to industrial machine-learning contexts. But I think that a successful adversarial attack completely invalidates a machine-learning method used in (many contexts in) natural science: It shows that the model isn't doing what we expect it to be doing. I hope to be able to articulate all this before Tuesday.


moar comparison of radial-velocity spectrographs; I was wrong

I got all excited last weekend about how to compare spectrographs, and I got a beautiful result. And then Megan Bedell (Flatiron) pointed out that my result is in conflict with something we worked out a few years previously about the accuracy with which a radial velocity can be measured given only one spectral line. She is right! I know nothing. I'm confused. How can this be hard? I spent the late afternoon trying to get this all resolved.


is an eigenvector a vector?

I spent time today talking to Kate Storey-Fisher about features to use in her cosmological regression projects. The point is that we are only considering features that have well-defined, coordinate-free meanings, because we are trying to do regressions that are invariant to coordinate transformations. These features include scalars, vectors, and tensors, which we contract into scalars. But what can you do with a tensor? At order 2, a tensor has a trace (its self-contraction); it can be contracted with two vectors; it has eigenvalues and eigenvectors. The eigenvalues are classical scalars; good! But are the eigenvectors classical vectors? No, they aren't, because they don't have signs. What can you do with them? I have some theories...


latent-variable model for stellar ages

Trevor David (Flatiron) is looking at the estimation of stellar ages, maybe inspired by his beautiful paper from 2020. Today I met with him and Megan Bedell (Flatiron) to look at making age estimates out of observed surface abundances. We discussed the possibility that we could build a latent-variable model in which each star has a latent state, which generates the observed age indicators and abundances. We went through some conceptual math and left it to David to look at implementations. This fits into a long-term goal I have of unifying all age indicators!



how to compare radial-velocity spectrographs?

Yesterday, in Data Group Meeting at Flatiron, Lily Zhao (Flatiron) showed a formula for the effectiveness of a spectrograph for measuring a radial velocity that she was confused about. It appeared in a paper by someone who builds spectrographs. We discussed it and ended up a bit confused. So today I sat down and wrote a short document, which represents my Official Position (tm) on spectrograph radial-velocity precision. My loyal reader will not be surprised to learn that I think the right way to compare is by looking at radial-velocity information per unit time where “information” is inverse variance (Fisher information). In that context, the answer is that you get more information linearly with wavelength coverage and you get more information linearly with spectrograph resolution. It's that simple!

[Note added later: This post is WRONG. See here.]


Friday: Hamlet and Horatio reboot

Lily Zhao (Flatiron), Megan Bedell (Flatiron), and I spoke about our linear regression methods for EPRV today. Our methods are called Hamlet and Horatio for reasons we have trouble articulating. But the idea is that if the confusing spectral variability in EPRV comes from surface features rotating in and out of view on the stellar surface, that should impart structure to the data that we can grab and exploit. We stepped back and asked what are the real, underlying assumptions of these methods. We didn't come to extremely clear positions, I am afraid. The future of this field, IMHO, is Doppler imaging spectroscopy of the entire rotating, evolving stellar surface!


feature engineering for dark-matter halos

Kate Storey-Fisher and I spoke about adding eigenvalues (scalars) and eigenvectors (vectors) to our geometric features of dark-matter halos. For regression! But the problem with this is that the eigenvectors have ambiguous sign; is that an issue? Yes! They are descriptions of an order-2 tensor, not order-1 vectors. Hmmm. We also spoke about whether the feature engineering should be deterministic, or chosen by hand.


infrared excesses for planet-hosting stars

Gaby Contardo (Flatiron) and I went to the Gaia EDR3 Archive to make use of its matched catalogs, matching up Gaia, 2MASS, and WISE. We are looking at very short-period planet hosts, which might show interesting photometric deviations. We took one host star, and then found many other stars with similar photometry in the visible. Do they agree in the infrared? It looks like maybe there is a tiny discrepancy? But the power will come from doing many, not just one.


dimensional scalings improve predictions

As my loyal reader knows, I have been working on the possibility that machine-learning or regression methods could be sensitive to dimensions or units and thus get better generalization and so forth. Today we had a success! Weichi Yao (NYU) has been converting the model in this paper on geometric methods to a model that respects dimensional scalings (or dimensional symmetry, or units equivariance). It hasn't been doing better than non-dimemsional-symmetric versions, in part (we think) because the dimensionless invariants we produce have worse conditioning properties than the raw labels. But Soledad Villar (JHU) had a good idea: Let's test on data that are outside the distribution used to generate the training set! That worked beautifully today, and we have far better predictions for the out-of-sample test points than the competitive methods. Why? Because the dimensional scalings guarantee certain aspects of generalization.


large-scale-structure sandbox

Kate Storey-Fisher (NYU), Abby Williams (NYU), and I spent a session today working on Williams's project to measure the statistical homogeneity of the large-scale structure in the Universe. We realized that to finish this project it would help very much to have a testbed in which we have large-scale structure data, a selection function, and interfaces that permit measurement of clustering, void statistics, and so on. We don't have that! Does anyone? If not, let's build it. Ideally with either real SDSS BOSS data or else simulated DESI data.


variable-rate Poisson process

At lunch today, Adrian Price-Whelan (Flatiron) challenged me to explain the form of the variable-rate Poisson process likelihood function. I waved my hands! But I think the argument goes like this (don't quote me on this!): You boxelize your space (time or space or whatever you are working in) in small-enough boxels that every boxel contains exactly one or zero data points. Then you take the limit as the boxel sizes go to zero (and become infinitely numerous). The occupied boxels deliver a sum (in the log likelihood) of log densities at the locations of the observed data. The unoccupied boxels deliver an integral of the density over all of space. Something like that?


fitting, or fitting residuals?

Today Storey-Fisher (NYU) and I decided to modify her regression model for understanding galaxies in dark-matter halos from a regression in which we try to predict the properties of the galaxies directly, to a regression in which we try to predict the residuals away from a standard-practice smooth fit based on halo mass. The issues are complex here! But we want to look at simple models, and simple models can't capture the zeroth-order effects. My expectation (to be tested) is that the residuals are better fit by simple models than the zeroth-order trend. Why do I expect this? In my mind it has something to do with linearization.


is it possible to know how wrong your calibration is?

I had a great meeting with Lily Zhao (Flatiron) and Megan Bedell (Flatiron) to discuss whether we could write a paper on the term in the error budget in exoplanet radial-velocity projects that comes from wavelength calibration. We have an approach for understanding the impact on radial-velocity measurements from wavelength issues—both biases in the wavelength solution and variances—as a function of the wavelength scale over which those issues are coherent. That's good! But such analyses are of limited use if it is impossible for a project to determine, empirically, how well it is doing in getting those wavelength solutions correct. That is, how do you measure the bias and variance of your wavelength solution, and the covariances across wavelength? That's a hard problem. We discussed approaches that involve calibrating calibration images.


weak-lensing inconsistency

Tocay Alexie Leauthaud (UCSC) gave the NYU Astro Seminar, about various things related to cosmological tests with weak lensing. She showed an impressive result, which is that essentially all galaxy–galaxy lensing projects find a weak-lensing signal that is too low by tens of percent relative to what we expect from the Planck cosmological parameters and simple galaxy–halo occupation models. I am interested in looking into this more with Storey-Fisher and her (new, exploratory) models of galaxy occuption in hydro simulations. I have an intuition that the predictions might be overly naive if halo occupation is slightly more complex than expected. I am particularly interested in this issue because I think the galaxy–galaxy weak-lensing signal has been a very fundamental test of our picture of the dark sector.


why do we still use magnitudes?

As my loyal reader knows, I am writing a piece on magnitudes, distance moduli, color indices, color excesses, bolometric corrections, and so on. Today I sent a copy of the draft to my excellent colleague Mike Blanton (NYU) for comments. He came back with lots! One of the main points he made is an elucidation of why (in his opinion) magnitudes make sense to be using even in this day and age: The physical quantities of interest about a (say) star are its bolometric or total luminosity, and its detailed (high-resolution) spectrum. The thing we can observe, photometrically, is neither of these. So we integrate photons in a bandpass. The magnitude system is a way of summarizing the choices we make when we do that, and reminds the theorist or interpreter that our measurements are complex integrals of the things of interest. That's a good argument. Prior to this feedback, my answer to the “why?” question was about precise relative measurement, which isn't so relevant in many contemporary contexts.


figures for a paper

I'm trying to finalize the figures for my paper with Megan Bedell (Flatiron) about measuring radial velocities precisely. I have so many considerations for figures, and they get a bit challenging: Figures should be readable, unambiguously on a black-and-white printer or display. Figures should have aspect ratios such that they can be cut easily into presentation slides. Figures should have large text such that they are readable at a glance. Ink on figures should be used in proportion to the importance of the point being made (more ink on more important data, for example). Data should always be dark and black-ish, models can be light and colored. Figures should be readable by people with (at least) the most common form of color-blindness. Lines and points should be distinguished not just by hue but also by value. The same quantities on different plots should be plotted in the same point or line style and the same color, with the same label and same range. And so on!


how to make cosmological mocks cheaper

I had one of my two weeklies with Kate Storey-Fisher (NYU) today. As usual, we spent a lot of our time on the big picture. Why are we trying to find translations between hydrodynamical simulations and dark-matter-only simulations? It's because dark-matter-only sims are way cheaper. So we can do lots and lots of those, and then a few hydro sims, and then make translators. Or can we? That's what we're trying to figure out now. We're making use of ideas from our projects on geometric machine learning.


aluminium abundances

A very good conversation broke out in our weekly Gaia DR3 & SDSS-V prep meeting today, about Aluminum abundnaces in stellar photospheres, which are the key tool in a new paper about the Milky Way being drafted by Vasily Belokurov (Cambridge). Keith Hawkins (Texas) is in town, and he also happens to be the discoverer of very interesting relationships between Fe, Al, Mg in low-metallicity stars. The ratio [Al/Fe] increases with [Fe/H] at low metallicities and decreases at high metallicities. That has something to do with the different timescales for different kinds of supernovae and different rates of star formation. This all might explain why Christina Eilers (MIT) and I are finding weird issues when we try to fit [Al/Fe] as a function of stellar evolutionary state and dynamical actions in the Galaxy.


convert an L1 minimization into a linear program

Okay, linear programs are amazing things! And okay, L1 minimization and related sparse methods are magical. But can these be related? I always thought “no!” After all, the L1 norm involves an absolute value operator, and that isn't linear; linear programs are linear (it's in the name!). But today Soledad Villar (JHU) blew me away with the following (retrospectively simple) observation: If you augment any variable that you are going to L1 (that variable could be a parameter or a residual) with an auxiliary variable that is constrained to be greater than both the original variable and also the variable multiplied by negative one, then you can set up linear programs such that the auxiliary variables become the absolute values of their associated varaibles. And the optimization performed by the linear program becomes an L1 optimization. It's beautiful, we coded it up today in a few different contexts, and it worked!