O-minus-C inanity

In the exoplanet (and, before that, eclipsing-binary) communities, transit-timing variations are described in terms of a quantity called O−C (pronounced “oh minus sea”), which is the difference between the observed transit time and the “computed” transit time. Right now, Abby Shaum (CUNY) and I are using this terminology in our manuscript about phase variations in coherent pulsators with companions, at the behest of Keaton Bell (CUNY). Okay fine! But O−C has this terrible property, which is that the C part depends on the period or frequency you assume. You can completely change the appearance or morphology of an O−C plot just by slightly tweaking the period. And there is no true period of course! There is just whatever estimates you can make. Which are, in turn, affected by what you use to model the O−C. So it is absolutely awful in every way. Not a stable observable, people! Not even identifiable.


making linear algebra faster in practice

The key thing to make your code run faster is to avoid building large linear-algebra objects. For example, if you need to get the matrix product A.x, where A is a huge matrix and x is a long vector, and you only ever use the matrix A to do this one multiply by x, there is no reason to actually create A. Just create a function that evaluates A.x for any input x. That should be way faster, because of less memory allocation, and because you don't have to make the parts of the matrix that are all zeros (for example). Matt Daunt (NYU) and I discussed all this at length today, as we profiled code. This comment has some overlap with Section 9 of this paper.


high-order integration schemes

I was working on a white paper on ocean dynamics today and I threw in a sentence about how emulators (like machine-learning replacements for simulations) might be working because they might be effectively learning a high-order integration method. I then threw in a sentence about how, in many applications, high-order integrators are known to be better than low-order integrators. I then went to find a reference and... well, I am not sure I can back that up with a reference! I thought this was common knowledge, but it looks like almost all simulations and integrations are done with low-order integrators. Am I living in a simulation? (A simulation integrated with wimpy first-order integrators?)


an alternative to the L–S periodogram

Following some experiments and rants over the last few days with Nora Eisner (Flatiron), I wrote down today an algorithm for a hacky replacement of the Lomb–Scargle periodogram. This periodogram method has various bad pathologies, the worst of which is that it presumes that there is exactly one frequency that fully generates the data. If there are two, the assumptions are broken and the good properties are lost.

Not that my alternative has any good properties! It is like the radio interferometry method called CLEAN: It involves iteratively identifying frequencies and fitting them out. It's terrible. But it might be better than some of the wacky hacks that people do right now in the asteroseismology community.


classification to save labor

I spent part of the day discussing with Valentina Tardugno (NYU) and Nora Eisner (Flatiron) the goals of a machine-learning classification that Tardugno is creating to help the PlanetFinders project. The deal is: Citizen scientists find candidate planets and (currently) a human (Eisner) has to vet them, to remove contamination by various sources of false positives. This turns out to be a hard problem! When problems are hard, it becomes critical to very precisely specify what you are trying to achieve. So we spent time discussing what, exactly, it is that Eisner needs from a classifier. Is it to find good planets? Is it to remove obvious contaminants? Are some contaminants more problematic than others? Is it to save her hours of wall-clock time? Etc.


Phi-M radio

I worked today with Abby Shaum (CUNY) on her paper about her phase-demodulator to find exoplanet and substellar companions to stars by the timing of asteroseismic modes. I suggested that we highlight the incredible simplicity of her project by writing the method as an algorithm of just a few lines.


CZS Summer School, day 5: diffusion

Diffusion models are all the rage in machine learning these days. Today Laurence Levasseur (Montréal) gave a beautiful talk at the CZS Summer School about how diffusion works. She started with a long physics introduction, which was great, and also insightful, about how diffusion works in small physical systems. Then she showed how it can be turned into a method for sampling very difficult probability distributions.

I have a history of working on MCMC methods. These permit you to sample a posterior pdf when you only know a function f that is related to your posterior pdf by some unknown normalization constant. Similarly, diffusion lets you sample from a pdf when you only know the gradient of f. Again, you don't need the normalization. That makes me wonder: Should we be using diffusion in places where we currently use MCMC? I bet the answer is yes, for at least some problems.


CZS Summer School, day 4: GNNs

Today Andreea Deac (Montréal) gave a talk at the CZS Summer School about graph neural networks, and enforcing exact symmetries. It was a great talk, because it was useful to the students and filled with insights even for the experienced machine-learners. She did a great job of connecting GNNs to other methods in use in ML, including convolutional neural networks, and Deep Sets.


CZS Summer School, day 3: Deep Sets

Today was day 3 of the CZS Summer School, in which I am helping mentor a group of students working on equivariant methods on point clouds. In our working session today, Soledad Villar (JHU) (who is the main mentor for this group of students) gave a short, spontaneous explanation of the main Deep Sets result in machine learning: (Almost) any permutation-invariant function of a set of objects xi can be written in an amazingly simple form: h(Σig(xi)), where h and g are potentially nonlinear functions. That result is super-strong, and super-useful!


other kinds of machine learning

Astronomy is very focused on machine learning in the sense of regression and classification, but machine learning can do many other things. In addition, machine learning is a sub-field of machine intelligence, which is broader. I started today working on a proposal for the NSF (to be written with Mike Blanton, NYU) in which we propose using other kinds of machine learning and machine intelligence, and apply them earlier in the scientific process (like at operations and calibration) instead of at the end (like at source classification and labeling).


spherical harmonics for tensor fields

I have been kicking around the generalization of spherical harmonics to vector spherical harmonics, and how that might generate the tensor spherical harmonics to all orders of tensor and all parities. I think I got it today! For every spherical harmonic (ell and em), there are three vector spherical harmonics obtained by multiplying by the radial vector, taking the transverse gradient, and taking the transverse gradient and crossing it into the radial direction. I think these can be generalized (using, say, the Ricci calculus) to make the 2-tensors and so on. If I am right, this is a new way to represent tensor fields on the sphere. Use cases: Cosmic backgrounds, and ocean dynamics.


four kinds of emulators

I wrote in a draft grant proposal related to machine-learning emulators today. I wrote about five different kinds of emulators. Yes I think there are five qualitatively distinct kinds. Here they are:

Full replacement
The most extreme—and most standard—kind of emulator is one that simply replaces the full input–output relationship of the entire simulation. Thus if the simulation starts with initial conditions and boundary conditions, and ends with a final state (after an integration), the full-replacement emulator would be trained to learn the full relationship between the initial and boundary conditions and the final state. A full-replacement emulator is a complete, plug-in replacement for the simulator.
Simulation run times generally scale linearly with the number of time steps required to execute the integration. A set of emulators can be trained on a set of snapshots of the simulation internal state at a set of times that is much smaller than the full set of integration time steps. Each emulator is trained to learn the relationship between the internal state of the simulation at one time tA and the internal state of the simulation at a later time tB, such that the emulator can be used to replace the integrator during the time interval from tA to tB. A set of such emulators can be used to replace part or all of the integration performed by the simulator.
Resolution translator
Simulation run times generally scale with the number of grid points or basis functions in the representations of the state. Thus the simulator gets faster as resolution is reduced. An emulator can be trained to learn the relationship between a low-resolution simulation and a matched high-resolution simulation. Then a high-resolution simulation can be emulated by running a fast low-resolution simulation and applying the learned translation.
Physics in-painter
In most physical systems, there are coupled physics domains with different levels of computational complexity. For example, in cosmology, the pure gravitational part of the simulation is relatively low in computational cost, but the baryonic part—the atoms, photons, ram pressures, magnetic fields—is very high in computational cost. The simulator gets faster as physics domains, or equations, or interaction terms, are dropped. An emulator can be trained to learn the relationship between a simulation with some physics dropped and a matched full simulation. Then a full-physics simulation can be emulated by running a partial-physics simulation and applying the learned in-painting of the missing physics.
Statistics generator
In many contexts, the goal of the simulation is not to produce the full state of the physical system, but only certain critical statistics, such as the two-point correlation function (in the case of some cosmology problems). In this case, there is no need to emulate the entire simulation state. Instead, it makes sense to train the emulator to learn only the relationship between the initial and boundary conditions of the simulation and the final statistics of particular interest.