friday: NeurIPS submission

In a heroic final push, Soledad Villar (JHU) finished our paper for NeurIPS submission today. We showed that you can make gauge-invariant neural networks without using the irreducible representations of group theory, or any other heavy computational machinery, at least for large classes of groups. Indeed, for all the groups that appear in classical physics (orthogonal group, rotations, Euclidean, Lorentz, Poincaré, permutation). Our contribution is pure math! It is only about machine learning inasmuch as it suggests future methods and simplifications. We will post it to arXiv next week.



My only resarch today was conversations with Gaby Contardo about the scope and experiments of our paper on methods to automatically discover and characterize gaps in point-cloud data.


what is permutation symmetry?

I spent a lot of time today trying to write down, very specifically, what it means for a function to be invariant with respect to permutation of its input arguments. It turns out that this is hard! Especially when the function is a vector function of vector inputs. This is all related to our nascent NeurIPS submission. This symmetry, by the way, is the symmetry enforced by graph neural networks. But it is also a symmetry of all of classical physics (if, say, the vectors are the properties of particles).


astrology: Yes, it's true

Today Paula Seraphim (NYU) and I extended our off-kilter research on the possibility that we live in a simulation to off-kilter research on whether astrology has some basis in empirical fact. It does! There are birth-season correlations with many things. The issue with astrology, oddly, is not the data! It is with the theory that it is all related to planets and constellations. And if you think about the causes of birth-season effects on personality and capability, most of them (but not all of them) would have been much stronger 2000 years ago than they are today!


linear subspaces in special relativity

Who knew that my love of special relativity would collide with my love of data analysis? In the ongoing conversation between Soledad Villar (JHU), Ben Blum-Smith (NYU), and myself about writing down universally approximating functions that are equivariant with respect to fundamental physics symmetries, a problem came up related to the orientation of sets of vectors: In what groups are there possible actions on d-dimensional vectors such that you can leave all but one of the d vectors unchanged, and change only the dth? It turns out that this is an important question. For example, in 3-space, the orthogonal group O(3) permits this but the rotation group SO(3) does not! This weekend, I showed that the Lorentz group permits this. I showed it constructively.

If you care, my notes are here. It helped me understand some things about the distinction between covariant and contravariant vectors. This project has been fun, because I have used this data-analysis project to learn some new physics, and my physics knowledge to inform a data analysis framework.


gaps: it works!

In our early meeting today, Gaby Contardo (Flatiron) showed me results from the various tweaks and adjustments she has been making to her method for finding gaps (valleys, holes, lacunae) in point-cloud data. When she applies it to the local velocity distribution in the Milky Way disk, it finds all the gaps we see there and traces them nicely. We have a paper to write! Her method involves critical points and multiple derivatives and a stately kind of gradient descent. It's sweet! We have to work on figuring out how to generalize to arbitrary numbers of dimensions.


field theories require high-order tensors

I started a blow-up on twitter about electromagnetism and pseudo-vectors. Why do we need to invoke the pseudo-vector magnetic field when we start with real vectors and end with real vectors? This is all related to my project with Soledad Villar (JHU) and Ben Blum-Smith (NYU) about universally approximating functions (machine learning) for physics. Kate Storey-Fisher (NYU) converted an electromagnetic expression (for that paper/project) that contains cross products and B field into one that requires no cross products and no B field. So why do we need the B field again?

I figured out the answer today: If we want electromagnetism to be a field theory in which charges create or propagate a field and a test particle obtains an electromagnetic force by interaction with that field, then the field has to be an order-2 tensor or contain a pseudo-vector. That is, you need tensor objects to encode the configuration and motions of the distant charges. If you don't need your theory to be a field theory, you can get away without the high-order or pseudo- objects. This should probably be on my teaching blog, not here!


gaps: clever and non-clever methods

Gaby Contardo (Flatiron) showed me beautiful plots that indicate that we can trace the valleys and gaps in a point set using the geometry and calculus things we've been exploring. But then she pointed out that maybe we could find the same features just by taking a ratio of two density estimates with different frequency bandpasses (bandwidths)! Hahaha I hope that isn't true, because we have spent time on this! Of course it isn't lost time, we have learned s lot.


bad inputs on the RHS of a regression are bad

I discussed new results with Christina Eilers (MIT), who is trying to build a simple, quasi-linear causal model of the abundances of stars in our Galaxy. The idea is that some abundance trends are being set or adjusted by problems with the data, and we want to correct for that by a kind of self-calibration. It's all very clever (in my not-so-humble opinion). Today she showed that her results get much better (in terms of interpretability) if she trims out stars that get assigned very wrong dynamical actions by our action-computing code (thank you to Adrian Price-Whelan!). Distant, noisy stars can get bad actions because noise draws on distance can make them effectively look unbound! And in general, the action-estimation code has to make some wrong assumptions.

It's a teachable moment, however, because when you are doing a discriminative regression (predicting labels using features), you can't (easily) incorporate a non-trivial noise model in your feature space. In this case, it is safer to drop bad or noisy features than to use them. The labels are a different matter: You can (and we do) use noisy labels! This asymmetry is not good of course, but pragmatic data analysis suggests that—for now—we should just drop from the training set the stars with overly noisy features and proceed.


regression as a tool for extreme-precision radial velocity

Lily Zhao (Yale) showed new regression results to Megan Bedell (Flatiron) and me today. She's asking whether shape properties of a stellar spectrum give you any information about the radial velocity of the star, beyond the Doppler shift. The reason there might be some signatures is that (for example) star spots and pulsations can distort radial-velocity measurements (at the m/s level) and they also (very slightly) change the shape of the stellar spectrum (line ratios and line shapes and so on). She has approaches that are \emph{discriminative}—they try to predict the RV from the spectrum—and approaches that are \emph{generative}—they try to predict the spectrum from the RV and other housekeeping data. Right now the discriminative approaches seem to be winning, and they seem to be delivering a substantial amount of RV information. If this is successful, it will be the culmination of a lot of hard work.


re-parameterizing Kepler orbits

As many exoplaneteers know, parameterizing eccentric gravitational two-body orbits (ellipses or Kepler orbits) for inferences (MCMC sampling or, alternatively, likelihood optimizations) is not trivial. One non-triviality is that there are combinations of parameters that are very-nearly degenerate for certain kinds of observations. Another is that when the eccentricity gets near zero (as it does for many real systems), some of the orientation parameters become unconstrained (or unidentifiable or really non-existent). Today Adrian Price-Whelan (Flatiron) was hacking on this with the thought that the time or phase of maximum radial velocity (with respect to the observer) and the time or phase of minimum radial velocity could be used as a pair of parameters that give stable, well-defined combinations of phase, eccentricity, and ellipse orientation (when that exists). We spent an inordinate amount of time in the company of trig identities.


streams in external galaxies

Sarah Pearson (NYU), Adrian Price-Whelan (Flatiron), and I met today to discuss fitting tidal streams (and especially cold stellar streams) discovered around external galaxies. We are starting with the concept of the stream track, and therefore we need to turn imaging we have of external galaxies into some description of the stream track in some coordinate system that makes sense. We spent time discussing that. We're going to start with some hacks. This isn't unrelated to the work I have been discussing on microscopy of robots: We want to make very precise measurements, but in very heterogeneous, complex imaging.


group theory and the laws of physics

I was very fortunate to be part of a meeting today between Soledad Villar (JHU) and Benjamin Blum-Smith (NYU) in which we spoke about the possible forms of universally approximating functions that are equivariant under rotations and reflections (the orthogonal group O(d)). From my perspective, this is a physics question: What are all the possible physical laws in classical physics? From Villar's perspective, this is a machine-learning question: How can we build universal or expressive machine-learning methods for the natural sciences? From Blum-Smith's perspective, this is a group-theory question: What are the properties of the orthogonal group and its neighbors? We discussed the possibility that we might be able to write an interdisciplinary paper on this subject.


azimuthal variations in the velocity distribution

In my weekly with Jason Hunt (Flatiron), we discussed the point that if the gaps in the local velocity distribution in the Milky Way are caused by the bar (or some of them are) and if the gaps are purely phase shifts and not things being thrown out by chaos, then the velocity distribution in a localized patch should vary with azimuth as an m=2 pattern, with maybe some m=4 and m=6 mixed in. So, that made us think we should look at simulations, and see if there are any features in the local velocity distribution that might be interpretable in such a model. For instance, could we measure the angle of the bar?


Dr Marco Stein Muzio

Today Marco Stein Muzio (NYU) defended his PhD dissertation on multi-messenger cosmic-ray astrophysics (and cosmic-ray physics). He gave credible arguments that the combination of hadron, neutrino, muon, and photon data imply the existence of new kinds of sources contributing at the very highest energies. He made predictions for new data. We (the audience) asked him about constraining hadronic physics, and searching for new physics. He argued that the best place to look for new physics is in the muon signals: They don't seem to fit with the rest. But overall, if I had to summarize, he was more optimistic that the data would be all explained by astrophysical sources and hadronic physics, and not require modifications to the particle model. It was an impressive and lively defense. And Dr MSM has had a huge impact on my Department, co-founding a graduate-student political group and helping us work through issues of race, representation, and culture. I can't thank him enough, and I will miss his presence.


Dr Sicheng Lin

Today Sicheng Lin (NYU) defended his PhD dissertation on the connections between galaxies and the dark-matter field in which they live. He worked on elaborations of what's known as “halo occupation”, “abundance matching” and the like. At the end, I asked my standard questions about how the halo occupation fits into ideas we have about gravity, and the symmetries of physical law. After all, “haloes” aren't things that exist in the theory of gravity. And yet, the model is amazingly successful at explaining large-scale structure data, even down to tiny details. That led to a very nice and very illuminating discussion of all the things that could matter to galaxy clustering and dark-matter over-densities, including especially time-scales. An important dissertation in an important area: I learned during the defense that the DESI project has taken more than one million spectra in it's “science verification” phase. Hahaha! It makes all my work from 1994 to 2006-ish seem so inefficient!


microscopy of fiber robots

Conor Sayres (UW) and I continued today our discussion of the data-analysis challenges associated with the SDSS-V focal-plane system (the fiber robots). Today we discussed the microscopy of the robots. Sayres has images (like literally RGB JPEG images) from a setup in which each fiber robot is placed into a microscope. From this imaging, we have to locate the three fibers (one for the BOSS spectrograph, one for the APOGEE spectrograph, and one for a back-illumination system used for metrology, all relative to the outer envelope of the robot arm. And do this for 300 or 600 robots. The fibers appear as bright, resolved circles in the imaging, but on a background that has lots of other detail, shading, and variable lighting. This problem is one that comes up a lot in astrophysics: You want to measure something very specific, but in a non-trivial image, filled with other kinds of sources and noise. We discussed options related to matched filtering, but we sure didn't finish.


Stanford talk; heretics

I spoke today at Stanford, about the ESA Gaia Mission and it's promise for mapping (and, eventually, understanding) the dark matter in the Milky Way. I spoke about virial and Jeans methods, and then methods that permit us to image the orbits, like streams, the Snail, and orbital torus imaging. At the end of the talk Roger Blandford (Stanford) asked me about heretical ideas in gravity and dark matter. I said that there hasn't been a huge amount of work yet from the Gaia community testing alternative theories of gravity, but there could be, and the data are public. I also said that it is important to do such work, because gravity is the bedrock theory of astrophysics (and physics, in some sense). ESA Gaia potentially might deliver the best constraints in some large range of scales.


the orthogonal group is very simple

Soledad Villar (JHU) and I have been kicking around ideas for machine learning methods that are tailored to classical (mechanical and electromagnetic) physical systems. The question is: What is the simplest representation of objects in this theory that permits highly expressive machine-learning methods but constrained to obey fundamental symmetries, like translation, rotation, reflection, and boost. Since almost all (maybe exactly all) of classical physics obeys rotation and reflection, one of the relevant groups is the orthogonal group O(3) (or O(d) in general). This group turns out to be extremely simple (and extremely constrained). We might be able to make extremely expressive machines with very simple internals, if we have this group deliver the main symmetry or equivariance. We played around with possible abstracts or scopes for a paper. Yes, a purely theoretical paper for machine learning. That puts me out of my comfort zone! We also read some group theory, which I (hate to admit that I) find very confusing.