how to simulate a spectrum

I had a great conversation today with Matt Daunt (NYU), building on discussion yesterday with also Megan Bedell (Flatiron), about how to simulate data from an extreme-precision radial-velocity spectrograph. We decided to simulate the star, the atmosphere, and the (gasp!) gas cell) all at very high resolution, then combine them physically, then reduce resolution to the spectrograph resolution (which is very high nonetheless) and then sample and noisify the resulting data. The idea is: Make the structure of the code like the structure of our physical beliefs, or causal beliefs. We decided to fork this data simulation into its own project.


coarse-graining a point cloud with a kd-tree?

As my loyal reader knows, I am interested in fast-multipole method and whether it could be used to improve or speed machine-learning methods on graphs or spatial point clouds. Over the last months, I have learned about lots of limitations of FMMs, some of which we discuss here. I'm still interested! But when I last spoke with Leslie Greengard (Flatiron) he indicated that he felt like if you want to take FMMs scale up to very clustered data in high dimensions, maybe you have to think of truly adaptive trees (not the fixed tree of an FMM), like perhaps kd-trees. Today Soledad villar (JHU) and I discussed this idea. The question is: What could be proved about such an approach, or are there such approaches where you could get things like accuracy guarantees? The FMM has the beautiful property that you can compute the precision of your approximation, and dial up the order to get better precision.


abundance calibration and abundance gradients

Today Christina Eilers (MIT) updated Hans-Walter Rix (MPIA) and me on our project to self-calibrate the element-abundance measurements in APOGEE. We are looking at self-consistency of the abundance distribution as a function of actions; in a well-mixed Galaxy this could be used to calibrate the biases of the abundance measurements with surface gravity (a known effect in the data) and spectral resolution (a possible effect). Eilers has beautiful results: The abundances get better and the abundance gradients in the Galaxy (with radius or azimuthal action, and with vertical height or vertical action) become more clear and more sensible. So we have a paper to write!


machine-learning group meeting

Today Soledad Villar (JHU), Kate Storey-Fisher (NYU), Weichi Yao (NYU), and I crashed the machine-learning group meeting hosted by Shirley Ho (Flatiron) and Gaby Contardo (Flatiron). Villar presented our new paper on gauge-invariant functions and we started the conversation about what to do with it. We vowed to come back to the meeting to discuss that: What are the best applications of machine learning in cosmology and astrophysics right now?


a model for sailing (yes, sailing)

I've had a lifetime of conversations with Hans-Walter Rix (MPIA) about the point that you could in principle sail with a sailboat with flat sails: Nothing about the curvature of the sails is integral or required by sailing. The curvature helps, but isn't necessary. I have had another lifetime of conversations with Matt Kleban (NYU) about the point that sailing depends on the relative velocity between the air and the water, and this leads to some hilarious physics problems involving sailing on rivers in zero wind (it's possible because a flowing river is moving relative to the dead air).

These worlds collided this weekend because—inspired by a twitter conversation—I finally built a proper ram-pressure model of a flat-sail, flat-keel sailboat and got it all working. It's sweet! It sails beautifully. Much more to say, but question is: Is there a paper to write?


counting repeat spectra in APOGEE

I worked today with Katherine Alsfelder (NYU) to develop statistics on APOGEE spectra: There are two spectrographs (one in the North and one in the South) and there are 300 fibers per spectrograph. How many stars have been observed in each of the 600 different options, and how many of the 600-choose-2 options have seen the same star? This all in preparation for empirical cross-calibration of the spectrographs. There is a lot of data! But 600-choose-2 is a huge number.


information theory at Cambridge

Today I gave a colloquium at the University of Cambridge. My slides are here. I spoke about how to make precise measurements, how to design surveys, and how to exploit structure in noise. It's a rich set of things, and most of the writing about information theory in astronomy is only in the cosmology domain. Time to change that, maybe? It is also the case that the best book about information and inference ever written was written in Cambridge! So I was bringing coals to Newcastle, ish!


machine learning at #AAS238

Today I spoke at the “meeting-in-meeting” on machine learning at the summer AAS meeting. My slides are here. I started out a bit negative but I ended up saying very positive things about what machine learning can do for astrophysics. I got as much feedback on the twitters afterwards (maybe more) than I did in real time. Several of the other speakers in my session mentioned or discussed contrastive learning, which looks like it might be an interesting unsupervised technique.


making slides for AAS and Cambridge

I'm giving two talks this week, one at #AAS238 and one at the University of Cambridge. Because I am a masochist (?) I put in titles and abstracts for both talks that are totally unlike those for any talks I have given previously. So I have to make slides entirely from scratch! I spent every bit of time today not in meetings working on slides. I'm not at all ready!


vectors, bras, and kets

One of my PhD advisors—my official advisor—was Roger Blandford (now at Stanford). Blandford, being old-school, responded to a tweet thread I started by sending me email. I am trying to move over to always describing tensors and rotation operators and Lorentz transformations and the like in terms of unit vectors, and I realized that the most enlightened community along these lines are the quantum mechanics. Probably because they work in infinite-dimensional spaces often! Anyway, there are deep connections between vectors in a space and functions in a Hilbert space. I'm still learning; I think I will never fully get it.


objective functions and Nyquist sampling

Adrian Price-Whelan and I discussed today some oddities that Matt Daunt (NYU) is finding while trying to measure radial velocities in extremely noisy, fast APOGEE sub-exposures. He finds that the objective function we are using is not obviously smooth on 10-ish km/s velocity scales. Why not? We don't know. But what we do know is that a spectrograph with resolution 22,500 cannot put sharp structures into a likelihood function on scales smaller than about 13 km/s.

There's a nice paradox here, in fact: The spectrograph can't see features on scales smaller than 13 km/s, and yet we can reliably measure radial velocities much better than this! How? The informal answer is that the radial-velocity precision is 13 km/s divided by a certain, particular signal-to-noise. The formal answer involves information theory—the Fisher information, to be precise.


Dr Lily Zhao

I had the great honor to be on the PhD committee of Lily Zhao (Yale), who defended her dissertation today. It was great and remarkable. She has worked on hardware, calibration, software, stellar astrophysics, and planets. Her seminar was wide-ranging, and the number and scope of the questions she fielded was legion. She has already had a big impact on extreme precision radial-velocity projects, and she is poised to have even more impact in the future. One of the underlying ideas of her work is that EPRV projects are integrated hardware–software systems. This idea should inform everything we do, going forward. I asked a million technical questions, but I also asked questions about the search for life, and the astronomical community's management and interoperation of its large supply of diverse spectrographs. In typical Zhao fashion, she had interesting things to say about all these things.


orthogonalization in SR, continued

Soledad Villar (JHU) and I discussed more the problem of orthogonalization of vectors—or finding orthonormal basis vectors that span a subspace—in special (and general) relativity. She proposed a set of hacks that correct the generalization of Gram–Schmidt orthogonalization that I proposed a week or so ago. It's complicated, because although the straightforward generalization of GS works with probability one, there are cases you can construct that bork completely. The problem is that the method involves division by an inner product, and if the vector becomes light-like, that inner product vanishes.



friday: NeurIPS submission

In a heroic final push, Soledad Villar (JHU) finished our paper for NeurIPS submission today. We showed that you can make gauge-invariant neural networks without using the irreducible representations of group theory, or any other heavy computational machinery, at least for large classes of groups. Indeed, for all the groups that appear in classical physics (orthogonal group, rotations, Euclidean, Lorentz, Poincaré, permutation). Our contribution is pure math! It is only about machine learning inasmuch as it suggests future methods and simplifications. We will post it to arXiv next week.



My only resarch today was conversations with Gaby Contardo about the scope and experiments of our paper on methods to automatically discover and characterize gaps in point-cloud data.


what is permutation symmetry?

I spent a lot of time today trying to write down, very specifically, what it means for a function to be invariant with respect to permutation of its input arguments. It turns out that this is hard! Especially when the function is a vector function of vector inputs. This is all related to our nascent NeurIPS submission. This symmetry, by the way, is the symmetry enforced by graph neural networks. But it is also a symmetry of all of classical physics (if, say, the vectors are the properties of particles).


astrology: Yes, it's true

Today Paula Seraphim (NYU) and I extended our off-kilter research on the possibility that we live in a simulation to off-kilter research on whether astrology has some basis in empirical fact. It does! There are birth-season correlations with many things. The issue with astrology, oddly, is not the data! It is with the theory that it is all related to planets and constellations. And if you think about the causes of birth-season effects on personality and capability, most of them (but not all of them) would have been much stronger 2000 years ago than they are today!


linear subspaces in special relativity

Who knew that my love of special relativity would collide with my love of data analysis? In the ongoing conversation between Soledad Villar (JHU), Ben Blum-Smith (NYU), and myself about writing down universally approximating functions that are equivariant with respect to fundamental physics symmetries, a problem came up related to the orientation of sets of vectors: In what groups are there possible actions on d-dimensional vectors such that you can leave all but one of the d vectors unchanged, and change only the dth? It turns out that this is an important question. For example, in 3-space, the orthogonal group O(3) permits this but the rotation group SO(3) does not! This weekend, I showed that the Lorentz group permits this. I showed it constructively.

If you care, my notes are here. It helped me understand some things about the distinction between covariant and contravariant vectors. This project has been fun, because I have used this data-analysis project to learn some new physics, and my physics knowledge to inform a data analysis framework.


gaps: it works!

In our early meeting today, Gaby Contardo (Flatiron) showed me results from the various tweaks and adjustments she has been making to her method for finding gaps (valleys, holes, lacunae) in point-cloud data. When she applies it to the local velocity distribution in the Milky Way disk, it finds all the gaps we see there and traces them nicely. We have a paper to write! Her method involves critical points and multiple derivatives and a stately kind of gradient descent. It's sweet! We have to work on figuring out how to generalize to arbitrary numbers of dimensions.


field theories require high-order tensors

I started a blow-up on twitter about electromagnetism and pseudo-vectors. Why do we need to invoke the pseudo-vector magnetic field when we start with real vectors and end with real vectors? This is all related to my project with Soledad Villar (JHU) and Ben Blum-Smith (NYU) about universally approximating functions (machine learning) for physics. Kate Storey-Fisher (NYU) converted an electromagnetic expression (for that paper/project) that contains cross products and B field into one that requires no cross products and no B field. So why do we need the B field again?

I figured out the answer today: If we want electromagnetism to be a field theory in which charges create or propagate a field and a test particle obtains an electromagnetic force by interaction with that field, then the field has to be an order-2 tensor or contain a pseudo-vector. That is, you need tensor objects to encode the configuration and motions of the distant charges. If you don't need your theory to be a field theory, you can get away without the high-order or pseudo- objects. This should probably be on my teaching blog, not here!


gaps: clever and non-clever methods

Gaby Contardo (Flatiron) showed me beautiful plots that indicate that we can trace the valleys and gaps in a point set using the geometry and calculus things we've been exploring. But then she pointed out that maybe we could find the same features just by taking a ratio of two density estimates with different frequency bandpasses (bandwidths)! Hahaha I hope that isn't true, because we have spent time on this! Of course it isn't lost time, we have learned s lot.


bad inputs on the RHS of a regression are bad

I discussed new results with Christina Eilers (MIT), who is trying to build a simple, quasi-linear causal model of the abundances of stars in our Galaxy. The idea is that some abundance trends are being set or adjusted by problems with the data, and we want to correct for that by a kind of self-calibration. It's all very clever (in my not-so-humble opinion). Today she showed that her results get much better (in terms of interpretability) if she trims out stars that get assigned very wrong dynamical actions by our action-computing code (thank you to Adrian Price-Whelan!). Distant, noisy stars can get bad actions because noise draws on distance can make them effectively look unbound! And in general, the action-estimation code has to make some wrong assumptions.

It's a teachable moment, however, because when you are doing a discriminative regression (predicting labels using features), you can't (easily) incorporate a non-trivial noise model in your feature space. In this case, it is safer to drop bad or noisy features than to use them. The labels are a different matter: You can (and we do) use noisy labels! This asymmetry is not good of course, but pragmatic data analysis suggests that—for now—we should just drop from the training set the stars with overly noisy features and proceed.


regression as a tool for extreme-precision radial velocity

Lily Zhao (Yale) showed new regression results to Megan Bedell (Flatiron) and me today. She's asking whether shape properties of a stellar spectrum give you any information about the radial velocity of the star, beyond the Doppler shift. The reason there might be some signatures is that (for example) star spots and pulsations can distort radial-velocity measurements (at the m/s level) and they also (very slightly) change the shape of the stellar spectrum (line ratios and line shapes and so on). She has approaches that are \emph{discriminative}—they try to predict the RV from the spectrum—and approaches that are \emph{generative}—they try to predict the spectrum from the RV and other housekeeping data. Right now the discriminative approaches seem to be winning, and they seem to be delivering a substantial amount of RV information. If this is successful, it will be the culmination of a lot of hard work.


re-parameterizing Kepler orbits

As many exoplaneteers know, parameterizing eccentric gravitational two-body orbits (ellipses or Kepler orbits) for inferences (MCMC sampling or, alternatively, likelihood optimizations) is not trivial. One non-triviality is that there are combinations of parameters that are very-nearly degenerate for certain kinds of observations. Another is that when the eccentricity gets near zero (as it does for many real systems), some of the orientation parameters become unconstrained (or unidentifiable or really non-existent). Today Adrian Price-Whelan (Flatiron) was hacking on this with the thought that the time or phase of maximum radial velocity (with respect to the observer) and the time or phase of minimum radial velocity could be used as a pair of parameters that give stable, well-defined combinations of phase, eccentricity, and ellipse orientation (when that exists). We spent an inordinate amount of time in the company of trig identities.


streams in external galaxies

Sarah Pearson (NYU), Adrian Price-Whelan (Flatiron), and I met today to discuss fitting tidal streams (and especially cold stellar streams) discovered around external galaxies. We are starting with the concept of the stream track, and therefore we need to turn imaging we have of external galaxies into some description of the stream track in some coordinate system that makes sense. We spent time discussing that. We're going to start with some hacks. This isn't unrelated to the work I have been discussing on microscopy of robots: We want to make very precise measurements, but in very heterogeneous, complex imaging.


group theory and the laws of physics

I was very fortunate to be part of a meeting today between Soledad Villar (JHU) and Benjamin Blum-Smith (NYU) in which we spoke about the possible forms of universally approximating functions that are equivariant under rotations and reflections (the orthogonal group O(d)). From my perspective, this is a physics question: What are all the possible physical laws in classical physics? From Villar's perspective, this is a machine-learning question: How can we build universal or expressive machine-learning methods for the natural sciences? From Blum-Smith's perspective, this is a group-theory question: What are the properties of the orthogonal group and its neighbors? We discussed the possibility that we might be able to write an interdisciplinary paper on this subject.


azimuthal variations in the velocity distribution

In my weekly with Jason Hunt (Flatiron), we discussed the point that if the gaps in the local velocity distribution in the Milky Way are caused by the bar (or some of them are) and if the gaps are purely phase shifts and not things being thrown out by chaos, then the velocity distribution in a localized patch should vary with azimuth as an m=2 pattern, with maybe some m=4 and m=6 mixed in. So, that made us think we should look at simulations, and see if there are any features in the local velocity distribution that might be interpretable in such a model. For instance, could we measure the angle of the bar?


Dr Marco Stein Muzio

Today Marco Stein Muzio (NYU) defended his PhD dissertation on multi-messenger cosmic-ray astrophysics (and cosmic-ray physics). He gave credible arguments that the combination of hadron, neutrino, muon, and photon data imply the existence of new kinds of sources contributing at the very highest energies. He made predictions for new data. We (the audience) asked him about constraining hadronic physics, and searching for new physics. He argued that the best place to look for new physics is in the muon signals: They don't seem to fit with the rest. But overall, if I had to summarize, he was more optimistic that the data would be all explained by astrophysical sources and hadronic physics, and not require modifications to the particle model. It was an impressive and lively defense. And Dr MSM has had a huge impact on my Department, co-founding a graduate-student political group and helping us work through issues of race, representation, and culture. I can't thank him enough, and I will miss his presence.


Dr Sicheng Lin

Today Sicheng Lin (NYU) defended his PhD dissertation on the connections between galaxies and the dark-matter field in which they live. He worked on elaborations of what's known as “halo occupation”, “abundance matching” and the like. At the end, I asked my standard questions about how the halo occupation fits into ideas we have about gravity, and the symmetries of physical law. After all, “haloes” aren't things that exist in the theory of gravity. And yet, the model is amazingly successful at explaining large-scale structure data, even down to tiny details. That led to a very nice and very illuminating discussion of all the things that could matter to galaxy clustering and dark-matter over-densities, including especially time-scales. An important dissertation in an important area: I learned during the defense that the DESI project has taken more than one million spectra in it's “science verification” phase. Hahaha! It makes all my work from 1994 to 2006-ish seem so inefficient!


microscopy of fiber robots

Conor Sayres (UW) and I continued today our discussion of the data-analysis challenges associated with the SDSS-V focal-plane system (the fiber robots). Today we discussed the microscopy of the robots. Sayres has images (like literally RGB JPEG images) from a setup in which each fiber robot is placed into a microscope. From this imaging, we have to locate the three fibers (one for the BOSS spectrograph, one for the APOGEE spectrograph, and one for a back-illumination system used for metrology, all relative to the outer envelope of the robot arm. And do this for 300 or 600 robots. The fibers appear as bright, resolved circles in the imaging, but on a background that has lots of other detail, shading, and variable lighting. This problem is one that comes up a lot in astrophysics: You want to measure something very specific, but in a non-trivial image, filled with other kinds of sources and noise. We discussed options related to matched filtering, but we sure didn't finish.


Stanford talk; heretics

I spoke today at Stanford, about the ESA Gaia Mission and it's promise for mapping (and, eventually, understanding) the dark matter in the Milky Way. I spoke about virial and Jeans methods, and then methods that permit us to image the orbits, like streams, the Snail, and orbital torus imaging. At the end of the talk Roger Blandford (Stanford) asked me about heretical ideas in gravity and dark matter. I said that there hasn't been a huge amount of work yet from the Gaia community testing alternative theories of gravity, but there could be, and the data are public. I also said that it is important to do such work, because gravity is the bedrock theory of astrophysics (and physics, in some sense). ESA Gaia potentially might deliver the best constraints in some large range of scales.


the orthogonal group is very simple

Soledad Villar (JHU) and I have been kicking around ideas for machine learning methods that are tailored to classical (mechanical and electromagnetic) physical systems. The question is: What is the simplest representation of objects in this theory that permits highly expressive machine-learning methods but constrained to obey fundamental symmetries, like translation, rotation, reflection, and boost. Since almost all (maybe exactly all) of classical physics obeys rotation and reflection, one of the relevant groups is the orthogonal group O(3) (or O(d) in general). This group turns out to be extremely simple (and extremely constrained). We might be able to make extremely expressive machines with very simple internals, if we have this group deliver the main symmetry or equivariance. We played around with possible abstracts or scopes for a paper. Yes, a purely theoretical paper for machine learning. That puts me out of my comfort zone! We also read some group theory, which I (hate to admit that I) find very confusing.


open research across the University

Scott Collard of NYU Libraries organized an interdisciplinary panel across all of NYU today to discuss open research. I often talk about “open science”, but this discussion was explicitly to cover the humanities as well. We talked about the different cultures in different fields, and the roles of funding agencies, universities, member societies, journals, and so on. One idea that I liked from the conversation was that participants should try to ask what they can do from their position and not try to ask what other people should do from theirs. We had recommendations for making modifications to NYU promotion and tenure, putting open-research considerations into annual merit review, and asking the Departments to think about how, in their field, they could move to the most open edge of what's acceptable and conventional. Another great idea is that open research is directly connected to ideas of inclusion and equity, especially when research is viewed globally. That's important.


adversarial attacks and robustness

Today Teresa Huang (JHU) re-started our conversations about adversarial attacks against popular machine-learning methods in astrophysics. We started this project (ages ago, now) thinking about test-time attacks: You have a trained model, how does it fail you at test time? But since then, we have learned a huge amount about training-time attacks: If you add a tiny change to your training data, can you make a huge change to your model? I think some machine-learning methods popular in astronomy are going to be very susceptible to both kinds of attacks!

When we discussed these ideas in the before times, one of the objections was that adversarial attacks are artificial and meaningless. I don't agree: If a model can be easily attacked, it is not robust. If you get a strange and interesting result in a scientific investigation when you are using such a model, how do you know you didn't just get accidentally pwned by your noise draw? Since—in the natural sciences—we are trying to learn how the world works, we can't be putting in model components or pipeline components that are capable of leading us very seriously astray.


how accurately can we do closed-loop robot control?

Conor Sayres (UW) and I spoke again today about the fiber positioning system (fiber robots) that lives in the two focal planes of the two telescopes that take data for SDSS-V. One of the many things we talked about is how precisely do we need to position the fibers, and how accurately will we be able to observe their positions in real time. It's interestingly marginal; the accuracy with which the focal-plane viewing system (literally a camera in the telescope that looks the wrong way) will be able to locate the fiber positions depends on details that we don't yet know about the camera, the optics, the fiducial-fiber illumination system, and so on. There are different kinds of sensible procedures for observatory operations that depend very strongly on the accuracy of the focal-viewing system.


vectors and scalars

If you have a set of vectors, what are all the scalar functions you can make from those vectors? That is a question that Soledad Villar (JHU) and I have been working on for a few days now. Our requirements are that the scalar be rotationally invariant. That is, the scalar function must not change as you rotate the coordinate system. Today Villar proved a conjecture we had, which is that any scalar function of the vectors that is rotationally invariant can only depend on scalar products (dot products) of the vectors. That is, you can replace the vectors with all the dot products and that is just as expressive.

After that proof, we argued about vector functions of a set of vectors. Here it turns out that there are a lot more options if you want your answer to be equivariant (not invariant but equivariant) to rotations than if you wnt your answer to be equivariant to rotations and parity swaps. We still don't know what our options are, but because it's so restrictive, I think parity is a good symmetry to include.


the theory is a grid

I had a great conversation with Andy Casey (Monash) at the end of the day. We discussed many things related to APOGEE and SDSS-V. One of the things I need is the code that makes the synthetic (physical model) spectra for the purposes of obtaining parameter estimates in APOGEE and the derivatives of that model with respect to stellar parameters. That is, I want the physical-model derivatives of spectral expectation with respect to parameters (like temperature, surface gravity, and composition). It turns out that, at this point, the model is a set of synthetic spectra generated on a grid in parameter space! So the model is the grid, and the derivatives are the slopes of a cubic-spline interpolation (or something like that). I have various issues with this, but I'll be fine.


Dr Shengqi Yang

I've had the pleasure of serving on the PhD committee of Shengqi Yang (NYU) who defended her PhD today. She worked on a range of topics in cosmological intensity mapping, with a concentration on the aspects of galaxy evolution and galaxy formation that are important to understand in connecting the intensity signal to the cosmological signal. But her thesis was amazingly broad, including theoretical topics and making observational measurements, and also ranging from galaxy evolution to tests of gravity. Great stuff, and a well-earned PhD.


Dr Jason Cao

Today Jason Cao (NYU) defended his PhD on the galaxy–halo connection in cosmology. He has built a stochastic version of subhalo abundance matching that has a stochastic component, so he can tune the information content in the galaxies about their host halos. This freedom in the halo occupation permits the model to match more observations, and it is sensible. He also explored a bit the properties of the dark-matter halos that might control halo occupation, but he did so observationally, using satellite occupation as a tracer of halo properties. These questions are all still open, but he did a lot of good work towards improving the connection between the dark sector and the observed galaxy populations. Congratulations, Dr Cao; welcome to the community of scholars!


finding the fiber robots in the SDSS-V focal planes

At the end of the day I met with Conor Sayres (UW) to discuss the problem of measuring the position of focal-plane fiber-carrying robots given images from in-telescope cameras (focal viewing cameras) inside the telescopes that are operating the SDSS-V Project. We have not installed the fiber robots yet, but Sayres has a software mock-up of what the focal viewing camera will see and all its optics. We also discussed some of the issues we will encounter in commissioning and operation of this viewing system.

Later, in the night, I worked on data-driven transformations between focal-plane position (in mm) in the telescope focal plane and position in the focal viewing camera detector plane (in pixels). I followed the precepts and terminology described in this paper on interpolation-like problems. My conclusion (which agrees with Sayres's) is that if these simulations are realistic, the fitting will work well, and we will indeed know pretty precisely what all the fiber robots are doing.


problems for gauge-invariant GNNs

Today Kate Storey-Fisher (NYU) and I spent more time working with Weichi Yao (NYU) and Soledad Villar (JHU) on creating a good, compact, but real test problem for gauge-invariant graph neural networks. We discussed a truly placeholder toy example in which we ask the network to figure out the identity of the most-gravitationally-bound point in a patch of a simulation. And we discussed a more real problem of inferring things about the occupation or locations of galaxies within the dark-matter field. Tomorrow Storey-Fisher and I will look at the IllustrisTNG simulations, which she has started to dissect into possible patches for Yao's model.


unwinding a spiral

A lot of conversations in the Dynamics group at Flatiron recently have been about spirals: Spirals in phase space, spirals in the disk, even spirals in the halo. In general, as a perturbed dynamical system (like a galaxy or a star cluster) evolves towards steady-state, it goes through a (or more than one) spiral phase. We've (collectively) had an interest in unwinding these spirals, to infer the initial conditions or meta-data about the events that caused the disequilibrium and spiral-winding. Jason Hunt (Flatiron) discussed these problems with Adrian Price-Whelan (Flatiron) and me today, showing some attempts to unwind (what I call) The Snail. That led to a long conversation about what would make a good “loss function” for unwinding. If something was unwinding well, how would we know? That led to some deep conversations.


geometric data analysis

I got some real hacking time in this afternoon with Gaby Contardo (Flatiron). We worked through some of the code issues and some of the conceptual issues behind our methods for finding gaps in point clouds using (what I call) geometric data analysis, in which we find critical points (saddles, minima, maxima) and trace their connections to map out valleys and ridges. We worked out a set of procedures (and tested some of them) to find critical points, join them up with constrained gradient descents, and label the pathways with local meta-data that indicate how “gappy” they are.


extremely precise spectrophotometric distances

Adrian Price-Whelan is building a next-generation spectrophotometric distance estimation method that builds on things that Eilers, Rix, and I did many moons ago. Price-Whelan's method splits the stars up in spectrophotometric space and builds local models for different kinds of stars. But within those local patches, it is very similar to what we've done before, just adding some (very much) improved regularization and a (very much) improved training set. And now it looks like we might be at the few-percent level in terms of distance precision! If we are, then the entire red-giant branch might be just as good for standard-candlyness as the red clump. This could really have a big impact on SDSS-V. We spent part of the day making decisions about spectrophotometric neighborhoods and other methodological hyper-parameters.


how to calibrate fiber robots?

Today we had the second in a series of telecons to discuss how we get, confirm, adjust, and maintain the mapping, in the SDSS-V focal planes (yes there are two!) between the commands we give to the fiber-carrying robots and the positions of the target stellar images. It's a hard problem! As my loyal reader might imagine, I am partial to methods that are fully data-driven, and fully on-sky, but their practicality depends on a lot of prior assumptions we need to make about the variability and flexibility of the system. One thing we sort-of decided is that it would be good to get together a worst-case-scenario plan for the possibility that we install these monsters and we can't find light down the fibers.


re-scoping our gauge-invariant GNN project

I am in a project with Weichi Yao (NYU) and Soledad Villar (NYU) to look at building machine-learning methods that are constrained by the same symmetries as Newtonian mechanics: Rotation, translation, Galilean boost, and particle exchange, for examples. Kate Storey-Fisher (NYU) joined our weekly call today, because she has ideas about toy problems we could use to demonstrate the value of encoding these symmetries. She steered us towards things in the area of “halo occupation”, or the question of which dark-matter halos contain what kinds of galaxies. Right now halo occupation is performed with very blunt tools, and maybe a sharp tool could do better? We would have the advantage (over others) that anything we found would, by construction, obey the fundamental symmetries of physical law.


domain adaptation and instrument calibration

At the end of the day I had a wide-ranging conversation with Andy Casey (Monash) about all things spectroscopic. I mentioned to him my new interest in domain adaptation, and whether it could be used to build data-driven models. The SDSS-V project has two spectrographs, at two different telescopes, each of which observes stars down different fibers (which have their own idiosyncracies). Could we build a data-driven model to see what any star observed down one fiber of one spectrograph would look like if it had been observed down any other fiber or any fiber of the other spectrograph? That would permit us to see what systematics are spectrograph-specific, and whether we would have got the same answers with the other spectrograph, and other questions like that.

There are some stars observed multiple times and by both observatories, but I'm kind-of interested in whether we could do better using the huge number of stars that haven't been observed twice instead. Indeed, it isn't clear which contains more information about the transformations. Another fun thing: The northern sky and the southern sky are different! We would have to re-build domain adaptation to be sensitive to those differences, which might get into causal-inference territory.


The Practice of Astrophysics (tm)

Over the last few weeks—and the last few decades—I have had many conversations about all the things that are way more important to being a successful astrophysicist than facility with electromagnetism and quantum mechanics: There's writing, and mentoring, and project design, and reading, and visualization, and so on. Today I fantasized about a (very long) book entitled The Practice of Astrophysics that covers all of these things.


best setting of hyper-parameters

Adrian Price-Whelan (Flatiron) and I encountered an interesting conceptual point today in our distance estimation project: When you are doing cross-validation to set your hyper-parameters (a regularization strength in this case), what do you use as your validation scalar? That is, what are you optimizing? We started by naively optimizing the cost function, which is something like a weighted L2 of the residual and an L2 of the parameters. But then we switched from the cost function to just the data part (not the regularization part) of the cost function, and everything changed! The point is duh, actually, when you think about it from a Bayesian perspective: You want to improve the likelihood not the posterior pdf. That's another nice point for my non-existent paper on the difference between a likelihood and a posterior pdf. It also shows that, in general, the data and the regularization will be at odds.


strange binary star system; orbitize!

Sarah Blunt (Caltech) crashed Stars & Exoplanets Meeting today. She told us about her ambitious, community-built orbitize project, and also results on a mysterious binary-star system, HD 104304. This is a directly-imaged binary, but when they took radial-velocity measurements, the mass of the primary is way too high for its color and luminosity. The beauty of orbitize is that it can take heterogeneous data, and it uses brute-force importance sampling (like my one true love The Joker), so she can deal with very non-trivial likelihood functions and low signal-to-noise, sparse data.

The crowd had many reactions, one of which is that probably the main issue is that ESA Gaia is giving a wrong parallax. That's a boring explanation, but it opens a nice question of using the data to infer or predict a distance, which is old-school fundamental astronomy.


causal-inference issues

I had a nice meeting (in person, gasp!) with Alberto Bolatto (Maryland) about his beautiful results in the EDGE-CALIFA survey of galaxies, and (yes) patches of galaxies. Because they have an IFU, they can look at relationships between gas, dust, composition, temperature, star-formation rate, mean stellar age, and so on, both within and across galaxies. He asked me about some difficult situations in undertanding empirical correlations in a high dimensional space, and (even harder) how to derive causal conclusions. As my loyal reader might guess, I wasn't much help! I handed him a copy of Regression and Other Stories and told him that it's going to get harder before it gets easier! But damn what a beautiful data set.


is the simulation hypothesis a physics question?

Against my better judgement, I am writing a paper on the question of whether we live inside a computer simulation. Today I was discussing this with Paula Seraphim (NYU), who has been doing research with me on this subject. We decided to re-scope the paper around the question “Is the simulation hypothesis a physics question?” instead of the direct question “Do we live in a simulation?”, which can't be answered very satisfactorily. But I think when you flow it down, you conclude that this question is, indeed, a physics question! And the simulation hypothesis motivates searches for new physics in much the same way that the dark matter and inflation do: The predictions are not specific, but there are general signatures to look for.


stating a transfer-learning problem

I am trying to re-state the problem of putting labels on SDSS-IV APOGEE spectra as a transfer learning problem, since the labels come from (slightly wrong) stellar models. Or maybe domain adaptation. But the form of the problem we face in astronomy is different from that faced in most domain-adaptation contexts. The reasons are: The simulated stars are on a grid, not (usually) drawn from a realistically correct distribution. There are only labels on the simulated data, not on the real data (labels only get to real data through simulated data). And there are selection effects and noise sources that are unique to astronomy.


geometry of gradients and second derivatives

Building on conversations we had yesterday about the geometry and topology of gradients of a scalar field, Gaby Contardo (Flatiron) and I worked out at the end of the day today that valleys of a density field (meaning here a many-times differentiable smooth density model in some d-dimensional space) can be traced by looking for paths along which the density gradient has zero projection onto the principal component (largest-eigenvalue eigenvector) of the second-derivative tensor (the Hessian, to some). We looked at some toy-data examples and this does look promising as a technique for tracing or finding gaps or low-density regions in d-dimensional point clouds.


massive revision of a manuscript

Teresa Huang's paper with Soledad Villar and me got a very constructive referee report, which led to some discoveries, which led to more discoveries, which led to a massive revision and increase in scope. And all under deadline, as the journal gave us just 5 weeks to respond. It is a really improved paper, thanks to Huang's great work the referee's inspiration. Today we went through the changes. It's hard to take a paper through a truly major revision: Everything has to change, including the parts that didn't change! Because: Writing!



Today Soledad Villar (JHU) and I discussed the posssibility of building something akin to a graph neural network, but that takes advantage of the n log(n) scaling of a fast multipole method hierarchical summary graph. The idea is to make highly connected or fully connected graph neural networks fast through the same trick that the FMM works: By having nearby points in the graph talk precisely, but have distant parts talk through summaries in a hierarchical set of summary boxels. We think there is a chance this might work, in the context of the work we are doing with Weichi Yao (NYU) on gauge-invariant graph neural networks. The gauge invariance is such a strict symmetry, it might permit transmitting information from distant parts of the graph through summaries, while still preserving full (or great) generality. We have yet to figure it all out, but we spent a lot of time drawing boxes on the board.


we were wrong about the lower main sequence

I wrote ten days ago about a bimodality in the lower main sequence that Hans-Walter Rix (MPIA) found a few weeks ago. I sent it to some luminaries and my very old friend John Gizis (Delaware) wrote back saying that it might be issues with the ESA Gaia photometry. I argued back at him, saying: Why would you not trust the Gaia photometry, it is the world's premier data on stars? He agreed, and we explored issues of stellar variability, spectroscopy, and kinematics. But then, a few days later, Gizis pointed me at figure 29 in this paper. It looks like we just rediscovered a known data issue. Brutal! But kudos to Gizis for his great intuition.


two kinds of low-mass stars

I showed the Astronomical Data Group meeting the bifurcation in the lower main sequence that Hans-Walter Rix (MPIA) found a few weeks ago. Many of the suggestions from the crew were around looking at photometric variability: Does one population show different rotation or cloud cover or etc than the other?


not much to report

not much! Funny how a day can be busy but not involve any things that I'd call research.


no more randoms

In large-scale-structure projects, when galaxy (or other tracer) clustering is measured in real space, the computation involves spatial positions of the tracers, and spatial positions of a large set of random points, distributed uniformly (within the window function). These latter points can be thought of as a comparison population. However, it is equally true that they can be thought of as performing some simple integrals by Monte Carlo method. If you see them that way—as a tool for integrating—it becomes obvious that there must be far better and far faster ways to do this! After all, non-adaptive Monte Carlo methods are far inferior to even stupidly adaptive schemes. I discussed all this with Kate Storey-Fisher (NYU) yesterday and today.


writing about data-driven spectrophotometric distances

I wrote like mad in the paper that describes what Adrian Price-Whelan (Flatiron) and I are currently doing to estimate stellar distances using SDSS-IV APOGEE spectra (plus photometry). I wrote a long list of assumptions, with names. As my loyal reader knows, my position is that if you get the assumptions written down with enough specificity, the method you are doing becomes the only thing you can do. Or else maybe you should re-think that method?


setting hyper-parameters

Adrian Price-Whelan (Flatiron) and I are working on data-driven distances for stars in the SDSS-IV APOGEE data. There are many hyper-parameters of our method, including the number K of leave one-Kth-out splits of the data, the regularization amplitude we apply to the spectral part of the model (it's a generalized linear model), and the infamous Gaia parallax zero-point. These are just three of many, but they span an interesting range. One is purely operational, one restricts the fit (introduces bias, deliberately), and one has a true value that is unknown. How to optimize for each of these? It will be different in each case, I expect.


a split in the main sequence?

I did some actual, real-live sciencing this weekend, which was a pleasure. I plotted a part of the lower-main sequence in ESA Gaia data where Hans-Walter Rix (MPIA) has found a bimodality that isn't previously known (as far as we can tell). I looked at whether the two different kinds of stars (on each side of the bimodality) are kinematically different and it doesn't seem like it. I sent the plots to some experts to ask for advice about interpretation; this is out of scope for both Rix and me!


predicting the future of a periodic variable star

Gaby Contardo (Flatiron) showed me an amazingly periodic star from the NASA Kepler data a few days ago, and today she showed me the results of trying to predict points in the light curve from prior points in the light curve (like in a recurrent method). When the star is very close to periodic, and when the region of the star used to predict a new data point is comparable in length to the period or longer, then even linear regression does a great job! This all relates to auto-regressive processes.


we have new spectrophotometric distances

After a couple of days of hacking and data munging—and looking into the internals of Jax—Adrian Price-Whelan and I produced stellar distance estimates today for a few thousand APOGEE spectra. Our method is based on this paper on linear models for distance estimation with some modifications inspired by this paper on regression. It was gratifying! Now we have hyper-parameters to set and valication to do.


what is a bolometric correction?

Today Katie Breivik (Flatiron) asked me some technical questions about the bolometric correction. It's related to the difference between a relative magnitude in a bandpass and the relative magnitude you would get if you were using a very (infinitely) broad-band bolometer. Relative magnitudes are good things (AB magnitudes, in contrast, are bad things, but that's for another post): They are relative fluxes between the target and a standard (usually Vega). If your target is hotter than Vega, and you choose a very blue bandpass, the bandpass magnitude of the star will be smaller (relatively brighter) than the bolometric magnitude. If you choose a very red bandpass, the bandpass magnitude will be larger (relatively fainter) than the bolometric magnitude. That's all very confusing.

And bolometric is a horrible concept, since most contemporary detectors are photon-counting and not bolometric (and yes, that matters: the infinitely-wide filter on a photon-counting device gives a different relative magnitude than the infinitely-wide filter on a bolometer). I referred Breivik to this horrifying paper for unpleasant details.


low-pass filter for non-uniformly sampled data

Adrian Price-Whelan (Flatiron) and I used the new fiNUFFT non-uniformly-sampled fast Fourier tranform code to build a low-pass filter for stellar spectra today. The idea is: There can't be any spectral information in the data at spectral resolutions higher than the spectrograph resolution. So we can low-pass filter in the log-wavelength domain and that should enforce finite spectral resolution. The context is: Making features to use in a regression or other machine-learning method. I don't know, but I think this is a rare thing: A low-pass filter that doesn't require uniformly or equally-spaced sampling in the x direction or time domain.


correcting wrong simulations, linear edition

Soledad Villar (JHU) and I spent some time today constructing (on paper) a model to learn simultaneously from real and simulated data, even when the simulations have large systematic problems. The idea is to model the joint distribution of the real data, the simulated data, and the parameters of the simulated data. Then, using that model, infer the parameters that are most appropriate for each real data point. The problem setup has two modes. In one (which applies to, say, the APOGEE stellar spectra), there is a best-fit simulation for each data example. In the other, there is an observed data set (say, a cosmological large-scale structure survey) and many simulations that are relevant, but don't directly correspond one-to-one. We are hoping we have a plan for either case. One nice thing is: If this works, we will have a model not just for APOGEE stellar parameter estimation, but also for the missing physics in the stellar atmosphere simulations!


stellar flares

Gaby Contardo (Flatiron) and I have been trying to construct a project around light curves, time domain, prediction, feature extraction, and the arrow of time, for months now. Today we decided to look closely at a catalog of stellar flares (which are definitely time-asymmetric) prepared by Jim Davenport (UW). Can we make a compact or sparse representation? Do they cluster? Do those properties have relationships with stellar rotation phase or other context?



astronomy in film

One of my jobs at NYU is as an advisor to student screenwriters who are writing movies that involve science and technology. I didn't get much research done today, but I had a really interesting and engaging conversation with film-writers Yuan Yuan (NYU) and Sharon Lee (NYU) who are writing a film that involves the Beijing observatory, the LAMOST project, and the Cultural Revolution. I learned a lot in this call!


when do you ever sum over all the entries in a matrix?

Imagine you have $n$ measurements of a quantity $y$. What is your best estimate of the value of $y$? It turns out that if you have an estimate for the covariance matrix of $y$, the information in (expected inverse variance from) your $n$ data points is given by the sum of the entries of the inverse of that covariance matrix. This fact is obvious in retrospect, but also confused me, since this is such a non-coordinate-free thing to do to a matrix!


why make a dust map? and Bayesian model elaboration

Lauren Anderson (Carnegie) and I had a wide-ranging conversation today. But part of it was about the dust map: We have a project with statisticians to deliver a three-dimensional dust map, using a very large Gaussian-process model. Right now the interesting parts of the project are around model checking and model elaboration: How do you take a model and decide what's wrong with it, in detail. Meaning: Not compare it to other models (that's a solved problem, in principle), but rather, compare it to the data and see where it would benefit from improvement.

One key idea for model elaboration is to check the parts of the model you care about and see if those aspects are working well. David Blei (Columbia) told us to climb a mountain and think on this matter, so we did, today. We decided that our most important goals are (1) to deliver accurate extinction values to stellar targets, for our users, and (2) to find interesting dust structures (like spiral arms) if they are there in the data.

Now the challenge is to convert these considerations into posterior predictive checks that are informative about model assumptions. The challenge is that, in a real-data Bayesian inference, you don't know the truth! You just have your data and your model.


best RV observing strategies

I really solidly did actual coding today on a real research problem, which I have been working on with Megan Bedell (Flatiron) for a few years now. The context is: extreme precision radial-velocity surveys. The question is: Is there any advantage to taking one observation every night relative to taking K observations every K nights? I succeeded!

I can now show that the correlations induced in adjacent observations by asteroseismic p-modes makes it advantageous to do K observations every K nights. Why? Because you can better infer the center-of-mass motion of the star with multiple, coherently p-mode-shifted observations. The argument is a bit subtle, but it will have implications for Terra Hunting and EXPRES and other projects that are looking for long-period planets.


EPRV capabilities

I had a conversation with Jacob Bean (Chicago) and Ben Montet (UNSW) about various radial-velocity projects we have going. We spent some time talking about what projects are better for telescopes of different apertures, and whether there is any chance the EPRV community could be induced to work together. I suggested that the creation of a big software effort in EPRV could bring people together, and help all projects. We also talked about data-analysis challenges for different kinds of spectrographs. One project we are going to do is get a gas cell component added in to the wobble model. I volunteered Matt Daunt (NYU) in his absence.


asteroseismic p-mode noise mitigation

I had a call with part of the HARPS3 team today, the sub-part working on observations of the Sun. Yes, Sun. That got us arguing about asteroseismic modes and me claiming that there are better approaches for ameliorating p-mode noise in extreme precision radial-velocity measurements than setting your exposure times carefully to null the modes. The crew asked me to get specific, so I had a call with Bedell (Flatiron) later in the day to work out what we need to assemble. The issues are about correlated noise: Asteroseismic noise is correlated; those correlations can be exploited for good, or ignored for bad. That's the argument I have to clearly make.


the Lasso as an optimizer

In group meeting, and other conversations today, I asked about how to optimize very large parameter vector, when my problem is convex but has an L1 term in the norm. Both Gaby Contardo (Flatiron) and Soledad Villar (JHU) said: Use the standard Lasso optimizer. At first I thought “but my problem doesn't have exactly the Lasso form!”. But then I realized that it is possible to manipulate the operators I have so that it has exactly the Lasso form, and then I can just use a standard Lasso optimizer! So I'm good and I can proceed.


can we turn an image into a colormap?

I talked to Kate Storey-Fisher (NYU) about a beautiful rainbow quartz rock that she has: It is filled with colors, in a beautiful geological palette. Could we turn this into a colormap for making plots, or a set of colormaps? We discussed considerations.


constructing a bilinear dictionary method for light curves

After having many conversations with Gaby Contardo (Flatiron) and Christina Hedges (Ames) about finding events of various kinds in stellar light curves (from NASA Kepler and TESS), I was reminded of dictionary methods, or sparse-coding methods. So I spent some time writing down a possible sparse-coding approach for Kepler light curves, and even a bit of time writing some code. But I think we probably want something more general than the kind of bilinear problem I find it easy to write down: I am imagining a set of words, and a set of occurrences (and amplitudes) of those words in the time domain. But real events will have other parameters (shape and duration parameters), which suggests using more nonlinear methods.


discovering and measuring horizon-scale gradients in large-scale structure

Kate Storey-Fisher (NYU) and I are advising an undergraduate research project for Abby Williams (NYU) in cosmology. Williams is looking at the question: How precisely can we say that the large-scale structure in the Universe is homogenous? Are there gradients in the amplitude of galaxy clustering (or other measures)? Her plan is to use Storey-Fisher's new clustering tools, which can look at variations in clustering without binning or patchifying the space. In the short term, however, we are starting in patches, just to establish a baseline. Today things came together and Williams can show that if we simulate a toy universe with a clustering gradient, she can discover and accurately measure that gradient, using analyses in patches. The first stage of this is to do some forecasts or information theory.


machine learning and ODEs

Today Soledad Villar (JHU) and I discussed different ways to structure a machine-learning method for a cosmological problem: The idea is to use the machine-learning method to replace or emulate a cosmological simulation. This is just a toy problem; of course I'm interested in data analysis, not theory, in the long run. But we realized today that we have a huge number of choices about how to structure this. Since the underlying data come from an ordinary differential equation, we can structure our ML method like an ordinary differential equation, and see what it finds! Or we can give it less structure (and more freedom) and see if it does better or worse. That is, you can build a neural network that is, on the inside, a differential equation. That's crazy. Obvious in retrospect but I've never thought this way before.


elaborating the causal structure of wobble

Lily Zhao (Yale) and Megan Bedell (Flatiron) and I are working on measuring very precise radial velocoties for very small data sets, where (although there are hundreds of thousands of pixels per spectrum) there are only a few epochs of observations. In these cases, it is hard for our data-driven method to separate the stellar spectrum model from the telluric spectrum model—our wobble method makes use of the independent covariances of stellar and telluric features to separate the star from the sky. So we discussed the point that really we should use all stars to learn the (maybe flexible) telluric model). That's been a dream since the beginning (it is even mentioned in the original wobble paper), but execution requires some design thinking: We want the optimizations to be tractable, and we want the interface to be sensible. Time to go to the whiteboard. Oh wait, it's a pandemic.


Cannon, neural network, physical model

In my weekly meeting with Teresa Huang (JHU) and Soledad Villar (JHU), we went through our methods for putting labels on stellar spectra (labels like effective temperature, surface gravity, and metallicity). We have all the machinery together now to do this with physical models, with The Cannon (a data-driven generative model), and with neural networks (deep learning, or other data-driven discriminative models). The idea is to see how well these different kinds of models respect our beliefs about stars and spectroscopic observations, and how they fit or over-fit, as a function of training and model choices. We are using the concept of adversarial attacks to guide us. All our pieces are in place now to do this full set of comparisons.


forwards/backwards project evolution

Gaby Contardo (Flatiron) and I have been working on time asymmetry in NASA Kepler light curves. Our first attempts on this have been about prediction: Is it easier to predict a point in a light curve using its past or its future? It turns out that, for very deep mathematical reasons, there is a lot of symmetry here, even when the light curve is obviously time asymmetric in seemingly relevant ways. So deep, I think we might have some kind of definition of “stationary”. So we are re-tooling around just observable asymmetries. We discussed many things, including dictionary methods. It also occurred to us that in addition to time-reversal questions, there are also flux-reversal questions (like if you flip a light-curve event upside down).


working on a paper on the selection function

One research highlight today was working on the writing and organization of a paper on the defintion and use of the selection function in population studies (with, say, a catalog of observed sources). The paper is led by Hans-Walter Rix (MPIA), is aimed at the ESA Gaia community, and uses the white-dwarf luminosity and temperature distribution as its test case.


red dwarf population bifurcation

My loyal reader knows that Hans-Walter Rix (MPIA) and I have been looking at the population of white dwarfs as observed by ESA Gaia. This is demonstration project; it is somewhat adjacent to our usual science. However, today he ran our white-dwarf code for the much redder stars at the bottom of the main sequence (late M and brown dwarfs) and what did he find? It looks like the main sequence splits into two branches at the lowest-mass (coldest) end. Is that a discovery or known? And who could tell us?



I spent some weekend time working through the paper on NASA TESS detrending by So Hattori (NYUAD). It's beautiful, and pedagogical. I'm pleased.


comparing Bayesian and frequentist estimates of prediction error

I had an interesting conversation with Soledad Villar (JHU) about the difference between frequentist and Bayesian descriptions or analysis of the expected wrongness (out-of-sample prediction error) for a regression or interpolation. The different statistical philosophies lead to different kinds of operations you naturally do (frequentists naturally integrate over all possible data sets; Bayesians naturally also integrate over all possible (latent) parameter values consistent with the data). These differences in turn lead to different meanings for the eventual estimates of prediction error. I'm not sure I have it all right yet, but I'd like to figure it out and write something about all this. I'm generally a pragmatist, but statistical philosophy matters sometimes!


CPM rises from the ashes

I had a great call today with So Hattori (NYUAD) and Dan Foreman-Mackey (Flatiron), about Hattori's reboot of the causal pixel model by Dun Wang (that we used in NASA Kepler data) for new use on NASA TESS data. Importantly, Hattori has generalized the model so it can be used in way more science cases than we have looked at previously, including supernovae and tidal disruption events. And his paper is super-pedagogical, so it will invite and support (we hope) new users. Very excited to help finish this up!


an error model for APOGEE RV data

I worked with Adrian Price-Whelan (Flatiron) this morning on an empirical noise model for SDSS-IV APOGEE radial-velocity data. We fit a mixture of quiet and noisy stars plus additive Gaussian noise to empirical radial-velocity data, and started to figure out how the noise must depend on temperature, metallicity, and signal-to-noise. It looks like we can learn the noise model! And thus be less dependent on the assumptions in the pipelines.


more data, less good answer

I brought up the following issue at group meeting: When Lily Zhao (Yale) looks at how well spectral shape changes predict radial-velocity offsets (in simulated spectroscopic data from a rotating star with time-dependent star spots), she finds that there are small segments of data that predict the radial velocity offsets better than the whole data set does. That is, if you start with a good, small segment, and add data, your predictions get worse. Add data, do worse! This shouldn't be.

Of course whenever this happens it means there is something wrong with the model. But what to do to diagnose this and fix it? Most of the crowd was in support of what I might call “feature engineering”, in which we identify the best spectral regions and just use those. I don't like that solution, but it's easier to implement than a full shake-down of the model assumptions.


our forwards-backwards results are fading

Gaby Contardo (Flatiron) and I have been working on predicting light-curve data points from their pasts and their futures, to see if there is a time asymmetry. And we have been finding one! But today we discussed results in which Contardo was much more aggressive in removing data at or near spacecraft issues (this is NASA Kepler data). And most of our results go away! So we have to decide where we go from here. Obviously we should publish our results even if they are negative! But how to spin it all...?


reimplementing The Cannon

One of the things I say over and over in my group is: We build software, but every piece of software itself is not that valuable: Our software is valuable because it encodes good ideas and good practices for data analysis. In that spirit, I re-wrote The Cannon (Ness et al 2015) in an hour today in a Google (tm) Colab notebook. It's only ten-ish lines of active code! And ten more of comments. The Cannon is not a software package; it is a set of ideas. And my reimplementation has way more stable linear algebra than any previous version I've seen (because I've learned so much about this in the last few years, with help from Soledad Villar). I did the Cannon reimplementation for Teresa Huang (JHU), who is finding adversarial attacks against it.


emission-line ratios

I had a nice conversation today with Renbin Yan (Kentucky) and Xihan Ji (Kentucky) about work they have been doing with emission-line ratios. Typically these ratios are plotted on a “BPT” diagram (yes, named after people, unfortunately). Ji has been looking for more informative two-dimensional diagrams, by considering linear combinations of a larger set of ratios. He has beautiful visualizations! And he can also clearly show how the models of the line ratios depend on assumptions and parameters, which develop intuitions about what the ratios tell us, physically. We briefly discussed the possibility that we might actually be able to constrain nucleosynthesis parameters using emission-line spectra of nebulae!


not much

Today was a low-ish research day. In my research time, I discussed improvements to radial-velocity measurements with Adrian Price-Whelan (Flatiron) and gauge-invariant machine learning with Soledad Villar.


diagnosing data-analysis issues

I had a useful meeting with Lily Zhao (Yale), Megan Bedell (Flatiron), and Matt Daunt (NYU) to discuss Zhao and Daunt's various data-analysis projects in precision spectroscopy. In both cases, we spent a lot of time looking at figures (and, in Zhao's case, interactively making figures in the meeting). This is generic: We spend way more time looking at visualization of issues than we do reading the code that generates them. I think it's important too; code has to make sensible figures; reading code can lead to all sorts of confusions. And, besides, debugging follows the scientific method: You hypothesize things the code could be doing wrong, you design figures to make that would demonstrate the bug, you predict what those figures should and shouldn't show, you make the figures, and you conclude and create new hypotheses. It's funny, I currently don't think that Science (tm) follows the scientific method, but I think debugging scientific code does. Hmmm.


scoping papers about technical work

While sitting in a freezing-cold car (not mine!), I pair-coded with (well really watched code being written by) Adrian Price-Whelan (Flatiron) on the SDSS-IV APOGEE visit sub-frames; the idea is to get higher time-resolution radial-velocity information. In the conversation while code was being hacked, we set scope for a couple of possible papers: In one, we could show that we measure short-period binary orbital parameters more precisely (and more accurately?) with finer time-resolution measurements. In another, we could show that we can measure asteroseismic modes across the red-giant branch. We don't have either result yet, so I am just dreaming. But it's related to the point that it is sometimes hard to publish technical contributions to astronomy.


the 16th birthday of this blog

Today is the 16th birthday of this blog! Yes, this blog has been going for 16 years, and if I trust my platform, this post will be post number 3753. I had a great research day today. In Stars & Exoplanets meeting Rodrigo Luger (Flatiron) showed his nice information-theoretic results on what you can learn from stellar light curves about stellar surfaces, and Sam Grunblatt (AMNH) showed some planets that have—or should have—changing orbital periods as they inspiral into their host stars. I asked Grunblatt about the resonances that might be there, like the ones I just learned about in Saturn's rings: Are planet inspirals sensitive to asteroseismic resonances?

Before and after this meeting, Adrian Price-Whelan (Flatiron) and I continued working on measuring radial-velocities in SDSS-IV APOGEE sub-exposures. We find so many weird effects we are confused! We find sub-hour velocity trends but they seem to have the wrong slopes (accelerations) given what we know about the targets. It might have to do with badly masked bad pixels in the spectra...


sub-exposure velocities in APOGEE

Adrian Price-Whelan (Flatiron) opened up the black box of the SDSS-IV APOGEE data, looking at whether we can measure stellar radial velocities in the short (9-min) sub-exposures that make up a full (1-ish hour) APOGEE exposure. We pair-coded for much of the morning and showed that yes, yes we can! This conceivably increases the time resolution of the APOGEE data considerably, and is useful for short-period systems (and, I hope, red-giant asteroseismology). We have to figure out what's our best method for making the measurements now.


frequencies and resonances

My day had a twitter (tm) component in which I asked about the correspondences between gaps in Saturns rings and low-integer-denominator resonances with Saturns moons (and, I learned, planetary seismic modes). This led me to Jason Hunt (Flatiron), who I asked about resonances in the Milky Way disk: Can the gaps in velocity space in the local disk be associated cleanly with particular resonances with the bar or spiral structure or Sagittarius? He thought yes, for some, and made some nice plots of the three orbital frequencies as a function of velocity in the local neighborhood. These are all steps towards (in my mind) figuring out the frequencies of disk perturbations more-or-less directly from the data.


spectral shapes predict radial-velocity mistakes, simulated-spectra edition

As my loyal reader knows, Lily Zhao (Yale) is looking at whether spectral shape changes predict radial-velocity mistakes in extreme-precision radial-velocity projects. She finds that they do! This time in simulated spectra, in which there are cool, tiny star spots on the surface of a rotating star. This gives us (Zhao, Megan Bedell and me) hope that we can apply this to real data, if we can model the telluric interference accurately enough. My job on this project is to show that what we are doing is an approximation to doppler spectroscopic imaging.


the subtleties of telluric, gas-cell, and spectral absorption

Today I spoke with Matt Daunt (NYU) about subtleties in modeling stellar spectra as observed through a gas cell or the atmosphere: The effects of these things are multiplicative on the spectrum, but multiplicative at extremely high resolution; they aren't strictly multiplicative at low resolution (because convolution doesn't commute with multiplication!). He is close to being able to reproduces some of the results from our wobble paper.


a problem statement for graph neural networks

Today Soledad Villar (JHU) and I tried to write down a zeroth-order problem statement for looking at the applicability of graph neural networks for solving cosmological large-scale-structure problems. The idea is that the universe has graph symmetries: The physics is not sensitive to the order in which we label our particles. This, the machine learners call “graph equivariance”. The universe also has many other symmetries like rotation, translation, boost and so on. These we are calling (for now) “gauge equivariances”. The mathematical language is different from the physical language, as usual!


cosmological gradients in the large-scale structure

Binning is sinning. This phrase appears in our recent paper on correlation-function estimation. Our solution to the problem of binning is very deep (imho): Not only do we obviate binning in the radial-separation direction, we also obviate binning in any other quantity on which you think the clustering might depend (like angle wrt the line of sight, galaxy luminosity, and so on).

Abby Williams (NYU), Kate Storey-Fisher (NYU), and I are using the new unbinned estimator to look for variations in galaxy clustering with position within the Hubble volume. Traditionally this might be done by splitting the space into boxels, and measuring the clustering in boxels separately; are there variations? But binning is sinning: Now Storey-Fisher has made an estimator that can estimate the parameters of a clustering model with an explicit gradient or variation with position. And Williams has made simulated cosmological volumes that contain clustering gradients for testing purposes. We're close to making a (toy) measurement!


referee report

I took a call today with Adrian Price-Whelan in which we discussed the very constructive, useful referee report we received for our orbital torus imaging paper. I don't know if it is just me, but I think refereeing in astrophysics has generally become more constructive over the years. More about improving the literature and less about gatekeeping it. The referee's most challenging comments are about testing our assumptions, and understanding how the anomalies we find in that paper trace back to violations of our assumptions. That's science right there.


writing like mad about Astrometry.net

It's crunch time this weekend on Dustin Lang (Perimeter) and my proposal for the NASA Open-Source Tools, Frameworks, and Libraries call. I spent a lot of quality time this weekend cranking out words. After doing some literature review, we find that Astrometry.net is used in a huge number of projects, from NASA missions to cosmic-ray detectors to (of course) amateur astrophotography workflows. That's exciting, and relevant to our proposal. One of the great things about the NASA call is that it requires us to think about project management, community building, and collaboration policies. That is good; it will help our project immensely.


causal inference in EPRV

Lily Zhao (Yale), Megan Bedell (Flatiron), and I looked at Zhao's results trying to learn radial-velocity displacements from spectral shape changes for EPRV data. She finds that it works, which got me extremely excited! However, there is some suggestion in her results that the method might be making use of unmodeled (unmasked) telluric lines of very low amplitude. This is not permitted, because we want to just use changes in the star's intrinsic shape. We came to the realization in the call that this gets into questions of causal inference: When the signals are small or noisy, how do we know that we are learning from signals caused by the star itself, rather than the atmosphere or instrument? We decided to move to simulated spectra for a bit to look at these questions in a playground where we know the right answers.


adversarial attacks and model derivatives

I froze in my mother-in-law's car (NYC alternate-side parking FTW) while I spoke with Teresa Huang (JHU) and Soledad Villar (JHU) about our old project to find adversarial attacks against machine-learning methods used in astronomy. One of the big problems we face is that our methods require good derivatives of output with respect to input (or vice versa) for the methods we are studying. However, it is often hard to get these derivatives precisely. Even when a method has analytic Jacobian or derivative operators (like tensorflow and jax deliver), it isn't always exactly useful, because sometimes the methods are doing stochastic things like dropout and ensembles when they make predictions. Our conclusion was that maybe we need to be reimplementing all methods ourselves, maybe in straw-person forms. That's bad. But also good?


writing proposals is hard!

Today I took a serious shot at getting words down in my upcoming NASA proposal for open-source tools, frameworks, and libraries. This is a new call to support development and maintainance of open-source projects that are aligned with NASA science missions (yay open science and NASA!). Dustin Lang (Perimeter) and I are proposing to support Astrometry.net, which is used in multiple NASA missions, including SOFIA and SPHEREx. It is hard to put together a full proposal; writing a proposal is comparable in intellectual scope to writing a scientific paper! And it must be done on deadline, or not at all.


is leading-order time dependence spirally?

Independently, Kathryn Johnston (Columbia) and David Spergel (Flatiron) have pointed out to me that if you have a Hamiltonian dynamical system that is slightly out of steady-state, you can do a kind of expansion, in which the steady-state equation is just the zeroth order term in an expansion. The first-order term looks like the zeroth-order Hamiltonian acting on the first-order perturbation to the distribution function, plus the first-order perturbation to the Hamiltonian acting on the zeroth-order distribution function (equals a time derivative of the distribution function). That's cool!

Now couple that idea with the fact that a steady-state Hamiltonian system is a set of phase-mixed orbits nested in phase space (literally a 3-torus foliation of 6-space). Isn't this first-order equation the equation of a winding-up spiral mode? I think it is! If so, it might unite a bunch of phenomenology, from cold stellar streams to spiral structure in the disk to The Snail. I discussed all this with Adrian Price-Whelan (Flatiron).


integrated hardware–software systems

I had a wide-ranging conversation today with Rob Simcoe (MIT) about connections between my group in New York and his group in Cambridge MA. He does complex hardware. We do principled software. These two things depend on each other, or should! And yet few instruments are designed with software fully in mind, in the sense of making good, non-trivial trades between hardware costs and software costs. And also few software systems are built with deep knowledge of the hardware that produces the input data. So there are synergies possible here.


Gaia Unlimited kick-off

Today was the first meeting of the Gaia Unlimited project (PI: Anthony Brown), in which we attempt to make a selection function (and the tools for making many different kinds of selection functions) for investigators making use of the ESA Gaia data to perform population-level inferences. Among the many things we discussed were the definition of the selection function (which is not trivial, given the historical usage of the term (and it appears in 700 refereed publications in 2020, according to NASA ADS), and what's known about the Gaia selection function already. The latter includes amazing work by Boubert and Everall in which they have tried to reverse engineer everything to determine the selection function to very faint levels, given the Gaia scan patterns, telemetry limits, and dropped fields. So far, my role in this project is on the conceptual side, around definitions, terminology, and use cases. Along those lines there was great discussion about what the selection function is, and what it is not. Our position is that it is the probability, given hypothetical properties q, that a counter-factual source with those properties would enter the catalog. Even that definition is not quite complete, because there are details relating to the observability of—and noise in—the properties q. More about this throughout this year!


structure of code

I spoke today with Matt Daunt (NYU), who is re-writing the wobble concept in Jax. That's a great project! We discussed how to structure the code so it is easy to use now and easy to extend. We have various extensions in mind, both conceptually (like using many stars simultaneously to improve our telluric models) and technically (like adapting to gas-cell spectrographs).


stream finding

Stars and Exoplanets meeting at Flatiron (well on xoom, really) was all about finding stellar streams. Matt Buckley (Rutgers) talked about repurposing to the astrophysical domain machine-learning methods employed in high-energy physics experimental data to find anomalies. Sarah Pearson (Flatiron) talked about building things that evolve from Hough transforms. In both cases we (the audience) argued that the projects should make catalogs of potential streams with low (non-conservative) thresholds: After all, it is better to find low-mass streams plus some junk than it is to miss them: Every stream is potentially uniquely valuable.


finishing a paper

I spent time today working through comments from Kate Storey-Fisher (NYU) on the document that Soledad Villar (JHU) and I have written about fitting flexible models. I made those changes, while Soledad put in some proofs of some of the key math points. We are so close to being done! But I don't mind being slowed down by amazingly constructive and useful comments from my students!


mathematical derivation of our clustering estimator

In a long conversation, Kate Storey-Fisher (NYU) and I worked through her new and nearly complete derivation of our continuous-function estimator for the correlation function (for large-scale structure). We constructed the estimator heuristically, and demonstrated its correctness somewhat indirectly, so we didn't have a good mathematical derivation per se in the paper. Now we do!


selection function disagreements

Hans-Walter Rix (MPIA) and I had a solid conversation today about the scope of our first paper on the selection function. We want to be pedagogical in scope and content. So the argument between us is: How sophisticated to get in our selection-function model? Rix is arguing for a less sophisticated case, keeping the story and main point simple, and I am arguing for something more sophisticated, that more connects to the real decisions that people are making every day. And all this relates to exactly what toy problems we show. We came to something of a compromise position, in which we give an example where the apparent magnitude cut is the main selection, but then show what happens when you expand the sample such that other effects beyond the pure apparent magnitude cut start to affect the sample significantly. One of our points will be that as particular selection effects get fractionally smaller in impact on your sample, you don't have to model them as precisely to meet some global accuracy goals for your model of the whole population.