partial differential equations

I am trying to write a proposal to fund the research I do on machine-learning theory. The proposal is to work on ocean dynamics. It's a great application for the things we have done! But it's hard to write a credible proposal in an area that's new to you. Interdisciplinarity and agility is not rewarded in the funding system at present! At least I am learning a ton as I write this.



I've been working on two philosophical projects this month. The first has been an interaction with Jim Peebles (Princeton) around a paper he has been writing, setting down his philosophy of physics. I am pretty aligned with his position, which I expect to hit the arXiv soon. I'm not a co-author of that. But one of the interesting things about science is how much of our work in in anonymous (or quasi-anonymous) support of others.

The second philosophical project is a paper about machine learning and science: I am trying to set down my thoughts about how ML can and can't help the sciences. This is fundamentally a philosophy-of-science question, not a science question.


try bigger writing

I have been buried in job season and other people's projects. That's good! Hiring and advising are the main things we do in this job. But I decided today that I need to actually start a longer writing project that is my own baby. So I started to turn the set of talks I have been giving about machine learning and astrophysics into a paper. Maybe for the new ICML Position Paper call?


Terra Hunting Fall Science Meeting, day 4

Today we delved into even more detail about how the HARPS3 instrument works, looking at engineering drawings and discussing how charge-coupled devices (CCDs) read out. We discussed the time stability of various parts of the instrument and electronics. We are all very excited about assembly, verification, and testing in Cambridge this summer.


Terra Hunting Fall Science Meeting, day 3

Today was a delight! In a working session, Clark Baker (Cambridge) gave a beautiful, conceptual and concrete description of how an echelle spectrograph works and the blaze and the resolution and etc. My favorite moment was the aha! moment I had when he described the Littrow condition. This was followed by Alicia Anderson (Cambridge) explaining how the data reduction proceeds. Then she and Federica Rescigno (Exeter) helped us install the data-reduction software for the ESO instruments (ESPRESSO, HARPS-N, etc) and we started reducing raw echelle data.

Before all this there was a wide-ranging discussion of measuring 3-point functions of radial-velocity time series data. This was inpired by the question: Is a Gaussian process a good model for these data? I hope this turns into a project or set of projects.


Terra Hunting Fall Science Meeting, day 2

So many good things happened in the meeting today! Highlights were presentations by Niamh O'Sullivan (Oxford) Ben Lakeland (Exeter) who showed amazing results running models of stellar variability on data from the Sun. O'Sullivan can see that the sun goes through many different phases of spots, granulation, and super-granulation. She finds these by fitting Gaussian processes of certain forms. Related: Suzanne Aigrain (Oxford) showed that even in very gappy data, the GP fits are unbiased, whereas naive use of periodograms is biased!

Lakeland showed that super-granulation can in principle be modeled in the Solar time series, and maybe the tiniest hint that when he corrects for super-granulation well, the RV variability might be even lower than at times at which there is no super-granulation in play at all. Does super-granulation suppress other kinds of variability?

I'm very optimistic—between Liang yesterday, Zhou's work at Flatiron, and these presentations—that we will be able to mitigate many difficult sources of stellar variability. I was inspired to outline a conceptual paper on why or how this is all going to work.


Terra Hunting Fall Science Meeting, day 1

Today was the first day of the Terra Hunting annual science meeting. One highlight of the day was a presentation by Yan Liang (Princeton), who is modeling stellar spectral variability (the tiny variability) that affects extremely precise radial-velocity measurements. Her method involves a neural network, which is trained to distinguish RV variations and spectral shape variations through a self-supervised approach (with a data augmentation). Then it separates true stellar RV variations from spectral-variability-induced wrong RV variations by requiring (essentially) that the RV variations be uncorrelated with the (latent) description of the stellar spectral shape. This connects to various themes I am interested in, including wobble by Bedell, a spectral variability project by Zhao, and causal structure in machine learning.


double periodogram

Cole Johnston (Leuven) is in New York this week. We discussed the problem of finding oscillation modes in the photometry of stars in the presence of a large, binary-induced periodicity. What he kind-of wants is a simultaneous fitting of a flexible periodic function plus a periodogram. We did some experiments (very promising!) and discussed the elements that will come together to make this all happen. The final method will look like a double fourier transform, in which one frequency grid gets the periodic part, and the other grid gets the rest of the modes and noise.


grant proposals

There is a non-wrong view of academic science that it is all about applying for funding, and evaluating the proposals of others for funding. That's all I did today (evaluated proposals for a foreign funding program; I submitted my own proposal to the NSF yesterday).


postdoc applications

There is a non-wrong view of the academic enterprise that it is entirely about getting hired, evaluating people for hire, and hiring. That's all I did today (okay the latter two, not the first).


conjectures about pre-training

On Monday of this week, Shirley Ho (Flatiron) gave a talk at NYU in which she mentioned the unreasonable effectiveness of pre-training a neural network: If, before you train your network on your real (expensive, small) training data, you train it on a lot of (cheap, approximate) pre-training data, you get better overall performance. Why? Ho discussed this in the context of PDE emulation: She pre-trains with cheap PDEs and then trains on expensive PDEs and she gets way better performance than she does if she just trains on the expsensive stuff.

Why does this work? One interesting observation is that even pre-training on cat videos helps with the final training! Ho's belief is that the pre-training gets the network understanding time continuity and other smoothness kinds of things. My conjecture is that the pre-training teaches the network about (approximate) diffeomorphism invariance (coordinate freedom). The cool thing is that these conjectures could be tested with interventions!


radical papers I want to write (or will never write)

I have to finish my NSF proposal with Mike Blanton (NYU), so naturally I am in procrastination mode. Here are three papers I wish I would write. Maybe I should post them on my ideas blog:

Occam's Razor is wrong: This paper, co-authored with Jennifer Hill (NYU), would be about the fact that, in the real, observed world, the simplest explanation is always wrong or at least incomplete.

Causation is just causality: This paper, maybe co-authored with David Blei (Columbia) or Bernhard Schölkopf (MPI-IS) or Hill, shows that you don't need to have free will in order to have cogent causal explanations of data. That is, you don't need to phrase causality in terms of predictions for counter-factual experiments that you might have chosen to do.

You don't ever want evidence: This paper shows that any time you are computing the Bayesian evidence—what I call the fully marginalized likelihood (fml)—you are doing the wrong integral and solving the wrong problem. For both practical and theoretical (principled) reasons.


data augmentation

A highlight of my day was a colloquium by Renée Hložek (Toronto) about cosmology and event detection with the LSST/Rubin. Importantly (from my perspective), she has run a set of challenges for classifying transients, based on simulations of the output of the very very loud LSST event-detection systems. The results are a bit depressing, I think (sorry Renée!), because (as she emphasized), all the successful methods (and none were exceedingly successful) made heavy use of data augmentation: They noisified things, artificially redshifted things, dropped data points from things, and so on. That's a good idea, but it shows that machine-learning methods at the present day can't easily (or ever?) be told what to expect as an event redshifts or gets fainter or happens on a different night. I'd love to fix those problems. You can almost think of all of these things as group operations. They are groups acting in a latent space though, not in the data space. Hard problems! But worthwhile.


writing proposal

Mike Blanton (NYU) and I are writing an NSF proposal. That took up most of my research time today!


linear regression

Valentina Tardugno (NYU) and I are looking at the NASA TESS housekeeping data: What parts of it are relevant to understanding the light curves? The weird thing is: We are asking this by asking: What housekeeping data can be reliably predicted using the light curves? Why this way? Because the light curves are higher in signal-to-noise (in general) than most channels of the housekeeping data. Today we went through all the relevant linear algebra for big linear models (which is where we are starting, of course!).


abundance gradients wrt positions or actions

It is traditional to plot things like the mean iron abundances of stars (or ratios of magnesium to iron, or other ratios) as a function of position in the Galaxy. However, stars change their positions over time, so the gradients (the features in any abundance–position plots) will be smeared out over cosmic time by their motions.

At the same time, stars have approximately invariant actions or integrals of motion, which don't change (much) as they orbit. These invariants are only approximate, both because the Galaxy isn't exactly integrable, and also because we don't know or measure everything we need to compute them precisely for any observed star.

Putting these two ideas together, the abundance–action features, or really abundance–invariant features should be much clearer and more informative than the abundance–position features. Awesome, let's go! The only problem is: Selection effects are often simple in the position space, but are almost never simple in the dynamical-invariant-space. So any plots are harder to interpret generally.

These are issues that I have discussed over many years with Hans-Walter Rix (MPIA). Today I discussed them with Danny Horta (Flatiron) and Adrian Price-Whelan (Flatiron), in preparation for an exploratory study by Horta.


predicting spectra from spectra

Saakshi More (NYUAD) came into my office during office hours today to ask about possible data science projects in physics. I pitched to her predicting ESA Gaia RVS spectra from Gaia XP spectra, and vice versa. Has anyone done that? In one direction, you have to predict high resolution detail from low-resolution input; in the other direction, you have to predict a wide wavelength range from narrow input. It seems like perfect for something like a linear auto-encoder (at least for a small patch of the color–magnitude diagram; non-linear for a large patch). Later in the day I talked to Gaby Contardo and she said: If you want to go simple, how about nearest neighbor? Good idea!


unitary evolution of the Universe

I spent the day with Juna Kollmeier (CITA) talking about epistemology, physical cosmology, and project management (especially academic management). I found myself saying to her the following argument (which I have not seen written down anywhere): Imagine that our Universe is hamiltonian (or lagrangian; it doesn't matter for these purposes). And imagine that our Universe is a simulation being run inside some bigger universe, which is also hamiltonian.

If our Universe is being observed in any sense by any system in that bigger universe, then there ought to be a loss of unitarity in our Universe. That is, there should be a violation of Liouville's theorem, or a violation of key conservation laws, or an information sink. And there is! At black hole horizons, there is an information paradox: Information that goes in never comes back (an evaporating black hole evaporates thermally, or so we think). Thoughts?


ARC BOG meeting

I stayed on at Cloudcroft after the SDSS-V Advisory Council meeting for the ARC Board of Governors meeting, which is the meeting of the organization that runs the Apache Point Observatory. I spent a lot of the meeting learning about the 3.5m and the site, which was interesting, and which made me think about how we apportion our resources in astronomy. These are huge facilities, run very lean (money wise), and they produce a lot of science. The SDSS family of projects has had simply immense scientific impact.

One success of the meeting: I have successfully coined and propagated the term SDSS Classic to mean SDSS-I and SDSS-II. Multiple people at the meeting now use this terminology!


SDSS-V AC meeting

Today I chaired the annual Advisory Council (AC) meeting for the SDSS-V project. The AC protects the interests of the partners, who gave money and other resources to the project. We had many presentations from different parts of the project and OMG this project is amazing. I learned a ton and feel very happy that our money is well spent. This activity counts as research because project management is a key part of science.

The AC meeting was followed by touring the observing hardware (I love it; the SDSS Telescope is incredibly important to everything I have done since the late 1990s), followed by actually looking through the 3.5m telescope at Apache Point Observatory.


toning down my language

I spent travel time (at airports and on airplanes) working on the title, abstract, and introduction of the forthcoming paper with Andy Casey (Monash) about combining visit spectra into mean spectra. This was mainly about me changing the tone from “You are all doing it wrong!” to a tone more like “Here's a way to think about it, and the consequences thereof.” After all, no method is the best for all situations and cases. Our method is best for situations where the individual visit spectra are barely sampled or under-sampled.


M dwarfs

I had a great phone call with Madyson Barber (UNC) and Andrew Mann (UNC) today about M dwarf stellar spectroscopy. I love the problem of understanding the spectra of M dwarfs because this is a subject where there is no ground truth: No physical models of M dwarf photospheres work very well! Why not? Probably because they depend on lots of molecular transitions and band heads, the properties of which are not known (and very sensitive to conditions).

I love problems where there is no ground truth! After all, science as a whole has no ground truth! So the M-dwarf spectroscopy problem is a microcosm of all of science. I went off the deep end on this call, and we were all left knowing less than we knew when we started the call. By this post, I apologize to Barber and Mann.



who owns a research project?

My day ended today with a great conversation about the ownership of research projects with a postdoc. When you make the transition from graduate student to postdoc, whose projects are whose? Are they the projects of your supervisors, or are they the projects of you? And should you keep doing them, or should you move to new things? I don't think there are easy answers, and I think that there are many subtle ways in which people have unresolved differences about these things. Since much of my work these days is postdoctoral mentoring, I've thought about this a lot. My only recommendation, which is hard to implement, is that clear communication about expectations is really, really important. And not just the expectations of the supervisors; the expectations of the (former) student are way more important!


area of a triangle?

On Friday and the weekend, I came up with (what I think is) a novel formula for the area A of a triangle! That's weird. I was looking for a formula in the Deep Sets (or map-reduce) format. Here it is. It's ridiculous and useless, but it involves only sums over functions of the individual corners of the triangle. It was hard to find! But it's exact (I believe).


information theory for spectroscopy

I had a meeting this morning with Megan Bedell (Flatiron) about our dormant paper about information theory and extreme-precision radial-velocity measurements. We see the paper a bit differently (is it about methods or is it about concepts?), but we were able to re-state a scope with which we are both happy. We assigned tasks (Bedell writing and me coding, mainly), and promised to make progress before next week. It is very, very, very hard to finish a paper! Especially when all authors are above some seniority, where they spend most of their time with others. I would love to get a lot more personal coding time!


symmetry day: crossing, permutation

Today's brown-bag talk, by Grant Remmen (NYU), was about (in part) crossing symmetry. This is the symmetry that any Feynman diagram can be rotated through 90 degrees (converting time into space and vice versa) and the interaction will have the same scattering amplitude. This symmetry relates electron–positron annihilation to electron–electron scattering. The symmetry has an important role in string theory, because it is a constraint on any possible fundamental theory. This symmetry has always seemed incredible to me, but it is rarely discussed outside very theoretical circles.

After the talk, and in the Blanton–Hogg group meeting, I brought up things about invariant functions that I learned from Soledad Villar (JHU) that are really confusing me: It is possible (in principle, maybe not in practice) to write any permutation-invariant function of N objects as a function of a sum of universal functions of the N objects (that's proven). How does that relate to k-point functions? Most physicists believe that any k-point function estimate will require a sum over all N-choose-k k-tuples. That's a huge sum, way bigger than a sum over N. What gives? I puzzled some of the mathematical physicists with this and I remain confused.


Florida, day two

Today was day two of my visit to University of Florida. I had many interesting discussions. One highlight was with Dhruv Zimmerman, who wants to infer big labels (non-parametric functions of time) from small features (a few bands of photometry). That's my kind of problem! We discussed different approaches, and we discussed possible featurizations (or dimensionality reductions) of the labels. I also pitched an information-theoretic analysis. If there's one thing I've learned in the last few years, it is that you shouldn't be afraid to solve problems where there are fewer data than parameters! You just have to structure the problem with eyes wide open.

After many more (equally interesting) discussions, the day ended with Sarah Ballard's group out at a lovely beer garden. We discussed the question: Should students be involved in, and privy to, all the bad things with which we faculty interact as academics, or should we protect students from the bad things? You can imagine my position, since I am all about transparency. But the positions were interesting. Ballard pointed out that in an advisor–student relationship, the student might not feel that they can refuse when the advisor wants to unload their feelings! That power asymmetry is very real. But Ballard's students (Chance, Guerrero, Lam, Seagear) said that they want to understand the bad things too; they aren't in graduate school just to write papers (that comment is for you, Quadry!).


Florida, day one

I spent today with Sarah Ballard's group, plus others, at the University of Florida. I gave a talk, to a large, lively, and delightful audience. At the end of this talk I was very impressed by the following thing: Ballard had everyone in the room discuss with their neighbors (turn and talk) for about 3 minutes, after the seminar but before the question period began! This is a technique I use in class sometimes; it increases participation. After those 3 minutes, audience members had myriad questions, as one might imagine.

I spoke with many people in the Department about their projects. One highlight was Jason Dittman, who showed me gorgeous evidence that a particular warm exoplanet on an eccentric orbit has an atmosphere that undergoes some kind of phase change at some critical insolation, as it moves away from its host star on its orbit. Crazy!

Late in the day I discussed n-point functions and other cosmological statistics with Zach Slepian and Jiamin Hou. We discussed the plausibility of getting tractable likelihoods for any n-point functions. We also discussed the oddity that n-point functions involve sums over n-star configurations among N stars (N choose n), but there are mathematical results that show that any permutation-invariant function of any point cloud can be expressed with only a sum over stars (N). That sounds like a research problem!


biases from machine learning

Today I gave a talk (with these slides) at a meeting in Denver for the NSF initiative Harnessing the Data Revolution. I spoke about the necessity and also the dangers of using machine-learning methods in scientific projects. I brought up two very serious possible biases. The first is that if emulators are used to replace simulations, and they can't be easily checked (because the simulation requirements are too expensive), the emulators will lead to a confirmation-bias problem: We will only carefully check the emulations if they lead to results that we don't like! The second bias I raised is that if we perform joint analyses on objects (stars, say) that have been labeled (with ages, say) by a machine-learning regression, there will in general be strong biases in those joint analyses. For example, the average value of 1000 age labels for stars labeled by a standard ML regression will not be anything like an unbiased estimate of the true average age of those stars. These biases are very strong and bad! That said, I also gave many example locations where using machine learning methods is not just okay but actually intellectually correct, in areas of instrument calibration, foregrounds, and other confounders.

The question period was great! We had 25 minutes of questions and answers, which ranged across a very wide set of topics, including statistics, experimental design, and epistemology.


Bayesian evidence?

Kate Storey-Fisher, Abby Williams, and I spent some time discussing unpublished work that relies heavily on calculations of the Bayesian evidence. Bayesian evidence—what I call the “fully marginalized likelihood”—relates to the volume of the posterior in parameter space. It is generally extremely sensitive to the width of the prior pdf, since if you are comparing two models with different parameterizations, the numbers you get depend on how you normalize or scale out the units of those parameter-space volumes. Indeed, you can get any evidence ratios you want by tuning prior pdf widths. That's bad if you are trying to conclude something, scientifically! Bayesian inference is only principled, imho, when you can quantitatively state the prior pdf that correctly describes your beliefs, prior to seeing the new data. And even then, your evidence is special to you; any other scientist has to recompute from scratch.


representation of flexible functions

Emily Griffith (Colorado) and I met today to look at replacing a spline interpolation function deep inside some of our code with a Fourier series. The idea is that we need a flexible function of one variable, and we were using a spline of a set of control points, but (for many reasons) we wanted to change to a sum of sines and cosines. The code work was a mess! The small change hits a lot of places inside our model, which is our K-process data-driven nucleosynthetic model. This same problem appears in the new version of wobble by Matt Daunt (NYU). I love flexible functions, but it's hard to implement them in a properly abstracted way. That is, it is hard to write a model so that you can just swap in a Fourier series or a Gaussian process where you used to have an interpolation of control points.


precision spectroscopy

One research highlight from the day was a conversation with Madeleine MacKenzie (ANU) about many things, including measuring magnesium isotopes in high-quality (high resolution and high SNR) stellar spectra. This comes just after a conversation (yesterday) with Matt Daunt (NYU) saying that he wants to do something with the extremely high-quality stellar spectra produced by the jax-wobble pipeline we are building. So I think there is a project to do. If we measure Mg isotopes, even for a few stars, we might be able to fit them into the 2d model of disk abundances that Emily Griffith (Colorado) and I are building. The model is so simple that we would only need a few stars to learn something interesting—including learn that the isotope ratio variations (seen by MacKenzie) represent some new kind of variability. Do they?


first-ever Blanton–Hogg group meeting?

Today was not the first-ever Blanton–Hogg group meeting. But it was the first ever for me, since I missed the first two for health and travel reasons. It was great! Tardugno (NYU) showed simulations of gas disks with embedded planets. The planets affect the disk, and the disk causes the planets to interact. Daunt (NYU) showed that his method for inferring (simultaneously) the spectrum, tellurics, and radial velocities in stellar spectra all works. I am stoked! Novara (NYU) showed that he has a bug in his code! But we had a good discussion inspired by that bug about surfaces of section in a real dynamics problem. Gandhi (NYU) showed us a paper with questionable claims about the CMB light passing through galaxy halos?


machine-learning theory and practice

Today I got invited to be on a panel discussion (hosted by Soledad Villar of JHU) with Alberto Bietti (Flatiron) about the theory and practice of machine learning. It was great! We talked about why ML works for scientific applications, and Bietti said something (obvious maybe) that I loved: Maybe ML only works because of properties of the data. That is, maybe when we are analyzing ML methods we are looking in the wrong place, and we should be analyizing the data to which they are successfully applied? I made fun of interpretation in ML, and that led to interesting comments from both Bietti and the audience. Several audience members suggested taking something more like a causal approach to interpretation: How does the method work under interventions or in conditional situations? That's interesting; it isn't what a physicist would consider interpetation, but it might be sufficient in many cases.


is the world lagrangian?

My day started with a long and very fun conversation with Monica Pate (NYU) about conservation laws in classical physics. As we all know, conservation laws are related to symmetries; each symmetry of the laws of physics creates a conservation law. Or does it? Well, it's a theorem! But it's a theorem when the laws of physics are lagrangian (or hamiltonian). That is, every symmetry of a hamiltonian system is associated with a conservation law in that system. So I asked: How do we know if or whether the world is lagrangian or hamiltonian? How could we know that? My best guess is that we know it because of these very conservation laws! The situation is complex.


planets forming in a disk

At the end of last week I had a great conversation with Valentina Tardugno (NYU) and Phil Armitage (Flatiron) about how planets form. I spent the whole weekend thinking about it: If a few planets are forming in a proto-planetary disk, there are all sorts of interactions between the planets and the disk, and the planets and each other, and the disk with itself. You can think of this (at least) two different ways:

You can think of planets which are interacting not just directly with one another, but also with a disk, and with each other as mediated by that disk. This is the planet-centric view. In this view, the planets are what you are tracking, and the disk is a latent object that makes the planets interact and evolve.

Alternatively, you can think of the disk, with planets in it. In this disk-centric view, the planets are latent objects that modify the disk, creating gaps and spiral waves.

Both views are legitimate, and both have interesting science questions. We will explore and see where to work. I am partial to the planet-centric view: I want to know where planetary systems come from!


planning your science

I had two interactions today that made me think seriously about big-picture and design things. I like design language: How do you design your whole research program, and how do you design individual projects so they fit into it. One interaction was in the Astronomical Data Meeting at Flatiron, where Vivi Acquaviva (CUNY) talked about the intersection between what you are good at, what is important, and what brings you joy. That's a hard intersection to find. Or way too easy; I am not sure. The other interaction was a conversation with Jiayin Dong (Flatiron), who is thinking about faculty job applications and the like. How to talk about your research in terms of the next decade instead of the next year?

One comment that is frequently made by Hans-Walter Rix (MPIA) is that he feels like most early-career (and even mid-career) people spend too much time doing their science and not enough time planning and justifying their science. It is important to be able to answer “why” questions about your research, and in the medium term it helps all your projects.



I got really lost with respect to research today. In almost all of my projects I am supposed to be mentoring postdocs and students. Today various blocks came up that interfered with that mentoring. And then I found that I had nothing sensible to work on! Of course that isn't true: I have literally a dozen projects in a mature state waiting on final work from me. But I couldn't figure out how to work on any of them. Research is hard. At the end of the day, Andy Casey (Monash) helped me out by giving me some very specific jobs to do.


uncertainty estimation for regression outputs

Most methods for performing regressions don't provide natural uncertainties. Some do, of course! But few deliver uncertainties you will believe. I discussed these issues with Contardo (SISSA) today, in the context of our project to (confidently) find infrared excesses around boring old main-sequence stars. One option is to look at the performance on held-out data. But then you have to decide how to aggregate this information in a way that is relevant for each object in your sample: They probably don't all have the same uncertainty! Another option is to look at the variation of prediction across training sets. That's good! But it requires that you have lots of training data. In this case, we do, so that's where we are at right now.


regressions for point clouds

I spent my research time today writing in a document that proposes (and demonstrates) some methods for performing machine-learning-style regressions, but where the input objects (features) are variable-size point clouds. Contributions also from Villar (JHU) and Gebhard (MPI-IS). I spent way too long working out the terminology and notation, and I am still wrong.


is a periodic signal in a time series statistically significant?

I had conversations with Nora Eisner (Flatiron) and Abby Shaum (CUNY) today about how we report the significance of a signal we find in a time series. In particular a periodic signal. It's an old, unsolved problem, with a lot of literature. And various hacks that are popular in the exoplanet community (and binary-star community!). My position is very simple: Since all methods for determining significance are flawed, and since when you fit a signal you have to estimate also an uncertainty on that signal's parameters, the simplest and most basic test of significance is the significance with which you measure the amplitude of the proposed signal. That is, if the amplitude is well measured, the signal is real. Of course there are adversarial data sets I can make where this isn't true! But that's just a restatement of the point that this is an unsolved problem. For deep reasons!


teeny tiny cosmological simulations.

Connor Hainje (NYU) is looking at this paper by Chen et al which uses a machine-learning regression to interpolate between cosmological simulation outputs at different cosmological epochs. To build an end-to-end pipeline for testing ideas, he has been running 32-cubed cosmological simulations. These might be the smallest simulations run since the 1980s! But, interestingly, he is finding that the interpolation isn't working great. Is this because it is harder to train a regression on a small simulation than it is on a large simulation? Is a small simulation less predictable or less interpolate-able? It's expensive to find out!


gradients of unit vectors

When you work in a curvilinear coordinate system, and you need to take gradients or tensor derivatives of scalar, vector, and tensor functions, the gradients of the unit vectors appear in your expressions. The unit vectors have gradients because, in a curvilinear coordinate system, they have orientations that depend on position. I gestured and imagined and guessed these derivatives for a spherical coordinate system by thinking geometrically. I got strange expressions I didn't believe. Then, today, I checked them by painstakingly taking derivatives, and my intuitive derivatives turned out to be exactly correct?


kinematic dipole and dust

I had a long conversation with Kate Storey-Fisher (NYU) and Abby Williams (Caltech) about the dipole in the Quaia catalog caused by the kinematic motion of the Solar System barycenter with respect to the cosmic rest frame. Williams has found that the amplitude of the dipole we get depends very strongly on how we account for dust in our sample. There is currently a controversy about the amplitude of the dipole seen in WISE quasars. We now think that it is possible that the measured amplitude is a strong function of how dust is corrected for in the sample? We designed new tests for next week.


O-minus-C inanity

In the exoplanet (and, before that, eclipsing-binary) communities, transit-timing variations are described in terms of a quantity called O−C (pronounced “oh minus sea”), which is the difference between the observed transit time and the “computed” transit time. Right now, Abby Shaum (CUNY) and I are using this terminology in our manuscript about phase variations in coherent pulsators with companions, at the behest of Keaton Bell (CUNY). Okay fine! But O−C has this terrible property, which is that the C part depends on the period or frequency you assume. You can completely change the appearance or morphology of an O−C plot just by slightly tweaking the period. And there is no true period of course! There is just whatever estimates you can make. Which are, in turn, affected by what you use to model the O−C. So it is absolutely awful in every way. Not a stable observable, people! Not even identifiable.


making linear algebra faster in practice

The key thing to make your code run faster is to avoid building large linear-algebra objects. For example, if you need to get the matrix product A.x, where A is a huge matrix and x is a long vector, and you only ever use the matrix A to do this one multiply by x, there is no reason to actually create A. Just create a function that evaluates A.x for any input x. That should be way faster, because of less memory allocation, and because you don't have to make the parts of the matrix that are all zeros (for example). Matt Daunt (NYU) and I discussed all this at length today, as we profiled code. This comment has some overlap with Section 9 of this paper.


high-order integration schemes

I was working on a white paper on ocean dynamics today and I threw in a sentence about how emulators (like machine-learning replacements for simulations) might be working because they might be effectively learning a high-order integration method. I then threw in a sentence about how, in many applications, high-order integrators are known to be better than low-order integrators. I then went to find a reference and... well, I am not sure I can back that up with a reference! I thought this was common knowledge, but it looks like almost all simulations and integrations are done with low-order integrators. Am I living in a simulation? (A simulation integrated with wimpy first-order integrators?)


an alternative to the L–S periodogram

Following some experiments and rants over the last few days with Nora Eisner (Flatiron), I wrote down today an algorithm for a hacky replacement of the Lomb–Scargle periodogram. This periodogram method has various bad pathologies, the worst of which is that it presumes that there is exactly one frequency that fully generates the data. If there are two, the assumptions are broken and the good properties are lost.

Not that my alternative has any good properties! It is like the radio interferometry method called CLEAN: It involves iteratively identifying frequencies and fitting them out. It's terrible. But it might be better than some of the wacky hacks that people do right now in the asteroseismology community.


classification to save labor

I spent part of the day discussing with Valentina Tardugno (NYU) and Nora Eisner (Flatiron) the goals of a machine-learning classification that Tardugno is creating to help the PlanetFinders project. The deal is: Citizen scientists find candidate planets and (currently) a human (Eisner) has to vet them, to remove contamination by various sources of false positives. This turns out to be a hard problem! When problems are hard, it becomes critical to very precisely specify what you are trying to achieve. So we spent time discussing what, exactly, it is that Eisner needs from a classifier. Is it to find good planets? Is it to remove obvious contaminants? Are some contaminants more problematic than others? Is it to save her hours of wall-clock time? Etc.


Phi-M radio

I worked today with Abby Shaum (CUNY) on her paper about her phase-demodulator to find exoplanet and substellar companions to stars by the timing of asteroseismic modes. I suggested that we highlight the incredible simplicity of her project by writing the method as an algorithm of just a few lines.


CZS Summer School, day 5: diffusion

Diffusion models are all the rage in machine learning these days. Today Laurence Levasseur (Montréal) gave a beautiful talk at the CZS Summer School about how diffusion works. She started with a long physics introduction, which was great, and also insightful, about how diffusion works in small physical systems. Then she showed how it can be turned into a method for sampling very difficult probability distributions.

I have a history of working on MCMC methods. These permit you to sample a posterior pdf when you only know a function f that is related to your posterior pdf by some unknown normalization constant. Similarly, diffusion lets you sample from a pdf when you only know the gradient of f. Again, you don't need the normalization. That makes me wonder: Should we be using diffusion in places where we currently use MCMC? I bet the answer is yes, for at least some problems.


CZS Summer School, day 4: GNNs

Today Andreea Deac (Montréal) gave a talk at the CZS Summer School about graph neural networks, and enforcing exact symmetries. It was a great talk, because it was useful to the students and filled with insights even for the experienced machine-learners. She did a great job of connecting GNNs to other methods in use in ML, including convolutional neural networks, and Deep Sets.


CZS Summer School, day 3: Deep Sets

Today was day 3 of the CZS Summer School, in which I am helping mentor a group of students working on equivariant methods on point clouds. In our working session today, Soledad Villar (JHU) (who is the main mentor for this group of students) gave a short, spontaneous explanation of the main Deep Sets result in machine learning: (Almost) any permutation-invariant function of a set of objects xi can be written in an amazingly simple form: h(Σig(xi)), where h and g are potentially nonlinear functions. That result is super-strong, and super-useful!


other kinds of machine learning

Astronomy is very focused on machine learning in the sense of regression and classification, but machine learning can do many other things. In addition, machine learning is a sub-field of machine intelligence, which is broader. I started today working on a proposal for the NSF (to be written with Mike Blanton, NYU) in which we propose using other kinds of machine learning and machine intelligence, and apply them earlier in the scientific process (like at operations and calibration) instead of at the end (like at source classification and labeling).


spherical harmonics for tensor fields

I have been kicking around the generalization of spherical harmonics to vector spherical harmonics, and how that might generate the tensor spherical harmonics to all orders of tensor and all parities. I think I got it today! For every spherical harmonic (ell and em), there are three vector spherical harmonics obtained by multiplying by the radial vector, taking the transverse gradient, and taking the transverse gradient and crossing it into the radial direction. I think these can be generalized (using, say, the Ricci calculus) to make the 2-tensors and so on. If I am right, this is a new way to represent tensor fields on the sphere. Use cases: Cosmic backgrounds, and ocean dynamics.


four kinds of emulators

I wrote in a draft grant proposal related to machine-learning emulators today. I wrote about five different kinds of emulators. Yes I think there are five qualitatively distinct kinds. Here they are:

Full replacement
The most extreme—and most standard—kind of emulator is one that simply replaces the full input–output relationship of the entire simulation. Thus if the simulation starts with initial conditions and boundary conditions, and ends with a final state (after an integration), the full-replacement emulator would be trained to learn the full relationship between the initial and boundary conditions and the final state. A full-replacement emulator is a complete, plug-in replacement for the simulator.
Simulation run times generally scale linearly with the number of time steps required to execute the integration. A set of emulators can be trained on a set of snapshots of the simulation internal state at a set of times that is much smaller than the full set of integration time steps. Each emulator is trained to learn the relationship between the internal state of the simulation at one time tA and the internal state of the simulation at a later time tB, such that the emulator can be used to replace the integrator during the time interval from tA to tB. A set of such emulators can be used to replace part or all of the integration performed by the simulator.
Resolution translator
Simulation run times generally scale with the number of grid points or basis functions in the representations of the state. Thus the simulator gets faster as resolution is reduced. An emulator can be trained to learn the relationship between a low-resolution simulation and a matched high-resolution simulation. Then a high-resolution simulation can be emulated by running a fast low-resolution simulation and applying the learned translation.
Physics in-painter
In most physical systems, there are coupled physics domains with different levels of computational complexity. For example, in cosmology, the pure gravitational part of the simulation is relatively low in computational cost, but the baryonic part—the atoms, photons, ram pressures, magnetic fields—is very high in computational cost. The simulator gets faster as physics domains, or equations, or interaction terms, are dropped. An emulator can be trained to learn the relationship between a simulation with some physics dropped and a matched full simulation. Then a full-physics simulation can be emulated by running a partial-physics simulation and applying the learned in-painting of the missing physics.
Statistics generator
In many contexts, the goal of the simulation is not to produce the full state of the physical system, but only certain critical statistics, such as the two-point correlation function (in the case of some cosmology problems). In this case, there is no need to emulate the entire simulation state. Instead, it makes sense to train the emulator to learn only the relationship between the initial and boundary conditions of the simulation and the final statistics of particular interest.


raw data from Cassini

One thing we discovered this past academic year is that NASA Cassini took more than 300,000 images of Saturn's rings! Today I met with Maya Nesen (NYU) and Ana Pacheco (NYU) to look at Cassini raw spacecraft data. Nesen is working on the tabulated housekeeping data, giving the position and orientation of the spacecraft and instruments in various coordinate systems (that we are trying to work out). Pacheco is working on the raw imaging data from the imaging module. We discussed how to display the imaging so that an astronomer can confirm the the noise level and rough noise properties in the pixels. We discussed adjustments to our plots of the housekeeping data to aid in our interpretation of it. In particular, we looked at some of the camera-related meta data and it looks like the camera might have a few different zoom settings. I guess we have to read some documentation!


building trust in emulators

I started writing in a possible grant proposal (that would be in collaboration with others) about the trustworthiness of machine-learning emulators. Emulators are systems that learn the input–output relationship of a computationally expensive simulation and produce (or speed the computation of) new simulation outputs, reducing total computational requirements for a given number of simulations. These are so important now that the ESA Euclid and Simons Observatory data-analysis plans crucially involve emulation.

The issue is: How do we trust that the emulators are giving good outputs? There is no obvious way to test them, except by comparing to held-out training data. But in large-scale structure contexts, no amount of held-out data can test the enormous input data space. I don't know how we will ever trust such systems (and damn do we need to!), but I have some ideas about how to improve the situation. One involves enforcing physics symmetries on the emulators. Another involves running adversarial attacks on them.


truly zero-metallicity stars

All week I have been discussing with Hans-Walter Rix (MPIA) the possibility that we could find the elusive, truly zero metallicity stars in the Milky Way halo. This from an email I wrote today, reacting to the observational point that no-one has ever seen such a star:

How could there be absolutely ZERO zero-metallicity stars in the Milky Way? Here's everything I got:

  1. Maybe there are literally no low-mass stars ever made at zero metallicity. Absolutely none, at high precision. This is possible, given how little we know about star formation and the IMF.
  2. Maybe stellar evolution is so weird at zero metallicity that low-mass stars go dark by 13 Gyr. I very much doubt that this is a possibility. But maybe low-mass stars at primordial abundances never burn and just slowly collapse into white-dwarf-like condensed objects on a very long cooling timescale. They would be super cold by now, like 100 K maybe??
  3. Maybe low-mass stars form slower than high-mass stars in star-forming regions at zero metallicity. This permits a few of the high-mass stars to quickly evolve and explode, polluting the outsides of the just-forming low-mass stars. These low-mass stars would then have low but non-zero surface metallicities, and be very alpha-enhanced. This is possible, although would it really leave NO zero-metallicity low-mass stars behind?
  4. Maybe low-mass stars somehow self-pollute at formation (or later in their lives). Maybe the nuclear fusion kicks in slightly before gravitational steady-state or radiative zones are set up, and the first bit of nuclear burning gets mixed in to the stellar envelopes? These stars would appear to have non-zero (but weird, carbon-enhanced maybe?) abundances. Or maybe this happens later in life because of some weird internal mixing. I have literally no idea whether any of this is possible.


elusive quasar dipole

There should be an imprint of the kinematic dipole observed in the cosmic microwave background in any cosmological tracer: The dipole is set by the Solar System barycentric velocity relative to the local Hubble flow, and that same velocity should imprint a dipole on anything cosmological. I have been working on this in part because it is a good measurement to make with Quaia, and in part because the dipole in the quasars is controversial. I have many thoughts, but I will save them for later.

Anyways, Abby Williams (NYU) has been working on making this measurement, and her dipole amplitude and direction depend on what we hold fixed and what we vary (in particular selection-function components), and they also depend on what sky region we use. None of this is surprising; the selection function has a strong dipole in it, and it is not known precisely. But then I don't understand how the studies published previously have such good error bars. Maybe they didn't consider the various different fitting regimes?


finding truly zero-metallicity stars

My conversation with Bergemann (MPIA) yesterday and a conversation with Fouesneau (MPIA) today made me think that it should be possible to do a complete search of the ESA Gaia XP spectra for zero-metallicity (and I mean primordial-abundances) stars. The spectra are easy to compute, and the hypothesis tests against normal stellar models are all set up already in the codes run at MPIA. Let's do this and find the elusive (literally zero membership, apparently) population-III stars in the Milky Way.


convection, granulation, stellar spectra

I had a great and long conversation today with Maria Bergemann (MPIA) about building a model of a full stellar spectrum out of the models they build of small patches of stellar surface, with full 3D convection and full radiative transfer. Their models are sophisticated, and give a full spectrum in every direction from the surface patch. Thus we can integrate a set of patches into a surrogate combined spectrum for one rotating star covered in convecting patches. We discussed how we might do that, technically, and what projects we might then do with the output. What I want to do (yes, you guessed it) is build data-driven models of stellar granulation to improve radial-velocity surveys.


an insight about machine learning

Gaby Contardo (SISSA) completed her visit to Heidelberg today. Over coffee this morning she delivered a very simple, but very nice insight about machine learning outputs. Apologies that this is very Inside Baseball:

As I like to emphasize, you can't really average (or do any populations inferences with) a collection of labels delivered by a discriminative ML method run on a collection of objects. Think: Finding the mean age of a cluster, where each star in the cluster got an age estimate from a discriminative ML method trained on stars with known ages. This is because the discriminative ML methods output something very akin to posterior quantities, and if you average a bunch of posterior estimates, you are multiplying in a prior times itself many times; eventually the prior dominates the inference (in many cases).

Contardo's point: If what you want is a label for a collection of objects, like that mean age, you should train on collections of objects. That is, make a training set where you have sets of N stars, labeled by mean age. Then this model can be applied to a new collection of stars and deliver a mean age estimate! Haha, brilliant. And correct. And consistent with the rules of inference.


mapping the image plane of a spectrograph

I had a phone conversation about wavelength-calibrating the multi-object APOGEE instrument with Karlo de Leon (NYU) today. He has arc images from each night, and line lists for the arc lamps. But before even using the arc lamps, I recommended that he try to find a model for the 2D images that is an outer product of 1D functions: One is the intensity as a function of wavelength from the arc lamp, and the other is the intensity as a function of slit position from the fibers on the slithead.

The thing we realized in the call is that the coordinate system is right when the image is well described as the outer product of these two functions, warped according to that coordinate system! Okay that's nice, now what will the residuals look like? One issue is that there are cosmic rays, hot pixels, and so on. Another issue is that there will be some vignetting that violates the strict outer-product model. We'll address these issues once we get close.


extreme infrared excesses

Gaby Contardo (SISSA) showed up in Heidelberg today to make progress on our project on infrared excesses in normal, non-young FGK stars. Because we are using NASA WISE data (along with ESA Gaia and NASA 2MASS), we are only sensitive to bright, hot infrared excesses, much hotter and brighter than typical debris disks around old stars. We have some candidates, which range in temperature from 300 to 1500 K and are reprocessing maybe one percent or a fraction of a percent of the stellar light. (Warning: I haven't calculated this; this is just a guesstimate based on looking at plots.) What are those things? Today we figured out that they can't be warm substellar companions, so they have to be dust (I guess??).


a likelihood for our Phi-M radio

There are AM radios and FM radios and (if you are a nerd) PCM radios. But Abby Shaum (CUNY) and I have built a Phi-M radio, which demodulates phase variations in a carrier signal. We (with Keaton Bell, CUNY) are using it to find binary companions and planets around stars that show coherent pulsation modes in their photometry. Today I wrote down a noise model for the output of our demodulator. It isn't completely trivial. But it's good, because we can make a likelihood function for fitting our companions. Our model will end up being a limit of the more general model called Maelstrom by Dan Hey (Hawai'i).


using catalogs responsibly

I had a conversation with Vedant Chandra (Harvard) today about how catalogs are used, and how that relates to how they are built. We started off by arguing about how principled one should be about doing a populations inference. Too abstract! So Chandra moved us in the pragmatic direction: Let's look at a very specific inference and see what matters about it. We decided to look at the distances to distant clusters in the ESA Gaia data: How do your inferences depend on the number of stars you use, the signal-to-noise ratios of those stars, and whether your individual-star measurements are maximum-likelihood or obtained by consideration of a posterior pdf? That should answer questions, and set up some concrete points of discussion.


data-driven information

My day started with a conversation with Wolfgang Brandner (MPIA), who asked me how to figure out the information content of ESA Gaia RVS spectra, but in a data-driven way. He wants to avoid the theoretical models at first; that is, he wants to figure out how precisely the spectra contain temperature and metallicity and age information without having temperatures, metallicities, and ages that we believe. One approach is to compare to other data that are sensitve to temperature, metallicity, and age: If the RVS spectra can predict those data, then (conditioned on assumptions) they must contain information about temperature, metallicity, and age. This is similar to questions of risk (or expected error in prediction) in machine-learning contexts.


wobble, star spots, quasar dipole

[Time to try to re-start this forum.]

I spent this morning on three different small activities. One was giving feedback to Matt Daunt (NYU) who is trying to re-build the wobble concept for stellar radial-velocity measurement in jax. He has annoying optimization issues, which are very hard to diagnose! Optimization is always nasty, in my experience.

Another activity was working on the abstract for Lily Zhao's (Flatiron) upcoming paper on stellar variability in the spectral domain, generated by rotating, spotty stars. She is concerned that the paper is too conceptual. I love conceptual papers! I think science moves forward through concepts and implentations, and no individual paper has to do it all.

My third activity this morning was working through the mathematics on a project of Abby Williams (NYU, Caltech) to measure the kinematic dipole in the all-sky Quaia quasar catalog. There are so many different ways to measure it. I think I have a justifiable likelihood function approach, and one in which we could marginalize out—or profile out—the uncertainties in the selection function we have estimated. It's a controversial subject, so I would like to do things correctly.



My only real work time today came at the end of the day, working with Emily Griffith (Colorado) on our paper about element abundance ratios in the SDSS-V APOGEE data. We find (like others) that the abundances can be explained pretty precisely with only two processes. My position (contrary to others) is that this is not because there are two dominant kinds of supernovae! It is because the disk mixes gas quickly in the azimuthal direction, such that a star's properties are mainly set by the (cosmic) time at which it was born, and the radius at which it was born. If I'm right, then you can't really use disk stars to understand process yields, since they are always an ugly mixture of processes. If I am wrong, there's lots more to do!


Dr Kate Storey-Fisher

Kate Storey-Fisher (NYU) defended her PhD here at NYU today. She killed it! She talked about emulating cosmological simulations (at the level of statistics, not maps), making invariant scalars that encode the shapes and dynamics of dark-matter halos, and her awesome 1.2 million all-sky quasar catalog from ESA Gaia and NASA WISE. It was all things my loyal reader knows lots about but I loved it. It has been an honor and a privilege to work with KSF these years, and I will miss her very very much.


Dr Irina Espejo

Today it was my honor to serve on the PhD defense committee of Irina Espejo (NYU), who is one of the first (ever in the world, actually!) PhDs in Data Science. Her PhD research involved making real, practical, scalable, reproducible tools for the (late-in-pipeline) analysis of high-energy physics data from the Large Hadron Collider. She built tools to speed up likelihood-free inferences, and she built a tool to find exclusion regions (upper limits) in complex parameter spaces. She used the latter to put constraints on a (real, not toy) proposed modification to the standard model.

On the first project, the tools that she built (and built on) make the LHC more sensitive to new physics, because they find better test statistics for distinguishing models. They make some searches far better, which makes me wonder whether particle physics is using our money efficiently??


how to extract XP spectra from raw Gaia data?

On the plane home from meetings at Cambridge, Warwick, and Paris, I worked on a long document I am writing for Gaia DPAC CU5, which is the organization responsible for calibrating and extracting the Gaia XP spectra. They are doing a beautiful self-calibration to extract all the spectra on the same system, in the sense of resolution, dispersion, and throughput. But their system has some pathologies, which we discussed last week. I think I know how to solve some of them. My document is reporting those thoughts.

Writing like this reminds me of graduate school: One of my advisors (Blandford) often encouraged me to write up thoughts, ideas, projects, and proposals, even when we had no intention of submitting them anywhere. It's good practice, I think, because you can't understand anything if you don't write about it.


how to maximize the yield of planets?

There were discussions this week at University of Warwick about the Terra Hunting Experiment strategy and likely detection capability. Various take-homes include that we need to mitigate lots of stellar noise, and that we care deeply about the covariance (as a function of separation in time) of adjacent measurements. I advocated that we split our ten-year survey into two or three surveys, of varying length. In the first, we learn about the stars, and in the last, we go to town on the very most promising targets. There was general agreement that this is a good idea. But now we need a very specific plan for what this means. As my loyal reader knows, in my view, the decisions must be based on repeatable operations, so that we have some hope of learning statistical things about populations in the end.


predicting RVs from SDO imaging

I'm at the Terra Hunting annual Science Working Group meeting, held this year at University of Warwick. There were many great talks today, some technical and some science. My mind was blown by Ben Lakeland (Exeter), who showed Solar Dynamics Orbiter data of the Sun, and then showed that, from these images, he can predict the magnetic-activity-generated RV signals in simultaneous EPRV measurements of the Solar RV. That's pretty exciting. He also showed that much of the time, the RV variations are dominated not by magnetic activity per se. If we are going to beat one meter per second, we are going to have to correct for convective shifts. Somehow!?


distances between point clouds

I spent the last two days working at Apple Paris, which was fun! I worked with the open-source ott-jax package, which can do some amazing things. I worked with Soledad Villar (Apple & JHU) to generalize the k-means algorithm to point clouds! It can cluster point clouds morphologically, even if the different point clouds have different numbers of points, and even if the different point clouds live in spaces of different dimensions! Everything obeys permutation and rotation symmetries.


calibrating Gaia

I was honored today by being invited to a meeting of the group at Cambridge (UK) that extracts and calibrates the low-resolution ESA Gaia XP spectra. The model is bilinear: Each source is represented as a linear sum of basis functions, and each observation of each source has an expectation which is that linear sum multiplied by a convolution kernel, which is also represented as a sum of components. These components are smooth functions of position in the device and wavelength. It's a very nice system! I went through it all with them and said what I would have done differently (which is not much, I have to admit).


extracting DNA

I spent the morning today at a conference for educators and education students at Queens College. It was great! I went to a session on classroom biology, in which we extracted DNA from a strawberry, using only household chemicals (detergent, salt, and alcohol). It was great, and I worked with two really cool lab partners, both Queens College first-year undergraduates.


a well-posed problem in gastrophysics

Magda Siwek (Harvard) gave an execellent NYU Astrophysics Seminar today, about evolution of binary systems when the binary is accreting from a circumbinary disk. She sets a few (just a few) disk parameters, and then sets the mass ratio and eccentricity of the binary, and seeks steady-state (low disk-mass or low accretion-rate) solutions. By ignoring electromagnetic fields and various bits of microphysics, she can create a setup that is completely scale-free, so it applies (approximately) from all scales from exoplanets to super-massive black holes. That's brilliant. She finds that the eccentricities are in general driven to non-zero steady-state values, which depend (strongly) on mass ratio and (maybe weakly) on disk parameters. That's a nice problem, and observationally relevant to projects we are doing right now.


how to publish a data set?

Today, working with Soledad Villar (JHU), I was faced with the age-old problem (okay only 20 years old) of how to publish a substantial data set on the web. I tried pushing to github but it was too big. I guess I could host it myself, but we know how that story ends. I could add it to a machine-learning compilation, maybe? I need to figure that last option out.


AB magnitudes from ESA Gaia

I got frustrated by this ESA Gaia documentation today. People: If you put something in a table, name it in the table the same way you name it in the equations in your paper! And if you explain completely two magnitude systems, then make note of which one you used in the catalog. Anyway, I finally figured out how to convert ESA Gaia magnitudes to and from calibrated flux densities (and therefore AB magnitudes) after re-reading the documentation a few times. This is for my quasar homogeneity projects with Abby Williams (NYU) and Kate Storey-Fisher (NYU).


water forces on bacteria

Today we had a really great colloquium by Ned Wingreen (Princeton), about water forces on bacteria and how communities of bacteria can be seen as an active material. He showed theory and data for simple experiments in which they can change the osmotic pressure on a wet surface where bacteria are moving. They can tune the water-driven forces on the bacteria and change their behaviors.

After that, at wine and cheese, David Grier (NYU) showed me (and lots of students) a home-built device that levitates (or really traps) tiny objects using acoustic waves. It was awesome.


coordinate-free reading?

The world is O(3) equivariant. Meaning: The laws of physics don't depend on the orientations of things, nor do they depend on the orientation of your coordinate system. But handwriting—and printed words—are not equivariant: Writing systems have a definite orientation and parity. Indeed, it can be hard to read things when they are reversed in a mirror or at an odd angle. Pick up a paper from your desk and read it. Before you start, you have to orient it. How do you do that?

My answer is: Context. I think you try different orientations until one seems to work for the reading. You can't always tell from a single letter (like an M or a W or an O), but you can tell once a string of a few letters or numbers are visible. Inspired by all this, Villar and I are making this data set (among others) for learning and reasoning tasks:


something in astronomy is wrong

Today I chatted with my old friend Phil Marshall (SLAC) about various things. Well actually I ranted at him about catalogs and how their use is related to the way they are made. He was sensible in reply. He suggested that we write something, and maybe also develop some guidance for the NSF LSST developer community. His recommendation was to create some examples that are simple but also that connect obviously to research questions in the heads of potential readers. Easier said than done! I said that any such paper needs a good title and abstract. He pointed out that this is true of every paper we ever write! Okay fine.


Are there young, alpha-rich stars?

I asked this question in Data Group meeting: With Emily Jo Griffith (Colorado) and I have a data-driven nucleosynthetic story for essentially every red-giant-branch star in the SDSS-IV APOGEE survey. Since the parameters of this model relate to the build-up of elements over time, they might be used to indicate age. We matched to the NASA Kepler asteroseismic sample and indeed, our nucleosynthetic parameters do a very good job of predicting ages.

On the RGB, age is mass, and the asteroseismology gives you masses, not ages. There are some funny outliers: Stars with large masses, which means young ages, but with abundances that strongly indicate old ages. Are they young or old? I am betting that they are old, but they’ve undergone mass transfer, accretion, or mergers. If I’m right, what should we look for? The Data Group (plus visitors) suggest looking for binarity, for vertical action (indicating age), for ultraviolet excess (indicating white dwarf companion), for abundance anomalies, and Gaia RUWE. Will do! My money is that all these stars are actually old.



Today I went to the L2G2 (Local Local Group Group) meeting at Columbia. This meeting started with a dozen of us around a table and is now 50 people packed into the Columbia Astronomy Library! A stand-out presentation was by Grace Telford (Rutgers), who showed beautiful spectroscopy of low-metallicity O stars. From their spectral features and (in one case) surrounding H2 region, she can calculate their production of ionizing photons. This is all very relevant to the high redshift universe and reionization. Afterwards, Scott Tremaine (IAS) argued that The Snail could be created by random perturbations, not just one big interaction.


two-dimensional disk?

I had a great set of conversations today with Griffiths and Eisner about the relationship between the idea that the abundance patterns in stars in the disk look very two-dimensional and the idea that the formation of stars in the disk might depend only on birth radius and birth time. If these two things are related, it creates all sorts of new projects for measuring stellar ages and for reconstructing the formation history of the disk.


scope of a paper

Emily J Griffith (Colorado) and I have been working on a two-process (or really few-process) model for the creation of the elements, fit to the abundances measured in the APOGEE survey. Our big conversation this week has been about the scope for our first paper: We have so many results and ideas we don’t know how to cut them into papers. Today we made a tentative scope for paper one: we’ll explain the model, deliver a huge catalog of abundance information, and demonstrate the usefulness for practitioners of Galactic archaeology. Then, later, we can actually do that archaeology!


a blackboard talk on orbital torus imaging

I gave the brown bag talk (chalk only) today at the NYU Center for Cosmology and Particle Physics. I spoke about torus imaging—using moments of the abundance distribution to measure or delineate the orbits in the Galaxy. I focused on the theory of dynamics and what it looks like if you can insert new invariants. The questions were great, including hard ones about non-equilibrium structures, radial migration, and chaos. All these things matter! Talking at a board in front of a skeptical, expert audience is absolutely great for learning about one's own projects, communication, and thinking.


causal structure in ML

Today I had the honor on serving on the PhD advising committee of Yan Liang (Princeton), who is designing her PhD project. She is adding causal structure to an autoencoder such that it can separate stellar variability-induced radial-velocity signals from exoplanet-induced signals in extreme precision radial-velocity data. Her method design is novel, and tests suggest that it might work. The committee recommended adding even more causal structure and physics knowledge (more is probably always better, provided that it isn’t incorrect)! As my loyal reader knows, I think this is the frontier for machine learning in the natural sciences: adding causal structure.


stars don't orbit at their guiding radii!

A subject of conversation all week has been about stellar orbits in the Milky Way disk, driven by some visualizations made by Adrian Price-Whelan (Flatiron). We often describe the azimuthal action or the z component of the angular momentum (L_Z or J_Phi) of a disk star in terms of the guiding radius, or the radius of a circular orbit of the same azimuthal action. The idea is: If the star has radial action, it will oscillate around the guiding radius as it orbits. Wrong!! If the vertical action is comparable to or larger than the radial action (and that’s typical), the star will orbit outside the guiding radius, always. The trivial picture is simply incorrect.


purity and completeness

Today Kate Storey-Fisher (NYU) and I discussed how to estimate the stellar contamination of her Gaia and WISE quasar catalog. Because there are few large, complete samples of anything, it’s hard to do this by comparison with any kind of Ground Truth™. What we realized on the call is that it’s easier to estimate how the contamination *changed* as we went from the Gaia quasar candidate table to our final sample. We discussed how to use what external data we have to estimate this.


CMB component separation with linear fitting

Today I sat down with Fiona McCarthy (Flatiron) to look at data-driven methods for separating cosmic microwave background data into different components. We implemented a simple polynomial regression to fit foregrounds, using (observed) difference maps as inputs (features) that are designed to contain foregrounds only. We obtained some preliminary results that looked exciting but we’ve only just started. Part of the motivation is that CNNs are hard to train, but linear combinations of image monomials are easy! I realized in all this that there are connections to the group-equivariant stuff I’ve done with Villar’s group, because we use invariants, and also to the causal inference things that Schölkopf’s group does, because we’re trying to impose some causal structure on our functions.



Imagine that you want to read all the text in an arbitrary image of the world. That text will lie at different locations, rotations, shears, and even reflections (think signage painted on windows! or mirrors!) in an image. As training for a baby problem in this area I made this training set today.


catalogs rant

Should I write this paper?

Abstract: Observational astronomy projects often produce catalogs—of stars, galaxies, quasars, planet hosts, and so on—for use in other projects. How can we use these catalogs responsibly? The answer to this turns out to be complex; it depends sensitively on how the catalogs were made. In particular, if the catalog entries were obtained by operations on a set of (nearly) independent or separable likelihood functions, the catalog can be used in a much wider set of circumstances than if the catalog entries were obtained by operations on a posterior pdf or on likelihood functions involving important shared parameters or shared data or shared prior information. This is true no matter whether the subsequent analyses of the catalog are Bayesian or frequentist. Importantly, at the present day, many important catalogs are being made from the outputs of MCMC runs or discriminative machine-learning methods (classifications or regressions). These catalogs are very hard or even impossible to use for population studies. I demonstrate these points mathematically, and also with toy examples from comology, stars, and exoplanets. I recommend that catalogs be designed and made with the feasibility of particular end-user investigations as explicit requirements.


abundance moments are the new actions

I had fun today talking to Neige Frankel (CITA) about all things Snail-y. We discussed how to verify the stellar parameters we are using for our Snail studies. One issue is that we want to check that things are (fairly) well mixed along orbits, but we need a theory of the orbits to check this. I recommended that, instead of computed actions (integrals of motion), we use statistics of the abundance distribution. After all, if the stars are well mixed, moments of the abundance distribution ought to be constant on orbits. If you just need the actions to label the orbits, abundance moments serve as replacements. Actions are theoretical and unobservable. Abundance moments are observable and measurable (noisily maybe!).


reproducibility; reionization

Today featured a blackboard talk by Soltan Hassan (NYU), about a semi-analytic model to explain the various bits of data we have about the reionization of the universe at redshifts around 7. The model is baroque, but there are no options when it comes to problems that are deep in gastrophysics.

After lunch I spent an hour on a panel organized by NYU Libraries about reproducibility in the natural sciences. That was fun; so many ideas! One interesting idea is that it is transparency, more than reproducibility, that is important. Another was a technical suggestion: If you want your students to be good at making reproducible code, they shouldn't bring you plots of their results, they should bring you code that you can run to make those plots! Haha, genius.


nulling the CMB

I had a fun conversation today with Fiona McCarthy (Flatiron) and Colin Hill (Columbia) about combining CMB maps that have been contaminated (by God) with foregrounds. The issue is that any machine-learning method for finding combination weights will deliver weights that are covariant with the true CMB signal, and thus bias the results importantly. We figured out that there are linear combinations of the maps that will have (by design) zero CMB signal in them! If we train our machine-learning method using those, it can't be sensitive to the CMB signal itself? Will it work? We'll see.


guarantees on diffusion

Today at JHU I went to the joint group meeting of Jeremias Sulam (JHU) and Soledad Villar (JHU) in which Jacopo Teneggi (JHU) showed guarantees on correctness for some diffusion-based de-noising schemes for images. The quarantees are a bit weak, because it is hard to put a hard boundary on coverage in image space! But essential for medical applications (the illustrated domain). I have to say, it was nice to see a rigorous approach to errors from machine-learning methods. I think this is necessary for our uses in cosmology, where we are expecting machine-learning emulators to be as accurate as simulations!?!


nerve-wracking talk

I spent a ride down to Baltimore preparing a talk for mathematicians. That's outside my comfort zone. I gave the talk with the MINDS institute at Johns Hopkins at lunchtime. It was about passive symmetries, active symmetries, classical physics, and machine learning. There was no math. They only asked me, in the end, a few questions I couldn't answer. I hypothesized that the differences between passive and active symmetries is that the latter are statements about interventions.


thermodynamics of cosmic gas

The day ended today at Flatiron with a great Colloquium by Eichiro Komatsu (MPA) about the temperature of cosmic gas. Gravitational collapse heats the gas, and that takes it up to something like 2 million degrees. This was computed ages ago by Peebles and others, but is now measured. There was a lot of discussion during and after about other heating mechanisms, and what things constitute gravitational heating. I'm interested in whether this result meaningfully constrains scattering interactions between the dark matter and baryons; if they scatter, and the dm is heavier, the baryons will (eventually) get exceedingly hot.


citing things

I spent a big part of today working on finishing up a paper with Megan Bedell (Flatiron). My job was to fill in missing references. I'm still not efficient at this, more than 30 years in to my astronomy career.



Once a year (and differently every year), we get together as much of the astronomical community in New York City as we can and have them give fast talks. Today was great! I learned a huge amount, and no highlight reel would do. But here are some examples: Amanda Quirk (Columbia) has great data on M33 stars that maybe we could use to build images of the orbital toruses using technology that Price-Whelan and I developed over the last few years? Marc Huertas-Company (Paris) said (confidently?) that many of the star-forming galaxies found by JWST at very high redshift are likely prolate. Michael Higgins (CUNY) and Keaton Bell (CUNY) have a beautiful system to separate sources of variability out in NASA TESS data using structure in frequency space. Kate Storey-Fisher (NYU) showed results from Giulio Fabbian cross-correlating her ESA Gaia quasar sample with the ESA Planck lensing map, with better error bars than any previous survey! Ben Cassese (Columbia) showed a moving-object pipeline with NASA TESS imaging that detects outer Solar System objects, much like old work by Dustin Lang and myself.


doing cosmology differently

Today Chirag Modi (Flatiron) gave a really great lunchtime talk about new technologies in cosmology and inference or measurement of cosmological parameters. He beautifully summarized how cosmology is done now (or traditionally): Make summary statistics of the observables, make a theory of the summary statistics, make up a surrogate likelihood function for use in inference, measure covariance matrices to use in the latter, and go. He's trying to obviate all of these things by using the simulations directly to make the measurements. He has nice results in forward modeling of the galaxy field, and in simulation-based inferences. Many interesting things came up in his talk, including the idea that I have discussed over the years with Kate Storey-Fisher (NYU) of enumerating all possible cosmological statistics! So much interesting stuff in the future of large-scale structure.


defining passive and active symmetries

What is a passive symmetry, and what is an active symmetry? I think I know: A passive symmetry is a symmetry that emerges because there are choices (like coordinate system, units system, gauge choice) in the representation of the data. An active symmetry is a symmetry that is observed to be there (like energy conservation). The passive symmetries are true by definition or by construction. THe active symmetries are subject to empirical test. Today Soledad Villar and I spent time talking about a truly formal definition in terms of commutative diagrams.


publication and collaboration policies

I spent some time in travel working on ideas for the Terra Hunting Experiment's publication, collaboration, and data-release policies. Megan Bedell (Flatiron) and I are not doing this in any official capacity; we are just brainstorming things that might be a good idea. One theme of our comments is that we want to make sure that the rules very strongly incentivize participation in the project by postdocs and students, who often don't have long enough time horizons to be at one institution for the full scientific arc of a project in this space. Another theme of our ideas is transparency: The Sloan Digital Sky Survey rules do a lot with transparency, and it works well there. When things are transparent to all, you often need fewer rules, because transparency leads to constructive, inclusive discussions.


is it possible to write a conceptual ML paper?

With Schölkopf (MPI-IS) and Villar (JHU) and others I am trying to write a conceptual paper about the structure of machine-learning methods. Physicists love conceptual papers! But the ML literature is all about performance of implemented methods. That makes it hard to write a conceptual paper. Referees expect to see performance that beats SOTA on some problem (at least a toy problem). I'm struggling.