Today Soledad Villar (JHU) and I discussed the posssibility of building something akin to a graph neural network, but that takes advantage of the n log(n) scaling of a fast multipole method hierarchical summary graph. The idea is to make highly connected or fully connected graph neural networks fast through the same trick that the FMM works: By having nearby points in the graph talk precisely, but have distant parts talk through summaries in a hierarchical set of summary boxels. We think there is a chance this might work, in the context of the work we are doing with Weichi Yao (NYU) on gauge-invariant graph neural networks. The gauge invariance is such a strict symmetry, it might permit transmitting information from distant parts of the graph through summaries, while still preserving full (or great) generality. We have yet to figure it all out, but we spent a lot of time drawing boxes on the board.
I wrote ten days ago about a bimodality in the lower main sequence that Hans-Walter Rix (MPIA) found a few weeks ago. I sent it to some luminaries and my very old friend John Gizis (Delaware) wrote back saying that it might be issues with the ESA Gaia photometry. I argued back at him, saying: Why would you not trust the Gaia photometry, it is the world's premier data on stars? He agreed, and we explored issues of stellar variability, spectroscopy, and kinematics. But then, a few days later, Gizis pointed me at figure 29 in this paper. It looks like we just rediscovered a known data issue. Brutal! But kudos to Gizis for his great intuition.
I showed the Astronomical Data Group meeting the bifurcation in the lower main sequence that Hans-Walter Rix (MPIA) found a few weeks ago. Many of the suggestions from the crew were around looking at photometric variability: Does one population show different rotation or cloud cover or etc than the other?
In large-scale-structure projects, when galaxy (or other tracer) clustering is measured in real space, the computation involves spatial positions of the tracers, and spatial positions of a large set of random points, distributed uniformly (within the window function). These latter points can be thought of as a comparison population. However, it is equally true that they can be thought of as performing some simple integrals by Monte Carlo method. If you see them that way—as a tool for integrating—it becomes obvious that there must be far better and far faster ways to do this! After all, non-adaptive Monte Carlo methods are far inferior to even stupidly adaptive schemes. I discussed all this with Kate Storey-Fisher (NYU) yesterday and today.
I wrote like mad in the paper that describes what Adrian Price-Whelan (Flatiron) and I are currently doing to estimate stellar distances using SDSS-IV APOGEE spectra (plus photometry). I wrote a long list of assumptions, with names. As my loyal reader knows, my position is that if you get the assumptions written down with enough specificity, the method you are doing becomes the only thing you can do. Or else maybe you should re-think that method?
Adrian Price-Whelan (Flatiron) and I are working on data-driven distances for stars in the SDSS-IV APOGEE data. There are many hyper-parameters of our method, including the number K of leave one-Kth-out splits of the data, the regularization amplitude we apply to the spectral part of the model (it's a generalized linear model), and the infamous Gaia parallax zero-point. These are just three of many, but they span an interesting range. One is purely operational, one restricts the fit (introduces bias, deliberately), and one has a true value that is unknown. How to optimize for each of these? It will be different in each case, I expect.
I did some actual, real-live sciencing this weekend, which was a pleasure. I plotted a part of the lower-main sequence in ESA Gaia data where Hans-Walter Rix (MPIA) has found a bimodality that isn't previously known (as far as we can tell). I looked at whether the two different kinds of stars (on each side of the bimodality) are kinematically different and it doesn't seem like it. I sent the plots to some experts to ask for advice about interpretation; this is out of scope for both Rix and me!
Gaby Contardo (Flatiron) showed me an amazingly periodic star from the NASA Kepler data a few days ago, and today she showed me the results of trying to predict points in the light curve from prior points in the light curve (like in a recurrent method). When the star is very close to periodic, and when the region of the star used to predict a new data point is comparable in length to the period or longer, then even linear regression does a great job! This all relates to auto-regressive processes.
After a couple of days of hacking and data munging—and looking into the internals of Jax—Adrian Price-Whelan and I produced stellar distance estimates today for a few thousand APOGEE spectra. Our method is based on this paper on linear models for distance estimation with some modifications inspired by this paper on regression. It was gratifying! Now we have hyper-parameters to set and valication to do.
Today Katie Breivik (Flatiron) asked me some technical questions about the bolometric correction. It's related to the difference between a relative magnitude in a bandpass and the relative magnitude you would get if you were using a very (infinitely) broad-band bolometer. Relative magnitudes are good things (AB magnitudes, in contrast, are bad things, but that's for another post): They are relative fluxes between the target and a standard (usually Vega). If your target is hotter than Vega, and you choose a very blue bandpass, the bandpass magnitude of the star will be smaller (relatively brighter) than the bolometric magnitude. If you choose a very red bandpass, the bandpass magnitude will be larger (relatively fainter) than the bolometric magnitude. That's all very confusing.
And bolometric is a horrible concept, since most contemporary detectors are photon-counting and not bolometric (and yes, that matters: the infinitely-wide filter on a photon-counting device gives a different relative magnitude than the infinitely-wide filter on a bolometer). I referred Breivik to this horrifying paper for unpleasant details.
Adrian Price-Whelan (Flatiron) and I used the new fiNUFFT non-uniformly-sampled fast Fourier tranform code to build a low-pass filter for stellar spectra today. The idea is: There can't be any spectral information in the data at spectral resolutions higher than the spectrograph resolution. So we can low-pass filter in the log-wavelength domain and that should enforce finite spectral resolution. The context is: Making features to use in a regression or other machine-learning method. I don't know, but I think this is a rare thing: A low-pass filter that doesn't require uniformly or equally-spaced sampling in the x direction or time domain.
Soledad Villar (JHU) and I spent some time today constructing (on paper) a model to learn simultaneously from real and simulated data, even when the simulations have large systematic problems. The idea is to model the joint distribution of the real data, the simulated data, and the parameters of the simulated data. Then, using that model, infer the parameters that are most appropriate for each real data point. The problem setup has two modes. In one (which applies to, say, the APOGEE stellar spectra), there is a best-fit simulation for each data example. In the other, there is an observed data set (say, a cosmological large-scale structure survey) and many simulations that are relevant, but don't directly correspond one-to-one. We are hoping we have a plan for either case. One nice thing is: If this works, we will have a model not just for APOGEE stellar parameter estimation, but also for the missing physics in the stellar atmosphere simulations!
Gaby Contardo (Flatiron) and I have been trying to construct a project around light curves, time domain, prediction, feature extraction, and the arrow of time, for months now. Today we decided to look closely at a catalog of stellar flares (which are definitely time-asymmetric) prepared by Jim Davenport (UW). Can we make a compact or sparse representation? Do they cluster? Do those properties have relationships with stellar rotation phase or other context?
One of my jobs at NYU is as an advisor to student screenwriters who are writing movies that involve science and technology. I didn't get much research done today, but I had a really interesting and engaging conversation with film-writers Yuan Yuan (NYU) and Sharon Lee (NYU) who are writing a film that involves the Beijing observatory, the LAMOST project, and the Cultural Revolution. I learned a lot in this call!
Imagine you have $n$ measurements of a quantity $y$. What is your best estimate of the value of $y$? It turns out that if you have an estimate for the covariance matrix of $y$, the information in (expected inverse variance from) your $n$ data points is given by the sum of the entries of the inverse of that covariance matrix. This fact is obvious in retrospect, but also confused me, since this is such a non-coordinate-free thing to do to a matrix!
Lauren Anderson (Carnegie) and I had a wide-ranging conversation today. But part of it was about the dust map: We have a project with statisticians to deliver a three-dimensional dust map, using a very large Gaussian-process model. Right now the interesting parts of the project are around model checking and model elaboration: How do you take a model and decide what's wrong with it, in detail. Meaning: Not compare it to other models (that's a solved problem, in principle), but rather, compare it to the data and see where it would benefit from improvement.
One key idea for model elaboration is to check the parts of the model you care about and see if those aspects are working well. David Blei (Columbia) told us to climb a mountain and think on this matter, so we did, today. We decided that our most important goals are (1) to deliver accurate extinction values to stellar targets, for our users, and (2) to find interesting dust structures (like spiral arms) if they are there in the data.
Now the challenge is to convert these considerations into posterior predictive checks that are informative about model assumptions. The challenge is that, in a real-data Bayesian inference, you don't know the truth! You just have your data and your model.
I really solidly did actual coding today on a real research problem, which I have been working on with Megan Bedell (Flatiron) for a few years now. The context is: extreme precision radial-velocity surveys. The question is: Is there any advantage to taking one observation every night relative to taking K observations every K nights? I succeeded!
I can now show that the correlations induced in adjacent observations by asteroseismic p-modes makes it advantageous to do K observations every K nights. Why? Because you can better infer the center-of-mass motion of the star with multiple, coherently p-mode-shifted observations. The argument is a bit subtle, but it will have implications for Terra Hunting and EXPRES and other projects that are looking for long-period planets.
I had a conversation with Jacob Bean (Chicago) and Ben Montet (UNSW) about various radial-velocity projects we have going. We spent some time talking about what projects are better for telescopes of different apertures, and whether there is any chance the EPRV community could be induced to work together. I suggested that the creation of a big software effort in EPRV could bring people together, and help all projects. We also talked about data-analysis challenges for different kinds of spectrographs. One project we are going to do is get a gas cell component added in to the wobble model. I volunteered Matt Daunt (NYU) in his absence.
I had a call with part of the HARPS3 team today, the sub-part working on observations of the Sun. Yes, Sun. That got us arguing about asteroseismic modes and me claiming that there are better approaches for ameliorating p-mode noise in extreme precision radial-velocity measurements than setting your exposure times carefully to null the modes. The crew asked me to get specific, so I had a call with Bedell (Flatiron) later in the day to work out what we need to assemble. The issues are about correlated noise: Asteroseismic noise is correlated; those correlations can be exploited for good, or ignored for bad. That's the argument I have to clearly make.