Today I helped Abby Shaum (NYU) finish her poster for the #AAS235 Meeting on using signal processing—and in particular phase demodulation—to find binary companions to stars with coherent asteroseismic modes in the NASA Kepler Mission data. Our abstract ended up being different from what we submitted! I hope no-one minds. But she has a beautiful result, and a pretty mechanistic method to search for binaries. Now our next thing is to figure out how to generalize the method to modes that aren't coherent over the full mission lifetime. If we can do that, there is a lot more we can find, I suspect.
My only real research today is that I got a bit more writing done for Adrian Price-Whelan's paper on binary stars in APOGEE DR16. I worked at the top of the paper (introduction) and the bottom (discussion), my two favorite places!
Adrian Price-Whelan (Flatiron) has a draft of a paper I'm pretty excited about: It is running The Joker, our custom sampler for the Kepler two-body problem, on every star (that meets some criteria) in the APOGEE DR16 sample. This method treats binary-companion discovery and binary-companion orbit characterization as the same thing (in the spirit of things I have done in exoplanet discovery and characterization). We find and characterize confidently 20,000 binaries across the H–R diagram with lots of variations of binary fraction with temperature, logg, and luminosity. And we release samplings for everything, even the things that don't obviously have companions. Well it doesn't count as research if I'm merely excited about a paper! But I did do a tiny bit of writing in the paper, making it my first research day in a while. Yesterday I tweeted this figure from the manuscript.
My former student Alex Malz (Bochum) has been back in town to finish one of the principal papers from his thesis, the CHIPPR method for combining probabilistic outputs from cosmology surveys. Our first target was redshift probability distributions: Many surveys are producing not just photometric redshift estimates, but full probability distributions over redshift. How to use these? Many ways people naively use them (like stack them, or bin them) are incorrect, probabilistically. If you want a redshift distribution, you need to perform a hierarchical inference using them. This, in turn, requires that you can convert them into likelihoods. That's not trivial for many reasons. Our first paper on this is somewhat depressing, because of the requirements both on the redshift producer, and on the redshift user. However, it's correct! And performs better than any of the unjustified things people do (duh). I signed off on the paper today; Malz will submit during the break.
Last night or this morning (not sure which) Lily Zhao (Yale) made some nice discoveries about the EXPRES instrument: She found that the housekeeping data (various temperature sensors, chilled water load, etc) correlate really well with the instrument calibration state as defined by our hierarchical (principal-component) model. That's interesting, because it opens up the possibility that we could interpolate the non-parametric calibration parameters in the housekeeping space rather than the time space. That would be cool. Indeed, the time is just a component of the housekeeping data!
In the afternoon, Kathryn Johnston (Columbia) hosted a group of people for the L2G2—the local Local Group group—for talks and discussions. There were many good discussions, led by Jason Hunt (Flatiron). Highlights for me included, first, work on detailed abundances with the cannon by Adam Wheeler (Columbia), who did a great job of describing (and re-implementing) the method, and extending it to do better things. Another was a talk on SAGA by Marla Geha, who showed that the Local Group satellites might be group satellites rather than galaxy satellites. This in the sense that they look more like the satellites of a more massive object than satellites of Milky Way or M31 analogs. It was a great day of great discussions.
at Stars & Exoplanets meeting today, Debra Fischer (Yale) and Ryan Petersburg (Yale) came in to tell us about the EXPRES instrument for extreme precision radial-velocity and the Hundred Earths long-term observing project. Fischer talked about how they do their RV analysis in spectral chunks, and the issue (and maybe the issue of EPRV in general) is how to combine those chunks, such that the most informative chunks get the most weight but such that the outliers get captured. In the discussion we also discussed micro-tellurics which might need to be modeled or masked. Petersburg discussed the question of how you define signal-to-noise ratio for a spectrum. It's not a well-posed problem! But it matters, because the contemporary instruments expose until they hit some SNR threshold.
I checked in with Megan Bedell (Flatiron) on our projects today. She showed really nice results in which she fits simulated radial-velocity data for a star that is oscillating in finite-coherence asteroseismic modes. My loyal reader knows that we have been working on this! But the cool thing is that she can mow fit the oscillations with a Gaussian Process with a kernel that is roughly correct, or exactly correct, even when the observations are integrated over finite exposure times. That's a breakthrough. It depends in large part on the magic of exoplanet.
Now GPs are extremely flexible, so the question is: How to validate results? After all, any GP can thread through any set of points. We came up with two schemes. The first is a N-fold cross-validation, in which we train the GP on all but 1/N-th of the data and then predict that 1/N-th, and cycle to get everything. First experiments along these lines seem to show that the more correct the kernel, the better we predict! The second is that we make fake data that includes the p-modes and a simulated planet companion. We show that our planet-companion inferences become more accurate as our kernel becomes more accurate.
We're hoping to improve on the results of this paper on p-mode mitigation. My conjecture is that when we use an accurate GP kernel, we will get exoplanet inferences at any exposure time that are better than one gets using the quasi-optimal exposure times and a “jitter” term to account for the residual p-mode noise.
It's finals week, so not much research. But I did get in a useful conversation with Soledad Villar (NYU) about the definition of a tensor (as opposed to a matrix, say). She had a different starting point than me! But we converged to similar things. You can think of it in coordinate-free or coordinate-transform ways, you can think of it in operator terms, and you can think of it as being composed of sums of outer products of vectors. I always thought of a tensor as being a ratio of vectors, in some sense, but that's a very hand-wavey idea.
Today was the final day of Machine Learning Tools for Research in Astronomy. I gave my talk, which was about causal structure. One thing I talked about is the strong differences (stronger than you might think) between generative and discriminative directions for machine learning. Another thing I talked about is the way that machine-learning methods can be used to denoise, deconvolve, and separate signals when they are designed with good causal structure.
Right after me, Timmy Gebhard (MPI-IS and ETH) gave an absolutely excellent talk about half-sibling regression ideas related to instrument calibration (think Kepler, TESS, and direct imaging). He beautifully explained exoplanet direct imaging and showed how his improvements to how they are using the data change the results. He doesn't have the killer app yet, but he is spending the time to think about the problem deeply. And if he switched from one-band direct imaging to imaging spectroscopy (which is the future!) I think his methods will kill other methods. He also spoke about the causal-inference philosophy behind his methods really well.
My talk slides are here and I also led the meeting summary discussion. My summary slides are here. The summary discussion was valuable. In general, the size and style of the meeting—and location in lovely Ringberg Castle—led to a great environment and culture at the meeting. Plus with great social engineering by Ntampaka, Nord, Pillepich, and Peek. The latter made progress on a community-driven set of Ringberg Recommendations, which might end up as a long-term outcome of the meeting.
Today was the fourth day of Machine Learning Tools for Research in Astronomy. Some of my personal highlights were the following:
Tomasz Kacprzak (ETH) showed that he can improve the precision of cosmological inferences by using machine learning to develop new statistics of cosmological surveys to compare to simulations. His technology is nearly translationally invariant, but not guaranteeed to be perfect, and not guaranteed to be rotationally symmetric (or rotationally and translationally covariant, I should say). So I wondered if any of the increased precision he showed might be coming from properties of the data that are not consistent with our symmetries. That is, precision might increase even if the features being used are not appropriate, given our assumptions. I'd love to have a playground for thinking about this more. It relates to ideas of over-fitting and adversaries we discussed earlier in the week.
Luisa Lucie-Smith (UCL) showed some related things, but more along the lines of finding interpretable latent variables that bridge the connection between the cosmological initial conditions and the dark-matter halos that are produced. I love that kind of approach! Can we use machine learning to understand the systems better? Her talk led to some controversy about how autoencoders (and the like) could be structured for interpretability. As my loyal reader knows, I don't love the “halo” description of cosmology; this could either elucidate it or injure it.
Doug Finkbeiner (Harvard) showed how he can patch and find outliers in massive spectroscopy data sets using things that aren't even machine learning, according to many in the room! That was fun, and probably very useful. This all connects to a theme of the meeting so far, which is using machine learning to aid in visualization and data discovery.
In between sessions we had a great conversation about student mentoring. This is a great idea at a workshop, where there are both students and mentors, and the participants have gotten to know one another. Related to this, Brian Nord (Fermilab) gave a nice talk about relationships between what we have been thinking about in machine learning and work in the area of science, technology, and society. He's trying to build new scientific communities, but in a research-based way. And I mean social-science-research-based. That's radical, and more likely to succeed than many of the things we physicists do without looking to our colleagues.
Today was the third day of Machine Learning Tools for Research in Astronomy. Some random personal highlights were the following:
Prashin Jethwa (Vienna) showed first results from a great project to model 2-d imaging spectroscopy of galaxies as a linear combination of different star-formation histories on different orbits. People have modeled galaxies as superpositions of orbits. And as superpositions of star-formation histories. But the problem is linear in both, so much more can be done. My only criticism (and to be fair, Jethwa made it himself) is that they collapse the spectroscopy to kinematic maps before starting. In my opinion, the most interesting information will be in the spectral–kinematic joint domain, because different lines (which are sensitive to different star-formation histories and different element abundance ratios) will have different shapes in different parts of each galaxy. Exciting that this is happening now; it has been a dream for years.
Francois Lanusse (Berkeley) and Greg Green (MPIA) both gave talks that were aligned strongly with my interests. Lanusse is taking galaxy generators (like VAEs and GANs) and adding causal structure (like projection onto the sky plane, pixelization, and noisification) so that the generators produce something closer to the true galaxies, and something not exactly the same as the data. That's exciting and a theme I have been talking about in this forum for a while. For Lanusse, galaxies are nuisances, to be marginalized out as we infer the cosmic weak-lensing shear maps.
In a completely different domain, Green is modeling stars as coming from a compact color—magnitude diagram but then being reddened and attenuated by dust. He is interested in the dust not the stars, so for him the CMD is a nuisance, as galaxies are for Lanusse. That makes it a great object to model with something ridiculous, like a neural net. He is living the dream I have of using the heavy learning machinery only for the nuisance parts of the problem, and reserving belief-consistent physical models for the important parts. Green was showing work that is only a few days old! But it looks very very promising.
Today was the second day of Machine Learning Tools for Research in Astronomy. Two personal highlights were the following:
Soledad Villar (NYU) spoke about adversarial attacks against machine-learning methods used in astrophysics. Her talk was almost entirely conceptual; she talked about what constitutes a successful attack, and how you find it. My expectation is that these attacks will be very successful, as my loyal reader knows! The examples she showed were from stellar spectroscopy. Her talk was interrupted and followed by extremely lively discussion, in which the room disagreed about what the attacks mean about a method, and whether they are revealing or important. That was some fun controversy for the meeting.
Tobias Buck (AIP) looked at methods to translate from image to image (like horse to zerbra!) but in the context of two-d maps of galaxies. Like translate from photometric images into maps of star formation and kinematics. It's a promising area, although the training data are all simulations at this point. I asked him whether he could translate from a two-d galaxy image into a three-d dark-matter map. He was skeptical, because the galaxy is so much smaller than its dark-matter halo.
At one of the coffee breaks, Josh Peek (STScI) proposed that we craft some kind of manifesto or document that helps practitioners in machine learning in astronomy make good choices, which are pragmatic (because it is important that machine learning be used and tried) but also involve due diligence (to avoid the “just throw machine learning at it” problem in some of the literature). He had the idea that we have the right people at this meeting to make something like this happen. I noted that we tried to do things like that in our tutorial paper on MCMC sampling, where we try to both be pragmatic but also recommend achievable best practices. The challenge is to be encouraging and supportive, but also draw some lines in the sand.
Today was the first day of Machine Learning Tools for Research in Astronomy in Ringberg Castle in Germany. The meeting is supposed to bring together astronomers working with new methods and also applied methodologists to make some progress. There will be a different mix of scientific presentations, participant-organized discussions, and unstructured time. Highly biased, completely subjective highlights from today included the following:
Michelle Ntampaka (Harvard) showed some nice results on using machine-learning discriminative regressions to improve cosmological inferences (the first of a few talks we will have this week along these lines). She emphasized challenges and lessons learned, which was useful to the audience. Among these, she emphasized the value she found in visualizing the weights of her networks. And she gave us some sense of her struggles with learning rate schedule, which I think is probably the bane of almost every machine learner!
Tom Charnock (Paris) made an impassioned argument that the outputs of neural networks are hard to trust if you haven't propagated the uncertainties associated with the finite information you have been provided about the weights in training. That is, the weights are a point estimate, and they are used to make a point estimate. Doubly bad! He argued that variational and Bayesian generalizations of neural networks do not currently meet the criteria of full error propagation. He showed some work that does meet it, but for very small networks, where Hamiltonian Monte Carlo has a shot of sampling. His talk generated some controversy in the room, which was excellent!
Morgan Fouesneau (MPIA) showed how the ESA Gaia project is using ideas in machine learning to speed computation. Even at one minute per object, they heat up a lot of metal for a long time! He showed that when you use your data to learn a density in the data space for different classes, you can make inferences that mitigate or adjust for class-imbalance biases. That's important, and it relates to what Bovy and I did with quasar target selection for SDSS-III.
Wolfgang Kerzendorf (Michigan State) spoke about his TARDIS code, which uses machine learning to emulate a physical model and speed it up. But he's doing proper Bayes under the hood. One thing he mentioned in his talk is the “expanding photosphere method” to get supernova distances. That's a great idea; whatever happened to that?
I arrived in Heidelberg today, for a fast visit before a week at Ringberg Castle for a meeting on machine learning and astrophysics. I had two long conversations about science projects, one with Neige Frankel (MPIA), and one with Greg Green (MPIA). Frankel is trying to make comprehensive maps of the Milky Way with red-clump stars from SDSS-IV APOGEE and ESA Gaia data. She is using a big linear model to calibrate the distance variations with color, magnitude, and dust. But it seems to have problems at high reddening. We found that some of those problems were an artifact of sample cuts based on Gaia uncertainties. My loyal reader knows that such cuts are dangerous! But still some odd problems remain: Bugs or conceptual issues?
Green is trying to infer the dust extinctions to stars as a function of three-dimensional position in the galaxy with reduced or no dependence on models of the stellar color–magnitude diagram. He is using a neural network to model this nuisance. My loyal reader knows that this is my dream: To only use these complex methods on the parts of our problems that are nuisances!
Kate Storey-Fisher (NYU) and I had a lunch-time conversation about her project to replace the standard large-scale-structure correlation-function estimator with a new estimator that doesn't require binning the galaxy pairs into separation bins. It estimates continuous functions. We discussed how to present such a new idea to a community that has been using the same binned estimator since the 1990s, and even before that they were only marginally different. That is, the change Storey-Fisher proposes is the biggest change to correlation-function estimation since it all started, in my (somewhat not humble) opinion.
But this creates a problem: How to convince the cosmologists that they need to learn new tricks? We have many arguments, but which one is strongest? We no longer need to bin, and binning is sinning! Or: We can capture more functional variation with fewer degrees of freedom, so we reduce simulation requirements! Or: We can restrict the function space to smooth functions, so we regularize away unphysical high-frequency components! Or: We get smaller uncertainties on the clustering at every scale! Or: We can make our continuous function components be similar to derivatives of the correlation function with respect to cosmological parameters and therefore create clustering statistics that are close to Fisher-optimal given the data and the model!
Writing methodological papers is not easy.
Lily Zhao (Yale) and I got to the point that we can calibrate one laser-frequency-comb exposure on EXPRES with a calibration based on all the LFC exposures. We find that we predict the lines in each exposure to a mean (over all lines) of about 3 cm/s! If this holds up to cross-validation, I am happy!
In Stars & Exoplanets Meeting at Flatiron, many fun things happened. Zhao showed us temperature variations observed spectroscopically in Sun-like stars: Are these due to spots? And Marina Kounkel (WWU) showed us incredible visualizations of young stars, showing that they are very clustered in kinematics and age. She believes that they form in lines, and then the lines break up. Her interactive plots were amazing!
Today Selma de Mink (Harvard) gave a great and energizing Astrophysics Seminar at NYU. She talked about many things related to the extremely massive-star progenitors of the estremely massive black holes being observed in merger by LIGO. One assumption of her talk, which is retrospectively obvious but was great, is that the vast majority of LIGO events should be first-generation mergers. A second merger is very unlikely, dynamically. But that wasn't her point: Her point was that the masses that LIGO sees will constrain how very massive stars evolve. In particular, she showed that there is a strong prediction of a mass gap: There can't be black holes formed by stellar evolution in the mass range 45 to 150 solar masses. The physics is all about pair-instability supernovae from very low-metallicity stars. But the details of this black-hole mass gap depend on some nuclear reaction rates, so she concludes that LIGO will make nucleosynthetic measurements! The LIGO data probably already do. It's a new world!
The NYU Center for Data Science has a big masters program, and as a part of that, students do capstone research projects with industry and academic partners. I am one of the latter! So Soledad Villar (NYU) and I have been advising two groups of capstone students in projects. One of these groups (Teresa Ningyuan Huang, Zach Martin, Greg Scanlon, and Eva Shuyu Wang) has been working on adversarial attacks against regressions in astronomy. The work is new in part because it brings the idea of attacks to the natural science domain, and because attacks haven't been really defined for regression contexts. Today we decided that this work is ready (enough) to publish. So we are going to try to finish and submit something for a conference deadline this week!
Based on conversations with Soledad Villar, Teresa Huang, Zach Martin, Greg Scanlon, and Eva Wang (all NYU), I worked today on establishing criteria for a successful adversarial attack against a regression in the natural sciences (like astronomy). The idea is you add a small, irrelevant amount u to your data x and it changes the labels y by an unexpectedly large amount. Or, to be more specific:
- The L2 norm (u.u) of the vector u should be equal to a small number Q
- The vector u should be orthogonal to your expectation v of the gradient of the function dy/dx
- The change in the inferred labels at x+u relative to x should be much larger than you would get for the same-length move in the v direction!
Conversations with Soledad Villar (NYU) got me closer to understanding what's different between an adversarial attack against a classification and a regression.
As my loyal reader knows, Abbie Shaum (NYU) and I are building a software analog of an FM radio to find binary companions to stars with coherent oscillation modes. We have a good signal! But now we want to optimize the carrier frequency and some other properties of the radio. We discussed how to do that today.
As my loyal reader knows, I love that products of Gaussians are themselves Gaussians! The result is that there are many factorizations of a Gaussian into many different Gaussian products. As my loyal reader also knows, Adrian Price-Whelan (Flatiron) and I found a bug in our code The Joker which fits radial-velocity data with Keplerian orbital models; this bug is related to the fundamental factorization of Gaussians that underlies the method. Today Price-Whelan showed me results from the fixed code, and we discussed them (and the priors we are using in our marginalization), along with the paper we are writing about the factorization. Yes, people, this is my MO: When you have a big bug—or really a big think-o or conceptual error—don't just fix it, write a paper about it! That's the origin of my paper on the K-correction. We are also contemplating writing a note about how you can constrain time-domain signals with periods longer than the interval over which you are observing them!
Today was the last day of the visit by Christina Eilers (MIT). We decided that we have a clear scope for our Milky Way disk paper, and we have a full outline and figures, so it was a success. But we don't have a good picture of our black-hole and quasar projects, which include finding a data-driven model that generates quasar properties given a black-hole mass (and possibly many latent parameters). The data are messy! And we don't know what to believe. To be continued.
Today Christina Eilers (MIT) and I got in a call with Hans-Walter Rix (MPIA) to discuss our Milky-Way disk project. He opined that our measurement of the spiral structure is indeed both the clearest picture of it in the $X-Y$ plane of the Milky Way disk, and also the only measurement of its dynamical influence or amplitude. So that was a good boost for morale, and we promised to send him a full outline of our paper, with figures and captions and an abstract, next week.
It being Wednesday, I worked today with many people. Many highlights! One was the following: With Lily Zhao (Yale), my loyal reader knows, I am calibrating the EXPRES spectrograph, which has both laser-frequency comb and thorium-argon lamp calibration data. Zhao and I have figured out that we can go both hierarchical and non-parametric with the calibration: Hierarchical in the sense that we will use all the calibration frames to calibrate every exposure, and non-parametric in the sense that we won't choose the order of a polynomial, we will use interpolation or a process.
Today we improved the order of operations for this project. At first we were interpolating, and then building the hierarchical model. But today (forced by computational cost) we realized that we can build the hierarchical model on the calibration data prior to the interpolation. That's lower dimensional. It sped things up a lot, and simplified the code. We did some robust things to deal with missing data, and Zhao did some clever things to make her code work with the arc lamp just as well as it does with the laser-frequency comb.
Our current plan is to assess our calibration quality by looking at the measured radial velocity (which should be exactly zero, I hope) of the thorium-argon lamps as calibrated by the LFC, and look at the velocity of the LFC as calibrated by the ThAr lamps. That is, a cross-validation between the lamps and the comb.
I had a very brief but useful conversation today with Soledad Villar (NYU) about the strategy and meaning of adversarial attacks against regression methods. We have been working on this all semester, but I am still thinking about the fundamentals. One thing I am confident about, even in the trivial machine-learning methods I have used in astronomy, is that there will be successful single-pixel attacks against standard regressions that we use. That is, you will find that the ML method is very sensitive to particular pixels! But this is a conjecture. We need to make a very clear definition of what constitutes a successful attack against a regression. In the case of classification, it seems like the definition is “The authors of the method are embarrassed”. But that doesn't seem like a good definition! Aren't we scientists? And open scientists, at that.
Christina Eilers (MIT) is in town for the week. We realized that we have four projects: Work on the kinematic signatures of spiral arms in the Milky Way disk; design a self-calibration program for stellar element abundances; create a latent-variable model (like The Cannon) for estimating black-hole masses from quasar spectra; infer simultaneously the large-scale structure towards luminous quasars and the quasar lifetimes using rest-frame ultraviolet spectra.
Because it is the most mature, our highest priority is the disk paper. We discussed the scope of this paper, which is: Good visualization of the velocity structure; a toy model to relate the velocity amplitude with the density amplitude of any dynamically-driven perturbation; rough measurement of the pitch angle; comparison to other claims of spiral structure in the neighborhood. We think we have the clearest view of the spiral structure, and the only truly dynamical measurement.
I spent the last two days at the National Society of Black Physicists meeting in Providence RI. It was a great meeting, with a solid mix of traditional physics, strategizing about the state of the profession, and offline conversations about politics and the many communities of physicists. Many great things happened. Here are some random highlights: I learned from Bryen Irving (Stanford) that the harder neutron-star equations of state lead to larger tidal effects on binary inspiral. After all, harder state means larger radius, larger radius means more tidal distortion to the surface equipotential. Deep! I enjoyed very much a comment by Richard Anantua (Harvard) about “the importance of late-time effects on one's career”. He was talking about the point that there are combinatorially many ways to get from point A to point B in your career, and it is your current state that matters most. Beautiful! There was an excellent talk by Joseph Riboudo (Providence College) that was simultaneously about how to influence the community with a Decadal-survey white paper and about primarily undergraduate institutions and how we should be serving them as a community. He was filled with wisdom! And learning. Eileen Gonzalez (CUNY) showed her nice results understanding incredibly cool (and yes, I mean low-temperature) star binaries. She is finding that data-driven atmospheric retrieval methods plus clouds work better than grids of ab initio models. That's important for the JWST era. And I absolutely loved off-session chatting with Dara Norman (NOAO) and others. Norman is filled with conspiracy theories and I have to tell you something: They are all True. Norman also deserves my thanks for organizing much of the astrophysics content at the meeting. It was a great couple of days.
Stars and Exoplanets Meeting at Flatiron was a delight today. Lachlan Lancaster (Princeton) showed his results on a really interesting object he found in the ESA Gaia data. He was inspired by the idea that star clusters might have central black holes, which might retain a very dense, very luminous nuclear star cluster even after the cluster disrupts. But his search of the Gaia data was so simple: Look for things that are apparently bright but low in parallax (large in distance). Duh! And what he found is a very bright “star” that is variable, shows emission lines, and is above the top of the H–R diagram! The ideas from the room ranged from extremely young star to microquasar to technosignatures (who suggested that?). And the thing is incredibly variable.
But there was lots more! I won't do everything, but I will say that Thankful Cromartie (Virginia) showed data from pulsar monitoring (as part of a pulsar-timing project for gravitational waves). She showed that she can very clearly see the Shapiro time delay in the pulses when they pass by the neutron star that is in orbit around the pulsar. This lets them measure the mass of the neutron star accurately. It is very massive! i think it must be one of the most massive neutron stars known, which, in turn, will put pressure on the equations of state. Beautiful results from beautiful data.
An all-proposal day was interrupted by a three-hour lunch with Jim Peebles (Princeton) and Kate Storey-Fisher (NYU). We discussed many things, but a major theme was curiosity-driven research, which Peebles wants to speak about at the Nobel Prize ceremony next month.
Grace Telford (Rutgers) showed up in NYC today and we discussed the inference of star-formation histories from observations of resolved stellar populations. We discussed the point that the space being high dimensional (because, say, the star formation history is modeled as a set of 30-ish star-formation rates in bins), which leads to two problems. The first is that a maximum-likelihood or maximum-a-posteriori setting of the SFH will be atypical (in high dimensions, optima are atypical relative to one-sigma-ish parameter settings). The second is that the results are generally extremely prior-dependent, and the priors are usually made up by investigators, not any attempt to represent their actual beliefs. We talked about ways to mitigate against these issues.
As my loyal reader knows, I am working with Lily Zhao (Yale) to calibrate the EXPRES spectrograph. Our approach is non-parametric: We can beat any polynomial calibration with an interpolation (we are using splines, but one could also use a Gaussian Process or any other method, I think). The funniest thing happened today, which surprised me, but shouldn't have! When Zhao plotted a histogram of the differences between our predicted line locations (from our interpolation) and the observed line locations (of held-out lines, held out from the interpolation), they were always redshifted! There was a systematic bias everywhere. We did all sorts of experiments but could find no bug. What gives? And then we had a realization which is pretty much Duh:
If you are doing linear interpolation (and we were at this point), and if your function is monotonically varying, and if your function's first derivative is also monotonically varying, the linear interpolator will always be biased to the same side! Hahaha. We switched to a cubic spline and everything went unbiased.
In detail, of course, interpolation will always be biased. After all, it does not represent your beliefs about how the data are generated, and it certainly does not represent the truth about how your data were generated. So it is always biased. It's just that once we go to a cubic spline, that bias is way below our precision and accuracy (under cross-validation). At least for now.
I had a meeting with Emily Cunningham (Flatiron) to discuss any projects of mutual interest. She has been looking at simulations of the Milky Way (toy simulations) in which the LMC and SMC fall in. These simulations get tidally distorted by the infall, and various observational consequences follow. For example, the disk ends up having a different mean velocity than the halo! And for another, different parts of the halo move relative to one another, in the mean. Cunningham's past work has been on the velocity variance; now it looks like she has a project on the velocity mean! The predictions are coming from toy simulations (from the Arizona group) but I'm interested in the more general question of what can be learned from spatial variations in the mean velocity in the halo. It might put strong constraints on the recent-past time-dependence.
Oh what a great day! Not a lot of research got done; NSF proposals, letters of recommendation, and all that. But in the afternoon, undergraduate researcher Abby Shaum (NYU) and I looked at her project to do frequency demodulation on asteroseismic modes to find orbital companions and we got one. Our target is a hot star that has a few very strong asteroseismic modes (around 14 cycles per day in frequency), and our demodulator is actually a phase demodulator (not frequency) but it's so beautiful:
The idea of the demodulator is that you mix (product) the signal (which, in this case, is bandpass-filtered NASA Kepler photometric data) with a complex sinusoid at (as precisely as you can set it) the asteroseismic carrier frequency. Then you Gaussian smooth the real and imaginary parts of that product over some window timescale (the inverse bandwidth, if you will). The resulting extremely tiny phase variations (yes these stars are coherent over years) have some periodogram or power spectrum, which shows periodicity at around 9 days, which is exactly the binary period we expected to find (from prior work).
I'm stoked! the advantages of our method over previous work are: Our method can easily combine information from many modes. Our method can be tuned to any modes that are in any data. We did not have to bin the lightcurve into bins; we only had to choose an effective bandwidth. The disadvantages are: We don't have a probabilistic model! We just have a procedure. But it's so simple and beautiful. I'm feeling like the engineer I was born to be.
It was a great research day today. I worked with Lily Zhao (Yale) on the wavelength calibration of the EXPRES spectrograph, which my loyal reader knows is a project of Debra Fischer (Yale). Lily and I cleaned up and sped up (by a lot) the polynomial fitting that the EXPRES team is doing, and showed (with a kind of cross-validation) that the best polynomial order for the fit is in the range 8 to 9. This is for a high-resolution, laser-frequency-comb-calibrated, temperature-controlled, bench-mounted, dual-fiber spectrograph.
But then we threw out that polynomial fit and just worked on interpolating the laser frequency-comb line positions. These are fixed in true wavelength and dense on the detector (for many orders, anyway). Oh my goodness did it work! When we switched from polynomial fitting to interpolation, the cross-validation tests got much better, and the residuals went from being very structured and repeatable to looking like white noise. When we averaged solutions, we got very good results, and when we did a PCA of the differences away from the mean solution, it looks like the variations are dominated by a single variability dimension! So it looks like we are going to end up with a very very low-dimensional, data-driven, non-parametric calibration system that hierarchically pools information from all the calibration data to calibrate every single exposure. I couldn't be more stoked!
A no-research day (Thursdays are always bad) was ended on a great note with a Colloquium by Ian Dobbs-Dixon (NYUAD), who spoke about the atmospheres of hot-jupiter-like exoplanets. He has a great set of equipment that connects the global climate model built for Earth climate modeling with lots of planet-relevant physics (like strong, anisotropic insolation and internal heat flows) to figure out what must be happening on these planets. He showed some nice predictions and also some nice explanations of the observed property (yes observed property) that these planets do not have their hottest point at the sub-stellar point. It's so exciting when we think forward to what might be possible with NASA JWST.
My main research contribution today was to write some notes for myself and Lily Zhao (Yale) about how we might start to produce a low-dimensional, hierarchical, non-parametric calibration model for the EXPRES spectrograph.
At the end of a long faculty meeting at NYU Physics, my colleague Shura Grosberg came to me to discuss a subject we have been discussing at a low rate for many months: How is it possible that my watch (my wristwatch) is powered purely by stochastic motions of my arm, when thermal ratchets are impossible? He presented to me a very simple model, in which my watch is seen a set of three coupled systems. One is the winder, which is a low-Q oscillator that works at long periods. The next is the escapement and spring, which is a high-Q oscillator that has a period of 0.2 seconds. The next is the thermal bath of noise to which the watch dissipates energy. If my arm delivers power only on long periods (or mainly on long periods), then it only couples well to the first of these. And then power can flow to the other two systems. Ah, I love physicists!
As my loyal reader knows, I love the Brown-Bag talks at the Center for Cosmology and Particle Physics. Today was a great example! Hongwan Liu (NYU) talking about milli-charged dark matter. Putting a charge in the dark sector is a little risky, because the whole point of dark matter is that it is invisible, electromagnetically! But it turns out that if you include enough particle complexity in the dark sector, you can milli-charge the dark matter and move thermal energy from the light sector into the dark sector and vice versa.
Liu was motivated by some issues with 21-cm intensity mapping, but he has some very general ideas and results in his work. I was impressed by the point that his work involves the heat capacity of the dark sector. That's an observable, in principle! And it depends on the particle mass, because a dark sector with smaller particle mass has more particles and therefore more degrees of freedom and more heat capacity! It's interesting to think about the possible consequences of this. Can we rule out very small masses somehow?
Continuing on stuff I got distracted into yesterday (when I should be working on NSF proposals!) I did some work on phase manipulation to interpolate between images. This was: Fourier transform both images, and interpolate in amplitude and phase independently, rather than just interpolate the complex numbers in a vector sense. It works in some respects and not in others. And it works much better on a localized image patch than in a whole image. I made this tweet to demonstrate. This is related to the idea that people who do this professionally use wavelet-like methods to get local phase information in the image instead of manipulating global phase. So the trivial thing doesn't work; I need to learn more!
Nora Shipp (Chicago) has been in town this week, working with Adrian Price-Whelan to find halo substructures and stellar streams around the Milky Way. The two of them made beautiful animations, paging through distance slices, showing halo stellar density (as measured by a color-magnitude matched filter). There are lots of things visible in those animations! We discussed the point that what makes overdensities appear to the human eye is their coherence through slices.
That made me think of things that Bill Freeman (MIT) and his lab does with amplifying small signals in video: Should we be looking for small overdensities with similar tricks? Freeman's lab uses phase transforms (like Fourier transforms and more localized versions of those) to detect and amplify small motions. Maybe we should use phase transforms here too. That led Price-Whelan and me to hack a little bit on this image pair by Judy Schmidt, which was fun but useless!
Late in the day, Megan Bedell (Flatiron), Lily Zhao (Yale), Debra Fischer (Yale), and I all met to discuss EXPRES data. It turns out that what the EXPRES team has in terms of data, and what they need in terms of technology, is incredibly well aligned with what Bedell and I want to do in the EPRV space. For example, EXPRES has been used to resolve the asteroseismic p-modes in a star. For another, it has made excellent observations of a spotty star. For another, it has a calibration program that wants to go hierarchical. I left work at the end of the day extremely excited about the opportunities here.
Today Josh Ruderman (NYU) gave a great Physics Colloquium, about particle physics phenomenology, from measuring important standard-model parameters with colliders to finding new particles in cosmology experiments. It was very wide-ranging and filled with nice insights about (among other things) thermal-relic dark matter and intuitions about (among other things) observability of different kinds of dark-sector activity. One theme of the dark-matter talks I have seen recently is that most sensible, zeroth-order bounds (like on mass and cross section for a thermal-relic WIMP) can be modified by slightly complexifying the problem (like by adding a dark photon or another dark state). Ruderman navigated a bunch of that for us nicely, and convinced us that there is lots to do in particle theory, even if the LHC remains in a standard-model desert.
Our LSST broker discussions from yesterday continued at the Cosmology X Machine Learning group meeting at Flatiron. The group helped us think a little bit about the supervised and unsupervised options in the time-domain space.
My day ended with a long conversation with Sjoert van Velzen (NYU), Tyler Pritchard (NYU), and Maryam Modjaz (NYU), about possible things we could be doing in the LSST time-domain and broker space. Our general interest is in finding interesting and unusual and outlier events that are interesting either because they are unprecedented, or because they are unusual within some subclass, or because they imply odd physical parameters or strange conditions. But we don't have much beyond that! We need to get serious in the next few months because there will be proposal calls.
As my loyal reader knows, I have opinions about spectroscopic extraction—the inference of the one-dimensional spectrum of an object as a function of wavelength, given the two-dimensional image of the spectrum in the spectrograph detector plane. The EXPRES team (I happen to know) and others have the issue with their spectrographs that the cross-dispersion direction (the direction precisely orthogonal to the wavelength direction) is not always perfectly aligned with the y direction on the detector. This is a problem because if it is aligned, there are very simple extraction methods available.
I spent parts of the day writing down not the general solution to this problem (which might possibly be Bolton & Schlegel's SpectroPerfectonism, although I have issues with that too), but rather with an expansion around the perfectly-aligned case, that leads to an iterative solution, but preserving the solutions that work at perfect alignment. It's so beautiful! As expansions usually are.
What to call this? I am building on Zechmeister et al's “flat-relative optimal extraction”. But I'm allowing tilts. So Froet? Is that a rude word in some language?
Marla Geha (Yale) crashed Flatiron today and we spent some time talking about a nice problem in spectroscopic data analysis: Imagine that you have a pipeline that works on each spectrum (or each exposure or each plate or whatever) separately, but that the same star has been observed multiple times. How do you post-process your individual-exposure results so that you get combined results that are the same as you would have if you had processed them all simultaneously. You want the calibration to be independent for each exposure, but he stellar template to be the same, for example. This is very related to the questions that Adrian Price-Whelan (Flatiron) and I have been solving in the last few weeks. You have to carry forward enough marginalized likelihood information to combine later. This involves marginalizing out the individual-exposure parameters but not the shared parameters. (And maybe making some additional approximations!)
As is not uncommon on a Friday, Astronomical Data Group meeting was great! So many things. One highlight for me was that Lily Zhao (Yale) has diagnosed—and figured out strategies related to—problems we had in wobble with the learning rate on our gradient descent. I hate optimization! But I love it when very good people diagnose and fix the problems in our optimization code!
Thursdays are low research days. I did almost nothing reportable here according to The Rules. I did have a valuable conversation with Price-Whelan (Flatiron) about marginalized likelihoods, and I started to get an intuition about why our factorization of Gaussian products has the form that it has. It has to do with the fact that the marginalized likelihood (the probability of the data, fully marginalizing out all linear parameters) permits or has variance for the data that is a sum in quadrature of the noise variance and the model variance. Ish!
I had an amusing email from out of the blue, asking me to dig up the IDL (yes, IDL) code that I (and Blanton and Bovy and Johnston and Roweis and others) wrote to analyze the local velocity field using the ESA Hipparcos data. Being a huge supporter of open science, I had to say yes to this request. I dug through old cvs repositories (not svn, not git, but cvs) and found the code, and moved it to Github (tm) here. I didn't truly convert the cvs repo to git, so I erased history, which is bad. But time is precious, and I could always fix that later. I hereby apologize to my co-authors!
All this illustrates to me that it is very good to put your code out in the open. One reason is that then you don't have to go digging like this; a simple google search would have found it! Another is that when you know your code will be out in the open, you are (at least slightly) more likely to make it readable and useable by others. I dug up and threw to the world this code, but will anyone other than the authors ever be able to make any use of it? Or even understand it? I don't know.
I had my weekly call with Ana Bonaca (Harvard) this morning, where she updated me on our look at systematic effects in the radial-velocity measurements we are getting out of Hectochelle. We see very small velocity shifts in stellar radial velocities across the field of view that seem unlikely to be truly in the observed astrophysical stellar systems we are observing. At this point, Bonaca can show that these velocity shifts do not appear in the sky lines; that is, the calibration (with arc lamps) of the wavelengths on the detector is good.
All I have left at this point is that maybe the stars illuminate the fibers differently from the sky (and arc lamps) and this difference in illumination is transmitted to the spectrograph. I know how to test that, but it requires observing time; we can't do it in the data we have in hand right now. This is an important thing for me to figure out though, because it is related to how we commission and calibrate the fiber robot for SDSS-V. Next question: Will anyone give us observing time to check this?
Today was almost all admin and teaching. But I did get to the Astronomical Data Group meeting at Flatiron, where we had good discussions of representation learning, light curves generated by spotted stars, the population of planets around slightly evolved stars, and accreted stellar systems in the Milky Way halo!
I got in a bit of research in a mostly-teaching day. I saw the CDS Math-and-Data seminar, which was by Peyman Milanfar (Google) about de-noising models. In particular, he was talking about some of the theory and ideas behind the de-noising that Google uses in its Pixel cameras and related technology. They use methods that are adaptive to the image itself but which don't explicitly learn a library of image priors or patch priors or anything like that from data. (But they do train the models on human reactions to the denoising.)
Milanfar's theoretical results were nice. For example: De-noising is like a gradient step in response to a loss function! That's either obvious or deep. I'll go with deep. And good images (non-noisy natural images) should be fixed points of the de-noising projection (which is in general non-linear). Their methods identify similar parts of the images and use commonality of those parts to inform the nonlinear projections. But he explained all this with very simple notation, which was nice.
After the talk I had a quick conversation with Jonathan Niles-Weed (NYU) about the geometry of the space of natural images. Here's a great argument he gave: Imagine you have two arbitrarily different images, like one of the Death Star (tm) and one of the inside of the seminar room. Are these images connected to one another in the natural-image subspace of image space? That is, is there a continuous transformation from one to the other, every point along which is itself a good natural image?
Well, if I can imagine a continuous tracking shot (movie) of me walking out of the seminar room and into a spaceship and then out of the airlock on a space walk to repair the Death Star (tm), and if every frame in that movie is a good natural image, and everything is continuous, then yes! What a crazy argument. The space of all natural images might be one continuously connected blob. Crazy! I love the way mathematicians think.
So many things. I love Wednesdays. Here's one: I spent a lot of the day working with Adrian Price-Whelan (Flatiron) on our issues with The Joker. We found some simple test cases, we made a toy version that has good properties, we compared to the code. Maybe we found a sign error!? But all this is in service of a conceptual data-analysis project I want to think about much more: What can you say about signals with periodicity (or structure) on time scales far, far longer than the baseline of your observations? Think long-period companions in RV surveys or Gaia data. Or the periods of planets that transit only once in your data set. Or month-long asteroseismic modes in a giant star observed for only a week. I think it would be worth getting some results here (and I am thinking information theory) because I think there will be some interesting scalings (like lots of things might have precisions that scale better (faster I mean) than the square-root of time baseline).
In Stars & Exoplanets meeting at Flatiron, many cool things happened! But a highlight for me was a discovery (reported by Saurabh Jha of Rutgers) that the bluest type Ia supernovae are more standardizeable (is that a word?) candles than the redder ones. He asked us how to combine the information from all supernovae with maximum efficiency. I know how to do that! We opened a thread on that. I hope it pays off.
Today Kristina Hayhurst (NYU) came to my office and, with a little documentation-hacking, we figured out how to read and plot ESA Planck data or maps released in the Planck archive! I am excited, because there is so much to look at in these data. Hayhurst's project is to look at the “Van Gogh” plot of the polarization: Can we do this better?
In the CCPP Brown-Bag seminar today, Neal Weiner (NYU) spoke about the possible connections between the dark sector (where dark matter lives) and our sector (where the standard model lives). He discussed the WIMP miracle, and then where we might look in phenomenology space for the particle interactions that put the WIMPs or related particles in equilibrium with the standard-model particles in the early Universe.
In the afternoon, I worked with Abby Shaum (NYU) and Kate Storey-Fisher (NYU) to get our AAS abstracts ready for submission for the AAS Winter Meeting in Honolulu.
Adrian Price-Whelan (Flatiron) and I spent time this past week trying to factorize products of Gaussians into new products of different Gaussians. The context is Bayesian inference, where you can factor the joint probability of the data and your parameters into a likelihood times a prior or else into an evidence (what we here call the FML) times a posterior. The factorization was causing us pain this week, but I finally got it this weekend, in the woods. The trick I used (since I didn't want to expand out enormous quadratics) was to use a determinant theorem to get part of the way, and some particularly informative terms in the quadratic expansion to get the rest of the way. Paper (or note or something) forthcoming...
Megan Bedell (Flatiron) and I continued our work from earlier this week on making a mechanical model of stellar asteroseismic p-modes as damped harmonic oscillators driven by white noise. Because the model is so close to closed-form (it is closed form between kicks, and the kicks are regular and of random amplitude), the code is extremely fast. In a couple minutes we can simulate a realistic, multi-year, dense, space-based observing campaign with a full forest of asteroseismic modes.
The first thing we did with our model is check the results of the recent paper on p-mode mitigation by Chaplin et al, which suggest that you can obtain mitigation of p-mode noise in precision radial-velocity observation campaigns by good choice of exposure time. We expected, at the outset, that the results of this paper are too optimistic: We expected that a fixed exposure time would not do a good job all the time, given the stochastic nature of the driving of the modes, and that there are many modes in a frequency window around the strongest modes. But we were wrong and the Chaplin et al paper is correct! Which is good.
However, we believe that we can do better than exposure-time-tuning for p-mode mitigation. We believe that we can fit the p-modes with the (possibly non-stationary) integral of a stationary Gaussian process, tuned to the spectrum. That's our next job.
Our weekly Stars and Exoplanets Meeting at Flatiron was all about stellar rotation somehow this week (no we don't plan this!). Adrian Price-Whelan (Flatiron) showed that stellar rotations can get so large in young clusters that stars move off the main sequence and the main sequence can even look double. We learned (or I learned) that a significant fraction of young stars are born spinning very close to break-up. This I immediately thought was obviously wrong and then very quickly decided was obvious: It is likely if the last stages of stellar growth are from accretion. Funny how an astronomer can turn on a dime.
And in that same meeting, Jason Curtis (Columbia) brought us up to date on his work on on stellar rotation and its use as a stellar clock. He showed that the usefulness is great (by comparing clusters of different ages); it looks incredible for at least the first Gyr or so of a stars lifetime. But the usefulness decreases at low masses (cool temperatures). Or maybe not, but the physics looks very different.
In the morning, before the meeting, Megan Bedell (Flatiron) and I built a mechanical model of an asteroseismic mode by literally making a code that produces a damped, driven harmonic oscillator, driven by random delta-function kicks. That was fun! And it seems to work.
The highlight of a low-research day was a great NYU Astro Seminar by Maria Okounkova (Flatiron) about testing or constraining extensions to general relativity using the LIGO detections of black hole binary inspirals. She is interested in terms in a general expansion that adds to Einstein's equations higher powers of curvature tensors and curvature scalars. One example is the Chern–Simons modification, which adds some anisotropy or parity-violation. She discussed many things, but the crowd got interested in the point that the Event Horizon Telescope image of the photon sphere (in principle) constrains the Chern–Simons terms! Because the modification distorts the photon sphere. Okounkova emphasized that the constraints on GR (from both gravitational radiation and imaging) get better as the black holes in question get smaller and closer. So keep going, LIGO!
I had a conversation with Ana Bonaca (Harvard) early today about the sky emission lines in sky fibers in Hectochelle. We are trying to understand if the sky is at a consistent velocity across the device. This is part of calibrating or really self-calibrating the spectrograph. It's confusing though, because the sky illuminates a fiber differently than the way that a star illuminates a fiber. So this test only tests some part of the system.
At the Brown-bag talk, Bob Johnson (Virginia) spoke about exo-moons and in particular exo-Ios. Yes, analogs of Jupiter's moon Io. The reason this is interesting is that Io interacts magnetically and volcanically with Jupiter, producing an extended distribution of volcanically produced ions in Jupiter's magnetic field. It is possible that transmission spectroscopy of hot Jupiters is being polluted by volcanic emissions of very hot moons! That would be so cool! Or hot?
My loyal reader knows that earlier this week I got interested in (read: annoyed with) the standard description of the optimal extraction method of obtaining one-dimensional spectra from two-dimensional spectrograph images, and started writing about it on a trip. On return to New York, Lily Zhao (Yale) listened patiently to my ranting and then pointed out this paper by Zechmeister et al on flat-relative extraction, which (in a much nicer way) makes all my points!
This is a classic example of getting scooped! But my feeling—on learning that I have been scooped—was of happiness, not sadness: I hadn't spent all that much time on it; the time I spent did help me understand things; and I am glad that the community has a better method. Also, it means I can concentrate on extracting, not on writing about extracting! So I found myself happy about learning that I was scooped. (One problem with not reading the literature very carefully is that I need to have people around who do read the literature!)
I had a quick pair-coding session with Anu Raghunathan (NYU) today to discuss the box least squares algorithm that is used so much in finding exoplanets. We are looking at the statistics of this algorithm, with the hope of understanding it in simple cases. It is such a simple algorithm, many of the things we want to know about uncertainty and false-positive rate can be determined in closed form, given a noise model for the data. But I'm interested in things like: How much more sensitive is a search when you know (in advance) the period of the planet? Or that you have a resonant chain of planets? These questions might also have closed-form answers, but I'm not confident of them, so we are making toy data.
On the plane home, I wrote words about optimal extraction, the method for spectral analysis used in most extreme precision radial-velocity pipelines. My point is so simple and dumb, it barely needs to be written. But if people got it, it would simplify pipelines. The point is about flat-field and PSF: The way things are done now is very sensitive to these two things, which are not well known for rarely or barely illuminated pixels (think: far from the spectral traces).
Once home, I met up with a crew of data-science students at the Center for Data Science to discuss making adversarial attacks against machine-learning methods in astronomy. We talked about different kinds of machine-learning structures and how they might be sensitive to attack. And how methods might be made robust against attack, and what that would cost in training and predictive accuracy. This is a nice ball of subjects to think about! I have a funny fake-data example that I want to promote, but (to their credit) the students want to work with real data.
I achieved my goals for Terra Hunting Experiment this week! After my work on the plane and the discussion we had yesterday, we (as a group) were able to draft a set of potentially sensible and valuable high-level goals for the survey. These are, roughly, maximizing the number of stars around which we have sensitivity to Earth-like planets, delivering statistically sound occurrence rate estimates, and delivering scientifically valuable products to the community. In that order! More about this soon. But I'm very pleased.
Another theme of the last two days is that most or maybe all EPRV experiments do many things slightly wrong. Like how they do their optimal extraction. Or how they propagate their simultaneous reference to the science data. Or how they correct the tellurics. None of these is a big mistake; they are all small mistakes. But precision requirements are high! Do these small mistakes add up to anything wrong or problematic at the end of the day? Unfortunately, it is expensive to find out.
Related: I discovered today that the fundamental paper on optimal extraction contains some conceptual mistakes. Stretch goal: Write a publishable correction on the plane home!
Today at the Terra Hunting Experiment Science Team meeting (in the beautiful offices of the Royal Astronomical Society in London) we discussed science-driven aspects of the project. There was way too much to report here, but I learned a huge amount in presentations by Annelies Mortier (Cambridge) and by Samantha Thompson (Cambridge) about the sources of astrophysical variability in stars that is (effectively) noise in the RV signals. In particular, they have developed aspects of a taxonomy of noise sources that could be used to organize our thinking about what's important to work on and what approaches to take. I got excited about working on mitigating these, which my loyal reader knows is the subject of my most recent NASA proposal.
Late in the day, I made my presentation about possible high-level goals for the survey and how we might flow decisions down from those goals. There was a very lively discussion of these. What surprised me (given the diversity of possible goals, from “find an Earth twin” to “determine the occurrence rate for rocky planets at one-year periods”) was that there was a kind of consensus: One part of the consensus was along the lines of maximizing our sensitivity where no other survey has ever been sensitive. Another part of the consensus was along the lines of being able to perform statistical analyses of our output.
I flew today to London for a meeting of the Terra Hunting Experiment science team. On the plane, I worked on a presentation that looks at the high-level goals of the survey and what survey-level and operational decisions will flow down from those goals. Like most projects, the project was designed to have a certain observing capacity (number of observing hours over a certain—long—period of time). But in my view, how you allocate that time should be based on (possibly reverse-engineered) high-level goals. I worked through a few possible goals and what they might mean for us. I'm hoping we will make some progress on this point this week.
Today was the third day of Gotham Fest, three Fridays in September in which all of astronomy in NYC meets all of astronomy in NYC. Today's installment was at NYU, and I learned a lot! But many four-minute talks just leave me wanting much, much more.
Before that, I met up with Adrian Price-Whelan (Flatiron) and Kathryn Johnston (Columbia) to discuss projects in the Milky Way disk with Gaia and chemical abundances (from APOGEE or other sources). We discussed the reality or usefulness of the idea that the vertical dynamics in the disk is separable from the radial and azimuthal dynamics, and how this might impact our projects. We'd like to do some one-dimensional problems, because they are tractable and easy to visualize. But not if they are ill-posed or totally wrong. We came up with some tests of the separability assumption and left it to Price-Whelan to execute.
At lunch, I discussed machine learning with Gabi Contardo (Flatiron). She has some nice results on finding outliers in data. We discussed how to make her project such that it could find outliers that no-one else could find by any other method.
With Suroor Gandhi (NYU) and Adrian Price-Whelan (Flatiron) we have been able to formulate (we think) some questions about unseen gravitational matter (dark matter and unmapped stars and gas) in the Milky Way into questions about transformations that map one set of points onto another set of points. How, you might ask? By thinking about dynamical processes that set up point distributions in phase space.
Being physicists, we figured that we can do this all ourselves! And being Bayesians, we reached for probabilistic methods. Like: Build a kernel density estimate on one set of points and maximize the likelihood given the other set of points and the transformation. That's great! But it has high computational complexity, and it is slow to compute. But for our purposes, we don't need this to be a likelihood, so we found out (through Soledad Villar, NYU) about optimal transport
Despite its name, optimal transport is about solving problems of this type (find transformations that match point sets) with fast, good algorithms. The optimal-transport setting brings a clever objective function (that looks like earth-mover distance) and a high-performance tailored algorithm to match (that looks like linear programming). I don't understand any of this yet, but Math may have just saved our day. I hope I have said here recently how valuable it is to talk out problems with applied mathematicians!
I got in some great research time late today working with Adrian Price-Whelan (Flatiron) to understand the morphology of the distribution of stars in APOGEE–Gaia in elements-energy space. The element abundances we are looking at are [Fe/H] and [alpha/Fe]. The energy we are looking at is vertical energy (as in something like the vertical action in the Milky Way disk). We are trying to execute our project called Chemical Tangents, in which we use the element abundances to find the orbit structure of the Galaxy. We have arguments that this will be more informative than doing Jeans models or other equilibrium models. But we want to demonstrate that this semester.
There are many issues! The issue we worked on today is how to model the abundance space. In principle we can construct a model that uses any statistics we like of the abundances. But we want to choose our form and parameterization with the distribution (and its dependence on energy of course) in mind. We ended our session leaning towards some kind of mixture model, where the dominant information will come from the mixture amplitudes. But going against all this is that we would like to be doing a project that is simple! When Price-Whelan and I get together, things tend to get a little baroque if you know what I mean?
I spent my research time today writing notes on paper and then LaTeX in a document, making more specific plans for the projects we discussed yesterday with Zhao (Yale) and Bedell (Flatiron). Zhao also showed me issues with EXPRES wavelength calibration (at the small-fraction-of-a-pixel level). I opined that it might have to do with pixel-size issues. If this is true, then it should appear in the flat-field. We discussed how we might see it in the data.
Today I had a great conversation with Lily Zhao (Yale) and Megan Bedell (Flatiron) about Zhao's projects for the semester at Flatiron that she is starting this moth. We have projects together in spectrograph calibration, radial-velocity measurement, and time-variability of stellar spectra. On that last part, we have various ideas about how to see the various kinds of variability we expect in the joint domain of wavelength and time. And since we have a data-driven model (wobble) for stellar spectra under the assumption that there is no time variability, we can look for the things we seek in the residuals (in the data space) away from that time-independent model. We talked about what might be the lowest hanging fruit and settled on p-mode oscillations, which induce radial-velocity variations but also brightness and temperature variations. I hope this works!
I spoke with Christina Eilers (MPIA) early yesterday about a possible self-calibration project, for stellar element abundance measurements. The idea is: We have noisy element-abundance measurements, and we think they may be contaminated by biases as a function of stellar brightness, temperature, surface gravity, dust extinction, and so on. That is, we don't think the abundance measurements are purely measurements of the relevant abundances. So we have formulated an approach to solve this problem in which we regress the abundances against things we think should predict abundances (like position in the Galaxy) and also against things we think should not predict abundances (like apparent magnitude). This should deliver the most precise maps of the abundance variations in the Galaxy but also deliver improved measurements, since we will know what spurious signals are contaminating the measurements. I wrote words in a LaTeX document about all this today, in preparation for launching a project.
Today I got in my first weekly meeting (of the new academic year) with Kate Storey-Fisher (NYU). We went through priorities and then spoke about the problem of performing some kind of comprehensive or complete search of the large-scale structure data for anomalies. One option (popular these days) is to train a machine-learning method to recognize what's ordinary and then ask it to classify non-ordinary structures as anomalies. This is a great idea! But it has the problem that, at the end of the day, you don't know how many hypotheses you have tested. If you find a few-sigma anomaly, that isn't surprising if you have looked in many thousands of possible “places”. It is surprising if you have only looked in a few. So I am looking for comprehensive approaches where we can pre-register an enumerated list of tests we are going to do, but to have that list of tests be exceedingly long (like machine-generated). This is turning out to be a hard problem.
The New York City physics and astronomy departments (and this includes at least Columbia, NYU, CUNY, AMNH, and Flatiron) run a set of three Friday events in which everyone (well a large fraction of everyone) presents a brief talk about who they are and what they do. The first event was today.
I re-derived equation (11) in our paper on The Joker, in order to answer some of the questions I posed yesterday. I find that the paper does have a sign error, although I am pretty sure that the code (based on the paper) does not have a sign error. I also found that I could generalize the equation to apply to a wider range of cases, which makes me think that we should either write an updated paper or at least include the math, re-written, in our next paper (which will be on the SDSS-IV APOGEE2 DR16 data).
This morning, Adrian Price-Whelan proposed that we might have a sign error in equation (11) in our paper on The Joker. I think we do, on very general grounds. But we have to sit down and re-do some math to check it. This all came up in the context that we are surprised about some of the results of the orbit fitting that The Joker does. In a nutshell: Even when a stellar radial-velocity signal is consistent with no radial-velocity trends (no companions), The Joker doesn't permit or admit many solutions that are extremely long-period. We can't tell whether this is expected behavior, and we are just not smart enough to expect it correctly, or whether this is unexpected behavior because our code has a bug. Hilarious! And sad, in a way. Math is hard. And inference is hard.
One of my projects this Fall (with Soledad Villar) is to show that large classes of machine-learning methods used in astronomy are susceptible to adversarial attacks, while others are not. This relates to things like the over-fitting, generalizability, and interpretability of the different kinds of methods. Now what would constitute a good adversarial example for astronomy? One would be classification of galaxy images into elliptical and spiral, say. But I don't actually think that is a very good use of machine learning in astronomy! A better use of machine learning is converting stellar spectra into temperatures, surface gravities, and chemical abundances.
If we work in this domain, we have two challenges. The first is to re-write the concept of an adversarial attack in terms of a regression (most of the literature is about classification). And the second is to define large families of directions in the data space that are not possibly of physical importance, so that we have some kind of algorithmic definition of adversarial. The issue is: Most of these attacks in machine-learning depend on a very heuristic idea of what's what: The authors look at the images and say “yikes”. But we want to find these attacks more-or-less algorithmically. I have ideas (like capitalizing on either the bandwidth of the spectrograph or else the continuum parts of the spectra), but I'd like to have more of a theory for this.
The self-calibration idea is extremely powerful. There are many ways to describe it, but one is that you can exploit your beliefs about causal structure to work out which trends in your data are real, and which are spurious from, say, calibration issues. For example, if you know that there is a set of stars that don't vary much over time, the differences you see in their magnitudes on repeat observations probably have more to do with throughput variations in your system than real changes to the stars. And your confidence is even greater if you can see the variation correlate with airmass! This was the basis of the photometric calibration (that I helped design and build) of the Sloan Digital Sky Survey imaging, and similar arguments have underpinned self-calibrations of cosmic microwave background data, radio-telescope atmospheric phase shifts, and Kepler light curves, among many other things.
The idea I worked on today relates to stellar abundance measurements. When we measure stars, we want to determine absolute abundances (or abundances relative to the Sun, say). We want these abundances to be consistent across stars, even when those stars have atmospheres at very different temperatures and surface gravities. Up to now, most calibration has been at the level of checking that clusters (particularly open clusters) show consistent abundances across the color–magnitude diagram. But we know that the abundance distribution in the Galaxy ought to depend strongly on actions, weakly on angles, and essentially not at all (with some interesting exceptions) on stellar temperature, nor surface gravity, nor which instrument or fiber took the spectrum. So we are all set to do a self-calibration! I wrote a few words about that today, in preparation for an attempt.
Mattias Samland (MPIA), as part of his PhD dissertation, adapted the CPM model we built to calibrate (and image-difference) Kepler and TESS imaging to operate on direct imaging of exoplanets. The idea is that the direct imaging is taken over time, and speckles move around. They move around continuously and coherently, so a data-driven model can capture them, and distinguish them from a planet signal. (The word "causal" is the C in CPM, because it is about the differences between how systematics and real signals present themselves in the data.) There is lots of work in this area (including my own), but it tends to make use of the spatial (and wavelength) rather than temporal coherence. The CPM is all about time. It turns out this works extremely well; Samland's adaptation of CPM looks like it outperforms spatial methods, especially at small “working angles” (near the nulled star; this is coronography!).
But of course a model that uses the temporal coherence but ignores the spatial and wavelength coherence of the speckles cannot be the best model! There is coherence in all four directions (time, two angles, and wavelength) and so a really good speckle model must be possible. That's a great thing to work on in the next few years, especially with the growing importance of coronographs at ground-based and space-based observatories, now and in the future. Samland and I discussed all this, and specifics of the paper he is nearly ready to submit.
I'm very proud of the things we have done over the years with our project called The Cannon, in which we learn a generative model of stellar spectra from stellar labels, all data driven, and then use that generative model to label other stellar spectra. This system has been successful, but it is also robust against certain kinds of over-fitting, because it is formulated as a regression from labels to data (and not the other way around). However, The Cannon has some big drawbacks. One is that (in its current form) the function space is hard-coded to be polynomial, which is both too flexible and not flexible enough, depending on context. Another is that the spectral representation is the pixel basis, which is just about the worst possible representation, given spectra of stars filled with known absorption lines at fixed resolution. And another is that the model might need latent freedoms that go beyond the known labels, either because the labels have issues (are noisy) or some are missing or they are incomplete (the full set of labels isn't sufficient to predict the full spectrum).
This summer we have discussed projects to address all three of these issues. Today I worked down one direction of this with Adam Wheeler (Columbia): The idea is to build a purely linear version of The Cannon but where each star is modeled using a generative model built just on its near neighbors. So you get the simplicity and tractability of a linear model but the flexibility of non-parametrics. But we also are thinking about operating in a regime in which we have no labels! Can we measure abundance differences between stars without ever knowing the absolute abundances? I feel like it might be possible if we structure the model correctly. We discussed looking at Eu and Ba lines in APOGEE spectra as a start; outliers in Eu or Ba are potentially very interesting astrophysically.
Today (and really over the last few days as well) I had a long discussion with Ana Bonaca (Harvard) about the results of our spectroscopy in the GD-1 stellar-stream fields. As my loyal reader knows, Bonaca, Price-Whelan, and I have a prediction for what the radial velocities should look like in the stream, if it is a cold stream that has been hit by a massive perturber. Our new velocity measurements (with the Hectochelle instrument) are not the biggest and best possible confirmation of that prediction!
However, our velocities are not inconsistent with our predictions either. The question is: What to say in our paper about them? We listed the top conclusions of the spectroscopy, and also discussed the set of figures that would bolster and explain those conclusions. Now to plotting and writing.
Along the way to understanding these conclusions, I think Bonaca has found a systematic issue (at extremely fine radial-velocity precision) in the way that the Hectochelle instrument measures radial velocities. I hope we are right, because if we are, the GD-1 stream might become very cold, and our velocity constraints on any perturbation will become very strong. But we will follow up with the Hectochelle team next week. It's pretty subtle.
Today I was finally back up at MPIA. I spent a good fraction of the day talking with Doug Finkbeiner (Harvard), Josh Speagle (Harvard) and others about probabilistic catalogs. Both Finkbeiner's group and my own have produced probabilistic catalogs. But these are not usually a good idea! The problem is that they communicate (generally) posterior information and not likelihood information. It is related to the point that you can't sample a likelihood! The big idea is that knowledge is transmitted by likelihood, not posterior. A posterior contains your beliefs and your likelihood. If I want to update my beliefs using your catalog, I need your likelihood, and I don't want to take on your prior (your beliefs) too.
This sounds very ethereal, but it isn't: The math just doesn't work out if you get a posterior catalog and want to do science with it. You might think you can save yourself by dividing out the prior but (a) that isn't always easy to do, and (b) it puts amazingly strong constraints on the density of your samplings; unachievable in most real scientific contexts. These problems are potentially huge problems for LSST and future Gaia data releases. Right now (in DR2, anyway) Gaia is doing exactly the correct thing, in my opinion.
My enforced week off work has been awesome for writing code. I deepened my knowledge and interest in the Google (tm) Colaboratory (tm) by writing a notebook (available here) that constructs fake stars in a fake galaxy and observes them noisily in a fake spectroscopic survey. This is in preparation for measuring the selection function and doing inference to determine the properties of the whole galaxy from observations of the selected, noisily observed stars. This in turn relates to the paper on selection functions and survey design that I am writing with Rix (MPIA); it could be our concrete example.
Today Doug Finkbeiner (Harvard), Josh Speagle (Harvard), and Ana Bonaca (Harvard) came to visit me in my undisclosed location in Heidelberg. We discussed many different things, including Finkbeiner's recent work on finding outliers and calibration issues in the LAMOST spectral data using a data-driven model, and Speagle's catalog of millions of stellar properties and distances in PanSTARRS+Gaia+2MASS+WISE.
Bonaca and I took that latter catalog and looked at new ways to visualize it. We both have the intuition that good visualization could and will pay off in these large surveys. Both in terms of finding structures and features, and giving us intuition about how to build automated systems that will then look for structures and features. And besides, excellent visualizations are productive in other senses too, like for use in talks and presentations. I spent much of my day coloring stars by location in phase space or the local density in phase space, or both. And playing with the color maps!
There's a big visualization literature for these kinds of problems. Next step is to try to dig into that.
When I am stuck in a quiet attic room, doing nothing but writing, I tend to go off the rails! This has happened in my paper with Rix about target selection for catalogs and surveys: It is supposed to be about survey design and now it has many pages about the likelihood function. It's a mess. Is it two papers? Or is it a different paper?
I resolved (for now, to my current satisfaction) my issues from a few days ago, about likelihoods for catalogs. I showed that the likelihood that I advocate does not give biased inferences, and does permit inference of the selection function (censoring process) along with the inference of the world. I did this with my first ever use of the Google (tm) Colaboratory (tm). I wanted to see if it works, and it does. My notebook is here (subject to editing and changing, so no promises about its state when you go there). If your model includes the censoring process—that is, if you want to parameterize and learn the catalog censoring along with the model of the world—then (contra Loredo, 2004) you have to use a likelihood function that depends on the selection function at the individual-source level. And I think it is justified, because it is the assumption that the universe plus the censoring is the thing which is generating your catalog. That's a reasonable position to take.
I'm stuck in bed with a bad back. I have been for a few days now. I am using the time to write in my summer writing projects, and talk to students and postdocs by Skype (tm). But it is hard to work when out sick, and it isn't necessarily a good idea. I'm not advocating it!
I worked more on my selection-function paper with Rix. I continued to struggle with understanding the controversy (between Loredo on one hand and various collaboration of my own on the other) about the likelihood function for a catalog. In my view, if you take a variable-rate Poisson process, and then censor it, where the censoring depends only on the individual properties of the individual objects being censored, you get a new variable-rate Poisson process with just a different rate function. If I am right, then there is at least one way of thinking about things such that the likelihood functions in the Bovy et al and Foreman-Mackey et al papers are correct. My day ended with a very valuable phone discussion of this with Foreman-Mackey. He (and I) would like to understand what is the difference in assumptions between us and Loredo.
I also worked today with Soledad Villar (NYU) to develop capstone projects for the masters program in the Center for Data Science. The Masters students do research projects, and we have lots of ideas about mashing up deep learning and astrophysics.
For a number of projects, my group has been trying to compare point sets to point sets, to determine transformations. Some contexts have been calibration (like photometric and astrometric calibration of images, where stars need to align, either on the sky or in magnitude space) and others have been in dynamics. Right now Suroor Gandhi (NYU), Adrian Price-Whelan (Flatiron), and I have been trying to find transformations that align phase-space structures (and especially the Snail) observed in different tracers: What transformation between tracers matches the phase-space structure? These projects are going by our code name MySpace.
Projects like these tend to have a pathology, however, related to a pathology that Robyn Sanderson (Flatiron) and I found in a different context in phase space: If you write down a naive objective for matching two point clouds, the optimal match often has one point cloud shrunk down to zero size and put on top of the densest location on the other point cloud! Indeed, Gandhi is finding this so we decided (today) to try symmetrizing the objective function to stop it. That is, don't just compare points A to points B, but also symmetrically compare points B to points A. Then (I hope) neither set can shrink to zero usefully. I hope this works! Now to make a symmetric objective function...
I spent my research time today writing in a paper Rix (MPIA) and I are preparing about selecting sources for a catalog or target selection. The fundamental story is that you need to make a likelihood function at the end of the day. And this, in turn, means that you need a tractable and relatively accurate selection function. This all took me down old paths I have traveled with Bovy (Toronto) and Foreman-Mackey (Flatiron).
In email correspondence, Foreman-Mackey reminded me of past correspondence with Loredo (Cornell), who disagrees with our work on these things for very technical reasons. His (very nice) explanation of his point is around equations (8) through (10) in this paper: It has to do with how to factorize a probability distribution for a collection of objects obtained in a censored, variable-rate Poisson process. But our historical view of this (and my restored view after a day of struggling) is that the form of the likelihood depends on fine details of how you believe the objects of study were selected for the catalog, or censored. If they were censored only by your detector, I think Loredo's form is correct. But if they were censored for physical reasons over which you have no dominion (for example a planet transit obscured by a tiny fluctuation in a star's brightness), the selection can come in to the likelihood function differently. That is, it depends on the causal chain involved in the source censoring.
[I have been on travel of various kinds, mostly non-work, for almost two weeks, hence no posts!]
While on my travels, I wrote in my project about target selection for spectroscopic surveys (with Rix) and my project about information theory and extreme-precision radial-velocity measurement (with Bedell). I also discovered this nice paper on Cepheid stars in the disk, which is a highly relevant position-space complement to what Eilers and I have been doing in velocity space.
On the weekend and today, Eilers (MPIA), Rix (MPIA), and I started to build a true ansatz for a m&equals2 spiral in the Milky Way disk, in both density and velocity. The idea is to compute the model as a perturbation away from an equilibrium model, and not self-consistent (because the stars we are using as tracers don't dominate the density of the spiral perturbation). This caused us to write down a whole bunch of functions and derivatives and start to plug them into the first-order expansion away from the steady-state equilibrium of an exponential disk (the Schwarzschild distribution, apparently). We don't have an ansatz yet that permits us to solve the equations, but it feels very very close. The idea behind this project is to use the velocity structure we see in the disk to infer the amplitude (at least) of the spiral density structure, and then compare to what's expected in (say) simulations or theory. Why not just observe the amplitude directly? Because that's harder, given selection effects (like dust).
I gave the Königstuhl Colloquium in Heidelberg today. I spoke about the (incredibly boring) subject of selecting targets for spectroscopic follow-up. The main point of my talk is that you want to select targets so that you can include the selection function in your inferences simply. That is, include it in your likelihood function, tractably. This puts actually extremely strong constraints on what you can and cannot do, and many surveys and projects have made mistakes with this (I think). I certainly have made a lot of mistakes, as I admitted in the talk. Hans-Walter Rix (MPIA) and I are trying to write a paper about this. The talk video is here (warning: I haven't looked at it yet!).
I had an inspiring conversation with Sara Rezaei Kh. (Gothenburg) today, about next-generation dust-mapping projects. As my loyal reader knows, I want to map the dust in 3d, and then 4d (radial velocity too) and then 6d (yeah) and even higher-d (because there will be temperature and size-distribution variations with position and velocity). She has some nice new data, where she has her own 3d dust map results along lines of sight that also have molecular gas emission line measurements. If it is true that dust traces molecular gas (even approximately) and if the 3-d dust map is good, then it should be possible to paint velocity onto dust with this combined data. My proposal is: Find the nonlinear function of radial position that is the mean radial velocity such that both line-of-sight maps are explained by the same dust in 4d. I don't know if it will work, but we were able to come up with some straw-man possible data sets for which it would obviously work. Exciting project.
[After I posted this, Josh Peek (STScI) sent me an email to note that these ideas are similar to things he has been doing with Tchernyshyov and Zasowski to put velocities onto dust clouds. Absolutely! And I love that work. That email (from Peek) inspires me to write something here that I thought was obvious, but apparently isn't: This blog is about my research. Mine! It is not intended to be a comprehensive literature review, or a statement of priority, or a proposal for future work. It is about what I am doing and talking about now. If anything that I mention in this blog has been done before, I will be citing that prior work if I ever complete a relevant project! Most ideas on this blog never get done, and when they do get done, they get done in responsible publications (and if you don't think they are responsible, email me, or comment here). This blog itself is not that responsible publication. It contains almost no references and it does not develop the full history of any idea. And, in particular, in this case, the ideas that Rezaei Kh. and I discussed this day (above) were indeed strongly informed by things that Peek and Tchernyshyov and Zasowski have done previously. I didn't cite them because I don't cite everything relevant when I blog. If full citations are required for blogging, I will stop blogging.]