I spent the last two days at the National Society of Black Physicists meeting in Providence RI. It was a great meeting, with a solid mix of traditional physics, strategizing about the state of the profession, and offline conversations about politics and the many communities of physicists. Many great things happened. Here are some random highlights: I learned from Bryen Irving (Stanford) that the harder neutron-star equations of state lead to larger tidal effects on binary inspiral. After all, harder state means larger radius, larger radius means more tidal distortion to the surface equipotential. Deep! I enjoyed very much a comment by Richard Anantua (Harvard) about “the importance of late-time effects on one's career”. He was talking about the point that there are combinatorially many ways to get from point A to point B in your career, and it is your current state that matters most. Beautiful! There was an excellent talk by Joseph Riboudo (Providence College) that was simultaneously about how to influence the community with a Decadal-survey white paper and about primarily undergraduate institutions and how we should be serving them as a community. He was filled with wisdom! And learning. Eileen Gonzalez (CUNY) showed her nice results understanding incredibly cool (and yes, I mean low-temperature) star binaries. She is finding that data-driven atmospheric retrieval methods plus clouds work better than grids of ab initio models. That's important for the JWST era. And I absolutely loved off-session chatting with Dara Norman (NOAO) and others. Norman is filled with conspiracy theories and I have to tell you something: They are all True. Norman also deserves my thanks for organizing much of the astrophysics content at the meeting. It was a great couple of days.
Grace Telford (Rutgers) showed up in NYC today and we discussed the inference of star-formation histories from observations of resolved stellar populations. We discussed the point that the space being high dimensional (because, say, the star formation history is modeled as a set of 30-ish star-formation rates in bins), which leads to two problems. The first is that a maximum-likelihood or maximum-a-posteriori setting of the SFH will be atypical (in high dimensions, optima are atypical relative to one-sigma-ish parameter settings). The second is that the results are generally extremely prior-dependent, and the priors are usually made up by investigators, not any attempt to represent their actual beliefs. We talked about ways to mitigate against these issues.
As my loyal reader knows, I am working with Lily Zhao (Yale) to calibrate the EXPRES spectrograph. Our approach is non-parametric: We can beat any polynomial calibration with an interpolation (we are using splines, but one could also use a Gaussian Process or any other method, I think). The funniest thing happened today, which surprised me, but shouldn't have! When Zhao plotted a histogram of the differences between our predicted line locations (from our interpolation) and the observed line locations (of held-out lines, held out from the interpolation), they were always redshifted! There was a systematic bias everywhere. We did all sorts of experiments but could find no bug. What gives? And then we had a realization which is pretty much Duh:
If you are doing linear interpolation (and we were at this point), and if your function is monotonically varying, and if your function's first derivative is also monotonically varying, the linear interpolator will always be biased to the same side! Hahaha. We switched to a cubic spline and everything went unbiased.
In detail, of course, interpolation will always be biased. After all, it does not represent your beliefs about how the data are generated, and it certainly does not represent the truth about how your data were generated. So it is always biased. It's just that once we go to a cubic spline, that bias is way below our precision and accuracy (under cross-validation). At least for now.
I had a meeting with Emily Cunningham (Flatiron) to discuss any projects of mutual interest. She has been looking at simulations of the Milky Way (toy simulations) in which the LMC and SMC fall in. These simulations get tidally distorted by the infall, and various observational consequences follow. For example, the disk ends up having a different mean velocity than the halo! And for another, different parts of the halo move relative to one another, in the mean. Cunningham's past work has been on the velocity variance; now it looks like she has a project on the velocity mean! The predictions are coming from toy simulations (from the Arizona group) but I'm interested in the more general question of what can be learned from spatial variations in the mean velocity in the halo. It might put strong constraints on the recent-past time-dependence.
Oh what a great day! Not a lot of research got done; NSF proposals, letters of recommendation, and all that. But in the afternoon, undergraduate researcher Abby Shaum (NYU) and I looked at her project to do frequency demodulation on asteroseismic modes to find orbital companions and we got one. Our target is a hot star that has a few very strong asteroseismic modes (around 14 cycles per day in frequency), and our demodulator is actually a phase demodulator (not frequency) but it's so beautiful:
The idea of the demodulator is that you mix (product) the signal (which, in this case, is bandpass-filtered NASA Kepler photometric data) with a complex sinusoid at (as precisely as you can set it) the asteroseismic carrier frequency. Then you Gaussian smooth the real and imaginary parts of that product over some window timescale (the inverse bandwidth, if you will). The resulting extremely tiny phase variations (yes these stars are coherent over years) have some periodogram or power spectrum, which shows periodicity at around 9 days, which is exactly the binary period we expected to find (from prior work).
I'm stoked! the advantages of our method over previous work are: Our method can easily combine information from many modes. Our method can be tuned to any modes that are in any data. We did not have to bin the lightcurve into bins; we only had to choose an effective bandwidth. The disadvantages are: We don't have a probabilistic model! We just have a procedure. But it's so simple and beautiful. I'm feeling like the engineer I was born to be.
It was a great research day today. I worked with Lily Zhao (Yale) on the wavelength calibration of the EXPRES spectrograph, which my loyal reader knows is a project of Debra Fischer (Yale). Lily and I cleaned up and sped up (by a lot) the polynomial fitting that the EXPRES team is doing, and showed (with a kind of cross-validation) that the best polynomial order for the fit is in the range 8 to 9. This is for a high-resolution, laser-frequency-comb-calibrated, temperature-controlled, bench-mounted, dual-fiber spectrograph.
But then we threw out that polynomial fit and just worked on interpolating the laser frequency-comb line positions. These are fixed in true wavelength and dense on the detector (for many orders, anyway). Oh my goodness did it work! When we switched from polynomial fitting to interpolation, the cross-validation tests got much better, and the residuals went from being very structured and repeatable to looking like white noise. When we averaged solutions, we got very good results, and when we did a PCA of the differences away from the mean solution, it looks like the variations are dominated by a single variability dimension! So it looks like we are going to end up with a very very low-dimensional, data-driven, non-parametric calibration system that hierarchically pools information from all the calibration data to calibrate every single exposure. I couldn't be more stoked!
A no-research day (Thursdays are always bad) was ended on a great note with a Colloquium by Ian Dobbs-Dixon (NYUAD), who spoke about the atmospheres of hot-jupiter-like exoplanets. He has a great set of equipment that connects the global climate model built for Earth climate modeling with lots of planet-relevant physics (like strong, anisotropic insolation and internal heat flows) to figure out what must be happening on these planets. He showed some nice predictions and also some nice explanations of the observed property (yes observed property) that these planets do not have their hottest point at the sub-stellar point. It's so exciting when we think forward to what might be possible with NASA JWST.
My main research contribution today was to write some notes for myself and Lily Zhao (Yale) about how we might start to produce a low-dimensional, hierarchical, non-parametric calibration model for the EXPRES spectrograph.
At the end of a long faculty meeting at NYU Physics, my colleague Shura Grosberg came to me to discuss a subject we have been discussing at a low rate for many months: How is it possible that my watch (my wristwatch) is powered purely by stochastic motions of my arm, when thermal ratchets are impossible? He presented to me a very simple model, in which my watch is seen a set of three coupled systems. One is the winder, which is a low-Q oscillator that works at long periods. The next is the escapement and spring, which is a high-Q oscillator that has a period of 0.2 seconds. The next is the thermal bath of noise to which the watch dissipates energy. If my arm delivers power only on long periods (or mainly on long periods), then it only couples well to the first of these. And then power can flow to the other two systems. Ah, I love physicists!
As my loyal reader knows, I love the Brown-Bag talks at the Center for Cosmology and Particle Physics. Today was a great example! Hongwan Liu (NYU) talking about milli-charged dark matter. Putting a charge in the dark sector is a little risky, because the whole point of dark matter is that it is invisible, electromagnetically! But it turns out that if you include enough particle complexity in the dark sector, you can milli-charge the dark matter and move thermal energy from the light sector into the dark sector and vice versa.
Liu was motivated by some issues with 21-cm intensity mapping, but he has some very general ideas and results in his work. I was impressed by the point that his work involves the heat capacity of the dark sector. That's an observable, in principle! And it depends on the particle mass, because a dark sector with smaller particle mass has more particles and therefore more degrees of freedom and more heat capacity! It's interesting to think about the possible consequences of this. Can we rule out very small masses somehow?
Continuing on stuff I got distracted into yesterday (when I should be working on NSF proposals!) I did some work on phase manipulation to interpolate between images. This was: Fourier transform both images, and interpolate in amplitude and phase independently, rather than just interpolate the complex numbers in a vector sense. It works in some respects and not in others. And it works much better on a localized image patch than in a whole image. I made this tweet to demonstrate. This is related to the idea that people who do this professionally use wavelet-like methods to get local phase information in the image instead of manipulating global phase. So the trivial thing doesn't work; I need to learn more!
Nora Shipp (Chicago) has been in town this week, working with Adrian Price-Whelan to find halo substructures and stellar streams around the Milky Way. The two of them made beautiful animations, paging through distance slices, showing halo stellar density (as measured by a color-magnitude matched filter). There are lots of things visible in those animations! We discussed the point that what makes overdensities appear to the human eye is their coherence through slices.
That made me think of things that Bill Freeman (MIT) and his lab does with amplifying small signals in video: Should we be looking for small overdensities with similar tricks? Freeman's lab uses phase transforms (like Fourier transforms and more localized versions of those) to detect and amplify small motions. Maybe we should use phase transforms here too. That led Price-Whelan and me to hack a little bit on this image pair by Judy Schmidt, which was fun but useless!
Late in the day, Megan Bedell (Flatiron), Lily Zhao (Yale), Debra Fischer (Yale), and I all met to discuss EXPRES data. It turns out that what the EXPRES team has in terms of data, and what they need in terms of technology, is incredibly well aligned with what Bedell and I want to do in the EPRV space. For example, EXPRES has been used to resolve the asteroseismic p-modes in a star. For another, it has made excellent observations of a spotty star. For another, it has a calibration program that wants to go hierarchical. I left work at the end of the day extremely excited about the opportunities here.
Today Josh Ruderman (NYU) gave a great Physics Colloquium, about particle physics phenomenology, from measuring important standard-model parameters with colliders to finding new particles in cosmology experiments. It was very wide-ranging and filled with nice insights about (among other things) thermal-relic dark matter and intuitions about (among other things) observability of different kinds of dark-sector activity. One theme of the dark-matter talks I have seen recently is that most sensible, zeroth-order bounds (like on mass and cross section for a thermal-relic WIMP) can be modified by slightly complexifying the problem (like by adding a dark photon or another dark state). Ruderman navigated a bunch of that for us nicely, and convinced us that there is lots to do in particle theory, even if the LHC remains in a standard-model desert.
Our LSST broker discussions from yesterday continued at the Cosmology X Machine Learning group meeting at Flatiron. The group helped us think a little bit about the supervised and unsupervised options in the time-domain space.
My day ended with a long conversation with Sjoert van Velzen (NYU), Tyler Pritchard (NYU), and Maryam Modjaz (NYU), about possible things we could be doing in the LSST time-domain and broker space. Our general interest is in finding interesting and unusual and outlier events that are interesting either because they are unprecedented, or because they are unusual within some subclass, or because they imply odd physical parameters or strange conditions. But we don't have much beyond that! We need to get serious in the next few months because there will be proposal calls.
As my loyal reader knows, I have opinions about spectroscopic extraction—the inference of the one-dimensional spectrum of an object as a function of wavelength, given the two-dimensional image of the spectrum in the spectrograph detector plane. The EXPRES team (I happen to know) and others have the issue with their spectrographs that the cross-dispersion direction (the direction precisely orthogonal to the wavelength direction) is not always perfectly aligned with the y direction on the detector. This is a problem because if it is aligned, there are very simple extraction methods available.
I spent parts of the day writing down not the general solution to this problem (which might possibly be Bolton & Schlegel's SpectroPerfectonism, although I have issues with that too), but rather with an expansion around the perfectly-aligned case, that leads to an iterative solution, but preserving the solutions that work at perfect alignment. It's so beautiful! As expansions usually are.
What to call this? I am building on Zechmeister et al's “flat-relative optimal extraction”. But I'm allowing tilts. So Froet? Is that a rude word in some language?
Marla Geha (Yale) crashed Flatiron today and we spent some time talking about a nice problem in spectroscopic data analysis: Imagine that you have a pipeline that works on each spectrum (or each exposure or each plate or whatever) separately, but that the same star has been observed multiple times. How do you post-process your individual-exposure results so that you get combined results that are the same as you would have if you had processed them all simultaneously. You want the calibration to be independent for each exposure, but he stellar template to be the same, for example. This is very related to the questions that Adrian Price-Whelan (Flatiron) and I have been solving in the last few weeks. You have to carry forward enough marginalized likelihood information to combine later. This involves marginalizing out the individual-exposure parameters but not the shared parameters. (And maybe making some additional approximations!)
As is not uncommon on a Friday, Astronomical Data Group meeting was great! So many things. One highlight for me was that Lily Zhao (Yale) has diagnosed—and figured out strategies related to—problems we had in wobble with the learning rate on our gradient descent. I hate optimization! But I love it when very good people diagnose and fix the problems in our optimization code!
Thursdays are low research days. I did almost nothing reportable here according to The Rules. I did have a valuable conversation with Price-Whelan (Flatiron) about marginalized likelihoods, and I started to get an intuition about why our factorization of Gaussian products has the form that it has. It has to do with the fact that the marginalized likelihood (the probability of the data, fully marginalizing out all linear parameters) permits or has variance for the data that is a sum in quadrature of the noise variance and the model variance. Ish!
I had an amusing email from out of the blue, asking me to dig up the IDL (yes, IDL) code that I (and Blanton and Bovy and Johnston and Roweis and others) wrote to analyze the local velocity field using the ESA Hipparcos data. Being a huge supporter of open science, I had to say yes to this request. I dug through old cvs repositories (not svn, not git, but cvs) and found the code, and moved it to Github (tm) here. I didn't truly convert the cvs repo to git, so I erased history, which is bad. But time is precious, and I could always fix that later. I hereby apologize to my co-authors!
All this illustrates to me that it is very good to put your code out in the open. One reason is that then you don't have to go digging like this; a simple google search would have found it! Another is that when you know your code will be out in the open, you are (at least slightly) more likely to make it readable and useable by others. I dug up and threw to the world this code, but will anyone other than the authors ever be able to make any use of it? Or even understand it? I don't know.
I had my weekly call with Ana Bonaca (Harvard) this morning, where she updated me on our look at systematic effects in the radial-velocity measurements we are getting out of Hectochelle. We see very small velocity shifts in stellar radial velocities across the field of view that seem unlikely to be truly in the observed astrophysical stellar systems we are observing. At this point, Bonaca can show that these velocity shifts do not appear in the sky lines; that is, the calibration (with arc lamps) of the wavelengths on the detector is good.
All I have left at this point is that maybe the stars illuminate the fibers differently from the sky (and arc lamps) and this difference in illumination is transmitted to the spectrograph. I know how to test that, but it requires observing time; we can't do it in the data we have in hand right now. This is an important thing for me to figure out though, because it is related to how we commission and calibrate the fiber robot for SDSS-V. Next question: Will anyone give us observing time to check this?
Today was almost all admin and teaching. But I did get to the Astronomical Data Group meeting at Flatiron, where we had good discussions of representation learning, light curves generated by spotted stars, the population of planets around slightly evolved stars, and accreted stellar systems in the Milky Way halo!
I got in a bit of research in a mostly-teaching day. I saw the CDS Math-and-Data seminar, which was by Peyman Milanfar (Google) about de-noising models. In particular, he was talking about some of the theory and ideas behind the de-noising that Google uses in its Pixel cameras and related technology. They use methods that are adaptive to the image itself but which don't explicitly learn a library of image priors or patch priors or anything like that from data. (But they do train the models on human reactions to the denoising.)
Milanfar's theoretical results were nice. For example: De-noising is like a gradient step in response to a loss function! That's either obvious or deep. I'll go with deep. And good images (non-noisy natural images) should be fixed points of the de-noising projection (which is in general non-linear). Their methods identify similar parts of the images and use commonality of those parts to inform the nonlinear projections. But he explained all this with very simple notation, which was nice.
After the talk I had a quick conversation with Jonathan Niles-Weed (NYU) about the geometry of the space of natural images. Here's a great argument he gave: Imagine you have two arbitrarily different images, like one of the Death Star (tm) and one of the inside of the seminar room. Are these images connected to one another in the natural-image subspace of image space? That is, is there a continuous transformation from one to the other, every point along which is itself a good natural image?
Well, if I can imagine a continuous tracking shot (movie) of me walking out of the seminar room and into a spaceship and then out of the airlock on a space walk to repair the Death Star (tm), and if every frame in that movie is a good natural image, and everything is continuous, then yes! What a crazy argument. The space of all natural images might be one continuously connected blob. Crazy! I love the way mathematicians think.
So many things. I love Wednesdays. Here's one: I spent a lot of the day working with Adrian Price-Whelan (Flatiron) on our issues with The Joker. We found some simple test cases, we made a toy version that has good properties, we compared to the code. Maybe we found a sign error!? But all this is in service of a conceptual data-analysis project I want to think about much more: What can you say about signals with periodicity (or structure) on time scales far, far longer than the baseline of your observations? Think long-period companions in RV surveys or Gaia data. Or the periods of planets that transit only once in your data set. Or month-long asteroseismic modes in a giant star observed for only a week. I think it would be worth getting some results here (and I am thinking information theory) because I think there will be some interesting scalings (like lots of things might have precisions that scale better (faster I mean) than the square-root of time baseline).
In Stars & Exoplanets meeting at Flatiron, many cool things happened! But a highlight for me was a discovery (reported by Saurabh Jha of Rutgers) that the bluest type Ia supernovae are more standardizeable (is that a word?) candles than the redder ones. He asked us how to combine the information from all supernovae with maximum efficiency. I know how to do that! We opened a thread on that. I hope it pays off.
Today Kristina Hayhurst (NYU) came to my office and, with a little documentation-hacking, we figured out how to read and plot ESA Planck data or maps released in the Planck archive! I am excited, because there is so much to look at in these data. Hayhurst's project is to look at the “Van Gogh” plot of the polarization: Can we do this better?
In the CCPP Brown-Bag seminar today, Neal Weiner (NYU) spoke about the possible connections between the dark sector (where dark matter lives) and our sector (where the standard model lives). He discussed the WIMP miracle, and then where we might look in phenomenology space for the particle interactions that put the WIMPs or related particles in equilibrium with the standard-model particles in the early Universe.
In the afternoon, I worked with Abby Shaum (NYU) and Kate Storey-Fisher (NYU) to get our AAS abstracts ready for submission for the AAS Winter Meeting in Honolulu.
Adrian Price-Whelan (Flatiron) and I spent time this past week trying to factorize products of Gaussians into new products of different Gaussians. The context is Bayesian inference, where you can factor the joint probability of the data and your parameters into a likelihood times a prior or else into an evidence (what we here call the FML) times a posterior. The factorization was causing us pain this week, but I finally got it this weekend, in the woods. The trick I used (since I didn't want to expand out enormous quadratics) was to use a determinant theorem to get part of the way, and some particularly informative terms in the quadratic expansion to get the rest of the way. Paper (or note or something) forthcoming...
Megan Bedell (Flatiron) and I continued our work from earlier this week on making a mechanical model of stellar asteroseismic p-modes as damped harmonic oscillators driven by white noise. Because the model is so close to closed-form (it is closed form between kicks, and the kicks are regular and of random amplitude), the code is extremely fast. In a couple minutes we can simulate a realistic, multi-year, dense, space-based observing campaign with a full forest of asteroseismic modes.
The first thing we did with our model is check the results of the recent paper on p-mode mitigation by Chaplin et al, which suggest that you can obtain mitigation of p-mode noise in precision radial-velocity observation campaigns by good choice of exposure time. We expected, at the outset, that the results of this paper are too optimistic: We expected that a fixed exposure time would not do a good job all the time, given the stochastic nature of the driving of the modes, and that there are many modes in a frequency window around the strongest modes. But we were wrong and the Chaplin et al paper is correct! Which is good.
However, we believe that we can do better than exposure-time-tuning for p-mode mitigation. We believe that we can fit the p-modes with the (possibly non-stationary) integral of a stationary Gaussian process, tuned to the spectrum. That's our next job.
Our weekly Stars and Exoplanets Meeting at Flatiron was all about stellar rotation somehow this week (no we don't plan this!). Adrian Price-Whelan (Flatiron) showed that stellar rotations can get so large in young clusters that stars move off the main sequence and the main sequence can even look double. We learned (or I learned) that a significant fraction of young stars are born spinning very close to break-up. This I immediately thought was obviously wrong and then very quickly decided was obvious: It is likely if the last stages of stellar growth are from accretion. Funny how an astronomer can turn on a dime.
And in that same meeting, Jason Curtis (Columbia) brought us up to date on his work on on stellar rotation and its use as a stellar clock. He showed that the usefulness is great (by comparing clusters of different ages); it looks incredible for at least the first Gyr or so of a stars lifetime. But the usefulness decreases at low masses (cool temperatures). Or maybe not, but the physics looks very different.
In the morning, before the meeting, Megan Bedell (Flatiron) and I built a mechanical model of an asteroseismic mode by literally making a code that produces a damped, driven harmonic oscillator, driven by random delta-function kicks. That was fun! And it seems to work.
The highlight of a low-research day was a great NYU Astro Seminar by Maria Okounkova (Flatiron) about testing or constraining extensions to general relativity using the LIGO detections of black hole binary inspirals. She is interested in terms in a general expansion that adds to Einstein's equations higher powers of curvature tensors and curvature scalars. One example is the Chern–Simons modification, which adds some anisotropy or parity-violation. She discussed many things, but the crowd got interested in the point that the Event Horizon Telescope image of the photon sphere (in principle) constrains the Chern–Simons terms! Because the modification distorts the photon sphere. Okounkova emphasized that the constraints on GR (from both gravitational radiation and imaging) get better as the black holes in question get smaller and closer. So keep going, LIGO!
I had a conversation with Ana Bonaca (Harvard) early today about the sky emission lines in sky fibers in Hectochelle. We are trying to understand if the sky is at a consistent velocity across the device. This is part of calibrating or really self-calibrating the spectrograph. It's confusing though, because the sky illuminates a fiber differently than the way that a star illuminates a fiber. So this test only tests some part of the system.
At the Brown-bag talk, Bob Johnson (Virginia) spoke about exo-moons and in particular exo-Ios. Yes, analogs of Jupiter's moon Io. The reason this is interesting is that Io interacts magnetically and volcanically with Jupiter, producing an extended distribution of volcanically produced ions in Jupiter's magnetic field. It is possible that transmission spectroscopy of hot Jupiters is being polluted by volcanic emissions of very hot moons! That would be so cool! Or hot?
My loyal reader knows that earlier this week I got interested in (read: annoyed with) the standard description of the optimal extraction method of obtaining one-dimensional spectra from two-dimensional spectrograph images, and started writing about it on a trip. On return to New York, Lily Zhao (Yale) listened patiently to my ranting and then pointed out this paper by Zechmeister et al on flat-relative extraction, which (in a much nicer way) makes all my points!
This is a classic example of getting scooped! But my feeling—on learning that I have been scooped—was of happiness, not sadness: I hadn't spent all that much time on it; the time I spent did help me understand things; and I am glad that the community has a better method. Also, it means I can concentrate on extracting, not on writing about extracting! So I found myself happy about learning that I was scooped. (One problem with not reading the literature very carefully is that I need to have people around who do read the literature!)
I had a quick pair-coding session with Anu Raghunathan (NYU) today to discuss the box least squares algorithm that is used so much in finding exoplanets. We are looking at the statistics of this algorithm, with the hope of understanding it in simple cases. It is such a simple algorithm, many of the things we want to know about uncertainty and false-positive rate can be determined in closed form, given a noise model for the data. But I'm interested in things like: How much more sensitive is a search when you know (in advance) the period of the planet? Or that you have a resonant chain of planets? These questions might also have closed-form answers, but I'm not confident of them, so we are making toy data.
On the plane home, I wrote words about optimal extraction, the method for spectral analysis used in most extreme precision radial-velocity pipelines. My point is so simple and dumb, it barely needs to be written. But if people got it, it would simplify pipelines. The point is about flat-field and PSF: The way things are done now is very sensitive to these two things, which are not well known for rarely or barely illuminated pixels (think: far from the spectral traces).
Once home, I met up with a crew of data-science students at the Center for Data Science to discuss making adversarial attacks against machine-learning methods in astronomy. We talked about different kinds of machine-learning structures and how they might be sensitive to attack. And how methods might be made robust against attack, and what that would cost in training and predictive accuracy. This is a nice ball of subjects to think about! I have a funny fake-data example that I want to promote, but (to their credit) the students want to work with real data.
I achieved my goals for Terra Hunting Experiment this week! After my work on the plane and the discussion we had yesterday, we (as a group) were able to draft a set of potentially sensible and valuable high-level goals for the survey. These are, roughly, maximizing the number of stars around which we have sensitivity to Earth-like planets, delivering statistically sound occurrence rate estimates, and delivering scientifically valuable products to the community. In that order! More about this soon. But I'm very pleased.
Another theme of the last two days is that most or maybe all EPRV experiments do many things slightly wrong. Like how they do their optimal extraction. Or how they propagate their simultaneous reference to the science data. Or how they correct the tellurics. None of these is a big mistake; they are all small mistakes. But precision requirements are high! Do these small mistakes add up to anything wrong or problematic at the end of the day? Unfortunately, it is expensive to find out.
Related: I discovered today that the fundamental paper on optimal extraction contains some conceptual mistakes. Stretch goal: Write a publishable correction on the plane home!
Today at the Terra Hunting Experiment Science Team meeting (in the beautiful offices of the Royal Astronomical Society in London) we discussed science-driven aspects of the project. There was way too much to report here, but I learned a huge amount in presentations by Annelies Mortier (Cambridge) and by Samantha Thompson (Cambridge) about the sources of astrophysical variability in stars that is (effectively) noise in the RV signals. In particular, they have developed aspects of a taxonomy of noise sources that could be used to organize our thinking about what's important to work on and what approaches to take. I got excited about working on mitigating these, which my loyal reader knows is the subject of my most recent NASA proposal.
Late in the day, I made my presentation about possible high-level goals for the survey and how we might flow decisions down from those goals. There was a very lively discussion of these. What surprised me (given the diversity of possible goals, from “find an Earth twin” to “determine the occurrence rate for rocky planets at one-year periods”) was that there was a kind of consensus: One part of the consensus was along the lines of maximizing our sensitivity where no other survey has ever been sensitive. Another part of the consensus was along the lines of being able to perform statistical analyses of our output.
I flew today to London for a meeting of the Terra Hunting Experiment science team. On the plane, I worked on a presentation that looks at the high-level goals of the survey and what survey-level and operational decisions will flow down from those goals. Like most projects, the project was designed to have a certain observing capacity (number of observing hours over a certain—long—period of time). But in my view, how you allocate that time should be based on (possibly reverse-engineered) high-level goals. I worked through a few possible goals and what they might mean for us. I'm hoping we will make some progress on this point this week.
Today was the third day of Gotham Fest, three Fridays in September in which all of astronomy in NYC meets all of astronomy in NYC. Today's installment was at NYU, and I learned a lot! But many four-minute talks just leave me wanting much, much more.
Before that, I met up with Adrian Price-Whelan (Flatiron) and Kathryn Johnston (Columbia) to discuss projects in the Milky Way disk with Gaia and chemical abundances (from APOGEE or other sources). We discussed the reality or usefulness of the idea that the vertical dynamics in the disk is separable from the radial and azimuthal dynamics, and how this might impact our projects. We'd like to do some one-dimensional problems, because they are tractable and easy to visualize. But not if they are ill-posed or totally wrong. We came up with some tests of the separability assumption and left it to Price-Whelan to execute.
At lunch, I discussed machine learning with Gabi Contardo (Flatiron). She has some nice results on finding outliers in data. We discussed how to make her project such that it could find outliers that no-one else could find by any other method.
With Suroor Gandhi (NYU) and Adrian Price-Whelan (Flatiron) we have been able to formulate (we think) some questions about unseen gravitational matter (dark matter and unmapped stars and gas) in the Milky Way into questions about transformations that map one set of points onto another set of points. How, you might ask? By thinking about dynamical processes that set up point distributions in phase space.
Being physicists, we figured that we can do this all ourselves! And being Bayesians, we reached for probabilistic methods. Like: Build a kernel density estimate on one set of points and maximize the likelihood given the other set of points and the transformation. That's great! But it has high computational complexity, and it is slow to compute. But for our purposes, we don't need this to be a likelihood, so we found out (through Soledad Villar, NYU) about optimal transport
Despite its name, optimal transport is about solving problems of this type (find transformations that match point sets) with fast, good algorithms. The optimal-transport setting brings a clever objective function (that looks like earth-mover distance) and a high-performance tailored algorithm to match (that looks like linear programming). I don't understand any of this yet, but Math may have just saved our day. I hope I have said here recently how valuable it is to talk out problems with applied mathematicians!
I got in some great research time late today working with Adrian Price-Whelan (Flatiron) to understand the morphology of the distribution of stars in APOGEE–Gaia in elements-energy space. The element abundances we are looking at are [Fe/H] and [alpha/Fe]. The energy we are looking at is vertical energy (as in something like the vertical action in the Milky Way disk). We are trying to execute our project called Chemical Tangents, in which we use the element abundances to find the orbit structure of the Galaxy. We have arguments that this will be more informative than doing Jeans models or other equilibrium models. But we want to demonstrate that this semester.
There are many issues! The issue we worked on today is how to model the abundance space. In principle we can construct a model that uses any statistics we like of the abundances. But we want to choose our form and parameterization with the distribution (and its dependence on energy of course) in mind. We ended our session leaning towards some kind of mixture model, where the dominant information will come from the mixture amplitudes. But going against all this is that we would like to be doing a project that is simple! When Price-Whelan and I get together, things tend to get a little baroque if you know what I mean?
I spent my research time today writing notes on paper and then LaTeX in a document, making more specific plans for the projects we discussed yesterday with Zhao (Yale) and Bedell (Flatiron). Zhao also showed me issues with EXPRES wavelength calibration (at the small-fraction-of-a-pixel level). I opined that it might have to do with pixel-size issues. If this is true, then it should appear in the flat-field. We discussed how we might see it in the data.
Today I had a great conversation with Lily Zhao (Yale) and Megan Bedell (Flatiron) about Zhao's projects for the semester at Flatiron that she is starting this moth. We have projects together in spectrograph calibration, radial-velocity measurement, and time-variability of stellar spectra. On that last part, we have various ideas about how to see the various kinds of variability we expect in the joint domain of wavelength and time. And since we have a data-driven model (wobble) for stellar spectra under the assumption that there is no time variability, we can look for the things we seek in the residuals (in the data space) away from that time-independent model. We talked about what might be the lowest hanging fruit and settled on p-mode oscillations, which induce radial-velocity variations but also brightness and temperature variations. I hope this works!
I spoke with Christina Eilers (MPIA) early yesterday about a possible self-calibration project, for stellar element abundance measurements. The idea is: We have noisy element-abundance measurements, and we think they may be contaminated by biases as a function of stellar brightness, temperature, surface gravity, dust extinction, and so on. That is, we don't think the abundance measurements are purely measurements of the relevant abundances. So we have formulated an approach to solve this problem in which we regress the abundances against things we think should predict abundances (like position in the Galaxy) and also against things we think should not predict abundances (like apparent magnitude). This should deliver the most precise maps of the abundance variations in the Galaxy but also deliver improved measurements, since we will know what spurious signals are contaminating the measurements. I wrote words in a LaTeX document about all this today, in preparation for launching a project.
Today I got in my first weekly meeting (of the new academic year) with Kate Storey-Fisher (NYU). We went through priorities and then spoke about the problem of performing some kind of comprehensive or complete search of the large-scale structure data for anomalies. One option (popular these days) is to train a machine-learning method to recognize what's ordinary and then ask it to classify non-ordinary structures as anomalies. This is a great idea! But it has the problem that, at the end of the day, you don't know how many hypotheses you have tested. If you find a few-sigma anomaly, that isn't surprising if you have looked in many thousands of possible “places”. It is surprising if you have only looked in a few. So I am looking for comprehensive approaches where we can pre-register an enumerated list of tests we are going to do, but to have that list of tests be exceedingly long (like machine-generated). This is turning out to be a hard problem.
The New York City physics and astronomy departments (and this includes at least Columbia, NYU, CUNY, AMNH, and Flatiron) run a set of three Friday events in which everyone (well a large fraction of everyone) presents a brief talk about who they are and what they do. The first event was today.
I re-derived equation (11) in our paper on The Joker, in order to answer some of the questions I posed yesterday. I find that the paper does have a sign error, although I am pretty sure that the code (based on the paper) does not have a sign error. I also found that I could generalize the equation to apply to a wider range of cases, which makes me think that we should either write an updated paper or at least include the math, re-written, in our next paper (which will be on the SDSS-IV APOGEE2 DR16 data).
This morning, Adrian Price-Whelan proposed that we might have a sign error in equation (11) in our paper on The Joker. I think we do, on very general grounds. But we have to sit down and re-do some math to check it. This all came up in the context that we are surprised about some of the results of the orbit fitting that The Joker does. In a nutshell: Even when a stellar radial-velocity signal is consistent with no radial-velocity trends (no companions), The Joker doesn't permit or admit many solutions that are extremely long-period. We can't tell whether this is expected behavior, and we are just not smart enough to expect it correctly, or whether this is unexpected behavior because our code has a bug. Hilarious! And sad, in a way. Math is hard. And inference is hard.
One of my projects this Fall (with Soledad Villar) is to show that large classes of machine-learning methods used in astronomy are susceptible to adversarial attacks, while others are not. This relates to things like the over-fitting, generalizability, and interpretability of the different kinds of methods. Now what would constitute a good adversarial example for astronomy? One would be classification of galaxy images into elliptical and spiral, say. But I don't actually think that is a very good use of machine learning in astronomy! A better use of machine learning is converting stellar spectra into temperatures, surface gravities, and chemical abundances.
If we work in this domain, we have two challenges. The first is to re-write the concept of an adversarial attack in terms of a regression (most of the literature is about classification). And the second is to define large families of directions in the data space that are not possibly of physical importance, so that we have some kind of algorithmic definition of adversarial. The issue is: Most of these attacks in machine-learning depend on a very heuristic idea of what's what: The authors look at the images and say “yikes”. But we want to find these attacks more-or-less algorithmically. I have ideas (like capitalizing on either the bandwidth of the spectrograph or else the continuum parts of the spectra), but I'd like to have more of a theory for this.
The self-calibration idea is extremely powerful. There are many ways to describe it, but one is that you can exploit your beliefs about causal structure to work out which trends in your data are real, and which are spurious from, say, calibration issues. For example, if you know that there is a set of stars that don't vary much over time, the differences you see in their magnitudes on repeat observations probably have more to do with throughput variations in your system than real changes to the stars. And your confidence is even greater if you can see the variation correlate with airmass! This was the basis of the photometric calibration (that I helped design and build) of the Sloan Digital Sky Survey imaging, and similar arguments have underpinned self-calibrations of cosmic microwave background data, radio-telescope atmospheric phase shifts, and Kepler light curves, among many other things.
The idea I worked on today relates to stellar abundance measurements. When we measure stars, we want to determine absolute abundances (or abundances relative to the Sun, say). We want these abundances to be consistent across stars, even when those stars have atmospheres at very different temperatures and surface gravities. Up to now, most calibration has been at the level of checking that clusters (particularly open clusters) show consistent abundances across the color–magnitude diagram. But we know that the abundance distribution in the Galaxy ought to depend strongly on actions, weakly on angles, and essentially not at all (with some interesting exceptions) on stellar temperature, nor surface gravity, nor which instrument or fiber took the spectrum. So we are all set to do a self-calibration! I wrote a few words about that today, in preparation for an attempt.
Mattias Samland (MPIA), as part of his PhD dissertation, adapted the CPM model we built to calibrate (and image-difference) Kepler and TESS imaging to operate on direct imaging of exoplanets. The idea is that the direct imaging is taken over time, and speckles move around. They move around continuously and coherently, so a data-driven model can capture them, and distinguish them from a planet signal. (The word "causal" is the C in CPM, because it is about the differences between how systematics and real signals present themselves in the data.) There is lots of work in this area (including my own), but it tends to make use of the spatial (and wavelength) rather than temporal coherence. The CPM is all about time. It turns out this works extremely well; Samland's adaptation of CPM looks like it outperforms spatial methods, especially at small “working angles” (near the nulled star; this is coronography!).
But of course a model that uses the temporal coherence but ignores the spatial and wavelength coherence of the speckles cannot be the best model! There is coherence in all four directions (time, two angles, and wavelength) and so a really good speckle model must be possible. That's a great thing to work on in the next few years, especially with the growing importance of coronographs at ground-based and space-based observatories, now and in the future. Samland and I discussed all this, and specifics of the paper he is nearly ready to submit.
I'm very proud of the things we have done over the years with our project called The Cannon, in which we learn a generative model of stellar spectra from stellar labels, all data driven, and then use that generative model to label other stellar spectra. This system has been successful, but it is also robust against certain kinds of over-fitting, because it is formulated as a regression from labels to data (and not the other way around). However, The Cannon has some big drawbacks. One is that (in its current form) the function space is hard-coded to be polynomial, which is both too flexible and not flexible enough, depending on context. Another is that the spectral representation is the pixel basis, which is just about the worst possible representation, given spectra of stars filled with known absorption lines at fixed resolution. And another is that the model might need latent freedoms that go beyond the known labels, either because the labels have issues (are noisy) or some are missing or they are incomplete (the full set of labels isn't sufficient to predict the full spectrum).
This summer we have discussed projects to address all three of these issues. Today I worked down one direction of this with Adam Wheeler (Columbia): The idea is to build a purely linear version of The Cannon but where each star is modeled using a generative model built just on its near neighbors. So you get the simplicity and tractability of a linear model but the flexibility of non-parametrics. But we also are thinking about operating in a regime in which we have no labels! Can we measure abundance differences between stars without ever knowing the absolute abundances? I feel like it might be possible if we structure the model correctly. We discussed looking at Eu and Ba lines in APOGEE spectra as a start; outliers in Eu or Ba are potentially very interesting astrophysically.
Today (and really over the last few days as well) I had a long discussion with Ana Bonaca (Harvard) about the results of our spectroscopy in the GD-1 stellar-stream fields. As my loyal reader knows, Bonaca, Price-Whelan, and I have a prediction for what the radial velocities should look like in the stream, if it is a cold stream that has been hit by a massive perturber. Our new velocity measurements (with the Hectochelle instrument) are not the biggest and best possible confirmation of that prediction!
However, our velocities are not inconsistent with our predictions either. The question is: What to say in our paper about them? We listed the top conclusions of the spectroscopy, and also discussed the set of figures that would bolster and explain those conclusions. Now to plotting and writing.
Along the way to understanding these conclusions, I think Bonaca has found a systematic issue (at extremely fine radial-velocity precision) in the way that the Hectochelle instrument measures radial velocities. I hope we are right, because if we are, the GD-1 stream might become very cold, and our velocity constraints on any perturbation will become very strong. But we will follow up with the Hectochelle team next week. It's pretty subtle.
Today I was finally back up at MPIA. I spent a good fraction of the day talking with Doug Finkbeiner (Harvard), Josh Speagle (Harvard) and others about probabilistic catalogs. Both Finkbeiner's group and my own have produced probabilistic catalogs. But these are not usually a good idea! The problem is that they communicate (generally) posterior information and not likelihood information. It is related to the point that you can't sample a likelihood! The big idea is that knowledge is transmitted by likelihood, not posterior. A posterior contains your beliefs and your likelihood. If I want to update my beliefs using your catalog, I need your likelihood, and I don't want to take on your prior (your beliefs) too.
This sounds very ethereal, but it isn't: The math just doesn't work out if you get a posterior catalog and want to do science with it. You might think you can save yourself by dividing out the prior but (a) that isn't always easy to do, and (b) it puts amazingly strong constraints on the density of your samplings; unachievable in most real scientific contexts. These problems are potentially huge problems for LSST and future Gaia data releases. Right now (in DR2, anyway) Gaia is doing exactly the correct thing, in my opinion.
My enforced week off work has been awesome for writing code. I deepened my knowledge and interest in the Google (tm) Colaboratory (tm) by writing a notebook (available here) that constructs fake stars in a fake galaxy and observes them noisily in a fake spectroscopic survey. This is in preparation for measuring the selection function and doing inference to determine the properties of the whole galaxy from observations of the selected, noisily observed stars. This in turn relates to the paper on selection functions and survey design that I am writing with Rix (MPIA); it could be our concrete example.
Today Doug Finkbeiner (Harvard), Josh Speagle (Harvard), and Ana Bonaca (Harvard) came to visit me in my undisclosed location in Heidelberg. We discussed many different things, including Finkbeiner's recent work on finding outliers and calibration issues in the LAMOST spectral data using a data-driven model, and Speagle's catalog of millions of stellar properties and distances in PanSTARRS+Gaia+2MASS+WISE.
Bonaca and I took that latter catalog and looked at new ways to visualize it. We both have the intuition that good visualization could and will pay off in these large surveys. Both in terms of finding structures and features, and giving us intuition about how to build automated systems that will then look for structures and features. And besides, excellent visualizations are productive in other senses too, like for use in talks and presentations. I spent much of my day coloring stars by location in phase space or the local density in phase space, or both. And playing with the color maps!
There's a big visualization literature for these kinds of problems. Next step is to try to dig into that.
When I am stuck in a quiet attic room, doing nothing but writing, I tend to go off the rails! This has happened in my paper with Rix about target selection for catalogs and surveys: It is supposed to be about survey design and now it has many pages about the likelihood function. It's a mess. Is it two papers? Or is it a different paper?
I resolved (for now, to my current satisfaction) my issues from a few days ago, about likelihoods for catalogs. I showed that the likelihood that I advocate does not give biased inferences, and does permit inference of the selection function (censoring process) along with the inference of the world. I did this with my first ever use of the Google (tm) Colaboratory (tm). I wanted to see if it works, and it does. My notebook is here (subject to editing and changing, so no promises about its state when you go there). If your model includes the censoring process—that is, if you want to parameterize and learn the catalog censoring along with the model of the world—then (contra Loredo, 2004) you have to use a likelihood function that depends on the selection function at the individual-source level. And I think it is justified, because it is the assumption that the universe plus the censoring is the thing which is generating your catalog. That's a reasonable position to take.
I'm stuck in bed with a bad back. I have been for a few days now. I am using the time to write in my summer writing projects, and talk to students and postdocs by Skype (tm). But it is hard to work when out sick, and it isn't necessarily a good idea. I'm not advocating it!
I worked more on my selection-function paper with Rix. I continued to struggle with understanding the controversy (between Loredo on one hand and various collaboration of my own on the other) about the likelihood function for a catalog. In my view, if you take a variable-rate Poisson process, and then censor it, where the censoring depends only on the individual properties of the individual objects being censored, you get a new variable-rate Poisson process with just a different rate function. If I am right, then there is at least one way of thinking about things such that the likelihood functions in the Bovy et al and Foreman-Mackey et al papers are correct. My day ended with a very valuable phone discussion of this with Foreman-Mackey. He (and I) would like to understand what is the difference in assumptions between us and Loredo.
I also worked today with Soledad Villar (NYU) to develop capstone projects for the masters program in the Center for Data Science. The Masters students do research projects, and we have lots of ideas about mashing up deep learning and astrophysics.
For a number of projects, my group has been trying to compare point sets to point sets, to determine transformations. Some contexts have been calibration (like photometric and astrometric calibration of images, where stars need to align, either on the sky or in magnitude space) and others have been in dynamics. Right now Suroor Gandhi (NYU), Adrian Price-Whelan (Flatiron), and I have been trying to find transformations that align phase-space structures (and especially the Snail) observed in different tracers: What transformation between tracers matches the phase-space structure? These projects are going by our code name MySpace.
Projects like these tend to have a pathology, however, related to a pathology that Robyn Sanderson (Flatiron) and I found in a different context in phase space: If you write down a naive objective for matching two point clouds, the optimal match often has one point cloud shrunk down to zero size and put on top of the densest location on the other point cloud! Indeed, Gandhi is finding this so we decided (today) to try symmetrizing the objective function to stop it. That is, don't just compare points A to points B, but also symmetrically compare points B to points A. Then (I hope) neither set can shrink to zero usefully. I hope this works! Now to make a symmetric objective function...
I spent my research time today writing in a paper Rix (MPIA) and I are preparing about selecting sources for a catalog or target selection. The fundamental story is that you need to make a likelihood function at the end of the day. And this, in turn, means that you need a tractable and relatively accurate selection function. This all took me down old paths I have traveled with Bovy (Toronto) and Foreman-Mackey (Flatiron).
In email correspondence, Foreman-Mackey reminded me of past correspondence with Loredo (Cornell), who disagrees with our work on these things for very technical reasons. His (very nice) explanation of his point is around equations (8) through (10) in this paper: It has to do with how to factorize a probability distribution for a collection of objects obtained in a censored, variable-rate Poisson process. But our historical view of this (and my restored view after a day of struggling) is that the form of the likelihood depends on fine details of how you believe the objects of study were selected for the catalog, or censored. If they were censored only by your detector, I think Loredo's form is correct. But if they were censored for physical reasons over which you have no dominion (for example a planet transit obscured by a tiny fluctuation in a star's brightness), the selection can come in to the likelihood function differently. That is, it depends on the causal chain involved in the source censoring.
[I have been on travel of various kinds, mostly non-work, for almost two weeks, hence no posts!]
While on my travels, I wrote in my project about target selection for spectroscopic surveys (with Rix) and my project about information theory and extreme-precision radial-velocity measurement (with Bedell). I also discovered this nice paper on Cepheid stars in the disk, which is a highly relevant position-space complement to what Eilers and I have been doing in velocity space.
On the weekend and today, Eilers (MPIA), Rix (MPIA), and I started to build a true ansatz for a m&equals2 spiral in the Milky Way disk, in both density and velocity. The idea is to compute the model as a perturbation away from an equilibrium model, and not self-consistent (because the stars we are using as tracers don't dominate the density of the spiral perturbation). This caused us to write down a whole bunch of functions and derivatives and start to plug them into the first-order expansion away from the steady-state equilibrium of an exponential disk (the Schwarzschild distribution, apparently). We don't have an ansatz yet that permits us to solve the equations, but it feels very very close. The idea behind this project is to use the velocity structure we see in the disk to infer the amplitude (at least) of the spiral density structure, and then compare to what's expected in (say) simulations or theory. Why not just observe the amplitude directly? Because that's harder, given selection effects (like dust).
I gave the Königstuhl Colloquium in Heidelberg today. I spoke about the (incredibly boring) subject of selecting targets for spectroscopic follow-up. The main point of my talk is that you want to select targets so that you can include the selection function in your inferences simply. That is, include it in your likelihood function, tractably. This puts actually extremely strong constraints on what you can and cannot do, and many surveys and projects have made mistakes with this (I think). I certainly have made a lot of mistakes, as I admitted in the talk. Hans-Walter Rix (MPIA) and I are trying to write a paper about this. The talk video is here (warning: I haven't looked at it yet!).
I had an inspiring conversation with Sara Rezaei Kh. (Gothenburg) today, about next-generation dust-mapping projects. As my loyal reader knows, I want to map the dust in 3d, and then 4d (radial velocity too) and then 6d (yeah) and even higher-d (because there will be temperature and size-distribution variations with position and velocity). She has some nice new data, where she has her own 3d dust map results along lines of sight that also have molecular gas emission line measurements. If it is true that dust traces molecular gas (even approximately) and if the 3-d dust map is good, then it should be possible to paint velocity onto dust with this combined data. My proposal is: Find the nonlinear function of radial position that is the mean radial velocity such that both line-of-sight maps are explained by the same dust in 4d. I don't know if it will work, but we were able to come up with some straw-man possible data sets for which it would obviously work. Exciting project.
[After I posted this, Josh Peek (STScI) sent me an email to note that these ideas are similar to things he has been doing with Tchernyshyov and Zasowski to put velocities onto dust clouds. Absolutely! And I love that work. That email (from Peek) inspires me to write something here that I thought was obvious, but apparently isn't: This blog is about my research. Mine! It is not intended to be a comprehensive literature review, or a statement of priority, or a proposal for future work. It is about what I am doing and talking about now. If anything that I mention in this blog has been done before, I will be citing that prior work if I ever complete a relevant project! Most ideas on this blog never get done, and when they do get done, they get done in responsible publications (and if you don't think they are responsible, email me, or comment here). This blog itself is not that responsible publication. It contains almost no references and it does not develop the full history of any idea. And, in particular, in this case, the ideas that Rezaei Kh. and I discussed this day (above) were indeed strongly informed by things that Peek and Tchernyshyov and Zasowski have done previously. I didn't cite them because I don't cite everything relevant when I blog. If full citations are required for blogging, I will stop blogging.]
I had a conversation today with Kate Storey-Fisher (NYU) about the software she is writing in our large-scale structure projects. One question is whether to develop on a fork of another project, or to bring in code from another project and work in our own project?
My view on this is complicated. I am a big believer in open-source software and building community projects. But I am also a believer that science and scientific projects have to have clear authorship: Authorship is part of giving and getting credit, part of taking responsibility for your work and decisions, and part of the process of criticism that is essential to science (in its current form now). So we left this question open; we didn't decide.
But my thoughts about the right thing to do depend on many factors, like: Is this code an important part of your scientific output, or is it a side project? Do you expect to write a paper about this code? Do you expect or want this code to be used by others?
Today Christina Eilers (MPIA) gave a great colloquium talk at MPIA about the intergalactic medium, and how it can be used to understand the lifetime of quasars: Basically the idea is that quasars ionize bubbles around themselves, and the timescales are such that the size of the bubble tells you the age of the quasar. It's a nice and simple argument. Within this context, she finds some very young quasars; too young to have grown to their immense sizes. What explanation? There are ways to get around the simple argument, but they are all a bit uncomfortable. Of course one idea I love (but it sure is speculative) is the idea that maybe these very young quasars are primordial black holes!
In other research today (actually, I think this is not research according to the Rules), I finished a review of a book (a history of science book, no less) for Princeton University Press. I learned that reviewing a book for a publisher is a big job!
Yesterday Eilers (MPIA) and I thought that splitting the stars into many populations would help us: Every stellar population would have its own kinematic distribution, but every population would share the same gravitational potential. We were right on the first part: The velocity dispersion and scale length are both a strong function of chemical abundances (metallicity or alpha enhancement). We even made a bin-free model where we modeled the dependences continuously! But for each stellar population, the degeneracy between circular velocity of the potential and scale-length of the distribution function remains. And it is exact. So splitting by stellar sub-population can't help us! Durn. And Duh.
Eilers and I looked at the dependence of the kinematics of disk populations with various element-abundance ratios. And we built a model to capitalize on these differences without binning the data: We parameterized the dependences of kinematics (phase-space distribution function) on element abundances and then re-fit our dynamical model. It didn't work great; we don't yet understand why.
Today Adrian Price-Whelan (Flatiron) and I resurrected an old project from last summer: Code-named Chemical Tangents, the project is to visualize or model the orbits (tori) in the phase-mixed parts of the Milky Way by looking at the element abundance distributions. The gradients in the statistics of the element-abundance distributions (like mean, or quantiles, or variances, or so on) should be perpendicular to the tori. Or the gradients should be in the action directions and never the conjugate-angle directions. Price-Whelan resurrected old code and got it working on new data (APOGEE cross Gaia). And we discussed name and writing and timeline and so on.
Eilers (MPIA), Rix (MPIA), and I have spent two weeks now discussing how to model the kinematics in the Milky Way disk, if we want to build a forward model instead of just measuring velocity moments (Jeans style). And we have the additional constraint that we don't know the selection function of the APOGEE–Gaia–WISE cross-match that we are using, so we need to be building a conditional likelihood, velocity conditioned on position (yes, this is permitted; indeed all likelihoods are conditioned on a lot of different things, usually implicitly!).
At Eilers's insistence, we down-selected to one choice of approach today. Then we converted the (zeroth-order, symmetric) equations in this paper on the disk into a conditional probability for velocity given position. When we use the epicyclic approximations (in that paper) the resulting model is Gaussian in velocity space. That's nice; we completed a square, Eilers coded it up, and it just worked. We have inferences about the dynamics of the (azimuthally averaged) disk, in the space of one work day!
Today, in a surprise visit, Bernhard Schölkopf (MPI-IS) appeared in Heidelberg. We discussed many things, including his beautiful pictures of the total eclipse in Chile last week. But one thing that has been a theme of conversation with Schölkopf since we first met is this: Should we build models that go from latent variables or labels to the data space, or should we build models that go from the data to the label space? I am a big believer—on intuitive grounds, really—in the former: In physics contexts, we think of the data as being generated from the labels. Schölkopf had a great idea for bolstering my intuition today:
A lot has been learned about machine learning by attacking classifiers with adversarial attacks. (And indeed, on a separate thread, Kate Storey-Fisher (NYU) and I are attacking cosmological analyses with adversarial attacks.) These adversarial attacks take advantage of the respects in which deep-learning methods are over-fitting to produce absurdly mis-classified data. Such attacks work when a machine-learning method is used to provide a function that goes from data (which is huge-dimensional) to labels (which are very low-dimensional). When the model goes from labels to data (it is generative) or from latents to data (same), these adversarial attacks cannot be constructed.
We should attack some of the astronomical applications of machine learning with such attacks! Will it work? I bet it has to; I certainly hope so! The paper I want to write would show that when you are using ML to transform your data into labels, it is over-fitting (in at least some respects) but when you are using ML to transform labels into your data, you can't over-fit in the same ways. This all connects the the idea (yes, I am like a broken record) that you should match your methods to the structure of your problem.
Today Christina Eilers (MPIA) and I spent time working out different formulations for an inference of the force law in the Milky Way disk, given stellar positions and velocities. We have had various overlapping ideas and we are confused a bit about the relationships between our different options. One of the key ideas we are trying to implement is the following: The selection function of the intersection of Gaia and APOGEE depends almost entirely on position and almost not at all on velocity. So we are looking at likelihood functions that are probabilities for velocity given position or conditioned on position. We have different options, though, and they look very different.
This all relates to the point that data analysis is technically subjective. It is subjective of course, but I mean it is subjective in the strict sense that you cannot obtain objectively correct methods. They don't exist!
Today was the first of two 90-minute pedagogical lectures at MPIA by Conny Aerts (Leuven), who is also an external member of the MPIA. I learned a huge amount! She started by carefully defining the modes and their numbers ell, em, and en. She explained the difference between pressure (p) modes and gravity (g) modes, which I have to admit I had never understood. And I asked if this distinction is absolutely clear. I can't quite tell; after all, in the acoustic case, the pressure is still set by the gravity of the star! The g modes have never been detected for the Sun, but they have been detected for many other kinds of stars, and they are very sensitive to the stellar interiors. The relative importance of p and g modes is a strong function of stellar mass (because of the convective and radiative structure in the interior). She also showed that p modes are separated by near-uniform frequency differences, and g modes by near-uniform period differences. And the deviations of these separations from uniformity are amazingly informative about the interiors of the stars, because (I think) the different modes have different radial extents into the interior, so they measure different integrals of the density. Amazing stuff. She also gave a huge amount of credit to the NASA Kepler Mission for changing the game completely.
[No posts for a few days because vacation.]
Great day today! I met up with Eilers (MPIA) early to discuss our project to constrain the dynamics of the Milky Way disk using the statistics of the actions and conjugate angles. During our conversation, I finally was able to articulate the point of the project, which I have been working on but not really understanding. Or I should say perhaps that I had an intuition that we were going down a good path, but I couldn't articulate it. Now I think I can:
The radial action of a star in the Milky Way disk is a measure of how much it deviates in velocity from the circular velocity. The radial action is (more or less) the amplitude of that deviation and the radial angle is (more or less) the phase of that deviation. Thus the radial action and angle are functions (mostly though not perfectly) of the stellar velocity. So as long as the selection function of the survey we are working with (APOGEE cross Gaia in this case) is a function only (or primarily) of position and not velocity, the selection function doesn't really come in to the expected distribution of radial actions and angles!
That's cool! We talked about how true these assumptions are, and how to structure the inference.
I spent time today with Christina Eilers (MPIA), discussing how to constrain the Milky Way disk potential (force law) using the kinematics of stars selected in a strange way (yes, APOGEE selection). She and others have shown in small experiments that the radial angle—the conjugate angle to the radial action—is very informative! The distribution of radial angles should be (close to) uniform if you can observe a large patch of the disk, and she finds that the distribution you observe is a very strong function of potential (force law) parameters. That means that the angle distribution should be very informative! (Hey: Information theory!)
This is an example of orbital roulette. This is a dynamical inference method which was pioneered in its frequentist form by Beloborodov and Levin and turned into a Bayesian form (that looks totally unlike the frequentist form) by Bovy, Murray, and me. I think we should do both forms! But we spent time today talking through the Bayesian form.
There is a paradox about deep learning. Which everyone either finds incredibly unconvincing or totally paradoxical. I'm not sure which! But it is this: It is simultaneously the case that deep learning is so flexible it can fit any data, including randomly generated data, and the case that when it is trained on real data, it generalizes well to new examples. I spent some time today discussing this with Soledad Villar (NYU) because I would like us to understand this a bit better in the context of possible astronomical applications of deep learning.
In many applications, people don't need to know why a method works; they just need to know that it does. But in our scientific applications, where we want to use the deep-learning model to de-noise or average over data, we actually need to understand in what contexts it is capturing the structure in the data and not just over-fitting the noise. Villar and I discussed how we might test these things, and what kinds of experiments might be illuminating. As my loyal reader might expect, I am interested in taking an information-theoretic attitude to the problem.
One relevant thing that Villar mentioned is that there is research that suggests that when the data has simpler structure, the models train faster. That's interesting, because it might be that somehow the deep models still have some internal sense of parsimony that is saving them; that could resolve the paradox. Or not!
The tiny bit of research I did today was work on a Decadal Survey (Astro2020) white paper on changes we might make to the Decadal Survey process itself. The challenge is to write this constructively and with the appropriate tone. I don't want to be sanctimonious!
Nick Pingel (ANU) came by Flatiron and impressed us all with discussions of ASKAP, which is one of the pathfinders to the SKA. The most impressive thing I learned is that the feeds for the telescope array are themselves dipole arrays, so you can synthesize multiple beams at each telescope, and then synthesize an aperture for each beam. That's a great capability for the array, but of course is also an engineering challenge. He said scary things about what the calibration looks like. It really made me wish I had got closer to radio astronomy in my life!
At Stars and Exoplanets Meeting today, Wolfgang Kerzendorf spoke about a novel idea for peer review (for telescope-time proposals, but it could be applied to funding proposals or paper refereeing too): When you submit a proposal, you are sent K proposals to review. And the reviews thus obtained are combined in a sensible way to perform the peer review. This approach is scalable, and connects benefit (funding opportunity) to effort (reviewing). That's a good idea, and crystallizes some things I have been trying to articulate for years.
Kerzendorf's contribution, however, is to make a technology that makes this whole problem simpler: He wants to use natural-language processing (NLP) to help the organizations match proposals to reviewers. He showed snippets from a paper that shows that a simple NLP implementation, looking for similarity between proposal texts and proposers' scientific literature, does a reasonable job of matching reviewers to proposals that they feel comfortable to review. This is a great set of issues, and connects also to the discussions in our community about blind reviewing.
I spent my little bit of research time today working on the paper by Matt Buckley (Rutgers) about observing and using as a tool the conservation of phase-space density that is guaranteed by Hamiltonian dynamics.
Fridays are the good days at Flatiron. We have the Astronomical Data Group internal meeting (which operates by extremely odd and clever rules, not designed nor enforced by me) and the new Dynamics Group internal meeting. In the latter Robyn Sanderson (Penn) brought her entire group from Penn. Students working with the ESA Gaia data. One thing the group is finding is that certain stars have dynamics that are far more sensitive to dynamical (potential) parameters than others. This is something that Bovy and I were arguing long ago: The dynamical model of the Milky Way will not rest equally on all Gaia stars: Some will be critical. That's either obvious or deep. Or both! (I'm loving that phrase these days.)
Late in the day, Rodrigo Luger (Flatiron) and I trapped Leslie Greengard (Flatiron) and Alex Barnett (Flatiron) into a conversation about performing line integrals of spherical harmonics along curves that are themselves solutions of spherical-harmonic equations. In a typical astronomy–math interaction, we spent most of our time describing the problem, and then the answer is either: That's trivial! or That's hard! Unfortunately the answer wasn't That's trivial! But they did give us some good ideas for how to think about the problem.
One funny thing Greengard asked, which resonated with me (no pun intended): He said: Can you convert this math question into a physics question? Because if you can, it probably has a simple answer! You see how odd that is? That if your equation represents a physics problem, it is probably simple to solve. And yet it seems like it is exactly right. That's either deep or wrong or scary. I think maybe the latter.
Today I had the great honor of meeting Ingrid Daubechies (Duke), who is a pioneering and accomplished mathematician, known for some of the fundamental work on wavelets and representations that have been incredibly important in data. For example, the JPEG standard is based on her wavelets! She gave a talk at the end of the day on teeth. Yes teeth. It turns out that the shapes of tooth surfaces tell you simultaneously about evolution and diet. And she has worked out beautiful ways to first get distances between surfaces. Like metric distances in surface space. And then join those distances up into local manifolds. It could have relevance to things we have been thinking about for a non-parametric version of The Cannon. It was a beautiful talk, with the theme or message that you do better in your science if you use mathematical tools that are matched well to the structure of your problem. That message is either obvious or deep. Or both! What a privilege to be there.
Today was a beautiful and accomplished PhD defense at NYU by Anna-Maria Taki (NYU). Taki is a particle phenomenologist who is looking at signatures of dark matter in the ESA Gaia data. She is concentrating on methods that relate to gravitational lensing: In addition to magnification changes, lensing can induce artificial proper motions and artificial accelerations in the stars. Indeed, Jupiter and Saturn have huge gravitational-lensing signatures at Gaia precision, and they are calibrated out. But if there are dark-matter substructures (say) between us and the SMC or LMC, we could see them in principle as anomalies in the Gaia data. Taki has developed matched filters and statistical techniques for finding the signatures. No detections yet! But there is a hope that an end-of-mission Gaia search could be very interesting.
In the discussion over champagne, I discussed with various people the idea that Taki's work could inspire a new small-explorer class NASA mission. If you could show that such a mission could definitively rule out the main predictions of lambda-CDM, that would be a competitive proposal, I think. And a beautiful experiment.
The day ended with a great and fun PhD candidacy exam by Paul McNulty (NYU). He is using data science and information theory to understand how neural activity relates to motor function in fruit-fly larvae. We discussed the sense in which such work is physics. It is, of course! But it's interesting how interdisciplinary physics has become.
Today was a day of applied math. In one instance, Rodrigo Luger (Flatiron) reformulated all of Doppler Imaging (and the Rossiter–McLaughlin effect) into a simpler form using a model that is (essentially) the outer product of spherical harmonics on the surface of the star and a Fourier-transform basis in the spectral domain. This permits him to do the operations of Doppler imaging extremely fast and with no explicit numerical integrals on the surface of the star. We'd like to implement this in some mash up of Starry and wobble.
In another instance, Kathryn Johnston (Columbia) realized (or really this kind of thing is obvious to her) that my project with Suroor Gandhi (NYU) to model The Snail (the vertical phase spiral in the local Milky Way disk) can be implemented in a potential expansion, expanding in powers away from a simple harmonic oscillator potential. That is much more general than what we were doing, and contains what we were doing within it as a special case. That led to all sorts of talk about what kinds of expansions we might be able to do. Like can we expand a non-equilibrium galaxy in small terms away from an equilibrium galaxy? That's something that Johnston was talking about on Friday in her weekly Dynamics Group meeting.
I sometimes work long days, but I try not to fill my weekend with working. This weekend, however, I had promised myself that I would take the notes from the SDSS-V review I led in Denver and turn them into a draft report for the project. I can't believe it, but I succeeded!
Are you going to be on a review panel? I have advice (unsolicited advice, which I try not to give, but after this weekend, I can't help myself):
Make sure you do lots of writing while in session. If you just listen and talk for the period of the review, you leave the review with nothing written, and then you have to reconstruct it from memory and your notes. Instead, schedule executive-session writing time during the review and come away from the review with everyone's notes and comments compiled into one jointly editable document. I learned this from Mike Hauser (formerly STScI), who chaired the Spitzer Oversight Committee for many years.
Act fast. If you don't write your report immediately, you will never write it. So kill your procrastination and write the hell out of it immediately. And then your panel members will be so shocked at your turnaround time, they will be inspired to act fast themselves. They will take the draft you write fast and turn it into a final version.
Be helpful and constructive. Think carefully about and (more importantly) discuss with the team precisely what they want out of the report and what they can do. Make sure you are answering the questions they want answered, and that the answers you give can be implemented usefully and without huge burden. Report from reviews are about the future not the past.
I love the SDSS-V Project and Collaboration and I want both to succeed. I very much hope that what we have written will help them meet their goals.
My day started with Ana Bonaca (Harvard) telling me about an external-galaxy stream, a stream around an external galaxy found by the Dragonfly telescope. She was able to make a nice model of it! And that fit required that the dark-matter halo be flattened (which is interesting). But what we discussed was (you guessed it) information theory: What can you learn about an external galaxy from seeing a stream in imaging? And what if you get a few radial velocities along that stream? This is a great set of questions, and builds on work we have been doing since the pioneering papers of Johnston and Helmi so many years ago.
Today was day two of the SDSS-V Multi-object spectroscopy review. We heard about the spectrographs (APOGEE and BOSS), the full software stack, observatory staffing, and had an extremely good discussion of project management and systems engineering. On this latter point, we discussed the issue that scientists in academic collaborations tend to see the burdens of documenting requirements and interfaces as interfering with their work. Project management sees these things as helping get the work done, and on time and on budget. We discussed some of the ways we might get more of the project—and more of the scientific community—to see the systems-engineering point of view.
The panel spent much of the day working on our report and giving feedback to the team. I am so honored to be a (peripheral) part of this project. It is an incredible set of sub-projects and sub-systems being put together by a dream team of excellent people. And the excellence of the people cuts across all levels of seniority and all backgrounds. My day ended with conversations about how we can word our toughest recommendations so that they will constructively help the project.
One theme of the day is education: We are educators, working on a big project. Part of what we are doing is helping our people to learn, and helping the whole community to learn. And that learning is not just about astronomy. It is about hardware, engineering, documentation, management, and (gasp) project reviewing. That's an interesting lens through which to see all this stuff. I love my job!
Today was day one of a review of the SDSS-V Multi-object spectroscopy systems. This is not all of SDSS-V but it is a majority part. It includes the Milky Way Mapper and Black-Hole Mapper projects, two spectrographs (APOGEE and BOSS), two observatories (Apache Point and Las Campanas), and a robotic fiber-positioner system. Plus boatloads of software and operations challenges. I agreed to chair the review, so my job is to lead the writing of a report after we hear two days of detailed presentations on project sub-systems.
One of the reasons I love work like this is that I learn so much. And I love engineering. And indeed a lot of the interesting (to me) discussion today was about engineering requirements, documentation, and project design. These are not things we are traditionally taught as part of astronomy, but they are really important to all of the data we get and use. One of the things we discussed is that our telescopes have fixed focal planes and our spectrographs have fixed capacities, so it is important that the science requirements both flow down from important scientific objectives, and flow down to an achievable, schedulable operation, within budget.
There is too much to say in one blog post! But one thing that came up is fundraising: Why would an institution join the SDSS-V project when they know that we are paragons of open science and that, therefore, we will release all of our data and code publicly as we proceed? My answer is influence: The SDSS family of projects has been very good at adapting to the scientific interests of its members and collaborators, and especially weighting those adaptations in proportion to the amount that people are willing to do work. And the project has spare fibers and spare target-of-opportunity capacity! So you get a lot by buying into this project.
Related to this: This project is going to solve a set of problems in how we do massively multiplexed heterogeneous spectroscopic follow-up in a set of mixed time-domain and static target categories. These problems have not been solved previously!
I spent time today on an airplane, writing in the papers I am working on with Jessica Birky (UCSD) and Megan Bedell (Flatiron). And I read documents in preparation for the review of the SDSS-V Project that I am leading over the next two days in a Denver airport hotel.
This morning on my weekly call with Eilers (MPIA) we discussed the new scope of a paper about spiral and bar structure in the Milky Way disk. Back at the Gaia Sprint, we thought we had a big result: We thought we would be able to infer the locations of the spiral-arm over-densities from the velocity field. But it turned out that our simple picture was wrong (and in retrospect, it is obvious that it was). But Eilers has made beautiful visualizations of disk simulations by Tobias Buck (AIP), who shows very similar velocity structure and for which we know the truth about the density structure. These visualizations say that there are relationships between the velocity structure and the density structure, but that it evolves. We tried to write a sensible scope for the paper in this new context. There is still good science to do, because the structure we see is novel and spans much of the disk.
In my small amount of true research time today, I wrote an abstract for the information-theory (or is it data-analysis?) paper that Bedell and I are writing about extreme-precision radial-velocity spectroscopy. The question is: What is the best precision you can achieve, and what data-analysis methods saturate the bound? The answer depends, of course, on the kinds of noise you have in your data! Oh, and what counts as noise.
In an absolutely excellent Stars and Exoplanets Meeting, Rodrigo Luger (Flatiron) had everyone in the room (and that's more than 30 people) say what they plan to get done this summer!
Following that, Melissa Ness (Columbia) talked about the different alpha elements and alpha enhancement: Are all alpha elements enhanced the same way? Apparently models of type-Ia supernovae say that different alpha elements should form in different parts of the supernova, so it is worth looking to see if there are abundance differences in different alphas. The generic expectation is that there should be a trend with Z. She has some promising results from APOGEE spectra.
Mike Blanton (NYU) talked about how we figure out how to perform a set of multi-epoch, multi-fiber spectroscopic surveys in SDSS-V. He has a product called Robostrategy which tries to figure out whether a set of targets (with various requirements on signal-to-noise and repeat visits and cadence and so on) is possible to observe with the two observatories we have, in a realistic set of exposures. That's a really non-trivial problem! And yet it appears that Blanton may have working code. I'm impressed, because integer programming is hard.
And Shuang Liang (Stony Brook) showed us that it is possible to calibrate u-band observations using the main-sequence turn-off, as long as you account for the differences between the disk and the halo. He has developed empirical approaches, and he has good evidence that his calibration based on the MSTO is better than other more traditional methods!
I had conversations today with Megan Bedell (Flatiron) and Kate Storey-Fisher (NYU) about titles for their respective papers. I am slowly developing a whole theory of writing papers, which I wish I had thought about more when I was earlier in my career. I made many mistakes! My view is that the most important thing about a paper is the title. Which is not to say that you should choose a cutesy title. But it is to say that you should make sure the person scanning a listing of papers can estimate very accurately what your paper is about.
I then think the next most important thing is the abstract. Write it early, write it often. Don't wait until the paper is done to write the abstract! The abstract sets the scope. If you have too much to put into one abstract, split your paper in two. If you don't have enough, your paper needs more content. And unless you are very confident that there is a better way, obey the general principles (not necessarily the exact form) underlying the A&A structure of context, aims, methods, results.
Then the next most important thing is (usually) the figures and captions. My model reader looks at the title. If it's interesting, the reader looks at the abstract. If that's interesting, they look at the figures. If all that is interesting, maybe they will read the paper. Since we want our papers to be read, and we want to respect the time of our busy colleagues, we should make sure the title, abstract, and figures-plus-captions are well written, accurate, unambiguous, interesting, and useful.
So I spent time today working on titles.