Today, Lauren Anderson (Flatiron) showed me some awesome visualizations of the color-magnitude distribution (CMD) of stars as a function of position in the Galaxy. There are variations! The question is: What part of the variation is due to dust and what part is due to metallicity (or composition) and what is due to star-formation history (or age)? We don't know how to answer these questions! But one thing we did is take some derivatives of the CMD with respect to position. Is this a sensible thing to do? Could we treat the CMD as data and try to fit it with a latent-variable model? That is, what is the right approach to quantifying and interpreting the CMD variations around the Galaxy?
2018-05-31
2018-05-30
binary stars, spectroscopic parallaxes. planets
Andy Casey (Monash) is in town to work on Gaia DR2 and he has been looking at using the radial-velocity uncertainty (which, in the database, is really an empirical scatter across measurements) to identify binary stars. This is a great idea. I was pitching various ways to calibrate this quantity to make it more reliable and then he reminded me that many binaries have tens of km/s semi-amplitudes! Duh. The signal is super-strong. This is a great #GaiaSprint project!
Christina Eilers (MPIA) and I had success today on spectroscopic parallaxes for stars at the top of the red-giant branch: We are now able to predict absolute luminosities (and therefore parallaxes) with almost 10-percent accuracy! That makes them only slightly worse than red-clump stars, and we think there is more information to exploit in the data. Our method is a bit hacky: We are still using spectroscopic quantities from the APOGEE pipelines and not just the spectra themselves, but it should point the way to a cleaner method soon.
Stars Meeting at Flatiron was a great success. One exciting project in progress is that Ruth Angus (Columbia) is finding relationships between exoplanet occurrence and host-star orbital actions! Now the causal part: Is this because of age or abundances or dynamical interactions? Another is that Ben Montet (Chicago) proposed that we find non-transiting hot-jupiter exoplanets by looking at surface rotation: There is at least weak evidence that stars with hot jupiters spin faster—or appear to. That's exciting as another possible indirect planet-detection technique.
2018-05-29
Gaia sprinting
Today was the first day of what's looking like it's going to be a very full week! Andy Casey (Monash) arrived in town, to work on binary stars and self-calibration of stellar parameter pipelines. Natalie Hinkel (Vanderbilt) showed up, to also work on self-calibration, in the context of her Hypatia Catalog. Christina Eilers (MPIA) showed up to work on data-driven approaches to estimating parallaxes from spectroscopy, which is a project we are re-booting from last summer. Alex Malz (NYU) dropped by to discuss issues with inferring redshift distributions from biased photometric redshift measurements. Kate Storey-Fisher (NYU) came to talk about mock catalogs for large-scale structure. And I continued conversations with other group members about their #GaiaSprint projects for next week. By the end of the day, Eilers and I were getting parallax predictions at the 20-ish-percent level, but we need to do a factor of two better!
2018-05-25
gravitational clustering, gravitational interferometry
Today Michael Joyce (LPNHE) gave a great talk about analytic and conceptual directions towards understanding nonlinear gravitational growth of structure in the Universe. He focused on the stable-clustering approximation, which dates back to Peebles, is very predictive over a range of scales, and can be used to test simulations. At lunch afterwards, we discussed the great importance of studying gravity analytically, a point made often and well by Roman Scoccimarro (NYU).
Prior to the seminar, Ellie Schwab-Abrams (AMNH) and I discussed self-calibration for pulsar timing arrays, which we think and hope could lead to a new era of gravitational interferometry and enormously increase the sensitivity to long-term gravitational-wave signals. We decided to start by solving the radio-astronomy problem, which has yet to be solved in the literature, because no radio telescope has the problem that the relative velocities of it's elements are unknown!
2018-05-24
interpreting and combining imaging
My data started with two great conversations about imaging and data science. In the first, Marc Gershow (NYU) and Rui Wu (NYU) came to talk to me about their project with fruit-fly larvae (mentioned here). They do a multi-step dimensionality reduction on the (immense amounts of) video data they have and then look for discrete behaviors and discrete changes in behavior. We discussed the possibility that we could re-cast all or parts of this problem as a regression, that would be fast and interpretable. It maps on to a lot of problems we are doing with stars, especially The Cannon and derivatives thereof.
In the second conversation, Mike Blanton (NYU) and Dou Liu (NYU) discussed with me their project to generalize spectro-perfectionism into a method (mentioned here) for combining badly and irregularly sampled imaging. We discussed a certain part of the problem that I was discussing many moons ago with Sam Roweis and Adam Bolton: The method involves a renormalization step, and this renormalization is very sensitive to details. Indeed, when Roweis died, he and I were looking at whether we could replace this step with something that makes more sense. I advised Blanton and Liu to take a calibration approach, where they run not on the flatfield-normalized spectral data, but run on the raw data and the flatfield data and deliver a ratio of results. For now. Until we can figure out the Right Thing To Do (tm).
2018-05-23
Multi-messenger astrophysics
I spent the day at a workshop at UMD on the connections between (what I would call) data science and (what the community calls) multi-messenger astrophysics. Multi-messenger means that there is information coming not just from photons, but maybe also neutrinos, or cosmic rays, or (importantly these days) gravitational waves. The idea is for a group of people to build a white paper that makes the case for joint programs between data-intensive research and MMA research. I spent some of my capital at the meeting arguing that we should make sure we focus on the whole scientific process, that includes not just discovery of transient events but also continuous-wave sources, sub-threshold populations, and all the machinery we need for inference and modeling as well. The future of data science is putting immensely complicated physical simulations inside the inference loop (which might be an optimization process or a sampling process). I learned a lot, both about MMA and about the NSF, who proposed that we hold the meeting: They use this kind of input in making high-level funding decisions. That is, programmatic decisions. By the end of the day we had 30 pages of text. In one day!
2018-05-22
#GaiaSprint prep
Today was a low-research day! But I did start the organizational processes for the #GaiaSprint, for which I am extremely excited. This may not be research by my rules, but it sure will create new research.
2018-05-21
dimensionless numbers
My morning started with a great call with Ana Bonaca (Harvard): She can make a very simple simulation of a cold stream that creates a gap in the stream and a spur of stars off the stream from the gap, looking extremely similar to the features found by her and Price-Whelan in the GD-1 stream in Gaia. We are looking at doing a little perturbative physics and building an atlas of possible stream features, in different physical regimes. We discussed a bit the first set of figures. Part of the point of the project is to work out all the dimensionless numbers that put stream interactions (with, say, dark-matter substructures) into different calculational regimes.
2018-05-18
group meeting, self-calibration for GR, writing
Today we had the first-ever Astronomical Data Group Meeting. The rules are: You must bring a plot, and you get a time period of (1 hr)/N where N is the number of people in the room to get feedback. It was fun: All of the plots (even Foreman-Mackey's) related to the Gaia DR2 data. I asked the crew whether the stars below the main sequence in the Gaia color–magnitude diagram are very low in metallicity? And if so, shouldn't we take spectra? Anderson thinks maybe they are just issues with crowded fields. That is, issues in the data. Problems with chasing outliers!
After that I had long sessions with Ellie Schwab-Abrams (CUNY), and Jonathan Bird (Vanderbilt). Schwab-Abrams and I are trying to convert my question about self-calibrating gravitational-wave pulsar-timing arrays into the equivalent question about self-calibrating radio telescopes. It is very similar! But we have to take into account the 6-space position not the 3-space position, and we also have to deal with light travel time issues that we can't control with delay lines! But the payoff is immense: I naively expect a factor of more than a billion increase in sensitivity of the arrays if we can do it. Yes I said billion. I hope I'm right.
Bird is finishing his paper on the age–velocity relationship in the disk. We went over discussion points. I recommended explicitly challenging the assumptions and saying what we think would happen if we relaxed them, both in terms of the results and in terms of model complexity. My problem (as it often is in projects) is that I care about the method much more than the astrophysical results.
2018-05-17
disk heating; cutting bait
Jonathan Bird (Vandy) is in town for two days, to finish a paper on heating in the Milky Way disk. The model is a hierarchical probabilistic model that generates the ages and vertical velocities of all the red-clump stars in a big part of the APOGEE data, where the ages come from C and N abundances from Ness and the velocities come from APOGEE and Gaia. He gets very precise answers! But there are deviations between the data and the model in the space of the data, and we debated how important these are to our conclusions.
Lauren Anderson (Flatiron) decided today that she has to down-select from many Gaia DR2 projects to one single Gaia DR2 project. Good idea! And in discussing this with her, I realized that I also needed to do this. We didn't get to final decisions.
2018-05-16
6-volume, myspace, rules, tellurics
Too many things today for one blog post! So just a rapid-fire list. Matt Buckley (Rutgers) and Adrian Price-Whelan (Princeton) and I discussed whether we could, in practice, measure phase-space six-volumes given a point-set in Gaia or a future data set. It isn't clear, so we started by designing some extremely simple simulations to test.
Price-Whelan and I discussed our myspace project to find the nonlinear transformation of the phase-space data near the Sun to make the phase-space structure as compact or informative as possible. We have a plan for implementation of the data-science side of the project, but we have no idea whether anything we find will be interpretable!
We had our first Stars Meeting under the new rules that we established last week. The objectives are, more-or-less: We want the presenters to be less prepared and we want the audience to be more engaged. We created some rules or guidelines to help achieve these objectives. And the meeting went well! Among other things that happened in this meeting, Price-Whelan showed a forming star cluster he found in the Milky Way halo, possibly connected to the Magellanic gas stream, and John Brewer (Yale) showed micro-tellurics (tiny atmospheric absorption lines) found in some of the very first R=150,000 EXPRES spectra.
On that last point: Brewer found these tellurics by observing a B star, which has no narrow lines (and almost no lines at all), so the narrow absorption lines must be intervening. Megan Bedell (Flatiron) has a data-driven method for finding tellurics even in very featured, narrow-lined spectra, by exploiting the causal structure: Star lines move with the star, atmosphere lines move with the atmosphere! She confirms at least qualitatively, at least some of Brewer's lines. I expect that we have some nice points to make in the comparison.
Oh, and: Unmodeled telluric absorption might be the limiting systematic in exoplanet RV surveys, right now or in the near future.
2018-05-15
Argh proposals!
My day was made low-research by the realization on waking that NASA ADAP proposals are due on Thursday, and not next week as I had, perhaps self-servingly, believed. That blew most of my day. Only research highlight was giving an informal talk at the NYU Center for Data Science, where I gave the crowd some idea of why and where we do data science in astronomy.
2018-05-14
integral-field spectroscopy
Today Dou Liu (NYU) gave a presentation of his thesis research as part of his candidacy exam. His first project has been to adapt the ideas in spectro-perfectionism from the spectral domain to the spatial domain to combine irregular, dithered imaging. He is applying this to integral-field spectroscopic data in MaNGA, which is part of SDSS-IV. He showed that he can get better angular resolution than the standard data-analysis methods, which are generally radial-basis-function interpolations of the data. One of his goals is to produce generally useful tools. Another is to re-process all of the MaNGA data. A great contribution, betterizing existing data that have already been hugely productive.
2018-05-11
data science and larva behavior
I had the great pleasure of being on the oral qualifying exam for Rui Wu (NYU), who is looking at the behavior and neural computation of fruit-fly larvae. She told us about her research so far (in preparation for her PhD project), in which she has built a fully data-driven model of larval behavior, classifying multiple different behaviors in an unsupervised model. She can also show that behavioral changes are correlated with changes to larval stimulus. She did all this by dimensionality-reducing video data with a set of clever techniques.
I learned an immense amount in her seminar. One is that they can genetically modify the larvae so that their olfactory senses can be stimulated with light! That's crazy but makes for better experimental techniques. Another is that they can read from individual neurons simultaneously with monitoring large-scale behavior. The fly is a model neural system that does complex things but with very few neurons, so there is a hope of reverse engineering the full computation. A truly out-there idea is that if the computation and behavior is understood, the larvae could be controlled or driven like an engineering system.
2018-05-10
moving things to the right of the bar
At lunch I gave the Flatiron CCA a taste of the science going on with Gaia DR2 in a short lunch talk. And before that I prepared my slides. All that counts as research by my rules.
My research highlight of the day, however, was a conversation with Neige Frankel (MPIA) about her probabilistic model for radial migration in the Milky Way disk. She doesn't have a good quantitative model for the selection function for her data, so she doesn't want her model to generate the three-space positions of the stars in her sample. At the same time, the structural parameters of the Milky Way disk are important. So she wants a model for the stellar properties, conditioned on the stellar positions. There are two ways to do this. The first is to write a graphical model where arrows only come from (and never go to) the stellar positions. We did that, but Frankel doesn't like that option, because the model only has simple analytic form when the arrows go to the positions.
The other option is to use the factorization formula p(a, b) = p(a|b) p(b). The stellar positions can be moved from the left side of the vertical bar to the right side by dividing by a pdf for the positions. We wrote that down, drew the relevant graphical models, and discussed changes to her text. She has beautiful results, TBA.