halo and Gaia

Lauren Anderson (Flatiron) and I looked at some simple selections of M-type giant stars in the Milky Way halo, to see if we see Sagittarius and other halo structures. They didn't jump out as obviously as I expected! But then we looked back at the Majewski et al paper and saw that their color selection was certainly not trivial! At the suggestion of Hans-Walter Rix (MPIA), we also looked at the Gaia CMD paper and the bimodal halo stars that were shown at the DR2 press conference. I am sure that the Milky Way halo will be full of interesting things! Now let's find them.

Also Megan Bedell (Flatiron) and I discussed our BetterTogether project, which has multiple goals. Her work today is to find planet hosts with comoving companions. One issue is that Kepler has such a low angular resolution, the matching between Kepler and Gaia might be difficult or ambiguous. Something to think about, given that we can't really re-photometer either dataset.


#GaiaDR2 zero-day workshop, day 3

Today was the third and final day of our Gaia DR2 zero-day workshop. My goodness it was a fun week. Many of the participants told me that they would remember this week for the rest of their lives! Now that's not something I hear every day. And the participants here in NYC were very much focused on learning what is in the data, and exploring the data. There was no sense of trying to rush out publications or results. I loved the atmosphere.

In my own research today, I worked with Dustin Lang (Toronto) to understand the SDSS-III spectra that overlap the white-dwarf parts of the UV color–magnitude diagram that David Schiminovich (Columbia) and Lang showed yesterday. It wasn't obviously simple, but I have ideas about making a latent-variable model for it: Predicting spectra from photometry!

In the lunch-time check-in session there were some really impressive results. One was a big model for stellar physical parameters, and extinctions in a two-component model by Eddie Schlafly (LBNL). He pointed out that since Gaia gives distances and colors, it is sensitive to even fully gray extinction. So it provides a new window into extinction. Since his model involves simultaneously modeling stellar multi-band photometry (combined from many missions) along with the intrinsic properties of every star, it got big fast. I think it was at 800,000 parameters today. Optimized! That's pretty good for day three.

Another beautiful set of results at the check-in were visualizations of tidal features: Sarah Pearson (Columbia) visualized the tidal tails of Palomar 5, hoping to find them extend further than ever before. Chervin Laporte (UVic) visualized the anti-center stream and made the case that all of its kinematic properties are consistent with it being a tidal arm coming off the Milky Way from an interaction (with Sagittarius, I guess?). The morphology of the anti-center stream really is sharp, like a fold caustic.

In more general data-understanding categories, Sergey Koposov (CMU) scanned through proper-motion space, showing us low-parallax (that is, non-close) stars in different proper-motion bins. That highlighted a lot of streams, clusters, and anisotropies. And Andy Cassey (Monash) showed us how good (or bad) astrometric excess variance (and also radial-velocity excess variance) is at detecting binary stars. The answer is: Promising, but not calibrated usefully yet. In an ideal world we would build a self-calibrated model of what causes the variance and then use the residuals to detect binarity.

There were many more impressive things today, about the nearby volume, about comoving stars, about detailed chemical abundances, about the GD-1 stream (and possible progenitor!), and about kinematics of the disk and kinematics of the bar; too many things to mention here. Thank you, Gaia Collaboration.


#GaiaDR2 zero-day workshop, day 2

It was a little harder to get up this morning after yesterday's 13-hour day, but I still made it in early for the second day of the Gaia DR2 zero-day workshop. We had about 70 yesterday and still maybe 50 today; the room was at capacity and we had people all over the 3rd floor of the (very generous) Flatiron Institute.

Dustin Lang (Toronto) coined the name "BetterTogether" for a project that Megan Bedell (Flatiron) and I started to find all the comoving pairs that can be confidently identified in the data. This kind of work isn't new: Semyeong Oh (Princeton) had big impact with her comoving-pair work in Gaia DR1. But what's new is the idea of using the co-moving-ness to betterize the parallaxes of both stars, and in particular the less luminous (and hence noisier) star. So pairs that are WD-MS or MS-RGB are most valuable! This project builds conceptually on work I did with Morgan Fouesneau (MPIA) and Hans-Walter Rix (MPIA) in the TGAS–PanSTARRS overlap.

The issue is that you can't trivially look at every pair in a 1.3-billion-star catalog. There are 1e18 pairs! And even deciding not to look at a pair takes time. So Lang started to build us a very nice data structure for doing the two-point work while Bedell looked at the restricted sample that matches the Kepler targets.

In the mid-day check-in, some really impressive things were shown. Lang and David Schiminovich (Columbia) showed a set of UV color–magnitude diagrams that literally caused the audience to gasp. Stars look so different in the UV! And there are stars where there “shouldn't be”, because of binarity or chromospheric activity or something. So much structure! Kohei Hattori (Michigan) showed a hyper-velocity star that looks like it was launched from the disk towards the Galactic Center. Tim Morton (Princeton) showed that the Gaia stellar radii are good enough to bring out the radius gap in Kepler exoplanets. Ana Bonaca (Harvard) and Adrian Price-Whelan (Princeton) showed that the gaps in the GD-1 stellar stream are really there, and also had hints of kinematic offsets that might indicate dark-matter substructure!

On a more astrophysical note, Kareem El-Badry (Berkeley) spent yesterday and today becoming an expert on white-dwarf physics and was able to give a reasonable, quantitative explanation of the (exquisite, surprising) morphology of the white-dwarf part of the color-magnitude diagram, including a generative model! He finds that even if the IMF and star-formation history are monotonic, the white-dwarf mass distribution is not, because of wiggly initial-mass–final-mass relations. That gets much (but not all, I'm interested to note) of the multi-modal structure in the diagram.


#GaiaDR2 zero-day workshop, day 1

Today was Gaia DR2. My day started at 05:00 for the press release, and ended at 18:00 with the champagne toast we lifted to the entire Gaia DPAC, who have actually changed the world. Amazing things happened during the day, way too much to report on in this forum. So I will just tell you what I was paying close attention to today.

Ana Bonaca (Harvard) and Adrian Price-Whelan (Princeton) looked to see if they could see the long, cold stellar stream GD-1. They found it, and it is in the data at immense signal-to-noise. It is still subtle though, reminding us that finding brand-new streams in the data will still be a challenging project. Their map of the stream confirms the gaps we thought we saw many years ago, and there might even be hints of kinematic distortions at the edges of those gaps. If any of that turns out to be real, we might be able to directly measure substructure in the stream.

Megan Bedell (Flatiron) did the match between Gaia and Kepler and made basic visualizations. These already revealed something interesting: Although there are very few planets around blue subdwarfs (and no, I have no idea what they are, but blue stars below the main sequence), the fraction of blue subdwarfs that host planets looks like it is way too high. What could this mean? Perhaps even more interesting: The planet orbital periods are too short for the planets to have survived the stellar evolution up the red-giant branch and back down again, so there is an astrophysical mystery there too.

In the Gaia DR2 press conference, the team attributed complexity in the white-dwarf color-magnitude diagram (and check it out, it is beautiful!) to different compositions (or maybe surface compositions) of the white dwarfs in the different stripes or modes in the diagram. Kareem El-Badry (Berkeley) did some digging in the white-dwarf world and finds that this is not a good explanation for the differences, or at least not a complete explanation. He thinks there must be some complexity to the mass distribution of white dwarfs, unless the cooling models have serious issues. And he also thinks that the diagram is not showing lots of white-dwarf–white-dwarf binaries, but also not showing zero of them!

There were some reporters at the event. I thought this story by Lee Billings (Scientific American) captured a lot of the spirit of the day!


not ready for LSST!

Fed Bianco (NYU) gave a great astro seminar today at NYU about the LSST project. She focused on time-domain and transient aspects, but did a good job of discussing the methods for making objective trade-offs in the cadence and survey-strategy space. Arguments broke out about filters and about data access. That was interesting and valuable. The project is amazingly ambitious, especially as regards data analysis and permitting and enabling non-trivial computational work by outsiders.


ready for DR2

Today I spent all my research time on details in preparation for Gaia DR2, which happens on Wednesday. Unfortunately, my preparation wasn't exactly research: I was working on building access, catering details, room arrangement, invitations, and encouragement. We have some forty people (not all of them astronomers) converging on Flatiron to work together on the new data.

One thing has become absolutely clear over the last few weeks, in part because of hard things some people have said to me, about themselves and about others and about perceptions: Our goal this week is to have fun. And learn. It isn't to be first on things. It is to learn things we couldn't have known before. The idea is to cooperate, to share, and to support the global Gaia community. I think we have been doing that for years now (I sure hope we have), but it is worth re-stating daily, especially when there is a lot of anticipation and excitement and, frankly, anxiety, about the upcoming data release.

Here's to 1.6 BILLION stars. In less than 36 hours.


finished a single-author paper!

In parallel-working session this morning, I finished and prepared for submission (to arXiv) my paper on a likelihood function for Bayesian data analyses with the Gaia data.


fiber robots; gravitational waves

At lunch, Mike Blanton (NYU) and I discussed operational matters for SDSS-V. One thing we discussed was how to have different cadences for different types of stars, when we have a huge field of view and finite target densities for each stellar type. His view is that we should re-formulate the question in terms of sky patches, and set cadences for particular sky patches, and then observe the stars inside those patches as makes sense given the patch cadences. We also asked how to formulate this problem in terms of a scalar objective function or cost function, which is essential if we are going to let loose with optimizers.

The other thing we talked about is positioning a dense set of fibers. There are configurational constraints on the path that the fiber robots can take if they are going to avoid collisions and conflicts. Can we resolve these? And what engineering literature do we look to for the best or standard solutions to problems of this type. I am sure there is a huge literature, because it connects to all sorts of things like milling machines and warehousing and things like that. But I need keywords. I promised to deliver some to the SDSS-V Collaboration.

At the end of the day, Vicky Kalogera (Northwestern) gave a great talk about gravitational wave observing. Her group has been essential in converting the theory of gravitational-wave sources into practical schemes for performing principled probabilistic inferences on the data. She said, in her talk, that in the process she has become an observer, but she only observes in the gravitational-wave sector! And it is really true: She referred consistently in her talk to astronomers as “electromagnetic observers”. I love that! But really, the LIGO results are incredible, and Kalogera deserves a lot of credit for them.


a non-parametric model of the MW acceleration field

At Stars group meeting, I spoke about Ana Bonaca and my new paper looking at the information content of cold stellar streams in the Milky-Way halo. It is a huge document, with lots of results, but my absolute favorite is this: As we make the potential model for the Milky Way more flexible, each stream constrains each potential parameter less well. This is the issue with information studies: They depend strongly on the model flexibility! But something cool happens in the limit of very flexible potential model: Each stream appears to end up constraining the local acceleration field, local to the current position (not past position) of the stream. This has lots of consequences: One is that if this is true, we can just model each stream independently, in a flexible potential, and then interpolate the acceleration constraints they deliver with a flexible or non-parametric model as an interpolator! That would make stream fitting more tractable than it is now, not less (and most other ideas we have are computationally impossible at present).

In the discussion, Vasily Belokurov (Cambridge) suggested that we might get more information—and more global information—if we modeled the density of stars along the stream. He is reacting to the point that the Bonaca stream model is a stream-track model, not a full six-dimensional distribution function. Belokurov might be right; we should add something like this to the paper.

After I spoke, Jackie Faherty (AMNH) got us really excited about what Gaia has done and will do for nearby moving groups of young stars (like open clusters). She believes that several of the “connected components” in the Oh et al paper are new, previously undiscovered young clusters, and that Gaia DR2 might find hundreds of new members, going down the main sequence! That's amazing. I hope it's true.


chemistry in protoplanetary disks

My research highlight today was a great talk by Ilse Cleeves (Harvard) about ALMA observations of the dust and molecular gas in proto-planetary disks. She showed that you can see chemical gradients in the disks, including rings and lines of formation of molecules. Much of what's visible is on the outer surfaces of the disk, which is illuminated by the young, accreting star, because the interior parts of the disk are optically thick. Because the chemical models are heavy and imperfect, I proposed looking at latent-variable models to describe the observed molecular abundances; maybe there are interesting features to be found even without relying on physical or chemical models?

The most remarkable thing she showed is time-domain chemical results: She can see chemical changes in real time as the disk responds to (presumably) stellar flares on day-ish time-scales! We discussed methods for distinguishing different kinds of events that might be triggering the chemical changes. Expect denser time sampling in the future; it's ALMA proposal-writing season!

Technically, my favorite part of Cleeves's work is that she does all of her model-fitting and hypothesis comparing in the visibilities. That is, she doesn't make an image from the interferometric data and then model it: She takes her models to the Fourier domain and compares to the raw data. That's classy.


Gaia writing, neutrino masses

In a remarkable turn of events, I finished writing a paper today! More specifically, I finished what I would call the “zeroth draft” of my paper (or really just short note) on the Gaia likelihood function. I checked in with Hans-Walter Rix (MPIA) and he encouraged me to take it through some revisions and submit it to arXiv. We shall see if I make it.

At lunch-time, the NYU CCPP Brown Bag talk was by Derek Inman (NYU) on neutrino masses. He explained what's known from oscillation experiments, from beta-decay experiments, and from cosmology. The laboratory bounds put lower limits on the neutrino masses, and the cosmological bounds put upper limits. The cosmological bounds are very strong, but they are also very dependent on having a very good cosmogonic model. That is, they are not even close to being model-independent. He did a nice job explaining how the flavor eigentstates relate to the propagation eigenstates, and the mass hierarchies. It is a nice set of problems.


finding planets

In the afternoon today, I spoke to the NYU Physics Majors about how we find planets in the Kepler data. I spoke about the probability calculus, linear models, fitting, marginalization, and search. Nothing here is all that hard; our special abilities are all about the engineering details of doing all that linear algebra fast.



If there is one thing I am worse at than anything else in writing papers, it is properly reading and citing the relevant literature. I spent all morning working through the literature relevant to my Gaia likelihood-function paper. And I know I am still missing things.


new Hamiltonian sampler; tidal circularization

In our Gaia DR2 prep meeting, I led a discussion of the likelihood function for Gaia data. I made the standard proposal (Gaussian based on Catalog values). And we discussed a bit when you need to use that (that is, be Bayesian) and when can you just treat the data like truth. As my loyal reader might expect, I advised extreme pragmatism.

In Stars meeting, there was lots of great stuff: Foreman-Mackey (Flatiron) showed us his brand-new Hamiltonian MCMC sampler, which has a super-simple interface, and contains lots of the hacky goodness of STAN. He showed that is blows emcee out of the water for some problems with dozens of parameters. He and Price-Whelan (Princeton) are using this sampler to look for multi-star systems in APOGEE radial-velocity data.

Price-Whelan showed us APOGEE data that show very clear evidence of tidal circularization for (fainter) stars orbiting red-giant stars, and in very good agreement with the theory of this. The situation is easiest to predict, apparently, for giant stars with large convective envelopes. He did both the data analysis and some nice theory with MESA and tidal-circularization differential equations too to support his claims.

Late in the day, Julianne Dalcanton (UW) consulted with Foreman-Mackey and me on a hierarchical model for the dust in M33. She wants to simultaneously learn the dust map and the unreddened color-magnitude diagram as a function of position in the galaxy. That's a great but hard optimization.


undergraduate projects

It was a low-research day! But I did get in a short conversation with Shiloh Pitt (NYU) who is using a Jupyter notebook to verify a set of matrix identities that I am planning on posting to arXiv in the near future. The notebook is great for making outward-facing code! And I also got in a short conversation with Elisabeth Andersson (NYU) who figured out how to make a data-driven model template for searching for new planets in a star that already has known planets.


Hawking radiation, Gaussians

At lunch time, Matt Kleban (NYU) gave a nice overview of the simplest arguments that black holes must radiate. It was a memorial, of sorts, for Stephen Hawking. Fundamentally, the argument is that if GR is to be consistent with quantum mechanics, then you must have black-hole radiation. That argument is good and sensible, but it is certainly theoretically prejudiced, since it confidently predicts something that will never be observed, and by privileging one part of theory over another. In the discussion afterwards, we learned that GR people tend to think that black holes obviously destroy information, whereas particle physicists tend to think that information will be preserved by some heretofore unknown mechanism. That's interesting, and highlights how socially constructed some aspects of theory might be. But I learned a lot and loved the talk and the discussion. Kleban is a very deep person and a great colleague.

Earlier in the day, I got challenged on a claim that the prior prediction for a snapshot of the amplitude of one mode of a Gaussian-driven damped, harmonic oscillator would be zero-mean and Gaussian. Not the squared amplitude but the straight linear amplitude of the sinusoid with a particular phase. That rattled around in my head all day. Late in the day, I think I have a good argument: Every linear projection of a Gaussian process onto any basis function or anything else (so long as it is a linear function of the Gaussian-process data) will be Gaussian-distributed.


radial migration

Hans-Walter Rix (MPIA) has been kicking around ideas for observationally testing the process of radial migration of orbits in the Milky Way disk. It is a slippery problem, because stars aren't tagged with their birth locations in the disk! His idea has been to assume that stellar surface metallicity is a nearly deterministic function of time and radius, and then look at explaining all of the variance in abundances we see at any radius today as being the result of radial migration. This is like a maximal approach to the problem. Maximal in the sense that it attributes all variance (in the age-metallicity relationship) to migration.

Today, on the plane home from undislo, I read a very nice draft by Neige Frankel (MPIA) that executes these ideas, and beautifully (and probabilistically). She uses stellar ages from the C and N dredge-up analysis of Melissa Ness (MPIA), which are imprecise but do seem to be ages. Frankel finds sensible parameters for the migration, in terms of radius variance as a function of time. It all hangs together, because the Milky Way data (from APOGEE in this case) really do show a broadening of scatter in metallicity as a function of radius as age increases. That is, there does seem to be at least qualitative support for this picture of radial migration.


Stitch Fix

I had the great privilege of visiting Stitch Fix today, hosted by Dave Spiegel. I was interested in the company for many reasons, but the main one is that it has a large number of PhD astrophysicists on its data-science team. I learned a huge amount while visiting. Here are some random things:

If you are doing data science to inform or support the decision-making of an employee of your company, it is worth spending a lot of computation on that: After all, the employee is very valuable and expensive! On the other hand, if you are doing data science to directly execute commands (for, say warehousing of goods), you better not get it wrong, because if you have a bug, you could literally move lots of stuff where you don't want it!

If you sell clothing, the lead time between buying clothing wholesale and selling it is long! So you can't quickly or in real time feed back customer preferences into your buying choices. That makes prediction of paramount importance! One thing that really surprised me is that Stitch Fix designs and even manufactures some of its own clothing, so they have unique lines of clothing, adapted to their customers' preferences!

And, obviously (but new to me): Clothing is combinatoric! Even in making a standard button-up shirt, there are myriad few-way decisions about collar, buttons, sleeves, relative dimensions, and so on, such that there is no way in the history of all of humankind that you could make every possible (or even every sensible!) version of a standard dress shirt. That puts a data-science-oriented company like Stitch Fix in a very, very interesting position.



In an undisclosed location, I continued to write in my Gaia likelihood function document. I'm not sure why! But I'm more convinced than ever that we can deliver likelihood (rather than posterior) information in future versions of (at least some kinds of) probabilistic catalogs.


a likelihood function for Gaia

I spent time on the long weekend and today working on a short paper or note on the Gaia likelihood function. My point is trivial: It is just to say that we do inference with Gaia by constructing, from the Catalog, a surrogate likelihood function. This is what everyone does but is rarely made fully explicit. I can't tell whether it is worth publishing. But I find myself compelled to write it nonetheless. I guess it makes the additional point that catalogs from projects like Gaia should be based on likelihood information, not posterior information. Why? Because users need to multiply in new likelihoods, not new posteriors, to update their beliefs!