searches for anomalies

Today Kate Storey-Fisher (NYU) and I met with Mike Blanton (NYU) and Zhongxu Zhai (NYU) to discuss possible projects that Storey-Fisher and I have been talking about. We are thinking about trying to systematize (and pre-register) the search for anomalies in cosmological surveys. The idea (which is still vague) is to somehow lexicographically order all anomalies we could search for, and then search, such that we can keep exquisite track of the number of independent hypotheses we have checked.

Blanton and Zhai had some advice for us. One category of advice was around systematics: Anomalies and systematics in the data might appear similar! So we should think about anomalies that are somehow least sensitive to these systematics. One good thing is that we are working at the home of many of the tools that we need to make these assessments. Another category of advice was to think about what anomalies are motivated by questions of theory in the dark sector, in galaxy formation, or in the initial conditions. Theory-inspired (if not predicted) anomalies are more productive, in a scientific-literature sense, than randomly specified anomalies. We are close to being able to specify a project!


detailed abundances and stellar companions

Taisiya Kopytova arrived in NYC for a few days to work on stellar abundances and orbital companions. Her project is very well designed: She has a set of red-giant stars in APOGEE where we know they have companions. For each of these stars with companions, she has found a set of matched stars—matched in stellar parameters—that don't have companions (or not companions that are detectable). She then compares the detailed chemical abundances between these two samples. The approach is extremely conservative and very robust to problems in the data: For a false effect to appear, it has to be an effect that causes a companion to be detected (or not detected)! And she finds signals.

One disturbing thing is that we find signal-to-noise effects, and we get slightly different results when we use APOGEE DR13 or DR14 data. So we might need to match on signal-to-noise as well as stellar parameters.


paper scopes, voids

In Friday parallel-working session, Megan Bedell (Flatiron) and I discussed (for the nth time) the scope of the first paper and next papers in our extreme-precision radial-velocity work. We realized that paper 1 is pretty-much ready to go! We also realized that the point should not be about what people might be doing wrong, but about what things you can do that are easy and close to correct. In particular, the point that a data-driven spectral model can come close to saturating the Cramér–Rao bound on radial-velocity. This was not obvious at the outset, because some of the information in the data must go into the spectral model (and not the RV measurement). That's a good point!

Renée Hlozek (Toronto) gave the Astrophysics Seminar. In part she talked about the negative S–Z effect from voids. In another part, she talked about constraining light scalar dark matter with large-scale structure. Both problems I am interested in for near-future research. In the afternoon, she and Alex Malz (NYU) and I talked about advising and mentoring. Hlozek is a deep thinker about these things.


geometry bugs

In our weekly meeting, Ana Bonaca (Harvard) and I discovered a super-subtle bug in how the covariance matrices we are making (context: the Cramér–Rao bound on Milky-Way parameters given observations (and a model) of cold stellar streams) are being plotted. Damn geometry is hard! But she fixed the code and all our covariances look really good now. We think we understand the trade-offs between different parameters, given different data. Time to write! And use the framework for planning new observations.

I spent the rest of the day not working on my NSF proposal, which is very bad!


Gaia DR2 halo and disk projects

Today Kathryn Johnston (Columbia) came through Flatiron to discuss Gaia DR2 projects in the Milky-Way halo. She made the very nice point that we could use a Gaia simulator like PyGaia to “observe” the Bullock & Johnston all-substructure simulations to see how halo substructure appears in a realistic DR2 data set. We discussed clustering algorithms and the relationships between applying clustering directly to the observed data vs transforming the data to some better space (invariants, say) vs doing some kind of inference or data-driven model that respects the Gaia noise model and so on. We are looking for methods that will be powerful, but simple, since we are looking for fast projects to do in the immediate follow-up period to the data release.

Our conversation veered into chemical-abundance space, where we all realized that Megan Bedell (Flatiron) is sitting on an amazing chemical-tagging data set. She only has 80 stars, but because they are Solar twins, they have exceedingly good chemical measurements. Can we use these to measure scattering processes in the Milky-Way disk?

We also briefly discussed something inspired by Alyssa Goodman (Harvard), who spoke first thing in the morning at the Scientific Visualization conference that is on at Flatiron: Can we measure our position relative to the disk plane, and maybe see fluctuations in that plane? Goodman says that the Sun is 25 pc above the plane, and that is obvious (she says) from the radio observations of HI gas. But Bovy (if I recall correctly) looked at this in Gaia DR1 and finds that our offset from the midplane is less than 10 pc. Is there an offset between stars and gas? If so, why? If not, who is wrong? Great set of questions for DR2.


heteroskedastic GPLVM; search for anomalies

Christina Eilers (MPIA) and I have decided to re-implement the Gaussian Process latent-variable model, with modifications that permit the data to be heteroskedastic (and missing) and the kernel function to be different along different dimensions of the data space. We spent an hour today de-bugging analytic derivatives. We need these, because there is a non-convex optimization as part of that model. We resolved to bring the action to New York and have Foreman-Mackey (Flatiron) help us re-implement everything in george. I was left with homework to write this model down in full generality.

Kate Storey-Fisher (NYU) and I got close to specifying a well-posed problem in our nascent project to find CMB-like anomalies in large-scale structure data. We read this paper by NYU locals about prospects for future surveys, but we want to work with real data if we can. We discussed how a search for anomalies can be cast as a parameter estimation problem. We haven't settled on a methodology, though.


linear models for nuisances

The day started with Rodrigo Luger (UW) and Dan Foreman-Mackey (Flatiron) and me discussing a range of projects. They endorsed my general idea of looking for planets by searching resonances! Which is good. We tentatively decided to try to write one of the new ApJ Research Notes about our systematics models for Kepler and other projects. There are a lot of unifying good ideas there; let's spread the Good News. The idea is that it is possible to simultaneously fit a linear model and marginalize it out with a simple linear-algebra move. Work on that started almost immediately.

Lauren Anderson (Flatiron) visualized the proper motions of stars in the Galaxia model as a function of sky position and distance, to see if proper motions can be used to infer distances by methods that are more clever than reduced proper motion. It looks like they can be! We discussed further improvements to the visuals, with help from Vasily Belokurov (Cambridge).


stellar age–velocity relation

Jonathan Bird (Vandy) and I spent the morning working together on his paper on the age–velocity relationship in the Milky-Way disk. He has absolutely beautiful results, from APOGEE red-clump stars and Gaia DR1 transverse kinematics. The thing that is new is that (thanks Martig and Ness) he has actually useful age estimates for many hundreds of stars. And we will have the same for tens of thousands in the overlap with Gaia DR2. Indeed, we commented in the paper that SDSS-V will make this possible at scale. The great thing about the ages is that even with hundreds of stars, we get a comparable measure of the age–velocity relation to studies that involved orders of magnitude more stars.

We discussed the final presentation in the paper. We worked through the figures and drew a simple graphical model to illustrate the project. We then went, very carefully, through the assumptions of the project, so we can state them explicitly at the outset of our methods section, and then use them to structure the discussion at the end. It's a fun intellectual exercise to go through these assumptions carefully; somehow you only understand a project substantially after it is finished!


self-calibration of stellar abundances

I spent the day at Vanderbilt, where I gave a talk and had many valuable conversations. Some were about data science: Andreas Berlind (Vanderbilt) is chairing a committee to propose a model for data science at Vanderbilt. We discussed the details that have been important at NYU.

One impressive project I learned about today was Hypatia, a compendium of all detailed stellar abundance measurements (and relevant housekeeping data) in the literature. Over dinner, Natalie Hinkel (Vanderbilt) and I discussed the possibility that this catalog could be used for some kind of self-calibration of all abundance measurements. That's an interesting idea, and connects to things I have discussed over the years with Andy Casey (Monash).


self-calibrating pulsar arrays, and much more

I had a great conversation with Chiara Mingarelli (Flatiron) and Ellie Schwab (AMNH) today about pulsar-timing arrays and gravitational-wave sources. We are developing some ideas about self-calibration of the arrays, such that we might be able to simultanously search for coherent sources (that is: not just stochastic backgrounds) and also precisely determine the distances to the individual pulsars to many digits of accuracy!. It is futuristic stuff, and there are lots of ways it might fail badly, but if I am right that the self-calibration of the arrays is possible, it would make the arrays a few to tens of times more sensitive to sources! We started with Mingarelli assigning us some reading homework.

In the Stars group meeting, we had a productive discussion led by Megan Bedell (Flatiron), Andrew Mann (Columbia), and John Brewer (Yale) about things learned at the recent #KnowThyStar conference. There are some new uses of machine learning and data-driven models that I might need to spend some time criticizing! And it appears that there are some serious discrepancies between asteroseismic scaling relations for stellar radii and interferometric measurements. Not bigger than those expected by the stellar experts, apparently, but much bigger than assumed by some of the exoplanet community.

Prior to that, in our weekly Gaia DR2 prep working session, we discussed the use of proper motion as a distance indicator in a post-reduced-proper-motion world. That is: The assumptions underlying reduced proper motion are not great, and will be strongly violated in the DR2 data set. So let's replace it with a much better thing!

Adrian Price-Whelan (Princeton) showed some incredible properties of (flowing from beautiful design of) the astropy coordinates package. Damn!


writing projects

Coming off my personal success of (finally) getting a paper on the arXiv yesterday (check the footnote on the cover page), I worked through two projects that are close to being writeable or finishable. The first is a paper with Stephen Feeney (Flatiron) on the Lutz-Kelker correction, when to use it (never) and what it is (a correction from ML to MAP). The second is a document I wrote many months ago about finding similar or identical objects in noisy data. After I read through both, I got daunted by the work that needs to happen! So I borked. I love my job! But writing is definitely hard.


discovery! submission!

It was an important day for physics: The LIGO/VIRGO collaboration and a huge group of astronomical observational facilities and teams announced the discovery of a neutron-star–neutron-star binary inspiral. It has all the properties it needs to have to be the source of r-process elements, as the theorists have been telling us it would. Incredible. And a huge win for everyone involved. Lots of questions remain (for me, anyway) about the 2-s delay between GW and EM, and about the confidence with which we can say we are seeing the r process!

It was also an unusual day for me: After working a long session on the weekend, Dan Foreman-Mackey (Flatiron) and I finished our pedagogical document about MCMC sampling. I ended the day by posting it to arXiv and submitting it (although this seems insane) to a special issue of the ApJ. I don't write many first-author publications, so this was a very, very good day.


calibration of ZTF; interpolation

I am loving the Friday-morning parallel working sessions in my office. I am not sure that anyone else is getting anything out of them! Today Anna Ho (Caltech) and I discussed things in my work on calibration and data-driven models (two extremely closely related subjects) that might be of use to the ZTF and SEDM projects going on at Caltech.

Late in the morning, an argument broke out about using deep learning to interpolate model grids. Many projects are doing this, and it is interesting (and odd) to me that you would choose a hard-to-control deep network when you could use an easy-to-control function space (like a Gaussian Process, stationary or non-stationary). But the deep-learning toothpaste is hard to put back into the tube! That said, it does have its uses. One of my medium-term goals is to write something about what those uses are.


age-velocity; finishing

I had a great, long call with Jonathan Bird (Vandy) to discuss his nearly-finished paper on the age–velocity relation of stars in the Gaia DR1 data. We discussed the addition of an old, hot population, in addition to the population that shows the age–velocity relation. That's a good idea, and accords with our beliefs, hence even gooder.

I spent the rest of my research time today working through the text of Dan Foreman-Mackey (Flatiron) and my MCMC tutorial. We are trying to finish it this week (after five-ish years)!


WDs in Gaia, M33, M stars, and more

In our weekly parallel-working Gaia DR2 prep meeting, two very good ideas came up. The first is to look for substructure in the white-dwarf sequence and see if it can be interpreted in terms of binarity. This is interesting for two reasons. The first is that unresolved WD binaries should be the progenitors of Type Ia supernovae. The second is that they might be formed by a different evolutionary channel than the single WDs and therefore be odd in interesting ways. The second idea was to focus on giant stars in the halo, and look for substructure in 3+2-dimensional space. The idea is: If we can get giant distances accurately enough (and maybe we can, with a model like this), we ought to see the substructure in the Gaia data alone; that is: No radial velocities necessary. Of course we will have radial velocities (and chemistry) for a lot of the stuff.

In the stars group meeting, many interesting things happened: Anna Ho (Caltech) spoke about time-domain projects just starting at Caltech. They sure do have overwhelming force. But there are interesting calibration issues. She has accidentally found many (very bright!) flaring M stars, which is interesting. Ekta Patel (Arizona) talked about how M33 gets its outer morphology. Her claim is that it is not caused by its interaction with M31. If she's right, she makes predictions about dark-matter substructure around M33! Emily Stanford (Columbia) showed us measurements of stellar densities from exoplanet transits that are comparable to asteroseismology in precision. Not as good, but close! And different.

In the afternoon I worked on GALEX imaging with Dun Wang (NYU), Steven Mohammed (Columbia), and David Schiminovich (Columbia). We discussed how to release our images and sensitivity maps such that they can be responsibly used by the community. And Andrina Nicola (ETH) spoke about combining many cosmological surveys responsibly into coherent cosmological constraints. The problem is non-trivial when the surveys overlap volumetrically..


a day at MIT

I spent the day today at MIT, to give a seminar. I had great conversations all day! Just a few highlights: Rob Simcoe and I discussed spectroscopic data reduction and my EPRV plans. He agreed that, in the long run, the radial-velocity measurements should be made in the space of the two-d pixel array, not extracted spectra. Anna Frebel and I discussed r-process stars, r-process elements, and chemical-abundance substructure in the Galaxy Halo. We discussed the immense amount of low-hanging fruit coming with Gaia DR2. I had lunch with the students, where I learned a lot about research going on in the Department. In particular Keaton Burns had interesting things to say about the applicability of spectral methods in solving fluid equations in some contexts. On the train up, I worked on the theoretical limits of self-calibration: What is the Cramér–Rao bound for flat-field components given a self-calibration program? This, for Euclid.


Euclid and MCMC

I did some work on the (NYC) long weekend on two projects. In the first, I built some code to make possible observing strategies for the in-flight self-calibration program for ESA Euclid. Stephanie Wachter (MPIA) contacted me to discuss strategies and metrics for self-calibration quality. I wrote code, but realized that I ought to be able to deliver a sensible metric for deciding on dither strategy. This all relates to this old paper.

On Monday I discussed our nearly-finished MCMC paper with Dan Foreman-Mackey (Flatiron) and we decided to finish it for submission to the AAS Journals. I spent time working through the current draft and reformatting it for submission. There is lots to do, but maybe I can complete it this coming week?


dust-hidden supernovae

In my weekly parallel-hacking, I re-learned how to use kplr with Elisabeth Andersson (NYU).

This was followed by a nice talk by Mansi Kasliwal (Caltech) about the overwhelming force on time-domain astronomy being implemented by her and others at Caltech. One of their projects will be imaging more than 3000 square degrees an hour! There isn't enough solid angle on the sky for them. She is finding lots of crazy transients that are intermediate in luminosity between supernovae and novae, and she doesn't know what they are. Also she may be finding the (long expected) fully-obscured supernovae. If she has found them, she may be doubling the observed supernova rates in nearby galaxies. Great stuff.

The day ended with lightning talks at the CCPP, with faculty introducing themselves to the new graduate students.


uncertainty propagation

I started the day with a long discussion with Ana Bonaca (Harvard) about how to propagate uncertainties in Galactic gravitational potential parameters into some visualization about what we (in that context) know about the acceleration field. In principle, the acceleration field is more directly constrained (by our dynamical systems) than the potential. What we want (and it is ill-posed) is some visualization of what we know and don't know. Oddly, this conversation is a conversation about linear algebra above all else. We both admitted to each other on the call that we are both learning a lot of math in this project!

[My day ended early because: NYC Comic Con!]


Gaia and exoplanets

At our weekly Gaia DR2 prep workshop, a bunch of good ideas emerged from Megan Bedell (Flatiron) about exoplanet and star science. Actually, some of the best ideas could be done right now, before DR2! These include looking at our already-known co-moving pairs of stars for examples with short-cadence Kepler data or known planetary systems. There is also lots to do once DR2 does come out. In this same workshop, David Spergel (Flatiron) summarized the work that the Gaia team has done to build a simulated universe in which to test and understand it's observations. These are useful for trying out projects in advance of the data release.

In the afternoon, everyone at the Flatiron CCA, at all levels, gave 2-minute, 1-slide lightning talks. It was great! There were many themes across the talks, including inference, fundamental physics, and fluid dynamics. On the first topic: There is no shortage of people at Flatiron who are thinking about how we might do better at learning from the data we have.


systematizing surprise; taking logs

I had a substantial conversation with Kate Storey-Fisher (NYU) about possible anomaly-search projects in cosmology. The idea is to systematize the search for anomalies, and thereby get some control over the many-hypotheses issues. And also spin-off things around generating high-quality statistics (data compressions) for various purposes. We talked about the structure of the problem, and also what are the kinds of limited domains in which we could start. There is also a literature search we need to be doing.

I also made a Jupyter notebook for Megan Bedell (Flatiron), demonstrating that there is a bias when you naively take the log of your data and average the logs, instead of averaging the data. This bias is there even when you aren't averaging; in principle you ought to correct any model you make of the log of data for this effect, or at least when you transform from linear space to log or back again. Oh wait: This is only relevant if you are not also transforming the noise model appropriately! Obviously you should transform everything self-consistently! In this case we have nearly-Gaussian noise in the linear space (because physics) and we want to treat the noise in the log space as also linear (because computational tractability). Fortunately we are working with very high signal-to-noise data, so these biases are small.


exploration vs exploitation

I met with Lauren Anderson (Flatiron) first-thing to figure out how we can munge our hacky #GaiaSprint projects into real and cutting-edge measurements of the Milky Way. We looked at the VVV infrared survey because it ought to be better than 2MASS for mapping the inner disk and bulge. We looked at using SDSS photometry to map the halo. On the latter, the dust modeling is far simpler, because for distant stars, the dust is just a screen, not an interspersed three-dimensional field. We also discussed the ever-present issue for a postdoc (or any scientist): How much time should you spend exploiting things you already know, and how much exploring new things you want to learn?

In the morning I also discussed the construction of (sparse) interpolation operators and their derivatives with Megan Bedell (Flatiron).

At lunch, Yacine Ali-Haimoud (NYU) gave a great brown-bag talk on the possibility that black holes make up the dark matter. He showed that there are various different bounds, all of which depend on rich astrophysical models. In the end, constraints from small-scale clustering rule it out (he thinks). Matt Kleban (NYU) and I argued that the primordial black holes could easily be formed in some kind of glass that has way sub-Poisson local power. Not sure if that's true!