On my first day in Heidelberg, I attended a colloquium talk by Maryam Modjaz (NYU), about exploding stars. She has very nice results on the metallicities of the environments (galaxy hosts) of supernovae and gamma-ray bursts. She can show that the type Ic broad-lined supernovae are different in chemical environments than the type Ic normal supernovae, and she can show that the SNe associated with GRBs are Ic broad-lined. So the non-GRB type Ic broad-lined supernovae are very likely the counterparts of off-axis gamma-ray bursts. The gamma-ray bursts without gamma rays! This is exciting, because it will bolster the model for GRBs and constrain the beaming.
It was with the greatest pleasure that I participated in the PhD defense today of Dun Wang (NYU), who has been my student this last five years. He has done a remarkable body of work: He has a very good model for the NASA Kepler data, using pixels to predict other pixels. He has a completely novel method for image differencing, where he doesn't need a reference image (and instead uses a time series of images to build a predictive model). And he has a data-driven model for the pointing (as a function of time) and sensitivity map for the last days of the NASA GALEX mission, where the camera was scanned rapidly back and forth across the Galactic Plane.
I have many things to say about this work, but here are just a few: Wang's work encouraged me to think about extremely big models! I think his model of the Kepler data has more free parameters than any model of anything, ever (literally close to a trillion). Gotta love convexity! He used his image differencing to discover completely new microlensing events in the K2 Campaign 9 data. He has the first ever ultraviolet maps of the Milky Way disk plane at this depth and resolution. It is a very impressive body of work.
Congratulations Dr Wang. And thank you!
My only real research today was a session with Dun Wang (NYU) in preparation for his PhD defense. I encouraged him to talk about less, not more: The defense (at least here) doesn't need to be about everything (that would take hours, anyway); it should be about what you learned, and what was most fun.
Foreman-Mackey (Flatiron) showed me something interesting today: He is doing full sampling of one-planet and two-planet transiting systems in Kepler but using models that have many more than one or two planets. Where many is like four! But he is learning some very interesting things: One is that you can do this, as long as you let planet radii go to zero. Another is that the ease of sampling depends strongly on the eccentricity prior. That isn't surprising in retrospect.
One of the motivations of this, I think, is to get away from computing Bayesian evidence between different multiplicity models: After all, to compute these ratios, you have to sample a large N, so why not just do that one N once and treat the problem as a parameter-estimation problem, rather than an evidence problem? That's dear to my heart. Another angle that I'm interested in is the following: We know that our own Solar System in fact has thousands to millions of planets; can we deal with it in a more non-parametric way?
Crazy idea: Send N to infinity and fix the planet periods (say) and then see if you can sample in the other orbital parameters. Right now that doesn't seem feasible, but it might be the truly non-parametric approach.
Today was the hack day associated with #wetton18. It was a great day! I had an incredibly limited goal: As I mentioned, I learned yesterday that some Gaia Bp Rp spectra (the low-resolution spectrophotometry) have been released with the Gaia transient alerts. They are uncalibrated, and possibly heavily affected by systematics, but there are many thousands of them! So my goal was to just plot some of these spectra.
What a success this was! Once I realized (and announced) that the project involves scraping data from web pages, Brigitta Sipocz (Cambridge) immediately volunteered to help. She (incredibly quickly) built a tool that scrapes the raw Gaia data from the alerts pages, refactors it into a correctly formatted astropy table, and writes it out as a fits file, structured so that multiple scrapings can be concatenated into a larger table.
We made the visualization shown in this tweet. That shows a blue star that is fading. Because it is a relatively normal star (that is, not a supernova), maybe we could use it, and others like it, to build some kind of model of the Bp Rp spectra. Our code is here.
The Wetton Workshop opened today with amazing talks by Udalski and Wyrzykowski about the OGLE project and data. It is truly incredible what has been achieved in this survey, which was designed with a very forward-looking goal of detecting microlensing by compact objects in the dark sector. The project detected all kinds of other expected and unexpected time-domain phenomena. These talks were followed by Alexander Scholz (St Andrews) providing some philosophical basis for looking for and at anomalies in data streams. He gave the good advice (and OGLE is a great example of this) to look at timescales or wavelengths or precisions where no-one has looked before. Hear, hear! (He is also the lead of the WETI project, of which I am a big fan.)
There were too many things that I loved today; I can't list them all here! But one personal highlight was an exciting talk by Thomas Wevers about the Gaia alerts system, which is putting Gaia data on-line in real time when stars vary strongly, or when new sources appear on the sky. It produces a few alerts a day, and the data dump includes the epoch photometry and the raw Bp-Rp low-resolution spectra! This got me extremely excited: I haven't seen any Bp-Rp spectra yet, and there are now thousands online. I resolved to look at them asap. Wevers warned us that the spectra are not calibrated in any sense: Not in wavelength or in photometrically.
Today was the first day of the Wetton Workshop at Oxford. There were many interesting talks from all over the map, but with a goal at understanding how we make sure that we stay open to unexpected discoveries, even as we make more and more targeted data sets and experiments. One theme that emerged is that of systematics: As you push data harder and harder&mdashin cosmology or exoplanet search or anything else—you become more and more sensitive to the details of your hardware and electronics and selection and so on. This led to a discussion of end-to-end simulation of data sets to understand how hardware issues enter and to see if we understand the hadware.
That's important! But I think there is an equally important aspect to this: If we don't take our data with sufficient heterogeneity, we can't learn certain things. For example, if you take all LSST exposures at 15 seconds, you never test the shutter, never test linearity of the detector, never find out on what time scales the PSF is changing, and so on. For another, if you take all the Euclid imaging survey on a regular grid, you never get cross-calibration information from one part of the detector to another, nor can you find certain kinds of anisotropies in the detector or the point-spread function. If we are going to saturate the bounds, we are going to need to take science data in many, many configurations.
Here are the slides from the public talk I gave at the end of the day. Note my digs at press-release artists' conceptions. I think we should be honest about what we do and don't know!
I traveled to Oxford today for #wetton18. As part of this meeting, I am giving a public talk in Oxford. I spent the time on the plane I should have spent sleeping making slides. I think the hard thing about a public talk is always level; there is always a diverse audience with very different interests and backgrounds. The talk I made—about finding planets—requires some sophisticated reasoning to make sense. I fear that I have bitten off too much.
One thing I will definitely put into the talk is some material about the limitations of our knowledge in astrophysics. It comes from the point that many things can't be independently confirmed, especially when they are at the limits of our observing capabilities. It is a bit hard to present this without sounding like we don't believe anything. That's a challenge.
It was my great pleasure to sit on the PhD defense committee for the successful defense of Sarah Pearson. She wrote a thesis about low-mass galaxies and globular clusters, considering both their interactions with each other, and with the bigger galaxies into which they later fall. She has some nice analyses of the Palomar 5 tidal stream, and what it's morphology might tell us about the Milky Way halo and bar. And also nice results on gas bridges and streams around pairs of dwarf galaxies.
I was most interested in her stellar-stream results, including several things I hadn't thought about before: One is that prograde streams are more affected by the bar and spiral arms in the disk than retrograde streams. Another is that we might be able to find globular-cluster streams around other galaxies nearby. That would be incredible! And since (as she showed) you can learn a lot about a galaxy just from the shape of a stream, we might not need to do much more than detect streams around other galaxies to learn a lot. It was a pleasure to serve on the committee, and it is a beautiful body of work.
Today was my last day and fifth lecture at TASI. This lecture was crowd-sourced in content! I spoke about Fisher information, linear algebra tips and tricks, and decision theory and model selection. On the latter I strongly advocated engineering methods like cross-validation!
Over lunch I had a great set of conversations with Zach Berta-Thompson about precise measurement for exoplanets, and also hack weeks like the #GaiaSprint. We went deep into the limits on ultra-precise photometry from the ground. We wondered at the point that the best imaging systems get the best precision (on photometry, of point sources) by de-focusing. That has always struck me as somehow absurd, though it's true that you don't have to understand your system nearly so well when you are out of focus (for many reasons).
We had one very good idea: Instead of de-focusing, put in an objective prism! You could get many of the benefits of de-focus but also get far more information about the atmosphere and speckle and scintillation and so on. In principle, you might beat the best measurements made to date. And it is a cheap experiment to perform.
Today my lecture was crowd-sourced! In response to popular opinions from the students, I spoke about cosmological large-scale structure experiments. I spoke about how the large surveys are collapsed to symmetry-respecting mean, variance, and three-point functions, and how simulations of large-scale structure are used to build surrogate likelihood functions for these summary statistics.
Today was my second day of lecturing at TASI, and I gave one (morning) lecture, on the use of MCMC sampling. In the afternoon, I looked at (for the first time) the GALAH data on detailed element abundances of stars. I looked at the question of whether the chemical abundances could be used to predict the Galactocentric radii. The idea is: If the gas involved in star formation is azimuthally mixed, there ought to be relationships between radius in the disk and chemical abundances. They didn't jump out! I have various ideas about why, but for now this will be back-burner.
Today was the sixth day (but my first day) of the Theory Advanced Study Institute summer school at CU Boulder. I gave two 75-minute lectures on data analysis, my first two of five lectures this week. In the first, I tried to boil down data-analysis to a set of over-arching principles. I got 8 principles. Maybe this is the introduction to the book I will never write! In the second lecture I spoke about fitting a model, from a frequentist perspective, but with a focus on the likelihood function. I am loving the interactive audience. See the wiki for a (constantly updating) description of the lectures I am giving.
After a long week (and some great success), all Christina Eilers (MPIA) and I had in us to do today was make the short-term to-do list for our spectroscopic parallax project (which, by the way, Hans-Walter Rix thinks we shouldn't name that way!) and our related Milky-Way mapping project. In my wrap-up slide, I used my two minutes to speak of the conceptual things we learned about linear models and their power.
The wrap-up was given in two separate groups, in parallel. We were forced to this by space and the size of the Sprint. There were many complaints! But if you want to look at the incredible set of wrap-up slides, look here! You will see some amazing things in there. I was blown away, and several participants told me that it was a very important meeting for them. Our explicit (not implicit) goal is to increase the scientific productivity of Gaia and the community of astrophysics that it supports; I very much we hope we succeed in doing that. Today, I am optimistic that we can.
Because we did so many experiments this year, with selection, with splitting the group, with communication, and so on, we learned a lot. We made many mistakes. I hope we can capitalize on these mistakes to learn for future projects, like the next Sprint, and all the other hacking and sprinting and parallel-working things we do.
It's hard work, this full week of sprinting! Especially following a week of hacking in preparation! I was exhausted today (and I can't entirely blame Andy Casey, though I'd like to). Christina Eilers (MPIA) continued with her map-making work. We started the day by trying to find chemical-abundance neighborhoods (that is, regions of element-abundance-space) where the stars lie on a ring in the Milky-Way disk. There should be such rings if we can measure the abundances well enough! But we failed.
In other news, Andy Casey (Monash) and Adrian Price-Whelan (Princeton) asserted to me that they can take the stars in APOGEE with multi-modal posterior pdfs in orbital-companion space (that we produced here) and rule out some modes just with the Gaia DR2 radial-velocity mean and variance (which is all we get!). I hope this is true.
And in yet other news, David Spergel (Flatiron) and Megan Bedell (Flatiron) not only found co-moving stars in the halo, but find that as the separations get large, the velocity vectors point parallel to (or anti-parallel to) the separation vector between the stars. Duh! Disrupted binaries are two-star streams. I pointed out (to some skepticism) that the velocity differences between very wide pairs of nearly-comoving stars could be used to make local acceleration maps of the Milky-Way halo. Stoked!
Boris Leistedt (NYU) and I have been talking for a while about a set of subjects related to the point that proper motions and parallaxes are both inversely related to distance, so you can use them to inform one another. This is a covariance induced by the geometry! Today he got this all working, along with a hierarchical inference of the velocity distribution in the Milky-Way halo. It is early days, but it looks like he substantially improves the parallax estimates for most stars. And, importantly, he can produce improved parallax likelihoods not just improved parallax posteriors. That is, they have wider use in downstream inference than, say, the Bailer-Jones et al distances. But still they will be hard to use absolutely correctly.
Andy Casey (Monash) and I discussed a possible a non-parametric model for the radial-velocity scatter delivered in Gaia DR2. This model would compare any star to its neighbors in relevant parameters (like color and apparent magnitude and housekeeping flags) to establish whether it has enough of a RV excess to be considered a likely binary.
Ana Bonaca (Harvard) showed me maps of the Jhelum stellar stream which make it look (to my eye) like a fold caustic! Many moons ago, Scott Tremaine (IAS) asked me if we could find various kinds of catastrophes in the stellar density, and I (and friends) responded with this paper on the cusp catastrophe. Maybe Bonaca has found one, but a fold! (And folds should be more common than cusps.)
Christina Eilers (MPIA) and I temporarily paused our methodological developments on our spectroscopic-parallax project and made maps of the Milky-Way disk. We tried plotting velocities, abundances, and vertical distortions (warps). Getting good visualizations is hard because the APOGEE selection function is so featured. That reminds me of why I am such a big fan of SDSS-V!
Many interesting things were shown in the afternoon check-in, but incredibly Sihao Cheng (JHU) and Sergey Koposov (CMU) found that galaxies appear in the Gaia DR2 data as variable stars! Why? Because the asymmetric Gaia point-spread function projects onto the complex galaxy morphology differently at different s/c orientations. That rocks! In principle the galaxy morphologies could be inferred from the time-variable data...
Oh yes it worked! Today Christina Eilers (MPIA) clearly got 10-percent parallax precision with her linear, data-driven spectroscopic parallax model, making use of APOGEE data and WISE and Gaia photometry. The model is literally a linear combination of inputs, with a hard regularization and cross-validation to protect against over-fitting. Because our outputs are the cross-validation predictions, every spectroscopic parallax we produce is technically independent of the Gaia training data (although there are some residual correlations etc if you really want to go deep). From Daniel Michalik (ESTEC) we learned a lot about both astrometric and photometric data-quality filtering for the Gaia data, which (we can see in the residuals) will further improve our results!
Because the model is purely linear, we can propagate uncertainties easily, and we can “run it both ways” as it were. We can definitely do better if we go to a nonlinear model, because linearity is such an absurdly difficult constraint. However, it is so beautiful to have a linear relationship, we might stay here for a few papers!
Many incredible results appeared today, but one that struck me is the following: Laura Inno (MPIA) looked at the Cepheid variables in the data, where she has ages and photometric distances and kinematics. She clearly sees a substantial warp in the outer disk. The question came up: Is this the same as the warp in the gas disk?
Today was the first day of the 2018 NYC Gaia Sprint, with satellite events in Santa Barbara and Seattle. 90 astrophysicists converged on Flatiron to pitch and start hacking. For those who don't know the event, the idea is that it is a working meeting, where participants are asked to move their scientific projects forward and start new ones, and there is (almost) no formal schedule at all. Everything other than the pitch at the beginning and the wrap-up at the end are crowdsourced.
One of the two scheduled hours of the entire week was the introductory pitches today. The pitches ranged across a huge range of topics and interests. The pitch slides are here.
As my loyal reader knows, there are far more projects to do with Gaia DR2 than years left in my life, so I have to choose! I decided, with little consideration, to concentrate on the spectroscopic-parallax project with Eilers (MPIA). This is a tool-building project, with methodological aspects that are interesting, and so it isn't a terrible choice. It also serves the long-term goals of SDSS-V. In service of this project, we matched our sample to the WISE data and removed our Galactic latitude cuts so that the model could automatically capture the dust reddening and extinction. We'll see if that works, tomorrow.
At the evening check-in, some great stuff was shown, especially some extremely odd kinematics of the Milky Way disk, and some population results on binary stars.
Christina Eilers (MPIA) and I went off the reservation today and implemented a L1-regularized linear regression method for our spectroscopic-parallax project. This permitted us to consider using the full spectrum as a feature vector, and not just derived quantities. That is, it obviated a lot of our feature engineering! But we also discovered a massive bug: We had been using the uncertainties where we should have had our inverse variances! That's been done before. But when we made these changes, we got better results; it looks like we might be able to beat 10-percent distances with a little more work.
Andy Casey (Monash) and Natalie Hinkel (Vanderbilt) showed me the self-calibration results they have from the Hypatia Catalog. The results are beautiful! They have affine translations between labels in one survey into labels in another survey, and de-noised labels for all surveys. It is cool! Much more needs to be done. But a great start, and very promising for answering some of my questions about accuracy and precision.
At my data-group group meeting, each participant had a short time interval to explain a figure they are working on. The range of subjects shown was amazing! And we learned that having everyone talk for a well-defined pre-set short time is better than having a few people talk for an undefined amount of time. That's a win.