empirical models for APOGEE spectra

I spent a chunk of the day with Melissa Ness (MPIA), fitting empirical models to APOGEE infrared spectra of stars. The idea is to do a simple linear supervised classification or regression, in which we figure out the dependence of the spectra on key stellar parameters, using a "training set" of stars with good stellar parameters. We worked in the pair-coding mode. By the end of the day we could show that we are able to identify regions of the spectrum that might serve as good metallicity indicators, relatively insensitive to temperature and log-g. The hopes for this project range from empirical metallicity index identification to label de-noising to building a full data-driven (supervised) stellar parameter pipeline. We ended our coding day pretty optimistic.


dust priors and likelihoods

Richard Hanson and Coryn Bailer-Jones (both MPIA) and I met today to talk about spatial priors and extinction modeling for Gaia. I showed them what I have on spatial priors, and we talked about the differences between using extinction measurements to predict new extinctions, using extinction measurements to predict dust densities, and so on. A key difference between the way I am thinking about it and the way Hanson and Bailer-Jones are thinking about it is that I don't want to instantiate the dust density (latent parameters) unless I have to. I would rather use the magic of the Gaussian Process to marginalize it out. We developed a set of issues for the document that I am writing on the subject. At Galaxy Coffee, Girish Kulkarni (MPIA) gave a great talk about the physics of the intergalactic medium and observational constraints from the absorption lines in quasar spectra.


quasar continuum blueward of Lyman alpha, Galactic center

If you go to the blue side of Lyman alpha, at reasonable redshifts (say 2), the continuum is not clearly visible, since the forest is dense and has a range of equivalent widths. Any study of IGM physics or radiation or clustering or ionization depends on an accurate continuum determination. What to do? Obviously, you should fit your continuum simultaneously with whatever else you are measuring, and marginalize out the posterior uncertainties on the continuum. Duh!

That said, few have attempted this. Today I had a long conversation with Hennawi, Eilers, Rorai, and KG Lee (all MPIA) about this; they are trying to constrain IGM physics with the transmission pdf, marginalizing out the continuum. We discussed the problem of sampling each quasar's continuum separately but having a universal set of IGM parameters. I advocated a limited case of Foreman-Mackey and my endless applications of importance sampling. Gibbs sampling would work too. We discussed how to deal with the fact that different quasars might disagree mightily about the IGM parameters. Failure of support can ruin your whole day. We came up with a clever hack that extends a histogram of samples to complete support in the parameters space.

In Milky Way group meeting, Ness (MPIA) showed that there appears to be an over-density of metal-poor stars in the inner one degree (projected) at the center of the Milky Way. She is using APOGEE data and her home-built metallicity indicators. We discussed whether the effect could be caused by issues with selection (either because of dust or a different explicit selection program in this center-Galaxy field). If the effect is real, it is extremely interesting. For example, even if the stars were formed there, why would they stay there?


extinction and dust, H-alpha photons

While "off the grid" for a long weekend, I spent time writing documents for Coryn Bailer-Jones (MPIA) and Dennis Zaritsky (Arizona). The former was about using spatial priors for inference of the three-dimensional dust density constrained by Gaia data. If you use a Gaussian Process spatial prior, you can perform the inference in extinction space (not dust space) and transfer extinction predictions to new points given extinction data without ever explicitly instantiating the dust density field. This is not a genius idea; it flows from the fact that any linear projection of a Gaussian pdf is itself a Gaussian pdf. The whole thing might not be computationally tractable, but at least it is philosophically possible. One issue with using a Gaussian Process here is that it puts support onto negative dust densities. I don't think that is a problem, but if it is a problem, the fixes are not cheap.

The latter document—for Zaritsky—is about finding the H-alpha photons that are coming from the outskirts of low-redshift galaxies by doing cross-correlations between SDSS spectroscopy and nearby galaxy centers. This project is designed to test or constrain some of the ideas in a talk at NYU by Juna Kollmeier a few months ago.


coffee, objectives in calibration

At MPIA Galaxy Coffee, Schmidt (UCSB) showed resolved spectroscopy of some highly magnified high-redshift galaxies to look for spatial variations of metallicity. Beautiful data! He also talked about the photon density at high redshift and the implications for reionization. McConnell (Hawaii) spoke about the need for more good black-hole mass determinations. He (inadvertently perhaps) showed that the BH-sigma (velocity dispersion) relation could easily have zero scatter, when you consider the uncertainties in both directions and the possibility that the sigma uncertainties are under-estimated. In general it is very hard to get a sigma on a sigma! Filed away for future thinking.

In the afternoon, I spoke with Wang, Schölkopf, Foreman-Mackey about Kepler calibration. We tentatively decided to make exoplanet discovery our primary objective. An objective is required, because we can set our "hyper-parameters" (our choices about model and model complexity) to optimize anything from exoplanet discovery to exoplanet characterization to stellar rotation determination. I am still worried about the flexibility of our models.


spiral structure in the Milky Way, K2

In Milky Way group meeting, Bovy showed results from a red-clump-star catalog derived from SDSS-III APOGEE and Schlafly showed extinction-corrected stellar maps sliced by distance from PanSTARRS. Both of these data sets plausibly contain good evidence of the spiral arms in stellar density, but neither of the first authors were willing to show me that evidence! I couldn't convince Bovy to help me out, but I think Schlafly might make the plot I want, which is the density as a function of Galactic X and Y for stars at low Z, divided by a smooth Galaxy model.

In a phone call late in the day, Ian Crossfield (Arizona) and Erik Petigura (Berkeley) and I discussed K2 data. I (perhaps unwisely) agreed to see if I can photometer the data, given pixel files.


exoplanet photometry, crossing the streams

Nick Cowan (Northwestern, Amherst) was in town today to give a seminar about exoplanet thermodynamics and climate. He showed nice results inferring the temperature distribution on the surfaces of hot jupiters and the same for degraded data on the Earth. He spent a lot of time talking about thermostats and carbon and water cycles on the Earth and Earth-like planets. In the morning I discussed my OWL photometry with him, and we proposed to try it on his massive amounts of Spitzer IRAC data on transiting exoplanets.

After lunch, a group of the willing (Rix, Bovy, Sesar, Price-Whelan, myself) discussed stream fitting in the Milky Way. We decided to fit multiple streams simultaneously, starting with Orphan, GD-1, and Palomar 5. The first step is to gather the data. Price-Whelan and I also have to modify our method so that it can take the heterogeneous kinds of data in the GD-1 data set.


centroiding and searching

I spoke with Vakili about centroiding stars. We are trying to finally complete a project started by Bovy ages ago to compare the best-possible centroiding of stars with a three-by-three pixel hack related to what is done in the SDSS pipelines. Vakili hit this issue because if you don't have good centroids, you can't get a good point-spread function model. Well, actually I shouldn't say "can't", because you can but then you need to make the centroids a part of the model that you learn along with the point-spread function. That may still happen, but along the way we are going to write up an analysis of the hack and also the Right Thing To Do.

Foreman-Mackey, Fadely, Hattori, and I discussed Hattori's search for exoplanets in the Kepler data. The idea is to build up a simple system based on simple components and then swap in more sophisticated components as we need them. We discussed a bit the question of "search scalar"—that is, what we compute as our objective function in the search. There is a likelihood function involved, but, as we often say in #CampHogg: Probabilistic inference is good at giving you probabilities; it doesn't tell you what to do. Search is a decision problem.


Gaia RVS spectra

As I mentioned a few days ago, Mark Cropper (UCL) brought up the question of how to infer clean, high-resolution spectra from the Gaia RVS data, which are afflicted by distortions caused by charge-transfer inefficiency. I spent my small amount of work time today drafting a document with my position, which is that if you can forward-model the CTE (which they can) then spectral extraction is not hard.


GaiaCal, day four

Tne theme of this short meeting has been that things we thought were real astrophysical parameters just might not be. Soderblom continued along this path when he showed that it is impossible to measure effective temperature to better than something on the order of 100 K. That limits the precision of model comparisons, even in the absence of modeling uncertainties (which are currently large). I argued that perhaps we should just give up on these ersatz "physical" parameters and go to predictions of observables. I was shouted down.

Finkbeiner showed unreal comparisons between SDSS and PanSTARRS photometry, with each survey synthesizing the other. He can show with enormous confidence which survey is responsible for which photometric residuals, just by projecting the residuals onto the sky or into the focal planes of the two instruments. This is an interesting kind of "causal inference". It is incredibly convincing; I must discuss it with Schölkopf next month.


GaiaCal, day three

It was a half-day at the meeting today. Highlights included a talk by Bergemann about stellar models, which (perhaps unwittingly) demonstrated that the effective temperature, spectrum, and color of a star are not unique predictions of a model. That is, all of them are time-dependent; we really need to prediction distributions, not values. In the questioning, Davies pointed out that the variability could be used as a predictor of the depth of the photospheric convection zone or turnover scale. We discussed that later; resolving to think more about it.

Creevey convinced me that asteroseismology might be the One True Path to reliable calibration of astrophysical determinations of stellar densities, internal structure, and ages. An asteroseismology mission with the same scope as Gaia would change everything forever! In the mean time, asteroseismology can be used to calibrate the stars we most care about.

At the end of the sessions, I spoke, emphasizing the need to preserve and transmit likelihood functions, echoing comments by Juric that this is a goal of the planning in the LSST software group.


GaiaCal, day two

A lot of the second day of #GaiaCal was about "benchmark stars". Indeed, my concerns about "truth" from yesterday got more severe: I don't think there is any possibility of knowing the "true" value of Fe/H or anything else for a star, so we should probably figure out ways of doing our science without making reference to such inaccessible things. One approach is to define the results of a certain set of procedures on a certain set of stars to be "truth" and then warp all models and methods to reproduce those truths. That's worth thinking about. During the day's proceedings, Blomme and Worley showed some incredible spectra and spectral results on the benchmarks, including abundance measurements for some 25 elements.

Along different lines, Cropper gave an astounding talk about the Gaia RVS spectral data reduction pipeline. It was astounding because it involves a complicated, empirical, self-calibrated model of charge-transfer efficiency and radiation damage, which is continuously updated as the mission proceeds. It also includes a spectral calibration (wavelength solution, astrometry) that is fit simultaneously with all spectral extractions in a complete self-calibration. Cropper asked the implicit question in his talk: "How do we extract CTE-undamaged spectra from CTE-damaged data?". I think I know how to answer that; resolved to write something about it.

Liu showed amazing empirical calibrations of the LAMOST spectra using SEGUE parameters and support vector machines. He has done a great job of tranferring the one system onto the other data, which gives hope for the benchmark plans that are emerging. Bovy gave a nice talk about ages (birth dates), chemical abundances, and orbital actions serving as stellar "tags" which can then be understood statistically and generatively.


GaiaCal, day one

Today was the first day of the Astrophysical calibration of Gaia and other surveys meeting at Ringberg Castle in Tegernsee, Germany. The meeting is about methods for combining data to produce good astrophyiscal parameters from current and future surveys. There were many great talks today, but highlights were the following:

Andrae showed work on simulated data to demonstrate that the final Gaia Catalog will have good parameters (effective temperature, surface gravities, and so on) for stars. He showed that the differences between models is much larger than any precision of the data, and that therefore no models are expected to be "good fits" in any sense. He showed that they can recalibrate or adjust the input spectral models to force them to agree with the data, and that this seems to work extremely well.

At the end of a talk by Korn, Fouesneau commented that every model incorporated into the Gaia pipelines ought to include (if it has it) predictions for wavelengths outside the Gaia bandpasses. This is because then the results of fitting to Gaia data can be used to make predictions in other surveys. This comment resonated with me, because in the talks today there was a lot of talk about the "true" values of astrophysical parameters, but since these latent values are inaccessible to direct observation, we need other ways to assess credibility.

In between talks, Bovy made a nice point, which is that if Gaia wants end users to be able to compute likelihoods (and not just get point estimates or best-fit values) for astrophysical parameters, they should project all models onto their various systems (the G magnitude system, the BP/RP low-resolution spectral system, and the RVS high-resolution spectral system). This would permit end users to re-compute likelihoods for all data for all models.


web cams and James Bradley

Over lunch, Markus Pössel (MPIA) mentioned that he can measure the sidereal day very accurately, using a fish-eye or wide-field web cam pointed at the sky. This led us to a discussion of whether it would be possible to repeat Bradley's experiments of the 1700s that measured stellar aberration, precession of the Earth's axis, and nutation. Pössel had the very nice realization that you don't have to specifically identify any individual stars in any images to do this experiment; you can just do cross-correlations of multi-pixel time series. That's brilliant! We decided to discuss again later this month along with a possible (high school) student researcher.

Before that, Roberto Decarli (MPIA) and I discussed various projects. The most interesting is whether or how you can "stack data" (combine information from many images or many parts of an image) but in interferometric imaging data. Decarli has shown that you can do this stacking in the fourier space rather than in the image space. That's excellent, because the noise properties of the data are (conceivably) known there, but never understood properly in the image space. I gave him my usual advice, which is to replace the stacking with some kind of linear fit or regression: Stacking in bins is like linear fitting but under hard assumptions about the noise model and properties of the sources. We agreed to test some ideas.


Gaia status, Kepler calibration

Great conversations today with Brewer about sampling and inference, Knut Jahnke and Stefanie Wachter (MPIA) and Doug Finkbeiner (CfA) about Euclid calibration, and Jennifer Hill (NYU) about data science. But that's not all:

In the morning, Coryn Bailer-Jones (MPIA) gave a status update on the Gaia mission, which was launched in December. It is performing to spec in almost all respects. Bailer-Jones gave us straight talk on three issues they now face: The scattered light (from sources not in the field of view) is much higher than expected, reducing the magnitude limits for all accuracies and capabilities. There is something (probably water) affecting the total photometric throughput. It looks like this can be mitigated with occasional thermal cycling of the optics. The "fundamental angle" between the two telescopes seems to be varying with time with a larger amplitude than expected. This can be modeled, so long as it is understood properly. I think I speak for the entire astronomical community when I say that we can't wait for early data releases and wish the Gaia Collaboration the best of luck in addressing these challenges.

In the afternoon, Dun Wang, Foreman-Mackey, Schölkopf, and I worked through Wang's results on Kepler data-driven calibration. I am pretty excited about this project: Wang has shown that when we "train" the model on data not near (in time) to an injected transit, the data recalibrated with the model produces unbiased (on average) inferences about the transit. We assigned Wang next steps.