Most methods for performing regressions don't provide natural uncertainties. Some do, of course! But few deliver uncertainties you will believe. I discussed these issues with Contardo (SISSA) today, in the context of our project to (confidently) find infrared excesses around boring old main-sequence stars. One option is to look at the performance on held-out data. But then you have to decide how to aggregate this information in a way that is relevant for each object in your sample: They probably don't all have the same uncertainty! Another option is to look at the variation of prediction across training sets. That's good! But it requires that you have lots of training data. In this case, we do, so that's where we are at right now.
2023-09-20
2023-09-01
kinematic dipole and dust
I had a long conversation with Kate Storey-Fisher (NYU) and Abby Williams (Caltech) about the dipole in the Quaia catalog caused by the kinematic motion of the Solar System barycenter with respect to the cosmic rest frame. Williams has found that the amplitude of the dipole we get depends very strongly on how we account for dust in our sample. There is currently a controversy about the amplitude of the dipole seen in WISE quasars. We now think that it is possible that the measured amplitude is a strong function of how dust is corrected for in the sample? We designed new tests for next week.
2023-07-26
elusive quasar dipole
There should be an imprint of the kinematic dipole observed in the cosmic microwave background in any cosmological tracer: The dipole is set by the Solar System barycentric velocity relative to the local Hubble flow, and that same velocity should imprint a dipole on anything cosmological. I have been working on this in part because it is a good measurement to make with Quaia, and in part because the dipole in the quasars is controversial. I have many thoughts, but I will save them for later.
Anyways, Abby Williams (NYU) has been working on making this measurement, and her dipole amplitude and direction depend on what we hold fixed and what we vary (in particular selection-function components), and they also depend on what sky region we use. None of this is surprising; the selection function has a strong dipole in it, and it is not known precisely. But then I don't understand how the studies published previously have such good error bars. Maybe they didn't consider the various different fitting regimes?
2023-07-19
extreme infrared excesses
Gaby Contardo (SISSA) showed up in Heidelberg today to make progress on our project on infrared excesses in normal, non-young FGK stars. Because we are using NASA WISE data (along with ESA Gaia and NASA 2MASS), we are only sensitive to bright, hot infrared excesses, much hotter and brighter than typical debris disks around old stars. We have some candidates, which range in temperature from 300 to 1500 K and are reprocessing maybe one percent or a fraction of a percent of the stellar light. (Warning: I haven't calculated this; this is just a guesstimate based on looking at plots.) What are those things? Today we figured out that they can't be warm substellar companions, so they have to be dust (I guess??).
2023-05-31
Dr Kate Storey-Fisher
Kate Storey-Fisher (NYU) defended her PhD here at NYU today. She killed it! She talked about emulating cosmological simulations (at the level of statistics, not maps), making invariant scalars that encode the shapes and dynamics of dark-matter halos, and her awesome 1.2 million all-sky quasar catalog from ESA Gaia and NASA WISE. It was all things my loyal reader knows lots about but I loved it. It has been an honor and a privilege to work with KSF these years, and I will miss her very very much.
2023-02-28
purity and completeness
Today Kate Storey-Fisher (NYU) and I discussed how to estimate the stellar contamination of her Gaia and WISE quasar catalog. Because there are few large, complete samples of anything, it’s hard to do this by comparison with any kind of Ground Truth™. What we realized on the call is that it’s easier to estimate how the contamination *changed* as we went from the Gaia quasar candidate table to our final sample. We discussed how to use what external data we have to estimate this.
2022-10-03
infrared-excess stars
My day started with a call with Gaby Contardo (SISSA) and Trevor David (Flatiron) about Contardo's project to find stars in ESA Gaia and NASA WISE that have infrared excesses. These stars should be young or dust-enveloped or host aliens! We are trying to phrase this problem as a prediction problem: How well can we predict infrared brightness from Gaia (visible) information, and are there stars with significant excess infrared? The answer seems to be yes: The histogram of differences between predicted and observed skews nicely to infrared-excess. Now: Are any of these known objects? And can we rediscover (say) star-forming regions?
2022-05-05
making a mock Gaia quasar sample
I had conversations today with both Hans-Walter Rix (MPIA) and Kate Storey-Fisher (NYU) about the upcoming ESA Gaia quasar sample. We are trying to make somewhat realistic mocks to test the size of the sample, the computational complexity of things we want to do, the expected signal-to-noise of various cosmological signals, and the expected amplitude and spatial structure of the Gaia selection function. We have strategies that involve making clean samples with a lognormal mock, and making realistic samples (but which have no clustering) using the Gaia EDR3 photometric sample (matched to NASA WISE).
2022-02-16
infrared excesses for planet-hosting stars
Gaby Contardo (Flatiron) and I went to the Gaia EDR3 Archive to make use of its matched catalogs, matching up Gaia, 2MASS, and WISE. We are looking at very short-period planet hosts, which might show interesting photometric deviations. We took one host star, and then found many other stars with similar photometry in the visible. Do they agree in the infrared? It looks like maybe there is a tiny discrepancy? But the power will come from doing many, not just one.
2021-03-17
we have new spectrophotometric distances
After a couple of days of hacking and data munging—and looking into the internals of Jax—Adrian Price-Whelan and I produced stellar distance estimates today for a few thousand APOGEE spectra. Our method is based on this paper on linear models for distance estimation with some modifications inspired by this paper on regression. It was gratifying! Now we have hyper-parameters to set and valication to do.
2020-11-30
re-doing spectroscopic distances in EDR3 with high-alpha too
Christina Eilers (MIT) and I discussed our ESA Gaia EDR3 projects today. Our top priority is to re-do our machine-learning (linear regression, really) spectrophotometric distance estimates for very luminous red-giant stars, and then re-map the Milky Way disk in abundances and kinematics. We think that even a small improvement in the parallaxes (as we expect to get on Thursday) might make a big difference to our inferred spectroscopic distances. We discussed the point that in our DR2 work we only used stars on the “low-alpha sequence”; we want to generalize if we are going to make complete abundance maps. But also the stars with different abundance trends might want very different distance estimation parameters. That suggests doing the EDR3 regression in a more “abundance-aware” way.
2019-08-20
visualizing substructure in large data
Today Doug Finkbeiner (Harvard), Josh Speagle (Harvard), and Ana Bonaca (Harvard) came to visit me in my undisclosed location in Heidelberg. We discussed many different things, including Finkbeiner's recent work on finding outliers and calibration issues in the LAMOST spectral data using a data-driven model, and Speagle's catalog of millions of stellar properties and distances in PanSTARRS+Gaia+2MASS+WISE.
Bonaca and I took that latter catalog and looked at new ways to visualize it. We both have the intuition that good visualization could and will pay off in these large surveys. Both in terms of finding structures and features, and giving us intuition about how to build automated systems that will then look for structures and features. And besides, excellent visualizations are productive in other senses too, like for use in talks and presentations. I spent much of my day coloring stars by location in phase space or the local density in phase space, or both. And playing with the color maps!
There's a big visualization literature for these kinds of problems. Next step is to try to dig into that.
2019-07-15
dynamics and inference
Eilers (MPIA), Rix (MPIA), and I have spent two weeks now discussing how to model the kinematics in the Milky Way disk, if we want to build a forward model instead of just measuring velocity moments (Jeans style). And we have the additional constraint that we don't know the selection function of the APOGEE–Gaia–WISE cross-match that we are using, so we need to be building a conditional likelihood, velocity conditioned on position (yes, this is permitted; indeed all likelihoods are conditioned on a lot of different things, usually implicitly!).
At Eilers's insistence, we down-selected to one choice of approach today. Then we converted the (zeroth-order, symmetric) equations in this paper on the disk into a conditional probability for velocity given position. When we use the epicyclic approximations (in that paper) the resulting model is Gaussian in velocity space. That's nice; we completed a square, Eilers coded it up, and it just worked. We have inferences about the dynamics of the (azimuthally averaged) disk, in the space of one work day!
2018-10-21
ready to submit!
I worked on the weekend to finish my paper with Eilers (MPIA) and Rix (MPIA). It is ready to submit! And yet I can't push my changes properly to GitHub because they are (in a very rare moment) down! I made some compromises in finishing up this paper; I can only justify them by promising myself I will address the final issues while the referee considers the manuscript.
2018-10-09
finishing papers; galaxy morphology regressions
The morning started with a conversation between Eilers (MPIA) and I in which we decided that we will finish our connected papers (first draft anyway) by Friday. I think she will make it! But will I make it? I am going to be strong. We also went through some ideas about testing the assumptions that underly our Jeans model for the Milky Way disk, and what to write about the outcomes of those tests.
Mid-day I had good conversations with Storey-Fisher (NYU) about building pseudo-simulations that make point sets with low-amplitude non-trivial power spectra. We spent an unfortunate amount of time figuring out how the numpy fft module organizes and stores fourier transform data. It isn't trivial!
In the afternoon, Elisa Chisari (Oxford) gave a nice (and pleasantly technical) talk about weak lensing, which evolved into a longer discussion about how we might get more information out of galaxy imaging surveys. I pitched my ideas of thinking about how we might train regression models that can predict dark-matter structure from galaxy morphologies or even better large-scale-structure morphologies. And Chisari has (indirect) evidence that such approaches might be very powerful, because (with simulations) she showed (in the context of intrinsic-alignment contamination of weak-lensing data) that even simple measures of galaxy morphology are expected to be very sensitive to the local gravitational tidal field.
One thing that came up in this discussion is my suspicion that ellipticity is a very blunt tool. I have counter-examples that show that ellipticity is not necessarily the galaxy property most sensitive to the weak-lensing field (in an information-theoretic sense). But we formulated a challenge: Make an adversarial morphology distribution for galaxies such that none of the weak-lensing information in the data is in the galaxy ellipticities. That would be hilarious (or instructive, or both).
2018-09-23
finishing a paper; latents
I dusted off the draft of my paper with Eilers (MPIA) and Rix (MPIA) about spectrophotometric measurements of red-giant distances or parallaxes using Gaia SDSS APOGEE, 2MASS, and WISE. It is nearly done! But we put it on ice while Eilers finished other things. I worked through more than half of the text, making notes on what small things remain to do.
The biggest to-do item? We have a linear model (for the log distance or log parallax or absolute magnitude). That's sweet, because it is simple, and it is interpretable, at least partially. Now we have to make that true by interpreting. Interpreting a linear model is harder than fitting a linear model!
I also had conversations with Storey-Fisher (NYU) about models for the correlation function and Price-Whelan (Princeton) about Milky Way non-equilibrium dynamical models. On the former, we discussed the difference between the correlation function and any particular estimate of the correlation function. It's a bit complex, because I'm not sure there is even agreement in the community about what would be considered the true latent correlation function in the low-ish redshift Universe.
2018-09-04
luminous red giants in APOGEE
Eilers (MPIA) and I went on the APOGEE science telecon to describe our results. I talked about how we calibrated a (purely linear) spectro-photometric distance estimate for luminous red giants that manages to correct for dust and luminosity, and Eilers talked about how we used those tracers to measure the circular velocity of the Milky Way disk (that is, the potential). We use the Jeans equation in cylindrical symmetry. We got great feedback from the APOGEE team, which we will use to improve our discussion in our papers.
2018-08-24
bad development cycle is bad
My day started with a conversation with Christina Eilers (MPIA) about the Milky Way rotation curve. We found some strange kinematics points that might be messing with us, and realized that they are almost all stars at or past the Bulge, and therefore not affecting our results, which are only for Galactocentric radii greater than 5 kpc, to avoid the craziness of the bar (which violates our dynamical assumptions). Her figures are ready, so I encouraged her to write figure captions and assemble the paper.
I spent my research time getting MCMC running on my Chemical Tangents project. I have a marginalized likelihood, so all I had to do is put on priors and insert into emcee. Oh how I would have benefitted from a testing environment! When I packaged it all up for emcee I messed up the units of almost all the inputs, so I got garbage in every MCMC run. And the runs took a long time, so diagnosis was painful. Unit testing. And for units! Live and fail to learn, that's what I say.
Once everything appeared to be working, I set up some (nasty) multi-processing, set my laptop to stay awake all night, and blew processes. I should have converged samplings by morning.
2018-08-09
The Cannon again, chemical tori
Within one frantic half-hour, Eilers (MPIA) and I completely implemented a new version of The Cannon and ran it on her sample of luminous red giants. We did this so that we can compare the internals of her linear model for parallax estimation to the internal derivatives or label dependencies for The Cannon. This will let us take a step towards interpreting the internals of the spectrophotometric-parallax model. We scanned the comparison but it doesn't look quite as easy to interpret as I had hoped.
As soon as this was done, I said some words in MPIA Milky Way group meeting about my ideas for Chemical Tangents: That is, the idea that orbits must lie in the level surfaces (hyper-surfaces in 6-d phase space) of the chemical abundance distribution. The method puts an enormous number of constraints on the orbit space, so it has the potential to be extremely constraining. Rix (MPIA) is suspicious that it all sounds too good to be true: The method requires no knowledge of the selection function (to zeroth order) and no second-order statistics. It is entirely first-order in the data. Damn I hope I'm not wrong here.
In the morning, Rene Andrae (MPIA) showed me his enormous cross-match of spectroscopic surveys that he is putting together in part to understand the stellar parameter pipelines of Gaia (to which he is a contributor). He has the input data for a combinatoric diversity of projects we could do with The Cannon or stellar-parameter self-calibration.
2018-08-08
projects examined
Rix (MPIA) started the day concerned with substantial issues with the linear parallax model that Eilers (MPIA) and I have built; we spent much of the day following them up. Our precision gets worse with distance—an effect we have noticed all summer but haven't been able to explain—and now we have to explain it! We compared stars in clusters and looked at parallax offsets as a function of various things; we don't yet have an explanation. But we did do some straightforward error propagation and guess what: Our precision really can't be much better than the 9-ish percent that we are seeing. The whole exercise left me more confident in the quality of the model in the end: The model really seems to have learned how to cope with dust, age, and intrinsic luminosity effects, even though we didn't tell it how.
In a call with Bonaca (Harvard) we looked at oddities in her model of the morphology of the GD-1 stream gaps. We had some provable scalings that should be there but the code wasn't reproducing them. We worked out today that the stream perturbation isn't quite in the regime we thought it was. In more detail: An encounter of a massive perturber with a stream is impulsive if GM/(b v^2) is much less than 1, where G is Newton's constant, M is the perturber's mass, b is the impact parameter, and v is the relative velocity of the encounter (or maybe some component thereof). That is, you have to have this dimensionless number much less than unity if you want the impulse approximation to hold. Duh! But now we understand the simulations she is making.
The day ended with Birky (UCSD) and I calling Andrew Mann (UNC) and Adam Burgasser (UCSD) to discuss Birky's results modeling M-type dwarf spectra in APOGEE. She has beautiful results, and can show both that her spectral models are accurate (in the space of the spectral data) and that her inferences about latents (temperature and metallicity) are reasonable when compared with proxies and tests of various kinds. So it is time to finish writing it up! We made plans for that. One amusing thing about her project is that it creates a beautiful translation between temperature, metallicity, and spectral type. And it isn't trivial!