Hogg's Research

2017-09-30

LIGO noise correlations

I spent some weekend science time reading this paper on LIGO noise that claims that the time delays in the LIGO detections (between the Louisiana and Washington sites) are seen in the noise too—that is, that the time delays or coincidence aspects of LIGO detections are suspect. I don't understand the paper completely, but they show plots (Figure 3) that show very strong phase–frequency relationships in data that are supposed to be noise-dominated. That's strange; if there are strong phase–frequency relationships, then there are almost always visible structures in real space. (To see this, imagine what happens as you modify the zero of time: The phases wind up!) Indeed, it is the phases that encode real-space structure. I don't have an opinion on the bigger question yet, but I would like to have seen the real-space structures creating the phase–frequency correlations they show.

2017-09-29

the life stories of counter-rotating galaxies

Today was the third experiment with Friday-morning parallel working in my office. It is like a hack week spread out over months! The idea is to work in the same place and build community. During the session, I worked through a multi-linear model for steallar spectra and tellurics with Bedell, based on conversations with Foreman-Mackey earlier in the week. I also worked through a method for generating realistic and self-consistent p(z) functions for fake-data experiments with Malz. This is a non-trivial problem: It is hard to generate realistic fake data, it is even harder to generate realistic posterior PDFs that might come out of a probabilistic set of analyses of those data.

Just before lunch, Tjitske Starkenburg (Flatiron) gave the NYU Astro Seminar. She mainly talked about counter-rotating galaxies. She took the unusual approach of following up, in the simulations she has done, some typical examples (where the stars rotate opposite to the gas) and figure out their individual histories (of accretion and merging and movement in the large-scale structure). Late in the day, she and I returned to these subjects to figure out if there might be ways to read a galaxy's individual cosmological-context history off of its present-day observable properties. That's a holy grail of galaxy evolution.

2017-09-28

what's the point of direct-detection experiments?

In the morning I spoke with Ana Bonaca (Harvard) and Chris Ick (NYU) about their projects. Bonaca is looking at multipole expansions of the Milky Way potential from an information-theory (what can we know?) point of view. We are working out how to visualize and test her output. Ick is performing Bayesian inference on a quasi-periodic model for Solar flares. He needs to figure out how to take his output and make a reliable claim about a flare being quasi-periodic (or not).

Rouven Essig (Stonybrook) gave a nice Physics Colloquium about direct detection of dark matter. He is developing strong limits on dark matter that might interact with leptons. The nice thing is that such a detection would be just as important for the light sector (new physics) as for the dark sector. He gave a good overview of the direct-detection methods. After the talk, we discussed the challenge of deciding what to do as non-detections roll in. This is not unlike the issues facing accelerator physics and cosmology: If the model is just what we currently think, then all we are doing is adding precision. The nice thing about cosmology experiments is that even if we don't find new cosmological physics, we usually discover and measure all sorts of other things. Not so true with direct-detection experiments.

2017-09-27

Gaia, EPRV, photons

In our Gaia DR2 prep workshop, Stephen Feeney (Flatiron) led a discussion on the Lutz–Kelker correction to parallaxes, and when we should and shouldn't use it. He began by re-phrasing the original LK paper in terms of modern language about likelihoods and posteriors. Once you put it in modern language, it becomes clear that you should (almost) never use these kinds of corrections. It is especially wrong to use them in the context of Cepheid (or other distance-ladder) cosmology; this is an error in the literature that Feeney has uncovered.

That discussion devolved into a discussion of the Gaia likelihood function. Nowhere in the Gaia papers does it clearly say how to reconstruct a likelihood function for the stellar parallaxes from the catalog, though it does give a suggestion in the nice papers by Astraatmadja, such as this one. Astraatmadja is a Gaia insider, so his suggestion is probably correct, but there isn't an equivalent statement in the official data-release papers (to my knowledge). There is a big set of assumptions underlying this likelihood function (which is the one we use); we unpacked them a bit in the meeting. My position is that this is so important, it might be worth writing a short note for arXiv.

In Stars group meeting, Megan Bedell (Flatiron) showed her current status on measuring extremely precise radial velocities using data-driven models for the star and the tellurics. It is promising that her methods seem to be doing better than standard pipelines; maybe she can beat the world's best current precision?

Chuck Steidel (Caltech) gave a talk in the afternoon about things he can learn about ionizing photons from galaxies at high redshift by stacking spectra. He had a number of interesting conclusions. One is that high-mass-star binaries are important! Another is that escape fraction for ionizing photons goes up with the strength of nebular lines, and down with total UV luminosity. He had some physical intuitions for these results.

2017-09-26

machine learning

The day started with a somewhat stressful call with Hans-Walter Rix (MPIA), about applied-math issues: How to make sure that numerical (as opposed to analytic) derivatives are calculated correctly, how to make sure that linear-algebra operations are performed correctly when matrices are badly conditioned, and so on. The context is: Machine-learning methods have all sorts of hard numerical issues under the hood. If you can't follow those things up correctly, you can't do correct operations with machine-learning models. It's stressful, because wrongness here is wrongness everywhere.

Later in the morning, Kilian Walsh (NYU) brought to me some ideas about making the connections between dark-matter simulations and observed galaxies more flexible on the theoretical / interpretation side. We discussed a possible framework for immensely complexifying the connections between dark-matter halos and galaxy properties, way beyond the currently-ascendent HOD models. What we wrote down is interesting, but it might not be tractable.

2017-09-25

thermal relics

In a low-research day, I discussed probabilistic model results with Axel Widmark (Stockholm), a paper title and abstract with Megan Bedell (Flatiron), and Gaia DR2 Milky Way mapping with Lauren Anderson (Flatiron).

The research highlight of the day was an excellent brown-bag talk by Josh Ruderman (NYU) about thermal-relic models for dark matter. It turns out there is a whole zoo of models beyond the classic WIMP. In particular, the number-changing interactions don't need to interact with the visible sector. The models can be protected by dark-sector layers and have very indirect (or no) connection to our sector. We discussed the differences between models that are somehow likely or natural and models that are somehow observable or experimentally interesting. These two sets don't necessarily overlap that much!

2017-09-21

GPLVM Cannon

Today Markus Bonse (Darmstadt) showed me (and our group: Eilers, Rix, Schölkopf) his Gaussian-Process latent-variable model for APOGEE spectra. It looks incredible! With only a few latent variable dimensions, it does a great job of explaining the spectra, and its performance (even under validation) improves as the latent dimensionality increases. This is something we have wanted to do to The Cannon for ages: Switch to GP functions and away from polynomials.

The biggest issue with the vanilla GPy GPLVM implementation being used by Bonse is that it treats the data as homoskedastic—all data points are considered equal. When in fact we have lots of knowledge about the noise levels in different pixels, and we have substantial (and known) missing and bad data. So we encouraged him to figure out how to implement heteroskedasticity. We also discussed how to make a subspace of the latent space interpretable by conditioning on known labels for some sources.

2017-09-20

SDSS+Gaia

At our new weekly Gaia DR2 prep meeting, Vasily Belokurov (Cambridge) showed us a catalog made by Sergei Koposov (CMU) which joins SDSS imaging and Gaia positions to make a quarter-sky, deep proper-motion catalog. His point: Many projects we want to do with Gaia DR2 we can do right now with this new matched catalog!

At the Stars group meeting, Ruth Angus led a discussion of possible TESS proposals. These are due soon!

2017-09-19

unresolved binaries

Today Axel Widmark (Stockholm) showed up in NYC for two weeks of collaboration. We talked out various projects and tentatively decided to look at the unresolved binary stars in the Gaia data. That is, do some kind of inference about whether stars are single or double, and if double, what their properties might be. This is for stars that appear single to Gaia (but, if truly double, are brighter than they should be). I suggested we start by asking “what stars in the data can be composed of two other stars in the data?” with appropriate marginalization.

2017-09-18

latent-variable models for stars

The day started with various of us (Rix, Eilers, Schölkopf, Bonse) reviewing Bonse's early results on applying a GPLVM to stellar spectra. This looks promising! We encouraged Bonse to visualize the models in the space of the data.

The data-driven latent-variable models continued in the afternoon with Megan Bedell and I discussing telluric spectral models. We were able to debug a sign error and then make a PCA-like model for telluric variations! The results are promising, but there are continuum level issues everywhere, and I would like a more principled approach to that. Indeed, I could probably write a whole book about continuum normalization at this point (and still not have a good answer).

2017-09-17

regression

Our data-driven model for stars, The Cannon, is a regression. That is, it figures out how the labels generate the spectral pixels with a model for possible functional forms for that generation. I spent part of today building a Jupyter notebook to demonstrate that—when the assumptions underlying the regression are correct—the results of the regression are accurate (and precise). That is, the maximum-likelihood regression estimator is a good one. That isn't surprising; there are very general proofs; but it answers some questions (that my collaborators have) about cases where the labels (the regressors) are correlated in the training set.

2017-09-15

new parallel-play workshop

Today was the first try at a new group-meeting idea for my group. I invited my NYC close collaborators to my (new) NYU office (which is also right across the hall from Huppenkothen and Leistedt) to work on whatever they are working on. The idea is that we will work in parallel (and independently), but we are all there to answer questions, discuss, debug, and pair-code. It was intimate today, but successful. Megan Bedell (Flatiron) and I debugged a part of her code that infers the telluric absorption spectrum (in a data-driven way, of course). And Elisabeth Andersson (NYU) got kplr and batman installed inside the sandbox that runs her Jupyter notebooks.

2017-09-14

latent variable models, weak lensing

The day started with a call with Bernhard Schölkopf (MPI-IS), Hans-Walter Rix (MPIA), and Markus Bonse (Darmstadt) to discuss taking Christina Eilers's (MPIA) problem of modeling spectra with partial labels over to a latent-variable model, probably starting with the GPLVM. We discussed data format and how we might start. There is a lot of work in astronomy using GANs and deep learning to make data generators. These are great, but we are betting it will be easier to put causal structure that we care about into the latent-variable model.

At Cosmology & Data Group Meeting at Flatiron, the whole group discussed the big batch of weak lensing results released by the Dark Energy Survey last month. A lot of the discussion was about understanding the covariances of the likelihood information coming from the weak lensing. This is a bit hard to understand, because everyone uses highly informative priors (for good reasons, of course) from prior data. We also discussed the multiplicative bias and other biases in shape measurement; how might we constrain these independently from the cosmological parameters themselves? Data simulations, of course, but most of us would like to see a measurement to constrain them.

At the end of Cosmology Meeting, Ben Wandelt (Flatiron) and I spent time discussing projects of mutual interest. In particular we discussed dimensionality reduction related to galaxy morphologies and spatially resolved spectroscopy, in part inspired by the weak-lensing discussion, and also the future of Euclid.

2017-09-13

Gaia, asteroseismology, robots

In our panic about upcoming Gaia DR2, Adrian Price-Whelan and I have established a weekly workshop on Wednesdays, in which we discuss, hack, and parallel-work on Gaia projects in the library at the Flatiron CCA. In our first meeting we just said what we wanted to do, jointly edited a big shared google doc, and then started working. At each workshop meeting, we will spend some time talking and some time working. My plan is to do data-driven photometric parallaxes, and maybe infer some dust.

At the Stars Group Meeting, Stephen Feeney (Flatiron) talked about asteroseismology, where we are trying to get the seismic parameters without ever taking a Fourier Transform. Some of the crowd (Cantiello in particular) suggested that we have started on stars that are too hard; we should choose super-easy, super-bright, super-standard stars to start. Others in the crowd (Hawkins in particular) pointed out that we could be using asteroseismic H-R diagram priors on our inference. Why not be physically motivated? Duh.

At the end of Group Meeting, Kevin Schawinski (ETH) said a few words about auto-encoders. We discussed imposing more causal structure on them, and seeing what happens. He is going down this path. We also veered off into networks-of-autonomous-robots territory for LSST follow-up, keying off remarks from Or Graur (CfA) about time-domain and spectroscopic surveys. Building robots that know about scientific costs and utility is an incredibly promising direction, but hard.

2017-09-12

statistics of power spectra

Daniela Huppenkothen (NYU) came to talk about power spectra and cross-spectra today. The idea of the cross-spectrum is that you multiply one signal's Fourier transform against the complex conjugate of the others'. If the signals are identical, this is the power spectrum. If they differ by phase lags, the answer has an imaginary part, and so on. We then launched into a long conversation about the distribution of cross-spectrum components given distributions for the original signals. In the simplest case, this is about distributions of sums of products of Gaussian-distributed variables, where analytic results are rare. And that's the simplest case!

One paradox or oddity that we discussed is the following: In a long time series, imagine that every time point gets a value (flux value, say) that is drawn from a very skew or very non-Gaussian distribution. Now take the Fourier transform. By central-limit reasoning, all the Fourier amplitudes must be very close to Gaussian-distributed! Where did the non-Gaussianity go? After all, the FT is simply a rotation in data space. I think it probably all went into the correlations of the Fourier amplitudes, but how to see that? These are old ideas that are well understood in signal processing, I am sure, but not by me!