2018-02-13

writing on dimensionality

Because of work Bedell did (on a Sunday!) in support of the Milky Way Mapper meeting, I got renewed excitement about our element-abundance-space dimensionality and diversity work: She was able to show that we can see aspects of the low dimensionality of the space in the spectra themselves, mirroring work done by Price-Jones (Toronto) in APOGEE, but with more specificity about the abundance origins of the dimensionality. That got me writing text in a document. As my loyal reader knows, I am a strong believer in writing text during (not after) the data-analysis phases. I'm also interested in looking at information-theoretic or prediction or measurement approaches to dimensionality.

2018-02-12

FML, and the Big Bounce

The day started with a realization by Price-Whelan (Princeton) and me that, in our project The Joker, because of how we do our sampling, we have everything we need at the end of the sampling to compute precisely the fully marginalized likelihood of the input model. That's useful, because we are not just making posteriors, we are also making decisions (about, say, what to put in a table or what to follow up). Of course (and as my loyal reader knows), I don't think it is ever a good idea to compute the FML!

At lunch, Paul Steinhardt (Princeton) gave a great black-board talk about the idea that the Universe might have started in a bounce from a previously collapsing universe. His main point (from my perspective; he also has particle-physics objectives) is that the work that inflation does with a quantum mechanism might be possible to achieve with a classical mechanism, if you could design the bounce right. I like that, of course, because I am skeptical that the original fluctuations are fundamentally quantum in nature. I have many things to say here, but I'll just say a few random thoughts: One is that the strongest argument for inflation is the causality argument, and that can be achieved with other space-time histories, like a bounce. That is, the causality (and related problems) are fundamentally about the geometry of the space and the horizon as a function of time, and there are multiple possible universe-histories that would address the problem. So that's a good idea. Another random thought is that there is no way to make the bounce happen (people think) without violating the null-energy condition. That's bad, but so are various things about inflation! A third thought is that the pre-universe (the collapsing one) probably has to be filled with something very special, like a few scalar fields. That's odd, but so is the inflaton! And those fields could be classical. I walked into this talk full of skepticism, and ended up thinking it's a pretty good program to be pursuing.

2018-02-11

welcome to the Milky Way Mapper

Today was the (unfortunately Sunday) start to the first full meeting of the Milky Way Mapper team, where MWM is a sub-part of the proposed project SDSS-V, of which I will be a part. It was very exciting! The challenge is to map a large fraction of the Milky Way in red-giant stars (particularly cool, luminous giants), but also get a full census of binary stars in different states of evolution, and follow up exoplanets and other scientific goals. Rix was in town, and pointed out that the survey needs a description that can be stated in two sentences. Right now it is a mix of projects, and doesn't have a description shorter than two dense slides! But it's really exciting and will support an enormous range of science.

There were many highlights of the meeting for me, most of them about technical issues like selection function, adaptive survey design, and making sensitive statistical tests of exoplanet systems. There was also a lot of good talk about how to do non-trivial inferences about binary-star populations with very few radial-velocity measurements per star. That is where Price-Whelan and I shine! Another subject that I was excited about is how one can design a survey that is simultaneously simple to operate but also adaptive as it goes: Can we algorithmically modify what we observe and when based on past results, increase efficiency (on, say, binary stars or exoplanets), but nonetheless produce a survey that is possible to model and understand for population statistics? Another subject was validation of stellar parameter estimates: How to know that we are getting good answers? As my loyal reader can anticipate, I was arguing that such tests ought to be made in the space of the data. Can they be?

2018-02-09

warps and other disk modes

Adrian Price-Whelan (Princeton) and Chervin Laporte (Victoria) convened a meeting at Flatiron today to discuss the outer disk. It turned into a very pleasurable free-for-all in part because Kathryn Johnston (Columbia) came down and Sergey Koposov (CMU) was in town for it! We argued about what are the best tracers for fast or early Gaia DR2 results on the warp and other outer-disk structure, which looks non-trivial and interesting. One thing I proposed, which I would like to think about more, is taking the disk-warping simulations of Laporte and using them to inspire or generate a set of basis functions for disk modes in which expected warps and wiggles are compactly described. Then we could fit the Gaia data with these modes and have a regularized but non-parametric model of the crazy.

Late in the day, Ana Bonaca (Harvard) and I walked through our full paper and results on the information in streams with Johnston and Price-Whelan. They gave us lots of good feedback on how to present our results and what to emphasize.

2018-02-08

dust

The highlight of my low-research day was a great seminar by Eddie Schlafly (LBL) about Milky Way dust. He showed that he can build three-dimensional models (and maybe four-dimensional, because radial-velocities are available) from PanSTARRS and APOGEE data (modeling stellar spectra and photometry) and he showed that he can even map the extinction curve in three dimensions! That reveals new structures. It is very exciting that in the near future we might be able to really build a dynamical model of the Milky Way with dust as a kinematic tracer. Also interesting to think about the connection to CMB missions. He showed a ridiculous Planck polarization map that I hadn't seen before: It looks like a painting!

2018-02-07

Gaia helpdesk and optimized photometry and various

We got way too many applications for the #GaiaSprint. This is a great problem to have, although it is giving me an ulcer: Almost every applicant is obviously appropriate for the Sprint and should be there! So the SOC discussed ways we could expand the Sprint but maintain its culture of intimacy and fun.

In Gaia DR2 prep workshop, we discussed our preparations for joining the Kepler data (and especially the whole KIC) with the data from Gaia DR2. We are hoping to have this done within minutes of the data release, making use of the high-end ESA data systems. This activity resulted in the submission of a trouble ticket to the Gaia helpdesk.

At stars group meeting, way too much happened to report. But Ben Pope (NYU) showed that his work on using L1 to regularize the optimization of photometric apertures works extremely well in some cases, but is very brittle, for reasons we don't yet understand. Simon J Murphy (Sydney) started to talk about what he and Foreman-Mackey (Flatiron) have achieved in his week-long visit but he got side-tracked (by me) onto how awesome delta-Scuti stars are and somehow why. And Ana Bonaca (Harvard) gave an overview of what we are doing with stellar streams.

2018-02-06

spectral representation; purely geometric spectroscopic parallaxes

Today was a low-research day! Research was pretty-much limited to a (great) call with Rix (MPIA) and Eilers (MPIA). We discussed several important successes of Eilers's work on latent-variable models. One is that she finds that she can improve the performance of The Cannon operating on stellar spectra if she reduces the dimensionality of the stellar spectra before she starts! That's crazy; how can you throw away information and do better? I think the answer must have something to do with model wrongness: The model is wrong (as all models are), and it is probably less wrong in the projected space than it was in the original pixel basis. This all relates to data representation issues that I have worried about (but done nothing about) before.

Another important success is that Eilers can run the Gaussian-Process latent-variable model on the dimensionality-reduced space much, much faster than the original data space, and not only does it do better than it did before, it does better than The Cannon. That's great, but it isn't just performance we are looking for: The GPLVM has better model structure, such that we can infer labels without having training data that have nuisance parameter labels. That is, we can make a predictive model for the interesting subspace of the label space. This is tremendously important going in to Gaia DR2, because we want to train a spectroscopic parallax method using only geometric inputs: No stellar models, ever!

2018-02-05

information in stellar streams

Ana Bonaca (Harvard) arrived in town for a week of hacking on our stream-information project. She spent today getting more streams in to the analysis. The point of the project is not to model each stream in detail, but rather to examine, using Fisher Information, the information that each stream (or any combination of streams) brings to the measurement of gravitational-potential parameters. We worked also on paper scope and our original goal (way long ago) of constraining the mass and orbit of the LMC.

2018-02-02

adversarial approaches to everything

Today's parallel-working session at NYU was a dream. Richard Galvez (NYU) is working with Rob Fergus (NYU) to train a generative adversarial network on images of galaxies. One issue with these GANs is that a GAN can do well making fake data in a subspace of the whole data space, and still do well, adversarially. So Galvez is using a clustering (k-means) in the data space, and looking at the populations of the clusters in the true data and in the generated data, to see that coverage is good. This is innovative, and important if we are going to use these GANs for science.

Kate Storey-Fisher (NYU) is making something like adversarial (there's that word again) mock catalogs for large-scale structure projects: She is going to make the selection function in each patch of the survey a nonlinear function of the housekeeping data (point-spread function, stellar density, transparency, season, and so on) we have for that patch. Then we can see what LSS statistics are robust to the crazy. These mocks will be adversarial in the sense that they will represent a universe that is out to trick us, while GANs are adversarial in the sense that they use an internal competitive game for training.

And as I was explaining why I am disappointed with the choices that LSST has made for broad-band filters, Alex Malz (NYU) and I came up with an inexpensive and executable proposal that would satisfy me and improve LSST. It involves inexpensive and easy-to-make stochastically ramped filters. I don't think there is an iceball's chance in hell that the Collaboration would even for a moment consider this plan, but the proposal is a good one. I guess this is adversarial in a third sense!

2018-01-31

SPHEREx workshop, day 2

I got up at 0530 and looked at the participants and schedule for the SPHEREx workshop. I realized that I had prepared precisely the wrong talk yesterday! So I threw away my slides and made completely new slides. It was rushed. I forgot things. But it was still an improvement. I switched from saying things about scientific goals to saying things about technical improvements or extensions that could make the project more capable in respects that would serve the needs of (among other things) stellar science.

I then headed in to the workshop; I could only make it to the second day. I learned so much today. I can't do it justice. Here are some random facts: A lot could be learned about exoplanets if we could get bolometric fluxes for the stars.
I knew this already, I guess, but the prospects for SPHEREx here are excellent, if the project can deliver absolutely calibrated flux densities. There is a mass–metallicity relationship inside the Solar System! The Solar System contains Trojan satellites/asteroids around Neptune, not just Jupiter! There is no model for the zodiacal light in the Solar System that matches the observations to the level of precision that an infrared survey would need to remove or avoid it. The zodiacal light is consistent with being made up of ground up asteroids and evaporated comets! ALMA has observed many debris disks around nearby stars; some of these are angularly huge. The poster child is Fomalhaut, which has a thin, elliptical ring. It's a crazy thing. I learned these things from a combination of Dan Stevens (OSU), Jennifer Burt (MIT), Carey Lisse (JHU), and Meredith MacGregor (Harvard), but that's just a tiny sampling.

At the end of the day there was discussion of calibration, led by Doug Finkbeiner (CfA) and me. I very much enjoy the technical challenges for SPHEREx and the enthusiasm of the team taking them on.

2018-01-30

slides prep

It was a very low-research day! But on the train to Boston, I prepared slides for a short talk at a meeting at Harvard about the SPHEREx mission concept. I wrote about how this cosmology mission (line intensity mapping and large-scale structure) might revolutionize our knowledge of stars in the Milky Way.

2018-01-29

asteroseismic binaries; distances between transients

Simon J Murphy (Sydney) is in town for two weeks of hacking with Dan Foreman-Mackey (Flatiron). On arrival last week, the two of them implemented something I have been wanting to do for a long time, which is use asteroseismic phase shifts to find binary companions (yes, people have done this for a while now) but without binning the data up or ever explicitly measuring any time delays in bins or at times. This week (having solved that) they are looking at radial-velocity predictions from those discoveries, and testing them with HIRES spectra. They teamed up with Megan Bedell (Flatiron) to use her wobble system to make these measurements. All I did was cheer-lead.

In the afternoon, Alex Malz (NYU) and I discussed what we might do in an upcoming LSST transient classification challenge. I am interested in the following question: Say you have two sparsely and irregularly sampled light-curves of two transient events that are intrinsically similar but maybe at different redshifts, and you want to see that they are similar. How do you construct a relevant, useful, and tractable similarity or distance metric? I have lots of ideas; if we can solve this, we might have something to contribute.

2018-01-26

infall

In a low-research day, Megan Bedell and I went through the assumptions underlying our beliefs about chemical enrichment of a star from infall by dust or rocky material. This is relevant because she is finishing up a paper on the subject, with her extremely precise measurements of Solar twins.

2018-01-25

EPRV, locked planets, data sharing

At lunch-time today, Megan Bedell (Flatiron) and Ray Pierrehumbert (Oxford) gave talks at Flatiron. During Bedell's talk, she nicely laid out the large number of results we have on extreme-precision radial-velocity measurement; we need to start writing papers asap! She even gave a very simple and new description of what we found with respect to HARPS wavelength-calibration fidelity, a couple of years ago. So we need to write that up too.

Pierrehumbert showed fluid-dynamics results on atmospheres of tidally-locked planets (which are interesting, because they sustain huge temperature gradients around their surfaces). He has some cases where he can't find any steady-state solution for the atmophere; the resulting time dependences might have observable consequences.

Late in the day, I gave a presentation to the AAAC that oversees astrophysics and inter-agency cooperation in astronomy across NSF, NASA, and DOE. I was asked to speak about the future of data sharing, data re-use, and joint analyses. I drew inspiration from cosmology and went into two of my standard sets of talking points: The first is that we need to be thinking about likelihood functions, and how to share them: Data sets are combined by their (possibly partially marginalized) likelihood functions. The second is that when data get sophisticated or complex, there is no point in releasing it without also releasing the code that made sense of the data in real scientific projects. That is, code and data releases can't really be seen as separate things. And we might not be able to have a data release without having a code release (with appropriate licensing for repurposing and re-use). My slides were incomplete, but I put them up here anyway.