gut bacteria and flaring stars

Recovering from the red-eye, this was a low-research day. Tarmo Äijö (SCDA) gave a nice talk in the morning about modeling biome populations in the human gut, using multinomial distributions with a time-dependent set of population fractions, which themselves come from a Gaussian Process. He uses STAN to do the sampling; it is impressive the complexity of the models that STAN can handle, and sample well. They find changes in the biome during and after diseases, and also some species that seem to show periodic population changes, with long periods (many weeks). All of the data are from genome sequencing, by the way: It is impossible (apparently) to tell apart these bacteria on morphological grounds.

Late in the day I had a call with Kelle Cruz (CUNY) and Ellie Schwab (CUNY) about low-mass flaring stars. They have a nice data set with many observations of many stars, and most of them are non-detections. They have the upper limit data, but not the measurements in the non-detection cases. They want to say things about the distribution of activity, as a function of stellar type. We tried to hash out the simplest possible project, and then decided to defer to a face-to-face on Monday.


NG Next SfL, day 2

At Northrop Grumman today, Justin Crepp (Notre Dame) talked about future extreme high precision radial-velocity spectrographs that could capitalize on adaptive optics to make them more precise, more stable, and smaller. He has a demo concept working and this could be the future! Rémi Soummer (STScI) talked about coronograph concepts for WFIRST and future missions. Soummer's talk had a huge impact on me: He showed that a coronograph actually brings the light to three successive foci, with a non-trivial mask or stop at each focus. This translates into three two-dimensional functions, each of which needs to be optimized, and possible adaptively (when there are incoming wavefronts that are not perfectly flat). This is a great set of problems, in optics and control. I fantasized about having a home-built code that does all the relevant electromagnetic propagation calculations. One simple and take-home rule of thumb from Soummer's talk: A one-picometer distortion to the wavefront at the telescope entrance aperture will make an Earth-brightness speckle for the mission concepts we care about!

Jon Arenberg (NG) pointed out that, from an engineering perspective, we should be thinking in terms not of a linear design process flowing from objectives to final data products, but an interactive feedback loop involving scientific objectives, engineering capabilities, both scientific and technological research, and flexible redesign, ideally as late into the mission process as possible, with in-mission servicing an additional possible bonus.

After all the talks were done, we talked about wish lists and important ideas, to start a longer conversation between the astronomical community and NG. I said that what I want (in the ideal world) is a platform in space where we can build, test, prototype, and fail fast. Right now (mirroring Arenberg's points), space-mission planning and execution is a very long, slow process, with no feedback. If we could work in space, we could consider all sorts of things in situ, and not have to build things at one gee and then deploy them at zero. Next decadal survey: Can we talk about some out-of-the-box ideas, that might be expensive, but might have many enthusiastic constituents?


NG Next SfL, day 1

I spent the day at Northrop Grumman, where Alberto Conti (NG Aerospace) convened an intimate workshop on the search for life around other stars. It was a remarkably interdisciplinary day, and I learned a huge amount. An incomplete and personal list of highlights follow.

Tom Vice (NG Aerospace Systems President) and Tom Pieroneck (NG) opened up the meeting by talking about hard problems, which is a good way of thinking about building an engineering team that will have work to do for many years; I realized that it is hard problems in data analysis that unites a lot of what we do at Camp Hogg; it attracts good students, postdocs, and collaborators.

Sara Seager (MIT) kicked off the science talks with a completely eye-opening discussion of how we might identify signs of life through spectroscopy. She emphasized that there are many possible false-positive signals. But it would be exciting if we found oxygen, water, and methane in the same atmosphere. The discussion of water-based life vs other liquids. She made a strong case for water! Another great idea (maybe from Lee Feinberg in the audience?) was to look for signs of climate change to identify life!

Sanjoy Som (Blue Marble) gave a great and surprising talk about how we might use the geological record of rocks on Earth to look at life signatures and changes to our atmosphere over time. The coolest was his use of fossilized raindrops, plus some first-year mechanics. His point: The different states of Earth through time are proxies for exoplanets with life. Great point!

Leslie Rogers (Berkeley) talked about inferring planet masses and mass–radius relationships. In a direct-detection experiment we will measure neither directly; is that a problem? In the question period, someone brought up the possibility that it would be the moon of a giant planet in the habitable zone that might be the inhabited object.

Chris Stark (STScI) produced mind-blowing simulations of all possible ExoHab or LUVOIR missions to find habitable planets. His simulations include optimizations of target ordering, exposure time, for starshade and coronographic experiments, all as a function of things like mirror size, mission lifetime, and so on. So much input! He made the comparison to the LHC: We need a mission that produces an interesting answer even if it doesn't detect life signatures. That is a good point. He mentioned that we will be affected by exo-Zodiacal light at unknown levels; we need to figure this out before we settle on final design decisions for anything. Fortunately, this may be addressable from the ground or with WFIRST.

There were so many other interesting things, in talks and discussions: Karin Öberg (CfA) talked about the formation of planets and the chemical and materials properties of the disks in which planets formed; Daniel Apai (Arizona) talked about mapping planet surfaces with time-domain lightcurves (something he has done to great effect); Alicia Berger (Colorado) talked about amino acids and their relationship to biosignatures. In that latter talk, Öberg burst one of my bubbles by noting that despite claims in the literature, amino acids have not been discovered in interstellar spectra. A great day!


book review

I spent vacation time on Friday, the weekend, and today working on a book review, to appear in Physics Today in the near future.


punking Gaia

I spent a good part of my research time talking out Gaia projects with Dustin Lang. We are thinking about what it would require to put together the full match of the Gaia First Data Release to all existing all-sky or near-all-sky catalogs. Or do forced photometry? The idea is to get together everything we would need to exploit the data, photometrically (and perhaps spectroscopically). If we did it right, we could be a better source for the Gaia data than ESA itself. I also spent time today writing about data-driven, interpretable, nucleosynthesis models.


writing about LSF and nuclei

I got started on various late writing projects today. I wrote about generalizing The Cannon to deal with spectroscopic data that are taken with a variable-width (or variable shape) effective line-spread function in the spectrograph. This could be caused by the instrument or by rotation or convection in the star. This is trivial to do at test time: You just convolve the spectral model before comparison to the data. But at training time this is difficult. We either have to run the convolution backwards, or build a model in the deconvolved space. Related to this, I had a conversation at lunch with Alex Barnett and Leslie Greengard about when you can invert a convolution operator. The answer varies from “never” to “over some range of spatial scales” to “it depends on the signal-to-noise and scale”. Thinking about all this on behalf of Andy Casey.

I also wrote (well typeset equations, anyway) for my project to build the minimal, interpretable, data-driven nucleosynthesis model. This is one of my answers to Hans-Walter Rix's question: Imagine you had 100,000 stars with 15 measured chemical abundances (and we have this!); what do you do with them?


cell types in the brain

In the morning, Mariano Gabitto (Columbia) spoke about identifying different cell types in the brain using gene expression. He is using probabilistic methods in a problem with integer parameters and no fixed complexity. That's challenging! The goal is to understand the “circuits” in the brain; that's a long-term goal, I suspect.

In the afternoon, I gave the seminar at AMNH here in NYC. I talked about using The Cannon to get detailed chemical abundances for stars. It is a great group and it was a lively conversation.


next steps for The Cannon

It was a low research day! The only substantial research value came in a call with Andy Casey about the next steps with his regularized version of The Cannon. We have to track down issues for cool stars (which are hard for the models and underpopulated in our training set. We have to look at whether we can make use of stars with different effective line-spread functions. We have to get some non-trivial continuum ideas working. We have to get ready to help APOGEE with DR14. I suggested that we prioritize things that will help GALAH in the short run: This would be high impact, and anything we do for GALAH is good for all the other customers.


chemical abundances of halo stars

Kathryn Johnston came downtown for the afternoon to discuss matters of chemical abundances and the accretion history of the halo. She has some of the seminal work on the relationships between chemical abundances and kinematics that we might expect in the standard model of galaxy formation. We looked at how we might make a model of the Milky Way halo that is a mixture of stellar populations, each of which has a sensible chemical-enrichment history. That would be a step towards understanding the accretion origin of the halo. We also discussed how we might identify halo stars that clearly came from the disk, on the basis of chemical signatures. All this with our huge sample of stars from APOGEE DR12, with fifteen chemical abundances.


practice of Fools and differentiation

The highlight of today was a long call with Dan Foreman-Mackey, in which we discussed various projects. One idea he had was to make the pigeons-in-holes project an April Fools' project. That's a good idea, and it would permit us to write in a snarkier tone. It also has the right characteristics for an April Fools' paper: It is technically difficult but off our main track (in a humorous direction). We both promised to try to make progress: Me on writing, and him on getting nested sampling to work (as a demo). On the minus side, April Fools' is pretty close at hand!

We also discussed the MCMC tutorial we have been writing (for many, many years). He actually made problem solutions for a bunch of the problems! So it is getting very close to being a post-able paper. We made some notes about what needs to be changed.

After that call I buckled down and wrote derivatives (I hate doing that) for the objective function that I am optimizing in my nucleosynthesis code. Actually, I am in a set of conversations with (on one side, the hipster) astronomers and (on the other side, the stodgy) mathematicians about whether auto-differentiation is a good idea or a bad idea. Guess who is on which side? But I am so old, every time I should learn how to auto-diff, I instead just write my derivatives (and test them, which hurts). As my loyal reader probably knows, auto-diff is having the machine write your derivatives code for you. Not do finite differencing, actually do the chain rule! Anyway, I got some derivatives written and then hit numerical issues with all the dot products of exponentials of things. Argh.


group meeting

In a low-research day, I had a sparsely attended group meeting. It was nonetheless worthwhile because we discussed what I might learn or do with my alpha abundances. Mike Blanton straight-up laughed when I said I wanted to build a simple nucleosynthesis model.


alpha-element nucleosynthesis

In a low-research day, Hans-Walter Rix and I discussed the possible relationships between the alpha-element abundance ratios I have been looking at and current best-in-class models of nucleosynthesis and enrichment. Jan Rybizki (MPIA) has such models and has simulated the alpha-element abundance evolution in a wide range of scenarios. The situation is complex—the models aren't well described by just two types of fixed supernova yields of course—but it also looks like there are variations in the data that aren't captured by the models. So hopefully there are simple things to conclude here.


populations of exoplanets and binary stars

Tim Morton (Princeton) came up to the Simons Foundation for the day and we talked about exoplanets and binary stars. As my loyal reader knows, Morton maintains the best model of binary stars for assessing false positive rates in Kepler data. He is working on vetting true exoplanet candidates (that is, making decisions about individual systems) and also modeling jointly the exoplanet and binary-star populations. We discussed various issues in all of this. Morton's current goal is to understand the exoplanet population—not by excluding the binaries—but by modeling them simultaneously. This is the right move, given that there will always be many seriously ambiguous cases. We also discussed the relationship between technologies that Morton is developing for exoplanets (like population-level ABC inference) and what's needed in other astrophysical domains, like large-scale structure and the populations of gravitational-wave sources.


the literature as revealed by arXiv, nucleosynthesis

My day started with coffee with Paul Ginsparg (Cornell), who is the originator of the arXiv. He is also a faculty member in both Information Science and in Physics. We discussed a wide range of things, but we ended up at experiments we could do inside the arXiv, which is not just a project that transformed all of scientific publishing, but which is a huge repository of information about how literature is written and ideas are propagated. We discussed the things that NASA ADS and INSPIRE have that arXiv doesn't, like, for instance, a citation graph and a concordance of different versions of papers. Completely randomly, we ran into Josh Greenberg (Sloan Foundation) at the Ithaca-to-NYC bus, and he agreed that the arXiv is an amazing source of empirical data about how publishing and science works (perhaps not surprisingly!). We tentatively agreed to explore ideas by email and see if anything catches.

On the bus ride home, I built a nucleosynthetic model of the detailed chemical abundances we are getting out of The Cannon. Right now there are various idiotic things about my model: It uses no physics inputs, and it is ridiculously slow. However, it is a skeleton on which we could build an interpretable, physical model of how the stars got their elements. The idea I have in mind is to build data-driven yield “vectors”, but to build them as perturbations on theoretically computed yield vectors, and thereby preserve some aspects of interpretability relative to a truly free model.


my heroes at Cornell

I gave the MacConochie lecture at Cornell Astronomy today. I spoke about The Cannon. At the start of my talk, I thanked Tom Loredo (Cornell) for starting the consideration of principled inference in astronomy that has been so ascendent (and so influential in my own work), and Paul Ginsparg (Cornell) for starting the arXiv, which may be the most important development in physics in my lifetime! I had the pleasure of dinner with Loredo and will meet Ginsparg tomorrow.

In my day of meeting with people, the most fun was had with the graduate students, who fed me pizza and talked about their work. The range of activities is extremely wide at Cornell, with a range from the high-redshift universe to near-Earth asteroids. I raised the question: Is radar ranging of asteroids really “astronomy”? The reason I ask is: It is not a passive collection of photons (or cosmic rays or neutrinos or gravitational waves); it is an actively controlled scattering experiment!


Kant was wrong?

Very early in the day, Rix and I talked about the future of stellar spectroscopy. With The Cannon we have shown that detailed abundances can be measured in lower signal-to-noise and lower resolution data than anyone imagined. Now we have to make this case in such a way that we influence future projects!

Late in the day, Juna Kollmeier (OCIW) gave a talk at the Simons Foundation. She gave a wide-ranging talk, about gravity from large scales to black holes. The questions were all about Einstein! She said some provocative things, for example: She said that Immanuel Kant was wrong about physics, which surprised me! I am going to look up the quotation she gave; my guess is that he was talking about materialism, not physics, and therefore was not wrong. But I will find out. I am a huge fan of Kant (in university I painted his face on the back of my leather jacket). She showed cave paintings of the sky, and it made me wonder if the time baseline trumps the precision for measuring proper motions? Probably not, but I bet there's a literature. She showed that Slipher was the first astronomer to get good evidence for a black hole. And etc.


not much

I caught up on backlogs of non-research things today. My research time was spent tweaking our Pigeons-in-holes model for testing MCMC and my visualization of the differences in abundance patterns across supposedly-similar alpha elements.


two kinds of alpha elements?

A few days ago I sent around—to the APOGEE Collaboration—plots of large-scale gradients of chemical-abundance ratios in the Milky Way. One of the comments (made by several collaborators) was “that's odd; the alpha elements don't track each other!” I started looking at this today: Can we show that there are multiple kinds of alpha elements? And is that interesting? I made plots of five different [α/Fe] ratios: [O/Fe], [Mg/Fe], [Si/Fe], [S/Fe], and [Ca/Fe], all as a function of various things, like metallicity, Galactic radius, Galactic height, stellar effective temperature and gravity, and so on. It does look like there are (at least) two different alpha-elements. I discussed with Rix whether this is interesting and how to make the case that the effects are real.


pigeons in holes

The loyal reader knows that Dan Foreman-Mackey and I are working on building toy problems for inference (that is, priors and likelihood functions) that contain nasty but realistic degeneracies for testing MCMC and fully marginalized likelihood (FML) computations. I got the first version of the code working today, and performed some minimal unit tests.

The code builds a mixture of M-choose-K times K-factorial Gaussians in KD-dimensional space such that the model has all combinatoric degeneracies from (a) not knowing which of the M holes to put the K pigeons into and (b) not knowing which pigeon is which (labeling degeneracy). This mixture of Gaussians gets huge fast, but as long as the prior is also a mixture of Gaussians, perfect sampling, exact marginalization, and analytic FML calculations are all possible without MCMC; it is therefore a perfect testbed.


not much

It was a low-research day! But I did sell a drawing (long story). Anyway, the dark-matter ideas we talked about just yesterday at Rutgers were investigated and published today on arXiv (not by us).


are the LIGO black holes in fact dark-matter particles?

[This is blog post 2600. You have to be a real nerd to know why I care.]

I gave the annual Robbins Lecture in the Department of Physics at Rutgers University. I spoke about noise modeling and the discoveries of transiting exoplanets. Before my talk, I had many interesting discussions around the department, including extensive discussion of my favorite black-hole dark-matter model: tens- to hundreds-of-Solar-mass black holes. This model is not ruled out by anything, and it is possible to calculate exactly. Now, interesting question: What if the black holes discovered by LIGO are in fact dark-matter particles? Matthew Buckley (Rutgers) and I discussed this idea in some detail: Is the implied event rate consistent with three-body and many-body capture processes? Would the stochastic background be too loud? Etc.


error propagation at #astrohackny, are MCMC runs converged?

At #astrohackny today, Adrian Price-Whelan and I led a discussion of error propagation and reporting. I talked about three basic methods of error propagation: Exact, when the model is linear and the noise is Gaussian, linearized, by taking derivatives to make a Fisher matrix approximation, and with MCMC. I emphasized taking a geometric view of the situation. Price-Whelan talked about methods for reporting values and uncertainties at the end of a data analysis project. His main punchline is that there is no way to summarize a whole posterior pdf (or likelihood function) with a number and an error bar, so you should just do something sensible and report what you did precisely. Also, you should give the reader a method for obtaining your posterior samples or likelihood function code.

Late in the day I discussed single transits in Kepler with Dan Foreman-Mackey. He is finding that his MCMC runs to characterize the multiple-planet systems he has found are showing very, very long autocorrelation times (like it is taking many CPU days or weeks to sample). If he is right, this throws doubt (in my mind) on any posterior sampling in the parameter space of (say) a 5-planet model. And there are a few claims in the literature of converged samplings.