On the plane home from Canberra, I worked on my various writing projects.
In the time around sessions, and on the conference hike (with kangaroos), I had many conversations with Marcus Frean (Wellington) about a generalization of his source finding system. He finds sources in data by looking for anomalies in the "pixel histogram"; his model has no knowledge of astronomy (or anything else) on the inside. We discussed several generalizations in which the model would learn—as it saw more and more data—what kinds of properties that astronomical data have. The idea is that the system should learn that the anomalies (sources) fall into different categories, learn the process that generates each of those categories, and instantiate new categories as required by the data. A system like this would be like a simulation of a hypothesis-generating astronomer! It also would be extremely useful running on any operating astronomical survey; "eyes on the data" are always valuable and usually very expensive. As my reader knows, I think that eyes on the data is the most valuable contribution of massive citizen science projects like the Zooniverse; awesome if we could add some robots into the mix!
At the end of the day (Australian time), during the MaxEnt2013 conference dinner, Gaia launched! It looks right now like the launch was successful. This is potentially the beginning of a new era in observational astrophysics. Congratulations to everyone involved.
I spoke at MaxEnt2013 today, in a short astronomy session that included also Ensslin (MPA), Frean (Wellington), and Brewer. Brewer spoke about our project to fully marginalize out catalogs, and Frean showed some exceedingly general methods for source discovery in data streams, applied (among other things) to astronomical data. I pitched a project to him at lunch about predicting or automatically inspecting survey data as it comes off of telescopes, which would be a beautiful extension of his work. Ensslin showed awesome reconstructions of astrophysical fields (especially the magnetic field in the Galaxy) from sparse data samples (rotation measures, in this case). He uses ideas from field theory to go beyond the Gaussian Process.There were many valuable talks; too many to mention. Stand-outs for me included a talk by Hutter (ANU) about things that overlap my crazy paper. He was arguing for a message-length approach to selecting theories, especially huge theories of everything. He made the good point that the message must include both the initial conditions and a description of the position of the observer. Hutter describes himself as a mathematical philosopher. Lineweaver (ANU) argued passionately that the Universe is not a fluctuation away from a high-entropy state (I agree) and Goyal (Albany) argued that exchangeability can be used to prove that the universe can only contain fermions and bosons (nothing else). On the latter, I would like to understand it better; I certainly grew up learning the opposite: I learned that this was an additional postulate. Wood (ANU) gave a nice overview of probabilistic topic models and their value and limitations.
After lunch, there were break-out sessions, and we guided the (very well attended) astronomical one to things where Brewer, Murray, and I overlap. We talked about combining information from images taken at different times, through different bandpasses, and with very different calibration properties. The issues are very different if you have the multiple images or if you just have catalogs. Many good ideas came up, including many that I had (nearly) forgotten from my Gaia paper. In the end, we didn't resolve anything but we specified a very sensible project, which is to figure out how one might construct catalog outputs such that the catalogs can be combined to produce inferences that are almost as good as the inferences you get from working with the images directly. Very sensible! And very related to abortive projects I have started with Marshall.
At the end of a long day, Huppenkothen (Amsterdam) was showing Murray and me bursts from Fermi observations of a magnetar, and discussing ways we might fit the data with some kind of process (Gaussian or dictionary). We accreted Brewer and Frean and then challenged ourselves to produce a result by midnight. After a monster hack session we succeeded; we hope to be able to use what we have to constrain rise times (or make some new discovery) in these kinds of bursts.
A little bit of hooky: I was kidnapped by Aaron Dotter (ANU) and taken up to Mt Stromlo Observatory, to chat with the locals about data and calibration (my favorite subjects these days). Ken Freeman (ANU) came by, and we discussed the just-starting HERMES project, which is on sky and taking data. The project is performing a survey of a million stars at high s/n (like 100) and high-ish resolution (like 30,000 or 50,000). The idea is to do "chemical tagging" and dissect the Milky Way into its accretion-historical parts. There are two challenges we discussed. The first is calibration and extraction of the spectra, which must deal with cross-talk between fibers (yes, it is fiber-fed like APOGEE) or their traces on the CCD, modeling and removal of sky in wavelength regions where there are no sky lines, and determination of the point-spread and line-spread functions as a function of position in the device. The second is the chemical tagging problem in the limit that models are good but not perfect. I have many ideas about that; they fall into the category we talked about with Sontag (NYU) last week: Simultaneously we want to use the models to understand the data and the data to update the models.
In the meeting today I missed some of the talks, of course, which I regret. There were talks in the morning about the possibility that not only does nature (in equilibrium) maximize entropy, but perhaps when it is out of equilibrium it also maximizes entropy production. I actually worked on this as an undergraduate back around 1991 with Michel Baranger (MIT emeritus); back then we were suspicious that the principle could even work. I think now it is known that for some systems, in some circumstances, they do seem to choose the path of maximum dissipation, but the general value of the principle is not clear; it might even be more misleading than useful.
Iain Murray (Edinburgh) gave a great talk (clear, surprising, useful), about inferring density functions (probability distributions) given points. He showed some amazing results for "autogregressive" models, which are so much more general than what I thought they were this summer when we were working on Kepler. He gave me a lot of new ideas for MJ Vakili's project.
Today was the first day of MaxEnt 2013 in Canberra, Australia. There were many great talks, including about exponential-family probability distributions and their generalizations, image reconstruction from geophysical and medical imaging, model selection via marginalized likelihood, and inference and decision making for networks and flow. There were also amusing discussions of the "surprise test paradox" and the "black raven paradox" and other simple probability arguments that are very confusing. These conversations carried into dinner, at which Iain Murray (Edinburgh) and Brewer and I argued about their relevance to our understanding of inference.
The most productive part of the day for me was at lunch (and a bit beyond), during which Murray, Brewer, and I argued with various attendees about the various topics of discussion I brought to Australia to discuss with Murray. Among the various things that came up are GPLVM (by Neil Lawrence) as a possible tool for Vakili and me on galaxy image priors, Gaussian Processes for not just function values but also derivatives (including higher derivatives) and the potential this has for "data summary", and MCMC methods to find and explore badly multi-modal posterior pdfs. We spent some significant time discussing how to make likelihood calculation more efficient or adaptive. In particular, Murray pointed out that if you are using it in a Metropolis–Hastings accept/reject step, how precisely you need to know it depends on the value of the random number draw; in principle this should be passed into the likelihood function! We also spent some time talking about how large-scale structure is measured. Murray had some creative ideas about how to use better the existing cosmological simulations in the data analysis.
Part of CampHogg had lunch with David Sontag (NYU) today; Sontag works on tractable approximations to intractable Bayesian inferences (among other things). In particular, he is interested in making scientific discoveries in data using non-trivial Bayesian models. We spent much of lunch discussing the gap between supervised and unsupervised methods; most important scientific tasks don't fall clearly into one category or the other, and progress in the huge region between them could be immensely useful. I pitched several April-Fools scale projects at Sontag; none have quite stuck yet.
In the afternoon, Christian Ott (Caltech) gave a nice talk about numerical modeling of exploding stars, and the persistent problem that supernova explosions do not reliably happen on the computer. That deep problem has been around for all the time I have been an astrophysicist!
Over lunch we the state of Foreman-Mackey's sampling of all Kepler Objects of Interest, and the state of Fadely and my project to infer the spectra of objects using photometry alone. With the term coming to an end, it was barely a research day.
In a secret "overwhelming force" project, Foreman-Mackey is resampling all the Kepler Objects of Interest, to produce a full probabilistic catalog. In many low signal-to-noise systems, there are fitting degeneracies (really near-degeneracies). In many of these, the posterior pdf does not accord with our intuitive views about what ought to be going on. We realized that this is because our "flat priors" didn't accord with our real, prior beliefs. That is, there was a disparity between our actual prior beliefs and our coded-up prior function. We knew this would be the case—we are using wrong but simple interim priors that we plan to replace with a hierarchical inference—but it was amusing to be reminded of the the simple point that ugly priors lead to ugly inferences. We made some small changes to the priors, resampled, and our inferences look much better.
Say you have many noisy inferences of some quantity (planet radius, say, for hundreds of planets), and you want to know the true distribution of that quantity (the planet-radius distribution you would observe with very high signal-to-noise data). How should you estimate the distribution? One option: Histogram your maximum-likelihood estimates. Another: Co-add (sum) your individual-object likelihood functions. Another: Co-add your individual-object posterior pdfs. These are all wrong, of course, but the odd thing is that the latter two—which seem so sensible, since they "make use of" uncertainty information in the inferences—are actually wronger than just histogramming your ML estimates. Why? Because your ML-estimate histogram is something like the truth convolved with your uncertainties, but a co-add of your likelihoods or posteriors is pretty-much the same as that but convolved again. The Right Thing To Do (tm) is hierarchical inference, which is like a deconvolution (by forward modeling, of course). I feel like a skipping record. Fadely, Foreman-Mackey, and I discussed all this over lunch, in the context of recent work on (wait for it) planet radii.
In MCMC meeting today, Goodman brought up the difference between the fully marginalized posterior model probability and what you learn by cross-validation, for model selection. As my loyal reader knows, I have many thoughts about this and also a nascent paper with Vanderplas. However, Goodman has a different take from me: He sees cross-validation as producing the most predictive model (preventing over-fitting), but posterior probability as delivering the most probable model, given the universe of models. I think this is deeply (and obviously) correct. However, we haven't settled on words for Hou's paper yet, because he is still willing to use the T-word ("truth"), and I am not! (I also think, in the end, this is all related to the point that we want to deliver the most useful result for our purposes, and this necessarily involves utility.)
Ramirez-Ruiz (Santa Cruz) gave a morning talk on tidal disruption flares. He finds that for every fully disrupted star (by a central black hole in a galaxy) there should be many partially disrupted stars, and we should be able to find the remnants. The remnants should look odd in various ways, and be on odd orbits. Worth looking for!
In the afternoon, Johnson (Harvard) talked about exoplanets around cool stars, with some diversions into interesting false positives (a white dwarf eclipsing an M star, providing a gravitational-lensing measurement of the white dwarf mass) and new hardware (the MINERVA project is building an exoplanet search out of cheap hardware). Johnson gave various motivations for his work, but not the least was the idea that someday we might go to one of these planets! A great pair of talks. Late in the afternoon, Johnson and I collected some suggestions for combining our research groups and projects.
John Asher Johnson (Harvard) and Ben Montet (Caltech) are secretly visiting NYU today and tomorrow. We spent a long session today talking about overlapping interests. Montet showed results on the population of long-period planets, where they have radial velocity trends (not period signals but trends) and adaptive-optics imaging upper limits (that is, no detections). What can you conclude when you have no period and no direct detection? A lot, it turns out, because the trends sets a relationship between mass, inclination, and period, and the adaptive optics rules out a large class of binary-star models.
In related news, Johnson was excited about the methods Fadely and I are using to infer the HST pixel-convolved point-spread function. It is very related to methods he wants to use to infer the line-spread function in the HiRes data he has on exoplanets. He was particularly impressed by our smoothness priors that regularize very flexible model fits without breaking convexity.
In a low-research day, Fadely and I discussed the issue that any HST PSF model must be able to track or model the dependence of the PSF on focus, which changes during the mission, both in a secular way and through the orbit (depending on pointing). That's a problem, because with only millions of stars, we do not have lots of excess data for our goals.
Fadely and Foreman-Mackey are both having fitting issues that are hard to comprehend, both in extremely ambitious comprehensive data analysis programs. Fadely has a model where the update steps (the hand-built optimization steps) are guaranteed (by math) to improve the objective function and yet, and yet! I asked him for a document, so we can compare code to document. His model is a beautiful one, which simultaneously finds the position and flux of every star in the HST data for the WFC3 IR channel, the point-spread function, and the pixel-level flat-field!
Foreman-Mackey is finding that his automatic re-fits (samplings using emcee and interim priors) to all Kepler Objects of Interest are favoring high impact parameters. This is a generic problem with exoplanet transit fits; the KOI best-fit values have these biases too; that doesn't make it trivial to understand. Even our hierarchical Bayesian inference of the impact-parameter distribution is not okay. It has something to do with the prior volume or else with the freedom to fit at larger impact parameter; or perhaps a (wrong) lack of penalty for large planets. Not sure yet. We have some hypotheses we are going to test (by looking at the samplings and prior-dependences) tomorrow.
Vakili and I discussed the issue that you can run kernel PCA on galaxy images, or on the shapelet transforms of galaxy images, and you should get the same answer. PCA is invariant to rotations of the coordinate system. However, really we are using the shapelets for denoising: We truncate the high-order terms that we think are noise-dominated. We discussed less heuristic approaches to this.
At MCMC meeting, Hou showed his impressive results on marginalized likelihood computations. He gets answers that are provably (if the code is correct) unbiased and come with uncertainty estimates. He gets some discrepancies with numbers in the literature, even when he uses the same data and the same prior pdfs, so we are confused, but we don't know how to diagnose the differences. Goodman explained to us the magical Bernstein–von Mises theorem, which guarantees that the posterior pdf approaches a Gaussian as the data grows very large. Of course the theorem depends on assumptions that cannot possibly be true, like that the model space includes the process that generated the data in the first place!
On the phone with the exoSAMSI crew, we de-scoped our first papers on search to the minimum (and set Spring targets for completion). At lunch, Mark Wyman (NYU) talked about modifications to inflation that would make the gravitational wave signatures both more prominent and more informative.
Or Graur (JHU), Yuqian Liu (NYU), Maryam Modjaz (NYU), and Gabe Perez-Giz (NYU) came by today to pick my brain and Fadely's brain about interpreting spectral data. Their problem is that they want to analyze supernova spectral data, but for which they don't know the SN spectral type, don't know the velocity broadening of the lines, don't know the true spectral resolution, don't know the variance of the observational noise, and expect the noise variance to depend on wavelength. We discussed proper probabilistic approaches, and also simple filtering techniques, to separate the signal from the noise. Obviously strong priors on supernova spectra help enormously, but the SN people want to stay as assumption-free as possible. In the end, a pragmatic filtering approach won out; we discussed ways to make the filtering sensible and not mix (too badly) the signal output with the noise output.
Aside from a blackboard talk by Maryam Modjaz (NYU) about supernova types and classification, it was a day of all talk. I spoke with Goodman and CampHogg about Hou's paper on marginalized likelihood calculation using the geometric path. I spoke with Vakili about how you go, in kernel PCA, back from the (high dimensional) feature space back to the original data space. (It's complicated.) I spoke with the exoSAMSI crew about exoplanet populations inference; Megan Shabram (PSU) is close to having a hierarchical inference of the exoplanet eccentricity distribution (as a function of period). Finally, I spoke with Foreman-Mackey about his new evil plan (why is there a new evil plan every four days?) to build an interim-prior-based sampling of the posterior density of exoplanet parameters for every KOI in the Kepler Catalog.
In the astro seminar, Carlos Badenes (Pitt) talked about white-dwarf–white-dwarf binaries and an inferred rate of inspiral, based on SDSS spectra split up exposure by exposure: The orbits of the soon-to-merge white dwarfs are so fast and short-period that even the twenty-minute intervals between spectral exposures in SDSS are long enough to show velocity changes! He finds a merger event rate for the binaries large enough to explain the type-Ia supernova rate, but only if he permits sub-Chandrasekhar total masses to make the SNe. That is, he gets enough events, but they tend to be low-mass.
Tim Morton (Princeton) spent the day at NYU to talk exoplanets, sampling, selection functions, marginalized likelihoods, and so on. We had a productive talk about making high-performance importance-sampling code to compute the marginalized likelihoods.
I spent too much time today trying to understand kernel PCA, inspired by Vakili's use of it to build a probabilistic model of galaxy images. Schölkopf would be disappointed with me! I don't see how it can give useful results. But then on further reflection, I realized that all my problems with kPCA are really just re-statements of my problems with PCA, detailed in my HMF paper: PCA delivers results that are not affine invariant. If you change the metric of your space, or the units of your quantities, or shear or scale things, you get different PCA components. That problem is even more severe and hard to control and incomprehensible as you generalize with the kernel trick.
I also don't understand how you go from the results of kPCA back to reconstructions in the original data space. But that is a separate problem, and just represents my weakness.
In a low-research day, I discussed spectral plotting with Jeffrey Mei (NYUAD). This is serious wheel-reinvention: Every student who works on spectra pretty-much has to build her or his own plotting tools.
In a blast from the past, James Long (TAMU) called me today to discuss a re-start of what I like to call the "insane robot" project, in which we are fitting photometric data censored by an unknown (but assumed stationary) probabilistic process. This project was started with Joey Richards (wise.io), who wrote much of the code with Long's help, but it has been dormant for some time now. One astonishing thing, after a couple years of disuse, the code was comprehensible and ran successfully. Let's hear it for well-documented, well-structured code!
Late in the day, Foreman-Mackey proposed a very simple approach to inferring exoplanet population parameters, based only on the content of the Kepler "Object of Interest" catalog. That is, a way to build a probabilistic model of this catalog that would be responsible and rigorous (though involving many simplifying assumptions, of course). It relates to projects by Subo Dong and others, who have been doing approximations to hierarchical inference; one goal would be to test those conclusions. The common theme between the exoplanet project and the insane robot project is that both require a parameterized model of the completeness or data censoring; we don't know with any reliability in either case the conditions under which an observation makes it into the catalog.
I spoke with MJ Vakili today about how to turn his prior over galaxy images into a probabilistic weak lensing measurement system. Any time we write down a probabilistic model, we need to be able to evaluate the probability of some set of parameters given data, or some set of data given parameters, and we also need to be able to sample from it: We need to be able to generate fair samples of artificial data given parameters, and generate fair samples of parameters given data. Vakili is assigned with the task of making both kinds of operations first correct and second fast; the weak lensing community won't care that we are more righteous if we aren't practicable.
Masao Sako (Penn) gave the astro seminar today, talking about supernova cosmology, now and in the near future. Afterwards we discussed the possibility that precise cosmological measurements may be reaching their maximum possible precisions, some from cosmic variance and some from complicated and random systematic issues (unknown unknowns, as it were).
Before and at lunch, CampHogg discussed the chapters and title for Hou's PhD thesis, which is about probabilistic inference in the exoplanet domain. This subject of discussion was inspired by Hou's extremely rapid write-up of his new MCMC method (which he is calling multi-canonical, but which we now think is probably a misnomer).
After the Spitzer Oversight Committee meeting came to a close, I got lunch with Heather Knutson (Caltech), during which I picked her brain about things exoplanet. She more-or-less agreed with my position that if any eta-Earth-like calculation is going to be precise, it will have to find new, smaller planets, beyond what was found by Petigura and company (in their pay-walled article, and recent press storm). That said, she was skeptical that CampHogg could detect smaller-sized planets than anyone else has.
Knutson described to me a beautful project in which she is searching the hot jupiters for evidence of more massive, outer planets and she says she does find them. That is, she is building up evidence that migration is caused by interactions with heavier bodies. She even finds that more massive hot Jupiters tend to have even more massive long-period siblings. That's pretty convincing.
I spent the day at the Spitzer Science Center participating in a review of preparations for Spitzer's proposal to the NASA Senior Review, which is empowered to continue or terminate the ongoing missions. I also wrote text for the NSF proposal being submitted by Geha, Johnston, and me.
Today I turned down an invitation to the White House. That might not be research, but it sure is a first for me! I turned it down to hang out more with Vanderplas (UW). I hope he appreciates that! At the White House Office of Science and Technology Policy (okay, perhaps this is just on the White House grounds), there was an announcement today of the Moore-Sloan Data Science Environment at NYU, UW, and Berkeley. This is the project I was working on all summer; it has come to fruition, and we start hiring this Spring. Look for our job ads, which will be for fellowship postdocs, software engineering and programming positions, quantitative evaluation (statistics) positions, and even tenure-track faculty positions (the latter coming from NYU, not Moore and Sloan, but related).
At lunch, Vanderplas, Foreman-Mackey, Fadely, and I discussed alternative publication models and how they relate to our research. Foreman-Mackey reasserted his goal of having any exoplanet discoveries we make come out on Twitter before we write them up. Vanderplas is wondering if there could be a scientific literature on blogs that would "play well" with the traditional literature.
Earlier in the morning, Vanderplas gave us some good feedback on our data-driven model of the Kepler focal plane. He had lots to say about these "uninterpretable" models. How do you use them as if they provide just a calibration, when what they really do is fit out all the signals without prejudice (or perhaps with extreme prejudice)? Interestingly, the Kepler community is already struggling with this, whether they know it or not: The Kepler PDC photometry is based on the residuals away from a data-driven model fit to the data.
Jake Vanderplas (UW), internet-famous computational data-driven astrophysicist, showed up at NYU for a couple of days today. He showed us some absolutely great results on objective design of photometric systems for future large imaging surveys (like LSST). His method follows exactly my ideas about how this should be done—it is a scoop, from my perspective—he computes the information delivered by the photometric bandpasses about the quantities of interest from the observed objects, as a function of exposure time. Fadely, Vanderplas, and I discussed what things about the bandpasses and the survey observing strategy he should permit to vary. Ideally, it would be everything, at fixed total mission cost! He has many non-trivial results, not the least of which is that the bandpasses you want depend on the signal-to-noise at which you expect to be working.
In the afternoon, Hou, Goodman, Fadely, Vanderplas, and I had a conversation about Hou's recent work on full marginalization of the likelihood function. In the case of exoplanet radial-velocity data, he has been finding that our simple "multi-canonical" method is faster and more accurate than the much more sophisticated "nested sampling" method he has implemented. We don't fully understand all the differences and trade-offs yet, but since the multi-canonical method is novel for astrophysics, we decided to raise its priority in Hou's paper queue.
In a day of proposal and letter writing, Fadely came by for a work meeting. We discussed all his projects and publications and priorities. On the HST WFC3 self-calibration project, he is finding that the TinyTim PSF model is not good enough for our purposes: If we use it we will get a very noisy pixel-level flat. So we decided we have to suck it up and build our own model. Then we realized that in any small patch of the detector, we can probably make a pretty good model just empirically from all the stellar sources we see; the entire HST Archive is quite a bit of data. Other decisions include: We will model the pixel-convolved PSF, not the optical PSF alone. There is almost no reason to ever work with anything other than the pixel-convolved PSF; it is easier to infer (smoother) and also easier to use (you just sample it, you don't have to convolve it). We will work on a fairly fine sub-pixel grid to deal with the fact that the detector is badly sampled. We will only do a regularized maximum likelihood or MAP point estimate, using convex optimization. If all that works, this won't set us back too far.
Late in the day I zoomed up to Columbia to discuss streams with Bonaca, Johnston, Küpper, and Price-Whelan. We discussed things related to our upcoming NSF proposal. One idea in the proposal is too look at models of the Milky Way gravitational potential that make use of expansions. In these kinds of problems, issues arise regarding what expansion to use, and what order to go to. On the former, choices include expansions that are orthogonal in something you care about, like the potential or density, or expansions that are orthogonal in the context of the data you have. That is, the data constrain the potential incompletely, so an expansion that is orthogonal in the potential basis will not have coefficients that are independently constrained by the data; there will be data-induced covariances in the uncertainties. On the latter (what order), choices include, at one extreme, just making a heuristic or educated guess, and on the other extreme, going fully non-parametric and inferring an infinity of parameters. You can guess what I want to try! But we will probably put more modest goals in the proposal, somewhere in-between. Amusingly, both of these problems (orthogonal expansions for incomplete observations, and choices about expansion order) come up in cosmology and have been well studied there.
Jeffrey Mei (NYUAD) came by to discuss his project to infer the dust extinction law from SDSS spectra of F stars. We talked about a "centering" issue: We are regressing out g-band brightness (flux; a brightness or distance proxy), H-delta equivalent width (a temperature proxy), and extinction amplitude from the Schlegel, Finkbeiner, & Davis map. The coefficient of the latter will be interpretable in terms of the extinction law (the dust law). Because we have regressed out the observed g-band brightness, we get that the mean effect of extinction is zero in the g-band; that is, our results about the dust extinction are distorted by what we choose, precisely, to regress out. Another way to put it: The brightness is a function of distance, temperature, and extinction. So if you regress that out, you distort your extinction results. The point is obvious, but it took us a while to figure that simple thing out! We have a fix, and Mei is implementing.
MJ Vakili delivered to me a draft manuscript on his prior over galaxy images. In the introduction, it notes that the only other times things like this have been done it has been to reduce the dimensionality of the space in which galaxy images are modeled or represented. This is a baby step, of course, towards a prior on images, but only a baby step, because principal component coefficients don't, in themselves, have a probabilistic interpretation or result in a generative model.
On the board in my office, Vakili explained how he would use the prior over images to make the best possible measurement of weak gravitational-lensing shear; it involves marginalizing out the unsheared galaxy image, which requires the prior of which we speak. The cool thing is that this solves—in principle—one of the ideas Marshall and I hatched at KIPAC@10, which was to use the detailed morphological features in galaxies that go beyond just overall ellipticity to measure the shear field. Now that's in principle; will it work in practice? Vakili is going to look at the GREAT3 data.
Dennis Zaritsky (Arizona, on sabbatical at NYU) gave the astro seminar today, about measuring the IMF for the purposes of understanding the mass-to-light ratios of stellar populations. He is using bound stellar clusters, measuring kinematic masses and visible and infrared photometry. He finds that there seem to be two different IMFs for stellar clusters, one for the oldest clusters and another for those less than 10 Gyr in age. But this difference also more-or-less maps onto metallicity (the older clusters are more metal poor) and onto environment (the younger clusters are Magellanic-Cloud disk clusters, the older clusters are Milky-Way bulge and halo clusters). So it is hard to understand the causal relationships in play. Zaritsky is confident that near-future observations will settle the questions.
At lunch, Fadely proposed that we enter the Strong-Lens Time Delay Challenge. We imagined an entry that involves multi-band Gaussian Processes (like those worked out by Hernitschek, Mykytyn, Patel, Rix, and me this summer) added to multi-band Gaussian Processes. Time to do some math.
Fergus, Neil Zimmerman (MPIA), and I chatted on the phone a bit today. Zimmerman wants to write a proposal to build a forward model of a coronographic spectrograph (think P1640 or GPI); he has the intuition (which I share) that if you built such a model, you could calibrate the data far better. Right now calibration is performed before and after observing; the science frames are expected to be in agreement with the calibration meta-data or some interpolation of it, and the data are "corrected" or transformed from the instrument coordinates (two-dimensional pixels) to some calibrated object (three-dimensional imaging spectroscopic boxels). But since flexure and temperature changes can be non-trivial, and since the science frames contain so many photons, it would be better to learn the calibration of the spectrograph from the appropriate combination of the calibration and science data, and it would be better to perform the comparison between models and data at the pixel level. That's a theme of this blog, of course. We discussed what kinds of small, toy, demonstration systems could show this, convincingly enough for a proposal, relevant to the real thing, but easy to set up and use as a little sandbox.
I spent the day at STScI, giving a talk about hierarchical inference, hosted by Lou Strolger (STScI), and also chatting with various. There is so much going on at STScI and JHU; it was a busy day! One theme of my conversations was calibration (of course); CampHogg and STScI are aligned in wanting to make calibration simultaneously more precise and less time-consuming (as in less consuming of observing time). Another theme was the short life of JWST; as a non-serviceable facility with expendables, it has a finite lifetime. This puts pressure not just on calibration, but also on every possible science program. We have to use this facility efficiently. That's a challenge to the whole community, but especially the many teams at STScI.
Who would have thunk it: I have spent the last 25 years doing astrophysics in some form or another, and now I am preparing to co-write a paper on computing the determinants of matrices. Foreman-Mackey and I met with Mike O'Neil (NYU) and Sivaram Ambikasaran (NYU) (both Applied Math) today about making determinant calculations fast. The crazy thing is that linear algebra packages out there are happy to make matrix inversion fast, but they uniformly discourage, disparage, or express incredulity about the computation of determinants. I understand the issues—determinants have ungodly units and therefore ungodly magnitudes—but we need to compute them if we are going to compute Gaussian probability densities. Our matrices are somewhat sparse, but the key idea behind Ambikasaran's method is that the matrices are smooth (columns are good at predicting other columns), or, equivalently, that the matrices have low-rank sub-matrices inside them. Plus fastness.
MJ Vakili (NYU) showed me today what he has been working on to generate a data-driven prior probability distribution over galaxies. It is great work. He finds that he can do a shapelet decomposition, truncate it, and then do a dimensionality reduction (again, as it were), and then fit the resulting distribution of components with a mixture of Gaussians. We have yet to show that the model is good, but when he samples from it, the samples look like actual galaxies. The point is this: If you want to measure a shear map (or anything else, for that matter) from galaxy images, you can't do proper inference if you don't have a prior over galaxy images. So we are playing around with the possibility of making one.
In the brown-bag today, Craig Lage (NYU) showed detailed simulations he is doing of the "Bullet Cluster". He is doing by-hand likelihood optimization, with an hours-long simulation inside the loop! But the results are gorgeous: He can reproduce all the large-scale features, and a lot of the small-scale details. He says it isn't a challenge to CDM, but it is a challenge to theories in which there is no dark matter. One of his goals is to test dark-matter interactions; it looks very promising for that.
On the airplane home from "The AD", I wrote in our paper about a data-driven model for the Kepler focal plane. I wrote about the following issue: This model is a data-driven, flexible model of the pixels telemetered down from the spacecraft. As such, the model doesn't contain anything within it that could be interpreted as the "flat-field" or as the "point-spread function" or a "source", let alone a source "brightness". But it is a good model! The question is: How to extract photometry? We have a plan, but it is debatable. The fundamental issue is that data-driven models are, almost by definition, uninterpretable (or at least not straightforwardly interpretable). Insane.
I spoke to the NYUAD Physics Department and related parties in a research seminar about inference and data-driven models, and then in the early evening I gave a public talk for the NYU Abu Dhabi Institute. In the latter forum I spoke about Dark Matter: What we know about it, how we know it, and what might come next. I got great questions and a lively multi-hour discussion with audience members (from a remarkable range of backgrounds, I might add) followed my talk.
I put lots and lots of (proverbial) red ink onto two papers. One is Hou's paper on diffusive nested sampling (with the stretch move) to compute fully marginalized likelihoods and inform decision-making about exoplanets and follow-up. The method is principled and accurate (but very slow). Hou has implemented, and clearly explained, a very complicated and valuable piece of software.
The other is Lang's paper on building new, better, deeper, and higher-resolution co-adds (combined imaging) from the WISE Satellite data. He included in the paper some of our philosophy about what images are and how they should be modeled and interpreted, which pleased me greatly. He is also delivering a data set of enormous value. Got infrared needs?
Jasper Hasenkamp (NYU) gave the brown-bag, about fixing anomalies between large-scale-structure cosmology results and cosmic-microwave-background cosmology results using mixed dark matter—the standard CDM model plus a small admixture of a (possibly partially thermalized) neutrino-like species. The model seems to work well and will make new predictions, including (in principle) for accelerator experiments. Mark Wyman (NYU) has also worked on similar things.
At the "No More Tears" phone-con (about Kepler planet-searching), we talked about wavelets with Bekki Dawson (Berkeley) and other exoSAMSI participants. In our MCMC meeting, we worked on finishing Hou's nearly finished paper on nested sampling, and we quizzed Goodman about mixing the stretch move (the underlying engine of emcee) with Metropolis-Hastings to capitalize on the observation that in most likelihood functions there are "fast" and "slow" parameters, where the "fast" parameters can be changed and the likelihood call re-made quickly, while the "slow" parameters require some large, expensive re-calculation. This is generic, and we came up with some generic solutions. Some of them are even permitted mathematically. Foreman-Mackey is thinking about these in the context of running n-body simulations within the inference loop
In other news, Lang delivered a draft paper about his work on the WISE imaging, and Fadely had some ideas about finding nails for our factor-analysis hammer.
Bonaca (Yale), Geha (Yale), Johnston (Columbia), Kuepper (Columbia), and Price-Whelan all came to NYU to visit CampHogg to discuss stream-fitting. We (like everyone on Earth, apparently) want to use streams to constrain the potential and accretion history of the Milky Way. Kuepper and Bonaca are working on simulation methods to make fake streams (quickly) and compare them to data. Price-Whelan is working out a fully probabilistic approach to generating stream data with every star carrying a latent variable which is the time at which it was released from the progenitor (this is my "Bread and Butter" project started at the end of this summer after sessions with Binney, Bovy, Rix, Sanders, and Sanderson). We have hopes of getting good inferences about the Milky Way potential (or acceleration field or mass density) and its evolution with time.
Price-Whelan and Foreman-Mackey spent some time coming up with very clever Gibbs-like strategies for sampling the per-star latent parameters (release time and orbit for each star) in an inner loop with an outer loop sampling the potential and progenitor parameters that are shared by all stars. In the end, we decided to de-scope and write a paper with brute-force sampling and a small data set. Even at small scope, such a paper (and software) would be state-of-the-art, because what we are doing treats properly missing data and finite observational uncertainties, which will be a first (unless Bovy or Sanders has scooped us?).
At lunch, I asked the team to say what a stream really constrains: Is it the potential, or the acceleration field, or the density? Clarity on this could be useful for guiding methods, expansions, parameterizations, and so on. In the afternoon, Geha, Johnston, and I also talked about joint funding opportunities and outlined a proposal.
I had a short conversation today with NYUAD (Abu Dhabi) undergraduate Jeffrey Mei about some work he has been doing with me to infer the extinction law from the standard stars in the SDSS spectroscopy. This is one of my ideas and seems to be working extremely well. He has built a generative model for the spectroscopy and it produces results that look plausibly like a dust attenuation law with some tantalizing features that could be interstellar bands (but probably aren't). I outlined a possible paper which he is going to start writing.
Late in the day, Paul Chaikin (NYU) gave a talk about the astounding experiments he has been doing with people in physics, chemistry, and biology to make artificial systems that act like life. He has systems that show motility, metabolism, self-reproducibility, and evolution, although he doesn't have it all in one system (yet). The systems make beautiful and very clever use of DNA, specific binding, enzymes, and techniques for preventing non-specific or wrong binding. Absolutely incredible results, and they are close to making extremely life-like nano-scale or microscopic systems.
In our semi-weekly arXiv coffee, Fed Bianco (NYU) showed us some papers about Pluto, including an occultation study (Pluto occults a background star) and the implications for Pluto's atmosphere. But then we got onto occultations and she showed us some amazing Kuiper-Belt-object occultation data she has from fast cameras on Hawaii. The coolest thing (to me) is that the occulters are so tiny, the occultations look different from different observatories, even Haleakala to Mauna Kea! Tycho Brahe would have loved that: The effect could have been used to prove (pretty much) the heliocentric model.
I spent a good chunk of the afternoon at the brand-new Simons Center for Data Analysis with applied mathematicians Leslie Greengard (Simons, NYU) and Mike O'Neil (NYU), talking about big matrices and inverting them and getting their determinants. Their codes are super-good at inverting (or, equivalently, providing operators that multiply by the inverse), even the ten-million by ten-million matrices I am going to need to invert, but not necessarily at computing determinants. We discussed and then left it as a homework problem. The context was cosmology, but this problem comes up everywhere that Gaussian Processes are being used.
[This is my 211th research blog post. That's a lot of posts over the last nearly-9 years! I'll be an old man when I post my 212th.]
At the brown-bag talk today, Gruzinov (NYU) talked about modeling pulsars using what he calls "Aristotelian Electrodynamics", which is an approximation valid when synchrotron radiation losses are so fast that charged particles essentially move along magnetic field lines. He claims to be able to compute realistic predictions of pulsar light-curves in the Fermi bandpass, which, if true, is a first, I think. He argued that all pulsars should live in a four-dimensional family, parameterized by two angles (viewing and dipole-misalignment), one spin period, and one magnetic dipole moment. If it all bears out, pulsars might be the new standard candles in astronomy!
In the afternoon, Foreman-Mackey and I went on the BayCEP phonecon of the exoSAMSI group, where we discussed hierarchical inference and approximations thereto. There are various projects close to doing a proper hierarchical probabilistic inference of the distribution of planets in various parameters. Eric Ford (PSU) is even implementing some of the ideas in this old paper.
At breakfast, I went through with Barclay and Quintana (Ames) my list of all the effects that lead to variability in Kepler light-curves. These include intrinsic stellar variability, stellar variability from other stars that overlap the target, stellar variability transferred to the target by electronics issues. They include stellar proper motion, parallax, and aberration. They include variations in spacecraft temperature, pointing, and roll angle. And so on. The list is long! I am trying to make sure we understand what our pixel-level model covers and what it doesn't. I am spending a lot of my writing time on our data-driven pixel-level model getting the assumptions, capabilities, and limitations clearly specified.
While Foreman-Mackey and Barclay set off on a tangent to measure (or limit) exoplanet masses using n-body models of exoplanet systems observed by Kepler, I had a great phone call with Schaefer (CMU), Cisewski (CMU), Weller (CMU), and Lang about using Approximate Bayesian Computation (ABC) to ask questions about the universality of the high-mass initial mass function (IMF) in stellar clusters observed in the PHAT survey. The idea behind ABC is to do a kind of rejection sampling from the prior to make an approximation to posterior sampling in problems where it is possible to generate data sets from the model (and priors) but impractical or impossible to write down a likelihood function.
The reason we got this conversation started is that way back when we were writing Weisz et al on IMF inference, we realized that some of the ideas about how high-mass stars might form in molecular clouds (and thereby affect the formation of other less-massive stars) could be written down as a data-generating process but not as a computable likelihood function. That is, we had a perfect example for ABC. We didn't do anything about it from there, but maybe a project will start up on this. I think there might be quite a few places in astrophysics where we can generate data with a mechanistic model (a simulation or a semi-analytic model) but we don't have an explicit likelihood anywhere.
At the end of the day, Sarah Ballard (UW) gave a great Physics Colloquium on habitable exoplanets and asteroseismology, and how these two fields are related. They are related because you only know the properties of the exoplanet as well as you can understand the properties of the star, and asteroseismology rocks the latter. She mentioned anthropics momentarily, which reminded me that we should be thinking about this: The anthropic argument in exoplanet research is easier to formulate and think about than it is in cosmology, but figuring it out on the easier problem might help with the harder one.
In the morning, to steady our thoughts, Foreman-Mackey, Barclay, and I wrote on the blackboard our model for the Kepler pixel-level data. This is the data-driven model we wrote up in the white paper. The idea is to fit pixels with other pixels, but taking as "other" pixels only those that are far enough away that they can't be being affected by the same star. These other pixels will share in spacecraft issues (like temperature and pointing issues) but not share in stellar variability or exoplanet transit effects, because different stars are independently variable. A key idea of our model, which Foreman-Mackey mocked up for our white paper, is that we avoid over-fitting the pixels by using a train-and-test framework, in which we "learn" the fit coefficients using data not near (in time) to the pixel value we are trying to predict (or de-trend).
In the evening, I started writing up this model and our results. We are all ready to write paper zero on this.
In the morning, Barclay and I decided to divide and conquer: He would write up the scope for a limb-darkening paper, and I would write up a plan for a re-calibration of the Kepler data (using Foreman-Mackey's regression model built for the whitepaper). We both then failed to complete our tasks (there's always tomorrow!). In the afternoon, I discussed large-scale structure measurements with Walsh and Tinker, who are looking at extensions to the halo-occupation model. One extension is to reconsider dependences for halo occupation on other halo parameters (other than mass). Another is to look at more large-scale-structure observables.
Tom Barclay (Ames) found himself in New York this week and we discussed our dormant project to measure stellar limb darkening using exoplanet transits. I love this project, because it is a way to image a star without high angular resolution imaging! We discussed applications for such measurements and (for the millionth time) the scope of our possible first paper. We also discussed many other things, including "what's next for Kepler?", and MCMC troubles, and searches for long-period planets. We also had a telecon about search with Bekki Dawson (Berkeley) and other exoSAMSI-ites. Late in the day, Hou reminded me that I owe him comments on his paper on nested sampling!
In an almost zero-research day, Wendy Freedman (OCIW) gave a great and inspiring talk about measuring the Hubble Constant with local measurements of Cepheid stars and supernovae. She demonstrated the great value of moving to infrared observations and argued convincingly that systematic uncertainties in the measurements are now down around the few-percent level. Of course the interesting thing is that these local measurements consistently get Hubble Constants a tiny bit higher (Universe a tiny bit smaller or younger) than the cosmic-microwave-background-based inferences. Freedman argued strongly that this tension should be pursued and tested, because (a) it provides a fundamental, independent test of the cosmological model, and (b) it could conceivably point to new physics. I agree.
The hypothesis-combination trick mentioned yesterday did indeed work, speeding up Foreman-Mackey's exoplanet search code by a factor of about 30 and keeping the same result, which is that we can detect Earth-like exoplanets on year-ish orbits around at least some Sun-like stars in the Kepler data. Now are there any? This speed-up, combined with the four orders of magnitude Foreman-Mackey got this weekend makes for a total speed up of 105.5, all in one crazy week. And then at lunch, Mike O'Neil (NYU) told us that given the form of our Gaussian Process kernel, he could probably get us another substantial speed-up using something called a "fast Gaussian transform". If this turns out to be true (we have to do some math to check), our exoplanet search could get down to less than a minute per star (which would be dandy).
In other overwhelming-force news, Fadely delivered psf-model fits to thousands of stars in HST WFC3 IR-channel data, showing that we can do a very good job of modeling the pixel-level data in preparation for flat-field determination. And Hou delivered a complete manuscript draft on his diffusive nested ensemble sampler. So it was a great day of hard work by my team paying off handsomely.
I worked on large-galaxy photometry with Patel for part of the day. She is dealing with all our problem cases; we have good photometry for almost any large, isolated galaxy, but something of a mess for some of the cases of overlapping or merging galaxies. Not surprising, but challenging. I am also working on how to present the results: What we find is that with simple, (fairly) rigid galaxy models we get excellent photometry. How to explain that, when the models are too rigid to be "good fits" to the data? It has to do with the fact that you don't have to have a good model to make a good photometric measurement, and the fact that simple models are "interpretable".
In the afternoon, we had a breakthrough in which we realized that Foreman-Mackey's exoplanet search (which he sped up by a factor of 104 on the weekend with sparse linear algebra and code tricks) can be sped up by another large factor by separating it into single-transit hypothesis tests and then hypothesis tests that link the single transits into periodic sets of transits. He may try to implement that tomorrow.
I spent some time with Kilian Walsh (NYU) discussing halo occupation (putting galaxies into dark-matter halos). With Jeremy Tinker (NYU) he is looking at whether enriching the halo parameters (currently only mass and concentration) with environment information will improve halo-occupation models. As per usual, I asked for plots that show how well they are doing without the environment.
In the afternoon, Foreman-Mackey delivered results of a (very limited) search for Earth-like exoplanets in the light-curve of a bright Kepler G dwarf star, using his Gaussian-process likelihood function. In the limited search, he was able to re-find an injected Earth-like transit with a 300-day period. That's extremely promising. It is not clear whether things will get a lot worse when we do a fuller search or go to fainter stars.
I spent the research part of my day at the Tri-State Astronomy meeting, beautifully organized by Geha (Yale) and Maller (CUNY) and others. I won't do it justice; there were five great review talks and lots of great posters (and one-minute poster summaries). Alyson Brooks (Rutgers), in her review talk, beautifully summarized the issues with CDM at small scales, and discussed the idea that baryonic processes might explain all of these. My take on her bottom line (that baryons might explain everything) is that it would be a disappointment if we can't see the predicted properties of the dark matter at small scales! But she did point towards isolated dwarf galaxies (that is, very low luminosity galaxies that are not satellites of any more luminous galaxy) as critical tests of the ideas; isolated dwarfs should differ from satellite dwarfs if CDM is going to stay okay at small scales. She also implicitly made many predictions about outflows from small galaxies at intermediate redshifts. And she gave a few shout-outs to Zolotov, who has been deeply involved in these baryon issues since leaving NYU.
Vy Tran (TAMU) showed up and we discussed inference of spectral energy distributions of high-redshift galaxies given only photometry (no spectroscopy). She showed us some nice results, but Fadely and I think we could actually infer spectral properties at wavelength resolution higher than that implied by the broad-band (or medium-band) photometry. So we more-or-less launched a collaboration.
On the way to lunch, Fadely, Foreman-Mackey, and I had an epiphany: Foreman-Mackey has been trying to set the hyperparameters of a Gaussian Process to optimize our ability to search for and find exoplanets. This has been befuddling because it is so frequentist (or not even frequentist you might say). The issue is that what you want to optimize (hyperparameter-wise) depends on the details of how you are going to decide what is an exoplanet (that is, on your threshold choices). We realized on the way to lunch that we probably should be choosing the hyperparameters to maximize the area under the ROC curve. Foreman-Mackey launched on this in the afternoon. Cool if it works; novel definitely.
In a last-minute clutch play on a low-research day, Foreman-Mackey showed me that our ability to detect exoplanet transits in the lightcurves of stochastically varying stars depends in a non-trivial way on the hyperparameters of the Gaussian Process we use to model the unocculted star. We are hoping we can tune to the most-sensitive-for-search hyperparameters before searching, and only switch back to being agnostic about the hyperparameters at the post-search characterization stage.
Having written some stuff that I wasn't happy with a few months ago, thought about it, forgot about it, remembered it, mentioned it on the blog here and there, and then dusted it off today, I got close to being ready to make an argument about when you should and shouldn't compute the marginalized likelihood, even if you are a committed probabilistic reasoner. The fundamental idea is that you shouldn't do model selection based on marginalized likelihoods; these are very challenging integrals but only approximations (and often bad ones) to the integrals you would really want to do to inform a model choice. I guess one way to put it is: Don't spend a huge amount of computer time computing something that is a worse approximation to what you want than something else that might be much easier to compute! I sent my argument for review to Brewer, my guru in all things Bayes. I want to say things that are useful and uncontroversial, so I need to be careful, because on these matters I tend to be like a bull in a china shop.
Late in the day I talked to a NYU Data Science class about possible research projects they could do with the Kepler data. As I was talking about how we search for exoplanets in the data (and how likely it is that the data contain many undiscovered planets), one of the faculty in charge of the class (Mike O'Neil) asked me how many exoplanets we (meaning CampHogg) have found in the data so far. I had to admit that the answer is zero. That's just embarrassing. I need to ride the team hard tomorrow.
I gave the CCPP Brown Bag talk today, about Kepler data and exoplanets. I was going to talk about calibration, flat-field, and our white paper, but I ended up talking about very flexible models, for intrinsic stellar variability and spacecraft-induced variability in the lightcurves. People were shocked that some of our models have hundreds of thousands of parameters. I didn't let them in on the secret that, in some sense, the Gaussian Processes we use have infinite numbers of parameters!
Tim Morton (Caltech) dropped by to talk about various things exoplanet. He has a very nice system for computing and propagating probabilities for various exoplanet and non-exoplanet (false-positive) scenarios, given Kepler data. He produces most of what you need in order to do efficient follow-up and population studies, given uncertain or noisy exoplanet identifications. In other news, today was Angus's last day here at NYU. She has been visiting for a month from Oxford, and started some projects with us on search, using Gaussian Processes under the hood. They told us it couldn't be done (too slow) but we are doing it.
I talked to Lucas Macri (TAMU) and Lifan Wang (TAMU) about optical observations being made in Antarctica at Dome A. This is a Chinese site that is horrifying to get to and use, but which has amazing observational properties, like excellent seeing, transparency, and sky spectrum. They have taken lots of valuable data with amazing 24/7 time coverage and short cadence. They even have some potential exoplanet discoveries. Wang has lots of ideas for next-generation projects, including some with a hefty component of high-grade robotics.
In the morning, Darren DePoy (TAMU) and Jennifer Marshall (TAMU) showed me the hardware being built for the HETDEX experiment. This involves 150 identical (or near-identical) fiber-fed spectrographs, plus a robot positioner. We spent a lot of time talking about mechanical engineering, because if you want to mass assemble a whole lot of instrumentation, you need it to be simultaneously easy to adjust mechanically, and unnecessary to adjust. This may be the future of astrophysics.
I spent today at Texas A&M University, where I spoke to the Physics Department. I took a risk and talked entirely about modeling astrophysics data, including even inferring the Kepler flat-field! Nick Suntzeff (TAMU) introduced me with a discussion of astro-statistics and its importance in the future of astrophysics, which made me feel a bit better about choosing such a technical topic. I particularly emphasized that making measurements in astrophysics problems—where we can't do controlled experiments—usually requires building a hybrid model that includes both data-driven components (for the parts of the problem that are complicated but we don't particularly need to understand), and causal-physical components (for the parts where we hope to gain some understanding). My examples were XDQSO, Kepler, Comet Holmes and the kitten, and the Solar System force law. On the first example, all I really said is that a whole lot of bad data can be as good as a small amount of good data, when you have a good noise model. On the last point, all I really said was that we have no idea how to scale up for Gaia.
The third and last day of dotastronomy 5 started with reports of the outcome of the Hack Day. Various extremely impressive hacks happened, way too many to mention, but including a very impressive video about planet naming, by Deacon and Angus and others, an automated astronomer-career mapping app by Foreman-Mackey and others, a XBox-Kinect doppler-shift app by Lynn that got everyone in the room dancing and spinning more than once, and (near and dear to my heart) improved functionality for the Zoonibot by Barentsen and Simmons and others. That latter hack is an extension of the the bot that got started by Beaumont and Price-Whelan (at, I am proud to say, my suggestion) at dotastronomy 4.
Among the talks, one of the highlights for me was Trouille (Adler) talking about the Galaxy Zoo Quench project, in which Zooites are taking the project from soup to nuts, including writing the paper. She spent a time in her talk on the problem of getting the participants to boldly play with the data as professional scientists might. It is a rich and deep piece of public outreach; it takes self-selected people through the full scientific process. Another highlight was Microsoft's Tony Hey talking about open access, open data, open science, libraries, and the fourth paradigm. Very inspiring stuff.
Related to that, there was great unconference action in a session on open or low-page-charge publishing models, led by Lynn (Adler) and Lintott (Oxford), in which Simpson (Oxford; and our fearless dotastronomy leader) got emotional (in all the right ways) about how crazy it is that the professional societies and individual scientists have signed away their right to their own work that they researched, wrote, reviewed, and edited for the literature. Testify!
I ran a short unconference session on combining noisy information coming from Zoo participants (or equivalent) in citizen-science and croud-sourcing situations. A good discussion of many issues came up, including about the graphical model that represents our assumptions about what is going on in the projects, about active learning and adaptive methods, and about exposing the internal data in real time so that external (third-party) systems can participate in the adaptive decision-making. I also advocated for boosting-like methods, based on the idea that there might be classifiers (people) with non-trivial and covariant residual (error) properties.
It has been a great meeting; Rob Simpson (Oxford) and Gus Muench (Harvard) deserve huge thanks for organizing and running it.
For reasons outside my control, I had to play hookey on the second day of dotastro. That is very sad, because it is the Hack Day, and the whole raison d'etre of the meeting, as far as I am concerned. The only (research) compensation for this was a fast meeting with Kathryn Johnston (Columbia) and Robyn Sanderson in which we planned various ways to take over the world. The general idea is that we might be able to build a team, connected by commuter rail, that covers the observational, theoretical, and data analytical sides of measuring the Milky Way potential with cold phase-space structures.
Today was the first day of dotastronomy, the meeting for astronomy and web and outreach and so-on, this time in Cambridge, MA. Stand-out talks included those by Stuart Lynn (Adler) on the Zooniverse and Elisabeth Newton (Harvard) about astronomy blogging in general (she mentioned this blog) and Astrobites in particular. Astrobites has been an incredible resource for astronomy, and it is carefully cultivated, edited, and managed. What a project!
In the afternoon we switched to unconference, some of which I skipped to attend a phonecon about Kepler data with the exoSAMSI crew, organized by Bekki Dawson (Harvard), who is effectively our leader. On that call, we discussed what everyone has been doing since exoSAMSI, which is quite a bit. Barclay (Ames) has been working on inferring the limb-darkening laws using transits as measuring tools. Quarles (Texas) has been searching the real-stars-with-injected-planets that we (read: Foreman-Mackey) made back at exoSAMSI, with some success. Foreman-Mackey and Angus have been searching for long-period systems with a fast Gaussian Process inside the search loop. We also spent some time talking about modeling the pixel-level data, since we at CampHogg have become evangelists about this. The SAMSI program, organized mainly by Eric Ford (PSU) has been incredibly productive and is effectively the basis for a lot of my research these days.
In my dotastro talk this morning, I mentioned the point that in "citizen science" you have to model the behavior of your citizens, and then generalized to "scientist science": If you are using data or results over which you have almost no control, you probably have to build a model of the behavior and interests and decision-making of the human actors involved in the data-generating process. In the afternoon, Lintott (Oxford) suggested that we find a simple example of this and write a short paper about it, maybe in an area where it is obviously true that your model of the scientists impacts your conclusions. That's a good idea; suggestions about how to do this from my loyal reader (you know who you are) are welcome.
In a low-research day, I spoke to the new graduate students about research opportunities among the astrophysicists in the Department. I ended up making a pitch for discovery and for origins: We work in a field in discoveries are daily occurrences. And we have some hope of understanding our origins, of our Universe, of our Galaxy, of our Solar System, and maybe even of our tree of life. I love my job!
The highlight of a low-research day was a visit from Roger Blandford (KIPAC), who gave the Physics Colloquium on particle acceleration, especially as regards ultra high-energy particles. He pointed out that the (cosmic) accelerators are almost the opposite of thermal systems: They put all the energy very efficiently into the most energetic particles, with a steep power-law distribution. He made the argument that the highest energy particles are probably accelerated by shocks in the intergalactic media of the largest galaxy clusters and groups. This model makes predicitions, one of which is that the cosmic rays pretty-much must be iron nuclei. In conversations over coffee and dinner we touched on many other subjects, including gravitational lensing and (separately) stellar spectroscopy.
At Computer-vision-meets-astronomy today, Fadely showed us all some example HST WFC3 images, some models of the PSF, and some comparisons between model and observed stars. I had never put two-and-two together, but the PHAT project (on the periphery of which I lie) has taken some absolutely awesome WFC3 images for the purposes of array calibration: The PHAT images (being of M31) are absolutely teeming with stars. Indeed, it is impressive that the PHAT team can photometer them at all. We discussed strategies for flat-field determination given that we have a good but not perfect PSF model and a lot of heterogeneous data.
After that but before lunch, we more-or-less decided that while Foreman-Mackey works on a Kepler light-curve likelihood function paper, Angus (Oxford) should start work on a Kepler light-curve exoplanet search paper, making use of the same machinery. This is a great division of labor (I hope) and might eventually bring us close to the goal of everything we have been doing with Kepler, to wit, finding Earth-like planets on year-ish orbits around Sun-like stars. Pleased.
I took a risk up at Columbia's Pizza Lunch forum by talking about the Kepler flat-field. I also was exceedingly rude and talked through Price-Whelan's spot (he was supposed to follow me). I apologize! Well, you can't say I didn't try to bore the pants off of everyone: I talked about the (novel, and exciting to almost no-one other than me) result, published in our white paper, that it is possible to infer the properties of the flat field at higher than pixel resolution.
That is, the team (meaning, in this case, Lang) made simulated data with drifting stars, a PSF that varies slowly with position (and is well understood), and no prior knowledge about how the stars in the field are drifting. We find (meaning Lang finds) that he can simultaneously figure out the pointing of the satellite and the flat-field, even when the flat-field is both created and fit with models that have multiple sub-pixels per pixel. The reason it works is that as the star moves, it illuminates each pixel differently, and is therefore differently sensitive to the different parts of each pixel. It is not clear yet whether we can do this accurately enough to recover the Kepler sub-pixel flat-field, but damn I want to try. Unfortunately, we need lots of data taken in the two-wheel mode, and (as far as I know) they aren't yet taking any new data. Kepler: Please?
Discussed MCMC convergence with Jeffrey Gertler (NYU), Bayesian evidence (fully marginalized likelihood) with Hou and Goodman, and data-science projects with Mike O'Neil (NYU). O'Neil is co-teaching a course at NYU for the new Data Science program, where the idea is that Masters students will do research projects on real research topics. Foreman-Mackey and I are happy to provide; we discussed several ideas, most of which involve the Kepler data, which we have on the brain right now. One idea is to find all the single transits and see if you can use them to place limits on (or measure!) the frequency (suitably defined) of Jupiter analogs (suitably defined). That's a great problem to post on my Ideas Blog. Hou is computing the marginalized likelihoods of various qualitatively different explanations of radial velocity data, including stellar oscillation models and multi-planet scenarios. Gertler is preparing to find exoplanets in the Galex photon (time-resolved) data.
The morning session was about dark energy, with Sawicki (Heidelberg) taking the theory side and various observers taking the data side. Highlights for me were the weak lensing talks, with von der Linden (DARK) talking about measuring cluster masses and Bard (KIPAC) talking about cosmic shear. During Bard's talk I came up with three possible short papers about weak lensing methodologies, which Marshall, Bard, Meyers (KIPAC) and various others refined over lunch:
The first paper obviates all the problems with transferring stellar PSFs to galaxies by measuring the PSF using the galaxies. LSST can do this because it takes many images under different PSFs. The second paper uses the configurational separations of features (think peaks) in galaxy images to measure shear, independently of, or in concert with, ellipticity measurements. In principle this might be useful, because point separations depend only on astrometric calibration, not PSF determination. The third is to use image-to-image variations in astrometric distortions to infer image-to-image changes in the PSF. I think these two things have to be related, no? This latter project has probably already been done; it requires a literature search.
Although there were very amusing and useful talks this morning from Bloom (Berkeley), Boutigny (CNRS), Marshall, and Wecshler (KIPAC), the highlight for me was a talk by Stuart Lynn (Adler) about the Zooniverse family of projects. He spent a lot of time talking about the care they take of their users; he not only demonstrated that they are doing great science in their new suite of projects, but also that they are treating their participants very ethically. He also emphasized my main point about the Zoo, which is that the rich communication and interaction on the forums of the site is in many ways what's most interesting about the projects.
In the afternoon, we had the "unconference" session. Marshall and I led a session on weak lensing. We spent the entire afternoon tweaking and re-tweaking and arguing about a single graphical model! It was useful and fun, though maybe a bit less pragmatic than we wanted.
Today was the second day (I missed the first) of the KIPAC@10 meeting at KIPAC at SLAC. There was a whirlwind of talks on compact objects and galaxy evolution, too many to summarize, but some highlights for me were the following:
Steiner (UW) showed neutron-star mass and radius measurements and discussed their implications for the properties of matter at extreme density. He showed some very noisy likelihood functions (ish) in mass–radius space, one per measured neutron star (and there are 8-ish with measurements) and tried to draw a curve through them. I have opinions about how to do that and he seems to be doing it right; each time we tried to discuss this over coffee something interrupted us.
Perna (Colorado) talked about magnetars; I hadn't appreciated how extremely short-lived these stars must be; their lifetimes are measured in kyr, which is not a unit you see every day. Romani (Stanford) made a pitch that Fermi-discovered gamma-ray pulsars are the bees knees. He didn't show folded light-curves but apparently there are now hundreds where you can see the periodicity in the (sparse) Fermi data. Tomsick (Berkeley) showed some outrageously awesome NuSTAR data, making me want to hear much more about that mission. It's PI is my old friend from graduate school, Fiona Harrison (Caltech), to drop a name.
Cordes (Cornell) talked about pulsar timing and gravitational radiation, a subject on which I have opinions (from a precision measurement perspective). He, like is common in that business, concentrated on the stochastic gravitational wave background; I would like to hear or think more about coherent source detection. It is usually easier! Along those lines, at one point Blandford (KIPAC) asked Aarons (Berkeley) if physical models of pulsar emission were likely to help in measurements of pulsar timing. Aarons didn't commit either way, but I think the answer has to be
yes. Indeed, I have suggested previously that modeling the emission almost has to improve the measurements.
Stark (Arizona) showed very nice new data on galaxies at extremely high redshifts. He noted that almost every result at redshifts beyond six depends entirely on photometric redshifts. That's true, but is it a concern? I guess it is because there could be interloping lower-redshift objects (or stars) having a big effect on the conclusions. Kriek (Berkeley) and Lu (KIPAC), in separate talks, showed that it is difficult to explain the evolution of galaxies in sizes and stellar populations with simple models of star formation and merging. Also, Kriek called into question the star-formation-rate estimates people have been using, which is interesting; she finds a factor-of-two-ish range in the mistakes that could be being made, and this is the same order of magnitude as the amplitude of the variation in specific star-formation rate with galaxy mass. She didn't claim that there is an error there.
In the discussions at lunch, Stuart Lynn (Adler) pitched an idea from David Harris (NAS) that we start a journal of short contributions. Marshall was all over that; it might get launched tomorrow in the unconference session.
First thing in the morning, I tweaked up our Kepler two-wheel white paper and submitted it. I am very proud, because it contains at least three novel results in astronomical image modeling and calibration. I very much hope I can keep momentum and get my team to publish these results. If you want to read the white paper (see if you can spot the three novel results), it is here.
At a very leisurely lunch, Foreman-Mackey, Fadely, Ruth Angus (Oxford), and I discussed possible projects for this month, in which Angus is visiting CampHogg. We more-or-less settled on long-period exoplanet search, with heavy use of Gaussian Processes.
I spent the whole long holiday weekend (it is Labor Day here in the US) working on our Kepler white paper, in response to the two-wheel call (PDF). Foreman-Mackey came through with some auto-regressive-like data-driven models for extant Kepler data, Michael Hirsch (UCL, MPI-IS) came through with some PSF-photometry tests for insane (drifted) PSFs, and Lang came through with simulations of toy Kepler data in the two-wheel era, along with models thereof. Lang was able to show that we can (in principle) infer a flat-field that is higher resolution than the data! That is, we can infer heterogeneous intra-pixel sensitivity variations, pixel by pixel. That is awesome, and we ought to apply it to every data set out there! We will post the white paper on arXiv after we submit it.
I had a long conversation this morning with Rix and Lisa Kaltenegger (MPIA) about how you would losslessly propagate observational noise in exoplanet observations into probabilistic judgements about habitability. We decided to take a principled (Bayesian, hierarchical) approach in setting up the problem and then to make approximations as necessary to make the calculation possible given current knowledge and technology. We worked through the problem in the form of a probabilistic graphical model, and then discussed how to translate that PGM into code. And then sample it.
In Galaxy Coffee this morning, William Bethune (Univ. J. Fourier) spoke about his project to look at reverberation mapping of a variable AGN at two-ish microns. He and his collaborators can do reverberation mapping on the dusty torus around the AGN, which was novel (to me). They can see temperature changes as well as brightness changes, which opens up new possibilities for physical modeling. The data are noisy and sparse, so the Bayesian methods of Brewer might make for better results.
In the afternoon, I worked on our white paper for the Kepler two-wheel call. All I have so far is an executive summary, no content! Unfortunately it is due in a few days.
At Milky Way group meeting I got all crazy about the fact that there are all these multi-element chemical abundance surveys starting, taking great spectra of giant stars, but there is no evidence (yet) that anyone can actually measure the detailed abundances in any giant stars, even given great data. I exhorted everyone to look at the APOGEE data, which are beautiful and plentiful and ought to be good enough to do this science. Any improvement in their ability to measure chemical abundances will be richly rewarded.
In the afternoon I spoke with Beth Biller (MPIA) and Ian Crossfield (MPIA) about possibly constraining the long-period planet distribution using single-transit objects in the Kepler data (that is, stellar light curves for which a single transit is observed and nothing else). This is an idea from the exoSAMSI meeting a few months ago. We decided that it would probably not be a good idea to do this with a single-transit catalog that doesn't also have a good estimate of completeness and purity and so on. That is, Biller and Crossfield were (rightly) suspicious when I said maybe we could just fit the completeness function hyper-parameters along with the population hyper-parameters! That said, they were both optimistic that this could work. I wrote to Meg Schwamb (Taipei) about her (possibly existing) single-transit catalog from PlanetHunters.
I spent some time discussing with Stephanie Wachter (MPIA) the Euclid calibration strategy and survey requirements. I think that Euclid (and all surveys) should do something like the "strategy C" in the paper on survey strategy by Holmes et al. Indeed, I think that strategy C would not only lead to far better calibration of Euclid than any of their current survey strategy ideas, I think it would make for more uniform solid-angular coverage, better time-domain coverage, and greater robustness to spacecraft faults or issues.
I spent the weekend completely off the grid. I didn't even bring my computer or any device. That was a good idea, it turns out, even for doing research. I got in some thinking (and writing) on various projects: I sharpened up my argument (partially helped by conversations with various MPIA people last week) that you never really want to compute the Bayes evidence (fully marginalized likelihood). If it is a close call between two models, it is very prior-dependent and isn't the right calculation anyway (where's the utility?); if it isn't a close call, then you don't need all that machinery.
I worked out a well-posed form for the question "What fraction of Sun-like stars have Earth-like planets on year-ish orbits?". That question is not well-posed but there are various possible well-posed versions of it, and I think some of them might be answerable with extant Kepler data.
Along the same lines, I wrote up some kind of outline and division of responsibilities for our response to the Kepler call for white papers related to repurpose in the two-wheel era. I decided that our main point is about image modeling, even though we have many thoughts and many valuable things to say about target selection, field selection, cadence, and so on. When I get back to civilization I have to email everyone with marching orders to get this done.
Rix and I have a side project to find streams or kinematic substructures in Milky-Way stellar data of varying quality. It works by building a sampling of the possible integrals of motion for each star given the observations, as realistically as possible, and then finding consensus among different stars' samplings. I worked on scoping that project and adjusting its direction. I am hoping to be able to
link up stars in big Halo-star surveys into substructures.
In a very full day, I learned about quasar-absorption-line-based mapping of the density field in large volumes of the Universe from K. G. Lee (MPIA), I discussed non-parametric methods for inferring the three-dimensional dust map in the Milky Way from individual-star measurements with Richard Hanson (MPIA), I was impressed by work by Beth Biller (MPIA) that constrains the exoplanet population by using the fact (datum?) that there are zero detections in a large direct-detection experiment, and I helped Beta Lusso (MPIA) get her discrete optimization working for maximum-likelihood quasar SED fitting. On the latter, we nailed it (Lusso will submit the paper tomorrow of course) but before nailing it we had to do a lot of work choosing the set of models (discrete points) over which fitting occurred. This reminds me of two of my soap-box issues: (a) Construction of a likelihood function is as assumption-laden as any part of model fitting, and (b) we should be deciding which models to include in problems like this using hierarchical methods, not by fitting, judging, and trimming by hand. But I must say that doing the latter does help one develop intuition about the problem! If nothing else, Lusso and I are left with a hell of a lot of intuition.
In a low-research day, Melissa Ness (MPIA) led a discussion in the Milky Way group meeting about the Milky Way Bulge. She showed that it very clearly has an x-shape (peanut-shape or boxy shape) and that the x-shape is more visible in the higher metallicity stars. She also showed evidence from simulations that the higher metallicity stars are more represented in the x-shape because of the phase-space distribution they had when they were excited into the x-shape orbits by the bar; that is, the metallicity distribution in the bulge is set by the excitation mechanism as much as the properties of the star formation. One thing that was interesting to me about this is that the bulge is x-shaped and the orbits are also x-shaped. That means that maybe we could just "read off" the orbits from the three-dimensional distribution of stars. Ish. That's not often true in dynamical systems. Ness's data come from a sparse sampling of the sky, but her results are about big, continuous structures. It would be great to get complete (meaning spatially contiguous) coverage (meaning spectral observations) of giant stars all over the bulge!
Today was Save Kepler Day at Camp Hogg. Through a remarkable set of fortunate events, I had Barclay (Ames), Fergus (NYU), Foreman-Mackey (NYU), Harmeling (MPI-IS), Hirsch (UCL, MPI-IS), Lang (CMU), Montet (Caltech), and Schölkopf (MPI-IS) all working on different questions related to how might we make Kepler more useful in two-wheel mode. We are working towards putting in a white paper to the two-wheel call. The MPI-IS crew got all excited about causal methods, including independent components analysis, autoregressive models, and data-driven discriminative models. By the end of the day, Foreman-Mackey had pretty good evidence that the simplest auto-regressive models are not a good idea. The California crew worked on target selection and repurpose questions. Fergus started to fire up some (gasp) Deep Learning. Lang is driving the Tractor, of course, to generate realistic fake data and ask whether what we said yesterday is right: The loss of pointing precision is a curse (because the system is more variable) but also a blessing (because we get more independent information for system inference).
One thing about which I have been wringing hands for the last few weeks is the possibility that every pixel is different; not just in sensitivity (duh, that's the flat-field) but in shape or intra-pixel sensitivity map. That idea is scary, because it would mean that instead of having one number per pixel in the flat-field, we would have to have many numbers per pixel. One realization I had today is that there might be a multipole expansion available here: The lowest-order effects might appear as dipole and quadrupole terms; this expansion (if relevant) could make modeling much, much simpler
The reason all this matters to Kepler is that—when you are working at insane levels of precision (measured in ppm)—these intra-pixel effects could be the difference between success and failure. Very late in the day I asked Foreman-Mackey to think about these things. Not sure he is willing!
I arrived at the MPI-IS in Tübingen to spend two days talking about image modeling with Schölkopf, Harmeling, and Kuhlmann. A lot of what we are talking about is the possibility of saving Kepler, where our big idea is that we can recover lost precision (from loss of pointing accuracy) by modeling the images, but we also talked about radio interferometry. On the Kepler front, we discussed the past Kepler data, the precision requirements, and the problems we will have in modeling the images. One serious problem for us is that because Kepler got its precision in part by always putting the stars in the exact same places in the CCD every exposure, we don't get the kind of data we want for self-calibration of the detector and the PSF. That's bad. Of course, the precision of the whole system was thereby made very good. In two-wheel mode (the future), the inevitably larger drift of the stars relative to the CCD pixels will be a curse (because the system won't be perfectly stable and stationary) but also a blessing (because we will get the independent information we need to infer the calibration quantities).
On the radio-interferometry front, we discussed priors for image modeling, and also the needs of any possible "customers" for a new radio-interferometry image-construction method. We decided that among the biggest needs are uncertainty propagation and quantification of significance. These needs would be met by propagating noise, either by sampling or by developing approximate covariance-matrix representations. In addition, we need to give investigators ways to explore the sensitivities of results to priors. We came up with some first steps for Kuhlmann.
In the afternoon, I spoke about data analysis in astrophysics to a group of high-school students interested in machine learning.
It was stream day today. We started with Bovy giving an impromptu lecture on a generative model for tidal streams based on actions and angles. It is almost identical to a model that I have been working on with / for Johnston and Price-Whelan, but angle-space makes it possible for Bovy to do one marginalization integral analytically that I have to do numerically. That huge benefit comes at a cost of course; the analytic marginalization requires a particular prior on stream disruption rate, and the action–angle formalism requires integrable potentials. All that said, the marginalization might be valuable computationally. During Bovy's disquisition, Robyn Sanderson pointed out that some of the ideas he was presenting might be in the Helmi & White paper from back in the day.
After this, Sanderson and I worked on our action-space clustering note. Sanderson debugged the use of KL-Divergence as a method to put uncertainties on parameter estimates; I worked on abstract and introduction text. One question I am interested in (and where Sanderson and I disagree) is whether what we are doing will work also for kinematically hot structures (thick disk, very old streams) or only kinematically cold ones. Since the method is based on information theory (or predictive value) I have an intuition that it will work for just about any situation (though obviously it will get less constraining as the structures get hotter).
At MPIA Galaxy Coffee, Bovy talked about his work on pre-reionization cosmology: He has worked out the effect that velocity differences between baryons and dark matter (just after recombination) have on structure formation: On large scales, there are velocity offsets of the order of tens of km/s at z=1000. The offsets are spatially coherent over large scales but they affect most strongly the smallest dark-matter concentrations. Right now this work doesn't have a huge impact on the "substructure problem" but it might as we go to larger samples of even fainter satellite galaxies at larger Galactocentric distances. In question period there was interest in the possible impact on the Lyman-alpha forest. In the rest of the day, Sanderson (Groningen) and I kept working on action space, and Lusso (MPIA) and I continued working on fitting quasar SEDs.
Today Robyn Sanderson (Groningen) arrived at MPIA for three days of sprint on a paper about inferring the Milky Way Halo potential by optimizing the information (negative entropy) in the phase-space distribution function. This project involves the horror of transforming the observables to the actions (something I railed against a few days ago), but the method is promising: It permits identification of potentials given phase-space information without requiring that we identify related stars or specific structures at all. The method only requires that there be structure. And of course we are looking at how it degrades as the observational errors grow.
In addition to working on this, we attended the Rix-group Milky-Way group meeting, where (among other things), Wyn Evans (Cambridge) told us about using the apocenter information we have about the Sagittarius tidal stream to infer Halo potential parameters. He gets a low mass for the Milky Way (in accord with other stellar kinematic methods, but in conflict with Leo I's high velocity; is it bound?). I had a bit of a problem with how Evans and his collaborators join up the stream to the progenitor, but that may just be a detail. Hoping to learn more later this week.
If you have a population of objects, each of which has a true property x which is observed noisily, how do you infer the distribution of true x values? The answer is to go hierarchical, as we do in this paper. But it surprises some of my friends when I tell them that if you aren't going to go hierarchical, it is better to histogram the maximum-likelihood values than it is to (absolute abomination) add up the likelihood functions. Why? Because a maximum-likelihood value is noisy; the histogram gives you a noise-convolved distribution, but the likelihood function has really broad support; it gives you a doubly-convolved distribution! Which is all to say: Don't ever add up your likelihood functions!
When you look under the hood, what hierarchical inference is doing is looking for consensus among the likelihood functions; places where there is lots of consensus are places where the true distribution is likely to be large in amplitude. Rix had a very nice idea this weekend about finding consensus among likelihood functions without firing up the full hierarchical equipment. The use case is stream-finding in noisy Halo-star data sets. I wrote text in Rix's document on the subject this morning.