I worked through a chunk (but not all) of my old email at an undisclosed location (hint: I got a sun burn) and then we played hooky at Muir Woods. That ain't research!
I spent a good chunk of the day at Stanford, chatting with Blandford and Strigari. Blandford had lots of good thoughts to contribute to my general ideas about how one might build empirical (data-driven) and yet physically interpretable models of stars from enormous amounts of high signal-to-noise, high resolution spectral data (like we have in APOGEE). In particular, he pointed out that we don't have to ignore what we know about atomic physics and quantum mechanics when we do it! Strigari is thinking about the Oort cloud and the comets that allegedly fill it: Do they really have to be in a fluffy cloud around every star, or could they instead be in a space-filling population not bound to any star? Or a mixture of the two? Radical! And, he hopes, testable.
I wrote the introduction for my paper with Joey Richards (wise.io). It is all about how to use the
null values you get in standard multi-epoch imaging surveys when a source is undetected at some epochs. We are working on the (common) case in which you get no detection and you also have no (easy) way to reconstruct the detection threshold or any upper limit information. If you ever thought such null values would be useless, and that they ought to be discarded, it turns out you are wrong (although the use of them is not trivial).
I spent the day at wise.io headquarters, hacking with Joey Richards. We worked on our insane robot model paper. Over lunch, we had a very productive discussion about the future of machine learning and statistical methods in astrophysics. Various points that came up include: Fundamentally, astronomy is an unsupervised problem, because the goal is to discover new things and new data are always different from old data. It is exceedingly valuable to have a causal model because then you can incorporate the things that you know about data generation (especially noise models) into your inference. It is important to keep nuisance parts of the model as flexible as possible because you don't want to impose structure that isn't there, nor do you want to prevent the machine from finding hidden or unknown structure. The causal-model objective and the flexible-model objective conflict a tiny bit, because the causal requirement usually limits your freedom. If Richards and I disagree on anything, it is where to set the boundary between freedom and enforced causality. Richards loves my idea of generating a completely data-driven stellar spectroscopy model that is good enough to do chemical tagging. We promised to start that discussion after this week. Great lunch and a great day!
In a low-research day in an undisclosed location in California I wrote a bit in my documents about model selection, to bolster the points that I thought were obvious but the statistics school students found surprising.
Fouesneau made progress this week—and showed some results today—on a challenging project: Take a grid of stellar spectra (a grid in mass, age, reddening, metallicity, and so on), find groups of spectra that are (from our perspective) identical, combine them, and modify the code to deal.
Identical from our perspective means
cannot be distinguished with percent-level six-band photometry in the PHAT bands.
Modify the code to deal means
do all the index-wrangling so that we can do our raw chi-squared calculations on the minimal set of model spectra, but do all our probabilistic inference on the full set. Well not the minimal set, since finding that is (probably) NP-hard. Anyway, it is a non-trivial search problem (we are using ball trees!) and it is a non-trivial code change, but Fouesneau is close.
Over after-work drinks, Foreman-Mackey, Fouesneau, Weisz, and I discussed the (different) fitting project in which we combine spectroscopy and photometry. The question of weighting came up: How do we relatively weight the spectroscopy and the photometry, given that there are thousands of spectral pixels but only a few bands of imaging? The answer is: You don't! Weighting the chi-squared values before combining them is like taking the likelihood to a power, which makes little sense. The concerning issue is that we don't trust the spectroscopy as much as the imaging, so the larger number of pixels is disturbing (spectroscopy always dominates the chi-squared calculation). My answer is that you have to deal with that lack of trust by complexifying the model. The flexible spectrophotometric calibration functions we are fitting along with the spectral properties (see yesterday's post) parameterize our distrust, and also effectively downweight the spectra in their importance in the combined fit: A good chunk of the spectral information is being drawn away from the spectral properties and on to the nuisance parameters.
Weisz and I pair-coded, with some help from Fouesneau and Foreman-Mackey, an extension to Weisz's spectral fitting program that will simultaneously fit spectroscopic and photometric data on stellar clusters. The idea is that a spectrum contains far more information about the stellar population than a few photometric points, but a typically taken, calibrated, and reduced spectrum has serious systematics in the spectrophotometric calibration. How to use these data responsibly?
Our generative model (or maybe causal model?) is that the spectroscopic data are good, but multiplied by an unknown smooth function of wavelength representing spectrophotometric wrongness. We put in a flexible (cubic spline) model for this function, and fit (with emcee) the cluster spectrum and the spline function (which enters multiplicatively). We got something working and now Weisz is running it overnight. In the end, if this model has the right flexibility, the overall spectral shape information will come from the photometry, while the line information will come from the spectroscopy.
The SOP in this business is to fit and divide out a continuum, from the spectrum and the spectrum models. That's a good idea, but it isn't the Right Thing To Do (tm) if you think your models might have certain kinds of problems. It also performs badly when no part of your spectrum is clearly continuum and you might have small resolution differences between model and data.
Conversations continued with Fouesneau and Weisz; today they were about the evolution of stellar clusters. Fouesneau is working on using observations of clusters of different masses and ages to constrain models of cluster evolution and dissolution. It looks like at early ages (less than 0.1 Gyr), the cluster population is consistent with cluster conservation, but at old ages (greater than 1 Gyr), there must be cluster destruction or dissolution. We wrote down a probabilistic model for this process and a plan for how it could be inferred at the photometric-catalog level (rather than the inferred masses and ages level). Going to the photometric-catalog level permits inclusion of non-approximate completeness functions.
At one point in the conversation I fired up my
don't co-add your posterior pdfs rant. If you have a bunch of posterior pdfs, one per object (one per cluster, in this case, in the mass–age parameter space), what is your best estimate for the true distribution in the parameter space? It is not the coaddition of the posterior pdfs. Perhaps it is counterintuitive, but it is better to histogram best-fit values than it is to co-add pdfs. The Right Thing To Do (tm) is to perform a hierarchical analysis (as in this paper), but that's expensive and non-trivial. Fundamentally, adding up pdfs is never a good idea. I think maybe I need to write a Data Analysis Recipes on this.
For some reason these days, I keep getting asked about running MCMC in situations where the model is only defined on a discrete grid. Answer: No problem! You can either run MCMC also on that grid (with discrete proposal distributions) or else you can run MCMC in a continuous space, but snap-to-grid for the likelihood calculation (and then snap back off when you are done). Things got a bit hairier when the PHAT team (Weisz and Fouesneau are in town for a code sprint, Gordon was on the phone) were asking about the same but with non-trivial priors and exceedingly non-uniform model grids. So I decided to write down the full answer. I didn't finish by the end of the day.
It being spring break, Price-Whelan also spent a
spa day down at NYU, to re-start our project on co-adding (or, really on not co-adding) imaging data. We are showing that photometric modeling (or measurement) in unstacked data beats the same in stacked data, even for sources too faint to see at any epoch. That is, you might need to stack the data in order to see the source, but you don't need to in order to detect or measure it. Worse than
don't need to: You get more precision by not stacking. Duh!
Discussed shapelets with MJ Vakili (NYU) who is looking at fitting large numbers of galaxies in SDSS and CFHT imaging. I encouraged him to look at building a probability density function for galaxy morphologies, which is a project I have wanted to do for years: For one, this is the prior pdf that is needed for weak lensing and other studies. For two, it should have complex non-trivial topological structure because of the viewing-angle dependencies of the observed two-dimensional morphologies.
In the morning, Fergus and I discussed the various ongoing projects. At one point in the discussion we marveled at the value of factor analyzers and the remarkable situation that they have never been used in astrophysics. If anyone knows an exception, I pay beer for first example. Factor analyzers might be great—or mixtures of them—for Vakili's project.
Greg Dobler (UCSB) gave the astro seminar today, on the WMAP, Fermi, and now Planck haze at the center of the Milky Way. My take-away is that, energetically, it is not hard to explain, but specifically it has morphological features that are hard to explain. That's like so damned much in astrophysics!
Not much other work got done, except I shocked (shocked!) the crowd at stats school by saying that it is almost never the case that you want to select models based on chi-squared per degree of freedom. You can show that your models are bad fits that way, but you can't really choose among models that way. It is related to Gould's intemperate comment about accuracy vs precision: Model selection is a precision question, not an accuracy question.
I prepped for my first class in our stats for grad students series by making up some notation for leave-one-out cross-validation, for which I am going to advocate in situations where you need to make a decision and you don't have any serious thoughts about your utility. It surprises many that I am against using marginalized likelihood (Bayes evidence). But I am against it because that is what you would compute in order to mix models, not to decide between them! Also, it is strongly dependent on your priors, when anything you want to use for deciding should be strongly dependent on your utility. And it is super-hard to compute in most circumstances. And so on.
David Russell (Canarias) gave a seminar today on the relationships between microquasars and quasars, and what those things might tell us about how jets are powered. He showed some outrageous phenomenology, including beautiful jet–plasma interactions and a
fundamental plane of accreting (hard state) black holes.
On the subway up to an undisclosed location in the West 70s, I wrote text for my Atlas of Galaxies. At the location, Schiminovich and I discussed next steps in the GALEX Photon Catalog project. We also discussed a proposal that Schiminovich is writing for a balloon-borne instrument that deals with variable conditions by making non-trivial decisions in real time in response to real-time measurements of transparency, sky brightness, and field-of-view (which can be hard to control on balloons). Obviously, a custom-built on-board lightweight implementation of Astrometry.net would be useful!
Meetings regarding a Sloan Foundation visit to NYU, discussions of graduate advising, first shot at teaching assignments for next year, and a long discussion of NYU's vote of no confidence in its President. None of that counts as research! A few moments of research from Annie Preston (Haverford), who is trying to figure out VVDS spectroscopy in support of our star–galaxy classification projects.
Yike Tang gave Marshall (participating from afar) and me an update on the hierarchical weak lensing projects he has been working on. We had various realizations during the update, one of which is that Yike should be doing maximum-marginalized-likelihood when he requires an estimator and passing forward full likelihood information when he doesn't. We asked him to re-tool to this path, started a template paper manuscript, and, late in the day, I wrote an abstract for the project. We are a long way from having a publishable paper, but Tang's results are really tantalizing: He gets great performance out of the hierarchical machinery.
In somewhat bigger (but less personal) news, Oppenheimer (AMNH) lifted the embargo on our exoplanet project: With the help of Rob Fergus (and cheerleading by me), Oppenheimer's P1640 instrument has successfully taken near-infrared spectra of the four companions (young planets) of HR8799. The spectra show evidence of various temperatures and compositions. They were extracted with data-driven-model magic from Fergus, plus a lot of great hardware and adaptive-optics engineering by the P1640 team. The paper is on arXiv here and got some press, including this piece by Caleb Scharf (Columbia).
In a low-research but high-talking day, the morning stats lecture was again Tinker (NYU) but this time talking about MCMC. He talked about burn-in, tuning, convergence, and use of the chains. He reminded Foreman-Mackey and I that we really have to write up our tutorial on all this. Not much else got done today.
Peter Teuben (Maryland) came in to NYC for a few hours to discuss radio interferometry. He got me to dust off some of the things I was writing about it last summer. We tried to sketch out a road-map from bootstrap tests of radio map-making to likelihood tests to regularized likelihood optimization.
While stuck on an airplane in Chicago, I wrote s short note for Genevieve Graves (Princeton), in answer to a question from her about what I would do instead of stacking data, especially if I wanted to see the intrinsic variance among the data as well as the mean. My proposal is related to a Factor Analyzer, which is like a probabilistic form of Principal Components Analysis (and therefore like HMF). I also wrote a bit in my Atlas of Galaxies, which is becoming a bit of an albatross. I am not good at commitment (when it comes to projects, anyway).
I had a great trip to Northwestern University today, where I visited the CIERA, which is an interdisciplinary center for engineering and applied math and computer science and astrophysics. I talked about hierarchical inference.
Many of my conversations were about parameter estimation and inference using MCMC, where the group, including Will Farr, Ben Farr (no relation), and Vicki Kalogera have been doing pretty radical MCMC things for LIGO and other projects. In particular, Ben Farr told me about generalizations of parallel tempering that are extremely relevant to things Goodman and our NYU MCMC group have been discussing; it might make us infinitely powerful (or an approximation to that). Will Farr shares my love of "detecting the undetectable": He showed that you can infer the properties of stellar clusters even when no member is identified at high probability, which is one of the punchlines of the work I did (so very long ago) with Koposov and Rix. The Farrs also failed to dissuade me from (encouraged me towards, actually) signing an MOU with LIGO and doing some noise modeling.
Nick Cowan told me about mapping exoplanets using photometric variations and—in transiting cases—details of the lightcurve at ingress and egress. This is very related to the computer vision projects we were talking about this summer. I gave him this insane paper. He explained to me something I hadn't thought of at all before: The obliquity (angle between the rotation axis and the orbital plane) and the observational inclination (angle of the orbital plane to the line of sight) both enter in what you can see, so in principle you can measure obliquities (poorly) just by looking at phase curves in reflected light!
Late in the day, Meagan Morscher and Laura Trouille told me about their work on computational thinking, which is something I ought to muse about on the teaching blog.
One of the big problems with using Kepler data is that the data have large variations (well, tiny variations, but variations much bigger than a transit depth) caused by stellar variability and spacecraft sensitivity or pipeline issues. Starting on Friday, and continuing today, Foreman-Mackey worked on spline fitting along with our hometown favorite, iteratively re-weighted least squares. IRLS is much better than sigma-clipping, because the outlier data are down-weighted continuously (not rejected), and every data point, no matter how bad, has some influence on the fit. It is also an approximation to a probabilistic fit. (This is all being done prior to the probabilistic transit fitting, however, so it is still a hack.) The model we use for the variability is a cubic spline with knots every three days. In addition, today, we coded up a robust heuristic for identifying break-points where there are sensitivity discontinuities, where we need to add spline knots to follow the action. Our heuristic involves a matched filter applied to the re-weighted residuals (away from the spline fit) output by the IRLS. Anyway, it works really well and is fast (once we put it into Fortran).
After statistics school with Jeremy Tinker (NYU) on bootstrap, jackknife, and covariances, and after an arXiv coffee in which Foreman-Mackey summarized the (excellent) recent paper by Dressing & Charbonneau, and after a great astro seminar by Kathryn Johnston (Columbia) in which she showed that we can find shells around the Milky Way, both in position and velocity, and after a long discussion with Johnston and Price-Whelan and Foreman-Mackey about how we should make probabilistic comparisons between theory and observations in kinematic studies of the Milky Way and its substructure, Fadely and I finished and submitted our HST Archival Calibration proposal. I love my job, and I love Fridays at the NYU CCPP! Fadely made a really useful figure for the proposal today, which also involved doing an all-new toy inference. Clutch play!