K-nearest-neighbors is a model

In my lexicon, a model is an object that has a (possibly null) set of parameters and makes predictions, in the sense that it produces a probability distribution function for the data as a function of those parameters. It also needs a few more things, including a prior PDF for the nuisance parameters (it doesn't need a prior PDF for the parameters of greatest interest, in my humble opinion). Given all this, I would have thought that taking, for each data point, the K nearest neighbors in the data space (under some metric) is not a model. But it can be, if you can cleverly convert the properties of the K nearest neighbors into a PDF for the data point. For Fadely's calibration job, and at the suggestion of Fergus, I think we can do this, and I wrote it up on the airplane home.

Because the model has very few user-tuned knobs and because its complexity and coverage grows linearly with the obtained data, the KNN model is truly non-parametric. In Fergus's terminology, it has high capacity: It can model a lot but at the same time isn't obstructed by intractable optimization or sampling.


occupying LIGO

I spent the afternoon with the LIGO team at Caltech. I got the invitation from Roy Williams (Caltech) after I tweeted (yes, tweeted) my disappointment with the LIGO data release policies. So I spent half my talk discussing the costs and benefits of open data sharing. The discussion in the room was very lively, with a range of opinions but none of them very far from my own: Everyone sees data release as hugely beneficial but also very costly. Advanced LIGO is required to do data release and the team is thinking about how to stage and structure it. Much of the discussion we had was about the salient differences between astronomy observatories and missions and physics experiments; the crucial point that LIGO has no sources yet is key and relevant. And a big issue for LIGO is lack of agreement both within LIGO and between LIGO and its partner projects about what is appropriate.

As for current LIGO data: The team is attempting to build a lightweight MOU structure to permit outside access to data without jeopardizing their core mission. I discussed what that would look like and promised to get back to them with a proposal if anyone on my team wants to fire up some analysis. Perez-Giz: I am thinking of you.

[note added later: PDF slides from my talk available here.]


Spitzer precision

I spent the day at the Spitzer Science Center, where I continue to serve on the Spitzer Oversight Committee. Perhaps it isn't research but I sure do learn a lot at these meetings. Among the many remarkable things I learned is that in the Warm Mission, the Spitzer IRAC instrument continues to get more precise with time. This is unusual for an aging, distant space observatory! The reason is that exoplanet observers are feeding systematic noise information back to the instrument team, who are working to implement hardware, software, and observation-planning changes. Right now the instrument is working at tens-of-micromagnitudes precision (parts in 105), which has permitted it to measure transits of Earth-sized exoplanets.

Over dinner, we discussed my long-term fantasy of hijacking a decommissioned satellite (any satellites coming to end-of-life in the next few years?) and setting it up as an autonomous, intelligent observer, saving on operations and telemetry by having it phone home only when it has something interesting to say.


Bayesian point estimates?

Phil Marshall and I spent the day talking about weak and strong lensing. We sanity checked with Fadely our ideas about finding strong lenses in multi-epoch ground-based imaging like PanSTARRS and LSST. For lunch, we took the advice of 4ln.ch. With Foreman-Mackey we discussed an interesting idea in this Lance Miller paper: If you have N independent measurements of x and you have a correct likelihood function and a correct prior PDF for x, the (expected value of the) mean (over data points) of the N posterior PDFs for x is the prior PDF for x. That is either obvious or surprising. To me: surprising. They use this fact to infer the prior, which is a super-cool idea but I expect to be able to beat it dramatically with our hierarchical methods. In the airport at the end of the day, I contemplated what I would say to the LIGO team at Caltech.

ps: Though that Miller paper is excellent and filled with good ideas, I have one quibble: They have an estimate of the galaxy ellipticity they call Bayesian because it is the posterior expectation of the ellipticity (they even use the word Bayesian in the title of their papers). The Bayesian output is the full likelihood function or posterior PDF not some integral or statistic thereof. A point estimate you derive from a Bayesian calculation is not itself Bayesian; it is just a regularized estimate. Bayesians produce probability distributions not estimators! Don't get me wrong: I am not endorsing Bayes here; I am just sayin': Point estimates are never Bayesian, even if Bayes was harmed along the way.


Marshall, hierarchical

Phil Marshall showed up for two days and we caught up on our various projects. The one we are most on about is Yike Tang's (NYU) to bring hierarchical inference to weak lensing. We talked about strategy for that, and the competition. Comparison to other methods is daunting for a variety of reasons: We expect to be better than other methods when the data do not precisely match the model; that is, when the likelihood function or noise model is wrong in important ways. All the current tests of weak lensing technology are on fake data, for which the likelihood function is never wrong (or rarely wrong or not wrong in interesting ways).


blind calibration

In the usual form of astronomical self-calibration—like what we did to get the flats in SDSS or what I wrote up with Holmes and Rix—you use repeated observations of the same stars at different focal-plane positions to determine the sensitivity as a function of focal-plane position. Today at computer-vision-meets-astronomy group meeting, Fadely showed that he has the potential to calibrate the sensitivity of a device using only the statistical properties of observed image patches. He doesn't need to even identify sources, let alone determine when the same source has been re-observed. It is still early days, so we don't know the final precision or how it depends on the number of images, number of sources in those images, PSF sampling, and so on, but it looks very promising. In particular, if we pair it with the more traditional self-calibration it could be extremely precise. Our goal: Obviate the taking of (always questionable) flat-field images. This is also something I enjoyed discussing two weeks ago with Suntzeff (TAMU) when I was out in Kansas. (He thought I was talking crazy.)


string theory, QCD, and weak lensing

In the brown-bag talk today, Victor Gorbenko (NYU) discussed the connection between string theory and QCD-like Yang–Mills theories. The connection is that strong nuclear forces (mediated by gluons) have field confined to thin tubes when quarks get widely separated; string theory seems like a fit. This is an old program which was abandoned a while ago, but Gorbenko and collaborators have shown that it might work, if you add to the string theory some additional axion-like particles.

In what little research time I got today, I organized thoughts and process for Yike Tang's (NYU) project to do weak lensing with hierarchical inference. By inferring the ellipticity distribution along with the local shear, he can potentially increase the angular resolution and signal-to-noise of any shear map just by software improvements alone (over standard practice).


supernovae and stars

Two great talks today, one by David Nidever (Michigan), who is one of the key people in the SDSS-III APOGEE project, and one by Or Graur (AMNH, Tel Aviv), who is a supernova prosepector. Nidever has been crucial in making the APOGEE spectral reductions precise. He has them so precise that he can actually discover exoplanet and brown-dwarf companions to main-sequence and giant stars. He talked about various constraints on or understandings of the accretion history of the Milky Way, including quite a bit about the puzzling Magellanic stream. What's so crazy is that it has no stars in it. Although perhaps not as crazy as I first thought when I started to think about starless HI structures connecting nearby galaxies like M81 and M82.

Graur talked about supernova searches in data sets (like CLASH) that were designed to find supernovae and also data sets (like SDSS-III BOSS) that were not. In the latter, he has made a very sensitive automated search in the spectra of the luminous red galaxies and found 100-ish type-Ia supernovae. This yield is much lower than you might expect (from the duration and rate of supernovae) but makes sense when you include the finite fiber size and signal-to-noise. He made a very strong point that many astronomical surveys can also be SNe surveys at almost no additional cost. That's good for our long-term future discounted free cash flow.


web programming, MW interactions

Schiminovich and I met at one of our undisclosed locations (hint: good coffee) to work on GALEX calibration using SDSS stars. I got Foreman-Mackey to help with some web programming: We need to query the SDSS-III CAS from within the program if things are going to be efficient. That now works!

After this we went to the Columbia Astronomy Library to watch Maureen Teyssier (Columbia) speak about comparisons between the Milky Way and its environment and comparable locations in numerical simulations. She finds that she can tentatively identify a few galaxies that are near (but not in) the Local Group that probably, in their past, punched through the dark-matter halo of the Milky Way! Her findings resolve a puzzle in the study of dwarf galaxies: Most isolated dwarf galaxies—galaxies that are outside the virial radius of any luminous galaxy—have abundant cold gas. There are a few exceptions near the MW, and many of these end up in the Teyssier analysis to be punch-through galaxies. That is great and opens up some ideas about measuring the local dynamics on scales larger than the virial radius. This is all related somehow to Geha results I have mentioned previously.


third-order statistics, and more statistics

Bovy came into town for the morning and we talked various things, including skewness of quasar variability. He finds a signal that is slightly hard to explain. Our discussion segued into the applied-math-meets-astronomy meeting, with Goodman giving us hope that it might be possible to treat the density field in the Universe as a Gaussian process. When Bovy and I last talked about this problem (with Iain Murray of Edinburgh), we decided that the matrices were too large to invert. Goodman thinks that things may have evolved. At that same meeting, Hou showed us how he is refining the levels in his nested sampler. That seems to be enormously increasing the precision of his results; he has succeeded in building a precise nested sampler and we can see places where we can make further contributions to improve precision long-term. This is exciting and opens up new opportunities for our exoplanet projects.


conditional vs joint probabilities

At computer-vision-meets-astronomy we discussed Fadely's image-patch modeling system, and how to use it for calibration. We had a long discussion about the following issue: If you have a patch that is a good fit to the model except in one pixel (where, maybe, calibration parameters are wrong), should you look for that badness of fit in the total joint likelihood of all the pixels or in the conditional likelihood of the one pixel given the others? Intuitions diverged; in principle there is one joint PDF that contains all the information, so it doesn't matter. But that ignores the point that the PDF we have fit is not in any sense accurate! So we reduced scope to a very simple functional test of this point.


baryon acoustic feature

It's a feature, people, not an oscillation. Jeremy Tinker (NYU) gave a great brown-bag today at the blackboard about the measurement of the baryon acoustic feature in the Lyman-alpha forest at redshift 2.3 from SDSS-III BOSS. This is a great piece of work, to which my only significant contribution was quasar target selection with Bovy. The baryon acoustic feature position is consistent with the standard model of cosmology (unfortunately) but the result is beautiful and represents a lot of work by a lot of people. Congratulations, BOSS: A great project making fundamental discoveries (and on budget and on time). I spent the afternoon talking radio with Van Velzen (Amsterdam) and Atlas with Mykytyn and Patel.


the little APS on the prairie

I spent the day at the APS Prairie Section meeting in Lawrence Kansas. Two (of many) talks that made an impression on me were by Corso (Iowa), who described the calibration procedures for a new generation of CMS photo-multiplier tubes, and by Mohlabeng (Kansas), who showed that there is a statistically significant color--redshift term in the public supernovae sample. That latter result got people—including Suntzeff (TAMU) and Rudnick (Kansas)—talking about selection effects. Of course even if the effect is from selection, it needs to be modeled; modeling the effect does seem to change inferences about cosmological parameters. In the post-banquet spot, I talked about the challenges of doing inference with big data sets (slides here in PDF). That led to conversations late into the evening at the bar. Thanks, APS Prairie Section!


sampling in catalog space

The only small bit of research I got done today was writing some text for Brewer's masterpiece on sampling in catalog space. Given an image of the sky, he can sample the posterior PDF over catalogs, along with the posterior PDF for hyperpriors on the number–flux relation of stars. By using nested sampling he doesn't get killed by the combinatoric degeneracies that arise from the catalog re-ordering degeneracy. Even better, he models the stars that are too faint to cleanly detect. Even better, the different catalogs in the sampling have different numbers of entries; that is, this is sampling at variable model complexity. Congratulations Brewer!


Bayes factors, rapid transit

I am so interdisciplinary! I have computer-vision-meets-astronomy every Tuesday and applied-math-meets-astronomy every Wednesday. At the latter, we continued to discuss non-trivial sampling, including nested sampling with the stretch move and other methods for computing the Bayes integral. One idea we discussed was one mentioned by VanderPlas (UW) last week: Just straight-up density estimation based on a sampling followed by a mean-over-sampling of ratios between the estimated density and a product of likelihood times prior. This should (perhaps with high variance) estimate the integral (normalization). This is brilliant and simple, but we expect it won't work in very high dimensions. Maybe worth some testing though. I think VanderPlas discusses this in his forthcoming textbook.

In parallel, Foreman-Mackey has been preparing for a full assault on the Kepler data by getting ready a planet transit code. We are trying to be more general (with respect to limb darkening and the like) than the enormously useful and highly cited Mandel and Agol paper (a classic!) but without sacrificing speed. I had a brilliant insight, which is that any case of a disk blocking part of an annulus is just a difference of two cases of a disk blocking part of a disk. That insight, which could have been had by any middle-schooler, may speed our code by an order of magnitude. Hence rapid transit. Oh, have I mentioned that I crack myself up?



While America voted, Fergus, Krishnan, Foreman-Mackey, and I discussed with Fadely his self-calibration results. He is trying to infer the calibration parameters (dark and flat for every pixel) in a detector without any calibration data, just using science data. The idea is to build a statistical model of the data and then find the calibration parameters that make that model most likely; when calibration is good, the data will be more informative or more compact in the calibrated data space (we hope). Right now performance is looking bad. I'm still optimistic, of course! Krishnan pointed out some issues, one of which is that generative models are not always highly discriminative and this project lives somewhere in the intersection of generative and discriminative. The distinction between generative and discriminative has rarely come up in astronomy. At the same meeting, we discussed Fergus's direct detection successes: He has spectra of some exoplanets! More soon, I hope.


audio in the city, black hole populations

Mike Kesden (NYU) gave the brown-bag talk today, about black-hole–black-hole binary population synthesis in preparation for gravitational radiation detection with advanced LIGO. The principal pathway for making LIGO-relevant BH–BH binaries is an insane combination of mass transfer, supernova, common-envelope evolution, supernova, and inspiral, but hey! Kesden argued that it really is likely that advanced LIGO will see these.

In the morning I met with Oded Nov (NYU Poly) and Claudio Silva (NYU Poly) to discuss possible funding proposals related to engineering, science, and citizen science. We came up with a kick-ass idea for using smart phones and angry residents to map the audio response of the city to car alarms (really any noises, but car alarms are nice standard sirens for calibration). The project would look a lot like the (non-existent) Open-Source Sky Survey but build a three-dimensional audio-response model of the city. Cool if we did it!


exoplanet significance

Fergus and I got together to discuss how to present the significances of the exoplanet detections he is making with the P1640 coronographic spectrograph. I opined that if the model we are using was truly generative—if the model included a quantitative likelihood or probability for the data given the model parameters—then there would be a few straightforward choices. The model is very close to meeting these criteria but not quite there. We discussed relevant hacks. On the side, we started a discussion of imaging using masks, which is a branch of astronomical instrumentation where we might be able to have an impact using non-trivial imaging priors.


cross validation

VanderPlas (UW) and I continued working on the cross-validation and Bayes integral comparison. I am optimistic because both quantities have exactly the same units (or dimensions). I am pessimistic because the Bayes integral involves the prior and the cross-validation doesn't. VanderPlas has some conditions under which he thinks they could be the same, but neither of us is convinced that those conditions could really be met in practice. I am wondering if they could be met in a very successful hierarchical analysis. Anyway, this is all very vague because we don't have a clear example. By the end of the day we nearly had a document and we had a plan for an example problem where we can work everything out explicitly and (more or less) exactly.


hurricane science

The needs of a family living 18th-century-style in the post-apocalypse weighed heavily today, so not so much research got done. However, VanderPlas (UW) and I had a breakthrough on our cross-validation vs Bayes discussion. Now we are desperately trying to finish a Hurricane paper before the electricity comes back on.