In my lexicon, a *model* is an object that has a (possibly null) set of parameters and makes predictions, in the sense that it produces a probability distribution function for the data as a function of those parameters. It also needs a few more things, including a prior PDF for the *nuisance* parameters (it doesn't need a prior PDF for the parameters of greatest interest, in my humble opinion). Given all this, I would have thought that taking, for each data point, the *K* nearest neighbors in the data space (under some metric) is *not* a model. But it *can* be, if you can cleverly convert the properties of the *K* nearest neighbors into a PDF for the data point. For Fadely's calibration job, and at the suggestion of Fergus, I think we *can* do this, and I wrote it up on the airplane home.

Because the model has very few user-tuned knobs and because its complexity and coverage grows linearly with the obtained data, the KNN model is truly non-parametric. In Fergus's terminology, it has high *capacity*: It can model a lot but at the same time isn't obstructed by intractable optimization or sampling.

Hi David,

ReplyDeleteAny mapping from a data set to a posterior PDF (even it is an ad-hoc invented mapping) can be thought of as the conditional part p(theta|x) of a Bayesian model in the pre-data state: p(theta, x) = p(theta)p(x|theta). To complete the specification you need the marginal p(x), what data you expect to see. If you do that, any adhockery can be given a proper Bayesian interpretation.

It's a neat trick to think of things this way. The main disadvantage is that people often like to think about p(x|theta), and any ad-hoc method that gets turned into a Bayesian model by this trick will probably conflict with what you think p(x|theta) should be.

- Brendon

The original reference (as far as I can tell) for KNN also did density estimation. Here is a reprint of the 1951 paper:

ReplyDeletehttp://www.jstor.org/stable/1403797

with a commentary on the history:

http://www.jstor.org/stable/1403796