K-nearest-neighbors is a model

In my lexicon, a model is an object that has a (possibly null) set of parameters and makes predictions, in the sense that it produces a probability distribution function for the data as a function of those parameters. It also needs a few more things, including a prior PDF for the nuisance parameters (it doesn't need a prior PDF for the parameters of greatest interest, in my humble opinion). Given all this, I would have thought that taking, for each data point, the K nearest neighbors in the data space (under some metric) is not a model. But it can be, if you can cleverly convert the properties of the K nearest neighbors into a PDF for the data point. For Fadely's calibration job, and at the suggestion of Fergus, I think we can do this, and I wrote it up on the airplane home.

Because the model has very few user-tuned knobs and because its complexity and coverage grows linearly with the obtained data, the KNN model is truly non-parametric. In Fergus's terminology, it has high capacity: It can model a lot but at the same time isn't obstructed by intractable optimization or sampling.


  1. Hi David,

    Any mapping from a data set to a posterior PDF (even it is an ad-hoc invented mapping) can be thought of as the conditional part p(theta|x) of a Bayesian model in the pre-data state: p(theta, x) = p(theta)p(x|theta). To complete the specification you need the marginal p(x), what data you expect to see. If you do that, any adhockery can be given a proper Bayesian interpretation.

    It's a neat trick to think of things this way. The main disadvantage is that people often like to think about p(x|theta), and any ad-hoc method that gets turned into a Bayesian model by this trick will probably conflict with what you think p(x|theta) should be.

    - Brendon

  2. The original reference (as far as I can tell) for KNN also did density estimation. Here is a reprint of the 1951 paper:
    with a commentary on the history: