2020-09-24

next-order term in k nearest neighbors

Today between zoom (tm) calls, Adrian Price-Whelan (Flatiron) and I discussed the following problem with the k nearest neighbors algorithm: If your point is near an edge or gradient in your sample, and the quantity you are predicting (your label) also has a gradient locally, then the “center of mass” of your neighbors will be offset from your target point, and therefore have an average or mean that is inappropriate to your target point.

No problem: Instead of taking a mean of your nearest neighbors, fit a linear plane to your neighbors (label as a function of position) and interpolate or evaluate that fit at the point. Awesome! This is (a simplification of) what Adam Wheeler (Columbia) and I did in our X-enhanced stars paper. The only problem is: We are doing nearest neighbors inside an optimization loop, and we need it to be very fast.

Price-Whelan and my solution: Compute the k-neighbor center of mass position, project all target–neighbor offset vectors onto the target–center-of-mass vector, and then do a linear regression in one dimension, interpolated to the target point in one dimension. This works! It's fast, and (when you flow through all the linear algebra) it looks like a really strange kind of weighted mean. Price-Whelan implemented this and it improved our results immediately. Yay!

No comments:

Post a Comment