My loyal reader knows that I have been working on the fundamentals of linear regression for a bit now. Today I did some writing on this topic. Last week, Soledad Villar (JHU) and I got the point that we could write down a specific case where the limit of infinite features in a particular, carefully designed linear regression becomes exactly a Gaussian Process with a particular, carefully chosen kernel. I understand how to generalize this result partially, but not completely: Apparently this will work in an infinity of different bases, with an infinity of different weighting functions or kernels. My goal is to write something pedagogical and useful for practitioners.
Don't even need infinite bases if you're doing Bayesian linear regression. Put a Gaussian prior on the regression coefficients and then marginalize out the coefficients. You get the "dot product" covariance function (section 4.2.2 Rasmussen and Williams) Gaussian process as your prior for your unknown function and can update it using the typical GP equations to fit simple linear regression.
ReplyDelete