I spent the day in an undisclosed location, working on linear regression. That is, I am working on the notation and description for linear regression for my opus on machine learning in astronomy. Mid-way through getting it all together, I started to lose faith that I even know what linear regression is—or that what machine learners call linear regression is what I call linear regression! But in my undisclosed location, I don't have a copy of Bishop!
I do have Wikipedia, however, and I spent some time there, reading different descriptions and learning new applications of the kernel trick. I think of it as some kind of “lifting” of the problem to a (far) larger space, but it can also seen as a redefinition of “proximity” or “similarity” in the data space. That makes sense, because (at base) the kernel trick is a redefinition of the dot product. Stuff to think about, and relevant to many machine-learning methods. In particular, when you apply it to linear regression, you get (more or less) the Gaussian Process.