2020-07-30

bias in over-parameterized settings

In a long conversation, Soledad Villar (NYU) and I worked out the expectation values for various kinds of what statisticians call “risk” for ordinary least-squares regression. The risk is a kind of expected mean-square error, and unfortunately much of the literature is ambiguous about what the expectation is taken over. That is, an expectation is an integral. Integral over what? So we worked some of that out, and then re-derived a known result, which is that ordinary least squares is unbiased (under assumptions, and under definitions of bias in terms of expectations) when the number of data points is larger than the number of free parameters, and it is biased when the number of data points is smaller than the number of free parameters.

If you are shocked that we are considering such cases (fewer data points than parameters! Blasphemy!) then you haven't been paying attention: In linear regression, the mean-squared error (for out-of-sample test data) for OLS generically gets smaller when the number of parameters far exceeds the number of data points. Everything we were taught in school is wrong! Of course in order to find this result, you have to define least squares so that it has a well-defined solution; that solution is the min-norm solution: The solution that minimizes the total sum of squares (or something related to this) of the parameters. That breaks the degeneracies you might be fearing.

No comments:

Post a Comment