2021-04-08

best setting of hyper-parameters

Adrian Price-Whelan (Flatiron) and I encountered an interesting conceptual point today in our distance estimation project: When you are doing cross-validation to set your hyper-parameters (a regularization strength in this case), what do you use as your validation scalar? That is, what are you optimizing? We started by naively optimizing the cost function, which is something like a weighted L2 of the residual and an L2 of the parameters. But then we switched from the cost function to just the data part (not the regularization part) of the cost function, and everything changed! The point is duh, actually, when you think about it from a Bayesian perspective: You want to improve the likelihood not the posterior pdf. That's another nice point for my non-existent paper on the difference between a likelihood and a posterior pdf. It also shows that, in general, the data and the regularization will be at odds.

No comments:

Post a Comment