2018-10-18

convexity in machine learning

Thursdays are low-research! But there was a great NYU Physics Colloquium at the end of the day by Eric Vanden-Eijnden (NYU) about the mathematical properties of neural networks. I would say “deep learning” but in fact the networks that are most amenable to mathematical analysis are actually shallow and wide.

I am not sure I fully understood EVE's talk, but if I did, he can show the following: Although the optimization of the network (which is a shallow but wide fully connected logistic network, maybe) is not in any sense convex, and although the model is non-identifiable, with certain (or any?) convex loss function, and with enough data (maybe), the optimum of the loss is convex in the approximation of the model to the function it is trying to emulate.

If anything even close to this is true it is extremely important: Can an optimization be non-convex in the parameter space of a function but convex in the function space? I am sure there are trivial examples, but non-trivially? This might relate to things I have wondered about bi-linear models and related, previously.

No comments:

Post a Comment