2012-08-29

analytic derivatives

On a packing-and-travel-saturated day, research accomplishment was thin, although on the plane I took the first steps towards putting analytic derivatives into our high-resolution Herschel dust emissivity model. The derivatives should enormously speed optimization, if Fergus and Krishnan are to be believed.

1 comment:

  1. Advice I often give to students (probably not news to you):

    Make sure your gradient computations aren't costing more than they should. Consider any scalar function of D variables, that has cost C to evaluate. There's theory that says you can get all D derivatives for a cost less than about 5C, the cost is not O(CD) as numerically perturbing each input would cost. The key is to "backpropagate" the derivatives as is commonly done in neural networks, or reverse mode automatic differentiation.

    Gradient based optimizers are much better than things like Nelder–Mead. You can get a whole tangent hyperplane for about the cost of evaluating the scalar cost, so it would be silly not to use this high-dimensional information.

    Finite differences should be avoided, both for the O(CD) cost, and because they can be numerically unstable. However, always test your gradient code against finite differences. Optimizers get very confused if your gradients are wrong, and they're easy to mess up.

    ReplyDelete