2015-12-02

optimization, clustering in abundance space

In a very short research day, I thought about whether it is better to run the E-M algorithm or to take derivatives and optimize with an industrial optimizer, when the optimization is of a marginalized likelihood. The standard practice in the business is E-M, but it isn't clear to me that this would beat taking the derivatives. It is expensive to take derivatives, but E-M is usually slow. I realized I need to know more about these matters.

In conversation with Ness about abundance space, we realized that we should do some exploring to see if the abundances returned by The Cannon show more structure than those returned by other pipelines acting on the APOGEE data. I suggested t-sne for exploratory data analysis, and also looking at random projections. Although we don't yet know whether we have novel or informative structure in abundance space, the open clusters look like they do stand out as islands!

3 comments:

  1. You may already know about these:
    http://www.cs.nyu.edu/~roweis/papers/emecgicml03.pdf
    http://www.cs.nyu.edu/~roweis/papers/uai2003draft.pdf

    ReplyDelete
  2. I should know these, of course, but didn't. Thank you!

    ReplyDelete
  3. As for t-sne and random projections, I just stumbled onto this lib which might be of use: https://github.com/clementfarabet/manifold . Altough the scikit-learn implementations might be sufficient.

    ReplyDelete