translation, geometry, deep learning, fast GPs

Friday-morning parallel-working session was brief today. I talked to Shiloh Pitt (NYU) about verifying matrix identities using numerical methods. And then we went downstairs for a mini-workshop at NYU CCPP organized by Kyle Cranmer (NYU) and Glennys Farrar (NYU) about physics and data science.

Cranmer led it off with an informal discussion of the different language used by statistics and computer science and applied math and physics. There are lots of words used differently, or that trigger different things. He mentioned “bias” and “correlation” and the uses or meanings of graphs and flowcharts. During the talks more words came up. One subtle one is that data scientists think of a data record as a point in data space (so, say, an image is a point in image space). That isn't always natural for physicists.

Joan Bruna (NYU) gave a nice talk about the geometric properties of deep learning, keying off of the success of convolutional neural networks. He said many interesting and insightful things, but here are a few that stuck with me: The convolutional symmetry at small scales in image space aids the NN in finding a distance metric (or something like that) between images that respects symmetries or structure that is really there. And it does that tractably, or in reasonable time. He claims that any compact symmetry group can be incorporated: That is, he claims that deep learning models can be made to exactly respect any symmetry that has certain properties. That's very exciting for physical applications. Distances between nodes on a graph also represent a geometry; it can be extremely different from geometry on simple manifolds! But the same ideas apply: If there are symmetries, they can be respected by the deep learning algorithms.

Life intervened! But by the end of the day, I made it to Flatiron to see a talk by Dan Foreman-Mackey (Flatiron) about data science, interdisciplinarity, open science, and finding planets around other stars. He gave a lot of credit to his interdisciplinary collaborations. He also mentioned the kinds of translation issues that Cranmer opened with at NYU. On the technical side, he showed his Gaussian-process methods and code and the near-linear scaling that they deliver. As I like to say: If you are doing linear algebra faster than N-squared (and he is, by far) then you can't even represent your matrices. That is, building the matrix itself is already N-squared. After his talk the Flatiron applied mathematicians were in heated arguments about exactly why (in a math sense) his methods are so fast. Foreman-Mackey's code is making possible things in astrophysics that have never been possible before.


  1. We also found the linear scaling surprising so I’m glad you were arguing about it.

  2. @hogg: where's that "near" in "near-linear" coming from!?! It's just straight up linear because the overhead is negligible for anything bigger than N~100 :-)

    @gizis: the mathematicians weren't arguing about that (hogg just likes to cause trouble) - they were actually discussing which was more awesome: the fact that we get linear scaling for this class of kernels or that this kernel is a *really good model* of stars!

  3. @dfm: Deleted the "near". But I stand by my mathematicians comment; I was a witness!