Camp Hogg (which includes Muandet these days) had lunch with David Blei (Princeton), who is a computer scientist and machine-learning expert. He told us about projects he is doing to index and provide recommendations for arXiv papers, based (presumably) on his experience with author–topic modeling. Blei is a kindred spirit, because he favors methods that have a graphical model or probabilistic generative model underlying. We agreed that this is beneficial, because it moves the decision making from
what algorithm should we use? to more scientific questions like
what is causing our noise? and
what aspects of the problem depend on what other aspects?. These scientific questions lay the assumptions and domain-knowledge input bare.
We talked about the value of having arXiv indexing, how automated paper recommendations might be used, what things could cause users to love or hate it, and what kinds of external information might be useful. We mentioned Twitter. Blei noted that any time that you have a set of user
bibliographies—that is, the list of papers they care about or use—those bibliographies can help inform a model of what the papers are about. For example, a paper might be in the statistics literature, and have only statistics words in it, but in fact be highly read by physicists. That is an indicator that the paper's subject matter spills into physics, in some very real sense. One of Blei's interests is finding influential interdisciplinary papers by methods like these. And the nice thing is that external forums like Twitter, Facebook (gasp), and user histories at the arXiv effectively provide such bibliographies.
Late in the day we met up with Micha Gorelick (bitly) to discuss our plans for the dotastronomy hack day in New York City this weekend (organized by Gus Muench, Harvard). We are wondering if we could hack from idea to submittable paper in one day.