Fernando Perez (Berkeley), Karthik Ram (Berkeley), and Jake Vanderplas (UW) all descended on CampHogg today, and we were joined by Brian McFee (NYU) and Jennifer Hill (NYU) to discuss an idea hatched by Hill at Asilomar to build a system to scrape the literature—both refereed and informal—for software use. The idea is to build a network and a recommendation system and alt metrics and a search system for software in use in scientific projects. There are many different use cases if we can understand how papers made use of software. There was a lot of discussion of issues with scraping the literature, and then some hacking. This has only just begun.
At lunch, I visited the Simons Center for Data Analysis. I ended up having a long conversation with Christian Mueller (Simons) about the intersection of statistics with convex optimization. Among other things, he is working on principled methods for setting the hyperparameters in regularized optimizations. He told me many things I didn't know about convex problems in data analysis. In particular, he indicated that there might be some very clever and provably optimal (or non-sub-optimal) ways to reduce the feature space for the "Causal Pixel Model" for Kepler pixels that Wang is working on.