data textification?

Arfon Smith (Github) was in town for the day, as was Josh Bloom (Berkeley). We spent a good deal of the morning talking about matters of mutual interest, also with Foreman-Mackey. One idea we batted around was topic modeling for code repositories on Github. It would be so cool to find other codebases that are about similar subjects and not just in the same language. We split for a bit into pairs, with Bloom and me discussing probabilistic astrometric calibration. He has a plan to fit the focal-plane distortions in a telescope image with a Gaussian Process, which is very much aligned with current ideas at CampHogg. After Bloom left, Smith, Foreman-Mackey, and I discussed things we would like to do with or see in Github. In particular we discussed the great value of the (parody, non-serious) open-source report card that Foreman-Mackey built last year. It is valuable because it is a information-rich text-based description of Github activity; it is like a textification (as opposed to a visualization or a sonification) and it provides heterogeneous detail (in a humorous way). What else could be done like that? Somehow this all relates to evaluation and metrics. Imagine a system that could "summarize" an astronomer's full publication history, or even better full publication, hardware-building, and code-writing history.

1 comment:

  1. John O'Meara01 May, 2014 08:05

    So, in essence, every astronomer instead of giving you their CV, should be giving you their makefile.