public release of data

Productive and lively discussion at coffee today (Ben Weiner visiting from NOAO, Zabludoff, Moustakas, Blanton, myself) about good policies for the journals and large surveys to adopt in regards the public release of data and code.

My vision (with which no-one agreed) is that the journals should require all papers be released with a tarball that contains all the data and code, such that a reader can unpack the tarball, type make, and produce all of the analyses and figures for that paper! Of course there are implementation issues, but if we want to stay clamped to the manifold of repeatable science (Sam Roweis) we have no other options. I could write about this for hours, but I have an NSF proposal due. I hope to return to this later, because in the course of the discussion I discovered many new reasons to support my view.

We are very far from my vision right now, and it is hard to get there incrementally.


  1. Now that my NSF proposal is in, I started to write a polemic on this subject here.

  2. Hogg - you should set an example and do this
    for all your papers. Maybe the rest of the community will follow your example.

  3. Agreed! We did this for this paper but we didn't know where to host the tarball.

  4. There is no way to take "the man" out of the equation. The whole peer review does not stop once the paper comes out, it continues well into the future as the citations are accrued and the follow up research is being done.

    Ultimately, some people end up being more trusted by others with their research, and no tarballs can solve the issue. And, naturally, it is a non-starter due to the technical issues involved.