I spent the day at Radcliffe, in a small meeting arranged by Alyssa Goodman (Harvard) and Xiao-Li Meng (Harvard) on how to curate and keep data for analysis and re-analysis. Most of the discussion in this (free-form, informal, small) workshop was around the idea of meta-analysis and re-use of the data by other users. Some of the interesting ideas that came up were the following: Different people coming from different backgrounds have very different meanings for the word "model" and also many other words, including "data" and "provenance". The goals of data preservation, meta-analysis, re-use, and scientific reproducibility are all related and overlapping. Archivists and curators do best when they get involved with the data as early as possible in the "life cycle", preferably right at the original taking of the data. The concerns that arise with reproducibility and the concerns that arise with privacy (think: health data and the like) are strongly at odds.
Meta-analysis can be described in terms of hierarchical modeling (duh) and we should probably think about it that way. Meng showed some nice results on the idea of sufficient statistics in hierarchical models; specifically, he is thinking about statistics that are sufficient for sub-branches of the full model: When are they also sufficient statistics for the whole model? The range of expertise in the room—from statistics to particle physics to library science—made for a lively conversation, and many (small) disagreements. The goal for tomorrow is to write a document summarizing various things learned.