#DSEsummit, day 2

In the morning, Katy Huff (UCB) gave an energizing talk about The Hacker Within, which is a program to have peers teach peers about their data-science (or programming or engineering) skills to improve everyone's scientific capabilities. The model is completely ground-up and self-organized, and she is trying to make it easy for other institutions to “get infected” by the virus. She had some case studies and insights about the conditions under which a self-organized peer-educational activity can be born and flourish. UW and NYU are now both going to launch something; I was very much reminded of #AstroHackNY, which is currently dormant.

Karthik Ram (UCB) talked about a really deep project on reproducibility: They have interviewed about a dozen scientists in great detail about their “full stack” workflow, from raw data to scientific results, identifying how reproducibility and openness is or could be introduced and maintained. But the coolest thing is that they are writing up the case studies in a book. This will be a great read; both a comparative look at different disciplines, but also a snapshot of science in 2015 and a gift to people thinking about making their stack open and reproducible.

I had a great conversation with Stefan Karpinski (NYU) and Fernando Perez (UCB) about file formats (of all things). They want to destroy CSV once and for all (or not, if that doesn't turn out to be a good idea). Karpinski explained to me the magic of UTF8 encoding for text. My god is it awesome. Perez asked me to comment on the new STScI-supported ASDF format to replace FITS, and compare to HDF5. I am torn. I think ASDF might be slightly better suited to astronomers than HDF5, but HDF5 is a standard for a very wide community, who maintain and support it. This might be a case of the better is the enemy of the good (a phrase I learned from my mentor Gerry Neugebauer, who died this year). Must do more analysis and thinking.

In the afternoon, in the unconference, I participated in a discussion of imaging and image processing as a cross-cutting data-science methodology and toolkit. Lei Tian (UCB) described forward-modeling for super-resolution microscopy, and mentioned a whole bunch of astronomy-like issues, such as spatially variable point-spread function, image priors, and the likelihood function. It is very clear that we have to get the microscopists and astronomers into the same room for a couple days; I am certain we have things to learn from one another. If you are reading this and would be interested, drop me a line.

No comments:

Post a Comment