huge data

Today was the second Big Data Lunch at NYU, hosted by me and Heather Stewart of NYU Information Technology Services. The speaker was Kyle Cranmer (NYU), who talked about the LHC, which is truly big data. His talk was great, and started a tiny bit of a discussion about what would constitute truly usable shared, high-performance compute infrastructure for those of us who analyze data. For most of us, common facilities are not that useful in their current form, because we are very constrained in OS, file systems, data transfer, software installation, and other issues. Indeed, Cranmer is not allowed to publish results if he does not use particular versions of operating systems and software! Cranmer advocated an elastic system like Amazon's EC2, with virtualization that would permit arbitrary OS and software installs; I am agnostic, but I agree that we need to do something other than just build huge multi-core systems if we are going to facilitate computational science that goes beyond simulations.

No comments:

Post a Comment