2014-09-23

half full or half empty?

Interestingly (to me, anyway), as I have been raving in this space about how awesome it is that Ness and I can transfer stellar parameter labels from a small set of "standard stars" to a huge set of APOGEE stars using a data driven model, Rix (who is one of the authors of the method) has been seeing our results as requiring some spin or adjustment in order to be impressive to the stellar parameter community. I see his point: What impresses me is that we get good structure in the label (stellar parameter) space and we do very well where the data overlap the training sample. What concerns Rix is that many of our labels are clearly wrong or distorted, especially where we don't have good coverage in the training sample. We discussed ways to modify our method or our display of the output to make both points in a responsible way.

Late in the day, Foreman-Mackey and I discussed NYU's high-performance computing hardware and environment with Stratos Efstathiadis (NYU), who said he would look into increasing our disk-usage limits. Operating on the entire Kepler data set inside the compute center turns out to be hard, not because the data set is large, but rather because it is composed of so many tiny files. This is a problem, apparently, for distributed storage systems. We discussed also the future of high-performance computing in the era of Data Science.

No comments:

Post a Comment