I worked more on my
position on catalogs, with some help from Lang. Here are some key ideas:
- Catalogs originated as a way for astronomers to communicate information about images. For example, Abell spent thousands of hours poring over images of the sky; his catalog communicated information he found in those images, so that other workers would not have to repeat the effort. This was at a time that you couldn't just
send them the data and the code.
- Why did the SDSS produce a catalog and didn't just release the images? Because people want to search for sources and measure the fluxes of those sources, and people do this in standard ways; the SDSS made it easier for them by pre-computing all these fluxes and making them searchable. But the SDSS could have produced a piece of fast code and made it easy to run that code on the data instead; that would have been no worse (though harder to implement at the present day).
- One of the reasons people use the SDSS catalogs is not just that they are easy to use, but that they contain all of the Collaboration's knowledge about the data, encoded as proper data analysis procedures. But here it would have been more useful to produce code that knows about these things than a dataset that knows about these things, because the code would be readable (self-documenting), re-usable, and modifiable. Code passes on knowledge, whereas a catalog freezes it.
- The catalogs are ultimately frequentist, in that hard decisions (about, say, deblending) are made based on arithmetic operations on the data, and then the down-stream data analysis goes according to those decisions, even when the real situation is that there is uncertainty. If, instead of a fixed catalog there was a piece of code that takes any catalog and returns the likelihood of that catalog given the imaging, we could analyze those decisions probabilistically and do real inference.
And other Important Things like that.