2023-02-25

catalogs rant

Should I write this paper?

Abstract: Observational astronomy projects often produce catalogs—of stars, galaxies, quasars, planet hosts, and so on—for use in other projects. How can we use these catalogs responsibly? The answer to this turns out to be complex; it depends sensitively on how the catalogs were made. In particular, if the catalog entries were obtained by operations on a set of (nearly) independent or separable likelihood functions, the catalog can be used in a much wider set of circumstances than if the catalog entries were obtained by operations on a posterior pdf or on likelihood functions involving important shared parameters or shared data or shared prior information. This is true no matter whether the subsequent analyses of the catalog are Bayesian or frequentist. Importantly, at the present day, many important catalogs are being made from the outputs of MCMC runs or discriminative machine-learning methods (classifications or regressions). These catalogs are very hard or even impossible to use for population studies. I demonstrate these points mathematically, and also with toy examples from comology, stars, and exoplanets. I recommend that catalogs be designed and made with the feasibility of particular end-user investigations as explicit requirements.

No comments:

Post a Comment