density estimation for classification

Ingyin Zaw (NYU) came into my office to ask about separating non-maser galaxies from maser galaxies using optical and infrared properties. We talked about support vector machines and the like but she wanted to use properly the data-point uncertainties, and the objects with missing data (some quantities of interest not measured). That got me on my soap-box:

If you want to do classification right, you have to have a generative model for your two distributions, preferably one that takes account of the individual data-point errors (which are different for each point) and missing data. This means fitting the noise-free distribution in a generalized way to each noisy data point. Then you can construct likelihood ratios of non-maser vs maser (or whatever you like) by ratioing, for any new data point, the error-convolved densities evaluated at the new datum. And, of course, we have the best code for constructing the noise-free (or uncertainty-deconvolved) distribution functions in our extreme deconvolution code base (and associated paper).

Support vector machines and neural nets and the like are great, but they work only in the never-existing situation that (a) all training data points have (roughly) the same error properties, (b) there are no missing measurements for any data point, and (c) your training data are identical in selection and noise properties to your test or untagged data. Show me a problem like this in the observational sciences (I have yet to find one) and you are good to go! Otherwise, you have to build generative models for your data. (I would say IMHO but there is no H going on.)

1 comment: