Hogg's Research: totally wrong paper

2009-08-31

totally wrong paper

I read a paper today that is just about completely wrong. It is Linear regression in astronomy by Isobe, Feigelson, Akritas, Babu (1990). The paper presents five methods for fitting straight lines to data, and compares them. I think I have three objections:

First, they present procedures, and do not show that any of those procedures optimize anything that a scientist would care about. That is, they do not show that any procedure gives a best-fit line in any possible sense of the word best. Now, of course, some of their procedures do produce a best-fit line under some assumptions, but they only give those assumptions for one (or two) of their five methods. In particular, the method they advocate has no best-fit interpretation whatsoever!. Scientists do not trade in procedures, they trade in objectives, and choose procedures only when they are demonstrated to optimize their objectives, I hope.

Second, when deciding whether to fit for Y as a function of X or X as a function of Y, they claim that the decision should be based on the physics of X and Y! But the truth is that this decision should be based on the error properties of X and Y. If X has much smaller errors, then you must fit Y as a function of X; if the other way then the other way, and if neither has much smaller errors, then that kind of linear fitting is invalid. This paper propagates a very dangerous misconception; it is remarkable that professional statisticians would say this. It is not a matter of statistical opinion, what is written in this paper is straight-up wrong.

Third, they decide which of their methods performs best by applying all five methods to sets of simulated data. These data are simulated with certain assumptions, so all they have shown is that when you have data generated a certain way, one method does better at getting at the parameters of that generative model. But then, when you have a data set with a known generative model, you should just optimize the likelihood of that generative model. The simulated data tell you nothing in the situation that you don't know the generative model for your data, which is either always or never the case (not sure which). That is, if you know the generative model, then just use it directly to construct a likelihood (don't use the methods of this paper). If you don't, then you can't rely on the conclusions of this paper (and its ilk). Either way, this paper is useless.

Wow, I am disappointed that this is the state of our art. I hope I didn't sugar-coat that critique too much!

2 comments:

Anonymous01 September, 2009 14:07
this is a lifetime top 5 post.
ReplyDelete
Replies
Hogg02 September, 2009 16:22
Greg Novak (Princeton) writes me by email with the very good point that "you mention error bars as a way of making a choice between fitting as a function of x or as a function of y. One may also be concerned about selection functions. If you select targets based on x and then measure y, you may be happier to fit as a function of x since you understand your selection function along that axis, while you may not understand your selection function along the y axis."
ReplyDelete
Replies

Recent Collaborators

Adam Greenberg (Columbia)
Adam Myers (Wyoming)
Adi Zolotov
Adrian Price-Whelan (Flatiron)
Alex Malz (NYU)
Ana Bonaca (Harvard)
Andreas Küpper
Andy Casey (Monash)
Anna Y. Q. Ho (Caltech)
Anna-Christina Eilers (MPIA)
Aukosh Jagannath
Bernhard Schölkopf (MPI-IS)
Beth Willman (Arizona)
Boris Leistedt (NYU)
Brendon Brewer (Auckland)
Christopher Stumm (Etsy)
Dalya Baron (TAU)
Dan Foreman-Mackey (Flatiron)
Daniela Huppenkothen
David Mykytyn (NYU)
David Schiminovich (Columbia)
Demetri Muna
Dmitry Malyshev (Stanford)
Dun Wang
Dustin Lang (Princeton)
Ekta Patel (Berkeley)
Elisabeta Lusso (Arcetri)
Emily Griffith (Colorado)
Federica Bianco (NYU)
Fengji Hou
Hans-Walter Rix (MPIA)
Iain Murray (Edinburgh)
James Long (TAMU)
Jan Rybizki (MPIA)
Jeffrey Mei (NYUAD)
Jeremy Magland (Flatiron)
Jeremy Tinker (NYU)
Jo Bovy (Toronto)
Joe Hennawi (MPIA)
Joey Richards (Berkeley)
John Moustakas (Siena College)
Jonathan Bird (Vanderbilt)
Jonathan Goodman (NYU)
Kate Storey-Fisher (NYU)
Kathryn Johnston (Columbia)
Krikamol Muandet (MPI-IS)
Lauren Anderson
Leslie Greengard (Flatiron)
Lily Zhao (Flatiron)
Marcus Frean (Wellington)
Maria Kapala (Cape Town)
Marla Geha (Yale)
Megan Bedell (Flatiron)
Melissa Ness (Columbia)
Michael Blanton (NYU)
Mike O'Neil (NYU)
MJ Vakili (Leiden)
Morad Masjedi
Nora Eisner (Flatiron)
Paraskevi Tsalmantza
Phil Marshall (SLAC)
Rob Fergus (NYU)
Robyn Sanderson (Columbia)
Ronin Wu (Tokyo)
Rory Holmes (COM DEV)
Ross Fadely (Insight)
Ruth Angus (AMNH)
Sam Roweis (deceased)
Sarah Pearson (NYU)
Semyeong Oh
So Hattori (NYUAD)
Soledad Villar (JHU)
Stephen Feeney (Flatiron)
Steven Mohammed (Columbia)
Taisiya Kopytova (ASU)
Teresa Huang (NYU)
Tim Morton (Princeton)
Tom Barclay (NASA)