totally wrong paper

I read a paper today that is just about completely wrong. It is Linear regression in astronomy by Isobe, Feigelson, Akritas, Babu (1990). The paper presents five methods for fitting straight lines to data, and compares them. I think I have three objections:

First, they present procedures, and do not show that any of those procedures optimize anything that a scientist would care about. That is, they do not show that any procedure gives a best-fit line in any possible sense of the word best. Now, of course, some of their procedures do produce a best-fit line under some assumptions, but they only give those assumptions for one (or two) of their five methods. In particular, the method they advocate has no best-fit interpretation whatsoever!. Scientists do not trade in procedures, they trade in objectives, and choose procedures only when they are demonstrated to optimize their objectives, I hope.

Second, when deciding whether to fit for Y as a function of X or X as a function of Y, they claim that the decision should be based on the physics of X and Y! But the truth is that this decision should be based on the error properties of X and Y. If X has much smaller errors, then you must fit Y as a function of X; if the other way then the other way, and if neither has much smaller errors, then that kind of linear fitting is invalid. This paper propagates a very dangerous misconception; it is remarkable that professional statisticians would say this. It is not a matter of statistical opinion, what is written in this paper is straight-up wrong.

Third, they decide which of their methods performs best by applying all five methods to sets of simulated data. These data are simulated with certain assumptions, so all they have shown is that when you have data generated a certain way, one method does better at getting at the parameters of that generative model. But then, when you have a data set with a known generative model, you should just optimize the likelihood of that generative model. The simulated data tell you nothing in the situation that you don't know the generative model for your data, which is either always or never the case (not sure which). That is, if you know the generative model, then just use it directly to construct a likelihood (don't use the methods of this paper). If you don't, then you can't rely on the conclusions of this paper (and its ilk). Either way, this paper is useless.

Wow, I am disappointed that this is the state of our art. I hope I didn't sugar-coat that critique too much!


frequentist epiphany

I woke up today wondering what would a frequentist do?, especially with regards to a fitting problem with nuisance parameters; that is, a problem where there are parameters (m,b) that one cares about, and parameters (P,Y,V) that one doesn't. A strict frequentist has no measure on the parameter space, so he or she can't marginalize or integrate out the nuisance parameters from the answer. Also, the frequentist does not produce a probability distribution for the parameters given the data, but rather a probability for the data given the parameters (a likelihood, which has the wrong units to be marginalized). Marginalization is not an option.

I think a principled frequentist has two choices: Either choose the maximum-likelihood model, and report all five parameters, even though three are uninteresting, and then do some kind of sampling-based error analysis around that peak; or else look at the entire space of models that are above some threshold in likelihood. If the latter, the frequentist gets out a range of possibilities for all five parameters. That range is a strict range, depending only on the likelihood threshold. The fact that some parameters are nuisance parameters is irrelevant. The frequentist is more ad hoc because a threshold must be chosen, but also more conservative because in general the prior and the marginalization (at least slightly) tighten up the parameter bounds. (Note that I am assuming that the Bayesian sets priors objectively, which is often not the case.)



I finished a first-draft set of figures for a putative paper on the bulge-mass vs black-hole-mass relation in galaxies. I find that even if the restrictive assumption is made that the relation is a power-law, the scatter around the relation is dominated by measurement uncertainties, not the intrinsic scatter (the scatter that would be observed given extremely high signal-to-noise measurements). I can also identify which galaxies are the largest outliers, accounting for their measurement uncertainties and marginalizing over all other parameters (including slope and intercept). Those galaxies are either truly anomalous or else have unrealistically small reported uncertainties on their black-hole or bulge masses.


sequential sampling (Gibbs?)

In what little time got spent today on research, I switched over my black-hole against bulge-mass power-law fitting code to perform its movements in parameter space in one dimension at a time, sequentially looping over the available dimensions. This permitted me to set the step sizes objectively, and seems to have improved convergence. We have been learning that although MCMC is a super-useful and super-simple tool, in practice there are a lot of considerations involved in making sure it is running efficiently and converging; there are no extremely straightforward guarantees. This deserves a detailed write-up at some point in the near future.


local standard of rest paper, black holes

[I just returned from some time off.]

I resurrected the black-hole–bulge-mass relation fitting code, made some tweaks, and got it running again after the break. On vacation I sketched out an abstract for a paper on the subject; not sure it is greater than one LPU (least publishable unit).

Bovy sent me a draft manuscript on the local standard of rest—the velocity of a putative circular orbit around the Galaxy, passing through the Solar position. He and I are arguing that this orbit does not exist, on an empirical basis. The data on the local phase-space density are simply not consistent with what is expected if that orbit exists. Of course there are approximations to this orbit; quantitatively we find that there is no such orbit at the few km/s level. There is no point in trying to measure it more accurately than that; and it shouldn't be used at all in any contexts that require precision.


Bovy, materials

On the last day of the IMPRS summer school, Bovy gave a detailed talk about dynamical inference in the Solar System, our April Fools' paper. He gave a very nice description of not just what we did but also why it works and what are the limitations. I am getting steadily more pessimistic about dynamical inference with Gaia, at least with phase-mixed populations. Populations that are not phase mixed, such as tidal streams, are seeming much more promising at present.

All my presentation materials (somewhat disjointed) are available in this tar gz file. For a non-attendant at the summer school, the most comprehensible part will be the problem sets.



Konrad Bernloehr (MPIK) gave a beautiful talk about the HESS experiment at the IMPRS. The data analysis is based on absolute and extreme forward modeling: They produce in their air shower simulations the expected output from the HESS readout electronics. They even model were each photon hits each mirror! Data is compared to model in the space of the raw data. Beautiful!



In my IMPRS lecture today, I admitted in public that when you make a hard decision (that is, when you choose a model, or choose to include or not include a source in your catalog) you are by necessity doing something subjective. After all, Bayes returns only relative probabilities; making a decision on that basis departs from the objective process of inference. You can do better by making your decisions with the help of explicit utilities, but even these don't save you from subjectivity.



After we had all the students at the IMPRS summer school code up a Metropollis-Hastings MCMC algorithm yesterday, Bill Press, in his lecture, explained the algorithm today and—very simply—proved its efficacy. This was a pedagogical experiment: We had the students execute first, understand later.


Press and python

Bill press kicked off the IMPRS summer school with a set of lectures on distributions and Bayesian inference, with a focus on probability theory.

In the afternoon, Bovy, Lang, and I ran a lab session in which we had the students write a MCMC sampling code in Python. Many of the students succeeded, and many are close, despite the fact that most of them had never used Python previously. It was the first of three sessions. If you want to follow along from home, we will post all our materials to the web within days.


black-hole bulge-mass relation

On the weekend, Lang and I re-fit the local black-hole–bulge-mass relation, to assess common wisdom about the slope. We find that if the model is given flexibility to deal with outliers, the slope is not extremely strongly constrained; even a slope of unity is possible. More on this we decide to finish it and write it up. It isn't quite one LPU, if you know what I mean.

We also worked on preparations for the IMPRS summer school.


streams do exist?

Lang and I softened (made more robust) the background model in our brute-force foreground–background modeling system; that is, we made it harder for the foreground model to pick up spurious streams by effectively modeling deficiencies in the background model. This worked like a charm, eliminating almost all streams in our "scramble" sample and leaving a few tantalizing ones in the real data. Now we have to figure out how to observationally confirm our candidates. This is hard, because when you have used all the data to find the candidates, you don't have any data left to confirm them!


didn't find a stream, LSR doesn't exist

Lang and I switched to brute-force foreground/background modeling, after Koposov convinced us it was a good idea and we convinced ourselves that we could do it quickly (see yesterday's post). We found a stream, very near the Sun! Then we scrambled the data (that is, generated fake data that can't have structure) and found even more streams, so I guess we didn't really find a stream. Working on it.

Behind us in the office, Bovy has been working on the Sun's motion relative to the local standard of rest. The assumptions that underly the measurement of the LSR are all violated; is there an LSR at all? We don't know (there doesn't have to be one at all if the Galaxy isn't close to axisymmetric), but we do know that if you assume there is a local standard of rest, what you get for it is a strong function of how you deal with stars that are members of the cold moving groups in the disk. The error bars in the literature, including mine in Hogg et al 2005 are all way underestimates.


finding streams

Koposov, Lang, Rix, and I debated methods for finding streams or substructure in high-dimensional stellar data sets, with Koposov defending brute-force foreground/background modeling, and Lang defending density estimation as a search heuristic. We all agreed that the best way to confirm a stream is with foreground/background modeling, so we were sympathetic with Koposov, but there are issues with speed, as there are in all brute-force methods.