Upon arrival in Heidelberg today, Rix gave me homework regarding inferring the stellar mass function in a cluster given a sample of measured masses. This is a Poisson sampling problem, and there are two somewhat different formulations: In the first, you consider the independent probability for each of the observed masses (given a model of the distribution), and then multiply the product of those by a Poisson probability for the total observed number. In the second, you consider infinitesimal bins in mass, and product together the Poisson probabilities for the (binary) occupation of each of the bins. Late in the day I wrote a document showing that these two approaches are mathematically identical, at least as far as the likelihood function is concerned.
These models are for independently drawn (iid) mass samples. We would like to break the iid assumption, but non-independent samples are more complicated. There aren't good general ways to think about dependent data, and yet there is a vociferous literature about whether stellar masses in clusters are iid. The literature is loud but doesn't contain anything that I consider a "model", which for me is a quantitative specification of the likelihood (the probability of the data given model parameters).
Dear David.
ReplyDeleteRecently I have used Nested Sampling to obtain samples in order to infer centralized and dispersion measurements of the posterior distribution. I also wondered about independent samples, and I have used "sampling efficiency" (based on autocorrelation) to control how "good" was the sampling:
eta = (1+2*sum(acf(post.theta, plot = FALSE)$acf))^(-1) # (R code)
In http://kmh-lanl.hansonhub.com/talks/maxent00b.pdf is said "eta^(-1) iterates are required to achieve one statistically independent sample".
But now, I wonder about (i) are there a set of statistical tools to measure sampling efficiency? (not only eta), and (ii) how could we obtain good results by combining samples with medium-high efficiency and a good (robust?) estimator of the parameters?
So, if you have any clue about this, please, show me the way :)
Cheers,
Angel Berihuete.
I believe your description is equivalent to saying that your masses are drawn from a Poisson process. A Poisson process is a "point process": there are random number of points that have labels, positions and/or times (in your case, positions on a mass axis). Interacting point processes have been developed, which might be considered instead of the Poisson process. One review is on p31 (p46 of the pdf file) in http://www.cs.toronto.edu/~rpa/adams-phd-thesis.shtml
ReplyDeleteIt can be tricky to tease apart dependencies due to uncertainty, and dependencies due to physics. If I start reading the list of masses {3.61, 3.72, 3.54, ...}, then I will infer the mass distribution is peaked around 3.6 and expect to see more masses around there. However, the masses might still be independent in some sense, conditioned on knowing enough about the physical system.
There are interactions that can't be well modelled by independent draws from an unknown intensity. Interaction processes have been developed to model repulsive effects in other fields. Examples: 1) a neuron has a "refractory period", it can't spike twice quickly; 2) a tree stops another tree growing underneath it.
Dear David
ReplyDeleteI has been interested in the issue of the stellar mass function for years.
When I began with this, I used to thinking in terms of Poisson distributions. But currently I think that a multinomial distribution is a better approach. It does no means that stellar masses are not iid, but just that their probabilities are correlated each other.
Do you think that there is any argument which favor a poisson approach instead a multinomial one?
Thanks in advance for your opinion
Miguel Cerviño
"Do you think that there is any argument which favor a poisson approach instead a multinomial one? "
ReplyDeleteThe counts in bins will be Multinomial given the total number of objects, you are correct. The marginal distribution for the total number of objects is then poisson given some rate.
I think it would be better to analyse unbinned data (equvalent, as Hogg pointed out, to infinitely thin bins), in which case the number of objects would be poisson given some rate and then the object positions would be IID draws from the (normalised) mass function.