I had a long conversation with Joe Hennawi and Beta Lusso (MPIA) about Lusso's very nice fitting of multi-wavelength photometry of AGN with a combined stars plus disk plus torus plus cold dust model. She has a few to hundreds of templates for each of the four components so she is fitting tens of millions of qualitatively different models. I think the performance of her fitting could be improved by learning priors on the templates much as we did for our hierarchical bayesian star–galaxy classification project. Of course it would be very computationally expensive and it might not help with her core goals, so I wasn't advocating it strongly.
However, I do believe that if you have enormous numbers of templates or archetypes or models for some phenomenon and you have many data points or real examples, you really have to use hierarchical methods to control the complexity of the problem: There is no way that all of your models are equally plausible a priori, and the best way to set their relative plausibilities is to use the data you have.
This also resolves an age-old problem: How do you decide how many templates or archetypes to include? The answer is: Include them all. The hierarchical inference will take care of down-weighting (even zeroing-out, in our experience) the less useful ones and up-weighting the more useful ones.
This whole thing is pretty deep and I can write or talk about it for hours: In real fitting situations with real data, it is always the case that your model is both too flexible and not flexible enough. It is too flexible because your model space permits fitting data that you could never possibly see for lots of reasons fundamental and incidental. It is not flexible enough because in fact the models are always wrong in subtle and not-so-subtle ways. I have a growing idea that we can somehow solve both of these problems at once with a flood with archetypes, mop up with hierarchical inference
approach. More on this over the next few years!
Looking forward to hearing about it over the next few years!
ReplyDeleteHeirarchical models are great. I notice that Loredo emphasised them a lot in his review. At the SCMA meeting last year he summarised the Bayesian section of talks and told us we were all using Heirarchical models even if we didn't realise it.
One thing heirarchical models don't do is allow you to "infer the prior from data". That's a silly notion. Instead, what they do is allow you to express non-ridiculous priors in high dimensions. What works in low dimensions is appeals to ignorance and vague priors to represent them. In high dimensions if you put vague independent priors on lots of parameters that's not just ignorance, it's also confidence about diversity! Which you usually don't want.
@Brendon: Agreed, except for one thing: All prior information comes from data, I very much hope! There is no other source of knowledge, really.
ReplyDeleteBut I don't think we really disagree, because by "data" here I think you mean "the data being used in this experiment" and I think you mean to exclude "all the data you have seen in all prior experiments" which is, I hope, providing a very important component of your prior knowledge!
"I think you mean to exclude "all the data you have seen in all prior experiments" which is, I hope, providing a very important component of your prior knowledge!
ReplyDeleteYep. Of course the data you have now should be used to inform the prior for the analysing the next data set!
"All prior information comes from data, I very much hope! There is no other source of knowledge, really."
I'm not 100% convinced on that. Obviously it's extremely important, but I don't really know why we aren't Jaynes's poorly informed robot (who can observe an arbitrarily large amount of data and still not know anything else). Obviously we can learn but you need a certain kind of prior information in order to learn. Not sure how to solve that though, I think it might be related to those regresses in philosophy that are hard to terminate.