2019-09-09

enumerating all possible statistical tests

Today I got in my first weekly meeting (of the new academic year) with Kate Storey-Fisher (NYU). We went through priorities and then spoke about the problem of performing some kind of comprehensive or complete search of the large-scale structure data for anomalies. One option (popular these days) is to train a machine-learning method to recognize what's ordinary and then ask it to classify non-ordinary structures as anomalies. This is a great idea! But it has the problem that, at the end of the day, you don't know how many hypotheses you have tested. If you find a few-sigma anomaly, that isn't surprising if you have looked in many thousands of possible “places”. It is surprising if you have only looked in a few. So I am looking for comprehensive approaches where we can pre-register an enumerated list of tests we are going to do, but to have that list of tests be exceedingly long (like machine-generated). This is turning out to be a hard problem.

No comments:

Post a Comment