## 2014-08-05

### understanding the behavior of complex code

Three of my conversations today were the following: Zhitai Zhang (MPIA) is working out, for a given stellar position in three-space and radial velocity, what kinds of orbits might that star be on, given a Milky Way gravitational potential model and the unknown proper motion. Wilma Trick (MPIA) is generating toy stellar position and velocity data on a toy Milky Way disk and censoring it with a toy selection function and then trying to infer the toy model parameters from the simulated data. Ben Johnson (UCSC) is burning in his simultaneous fit of star-cluster model and spectrophotometric calibration vector. In all three cases, the code is doing something unexpected, despite passing lots of local sanity checks. This is like the difference between unit testing and functional testing: Sometimes the whole system is hard to understand. Especially when it is a complex combination of physics, statistics, and code approximations. Is the puzzling behavior a problem with the code, or is it a problem with the customer (me), who just can't handle the truth.

Zhang sometimes finds that the true stellar orbit is low probability given the data, even when there are no observational errors on the four observed phase-space coordinates. We think this has something to do with the orbital phase being such that the (unobserved) transverse velocity is unusually large (or small). Trick is finding biased disk structure parameters, even when she is generating and fitting with the same model family and the data are perfect; we suspect either the toy-data generation or else some treatment of the (trivial) censoring. Johnson is finding that the cluster amplitude or mass is unconstrained by the spectral data, even when the calibration uncertainty is set to something finite. In each case, we can't quite tell whether the behavior is evidence for problems with the code or problems with our concepts (that is, is it a bug or a think-o?). All I know how to do in these cases is come up with sensible sanity checks on the code. We suggested sampling a larger range in transverse velocity for Zhang, making even simpler toy data for Trick, and looking at every stage in the likelihood calculation for Johnson. The latter lasted well into the night and I believe in the end it was a think-o not a bug.