I wrote text in the “Methods and data” section of the mass-and-age paper I am writing with Ness. I emphasized particularly the point that The Cannon is a probabilistic model: It is a likelihood function, which is optimized at training time, and then again at test time. The only difference between training and test is which parameters are varied. In the former, it is the spectral expectation and variance parameters, at fixed label values. In the latter, it is the label values, at fixed spectral expectation and variance parameter values. The cool thing about this likelihood formulation is that it makes it trivial to account for heteroskedastic noise variances and missing data (in both the training data and the test data).