[I am on vacation this week; that didn't stop me from doing a tiny bit of research.]
I did a bit of writing for the project of taking The Cannon into compressed-sensing territory, while Andy Casey (Cambridge) structures the code so we are ready to work on the problem when he is here in NYC in a couple of weeks. I tried to work out the most conservative possible train–validate–test framework for training and validation, consistent with some ideas from Foreman-Mackey. I also tried to understand what figures we will make to demonstrate that we are getting better or more informative abundances than other approaches.
Hans-Walter called to discuss the behavior of The Cannon when we try to do large numbers of chemical abundance labels. The code finds that it's best model for one element will make use of lines from other elements. Why? He pointed out (correctly) that The Cannon does it's best to predict abundances. In no sense is it directly measuring the abundances. It is doing it's best to predict, and the best prediction will measure the element directly, and also include useful indirect information. So we have to decide what our goals are, and whether to restrict the model.