John Moustakas (Siena) came in for a day of pair-coding, trying to resurrect a project we worked on way back in 2007 and 2008 with Sam Roweis (deceased) for the PRIMUS survey. The idea is to find the best subset of a galaxy spectral template list to explain a set of spectral data (and get redshifts). We cast this “best subset” problem as an integer programming problem—minimize the number of templates such that every data point is well represented by at least one template—and solved it using IBM's amazing (and closed-source, proprietary) CPLEX optimizer for non-convex problems. Today, Moustakas and I not only got our code from 2008 working again (let's hear it for readable code!), we replaced the old IDL code (that writes input files for CPLEX) with direct calls from Python to the new CPLEX Python API. It worked!
I don't want to sound like an advertisement for IBM, but CPLEX is hundreds of times faster than anything we were able (even with Roweis's help) to write ourselves, and also found consistently better solutions. These problems are simple to state but NP-Hard to solve. CPLEX is unbelievably expensive (and subject to export restrictions) but it is available for free to US academics through IBM's academic initiative.