While I was on the plane flying home, Rory Holmes (MPIA) implemented some things we had talked about related to internal (self-) calibration of photometric data sets, in a toy context. It looks (as expected) like the quality with which you determine your flat-field is a strong function of how you sampled it. This project is expected to evolve into something about how you should design your survey (WFIRST or LSST or the like) to ensure precise photometric calibration.
On my last day at MPIA—what a great summer it has been—Tsalmantza and I raced an iterated gradient descent against a block-diagonal method for solving the matrix factorization problem we have been working on (I have given it many names, but I now realize that Roweis would have called it
matrix factorization). We were both sure that gradient descent would crush because in our (not brilliant) R implementation, the gradient descent steps took hundreds of times less time to execute than the block-diagonal steps. But it turned out that gradient descent takes more than hundreds of times more iterations! So the block-diagonal method wins. We probably could beat both with conjugate gradient, but that will wait for another day. I have packing to do!
On the weekend I read—at Lang's suggestion—An Introduction to the Conjugate Gradient Method Without the Agonizing Pain (PDF) by Richard Shewchuk (Berkeley). It made me infintely powerful, because now I can do gradient descent and conjugate gradient effortlessly. I wrote the relevant functions for my project with Tsalmantza. Shewchuk's document contains much knowledge, beautifully expressed. You must read it!
Tsalmantza and I confronted the degeneracies you have when you use vectors to span a linear space: The components of our spectral basis can be reordered and renormalized and made into linear combinations of one another, all without changing any of the final results. This means (a) really you should never plot the basis vectors, but really just plot the fits of the basis vectors to real objects; it is not the basis vectors that are the result of your optimization, it is the subspace they span. And (b) really you should break these degeneracies as you go, so your code outputs something somewhat predictable! We broke the degeneracies the same way PCA does, except much better (of course!): We normalize them sensibly (all have the same sum of squares), orthogonalize them by diagonalizing the squared matrix of their coefficients when fit to the data, and sort them by eigenvalue, most important first.
Now the code we have outputs exactly what PCA outputs, except that it optimizes chi-squared not the unscaled sum of squares, and is therefore a much better representation of the data. It is a plug-in replacement for PCA that is better in every respect. Have I mentioned that I am excited about this? Now we must write the method paper and then finish all our myriad projects we are doing using this.
I finished my fitting-a-straight-line document; I will submit it to arXiv tomorrow. In the afternoon, Tsalmantza and I found that the non-negative updates are not only very successful, they are also very fast, much faster than the unconstrained updates. We think this is just because we are doing the unconstrained updates in a silly way (for R anyway). So it is on the to-do list to speed up the normal updates.
[after the fact: The fitting-a-line document is http://arxiv.org/abs/1008.4686. Thanks to everyone who helped out!]
Rix and I discussed a project to design observing strategies that optimize calibration performance of a survey, with Euclid, WFIRST, and LSST in mind. The idea is that, as with SDSS, the best calibration will be internal calibration, and that is optimized when there is non-degenerate information about sources and pixels. Rory Holmes (MPIA) will write some code here with direct impact on the Euclid planning, but the project is general.
I worked out for Tsalmantza the non-negative updates if we moved our data-driven spectral modeling to non-negatively constrained. I hope both my readers know that non-negative matrix factorization is a solved problem (and simple at that), even for a heteroscedastic (proper chi-squared) objective function. There is no reason to ever use PCA ever again! I have in mind that we could give good LRG redshift-finding templates to the BOSS.
Submitted my paper with Myers and Bovy on inferring the eccentricity distribution. It will appear on arXiv tomorrow. I also had long conversations with Bovy about all the on-going projects, especially those related to quasar target selection.
Tsalmantza and I implemented a Gaussian-processes-like prior on her spectral model, to enforce smoothness at wavelengths where there are very few data. We looked at the results today and they are very nice. It works great, costs almost no time per iteration, and doesn't slow down convergence. This is great for a bunch of projects we are working on.
Late in the day I discussed image modeling with Lang; we have a (somewhat) practical proposal to replace the SDSS imaging catalog with something that is optimal (under some assumptions) in the pixel space. This should improve the catalog at the faint end and in regions where the data have issues.
Tsalmantza and I pair-coded two parts of our spectral modeling software (in R it has limitations, but it is open-source): (1) We set things up to include a Gaussian-process-like smoothness prior. This ended up being trivial, but it looked bad at first because it makes a block-diagonal matrix non-block-diagonal. (2) We worked through the bugs in the highly informative prior we are trying out on our quasar-redshift project: Do we measure the redshifts of quasars better if we insist that the redshift be measured by a spectrum that looks very much like an already-seen quasar? The prior is what I would call a
mixture-of-archetypes. After we finished the latter part, the answer looks like it is going to be
yes. If so, this is a win for Bayes; this project directly competes a marginalize-out-nuisances (with a highly informative prior) against a optimize-everything method; the first is very Bayesian (extreme, you might say) and the second is very frequentist.
I led a discussion at the MPIA of the results of the Astronomy and Astrophysics Decadal Survey, which was released last week. I was much happer with the outcome than many of my colleagues here. Of course it was confusing about what it all means for Euclid in which MPIA has a stake. Does this count as research? If not, I also made the affine-parameter prior PDF proper in Jagannath and my curve-fitting project, and looked at new results of quasar redshift determination with Tsalmantza.
I wrote a polemical discussion section and thereby finished the zeroth draft of Myers and my eccentricity distribution paper. Just waiting for code to finish running on that, and then I will have to take the salty language down a notch. In the afternoon, Schmidt and I discussed his periodogram code for searching SDSS stripe 82 for periodic variables, and spectral handles on his quasar variability findings.
In the morning I finished the SDSS-III DR8 visualization. In the afternoon, Myers and I pair-coded things related to the SDSS—Pan-STARRS cross-calibration we are working on. For reasons outside of our control, we pair-coded at Jumpinn.
I worked on my full-SDSS-III imaging visualizations today. These visualizations are built from every single detected object in every survey-primary bit of imaging. So they are challenging to build. Some day, it would be fun to go visualization-crazy: Why do the theorists always have better visuals?
I spent the week traveling (hence no posts) and writing. I wrote text about the structure function and Gaussian processes for Bovy and Kasper Schmidt (MPIA). I wrote text about fitting a curve to a set of points in phase space for Jagannath. Next week it is back to Heidelberg and back to work.
I realized today that my eccentricity distribution code can use the methods of Gaussian processes. This pleased me greatly, as I have been wanting to learn about them. I also gave a short talk on image modeling and worked on updating my well-known but out-of-date SDSS zoom (see frame above).
Myers and I worked out a robust likelihood function to use for comparing synthetic Pan-STARRS fluxes to real Pan-STARRS data. The synthetic magnitudes will be constructed from overlapping SDSS imaging data, with a flexible linear model. Eventually the hope is to variability-select objects, with two epochs spanning a six-to-ten-year baseline.
I demonstrated with our fake exoplanet data that my exoplanet deconvolution methodology works. This is even more extreme than Bovy's extreme deconvolution code, because it can use data with arbitrarily complicated uncertainties—not just heteroscedastic but also non-Gaussian, bimodal, or what have you. It ain't fast.
In related news, I figured out (the hard way) that if you are sampling the posterior probability distribution function but keeping track of the likelihoods, the best likelihood you see in the sampling is not necessarily all that close to the true maximum of the likelihood function. This is obvious in retrospect, but I was confused for a while. I think it is true that if your prior is not extremely informative, you ought to eventually see the maximum-likelihood point, but you might have to wait a long time, especially if your likelihood is not very sharply peaked.
Myers and I sat side-by-side for much of the day getting and checking samplings for the posterior probability distributions for exoplanet parameters for a big chunk of fake data. We decided that the distribution inference code we are writing should run on samplings—since so many astronomers are producing samplings these days—so we need lots of samplings for testing.