I have various fantasies about book-length writing projects. I worked a bit on one of them today; no other research, because some toboggans and a snowy slope beckoned.
2010-12-30
2010-12-29
model complexity for HMF
I got up early and got my ya-yas out on model complexity in Tsalmantza and my paper on HMF. I have been thinking about finishing a note on model complexity for my/our Data Analysis Recipes series, and since (a) the issue is ringing around in my otherwise empty head, and (b) the issue comes up in the HMF paper, the HMF paper just got an overly-long polemic on why you shouldn't use AIC, BIC, or (naive) Bayesian evidence integrals. I am sure (I hope, perhaps?) that we will end up removing it before we submit!
2010-12-28
station keeping
The only research I did today, in between getting relatives to various travel destinations, was to answer email from some of my hard-working students, two of whom are close to being able to criticize objectively the LSST filter choice.
2010-12-27
more HMF
I spent most of my research time today and over the last few days working on the HMF paper. Iain Murray, in the comments to my last post, pointed out that there might be a connection to factor analysis, and I am working that out. Certainly they are related, but factor analysis and ICA and the like are for situations where the noise properties are a function only of row or of column of the (large, rectangular) data matrix. HMF permits the noise variance matrix to be arbitrarily heterogeneous.
2010-12-23
HMF
More matrix factorization today. PCA is bad, HMF is good. Why is PCA bad? Because it does a good job at describing the variance of your data, which can have substantial contributions from—or be dominated by—noise. Why is HMF good? Because it models the noise-deconvolved underlying distribution.
2010-12-22
heteroscedastic matrix factorization
I worked on Tsalmantza and my paper on matrix factorization, which makes a data-driven model for spectra.
2010-12-21
exoplanet sampling
I spent my research time working on Hou's paper on exoplanet sampling with an ensemble sampler. He gets great performance!
2010-12-20
tiny black holes
In a day of all talking all the time, I had a nice conversation with Mike Kesden about detecting tiny black holes that might zip through the Sun. There is a mass range where they are not ruled out as the dark-matter candidate but would produce observable signatures.
2010-12-17
Magellanic streams
Today was catch-up day so not much work got done, but I did manage to learn a lot about the Magellanic Clouds and stream from one and a half seminars by Gurtina Besla (Harvard). Her view, which is pretty convincing, is that the clouds fell in late, and the stream is caused by a tidal interaction between the clouds, not between each cloud and the Milky Way.
2010-12-16
Harvard
I spent the day at Harvard CfA, giving a seminar and discussing various things with the locals. Among other things I learned: Doug Finkbeiner and Mario Juric have a re-calibration for PanSTARRS that looks very promising. They also built fast technology to implement it (and other scientific investigations with the large data set). Kaisey Mandel (with Bob Kirshner) has a hierarchical model for supernovae light-curves that uses all bands simultaneously, including the important H band, and fits simultaneously for dust extinction and light-curve properties. He appears to be able to improve distance indication by tens of percent, which is a big deal in this area. Alyssa Goodman, Gus Muench, and others have been working on crazy new stuff for coordinated literature and sky searching, including unpacking the NASA ADS data and doing things like making higher-end text search but also running the in-line images into Astrometry.net and building a sky index. Dinner (and drinks after) were great too.
The only bad thing—or I should say sad thing—is that one of the great pleasures for me of visiting the CfA has always been hanging out with John Huchra. What a loss for all of us.
2010-12-15
Hui and Murray
After a discussion of the (wrong) Gurzadyan & Penrose result (in the HET seminar time slot), I introduced Iain Murray to Lam Hui, the author of one of the papers inspiring our work on marginalizing out the density field. We discussed the regimes in which making a single-point estimate of the correlation function might be worsening our results (on cosmological parameters) down-stream (in the inference chain). The rest of the day was definitely not research.
2010-12-14
inverting large matrices
Today was an all-talking day, with Murray, Bovy, and I discussing all our projects that involve—or might involve—Gaussian processes. Murray started to describe some methods for making approximations to large variance tensors that make them computationally possible to invert. These might be of great value; when the matrix gets larger than a few thousand by a few thousand, it becomes hard to invert in general. Inversion is n-cubed.
2010-12-13
marginalize out the density field!
In studies of the baryon acoustic feature, we like to get all Bayesian about the cosmological parameters, but then we apply all that machinery to the measured two-point functions, which are created with non-Bayesian single-point estimators! I spent a chunk of today discussing that problem with Iain Murray, who is visiting NYU for the week. Murray may have a straightforward solution to this problem, in which we try to write down the probability of the data given a density field times the probability of the density field given a two-point function. Then we can marginalize out the density field and we are left with a probability of the data given the two-point function. That would be exactly the full likelihood function we all need! It might be necessary to either approximate or else use a lot of compute cycles, but even approximate likelihood functions ought to beat single-point estimators.
I pointed out to Murray that if we are spending tens of millions (or billions, maybe?) of dollars on hardware to measure the baryon acoustic feature, it might be worth spending a few bucks to improve the inference we use to exploit it.
2010-12-12
the IMF
On the plane home from NIPS, Lang and I pair-coded some Python tools for working with and fitting the IMF, in the hopes of weighing in on various high-mass-star issues with Dalcanton's PHAT project. Sitting across the aisle from us was Iain Murray (Edinburgh), who explained stick-breaking and Chinese-restaurant processes, which will be relevant, I very much hope!
2010-12-11
NIPS workshops, day 2
I learned more about sparse matrix representations today. I mainly learned that there is lots to learn, but I am pretty confident that we astronomers are not using all the technology we should be. I also got fired up about making some astronomy data sets for machine learning, in analogy to the MNIST set of handwritten digits. Many conference participants asked me for data, and I would like to be able to distract some of them towards astronomy problems.
In the afternoon, Lang and I pair-coded SDSS meta-data handling. During this meeting, in the break periods (when normals ski), we implemented the SDSS astrometry meta-data (which is not standards-compliant for historical reasons) and the SDSS photometric calibration meta-data. Soon we will be ready to do some damage on SDSS Stripe 82.