I have various fantasies about book-length writing projects. I worked a bit on one of them today; no other research, because some toboggans and a snowy slope beckoned.
I got up early and got my ya-yas out on model complexity in Tsalmantza and my paper on HMF. I have been thinking about finishing a note on model complexity for my/our Data Analysis Recipes series, and since (a) the issue is ringing around in my otherwise empty head, and (b) the issue comes up in the HMF paper, the HMF paper just got an overly-long polemic on why you shouldn't use AIC, BIC, or (naive) Bayesian evidence integrals. I am sure (I hope, perhaps?) that we will end up removing it before we submit!
The only research I did today, in between getting relatives to various travel destinations, was to answer email from some of my hard-working students, two of whom are close to being able to criticize objectively the LSST filter choice.
I spent most of my research time today and over the last few days working on the HMF paper. Iain Murray, in the comments to my last post, pointed out that there might be a connection to factor analysis, and I am working that out. Certainly they are related, but factor analysis and ICA and the like are for situations where the noise properties are a function only of row or of column of the (large, rectangular) data matrix. HMF permits the noise variance matrix to be arbitrarily heterogeneous.
More matrix factorization today. PCA is bad, HMF is good. Why is PCA bad? Because it does a good job at describing the variance of your data, which can have substantial contributions from—or be dominated by—noise. Why is HMF good? Because it models the noise-deconvolved underlying distribution.
In a day of all talking all the time, I had a nice conversation with Mike Kesden about detecting tiny black holes that might zip through the Sun. There is a mass range where they are not ruled out as the dark-matter candidate but would produce observable signatures.
Today was catch-up day so not much work got done, but I did manage to learn a lot about the Magellanic Clouds and stream from one and a half seminars by Gurtina Besla (Harvard). Her view, which is pretty convincing, is that the clouds fell in late, and the stream is caused by a tidal interaction between the clouds, not between each cloud and the Milky Way.
I spent the day at Harvard CfA, giving a seminar and discussing various things with the locals. Among other things I learned: Doug Finkbeiner and Mario Juric have a re-calibration for PanSTARRS that looks very promising. They also built fast technology to implement it (and other scientific investigations with the large data set). Kaisey Mandel (with Bob Kirshner) has a hierarchical model for supernovae light-curves that uses all bands simultaneously, including the important H band, and fits simultaneously for dust extinction and light-curve properties. He appears to be able to improve distance indication by tens of percent, which is a big deal in this area. Alyssa Goodman, Gus Muench, and others have been working on crazy new stuff for coordinated literature and sky searching, including unpacking the NASA ADS data and doing things like making higher-end text search but also running the in-line images into Astrometry.net and building a sky index. Dinner (and drinks after) were great too.
The only bad thing—or I should say sad thing—is that one of the great pleasures for me of visiting the CfA has always been hanging out with John Huchra. What a loss for all of us.
After a discussion of the (wrong) Gurzadyan & Penrose result (in the HET seminar time slot), I introduced Iain Murray to Lam Hui, the author of one of the papers inspiring our work on marginalizing out the density field. We discussed the regimes in which making a single-point estimate of the correlation function might be worsening our results (on cosmological parameters) down-stream (in the inference chain). The rest of the day was definitely not research.
Today was an all-talking day, with Murray, Bovy, and I discussing all our projects that involve—or might involve—Gaussian processes. Murray started to describe some methods for making approximations to large variance tensors that make them computationally possible to invert. These might be of great value; when the matrix gets larger than a few thousand by a few thousand, it becomes hard to invert in general. Inversion is n-cubed.
In studies of the baryon acoustic feature, we like to get all Bayesian about the cosmological parameters, but then we apply all that machinery to the measured two-point functions, which are created with non-Bayesian single-point estimators! I spent a chunk of today discussing that problem with Iain Murray, who is visiting NYU for the week. Murray may have a straightforward solution to this problem, in which we try to write down the probability of the data given a density field times the probability of the density field given a two-point function. Then we can marginalize out the density field and we are left with a probability of the data given the two-point function. That would be exactly the full likelihood function we all need! It might be necessary to either approximate or else use a lot of compute cycles, but even approximate likelihood functions ought to beat single-point estimators.
I pointed out to Murray that if we are spending tens of millions (or billions, maybe?) of dollars on hardware to measure the baryon acoustic feature, it might be worth spending a few bucks to improve the inference we use to exploit it.
On the plane home from NIPS, Lang and I pair-coded some Python tools for working with and fitting the IMF, in the hopes of weighing in on various high-mass-star issues with Dalcanton's PHAT project. Sitting across the aisle from us was Iain Murray (Edinburgh), who explained stick-breaking and Chinese-restaurant processes, which will be relevant, I very much hope!
I learned more about sparse matrix representations today. I mainly learned that there is lots to learn, but I am pretty confident that we astronomers are not using all the technology we should be. I also got fired up about making some astronomy data sets for machine learning, in analogy to the MNIST set of handwritten digits. Many conference participants asked me for data, and I would like to be able to distract some of them towards astronomy problems.
In the afternoon, Lang and I pair-coded SDSS meta-data handling. During this meeting, in the break periods (when normals ski), we implemented the SDSS astrometry meta-data (which is not standards-compliant for historical reasons) and the SDSS photometric calibration meta-data. Soon we will be ready to do some damage on SDSS Stripe 82.
In the morning, I learned about sparse codes. I got pretty stoked. We are finding that k-means (well not exactly k-means, which you should never use, but our generalization of it to be probabilistically correct) is too sparse to get good performance (on, say, explaining galaxy spectra), and we are finding that PCA (well not exactly PCA, which you should never use, but our generalization of it to be probabilistically correct) is too dense. Sparse codes looks like it might interpolate between these cases. That is, we will be able to capture more structure than a prototype approach, but not be as restricted as a linear manifold approach. Excited!
In the afternoon, Lang and I pair-coded and tested the (annoying) SDSS asTrans astrometric transformation meta-data format in Python. Soon all your SDSS interface will belong to us.
Today was the Sam Roweis Symposium at NIPS. I spoke, along with four other of Roweis's close collaborators. I learned a lot, especially how LLE and related methods work. It was a great session. One thing it all reminded me of is that the NIPS crowd is far more statistically and inferentially sophisticated than even the most sophisticated astronomers. It really is a different world.
In the morning before the Roweis symposium, two talks of note were by Martin Banks (Berkeley) and Josh Tenenbaum (MIT). Banks talked about the perceptual basis for photographic rules and concepts. The most impressive part of it, from my point of view, was that he explained the tilt-shift effect: If you limit the depth of field in an image, the objects being photographed appear tiny. The effect is actually quantitatively similar to binocular parallax, in the sense that the governing equation is identical. In binary parallax you measure distances relative to the separation of your eyes; in depth-of-field you measure distances relative to the size of your pupil entrance!
Tenenbaum talked about very general models, in which even the rules of the model are up for inference. He has beautiful demos in which he can get computers to closely mimic human behavior (on very artificial tasks). But his main point is that the highly structured models of the mind, including language, may be learned deeply; that is, it might not just be fitting parameters of a fixed grammar. He gave good evidence that it is possible that everything is learned, and noted that if the program is to be pursued, it needs to become possible to assign probabilities (or likelihoods) to computer programs. Some work already exists in this area.
Lang and I stopped in to see Phil Gregory at UBC, who has been writing good stuff about Bayesian methods for exoplanet discovery. In the conversation I sharpened up my objections to Bayesian-evidence-based model selection as it is done in practice. It could be done well in principle but that would require properly informed priors. If the priors are
uninformative, small changes in the outskirts of the prior-allowed regions can have enormous effects on the evidence integrals.
Wu successfully defended her PhD today. Well, actually, she defended half of it yesterday, but for scheduling reasons we had to finish today. She presented the molecular hydrogen mass function as her principal result, and all the Spitzer data on which it is based. She has the largest uniform sample of Spitzer spectroscopy in existence. She is off to start a postdoc working with Herschel data this month.
Tsalmantza, Hennawi, and I (well, really Tsalmantza) got running a system to simultaneously figure out quasar redshifts and build up model spectra. That is, the system infers a model for the spectra, uses that model to determine the redshifts, and then updates the model. Iterate to convergence. In some sense it is supposed to be a model for how astronomers figure out redshifts! At lunch-time Kyle Cranmer (NYU) talked about marginalizing likelihoods at the LHC (and the improvement that gives in measurements) and before that Wu started her PhD defense process, to be completed tomorrow.
Two great seminars today. The first was by Ross Fadely (Haverford), who had done full brute-force modeling of strong gravitational lensing in a CDM-substructure context. He lays down CDM-compatible substructure in an enormous outer loop of realizations, and then does lens model selection within that loop. Brute force modeling that warms the heart. He finds, unfortunately, that the lenses don't obviously contradict CDM.
The second was by Josh Winn (MIT), who talked about very clever measurements of star-spin vs orbital angular momentum mis-alignment in exoplanet systems. The data show some beautiful regularities that are not implausibly explained by a combination of few-body effects driving inward migration (of hot Jupiters) followed by tidal damping of inclinations and eccentricities on pretty short time-scales. He argued, effectively (though not explicitly) that exoplanets measure the tidal dissipation timescales (or tidal quality factor) of convective stars much better than models can predict it. In the question period, the subject of free-floating planets came up. Mental note to self: Discover these!
Ross Fadely and Beth Willman (Haverford) came in for two days, making my visitor list pretty long! We discussed Fadely's early-days results on star–galaxy separation. He has some strange objects that are obviously stars but are better fit by galaxies. We tasked him with giving more detail on a few of these cases to see if there is something simple wrong (or too general) with the galaxy models we are using, or too restrictive about the star models we are using.
In the afternoon, Willman gave the physics colloquium about the ultra-faint galaxies. What a rich and successful field this has been! And (in my humble view) it all started here at NYU with a pair (here and here) of papers. Now these galaxies are found to be incredibly numerous, observed to contain dark matter (Willman focused on the velocity measurements of Simon & Geha), possible sites for observable dark-matter annihilation (though Willman didn't discuss that at all), and promise to increase in observed number dramatically in the next decade.
My MPIA collaborators Hennawi and Tsalmantza arrived in NYC for a week today. We are collaborating on a now exponentially growing number of projects involving data-driven models of galaxy spectra. We are trying to recover the known double-redshift objects in the SDSS, which tend to be either gravitational lenses or binary galaxies or quasars. We are trying to find new double-redshift objects of both kinds. We are trying to make more robust methods for getting precise redshifts of broad-line objects (which don't have any narrow redshift indicators). We are trying to model the quasar continuum blueward of Lyman-alpha for IGM measurements. We made a tiny bit of progress today on each of these.