Lang and I got PyEphem working in the guts of our Comet Holmes project. PyEphem is not exactly what we want, but it outsources all responsibility for orbit computation.
Roweis, Fergus, and I met today to discuss opportunities for interdisciplinary research. We do this often; we all work on super-related fields, with Roweis in machine learning and computational statistics, Fergus in computer vision, and me in astronomy. Currently our only three-way project is one in which we are building a very general model of the SDSS imaging data stream, but it is very exploratory at this point.
Among other things, Bovy and I discussed the first steps towards measuring proper motions in the USNO-B and SDSS raw data streams. This is a big project, but we have decided to get
closer to the data.
I spent yesterday and today at the Spitzer Oversight Committee, which is definitely not research. I learned many things about project management and spacecraft engineering, but the most interesting scientific result I heard about was that Saturn has a ginormous outer ring.
I spent my research time today reading Bretthorst's book on Bayesian spectrum analysis [one big PDF file]. It is a beautiful and useful document; I think there will be many ideas in this book useful to the exoplanet problem. Roweis pointed me to this book; Yavin made me read it.
One small comment on this excellent book, which I am compelled by God and Man to make: Bretthorst, like Jaynes, is a believer in assuming that errors are Gaussian because that is the most conservative thing you can do if you have a noise variance estimate and nothing else. This is technically correct, and beautiful to see demonstrated. However, this is a very dangerous argument, because it only applies when you somehow, magically, know your noise variance. You never do, at best you know the curvature at the mode of the noise distribution (if you are lucky). The variance is dominated in most real systems by rare outliers. No finite experiment is likely to provide you a good estimate of it. Furthermore, even if you do know the variance, how would you know that you know? You would have to take fourth moments to confirm it, and I have never seen an experiment ever in the history of science in which fourth moments are accurately measured. Finally, the conservativeness-of-Gaussian argument is a maximum-entropy argument subject to a strict variance constraint. Jaynes and Bretthorst should know better: You never have absolutely perfect knowledge of anything; the noise should be found through an inferential process, not a constrained exact math problem!
Whew! I had to get that off my chest.
Brice Ménard (CITA) came by for the day, and Schiminovich came down in the afternoon and we all discussed weak but measurable correlations in quasar spectra and imaging. For example, Ménard has measured relationships between quasar images and absorbers (to show that there is extended star formation associated with absorbers), between emission lines and absorption lines in absorbers (same conclusion), and between quasar colors and angular separation to nearby galaxies (to show that there is dust correlated with galaxies). These projects are all describable as
stacking projects, because they involve measuring very weak signals that are only detectable in large ensembles of objects that can be, in some sense, aligned. However, they are also all describable as
correlation function projects, because they are measurements of excess in one signal that is keyed by the presence of a somehow neighboring signal. Either way, they are projects that are only possible with large uniform data sets; Ménard is the world's expert in finding signals like these. Schiminovich and I committed to giving Ménard some GALEX data for extension of these projects into the ultraviolet.
Lang and I found a significant bug in the code we were running this summer to brute-force identify stream-like substructures in the stellar distribution in the Milky Way. We fixed it, updated our results, and started to follow up the most promising stream (follow up statistically, not observationally). Our plan is to write an observing proposal to confirm it (or deny it) with radial velocity measurements.
I ate lunch with Yavin, during which we discussed how next to proceed on the exoplanets. Yavin has made something that finds the dominant planet incredibly fast by performing Fourier-based arithmetic operations on the data prior to any fitting. The question is: Can we use these methods to enormously speed up any more correct Bayesian analysis? We both have the intuition that we can. I promised to start literature searching this weekend to make sure we are not completely reinventing old wheels (I am sure we are to some extent).
I spent the day checking in on a number of ongoing projects. Bovy and I discussed the moving groups from Hipparcos and the black hole at the Galactic Center. Price-Whelan and I discussed proper priors for the Bayesian line-fitting problem. Itay Yavin (NYU) and I discussed a nascent project on translating the exoplanet problem into a problem in harmonic or spectral analysis. Lang and I discussed, among other things, observing proposals. And Wu and I discussed her slides (on multi-wavelength measures of interstellar radiation and star-formation rates) for a short talk at the GALEX team meeting at Columbia tomorrow.
Roweis and I spent the morning discussing the possibility that we could build a pipeline to read and analyze the SDSS-III BOSS spectral data, starting by looking for outliers and moving towards full data reduction. This is, in some sense, duplication of effort, but since we would work without
domain knowledge, we might learn or confirm some things that might have to be assumed otherwise.
In the morning, Jeremy Tinker (Berkeley) led our group meeting with a discussion of information about galaxy evolution from clustering. In the approximation that we know the dark matter model, the relationship between galaxies and dark matter can be parameterized and then the observed galaxy—galaxy clustering puts constraints on how the galaxies could possibly form and evolve. He has some counterintuitive results, from the fact that at intermediate redshift, the large-scale clustering of red and blue galaxies is very similar.
In the afternoon, Marc Kamionkowski (Caltech) gave the Big Apple Colloquium about the isotropy and homogeneity of large-scale structure, and in particular the cosmic microwave background. He is building non-natural models that permit anisotropy in the power spectrum while preserving isotropy in the temperature and density and all else. There is a small amount of evidence for this statistical anisotropy situation in the current data; it is a long shot but if it holds up it is extremely important.
As my loyal reader knows, I have no fear when it comes to models with huge numbers of parameters; indeed the ubercalibration project is effectively a fit with hundreds of millions of parameters, and we can prove that we got the global optimum (in the sense that we made sure the problem is guaranteed to be convex). Today Roweis pitched a generalization of all this, in which one creates a very flexible linear model space, where parameters are tied to or in a hierarchy of meta-data, such that some parameters are tied to, say, the date, some to the airmass, some to the seeing, some to the camera column, and so on. Then the model discovers which parameters are necessary for accurate modeling of the data and thereby discovers important meta-data, dependencies of the data on artificial issues, and bad data. We tentatively agreed to run this on the new BOSS spectra from SDSS-III.
Iain Murray gave another nice talk today, this time in the machine learning group, about a high-end sampling method called
elliptical slice sampling, optimized for gaussian process modeling, where calls to the prior probability distribution function are more expensive than likelihood calls. It was a very nice talk and got me thinking about slice sampling in general, which might be very useful to us.
I assigned Jagannath the following problem: What can you learn about the gravitational potential of a gravitating system in which you yourself lie (think the Milky Way) if all you have are streams of stars that trace out orbits, and all you can measure about those streams are their trajectories in angular coordinates (think RA, Dec). That is, what can you do if you get a snapshot of a few orbits, but you only get the angular shapes of those orbits? We went back and forth a bit between the answer
everything and the answer
nothing. I think it is the former, but of course it will depend strongly on the precision of the measurements! In some sense, all the answers are known; we are just on an intuition-building adventure.
I spent an irresponsible morning talking non-stop to (at?) Iain Murray and Jo Bovy, about our various inference projects, current and future. Murray and Bovy have spent the last week figuring stuff out about our Solar System project, and are producing a much deeper (and more useful) paper; it shows that different parameterizations of phase-space distribution functions lead to different results, but (a) not very different, when stated in terms of reasonable probability intervals, and (b) you can put all possible parameterizations into the model and marginalize over them, without harming the results. What a pleasure it is that I can count hanging out all morning discussing such niceties as performing crucial functions of my job!
At lunch time, Jeff Allen (NYU) gave a status update from the Pierre Auger Observatory for ultra-high energy cosmic rays. There are now conflicting bits of evidence about their sources and composition, which is tantalizing, although the GZK cutoff did really appear is it had to.