Stephen Feeney (UCL) gave a nice talk on finding bubble collisions by performing model fits and Bayesian hypothesis tests on the cosmic microwave background maps. He had to make some approximations to do the Bayes integrals. He concludes that the rate of bubble collisions is low (I guess we would be suspicious if he concluded otherwise), but because of resource limitations and decisions made early in the project, he and his team did not test for large-angle signatures, which are expected to be the most common. The good news is that such tests are going to be easy in the near future.
After a few minutes of conversation, Blanton demonstrated to Willman and I that even though the SDSS data are not designed to deal with crowded fields well, they can be used to measure the proper motions of halo globular clusters. Koposov, Rix, and I demonstrated that the proper motions can be used statistically when we looked at the GD-1 Stream, but then I left this subject behind, even though globular clusters are actually much easier to measure than the GD-1 Stream. This relates to my repetitive rant that there are many great measurements waiting to be made in existing data.
Fergus came by in the morning to discuss modeling speckles as a function of wavelength in coronography, and we spent a while counting parameters. As is usual in these discussions I have with vision people, there are more parameters than data points in the natural models we want to write down. So we either have to apply priors, or else simplify the model; we decided to do the latter. The odd thing (in my mind) is that simplifying the model (that is, reducing the number of free parameters) is actually equivalent to applying extremely strong priors. So the idea that one can "avoid using priors" by choosing a simpler model is straight-up wrong, no? That said, I am very happy with Fergus's beautiful model, which involves an extremely general description of how one might transform an image locally.
Foreman-Mackey and I started work on—and Goodman and I discussed—next-generation ensemble samplers, that update a pair of ensembles in a leapfrog fashion. I still haven't completely understood the rules, but it appears that as long as at every step we satisfy detailed balance, we are fine. If this is true, then I have an intuition that we can design a new sampler with incredible performance on non-trivial probability distribution functions.
Other than a few conversations, my only research today was to talk about Bovy, Rix, and my results on the chemical and position-space structure of the Milky-Way disk. We find that if you look at disk structure as a function of chemical abundances, you get very simple and nice results; it is less messy (for reasons we don't entirely understand) than looking at the chemical abundances as a function of position. Manuscript forthcoming soon; watch this space.
Maxim Lyutikov (Purdue) came through to give a nice seminar about fundamental electrodynamics around black holes. The big deal is that near the horizon, E gets a component along B and then plasma is produced easily out of the vacuum to short the parallel component and stabilize the B field. This is very nice!
I had many conversations with team members about various projects in progress; nothing specific to report, except maybe that Jagannath and I realized that we were semi-scooped but that the scooping paper has some serious errors in it. More on this when we get together our thinking and become confident that we are right in our criticisms.
I spent time today talking with Goodman more about his (with Weare) ensemble samplers. He points out that if you have two ensembles, you have a lot of freedom for using one ensemble to inform the proposal distribution for the other ensemble. This could permit non-trivial density modeling of the one ensemble to help sample efficiently the other ensemble. We discussed the possible implications of this for multi-modal probability distribution functions and I am optimistic that we could make a very efficient next-generation sampler. This is all about proposal distribution of course; there are lots of things people want samplers to do; we are concentrating on those things that make the autocorrelation time short or the number of independent samples per likelihood call high.
Fadely, Willman, and I realized today that our data-driven, hierarchical Bayesian discrete classification system for faint sources in current and future optical survey data is also a photometric redshift prediction system. We decided (tentatively) that we should write a paper on that. I also had conversations with Foreman-Mackey about Gaussian processes, and Ben Weaver (NYU) about SDSS-III BOSS redshift fitting.
I spent the day in Delaware, hanging out with John Gizis (Delaware) and talking about finding rare objects. He has found some of the coolest and nearest stars in the WISE data, by very straightforward methods. I think he published the first WISE paper after the WISE public data release; this reminds me of 1996 and the Hubble Deep Field! He has also found hotter stars with strange infrared properties. One of the many things we discussed is understanding the spatial distribution of stars with dust disks relative to the population of very young stars in the local neighborhood. I also gave a seminar.
In most of my research time today I talked with our new NSF postdoc Gabe Perez-Giz (NYU) about his plans for his first year here. These include working through a set of challenging and fundamental problems in numerical methods for computing gravitational radiation. Part of this plan is to produce a quantitative description of test-particle phase space (qualitative orbit properties) around Kerr (spinning) black holes. I think this is a great idea, but it involves a huge amount of literature review, synthesis, and computation.
At lunch, the CCPP brown-bag series was kicked off by Kleban (NYU) who told us about natural properties of the cosmological constant in M-theory. The idea is that one natural (or mathematically equivalent) way of thinking about the cosmological constant is as a high-dimensional analog of electromagnetism, with a vacuum field value. This gets all the stringy or M-y properties of the cosmological constant: Huge number of possible vacua, finite probabilities of transitioning to other (much worse) vacua, no non-trivial dynamics in the cosmological constant sector (except for vacuum-changing dynamics).
I spent the morning with Fergus working on fitting a dumb model to Ben Oppenheimer's (AMNH) coronographic imaging of possibly-planet-hosting stars. Oppenheimer's instrument is not just a coronograph but an integral field spectrograph, so there is a huge amount of information in the wavelength-dependence of the residual light from the primary star. Fergus and I worked on building a simple model of it, hoping to increase the sensitivity to faint planets.
At lunch, Beth Willman (Haverford, long-term visitor at NYU) made the case that part of the definition of a galaxy (yes, believe it or not, people are currently arguing about how to define a galaxy; see, eg, Willman 1) ought to involve the chemical abundances of the stars. This is a beautiful, simple, and convincing idea.
I asked the BOSS galaxy-clustering team about my concerns about measuring the baryon acoustic feature by first making a point estimate of the two-point correlation function of galaxies and then, from that two-point function estimate, inferring the baryon acoustic feature length scale. I got no sympathy. Maybe I am wrong, but I feel like if we are going to move the information from the data to the BAF, we need to write down a likelihood. The reason the BOSSes think I am wrong (and maybe I am) is because on large scales the density field is close to a Gaussian random field, and the two-point function is a sufficient statistic. But the reason I think they might be wrong is (a) the distribution of galaxies is a non-linear sampling of the density field, and (b) the two-point function might be sufficient to describe the density, but a point estimate of the two-point function is not the two-point function. Anyway, I retreated, and resolved to either drop it or do the math.
Marshall and I spent a phone call talking about how we can model strong gravitational lenses in the PanSTARRS data, given that the data (in the short term, anyway) will be in the form of catalogs. As both my loyal readers know, I am strongly against catalogs. What Marshall and I had independently realized—me for catalog source matching and Marshall for lens modeling—is that the best way to deal with a catalog is to treat it as a lossy compression of the data. That is, use the catalog to synthesize an approximation to the imaging from which it was generated, use the error analysis to infer a reasonable noise model, and then fit better or new models to those synthesized images. I love this idea, and it is very deep; indeed it may solve the combinatoric complexity that makes catalog matching impossible, as well as the lensing problems that Marshall and I are working on.
Foreman-Mackey and I went over to Fergus's office to discuss with Fergus and Andrew Flockhart (NYU) a few possible projects. One is to perform cosmic-ray identification in imaging without a CR-split. That is, to find the cosmic rays by modeling the data probabilistically rather than by comparing overlapping imaging. This could make HST snapshot surveys and SDSS data more productive at no additional cost, or just as productive at smaller observing intervals. Another project we discussed is one to model the speckle patterns in multi-wavelength coronograph images taken by Oppenheimer's crew. In the latter we talked about priors that could help given that the space of possible solutions is almost unimaginably large.
Prompted by some comments yesterday from Iain Murray on my incomprehensible post about PCA, I spent some of the morning looking into the different meanings of
factor analysis in the machine-learning literature. At least some of those are very close to Tsalmantza and my HMF method; I think this means some changes to the abstract, introduction, and method sections of our paper.
In the morning I worked on Tsalmantza and my paper on matrix factorization. Specifically, I worked on the initialization, which we perform with a PCA. But before we do the PCA, we do two things: The first is we "isotropize" the space from a measurement-noise point of view. That is, we re-scale the axes (which number in the thousands when running on SDSS spectra) so that the median measurement uncertainty (noise variance) is the same in every direction. The second is that we compute the mean, and then
project every spectrum into the subspace that is orthogonal to the mean spectrum. That is, we compute the principal variance directions that are orthogonal to the mean-spectrum direction. Then our K-spectrum initialization is based on the mean spectrum and the first K−1 variance eigenvectors orthogonal to the mean.
Whew! All that work put into the PCA, which is just to initialize our method. As my loyal reader knows, I think PCA is almost never the right thing to do. I really should finish the polemic I started to write about this a few years ago.
[Comments below by Murray make me realize that I am not being very clear here. In the case of our matrix factorization, we are trying to find K orthogonal eigenspectra that can be coadded to explain a large set of data. To initialize, we want to start with K orthogonal eigenspectra that include the mean spectrum. That's why we project and PCA.]
I worked on getting SDSS-III DR8 versions of my RC3 galaxies working. I was still finding problems until late in the day when Ben Weaver (NYU), our awesome SDSS-III data guru, stepped in and saved my day with a bug fix. The great thing about working on SDSS and SDSS-III is working with great people.
Fadely, Willman, and I have a star–galaxy separation method (hierarchical Bayesian discrete classifier) for LSST that can be trained on untagged data; that is, we do not need properly classified objects (any
truth table) to learn or optimize the parameters of our model. However, we do need a truth table to test whether our method is working. Not surprisingly, there are not a huge number of astronomical sources at faint magnitudes (think 24) where confident classification (that is, with spectroscopy or very wide-wavelength SED measurements) has been done. So our biggest problem is to find such a sample for paper zero (the method paper). One idea is to run on a bunch of data and just make the prediction, which is gutsy (so I like it), but really who will believe us that our method is good if we haven't run the test to the end?
A combination of illness, travel, and preparation for class prevented much research getting done in the last few days. One exception is that Fergus and I met to discuss modeling planet-obscuring speckle in coronograph images, where Fergus thinks he may have a breakthrough method that is simple and physically motivated. It seems promising, so I am about to ask the Oppenheimer team for some example data!