K-nearest-neighbors is all the rage in Camp Hogg these days. Pursuant to our conversations this week, I wrote a short document explaining how we could use it to determine G-dwarf ages with SEGUE spectra. The issue is that you want to see very subtle chromospheric activity on top of coincident absorption lines. You need a good model of each spectrum, better than theoretical models or spectral libraries can provide. My document explains how you could do it with the large extant data set, capitalizing on the neighbors in spectral space. Rix is about to hand me the spectra.
Rix and I spent a big chunk of the day talking about everything we have been doing since the summer. We veered off into my dream project of doing stellar chemical abundance analyses without a good physical model of the stellar atmosphere or emission. I think this is possible. And in discussion with Rix, I realized that it could be directly related (or very similar to) support vector machines with the kernel trick. The kernel can be flexible enough to adapt to star temperature and surface gravity, but not flexible enough to adapt to chemical abundance changes. I just want to figure out how to work with missing data and noise (as my loyal reader knows, this is my biggest problem with most standard machine-learning methods). We also discussed the NSF Portfolio Review, the modeling of tidal streams, age-dating stars, the three-dimensional distribution of dust in the Milky Way, and other stuff.
On the plane to Tucson, I spent a few minutes trying to formalize a few ideas about
limiting cases we might find in exoplanet population statistics, which we can use to build intuition about our full probabilistic inference. The first is that if the mutual inclinations are low (very low), the observed tranet (a transiting planet is a tranet) multiplicity distribution translates very simply (by geometric factors) into the true planet multiplicity distribution. The second is that if the multiplicity is high, the observed tranet multiplicity distribution constrains a ratio of the true planet multiplicity to the mutual inclination distribution width. The third is that if the multiplicity could somehow found to be dynamically maximal (if that is even possible to define), then the tranet multiplicity distribution translates very simply into the true planet mutual inclination distribution. I am not sure that any of this is useful or even correct.
Camp Hogg met with David Blei (Princeton) again today to discuss some more the inference generalities that came up last week. In particular, we were interested in discussing what aspects of our planned exoplanet inferences will be tractable and what aspects will not, along with whether we can co-opt technology from other domains where complex graphical models have been sampled or marginalized. Blei's initial reaction to our model is that it contains way too much data to be sampled and we will have to settle for optimization. He softened when we explained that we can sample a big chunk of it already, but still expects that fully marginalizing out the exoplanets on the inside (we are trying to infer population properties at the top level of the hierarchy, using things like the Kepler data at the bottom level) will be impossible. Along the way, we learned that our confusion about how to treat model parameters that adjust model complexity arises in part because that genuinely is confusing.
We also discussed prior and posterior predictive checks, which Blei says he wants to do at scale and for science. I love that! He has the intuition that posterior predictive checks could revolutionize probabilistic inference with enormous data sets. He gave us homework on this subject.
Today was also Muandet's last day here in NYC. It was great to have him. Like most visitors to Camp Hogg, he goes home with more unanswered questions than he arrived with, and new projects to boot!
Foreman-Mackey gave his PhD Candidacy exam talk today and passed, of course. He talked about hierarchical Bayesian modeling of exoplanet populations, which is his ambitious thesis topic. I don't think I have ever had a student better prepared to do an ambitious project with data! He also has an eleven-month-old arXiv-only paper with 25 citations already. In the morning, Fadely and I looked at results from Fadely's patch-based, nearest-neighbor-based image calibration system. It is fast and seems to be doing the right thing, but we don't understand all of the details or what we expect for the
dynamics of the optimization. Late in the day, Fadely showed me some experiments that suggest that in pathological situations, it can converge to something that looks like the wrong answer.
Krik Muandet (MPI-IS) led a discussion today of support vector machines and extensions thereof. For me the most valuable part of the discussion was an explanation of the kernel trick, which is absolutely brilliant: It projects the data up into a much higher-dimensional space and permits accurate classification without enormous computational load. Indeed, the kernel obviates any creation of the higher dimensional space at all. Muandet then went on to discuss his support measure machine extension of SVM; in this extension the points are replaced by probability distribution functions (yes, the data are now PDFs). I was pleased to see that the SMM contains, on the inside, something that looks very like chi-squareds or likelihood ratios. Jonathan Goodman asked how the SMM classifier differs from what you would get if you just ran SVM on a dense sampling of the input PDFs. Of course it has to differ, because it takes different inputs, but the question, correctly posed, is interesting. We ended the talk with lunch, at which we asked Muandet to do some demos that elucidate the relationship between SVM and nearest neighbors (Fadely's current passion).
I spent a big chunk of the afternoon discussing gravitational radiation detection with Lam Hui (Columbia). I think we have resolved the differences that emerged during his talk here this semester.
It was 1995 again here at Camp Hogg. At dotastronomy NYC hack day, one of the participants (who I am leaving nameless unless he or she wants to self-identify in the comments) identified a SQL-injection vulnerability in the MAST (Hubble Space Telescope) astronomical data archive. I made the mistake of bug-reporting it late last night and then had to deal with the consequences of it today. It was my first experience of the grey-hat world (white hat: report bug without exploit; grey hat: exploit bug trivially and report it; black hat: exploit but don't report); grey-hat is effective but stressful and doesn't get you any love. The upshot is positive though: MAST will be more secure going forward.
I was in two exams today. The first was a candidacy exam for Henrique Moyses (NYU), who is working on the
stochastic vortex, a response of random-walking particles to non-conservative forces. One of the main problems he has is how to measure the drift velocity field of a particle, when the secular drift is a tiny, tiny fraction of the random velocity. He has lots of samples, but fitting the steady part (mean) of the spatially varying stochastic velocity distribution function is not trivial. We vowed to discuss density estimation in January.
The second exam was a PhD defense for Ivane Jorjadze (NYU), who has various results on systems of jammed particles. He has worked out quantitatively a thermodynamic analogy for non-thermal but randomly packed solids with some specific experimental systems. He has also worked out the elasticity and normal modes of jammed soft particles (like cells). Beautiful stuff, with good physics content and also very relevant to biological systems. Congratulations Dr. Jorjadze!
I had a great time at dotastronomy NYC hack day, organized by Muench (Harvard). Foreman-Mackey and I (with help from Schwamb, Cooper, Taub) worked on a probabilistic analysis of exoplanet populations based on the Kepler
objects of interest list. I wrote text, Foreman-Mackey wrote code, Cooper drew a graphical model, and Schwamb and Taub read papers. In parallel, many hacks happened. The most impressive was Schiminovich's 9-year-old son Theo's calculation of exoplanet transit lightcurves (including the secondary eclipse) in scratch. Theo went from not even knowing what an exoplanet is to doing this calculation in hours! Another impressive hack was Beaumont (Harvard), who hacked an OpenGL backend onto matplotlib, making the (notoriously slow but nice) plotting package outrageously fast for million-point plots. It was seriously impressive; I hope he pushes it to matplotlib development. There were many other nice hacks and a good time was had by all. Thanks to Muench, bitly, Seamless Astronomy, and Gorelick (bitly) for making it happen!
Bovy showed up for the day, in part to discuss quasar target selection with Muandet. Late in the day we discussed more general methods for determining the Solar motion relative to the full Milky Way disk, given that there is mounting evidence that the entire Solar Neighborhood is moving faster (relative to the MW Center) than the mean rotation speed.
In the late morning there was a talk by Tanaka (MPA) about binary super-massive black holes and their observability. As my loyal reader knows, I think these are more elusive than they should be: Either there are very few of them or they hide by turning off all AGN activity. Tanaka discussed all options but contributed something very interesting: His accretion models for these objects suggest that they should be UV and X-ray dark relative to typical quasars, at least in the late stages of inspiral. So we could test our ambiguous candidates by checking whether they are UV dark. He proposed several other interesting tests, most of which take new data and serious observing campaigns. At some point I found myself wondering whether ULIRGS (which show IR AGN activity and star formation) could be SMBH binaries. They are UV-poor and clearly the result of mergers.
Camp Hogg (which includes Muandet these days) had lunch with David Blei (Princeton), who is a computer scientist and machine-learning expert. He told us about projects he is doing to index and provide recommendations for arXiv papers, based (presumably) on his experience with author–topic modeling. Blei is a kindred spirit, because he favors methods that have a graphical model or probabilistic generative model underlying. We agreed that this is beneficial, because it moves the decision making from
what algorithm should we use? to more scientific questions like
what is causing our noise? and
what aspects of the problem depend on what other aspects?. These scientific questions lay the assumptions and domain-knowledge input bare.
We talked about the value of having arXiv indexing, how automated paper recommendations might be used, what things could cause users to love or hate it, and what kinds of external information might be useful. We mentioned Twitter. Blei noted that any time that you have a set of user
bibliographies—that is, the list of papers they care about or use—those bibliographies can help inform a model of what the papers are about. For example, a paper might be in the statistics literature, and have only statistics words in it, but in fact be highly read by physicists. That is an indicator that the paper's subject matter spills into physics, in some very real sense. One of Blei's interests is finding influential interdisciplinary papers by methods like these. And the nice thing is that external forums like Twitter, Facebook (gasp), and user histories at the arXiv effectively provide such bibliographies.
Late in the day we met up with Micha Gorelick (bitly) to discuss our plans for the dotastronomy hack day in New York City this weekend (organized by Gus Muench, Harvard). We are wondering if we could hack from idea to submittable paper in one day.
Foreman-Mackey and I had a long conversation about parameterizing exoplanet models for transit lightcurves (and also radial-velocity measurements, direct-detection experiments, and astrometric data). The issue is that there are multiple angles—between the orbital plane and the plane normal to the line of sight, between the line of sight and the perihelion direction, and between the perihelion and both the ascending node and the fiducial time (zero of time). These angles are all differently degenerate for different kinds of data, and a well-measured angle can be
hidden if you parameterize it such that the well-measured angle appears as a difference or sum of parameters. We also pair-coded some improvements to his transit-fitting code. The fits—to Kepler and Spitzer data—are sweet. Because his code is flexible, he can learn the limb darkening along with all the orbit and transit parameters.
At applied-math-meets-astronomy group meeting, we discussed various things, including using classification within a sampler to split the parameter space (in which there is a multi-modal likelihood function or posterior PDF) into a set of unimodal sub-models. On this subject, it is great to have Muandet visiting from MPI-IS; he is an expert on support vector machines and their extensions. Another thing that we discussed is the possibility in the Goodman & Weare stretch move underlying emcee and Hou's code, we might be able to improve the acceptance rate by changing the proposal distribution in the one-dimensional walker-mixing direction. That is super cool and something to work on at sleep time.
At computer-vision-meets-astronomy group meeting, Fadely showed awesome results that we can calibrate astronomical imaging (determine the dark and flat) without taking specific calibration data, or even identifying stars (let alone cross-matching stars in repeat observations). This is unreal! He also showed that the information that comes from high-signal-to-noise image patches is very different from the information that comes from low-signal-to-noise (read:
blank) patches. Afterwards, Krik Muandet (MPI-IS) and I discussed how to evaluate his quasar target selection methods, which are based on support measure machines, a generalization of support vector machines to the space of probability distribution functions.
Two long meetings today with Mike Kesden (NYU) and Gabe Perez-Giz (NYU) about gravitational radiation. In the first we discussed the idea (from Lam Hui) that binary pulsars could be used as resonant detectors of gravitational radiation. We agree with Hui's conclusions (that they can, but that they aren't all that sensitive at present) but we disagree with the interpretation of the result in terms of sources. We are trying to write a clarifying document. Or at least Kesden is.
In the second conversation we talked about doing an external analysis of extant LIGO data. I have various ideas about modeling the strain or strain noise using the (massive, abundant) LIGO housekeeping data, to increase the sensitivity of the experiment to real gravitational signals. We talked about a wide range of things and tentatively agreed that we should explore entering into a memorandum-of-understanding with the LIGO Collaboration. But first we are going to do a tiny bit more homework. That is, Perez-Giz is going to.
Annalisa Pillepich (UCSC) gave a seminar in the morning about the effects of baryons on galaxy formation, comparing dark-matter-only and more realistic simulations. She finds (as does Zolotov and collaborators) that baryons can have a big effect on the structure of dark-matter halos. She finds that baryons have a big enough effect to resolve many of the remaining issues in the comparison of simulations and observations of low-mass galaxies in the Local Group. Really her conclusion (and my reading of her and Zolotov's work) is that—though there are large uncertainties—the magnitudes of the effects of baryons are large enough that they could resolve most current issues, and other things might help resolve them too. So once again the dark matter could be vanilla CDM. That's bad!
In the afternoon, Stephane Courteau (Queens) argued similarly that theories of galaxy evolution have enough uncertainties in them that they don't really predict the properties of the galaxy population well, but from the opposite perspective: He argued that no simulations get all the observed properties of the disk-galaxy population right at the present day. He argued that there needs to be very strong feedback to explain the population in current semi-analytic-style models. He also showed some amazingly good scaling relations for disk galaxies, measured carefully and lovingly by his group.
Late in the day, he argued with me that doing galaxy photometry by fitting simple parameterized models is always wrong. That's what I am doing with my Atlas project with Mykytyn and Patel. I don't agree with Courteau here—my view is that all photometry is in effect model fitting—but his points are good and need to be addressed. Translated into my language, effectively he argues for very flexible photometric models of galaxies; non-parametric models if you will. I agree but only on the condition that those flexible models are regularized so that in effect galaxies observed at different angular resolutions are being photometered equivalently. Well, at least for my purposes I need that (I want to find similar galaxies at different distances).
Tang, Marshall, and I have a problem for which we think Gibbs sampling will beat the ensemble sampler emcee (by Foreman-Mackey). The issue is that you can't really Gibbs sample with emcee: In Gibbs sampling you hold a bunch of parameters fixed while you independently sample the others, and swap back and forth (or cycle through groups of easily sampled subsets of parameters). In the ensemble (of
ensemble sampling) if you have different values for parameters in the non-sampling subset, then the parameters in the sampling subset are drawn from different posterior PDFs. That breaks the whole ensemble concept. So we are confused. Tang's job is to implement simple Metropolis–Hastings Gibbs sampling and see if it beats emcee. I am afraid it will as the number of parameters gets large. This is just more annoying evidence that there is no one-size-fits-all or kitchen-sink sampler that solves all problems best. If I worked in stats I think I would try to make that thing.
In the morning, with the computer vision types at NIPS, it was just us astronomers. Fadely and I worked on making fake data to test our calibration systems, while Foreman-Mackey ran MCMC sampling on some Kepler transit data. In the afternoon, I met up with Josh Bloom (Berkeley) at an undisclosed location to discuss various things in play. We spent a bit of time realizing that we are very closely aligned on automated decision making. Somehow it has to involve your long-term future discounted free cash flow, measured in currency units.
In a low-research day, Jonathan Zrake gave a nice brown-bag talk about how magnetic fields are amplified in turbulent plasma. I also had conversations with Fadely, Willman, and collaborators about star–galaxy separation, and Foreman-Mackey about author–topic modeling.
In my lexicon, a model is an object that has a (possibly null) set of parameters and makes predictions, in the sense that it produces a probability distribution function for the data as a function of those parameters. It also needs a few more things, including a prior PDF for the nuisance parameters (it doesn't need a prior PDF for the parameters of greatest interest, in my humble opinion). Given all this, I would have thought that taking, for each data point, the K nearest neighbors in the data space (under some metric) is not a model. But it can be, if you can cleverly convert the properties of the K nearest neighbors into a PDF for the data point. For Fadely's calibration job, and at the suggestion of Fergus, I think we can do this, and I wrote it up on the airplane home.
Because the model has very few user-tuned knobs and because its complexity and coverage grows linearly with the obtained data, the KNN model is truly non-parametric. In Fergus's terminology, it has high capacity: It can model a lot but at the same time isn't obstructed by intractable optimization or sampling.
I spent the afternoon with the LIGO team at Caltech. I got the invitation from Roy Williams (Caltech) after I tweeted (yes, tweeted) my disappointment with the LIGO data release policies. So I spent half my talk discussing the costs and benefits of open data sharing. The discussion in the room was very lively, with a range of opinions but none of them very far from my own: Everyone sees data release as hugely beneficial but also very costly. Advanced LIGO is required to do data release and the team is thinking about how to stage and structure it. Much of the discussion we had was about the salient differences between astronomy observatories and missions and physics experiments; the crucial point that LIGO has no sources yet is key and relevant. And a big issue for LIGO is lack of agreement both within LIGO and between LIGO and its partner projects about what is appropriate.
As for current LIGO data: The team is attempting to build a lightweight MOU structure to permit outside access to data without jeopardizing their core mission. I discussed what that would look like and promised to get back to them with a proposal if anyone on my team wants to fire up some analysis. Perez-Giz: I am thinking of you.
[note added later: PDF slides from my talk available here.]
I spent the day at the Spitzer Science Center, where I continue to serve on the Spitzer Oversight Committee. Perhaps it isn't research but I sure do learn a lot at these meetings. Among the many remarkable things I learned is that in the Warm Mission, the Spitzer IRAC instrument continues to get more precise with time. This is unusual for an aging, distant space observatory! The reason is that exoplanet observers are feeding systematic noise information back to the instrument team, who are working to implement hardware, software, and observation-planning changes. Right now the instrument is working at tens-of-micromagnitudes precision (parts in 105), which has permitted it to measure transits of Earth-sized exoplanets.
Over dinner, we discussed my long-term fantasy of hijacking a decommissioned satellite (any satellites coming to end-of-life in the next few years?) and setting it up as an autonomous, intelligent observer, saving on operations and telemetry by having it phone home only when it has something interesting to say.
Phil Marshall and I spent the day talking about weak and strong lensing. We sanity checked with Fadely our ideas about finding strong lenses in multi-epoch ground-based imaging like PanSTARRS and LSST. For lunch, we took the advice of 4ln.ch. With Foreman-Mackey we discussed an interesting idea in this Lance Miller paper: If you have N independent measurements of x and you have a correct likelihood function and a correct prior PDF for x, the (expected value of the) mean (over data points) of the N posterior PDFs for x is the prior PDF for x. That is either obvious or surprising. To me: surprising. They use this fact to infer the prior, which is a super-cool idea but I expect to be able to beat it dramatically with our hierarchical methods. In the airport at the end of the day, I contemplated what I would say to the LIGO team at Caltech.
ps: Though that Miller paper is excellent and filled with good ideas, I have one quibble: They have an estimate of the galaxy ellipticity they call
Bayesian because it is the posterior expectation of the ellipticity (they even use the word
Bayesian in the title of their papers). The Bayesian output is the full likelihood function or posterior PDF not some integral or statistic thereof. A point estimate you derive from a Bayesian calculation is not itself Bayesian; it is just a regularized estimate. Bayesians produce probability distributions not estimators! Don't get me wrong: I am not endorsing Bayes here; I am just sayin': Point estimates are never Bayesian, even if Bayes was harmed along the way.
Phil Marshall showed up for two days and we caught up on our various projects. The one we are most on about is Yike Tang's (NYU) to bring hierarchical inference to weak lensing. We talked about strategy for that, and the competition. Comparison to other methods is daunting for a variety of reasons: We expect to be better than other methods when the data do not precisely match the model; that is, when the likelihood function or noise model is wrong in important ways. All the current tests of weak lensing technology are on fake data, for which the likelihood function is never wrong (or rarely wrong or not wrong in interesting ways).
In the usual form of astronomical self-calibration—like what we did to get the flats in SDSS or what I wrote up with Holmes and Rix—you use repeated observations of the same stars at different focal-plane positions to determine the sensitivity as a function of focal-plane position. Today at computer-vision-meets-astronomy group meeting, Fadely showed that he has the potential to calibrate the sensitivity of a device using only the statistical properties of observed image patches. He doesn't need to even identify sources, let alone determine when the same source has been re-observed. It is still early days, so we don't know the final precision or how it depends on the number of images, number of sources in those images, PSF sampling, and so on, but it looks very promising. In particular, if we pair it with the more traditional self-calibration it could be extremely precise. Our goal: Obviate the taking of (always questionable) flat-field images. This is also something I enjoyed discussing two weeks ago with Suntzeff (TAMU) when I was out in Kansas. (He thought I was talking crazy.)
In the brown-bag talk today, Victor Gorbenko (NYU) discussed the connection between string theory and QCD-like Yang–Mills theories. The connection is that strong nuclear forces (mediated by gluons) have field confined to thin tubes when quarks get widely separated; string theory seems like a fit. This is an old program which was abandoned a while ago, but Gorbenko and collaborators have shown that it might work, if you add to the string theory some additional axion-like particles.
In what little research time I got today, I organized thoughts and process for Yike Tang's (NYU) project to do weak lensing with hierarchical inference. By inferring the ellipticity distribution along with the local shear, he can potentially increase the angular resolution and signal-to-noise of any shear map just by software improvements alone (over standard practice).
Two great talks today, one by David Nidever (Michigan), who is one of the key people in the SDSS-III APOGEE project, and one by Or Graur (AMNH, Tel Aviv), who is a supernova prosepector. Nidever has been crucial in making the APOGEE spectral reductions precise. He has them so precise that he can actually discover exoplanet and brown-dwarf companions to main-sequence and giant stars. He talked about various constraints on or understandings of the accretion history of the Milky Way, including quite a bit about the puzzling Magellanic stream. What's so crazy is that it has no stars in it. Although perhaps not as crazy as I first thought when I started to think about starless HI structures connecting nearby galaxies like M81 and M82.
Graur talked about supernova searches in data sets (like CLASH) that were designed to find supernovae and also data sets (like SDSS-III BOSS) that were not. In the latter, he has made a very sensitive automated search in the spectra of the luminous red galaxies and found 100-ish type-Ia supernovae. This yield is much lower than you might expect (from the duration and rate of supernovae) but makes sense when you include the finite fiber size and signal-to-noise. He made a very strong point that many astronomical surveys can also be SNe surveys at almost no additional cost. That's good for our long-term future discounted free cash flow.
Schiminovich and I met at one of our undisclosed locations (hint: good coffee) to work on GALEX calibration using SDSS stars. I got Foreman-Mackey to help with some web programming: We need to query the SDSS-III CAS from within the program if things are going to be efficient. That now works!
After this we went to the Columbia Astronomy Library to watch Maureen Teyssier (Columbia) speak about comparisons between the Milky Way and its environment and comparable locations in numerical simulations. She finds that she can tentatively identify a few galaxies that are near (but not in) the Local Group that probably, in their past, punched through the dark-matter halo of the Milky Way! Her findings resolve a puzzle in the study of dwarf galaxies: Most isolated dwarf galaxies—galaxies that are outside the virial radius of any luminous galaxy—have abundant cold gas. There are a few exceptions near the MW, and many of these end up in the Teyssier analysis to be punch-through galaxies. That is great and opens up some ideas about measuring the local dynamics on scales larger than the virial radius. This is all related somehow to Geha results I have mentioned previously.
Bovy came into town for the morning and we talked various things, including skewness of quasar variability. He finds a signal that is slightly hard to explain. Our discussion segued into the applied-math-meets-astronomy meeting, with Goodman giving us hope that it might be possible to treat the density field in the Universe as a Gaussian process. When Bovy and I last talked about this problem (with Iain Murray of Edinburgh), we decided that the matrices were too large to invert. Goodman thinks that things may have evolved. At that same meeting, Hou showed us how he is
refining the levels in his nested sampler. That seems to be enormously increasing the precision of his results; he has succeeded in building a precise nested sampler and we can see places where we can make further contributions to improve precision long-term. This is exciting and opens up new opportunities for our exoplanet projects.
At computer-vision-meets-astronomy we discussed Fadely's image-patch modeling system, and how to use it for calibration. We had a long discussion about the following issue: If you have a patch that is a good fit to the model except in one pixel (where, maybe, calibration parameters are wrong), should you look for that badness of fit in the total joint likelihood of all the pixels or in the conditional likelihood of the one pixel given the others? Intuitions diverged; in principle there is one joint PDF that contains all the information, so it doesn't matter. But that ignores the point that the PDF we have fit is not in any sense accurate! So we reduced scope to a very simple functional test of this point.
It's a feature, people, not an
oscillation. Jeremy Tinker (NYU) gave a great brown-bag today at the blackboard about the measurement of the baryon acoustic feature in the Lyman-alpha forest at redshift 2.3 from SDSS-III BOSS. This is a great piece of work, to which my only significant contribution was quasar target selection with Bovy. The baryon acoustic feature position is consistent with the standard model of cosmology (unfortunately) but the result is beautiful and represents a lot of work by a lot of people. Congratulations, BOSS: A great project making fundamental discoveries (and on budget and on time). I spent the afternoon talking radio with Van Velzen (Amsterdam) and Atlas with Mykytyn and Patel.
I spent the day at the APS Prairie Section meeting in Lawrence Kansas. Two (of many) talks that made an impression on me were by Corso (Iowa), who described the calibration procedures for a new generation of CMS photo-multiplier tubes, and by Mohlabeng (Kansas), who showed that there is a statistically significant color--redshift term in the public supernovae sample. That latter result got people—including Suntzeff (TAMU) and Rudnick (Kansas)—talking about selection effects. Of course even if the effect is from selection, it needs to be modeled; modeling the effect does seem to change inferences about cosmological parameters. In the post-banquet spot, I talked about the challenges of doing inference with big data sets (slides here in PDF). That led to conversations late into the evening at the bar. Thanks, APS Prairie Section!
The only small bit of research I got done today was writing some text for Brewer's masterpiece on sampling in catalog space. Given an image of the sky, he can sample the posterior PDF over catalogs, along with the posterior PDF for hyperpriors on the number–flux relation of stars. By using nested sampling he doesn't get killed by the combinatoric degeneracies that arise from the catalog re-ordering degeneracy. Even better, he models the stars that are too faint to cleanly detect. Even better, the different catalogs in the sampling have different numbers of entries; that is, this is sampling at variable model complexity. Congratulations Brewer!
I am so interdisciplinary! I have computer-vision-meets-astronomy every Tuesday and applied-math-meets-astronomy every Wednesday. At the latter, we continued to discuss non-trivial sampling, including nested sampling with the stretch move and other methods for computing the Bayes integral. One idea we discussed was one mentioned by VanderPlas (UW) last week: Just straight-up density estimation based on a sampling followed by a mean-over-sampling of ratios between the estimated density and a product of likelihood times prior. This should (perhaps with high variance) estimate the integral (normalization). This is brilliant and simple, but we expect it won't work in very high dimensions. Maybe worth some testing though. I think VanderPlas discusses this in his forthcoming textbook.
In parallel, Foreman-Mackey has been preparing for a full assault on the Kepler data by getting ready a planet transit code. We are trying to be more general (with respect to limb darkening and the like) than the enormously useful and highly cited Mandel and Agol paper (a classic!) but without sacrificing speed. I had a brilliant insight, which is that any case of a disk blocking part of an annulus is just a difference of two cases of a disk blocking part of a disk. That insight, which could have been had by any middle-schooler, may speed our code by an order of magnitude. Hence rapid transit. Oh, have I mentioned that I crack myself up?
While America voted, Fergus, Krishnan, Foreman-Mackey, and I discussed with Fadely his self-calibration results. He is trying to infer the calibration parameters (dark and flat for every pixel) in a detector without any calibration data, just using science data. The idea is to build a statistical model of the data and then find the calibration parameters that make that model most likely; when calibration is good, the data will be more informative or more compact in the calibrated data space (we hope). Right now performance is looking bad. I'm still optimistic, of course! Krishnan pointed out some issues, one of which is that generative models are not always highly discriminative and this project lives somewhere in the intersection of generative and discriminative. The distinction between generative and discriminative has rarely come up in astronomy. At the same meeting, we discussed Fergus's direct detection successes: He has spectra of some exoplanets! More soon, I hope.
Mike Kesden (NYU) gave the brown-bag talk today, about black-hole–black-hole binary population synthesis in preparation for gravitational radiation detection with advanced LIGO. The principal pathway for making LIGO-relevant BH–BH binaries is an insane combination of mass transfer, supernova, common-envelope evolution, supernova, and inspiral, but hey! Kesden argued that it really is likely that advanced LIGO will see these.
In the morning I met with Oded Nov (NYU Poly) and Claudio Silva (NYU Poly) to discuss possible funding proposals related to engineering, science, and citizen science. We came up with a kick-ass idea for using smart phones and angry residents to map the audio response of the city to car alarms (really any noises, but car alarms are nice
standard sirens for calibration). The project would look a lot like the (non-existent) Open-Source Sky Survey but build a three-dimensional audio-response model of the city. Cool if we did it!
Fergus and I got together to discuss how to present the significances of the exoplanet detections he is making with the P1640 coronographic spectrograph. I opined that if the model we are using was truly generative—if the model included a quantitative likelihood or probability for the data given the model parameters—then there would be a few straightforward choices. The model is very close to meeting these criteria but not quite there. We discussed relevant hacks. On the side, we started a discussion of imaging using masks, which is a branch of astronomical instrumentation where we might be able to have an impact using non-trivial imaging priors.
VanderPlas (UW) and I continued working on the cross-validation and Bayes integral comparison. I am optimistic because both quantities have exactly the same units (or dimensions). I am pessimistic because the Bayes integral involves the prior and the cross-validation doesn't. VanderPlas has some conditions under which he thinks they could be the same, but neither of us is convinced that those conditions could really be met in practice. I am wondering if they could be met in a very successful hierarchical analysis. Anyway, this is all very vague because we don't have a clear example. By the end of the day we nearly had a document and we had a plan for an example problem where we can work everything out explicitly and (more or less) exactly.
The needs of a family living 18th-century-style in the post-apocalypse weighed heavily today, so not so much research got done. However, VanderPlas (UW) and I had a breakthrough on our cross-validation vs Bayes discussion. Now we are desperately trying to finish a Hurricane paper before the electricity comes back on.
We managed to get some research done in storm-ravaged lower Manhattan: Jake VanderPlas (UW) and I continued to work on cross-validation and its relationship to the Bayes integral. I think we nearly have some conditions under which they compute essentially the same thing. Rob Fergus came over to our undisclosed location at lunch time and we discussed the calibration of his spectrophotometric measurements of exoplanets. He finds (not surprisingly) that the complexity he uses for the PSF model affects the detailed shape of the planet spectra he extracts; this happens because the planets are a tiny, tiny fraction of the total light in the image. We discussed how to express this issue in the paper. It is a kind of systematic error, in some sense.
Jake VanderPlas (UW) had the misfortune to be staying in a NYU guest apartment when Hurricane Sandy hit on Monday, taking out subway and then power and then water. In between fulfilling basic human needs for ourselves and our neighbors, we worked on the relationship between cross-validation and Bayes integrals. I think we have something to say here. It might not be original, but it is useful in understanding the relationships of methods. We both wrote some equations and then tried to develop a concordance today. We started a document. While we sheltered in the only location in lower Manhattan with power and internet, I also spoke by Skype with Lang about flexible sky models for The Tractor. Today ended with a discussion on long-term future discounted free-cash flow, about which I really must write an essay sometime very soon.
In between seminars by Baldauf (Zurich) and Goldreich (Caltech), I worked on a proposal I intend to send to Andreas Küpper (Bonn) about doing probabilistic fitting of tidal tails. I am trying to get a summary document together and then discuss how to share data and code. With his model for steady-state tidal disruption and my probability foo, I think we can learn more (and more accurately) from the Palomar 5 stream than anyone has before. I hope he agrees!
The Physics Colloquium today was by Jasna Brujic (NYU) who talked about biomimetic packing: This is creation of artificial tissue from artifical
cells, non-living analogs of biological systems. I learned things about random packing and the statistical properties of heterogeneous materials. In particular, I was interested to learn that although you can hold a (frictionless, perfect) sphere with four points of contact, if you have a frustrated random solid packing of (frictionless, perfect) spheres, each sphere will have, on average, six points of contact. As you remove points of contact, the bulk and shear moduli drop to zero.
I discussed fast imaging (lucky imaging) with Federica Bianco (NYU), Foreman-Mackey, and Fadely. Bianco is interested in data analysis methods that lead to unbiased photometry; this is hard when the PSF is a very strong function of time and space, as it is in fast imaging. We have some ideas, and some code that is half-done.
Guangtun Zhu (JHU) dropped in to discuss recent work. He showed beautiful spectral models of quasars, beautiful average absorption spectra, and beautiful galaxy–gas cross-correlation functions from quasar absorption spectra. We argued that he could, on the one hand, analyze detailed issues with SDSS spectroscopic calibration and, on the other hand, determine the fraction of the cosmic matter density in a range of atomic species. Nice diversity there!
At astronomy meets applied math group meeting (Goodman, Hou, Foreman-Mackey, Fadely, myself) we discussed Hou's insertion of Goodman's stretch move (the basis of our popular product emcee) into Brewer's nested sampling. We think we have some improvements for the method, and Hou is meeting our functional tests, so we are about to apply the method to exoplanet systems. After that we discussed an idea we have been kicking around for a year or so: If a MCMC sampler is stuck in a small number of optima and can't easily transition from one optimum to another, then we should split the parameter space up into regions such that there is only one optimum per region. Then we can sample each region independently and recombine the individual-region samplings into one full sampling. We worked out a method that involves k-means (to do the clustering to find optima) and SVM (to do the splitting of the space). In principle we could make a very general, very flexible sampler this way.
I spent the afternoon at Columbia, starting with Pizza Lunch, in which the second-year graduate students described their upcoming projects. Price-Whelan described a project with Johnston finding streams among distance-indicating stars. The three of us followed the lunch with a half-hour discussion of this. Even a single good distance-indicator in a cold stream could substantially improve our inferences about the Milky Way. It relates strongly to the fitting discussion we had with Küpper yesterday; with a good probabilistic inference framework, every new piece of data, no matter how oddly obtained, is useful. It also relates strongly to Foreman-Mackey's project to find fainter and more elusive distance indicators.
I then had a long chat with Schiminovich, which ended up with us making some plots that strongly confirm our suspicions that we can do a better job of modeling the GALEX focal plane than we have been so far. Late in the evening I submitted Lang and my mixture-of-Gaussian note to PASP. It will appear on the arXiv this week.
I had a great conversation today with Kathryn Johnston (Columbia), Joo Yoon (Columbia), and Andreas Küpper (Bonn) about cold streams of tidal origin. Küpper has results that show many things, including the following: (a) It is possible to model the detailed morphologies of tidal streams quickly without even N-body simulations. (b) The wiggly expected morphologies near the parent body are mainly due to epicyclic motions, not variable tidal stripping. (c) Plausibly the non-uniformities in density in the Palomar 5 stream can be due to these epicyclic effects. (d) There is awesome morphology expected in the position—radial-velocity plane, and Marla Geha (Yale) has the data to (possibly) test it. Nice work and the kind of work that generates multiple questions and projects for every issue it resolves. We discussed better ways to do the fitting of models to data.
There were talks today by Jim Stone (Princeton), talking about accretion disk physics from simulations, and Lauren Porter (UCSC), talking about the fundamental plane of elliptical galaxies from semi-analytic modeling. Interestingly, Porter was trying to understand the observations of the dependence of galaxy properties as a function of distance from the plane, not within the plane. It is so intriguing that the FP has been around for decades and never really been explained in terms of galaxy formation. Porter finds that it is natural for the duration of star formation to be related to position off the plane. When not in seminars, I was working on my mixture-of-Gaussians paper.
I actually did real research today, writing and making figures for Lang and my paper on mixture-of-Gaussian approximations to standard two-dimensional galaxy intensity models (think
exponential and de Vaucouleurs). I tweaked the figures so their notation matches the paper, I made figure captions, I adjusted the text, and I got the to-do items down to one day's hard work. I am so close! People: Don't use the de Vaucouleurs profile; use my approximation. It is so much better behaved. Details to hit arXiv very soon, I hope.
Mired in bureaucracy, the only research of note happened in long conversations with Goodman, Hou, and Foreman-Mackey on nested sampling, and with Yike Tang on weak lensing, and with Kilian Walsh on measuring the noise in images, and Fadely and Foreman-Mackey on initializing a mixture of factor analyzers.
Fadely and I continued work with Willman, Preston, and Bochanski today. Preston got Fadely's code working on Haverford computers. Willman found more than ten-thousand possibly relevant spectra in SDSS Stripe 82 to test our star–galaxy classifications. Bochanski found lots of relevant HST imaging and even found that SExtractor has been run on everything we need. We spent lunch planning our next steps. Willman challenged us to make sure we have a very good case that high-quality star–galaxy separation is necessary for real, planned science projects. I think we do, but it was a good reality check on this big effort.
In the afternoon, Schiminovich and Gertler and I talked about obtaining and using SDSS stars for self-calibration of the GALEX photon stream. That's going to be a good idea. We also discussed sources of the sky-to-focal-plane residuals we are seeing; we very much hope they are purely from the focal-plane distortions and not from issues with the spacecraft attitude.
Today is not over but it has already been pretty busy. Beth Willman is in town, with her research assistant Annie Preston (Haverford) and postdoc John Bochanski (Haverford) to discuss next generation ideas on hierarchical Bayesian star–galaxy separation. The idea is to apply our method to SDSS Stripe 82 data to get better star–galaxy separation that what is currently available. We discussed scope of the project and methods for verification of our success (or failure).
In parallel, Rob Fergus and I are in one-hundred-percent-effort mode on a submission for the Google Faculty Research Awards program. It is for small amounts of money but it is a very streamlined (thanks, Google overlords!) process, so I think we can get in a credible proposal. We are proposing to bring self-calibration down to the pixel level and out to the masses.
In the morning Adam Miller (Berkeley) gave a beautiful talk about the data, features, and decision trees of the Bloom-group variable-star classification based on the ASAS data. I know this project well from working with Richards and Long (hey, people, we have a paper to write!), but it was nice to see the full system described by Miller. The technology and results are impressive. The audience (including me) was most interested in whether the automated methods could discover new kinds of variables. Miller had to admit that they didn't have any new classes of variables—indicating that the sky is very well understood down to 13th magnitude—but he did show some examples of individual stars that are very hard to understand physically. So, on follow-up, they might have some great discoveries.
I have criticisms of the Bloom-group approaches (and they know them); they relate to the creation of irreversible
features from the data: The models they learn from the data (in their random forest) are generative in the feature space, but not in the original data space. This limits their usefulness in prediction and subsequent analysis. But their performance is good, so I shouldn't be complaining!
In the afternoon, Fadely and I figured out a degeneracy in factor analysis (and also mixtures of factor analyzers). We discussed but see no serious discussion of it on the web or in the foundational papers. We certainly have useful things to contribute about this method in practice.
In a low-research day the most fun I had was working in pair-code mode to speed up Fadely's mixture of factor analyzers code, now published on PyPI. Most of our speed-up came from realizing that (somewhere inside a pretty complicated expression)
c = numpy.diag(a[:, None] * b[None, :])
b are one-dimensional
numpy arrays of the same length) is exactly equivalent to
Duh! That embarrassment aside, I think we have some pretty useful code now. Time to test it out. Please help!
c = a * b
I had two conversations about Astrometry.net today. The first was with Oded Nov (NYU Poly), with whom I plan to put in an engineering proposal to work on citizen science and distributed, untrusted sensors, using Astrometry.net and something about urban sensing as examples.
The second conversation was with Kilian Walsh (NYU), with whom I am trying to get up and running a system to synthesize every submitted image right after we calibrate it. The idea is that the image synthesis (plus a healthy dose of optimization) will tell us, for each submitted image, the sky, bandpass, zeropoint, large-scale flat, and point-spread function. If we can get all that to work (a) we will be Gods on this Earth, and (b) we will be able to search the data for variable stars, transients, and other anomalies.
At computer-vision group meeting this morning, Fadely, Foreman-Mackey, and I tri-coded Fadely's mixture of factor analyzers code and got it working. This is no mean feat, if you look at the linear algebra involved. And all that before noon!
The MFA model is like a mixture-of-Gaussians model, but each Gaussian has a variance tensor with reduced degrees of freedom: Each variance tensor is a sum of a diagonal (noise) tensor plus a low-rank general tensor. It is like a generalization of PCA to build a probabilistic distribution model for the data, then generalized to a mixture. It is a very powerful tool, because in large numbers of dimensions (for the data space) you get almost all the power of mixture-of-Gaussian modeling without the immense numbers of parameters.
With Hirsch (UCL & MPI-IS) and Foreman-Mackey, I spent the whole morning discussing next-generation systems to estimate the point-spread function in heterogeneous data. Hirsch has developed a beautiful model for spatially varying PSFs, using data we took (well really he and Schölkopf took) in Valchava as the test data. We discussed various possible directions to go: In some, he works towards making a model of the PSF and its variations over collections of images from the same hardware. In others, he works towards setting PSF model complexity parameters using the data themselves. In others he models departures from smooth, parametric forms for the PSF to increase accuracy and precision. We concluded that if we can make some useful software, it almost certainly would get adopted. We also discussed integration with Astrometry.net, where we want to move towards image modeling for final calibration (and anomaly discovery!).
In the afternoon, Matias Zaldarriaga (IAS) gave a talk about measuring and understanding fluctuations in the Universe better than we can at present. In one project, he showed that we can use certain kinds of distortions away from black-body in the CMB to measure the amplitude of fluctuations on extremely small scales—scales too small to observe any other way. In another, he showed that you can do fast, precise numerical simulations by simulating not the full universe, but the departure of the universe away from a linear or second-order prediction for the growth of structure. That made me say "duh" but is really a great idea. It also gave me some ideas for machine learning in support of precise simulations, which perhaps I will post on the ideas blog.
Michael Hirsch (UCL & MPI-IS) arrived for two days at NYU. We talked about self-calibration and blind deconvolution. On the latter, I was arguing that many things people usually do in computer vision might not work for astronomy, because astronomers expect to be able to make measurements (especially flux measurements) on their processed images. Some computer vision methods break that, or make measurements highly biased. On that point, I did my usual disparage of non-negative. Like Schölkopf, Fergus disagreed: If we think the fundamental image-formation mechanism is non-negative, then non-negative is the way to go methodologically. I think there might be a problem if you impose non-negative but not, at the same time, other things that are similarly informative that you know about the imaging. Anyway, we left it that I would make a fake data set that obeys exactly the image formation model but still leads to badly biased results when standard blind deconvolution is applied to it. That would be a service to this endless argument.
We also thought more and argued more about the idea that Fadely's brain-dead model of tiny patches of SDSS imaging data could be used for self-calibration purposes. We have a rough plan, but we are still contemplating whether the calibration and the data model could be learned simultaneously.
The graphical model I posted yesterday for the GALEX Photon Catalog is, in fact, wrong. I worked on it more today, and will post a better version as soon as I start to really understand Schiminovich's comments about the part of the PSF that comes from the electronics: The reason that GALEX has a broad PSF is in part electronic!
In other but conceptually related news, NYU graduate student Yike Tang (who started working with me recently) has been able to show that building a joint hierarchical model of galaxy shapes and the weak-lensing shear map permits much higher precision (or much higher spatial resolution) shear maps than what you get by doing averaging of galaxy shapes. This idea has been kicking around for a while but I think (and, despite my open-science polemics, hope) that we are the first to actually do it. Much debugging remains, and we are very far from having a paper, but we may just have made LSST and Euclid and DES much more powerful at fixed cost!
In the morning at our computer-vision group meeting, Fergus told us about his detection of and spectral extraction for three of four companions of HR8799, the poster child for young planetary systems. This is a huge success for his methods (which are computer-vision based, flexible linear models) and for the P1640 spectrographic coronograph (Oppenheimer PI). He is pretty stoked about the success—his software has been game-changing for the P1640 project—as is Oppenheimer, who wants to publish asap.
In the afternoon, at the insanity that is the Leroy St New York Public Library for children, I made the graphical model (below) using Daft. Based on a two-hour conversation with Schiminovich, it is supposed to represent the data-generation process (or, perhaps, causal model) for the photon stream we have from the GALEX spacecraft. I think it still has some errors.
I spoke in the CCPP Brown Bag today, about computer vision and its connections to cosmology. I realized only during the talk (and therefore way too late to change course) that really the best connection between computer vision and cosmology to emphasize right now is in the area of self-calibration. In computer vision, it is often the case that you have to determine the distortion or noise or blurring kernel of the image using that image alone; that is, you have no external calibrating information. This is also true for precise astronomical data sets: The best information about how the instrument works comes from the science data themselves. Oh well, I will do a better job next time! I can also then connect the stuff I am doing with Fergus and Fadely with the things I am doing with Foreman-Mackey and Holmes.
Inspired by all this, I spent part of the afternoon making a huge ginormous graphical model with Daft for all of astronomy.
Foreman-Mackey and I finished the first releasable version of Daft, simple software for rendering graphical models for publication. And remember: If your code doesn't make you cringe, you haven't released it early enough.
It will be with horror that my reader learns that Lang and I spent part of the morning in a pair-code session binning down SDSS data to larger (less informative) pixels. We had to bin down everything: The data, the error model, the point-spread function, the photometric calibration, the astrometric calibration, etc. Why did we do it, you may ask? Because for the Sloan Atlas project, we are encountering galaxies that are so large on the sky (think
M101) that we can't—without major code changes asap—fit the data and model and derivatives into (our huge amount of) memory, even in our (fairly clever) sparse implementation of the problem. The amazing thing is that by the end of the day we (meaning Lang) got something that works: We can run The Tractor on the original data or the rebinned data and it seems to give very similar results. Testing tomorrow!
In the afternoon, Andrew MacFadyen (NYU) gave the Physics Colloquium, about ultrarelativistic plasma problems, motivated by gamma-ray bursts. The most interesting things to me in this business are about universality: In the non-relativistic limit there is the Sedov–Taylor scale-free expanding explosion model. In the ultra-relativistic limit there is the Blandford–McKee jet model. However, on the latter, the different parts of a collimated jet can't actually communicate with one another laterally (for relativistic causality reasons), so there is no possibility of homogeneity. In other words, the jet must be a heterogeneous mixture of jets, in some sense. The mixture fuses together into one jet continuously over time. That seems like a very interesting physics problem. MacFadyen and his group have been doing fundamental work, with awesome visuals to boot.
Today in our weekly meeting, Hou gave Goodman, Foreman-Mackey, and I a tutorial on nested sampling, explaining how his extension of diffusive nested sapling—using the affine-invariant MCMC sampler—can sample the posterior PDF and compute the marginalized likelihood (the
Bayes factor or
evidence). I still don't completely understand; the method is certainly not trivial. We are hoping the affine-invariant sampler will create performance advantages over other implementations, and we have lots of evidence (generated by and with Brewer) that diffusive nested sampling is awesome for multi-modal posterior PDFs.
Way back in the day, my friend and colleague Sam Roweis worked on making principal components analysis (a method I love to bash but occasionally use) into a probabilistic model. He said (very sensibly in this note):
Finally, the PCA model itself suffers from a critical ﬂaw which is independent of the technique used to compute its parameters: it does not deﬁne a proper probability model in the space of inputs. This is because the density is not normalized within the principal subspace. In other words, if we perform PCA on some data and then ask how well new data are fit by the model, the only criterion used is the squared distance of the new data from their projections into the principal subspace. A datapoint far away from the training data but nonetheless near the principal subspace will be assigned a highHe proposed fixes for these problems and they generally look like (a) putting some non-trivial Gaussian or similar distribution down in the low dimension PCA subspace, and (b) putting some more trivial Gaussian (like perhaps an isotropic one) down in the high dimension orthogonal (complementary) subspace. This converts PCA into a constrained maximum-likelihood or MAP for a (simple) probabilistic model.pseudo-likelihoodor low error. Similarly, it is not possible to generatefantasydata from a PCA model.
Today, Fergus proposed that we do something strongly related in Fadely's project to fit a probabilistic data-driven model to a huge collection of SDSS imaging patches: We will use a mixture of Gaussians to model the distribution, but reduce the dimensionality of the fit (not the dimensionality of the mixture but the dimensionality of the parameter space) by making each Gaussian be composed of a low-dimensional non-trivial structure times a higher dimensional trivial structure. These kinds of models can capture much of what a completely free mixture of Gaussians can capture but with many fewer parameters and much faster optimization and execution. We also figured out symmetry considerations that massively reduce the diversity of the training data. So life looks very good in this sector.
Neal Weiner (NYU) gave the brown-bag about recent exciting developments in Fermi—a line in the Galactic Center region at 130 GeV—and also explained the relationship between this observation and the
WIMP miracle that a weak-scale particle naturally explains the abundance of dark matter in a thermal freeze-out model. It was a great seminar in every way (pedagogical, current, and highly editorial), and got me more excited than I already was about this possibility that the dark matter is illuminated.
After brown-bag, David Levitan (Caltech) gave an informal talk about finding variable stars in the Palomar Transient Factory data. He focused on AM CVn stars, which are crazy high mass-ratio double-degenerate binaries. He has done great work on calibrating and using for discovery (mining?) the PTF data.
Norm Murray (CITA) gave the astrophysics seminar. He showed pretty convincingly that massive star-formation regions are effectively Eddington-limited, meaning that the accretion of material onto stars and the gravitational collapse is halted or controlled by ultraviolet radiation pressure on dust in the interstellar medium. This has lots of implications for present-day and early-universe star formation. He showed a very nice mix of theory and observations, including some nice stuff about giant molecular clouds in the Milky Way disk
The rest of the day was largely spent with Foreman-Mackey, building daft, a package for rendering probabilistic graphical models or directed graphs. It is designed to make figures in which the symbols are exactly the same size and font as those in the paper. Here's an example:
Dilip Krishnan (NYU) came over to the CCPP to school us cosmologists in optimization today. The conversation (in the lounge) was wide-ranging, but the most interesting parts of it (from my perspective) were about stochastic-gradient methods in sparse problems.
A stochastic-gradient method (or online method) is one that permits you to look sequentially at parts of your data (not all the data at once) but nonetheless optimize global parameters that affect all of the data. The idea is: You have so much data, you can't fit it all on disk, so you just sample a bit of it, update your parameters according to what it says, throw away the bit of data, and grab the next bit. Iterate to convergence. Foreman-Mackey and I have been investigating this in the context of The Thresher.
A sparse problem (in my parlance) is one in which most parameters only affect a small fraction of the data, and most data are only affected by a small fraction of the parameters. It is not obvious to me that the stochastic-gradient methods will work (or work well) on sparse problems, at least not without modification.
I promised Krishnan that I would deliver to him an example of a realistic, convex, sparse problem in astronomy. Convexity ensures that we have no disagreements about what constitutes success in optimization. It will provide a sandbox for investigating some of these issues.
In a low-research day, Greg Dobler (UCSB) gave an informal and impromptu journal-club talk about his work on Planck understanding the haze or bubbles at the center of the Galaxy. He confirms confidently the WMAP results and the bubbles seem to have a hard spectral index. After the talk we momentarily discussed next-generation analyses of the three-dimensional galaxy, that would include stars, dust, gas, magnetic fields, and cosmic rays. In the afternoon, Gertler and I discussed the GALEX photons and the difficulty of optimizing non-trivial models. We also discussed the likelihood for Poisson processes with variable rates, which is a nice problem.
In separate conversations, I spoke with Yike Tang and MJ Vakili (both NYU graduate students) about doing hierarchical inference to infer the weak-lensing cosmological shear map at higher resolution than is possible by any shape-averaging method. Tang is working on building priors over shapes, and Vakili is looking at whether we could build more general priors over galaxy images. That, as my loyal reader knows, is one of my long-term goals; it could have many applications beyond lensing. At lunch, Glennys Farrar (NYU) gave a provocative talk about the possible effect of pion–matter effects (like an index of refraction) to resolve issues in understanding ultra-high energy cosmic ray showers in the atmosphere, issues that have been pointing to possible new physics.
My PhD student Tao Jiang graduated today. His thesis involved the cross-correlations of various galaxy populations as a function of color and luminosity. He was able to show that the red-galaxy population has been growing since redshift unity mainly from merger activity, and that the growth is dominated by red-red mergers (that is,
dry mergers). He also showed that the rate of blue-galaxy mergers is consistent with the rate at which blue galaxies transform into red galaxies (pass into the red population), which is pretty striking. Congratulations, Dr Jiang!
On the flight home, I wrote this paragraph (subject to further revision) for my self-calibration paper with Holmes and Rix. For context,
strategy A is the traditional uniform, aligned, square-grid survey strategy with small marginal overlaps;
strategy D is a quasi-random strategy where no two fields are precisely aligned in any respect.
Ultimately, survey design requires trade-offs between calibration requirements and other requirements related to cadence, telescope overheads, and observing constraints (like alt, az, airmass, and field-rotation constraints). One requirement that is often over-emphasized, however, is conceptual or apparent ``uniformity''; these different survey strategies have different uniformities of coverage, each with possibly non-obvious consequences. Many astronomers will see survey strategy A as being ``more uniform'' in its coverage (especially relative to survey D). This is not true, if uniformity is defined by the variance or skewness or any other simple statistic in exposure times; strategy D is extremely uniform (more uniform than Poisson, for example). In any survey, past and future, variations in exposure time have been valuable for checking systematic and random errors, and don't---in themselves---make it difficult to obtain a uniform survey in properties like brightness (since samples can be cut on any well-measured property). In general, in the presence of real astronomical variations in distributions of luminosity, distance, metallicity, and (more importantly) extinction by dust, there is no way to make a survey uniformly sensitive to the objects of interest. As a community we should be prepared to adjust our analysis for the non-uniformity of the survey rather than adjust (cut) our data to match the uniformity of unrealistically simplified analyses. This is already standard practice in the precise cosmological measurement experiments, and will be required practice for the next generation of massively-multiple-epoch imaging surveys.
The meeting got all semantics-y and I think I have been clear in the past about my feelings about semantics, so I won't expand on that here except to say that on the one side, if semantics are to be applied to data or papers or web pages by their authors, the tags are being applied with strong biases and incentives (all meta-data are wrong). On the other side, if semantics are to be generated by machines, then the tags are all probabilistic and we should not call them semantics but rather hyper-parameters and provide probability distribution functions over them, prior and posterior.
Highlights of the day included Alex Gray's talk on fast methods, which was followed by a short but useful discussion of discriminative and generative methods and the relationship to noise models and utility. Also Ashish Mahabel gave a nice overview of his classifiers for CRTS, which includes exactly the elements needed to do utility-based decision-making and experimental design. Between talks, Bob Hanisch and I discussed getting Astrometry.net data into the Virtual Observatory. I think that would be good for both projects. (And I guess, in some sense, it means that I will be publishing semantics. Oh well!)
I arrived at AstroInformatics 2012 today (day two) in the car of Zeljko Ivezic (UW). There were many great talks. Highlights for me included Jojic talking about de-noising images and video using image textures built from image patches. He had very impressive results with an exceedingly simple model (that patches of the image are all distorted views of some
fiducial patch). Longo claimed that he can beat XDQSOz with a new data-driven model. I am looking forward to reading the details. Norris described the SKA Pathfinder project EMU, which is going to be awesome. 70 million galaxies, many of them just star forming, detected in the radio, over 75 percent of the sky. All data will be released within a day of taking. The analysis is going to be done with attention paid to faint source detection. The raw uv data will be thrown away after analysis, which scares the crap out of me but hey, that's the new world of huge projects!
In the breaks and on the drive, I discussed with Ivezic and Jacob VanderPlas (UW) their nascent book on astronomical data analysis. It is a comprehensive work, with many worked out examples. It will ship with the code making all figures released open-source. I gave some comments after skimming a few sections and I was so stoked that I volunteered to read a bit of it critically pre-publication. Nice work, people!
After a low-research morning (punctuated by an excellent talk on the post-Higgs LHC by Andy Haas of NYU), I spent my flight to Seattle madly working on my talk for AstroInformatics 2012. I am talking about why the map–reduce (or Hadoop or whatever) frameworks for data analysis will not be sufficient for the future of astrophysics and why we have to develop new things ourselves. The map–reduce framework is great because it is a general framework for solving problems in log N time. But, to my knowledge, it can't do anything like real probabilistic hierarchical models without egregious approximations. I don't have much specific to propose as an alternative except making brute force a lot less brute.
Lam Hui (Columbia) gave a fun talk about various topics in our astro seminar. One point was about detection of stochastic gravitational radiation using the binary pulsar as a
resonant detector. This is a great idea; the binary pulsar is sensitive at much longer periods than other detectors, and our metrology of the system is excellent. Mike Kesden (NYU) and I were confused about whether the effect he seeks really is stochastic, given the sources and the narrow-band detector. We resolved to work it out later.
Late in the day, Patel and Mykytyn updated me on the Sloan Atlas measurements. The measurements they are making look great; for many of the galaxies we are measuring these are the first reliable measurements ever from the SDSS. We discussed a bit the need and plans for checking that our photometry is correct. This is a hard thing to do; what's the absolute truth for these previously unmeasured objects?
Fadely, Foreman-Mackey, and I worked on some point-spread-function fitting in preparation for making adjustments to how this is done in The Thresher and perhaps also to criticize how it is often done in standard pipelines. There are many issues: What basis to use for the PSF, what weighting to use in the fits, and what regularizations or priors to apply. The point we are coming to is that if you apply the non-negative regularization, you have to also put on other regularizations or else you can get highly biased results.
At lunch, Fergus, Fadely, and I met with Brenner (AMNH) and Nilsson (AMNH) to discuss the P1640 data and analysis next steps. After lunch we all got into an argument about finding the location of the center of the
optical flow—the point from which the speckles expand with wavelength. I argued that this is not defined if you let the point itself be a function of wavelength; then you are in the situation of the expanding Universe, where everyone thinks he or she is at the center. Same for speckles in the focal plane: Every speckle can have a comoving, colocated
observer who thinks he or she is at the center of the focal plane! Of course there really is some center, because the telescope and camera have an axis, so the point is purely amusing.
In the afternoon, Keith Chan (NYU) defended his PhD (supervised by Scoccimarro) on understanding dark-matter simulations in terms of halo clustering and halo statistics. He has created a valuable piece of work and the defense was a pleasure.
Ross Fadely is NYU's newest employee; I spent an otherwise very low-research day chatting with him (and also Fergus) about all the things we could work on together. We went into particular detail on imaging and models of images, on radio astronomy and the issues with CLEAN, and on optimization and other engineering challenges. Fergus pitched a no-knowledge purely data-driven model for the SDSS data. It is insane but could be useful and definitely fun. The use cases we came up with include
patching missing or corrupted data and identification of astrophysically interesting outliers and anomalies. With Fadely at NYU, the future is bright!
One piece of advice I give to thesis-writers is to write the acknowledgements early. If you don't, you will forget people, and (also) precise wording is valuable (so letting it get re-read many times is good). Taking my own advice, I started writing the acknowledgements for the Sloan Atlas of Galaxies today. I also had a long conversation with Fergus about next steps on all our projects.
On a packing-and-travel-saturated day, research accomplishment was thin, although on the plane I took the first steps towards putting analytic derivatives into our high-resolution Herschel dust emissivity model. The derivatives should enormously speed optimization, if Fergus and Krishnan are to be believed.
Today was my last day at MPIA for 2012. I spoke at length with Karin Sandstrom (MPIA) about fitting combined HI, CO, and dust-emission data, and with Sandstrom and Kapala about Kapala's comparison of ultraviolet stars with Herschel dust. I also had a final conversation with Tsalmantza about all the lensing-related work she has been doing; she is still finding more candidate gravitational lenses and has prepared a long document for publication or follow-up, filled with candidates. Expect them to hit my ideas blog any day now! I am not getting any traction on my Atlas, which is what I have to finish in the next two months! Don't distract me, people!
After a long hiatus Lang and I resumed our quasi-regular Skype calls to discuss The Tractor and publications thereof. We sure are taking a long time to deliver! Lang proposed a brilliant de-scope that permits a paper to be written in short order without much additional engineering beyond what we have already done: Let's write a paper in which we synthesize (make model images for) every SDSS field imaged to date, in all five bands, and do investigations on the residuals. Investigations can include goodness-of-fit-like statistics but also overlap of the residuals with derivatives of the model with various parameters (like object positions, fluxes, and shapes). The idea would be to use the official SDSS catalog and calibration parameters (astrometry, PSF, photometric zeropoints) without analysis, so that the synthesis would be a full-survey vet of the full system. The fundamental assumption is that the catalog is a set of model parameters that describe quantitatively the data. That would be fun, and show that we can do something, at least,
at scale. I also had interesting conversations with Tom Robataille (MPIA) about astropy and the future of data analysis, and Kapala about ultraviolet HST data and dust.
So many papers are so close to being done and yet...! Today Holmes and I gave a final push on responding to referee on our self-cal paper. We just have to let all the code run through one more time, update the final numbers and submit (I very much hope). The flip side of all the work is that the paper is very much improved, thanks to a good referee and many comments from the community.
On a low-research day I prepared and gave a short coffee talk at MPIA about our work on the insane-robot censored-data project (with Richards, Long, Foreman-Mackey, and Bloom). After me Ronald Läsker (MPIA) spoke about galaxy bulge-disk fitting of two-dimensional galaxy images and the insanity of it all; what you get for the bulge mass is a very strong function of how many components you include and which ones you think of as being
bulge. He finds that adding more galaxy components can increase or decrease the inferred bulge mass by factors of a few in typical cases. So unless someone has a full kinematic decomposition, don't believe a bulge mass uncertainty that is more precise than a factor of two! In general, the problem with bulge measuring is in the interpretation of fits that are otherwise extremely valuable for photometry and other purposes; if we just think of the amplitudes of the components as uninterpretable latent parameters, they don't cause any trouble at all, and you still get good colors and magnitudes.
I had a long conversation with Joe Hennawi and Beta Lusso (MPIA) about Lusso's very nice fitting of multi-wavelength photometry of AGN with a combined stars plus disk plus torus plus cold dust model. She has a few to hundreds of templates for each of the four components so she is fitting tens of millions of qualitatively different models. I think the performance of her fitting could be improved by learning priors on the templates much as we did for our hierarchical bayesian star–galaxy classification project. Of course it would be very computationally expensive and it might not help with her core goals, so I wasn't advocating it strongly.
However, I do believe that if you have enormous numbers of templates or archetypes or models for some phenomenon and you have many data points or real examples, you really have to use hierarchical methods to control the complexity of the problem: There is no way that all of your models are equally plausible a priori, and the best way to set their relative plausibilities is to use the data you have.
This also resolves an age-old problem: How do you decide how many templates or archetypes to include? The answer is: Include them all. The hierarchical inference will take care of down-weighting (even zeroing-out, in our experience) the less useful ones and up-weighting the more useful ones.
This whole thing is pretty deep and I can write or talk about it for hours: In real fitting situations with real data, it is always the case that your model is both too flexible and not flexible enough. It is too flexible because your model space permits fitting data that you could never possibly see for lots of reasons fundamental and incidental. It is not flexible enough because in fact the models are always wrong in subtle and not-so-subtle ways. I have a growing idea that we can somehow solve both of these problems at once with a
flood with archetypes, mop up with hierarchical inference approach. More on this over the next few years!
I wrote some code to re-compute the GALEX spacecraft orientation given a first guess. My code just looks for the linear combination of attitude components that makes a chosen compact astronomical source compact on the celestial sphere. It seems to work on fake data; the plan for this week is to incorporate it into our code that plots the full photon time history of any GALEX source. Then it will permit that code to output probabilities (or odds), one for each photon, that the photon was generated by the smooth sky background as opposed to the chosen compact source, under assumptions about source variability and other things.
After great conversations in Valchava, visions of sugar-plums and spike-slab priors have been dancing in my head. So I spent today (and much of the past weekend) writing furiously a couple of documents. The first is a white-paper regarding a justified probabilistic approach to radio interferometry data. This captures things my loyal reader has been reading about for the last few weeks, plus many new ideas coming from Valchava.
The second document is a white-paper regarding the possibility of calibrating the pixel-level flat-field map (that is, many millions of parameters) for an imaging camera using the science data alone (that is, no sky or dome flats). This latter project would work by looking at consistency among neighboring pixels given (and this requires assumptions) an observing program that observes without trying to put particular science objects in particular locations. One of the things I love about this project is that sky flats can be very misleading: The smooth sky illuminates the detector differently than a star does, because there are often unbaffled stray light paths and reflection paths. I have been thinking about these issues because they might be applicable to the Euclid mission, which currently plans to rely on internal LED illuminators for calibration information.
I am not sure why I write these white papers. They are a huge amount of work, and although they are great for organizing and recording my ideas, almost none of them ever results in a real publication.
Today was radio-interferometry day. Rix and I explained our issues with CLEAN, and we discussed with the computer vision types. Fergus described CLEAN as a
conservative greedy algorithm, which is very appropriate. In the end we had some good ideas about how to attack the problem, with agreement from everyone. The only point of concern was between those (Harmeling representing) who thought we might be able to proceed without strong priors and those (Fergus and Schuler representing) who thought that strong priors would be necessary to deal with the spatial frequencies at which there is no u-v coverage. At the end of the day I was assigned the task of writing down what we learned and agreed upon. I started on that.
One interesting thing about CLEAN is that it averages visibilities on a grid in u-v so that it can use the fast fourier transform. This is a great idea, but makes forward modeling harder. We are hoping we can stay fast without this trick. In our vision (codenamed NAELC) we never transform in that direction anyway.
Rix arrived in Valchava today, and (after a morning hike) proceeded to talk about making high-resolution models of multi-resolution (or heterogeneous) multi-spectral data, with the example being our dust maps made from Herschel data. One thing I learned in the discussion is that we maybe should be using L-BFGS-B optimization instead of the hacky Levenberg-Marquardt-like thing we are doing now. I was surprised, since Lev-Mar-like methods know that the problem is least-squares-like, whereas L-BFGS-B won't, but the experts (especially Dilip Krishnan) advised strongly.
After another morning hack session where the only real progress was getting spectral classification information for Muandet, Rob Fergus and I talked about astronomy challenges of relevant to the computer vision types. Fergus talked about our project on high dynamic-range imaging, where the problem is to make a data-driven model of an extremely complicated and time-variable point-spread function when we have very few samples. This wouldn't be hard if we didn't care about finding exoplanets that are much much fainter than the speckley dots in the PSF. The advantage we have with the P1640 data we have been using is that the speckles (roughly) expand away from the optical axis with wavelength, whereas faint companions are fixed in position.
I talked about what I see are three themes in astronomy where astronomers need help: (1) In large Bayesian inferences, we need methods for finding, describing, and marginalizing probability distributions. (2) We have many problems where we need to eventually combine supervised and unsupervised methods, because we have good theories but we can also see that they fail in some circumstances or respects (think: stellar spectral models). (3) The training data we use to learn about some kind of objects (think: quasars) is never the same in signal-to-noise, distance, brightness, redshift, spectral coverage, or angular resolution as the test data on which we want to run our classifications or models next. That is, we can't use purely discriminative models; we need generative models. That's what XDQSO did so well.
On the last point, Schölkopf corrected me: He thinks that what I need are causal models, which are often—but not always—also generative. I am not sure I agree but the point is interesting.
In the morning, Schuler, Harmeling, Hirsch, Schölkopf, Hormuth, and I started the necessary work to convert some of Schuler and Hirsch's blind deconvolution code over into a package that could take an arbitrary astronomical image and return a spatially varying point-spread function. We also, on the side, struggled with downloading and compiling Astrometry.net code. Schölkopf was surprised to hear that we don't have funding for that project!
In the afternoon, Dilip Krishnan (NYU) talked about his work with Fergus to remove rain and other kinds of complex, heterogeneous occlusions from photographs. They are using a discriminative (rather than a generative) approach, which in this case means that they train with data as close as possible to the data that they want to fix. That sets up some ideas for my talk tomorrow. There was also discussion of his and Fergus's very clever priors on image gradients in natural images: There are some 0.8 exponents that amuse me.
Felix Hormuth (MPIA) talked about the lucky imaging camera AstraLux, which he built and has been very successful and was incredibly cheap to build. He showed the data and gave the crowd some ideas about how astronomical seeing arises. He also showed some data sets that do not properly reduce under the standard, vanilla lucky-imaging-style pipelines. He distributed data to the interested parties.
Sam Hasinoff (Google) continued the craziness of yesterday, talking about inferring a scene from the shadows and reflections it creates in illuminated objects. Lambertian objects (objects that don't produce specular reflections) are not very informative when they are convex, but become valuable when their shapes get more complex. He showed some very nice results reconstructing Mars from single-pixel photometry data, which are very related to recent exoplanet studies with Spitzer.
Krik Muandet (Tübingen) showed
support measure machines, which are a generalization of SVMs attempting to deal with the idea that the data might in fact be probability distributions rather than samples. He is trying to improve upon XDQSO target selection. Tomorrow I am going to show him how to test his results using the just-released SDSS-III DR9 Data
Today was the start of a computer-vision-meets-astronomy workshop convened by Michael Hirsch and Bernard Schölkopf in Valchava, Switzerland. The talks are informal, discussion ample, and we plan to spend the mornings hacking on problems of mutual interest. I came down from Heidelberg with Felix Hormuth (MPIA), who brought a few hard drives of lucky-imaging data for us to play with.
Bill Freeman (MIT) kicked things off by talking about his project to make an image of the Earth, using Earthshine on the Moon as his data source. The idea is that the details of the shadows on craters should encode (low-resolution) information about the image of the Earth. Crazy! It is like an inverted or messed-up pinhole camera, and he showed some outrageously high-performance prior work using occulters in an otherwise open spaces to do imaging (like video of people walking around in rooms compared with reference images telling you what scene is outside the window of the room!). He also showed some work exaggerating or amplifying tiny differences in images used to extrapolate slow processes in time; I recommended he look at HST imaging of things like V838 Monocerotis to make data-driven predictions of the pre-explosion and future state.
Christian Schuler (Tübingen) showed incredible results on blind and non-blind deconvolution of images with nasty or complicated point-spread functions. Schölkopf and I got interested in applying his methods to astronomical data—particularly his methods for determining the PSF blind—we plan to hack on this tomorrow morning.
Mykytyn continued to work on our measurements of huge galaxies, after a lot of fiddling around with various kinds of galaxy radii. He also figured out that we can measure M51a no problem, and M51b will probably work too, so we can do some pretty big galaxies with The Tractor.
While he was shepherding fits, I was building a data-driven model of the psf from the Fizeau Interferometer called LMIRcam on LBT (thanks to Andy Skemer of Arizona). Here's an example (below). The left is the data, and the right is my model. It looks just like a textbook example of an optical interferometry PSF (which up to now have only appeared in texbooks, never in the readout data from real instruments)!
In between talks by KG Lee (MPIA), Bovy, and Farrar (NYU), Mykytyn and I continued to work on the Atlas measurements, and Kapala and I looked some more into the relationship between Herschel dust maps and PHAT extinction measurements on luminous stars. The latter is quite a mess and we can't figure out what parts of the mess are problems with the star fitting or real variance in the extinction, and what parts of the mess are problems with the star fitting or real correlations between star and dust properties. We discussed with Groves.
The radio world uses CLEAN—and an infinite variety of alternatives and modifications thereto—for building radio interferometry maps. Today I met with Fabian Walter (MPIA), Frank Bigiel (Heidelberg), Tom Herbst (MPIA), and Rix to discuss probabilistic replacements. While we don't have a plan, we were able to come up with the following reasons that CLEAN needs improvement (these all apply to vanilla CLEAN; there are non-vanilla versions that fix some of these issues but not all, to my knowledge): (1) It is some kind of optimizer, but a heuristic optimizer (that is, there are optimizers out there with better properties) and it is not optimizing a well-specified (let alone justified) scalar objective. (2) It requires heuristic decision-making about stopping. (3) The noise in the final map it makes can only be estimated by taking the rms in
empty regions; it has no error or noise model and doesn't propagate uncertainties from the fundamental visibilities (let alone calibration uncertainties and so on). (4) The final map it makes has some flux convolved with the clean beam, and some fainter flux convolved with the dirty beam; this means that there is no well-defined point-spread function and the absolute calibration of the map is ill-defined. (5) It provides no mechanism for producing or quantitatively comparing alternative maps; that is, it produces only a point estimate, no judgement of that estimate or sampling around that estimate. (6) It requires binning of the visibility data in the u-v plane. (7) There is no mechanism by which the map it produces is standardly or by convention projected back into the space of visibilities and compared to the raw data to vet or confirm any fundamental or even heuristic noise model.
In addition to all this I could add things about the scene model and priors on the scene model and so on, but these aren't really the problem of CLEAN itself. In other news, Mykytyn continued working on getting large numbers of RC3 galaxies measured by The Tractor and even started writing up some of his work.