Foreman-Mackey and I finished the first releasable version of Daft, simple software for rendering graphical models for publication. And remember: If your code doesn't make you cringe, you haven't released it early enough.
It will be with horror that my reader learns that Lang and I spent part of the morning in a pair-code session binning down SDSS data to larger (less informative) pixels. We had to bin down everything: The data, the error model, the point-spread function, the photometric calibration, the astrometric calibration, etc. Why did we do it, you may ask? Because for the Sloan Atlas project, we are encountering galaxies that are so large on the sky (think
M101) that we can't—without major code changes asap—fit the data and model and derivatives into (our huge amount of) memory, even in our (fairly clever) sparse implementation of the problem. The amazing thing is that by the end of the day we (meaning Lang) got something that works: We can run The Tractor on the original data or the rebinned data and it seems to give very similar results. Testing tomorrow!
In the afternoon, Andrew MacFadyen (NYU) gave the Physics Colloquium, about ultrarelativistic plasma problems, motivated by gamma-ray bursts. The most interesting things to me in this business are about universality: In the non-relativistic limit there is the Sedov–Taylor scale-free expanding explosion model. In the ultra-relativistic limit there is the Blandford–McKee jet model. However, on the latter, the different parts of a collimated jet can't actually communicate with one another laterally (for relativistic causality reasons), so there is no possibility of homogeneity. In other words, the jet must be a heterogeneous mixture of jets, in some sense. The mixture fuses together into one jet continuously over time. That seems like a very interesting physics problem. MacFadyen and his group have been doing fundamental work, with awesome visuals to boot.
Today in our weekly meeting, Hou gave Goodman, Foreman-Mackey, and I a tutorial on nested sampling, explaining how his extension of diffusive nested sapling—using the affine-invariant MCMC sampler—can sample the posterior PDF and compute the marginalized likelihood (the
Bayes factor or
evidence). I still don't completely understand; the method is certainly not trivial. We are hoping the affine-invariant sampler will create performance advantages over other implementations, and we have lots of evidence (generated by and with Brewer) that diffusive nested sampling is awesome for multi-modal posterior PDFs.
Way back in the day, my friend and colleague Sam Roweis worked on making principal components analysis (a method I love to bash but occasionally use) into a probabilistic model. He said (very sensibly in this note):
Finally, the PCA model itself suffers from a critical ﬂaw which is independent of the technique used to compute its parameters: it does not deﬁne a proper probability model in the space of inputs. This is because the density is not normalized within the principal subspace. In other words, if we perform PCA on some data and then ask how well new data are fit by the model, the only criterion used is the squared distance of the new data from their projections into the principal subspace. A datapoint far away from the training data but nonetheless near the principal subspace will be assigned a highHe proposed fixes for these problems and they generally look like (a) putting some non-trivial Gaussian or similar distribution down in the low dimension PCA subspace, and (b) putting some more trivial Gaussian (like perhaps an isotropic one) down in the high dimension orthogonal (complementary) subspace. This converts PCA into a constrained maximum-likelihood or MAP for a (simple) probabilistic model.pseudo-likelihoodor low error. Similarly, it is not possible to generatefantasydata from a PCA model.
Today, Fergus proposed that we do something strongly related in Fadely's project to fit a probabilistic data-driven model to a huge collection of SDSS imaging patches: We will use a mixture of Gaussians to model the distribution, but reduce the dimensionality of the fit (not the dimensionality of the mixture but the dimensionality of the parameter space) by making each Gaussian be composed of a low-dimensional non-trivial structure times a higher dimensional trivial structure. These kinds of models can capture much of what a completely free mixture of Gaussians can capture but with many fewer parameters and much faster optimization and execution. We also figured out symmetry considerations that massively reduce the diversity of the training data. So life looks very good in this sector.
Neal Weiner (NYU) gave the brown-bag about recent exciting developments in Fermi—a line in the Galactic Center region at 130 GeV—and also explained the relationship between this observation and the
WIMP miracle that a weak-scale particle naturally explains the abundance of dark matter in a thermal freeze-out model. It was a great seminar in every way (pedagogical, current, and highly editorial), and got me more excited than I already was about this possibility that the dark matter is illuminated.
After brown-bag, David Levitan (Caltech) gave an informal talk about finding variable stars in the Palomar Transient Factory data. He focused on AM CVn stars, which are crazy high mass-ratio double-degenerate binaries. He has done great work on calibrating and using for discovery (mining?) the PTF data.
Norm Murray (CITA) gave the astrophysics seminar. He showed pretty convincingly that massive star-formation regions are effectively Eddington-limited, meaning that the accretion of material onto stars and the gravitational collapse is halted or controlled by ultraviolet radiation pressure on dust in the interstellar medium. This has lots of implications for present-day and early-universe star formation. He showed a very nice mix of theory and observations, including some nice stuff about giant molecular clouds in the Milky Way disk
The rest of the day was largely spent with Foreman-Mackey, building daft, a package for rendering probabilistic graphical models or directed graphs. It is designed to make figures in which the symbols are exactly the same size and font as those in the paper. Here's an example:
Dilip Krishnan (NYU) came over to the CCPP to school us cosmologists in optimization today. The conversation (in the lounge) was wide-ranging, but the most interesting parts of it (from my perspective) were about stochastic-gradient methods in sparse problems.
A stochastic-gradient method (or online method) is one that permits you to look sequentially at parts of your data (not all the data at once) but nonetheless optimize global parameters that affect all of the data. The idea is: You have so much data, you can't fit it all on disk, so you just sample a bit of it, update your parameters according to what it says, throw away the bit of data, and grab the next bit. Iterate to convergence. Foreman-Mackey and I have been investigating this in the context of The Thresher.
A sparse problem (in my parlance) is one in which most parameters only affect a small fraction of the data, and most data are only affected by a small fraction of the parameters. It is not obvious to me that the stochastic-gradient methods will work (or work well) on sparse problems, at least not without modification.
I promised Krishnan that I would deliver to him an example of a realistic, convex, sparse problem in astronomy. Convexity ensures that we have no disagreements about what constitutes success in optimization. It will provide a sandbox for investigating some of these issues.
In a low-research day, Greg Dobler (UCSB) gave an informal and impromptu journal-club talk about his work on Planck understanding the haze or bubbles at the center of the Galaxy. He confirms confidently the WMAP results and the bubbles seem to have a hard spectral index. After the talk we momentarily discussed next-generation analyses of the three-dimensional galaxy, that would include stars, dust, gas, magnetic fields, and cosmic rays. In the afternoon, Gertler and I discussed the GALEX photons and the difficulty of optimizing non-trivial models. We also discussed the likelihood for Poisson processes with variable rates, which is a nice problem.
In separate conversations, I spoke with Yike Tang and MJ Vakili (both NYU graduate students) about doing hierarchical inference to infer the weak-lensing cosmological shear map at higher resolution than is possible by any shape-averaging method. Tang is working on building priors over shapes, and Vakili is looking at whether we could build more general priors over galaxy images. That, as my loyal reader knows, is one of my long-term goals; it could have many applications beyond lensing. At lunch, Glennys Farrar (NYU) gave a provocative talk about the possible effect of pion–matter effects (like an index of refraction) to resolve issues in understanding ultra-high energy cosmic ray showers in the atmosphere, issues that have been pointing to possible new physics.
My PhD student Tao Jiang graduated today. His thesis involved the cross-correlations of various galaxy populations as a function of color and luminosity. He was able to show that the red-galaxy population has been growing since redshift unity mainly from merger activity, and that the growth is dominated by red-red mergers (that is,
dry mergers). He also showed that the rate of blue-galaxy mergers is consistent with the rate at which blue galaxies transform into red galaxies (pass into the red population), which is pretty striking. Congratulations, Dr Jiang!
On the flight home, I wrote this paragraph (subject to further revision) for my self-calibration paper with Holmes and Rix. For context,
strategy A is the traditional uniform, aligned, square-grid survey strategy with small marginal overlaps;
strategy D is a quasi-random strategy where no two fields are precisely aligned in any respect.
Ultimately, survey design requires trade-offs between calibration requirements and other requirements related to cadence, telescope overheads, and observing constraints (like alt, az, airmass, and field-rotation constraints). One requirement that is often over-emphasized, however, is conceptual or apparent ``uniformity''; these different survey strategies have different uniformities of coverage, each with possibly non-obvious consequences. Many astronomers will see survey strategy A as being ``more uniform'' in its coverage (especially relative to survey D). This is not true, if uniformity is defined by the variance or skewness or any other simple statistic in exposure times; strategy D is extremely uniform (more uniform than Poisson, for example). In any survey, past and future, variations in exposure time have been valuable for checking systematic and random errors, and don't---in themselves---make it difficult to obtain a uniform survey in properties like brightness (since samples can be cut on any well-measured property). In general, in the presence of real astronomical variations in distributions of luminosity, distance, metallicity, and (more importantly) extinction by dust, there is no way to make a survey uniformly sensitive to the objects of interest. As a community we should be prepared to adjust our analysis for the non-uniformity of the survey rather than adjust (cut) our data to match the uniformity of unrealistically simplified analyses. This is already standard practice in the precise cosmological measurement experiments, and will be required practice for the next generation of massively-multiple-epoch imaging surveys.
The meeting got all semantics-y and I think I have been clear in the past about my feelings about semantics, so I won't expand on that here except to say that on the one side, if semantics are to be applied to data or papers or web pages by their authors, the tags are being applied with strong biases and incentives (all meta-data are wrong). On the other side, if semantics are to be generated by machines, then the tags are all probabilistic and we should not call them semantics but rather hyper-parameters and provide probability distribution functions over them, prior and posterior.
Highlights of the day included Alex Gray's talk on fast methods, which was followed by a short but useful discussion of discriminative and generative methods and the relationship to noise models and utility. Also Ashish Mahabel gave a nice overview of his classifiers for CRTS, which includes exactly the elements needed to do utility-based decision-making and experimental design. Between talks, Bob Hanisch and I discussed getting Astrometry.net data into the Virtual Observatory. I think that would be good for both projects. (And I guess, in some sense, it means that I will be publishing semantics. Oh well!)
I arrived at AstroInformatics 2012 today (day two) in the car of Zeljko Ivezic (UW). There were many great talks. Highlights for me included Jojic talking about de-noising images and video using image textures built from image patches. He had very impressive results with an exceedingly simple model (that patches of the image are all distorted views of some
fiducial patch). Longo claimed that he can beat XDQSOz with a new data-driven model. I am looking forward to reading the details. Norris described the SKA Pathfinder project EMU, which is going to be awesome. 70 million galaxies, many of them just star forming, detected in the radio, over 75 percent of the sky. All data will be released within a day of taking. The analysis is going to be done with attention paid to faint source detection. The raw uv data will be thrown away after analysis, which scares the crap out of me but hey, that's the new world of huge projects!
In the breaks and on the drive, I discussed with Ivezic and Jacob VanderPlas (UW) their nascent book on astronomical data analysis. It is a comprehensive work, with many worked out examples. It will ship with the code making all figures released open-source. I gave some comments after skimming a few sections and I was so stoked that I volunteered to read a bit of it critically pre-publication. Nice work, people!
After a low-research morning (punctuated by an excellent talk on the post-Higgs LHC by Andy Haas of NYU), I spent my flight to Seattle madly working on my talk for AstroInformatics 2012. I am talking about why the map–reduce (or Hadoop or whatever) frameworks for data analysis will not be sufficient for the future of astrophysics and why we have to develop new things ourselves. The map–reduce framework is great because it is a general framework for solving problems in log N time. But, to my knowledge, it can't do anything like real probabilistic hierarchical models without egregious approximations. I don't have much specific to propose as an alternative except making brute force a lot less brute.
Lam Hui (Columbia) gave a fun talk about various topics in our astro seminar. One point was about detection of stochastic gravitational radiation using the binary pulsar as a
resonant detector. This is a great idea; the binary pulsar is sensitive at much longer periods than other detectors, and our metrology of the system is excellent. Mike Kesden (NYU) and I were confused about whether the effect he seeks really is stochastic, given the sources and the narrow-band detector. We resolved to work it out later.
Late in the day, Patel and Mykytyn updated me on the Sloan Atlas measurements. The measurements they are making look great; for many of the galaxies we are measuring these are the first reliable measurements ever from the SDSS. We discussed a bit the need and plans for checking that our photometry is correct. This is a hard thing to do; what's the absolute truth for these previously unmeasured objects?
Fadely, Foreman-Mackey, and I worked on some point-spread-function fitting in preparation for making adjustments to how this is done in The Thresher and perhaps also to criticize how it is often done in standard pipelines. There are many issues: What basis to use for the PSF, what weighting to use in the fits, and what regularizations or priors to apply. The point we are coming to is that if you apply the non-negative regularization, you have to also put on other regularizations or else you can get highly biased results.
At lunch, Fergus, Fadely, and I met with Brenner (AMNH) and Nilsson (AMNH) to discuss the P1640 data and analysis next steps. After lunch we all got into an argument about finding the location of the center of the
optical flow—the point from which the speckles expand with wavelength. I argued that this is not defined if you let the point itself be a function of wavelength; then you are in the situation of the expanding Universe, where everyone thinks he or she is at the center. Same for speckles in the focal plane: Every speckle can have a comoving, colocated
observer who thinks he or she is at the center of the focal plane! Of course there really is some center, because the telescope and camera have an axis, so the point is purely amusing.
In the afternoon, Keith Chan (NYU) defended his PhD (supervised by Scoccimarro) on understanding dark-matter simulations in terms of halo clustering and halo statistics. He has created a valuable piece of work and the defense was a pleasure.
Ross Fadely is NYU's newest employee; I spent an otherwise very low-research day chatting with him (and also Fergus) about all the things we could work on together. We went into particular detail on imaging and models of images, on radio astronomy and the issues with CLEAN, and on optimization and other engineering challenges. Fergus pitched a no-knowledge purely data-driven model for the SDSS data. It is insane but could be useful and definitely fun. The use cases we came up with include
patching missing or corrupted data and identification of astrophysically interesting outliers and anomalies. With Fadely at NYU, the future is bright!