Yes, one- and two-sigma GALEX measurements do distinguish usefully between low- and high-redshift quasars.
The BOSS project on SDSS3 is taking spectra of hundreds of thousands of quasars. This requires selection based on existing imaging. Some kinds of quasars in the redshift range of interest (around 2.5) are confused in the imaging we have with blue stars and redshift-one quasars. None of these types of sources is detected in any GALEX catalog, but the redshift-one quasars do have, on average, a bit of ultraviolet flux. Today Schiminovich, Hennawi, and I found that there is hope that even two-sigma GALEX detections might serve as a strong veto of redshift-one contaminants.
The (obvious, in retrospect) punchline is: If you care about improving samples contaminated at the tens-of-percent level, you can use one- or two-sigma measurements; they aren't reliable, but they are as reliable as the sample you started with.
Price-Whelan and I nearly finished our response to the second referee report. Again, the referee made good suggestions, so no complaints. We hope to resubmit on Friday. No other real research, except a few minutes at the board with Jagannath at the holiday party.
Van Velzen gave a very good Brown-Bag talk today on the stellar tidal disruption event he and Farrar found in the SDSS Stripe 82 data. It is a beautiful candidate and suggests a high rate for such events. There was some suggestion that we might be able to enormously increase efficiency using GALEX. (GALEX has been used by Gezari and others to find a few already.)
Joe Hennawi (MPIA) showed up for a week today and we spent much of the day figuring out what we want to achieve in the areas of quasar target selection. Schiminovich came down and we decided to get some measurements done with GALEX; we are all optimistic that few-sigma measurements in GALEX pixels can break degeneracies in the visible selection of redshift-three quasar candidates from a sea of stars. We'll find out this week.
ps: WISE launched today. Good luck, people!
Jennifer Siegal-Gaskins (OSU) spoke about detection of the dark-matter annihilation signal in the Fermi data stream, by decomposing the diffuse emission into separate components, each of which has a particular energy spectrum and angular power on the sky. After the talk I suggested a generalization of her methods and started writing a document about them. She convinced me that Fermi is extremely promising for detection of dark-matter; much more promising than I had thought previously.
Roweis and I realized today that we should enforce much more than smoothness on the matrix that relates flux (in the observed object) to pixel values (in the CCD); we know that the matrix can be thought of as a narrow spectral trace convolved with an aperture-convolved, pixel-convolved point-spread function. So we should just model these two things (trace and PSF), and enforce smoothness there. That represents much stronger prior information for our inference, but it is sensible, in the sense that if we inferred a matrix that does not look like a trace convolved with a (slowly varying) PSF, we wouldn't believe it. We also worked out a bit about how to proceed.
I spent a great lunch and early afternoon with Maurizio Porfiri (NYU Poly), who is a mechanical engineer and applied mathematician working on dynamical networks of independent agents like fish. He has some sweet experiments in which his lab watches fish schooling, works out the schooling rules, and then inserts a robotic fish and uses the robotic fish to
take control of the school and drive it around! I discussed with him the possible relevance of his work to our ideas about building a collaboration between NYU, NYU Poly, NYU Abu Dhabi, and NYU Global that will create and run a global network of autonomous telescopes.
In the late afternoon, a group of high school students came from NEST high school in Manhattan to NYU to present original research they have done with the Spitzer Space Telescope on the AGN NGC 4051. They got time from the director's discretionary time and they have done a huge amount of work with the data and come to some interesting conclusions about the structure of the dust torus. Nice work!
Bovy and I discussed how we are going to measure proper motions from SDSS-III imaging (and SDSS-I and SDSS-II imaging) in comparison with USNO-B imaging. We want to model the pixels, but pragmatic considerations may force us to execute only a rough approximation to that lofty goal.
Great talks today from Daniel Grin (Caltech) on extremely precise cosmic recombination (it should be called
combination) calculations (he had to go into the hundreds of levels, with all the angular momentum states separately tracked), Paolo Spagnolo (INFN) on the possibilities for new physics in the LHC's first year of operation (when it is operating at partial power), and Keith C. Chan (NYU) on primodial non-Gaussianity measured from galaxy surveys like BOSS.
In the evening I balanced out the grueling talk schedule with a little relaxing work on the theory behind Roweis and my spectroscopic dreams.
On the weekend I spent time reading Bovy's next huge tome about the moving groups, in which he demonstrates that they are far more likely to be dynamically created than to be fossils of past star-formation events, and that within the dynamical options they are more likely to be generated by transient dynamical complexity (from, for example, spiral arms and similar inhomogeneities) than from long-lived dynamical resonances. This supports suggestions by Tremaine. We hope to finish the paper within days.
I also finally worked through and sent my comments on spectro perfectionism; in short I complained about their treatment of exposure time, darks, and biases; their restriction to invertible matrices; the fact that the final output they advocate has no Bayesian generalization and is not a point estimate of any simple statistical property of the data or spectrum; their timidity about huge matrices and huge least-squares problems; and the issues of consistently modeling cosmic rays or data with non-Gaussian noise. I never have this many complaints about a paper unless I love it (with one notable exception); don't get me wrong: I love the paper.
David Law (UCLA) gave our astro seminar today on his halo triaxiality result. The odd thing is that although they permit triaxiality, they find oblateness, with the principal axis in a weird direction. But then maybe it isn't so weird, because it looks like the weird direction might just be the orbital plane of the LMC. So is it just a result of the dynamical influence of an orbiting LMC? If so, that might end up
weighing the LMC!
I spent part of the morning doing the linear-in-eccentricity limit of low eccentricity Kepler radial velocity problem with Price-Whelan and Jagannath. We had a slow start but eventually got expressions for the fourier expansion to m=2 for small-eccentricity systems. After that, Price-Whelan plotted them up and showed them to be as accurate as we expected (that is, residuals were order e2).
In the afternoon, Roweis and I got fired up about spectroscopy. Here are the first few words of a document I started writing up about our project:
In a modern spectrograph, light from fibers or slits is dispersed onto a two-dimensional CCD or CCD-like detector, with (usually) one direction on the detector corresponding to an angular displacement on the sky and another (usually) close-to-orthogonal direction corresponding to wavelength. There are also slitless spectrographs, where one direction is a mixture of angular position and wavelength. The problem of spectroscopic data reduction is the problem of extracting the one-dimensional spectra—astronomical source flux densities as a function of wavelength—from the two-dimensional images. There is a literature on "optimal extraction" of these one-dimensional spectra; the best extraction methods treat the two-dimensional image pixels as data to be fit by a model that consists of a one-dimensional spectrum laid out geometrically on the device and convolved with a two-dimensional point-spread function (which might be a function of the atmospheric seeing or the device properties or both).
Although the optimal extraction literature solves some important problems, the hardest part of spectroscopic data reduction lies not in the extraction step but in the step of measuring or learning the geometric and point-spread functions themselves. These functions have various names in the literature but they can be expressed with a single rectangular matrix A...
I was out sick today, but in my bit of work time, Lang and I worked on fitting orbits for our Comet Holmes project. In general, fitting orbits blindly (that is, without a good first guess) is extremely difficult. We are working without having read the relevant literature; I think we may have to read something. Our MCMC works well, but our first guess is terrible.
In other news, Bovy and Murray and I finished the submittable version of our Solar System paper. Bovy submitted it to The Astrophysical Journal today.
Roweis, Blanton, and I had a lively talk about the recent Bolton & Schlegel paper on spectral data reduction. Roweis suggested a Bayesian generalization of it, the problem being, as Blanton points out, that we have to return to users what they want or expect, which might not be a posterior probability distribution over spectra! Then we moved to discussion of what Bolton & Schlegel call the matrix A. They treat this as known, but of course figuring out this matrix (which is the combined sensitivity of every pixel to every wavelength in every fiber) is really the hard part of spectroscopy.
Roweis, Fergus, and I met today to discuss opportunities for interdisciplinary research. We do this often; we all work on super-related fields, with Roweis in machine learning and computational statistics, Fergus in computer vision, and me in astronomy. Currently our only three-way project is one in which we are building a very general model of the SDSS imaging data stream, but it is very exploratory at this point.
Among other things, Bovy and I discussed the first steps towards measuring proper motions in the USNO-B and SDSS raw data streams. This is a big project, but we have decided to get
closer to the data.
I spent yesterday and today at the Spitzer Oversight Committee, which is definitely not research. I learned many things about project management and spacecraft engineering, but the most interesting scientific result I heard about was that Saturn has a ginormous outer ring.
I spent my research time today reading Bretthorst's book on Bayesian spectrum analysis [one big PDF file]. It is a beautiful and useful document; I think there will be many ideas in this book useful to the exoplanet problem. Roweis pointed me to this book; Yavin made me read it.
One small comment on this excellent book, which I am compelled by God and Man to make: Bretthorst, like Jaynes, is a believer in assuming that errors are Gaussian because that is the most conservative thing you can do if you have a noise variance estimate and nothing else. This is technically correct, and beautiful to see demonstrated. However, this is a very dangerous argument, because it only applies when you somehow, magically, know your noise variance. You never do, at best you know the curvature at the mode of the noise distribution (if you are lucky). The variance is dominated in most real systems by rare outliers. No finite experiment is likely to provide you a good estimate of it. Furthermore, even if you do know the variance, how would you know that you know? You would have to take fourth moments to confirm it, and I have never seen an experiment ever in the history of science in which fourth moments are accurately measured. Finally, the conservativeness-of-Gaussian argument is a maximum-entropy argument subject to a strict variance constraint. Jaynes and Bretthorst should know better: You never have absolutely perfect knowledge of anything; the noise should be found through an inferential process, not a constrained exact math problem!
Whew! I had to get that off my chest.
Brice Ménard (CITA) came by for the day, and Schiminovich came down in the afternoon and we all discussed weak but measurable correlations in quasar spectra and imaging. For example, Ménard has measured relationships between quasar images and absorbers (to show that there is extended star formation associated with absorbers), between emission lines and absorption lines in absorbers (same conclusion), and between quasar colors and angular separation to nearby galaxies (to show that there is dust correlated with galaxies). These projects are all describable as
stacking projects, because they involve measuring very weak signals that are only detectable in large ensembles of objects that can be, in some sense, aligned. However, they are also all describable as
correlation function projects, because they are measurements of excess in one signal that is keyed by the presence of a somehow neighboring signal. Either way, they are projects that are only possible with large uniform data sets; Ménard is the world's expert in finding signals like these. Schiminovich and I committed to giving Ménard some GALEX data for extension of these projects into the ultraviolet.
Lang and I found a significant bug in the code we were running this summer to brute-force identify stream-like substructures in the stellar distribution in the Milky Way. We fixed it, updated our results, and started to follow up the most promising stream (follow up statistically, not observationally). Our plan is to write an observing proposal to confirm it (or deny it) with radial velocity measurements.
I ate lunch with Yavin, during which we discussed how next to proceed on the exoplanets. Yavin has made something that finds the dominant planet incredibly fast by performing Fourier-based arithmetic operations on the data prior to any fitting. The question is: Can we use these methods to enormously speed up any more correct Bayesian analysis? We both have the intuition that we can. I promised to start literature searching this weekend to make sure we are not completely reinventing old wheels (I am sure we are to some extent).
I spent the day checking in on a number of ongoing projects. Bovy and I discussed the moving groups from Hipparcos and the black hole at the Galactic Center. Price-Whelan and I discussed proper priors for the Bayesian line-fitting problem. Itay Yavin (NYU) and I discussed a nascent project on translating the exoplanet problem into a problem in harmonic or spectral analysis. Lang and I discussed, among other things, observing proposals. And Wu and I discussed her slides (on multi-wavelength measures of interstellar radiation and star-formation rates) for a short talk at the GALEX team meeting at Columbia tomorrow.
Roweis and I spent the morning discussing the possibility that we could build a pipeline to read and analyze the SDSS-III BOSS spectral data, starting by looking for outliers and moving towards full data reduction. This is, in some sense, duplication of effort, but since we would work without
domain knowledge, we might learn or confirm some things that might have to be assumed otherwise.
In the morning, Jeremy Tinker (Berkeley) led our group meeting with a discussion of information about galaxy evolution from clustering. In the approximation that we know the dark matter model, the relationship between galaxies and dark matter can be parameterized and then the observed galaxy—galaxy clustering puts constraints on how the galaxies could possibly form and evolve. He has some counterintuitive results, from the fact that at intermediate redshift, the large-scale clustering of red and blue galaxies is very similar.
In the afternoon, Marc Kamionkowski (Caltech) gave the Big Apple Colloquium about the isotropy and homogeneity of large-scale structure, and in particular the cosmic microwave background. He is building non-natural models that permit anisotropy in the power spectrum while preserving isotropy in the temperature and density and all else. There is a small amount of evidence for this statistical anisotropy situation in the current data; it is a long shot but if it holds up it is extremely important.
As my loyal reader knows, I have no fear when it comes to models with huge numbers of parameters; indeed the ubercalibration project is effectively a fit with hundreds of millions of parameters, and we can prove that we got the global optimum (in the sense that we made sure the problem is guaranteed to be convex). Today Roweis pitched a generalization of all this, in which one creates a very flexible linear model space, where parameters are tied to or in a hierarchy of meta-data, such that some parameters are tied to, say, the date, some to the airmass, some to the seeing, some to the camera column, and so on. Then the model discovers which parameters are necessary for accurate modeling of the data and thereby discovers important meta-data, dependencies of the data on artificial issues, and bad data. We tentatively agreed to run this on the new BOSS spectra from SDSS-III.
Iain Murray gave another nice talk today, this time in the machine learning group, about a high-end sampling method called
elliptical slice sampling, optimized for gaussian process modeling, where calls to the prior probability distribution function are more expensive than likelihood calls. It was a very nice talk and got me thinking about slice sampling in general, which might be very useful to us.
I assigned Jagannath the following problem: What can you learn about the gravitational potential of a gravitating system in which you yourself lie (think the Milky Way) if all you have are streams of stars that trace out orbits, and all you can measure about those streams are their trajectories in angular coordinates (think RA, Dec). That is, what can you do if you get a snapshot of a few orbits, but you only get the angular shapes of those orbits? We went back and forth a bit between the answer
everything and the answer
nothing. I think it is the former, but of course it will depend strongly on the precision of the measurements! In some sense, all the answers are known; we are just on an intuition-building adventure.
I spent an irresponsible morning talking non-stop to (at?) Iain Murray and Jo Bovy, about our various inference projects, current and future. Murray and Bovy have spent the last week figuring stuff out about our Solar System project, and are producing a much deeper (and more useful) paper; it shows that different parameterizations of phase-space distribution functions lead to different results, but (a) not very different, when stated in terms of reasonable probability intervals, and (b) you can put all possible parameterizations into the model and marginalize over them, without harming the results. What a pleasure it is that I can count hanging out all morning discussing such niceties as performing crucial functions of my job!
At lunch time, Jeff Allen (NYU) gave a status update from the Pierre Auger Observatory for ultra-high energy cosmic rays. There are now conflicting bits of evidence about their sources and composition, which is tantalizing, although the GZK cutoff did really appear is it had to.
I fielded email today from various undergrads to whom I pitched (on Wednesday) a big project to build a precise probabilistic ephemeris for the Solar System. I am hoping to execute a huge project with a team of undergrads handling all the important parts.
At group meeting (before lunch), Iain Murray (Toronto, Edinburgh), who is one of our Bayesian consultants, gave a talk on sampling for inference, reviewing a few simple methods and then some hard pitfalls. A bit of an argument broke out about how you deal with well-separated maxima, with Roweis and me more on the side of "you never know if you have found the true maximum so you have to figure out how to do science without knowing that" and Scoccimarro and Blanton more on the side of "you have to keep working until you are sure".
Jim Peebles (Princeton) visited today. Roweis and I spent an hour or so advertising to him our data analysis projects. Late in the day he gave a talk on the history of cosmology, which made a number of remarkable points, like that Hoyle was (almost) the first person to identify the cosmic microwave background (despite the fact that he thought it was impossible). An argument broke out about the importance of Friedman; Peebles takes the (tough) view that to be really important in cosmology you must have not just done theory but connected it directly to experiment, or not just done experiment but had the theorists realize that it was relevant. That's a high bar.
I spoke with Bovy about the possibility of doing an ambitious analysis of what is possible with Gaia. I spoke with Iain Murray (Edinburgh), who is visiting, about sampling, where he is one of the world's experts. I asked him to tell me what we can do if likelihood calls are outrageously expensive (so converged samplings of the posterior PDF are impossible).
Today was devoted to an NSF proposal, which, according to the rules, is not research. The only research activity of the day was a talk by Nadia Zakamska (IAS) about infrared emission lines and what they have to say about the physics of galaxies. She has some beautiful results, especially that the PAH emission and the H2 emission in the interstellar medium are differently affected by dust extinction. This is odd, because they are both strongly correlated with star formation and one another. So they are strongly correlated but not co-spatial. Odd.
Lang and I worked on (that is, pair coded) making figures for our somewhat strange Comet Holmes project. We started by re-running everything from scratch (find images on web, download, source extract, calibrate) and made this, one of several images of the individual-image footprints on the sky. In this image, the brighter footprints are the more constraining footprints; axis units are degrees on the sky from some reference point.
My only research time today was spent fielding email about fractal universes, emails that were inspired by my arXiv submission yesterday. It turns out—contrary to what I wrote in that note—that there have been some serious attempts to compute observables in inhomogeneous models. I think my conclusions are still safe, but my language might have been a bit strong.
I spent all day today at the meeting in honor of Roger Blandford, which I played an embarrassingly microscopic effort in organizing. The meeting was incredibly well attended, with many old friends in attendance from all over the world. There were talks across the whole electromagnetic spectrum and covering a large range of astrophysical processes, all of which I enjoyed. In particular, Maxim Lyutikov (Purdue) gave just about the perfect description of Blandford the advisor (Lyutikov and I are coeval). Chris Kochanek (OSU) described a brute-force generative modeling of microlensing light curves that made my Bayesian heart sing. I spoke about model selection in cosmology, in a general way; I put my remarks on the arXiv here.
I spent a long time on the phone with Lang, who is building quantitative models of galaxy images using other galaxy images. We spoke about the free parameters, and the robustness of the fitting (given that there are some random superimposed stars and the like), and so on. He has some nice results already, and this has only been underway for a few days now. At the end of the day I gave a big public talk here in Buffalo.
I spoke with Schiminovich about what we can achieve by taking first moments (means) of low signal-to-noise data. There is quite a bit you can do (I think), though we haven't worked out all the details. This all relates to our hope of constraining properties of quasars and the Universe with low signal-to-noise GALEX observations.
One of the things I have been saying for a few years is that the astronomical images available on the Web, taken together, form some kind of very heterogeneous and odd sky survey. Could this be used to do science? Lang and I have an argument that it can: He Web-searched "comet holmes", took all images found that way, calibrated them with Astrometry.net (many didn't calibrate because, for example, they were images of cats or grandmothers), and fit a gravitational trajectory in the Solar System to the image locations. We started to write this up today.
I spent most of the day chatting, notably with Stumm, who is in NYC for a few days. Late in the day, Lang (attempted to) put the Astrometry.net paper up on the arXiv. This was a failure, and not for the usual reasons: None of the figures got flagged as too big!
The paper compiles without complaint by "pdflatex astrometry-dot-net" on any unix (mac or Linux) platform we have been able to try. And yet on arXiv, the figures can't be understood! (It can't determine size or bounding box or format or even see the file in some cases.)
The arXiv system has a number of issues, not the least of which is that it doesn't just run vanilla pdflatex, ever. It runs in some strange box in which various pdflatex options have been changed from their default values. This would be fine if the arXiv system exposed its pdflatex or latex configuration. But it doesn't. Even better, since all the processing is done by robots, a "sandbox" robot could be established for people to test uploads before submission, greatly reducing the time wasted on this, not to mention the stress; documents and tarballs could be tested as they are being written and not "by fire" right at posting time.
Indeed, the inscrutable arXiv robot will reject a submission on any number of grounds, many of which are mentioned in the arXiv help pages, but few of which are described in enough detail for a user to reliably avoid them. For example, the figure size constraints (which were not our problem today) are never stated explicitly in the help pages; the help pages (like this one) only say that figures "should" be made "small" because that is more "efficient"!
As my loyal reader knows, I love the arXiv; it has transformed astrophysics and all of the sciences. Now lets just make it easier to use! Note that anything that makes it easier to use also makes it easier to maintain and run. (Think of all the emails and blog posts that could be saved!)
Lang and I finished the primary Astrometry.net paper for submission! We will put it on the arXiv tomorrow. I am extremely excited to see it go out. Congratulations to Lang and the team.
At the end of the day the Physics Colloquium was by Halsey from ExxonMobil on carbon sequestration. It is amazing that this is on the table, because it is so expensive, it would cost as much to sequester the carbon as it currently costs to produce the oil; that is, it would double the price of oil. The talk reinforced the point that there are no real technological solutions; if we are going to reverse global warming, carbon emissions, and pollution there have to be social, cultural and political changes. Technology can only play a small role.
I just failed (though only just) to complete the noise-information paper (with Price-Whelan) for submission today. And then I also failed to complete the Astrometry.net paper as well! But both papers are close, so by Friday, if all goes well.
I finished the first draft of my contribution to the Blandford meeting at the end of the day today.
At the beginning of the day, Price-Whelan and I worked on last details for his paper on the information in astronomical images.
I spent the day talking to anyone who would listen about unconverged MCMC (or equivalent) chains. The issue is a big one, and I have many thoughts, all unorganized. But basically, the point is that when likelihood calls take a long time (think weeks), then there is no way in hell we will ever have a converged and dense sampling of the posterior probability distribution for any parameter space, let alone a large one. At the IPMU meeting last week, most practitioners thought it was impossible to work without a converged chain, but I noted that we never have a converged chain in the larger space of all possible models; whenever we have a converged chain it is just in some extremely limited and constrained subspace (for example the 11-dimensional space of CDM or the like; this is a tiny subspace of all the possible cosmological model spaces). The fact that we don't have a converged chain on all the possible models and all the possible parameters conceivable does not prevent us from doing science. This has connections to the multi-armed bandit problem. I also have been thinking about Rob Fergus's 80 million tiny images project, which treats the result of a huge set of Google (tm) searches as a sampling of the space of all natural images. Of course this is not a converged or dense sampling! But nonetheless, science (and engineering) can be done, very successfully.
Today was exoplanet day at the SFoA meeting, which meant I learned the most. Among other things, Winn (MIT) told us that he can measure the alignments of planetary orbits with stellar rotation and that planet transits more-or-less directly tell you the surface gravity on the planet. Turner (Princeton) spoke about strong biases in fitting planets to radial velocity curves, which emerge from the nonlinearity of the fitting. Laredo (Cornell) showed a system for optimizing or guiding future radial velocity measurements given measurements in the past, where he is optimizing for information gained about the orbit. As he points out, you can optimize for many things there; it is a multi-armed bandit kind of problem. One thing that surprised me is that he takes the convergence of his MCMC chains very seriously, which is good in general, but does not seem necessary to me in order to perform these experimental design activities. After all, your utility will always be approximate, why spend millions of hours of CPU time to fill out in enormous detail predictions of future trajectories that will only be used to approximately calculate your utility? But he is certainly thinking about the problems the right way.
Several mentioned that although orbit fitting is now being performed in quasi-optimal ways, the original data reduction (going from spectral pixels to radial velocity measurements with errors) is not. Turner opined that if this were done better, more planets would be discovered, because there are many detections close to the current limits. That's a big—but very important—problem.
Today was cosmology day at the meeting, with (among other interesting contributions) a nice discussion by Marinucci of needlets, a class of compact wavelets on the sphere. These have extremely nice properties for statistical analyses of fields on the sphere. Once again I am amazed at all the great things you can do with linear functions of your data! In off time, I spent some time getting very specific advice from Baines about MCMC improvements.
Today was the second day of the Statistical Frontiers of Astrophysics meeting in Tokyo. Among other very interesting talks, Xiao-Li Meng and Paul Baines worked us through some ideas in the modern use of MCMC. They are particularly interested in problems of data augmentation, where parameters are introduced for every data point (that is, there are more parameters than data points). They had a number of basic ideas for MCMC that are almost always a good idea; I have lots to do to my code when I get home (and lots of new literature to read).
Sam Roweis spoke at the NYU Computer Science Colloquium, and Lam Hui spoke at the NYU Astrophysics Seminar today. Roweis spoke about fitting models to data, where the data are taken by sensors with unknown properties (for example microphones with unknown thresholds, saturation, nonlinearity, and frequency response or sensors on wandering robots of unknown position). His point was that if you have enough sensor readings from differently unreliable or unknown sensors, you can still do very well, if you take a generative modeling approach. The demos were nice. Hui spoke about modifications to gravity and what they might do to dynamics. In particular, he noted that most current modifications to gravity look very much like a scalar–tensor theory, where there is (effectively) a different value of G in high-density regions than in low-density regions. If this is going on in our Universe, there ought to be lots of dynamical signatures; hopefully we can rule out large classes of models of that type.
The most attentive of readers would know that Price-Whelan and I have been working on assessing the bandwidth necessary for transmitting astronomical image information. That is, we have determined the precision to which you have to record pixel values to preserve all the scientific information in an image, where we assess that information for both bright sources and faint sources
below the noise. Today we pulled down a RAW (unprocessed) HST ACS image from the HST Archive and showed that, in fact, the ACS raw data are telemetered within one bit of the minimum bandwidth. That is, of the 16 bits in the integer representation of raw ACS pixel data, at most one bit is wasted. That's pretty close to optimal, and we will say so in the paper we hope to submit within a week or two.
Spent time with Johnston today at Columbia and she encouraged me to dust off my manuscript on perturbations of cold streams by compact substructures. We realized that there are already morphological features in the known streams that could be analyzed, at least roughly, in terms of perturbations by substructures. I started to get her to agree that the smoothness and straightness of streams like GD-1 already make them interesting for the dark-matter model. But we both agreed that we can't be quantitative about that until we understand how the disruption of streams by substructure affects their detectability.
I participated in Ronnie Jannson's PhD defense today. He has tested Galactic magnetic field models, and their implications for ultra-high-energy cosmic rays. He showed that none of the GMF models are reasonable, and that there is a substantial probability that a large fraction of the UHECRs come from a small number of sources. Great stuff and congratulations, Dr Jannson.
I spoke about decision making in the CCPP brown-bag lunchtime talk today. The crowd was very suspicious about the point that the number of data points and number of parameters are both irrelevant or nonexistent quantities, except in the case of linear least-squares fitting. No other research to report, as I spent the day preparing for class and a thesis defense.
On Sunday, Lang came into NYC and we discussed his projects involving imaging data pipelines and the photometry and deblending of galaxies. Lang would like to make deblending a probabilistic operation, but this requires (in our outlook) a generative model of a galaxy image—a model that is good at the pixel level! This doesn't exist; exponential and de Vaucouleurs models are not
good fits to the data in any sense of that term. There are shapelets and sechlets, but these (on the other hand) are pixel level with too much freedom. So we came up with the simplest possible model: other galaxies. Can we find a small number of galaxies that act as archetypes to fit other galaxies, and then use those for probabilistic deblending? We shall see.
Schiminovich came to NYU for the day and we talked Fireball and GALEX and the quasar–photon cross-correlation. If we could detect this angular cross-correlation, could we separate it into scattering, reflection, recombination, and star-formation components? Once again we came up with many new projects to do with GALEX; but we should probably just finish the ones we have started!
Sam Roweis has just moved from Google to NYU to start as a tenured faculty member. Congratulations! And, more important, congratulations to me, because this is one of the best things to happen in my work life! We meet every Thursday to talk and work; today we discussed, among other things, some of the questions that came up in Lang's thesis defense about how data-analysis pipelines could adapt to the data they have seen and, effectively, learn. The ideas or settings or choices we put in at the beginning would just be
initial conditions that get replaced with objective knowledge generated by operation.
I realized (duh) that there is a huge literature out there on MCMC engineering, and I tried to read a bit of it. There are lots of ways I ought to be able to speed up my exoplanet MCMC (and all our other MCMCs) with these tricks. Of course they all make the code much less understandable, so there are trade-offs. I discussed MCMC in general with Price-Whelan, who might start doing some research on the subject, and the tricks with Bovy, who is our resident sampler.
In many of the problems we are interested in, we would like to have a chain or sampling of models that are consistent with all data so far and then, as new data come in, we would like to trim the chain of models that the new data
rule out (really disfavor), and then extend the chain with new models that are consistent with everything. I have a strong intuition that we can do this without re-starting from scratch. I discussed this with Lang, and a short-term goal is to convert the exoplanet project over to this mode.
This idea is not unrelated to the fact that as new data come in, you can treat them as an entirely new experiment, for which the prior probability distribution is the posterior probability distribution from the previous data.
I spent airport time yesterday and morning time today making human-readable output from my exoplanet discovery code. I realized that by far the best way to see that the MCMC has converged is to show that there are no correlations between any parameter and the link number. I confirm this visually, but this could be tracked quantitatively; a great upgrade for the next generation of code. I also thought—and spoke with Roweis—about model selection, which everyone treats as trivial, but certainly is not.
It is with tears in my eyes that I report the intellectually exciting and (obviously) successful PhD defense of my friend and student Dustin Lang today at Toronto. It was truly an old-school defense, with a lively debate among the committee (half astronomers and half computer scientists) and the candidate, filled with deep discussion of the problem at issue (Astrometry.net and all its engineering and scientific aspects) and also the problems of the future (roboticising, learning from, and making more principled all of astronomy and the physical sciences). Lang has only scratched the surface of this, of course, but it is a deep scratch he has made. I also learned, once again, even more clearly than ever before, that astronomers and computer scientists have a lot to talk about. The heartiest congratulations are due to Dr Lang.
Later in the day I spoke at CITA, which is just about the liveliest place for theoretical astrophysics in the world.
One of the reasons I am interested in the exoplanet radial velocity stuff is that I am interested in model complexity: How to compare, combine, and decide among models of different complexity. I realized that the correct hierarchy of complexities is not based purely on the number of planets, but also on whether or not they are permitted to travel on orbits of nonzero eccentricity. I modified my code to respect this additional level of complexity and am re-running it all.
I had conversations today with Wu, who is responding to the referee report on her paper on Spitzer observations, and Bovy, who is responding to comments we solicited on his paper on the local standard of rest. In both cases the comments are very constructive; both papers will be much improved by responding to them.
I failed to admit in the previous post that I was fitting the radial velocity data with pure sinusoids, not proper planet-induced radial velocity models. I switched to proper models today, after taking a short refresher on the mean anomaly, eccentric anomaly, and true anomaly. I wrote (not completely) stupid python code to do all these things and got my MCMC running again. So far, it looks like it might work. Of course I am reinventing the wheel here.
Inspired by Schwab (Sternwarte Heidelberg), I spent all of Friday and a chunk of the weekend fitting models to precise radial velocity data on stars measured for the purpose of discovering exoplanets. Scwhab came to me because I had expressed confidence that an MCMC approach would be not just useful but necessary. Having said this, I had to make it work.
As always, the issue is with initialization, search, and convergence of the MCMC algorithm. The algorithm is simple and provably correct, but the proofs don't tell you how long it will take to converge to a fair sampling of the posterior distribution function. Furthermore, that convergence is a very strong function of choices you make about stepping (directions and sizes), and there is (at present) no objective way to set that. Indeed, this is a great area of research, and there are probably results there I don't know about.
One thing I did, which worked extremely well, is implement a trick suggested by Phil Marshall: I started the MCMC working on the prior alone, and slowly increased the relative weight of the likelihood, so that only after a long burn-in period was the system optimizing the true posterior. That worked extremely well. More on all this later, because all this recommends working out and writing down some
lessons learned about MCMC in practice.
[I broke the rules this week by posting only four posts. That's because on Wednesday, I got nothing done. Unfortunately, the rules demand that I admit this.]
Tao Jiang (NYU) brought me back up to date on his project to measure very small-scale galax–galaxy cross-correlation functions in the SDSS Main sample (that is, normal, low-redshift galaxies). His goal is to measure the merger rate as a function of mass, color, and mass ratio.
In the rest of my research time today I worked on some crazy celestial mechanics ideas; conversations with Christian Schwab (Heidelberg Sternwarte) got me thinking about how one explores the space of possible model fits to exoplanet radial velocity data.
I started writing something today for the upcoming symposium in honor of Roger Blandford (my PhD advisor). What I am working on is a bit off the rails; I am supposed to be talking about cosmology, but I might use the opportunity to talk about scientific reasoning, since that has always been one of Blandford's hobby interests.
I read a paper today that is just about completely wrong. It is Linear regression in astronomy by Isobe, Feigelson, Akritas, Babu (1990). The paper presents five methods for fitting straight lines to data, and compares them. I think I have three objections:
First, they present procedures, and do not show that any of those procedures optimize anything that a scientist would care about. That is, they do not show that any procedure gives a best-fit line in any possible sense of the word
best. Now, of course, some of their procedures do produce a best-fit line under some assumptions, but they only give those assumptions for one (or two) of their five methods. In particular, the method they advocate has no best-fit interpretation whatsoever!. Scientists do not trade in procedures, they trade in objectives, and choose procedures only when they are demonstrated to optimize their objectives, I hope.
Second, when deciding whether to fit for Y as a function of X or X as a function of Y, they claim that the decision should be based on the
physics of X and Y! But the truth is that this decision should be based on the error properties of X and Y. If X has much smaller errors, then you must fit Y as a function of X; if the other way then the other way, and if neither has much smaller errors, then that kind of linear fitting is invalid. This paper propagates a very dangerous misconception; it is remarkable that professional statisticians would say this. It is not a matter of statistical opinion, what is written in this paper is straight-up wrong.
Third, they decide which of their methods
performs best by applying all five methods to sets of simulated data. These data are simulated with certain assumptions, so all they have shown is that when you have data generated a certain way, one method does better at getting at the parameters of that generative model. But then, when you have a data set with a known generative model, you should just optimize the likelihood of that generative model. The simulated data tell you nothing in the situation that you don't know the generative model for your data, which is either always or never the case (not sure which). That is, if you know the generative model, then just use it directly to construct a likelihood (don't use the methods of this paper). If you don't, then you can't rely on the conclusions of this paper (and its ilk). Either way, this paper is useless.
Wow, I am disappointed that this is the state of our art. I hope I didn't sugar-coat that critique too much!
I woke up today wondering
what would a frequentist do?, especially with regards to a fitting problem with nuisance parameters; that is, a problem where there are parameters (m,b) that one cares about, and parameters (P,Y,V) that one doesn't. A strict frequentist has no measure on the parameter space, so he or she can't marginalize or integrate out the nuisance parameters from the answer. Also, the frequentist does not produce a probability distribution for the parameters given the data, but rather a probability for the data given the parameters (a likelihood, which has the wrong units to be marginalized). Marginalization is not an option.
I think a principled frequentist has two choices: Either choose the maximum-likelihood model, and report all five parameters, even though three are uninteresting, and then do some kind of sampling-based error analysis around that peak; or else look at the entire space of models that are above some threshold in likelihood. If the latter, the frequentist gets out a range of possibilities for all five parameters. That range is a strict range, depending only on the likelihood threshold. The fact that some parameters are nuisance parameters is irrelevant. The frequentist is more ad hoc because a threshold must be chosen, but also more conservative because in general the prior and the marginalization (at least slightly) tighten up the parameter bounds. (Note that I am assuming that the Bayesian sets priors objectively, which is often not the case.)
I finished a first-draft set of figures for a putative paper on the bulge-mass vs black-hole-mass relation in galaxies. I find that even if the restrictive assumption is made that the relation is a power-law, the scatter around the relation is dominated by measurement uncertainties, not the intrinsic scatter (the scatter that would be observed given extremely high signal-to-noise measurements). I can also identify which galaxies are the largest outliers, accounting for their measurement uncertainties and marginalizing over all other parameters (including slope and intercept). Those galaxies are either truly anomalous or else have unrealistically small reported uncertainties on their black-hole or bulge masses.
In what little time got spent today on research, I switched over my black-hole against bulge-mass power-law fitting code to perform its movements in parameter space in one dimension at a time, sequentially looping over the available dimensions. This permitted me to set the step sizes objectively, and seems to have improved convergence. We have been learning that although MCMC is a super-useful and super-simple tool, in practice there are a lot of considerations involved in making sure it is running efficiently and converging; there are no extremely straightforward guarantees. This deserves a detailed write-up at some point in the near future.
[I just returned from some time off.]
I resurrected the black-hole–bulge-mass relation fitting code, made some tweaks, and got it running again after the break. On vacation I sketched out an abstract for a paper on the subject; not sure it is greater than one LPU (least publishable unit).
Bovy sent me a draft manuscript on the local standard of rest—the velocity of a putative circular orbit around the Galaxy, passing through the Solar position. He and I are arguing that this orbit does not exist, on an empirical basis. The data on the local phase-space density are simply not consistent with what is expected if that orbit exists. Of course there are approximations to this orbit; quantitatively we find that there is no such orbit at the few km/s level. There is no point in trying to measure it more accurately than that; and it shouldn't be used at all in any contexts that require precision.
On the last day of the IMPRS summer school, Bovy gave a detailed talk about dynamical inference in the Solar System, our April Fools' paper. He gave a very nice description of not just what we did but also why it works and what are the limitations. I am getting steadily more pessimistic about dynamical inference with Gaia, at least with phase-mixed populations. Populations that are not phase mixed, such as tidal streams, are seeming much more promising at present.
All my presentation materials (somewhat disjointed) are available in this tar gz file. For a non-attendant at the summer school, the most comprehensible part will be the problem sets.
Konrad Bernloehr (MPIK) gave a beautiful talk about the HESS experiment at the IMPRS. The data analysis is based on absolute and extreme forward modeling: They produce in their air shower simulations the expected output from the HESS readout electronics. They even model were each photon hits each mirror! Data is compared to model in the space of the raw data. Beautiful!
In my IMPRS lecture today, I admitted in public that when you make a hard decision (that is, when you choose a model, or choose to include or not include a source in your catalog) you are by necessity doing something subjective. After all, Bayes returns only relative probabilities; making a decision on that basis departs from the objective process of inference. You can do better by making your decisions with the help of explicit utilities, but even these don't save you from subjectivity.
After we had all the students at the IMPRS summer school code up a Metropollis-Hastings MCMC algorithm yesterday, Bill Press, in his lecture, explained the algorithm today and—very simply—proved its efficacy. This was a pedagogical experiment: We had the students execute first, understand later.
Bill press kicked off the IMPRS summer school with a set of lectures on distributions and Bayesian inference, with a focus on probability theory.
In the afternoon, Bovy, Lang, and I ran a
lab session in which we had the students write a MCMC sampling code in Python. Many of the students succeeded, and many are close, despite the fact that most of them had never used Python previously. It was the first of three sessions. If you want to follow along from home, we will post all our materials to the web within days.
On the weekend, Lang and I re-fit the local black-hole–bulge-mass relation, to assess common wisdom about the slope. We find that if the model is given flexibility to deal with outliers, the slope is not extremely strongly constrained; even a slope of unity is possible. More on this we decide to finish it and write it up. It isn't quite one LPU, if you know what I mean.
We also worked on preparations for the IMPRS summer school.
Lang and I softened (made more robust) the background model in our brute-force foreground–background modeling system; that is, we made it harder for the foreground model to pick up spurious streams by effectively modeling deficiencies in the background model. This worked like a charm, eliminating almost all streams in our "scramble" sample and leaving a few tantalizing ones in the real data. Now we have to figure out how to observationally confirm our candidates. This is hard, because when you have used all the data to find the candidates, you don't have any data left to confirm them!
Lang and I switched to brute-force foreground/background modeling, after Koposov convinced us it was a good idea and we convinced ourselves that we could do it quickly (see yesterday's post). We found a stream, very near the Sun! Then we scrambled the data (that is, generated fake data that can't have structure) and found even more streams, so I guess we didn't really find a stream. Working on it.
Behind us in the office, Bovy has been working on the Sun's motion relative to the local standard of rest. The assumptions that underly the measurement of the LSR are all violated; is there an LSR at all? We don't know (there doesn't have to be one at all if the Galaxy isn't close to axisymmetric), but we do know that if you assume there is a local standard of rest, what you get for it is a strong function of how you deal with stars that are members of the cold
moving groups in the disk. The error bars in the literature, including mine in Hogg et al 2005 are all way underestimates.
Koposov, Lang, Rix, and I debated methods for finding streams or substructure in high-dimensional stellar data sets, with Koposov defending brute-force foreground/background modeling, and Lang defending density estimation as a search heuristic. We all agreed that the best way to confirm a stream is with foreground/background modeling, so we were sympathetic with Koposov, but there are issues with speed, as there are in all brute-force methods.
Bovy finished, presented at MPIA Galaxy Coffee, and submitted to the ApJ his paper on dynamical inference based on Galactic masers today. He shows that the Reid et al masers can be used to constrain the Galactic potential, provided that sufficient prior information is provided about the position of the Sun in the Galaxy and about symmetries in the maser phase-space distribution function. The issue is that the masers do not represent an angle-mixed population, but nor can they be assumed to travel on circular orbits (they are observed not to). So you have to assume something in order to get the dynamical inference going. They are measured very precisely, so in principle they could have a lot to say about the Milky Way. However, most of the information in the observations is used to infer things about the maser phase-space distribution function, the parameters of which are (for our purposes) nuisance parameters and must be marginalized away. Look for it on the arXiv on Monday.
Lang and I continued to write code related to searching for overdensities in high-dimensional astronomical data. Bovy continued to write his paper on dynamical inference from Milky Way masers—the material he talked about at the Leiden meeting a few weeks ago.
Lang and I got up early and pair-coded the
arbitrarily covariant and heterogeneous errors in both dimensions case of fitting a line, with outlier rejection. I then showed this figure in the afternoon in the MPIA Hauskolloqium. My points of greatest emphasis were:
- If you want to say you have the
best fitmodel, then your model parameters better optimize a justified, scalar objective function. I mean
scalarboth in the sense of
single-valuedand in the sense of
respecting relevant symmetries.
- When you can create a generative model for your data, inference proceeds by maximizing the likelihood (or, better, sampling the posterior probability distribution function). You have no freedom about this; fitting does not involve much choice, at least at the conceptual level.
- Markov-Chain Monte Carlo methods—in particular with the Metropolis algorithm—are very easy to implement and run, and they optimize even non-linear problems, explore multiple local minima, and automatically provide marginalizations over nuisance parameters.
Lang and I pair-coded some analysis and plots for the fitting-a-line document, about which I am giving an informal talk tomorrow here at MPIA. We re-re-discovered just how easy MCMC is, and how useful it is when you have nonlinear problems with parameters that you need to marginalize out. With Rix, Lang, Bovy, and I had many conversations today about what is best communicated in a short seminar on this subject. We have been polling the crowd at MPIA and there are as many issues to address as people we have interviewed.
T. J. Cox (Harvard) gave a nice seminar at the MPIA today about merging disk galaxies in realistic simulations. He does a nice job of creating a huge diversity of spheroidal galaxies, with gas content and/or dissipation one of the dominant parameters. He (indirectly) confirms my methodologies for constraining the galaxy–galaxy merger rate and growth rate of spheroidal galaxies.
Today Lang and I ruled out the streams we have been finding in 5D this week. There is structure in the data that is almost certainly systematic, not real.
Lang and I ran some friends-of-friends in 5d phase space, continuing our search for substructure. We have tons of tantalizing structures, but we feel so confident that they are not real that we are assigning ourselves the job of ruling them out, not the job of confirming them. We failed to rule them out by the end of the day, although we found many bugs in our code.
On the weekend, Wu left Heidelberg and Bovy arrived. Rix made Lang and me a five-dimensional catalog (angular position, radial distance, and transverse angular velocity) of metal-poor halo star candidates. Today Lang and I began searching it for substructure. We found lots of structure; now is it real or useful?
My only real research contribution today was to re-write the abstract for the Astrometry.net paper. Over the two years between the first writing of that abstract and today, we have learned so much about the system and the problem in general that the old abstract was close to useless.
Lang and I discussed what it would take to get his thesis chapter on the design and performance of Astrometry.net ready for submission to the AJ. We decided that we have to add a short section on the false positives (which are extremely rare, and which can all be attributed to bizarre coincidences between flaws in the USNO-B Catalog with flaws in the input images). We also spun our wheels a bit on investigating some recent claims of Milky Way halo substructure.
Wu whipped into shape her Spitzer measurements of molecular H2 in galaxies selected (to have nebular emission) from the SDSS. Her goal is to measure the molecular-hydrogen mass function. This will require many assumptions, but we hope somewhat fewer than other estimates of this mass function. We will see. She has automated all of the IRS spectral data reduction and analysis and has automatically updating web pages for everything, in her usual (excellent) style.
In preparation for Lang's arrival in Heidelberg, I (nearly) finished the second draft of my enormous document about fitting a straight line to data. I also printed out this paper on linear regression in preparation for some
I never had the guts to post anywhere my polemic against principal components analysis (mentioned in passing here previously). In thinking about it for the last year I have come up with various alternatives that are better, more appropriate, and justifiable from a generative modeling perspective. However, the simplest is the bilinear model: Treat each data point as being close to a linear combination of component spectra, and optimize both the coefficients and the component spectra. The optimization can be of chi-squared, so it can properly represent the errors (after all, in any scientific application you want to minimize chi-squared, not the mean squared error, which is what PCA does). This whole idea is re-re-re-discovery, even for me. It is the technique used in Blanton's kcorrect.
Wu and I submitted the paper to the ApJ today, and it will appear on the arXiv next week. Here's the abstract:
We present optical and mid-infrared photometry of a statistically complete sample of 29 very faint dwarf galaxies (M_r > -15 mag) selected from the SDSS spectroscopic sample and observed in the mid-infrared with Spitzer IRAC. This sample contains nearby (redshift z<0.005) galaxies three magnitudes fainter than previously studied samples. We compare our sample with other star-forming galaxies that have been observed with both IRAC and SDSS. We examine the relationship of the infrared color, sensitive to PAH abundance, with star-formation rates, gas-phase metallicities and radiation hardness, all estimated from optical emission lines. Consistent with studies of more luminous dwarfs, we find that the very faint dwarf galaxies show much weaker PAH emission than more luminous galaxies with similar specific star-formation rates. Unlike more luminous galaxies, we find that the very faint dwarf galaxies show no significant dependence at all of PAH emission on star-formation rate, metallicity, or radiation hardness, despite the fact that the sample spans a significant range in all of these quantities. When the very faint dwarfs in our sample are compared with more luminous (M_r ~ -18 mag) dwarfs, we find that PAH emission depends on metallicity and radiation hardness. These two parameters are correlated; we look at the PAH-metallicity relation at fixed radiation hardness and the PAH-hardness relation at fixed metallicity. This test shows that the PAH emission in dwarf galaxies depends most directly on metallicity.
Didn't get much work done today, but there were nice seminars by Rudnick (Kansas) on the evolution of the mass density on the red sequence as a function of environment, by Carilli (NRAO) on the status and promise of high-redshift 21-cm experiments (he is involved in the one called PAPER), and by Sheth (Penn) on statistical techniques for cross-correlations and deconvolution. My take-home message from the Carilli talk is that there is some awesome work to be done in the software domain.
Today I gave an informal seminar on decision making (in a Bayesian context) in the "Applications of Machine Learning in Astronomy" course led by Coryn Bailer-Jones (MPIA). It forced me to write down what I think about this subject and why I think it is important. I don't think I conveyed, however, why making decisions controlled by an explicit utility matrix is better than just making uncontrolled decisions.
On a related note, Joe Hennawi (MPIA) and I discussed quasar target selection for SDSS-III at length, yesterday and today. Today our subject was hypothesis testing (star vs quasar) using not just colors but also variability. Choosing targets for spectroscopy ought to be a perfect application for decision theory, although the SDSS-III target selection code does not, at present, make use of utilities.
In between various writing projects, Wu and I inspected and modified some of her fits of emission lines in our Spitzer spectroscopy of SDSS galaxies. We also had a chat with Schiminovich about them. Wu's project is to understand the mid-infrared and optical lines in general, and also assess whether we can use them to measure the abundances of molecular hydrogen from its direct line emission.
Working on the weekend and today, we finished Koposov's paper on the GD-1 cold stellar stream. His constraints on the Galaxy potential are significant; we have the (formally) best current constraint on the Galaxy circular velocity, and we constrain the flattening of the dark-matter halo (though not at awesome precision). Congratulations, Koposov! It will hit the arXiv within days.
Wu and I finished her thesis chapter / paper on the ultra-low-luminosity dwarf galaxies observed with SDSS and Spitzer. Our co-author has one week to respond and then we submit it! Congratulations Ronin.
I worked on abstracts for the Koposov and Wu papers that are being finished now in Heidelberg. Abstracts are the hardest parts of papers and therefore should be written first and constantly tinkered with as the paper is written and revised.
Marshall and I developed a plan to start image modeling, with the goal of finding faint lensing galaxies under the flux of bright, multiply-imaged quasars. This problem is a hard one in ground-based data, but essential if the lenses from PanSTARRS and LSST are going to be properly mined. The project is related to the image modeling projects of Lang and Bovy and myself; Marshall and I are hoping that we can join forces somehow. Among other big issues: Can we believe the surveys' pipeline-output PSF models, or do we have to fit for a set of perturbations or modifications around or away from those outputs? If we do have to fit, what basis functions make most sense?
I added text to the fitting-a-line document about foreground–background mixture modeling for robust fitting. This is a big favorite of mine because it is fast, simple, and very close to the right thing to do. It also resolves some of my issues of a few days ago.
Phil Marshall showed up in Heidelberg today; he and Kasper Schmidt (MPIA) and I had lunch and discussed (among other things) the confirmation and rejection of hypotheses, which rarely—in the real world—goes according to either the Bayesian or the frequentist methodology. For example, the WMAP-1 paper was taken to be an awesome confirmation of the standard CDM model (and it was!) even though it had a bad chi-squared value (so frequentists were wrong to be excited) and it wasn't being competed against any serious alternative model (so Bayesians have nothing to say at all). I think this all comes down to message length, but I certainly haven't worked it all out yet.
After my binary programming failures of yesterday, Bovy implemented a Gibbs sampler and solved my problem overnight. But then in discussing the issues, we concluded—as we always do—that the only principled way to deal with bad data or outliers is to model them. This means performing density estimation or modeling of the distribution function for the outliers simultaneously with performing the fit on the inliers, so all the data can be generatively modeled as a two-component mixture. One component is the inliers, with model parameters fitting those inliers. The other is the outliers, with model parameters describing the distribution of outliers. I think we may have to switch to that in the robust fitting section of the now-infamous
fitting a straight line document.
I worked into the wee hours on robust fitting with arbitrary assignments of the binary classification "good" or "bad" to each data point. Robust fitting methods like this are beautiful, but exponential in the number of data points (this is binary programming, discussed on this blog previously in a different context). I attempted to make use of the magic of sampling but with little success.
I spent most of the day reading carefully and commenting on Koposov's paper on modeling the cold GD-1 stream in a (flexible) Milky Way potential. The paper puts some strong constraints (fully marginalized, of course) on Galaxy and gravitational-potential parameters; for some parameters these are the only constraints at this length scale.
I also attended a nice short talk about transit timing by Monika Lendl (MPIA). It turns out that exoplanets that transit their parent stars produce timing sequences that are extremely sensitive to resonant perturbers. A Jupiter-mass transiting exoplanet can produce observable timing residuals when perturbed by even an Earth-mass (or smaller) perturber, if that perturber is interacting resonantly. This is potentially extremely sensitive, but it has the great problem that for a given timing residual pattern, while it may be easy to detect the perturber, it looks very difficult to understand the perturber uniquely.
I spent the day commenting on papers by various co-authors. Over coffee, Eric Bell got onto the subject of orthogonal charge transfer, which is one of the methods by which PanSTARRS is going to maintain good image quality. In OCT, the idea is to monitor bright stars on short timescales, and shift the charge on the CCD east, west, north, or south as the point-spread function shifts, to keep the light falling in the most compact PSF possible. This technology has been developed and tested well at Hawaii. I got interested, because it is a perfect problem (if and how to move the charge, when) to cast into the form of Bayesian decision theory, one of my current pet methodologies.
I revived the project with Schiminovich to measure the extragalactic intensity falling on the Milky Way from quasars as a function of redshift. As expected, as you approach Lyman-alpha, it comes from z<3 quasars, and from a broad redshift range within that.
I spent my high-quality time today reading very carefully the first science chapter of Ronin Wu's thesis. Her thesis will be on the star-formation and radiative properties of galaxies, and the first chapter is on the mid-infrared properties of extremely low-luminosity galaxies from the SDSS.
Vivi Tsalmantza and I attempted to re-start our project looking for double-redshifts in the SDSS spectroscopy. We discussed the issues of wavelength coverage—which, with empirical galaxy templates, is a function of the two redshifts—and interpolation, which ought to be done by cubic spline (with SDSS spectra, at least). I started writing up the issue of hypothesis testing in this situation of variable wavelength coverage.
I spent the day reading and commenting on Dustin Lang's dissertation chapter on the kd-tree. Lang and Mierle have created for the Astrometry.net system the fastest kd-tree ever. The speed-up comes from tricks you can play with static data structures. That is, if you have a data set that changes rarely (or never) and you need to do fast lookup in the data set far more frequently than you need to edit the data set, you can build a static block in memory that contains the data, structured as a hierarchical binary tree, but without any pointer dereferencing or complicated (and therefore expensive and slow) structures. The thesis chapter on this is great, and it will make a great paper. The code is available under the GPLv2 license by request to Lang.
On the plane I wrote in—and at MPIA I discussed (with Chien Peng of Victoria and MPIA)—my document on straight-line fitting, which I hope to finish up in the next week or two. I must finish it now if I am going to get onto the next topic and be ready for the IMPRS Summer School.
With Peng and Eric Bell (MPIA, Michigan) I discussed the possibility that the initial mass function of stars (the frequency distribution for stars of different masses at birth) might vary. It is a bit of an exaggeration to say that Bell believes that such a discovery (variation) would mean the end of galaxy astrophysics, while Peng and I believe that it is almost inevitable that it must vary. Bell noted that it would not be the end of astrophysics if we could measure and model the variations as a function of controlling parameters. That would be good, but it is possible that it depends on things nearly impossible to measure.
Bovy and I discussed two of his dynamical inference problems today. The first is with Reid's masers, where Bovy finds, when he marginalizes over even the most basic unknowns, that the masers are not highly informative about the potential (or rotation curve) of the Milky Way. That is not to say that we don't have much knowledge about the potential, but just that the masers aren't the source of it. We differ from Reid et al in our conclusions because we aren't as confident about the input assumptions.
The second is the Solar System demo paper. Tremaine proposed some methods that treat the eccentricity distribution better and don't depend on the energy distribution. We are not sure that there is any way to write the problem down that is insensitive to the energy distribution that doesn't make other assumptions we are unhappy about. This all gets at the question of how
generative your generative model needs to be. Does it really need to generate your observations? And if so, at what level? I think ideally at whatever level your observational uncertainties are simple (understood, uncorrelated, close to Gaussian, and so on).
Research time is getting scarce this week as I prepare for travel to Heidelberg for my annual summer spa there. I have spent what time I can reading and commenting on Dustin Lang's monumental PhD thesis (advisor: Sam Roweis). As my loyal reader knows, Lang is the author of the Astrometry.net automated calibration system, which is composed of clever geometric hashing, exceedingly fast kd-tree-based index lookup, Bayesian decision theory, and lots of web, script, and WCS candy. The thesis therefore contains both nice theory and beautiful engineering. It is a pleasure seeing it come together.
[According to Blogger (tm), this is my thousandth post to this research blog!]
I spent a few research-minutes today looking at the code of Aukosh Jagannath (NYU), who is working on the kinematics of cold streams in a toy galaxy, subject to perturbations by compact substructure, which leave behind coherent traces, in principle. He is making some fun movies; now can we actually detect substructure using streams?
On the plane home I finished working out the Bayesian framework for inferring the potential of the Milky way and the velocity and mass distributions produced by the launch mechanism for hypervelocity stars. Now, to the SDSS data, where there are a few known. I am not sure there are enough, and with only photometric parallaxes, the observations may not be highly constraining on the potential (which is what I care about). I am *sure* there is enough information already to interestingly constrain the velocity distribution produced by the launch mechanism.
I spent the day brushing up and giving my seminar on dynamical inference at OCIW. During it I was reminded what a good idea Tremaine's "two-star streams" is. We should do that. Any volunteers?
After a morning performing my Spitzer oversight duties, I worked on a simple, highly approximate, but maybe useful model of the fastest stars in the Galaxy, continuing where I left off on hyper-velocity stars.
Jay Anderson (STScI) gave the colloquium at OCIW today. He showed the incredible precision he can get in astrometry, proper motions, and photometry with HST when he models the image pixels directly, as we advocate. His color–magnitude diagrams for globular clusters are simply incredible. As is his astrometric precision (few percent of a pixel) for point sources, despite the fact that the images are only barely sampled. The science is good and the engineering is sweet.
Phil Marshall and I spent the afternoon in (and near) Marshall's UCSB office, trying to extract all the stars we can from the enormous HST dataset he put together for our lens searching. During a break, I met Brendon Brewer (UCSB), who is a real Bayesian.
At morning coffee at OCIW, David Law (UCLA) told us that he has evidence—from the morphology of the Sagittarius stream, mainly—that the Milky Way halo could be triaxial. This is pretty intriguing, because it is the generic prediction for dark-matter halos, as I understand is (substructure notwithstanding). He reminded me of the paradox that the Sagittarius stream must—in any model—be very old, which goes against the idea that galaxies are generically building up their mass through mergers.
Juna Kollmeier (OCIW) and I spent some time today talking about the ways that high-velocity (think nearly unbound or unbound) stars could be used to constrain properties of the Galaxy. Of course it got me thinking all Bayesian; it has some interesting similarities to the Oort problem, but with the added interesting and very useful (for inference) complication that the orbital times can exceed the stellar lifetimes. If you assume that stars are born near the center of the Galaxy, this could probably be made into a very interesting kind of inference.
After a day spent writing (yet more on my fitting paper) and arguing (with various people at OCIW, where I am spending these two weeks), Mario Mateo (Michigan) gave a nice and thought-provoking talk about dwarf galaxies, focusing on dark-matter content and host halos.
Spent the long weekend writing on the line-fitting and reading on Bayesian methods. The defense of Bayesian relative to sampling theory or frequentism in the MacKay book is far better than that in the Jaynes book, because MacKay uses much better frequentist examples, cites specific sources for the frequentist solutions he gives, and tries to find some common ground. He points out (and I agree) that frequentist p-values and chi-squared tests are very good for discovering problems.
I wrote words about bootstrap and jackknife in my nascent line-fitting document. Bovy made some fake data for demos.
I read the Navarro et al paper on Arcturus, the moving group that is in the Solar Neighborhood but lagging the Solar motion (and the Local Standard of Rest) by 100 km/s. Navarro et al base their conclusions—that the stream is a disrupting galaxy that has contributed stars to the disk—on a number of things which are not at all consistent with Bovy and my reconstruction of the local velocity field. That is, most of the stars that Navarro et al associate with the moving group are not part of the moving group found in our analysis, though they certainly are lagging stars, and the group we find has a velocity dispersion way too small to be the remnant of anything massive. Then again, we do find a large population of lagging stars outside the Arcturus group, many of which may be interesting.
Aukosh Jagannath (NYU) and I worked on his code to generate perturbed cold tidal streams, perturbed away from their orbits by passing substructure. I am hoping we can, in some sense, image the substructure with these streams, if they are abundant enough. We worked on generating visualizations to convince us that the signal is there, in principle.
Adrian Price-Whelan (NYU) and I worked on the figures and text for our short paper on the bandwidth with which images should be communicated and saved—or the smallest difference in pixel values that should be permitted by the representation. We find that there should be one or two bits spanning the noise, and that is all. All of the information, on sky level, variance, source positions, and even on sources that lie below the noise level (!) is preserved even when there is only two bits spanning the noise. This has implications for experimental or instrument design, and especially space missions. I don't think our results are really new; many of them exist in the stochastic resonance literature.
In my writing on fitting a straight line to data (!), I confronted today the issue that Bayesian analyses return distributions not answers and the investigator is forced to decide how to present those results. I am a strong supporter of returning a sampling from the posterior distribution, but that is expensive when your model has thousands of parameters! The standard practice is to return the
maximum a posteriori values, which is the quasi-Bayesian answer to maximum likelihood. Not sure how I feel about that; I think it may only make sense at high signal-to-noise, which, as I have commented before, only holds when it doesn't matter what you do.
Charlie Conroy (Princeton) came and spoke about modeling galaxy spectra as linear combinations of stellar populations. He is working towards marginalizing over unknowns and propagating errors, and then finding observations that reduce the magnitudes of the ranges for the unknowns. He shows very convincingly that stellar masses (the simplest thing you might want to get out of such a model for a galaxy) are not secure at the factor-of-two level, because of things like late stages of stellar evolution (which are brief, random, and luminous in many cases).
[Out sick for a few days; hence no posts.]
Bovy and I spent a lot of today discussing distribution functions, in the context of Rix's (really Reid's) masers and the Solar System. This was partially inspired by Tremaine sending us an alternative method to the inference problem we solved before April Fools' Day. The issue of interest is that of modeling distribution functions when you don't know what model space to use. In Bayesian inference there are ways of giving literally infinite freedom to the distribution function and nonetheless getting a reasonable posterior distribution, fundamentally because the method returns not an answer but a distribution. That said, it is not clear anything in this area is practically relevant. One place it might be is Reid's masers, where the paucity of data makes good inference important, and makes complicated methods tractable.
At group meeting today, Guangtun Zhu (NYU) started us on a discussion about morphological and kinematic selection of a pure sample of non-disk-dominated ellipticals using only SDSS data. This is not a trivial problem, and has been solved in an unsatisfying and relatively ad hoc manner up to now. If he solves it, there is a lot that can be done with his sample.
On the weekend I finished reading Jaynes, Probability Theory, which is an extremely long and detailed polemic about probability, inference, and decision theory. It was great!
I had many problems with the book, like its dismissal of alternate positions, and its unfair treatment of competing methods in each situation. It had too much bashing of other people. It also had some dumb things to say about quantum mechanics (did Jaynes never hear about Bell's inequalities?). But overall the book was extremely useful to me and clarifying on issues like noise and uncertainty, the subjectivity or objectivity of priors, and the locations where Bayesian methods involve ad hoc input and where they are mechanistic and driven by the rules of probability theory.
Jaynes takes a strong position that
probabilities are always descriptions of your knowledge: There might be a definite fact of the matter or not, but what you know about it is probabilistic. He then refuses to use the word
probability for anything else! This is a bit crazy, but it appeals to my extreme positions on consistency. So almost anything you or I would call a
probability distribution, Jaynes would call a
frequency distribution. All this reminds me of my careful use of the words
error. As my mentor Neugebauer used to say about our
They aren't errors! If they were errors, we would correct them. They are uncertainties.
In the morning I started to think about my lectures for the IMPRS. I think my first will be on fitting a straight line to data. It turns out I have hours of material on this! I pity the students.
In the afternoon, Glenn van de Ven (IAS) came by to discuss dynamical modeling with Bovy and me. van de Ven has beautiful results on external galaxies and on the globular cluster omega Cen, modeled by what's known as the Schwarzschild method. In the globular cluster, for which he has similar kinds of data to what we have for the Galactic Center, he has evidence for several different dynamical components in phase space; we wondered whether those might be related to the fact that omega Cen also has several different stellar populations.
Today Bovy took and passed (of course) his candidacy exam. He told us about dynamical inference in the Solar System, the Galactic Center, and the local Galactic neighborhood. Congratulations!
Immediately after Bovy's talk we had a special seminar about early Fermi/GLAST results on the gamma-ray sky. The short summary is that they are very sensitive to point sources, and see many new sources; they have found a population of radio-quiet gamma-ray millisecond pulsars; they don't see any very good evidence of dark-matter annihilation, but they do have one resolved source with no Galactic counterpart. This they are not yet releasing, but it looks intriguing.
I don't write papers that begin with the word "Let". However, today Roweis convinced Bovy and me that our joint paper on the method we use to reconstruct distribution functions from noisy and incomplete data should be closer to that style, if we are going to publish it in the stats literature. We also discussed many other things with Roweis, including operations research algorithms, decision theory, and matrix math. It is always great when we get a visit from the master.
In my infinitesimal research time today, Bovy and I discussed the dynamical measurements of the black hole at the Galactic Center. It appears that received wisdom is that by far the best measurement of its mass comes from the one closest star that has completed a full orbit in the 15-or-so years in which it has been observed, and all other measures are deprecated to that one. It also appears that the central mass estimate has creeped upwards throughout that time, so there are unresolved discrepancies among different data sets. Joint analyses of all the data available have not been attempted, to our knowledge.
Lang and I, working towards the Astrometry.net main paper, built an SDSS-based quad index and verification system (using the command-line tools of the Astrometry.net codebase), optimized to calibrate (blind) images as small as those produced by the Hubble Space Telescope. We succeeded in calibrating some images.
We spent the evening—into the wee hours—working on computational photography (treating the output of the Astrometry.net digital camera as scientific data). That's Friday night at Astrometry.net world headquarters! While we hacked, Stumm was being wined and dined by the museum set at the American Association of Museums Annual Meeting for his work on web-2.0 museum interactions with Flickr and Astrometry.net.
Today we inspected, in the context of the deepest imaging we have, the sources we
harvested from the Flickr group images yesterday. They look real, so we are emboldened. We realized that between this and the stuff Lang did on Comet Holmes (he can determine the orbit from the footprint of Flickr-submitted images!), we have some good evidence that Flickr and amateur astrophotographers can have a big impact on science.
Stumm, Lang, and I gathered some (heuristic) evidence for sources in the region of M42 (the Orion Nebula) using images submitted to the Flickr pool. The image below shows in transparent blue symbols the detected sources in the Flickr images in a tiny patch in the nebula. We take it from this that the amateur data do have highly redundant (and therefore reliable) information about many sources.
I had a great time at the thesis defense of Marilena Loverde (Columbia) today. Her thesis was on magnification bias effects on precise cosmology observables, and also signatures of non-gaussianity. She took a theory-centric view, which permitted her to investigate many precise effects, all of which will be either confusing or useful to precise surveys in the near future.
Christopher Stumm (Microsoft) and Lang are in town this week, to see if we can do reliable science with unreliable data. Stumm designed, wrote, and operates our Flickr pool in which we calibrate amateur astrophotography just as if it were professional data. Now we want to use those data as if they were professional data, since they certainly contain tons of information. But the challenge is accounting for the fact that data of unknown provenance will, in general, contain unknown artifacts, errors, and noise. We started on project specification this morning.
The plot below (made by Schiminovich and me today) shows the GALEX–SDSS colors of quasars with photometric redshifts between 2 and 3.5. The white points have spectroscopic redshifts in that same range; the cyan points don't. The vertical axis is a GALEX–SDSS color that separates them well. This might have implications for SDSS-3 quasar target selection, which takes place in regions of the sky with good GALEX coverage.
I got all fired up about decision theory this past weekend, and wrote something about it today, in the context of the Astrometry.net
verify step. Here is the first paragraph:
All science is really decision theory, not inference! After all, the end result of an inference is a paper, and what one writes in the paper depends on the result of the inference, and not at all trivially. That is, the content of the paper is not simply a statement of the posterior distribution function (obviously). Despite the obviousness of this, most discussions of scientific data analysis end at (or even before) the inference step, and don't give quantitative guidelines for anything that follows, except perhaps for the construction of confidence intervals.
Many assume that I am a religious or dogmatic Bayesian, because I use Bayesian statistics. I am not; I use them because (a) they are not ad-hoc, (b) they are justified in the context of probability theory, and (c) for many problems I do they perform better than the alternatives (see, for example, our April Fools' paper). Bovy and I spent some time today doing battle with Kyle Cranmer (NYU) on this point, which was productive inasmuch as we understood the differences between a Bayesian and frequentist approach, but unproductive in that Cranmer thinks he could do much better with a frequentist method (to our Fools' problem). We asked him to
put up or shut up, as Gruzinov says. While we were arguing, who should walk in but Yuri Levin (Leiden), the author of the frequentist orbital roulette paper.
Barry Madore (OCIW) gave a very provocative talk about the possibility that there might be dark galaxies and then showed that after some pretty hard looking (using galaxy–galaxy interaction signatures), he can't find any. So they are rare, at the near-L-star level. Nice and fundamental!
Rix, Bovy, and I discussed re-doing the famous Oort problem on the vertical structure of the Milky Way disk; that is, infer the density of the disk by simultaneously fitting for the position and velocity distribution of stars in the vertical direction. The differences now relative to the 1930s is that we have far more stars, far better measurements, and we can do the test over a finite fraction of the Milky Way disk. We might be able to see gradients and look for signatures of departure from axisymmetry, which would be qualitatively new.