I worked on astrometric
tweak today—the code that does precise astrometric calibration following rough calibration by our blind system at astrometry.net. Mierle convinced us at the astrometry.net meeting that we should be using the
ransac algorithm for tweak, and I have become even more convinced since then. Ransac is much better than the astronomer's usual tool of iterative sigma-clipping, especially when (as Mierle advocates) the inlier/outlier decision is made by fitting a model to the residuals that consists of a gaussian core of inliers and a flat distribution of outliers, superimposed.
I worked on astrometric
Jon Barron (UofT, Novartis) is in town and we worked on refining and writing up our work on cleaning the USNO-B1.0 catalog of spurious sourced caused by diffraction spikes and reflection halos in the plate imaging. We have a very conservative system with very few free parameters.
I also fixed some bugs I had introduced into the very fast and reliable pipeline that takes an image (any image) and returns the x,y coordinates of all the stars.
Jim Peebles (Princeton) spent the day at NYU to work on a synthesis of galaxy evolution observations and predictions. We are trying to write a document that draws out tensions with the dominant (CDM) paradigm, and advocates new observations and new theoretical work, in the service of understanding galaxy evolution and the dynamics of the dark sector in the context of the standard model of cosmology (which is extremely well tested on large scales—scales much larger than galaxies). We ended up spending a lot of time talking about massive central black holes, whose abundances, masses, and locations in galaxies all are very constraining on the hierarchical picture of structure formation. If galaxies grow by merging, and the pre-merger galaxies contain black holes, then in general there ought to be non-central and ejected massive black holes. None have been observed, to my knowledge.
I spoke at group meeting about using statistical methods to measure proper motions for substructures at amplitudes below the measurement uncertainty for any individual star, and the possible application to the Sagittarius stream. In the afternoon, Zolotov and I worked on Zolotov's AAS poster, which goes up next week.
I spent the morning facilitating Moustakas's use of astrometry.net in a science project. He is stacking U-band images into a mosaic. We only needed to adjust two things. (1) We needed to insert a cosmic-ray rejection step into his pipeline. We hacked something together but also thought about how we might insert that into astrometry.net as a standard option. (2) We had to split his multi-extension FITS files into individual images. This is clearly not optimal, but the multi-extension files require a lot of technical infrastructure (tying together the tangent points for multiple arrays on the same focal plane, and fitting or solving multiple images simultaneously) that so far has been off the critical path.
In the early morning, I not only fixed all the bugs I created yesterday, but I also worked out some fundamental issues with doing source detection in images with very limited dynamic range, like jpegs off the web and scans of photographic plates. These issues are obvious, but non-trivial: Stars have to be subject to strong non-linearities, and at the bright end, it is the size of the source that is related to flux, not the peak value in the image (which has saturated). Of course these issues are known. What is not known is what to do, in general, in data of which you have little or no knowledge, and when extended sources can be as prevalent as stars. These are the conditions under which astrometry.net operates! Blanton helped me come up with some ideas.
In the late morning and afternoon, Kallivayalil and I agreed to focus on getting a proper motion for Pal 5. This is a project that is hard, but not impossible (we hope), and finite. The first step is to gather all the HST data, survey data, plate data, and random ground-based data that we can find, and to turn those data into precise coordinate lists.
I spent the morning breaking software that has worked for months, by attempting to track down bugs in our simpleXY object-finding and measuring software for astronomical images. I failed to find the bugs, and left the software worse than when I started!
In the afternoon, Nitya Kallivayalil (Harvard) came into town and we discussed the issues involved in measuring statistical proper motions. She is the world's expert, because she has measured the proper motions of the LMC and SMC by comparing stars to quasars in HST images separated by two years. Now the question is: Can we do much more with heterogeneous data (which are worse than HST data) separated by much longer baselines? The issues are severe, especially when we think of the holy grails of the Sagittarius stream and other Milky Way substructure.
Zolotov and I worked on her visualizations of the SDSS data on the Sagittarius stream. As far as we can tell, the current models don't agree with the observations, and the observations in different kinds of stars (which have different systematic issues) don't obviously agree with one another. Zolotov is trying to put together a AAS poster that illustrates all this in a useful way.
Very early in the morning I did surgery on astrometry.net's awesome simpleXY code, which takes as input any astronomical image and returns a reasonable list of sources with x,y positions in the image and approximate fluxes. The code is incredibly simple (hence the name), incredibly fast, has very few free parameters, and essentially always works. It is Blanton's handiwork. It also produces very stable measures of object positions, thanks to astrometry-fu we inherited from the Sloan Digital Sky Survey. What I did was to make the code even more simple. Now I ought to write it up!
I spent the morning reading and digesting a nice paper by Storkey et al. It contains a lot of content that is very analogous to what we are doing with Stumm and Barron on cleaning USNO-B1.0 of diffraction and reflection artifacts. The main difference is that our stuff is highly specialized to USNO-B1.0, which makes our stuff less generally interesting, but much more sensitive to subtle, small, faint, or lightly populated artificial features.
While Barron soldiered on with enhancements to the automatic procedures for diffraction-spike and reflection-halo spurious-source detection, Stumm and I indulged in some mission creep: We looked at methods for identifying dense, linear features in the sky distribution of USNO-B1.0 sources that are not caused by diffraction spikes. The origin of some of these features are edge-on galaxies; the origin of others are incredibly unlikely coincidences of artifacts in multiple bands (inclusion in USNO-B1.0 required detection in multiple bands); and the origin of others are incredibly weird plates we don't understand. We didn't have much success with Hough Transform techniques, but we theorized a ransac-like approach that is promising.
Barron, Stumm, and I made plots of the colors and magnitudes of
stars identified as spurious in the USNO-B1.0 astrometric catalog on the basis that they form morphological crosses and circles centered on the bright stars. Interestingly, they have magnitudes and colors that are not unreasonable, so they could not be identified as spurious on a photometric basis. I presume this is why they made it into the catalog in the first place!
On Friday, Zaritsky (on sabbatical here at NYU) gave a nice talk on methods for determining the sizes of disks. It turns out, perhaps not surprisingly, that they go out a long way. In discussion at the end, it emerged that the "thin disk" test of CDM merger histories might be made stronger by looking at disks at large radius, since larger radii will be more susceptible to gravitational perturbations, and will also extend further into the substructure-filled dark-matter halo.
Today I did some research on issues with the USNO-B1.0 astrometric catalog. UofT undergrads Stumm and Barron are visiting next week to get a paper drafted on their use of computer vision techniques to improve the catalog. We are finding that a combination of computer vision and astronomical techniques can very reliably clean the catalog of a large subset of the non-real sources (which amount to a couple of percent of the 109 entries in the catalog).
I have been traveling for the last two days, hence the lack of posts. I re-learned how to do SDSS CAS queries, to obtain a complete list of the science-grade images used to construct SDSS DR5. We are running astrometry.net on all of them to measure statistics (and build knowledge of the sky, of course). I also worked on various strategies for tweak. I have a very robust one (though it is still vapor-ware) that iteratively fits not the WCS mapping, but the residuals in the current best-fit mapping, and iterates until the residuals have no power at the scales at which we are fitting.
One of our functional tests of astrometry.net is a run through an immense amount of SDSS data, looking to see what percentage we solve (>99 percent) and what percentage solve as false positives (we have never had one, so <3×10−5). I diagnosed a failure today and found this (small cutout shown below). It is an engineering-grade SDSS image, where the PSF is double-peaked, and different stars have different peaks dominant (some left, some right). Now that's what I call bad data. Can we be blamed for failing to solve that?
In research time stolen from exam preparation and proctoring, I worked on trivial matters related to astrometry.net, including web pages, administration of the alpha test, meeting minutes, filing tickets, etc. We got the following in an email from alpha tester Chris Kochanek (OSU), who has been using the system to great effect:
When I describe this [astrometry.net], all the observers want the code to install now. Congrats to all involved!. That improved my day.
In other news, UofT undergrads Jon Barron and Christopher Stumm have been looking at cleaning up the USNO-B1.0 catalog. Here is a plot of a healpix pixel of the sky, in which they have plotted the USNO-B1.0 catalog entries that have colors that aren't consistent with being (correctly measured) stars.
Woah did we work hard! I didn't have any time to even read email, let alone post. In what follows, recall (or learn) that astrometry.net solves the astrometry for an image blind by the following steps: It uses quads of stars to generate large numbers of hypotheses about pointing, rotation, and scale. It attempts to verify those hypotheses using a likelihood ratio (correct vs random lucky hit). Hopefully one verifies. It then tweaks the verified astrometric WCS to something precise.
On day 2, we worked very hard on tweak; we agreed on a scalar and a algorithm/methodology, and Mierle started hacking. Tweak is probably the biggest gap between where astrometry.net is and where it needs to be with its alpha testers. The functionality for professional and amateur users was discussed, and we came up with some ideas for a more modular system to give users more flexibility. We also made some breakthroughs understanding some false positives (almost none of which are our fault, it turns out) and looked at the awesome assemblage of astrometric
footprints produced by David Warde-Farley. Full-team dinner was delicious.
On day 3, Lang convinced us that he has a much better verify than the current one, and we worked out the math and implementation. Lang implemented. We also talked with Christopher Stumm and Jon Barron about their automatic detection of diffraction spike "false stars" in the USNO-B1.0 catalog and how to evaluate their success using astronomical techniques. A paper about cleaning USNO-B1.0 has begun. Mierle continued to hack. I went home for much-needed sleep!