Roweis, Lang, and I argued loudly for quite a while today about what
quads (sets of four stars) we should index in the hypothesis generator for our blind astrometric calibration system. The strange thing about our argument is that we all agreed that we should index all quads (not just quads in some restricted subspace of configurations), but we disagreed mightily on why we should. Mierle seemed to be of the opinion that this was all pretty academic, but then we reminded him that we are all academics.
Roweis, Lang, and I argued loudly for quite a while today about what
I spent the day in the Google cafeteria telling Lang everything I know about astronomy, in part to help him re-write our paper for Science and in part to help him plan his PhD dissertation. Both items will be about computer science mainly, but they do touch significantly on astronomy.
The whole Astrometry.net core team unwittingly celebrated the third birthday of this blog on Sunday by discussing the possible public
beta launch of our web services. The way we launch and the functionality we preserve and support in beta depends strongly on what we decide is important scientifically, that is, in the support of science with the astrometrically calibrated output of the services. So we spent a long time arguing about that.
I spent today working on the tech report describing our progress to date.
Roweis and I spent time going over Barron's work on the blind (that is, with no first guess) measurement or calibration of the date at which an astronomical image was taken. We worked on the question of: Which stars in the image contain most of the information about the date?
I hacked away at my statistical test of the Astrometry.net automated calibration system, and a bunch of stuff relating to Picasaweb, Astrometry.net, and Google Sky. Ryan Scranton (Google) showed up and we talked SDSS imaging.
I spent most of the day setting up and running a huge test of our automated calibration on a large set of heterogeneous images, most—but not all—of which are pictures of the night sky. This test will inform us about the statistics of our calibration system, test our code, classify a large number of images into
night sky or
not, and, of course, calibrate them.
Here is a beautiful ill-posed statistics problem in astronomy that comes up over and over again: Image A in bandpass B with point-spread function C overlaps image D in bandpass E with point-spread function F. Which sources in image A match which sources in image D? When the bandpasses and point-spread functions become arbitrarily different (for example, when you compare an HST image to a SCUBA image), the problem becomes hard.
Not just hard, but seriously ill-posed. My position is that all important questions in science are ill-posed questions. Roweis and I spent some time discussing how to convert this problem into a well-posed statistics problem and then how to solve that well-posed problem. This is on the critical path towards building a generative model of every astronomical image ever taken.
I worked on the Astrometry.net project book introduction, capitalizing on the evolution of my thinking that has proceeded while I have been here on the West Coast in SF and at UW.
With Zolotov I began looking at the relationships between the spatial positions (orbits) of halo stars and the progenitor galaxies from which they were accreted—in simulations of Milky-Way-like galaxies.
I am now, officially, a web 2.0 hacker. With some help from Roweis and the beautiful Picasaweb API, I wrote bash scripts to upload thousands of images to a Picasaweb account, and tag those images with relevant meta-data. If you want my scripts for your own use (upload a directory of JPEGs from the command-line!), just email me.
I spent the day at the University of Washington, where, among other things, I pitched Stumm and my idea of the open-source sky survey. There is a lot to be done, but if we can do it, the OSSS could operate itself, in principle. I also spent time with Dalcanton and Ivezic talking about astrometry, and Connolly about galaxy evolution.
I learned about the world of geotagging, photo-sharing, and community map-making with photos, especially with respect to the site Panoramio. It is a great site—with moderation—so that in any view of the Earth you always get the most interesting photos within that view. However, it's API doesn't appear to do what we need right now.
We learned today that while tags are searchable on Picasaweb, comments are not, but while comments can be written by any user, tags can only be written by the image owner. This throws a bit of a wrench into the simplest of our strategies for building an astrophotography collaboration site out of Picasaweb. But we were not daunted. Flickr is a bit better in that it has a defined format for
machine tags for building exactly this kind of thing, and image owners can give permission to others to tag their images. Roweis and I wrapped our heads around all this and worked on Astrometry.net's
API for interacting with web-based photo-sharing sites.
Roweis and I had a very long discussion about ways to have existing web-based photo-sharing and web-publishing sites handle the data for Astrometry.net. There are huge numbers of issues, not the least of which is that the opportunities for storing meta-data are very limited. However, if we can work something out, we win big because maintaining high-bandwidth servers is not a problem of great intellectual interest to our team.
I actually did some work at the Google booth helping people to operate Sky. The day was filled with discussion about how to make Sky more useful as a scientific tool, by, for example creating mash-ups with data archives. The nice thing is that Sky has an easy API, so this can be done by third parties (such as myself). One problem with trying to make progress at a meeting is that usually all you can do is plan more meetings.
I learned stuff about APOGEE (a part of SDSS-III) from, among other people, Nick Konidaris (UCSC) and Ricardo Schiavon (UVa). It will obtain high precision velocities and information about some dozen elements for 100,000 stars with a R=20,000 infrared spectrograph. This really provides qualitatively new information about the Milky Way.
Today was my first full day at the AAS meeting in Austin. I spent the whole day with the Google Sky team at the well-appointed Google room at the meeting. Google Sky launched some new features today, and gave some very well-attended presentations about how it works and how to use it for research and education. With the Sky team and with many AAS members, we discussed Sky and Earth imaging, adding content with tools like Astrometry.net, making PNG images from data, and, inevitably, the OLPC XO, which I brought along.
My ideas about making human-viewable images out of scientific data in a quasi-reversible way started a big discussion on the Astrometry.net internal email lists. It is not possible to make human-viewable images that have the full dynamic range of a typical astronomical image and look good. In most cases you can't do either, but you can essentially never do both. So I have to either go with insane ugliness or else bad irreversibility. Or else we just punt on Flickr and Picasaweb and move to archives that can take FITS files like MAST.
On the way to Austin, TX for the AAS meeting I completed my planning for my FITS to PNG converter. Now I just have to find all the open-source Python libraries I need.
Fergus and I finished our NSF pre-proposal today.
At Google, Roweis and I discussed the possibility of using Flickr and Picasaweb as front ends for Astrometry.net, thereby offloading image storage expenses to existing (and popular) web services. The main issue is that these services only accept human-viewable image formats like PNG and TIFF, not scientific data analysis formats like FITS. So I worked on methods for conversion that preserve the most scientific information possible.
At the urging of Mierle, I started the Astrometry.net project book on the way to California (I am spending January at Google here in San Francisco). I wrote about the use of amateur and historical data to improve our measurements of stellar proper motions.
It feels like I have spent all year writing proposals (well, I have if you count starting from 2008-January-1, but I mean
academic year). My conversation yesterday with Fergus was motivated by the possibility that we might write a proposal together. I spent today writing it.
I spent some time this afternoon discussing possibilities for interdisciplinary research with Rob Fergus (NYU) in the Computer Science Department. We decided that there are many points of intellectual overlap, especially that we work on images, that we work on brute-force methods with enormous numbers of images (and large amounts of disk), that we can't trust the sources of our data and we face unreliable meta-data, that we need methods that scale to problems where the number of images is in the billions (that is, we need better than linear scaling), that we are trying to build models of the image universe directly from the data and not from a heavy theoretical or conceptual framework, and that we want to make systems that just work, like the web services of Astrometry.net.
The differences, of course, are that my problems are very specific while Fergus's are very general. His problems are harder, in this sense. On the other hand, the precision and accuracy and false-positive constraints on my problems are much more severe. So there are significant differences in what we are doing. The challenge is to build an interdisciplinary program that benefits from the overlap, but produces useful output in both scientific domains.