data mining

I started working this week with Vivi Tsalmantza (MPIA) on data mining in the SDSS spectra. She is starting with dimensionality reduction and classification. The standard tool is PCA, but it ranks the components in terms of their contribution to the data variance. This has two problems, the first is that in many data directions your variance is probably dominated by your errors, not anything of scientific interest, and the second is that astronomers don't necessarily care most about the data variance! But we came up with some ways to apply robust estimation techniques to the dimensionality reduction, and I have an evil plan of eventually performing the dimensionality reduction on the error-deconvolved underlying distribution. But that may not be possible, for all sorts of reasons.


source association theory

I started writing some theoretical stuff about source association. I guess this would qualify as theoretical data analysis. I don't know what could be more boring than that! I am trying to justify the position that source association across catalogs is an ill-posed problem with no well-justified (even within any reasonable approximation) solution to date. This is a bit hard to argue given that astronomers have been doing it successfully for the last hundred years.


figures done

Lang and I finished all the figures for the first draft of the faint-source proper motion paper today, and I finished a first draft of all of the figure captions, one of which is almost an entire page of text.


stellar streams

Sergey Koposov and I spent time talking about finding streams in the SDSS imaging, using things akin to matched filters. Matched filters are very frequentist; they involve differencing integrals of the data. I prefer similar methods but that involve fitting distributions to the data that are more bayesian, but either way, it is clear that there is a lot of information in the color–magnitude space that is complementary to the information in angle space, and there also appears to be information in the proper-motion space. I would like us to try using it all.


stellar stream EM

Lang and I had discussed with Rix and Sergey Koposov (MPIA) the statistical detection of the proper motion of a cold stellar stream using the proper motions from comparison of the SDSS with the USNO-B imaging. This looks possible because although no star in the stream is measured at high signal-to-noise, and although no star is clearly in the stream or not, there is power in numbers. Unfortunately, just as Lang and I were getting ready to bust out some expectation-maximization, Koposov obtained the proper motion of the stream in question by a completely straightforward frequentist analysis. Nice work!


more informative figure

Lang and I worked on figure making for our faint proper-motion project. Here is the current incarnation; it shows the best-fit path for the star, and a sampling of the error distribution as a set of N other paths. The faint disks show the star sizes and positions given the image point-spread functions and assuming the source is traveling on the best-fit path.



Today Lang and I worked on the plot that shows that our proper-motion measurement code comes close to saturating the information in the data.


hypothesis comparisons

After Lang gave the Galaxy Coffee talk at MPIA, there were lingering questions about the differences between modeling a little smudge in co-added multi-epoch data as a moving, unresolved star or as a non-moving, extended galaxy. In the co-add, these hypotheses are hard to distinguish, but in the individual images, these hypotheses are very different, even though the object may not be measured with good signal-to-noise at any epoch. We began the work of explicitly making the non-moving galaxy model, so that we can perform clean hypothesis tests and quell the last of our critics.


synthetic image modeling

Lang and I discussed once again the issue of matching up datasets at the catalog level, and learning thereby about the positions and motions of stars on the sky. In each of these discussions we always conclude that the limitations of catalog level are such that we always want the images and to work at image level directly. However, today we realized that we could work with synthetic images, created from the catalogs and our model of the sky. The parameters of the model of the sky could be optimized to create synthetic images that best fit the synthetic images created from the set of catalogs. This got us substantially closer to the scalar objective we seek at catalog level.


saving dollars with astrometry

Lang and I have showed that we could have saved the very-red-objects community quite a bit of telescope time (read: money) by measuring proper motions of faint sources and thereby obviating a bit of spectroscopy. But Rix asked us if we could have saved them not just the spectroscopy but also the infrared imaging. Lang and I spent time on this question today. Not sure that we will be able to be so cocky here.


supernova rates, GAIA

Lang and I sat in on the MPIA GAIA group meeting. We discussed the photometric and spectroscopic identification of stars, binary stars, galaxies, and quasars in the GAIA data stream, and tests on the SDSS data and other related data. The GAIA team is using support vector machines, which also got Schiminovich and I excited last month; the GAIA team may be the main (or only?) users of SVMs in astronomy. It turns out there is work here that is similar to the archetypes project I have been pitching to Bovy.

In the late afternoon, Dani Maoz (Arcetri, Tel Aviv) gave a nice talk on supernova rates, focusing on Type Ia rates. He made a pretty good case that some of the Type Ia supernovae are prompt, and those that aren't prompt occur on the short side of the delay distributions that are discussed in the literature. This makes it interesting that galaxies show such clear alpha-enhancement patterns.


group finding

Lang and I pair-coded some web-based analysis of stars taken from the SDSS imaging sample, looking for groups that are plausibly tidally disrupted structures in the Milky Way halo. We didn't find anything, although we wrote code to automatically name it if we do!


nearby supernova, fast movers

Oliver Krause (MPIA) gave a beautiful talk about the discovery of light echos from the Cas A supernova and its identification as a type IIb. This identification was performed by taking a spectrum of the original supernova, but delayed by 300 years because it is being observed now in reflection from a nearby dust cloud! The identification is remarkable, because as of now it appears that the Local Group is way over-represented in type IIb SNe.

Lang and I worked on the fast-movers in our faint proper-motion paper.


structure in point sets, brown dwarfs

The MPIA was abuzz today with talk about finding structures in point sets with non-trivial error properties. This, of course, relates to the identification of streams and satellites in the Milky Way halo, which has been an industry for the last few years, since the discovery of Willman 1 at NYU in 2005. Lang was consulted, as our computational statistics expert.

Lang and I also made the list of low-mass star (including brown dwarf) candidates from our proper motion work into a LaTeX table for publication.



Lang and I conceived of and started to work out a project to build an all-sky astrometric catalog out of the original USNO-B imaging catalogs and the 2MASS catalog, all tied to Tycho. This would be the first step towards our first Astrometry.net astrometric catalog. This would also be a first shot at doing source matching at the catalog level, a subject about which we have been talking for half a year. The first order of business, we realized, is to make a conceptual data model and a scalar objective function.