...to do any research today. Today was catch-up day after sacrificing the last two weeks of my life to this amusement. So that's two Rules-breaking posts for two this week. Not a good start.
I spent all day at the Spitzer Science Center, reviewing their plans (still tentative) for a heritage-archive source list, or set of heterogeneous catalogs built from the imaging to support object-based and statistical searches of image contents. If they do this, they have to do it very much
hands free because the data set is huge. That wouldn't be such a problem (SDSS was done hands free) but of course they also have to deal with the fact that there are many different observing programs, exposure times, dither patterns, and field characteristics, including diffuse emission, and etc. The plans so far are excellent; they really impressed me, as they always do at the SSC. It got me thinking once again about my polemic on probabilistic catalogs and image modeling. (I think this post violates The Rules.)
Iain Murray de-bugged some of our old, wrong, and rejected methods in our dynamical inference project. This convinced us to re-try and we have a method and an answer we like much better. We frantically changed figures and text, added Murray as a co-author, and now we are done.
Schiminovich and I realized that we have a few very simple things we can work out using our GALEX photometry of SDSS photometric quasars, not the least of which is an estimate of the extragalactic ionizing radiation impinging on the outskirts of the Milky Way, broken down by redshift (and quasar luminosity, and other things). Schiminovich was appointed to draft paper 1.
With Ben Johnson (Cambridge), Wu, Schiminovich, and I discussed the physical meanings of infrared lines. Johnson has run a large grid of models and can make plots that put Wu's line measurements into some kind of physical context.
While Bovy implemented as fast as he could, I wrote as fast as I could. Our deadline approaches rapidly, but we are still at sea. Useful comments came in from Iain Murray (Toronto), a colleague of Lang's and a super Bayesian.
Bovy and I realized that one way to cast the Bayesian orbital roulette problem is as a maximum entropy problem: If we really think the phases ought to be well mixed, we seek the solution for which there is maximum entropy remaining in the phase distribution. This is tantamount to saying that all phases are equally likely, it turns out.
For the Astrometry.net project, Lang built an index of quads and stars from the 2MASS survey, to act as a comparison for the indices we built from USNO-B 1.0. He ran the 114 (out of 180,000) SDSS fields that failed on the USNO-B indices through the 2MASS indices and every single one of them calibrated successfully. This means that for SDSS r-band fields (55 sec exposures on a good 2.5-m telescope), the Astrometry.net system has (naively) a better than 99.999 percent success rate with no false positives. Although this is Lang's research and not mine, exactly, I am so stoked I don't know what else I would post about.
It being Spring Break here at NYU, the plan was to get one paper done per day! So far: Four days, zero papers. Damn.
Bovy and I spent the day hacking and talking on the dynamical inference problem: Determine the parameters of the gravitational potential given a snapshot of test particle positions and velocities. As I have noted previously, this problem is ill-posed in general; it all comes down to priors. Today we decided to revive our old idea of the roulette prior, or the prior that leads to a flat posterior distribution for particle phases (in the language you can use for integrable potentials; there are generalizations for non-integrable). We got talked out of that prior by Tremaine, but I now think for the wrong reasons; we certainly didn't understand the problem as well back then.
The Astrometry.net system works by generating likely hypotheses about each input image's astrometric calibration, and then testing those hypotheses with a verification process, which we have cast as a well-posed statistics problem in the formalism of Bayesian hypothesis testing. If we have this right, then the false-positive rate (which we want to have vanishingly small) is an explicit parameter in our system, not implicitly set by ad-hoc cuts or thresholds.
Unfortunately, our false-positive rate is orders of magnitude higher than it should be. This is because our well-posed statistics problem is an approximation to the problem we really need to solve. In detail, to do it right, we would need a generative model of the USNO-B Catalog (our basis for truth), the sky, and all sources of astronomical imaging. We are far from this, so we have to make approximations. Today Lang and I spent all day trying to improve our approximation. We didn't finish, but we learned a lot.
For space missions and any real-time electronics, bandwidth is limited. How do you set your bit depth or the relationship between your (limited) bits per pixel and the pixel intensity measurements to get the most information for your buck? Adrian Price-Whelan (NYU) and I worked on this problem all day today. Price-Whelan has some very nice results, based on simulated data, and the answer is, roughly, that you have transmitted all the information—even the information on sources much fainter than the noise—if your image noise is larger than about two bits.
I spent most of the day with Zolotov, who is turning into a theorist under the tutelage of Willman (Haverford) and Brooks (Caltech). We discussed observational discriminants for halo formation scenarios.
One of the principal selling points of Astrometry.net is that it can calibrate imaging even in the face of artifacts, occlusion, and spurious signal. Lang and I pair-coded some enhancements along these lines today, including an extremely robust method for determining the noise level in an image of unknown provenance (for example that has not been flat-fielded or background-subtracted), and some post-processing of detected source lists to remove bad CCD columns (a common problem in raw data).
Inspired by our conversation with Ménard yesterday, I worked out the signal-to-noise with which we detect quasars of different visible properties in GALEX ultraviolet imaging. I find that we can go to about 26 mag in the ultraviolet with sets of 200 GALEX AIS-depth images. This bodes well for testing cosmic extinction.
At the end of a non-research day, Brice Ménard (CITA) swung by and had a long discussion with Moustakas, Bovy, and me about his very nice measurement of the galaxy–dust cross-correlation function. It was illuminating, and made me both more excited about, and perhaps a little less optimistic about, performing related experiments with GALEX. Ménard proposed some joint ventures and I agreed to them.
Bovy explained something to me about the roulette problem I was stressing about earlier in the week: If we don't have any prior information about the system, we certainly can't do inference. For example, we can't use the concept of orbital roulette if we don't think the system is bound. So we have to set up our inference such that the prior distribution function can include the information that the planets (or stars or whatever) are bound. Furthermore, we prefer dynamical solutions—in the context of roulette or mixed-phase—that have shorter dynamical times, because they will be more well mixed in angle.
This breakthrough (for me, at least) makes it clear that we can't have a Bayesian form of the problem that doesn't involve relatively complex priors, complex because the whole concept of roulette or mixed phase involves strong prior information: The information that the system is bound and long-lived (relative to its internal dynamical timescales).
I worked out the details (on paper) of two projects we are thinking about relating to dust attenuation. In the first, we look at stellar spectra in the SDSS as a function of Galactic reddening and determine the dust attenuation law at high resolution. In the second, we look at quasar photometry in the ultraviolet as a function of line-of-sight separation from galaxies of differing properties to get the dust–galaxy cross-correlation function.
Once again I got confused about our probabilistic inference translation of the frequentist orbital roulette technique for inferring the dynamical properties of systems for which you only have a kinematic snapshot. The key issue is that the frequentist gets to choose which statistic he or she wants to test; he or she only needs a statistic which will assign low likelihood to unlikely data sets given the assumptions or model. The Bayesian, on the other hand, must treat the observations as modifying prior probability distributions for parameters (or hypotheses) into posterior probability distributions. The only
freedom is in choosing the prior, but that isn't even a freedom if you are a strict Bayesian (that is, if you use your prior probability distribution functions to accurately represent your true prior knowledge).
Now, if I could convince myself of the following I would be happy: When you see a snapshot of a dynamical system about which you know nothing except that it is long-lived, your true (I mean accurate) prior probability distribution over actions, angles (think phase space coordinates for an integrable dynamical system here), and dynamical parameters (think parameters of the gravitational potential, such as the total mass and dependence on radius here) is such that whatever values of position and velocity you observed, your Bayes-theorem-generated posterior probability distribution for the angles is nonetheless perfectly flat.
I have called this choice of prior the
roulette prior since if we adopt it, we get a very nice Bayesian formulation of orbital roulette. But I can't quite convince myself that this is really an accurate representation of one's prior knowledge.
Karl Gordon (STScI) came by on Friday to give the astrophysics seminar. He spoke about dust in star-forming regions and evidence for processing of the dust by massive stars. In conversations before his talk, Moustakas and I realized that we could measure the spectral dust law from SDSS spectra of stars; I did a quick-and-dirty job of that this weekend. I find that the dust law is in the data at huge significance. Now we are trying to figure out if it is worth doing right and publishing.
Among other things, Schiminovich and I wrote a recursive program to construct a hierarchical tree of voxels in a high dimensional space. This is just to make a good binning of the quasars that we are using to study the clustering of extinction and lyman alpha opacity. I only mention it here because we found the logic of the recursion so challenging!
Wu finished measuring the narrow emission lines in our Spitzer high-resolution mid-infrared spectroscopy of normal SDSS galaxies. It looks like the line phenomenology is going to be rich, and that we can measure molecular masses, star formation rates, and ionization spectrum independently from or differently than in the visible SDSS spectra.
I spent the afternoon at the American Museum of Natural History, where I gave a seminar on our image-modeling and automated calibration stuff. Before and after, I talked to Sebastien Lepine about how he measures parallaxes, Mike Shara about his infrared survey of the entire Milky Way disk, Jackie Faherty about brown dwarfs, and Ben Oppenheimer and Doug Brenner about the issues of finding incredibly faint companions among the speckle noise in a coronograph. Great day!
I spent the weekend and today working on the emission-line archetypes project, which makes use of binary programming. At the same time that I started to get concerned that this is the wrong approach (I have what may be a better method that makes use of density estimation with delta functions), I found that the engineering-grade binary programming solvers are awesome. Thanks, Roweis!