I tried to work out the limit on cosmic transparency implied by the accuracy of the COBE FIRAS experiment measurement of the blackbody spectrum. It provides an effective "Tolman test" because the experiment measures not just the spectral shape, but also the absolute amplitude of the intensity field.
I spent most of the day reading and summarizing published work on tidal streams as possible measuring devices for massive substructure in the Galaxy halo. So far I have found no analytic calculations, and no investigations of individual features in individual streams.
In conversation with Tsalmantza, I worked out a fully empirical methodology for constructing a set of spectral archetypes that fully
represents all of the galaxy spectra in the SDSS. Before today, the only methodology I had involved theoretical spectral fitting. But the work we have been doing on constructing a reliable and useful PCA subspace may actually pay off (despite my dislike of PCA).
Many years ago (2003, maybe?), I worked out a maximum-likelihood method for choosing the best possible binning for a histogram of discrete data. It is based on leave-one-out cross-validation. This has slept in my CVS repository until today, when I was about to post it to arXiv. Of course, just as I was beginning to post it I found this paper, with which I don't totally agree but which is clearly highly relevant (along with a number of papers referenced within it), so I will post tomorrow.
In separate conversations, Lang and I and Marshall and I have been talking about image modeling and deconvolution. I finally realized today something very fundamental about deconvolution: A deconvolved image will never have a well-defined point-spread function, because the high signal-to-noise parts of the image (think bright stars in astronomy) will deconvolve well, while the low signal-to-noise parts (think faint stars) won't. So the effective point-spread function will depend on the brightness of the source.
Properly done, image modeling or deconvolution—in some sense—maps the information in the image, it doesn't really make a
higher resolution image in the sense that astronomers usually use the word
This all gets back to the point that you shouldn't really do anything complicated like deconvolution unless you have really figured out that for your very specific science goals, it is the only or best way to achieve them. For most tasks, deconvolution per se is not the best thing to be doing. Kind of like PCA, which I have been complaining about recently.
In other news, Kathryn Johnston (Columbia) agreed with me that my first baby steps on cold streams—very cold streams—is probably novel, though my literature search is not yet complete.
In yet other news, Surhud More and I figured out a correct (well, I really mean justifiable) Bayesian strategy on constraining the cosmic transparency in the visible, by marginalizing over world models.
I spent most of my research time today thinking about how to analyze large collections of images. Lang and I are coming around to a
data compression framework: We add or change or make more precise model parameters (such as star positions and fluxes and adjustments to the PSF or flatfield) when adding or changing or making more precise those parameters reduces the total information content in (smallest compressed size of) the residuals by more than it costs us in an information sense (again, compressed size) to add to the parameters. This is data reduction.
There is a full worked-out theory of inference based on data compression; in fact to the extremists, the only probabilistic theory of inference associates probabilities with bit lengths of the model description (lossless compression) of the data stream. A beautiful (and freely available on the web; nice!) book on the subject is Information Theory, Inference, and Learning Algorithms by David MacKay.
For astronomical imaging, the best compression scheme ought to be a physical model of the sky, a physical model of every camera, and, for each image, its pointing on the sky, the camera from which it came, and residuals. The parameters of the sky model constitutes the totality of our astronomical knowledge, and we can marginalize over the rest. I love the insanity of that.
I spent much of the day working on an old-school perturbation-theory calculation. I consider a cold tidal stream in a host galaxy potential, perturbed by the close passage of a point mass. In the limit of small perturbations of very cold streams, this calculation has only two free parameters: the angle between the direction of the stream and the velocity of the perturber (in the comoving frame of the stream or equivalent), and the time since the impulse (in some scaled time units related to the mass and velocity of the perturber or amplitude of the perturbation). This is all very idealized, but actually, to zeroth order, I think I may have exhaustively described all possible perturbations to cold streams. Now the idea is to use this to constrain the substructure in our Galaxy with observations of cold streams.
I finished the faint-source proper-motion paper. It still needs to be vetted by collaborators, but I am stoked. Here is the abstract:
The near future of astrophysics involves many large solid-angle, multi-epoch, multi-band imaging surveys. These surveys will, at their faint limits, have data on large numbers of sources that are too faint to detect at any individual epoch. Here we show that it is possible to measure in multi-epoch data not only the fluxes and positions, but also the parallaxes and proper motions of sources that are too faint to detect at any individual epoch. The method involves fitting a model of a moving point source simultaneously to all imaging, taking account of the noise and point-spread function in each image. By this method it is possible—in well-understood data—to measure the proper motion of a point source with an uncertainty (found after marginalizing over flux, mean position, and parallax) roughly equal to the minimum possible uncertainty given the information in the data, which is limited by the point-spread function, the distribution of observation times, and the total signal-to-noise in the combined data. We demonstrate our technique on artificial data and on multi-epoch Sloan Digital Sky Survey imaging of the SDSS Southern Stripe. With the SDSSSS data we show that with this technique it is possible to distinguish very red brown dwarfs from very high-redshift quasars and from resolved galaxies more than 1.6 mag fainter than by the traditional technique. Proper motions distinguish faint brown dwarfs from faint quasars with better fidelity than multi-band imaging alone; we present 16 new candidate brown dwarfs in the SDSSSS, identified on the basis of high proper motion. They are likely to be halo stars because none has a significantly measured parallax.
Continued writing on the faint-motion paper. It is funny how much there is left to do when a project is done!
Wu began working on the GALEX properties of post-starburst galaxies: They aren't detected at the MIS depth. That is good, because star-forming galaxies are detected. So a GALEX selection is likely to work, at some level. The question is, how good will it be? We would like to have a GALEX-based selection of the post-starburst galaxies so we can perform emission-line studies without worrying about the fact that the post-starbursts are selected on the basis of emission lines. Fortuitously, Christy Tremonti (Arizona) showed up at the MPIA today for a month, so she may be able to help out.
Bovy, with some help from Moustakas and me, got ready the galaxy-cluster transparency paper for resubmission in response to referee. The referee really made a big difference to the paper, because he or she recommended averaging the samples in a better way, which improved the results. I, with help from Barron and Roweis, got ready the Blind Date paper (on estimating image dates using proper motions) for resubmission. And I, with help from Lang, have promised my co-authors I will get the paper on faint-source proper motions ready for submission by the end of the week. That end approaches fast.
I switched my search for low kurtosis directions in spectrum space into a search for bimodal directions. That is, I wrote down a scalar (which has to do with k-means with k=2) that decreases as a distribution becomes more bimodal. Then I searched Tsalmantza's high-variance PCA components for directions in the space that are
most bimodal. I find three, perpendicular bimodal directions! Of course each one is a different version of the red–blue galaxy bimodality, of which I have been an unheard critic. More on this as I understand it better.
Surhud More (MPIA) and I began working this week on the monopole term in the opacity of the Universe, using the consistency of baryon-acoustic and supernovae measures of the expansion history to check the phase-space conservation of photons. This test is (nearly) independent of world model, as it depends almost entirely on purely special-relativistic considerations. We hope to have an LPU (least publishable unit) on the subject soon.
Wu and I are back onto looking at the processes that lead to post-starburst galaxies, this time with Frank van den Bosch and Anna Pasquali here at MPIA. The first step was to create comparison samples to make null hypotheses; because the catalog we are using has redshift and flux dependencies, we built comparison samples to be exactly matched in redshift and brightness (stellar mass). Wu finished that today.
With Tsalmantza's help, I got the kurtosis minimization working on high-variance directions in the SDSS spectral space. In the minimal kurtosis directions (within the high-variance subspace), the star-forming and non-star-forming galaxies separate very clearly, and there are other tantalizing structures. I think this technique may have legs.
A few days ago I bashed PCA on various grounds, in particular that it ranks components by their contribution to the data variance, and it is rarely the data variance about which one cares. Today in discussions with Tsalmantza I realized that one could rank components by the kurtosis of their amplitudes (rather than the variance), and lowest first. This has a number of advantages, but one is that (uninteresting) data artifacts and outliers tend to create high-kurtosis directions in data space, and another is that if there are directions that are multi-modal, they tend also to be low in kurtosis (think the color distribution of galaxies, which is bimodal and low in kurtosis). It is still a very
frequentist approach, but a search for minimal kurtosis directions in data space might be productive. Tsalmantza and I hope to give it a shot next week.
I made plots of my quasar–photon and white-dwarf–photon cross-correlations, and the random samples which are supposed to be equivalent. There must be some kind of bug, because the random has a negative feature at the center! So I will spend the rest of this week de-bugging.
Schiminovich and I hope to detect intergalactic scattering with our quasar–photon cross-correlations. In order to make this detection, which will require precision, we need to create a differential experiment. The first difference is between mean quasars and mean white dwarfs. The white dwarfs are so close, they should have essentially no scattering (or scattering local to the observatory that is shared with the quasars). I made the white-dwarf–photon cross-correlations this weekend.
The mean white-dwarf image is probably interesting in its own right, if I broke down the white dwarfs by type and temperature, because they would provide extremely high signal-to-noise GALEX information. Does anyone want that?
Wrote code to combine the jackknife subsamples of my quasar–photon cross-correlation functions and visualize the output. The redshift-dependence of the ultraviolet flux from quasars was not as strong as I expected. Either I have a bug in my code or a bug in my thinking.
I got back up to speed on the quasar-photon cross-correlation, splitting the SDSS photometric quasar sample by redshift. This is the first test: Does the ultraviolet flux depend on redshift as we expect? It ought to drop out at the redshifts at which the two GALEX bandpasses cross Lyman-alpha and the Lyman limit. Hope to have results for tomorrow.