Bovy finished, presented at MPIA Galaxy Coffee, and submitted to the ApJ his paper on dynamical inference based on Galactic masers today. He shows that the Reid et al masers can be used to constrain the Galactic potential, provided that sufficient prior information is provided about the position of the Sun in the Galaxy and about symmetries in the maser phase-space distribution function. The issue is that the masers do not represent an angle-mixed population, but nor can they be assumed to travel on circular orbits (they are observed not to). So you have to assume something in order to get the dynamical inference going. They are measured very precisely, so in principle they could have a lot to say about the Milky Way. However, most of the information in the observations is used to infer things about the maser phase-space distribution function, the parameters of which are (for our purposes) nuisance parameters and must be marginalized away. Look for it on the arXiv on Monday.
Lang and I continued to write code related to searching for overdensities in high-dimensional astronomical data. Bovy continued to write his paper on dynamical inference from Milky Way masers—the material he talked about at the Leiden meeting a few weeks ago.
Lang and I got up early and pair-coded the
arbitrarily covariant and heterogeneous errors in both dimensions case of fitting a line, with outlier rejection. I then showed this figure in the afternoon in the MPIA Hauskolloqium. My points of greatest emphasis were:
- If you want to say you have the
best fitmodel, then your model parameters better optimize a justified, scalar objective function. I mean
scalarboth in the sense of
single-valuedand in the sense of
respecting relevant symmetries.
- When you can create a generative model for your data, inference proceeds by maximizing the likelihood (or, better, sampling the posterior probability distribution function). You have no freedom about this; fitting does not involve much choice, at least at the conceptual level.
- Markov-Chain Monte Carlo methods—in particular with the Metropolis algorithm—are very easy to implement and run, and they optimize even non-linear problems, explore multiple local minima, and automatically provide marginalizations over nuisance parameters.
Lang and I pair-coded some analysis and plots for the fitting-a-line document, about which I am giving an informal talk tomorrow here at MPIA. We re-re-discovered just how easy MCMC is, and how useful it is when you have nonlinear problems with parameters that you need to marginalize out. With Rix, Lang, Bovy, and I had many conversations today about what is best communicated in a short seminar on this subject. We have been polling the crowd at MPIA and there are as many issues to address as people we have interviewed.
T. J. Cox (Harvard) gave a nice seminar at the MPIA today about merging disk galaxies in realistic simulations. He does a nice job of creating a huge diversity of spheroidal galaxies, with gas content and/or dissipation one of the dominant parameters. He (indirectly) confirms my methodologies for constraining the galaxy–galaxy merger rate and growth rate of spheroidal galaxies.
Today Lang and I ruled out the streams we have been finding in 5D this week. There is structure in the data that is almost certainly systematic, not real.
Lang and I ran some friends-of-friends in 5d phase space, continuing our search for substructure. We have tons of tantalizing structures, but we feel so confident that they are not real that we are assigning ourselves the job of ruling them out, not the job of confirming them. We failed to rule them out by the end of the day, although we found many bugs in our code.
On the weekend, Wu left Heidelberg and Bovy arrived. Rix made Lang and me a five-dimensional catalog (angular position, radial distance, and transverse angular velocity) of metal-poor halo star candidates. Today Lang and I began searching it for substructure. We found lots of structure; now is it real or useful?
My only real research contribution today was to re-write the abstract for the Astrometry.net paper. Over the two years between the first writing of that abstract and today, we have learned so much about the system and the problem in general that the old abstract was close to useless.
Lang and I discussed what it would take to get his thesis chapter on the design and performance of Astrometry.net ready for submission to the AJ. We decided that we have to add a short section on the false positives (which are extremely rare, and which can all be attributed to bizarre coincidences between flaws in the USNO-B Catalog with flaws in the input images). We also spun our wheels a bit on investigating some recent claims of Milky Way halo substructure.
Wu whipped into shape her Spitzer measurements of molecular H2 in galaxies selected (to have nebular emission) from the SDSS. Her goal is to measure the molecular-hydrogen mass function. This will require many assumptions, but we hope somewhat fewer than other estimates of this mass function. We will see. She has automated all of the IRS spectral data reduction and analysis and has automatically updating web pages for everything, in her usual (excellent) style.
In preparation for Lang's arrival in Heidelberg, I (nearly) finished the second draft of my enormous document about fitting a straight line to data. I also printed out this paper on linear regression in preparation for some
I never had the guts to post anywhere my polemic against principal components analysis (mentioned in passing here previously). In thinking about it for the last year I have come up with various alternatives that are better, more appropriate, and justifiable from a generative modeling perspective. However, the simplest is the bilinear model: Treat each data point as being close to a linear combination of component spectra, and optimize both the coefficients and the component spectra. The optimization can be of chi-squared, so it can properly represent the errors (after all, in any scientific application you want to minimize chi-squared, not the mean squared error, which is what PCA does). This whole idea is re-re-re-discovery, even for me. It is the technique used in Blanton's kcorrect.
Wu and I submitted the paper to the ApJ today, and it will appear on the arXiv next week. Here's the abstract:
We present optical and mid-infrared photometry of a statistically complete sample of 29 very faint dwarf galaxies (M_r > -15 mag) selected from the SDSS spectroscopic sample and observed in the mid-infrared with Spitzer IRAC. This sample contains nearby (redshift z<0.005) galaxies three magnitudes fainter than previously studied samples. We compare our sample with other star-forming galaxies that have been observed with both IRAC and SDSS. We examine the relationship of the infrared color, sensitive to PAH abundance, with star-formation rates, gas-phase metallicities and radiation hardness, all estimated from optical emission lines. Consistent with studies of more luminous dwarfs, we find that the very faint dwarf galaxies show much weaker PAH emission than more luminous galaxies with similar specific star-formation rates. Unlike more luminous galaxies, we find that the very faint dwarf galaxies show no significant dependence at all of PAH emission on star-formation rate, metallicity, or radiation hardness, despite the fact that the sample spans a significant range in all of these quantities. When the very faint dwarfs in our sample are compared with more luminous (M_r ~ -18 mag) dwarfs, we find that PAH emission depends on metallicity and radiation hardness. These two parameters are correlated; we look at the PAH-metallicity relation at fixed radiation hardness and the PAH-hardness relation at fixed metallicity. This test shows that the PAH emission in dwarf galaxies depends most directly on metallicity.
Didn't get much work done today, but there were nice seminars by Rudnick (Kansas) on the evolution of the mass density on the red sequence as a function of environment, by Carilli (NRAO) on the status and promise of high-redshift 21-cm experiments (he is involved in the one called PAPER), and by Sheth (Penn) on statistical techniques for cross-correlations and deconvolution. My take-home message from the Carilli talk is that there is some awesome work to be done in the software domain.
Today I gave an informal seminar on decision making (in a Bayesian context) in the "Applications of Machine Learning in Astronomy" course led by Coryn Bailer-Jones (MPIA). It forced me to write down what I think about this subject and why I think it is important. I don't think I conveyed, however, why making decisions controlled by an explicit utility matrix is better than just making uncontrolled decisions.
On a related note, Joe Hennawi (MPIA) and I discussed quasar target selection for SDSS-III at length, yesterday and today. Today our subject was hypothesis testing (star vs quasar) using not just colors but also variability. Choosing targets for spectroscopy ought to be a perfect application for decision theory, although the SDSS-III target selection code does not, at present, make use of utilities.
In between various writing projects, Wu and I inspected and modified some of her fits of emission lines in our Spitzer spectroscopy of SDSS galaxies. We also had a chat with Schiminovich about them. Wu's project is to understand the mid-infrared and optical lines in general, and also assess whether we can use them to measure the abundances of molecular hydrogen from its direct line emission.
Working on the weekend and today, we finished Koposov's paper on the GD-1 cold stellar stream. His constraints on the Galaxy potential are significant; we have the (formally) best current constraint on the Galaxy circular velocity, and we constrain the flattening of the dark-matter halo (though not at awesome precision). Congratulations, Koposov! It will hit the arXiv within days.
Wu and I finished her thesis chapter / paper on the ultra-low-luminosity dwarf galaxies observed with SDSS and Spitzer. Our co-author has one week to respond and then we submit it! Congratulations Ronin.
I worked on abstracts for the Koposov and Wu papers that are being finished now in Heidelberg. Abstracts are the hardest parts of papers and therefore should be written first and constantly tinkered with as the paper is written and revised.
Marshall and I developed a plan to start image modeling, with the goal of finding faint lensing galaxies under the flux of bright, multiply-imaged quasars. This problem is a hard one in ground-based data, but essential if the lenses from PanSTARRS and LSST are going to be properly mined. The project is related to the image modeling projects of Lang and Bovy and myself; Marshall and I are hoping that we can join forces somehow. Among other big issues: Can we believe the surveys' pipeline-output PSF models, or do we have to fit for a set of perturbations or modifications around or away from those outputs? If we do have to fit, what basis functions make most sense?