Jonathan Zrake (NYU) gave a very nice oral candidacy exam today in which he described his relativistic MHD plus cosmic ray particle code that he and Andrew MacFadyen built to test acceleration mechanisms. He gets enormous accelerations and shows that they are—in some circumstances—caused by many little shocks in the turbulent post-shock fluid rather than the big shock (think relativistic outflow or blast wave) behind which the turbulence is growing.
Lang and I hacked away at our Comet Holmes project today, trying to get an MCMC to find the global optimum of the likelihood function, and sample it. We know the right answer—the point is not to get the right answer in this case—the point is to make a system that just works, every time. In this and in my exoplanet stuff, there are many local minima; of course that is true for generic optimization problems, but I have an intuition that a lot of the simple dynamics problems have similar kinds of local minima.
One theme of these kinds of problems is that we (meaning astronomers) tend to use MCMC for three things that are really distinct operations:
- We want the MCMC to crawl the parameter space and find, among all the local optima, the global optimum. This is search.
- We want the MCMC to gradient-descend into that global optimum. This is optimization.
- We want the MCMC to give us back a fair sampling from the likelihood or posterior probability distribution function. This is integration.
In the last few seconds of Mierle's visit, I realized that he might be the perfect person to execute the experimental design project I envision on filter curves for optical astronomy. Right now we all use filters (like UBVRI or ugrizy) that were hand-designed by humans for intuitive goals. I don't think there is any sense in which any filter system (even the exceedingly complex COMBO-17 set) was shown to be the optimum of anything; that is, I don't think anyone has found the best set of filters for any objective.
I had a long phone conversation with Lang today about PSF modeling and astrometric calibration. My loyal reader knows that my position is that precise catalog matching is not possible; for this reason precise calibration requires image modeling. Bovy is in fact testing this position with SDSS imaging. But there is an interesting and unresolved issue, which is that there is no necessary definition, given a PSF, of what constitutes the center of that PSF. That is, you can shift all the stars one pixel left and shift the definition of the center of the PSF one pixel right, and everything about astrometric calibration remains the same. This means many things, including (1) that decisions must be made to break the degeneracies (which are perfect, since they are logical), and (2) that atmospheric distortions to the precise astrometric mapping can be set to zero by definition if we are willing to suck them up into the PSF centroid definition, and (3) that naive tests of PSF centroiding (like what Bovy is starting) could fail because of inconsistencies about what the PSF centroid means or is. Important point: Because of atmospheric variations during an exposure, the centroid of a stellar image does not have to be the true location of the star in any sense.
Our engineering day was derailed partially by a great talk by Nathan Smith (Berkeley) about peculiarly luminous and ultra-luminous core-collapse supernovae. This talk was so beautifully structured, so clear, and so full of remarkable and tantalizing results, that my engineering team (read: Lang and Mierle) both wanted to quit astrometry and start working on extremely massive stars. They insisted on going to lunch with Smith; we spent lunch time talking about fundamental observables in astronomy, many of which were touched on in Smith's talk: He could measure delay times to explosive events with kinematic measurements, distances to objects by relating angular kinetic times to linear kinetic times, radiative and gaseous properties of nebulae by measuring families of emission lines, and three-dimensional structures of expanding nebulae with highly featured Balmer emission-line profiles. What a cornucopia of astronomy! Mierle, of course, suggested that we automate all of Smith's data analysis. Smith agreed that in any LSST-like future, automated spectral analysis will be a requirement.
Lang hit town and we worked on resurrecting tweak. Mierle worked on building an interactive tweak tool that runs in the browser, while Lang and I started getting together some kind of hacky optimizer based on what we already have in our old tweak and some ideas from our verify code. It was a bit of a mess, but by the end of the day we had code to make first-guess WCS. Tomorrow we will try to improve it.
Keir Mierle (Google) came into town for a week of hacking. He is on vacation; it pleases me that this is his idea of vacation! We spent the day working on the objective—that is, the scalar function we are trying to optimize—for the tweak problem, the fitting of fine-scale astrometric distortions in images. In the end we came close to what Astrometry.net uses in its verify step, the step at which it decides whether a quad-inspired hypothesis about astrometric calibration is correct (or not). The new thing is that model complexity must be penalized: At equal performance we prefer a simpler model; we need good evidence to instantiate new model freedoms. If we can get consensus with Lang, we will hack tomorrow.
I worked on my fitting-a-line document, including some re-structuring. I spoke with Wu about job applications, her thesis, and her thesis schedule; it is due in May. I talked to Bovy about various referee reports. I spent a long time talking to Lang about the tweak and verify problems, for which we both think proper specification of a good objective will go a long way towards solving the problems in general. It is a nice point, and very Sam-like: Specify your objective carefully, and everything else is just clever optimization.
I wrote a page in our long-suffering—and exceedingly long—note about fitting a straight line to data. My page was about all the reasons the (bulk of the) literature on
linear regression is wrong and useless. I have blogged this point before.
Schiminovich and I
worked (in our usual style) on the photometry of SDSS sources overlapping GALEX imaging. We planned the first scientific vetting steps for the catalog. I also spent a bit of time at the end of the day writing a polemical introduction to the paper (will need editing).
On a very short day (it is spring break here and beautiful weather) I spoke with Wu about her thesis abstract (due this week) and Zolotov about the first draft of her paper on stellar metallicities in galaxy formation models. Not a huge research day.
I spent a long time on the phone with Lang today discussing various matters but in particular our objective function for matching stars between an image and a catalog. This objective function is necessary to write down in advance of us declaring success (or anything else) in our plan to nail
tweak next week. (Tweak is the part of Astrometry.net that measures detailed focal-plane distortions.)
I also dusted off our document about fitting a straight line to data. Part of my reminiscing about Roweis has brought me back to this subject and I promised myself I would finish that up. It could be a useful document and I learned that it is very close. I started to mark up what we have.
I got up early and worked on the text of Bovy's new piece on Oort-like methods for understanding the Milky Way disk, then the tables and figures of Jiang's piece on the small-scale clustering and merger rate of galaxies of a range of colors and luminosities. I don't have much to say about these activities, since they are entirely read-and-mark-up, but it was very satisfying to sink a few hours into my students' (excellent) work.
Not much research got done today, but I gave a lunch-time blackboard talk about the BOSS meeting. I emphasized to the crowd the operations research issue—the unsolved problem that we have to make local, highly constrained observing decisions but that we must end with a contiguous footprint—in the hopes that someone would get interested to help solve it.
It was a science-filled day, with conversations with Price-Whelan, Jagannath, Maffucci, Blanton, and Jiang about various projects underway. But the best moment was when Dmitry Malyshev (NYU) came into my office and proposed a project (out of the blue) to determine the statistical properties of the distribution of sources too faint to be individually detected in the Fermi data. He had some good ideas about working in Fourier space, and I suggested some overly complicated ideas in real space. There ought to be a lot of information in the data set, because the photons are individually detected and there isn't a huge background. That is, there are many sources there at a significance too low to claim but which should show themselves in aggregate. One approach we did not talk about is doing the whole hyper-prior Bayes thing to find them.
The BOSS meeting got me fired up enough that on the airplane home I worked a bit more on specifying (and simplifying) Roweis and my data-driven model of the BOSS spectroscopic data (consisting of flats, arcs, and science frames). I realized that a lot of the problem comes down to sensible (fast, differentiable, and good) interpolation; this explains why such a big part of the SDSS spectro2d code (written by Schlegel and Burles) is it's B-spline core. I hope this all exists in Python in a useful form. If it doesn't, I might have to bring Burles out of retirement. I also realized that optimization might become an issue. Roweis was pushing stochastic gradient. I have to figure that out too.
Today was quasar target-selection day, and the meeting (which lasted three hours) was great. The target-selection group
downselected to one method from three or four competitors (all of which were performing well) for the
core (uniformly selected) subsample and made decisions for operations through to the summer. But the group also tasked itself with quite a bit of research and optimization leading up to the summer so that corrections can be made for the second year of data. It was a challenging, tiring, and substantially successful session.
In the afternoon, Blanton spoke about tiling, drilling, and survey strategy. There are dangers, given weather and observing constraints, that the survey might end with a non-contiguous footprint in the North Galactic Cap. This would be bad for both the galaxy clustering and the Lyman-alpha forest clustering. So the challenge is to operate the survey—that is, make local day-to-day decisions—such that at the end of five years, there is a large and contiguous footprint observed. This is pretty hard, especially since contiguity at the end of five years is not, in any sense, a local constraint on operations.
I am in Salt Lake City for the SDSS-III BOSS meeting. On the first day, there were a series of talks about the hardware state of the survey, and whether or not the early data (and the things they imply about hardware and targeting) are meeting requirements, as set down in a very detailed requirements document. Highlights include David Schlegel (LBNL) explaining why you should fit redshifts to the individual-camera exposures and not to any co-added spectrum, Adam Myers (UIUC) going through great detail about how complicated and uncertain the quasar target selection is, and Bill Carithers (LBNL) noting that for many science frames we don't (for reasons I didn't quite get) have arc lines shortward of 4000 Angstroms, where we care deeply about the wavelength solution for the quasar-absorption-line density field. It was not gloomy, though: The survey is underway and the data look excellent. Almost all non-spec items had paths to spec and I anticipate an amazing data set.
Wow, it really feels like no research gets done (by me) any more. But I do go to a lot of good seminars! Today it was Dani Maoz (Tel Aviv) who gave a nice talk about inferring the type-Ia supernova delay function from observations. He came up with four methods (really five if I count one that was in his talk but he didn't separate out as a separate method), and they all (but that one) agree that most of the type Ia supernovae go off promptly (within less than a billion years) after star formation. He has additional evidence—though weaker because it is more dependent on star-formation history fitting—for a longer tail to long delays. That is, there appear, observationally, to be two populations. There are many remaining puzzles and interesting bits of phenomenology worth following up there. This is all very important to the chemical history of the Universe and the formation of metal-rich systems like rocky planets and the like.
One of Maoz's methods involves a classical inverse problem, with two things related by a linear transformation. As my loyal reader knows, I think all the interesting things about these inverse problems lie in the transformation itself, not the inversion, but that is the subject for a beer-in-hand conversation with Maoz this weekend.
In the morning, Jeremy Tinker (Berkeley) gave a nice seminar on halo occupation. He showed that by modifying halo occupation parameters, he can fit the two-point correlation functions of galaxies of different types in a wide range of cosmological models. But then he showed that if he used those models to predict other observables, like the void probability function or the mass-to-light ratios of groups from weak lensing or the multiplicity function for clusters, he can strongly distinguish models. His results are beautifully consistent, in the end, with WMAP results, and more constraining in some directions.
In the late afternoon I
went to California to give a seminar at SLAC by videocon. This worked pretty well, although there are definite and strong limitations to communicating over the wire. The event was set up by Phil Marshall because he is exploring ways for astronomers to reduce their environmental footprints, and air travel sure is a big fraction of mine, according to the bible (and, indeed, any common-sense order-of-magnitude calculation). I avoid politics here, so no more said on this. As for the seminar, it was on my dynamical inference stuff. Phil even went so far as to arrange sit-down chats with various people there before and after my talk, which was very nice.
environment tag on this post is for HOD modeling, not carbon footprint.]
[Update next day: Phil sent me these photos taken by Kelen Tuttle (SLAC):]
I got nothing done today, if I don't count getting ready (in a technical sense) for my remote seminar tomorrow at SLAC. That is, we tested software and sound for a telecom from my office to SLAC. Getting the software working (we are using XMeeting) was annoying, but a lot less annoying than traveling cross country.
With Price-Whelan and Jagannath, and also now with Lang for our April-Fools' attempt (may fail), we are facing the problem of comparing a line of points on the sky (stars or measurements) with a line of points generated by a model (a sampling of a curve). This is a classic issue in astronomy (or any science) and yet there is no widely accepted probabilistic descriptions of this problem in general (that is, when the curve is arbitrary and can do things like cross and kink). In astronomy, if the curve is an orbit, it lives in phase space too, so really the comparison is being done in six-dimensional space, but where some of the dimensions are unmeasured for most of the points or stars.
Polemical notes I could make but will restrain myself include the following: There are many odd and wrong choices in the literature for comparing streams of stars to orbit calculations, we should correct those. In general, looking at the distance between a data point and the closest point on the curve is the wrong thing to do; the investigator should be marginalizing over a choice of points.
Distance must be defined with a metric, which is like the inverse covariance matrix describing the data-point uncertainty, or that convolved with a model width or uncertainty. There is no difference in principle (or in practice) between missing data and badly measured data, so there is a covariance approach that deals with the 6-d fit on the 2-d sky if you think about it right.
Lang and I spent much of the day discussing these issues and coming up with an approximate likelihood for the problem we are working on. I have so much to say about all these issues I should write a paper, but I almost certainly won't.
Ingyin Zaw (NYU) came into my office to ask about separating non-maser galaxies from maser galaxies using optical and infrared properties. We talked about support vector machines and the like but she wanted to use properly the data-point uncertainties, and the objects with missing data (some quantities of interest not measured). That got me on my soap-box:
If you want to do classification right, you have to have a generative model for your two distributions, preferably one that takes account of the individual data-point errors (which are different for each point) and missing data. This means fitting the noise-free distribution in a generalized way to each noisy data point. Then you can construct likelihood ratios of non-maser vs maser (or whatever you like) by ratioing, for any new data point, the error-convolved densities evaluated at the new datum. And, of course, we have the best code for constructing the noise-free (or uncertainty-deconvolved) distribution functions in our extreme deconvolution code base (and associated paper).
Support vector machines and neural nets and the like are great, but they work only in the never-existing situation that (a) all training data points have (roughly) the same error properties, (b) there are no missing measurements for any data point, and (c) your training data are identical in selection and noise properties to your test or untagged data. Show me a problem like this in the observational sciences (I have yet to find one) and you are good to go! Otherwise, you have to build generative models for your data. (I would say
IMHO but there is no
H going on.)