Today I started NYU undergrad Jeffrey Gertler on the GALEX photon list project with Schiminovich, Greenberg, and me. His first job is to get our software running at NYU, where we have all 1012 photons spinning on disk. Then he will work on spacecraft attitude, point-spread function, and unresolved background.
Fergus and I spent a good chunk of the day together, discussing in great detail the outline for our first paper on high dynamic-range imaging, and then our possible NSF BIGDATA proposal. The former is about our data-driven PSF (speckle) modeling for the P1640 spectroscopic coronograph. The latter is going to be about self-calibration, the point that if you have huge amounts of data, your science data are far more informative about instrument calibration and properties than the output of any separate, prior, or interspersed calibration strategy. It just takes guts to use them. But that's how we calibrated the SDSS imaging after all: Once we adjusted the observing strategy to include lots of redundancy we were able to get better flats and atmospheric transparency information out of the science scans across random stars than we got from any realistic calibration data on standard stars.
Last week, Patel took a test strip of galaxy prints made with different RGB-to-CMYK conversions for C-printing and we analyzed the results. Today I used the results of that to make a refined test print, with a finer grid centered on the best results from last week's test. The idea is to do a RGB-to-CMYK conversion that respects the physical properties of the CMYK inks, such that the reflectance of the CMYK print is related in a sensible way to the intensity of the RGB image on a standard monitor. Or, another way to put it: RGB is a light-emission standard, CMYK is a light-absorption standard. The light-emission and light-absorption mechanisms are imperfect, but in ways that can be modeled. We should use that! This all in preparation for the Atlas.
In related news, Patel and Mykytyn visited Princeton to meet Lang today, and Mykytyn got the star masking working in our Tractor fits of very bright galaxies. All this bodes well for our photometry and sample selection of the brightest galaxies in the SDSS footprint.
Foreman-Mackey and I worked away at doing Gaussian-aperture (or Gaussian-model) photometry on the output of
Traditional Lucky Imaging and The Thresher, both acting on some of Bianco's fast imaging data. We went around in circles a bit with bad optimization, but we can now measure the point-spread function width and the signal-to-noise of point sources in the final data products. In the morning, I worked on improving Lang and my mixture-of-Gaussian approximations to the SDSS galaxy profiles that we use in The Tractor, in part to help out Kevin Bundy, who is using them in preparation for MANGA data. Gaussians rock!
Martin (Strasbourg) and I discussed his project to detect new satellites of M31 in the PAndAS survey. He can construct a likelihood ratio (possibly even a marginalized likelihood ratio) at every position in the M31 imaging, between the best-fit satellite-plus-background model and the best nothing-plus-background model. He can make a two-dimensional map of these likelihood ratios and show a the histogram of them. Looking at this histogram, which has a tail to very large ratios, he asked me,
where should I put my cut? That is, at what likelihood ratio does a candidate deserve follow-up? Here's my unsatisfying answer:
To a statistician, the distribution of likelihood ratios is interesting and valuable to study. To an astronomer, it is uninteresting. You don't want to know the distribution of likelihoods, you want to find satellites! The likelihood ratio at which you make your cut depends on your willingness to publish (or really follow up and reject) crap, relative to your desire to get a complete sample. That is, the ROC curve is more interesting than the distribution of likelihoods, but the ROC curve takes a lot of work to generate (since you have to follow up a representative sample of stuff!).
But, fundamentally, where you set the likelihood ratio cut is determined in the end by your best estimate of your expected long-term future discounted free-cash flow. Your LTFDFCF is affected by many things, including the amount of time and effort you will spend on follow-up, the impressiveness of the paper you can publish, the time it takes that paper to be written and posted, the perceived or actual penalties for including non-satellites in your final results, the value of discovery priority, and the opportunity costs of failure to obtain priority, to name a few.
Nicolas Martin (Strasbourg) came in to NYU after the M31 Conference in Princeton. He gave an extended version of his conference talk on the substructure, streams, shells, and stellar halo discoveries they are making with their huge PAndAS survey of a large part of the M31–M33 virial region. Martin is also thinking a lot about star–galaxy separation, a subject close to my heart these days. We are both working in ways that we can combine results, in that we are both producing likelihoods or marginalized likelihoods that can be multiplied together, with his likelihood relating to magnitude and morphology, and our (heavily marginalized) likelihood relating to spectral energy distribution. Remember: Even if you and your friends are all
Bayesians, you want to communicate via likelihood functions!
I wrote a tiny snippet of code to fit a PSF (plus sky level) to a photon stream from a (bright enough) source in the GALEX photon catalog. This is just for vetting; we need to know if our spacecraft attitude model is giving us as good imaging quality as the official GALEX pipeline products. Also spent the morning chatting with Schiminovich and Greenberg, who are distracting me away from the Atlas with beautiful visualizations of thousands of GALEX photons.
I spent part of the morning talking to Mykytyn and Patel about the Sloan Atlas. We have awesome two-d image fitting working on Messier-sized galaxies (the hard ones) and everything looks great but we are having trouble interpreting the output! I fixed a monster bug in the code and nothing changed. This isn't atypical: A huge bug can be fundamental and insane but not actually flow down to different results. That makes me fear for the correctness of the results from almost every large scientific project.
Late in the day, Greenberg sent a huge png file showing the GALEX photon list for a single stellar source. Variations of the photon arrival positions with time, synchronized with spacecraft movements, suggest that we will be able to make a better model of spacecraft attitude and calibration. I am pretty stoked about that. I have done a lot of calibration in my life, but never of a remote spacecraft.
Greenberg, Schiminovich and I met in the Schiminovich Lab at Columbia today to discuss Greenberg's progress on reconstructing GALEX images from the photon list. I requested extremely detailed visualizations of the detected photons in the space of time, RA, Dec, detector position, and charge-pulse amplitude. It is in these signals—or really their departures from expectations—that we will see issues with the spacecraft attitude model, the detector sensitivity map, and the point-spread function. We worked out short-term to-do items and plans for the first scientific papers.
Foreman-Mackey and I made a prioritized list of outstanding issues for The Thresher and for our SDSS Stripe 82 projects. In the case of The Thresher we are just a few issues away from a publishable piece of code and paper. We also ate lunch with Sandford and got him on the point-spread-conversion problem I mentioned yesterday.
As another terrible distraction from my main lines of work, I wrote up a short note for Konrad Kuijken (Leiden) about how to create an image that is the expected image you would have got if you had had a uniform, circular point-spread function. That is, your point-spread function is not circular or uniform, but you would like that in many applications; this is an inference problem from my perspective. This problem is very related to problems Foreman-Mackey and I are working on for The Thresher and Marshall and Sandford are working on for LensZoo.
In the last half-hour of an almost-no-research day, I wrote a few paragraphs of text for the Atlas. These can only be draft paragraphs, of course, because we are taught that final-form paragraphs take 30 minutes each to write. However, it was a great end to a frustratingly low-research summer day.
I spent day two of jury duty reading about probability, first this note by Andrew Gelman about prejudices regarding statistical philosophy and then this (long PDF) piece by Radford Neal about anthropic arguments. Gelman's piece emphasizes the oft-ignored point that it is the likelihood not the prior that is usually the most suspect and challenging thing about a statistical analysis. Neal's confirms my view that Susskind-like approaches to anthropic arguments are just plain wrong. I think I have complained about this before.
On the first day of my jury duty stint at NY State Criminal Court, I wrote a simulator for a radio telescope array. It seems to work, though I don't have any unit or functional tests yet. The nice thing is it simulates a thermally—not coherently—emitting source. That was a bit non-trivial to get right. When the source is thermally emitting, the correlation amplitudes are very noisy, but the phases can still be very well measured. This is all part of getting intuition about noise models for radio arrays.
Columbia astronomer Jennifer Sokoloski is also serving on jury duty; we took the opportunity to have a conversation about accreting white dwarfs and supernovae. Sokoloski has some data on resolved stellar environments (circumstellar shells and winds) that might benefit from The Thresher.
En route back to New York, I read this paper by Bastian & Biermann (recommended by Anthony Brown) about the astrometric calibration issues that arise in the Gaia satellite because of the finite drift-scan integration time across the CCDs. There are several valuable insights in the paper, but the most amusing to me is that very rapidly variable objects will obtain signals in Gaia with a different effective mean time on the CCD and therefore be subject to a very slightly different spacecraft attitude model, in some sense. I don't think this will be problematic for many stars, but at Gaia precision, a lot of things matter. It is one of the many reasons I love interacting with the Gaia community.
I spent an enjoyable day at the IAP in Paris, chatting with the local luminaries. It was great to see the man himself Bertin (IAP), with whom I talked about extreme deconvolution, the Tractor, and modeling point-spread functions. On the former, he strongly encouraged us to work on photometric redshifts for galaxies (we have already done quasars); he expressed a strong conviction that most methods being developed were not properly taking into account the photometric uncertainties or noise, and therefore doomed to underperform. We could do this easily; the way Bovy set up XDQSO, it is an easy swap-in replacement of training data. Does anyone out there want to do this? I would be very happy to consult, and I bet we could convince Bovy too. A successful project could have a big impact on next-generation weak lensing projects.
McCracken (IAP) and I talked about calibration and catalog generation in huge new surveys. McCracken made a nice point (preaching to the converted I should say) that the generation of catalogs is very closely related to the calibration and reduction of the data. You can't really separate these, in part because if the catalog is an approximation to the parameters of a maximum-likelihood model, its creation depends sensitively on understandings of the noise. McCracken asked me to distinguish the Tractor from the next generation of SExtractor and I think the key point is that in the Tractor we see all the information as
parameters; we make no distinction between catalog quantities and calibration quantities. We can also very flexibly freeze or fit parameters as our knowledge or confidence changes. That said, we are also vapor-ware right now!
My ex-student Ronin Wu is now at CEA, so we met at IAP. We talked about getting her most interesting thesis chapter—a measurement of the molecular-hydrogen mass function using Spitzer IRS spectroscopic data—ready for publication. It is very close, but we have some strange results that suggest that different radiation-hardness indicators don't correlate well, and that the star-formation rate is not a strong function of molecular gas mass. Those need to be tracked down or understood.
In transit to Paris, I worked on a problem given to me by Rix and Brent Groves (MPIA): Model the dust emission in M31 at the resolution of the highest-resolution Herschel maps, but constrained by all the maps. This is a perfect job for the Tractor but it requires creating new objects (think unresolved, emitting
dust blobs). I didn't get very far, because it is a substantial generalization of what we have. What we have doesn't currently constrain the spectral energy distribution of any source very much a priori. That is, the Tractor doesn't yet have strong priors on emission mechanisms.
Rix and I spent part of the morning with Frank Bigiel (Heidelberg) talking about radio interferometry. In the upcoming ALMA world, people doing work on faint sources (deep fields) will want to get probabilistic information out of radio maps—information like confidence intervals on source brightness and existence. What they don't currently do in the radio world is write down a likelihood function for the scene; there is no probability of the read-out amplitudes and phases on baselines conditioned on the model for the scene on the sky. We discussed the possibility of creating that function; we spent (therefore) a lot of time talking about noise.
I spent a bit of the morning interviewing Wolfgang Brandner (MPIA) about Astralux, their fast imaging camera (no longer
Lucky Imaging camera since we are going to get everyone using The Thresher). We talked about how the fast cameras work; I learned that they have significant non-linearity and saturation can be a problem. The first has been corrected (how well, I wonder?) and the second we would need to take into account at Threshing. We also discussed the possibility (which is hard to deal with) that the same atmospheric variations that create a non-trivial PSF can also create plate-scale variations. Not sure whether to think about that; we do see some evidence for that in the data we have.
In the afternoon Hans Kjeldsen (Aarhus) gave a very nice seminar about Kepler. He showed that they can use helioseismology to determine the fraction of hydrogen burning that has completed inside a star; that is, as a clock! He showed nice results on exoplanets too, including surprising albedo measurements and a demonstration that multi-planet systems
draw their planets from a different distribution than single-planet systems. That's nice. He said that in exchange for a mission extension, the Kepler team is giving away all the data with no proprietary period, starting soon. That might have a big impact on my activity in the next year!
I spent the weekend—in transit to Heidelberg and in the garden there—trying to spec out the least complicated possible project demonstrating that hierarchical inference beats averaging for weak-lensing shear-map measurement. Today I was at MPIA, my summer home, where I will be in residence for July and August. I showed off The Thresher to the PanSTARRS team at MPIA, and then spent the afternoon struggling with Python with Rory Holmes. We are very close to resubmitting our self-calibration paper, which shows that some imaging survey strategies are much better than others. Holmes and I also talked about the next projects, which involve more realistic survey strategies for Euclid and BigBOSS and also methods that will permit simultaneous self-calibration of the flatfield, of intra-pixel variations in sensitivity, of the astrometric mapping, and of the point-spread function.
I spent the afternoon in Amsterdam talking one-on-one with people about Gaussian processes. There are a lot of time-domain projects in astrophysics that involve both stochastic and quasi-periodic variability. Aside from eclipsing binaries, almost no variables are perfectly periodic. Indeed even eclipsing binaries are complicated if the members of the binary are themselves stochastically variable. Gaussian processes are among the simplest methods for modeling stochastically varying and quasi-periodically varying objects, but the simplicity is only visible once you have confronted some conceptual blocks! It is simultaneously very simple and not obvious that you can think of your entire data set as a single draw from an enormous, multi-variate Gaussian (or even better, a finite-dimensional sampling of an infinite-dimensional one). Once you make that leap—the leap of seing not each point as being a Gaussian draw but the whole data set as being a Gaussian draw—you get a huge amount of power to describe non-trivial phenomena in the time domain.