lens finding

Marshall was in town and we tried to get some work done on our project to find strong lenses in ground-based imaging. We spent most of the time hacking around in legacy code. Foreman-Mackey showed us how to write a setup.py file in a Python module, which was very useful.


eclipse visualization

My only significant reportable research today was working on our GALEX eclipses, in particular the figures. Here's one:


MCMC for least-squares

Goodman came by with his student Bo Zhu to talk about a method for sampling in the special case that the logarithm of the likelihood is quadratic in a residual (like a chi-squared) and the priors on the parameters are Gaussian (or improperly flat and unbounded). They have a very clever method, inspired by Levenberg-Marquardt optimization, that makes use of the derivative of the model with respect to the parameters; their method perfectly (that is, with unit autocorrelation time) samples any linear least-squares problem straight out of the box. They asked me for problems (nails for their MCMC Hammer, if you will); we discussed a few but decided to start by testing it on problems on which we have run our ensemble sampler.


good attitude

I spent some of my travel time working on a suggestion for the Gaia pipeline teams: A pillar of the Gaia data analysis plan is a fully data-driven (that is, not physical) attitude model, set by the timings of the star transits. The model they plan on implementing is immense: It consists of a hundred million attitude pseudo-vectors on a grid of spline knots. I am proposing that the team set the freedom or complexity of this model objectively, using either cross-validation or a Bayesian mixture of different complexities. Either way, objective setting of the complexity should beat the team's plan of hard-set complexity; what I don't know is whether this will or can improve the results significantly enough to justify the work. I also suggest that they can tune the complexity in two different ways: They can either change the knot spacing or else tie together adjacent knots using a smoothness prior.


telnet protocol

[I am on family-related travel this week, so although I am working, it is not at full capacity, and I hereby assert The Rules.]

I got on the screen+skype pair-code tip with Lang for a couple of hours today to debug the part of our white-dwarf-eclipse project that queries the JPL Horizons system for the GALEX ephemeris. We need to contact the ephemeris because we have to translate the ingress and egress times for the eclipses into Solar System barycentric time. Think: Eight minutes from Earth to the Sun and all, plus additional aberration because of the spacecraft. It turned out that the problem was our negotiation with the telnet port (yes, we party like it's 1983 here at Astrometry.net headquarters) at JPL. We are using the standard Python telnetlib module, which doesn't hold your hands at all. That is, it requires us to construct all the "DO", "DON'T", "WILL", and "WONT" commands by hand. If you have no idea what I am talking about, thank your lucky stars! Suffice it to say that as we were working, it was sounding like we were hacking into NORAD for some War Games.

In the end, we succeeded, and it occurs to me that a Python module that wraps the telnetlib module and talks competently to the JPL Horizons system could be very, very valuable. Should we clean that up and push it to github? I am feeling all "code releasy" these days, with the huge positive community reaction we have got for emcee.


Dr. Berry Holl

I was the opponent today in the PhD defense of Berry Holl (Lund). As opponent, my role was to draw out the content of the thesis with questions and also subject the candidate to (extremely public) scientific scrutiny. It was great fun, in part because the thesis was so good, and in part because Holl handled the questions so well and so completely. His thesis has a lot of challenging linear algebra in its core (and I love linear algebra) but in the end I found myself spending most of my time on the 2/7 of his thesis that dealt with radiation damage to the Gaia CCDs. The question Holl answered is what that damage is likely to do to the data and to the precise measurement (and more importantly modeling) of the transit times of the stars. The conclusion we came to during the event is that the Gaia science data will actually contain more information about charge-transfer inefficiency and radiation damage than any other source of data before or during the mission. And that includes all the data that have been taken expressly for the purpose of learning about radiation damage!

Many Gaia projects have this character: The nasty systematic effect (general relativity in the Solar System, substellar companions, surface convection on giants) that matter at Gaia precision turn Gaia into the world's most sensitive measuring device for those effects. So in particular, at the end of the Gaia mission, we will have a great deal of precise quantitative knowledge about the behavior of CCDs that have been damaged; a great deal of precise knowledge about CCDs in general, really.

Holl passed his defense unanimously, of course, and it is with this post that I congratulate him and welcome him to the community of scholars!


three themes of data analysis

Melvyn Davies (Lund) upbraided me for not bringing a jacket and tie to Lund for tomorrow's PhD defense event, where I am a very important person. After he finished haranguing me about my jeans, we discussed the possibility of ever imaging or detecting directly free-floating planets. The conversation was discouraging!

I gave my seminar, and in preparing it I realized that there are a lot of simple themes connecting the crazy array of seemingly disconnected topics I work on. I was able even to classify my projects: Those that involve data-driven models (Tsalmantza quasar, Bovy quasar, and Fergus high-contrast projects); those that involve probabilistic classification or mixture models (Foreman-Mackey calibration, Lang Comet Holmes, and Koposov GD-1 projects); and those that involve moving away from catalogs and down towards rawer (pixel) data (Lang faint-motion and my own crazy large-scale structure projects). All this pleased me, because those are three ideas that can (in principle) be put into a one-hour seminar. I failed today, but it is a process, right?

By the way, one of the nicest conclusions of the Holl (Lund) thesis is that forward modeling is the best way to deal with Gaia's charge-transfer inefficiency issues. That's good for my brand.


inverting inverse covariance matrices

On my flight across the Atlantic I took photos from my seat which I today assembled into this video:

I also finished reading and making comments on Berry Holl's (Lund) PhD thesis. He has worked out expansions for inverting enormous Gaia-like inverse covariance matrices—inverse covariance matrices tend to be simple and sparse, covariance matrices tend to be complex and dense—and he has shown that Gaia can deliver on its promises despite the expected radiation damage to its CCDs. This radiation damage leads to charge transfer inefficiency, which leads to changes in the point-spread function in the scan (charge-transfer) direction on the CCDs. This leads to timing residuals which in principle affect astrometric measurements. However, the multiple scan angles at which Gaia hits each field saves it, even if the CTI is evolving with time and doesn't match exactly any of the (somewhat heuristic) models.

Holl impressed me 1.5 years ago and it will be an honor to play the (formal) role of opponent at his defense.

One thing I need to work out (for my own good) is how inverting the inverse covariance matrix relates to marginalization. The diagonal elements of the inverse covariance matrix are like the inverse uncertainty variances holding all other parameters fixed, whereas the diagonal elements of the covariance matrix are like the uncertainties marginalizing out all other parameters. That's all cool. But inverting the inverse covariance matrix is something any responsible frequentist must do; marginalization is only permitted for Bayesians. Do you see why I am confused? I am not confused about the math; I am confused about the meaning.


everything except the task at hand

Once again, Schiminovich and I were at our undisclosed location to work on GALEX time-domain projects. Instead we had a wide-ranging conversation about upcoming projects. We discussed the GALEX photon list project, and the kinds of things we will be able to do with a trillion time-tagged photons. We discussed joint NYU-Columbia advanced astrophysics lab courses that would have the students building and operating autonomous telescope hardware on NYC roofs. That project could beautifully combine teaching and research. We discussed what Popper calls verisimilitude: Are our well established models of the physical world literally true? Uh oh, weren't we supposed to be writing a paper? We discussed dispersed integral-field spectroscopy but also spatially dispersed to drop the background levels.

In the airport (on my way to Lund, Sweden) I had conversations with Foreman-Mackey about the arXiv and Lang about jobs.


gravitational lens modeling

Despite all my intentions to stop working on strong gravitational lensing, I keep getting accidentally pulled back, with interesting projects from Marshall and new discoveries from Tsalmantza. Today I pitched some strong lensing modeling projects to NYU graduate student Cato Sandford. The idea is to simultaneously model multi-epoch multi-band imaging (with variable PSF) with a time-varying multiply imaged quasar and the galaxy that is lensing it. That conversation is about the only real research I can report on today.


publishing implementations

Foreman-Mackey and I got very close today to finishing a note for arXiv on his super-fast, parallel, ensemble sampler that we have been using in a range of projects (see recent papers by Lang and Bovy). We will put it up as an arXiv-only paper, which is something I love to do. But the fact that this is not a typical or normal kind of publication—for example, there is nowhere that it could appear in the peer-reviewed literature—is crazy: A great implementation of a good algorithm that enables lots of science is itself an extremely important contribution to science, just like a telescope or a camera or a spectrograph. How can we make these things count like publications? And how can we change the language we all use that separates these contributions out into categories that are always contrasted with the category "science"? Enough spouting; watch the arXiv this week for some block-busting code.

[Note added a week later: Here is the arXiv paper.]


electromagnetism and massive stars

Inspired in part by our meetings yesterday about Fergus's modeling of imaging data in a coronograph, I worked on a physically motivated re-factor of my physically motivated code to model electromagnetic fields (phase and amplitude) in astronomical telescopes and cameras. I am just a few dozen lines of code away from having a full model (highly approximate) of a simple coronograph.

In the afternoon, Selma de Mink (STScI) gave a nice seminar about extremely massive star evolution. Among many other things, she noted that there is a possibility that low-metallicity, rapidly rotating, massive stars could evolve to very hot temperatures and very high luminosities where no other kinds of stars can be. I think we can find these things in PHAT data on Andromeda; I need to email the team.


exoplanets and speckles

Fergus did a set of demonstrations today for Oppenheimer, Brenner, and me of his planet-finding code for Oppenheimer's P1640 high dynamic-range imager. The imager blocks out most of the light of the star in an intermediate focal plane, but a combination of atmosphere and optical distortions plus physical optics means that still huge amounts of light hits the focal plane and in a very speckly pattern of blobs. Fergus showed us that he can (potentially) find planets among those speckles, even planets that are percent-level distortions of the speckle pattern! If this holds up it could have huge impact on high dynamic-range imaging, now and in the near future. For the past week or two I have also been playing around with modeling electromagnetic fields in imperfect cameras to see if we can make a more physically motivated model (Fergus's model is data-driven rather than physics-driven).


HST target selection

Tsalmantza and I discussed how we might winnow down our list of potentially lensed quasars into a set of sensible targets for HST imaging. It is essential to look for marginal evidence of extension; that is, do the quasars depart from our expectation of point-source morphology. A more speculative path is to look at luminosity indicators: Are any of the quasars brighter than you would expect given line strengths and ratios, possibly indicating gravitational magnification?


eclipsing binaries to population inference

Schiminovich and I returned to our undisclosed location today to work on the eclipsing white dwarfs we found in our GALEX time-stream project. We spent an inordinate amount of time working out how to infer the properties of all white-dwarf binary systems from a small number of discovered eclipsers. It is possible of course; the magic of a probabilistic generative model makes anything possible when selection and discovery can be modeled at least statistically, which it can (easily) in this case. In paper one, we are only going to do the most rough (order-of-magnitude) population inference, but eventually we should be able to say quite a bit. One consequence of our discussion was an increased optimism that we might have or get some companions that are substellar.


MCMC tips

I spent my research time today working on the discussion and tips section of Foreman-Mackey's MCMC code document. We hope to go up on the arXiv later this week. His MCMC code is being used by Bovy, Lang, and others as well as Foreman-Mackey and me in various places. The tips I worked through involve initialization of an ensemble of samplers, how to know that you have sampled for long enough, how to manage acceptance fraction, how to limit your sampling so you don't waste time, and how to deal with multi-modal probability distributions. His document is starting to look like part of the nascent "Data Analysis Recipes" series of papers being written by me and friends. Can't wait to see this hit arXiv; he has built a pretty great tool with a wide range of applications.


JWST astrobiology

Lisa Kaltenegger (MPIA) gave the astro seminar today. Her group has been doing great things on exoplanetary atmospheres and their observability; she showed that even the temperature structure and detailed energy balance and chemistry of an Earth-like atmosphere could in principle be inferred from low-resolution spectroscopy. The future is extremely bright in this area. One shocking thing that she said is that, given planned lifetime and sensitivity, JWST will not be able to realistically and at good signal-to-noise follow up the atmospheres of more than a few rocky planets (which are small and therefore very hard to observe). If we don't have rocky-planet candidates that are JWST-appropriate (that is, orbiting stars nearby enough to the Sun), JWST won't be able to do very much. It is not clear that we will be ready! In my view, which is perhaps not a consensus view, if JWST doesn't make a big impact on exoplanet research, it was probably not worth the (very large amount of) money. Cosmology and galaxy formation are great and all, but at the JWST price-tag (and exclusion of very worthy competitors), it has to make big impacts in many areas. So let's work on projects that get that candidate list ready; Kepler—awesome as it is—doesn't do the job because it's field doesn't contain bright enough (near enough) primaries.


station keeping

Not much to report today, except referee-responding with Lang, MCMC packaging and documentation with Foreman-Mackey, and safe operation of heavy Tractor machinery with Mykytyn. Also, Mulin Ding (the best sysadmin in all of science) installed a new 30 TB of disk on our current favorite big compute machine. I am sure we will fill it by April.


data archives filled with undiscovered discoveries

I spoke with Tsalmantza today in our weekly phone meeting. She has run our two-redshift model (which discovered some binary black hole candidates last year) on all the quasar spectra in SDSS. She has found a large number that show evidence of a foreground galaxy. These are all excellent candidates to be multiply imaged gravitational lens systems, some of which will have time variations for measuring time delays. HST proposal time! The nice thing is that you can tell a lot about each system just from the spectrum, not just the redshifts but detailed galaxy and quasar properties, so target selection for follow-up can be a highly informed process.

In a not unrelated conversation, I discussed with anthropologist Katie Detwiler (New School) the industrialization of astronomy (especially in Chile); she is interested in how changes in astronomy (increasing importance of archival research and specialized observing staff separated from academic astronomers) are related to larger trends that include globalization and development. She asked me how astronomy has changed now that many of the things being discovered each year are in fact just being located in databases that were filled long (years) ago (as with SDSS). That's a great question.