In a low-research day, Iain Murray (Edinburgh) sent me email telling me to never use kernel density estimates! I was flattered but I asked for some clarification. Chris Martin (Caltech) gave an inspiring seminar about directly observing the intergalactic medium in emission with ultraviolet imaging spectroscopy. He showed some very nice astrophysics results from pilot projects. This could change the world; right now most of the baryons in the Universe are invisible. Foreman-Mackey started work on his punk Kepler data interface.
Fadely's calibration projects are evolving from k-nearest-neighbor to kernel density estimates. We spent some time today trying to figure out how KDEs are implemented in practice in dimensions larger than 1. Almost all the discussion on the web uses one-dimensional examples, which are cool and instructive, but there are a few details about generalization to higher dimension that are non-trivial. I think we figured them all out by the end of the day.
Foreman-Mackey found some craziness with the eccentricities in his Kepler fits. These were puzzling for a while: Kepler data is only minimally sensitive to eccentricity. But then Foreman-Mackey remembered the obvious point that if you fix the parent-star mass and radius, then you can constrain the eccentricity. That's even been done for real systems.
Foreman-Mackey resubmitted the emcee paper and it is now accepted for publication in PASP. Update your CVs and your reference lists, people! Chris Martin (Caltech) showed up yesterday for a week at NYU; I spent a good chunk of the day today bending his ear about my project with Schiminovich to recalibrate the GALEX satellite and deliver time-domain lightcurves at the quantum limit (photon by photon). Martin, being the PI of GALEX (and now the owner-operator, since NASA loaned GALEX to Caltech), pointed out some subtleties of any GALEX model-of-everything.
Foreman-Mackey and I discussed the referee report for the emcee paper (which we submitted to PASP at the beginning of the month). We want to make a few additional changes, including citing all the people who are using it. I was pleased to learn that in its first 11 months on the arXiv, the emcee paper got 25 citations! And those are citations of an unrefereed paper! It was for these general reasons that Rix had insisted that we submit it to a refereed journal.
Foreman-Mackey is also under the gun to finish the Bart paper, in which we deliver ultra-fast MCMC sampling for exoplanet transit lightcurves. We discussed parameterizations of the celestial mechanics problem, as we so often do. It's not trivial.
Late in the day, I wrote words in my document that justifies the Long-Term Discounted Free Cash Flow as the basis for all decision making, even in statistical inference.
Around a nice talk by Daniel Grin (IAS) about estimators for non-Gaussianity in CMB projects and related matters, Marshall and I worked on initialization and optimization of lens models for PanSTARRS data. The optimizer was borked but we figured out it was because the (numerical) derivatives were being taken with step sizes that were too large. Fixed that and it now almost works!
Marshall and I got Lang on the phone for talk about The Tractor, which Marshall and I are adapting to fit strong lenses (and alternative models) to every source in the PanSTARRS imaging. With help from Lang we got the code running, but it looks like initialization (as usual) is the problem. Late in the day we wrote the abstract of paper 1 on the subject.
Sarah Ballard (UW) came in for the day to give an outstanding talk about habitability and exoplanets. Her talk was a beautiful balance between the general (exoplanet populations and host stars; you can't characterize a planet if you can't characterize its star; M dwarfs have lots of planets but bad spectral models; etc) and the specific (how Kepler 61b went from being habitable and Earth-like to not either). She pointed us to some great literature on the statistics of planets and false positives. Among the many tidbits from the talk: She repeatedly called exoplanet researchers
exoplaneteers. She predicted that there would be a rocky, habitable-zone planet discovered in 2013. She made the point that directly measured stellar diameters (from optical interferometry) have revolutionized the field. I can't help mentioning also that the talk was hilarious and interactive, with many audience members participating, just like a seminar (in my mind) should be. I love my job!
There has been a lot of discussion at CampHogg about how we might publish and get attribution and citation for code. There is the Astrophysics Source Code Library, which is definitely a step in the right direction, but I would like something that is as valuable as a refereed publication. In discussions over lunch today, Foreman-Mackey and I came up with a solution: We have been thinking about how to publish code but we should rather be thinking about how to code-ify publications. That is, we should think of each (relevant) publication as being a code release. That might work well with the current direction of our work, and certainly for Foreman-Mackey's dissertation work. Since we are tool-building, each science publication is backed up by releasable tools, for which those publications will serve as documentation and citation cows. I haven't really worked it all out yet, but this might work. Not much else got done today, except some writing for Mykytyn and some consulting with Ingyin Zaw (NYUAD) regarding maser galaxies.
Fadely and I got together a simpler formulation of Fadely, Fergus, and my self-calibration project. I think we have something that makes sense but also might be computable in practice. I realized that there are two different kinds of self-calibration: The first category contains methods (like "grid tests") in which you know the identities of the sources in the images and you are checking that the measurements of those sources in properly calibrated images do not depend on detector position. The second category contains methods (like the "super-flat") in which you don't know what the pointing is of any image, but you expect that the statistics of properly calibrated detector pixels (in the long run) will all be identical. We are working on generalizations of the second type, which I am calling "probabilistic" self-calibration.
Fouesneau (UW) and I discussed and adjusted the initialization for his ensemble sampling (with emcee) fits of King models to young stellar clusters in the PHAT data. Our pretty consistent experience is that you should initialize the ensemble in a pretty small ball in parameter space and then burn it in to a fair sampling. We also looked at the autocorrelation times, which are not stably measured in short chains, and only stably measured when the chains are long enough that you are properly converged. All of the experience we have developed in MCMC sampling for inference in typical astronomy problems ought to be passed on to the community somewhere! After Foreman-Mackey finishes his current exoplanet paper, we might take a couple weeks and write the
how to do MCMC document.
On my way to work I ran into Fergus, taking photos in preparation for a SIGGRAPH submission. I helped him out for an hour. Fouesneau (UW) made plots to evaluate the quality of his PHAT young stellar cluster fits. They look great; I think he has nailed this fitting. Next up: Do the fits in multiple bands and check that the color distribution is tighter than it used to be. At lunch Foreman-Mackey figured out that the dry-erase glass coffee table I am building (yes, building) could include a back-projected external monitor. That lost us some time with online shopping for compact, short-throw projectors. At MCMC meeting (Goodman, Hou, Foreman-Mackey, Fadely, and I) discussed combinatoric degeneracies and their ubiquity and hardness. The fact that you can reorder a set of planets or stars or whatever in your model and leave the likelihood unchanged is a real problem: You either fix this massive, perfect degeneracy in the prior (by enforcing order) or else live with it. Either way, it hurts performance in sampling with almost all known methods. Late in the day, as I was describing my dream of a data-driven modeling approach to APOGEE chemical abundances, Fadely had a brainstorm: We could consider random subsets of the spectrum to avoid unknown or uncharacterized data or noise issues. That made Foreman-Mackey immediately say
random forest, a method I never thought I would find myself using. Over the next hour I became more and more convinced that random forest could do almost exactly what I need. There were many other breakthroughs in that conversation, including desiderata for the data-driven model and ways to test it or start out.
Morgan Fouesneau (UW) arrived for a few days of paper-finishing intensity. He is trying to measure the unresolved (confused) light in young star clusters observed in the PHAT data on M31. We discussed the model and I asked for some diagnostics. They look good; the models seem to be working better than Fouesneau himself thought! Marla Geha (Yale) also showed up for some spectroscopic calibration consulting. We discussed the projects I had been doing with Roweis (and Bolton a bit) at the time of Roweis's death. She may want to give them a shot! We started by specifying some visualizations to make from her current calibrations. The idea is: If we can find invariants of the calibration, we can strongly regularize the calibration fits. Roweis and I were doing this in full generality, but the first thing to do is just look at the data.
The meeting was mostly science today, with various collaboration teams showing results on chemistry and dynamics in the Milky Way disk. An extreme stand-out for me was work by Gail Zasowski (OSU) on diffuse interstellar bands observed in the spectra as absorption lines tracing the ISM. These could be combined with Bovy's stellar tracers to build a highly assumption-free model of the disk. There were many other beautiful results shown. I kept encouraging the teams to make sure they publish before SDSS-III DR10, both because that's when everything goes public, and because there will be a fun press release. Late in the day the discussion of DR10 to-do items started. The APOGEE team has its work cut out for itself: This is the reward for building a great spectrograph and using it with creativity and efficiency.
I spent the day at the APOGEE meeting at OCIW in Pasadena. The talks varied between science results and technical issues. Highlights for me included the following: Madore and Kollmeier both gave nice talks about RR Lyrae variables. They showed infrared calibrations of the period–luminosity relation and highly efficient spectroscopic methods for doing fast surveys. This could have a big impact on Milky Way substructure projects. Udalski showed outrageously awesome results from OGLE, including free-floating planet discoveries. Law showed the beautiful and highly encouraging first-light images from the MANGA instrument built on the BOSS spectrograph. Hearty and Wilson gave great talks about the issues involved in moving the APOGEE spectrograph to the Las Campanas Observatory, which is part of the plans for APOGEE-II. I love engineering!
Today was the first-ever AAS Hack Day. We held it on the last day of the Long Beach AAS Meeting. The basic structure was: You sign up (registration fee: zero); arrive at 10:00; people with ideas pitch them; hack hack hack in groups of one or two to six; reconvene at 16:40 to report success or progress or failure. No experience necessary! And it exceeded my wildest expectations. There were many successes; here are some personal favorites:
- Mars video
- CMU Undergraduate Ashley Disbrow with help from Michele Vallisneri (Caltech) built from NASA Mars Rover data a freakin' three-d movie on the surface of Mars! Go get some red-blue glasses and fire it up!
- Justin Vasel (Minnesota) proposed, executed, and shipped a site to track astronomy-related US legislation. Let's hear it for open government, and serious hack skills.
- Various groups worked on different approaches to making spoken versions of abstracts or summaries of arXiv papers for busy commuters and the like.
- Fund Me Maybe
- Emily Rice (Staten Island) built a large team to do filming for a parody music video about the profession of astronomy. They got some hilarious footage. They didn't finish more than a rough cut, but watch the internets for it.
- cosmology calculator
- Brooke Simmons (Oxford), at dotastronomy last year built an OSX widget to do cosmology calculations. Today she made a web version of it. It includes the WMAP+BAO cosmologies and so on. Link here.
- FITS viewer
- Eddie Schlafly (MPIA) wrote a bare-bones but highly functional FITS image viewer in python/matplotlib.
- Doug Finkbeiner (CfA) said in the pitch session that he wanted to have stretch and contrast sliders in World-Wide Telescope. He sat down with Jonathan Fay (Microsoft) and the two of them pair-coded it right there. Fay says that the modification will get pushed into the next major WWT release!
- Elizabeth Lovegrove (UCSC) had the exceedingly ambitious plan to build an observation scheduler for ground-based observing programs. She recruited team members to calculate exposure times and earth geometry issues. She then, in parallel, looked at optimization subject to constraints and non-trivial utility. Her team didn't finish—maybe because the project would take an operations research firm several years—but my goodness that's a good idea!
In addition to all those, there were several projects making improvements to astropy, writing documentation for astronomers, implementing APIs for various web services (I think mainly in Python), and trying to make diverse code play well together.
I think we are going to do this again next year! Thanks to Kelle Cruz (Hunter, AMNH) for organizing the thing, my start-up for paying for it, and Jonathan Fay and Microsoft for buying us all lunch.
Only an idiot would suggest recalibrating HST imagers, given that STScI has a set of teams of people who do this! Nonetheless, on the plane to Long Beach, I started writing a proposal to do just that. The key idea is that stars illuminate the device slightly differently than the sky or any calibration source: They are compact, and they have particular spectral energy distributions (SEDs). Fadely, Fergus, and I are close to having a method to infer the calibration from the science observations of actual stars.
My New Years' resolution is to finish at least one paragraph of writing in an unfinished paper (or proposal) before opening email or any other internet distractions. At lunch we calculated that if I successfully do this I will write—by myself, by this method alone—five full papers in 2013. Maybe even more, if I keep on writing once the first paragraph is done (as I did today). 2013 is the year of writing! (It has to be given the number of unfinished projects we have gathering dust on the github and Astrometry.net servers.)
Today my paragraphs were written in my note about fitting tidal streams. I am writing the note for Andreas Kuepper (Bonn), who doesn't even know I am doing that, but I want to trap him into collaborating: He has great models of tidal streams, various people have awesome data, and I have a likelihood function.
My only research today was writing a paragraph or two in the Sloan Atlas photometric method paper. Several people asked me to explain my cryptic comments from yesterday, so here is a (draft, needs work) version of the first paragraph from the paper:
There is a deep sense in which obtaining precise and accurate galaxy photometry is fundamentally impossible. The reasons are multiple, but the dominant reasons are, first, that the angular outskirts of galaxies can contain significant luminosity but at incredibly low surface brightness and unknown morphology, and secondly, that the imaging point-spread function (PSF) can also have large-angle contributions that are unknown. The latter problem affects stellar photometry also, but so long as the PSF is constant, precise and accurate stellar photometry is possible. The difference between galaxies and stars is that all point-like stars will illuminate the PSF (correlate with it) identically. Stellar photometry does not rely on getting all this right so long as it deals with it consistently across stars. Not so for galaxies, each of which might have very different correlations with the PSF at large angles. There is no way to produce consistent photometry without knowing things about every galaxy and every PSF that are—almost in principle—unknowable.
I started writing the introduction for Mykytyn and Patel's coming paper on galaxy photometry. It opens by explaining why it is impossible to know the total flux coming into the telescope from a diffuse object like a galaxy. That's a subject I have spent hours discussing with the great Robert Lupton (Princeton). It is fun to try to get it written down in a paper.
I spent the morning trying to write stuff about my nascent Atlas. I spent the afternoon helping Fadely with debugging the calibration code. We realized that we need to take it apart and functional-test every part separately. Duh. It seemed like a low-research day in part because my to-do list is gaining items faster than I am crossing them off. That's now how January is supposed to go.
Mike Kesden (NYU) and I spent an enjoyable two hours coming to final consensus on what we think of the Hui et al paper about GW detection. We built an order-of-magnitude argument from the ground up (which confirms the Hui results but which, sadly, doesn't exist in the Hui paper). In the end, we reproduce the Hui expressions but do not confirm their final numbers. The paper is vague about how they get their final numbers, but I think the discrepancy must be in the bandwidth; I think the Hui team is mis-estimating the bandpass in which the effect is relevant.
Today, in (he claims) twenty minutes, inspired by our K-nearest-neighbors frenzy, Fadely built a nearest-neighbor data-driven photometric redshift system and it seems to work competitively straight out of the box. That led to an afternoon of nearest-neighbor hacking. I worked on expressing my issues about making KNN heteroskedastic (that is, switching to a weighted chi-squared definition of
neighbor). Fadely and I pair-coded a tweak to our KNN detector calibration system, in which we don't use the neighbors naively, but rather use them to build a more complex model of the neighboring data space. Oh, there are whole careers that could be built on KNN methods!
In the air home I wrote YATPD (or
yet another three-page document) proposing a method for finding interesting periodic variable stars in the PanSTARRS 3-Pi survey. The survey catalog is very hard to use, because it has no upper-limit information when stars aren't detected; it isn't even easy to find out if a star was in the detector footprint on a particular night; but we have no fear: We have the best methods in the world for missing data. I think my whole life these days is writing YATPDs. I think instead I ought to be ETPWITPD (or
executing the plans written in three-page documents). I hope some of that is in store for 2013. Happy New Year!