I spent the morning working on Tsalmantza and my HMF method paper. I signed off and in the afternoon she finished and submitted it. I feel pretty good about that.
NYU undergraduate David Mykytyn has been helping Lang and I get the Tractor working on big galaxies, with the thought that we could model big galaxies with the same machinery that we are using to model small ones (and by big I mean angularly big). He got everything working in record time but the model is so simple that it is having trouble dealing with all the real-world craziness at the centers of real, huge galaxies (think dust lanes etc). This on the path to a new Atlas.
In a many-hour delay in Madrid airport at gate U58, I re-hashed the CMML workshop for a group of New-York-bound machine learners who were in other workshops, but who (today) had nothing but time (thanks
mechanical issue with airplane). I then tried to do some of the math that Marshall and I had been waving around yesterday. I think we might have some conceptual issues. The plan is sound, way sound, but we might not be able to transfer probabilistic information with quite the fidelity we had in mind yesterday.
One thing I didn't mention about the awesome day yesterday is that Marshall and I scoped (not
scoped) two papers on PanSTARRS-like data. One involves making a super-simple catalog from the imaging, a catalog made up of a mixture of Gaussians (and nothing else, not even PSF determination). Marshall has the intuition that this catalog—not even really a catalog but more like a lossy but responsible description of the mean properties of the imaging—could be used to transmit to end users far more information than is transferred by traditional astronomical catalogs. We were on a roller coaster of
good news, bad news until we cracked it in a nasty cafe across the street from the Grenada bus station.
Today was the first day of the workshops at NIPS, and the day of the Cosmology Meets Machine Learning organized by a group led by Michael Hirsch (UCL and MPK Tübingen). What a day it was! The talks, by astronomers doing cosmology with sophisticated machine tools, were edutaining, with (among others) Lupton doing his best to pretend to be curmudgeonly (okay, he does have a point that some of the stuff I say is not all that practical), Starck showing amazing decompositions of Planck-like maps, and Refregier doing his best to alarm us about the difficulty of the cosmological weak lensing problem. In between these talks were shorts by the poster presenters; all good and all high bandwidth in their four-minute spots. A standout for me was Kaisey Mandel and his hierarchical probabilistic model for the type-Ia SNe, making the cosmological constraints more precise by hierarchically learning the priors over the nuisance parameters you need to marginalize out if you want to do things right!
While many left to ski, Marshall declared the afternoon break to be an
un-workshop in which workshop topics self-assembled and self-organized. This evolved to two big un-workshops, one on probabilistic graphical models, with Iain Murray doing the heavy lifting, and one on blind deconvolution with Hirsch throwing down. Hirsch showed some devastating results in blind and non-blind deconvolution, including (in the style of Rob Fergus), outrageous ability to compensate for bad hardware or bad photography. Outrageous.
Despite all that, it was the PGM workshop with Murray that—and I am not exaggerating here—was possibly the most educational ninety minutes of my post-graduate-school life. After some introductory remarks by Murray, we (as a group) tried to build a PGM for Refregier and Bridle's weak-lensing programs. Marshall insisted we use the notation that is common in the field and keep it simple, Murray insisted that we do things that are not blantantly wrong, Stefan Harmeling provided philosophy and background, especially about the relationship between generative modeling and probabilistic modeling, Lupton tried to stay as curmudgeonly as he could, and at the end, Murray broke it all down. It wasn't just science, it was like we were starring in an HBO special about science. We realized that PGMs are very valuable for de-bugging your thinking, structuring the elements of your code, and, of course, making sure you write down not-wrong probability expressions. Aawww Yeah!
At the end of the day, Marshall moderated a (huge) panel, which covered a lot of ground. The crazy thing is that we had some important points of consensus, not limited to the following: (1) As a pair of overlapping communities, our best area of overlap is in structured, physics-informed probabilistic modeling. Many cosmologists are stuck on problems like these, many machine learners have good technology (things like sparse methods, online and stochastic methods, and sampling foo). Neil Lawrence pointed out that the machine learners got their Bayes from astronomers Gauss and Laplace. Now the astronomers are asking for it back. (2) We should be setting up some simple challenges and toy problems. These make it easy to draw machine learners into the field, and help us boil our issues down to the key ideas and problems. That's Murray's big point.
Hirsch, Bridle, Marshall, Murray, and everyone else: Thank you. Absolutely cannot understand why Sam Roweis wasn't there for it. I never really will.
I went to (most of) the NIPS 2011 talks today (my first day at NIPS). Unlike the AAS meetings, the talks are very highly vetted (getting a talk at NIPS is harder—statistically speaking—than getting a prize fellowship in astronomy) and there are no parallel sessions, even though the meeting is almost as large as the AAS (NIPS is 1400; AAS winter is 2000-ish). One standout talk was by Laurent on the strange encoding of olfactory information in insects (and, apparently, humans, which are similar in this respect). There is a part of the olfactory system that looks like a sparse coding of the input, which looks (to my eyes) to be a very inefficient use of neurons. Another was by Feldman on
coresets, which are data subsamples (imagine that you have way too many data points to fit in RAM) plus associated weights, chosen such that the weighted sum of log-likelihoods of the coreset points is an epsilon-close approximation to the full-data log-likelihood (or other additive objective function). This concept could be useful for astrophysics; it reminds me of my projects with Tsalmantza on archetypes.
On the bus to Sierra Nevada in the afternoon, Marshall and I tried to scope out our
next paper. I put that in quotation marks because we don't have a very good track record of finishing projects! We are going to do something that involves image modeling and approximations thereto that can be performed on catalog (database) quantities.
I have been reading about graphical models for a few days here and there. So on the plane (to NIPS 2011) I tried to draw a graphical model for a large-scale structure and weak-lensing jointly supported model of the density field and cosmological parameters. I think I am close. Marshall and I and anyone who can stand it are going to unconference on Friday and try to work it out.
Guangtun Zhu (JHU) stopped by this week (welcome back!) and came for a chat about modeling spectra, in particular quasar spectra that might have absorption-line systems redward of Lyman alpha. Most catalogs of absorption systems involve significant hand-vetting; we discussed methods to ameliorate or eliminate the by-hand aspects of the problem. Guess what I advocated? Justified probabilistic modeling of the spectra, plus some sensible priors.
It is not clear it counts as research, since I did it purely for fun, but I wrote a recursive code to build this kind of space-filling curve. It is slow, but it works for any image (array) that is 2n×2n. If it has any research application, it is in image compression, but I officially don't care about that subject (this paper notwithstanding; it isn't about image compression; it is about information!).
I am so proud of this; check out how scale-free it is: The top-left 4×4 block has the same two-dimensional pattern for the ones digit as the whole chart has for the sixteens digit, but with the
pixels being sixteen-cell blocks.
Price-Whelan returned to NYU for a day (from distant Columbia, where he is now a grad student) to discuss methodologies for some of the projects he is thinking about with Palomar Transient Factory imaging data. There are lots of things possible; like most big projects, PTF creates more possible projects than there are collaborators to do them; this is why astrophysics is a great field! I argued that he should use the things we have been thinking about with respect to image combination and source detection to detect—and then analyze—the sources below the single-exposure detection limit. Any project PTF has done at high signal-to-noise can be done for far more sources additionally if you are willing to work at lower signal-to-noise (and at image level rather than catalog level).
As my loyal reader knows, I am strongly against modeling non-integrable systems with integrable potentials. And guess what: No galaxy in any kind of realistic dark-matter mass distribution could ever have an integrable potential! During a meeting today (and yesterday also, which perhaps doesn't count according to the implicit parts of The Rules), Fadely, Foreman-Mackey, and I talked about how to generalize things like Schwarzschild modeling to non-integrable potentials and proper probabilistic inference. Glenn van de Ven (MPIA) does these things for a living, so we wrote him email and he sent back some great suggestions for how to proceed. Now the question is: Should we implement something, even if it means reinventing a few wheels? It didn't take us long to decide: Yes!. Ah, Camp Hogg prefers engineering to science these days. One set of ideas I am excited about is how to write down a prior over phase-space distribution functions when you don't have an integrable-potential
action–angle language in which to write it. I love that hairball of problems.
After I used my five fingers to put my foot squarely in my mouth on the APOGEE Collaboration mailing list (yes folks, apparently you should be careful with that
Reply All button), I was treated late in the day to a couple of excellent talks about NYU efforts that border on astrobiology; not astrobiology per se but close: Paul Chaikin (NYU Physics) talked about his work with Pine and Seamans to make artificial structures out of DNA or small particles painted with DNA or the like that can self-assemble and then self-replicate. Bud Mishra (NYU Math and Biology) talked about simulations of the interactions that lead to multi-cellularity, with the thought that multi-cellularity might be inexpensive and generic in the path from first life forms to complex ones.
One of Mishra's most gripping (ha ha) points was about the number of fingers we have: Is five a random accident or is it something you would expect to see on other worlds? He pointed to a nice math result of his that shows that for frictionless fingers to hold any arbitrary solid object under load, you need between 7 and 12 fingers. Given that fingers come at a cost, and that there is friction, five might be very close to optimal for a wide range of utility functions! That's an idea I will be thinking about for a while. I also got him to say a bit about higher dimensions: He said my problem wasn't crazy because an articulated object is like a solid object in higher dimensions!
Douglas Brenner (AMNH) dropped by for a discussion of spectrum extraction, related to the coronography stuff we have been working on. I pointed him to this instant classic, a paper much discussed a few years ago in this forum (for example here). At lunch, Muna talked about software development for stellar population synthesis, which could have a big impact in the large-spectroscopic-survey world we are currently in. In the afternoon, Foreman-Mackey produced color-color plots for stars in his re-calibration (self-calibration) of SDSS Stripe 82 imaging. They are looking very good.
At the end of yesterday, there was a discussion of astrobiology at NYU, and then this morning I had email from Rix about a probability calculation for p(habitable|data) where "habitable" is the bit "yes this exoplanet is habitable" or "no it isn't" and "data" is whatever you have measured about an exoplanetary system. He was inspired by this paper. We discussed briefly by email; one of the issues (of course) is that many of the parameters that might determine habitability do not even indirectly affect the observable (at present day) properties of the exoplanet. That's a nice problem; it reminds me of my conversations about extraterrestrial life: Prior information informs the current debate much more than the data do; it is a playground for Bayesian reasoners.
In an all-talk day, I talked with Lang about next steps on our various slowly moving projects; I talked with Sjoert van Velzen (Amsterdam) and Foreman-Mackey about large surveys; I talked with Fadely yet again about the scope for paper zero on classification. This was a highlight: This time I think we have done it! The scope will be showing that hierarchical beats likelihood tests no matter how bad the models. In the afternoon, Nissanke (Caltech) gave a seminar about gravitational radiation detection, particularly in the light of coincident electromagnetic signals. At the end, discussion devolved into the insane data-secrecy policies of LIGO, which I abhor. Occupy LIGO!
I had lunch today with Demitri Muna (NYU) of scicoder fame. We discussed software engineering, code-writing environments, and the idea that Muna should write a book about what he knows. He has been teaching scicoder workshops at NYU for a couple of years now, and there is no good book for scientists who code.
Almost no research got done today, with the exception of being impressed by David Mykytyn (NYU), my new undergraduate researcher, picking up IDL and using it within an hour. Of course we hate IDL here at Camp Hogg, but sometimes we can't avoid it!
I spent my research time editing Holmes's nearly-ready paper on self-calibration of imaging surveys (what I have called "uber-calibration" previously). The results are so simple and so easy-to-use; I hope we have an influence on the next generation of surveys.
Hou, Goodman, and I have a radial-velocity linear damped oscillator model to capture non-exoplanet near-sinusoidal variability in radial velocity surveys (especially for giant stars). Unfortunately, it doesn't seem to be a good fit to the radial-velocity jitter in our test-case real star. So now we have to see if among the data we have there is any nail on which we can use this hammer.
My only research today was attendance at a nice blackboard talk by Renbin Yan (NYU) on the origin of LINER emission in galaxies. He has kinematic evidence that it is produced by evolved stars rather than nuclear emission from an accreting black hole. If he is right—and I think he is—this will make some waves in the galaxy evolution world, where LINER ratios has been assumed to mean black hole.
In the morning, Mustafa Amin (MIT) gave a short talk about consequences of potential shapes in inflation for the period between the end of inflation and the beginning of the
normal physics of standard-model particles. In the afternoon, Joel Primack (Santa Cruz) gave the astrophysics seminar on the extragalactic background light. For the latter, some of the constraints come from the absorption of high-energy photons from blazars. I asked about scattering—when I was a boy I was told that whenever there is absorption there is always scattering, was that wrong?—but he implied in his answer that the scattering mechanisms are very different from the absorbing mechanisms. I am confused! If you do the classical calculation of a plane wave intercepted by finite absorbers, you have to get scattering too, just as a consequence of physical optics. Maybe my reader will un-confuse me.
Between talks, Jagannath and I discussed the implications of recent literature developments for our paper on fitting streams. I think we have a plan to re-scope it.
Sarah Bridle (UCL) and Marshall called me by Skype (tm) this morning (my time) to discuss Phil and my crazy ideas about inferring the mass distribution and the cosmological parameters from weak-lensing data. Instead, we got sidetracked onto clustering measurements, with me making the pitch that we might be sacrificing signal-to-noise on the baryon acoustic feature by fitting models at the point-estimated (pair counts in separation bins) two-point correlation functions. That is, no large-scale structure project ever writes down a likelihood function for the galaxy positions.
At lunch, Fergus showed me his new and improved models for Oppenheimer's coronograph data. He factorizes the data matrix into principal components, and then finds that he can't fit the data well near the companion (exoplanet) locations with the dominant principal components. Furthermore, he finds that if he fits the data as a mixture of (empirical, data-driven) speckle model for the residual star light plus (heuristic, theory-driven) point-source model for the companion light, he does a great job of separating the components and thereby photometering the companion. Beautiful stuff and music to my ears, methodologically.
I got Tsalmantza and my HMF method paper up to the second-draft level. I was supposed to finish this back in October! It still needs a bit of work—Tsalmantza and I speak with very different
voices—but it is extremely close. One thing the paper lacks is a clear demonstration that optimizing chi-squared is far preferable to optimizing an unscaled mean square error. I guess we always considered that obvious, but I now wonder if everyone else does.
I spent the day at the Spitzer Science Center as part of my Oversight-Committee duties. This is not research. However, on the airplane to the meeting, I started image simulations to explore an idea that came up while I was at UCLA: Can you use multi-epoch imaging to beat the naive confusion limit if your sources are moving fast? I am sure the answer is
Daniel Mortlock (Imperial) dropped in for the day and we spent some quality time talking about data. We don't have any project to work on, but we should! Mortlock holds the current high-redshift quasar redshift record. At lunch time, Gabe Perez-Giz (NYU) talked about computing geodesic orbits around Kerr black holes. He is a big believer in making use of symmetries, and there are additional computational symmetries if you look at closed orbits—orbits where the azimuthal, radial, and polar-angle frequencies are rationally related. If you can build a dense orbit library out of these orbits (and it appears you can), then you might be able to compute a whole bunch of stuff fast. Now show me the money!
In the astro seminar today, Joey Richards (Berkeley), about whom I have been blogging all week, spoke about the methodologies and successes of the Bloom-led Center for Time-Domain Informatics team in automatically classifying time-variable objects in various imaging surveys. He concentrated on random forest (a combination of many decision trees, each of which is made by randomizing in various ways the training data) in part because it is extremely effective in these kinds of problems. He even claimed that it beat well-tuned support-vector machine implementations. I will have to sanity-check that with Schölkopf in Tübingen! Richards did a great job, in particular in explaining and responding to the principal disagreement we have, which is this: I argue that a generative model that can generate the raw pixels will always beat any black-box classifier, no matter how clever; Richards argues that you will never have a generative model that is accurate for all (or even most) real systems. It is this tension that made me invite him (along with Bloom and Long) to NYU this week.
After the seminar, Or Graur (Tel Aviv, AMNH) showed us how he can find supernovae lurking in SDSS spectra and pitched (very successfully) a test with SDSS-III BOSS data. We will get on that next week.
While Richards worked on
faintifying bright Mira-variable light curves and censoring them in the manner of an insane robot (or astronomical imaging pipeline), Long and I worked on Python-ifying, and numpy-ifying some slow marginalized likelihood code. The issue is that our likelihood model has two nuisance parameters per data point (the true uncertainty variance and the true censoring flux value, considered poorly known and different for every datum) which we want to marginalize out inside the repeatedly called likelihood function. Lots of ways to do this slowly; few ways to do this fast. The goal is to have the skeleton of a paper by tomorrow afternoon!
Over a long and hearty breakfast, I turned our equations for fitting lightcurves in the presence of carelessly censored data into a LaTeX document complete with lots of discussion. I handed it to Richards and Long.
I have published one paper and have two more in the works under the subject measuring the undetectable. My evil plan is panning out; today Joey Richards, James Long (Berkeley), Dan Foreman-Mackey, and I all agreed that we should work together on a very nice and extremely practical project. Here it is:
Imagine you have a catalog of point-source fluxes, measured for a bunch of sources in a badly documented multi-epoch survey. Now imagine that you are looking at one source, which is variable, and it has been detected at some epochs and not at others. Imagine further that at the non-detect epochs, you are not provided with any information about the flux upper limits; all you know is that the source
wasn't seen. How do you fit the stellar (lightcurve) properties of this source, using both the detections and the non-detections?
Without priors, this is impossible, I think, because you don't know whether the non-detections are non-detections because the data at those epochs was extremely bad or whether they are non-detections because the star at those epochs was very faint. But we figured out what you could do if you could hierarchically infer a prior over the detection threshold and over the noise properties of the detected sources. We started to write documents and code in the afternoon.
Josh Bloom, James Long, and Joey Richards (Berkeley), all of Palomar Transient Factory and Center for Time-Domain Informatics fame, are visiting Camp Hogg this week. My not-so-secret plan is to come up with a jointly executed project, so that I can get a piece of this talented team for free. We discussed options, ranging over classification, prediction, decision making, and so on. We have a substantial difference of opinion on various matters—Camp Hogg is all about models that generate the raw data, while the CTDI likes data-driven models in feature space—but we are working towards very similar goals in our work on time-domain astrophysics. One shared goal is to measure time-domain properties of sources too faint to identify at any individual epoch; that might be our project.
Only research today was a talk by Gwenael Giacinti (APC, Paris) about anisotropies in the local distribution of cosmic rays. It ended with a short argument (in the audience) about lensing of the intensity field, with various people confused and not confused about the meaning of the point that lensing conserves phase-space density of photons. The point is trivial in one sense, but in an equally important sense, it is not at all trivial, as evidenced by confusion among astrophysicists about what it can and can't mean. The argument inspired me to write something about it.
After getting back on the red-eye, I attended a nice seminar by Mike Eracleous (Penn State) about black-hole binaries. He is a bit more optimistic than I am about finding and confirming them, and about their usefulness. But we had near-simultaneous papers on the subject this year. In the afternoon, Fergus and I continued our debates about how to model speckles in the 1640 coronograph data. The issues are all about how to make the model so flexible that it can fit all the speckles, but so constrained that it does not at the same time want to fit out the companion (think
exoplanet). Out most insane idea is to fit for the electric field in the camera! But that, I think, is hard.
I was at UCLA all day today to give a seminar. I had great conversations with many of the faculty, postdocs, and students. One idea (of many, many) that stands out is from Mark Morris, who is thinking about whether time-resolved imaging of the Galactic Center (where the stars move fast) could be used to beat the confusion limit. I am sure it can; this is a great project to think about! Another is from Brad Hansen, who points out that if we do find eclipsing planets with GALEX (as Schiminovich and I hope to), they ought to be around massive (rather than normal) white dwarfs. This is because the massive ones might be formed by mergers that could also bring in post-main-sequence planets.
The scariest thing about Hallowe'en this year was how little research I got done! Maryam Modjaz (NYU) gave a nice blackboard talk at lunch about the supernova zoo, and the hopes we have for indirect determination of stellar and binary progenitors. For prompt supernovae, local gas-phase metallicity could be extremely informative; however, you often have to wait years to get a good spectrum. In the afternoon I had a short conversation with NYU undergraduate Gregory Lemberskiy, who is looking at some issues in image model initialization and optimization.
Phil Marshall and I spent the day pair-coding a tiny Python package that fits a set of mixture-of-Gaussian models of increasing complexity to a PanSTARRS-produced catalog cutout image. It is insanely fast! (We did this outside of Lang's Tractor for reasons that are too annoying to explain here.) Our success means that in principle, any survey that is doing any kind of galaxy fitting can also, very easily, fit general non-parameteric models to everything: That is, we could have a variable-complexity description of every imaging patch. I love this just because I am a geek. Marshall loves this because maybe—just maybe—the mixture-of-Gaussians description of a patch of imaging will be close to a compact sufficient statistic for performing other kinds of fits or analyses. He has weak and strong gravitational lensing in mind, of course. More on this soon, I very much hope!
[In other news, Brian Cox thinks I am guilty of scientific misconduct. Fortunately, Sarah Kendrew (MPIA) doesn't.]
Marshall showed up for a couple of days of hacking. He appealed to my sense of irresponsibility to convince me not to work on much more pressing things, like letters of recommendation, mid-term exams, hallowe'en costumes, and other matters that don't constitute research (consult rules at right). We discovered that the PanSTARRS preliminary catalogs might not have the angular resolution (or, perhaps, deblending aggression) we would like in order to use them to find gravitational lenses. The experience will lead to feedback that will (we hope) improve the PanSTARRS software, but they remind me of the great opportunities that could be afforded if we had a probabilistic framework for producing catalogs from imaging. I should work on that!
It is Open Access Week and for that reason, SUNY Albany libraries held an afternoon-long event. I learned a lot at the brown-bag discussion about how open access policies could dramatically improve the abilities of librarians to serve their constituents, and dramatically improve the ability of universities to generate and transmit knowledge. The horror stories about copyright, DRM, and unfair IP practices were, well, horrific. In the afternoon I gave a seminar about the openness of our group at NYU, including this blog, our web-exposed SVN repo, and our free data and code policies (obeyed where we are permitted to obey them; see above). It was great, and a great reminder that librarians are currently—in many universities—the most radical intellectuals, with sharp critiques of the conflicts and interactions between institutions of higher learning and institutions of commerce.
On the train home, I tried out importance sampling for my
posterior PDF over catalogs project. Not a good idea! The prior is so very, very large.
In separate conversations with Hou and with Foreman-Mackey, I found myself discouraging each of them from looking into serious sparse methods for Gaussian processes. Both students are potentially matrix-inversion-limited (or determinant-computation-limited). We could definitely benefit from having matrix respresentations (even if it means approximations) that have sparse inverses (or analytic inverses). But going there is (at this point) a research project in sparse methods, not an astronomy project. So I am going to hold off until we really need it. That said, there is a huge literature now of very promising techniques.
Seminars filled my research time today. At lunch, Sergei Dubovsky (NYU) talked about conjectures for the fundamental action of strings in string theory. He gave a tiny bit of motivation for why 26 dimensions is the preferred dimensionality of spacetime in string theory. Apparently the 26 dimensions are expected to be 25 spacelike and one timelike, which seems odd to me, but causality is a bitch (as they say). Causality featured prominently in his talk, because he is experimenting with different causal structures for the dynamics on the worldsheet of the fundamental string.
In the afternoon, Michele Vallisneri (JPL) told us about detecting gravitational radiation with LISA-like missions. They face there the same issue that Lang and I have been talking about for astronomical imaging: It is no longer possible to think about there being the catalog of sources from the data. There will always be many qualitatively different explanations of the data stream, and always secure scientific conclusions will have to be based on marginalizations over that catalog-space probability distribution function. He said interesting things about the
hardware injections of false events into the LIGO data stream to stress-test the analyses and about the software engineering aspects of these huge projects.
Lang and I spent the weekend at the Googleplex for the Google Summer of Code Mentor Summit. We did some work on our source detection paper, and had a lot of conversations about open-source software. Things I learned of relevance to astronomers include: The Software Freedom Conservancy (and other organizations that are similar, like SPI) can provide a non-profit umbrella for your project, making it possible for you to raise money for your project as a non-profit organization. The commercial movie industry uses a lot of open-source computer vision and graphics (which is really physics) open-source software, even in blockbuster films (like Smurfs). The Climate Code Foundation is trying to do some of the things advocated in Weiner's white paper, but for climate science. The
semantic web dream of many an astronomer (though not me; I am suspicious of all meta data) has been realized in the music space, in open-source projects like MusicBrainz.
Highlights for me today included El Ghaoui (Berkeley) talking about text document clustering and searching, and Das (NASA Ames) talking about sparse matrix methods in Gaussian processes. In the former, I realized it is incredibly easy to make sparse models of authors' use of words; I have always wanted to launch a project to figure out, in multi-author papers, which authors wrote which paragraphs. If you want to do that project with my consulting, send me email! In the latter, I learned about some inferential techniques for finding sparse approximations to the inverses of sparse matrices. In general the inverse is not sparse if the matrix is sparse, but you can come up with sparse approximations to the matrix and inverse pair. I spoke today, about my comprehensive image modeling projects. I also flashed some of Fergus's results.
Day two (but day one for me) of the 2011 Conference on Intelligent Data Understanding at Mountain View was today. Many of the talks are about airport safety and global climate. I learned, for example, that there are far more forest fires going on in Canada and Russia than in past years, probably as a result of global climate change. Highlights of the day include comments by Basu (Google) about the issue that useful algorithms have to be simple so that they can be scaled up and run on the highly parallel cloud (MapReduce and all that), and comments by Djorgovski (Caltech) about needing to take humans out of the loop for transient follow-up. On both points, I agree completely. At the end of the day there was a panel discussion about data-driven science and interdisciplinary data science with me as one of the panelists. I opened my discussion about how we work successfully with applied mathematicians (like Goodman) by saying that I can't understand any paper that begins with the word "Let".
As my loyal reader knows, Fergus and I have been working on data from Oppenheimer's (AMNH) 1640 coronograph. Fergus's model is a data-driven, empirical model of the speckle pattern as a function of wavelength, informed by—but not fully determined by—the physical expectation that the pattern should grow (in an angular sense) with wavelength. Fergus's model is very simple but at the same time competitive with the official data pipeline. Nonetheless, we had to make a lot of decisions about what we can and can't hold fixed, and what we can and can't assume about the observations. We resolved many of these issues today in a long meeting with Oppenheimer and Douglas Brenner (AMNH).
Complications we discussed include the following: Sometimes in the observing program, guiding fails and the star slips off the coronograph stop and into the field. That definitely violates the assumptions of our model! The spectrograph is operating at Cassegrain on the Palomar 200-inch, so as the telescope tracks, the gravitational load on the instrument changes continuously. That says that we can't think of the optics as being rigid (at the wavelength level) over time. When the stars are observed at significant airmass, differential chromatic refraction makes it such that the star cannot be centered on the coronograph stop simultaneously at all wavelengths. The planet or companions to which we are sensitive are not primarily reflecting light from the host star; these are young planets and brown dwarfs that are emitting their own thermal energy; this has implications for our generative model.
One more general issue we discussed is the
obvious point made repeatedly in computer vision but rarely in astronomy that astronomical imaging (and spectroscopy too, actually) is a bilinear problem: There is an intensity field created by superposing many sources and an instrumental convolution made by superposing point-spread-function basis functions. The received image is the convolution of these two unknown functions; since convolution is linear, this makes the basic model bilinear—a product of two linear objects. The crazy thing is that any natural model of the data will have far more parameters than pixels, because the PSF and the scene both are (possibly) arbitrary functions of space and time. Astronomers deal with this by artificially reducing the number of free parameters (by, for example, restricting the number of basis functions or the freedom of the PSF to vary), but computer vision types like Fergus (and, famously, my late colleague Sam Roweis) aren't afraid of this situation. There is no problem in principle with having more parameters than data!
Only conversation today, making me worry about my ability to still actually do anything! Gabe Perez-Giz (NYU) and I spent an inordinate amount of time talking about how GPS works, and whether recent claims about systematic error in the OPERA neutrinos could be correct. We also talked about his work on classifying very odd test-particle orbits in the Kerr spacetime.
In the afternoon, Goodman, Hou, Foreman-Mackey, and I talked about Gaussian Processes, among other things. Hou's model of stellar variability is a very restricted (and physically motivated) GP, while Foreman-Mackey's Stripe 82 calibration model involves a (not physically motivated) GP for interpolation and error propagation. Goodman pointed us to the idea of the copula, which apparently destroyed the world recently.
Not much research got done today, but I did have a nice long chat about instrumentation and calibration with Nick Konidaris (Caltech), who is building the SED Machine, and attended a blackboard talk on relativistic turbulence by Andrew MacFadyen (NYU).
I spent the morning writing in Tsalmantza and my HMF paper. This is my top priority for October. In the afternoon, Feryal Ozel (Arizona) gave a great talk about getting precise neutron-star mass and radius information to constrain the nuclear equation of state at high densities. She is doing very nice data analysis with very nice data and can rule out huge classes of equation-of-state models. She also showed some nice results on neutron-star masses from other areas, and (after the talk) showed me some hierarchical inferences about neutron-star mass distribution functions as a function of binary companion type.
Maayane Soumagnac (UCL) visited for a few hours to discuss her projects on classification and inference in the Dark Energy Survey. She is using artificial neural networks, but wants to compete them or compare them with Bayesian methods that involve modeling the data probabilistically. I told her about what Fadely, Willman, and I are doing and perhaps she will start doing some of the same, but focused on photometric redshifts. The key idea is to make the galaxy type, luminosity, and redshift priors hierarchically; that is, to use the data on many galaxies to construct the best priors to use for each individual galaxy. Any such system makes photometric redshift predictions but also makes strong predictions or precise measurements of many other things, including galaxy metallicities, star-formation rates, and properties as a function of redshift and environment.
One of the things we discussed, which definitely requires a lot more research, is the idea of hybrid methods between supervised and unsupervised. Imagine that you have a training set, but the training set is incomplete, small, or unreliable. Then you want to generate your priors using a mixture of the training data and all the data. Hierarchical methods can be trained on all the data with no supervision—no training set—but they can also be trained with supervision, so hierarchical methods are the best (or simplest, anyway) places to look at hybrid training.
I got up exceedingly early in the morning, highly motivated to write a short theory paper—that's theory of data analysis, of course—about the posterior probability distribution over catalogs. I have become motivated to write something like this because I have started to become concerned about various bits of wrongness in my old paper with Turner on faint source photometry. The principal results or conclusions of the paper are not wrong, but the language is all wrong (the terms likelihood and measurement are used wrongly) and the priors are improper. I asked Phil Marshall what you do about a paper that is wrong; he said:
Write a new paper correcting it!
One of the key things that has to be fixed in the problem is that the explanation of an image in terms of a catalog is—or should be—properly probabilistic. That means that if the image is not high in signal to noise, there are necessarily many even qualitatively different catalogs that could explain the image. This means describing or sampling a posterior distribution over models with varying complexity (or number of parameters or number of sources). That's not a trivial problem, of course.
The nice thing, if I can do it all, is that the new paper will not just resolve the issues with Hogg & Turner, it will also generalize the result to include positional and confusion noise, all in one consistent framework. The key idea is that for any source population you care about (faint galaxies, say, or Milky-Way stars), it is very easy to write down a proper and informative prior over catalog space (because, as Marshall often reminds me, we can simulate imaging data and the catalogs they imply very accurately).
After giving a seminar at the Cavendish lab in Cambridge UK by videocon, I spent most of my research day talking with Fergus, discussing his model of residual speckle patterns in coronograph images from Oppenheimer's group. Fergus's model is extremely general, but has strong priors which "regularize" the solution when there isn't much data or the data aren't troublesome. Because it is so general, the model is in fact very simple to write down, has analytic derivatives, and can be optimized quickly by some (pretty clever) Matlab code. We decided that he should run on everything and then we should meet with Oppenheimer's group to decide what's next. I think we might be able to improve the sensitivity of coronographs generally with these models.
Roman Rafikov (Princeton) spent the day at NYU and we had long discussions about the state and future of exoplanets. This in addition to his nice talk on the subject. I realized that between the image modeling I am doing with Fergus, the transit discoveries with Schiminovich, and the MCMC work with Goodman and Hou, more than half my current work is related to exoplanets.
I don't think I can count a nice lunch with Christopher Stumm at Etsy as research. But I did meet up with Schiminovich afterwards to figure out how we are going to write the papers we need to write. I am so stoked to have the full list of time- and angle-tagged photons from the full observing history of GALEX.
The highlight of my research day was a long conversation with Hou and Goodman about stars, stellar oscillations (linear and non-linear), modeling those with Gaussian processes and the like, and next-generation Markov-Chain Monte Carlo methods. On the latter, the idea is to use an ensemble sampler (or a pair of them) to perform very high-quality proposals, for applications where posterior (likelihood or prior) calls are espensivo.
I didn't do much research today, but I did remind myself how to get a new Mac computer working. The answer: Fink. As much as I hate compiling everything from scratch (which fink does), the pre-built binaries never work on new hardware or operating systems, so I found that all shortcuts I wanted to take were useless. The key commands were:
sudo fink selfupdate
sudo fink update-all
sudo fink install numpy-py27
sudo fink install scipy-py27
sudo fink install matplotlib-py27
sudo fink install texlive
sudo fink install imagemagick
(All these run after installing Xcode from the App store.) Setting up a new computer is annoying and a headache, but actually, I think maybe it is research.
Stephen Feeney (UCL) gave a nice talk on finding bubble collisions by performing model fits and Bayesian hypothesis tests on the cosmic microwave background maps. He had to make some approximations to do the Bayes integrals. He concludes that the rate of bubble collisions is low (I guess we would be suspicious if he concluded otherwise), but because of resource limitations and decisions made early in the project, he and his team did not test for large-angle signatures, which are expected to be the most common. The good news is that such tests are going to be easy in the near future.
After a few minutes of conversation, Blanton demonstrated to Willman and I that even though the SDSS data are not designed to deal with crowded fields well, they can be used to measure the proper motions of halo globular clusters. Koposov, Rix, and I demonstrated that the proper motions can be used statistically when we looked at the GD-1 Stream, but then I left this subject behind, even though globular clusters are actually much easier to measure than the GD-1 Stream. This relates to my repetitive rant that there are many great measurements waiting to be made in existing data.
Fergus came by in the morning to discuss modeling speckles as a function of wavelength in coronography, and we spent a while counting parameters. As is usual in these discussions I have with vision people, there are more parameters than data points in the natural models we want to write down. So we either have to apply priors, or else simplify the model; we decided to do the latter. The odd thing (in my mind) is that simplifying the model (that is, reducing the number of free parameters) is actually equivalent to applying extremely strong priors. So the idea that one can "avoid using priors" by choosing a simpler model is straight-up wrong, no? That said, I am very happy with Fergus's beautiful model, which involves an extremely general description of how one might transform an image locally.
Foreman-Mackey and I started work on—and Goodman and I discussed—next-generation ensemble samplers, that update a pair of ensembles in a leapfrog fashion. I still haven't completely understood the rules, but it appears that as long as at every step we satisfy detailed balance, we are fine. If this is true, then I have an intuition that we can design a new sampler with incredible performance on non-trivial probability distribution functions.
Other than a few conversations, my only research today was to talk about Bovy, Rix, and my results on the chemical and position-space structure of the Milky-Way disk. We find that if you look at disk structure as a function of chemical abundances, you get very simple and nice results; it is less messy (for reasons we don't entirely understand) than looking at the chemical abundances as a function of position. Manuscript forthcoming soon; watch this space.
Maxim Lyutikov (Purdue) came through to give a nice seminar about fundamental electrodynamics around black holes. The big deal is that near the horizon, E gets a component along B and then plasma is produced easily out of the vacuum to short the parallel component and stabilize the B field. This is very nice!
I had many conversations with team members about various projects in progress; nothing specific to report, except maybe that Jagannath and I realized that we were semi-scooped but that the scooping paper has some serious errors in it. More on this when we get together our thinking and become confident that we are right in our criticisms.
I spent time today talking with Goodman more about his (with Weare) ensemble samplers. He points out that if you have two ensembles, you have a lot of freedom for using one ensemble to inform the proposal distribution for the other ensemble. This could permit non-trivial density modeling of the one ensemble to help sample efficiently the other ensemble. We discussed the possible implications of this for multi-modal probability distribution functions and I am optimistic that we could make a very efficient next-generation sampler. This is all about proposal distribution of course; there are lots of things people want samplers to do; we are concentrating on those things that make the autocorrelation time short or the number of independent samples per likelihood call high.
Fadely, Willman, and I realized today that our data-driven, hierarchical Bayesian discrete classification system for faint sources in current and future optical survey data is also a photometric redshift prediction system. We decided (tentatively) that we should write a paper on that. I also had conversations with Foreman-Mackey about Gaussian processes, and Ben Weaver (NYU) about SDSS-III BOSS redshift fitting.
I spent the day in Delaware, hanging out with John Gizis (Delaware) and talking about finding rare objects. He has found some of the coolest and nearest stars in the WISE data, by very straightforward methods. I think he published the first WISE paper after the WISE public data release; this reminds me of 1996 and the Hubble Deep Field! He has also found hotter stars with strange infrared properties. One of the many things we discussed is understanding the spatial distribution of stars with dust disks relative to the population of very young stars in the local neighborhood. I also gave a seminar.
In most of my research time today I talked with our new NSF postdoc Gabe Perez-Giz (NYU) about his plans for his first year here. These include working through a set of challenging and fundamental problems in numerical methods for computing gravitational radiation. Part of this plan is to produce a quantitative description of test-particle phase space (qualitative orbit properties) around Kerr (spinning) black holes. I think this is a great idea, but it involves a huge amount of literature review, synthesis, and computation.
At lunch, the CCPP brown-bag series was kicked off by Kleban (NYU) who told us about natural properties of the cosmological constant in M-theory. The idea is that one natural (or mathematically equivalent) way of thinking about the cosmological constant is as a high-dimensional analog of electromagnetism, with a vacuum field value. This gets all the stringy or M-y properties of the cosmological constant: Huge number of possible vacua, finite probabilities of transitioning to other (much worse) vacua, no non-trivial dynamics in the cosmological constant sector (except for vacuum-changing dynamics).
I spent the morning with Fergus working on fitting a dumb model to Ben Oppenheimer's (AMNH) coronographic imaging of possibly-planet-hosting stars. Oppenheimer's instrument is not just a coronograph but an integral field spectrograph, so there is a huge amount of information in the wavelength-dependence of the residual light from the primary star. Fergus and I worked on building a simple model of it, hoping to increase the sensitivity to faint planets.
At lunch, Beth Willman (Haverford, long-term visitor at NYU) made the case that part of the definition of a galaxy (yes, believe it or not, people are currently arguing about how to define a galaxy; see, eg, Willman 1) ought to involve the chemical abundances of the stars. This is a beautiful, simple, and convincing idea.
I asked the BOSS galaxy-clustering team about my concerns about measuring the baryon acoustic feature by first making a point estimate of the two-point correlation function of galaxies and then, from that two-point function estimate, inferring the baryon acoustic feature length scale. I got no sympathy. Maybe I am wrong, but I feel like if we are going to move the information from the data to the BAF, we need to write down a likelihood. The reason the BOSSes think I am wrong (and maybe I am) is because on large scales the density field is close to a Gaussian random field, and the two-point function is a sufficient statistic. But the reason I think they might be wrong is (a) the distribution of galaxies is a non-linear sampling of the density field, and (b) the two-point function might be sufficient to describe the density, but a point estimate of the two-point function is not the two-point function. Anyway, I retreated, and resolved to either drop it or do the math.
Marshall and I spent a phone call talking about how we can model strong gravitational lenses in the PanSTARRS data, given that the data (in the short term, anyway) will be in the form of catalogs. As both my loyal readers know, I am strongly against catalogs. What Marshall and I had independently realized—me for catalog source matching and Marshall for lens modeling—is that the best way to deal with a catalog is to treat it as a lossy compression of the data. That is, use the catalog to synthesize an approximation to the imaging from which it was generated, use the error analysis to infer a reasonable noise model, and then fit better or new models to those synthesized images. I love this idea, and it is very deep; indeed it may solve the combinatoric complexity that makes catalog matching impossible, as well as the lensing problems that Marshall and I are working on.
Foreman-Mackey and I went over to Fergus's office to discuss with Fergus and Andrew Flockhart (NYU) a few possible projects. One is to perform cosmic-ray identification in imaging without a CR-split. That is, to find the cosmic rays by modeling the data probabilistically rather than by comparing overlapping imaging. This could make HST snapshot surveys and SDSS data more productive at no additional cost, or just as productive at smaller observing intervals. Another project we discussed is one to model the speckle patterns in multi-wavelength coronograph images taken by Oppenheimer's crew. In the latter we talked about priors that could help given that the space of possible solutions is almost unimaginably large.
Prompted by some comments yesterday from Iain Murray on my incomprehensible post about PCA, I spent some of the morning looking into the different meanings of
factor analysis in the machine-learning literature. At least some of those are very close to Tsalmantza and my HMF method; I think this means some changes to the abstract, introduction, and method sections of our paper.
In the morning I worked on Tsalmantza and my paper on matrix factorization. Specifically, I worked on the initialization, which we perform with a PCA. But before we do the PCA, we do two things: The first is we "isotropize" the space from a measurement-noise point of view. That is, we re-scale the axes (which number in the thousands when running on SDSS spectra) so that the median measurement uncertainty (noise variance) is the same in every direction. The second is that we compute the mean, and then
project every spectrum into the subspace that is orthogonal to the mean spectrum. That is, we compute the principal variance directions that are orthogonal to the mean-spectrum direction. Then our K-spectrum initialization is based on the mean spectrum and the first K−1 variance eigenvectors orthogonal to the mean.
Whew! All that work put into the PCA, which is just to initialize our method. As my loyal reader knows, I think PCA is almost never the right thing to do. I really should finish the polemic I started to write about this a few years ago.
[Comments below by Murray make me realize that I am not being very clear here. In the case of our matrix factorization, we are trying to find K orthogonal eigenspectra that can be coadded to explain a large set of data. To initialize, we want to start with K orthogonal eigenspectra that include the mean spectrum. That's why we project and PCA.]
I worked on getting SDSS-III DR8 versions of my RC3 galaxies working. I was still finding problems until late in the day when Ben Weaver (NYU), our awesome SDSS-III data guru, stepped in and saved my day with a bug fix. The great thing about working on SDSS and SDSS-III is working with great people.
Fadely, Willman, and I have a star–galaxy separation method (hierarchical Bayesian discrete classifier) for LSST that can be trained on untagged data; that is, we do not need properly classified objects (any
truth table) to learn or optimize the parameters of our model. However, we do need a truth table to test whether our method is working. Not surprisingly, there are not a huge number of astronomical sources at faint magnitudes (think 24) where confident classification (that is, with spectroscopy or very wide-wavelength SED measurements) has been done. So our biggest problem is to find such a sample for paper zero (the method paper). One idea is to run on a bunch of data and just make the prediction, which is gutsy (so I like it), but really who will believe us that our method is good if we haven't run the test to the end?
A combination of illness, travel, and preparation for class prevented much research getting done in the last few days. One exception is that Fergus and I met to discuss modeling planet-obscuring speckle in coronograph images, where Fergus thinks he may have a breakthrough method that is simple and physically motivated. It seems promising, so I am about to ask the Oppenheimer team for some example data!
Marshall and I (nearly) finished writing the lens-equation-solving part of our plug-in to the Tractor. This reminded me of graduate school, where I spent a lot of time solving the lens equation. One of my first efforts along this line became my first arXiv submission (astro-ph/9311077), which I am proud to say was submitted when the wonderful, wonderful arXiv was in its second year of infancy (and wasn't called
arXiv). Today Marshall and I re-discovered that uniquely identifying—by root finding—all the multiple images of a lensed source is non-trivial, especially near cusp caustics! That also reminded me of graduate school, where I worked out and then abandoned dozens of semi-clever techniques.
Marshall and I continued working on our lens prior PDF code, and we also spent some time looking at candidate double-redshift (maybe lens?) objects found in SDSS spectra by Tsalmantza. But the big news for my day is that Foreman-Mackey's code for re-calibrating SDSS Stripe 82 data and finding variable stars scaled up very well; it looks like running it on a single (okay, 12-core) machine will take only a couple of weeks. I hope that is right! We also confirmed that all the imaging is spinning on NYU disks.
Marshall and I wrote a piece of code that draws sets of four point-source positions from a prior over lensing configurations, without explicitly solving the lens equation. This code penalizes configurations by their non-compactness on the source plane, transformed to image-plane units. It works: When we MCMC the prior with Foreman-Mackey's awesome ensemble sampler we get realistic-looking lenses, even though we never actually did any root-finding. The goal is to plug this prior into the Tractor when it is working on images that contain lenses.
As my reader knows, I have been working on a responsible and probabilistic use of the Jeans equations (which model velocity second moments) for inference. Today I re-built how I simultaneously model the density as a function of position and the velocity variance as a function of position to meet a few desiderata: (1) I want the model to be extremely flexible and data-driven, or non-parametric (meaning huge numbers of parameters). (2) I want the model to strictly enforce the Jeans equation given a parametric (or non-parametric) density or gravitational potential model. (3) I want the model to be parameterized so that I don't have to put harsh or ugly barriers in parameter space that will cause my optimizer to choke. (4) I want the parameterization to be relatively stable in the sense that I want small parameter changes to lead to small, smooth changes to the density and velocity moment models. I got all of this working with a very odd parameterization: I build a non-parametric model of the derivative with respect to position of the number density times the velocity second moment! This gets divided by a potential gradient to give the number density model, and it gets integrated and divided by the density model to get the velocity moment model. Crazy, but it works. I am sure there are much better solutions for my desiderata, but I found that I was much more willing to write code than go to the library!
I have a huge collection of RC3 galaxy images on the web; these appear in many talks, some even occasionally credited to me. Keep those cards and letters rolling in! Today I started to re-make them with the (better and bigger) DR8 data set, which should make the collection larger, if not far better. Blanton and Ben Weaver (NYU) helped me with some of the crucial technical details.
Phil Marshall showed up at MPIA today for a week. He wants to work on lensing. I told him I will work on lensing provided that it involves image modeling. He was puzzled, but agreed.
I spent all weekend madly coding up my likelihood or Bayesian moment modeling and full phase-space modeling for one-dimensional dynamical systems. I can't see any easy, simple ways to make the code fast so it is super-slow. I have a method for inferring the gravitational potential using the position distribution and second moments of the velocity distribution and then I have one using the full position-and-velocity two-dimensional distribution function. Although there is a lot of computation involved, the problem is technically easy because all one-dimensional potentials are integrable. That makes it a nice toy system for learning about inference with different noise models and different selection functions.
I spent the morning pair-coding a simple stellar phase-space distribution modeling code with Rix, and the afternoon solo-coding the likelihood functions for my one-dimensional dynamics code. In both cases, the attempt is to have the code benefit strongly from the things Python can do for us.
In the afternoon, I had a fruitful conversations with Coryn Bailer-Jones (MPIA) about the internals and outputs of the Gaia pipelines for stellar parameter estimation and source classification. For the former, we are discussing three-dimensional models of the Milky Way, and for the latter we are working out methods that make use of discrete models.
I also spoke with Glenn van de Ven (MPIA) about what's been done in one-dimensional dynamical modeling, and Bovy sent Rix and me some beautiful plots of the Milky Way structural parameters as a function of stellar chemical abundances.
At MPIA Galaxy Coffee, Foreman-Mackey spoke about his very robust calibration model for SDSS Stripe 82. The model is a mixture of variable and non-variable stars, and for each, the observations are treated as a mixture of good data and bad data. These mixture models are effectively marginalizations over classifications; they don't produce or require hard classifications of stars into variable and non-variable, nor measurements into good and bad. This way, they take the information they can get from everything, but learn the most from the best data. The calibration model looks great. On his RR Lyrae finding, it occurred to us that maybe we need to model the stochastic as well as the periodic variations if we are going to be as sensitive as possible: On long time scales, RR Lyrae lightcurves evolve.
Late in the day, at the schwimmbad, I got my orbit-fitting code to use proper Python classes, and have the potential models inherit functions from a generic potential class. That made the code much nicer and easier to extend and test.
Astrometry.net went beta today. If you want to try out the new site, check out nova, our new image-sharing site. Note that we don't guarantee data integrity, so don't use us as a data backup system! But I think you will like what our Google Summer of Code interns, Kevin Chen and Carlos Lalimarmo, have done. Also, of course, huge effort from Lang. I am infinitely pleased.
Foreman-Mackey finished demonstrating to me that his Stripe 82 calibration process is working correctly and giving precise zeropoints for even the non-photometric runs (or at least as precise as possible) in the multi-epoch data set. He also has one candidate RR Lyrae star at enormous distance (it would be the most distant Milky Way RRL star ever, if it is real).
Nao Suzuki gave a nice talk in Hennawi's group meeting about things you can do with empirical (read PCA, which should make me groan, but he is doing good things with them) models of quasar spectra. In particular he has a very nice model-free way to estimate (under some strong but fairly reasonable assumptions) the attenuation of Lyman-alpha photons as a function of redshift. He does not confirm multiple reports of a feature at redshift around 3.2.
Ross Fadely arrived in Heidelberg today for a short visit. We discussed how to demonstrate success in and set the scope for our paper on star–galaxy classification using hierarchical Bayesian methods with (known bad) template SEDs. We are going to start by visualizing a few of the correct and incorrect classifications and see why they went that way.
Joe Hennawi consulted with me about a problem he is working on with Suzuki and Prochaska, to estimate the mean attenuation by the Lyman-alpha-absorbing IGM in large samples of quasars. The idea, roughly, is to assume that there is reasonable or modelable variations among quasars intrinsically, and then model the cosmological IGM attenuation and the proximity effects directly with a probabilistic model. We figured out how to do it on averages (stacks) of spectra, and we have a rough idea of how to do it on non-coadded individual quasars, but the latter is harder. I learned something important: Though I am consistently against stacking your data, it has the great advantage of employing the central limit theorem to make your noise Gaussian!
Foreman-Mackey and I worked on his mixture model for photometric calibration of the heterogeneous runs that make up SDSS Stripe 82. The mixture model is a mixture of variable and non-variable stars, and for each of those, the observations are a mixture of good observations and bad observations; it is a mixture of mixtures, with a lot of degeneracies. But it seems to do a great job of getting the zeropoints, and he successfully recovers the known RR Lyrae stars in the Stripe. Can we find new ones? That is the question for this month.
There is a standard technique for estimating gravitational potentials called
Jeans modeling that uses the Jeans equations to relate the gravitational potential, the number density as a function of position, and the velocity dispersion. As my regular reader can imagine, I have many issues with it, some of them physical (what distributions are angle-mixed and integrable?) and some of them statistical (so you measure your data and then do arithmetic operations on those measurements?). But I spent my week of vacation (just ended) building a little sandbox for testing it out in one dimension and comparing it to better methods—methods that start from a likelihood function, or probability of getting the data given model parameters. I am sure the latter will win in every way, but I don't have my ducks in a row yet.
By the way, when I say
likelihood function I don't mean I am going to do maximum likelihood, I mean I am going to transmit information from the data to the parameters of interest via a probability calculation! Just a reminder for those who hear
maximum likelihood when all that is said is
I worked with Holmes and Foreman-Mackey on calibration projects. With Holmes, we are finishing his thesis chapter on survey strategies optimized for photometric self-calibration. His results set the strategy for the Euclid proposed deep field strategy. With Foreman-Mackey, we are trying to convince ourselves that our super-robust multi-exposure calibration model is doing a good job at setting the photometric zero-points for Stripe 82. In the late afternoon, Nicholas Martin (MPIA) gave a nice talk about work on M31 satellites by Crystal Brasseur (MPIA, deceased).
In the morning's Galaxy Coffee at MPIA, Rix talked about (among other things) Bovy's results on the structure of the Milky Way as a function of metallicity (from SEGUE data). Rix has the view that the results qualitatively confirm the view that vertical structure is set by radial mixing, where vertical action is sort-of conserved. Rix was followed by David Fisher (UMD) who spoke about bulges and pseudo-bulges. I challenged Fisher on whether his subjects of study were in fact
bulgy, since he finds them by fitting two-dimensional images of near-face-on galaxies. He admitted that many of them may not be; his interest is in the action at the centers of disks, whatever the three-dimensional morphology.
In the afternoon, Lang and I worked on testing our optimal detection ideas in Stripe 82 data. It looks like we can beat what astronomers think of as optimal coadds (signal-to-noise-squared-weighted co-add images) by nearly ten percent in signal-to-noise-squared. That is like a ten-percent discount on observing time or total cost, so I am pretty pleased. We crush normal methods like exposure-time-weighted, unweighted, and (gasp) stacks where all contributing images have been smoothed to the same PSF.
Today was Lang's last day in Heidelberg, and Foreman-Mackey's first. I spent a long time talking with Bovy and Rix about Bovy's results on the structure of the Milky Way as a function of chemical abundances, based on SEGUE data. I looked over Lang's shoulder as he worked on optimal image stacking and optimal object detection (without stacking). I made a few-day plan with Foreman-Mackey for his first few days.
David Martinez-Delgado (MPIA) gave a beautiful seminar today about his work obtaining extremely flat and senstitive images of nearby galaxies, showing that many have faint, tidal remnants of past accretion events. The data are beautiful, thanks to his exquisite flat-fielding. He and his team made most of the images with 10 or 20 cm (yes, cm) telescopes!
During the rest of the day, Lang and I worked on source detection in multi-exposure imaging; we are pretty sure we can beat all of the standard methods, but we don't have a clear demo yet.
In a very nice MPIA Galaxy Coffee presentation, Bovy noted that there are sixty thousand parameters in the XDQSO BOSS CORE quasar target selection, and three hundred thousand in the XDQSOz photometric redshift system. So we did some monster optimizations there! He showed how it all works and why. I finished the day paying scientific IOUs.
Hennawi and I spent part of the day pair-coding a simple toy code to infer the spectroscopic noise model given a few (extracted, one-dimensional) spectra of the same object taken with the same spectrograph. Hennawi wants a precise noise model for the SDSS spectrographs for Lyman-alpha forest clustering and IGM studies. We figured out a quasi-correct Gibbs sampling method, that draws posterior samples of the object's true spectrum, and given each of those, subsequently samples in noise-model parameter space. It seems to work! Now to run on real data, which contain a few additional complications (like non-trivial interpolation and non-Gaussian noise components like cosmic rays).
In the morning I had discussions with Gennaro (MPIA), Brandner (MPIA), and Stolte (Bonn) about their work on fitting the IMF in Milky Way young star cluster Westerlund 1. Dan Weisz and I were getting different results from them and I was interested in finding out why; there are significant methodological differences, but the same data. I think I see a path to resolving the discrepancies, but it depends on the method that Gennaro et al used to estimate their completeness as a function of position and magnitude.
In the afternoon, Lang and I worked on objects that don't match well between PHAT epochs. These are all likely to be either foreground stars with significant proper motions or else variable objects. By fitting out high-order distortions in the HST ACS camera, we can get few-milliarcsec per-source position uncertainties, so we are pretty sensitive even with PHAT's short internal time baseline.
I worked with Dan Weisz (Washington) on his IMF fitting code for much of the afternoon. We got it working, but we had to make some approximations with which we are not entirely happy. We started to get results on one Milky Way cluster with published masses, but we were surprised enough by our numbers to not trust them. We learned a lot in the pair-coding mode, though so I thoroughly enjoyed the coding session.
Lang and I stole some weekend time to get the Tractor to run on SDSS images but with strong priors on source locations. We are doing this for Myers, who has measured some quasar positions with the EVLA and wants to know what SDSS has to say about them, conditioned on the EVLA results. This is a baby step towards the problem: How do you analyze a big bag of heterogeneous imaging data at the angular resolution of the best data?
I spent much of the day working on toy examples for, and presentation for, my MPIA Hauskolloquium on classification. I argued that marginalized likelihood is the way to go (before applying your utility, that is, your long-term future discounted free cash flow model), but that for it to work well you need to learn the relevant priors from the data you are classifying. That is, if you are working at the bleeding edge (as you should be), the most informative data set you have (about, say, star–galaxy classification at 29.5 magnitude) is the data set you are using; if it isn't: Change data sets! Or another way to put it: Any labeled data you have to use for classification—or any priors you have—are based on much smaller, or much worse, data sets. So for my seminar I argued that you should just learn the priors hierarchically as you go. I demonstrated that this program all works extremely well, in realistic demos that involve fitting with wrong and incomplete models (as we do in the real world).
[This is post 1500. And this exercise of daily blogging still is a valuable (to me) part of my practice.]
How to measure precisely the position of a star in an HST ACS image? The imager is not well sampled, so you can't just interpolate a smooth PSF model to the sub-pixel offset that is relevant. You have to build a pixel-convolved model for every sub-pixel offset. That is annoying! After figuring out how unlikely it is that our current software (not written by us) is doing this correctly, we returned to the issue of the high-order polynomial terms. In the morning, Robert da Silva (UCSC) gave a very nice talk about the observational consequences of the fact that galaxies form stars in bursts and therefore provide stochastic illumination.
Chatted with Holmes about his thesis chapter about ubercalibration; chatted with Bovy about his results on the Milky-Way disk as observed by SEGUE; filed tickets against the new web version of Astrometry.net. Late in the day, Lang and I got onto fitting the polynomial distortion terms in the HST ACS instrument; the official distortion map only goes to fourth order (if I remember correctly) but we see what look like systematic effects at higher orders.
In an action-packed day, I had several conversations about software as instrumentation. The first was with Rob Simcoe (MIT), after his excellent talk about his FIRE spectrograph: I asked him what aspects of the design were driven by the desire to make data reduction simpler. He claimed
none, though I didn't entirely believe him. My point is that instruments should be designed to preserve or transmit information and we shouldn't be afraid to write complicated software. So making the scattered light low is good, because scattered light reduces signal-to-noise, but making the traces straight is bad, because that can be handled in software. Simcoe said that all significant costs were of the first kind and not the second. The second conversation about software was with J. D. Smith (Toledo) on the bus, who is responsible for some of the most important Spitzer-related projects. He, like me, would like to see changes in which there are ways to reward and recognize astronomers for the infrastructure they create.
The day also included work on binary black holes, M31 dust, and M31 proper motion.
In the morning I wrote code to implement Dalcanton's M31 dust model, both for making fake data and for performing likelihood calculations given model parameters. By the time I was done she had
moved on to a more sophisticated model! But I was pleased with my code; I think the simple model is strongly ruled out by the data.
In the afternoon, Hennawi, Tsalmantza, Maier, Myers, and I discussed using Tsalmantza and my HMF method to improve quasar redshift determination. Hennawi needs precise redshifts to do some sensitive IGM and proximity-effect measurements; when quasars don't have narrow lines in the visible, this is challenging. We are hoping to beat the current status quo. Gabriele Maier is making a training set for Tsalmantza and me to work with. Hennawi specified a very limited
paper 1 for us to be working towards.
I didn't do much research on Friday or the weekend, except for chats with Lang and Dalcanton about their respective technical issues. Dalcanton's is to infer the dust distribution in M31 from the colors and magnitudes of the red giants. It seems easy enough, but when you are doing dust, and you are letting it live in three dimensions, some nasty integrals appear; integrate numerically or use simple distributions with analytic results or approximations?
The answer to this question is
yes of course, but Lang and I spent a bit of time trying to find them, in the places where multi-epoch HST data overlap, for all the obvious astrometric reasons you might want to do that. The day also included a talk by Martin Elvis on mining asteroids (of all things!), and discussions with Tsalmantza and Adam Bolton about finding double-redshift objects in spectra.
In the morning we continued our discussions of multi-exposure imaging. I love the style of this computational imaging group: Work hard all day, but work equals sitting in the garden arguing! We particularly discussed what you could believe about a model made by forward-modeling through the PSF (that is, a deconvolution). My position is that because there are near-degeneracies in such modeling, you have to return a posterior probability distribution over deconvolved images (or probably a sampling of that); Fergus thought it might be possible to make an adaptive model complexity designed to maintain unimodality in the posterior PDF. Either way, representing the posterior PDF is not going to be trivial! We postponed all such issues to subsequent projects; we have a scope for a first paper that skirts such issues.
In the afternoon, Christopher Burger, Stefan Harmeling, and I discussed making probabilistic models of CCD bias, dark-current, flat, and read-noise frames, from a combination of zero, dark, flat, and science data. We decided to make some experiments with a laboratory CCD camera and, if they work, repeat them with archival HST data.
In the morning a group of us (Fergus, Schölkopf, Hirsch, Harmeling, myself) worked on the idea that Hirsch et al's lucky image deconvolution system could be used to combine and model any multi-exposure imaging. The system seems to work on pretty-much anything (as it should), although there are knobs to turn when it comes to engineering aspects (as in: batch vs online, how to initialize, multiplicative or additive update steps, parameterization of the image plane, etc). In the afternoon, Hirsch, Harmeling, and I specified the content of a publishable (in astronomy) paper making that point, with explicit demonstration of applicability to PanSTARRS, DES, and LSST. We spent a long time talking about what is the most assumption-free kind of model; Harmeling likes grids of pixels as models; I like grids of pixels plus floating Gaussians or delta-functions.
I also gave an informal seminar about work by Lang and me on Astrometry.net, web 2.0 stuff, the Open-Source Sky Survey, the Tractor, and other vapor-ware.
I am spending three days visiting the group of Bernhard Schölkopf in Tübingen to discuss possible overlaps between his work on computational photography and image processing and current problems in astrophysics. Wow, there is a lot of overlap! In particular, his group has solved a problem I am interested in: How to combine
lucky imaging data (fast images, variable PSF) to get information at the highest possible resolution and signal-to-noise. The current schemes usually throw out less-good data, rather than combine it optimally. Here in Tübingen, Hirsch et al have something close to an optimal system. It is described here.
Rob Fergus is also visiting at the same time. In the afternoon he gave a talk about his work on blind deconvolution of natural images, using the statistics of image gradients as a handle on the point-spread function. All of astronomy is a form of blind deconvolution: The instrument (plus atmosphere) convolves what we care about with an unknown point-spread function; we must simultaneously figure out that function and what we care about. This is an ill-posed problem we solve using heuristics (like
find the stars, estimate the PSF using them, then look at everything else assuming that PSF), but something that truly does simultaneous inference will win in the end.
Tomorrow is supposed to be a hack day!
Among many other things today, the currently-in-Heidelberg part of the PHAT team (namely Dalcanton, Weisz, Lang, Rix, and Hogg) discussed the first baby steps towards using the young stellar clusters imaged in multiple bands in M31 to measure the stellar initial mass function and its dependence on cluster mass and clustocentric radius. We discussed all the limitations of working at catalog level and also the benefits, and decided to start out with that, with all approaches that are more precise and accurate postponed until we understand all the imaging details better. Lang and I were assigned the task of looking at the information content of the catalogs with respect to radial and mass dependences; Weisz is working on the machinery used for fitting.
Late in the day, Bolton and I discussed spectrograph modeling. My position is that we might be able to vastly improve the accuracy of spectroscopic reductions and high-level science results if we build hierarchical models of the spectrographs, using all the available science and calibration data to infer the parameters. In this vision, the analysis of every exposure would benefit as much as it could from the existence of all the other exposures. We decided that it is too early to embark on this idea; indeed, Bolton was perhaps suspicious that it was even a good idea in the first place.
Today Schiminovich showed up for a day in Heidelberg; he, Lang, and I spent some time discussing the scope and timing of our paper on transiting white dwarf stars in the GALEX time stream. We made plans for the figures (they must be readable after output by a black-and-white printer) and for the zeroth draft (it must be ready by July 20). We also worked on the title and abstract. We tentatively decided to put the word
planet in the title, because that is the long-term goal of all this.
In the morning I spoke with the PanSTARRS crew meeting at Heidelberg about hierarchical methods for star–galaxy separation. I showed the demo I made yesterday, and results obtained with Fadely and Willman on real data. In the afternoon, I spoke with Rory Holmes about our ubercalibration (self-calibration) theory paper, which he is writing up as a paper and as a PhD thesis chapter.
I sat in on an all-day meeting in Heidelberg of the PanSTARRS Milky Way and Local Group structure working group. I learned a great deal, including that PanSTARRS is operating well and has tons of data. I spoke brashly about detecting sources at low signal-to-noise in multi-epoch imaging, and about star–galaxy separation, and I got assigned homework on both subjects. On the latter, I agreed to speak tomorrow, and I stayed up late preparing a quantitative toy demo to illustrate the principles behind hierarchical modeling.
Upon arrival in Heidelberg today, Rix gave me homework regarding inferring the stellar mass function in a cluster given a sample of measured masses. This is a Poisson sampling problem, and there are two somewhat different formulations: In the first, you consider the independent probability for each of the observed masses (given a model of the distribution), and then multiply the product of those by a Poisson probability for the total observed number. In the second, you consider infinitesimal bins in mass, and product together the Poisson probabilities for the (binary) occupation of each of the bins. Late in the day I wrote a document showing that these two approaches are mathematically identical, at least as far as the likelihood function is concerned.
These models are for independently drawn (iid) mass samples. We would like to break the iid assumption, but non-independent samples are more complicated. There aren't good general ways to think about dependent data, and yet there is a vociferous literature about whether stellar masses in clusters are iid. The literature is loud but doesn't contain anything that I consider a "model", which for me is a quantitative specification of the likelihood (the probability of the data given model parameters).
Foreman-Mackey and I decided at lunch that the density of his calibration grid for SDSS Stripe 82 and the radius out to which he selects stars around each grid point are both complexity parameters in our calibration model. The first sets the number of free parameters, and the second sets the smoothness. We worked out a semi-practical way to search this space, applying cross-validation to choose the complexity parameters. It is an enormous amount of data; F-M has re-photometered every point source in every run, engineering-grade or science-grade, whether the source was detected or not. That's a good fraction of a billion measurements.
In Jagannath and my project to fit dynamical models to streams in phase space, we have a simple problem, which is to take a diagonal covariance tensor for the noise in the observables (distances, angles, proper motions) and transform it, by the best first-order approximation, to a non-diagonal covariance tensor for the noise in the phase-space coordinates. This transformation is a bad idea because distance uncertainties plus the non-linear transformation take Gaussian noise in the observables to non-Gaussian noise in the phase-space coordinates. However, it is a very good idea because if we do this transformation and live with the inaccuracy it brings (it brings inaccuracy because we are treating the noise in phase space as Gaussian), our code becomes very fast! We are checking our math now (for the Nth time) and coding it up.
I spent most of the day at scicoder, Demetri Muna's workshop for astronomers who code. I spoke about the practice of building academic software systems—pair coding, functional testing, and using packages vs writing your own—and then went to lunch with a small group. On the
writing your own point, I said that it is a good idea to both write your own and use pre-built packages, because you learn so much by writing your own, and you get so much performance out of (well built) industrial-strength code (though you can use your own if performance isn't a problem). Partly my remarks are motivated by the point that academic programming is about learning, not just shipping. In the afternoon, Muna taught us about using R for browsing data and making plots. Tsalmantza and I wrote all of our heteroscedastic matrix factorization stuff in R, so I was already a believer, though Python is my one true love.
Foreman-Mackey showed me a beautiful periodogram (well, actually a generalization of the periodogram to more general periodic functions) for one of the SDSS Stripe 82 RR Lyrae stars. We are so close to being able to start a proper search for new stars. Lang and I worked on tasking Lalimarmo and Chen (our GSOC team) on critical tasks for taking Astrometry.net to beta on the web. Part of the issue is keeping the development cycle non-onerous but data-preserving.
Lang and I, at the request of Zeljko Ivezic (Washington) worked on carving out from the Tractor just the minimal code required to grab a single SDSS field, grab the relevant catalog entries and calibration data, and synthesize the image. This permits comparison of SDSS images with the implicit catalog model.
I spent a good chunk of the day figuring out how detection of isolated sources in astronomical imaging relates to the Bayesian evidence, and then decision theory. I am nearly there. The nice thing is that if you see it as a subject in decision theory, you can infer things about investigators' utility by looking at the likelihood cut at which they cut off their catalogs. In the short term, it is teaching me something about measure theory in practice.
Lang and I are big on pair coding, where we are either co-spatial or else have an audio channel open (usually Skype) and work together in an editor (usually emacs, and using unix screen we can both
be there when we are not co-spatial). Today we had the audio channel open, but Lang worked on getting our mixture-of-Gaussians PSF code working on the output of the SDSS K-L PSF model while I filed a final report for one of my grants. Not exactly pair-coding, but it is more fun than grinding out code and reports alone!
The most surprising thing I learned today was from Benitez about the JPAS survey, a Spain–Brazil collaboration to do 56-band imaging to get large-scale structure. It is an ambitious project, and designed for maximum efficiency. It is also funded; I will be interested to see it proceed.
Talks by Richards, Bernstein, and Willman all set me up nicely; they all said that source classification and characterization at low signal-to-noise is extremely important scientifically, very difficult, and essentially probabilistic. They all showed incredible things we could do with respect to quasar science, weak lensing, and Local Group discovery if we can classify things properly at the faint end. After this crew, Gray, Lupton, and I spoke about computational methods, with Gray concentrating on classes of problems and the algorithms to make them fast and effective, me concentrating on producing probabilistic outputs (that is, no more hard, final, static catalogs), and Lupton talking about how it worked in SDSS and how it could work better in LSST and HSC. Lupton's talk closed the meeting, and it was a pleasure!
One constant note throughout the meeting, and especially today, was that a lot of science and discovery was enabled by the exquisite photometric calibration of the SDSS. I am proud of my (admittedly very small) contributions to that effort and the enormous amount they have paid off in so many areas.
There was a lot of transient discussion today, with talks about PTF and LSST. In Quinby's PTF talk, he showed Nick Konidaris's SED Machine which looks like a prototype for fast transient response. On Monday Strauss noted that LSST will issue 100,000 alerts per night, so a lot of thinking has to go on about how to deal with that. Transient argumentation continued into the semi-organized post-talk discussion over beer. Walkowicz's talk had some nice meta-discussion about the different choices you make for rare events vs common events, and within that, for known types of events vs unknown types.
For me the highlight talk today was by Dave Monet, talking about the future of astrometry. He very rightly pointed out that if Gaia and LSST work as planned, it is not just that they will dominate all of astrometry, but more importantly that they will do just about as well as it is possible to do, ever. That is, you could only beat them with insanely expensive things like new kinds of launch vehicles. So the key point for an astrometrist is how to be doing things that make sense in this context. I agree! Also, in response to a comment by me, he endorsed my (strongly held) position that the point of astrometry is not to produce long lists of numbers (positions, parallaxes, and proper motions); the point of astrometry is to answer high-level science questions, and with Gaia-scale data no-one knows really how to do that at the limit of what's in-principle possible. One of the most interesting things Monet said is that there is no real point in working on the global astrometric solution right now; there are no critical science questions that depend on it, and Gaia will break you.
In a day in which Lang and I spent an inordinate amount of time formatting our Comet 17P/Holmes paper for error-free submission to the AJ (this was hilarious but not exactly enlightening scientifically), one bright spot was having lunch with Rob Fergus, his student Li Wan, and my student Dan Foreman-Mackey to discuss the first steps towards our probabilistic theory of everything. The short-term goal is to create a quantitative model of every pixel of digital or digitized astronomical imaging ever taken from any facility anywhere at any time in any bandpass. Once that's done, we will think about bigger projects. But seriously, we came up with some (still vague) first steps, one of which is for the computer scientists to read some astronomy papers, and for the astronomers to read some computer-science papers.