you can't have resolution and signal-to-noise

Federica Bianco and BJ Fulton (LCOGT) pointed out that in our test of lucky imaging pipelines, our blind deconvolution runs—which try to find the high resolution scene which explains all of the data—fail to detect a faint star which is clearly there in the traditional lucky imaging stack. Fundamentally the issue is that you can't deconvolve and also maintain high signal-to-noise. We really should reconvolve our deconvolved scenes as we go. This is a big issue for those of us trying to get away from simple co-adding of images; co-adding is very easy to understand, and preserves some aspects of signal-to-noise (though not all, because of its treat-all-data-the-same properties).

In the morning, I re-worked my forward-going Sloan Atlas schedule. Writing a book is hard, at least for someone as easily distracted as me.


galaxy and star images, job ad

I met with Ekta Patel to discuss progress on sample selection and context information for the Sloan Atlas of Galaxies. I am past due on various deadlines on that project! I met with Kilian Walsh (NYU) who is going to do some trial projects this summer with me, relating to Astrometry.net and automated astrophysics data analysis in input data. Our hope is to facilitate making discoveries in data input to the system on the Web. Schiminovich came down for a few hours and found that WISE photometry does a good job of ruling out the possibility that any of our GALEX-discovered white-dwarf transits is caused by a planet (rather than a low-mass star). The jury is still out, I hope, but we also have much more to discover. Foreman-Mackey and I ran on the challenge that Federica Bianco gave us, but we didn't find the elusive third star that we are supposed to find. But then I started to get suspicious that it even exists.

In related news, a group that includes Rob Fergus and me has funding to hire a postdoc. The successful applicant will have experience in advanced probabilistic inference and also sophisticated astronomical image modeling, calibration, or measurement. If that sounds like you and you are interested in doing more of that kind of stuff, send me email.



I commented on Foreman-Mackey's document describing his new, faster calibration model for SDSS Stripe 82. I also worked on my probabilistic reasoning tutorial document, working in comments from Dan Weisz.


PSF models

In the morning, Fergus and I discussed the exoplanet imaging project we are doing, and funding opportunities related to it. The project can be seen as an attempt to make an extremely precise model of a point-spread function in a very non-trivial optical system. We also discussed relationships between computer vision and astronomy, in the context of things I learned at AISTATS.

In the afternoon, Federica Bianco (LCOGT) showed me some lucky imaging data where she finds that the Hirsch et al. online blind deconvolution method does not give good results, whereas old-school lucky imaging works okay. In a deconvolution problem, or an image modeling problem, you can either put the structure you see in the image into the scene or into the PSF. Where the structure goes is explicitly degenerate, although some choices might have much higher marginalized likelihood under sensible priors, or much lower cost under properly regularized cost functions. The challenge is setting those priors or cost functions. (And sampling or optimizing them!) Anyway, Bianco challenged Foreman-Mackey and me to beat her lucky imaging stack with a deconvolution run. We accepted.



I hacked away on the blind-deconvolution formulation of Lucky Imaging. I have reimplemented the online blind deconvolution method of Hirsch et al but added regularization and also taking multiple passes through the data (thus getting a better solution but breaking the online-iness). The code is now a very elegant core surrounded by lots of hacky, heuristic code. Not happy about that, but it gets the job done.


images: philosophy and modeling

On the plane home, I started writing a paper about our lucky imaging stuff, with a combination of computer scientists and astronomers in mind. This led me to write quite a bit about how we think about images: They are noisy samples of a PSF-convolved intensity field. The PSF should be the pixel-convolved PSF at the focal plane. That's important! Well-sampled images are easier to use than ill-sampled images. Telescopes can never measure the unconvolved intensity field; that's an object that doesn't really exist in empirical science. And so on. All this kind of discussion sounds very philosophical, but it has important implications for how the math works, what we present as our output, and the speed of our computations.


AISTATS days one and two

[Posting has been slow because internet at AISTATS was nearly nonexistent.] At AISTATS days one and two I learned a great deal. There were interesting contributions on automatically adapting MCMC proposal distributions, on computing marginalized likelihoods using densities of states, on model-selection criteria like AIC and so on, and on topological structures in data sets.

In addition I spent some time talking to people about stochastic gradient, in which you optimize a function by taking random data subsamples (think, perhaps, images coming from the telescope) and for each one compute a gradient in the objective function and take a small steps towards the best fit. These are methods for optimizing models on enormous data sets. It turns out that there is a lot of freedom (and a lot of heuristic ideas) about how to set the step sizes. In my nascent lucky imaging project, I am finding that I can get very different results and performance by choosing very different step sizes.

Various posters and talks at the meeting reminded me that there are versions of sparse coding that are convex. That is, you can do matrix factorizations like Tsalmantza and my HMF method, but with additional constraints that try to set most of the coefficients to zero, and you can do that wihout breaking the convexity of the coefficient-fitting step (the a-step in our paper on the subject). This could be very valuable for our next-generation projects.


AISTATS tutorial day

On day zero of AISTATS, I gave a workshop on machine learning in astronomy, concentrating on the ideas of (a) trusting unreliable data and (b) the necessity of having a likelihood, or probability of the data given the model, making use of a good noise model. Before me, Zoubin Ghahramani gave a very valuable overview of Bayesian non-parametric methods. He emphasized something that was implied to me by Brendon Brewer's success on my MCMC High Society challenge and mentioned by Rob Fergus when we last talked about image modeling, but which has rarely been explored in astronomy: If you want to have the flexibility to discover correct structure in your data, you may have to adopt methods that permit variable model complexity. The issues are two-fold: For one, a sampler or an optimizer can easily get stuck in a bad local spot if it doesn't have the freedom to branch more model complexity somewhere else and then later delete the structure that is getting it stuck. For another, if you try to model an image that really does have five stars in it with a model containing only four stars, you are requiring that you will do a bad job! Bayesian non-parametrics is this kind of argument on speed, with all sorts of processes named after different kinds of restaurants. But just working with the simple dictionary of stars and galaxies, we could benefit from the sampling ideas at least.

In the evening Bernhard Schölkopf drove some people from his group and me up to the observatories at the summit on La Palma, where we did some observing with amateur equipment. His student Edgar Klenske is working on using a Gaussian Process to do far better real-time control of inexpensive (think unreliable) telescope tracking hardware. If they succeed, they could have a big impact not just for unreliable hardware but also reliable hardware.


Lang and Hogg re Lupton

In preparation for writing our first paper on the Tractor, Lang and I discussed the relationship between the Tractor and the things that have come before, like SDSS Photo (Robert Lupton's baby) and SExtractor (Emmanuel Bertin's). We need to get right the ways in which Photo is and isn't an approximation to a likelihood maximization. For example, in fitting galaxy shapes, it is very close to doing maximum likelihood. However, it doesn't set the centroids of the objects or do aperture photometry that way. Of course, the Tractor can do sampling—that is, it can go way beyond optimization—but we don't want to be unfair to the precedents. We also want to slip in lots of the good ideas we gathered from Photo, including profile truncation and sampling considerations.

In the morning, Fergus and I spent some quality time with Andrew Flockhart (NYU), who is doing cosmic-ray classification in single-epoch imaging using nearest-neighbor (and maybe soon more insane methods) given a good training set derived from HST data. We decided to fully understand the system by making lots of visualizations of all the things it is doing; this is all related to my ideas about functional testing: You don't find bugs by looking at your code; you find them by producing results and comparing with expectations.


Atlas re-planning

I spent time today re-organizing the tasks required for completion of the Sloan Atlas of Galaxies. I am falling behind (of course) and have to re-schedule some parts of the job. I also spent some time working on organization of the content and (future) captioning of the plates.


making inference faster, K-giants

Here at Camp Hogg we aim to provide a satisfying, high performance data-analysis experience. Today Andreas Berlind (Vanderbilt) swung by and told us about his need to sample a halo-occupation posterior PDF with only ten-ish parameters but a likelihood function that takes ten minutes to execute (because it ambitiously uses so much data). No problem! That's the kind of problem that emcee was built for. Foreman-Mackey and I described the algorithm to Berlind and gave him the task of providing a callable posterior PDF (or really likelihood times prior PDF) function.

In the afternoon, Foreman-Mackey explained how he made his Stripe 82 calibration model far faster and more robust, by switching to a model that is an approximation to what we want but has fast, analytic derivatives. Like with the censored data last week, an approximation to what you want that is much faster can be better in the long run. Also, I am not convinced that what we originally wrote down for our Stripe 82 recalibration isn't an approximation to Foreman-Mackey's new formulation; that is, this might be a conceptual improvement as well as a performance improvement.

In the morning, Hou nearly convinced Goodman and me that he has found a non-exoplanet-caused 17-day period in some K-giant radial velocity data. It isn't an exoplanet because it shows phase and amplitude variations with time. We are currently calling it a damped oscillator but I am wondering what it is going to turn out to be, really. Can K-giants have oscillations at such long periods? I wish I knew something about stars.



Well, we failed. We don't have a complete paper about fitting data with arbitrarily censored data and arbitrarily unreliable noise and censoring information. The many changes we made yesterday led to constant re-starting of the runs, and there are many unresolved issues in our use of black-box function integrators and optimization without analytic derivatives. So we aren't done. Ah well, we tried. It has been a great week; I really love irresponsibly ignoring all my real job requirements and hacking. Bloom pointed out that it probably isn't optimal for my LTFDFCF. (I guess I don't obey my own rules.)

In the morning I gave my seminar at the Center for Time Domain Informatics. The audience was mixed and I argued (to the choir, I think) that astronomical catalogs are very limited and we have to explore—as a community—probabilistic replacements. Currently I think it has to be the raw data instrumented with hypothesis-testing software (that is, an executable likelihood function), and (at the present day) this would mean living in the cloud, for practical reasons. But I talked more about promise than implementation. After my talk, Lisa Randall (Harvard) suggested that there might be some relationships between these ideas and next-generation ideas for inference with data from the LHC and other particle experiments.


hacking all day

Spent the whole day hacking, with Long and Richards and (returned from DC) Bloom (Berkeley). We had the standard experience that when you try to apply your good ideas to real data, you make many improvements. That is, real data are a much better testing environment than fake data! We really want to have everything run (that is, have the model optimized on every one of the lightcurves) by tomorrow. Also Brendan Brewer (UCSB) showed up and we ran into Lisa Randall (Harvard) at lunch.


generative model, slowness

Yesterday Long coded up the insane-robot censored-data model, and today we reviewed and tested the code. It is super-slow, because it contains a double (or triple, depending on how you look at it) marginalization over nuisance parameters, as many nuisance parameters as data points (more, really). In the afternoon Richards refactored the code and I made some fake data by the same generative model as that generating our likelihood. That's the great thing about a generative model: It gives you the likelihood and a synthetic data procedure all at once. At the end of the day, Long figured out that we can make some different distribution choices and make at least some of the integrals analytic. Better to use an approximate, computable likelihood than a better, incomputable one! We have to get on the good foot because there are only two days of hacking left.


censored-data graphical model

Long and I spent part of the morning arguing about whether our model for the insane robot astronomical censored catalog data stream is mathematically consistent. We decided it is (and then I went of for an awesome afternoon at the Exploratorium). It helped that we made this graphical model (typeset in LaTeX by Foreman-Mackey):


insane robot model

Imagine you have an insane robot, randomly taking images of the sky. Every so often it takes an image that includes in its field of view a star in which you are interested. When it takes this image, it processes the image with some software you don't get to see, and it returns a magnitude for the star and an uncertainty estimate or else no measurement at all and no uncertainty information at all (because, presumably, it didn't detect the star, but you don't really know). I am in Berkeley this week to answer the following question: Do the null values—the times at which the robot gives you no output at all—provide useful information about the lightcurve of the star?

To make life more interesting, we are assuming that we don't believe the uncertainty estimates reported by the robot, and that the robot returns no information at all about upper limits or detection limits or the data processing, ever. That is, you don't know anything about the non-detections. You can't even assume that there are other stars detected in the images that are useful for saying anything about them.

Of course the answer to this question is yes, as Joey Richards (Berkeley), James Long (Berkeley), and I are demonstrating this week. As long as you don't think that the robot's insane software is varying in sync with properties of your star, the nulls (non-detections) are very informative. We even have examples of badly observed variable stars, for which the use of the nulls is necessary to get a proper period determination.

The crazy thing is, in astronomy, there are many data sets that have this insane-robot property: There are lots of catalogs created by methods that no-one precisely knows, that are missing objects for reasons no-one precisely knows. And yet they are useful. We are hoping to make them much more useful.


Geha and lucky

Marla Geha (Yale) gave the astrophysics seminar, on the star-formation properties of low-luminosity galaxies. She (with Blanton, Tinker, Yan at NYU) finds an incredibly strong environmental effect: Low-luminosity galaxies that are far from any massive parent galaxy are all star-forming, whereas those more nearby are in a mix of star-forming and non-star-forming. It is not just a trend: There really are no non-star-forming, isolated, low-luminosity galaxies. This makes these galaxies very valuable tools for cosmlogy and galaxy formation, as Geha noted and the crowd discussed. The seminar was just what I love: Lots of constructive and interesting interruptions and discussion, mainly because the results are so interesting and valuable; this really is a significant discovery. It is the strongest (in some sense) known environmental effect for galaxies, and it immediately rules out many simplistic ideas about low-mass galaxies, like that their star formation histories will be shut off by supernova feedback.

On the airplane to visit Bloom's CTDI at Berkeley, I worked more on our side project on Lucky Imaging. I got the code working (finally) and then handed it off to Foreman-Mackey. I have to get back to the things I am supposed to be doing! (Although one of the things that I love about my job is that procrastination like this is valuable in the long term.)


not much

I didn't get much work done today except some preparation for my week (next week) at the CTDI at Berkeley. I also spend a nice chunk of time discussing induction and inference with Michael Strevens (NYU Philosophy). It turns out that we can make use of each others' expertise.


lucky sprint

Foreman-Mackey and I (very irresponsibly) did a monster code sprint today on lucky imaging. We got a piece of code working, doing online stochastic gradient like in the Hirsch et al (2011) paper I cited two days ago. We are running on data provided by Federica Bianco (UCSB). However, we have not applied various useful and important regularizations (like centering the PSFs, making the PSFs non-negative, and so on) so currently the code wanders off to fuzzy model space. More soon!


mixture of Gaussians

I started writing a short note on using mixtures of Gaussians to improve two-dimensional image fitting. I realized in writing it that our expansions of the exponential and de Vaucouleurs profiles in terms of concentric Gaussians have an amusing property: They provide (automatically) their own three-dimensional deprojections, in the optically thin limit. That is, a set of concentric, two-dimensional, isotropic Gaussians that make a deVaucouleurs profile (and we have that set) can be the projection of a very simple-to-compute set of concentric, three-dimensional, isotropic Gaussians. That could be amusing. I should also look at fitting mixtures of Gaussians to various three-dimensional profiles too. This reminds me of my (dormant) project (with Jon Barron) to build a three-dimensional model of all Galaxies. But of course the main point of all this is to make image fitting code fast and numerically stable.


taking the lucky out of lucky imaging

Federica Bianco (UCSB) stopped by for a few hours; she will join us at the CCPP next year as a fellowship postdoc. We discussed lucky imaging, which she has been doing with the LCOGT project. I bragged to her that I think I can beat any standard lucky imaging pipeline, and now I have to put up or shut up. I was basing my claims on this obscure paper which is packed with awesome, game-changing ideas. The basic idea is that if you see each image as putting a constraint on a PSF-convolved scene, every image, no matter how bad its PSF, contributes information to the high-resolution image you seek. The cool thing is that if you have enough data, the final reconstructed scene will have better angular resolution than even your best single image.