software engineering for science

I had lunch today with Demitri Muna (NYU) of scicoder fame. We discussed software engineering, code-writing environments, and the idea that Muna should write a book about what he knows. He has been teaching scicoder workshops at NYU for a couple of years now, and there is no good book for scientists who code.



Almost no research got done today, with the exception of being impressed by David Mykytyn (NYU), my new undergraduate researcher, picking up IDL and using it within an hour. Of course we hate IDL here at Camp Hogg, but sometimes we can't avoid it!



I spent my research time editing Holmes's nearly-ready paper on self-calibration of imaging surveys (what I have called "uber-calibration" previously). The results are so simple and so easy-to-use; I hope we have an influence on the next generation of surveys.


oscillation model bad?

Hou, Goodman, and I have a radial-velocity linear damped oscillator model to capture non-exoplanet near-sinusoidal variability in radial velocity surveys (especially for giant stars). Unfortunately, it doesn't seem to be a good fit to the radial-velocity jitter in our test-case real star. So now we have to see if among the data we have there is any nail on which we can use this hammer.


no "N" in "LINER"?

My only research today was attendance at a nice blackboard talk by Renbin Yan (NYU) on the origin of LINER emission in galaxies. He has kinematic evidence that it is produced by evolved stars rather than nuclear emission from an accreting black hole. If he is right—and I think he is—this will make some waves in the galaxy evolution world, where LINER ratios has been assumed to mean black hole.


cosmology, scattering

In the morning, Mustafa Amin (MIT) gave a short talk about consequences of potential shapes in inflation for the period between the end of inflation and the beginning of the normal physics of standard-model particles. In the afternoon, Joel Primack (Santa Cruz) gave the astrophysics seminar on the extragalactic background light. For the latter, some of the constraints come from the absorption of high-energy photons from blazars. I asked about scattering—when I was a boy I was told that whenever there is absorption there is always scattering, was that wrong?—but he implied in his answer that the scattering mechanisms are very different from the absorbing mechanisms. I am confused! If you do the classical calculation of a plane wave intercepted by finite absorbers, you have to get scattering too, just as a consequence of physical optics. Maybe my reader will un-confuse me.

Between talks, Jagannath and I discussed the implications of recent literature developments for our paper on fitting streams. I think we have a plan to re-scope it.


clustering likelihood; coronograph mixture

Sarah Bridle (UCL) and Marshall called me by Skype (tm) this morning (my time) to discuss Phil and my crazy ideas about inferring the mass distribution and the cosmological parameters from weak-lensing data. Instead, we got sidetracked onto clustering measurements, with me making the pitch that we might be sacrificing signal-to-noise on the baryon acoustic feature by fitting models at the point-estimated (pair counts in separation bins) two-point correlation functions. That is, no large-scale structure project ever writes down a likelihood function for the galaxy positions.

At lunch, Fergus showed me his new and improved models for Oppenheimer's coronograph data. He factorizes the data matrix into principal components, and then finds that he can't fit the data well near the companion (exoplanet) locations with the dominant principal components. Furthermore, he finds that if he fits the data as a mixture of (empirical, data-driven) speckle model for the residual star light plus (heuristic, theory-driven) point-source model for the companion light, he does a great job of separating the components and thereby photometering the companion. Beautiful stuff and music to my ears, methodologically.


HMF second draft

I got Tsalmantza and my HMF method paper up to the second-draft level. I was supposed to finish this back in October! It still needs a bit of work—Tsalmantza and I speak with very different voices—but it is extremely close. One thing the paper lacks is a clear demonstration that optimizing chi-squared is far preferable to optimizing an unscaled mean square error. I guess we always considered that obvious, but I now wonder if everyone else does.


beating confusion

I spent the day at the Spitzer Science Center as part of my Oversight-Committee duties. This is not research. However, on the airplane to the meeting, I started image simulations to explore an idea that came up while I was at UCLA: Can you use multi-epoch imaging to beat the naive confusion limit if your sources are moving fast? I am sure the answer is yes.


geodesic motion

Daniel Mortlock (Imperial) dropped in for the day and we spent some quality time talking about data. We don't have any project to work on, but we should! Mortlock holds the current high-redshift quasar redshift record. At lunch time, Gabe Perez-Giz (NYU) talked about computing geodesic orbits around Kerr black holes. He is a big believer in making use of symmetries, and there are additional computational symmetries if you look at closed orbits—orbits where the azimuthal, radial, and polar-angle frequencies are rationally related. If you can build a dense orbit library out of these orbits (and it appears you can), then you might be able to compute a whole bunch of stuff fast. Now show me the money!


random forests; supernova discovery

In the astro seminar today, Joey Richards (Berkeley), about whom I have been blogging all week, spoke about the methodologies and successes of the Bloom-led Center for Time-Domain Informatics team in automatically classifying time-variable objects in various imaging surveys. He concentrated on random forest (a combination of many decision trees, each of which is made by randomizing in various ways the training data) in part because it is extremely effective in these kinds of problems. He even claimed that it beat well-tuned support-vector machine implementations. I will have to sanity-check that with Schölkopf in Tübingen! Richards did a great job, in particular in explaining and responding to the principal disagreement we have, which is this: I argue that a generative model that can generate the raw pixels will always beat any black-box classifier, no matter how clever; Richards argues that you will never have a generative model that is accurate for all (or even most) real systems. It is this tension that made me invite him (along with Bloom and Long) to NYU this week.

After the seminar, Or Graur (Tel Aviv, AMNH) showed us how he can find supernovae lurking in SDSS spectra and pitched (very successfully) a test with SDSS-III BOSS data. We will get on that next week.



While Richards worked on faintifying bright Mira-variable light curves and censoring them in the manner of an insane robot (or astronomical imaging pipeline), Long and I worked on Python-ifying, and numpy-ifying some slow marginalized likelihood code. The issue is that our likelihood model has two nuisance parameters per data point (the true uncertainty variance and the true censoring flux value, considered poorly known and different for every datum) which we want to marginalize out inside the repeatedly called likelihood function. Lots of ways to do this slowly; few ways to do this fast. The goal is to have the skeleton of a paper by tomorrow afternoon!


blowing text

Over a long and hearty breakfast, I turned our equations for fitting lightcurves in the presence of carelessly censored data into a LaTeX document complete with lots of discussion. I handed it to Richards and Long.


measuring the undetectable

I have published one paper and have two more in the works under the subject measuring the undetectable. My evil plan is panning out; today Joey Richards, James Long (Berkeley), Dan Foreman-Mackey, and I all agreed that we should work together on a very nice and extremely practical project. Here it is:

Imagine you have a catalog of point-source fluxes, measured for a bunch of sources in a badly documented multi-epoch survey. Now imagine that you are looking at one source, which is variable, and it has been detected at some epochs and not at others. Imagine further that at the non-detect epochs, you are not provided with any information about the flux upper limits; all you know is that the source wasn't seen. How do you fit the stellar (lightcurve) properties of this source, using both the detections and the non-detections?

Without priors, this is impossible, I think, because you don't know whether the non-detections are non-detections because the data at those epochs was extremely bad or whether they are non-detections because the star at those epochs was very faint. But we figured out what you could do if you could hierarchically infer a prior over the detection threshold and over the noise properties of the detected sources. We started to write documents and code in the afternoon.


classification etc

Josh Bloom, James Long, and Joey Richards (Berkeley), all of Palomar Transient Factory and Center for Time-Domain Informatics fame, are visiting Camp Hogg this week. My not-so-secret plan is to come up with a jointly executed project, so that I can get a piece of this talented team for free. We discussed options, ranging over classification, prediction, decision making, and so on. We have a substantial difference of opinion on various matters—Camp Hogg is all about models that generate the raw data, while the CTDI likes data-driven models in feature space—but we are working towards very similar goals in our work on time-domain astrophysics. One shared goal is to measure time-domain properties of sources too faint to identify at any individual epoch; that might be our project.


intensity invariance

Only research today was a talk by Gwenael Giacinti (APC, Paris) about anisotropies in the local distribution of cosmic rays. It ended with a short argument (in the audience) about lensing of the intensity field, with various people confused and not confused about the meaning of the point that lensing conserves phase-space density of photons. The point is trivial in one sense, but in an equally important sense, it is not at all trivial, as evidenced by confusion among astrophysicists about what it can and can't mean. The argument inspired me to write something about it.


black-hole binaries, image formation

After getting back on the red-eye, I attended a nice seminar by Mike Eracleous (Penn State) about black-hole binaries. He is a bit more optimistic than I am about finding and confirming them, and about their usefulness. But we had near-simultaneous papers on the subject this year. In the afternoon, Fergus and I continued our debates about how to model speckles in the 1640 coronograph data. The issues are all about how to make the model so flexible that it can fit all the speckles, but so constrained that it does not at the same time want to fit out the companion (think exoplanet). Out most insane idea is to fit for the electric field in the camera! But that, I think, is hard.



I was at UCLA all day today to give a seminar. I had great conversations with many of the faculty, postdocs, and students. One idea (of many, many) that stands out is from Mark Morris, who is thinking about whether time-resolved imaging of the Galactic Center (where the stars move fast) could be used to beat the confusion limit. I am sure it can; this is a great project to think about! Another is from Brad Hansen, who points out that if we do find eclipsing planets with GALEX (as Schiminovich and I hope to), they ought to be around massive (rather than normal) white dwarfs. This is because the massive ones might be formed by mergers that could also bring in post-main-sequence planets.


goose egg

No research at all today. But I am not ashamed: You should have seen my office at office hours! (Big exam on Thursday.)