ephemeris, sampling

I fielded email today from various undergrads to whom I pitched (on Wednesday) a big project to build a precise probabilistic ephemeris for the Solar System. I am hoping to execute a huge project with a team of undergrads handling all the important parts.

At group meeting (before lunch), Iain Murray (Toronto, Edinburgh), who is one of our Bayesian consultants, gave a talk on sampling for inference, reviewing a few simple methods and then some hard pitfalls. A bit of an argument broke out about how you deal with well-separated maxima, with Roweis and me more on the side of "you never know if you have found the true maximum so you have to figure out how to do science without knowing that" and Scoccimarro and Blanton more on the side of "you have to keep working until you are sure".


history of cosmology

Jim Peebles (Princeton) visited today. Roweis and I spent an hour or so advertising to him our data analysis projects. Late in the day he gave a talk on the history of cosmology, which made a number of remarkable points, like that Hoyle was (almost) the first person to identify the cosmic microwave background (despite the fact that he thought it was impossible). An argument broke out about the importance of Friedman; Peebles takes the (tough) view that to be really important in cosmology you must have not just done theory but connected it directly to experiment, or not just done experiment but had the theorists realize that it was relevant. That's a high bar.



Zolotov and I worked on estimating eccentricities for stars in N-body simulations. This is not trivial—not even well defined—because there is no overall, unique potential (let alone a spherical one!).


sampling and Gaia

I spoke with Bovy about the possibility of doing an ambitious analysis of what is possible with Gaia. I spoke with Iain Murray (Edinburgh), who is visiting, about sampling, where he is one of the world's experts. I asked him to tell me what we can do if likelihood calls are outrageously expensive (so converged samplings of the posterior PDF are impossible).


out sick

I was on vacation on Friday and out sick today. That's not research.


grant proposal and infrared emission

Today was devoted to an NSF proposal, which, according to the rules, is not research. The only research activity of the day was a talk by Nadia Zakamska (IAS) about infrared emission lines and what they have to say about the physics of galaxies. She has some beautiful results, especially that the PAH emission and the H2 emission in the interstellar medium are differently affected by dust extinction. This is odd, because they are both strongly correlated with star formation and one another. So they are strongly correlated but not co-spatial. Odd.


Comet Holmes

Lang and I worked on (that is, pair coded) making figures for our somewhat strange Comet Holmes project. We started by re-running everything from scratch (find images on web, download, source extract, calibrate) and made this, one of several images of the individual-image footprints on the sky. In this image, the brighter footprints are the more constraining footprints; axis units are degrees on the sky from some reference point.



My only research time today was spent fielding email about fractal universes, emails that were inspired by my arXiv submission yesterday. It turns out—contrary to what I wrote in that note—that there have been some serious attempts to compute observables in inhomogeneous models. I think my conclusions are still safe, but my language might have been a bit strong.


Roger Blandford

I spent all day today at the meeting in honor of Roger Blandford, which I played an embarrassingly microscopic effort in organizing. The meeting was incredibly well attended, with many old friends in attendance from all over the world. There were talks across the whole electromagnetic spectrum and covering a large range of astrophysical processes, all of which I enjoyed. In particular, Maxim Lyutikov (Purdue) gave just about the perfect description of Blandford the advisor (Lyutikov and I are coeval). Chris Kochanek (OSU) described a brute-force generative modeling of microlensing light curves that made my Bayesian heart sing. I spoke about model selection in cosmology, in a general way; I put my remarks on the arXiv here.


modeling galaxies with galaxies

I spent a long time on the phone with Lang, who is building quantitative models of galaxy images using other galaxy images. We spoke about the free parameters, and the robustness of the fitting (given that there are some random superimposed stars and the like), and so on. He has some nice results already, and this has only been underway for a few days now. At the end of the day I gave a big public talk here in Buffalo.



I worked on two talks for Buffalo and one for Stanford today; I also gave one of the Buffalo talks.


averaging data

I spoke with Schiminovich about what we can achieve by taking first moments (means) of low signal-to-noise data. There is quite a bit you can do (I think), though we haven't worked out all the details. This all relates to our hope of constraining properties of quasars and the Universe with low signal-to-noise GALEX observations.


The WWW is a sky survey

One of the things I have been saying for a few years is that the astronomical images available on the Web, taken together, form some kind of very heterogeneous and odd sky survey. Could this be used to do science? Lang and I have an argument that it can: He Web-searched "comet holmes", took all images found that way, calibrated them with Astrometry.net (many didn't calibrate because, for example, they were images of cats or grandmothers), and fit a gravitational trajectory in the Solar System to the image locations. We started to write this up today.


done and done

Astrometry.net paper up on arXiv today; noise-information paper going up tomorrow. My research today was just finishing tweaks on the latter.


arXiv issues

I spent most of the day chatting, notably with Stumm, who is in NYC for a few days. Late in the day, Lang (attempted to) put the Astrometry.net paper up on the arXiv. This was a failure, and not for the usual reasons: None of the figures got flagged as too big!

The paper compiles without complaint by "pdflatex astrometry-dot-net" on any unix (mac or Linux) platform we have been able to try. And yet on arXiv, the figures can't be understood! (It can't determine size or bounding box or format or even see the file in some cases.)

The arXiv system has a number of issues, not the least of which is that it doesn't just run vanilla pdflatex, ever. It runs in some strange box in which various pdflatex options have been changed from their default values. This would be fine if the arXiv system exposed its pdflatex or latex configuration. But it doesn't. Even better, since all the processing is done by robots, a "sandbox" robot could be established for people to test uploads before submission, greatly reducing the time wasted on this, not to mention the stress; documents and tarballs could be tested as they are being written and not "by fire" right at posting time.

Indeed, the inscrutable arXiv robot will reject a submission on any number of grounds, many of which are mentioned in the arXiv help pages, but few of which are described in enough detail for a user to reliably avoid them. For example, the figure size constraints (which were not our problem today) are never stated explicitly in the help pages; the help pages (like this one) only say that figures "should" be made "small" because that is more "efficient"!

As my loyal reader knows, I love the arXiv; it has transformed astrophysics and all of the sciences. Now lets just make it easier to use! Note that anything that makes it easier to use also makes it easier to maintain and run. (Think of all the emails and blog posts that could be saved!)


done and ExxonMobil

Lang and I finished the primary Astrometry.net paper for submission! We will put it on the arXiv tomorrow. I am extremely excited to see it go out. Congratulations to Lang and the team.

At the end of the day the Physics Colloquium was by Halsey from ExxonMobil on carbon sequestration. It is amazing that this is on the table, because it is so expensive, it would cost as much to sequester the carbon as it currently costs to produce the oil; that is, it would double the price of oil. The talk reinforced the point that there are no real technological solutions; if we are going to reverse global warming, carbon emissions, and pollution there have to be social, cultural and political changes. Technology can only play a small role.


frustrating day

I just failed (though only just) to complete the noise-information paper (with Price-Whelan) for submission today. And then I also failed to complete the Astrometry.net paper as well! But both papers are close, so by Friday, if all goes well.


first draft and last details

I finished the first draft of my contribution to the Blandford meeting at the end of the day today.

At the beginning of the day, Price-Whelan and I worked on last details for his paper on the information in astronomical images.


unconverged chains

I spent the day talking to anyone who would listen about unconverged MCMC (or equivalent) chains. The issue is a big one, and I have many thoughts, all unorganized. But basically, the point is that when likelihood calls take a long time (think weeks), then there is no way in hell we will ever have a converged and dense sampling of the posterior probability distribution for any parameter space, let alone a large one. At the IPMU meeting last week, most practitioners thought it was impossible to work without a converged chain, but I noted that we never have a converged chain in the larger space of all possible models; whenever we have a converged chain it is just in some extremely limited and constrained subspace (for example the 11-dimensional space of CDM or the like; this is a tiny subspace of all the possible cosmological model spaces). The fact that we don't have a converged chain on all the possible models and all the possible parameters conceivable does not prevent us from doing science. This has connections to the multi-armed bandit problem. I also have been thinking about Rob Fergus's 80 million tiny images project, which treats the result of a huge set of Google (tm) searches as a sampling of the space of all natural images. Of course this is not a converged or dense sampling! But nonetheless, science (and engineering) can be done, very successfully.



I spent the flight home working on my contribution for the Blandford fest. The SFoA meeting—well, really the side conversations and coffee talk—definitely helped me sharpen up some of the issues.


SFoA, day 4

Today was exoplanet day at the SFoA meeting, which meant I learned the most. Among other things, Winn (MIT) told us that he can measure the alignments of planetary orbits with stellar rotation and that planet transits more-or-less directly tell you the surface gravity on the planet. Turner (Princeton) spoke about strong biases in fitting planets to radial velocity curves, which emerge from the nonlinearity of the fitting. Laredo (Cornell) showed a system for optimizing or guiding future radial velocity measurements given measurements in the past, where he is optimizing for information gained about the orbit. As he points out, you can optimize for many things there; it is a multi-armed bandit kind of problem. One thing that surprised me is that he takes the convergence of his MCMC chains very seriously, which is good in general, but does not seem necessary to me in order to perform these experimental design activities. After all, your utility will always be approximate, why spend millions of hours of CPU time to fill out in enormous detail predictions of future trajectories that will only be used to approximately calculate your utility? But he is certainly thinking about the problems the right way.

Several mentioned that although orbit fitting is now being performed in quasi-optimal ways, the original data reduction (going from spectral pixels to radial velocity measurements with errors) is not. Turner opined that if this were done better, more planets would be discovered, because there are many detections close to the current limits. That's a big—but very important—problem.