On the Memorial Day long weekend here, I outlined Lang and my next NSF proposal.
I wrote english prose and LaTeX equations like the wind on our brand-new exoplanet project. In particular, I spent some time working out the difference between hierarchical bayesian approaches to distribution estimation and deconvolution or forward-modeling approaches. There is a lot of overlap, but the key difference is that if all you care about is the distribution itself, in the Bayesian approaches you integrate out all of the individual measurements (which, in this context, should be thought of as fits to more "raw" data). That is, if you are deconvolving (forward modeling) you are trying to explain the individual-object fit results; if you are hierarchical Bayesian, you are trying to obliterate them. As I wrote text, Myers wrote code, and Lang (who came into town) worked on image modeling in preparation for the NIPS deadline.
Blanton, Guantun Zhu (NYU), and I discussed the possibility of writing a paper on the archetypes system we set up for PRIMUS. The idea of the paper would be to split the archetypes finding and optimization out of any PRIMUS data paper because it has much wider applicability. The idea is to model a distribution of d-dimensional data by a set of delta functions in the d-dimensional space, with the set chosen to be the minimal set that adequately represents every data point. The nice thing is you can choose whatever operation you want to decide what represents what, and it can handle any kind of crazy degeneracies, missing data, or marginalization over nuisance parameters (think calibration, or extinction). The hard thing is that the search for the minimal set of archetypes is hard (in the technical algorithmic sense of the term) but Roweis cast the problem for us as a binary programming task, which is incredibly well handled by any number of open-source and commercial packages. For PRIMUS we used the IBM CPLEX code, which was astoundingly fast.
I spent a big chunk of the day working on the Spitzer Oversight Committee, which is helping the Spitzer Science Center react to funding realities and the slow shut-down of an incredibly productive and successful but finite observatory mission. We spent some time in the meeting talking about Spitzer's capabilities for (and successes in) exoplanet science, and the possibility of encouraging that even more in the future. This post bends the rules but oddly I truly find the work I do on this committee to be of great intellectual interest.
You can't marginalize over a parameter in the likelihood without a prior because the units are wrong! The likelihood is the probability of the data given the model, and therefore has units of inverse data. If you want to marginalize out some nuisance parameter, you have to multiply the likelihood by a prior probability distribution for that parameter and then integrate. So, as I like to point out, only Bayesians can marginalize.
Adam Myers and I are using marginalization to get the likelihood for parameters of a distribution for a quantity (in this case exoplanet mass), marginalizing out every individual quantity (mass) estimate. You have to marginalize out the individual mass determinations because they are all terribly biased individually, and it is only the underlying or
uncertainty-deconvolved distribution that you really care about. More soon, especially if we succeed!
Adam Myers (UIUC) showed up today for a week of work; we spent (too little) time today planning what it is we are going to accomplish by Friday. At the same time, several of my students are writing papers and I am getting way behind on comments and feedback. In the afternoon there was a nice talk by Suvi Gezari (JHU) about shock breakout SNe and tidal disruption events discovered in GALEX repeat-visit data. I had seen these results before, but she shocked us (or me at least) by noting that there are essentially no plans for future surveys in the ultraviolet. I knew this but it really is crazy to place all of our (community's) bets on the infrared. That's so 2006!
I spent the day typesetting linear algebra for the paper with Tsalmantza and making sanity plots of the GALEX data with Schiminovich. Our photometry looks good; time to send it to some of our users for testing.
Maryam Modjaz (Berkeley) gave a great seminar about what you can learn about GRB and SN progenitors from studies of host environments. She showed one of the cleanest results I have ever seen about this: If you look at SNe type Ic-BL (broad-line), the ones associated with prompt GRBs are very much lower in local metallicity (oxygen-indicated) than those not associated with GRBs. The non-GRB SNe type Ic-BL are higher in metallicity and don't show evidence for afterglows at late times. This all shows that there are no orphan afterglows (all the low-metallicity SNe type Ic-BL have prompt GRB emission) and therefore that the GRBs cannot be highly beamed. Strange! Of course these SNe-associated GRBs tend to be lower in total luminosity from typical GRBs, and this might be relevant. How big a deal is it if we find that GRBs are not beamed?
Schiminovich and I have been talking a good talk for many years now, but today we actually started to draft a written description of what we are doing with the GALEX and SDSS data with the intention of writing a paper. We also thought of some instructive figures to make and some baby science results to include in the
data release paper.
In the morning I spoke with Tsalmantza about our spectral modeling project, which we are using to find gravitational lenses, black-hole binaries, and quasar redshifts. Well, we aren't, but we hope to be. We have a smoothing or softening parameter—a continuous model-complexity parameter—and Tsalmantza proposed a simple cross-validation test to set it. In the afternoon I tried to start writing words about the project.
I had the great pleasure of talking at Margaret Smith's OpenSci NY meeting today. I spoke about the benefits my group gets from its extreme openness—not just my blogging but our web-exposed code, paper, and proposal repository, our open-source software projects, and our citizen-science stuff. After me was Heather Joseph from SPARC, who talked about a huge range of interesting things, including the scale (8 billion dollars) of the science and medicine publishing industry, copyright agreement amendments for publishers, the Harvard faculty vote on open access, page charges and journal subscription costs, and a range of other crucial issues for our future. She also mentioned the point that in the open science future, papers have to be machine-readable as well as human-readable. It was a great morning.
Schiminovich came to NYU and we got closer to having a complete set of measured GALEX photometry for all SDSS point sources. We obtained these photometric measurements by performing aperture photometry; today we worked out a heuristic strategy for inflating the error bar when a measurement is likely being affected by another nearby source. The idea is that we want—for our science goals—to report a flux and uncertainty for every known source; we do not want to just cut out or mask or flag sources that might be bad or contaminated. Our method is very empirical and has no precise justification, but unless we think of something more clever (or go whole-Hogg and perform the photometry by simultaneously fitting all GALEX pixels with a self-consistent model of the sky), something heuristic like this will go into our planned data release.
In a day shortened by our (fun, very fun) graduation party, I worked on Myers's NASA proposal, which is effectively due tomorrow! We also had a meeting of all the students and postdocs who are working on SDSS-III BOSS spectroscopy to get everyone up to speed on downloading, plotting, and analyzing the spectra. In this meeting I learned that Ben Weaver (NYU) has duplicated a lot of our legacy IDL code in Python.
In the morning, Schiminovich and I tried (in a very short time) to write code to combine our multi-aperture aperture photometry of GALEX data by fitting it with simple models. As we do every time, we discussed the merits of various levels of image modeling, recognizing always that it would not be difficult or slow to do the Right Thing but then deciding to continue with our (justifiable) approximation nonetheless. This fast morning was followed by grant-proposal writing and document tweaking.
A two-day figure-making bonanza by Bovy completed the fitting-a-line document, about which my reader is all-too aware. We are mulling it over before we send it off to the arXiv, where it will live a quiet life, undisturbed by readers. If you want to see a pre-release copy, send me an email; I will gladly share it in exchange for a promise to send me comments. Of course it isn't light reading!
I brainstormed with Adam Myers (UIUC) about quasar–galaxy cross-correlation projects for future work together, possibly funded by NASA. As my loyal reader knows, cross-correlations are very powerful; much more powerful than auto-correlations. There is a proposal deadline next week; will we make it?
In the morning I finished yet another draft of our insanely long-winded fitting-a-line-to-data document. It is now in Lang's hands. Bovy is working on figures. In the afternoon, the graduate students in the Center for Cosmology and Particle Physics gave shotgun seminars about their research. These were great, and I hope we do this every semester from here on.