New Directions in Modern Cosmology, day 4

Highlights for me today were talks by Lewis (Sussex) about lensing in the CMB and Joyce (Paris) about the scale-invariance of gravity and numerical simulations. Lewis showed some beautiful results with WMAP data, in which they can show that the statistics of the CMB are distorted by the slightly anisotropic beam of the satellite combined with the slightly anisotropic scan pattern. This was in the context of lensing, because he finds it using the same techniques that will be used to find lensing-induced distortions to the Planck map.

Joyce showed some beautiful results on gravity, including exceedingly precise simulations of one-dimensional (yes, one-dimensional) gravitational collapse, and analytic description of the behavior of n-body simulations. He is trying to understand whether the departures from scale-invariance seen in n-body simulations (the correlation function is not a power law in the dark sector, while the observed correlation function of galaxies is very close to a power law) could be somehow related to the difference between gravity and the simulations thereof. He doesn't have conclusive results, but his models of the models are impressive.


New Directions in Modern Cosmology, day 3

Another good day in Leiden. One highlight was work by Kunz (Geneva), who showed simple models for generating CDM-like inhomogeneities in the CMB without inflation. He gets very close, but in the details at large scale, causality requires some inflation-like activity. His arguments were very general. More was said about this by Magueijo (Imperial), who uses a variable speed of light to do inflation's work. In general these causality arguments come from the problem that it is very hard to set up the initial conditions.

Lachièze-Rey (Paris) described nice detectors for the baryon acoustic feature that don't require construction of the full correlation function or power spectrum. This led nicely into an afternoon discussion of homogeneity, where my results, Sylos Labini's (Rome), and Kazin's (NYU) were batted around. I didn't fare as well as I would have liked in part because we have not closed all the loopholes remaining for some kinds of technical inhomogeneities. I certainly think that it is well established that the Universe has a mean density, but Sylos Labini and others have a good point that if you can do your data analysis without assuming the mean density then you should, if for no reason other than that it is rarely well measured (so it adds uncertainty to your results). Don't get the wrong idea from my measured tone here: The universe is not a fractal on large scales!


New Directions in Modern Cosmology, day 2

Today was my first day at the meeting in Leiden, named above. It is an eclectic group, because the idea was to bring together outsiders and insiders in the whole cosmology thing and have people hash it out: A great idea. The talks were good, but I hope I will be forgiven for saying that (for my geeky self) I learned the most from the talk given by Hao Liu (Beijing), who is re-analyzing the WMAP data starting at the time stream. He is finding many very interesting things out about the calibration, attitude model, and map-making. It appears that the quadrupole amplitude is very soft. It is all about attitude and configuration modeling, just like Hipparcos, which is not surprising. Plus there are sidelobe issues. It appears that no-one, not Hou or the WMAP team, is building the full likelihood including all calibration parameters and then marginalizing out everything but the map (or the cosmological parameters). That is, there still is not an honest Bayesian analysis (if that is even possible!).


stellar oscillations

Hou, Goodman, and I had our weekly meeting today to discuss exoplanet fitting and inference.  Goodman suggested a possible statistical model for stellar oscillations that would permit us to treat them as a kind of structured Gaussian noise.  Yet another project becomes a kind of Gaussian process!  The idea is to drive a damped oscillator with a broad-band source.  I assigned that to myself as homework for my flight to Amsterdam.



In between seminars, Schiminovich and I played around a bit with our transit search.



Price-Whelan and I pair-coded some image stacking stuff, which we are going to use to criticize the (plan for the) PanSTARRS data pipeline.


automated time-domain discovery

Lang and I spent the day writing and automated system to detect rate changes in photon streams. We applied the code—which is pretty principled—to some GALEX data (extracted by Schiminovich) and we re-discovered a bunch of known things. We also discovered a bunch of other things; it remains to be seen if they are really discoveries.


timely distraction

I went up to Columbia today to work on the IGM and radiation fields with Schiminovich and then he distracted me with all sorts of crazy stuff he is finding in the GALEX time-stream. GALEX time-tags every photon (and it has one hell of a lot of photons), so it is the best time-domain project in astronomy ever. Almost nothing has been done in that time-stream to date, with some extremely notable exceptions. We talked, wrote code, and planned our exploitation.


beta distribution, hierarchical

In response to a (good) referee comment, I worked on re-fitting the exoplanet eccentricity distribution—in our hierarchical Bayesian model—with a beta distribution. This is a very useful family of distributions with two shape parameters and a lot of freedom. On the weekend and this afternoon I also started to write down the quasar absorption model that we might be able to use to make the clustering measurement the output of another hierarchical model. If we can do that successfully, it will be a real triumph.


IGM model

At the end of the day, Foreman-Mackey, Bovy, and I discussed how we might model the Lyman-alpha absorption in quasar spectra, such that the continuum (which is unobservable) becomes a fit parameter usefully constrained by the data. The long-term goal is to do Lyman-alpha clustering. The key thing is that this would be hierarchical: The parameters we care about would be parameters of the state and clustering of the IGM.


ionizing photons

Schiminovich and I planned our project to measure the intergalactic ionizing flux in the neighborhood of the Milky Way.


all code all day

Lang and I pair-coded all day. We can now optimize the likelihood of a new SDSS Catalog against the imaging pixels in a field, permitting the model both continuous changes (parameter updates) and qualitative changes (model type changes, like star to galaxy and so on). We find that we can enormously improve the goodness of fit and reduce greatly the number of parameters, so it is win–win. It was a great day! Now we must write, write, write.


practice of science

In the morning, Lang and I pair-coded. In the afternoon, Jiang and I discussed the finer points of English grammar! This sounds like a joke, but of course writing is the most important tool in the scientist's toolkit, bar none.



Among other discussion topics, Bovy and I debated the usefulness of the fact that if you have N pieces of data about an object, each of which says something about, say, the question of whether that object is a quasar, you can multiply together the N independently calculated likelihoods for quasar and for star. This sounds great, but it only makes sense when the N pieces of data tell you independent things. That is, it is only true if the joint probability of all the data equals the product of the probabilities of each data item separately. This is almost never the case!

In related news, we figured out why Zolotov and I are having trouble computing the Bayes factor for our project: Bayes factors are hard to compute! And they are also very slippery, because a small change to an uninformative prior—one that makes no change whatsoever to the posterior probability distribution—can have a huge impact on the Bayes factor. Once again, I am reminded that model selection is where I part company from the Bayesians. If you don't have proper, informed priors, you can't compute marginalized relative probabilities of qualitatively different models. It is cross-validation and data-prediction only that is reliable in most real situations.


finding AGN

At group meeting today, Renbin Yan (NYU) talked about finding AGN by X-ray and optical criteria. He has an optical color criterion that can replace the NII to H-alpha part of the BPT diagram; it works surprisingly well. He also showed that some supposed classes of AGN that are X-ray bright or X-ray faint are just the undetected edges of a broad distribution.


GALEX, time domain

Schiminovich came down for most of the day, and we discussed next steps on our GALEX projects. We also discussed the time domain, where we are set up to do some crazy projects, but haven't started. With Astro2010 endorsing LSST, this could be a valuable thing for the community as well as scientifically interesting. In the late afternoon, Lang and I continued to pair-code the reoptimization of the SDSS Catalog



With the chaos of the starting semester, and job season, I didn't get much research done beyond conversations with Lang and Zolotov and Bovy.


SDSS Catalog gradient descent

Lang and I worked on setting up the synthetic-image gradient-descent we are preparing for a next-generation SDSS imaging catalog. If we can get it started, we will be optimizing a multi-billion-parameter model. I have done something close before for SDSS ubercalibration, but in that project we transformed the problem into a linear fit by making some not-terrible approximations. No such luck here.


don't stack!

I finished (in my mind, anyway) specifying the full don't stack your images project for Price-Whelan today. At first we will work on synthetic data (despite the wrongness of that), and move to real data after we have clear results.


stacking images?

Price-Whelan and I spent part of the afternoon on our project to criticize (naive methods for) image stacking. We are using fake data but quickly realized that we shouldn't be. We should be using SDSS Stripe 82!


diagnosis of code

After a morning of discussion with new student Daniel Foreman-Mackey (NYU), I spent a chunk of the afternoon helping Jagannath diagnose his likelihood function. He is doing MCMC with a marginalized likelihood function; it is a pretty non-trivial piece of code. Something goes very wrong early, and we can't figure it out. It made me realize—once again—how important the idea of diagnosis is in science. And yet it is not part of any formal curriculum or training. This point is made in a childhood context by Seymour Papert in the wonderful book Mindstorms.



My only real research today was conversations with Zolotov and Wu about their current papers.