MCMC: A users' manual

On the first week of being (mostly) off the grid, my only research was writing in the MCMC manual that Foreman-Mackey and I are writing. It is not as close to being done as I remembered it from way back (I think we started it in 2012).


bad chemical tags, earning travel

It was a day packed with non-research, except for group meeting, which was great, as always. Sanderson updated us on extensions of her action-space clustering methods for measuring the Milky Way gravitational potential. One of the ideas that emerged in the discussion relates to "extended distribution functions": In principle any stellar "tags" or labels or parameters that correlate with substructure identification could help in finding or constraining potential parameters. Even very noisy chemical-abundance labels might in principle help a lot. That's worth checking. Also, chemical labels that have serious systematics are not necessarily worse than labels that are "good" in an absolute sense. That is one of my great hopes: That we don't need good models of stars to do things very similar to chemical tagging.

Also in group meeting we gave Hattori marching orders for (a) getting a final grade on his independent study and then (b) getting to fly to Hawaii for the exoplanet stats meeting. For the former, he needs to complete a rudimentary search of all stars in Kepler. For the latter he needs to do hypothesis tests on the outcome of the search and write it up!


calibration and search in Kepler; replacing humans

In group meeting we went through the figures for Dun Wang's paper on pixel-level self-calibration of Kepler, figure by figure. We gave him lots of to-do items to tweak the figures. This is all in preparation not just for his paper but also for AAS 225, which is in early January. At the end we asked "Does this set of figures tell the whole story?" and Fadely said "no, they don't show general improvement across a wide range of stars". So we set Wang on finding a set of stars to serve as a statistical "testbed" for the method.

Also in group meeting, Foreman-Mackey showed some of the systems returned by his search for long-period planets in the light-curves of bright Kepler G dwarf stars. So far it looks like he doesn't have anything that Kepler doesn't have, but he also thinks that some of the Kepler objects might be false positives.

We spent some time looking at Huppenkothen's attempt to reproduce human classification of states of black hole GRS 1915 using machine learning. We scoped a project in which she finds the features that do the best job of reproducing the human classification and then does an unsupervised clustering using those features. That should do something similar to the humans but possibly much better. She has good evidence that the unsupervised clustering will lead to changes in classes and classifications.


job season

Research ground to a halt today in the face of job applications. There's always tomorrow.


improving photometry hierarchically

Fadely handed me a draft manuscript which I expected to be about star–galaxy classification but ended up being about all the photometric measurements ever! He proposes that we can improve the photometry of an individual object in some band using all the observations we have about all other objects (and the object itself but in different bands). This would all be very model-dependent, but he proposes that we build a flexible model of the underlying distribution with hierarchical Bayes. We spent time today discussing what the underlying assumptions of such a project would be. He already has some impressive results that suggest that hierarchical inference is worth some huge amount of observing time: That is, the signal-to-noise ratios or precisions of individual object measurements rise when the photometry is improved by hierarchical modeling. Awesome!

Fadely and I also discussed with Vakili and Foreman-Mackey Vakili's project of inferring the spatially varying point-spread function in large survey data sets. He wants to do the inference by shifting the model and not shifting (interpolating or smoothing) the data. That's noble; we wrote the equations on the board. It looks a tiny bit daunting, but there are many precedents in the machine-learning literature (things like convolutional dictionary methods).


git trouble

I spent a bit of Sunday working on paper one from The Cannon. Unfortunately, most of my time was spent resolving (and getting mad about) git fails, where my co-authors (who shall remain nameless) were editing wrong versions and then patching them in. Argh.


redshift probability, lensed supernova, interstellar metallicity

At group meeting, Alex Malz showed some first results on using redshift probability distributions in a (say) luminosity function analysis. He showed that he gets different results if he takes the mean of the redshift pdf or the mode or does something better than either of those. I asked him to write that up so we can see if we all agree what "better" is. Fadely handed me a draft of his work to date on the star–galaxy separation stuff he has been working on.

After group meeting, at journal club, Or Graur (NYU) showed work he has been doing on a multiply imaged supernova. It looks very exciting, and it is multiply imaged by a galaxy in a lensing cluster, so there are actually something like seven or eight possibly detectable images of the supernova, some possibly with substantial time delays. Very cool.

The astro seminar was by Christy Tremonti (Wisconsin), who told us about gas and metallicity in galaxy disks. She has some possible evidence—in the form of gradients in effective yield and gas-to-star ratio—that the gas is being moved around the galaxy by galactic fountains. She is one of the first users of the SDSS-IV MaNGA integral-field data, so that was particularly fun to see.


dotastronomy, day 3

The day started with the reporting back of results from the Hack Day. There were many extremely impressive hacks. The stand-outs for me—and this is a very incomplete list—were the following: Angus and Foreman-Mackey delivered two Kepler sonification hacks. In the first, they put Kepler lightcurves into an online sequencer so the user can build rhythms out of noises made by the stars. In the second, they reconstructed a pop song (Rick Astley, of course) using lightcurves as fundamental basis vectors. This just absolutely rocked. Along similar lines, Sascha Ishikawa (Adler) made a rockin' club hit out of Kepler lightcurves. Iva Momcheva did a very nice analysis of NASA ADS to learn things about who drops out of astronomy post-PhD, and when. This was a serious piece of stats and visualization work, executed in one day. Jonathan Fay (Microsoft) implemented the Astrometry.net API to get amateur photographs incorporated into World-Wide Telescope. Jonathan Sick (Queens) and Adam Becker (freelance) built tools to make context-rich bibliographic and citation information that could be used to build better network analysis of the literature. Stuart Lynn (Adler) augmented HTML with tags that are appropriate for fine-grained markup for scientific publications, with the goal of making responsive design for the scientific literature while preserving scholarly information and referencing. Hanno Rein (Toronto) built a realistic three-dimensional mobile-platform fly-through system for the HST 3D survey.

After these hacks, there were some great talks. The highlights for me included Laura Whyte (Adler) talking about their incredibly rich and deep programs for getting girls and under-represented groups to come in and engage deeply at Adler. Amazingly well thought out and executed. Stefano Meschiari (UT) blew us away with a discussion of astronomy games, including especially "minimum viable games" like Super Planet Crash, which is just very addictive. He has many new projects and funding to boot. He had thoughtful things to say about how games interact with educational goals.

Unconference proceeded in the afternoon, but I spent time recuperating, and discussing data analysis with Kelle Cruz (CUNY) and Foreman-Mackey.


dotastronomy, day 2

Today was the Hack Day at dotastronomy. An incredible number of pitches started the day. I pitched using webcam images (behind a fisheye lens) from the Liverpool telescope on the Canary Islands to measure the sidereal day, the aberration of starlight, and maybe even things like precession and nutation of the equinoxes.

I spent much of the day discussing and commenting on other hacks: I helped a tiny bit with Angus and Foreman-Mackey's hack to sonify Kepler data, I listened to Jonathan Fay (Microsoft) as he complained about the (undocumented, confusing) Astrometry.net API, and I discussed testing environments for science with Arfon Smith (github) and Foreman-Mackey and others.

Very late in the evening, I decided to get serious on the webcam stuff. There is an image every minute from the camera and yet I found that I was able to measure sidereal time differences to better than a second, in any pair of images. Therefore, I think I have abundant precision and signal-to-noise to make this hack work. I went to bed having satisfied myself that I can determine the sidereal period, which is equivalent to figuring out from one day's rotation how many days there are in the year. Although I measured the sidereal day to nearly one part in 100,000, my result is equivalent to a within-a-single-day estimate for the length of the year of 366.6 days. If I use more than one image pair, or span more than one day in time, I will do far, far better on this!


dotastronomy, day 1

Today was the first day of dotastronomy, hosted by the Adler Planetarium. There were talks by Arfon Smith (Github), Erin Braswell (Open Science), Dustin Lang (Astrometry), and Alberto Pepe (Authorea). Smith made a lot of parallels between the open collaborations built around github and scientific collaborations. I think this analogy might be deep. In the afternoon, unconference was characteristically diverse and interesting. Highlights for me included a session on making scientific articles readable on many platforms, and the attendant implications for libraries, journals, and the future of publishing. Also, there was a session on the putative future Open Source Sky Survey, for which Lang and I own a domain, and for which Astrometry.net is a fundamental technology (and possibly Enhance!). There were many good ideas for defining the mission and building the communities for this project.

At coffee break, Foreman-Mackey and I looked at McFee's project of using Kepler light-curves as basis vectors for synthesizing arbitrary music recordings. Late at night, tomorrow's hack day started early, with informal pitches and exploratory work at the bar. More on all this tomorrow!



Kepler, uncertainties, inference

Bernhard Schölkopf showed up for the day today. He spent the morning working with Foreman-Mackey on search and the afternoon working with Wang on self-calibration of Kepler. In the latter conversation, we hypothesized that we might improve Wang's results if we augment the list of pixels he uses as predictors (features) with a set of smooth Fourier modes. This permits the model to capture long-term variability without amplifying feature noise.

Before that, in group meeting, Sanderson told us about the problem of assigning errors or uncertainties to our best-fit potential in her method for Milky Way gravitational potential determination. She disagrees with the referee and I agree with the referee. Ultimately, we concluded that the referee (and I) are talking about precision, but the accuracy of the method is lower than the precision. I think we understand why; it has something to do with the amount of structure (or number of structures) in phase space.

At lunch, we met up with David Blei (Columbia) and shot the shih about data science and statistics. Blei is a probabilistic inference master; we convinced him that he should apply his super-powers towards astrophysics. He offered one of our party a postdoc, on the spot!


black holes and weird pixel effects

In group meeting, Huppenkothen argued out the projects we discussed on Monday related to machine classification of black-hole accretion states of GRS 1915. We talked about all three levels of project: Using supervised methods to transfer classifications for a couple of years of data onto all the other years of data, using unsupervised methods to find out how many classes there plausibly are for the state, and building some kind of generative model either for state transitions or for literally the time-domain photon data. We discussed feature selection for the first and second projects.

Also at group meeting, Foreman-Mackey showed a new Earth-like exoplanet he has discovered in the Kepler data! Time to open our new Twitter (tm) account. He also showed that a lot of his false positives relate to un-discovered discontinuities in the Kepler photometry of stars. After lunch, we spent time investigating these and building (hacky, heuristic) code to find them.

Here are the symptoms of these events (which are sometimes called "sudden pixel sensitivity drops"): They are very fast (within one half-hour data point) changes to the brightness of the star. Although the star brightness drops, in detail if you look at the pixel level, some pixels brighten and some get fainter at the same time. These events appear to have signs and amplitudes that are consistent with a sudden change in telescope pointing. However, they are not shared by all stars on the focal plane, or even on the CCD. Insane! It is like just a few stars jump all at once, and nothing else does. I am confused.

Anyway, we now have code to find these and (in our usual style) split the data at their locations.


finding tiny planets, Kepler jumps, papers

Foreman-Mackey and I had a long and wide-ranging conversation about exoplanet search. He has search completeness in regimes of exoplanet period heretofore unexplored, and more completeness at small radii than anyone previously, as far as we can tell. However, his search still isn't as sensitive as we would like. We are doing lots of hacky and heuristic things ("ad hockery" as Jaynes and Rix both like to say), so there is definitely room for improvement. All that said, we might find a bunch of smaller and longer-period planets than anyone before. I am so stoked.

In related news, we looked at a Kepler star that suffered an instantaneous change in brightness. We went back to the pixel level, and found the discontinuity exists in every pixel in the star's image, but the discontinuity has different amplitudes including different signs in the different pixels. It is supposed to be some kind of CCD defect, but it is as if the star jumped in position (but its fellow stars on the CCD didn't). It is just so odd. When you do photometry at this level of precision, so many crazy thing appear.

Late in the day I caught up on reading and commenting on papers people have sent me, including a nice paper by Megan Shabram (PSU) et al on a hierarchical Bayesian model for exoplanet eccentricities, a draft paper (with my name on it) by Jessi Cisewski (CMU) et al on likelihood-free inference for the initial mass function of stars, a draft paper by Dustin Lang (CMU) and myself on principled probabilistic source detection, a paper by John Jenkins (Ames) et al on optimal photometry from a jittery spacecraft (think Kepler), and the draft paper by Melissa Ness et al on The Cannon.


classification of black-hole states

I had a discussion today with Huppenkothen about the qualitatively different states of accreting black-hole GRS 1915. The behavior of the star has been classified into some dozen-ish different states, based on time behavior and spectral properties. We figured out at least three interesting approaches. The first is to do old-school (meaning, normal) machine learning, based on a training set of classified time periods, and try to classify all the unclassified periods. It would be interesting to find out what features are most informative, and whether or not there are any classes that the machine has trouble with; these would be candidates for deletion.

The second approach is to do old-school (meaning, normal) clustering, and see if the clusters correspond to the known states, or whether it splits some and merges others. This would generate candidates for deletion or addition. It also might give us some ideas about whether the states are really discrete or whether there is a continuum of intermediate states.

The third approach is to try to build a generative model of the path the system takes through states, using a markov model or something similar. This might reveal patterns of state switches. It could even work at a lower level and try to predict the detailed time-domain behavior, which is incredibly rich and odd. This is a great set of projects, and easy (at least to get started).