testing works

OMG, testing works! (Duh.) I wrote proper tests for my celestial mechanics code, and sure enough I instantly located my bugs. Now I have working code, and tests to prove that it is working! Aaahh.


computational science, microlensing, and etc

In the morning I gave an informal talk at the Simons SCDA, about my work on The Cannon, and my issues (both good things and bad things) with machine learning. I discussed the point (which inspires our ABC research, and which I also discussed with Kravtsov at Chicago) that quantitative natural science is now almost entirely computational—meaning that the theory is a simulation that makes artificial data—and this leads to changes in how we do inference and speak about realities.

In the afternoon, I spoke with Megan Bedell (Chicago) about the echelle spectroscopy radial-velocity data she has; she has done some dimensionality reduction and there are promising opportunities I think to improve the end-to-end radial-velocity precision. I also worked on my celestial mechanics code for Price-Whelan; it is not working and I don't know why! My only option is to write proper tests. Tomorrow!

While all this was going on, in the background, Dun Wang has been steadily finding cool stuff in the K2C9 data, including, for example, this (previously known from the ground) baby!


new discoveries in the K2 data!

At group meeting, Dun Wang showed new discoveries in the brand-new K2 Campaign 9 data. The K2 team released the data the moment it came off the s/c, in fact before they even ran their own data-reformatting code. This was no problem for us, because K2 god Geert Barentsen (Ames) has (clean-roomed? and) open-sourced some of the core code. Wang ran our image-prediction / image-differencing version of our CPM code to predict the data and identify sources with strange excursions. He found ones that looked like possible microlensing events (known and unknown) and published them to the K2C9 group. I asked him to re-run CPM with other parameters to make it less (and more) aggressive and thereby address the (probable) over-fitting. The next step will be to incorporate a microlensing model and fit the K2 systematics (the point of the CPM code) and the microlensing parameters simultaneously. We discussed next steps.

Later in the day I started actually writing the celestial mechanics code my new eclipsing-binary team (Price-Whelan, Ness, and Foreman-Mackey). It is probably all wrong, but it is core technology, so it needs to get instrumented with tests.


celestial mechanics

As part of my project with Adrian Price-Whelan (and also Melissa Ness and Dan Foreman-Mackey), I spent my research time today figuring out time derivatives of Kepler's equations. These so we can do simultaneous fitting of the eclipsing binary light curve and the radial velocities revealed in the double-line spectrum. This was actual pen-on-paper calculus! It's been a while, although as I took these derivatives, it reminded me that I have taken them many times in the past.

In the afternoon I had a great conversation with Duane Lee (Vanderbilt) about chemical tagging and nucleosynthesis. He is close to being able to fit our data in the Milky Way halo with a mixture of dwarf-galaxy stellar populations. That would be awesome! We talked about low-hanging fruit with our APOGEE chemical abundance data.


a new project: eclipsing binaries

I have a dream! If we could get enough long-period eclipsing binaries with multi-epoch spectroscopy, we could go a long way towards building a truly data-driven model of stellar spectra. It would be truly data-driven, because we would use the gravitational model of the eclipsing binary to get stellar masses and radii, and thus give good label (truth) inputs to a model like The Cannon for the stellar spectra. (Yes, if you have an eclipsing binary and spectroscopy for radial velocities, you get everything.) And then we could get densities, masses, and radii of stars for the interpretation of transit and radial-velocity results on exoplanets, without relying on stellar models. There are lots of other things to do too, like build population models for binary stars, and exploit the stellar models for Milky Way science. And etc.

Today, because of a meeting cancellation, both Adrian Price-Whelan and I got the full day off from responsibilities, so we decided to use it very irresponsibly. We searched the (very incomplete and under-studied) Kepler eclipsing binary list for binaries with long periods, deep eclipse depths, and APOGEE spectroscopy. It turns out there are lots! We started with the system KIC 9246715, which is a red-giant pair.

In the APOGEE spectrum, the pair of velocities (double line) is clearly visible, and it clearly changes from epoch to epoch. We found the velocities at each epoch first by auto-correlation and then by modeling the spectrum as a sum of two single stars. A project is born!


extreme precision radial velocities

I continued working on my document about release of data and code. Twitter (tm) continues awesome.

Research highlight of the day was a long discussion with Megan Bedell (Chicago) about the consistency of exoplanet-host radial-velocity measurements order-by-order in a many-order high-resolution echelle spectrograph. The measurements are made by cross-correlation with a binary (ugh) template, and some orders are consistently high and some are consistently low, and we very much hope there are other more subtle regularities to exploit. Why are there these discrepancies? Probably because the model is inflexible and wrong. Unfortunately we don't have access to it directly (yet) so we have to live with the cross-correlation functions. We discussed simple methods to discover regularities in the order-by-order offsets and results, and sent Bedell off with a long to-do list.

I ended the day with a long conversation with Kat Deck (Caltech). Among other things, we discussed what we would do with our lives if exoplanet research evolves into nothing other than atmosphere transmission spectroscopy and modeling. Of course neither of us considers this outcome likely!


writing by tweeting

I spent the day working on my document about releasing data and code. I tweeted (tm) some of the ideas in the paper and started responding to the storm of replies. The twitters are excellent for getting ideas from the community!


the Bronx; the baryon acoustic feature

Although perhaps this doesn't count as research, I spent today at public middle school CIS 303 in the Bronx, for Career and College Day. I met lots of kids (and lots of other people talking about their careers) and said words and answered questions about my career and how I got here. The format was panels, interviewed by classrooms of kids. Most interesting idea of the day (and it was from other panelists): Success in a career requires empathy and the ability to listen. That's deep! Strongest impression of the day: Comparing this public middle school to that of my own daughter, I can (still) see a lot of disparity in the NYC public school system, and that disparity isn't just about money: It is also about discipline, school organization, and academic priorities. (These disparities are what got me studying education way back in the late eighties when I was in college.)

At the beginning of the day, I did get a bit of research in thanks to Jeremy Tinker (NYU), who showed the Blanton–Hogg group meeting what is currently going on with (finishing) BOSS data analyses and (starting) eBOSS ones. The combination of baryon acoustic feature scale measurements and redshift-distortion measurements lead to very strong constraints on cosmological parameters. I'd like to say more! But papers will appear within weeks.


not-so-latent variable latent variable model

I had a conversation today with Boris Leistedt about his work to build a latent model of galaxy SEDs and get template-space-marginalized photometric redshifts. I proposed that he instantiate the latent variables as observables (like rest-frame colors or line strengths or something); this will help the model break degeneracies and sensibly order the templates in the latent space. That is, it should regularize or simplify the model for inference. That's just an intuition. But it might also help people who have drunk less of our Kool Aid to understand!


release your data and code; diffraction microscopy

Having sent my cosmology inference draft to various friendlies for them to beat it up, I returned to other priorities. I have the goal to finish two more papers before my sabbatical ends. The first would be something (in the Data Analysis Recipes series) about why you should (or shouldn't) release your data and code. I keep making the same arguments over and over in person and by email, so I should write them down once and for all! This is exactly why I started the fitting-a-line document o-so-many years ago. The second is a paper on diffraction imaging with very few photons. I booted up both of these projects today.

On the first, I made a draft table of all the things that come in on the pro and con sides for releasing data and code. There are lots of overlaps, and lots of things appear in both the pro and the con column! For example, documentation: This is a con, because it is a burden, but a pro because you are encouraged to do it (and benefit from it). And it applies to both data and code.

On the second, I worked through the mathematics of the Ewald sphere, in preparation for generalizing my code so that it doesn't have to work in the small-angle limit.


finished a paper!

Sabbatical is an unreal experience. In case the loyal reader was wondering How awesome is your job?, let me add to the list of awesome the fact of having a year (every seven, in principle) with no teaching or committee duties, in which I can do whatever I want! One piece of evidence that this is happening right now is that I just finished the third (zeroth draft) first-author paper of this academic year, which is about six times as many first-author papers I complete in a normal year (yes, my normal rate is 0.5 first-author per year; thankfully my junior colleagues are far more productive and permit my co-authorship).

Of course these three first-author papers are not really done: On one I need to respond to referee, on one I am waiting for a bit of work from collaborators, and this new one is still very, very rough. But it is fully drafted, from end to end. The subject is: How we compare cosmological simulations to cosmological data, and the incorrect inferences we might be drawing by the wrongness of all this.


training a photometric-redshift method with a single redshift

Boris Leistedt dropped in today and we discussed his methods to build a physically possible model of galaxy spectral energy distributions and therefore photometric redshifts, but with an exceedingly flexible model. His method is brilliant because it is entirely data-driven (no fixed templates) and yet it respects the physics of special relativity (the Doppler shift), which the machine-learning methods do not.

He made the amusing point that his method can be trained with a training set that contains literally a single galaxy with a spectroscopic redshift! That is, if you even have only one single redshift, you can put photometric redshifts (with, admittedly, large error bars) on all the other photometric galaxies! That is a property that no other data-driven method has. The point is that if you have multi-wavelength data on a single galaxy with a redshift, you can make rough predictions about how other galaxies would look at other redshifts.

His real breakthrough is the idea of using Gaussian processes to put priors on the spectral energy distributions (templates): If the SEDs are drawn from a Gaussian process, then all of the photometry (which consists of linear projections of the SEDs) is also drawn from a Gaussian process. We discussed the magic of all of this.

I also read a proposal by Daniela Huppenkothen, and wrote words in my inference-of-variances paper.



For my inference-of-variances project, I got ABC working. It is delivering a correct posterior, even at a reasonable (finite) distance threshold. All is good! I put a figure showing this into the nascent paper.


Dr Greg Green

It was my pleasure to sit on the PhD defense committee of Gregory Green (Harvard) today. I had to do it remotely (for uninteresting reasons). Green has built a three-dimensional map of the dust in the Milky Way, by modeling every single star in the PanSTARRS data. This is an impressive feat computationally, since it is a huge problem, and also probabilistically, since most things you can write down are either intractable or wrong.

Being a good probabilistic reasoner, Green did something both tractable and correct, and got a beautiful map. His tours of the map in his presentation were mesmerizing. He was cagey about spiral structure; his method wouldn't necessarily find it even if it were there.

It was a great PhD defense based on absolutely great work. At the end of his talk he discussed ways he might do things that are even righter in the future, given our prior beliefs about the interstellar medium (and lots of new data). That's super interesting, and we hope to discuss when his dust settles (so to speak). Congratulations Dr Green!


variance of variances

I spent the day trying to understand the frequentist properties of empirical (sample) variances, and the properties (expectation value and variance) of estimators of the variance of the distribution that generated the sample. This is all related to my issues with cosmological inference, that I am trying to write up. I am not surprised cosmologists have made mistakes here; it is hard to understand even in the most trivial situation. I am working out the trivial case to make analogies to the real case.