#AAAC, day 2, classification by discriminability

The morning started with the second day of the #AAAC meeting. Steve Kahn (Stanford) talked to us about the relationships among the superficially similar projects LSST, Euclid, and WFIRST. The argument is that they are highly complementary. I didn't really disagree, but it is not obvious that we as a community would be willing to spend a lot of money on WFIRST if we knew that LSST and Euclid are definitely going forward. I asked pointed questions and hope to follow up. Since WFIRST can do so many things, maybe it should slightly re-prioritize given the context?

In the afternoon, I talked with Amit Singer (Princeton), who was pretty adamant that the stuff I am doing on single-photon imaging is stupid and a waste of time! Late in the day, based on a comment by Greengard, Jeremy Magland and I formulated an awesome new clustering (or unsupervised classification) algorithm: Define discriminability (of j from k) to be the empirical probability that a point from distribution j be closer to a neighbor in distribution j than to a point in distribution k. Now set the boundary (which could be an arbitrarily shaped surface) to maximize discriminability. Magland ended up getting pessimistic when we realized that it would be slow. But it is worth exploring.


#AAAC, day 1

Today was day 1 of the two-day Astronomy and Astrophysics Advisory Committee (that advises NSF, NASA, and the DOE on places of intellectual and funding overlap). In the meeting (which is open to all by phone) I learned various things. One is that the LISA Pathfinder has successfully arrived at L1 and is apparently fully functional, including in its high-tech thruster system. The test masses are set to be released in weeks (if I remember correctly). John Carlstrom (Chicago) gave a nice presentation about CMB Stage 4 experiments, which are ramping up. There is a lot still to learn from the CMB. He emphasized that foregrounds are the dominant issue for many important experiments, along with the lensing distortions. I have ideas about both, but especially foregrounds: I don't think the CMB community is using the most sophisticated, tractable models that are out there. I made a mental note to contact people off-line about this.


photon pile-up

At Blanton–Hogg group meeting, Daniela Huppenkothen brought up photon pile-up in x-ray and gamma-ray detectors. The issue is that if two photons arrive at the same time, or in the same electronics-restricted time window, they will appear as one photon, but of higher energy. It is an issue for Chandra and for Fermi, among other assets. This pile-up leads to a distortion of the spectrum (and point-spread function, and so on) of very bright sources. We discussed how one might model this, given that it is easy to simulate but hard to describe with a likelihood function. We also came up with a ridiculously simple idea for testing cosmic-ray detection in time-series imaging, which really, really needs to e done.

In the afternoon, I did text writing and problem (exercise) writing in my MCMC tutorial. Stay on target. #AcWriYear


MCMC and the Milky Way

I put more exercises into my MCMC document, and put another figure into my single-photon microscopy document. I had a long conversation with Hans-Walter Rix about all the projects we can do with our results on detailed abundances from The Cannon for stellar clusters and in the Milky Way disk.

In the afternoon I started preparing for two Gaia Sprints. These are going to be hack weeks in which we try to exploit the Gaia first data release data in the late Summer and early Fall. The idea is to produce publishable science in one tough week. Watch this space for an announcement soon.


not writing a book; single-photon imaging code

Stoked by having finished my first first-author paper in a long while, I had a call with Daniel Foreman-Mackey in which I proposed to him that I try to finish a paper every week until I got through my backlog! He talked me down to one paper per month, and we agreed that our MCMC tutorial document should be next. He argued that we should add exercises (it is, after all, a chapter of the book I will never write). I agreed and wrote an exercise later in the day. I have a bunch more to go.

In the afternoon, I had a discussion with Leslie Greengard of my results on imaging molecules at random, unknown (yes, random=unknown) orientations with single-photon images. We discussed two big issues. The first is writing and testing analytic derivatives of my fully marginalized likelihood function (which is the objective function I (horror) optimize for this project). The other issue is representation for the molecule. We discussed many options and tentatively settled on a simple linear parameterization in real space (not Fourier space). Still confused; Greengard points out that it is confusing because there genuinely is no simple answer: There are no bases with elements that are compact in both Fourier space and real space, for deep, deep reasons.


candidate Wang

Today Dun Wang (NYU) passed his oral candidacy exam. His PhD thesis is pretty ambitious: A self-calibration of the Kepler Spacecraft main-mission data, an ultraviolet map of the Milky Way from GALEX data (which he will also self-calibrate), and photometry in crowded fields for the K2 mission!


all the plots, radial-velocity survey design

In response to requests from my team, I made a 256-page, 2560-panel plot of every single one of the 256 k-means clusters we found in abundance space, a few of which we published in our paper yesterday. I don't see much else in there that is easy to interpret, but it looks to me like the higher metallicity groups we see are actually multiple groups mashed together. So I resolved to run at higher values of K on the weekend.

In the afternoon, I had a call with Dan Foreman-Mackey about exoplanet populations. (Thinking of the future) he observed that the goal of finding reliable targets for some kind of Terrestrial Planet Finder mission and the goal of understanding how typical planetary systems form and evolve might be very much at odds, especially when it comes to radial-velocity surveys. Indeed, there haven't been many populations analyses of radial-velocity surveys to date, in part because most of them are not designed with long-term statistical goals in mind. We discussed a bit about what we might do to encourage a future in which both goals can be met, handily. I pointed out something that Charlie Lawrence (JPL) said to me at #AAS227 a couple weeks ago: If some kind of TPF is going to cost billions of dollars, it is worth spending a few hundred million on the ground in preparation. So resources might be abundant.


paper submitted!

Today I presented out chemical-tagging results at Blanton–Hogg group meeting. We show that (now that we can see them with The Cannon) overdensities in chemical space appear also to be overdensities (or at least oddities) in phase space. I followed the meeting by making final edits to the paper, submitting it to the Astrophysical Journal and the APOGEE Collaboration, and putting it on the arXiv. I also sent it to friends and colleagues. This led to an email battle with Charlie Conroy (CfA), who believes our results are somewhere between trivial and wrong!


galaxy redshift model of everything

I produced today a second draft of the chemical tagging paper. Boris Leistedt came by the SCDA and updated me on his SED and redshift model for photometric surveys. I hesitate to even call this “photometric redshifts” because the ambition is so much greater: It is to get the luminosity distribution, the redshift dependence, and the spectral energy distribution for every type of object on the sky, plus assign types and redshifts to all the sources in a multi-band survey (or collection thereof). We even talked about constraining cosmology with galaxy counts, as was first proposed by Hubble so many years ago: It is a generative model for all the redshifted objects on the sky, plus perhaps an admixture of stars in our own Galaxy. Ambitious! And, I think, not impossible, at least if we keep our early goals limited.



I decided to go all in on the chemical-tagging paper; I completed a first draft today, with boat-loads of help from Andy Casey and Melissa Ness over the weekend. Hoping to finish and submit this week.


chemical tagging, phase retrieval, and imaging with single photons

I spent the morning writing about chemical tagging. It appears that The Cannon now delivers precise enough chemical abundance measurements that we can find structure in abundance space (as I discovered in Florida this past weekend). I started to write this up into some kind of paper. In the afternoon, I joined a discussion with the usual mathematical suspects about phase retrieval and set intersection methods. This is very promising! Also, during that discussion, Leslie Greengard handed me a paper by Ayyer et al that poses (pretty much) the question I solved over the break: Can you reconstruct a 3-d image from unknown projections, in each of which you get only one photon?


data-driven model of supernova yields

I spent the last few days working on abundance-space structure, with some detours to hang out with colleagues in town for the Future of AI meeting at NYU, including Bernhard Schölkopf. Today Schölkopf and I spent some quality time today talking about our next round of projects, in two categories. In the first category, we talked about simple situations in astronomy in which independent components analysis might be useful. One is supernova yields: Jennifer Johnson (OSU) had asked me last weekend what kinds of supernovae create potassium; I promised her not an answer but a method for getting an answer. Schölkopf suggested that this is a perfect case for ICA: We want to matrix factorize, but in a way that separates causes, not variance! ICA is based on some great math, to which he pointed me.

In the second category, we talked about the next generation of what astronomers call “image differencing”. We want to build out the Causal Pixel Model we built for Kepler self-calibration so that it could work in situations (like LSST) in which there is heterogeneous temporal and spatial coverage of the sky. Then, if it works, we can use everything we have to predict the imaging data we are trying to subtract (or really just predict precisely).


artificial what? and k-means

I spent a few hours at the Future of AI symposium, at which various luminaries speculated about the future of machine learning. Mainly I learned that representatives of huge companies are willing to perform insane extrapolations of their current technologies, which are all, entirely, based on convolutional nets (and some recurrent nets) as Yann Lecun (NYU) reminded us. There was an interesting call for unsupervised methods: Clearly you don't have AI if all you can do is supervised learning!

In the afternoon, Dan Foreman-Mackey told me to use k-means. That in answer to my question: I am trying to find structure in 15-dimensional chemical-abundance space (the output of The Cannon); what algorithms should I use? His answer was k-means or else suck it up and do extreme deconvolution. So of course I will go with k-means in the short run! That said, extreme deconvolution is the right tool for this job eventually.


APOGEE meeting, day 2

In the morning, Diane Feuillet (NMSU) spoke about her project to use hierarchical inference to get measures of the Milky Way star-formation history and improved age estimates for individual stars. I love this, of course. In the afternoon, the Collaboration talked through the mechanisms by which The Cannon could be incorporated into the APOGEE pipeline as a second-stage stellar abundance code and released publicly along with the data. This means, effectively, that my proposal of yesterday was accepted! In the break-out times, I spent some CPU visualizing our 15-dimensional abundance space and I think maybe we see interesting structures and clusters!


APOGEE meeting, day 1

The annual APOGEE Collaboration meeting started in Cocoa Beach, FL today, following the AAS meeting. I presented to the Collaboration the best current results from The Cannon training on 15 abundances and two stellar parameters. I showed that The Cannon de-noises even the training set—that is, it returns labels that are even better than those on which it was trained! That's impossible of course, in an accuracy sense, but it can be true in a precision sense. (Our evidence comes from the precision with which The Cannon finds stars in the same open cluster to be mono-abundance.)

I proposed that The Cannon be incorporated into the APOGEE as a second stage of labeling after the physical-model-based labeling by their ASPCAP pipeline. There were many discussions about details of our results (from Melissa Ness, Andy Casey, and myself) and their comparisons with other pipelines and methods. It looks likely that my proposal will be accepted.

In another part of the meeting, Jennifer Johnson (OSU) proposed that Potassium might have a different supernova origin than is believed. She is basing that on it's covariance with other elements. I promised to deliver a methodologically correct answer to that question, which is in the area of causal inference.


#AAS227, day 4; AAS Hack Day

Today was the fourth annual AAS Hack Day (#hackaas) at #AAS227, organized by Kelle Cruz (CUNY), Meg Schwamb (Taiwan), and myself, and sponsored by the LSST Corporation and Northrop Grumman. We had a huge crowd: About fifty people and the staff had to bring in extra tables, chairs, and power strips. The hacks varied enormously in scope and category; here are just a few that stood out:

AAS meeting conflicts
Adrian Price-Whelan (and a bit Scott Idem and me) used some vector-of-words methods from previous AAS Hack Days to look at schedule issues in the AAS 227 program. He found pairs of oral sessions that were scheduled in conflict that contain talks with abstracts that are close in word space. The idea was to predict which sessions led to the largest number of complaints to the AAS about scheduling, and also provide prototypes of tools that might be used to make scheduling better in the future.
gender and questions in AAS oral sessions
Mehmet Alpaslan and a team including Hack-Day veteran Jim Davenport looked at new data on oral session question-askers and speakers and chairs, finding (as we learned at earlier meetings) that men ask more questions than women, but also finding that the gender of the speaker seems to be correlated with the gender of the question-asker. The data are barely understood at present, being only days old.
crowd-sourcing the old literature reference graph
In some twitter activity prior to the meeting, we discovered that old papers have poor citation and reference information, because the references were often in footnotes, formatted inconsistently, and OCR-ed badly. Brooke Simmons taught the AAS Hack Day participants how to build prototype Zooniverse projects for crowd-sourcing, and Brendan Wells used that knowledge to build a project to solve this old-reference problem. Love that collaboration, which was un-imagined prior to the Hack Day!
glassdome: glassdoor for astronomy
Ellie Schwab and friends started to build a site where people of all different ranks and seniorities could openly or anonymously review their home institutions, and comment on salary and other often-private things. Originally the project started as anonymous, but evolved to more encouraging of open and transparent reviewing as the day went on.
finding asteroids with Kepler
Geert Barentsen arrived with the retrospectively obvious point that the Kepler satellite is awesome for finding asteroids: It spends (in its K2 mode) half of its time looking inside the Earth's orbit, so it is great for finding Earth-crossing and inner asteroids. It also has great cadence and sensitivity. He assembled a great team and started to look. Science! Also on the science with Kepler tip, Jennifer Cash and Lucianne Walkowicz started work extracting photometry from full-field images.
death to Jet
Timothy Pickering propagated the new matplotlib non-Jet colormaps to plotly.js. This is God's work, as it permits web-plotting gurus to benefit from the latest research in visual perception of continuous data. In case you haven't been paying attention, Hack Days are a great time for people to bond over their hatred of the Jet colormap, but Pickering also reminded us of the research that shows that it leads to misconceptions about the data, fails in black-and-white printing, and is bad for people with vision impairments.
exoplanetary systems in WWT
David Weigel, after reminding us that World-Wide Telescope has gone open source and is now a project of the AAS, showed us how he put a known exoplanet system into the software. The plan is to get them all in there and then make possible tours and activities around exoplanet discovery and science.
fabric poster upcycling
Ashley Pagnotta and company brought a sewing machine to #hackaas. It turns out that it makes sense these days to print your poster on fabric not paper! This is because fabric printing is now very cheap, and you can pack a fabric poster trivially in your luggage. Check it out. But Pagnotta and colleagues brought patterns and skills and turned posters into infotaining clothing. Insane.
More (crowd-sourced but incomplete) notes are available here. Thanks to our sponsors and everyone who came, and see you next year!


#AAS227, day 3

I arrived at the AAS in Kissimmee today. While I was preparing for APOGEE meetings and the AAS Hack Day, Andy Casey was sending incredible figures showing that The Cannon returns element abundances for stars that are far more precise even than the input labels used in the training step: It is an element-abundance de-noising system. His best tests involve open and globular clusters, the members of which form very small clumps in abundance space in The Cannon outputs.

At lunch, I ate with the exoplanet crew. I had a conversation with Angie Wolfgang (PSU) about the paper Dan Foreman-Mackey and I wrote about Petigura's sample of exoplanets. This paper is sometimes seen as a criticism of Petigura, but it really is not: We only wrote that paper because Petigura's catalog was the first exoplanet catalog that was good enough that we could do the kind of populations analysis we wanted to do! I guess no excellent research project goes uncriticized, as it were.

There were many impressive results and ideas presented at the APOGEE splinter session. Zasowski (JHU) talked about target selection for the survey and what constraints there are on what can and can't be changed, given physical and scheduling constraints. Renbin Yan (Kentucky) spoke about building a stellar library with APOGEE to support modeling of MaNGA galaxies. He and I also talked in the break with Mike Blanton (NYU) about sky subtraction in APOGEE and MaNGA, which are both hard problems. David Nataf (ANU) spoke about the GALAH survey, and its current status. It is getting stellar parameters but not yet its 29 abundances. I resolved to look into helping them out with The Cannon this semester.

I also had interesting conversations with Scot Kleinman (Gemini) about career paths for observers and operational staff in astrophysics, and with Phil Marshall (KIPAC) about LSST cadence and utility functions. I love these meetings.


likelihood-free inference in cosmology

At group meeting, MJ Vakili told us about his project with Chang Hoon Hahn (NYU) to do cosmological large-scale structure inferences using likelihood-free inference (ABC). We talked about the technical details, but also the philosophy: Current large-scale structure inferences generally use a substitute or made-up likelihood function that can't be correct: It is a Gaussian in correlation-function space. We can make fewer assumptions using ABC; indeed we can get correct posteriors without ever committing to the exact form of the likelihood function. We are performing the demonstration project with just the halo occupation distribution parameters, because this inference is cheaper than the full cosmological inference.


don't cut on significance, cut on value

I blogged yesterday about Dan Foreman-Mackey's new six-planet system. He found it, but then rejected it because the transit of the outer planet induces a significant astrometric offset. We figured out (on the board!) that the offset is consistent with small contamination from a nearby fainter star, and concluded that we should be rejecting stars not on the significance of the astrometric signal, but the absolute amplitude of the signal. This is a re-learn of a general lesson: If you use signficances for selection, errors in your selection depend on the errors in your errors and it makes selection a function of those errors. And so on. Usually (but not always) bad. (Or hard to model well.)


new six-planet extra-solar planetary system!

Dan Foreman-Mackey showed up for a couple of days of hacking. He has been searching for single (isolated) transits (that is, very long-period planets) in the 4-year Kepler data. He has found a few great outer planets! When the stellar properties are well known for the host star (especially the density of the host star, it turns out), the period and eccentricity of the long-period planet are (jointly) constrained fairly well from the transit duration.

Foreman-Mackey's search is super-conservative, as it must be, because the best way to find planets is through their periodicity in the data, and these long-period planets are (by construction of the problem) not periodic inside the Kepler time window. He finds candidates in the time stream and then does a set of probabilistic hypothesis tests against various kinds of data-artifact models. Finally, he rejects systems that induce what are known as “centroid shifts” in the Kepler imaging. These are blends (and therefore the planetary inferences are wrong).

One system he rejected that way, we figured out today, is the sixth planet in a known five-planet system with a set of five closely packed short-period planets. It has a small centroid shift associated with the single transit, but that centroid shift (it turns out) is quantitatively consistent with a small amount of contamination from a fainter star that is within one Kepler pixel angular separation of the target star. That is, the contamination is small and the planetary inferences are not very wrong (and this conclusion applies both to the short-period planets and to the long-period outer planet). Hence: A new six-planet system! You heard it here first, folks! Question for Foreman-Mackey: Why wasn't this announcement made by a twitter (tm) bot?


inferring a Gaussian from samples from random projections

My whole project from the last days of 2015 worked: If you give me samples from random projections of a Gaussian, I can infer the variance tensor of that Gaussian! Even if I only get one single sample point from each projection. I am amazed, but it is the magic of asymptotically unbiased, efficient estimators. (I think!) Anyway, I spent my bits of time here and there over the break coding, running, and visualizing. Now I have a good argument and acceptable (though slow) code. The question is: Is this publishable in any journal? It is a pure statistics result, but obvious in that context. I am pursuing it related to diffraction microscopy, but the problem is way too artificial for that audience, I expect. I need a scope.