#astrohackny, day N

At #astrohackny, Ben Weaver (NYU) showed a huge number of binary-star fits to the APOGEE individual-exposure heliocentric radial velocity measurements. He made his code fast, but not yet sensible, in that it treats all possible radial-velocity curves as equally likely, when some are much more easily realized physically than others. In the end, we hope that he can adjust the APOGEE shift-and-add methodology and make better combined spectra.

Glenn Jones (Columbia) and Malz showed some preliminary results building a linear Planck foreground model, using things that look a lot like PCA or HMF. We argued out next steps towards making it a probabilistic model with more realism (the beam and the noise model) and more flexibility (more components or nonlinear functions). Also, the model has massive degeneracies; we talked about breaking those.


light echos

I hung out in the office while Tom Loredo (Cornell), Brendon Brewer (Auckland), Iain Murray (Edinburgh), and Huppenkothen all argued about using dictionary-like methods to model a variable-rate Poisson process or density. Quite an assemblage of talent in the room! At lunch-time, Fed Bianco talked about light echos. It is such a beautiful subject: If the light echo is a linear response to the illumination, I have this intuition that we could (in principle) infer the full three-dimensional distribution of all the dust in the Galaxy and the time-varible illumination from all the sources. In principle!


radical fake TESS data

This past week, the visit by Zach Berta-Thompson (MIT) got me thinking about possible imaging surveys with non-uniform exposure times. In principle, at fixed bandwidth, there might be far more information in a survey with jittered exposure times than in a survey with uniform exposure times. In the context of LSST I have been thinking about this in terms of saturation, calibration, systematics monitoring, dynamic range, and point-spread function. However, in the context of TESS the question is all about frequency content in the data: Can we do asteroseismology at frequencies way higher than the inverse mean exposure time if the exposure times are varied properly? This weekend I started writing some code to play in this sandbox, that is, to simulate TESS data but with randomized exposure times (though identical total data output).


probabilistic density inference, TESS cosmics

Boris Leistedt (UCL) showed up for the day; we discussed projects for the future when he is a Simons Postdoctoral Fellow at NYU. He even has a shared Google Doc with his plans, which is a very good idea (I should do that). In particular, we talked about small steps we can take towards fully probabilistic cosmology projects. One is performing local inference of large-scale structure to hierarchically infer (or shrink) posterior information about the redshift-space positions of objects with no redshift measurement (or imprecise ones).

Zach Berta-Thompson (MIT) reported on his efforts to optimize the hyper-parameters of my online robust statistics method for cosmic-ray mitigation in the TESS spacecraft. He found values for the two hyper-parameters such that, for some magnitude ranges, my method beats his simple and brilliant middle-eight-of-ten method. However, because my method is more complicated, and because it seems to have its success depends (possibly non-trivially) on his (somewhat naive) TESS simulation, he is inclined to stick with middle-eight-of-ten. I asked him for a full and complete search of the hyper-parameter space but agreed with his judgement in general.


online, on-board robust statistics

Zach Berta-Thompson (MIT) showed up at NYU today to discuss the on-board data analysis performed by the TESS spacecraft. His primary concern is cosmic rays: With the thick detectors in the cameras, cosmic rays will affect a large fraction of pixels in a 30-minute exposure. Fundamentally, the spacecraft takes 2-second exposures and co-adds them on-board, so there are lots of options for cosmic-ray mitigation. The catch is that the computation all has to be done on board with limited access to RAM and CPU.

Berta-Thompson showed that a "middle-eight-of-ten" strategy (every 10 sub-exposures average all but the highest and the lowest) does a pretty good job. I proposed something that looks like the standard "iteratively reweighted least squares" algorithm, but operating in an "online" mode where it can only see the last few elements of the past history. Berta-Thompson, Foreman-Mackey, and I tri-coded it in the Center for Data Science studio space. The default algorithm I wrote down didn't work great (right out of the box) but there are two hyper-parameters to tune. We put Berta-Thompson onto tuning.


dissertation transits

Schölkopf, Foreman-Mackey, and I discussed the single-transit project, in which we are using standard machine learning and a lot of signal injections into real data to find single transits in the Kepler light curves. This is the third chapter of Foreman-Mackey's thesis, so the scope of the project is limited by the time available! Foreman-Mackey had a breakthrough on how to split the data (for each star) into train, validate, and test such that he could just do three independent trainings for each star and still capture the full variability. False positives remain dominated by rare events in individual light curves.

With Dun Wang, we discussed the GALEX photon project; his job is to see what about the photons is available at MAST, if anything, especially anything about the focal-plane coordinates at which they were detected (as opposed to celestial-sphere coordinates). This was followed by lunch at facebook with Yann LeCun.


Simons Center for Data Analysis

Bernhard Schölkopf arrived for a couple of days of work. We spent the morning discussing radio interferometry, Kepler light-curve modeling, and various things philosophical. We headed up to the Simons Foundation to the Simons Center for Data Analysis for lunch. We had lunch with Marina Spivak (Simons) and Jim Simons (Simons). With the latter I discussed the issues of finding exoplanet rings, moons, and Trojans.

After lunch we ran into Leslie Greengard (Simons) and Alex Barnett (Dartmouth), with whom we had a long conversation about the linear algebra of non-compact kernel matrices on the sphere. This all relates to tractable non-approximate likelihood functions for the cosmic microwave background. The conversation ranged from cautiously optimistic (that we could do this for Planck-like data sets) to totally pessimistic, ending on an optimistic note.

The day ended with a talk by Laura Haas (IBM) about infrastructure (and social science) she has been building (at IBM and in academic projects around data-driven science and discovery. She showed a great example of drug discovery (for cancer) by automated "reading" of the literature.



I took a physical-health day today, which means I stayed at home and worked on my students' projects, including commenting on drafts, manuscripts, or plots from Malz, Vakili, and Wang.


robust fitting, intelligence, and stellar systems

In the morning I talked to Ben Weaver (NYU) about performing robust (as in "robust statistics") fitting of binary-star radial-velocity functions to the radial velocity measurements of the individual exposures from the APOGEE spectroscopy. The goal is to identify radial-velocity outliers and improve APOGEE data analysis, but we might make a few discoveries along the way, a la what's implied by this paper.

At lunch-time I met up with Bruce Knuteson (Kn-X) who is starting a company (see here) that uses a clever but simple economic model to obtain true information from untrusted and anonymous sources. He asked me about possible uses in astrophysics. He also asked me if I know anyone in US intelligence. I don't!

In the afternoon, Tim Morton (Princeton) came up to discuss things related to multiple-star and exoplanet systems. One of the things we discussed is how to parameterize or build pdfs over planetary systems, which can have very different numbers of elements and parameters. One option is to classify systems into classes, and build a model of each (implicitly qualitatively different) class and then model the full distribution as a mixture of classes. Another is to model the "biggest" or "most important" planet first; in this case we build a model of the pdf over the "most important planet" and then deal with the rest of the planets later. Another is to say that every single star has a huge number of planets (like thousands or infinity) and just most of them are unobservable. Then the model is over the an (effectively) infinite-dimensional vector for every system (most elements of which describe planets that are unobservable or will not be observed any time soon).

This infinite-planet descriptor sounds insane, but there are lots of tractable models like this in the world of non-parametrics. And the Solar System certainly suggests that most stars probably do have many thousands of planets (at least). You can guess from this discussion where we are leaning. Everything we figure out about planet systems applies to stellar systems too.


Blanton-Hogg group meeting

Today was the first-ever instance of the new Blanton–Hogg combined group meeting. Chang-Hoon Hahn (NYU) presented work on the environmental dependence of galaxy populations in the PRIMUS data set and a referee report he is responding to. We discussed how the redshift incompleteness of the survey might depend on galaxy type. Vakili showed some preliminary results he has on machine-learning-based photometric redshifts. We encouraged him to go down the "feature selection" path to start; it would be great to know what SDSS catalog entries are most useful for predicting redshift! Sanderson presented issues she is having with building a hierarchical probabilistic model of the Milky Way satellite galaxies. She had issues with the completeness (omg, how many times have we had such issues at Camp Hogg!) but I hijacked the conversation onto the differences between binomial and Poisson likelihood functions. Her problem is very, very similar to that solved by Foreman-Mackey for exoplanets, but just with different functional forms for everything.


#astrohackny, CMB likelihood

I spent most of #astrohackny arguing with Jeff Andrews (Columbia) about white-dwarf cooling age differences and how to do inference given measurements of white dwarf masses and cooling times (for white dwarfs in coeval binaries). The problem is non-trivial and is giving Andrews biased results. In the end we decided to obey the advice I usually give, which is to beat up the likelihood function before doing the full inference. Meaning: Try to figure out if the inference issues are in the likelihood function, the prior, or the MCMC sampler. Since all these things combine in a full inference, it makes sense to "unit test" (as it were) the likelihood function first.

Late in the day I discussed the CMB likelihood function with Evan Biederstedt. Our goal is to show that we can perform a non-approximate likelihood function evaluation in real space for a non-uniformly observed CMB sky (heteroskedastic and cut sky). This involves solving—and taking the determinant of—a large matrix (50 million squared in the case of Planck). I, for one, think we can do this, using our brand-new linear algebra foo.


probabilistic Cannon

The biggest conceptual issue with The Cannon (our data-driven model of stellar spectra) is that the system is a pure optimization or frequentist or estimator system: We presume that the training-data labels are precise and accurate, and we obtain, for each test-set spectrum, best-fit labels. In reality our labels are noisy, there are stars that could be used for training but they only have partial labels (logg only from asteroseismology, for example), and we don't have zero knowledge about the labels of the unlabeled spectra. This calls for Bayes. Foreman-Mackey drew a graphical model in the morning and suggested variational inference. Late in the afternoon, David Sontag (NYU) drew that same model and made the same suggestion! Sontag also pointed out that there are some new ideas in variational inference that might make the project an interesting project in the computer-science-meets-statistics literature too. Any takers?



I spent the day at Tufts, where I spoke about The Cannon. Conversation with the locals centered on galaxy evolution, about which there are many interesting projects brewing.


GRE issues; binary star anomalies

Keivan Stassun (Vanderbilt) was at NYU all day, giving a morning talk about his very successful STEM PhD bridge program and an afternoon talk about stars (as they relate to exoplanets and other multiple systems). There was also a great discussion in-between, with academics from around the University in attendance. During lunch, Stassun emphasized that if there is one, single take-home thing we can do to improve the way we run our PhD programs, it is to stop using the GRE as an indicator of merit. He said that there is now abundant, redundant information and studies that show that GRE performance is a very strong function of sex and race, even controlling for scholastic aptitude. The adoption of the GRE was, of course, a very progressive thing: Let's judge applicants on objective measures of merit! But it turns out in practice that it does not measure merit. Most of us (myself included) think about the GRE anecdotally (what was it like for me, or for my students); but if we think about it systematically, I think we will find that we shouldn't be using it if what we want is to admit the best possible students. Stassun: Testify!

In the afternoon talk, Stassun showed some very tantalizing and very perplexing evidence that stars in trinary systems might be physically different from stars in binary systems! He showed that for "hard" eclipsing binaries, the consistency of the stellar radii and luminosities and masses with a deterministic set of relationships appears violated for binaries that have a distant tertiary companion. That is, the distant companion seems to affect the stars in the binary. The data set is still small, and it could be a fluke, but the observation makes clear predictions and presents an awesome physics puzzle. He also talked about the flicker method for determining stellar surface gravities, which I have discussed here previously.


inferring evolution, hidden Markov model

Sriram Sankararaman (Harvard) gave a great Computer Science Colloquium today about inferring the evolutionary tree (well, it isn't really a tree) from genetic information, particularly as regards humans and neandertals. He is able to show, using the statistics of DNA variability, that humans and neandertals had intermixing long after they separated (both geographically and as species). He was also able to show that there is statistical evidence for the sterility (infertility) of males after speciation. Awesome stuff, and very related to cosmology in many ways: The models are of two-point statistics of the DNA sequences, not the sequences themselves, and the probabilistic modeling methods (approximate Gaussian likelihood functions and MCMC) are very similar indeed.

Prior to that, in group meeting, McFee and Huppenkothen jointly proposed a plan for clustering black hole timing data using a hidden Markov model: The idea is that the data are generated by a probability distribution that is set by a state, and there are finite probabilities of transitioning from state to state at each time step. This is a well-understood idea in machine learning, but also very close to how we think about the generation of the timing data, fundamentally. Great plan! Huppenkothen's first order of business is to run k-means in a feature space (for initialization of the HMM).


upgrading The Cannon; finding single transits

The day started with a short conversation with Anna Ho (MPIA), Foreman-Mackey, Ness, and Rix about changing the polynomial model inside The Cannon into a Gaussian Process. This move (which Ho and Foreman-Mackey are attempting this week) brings a large number of advantages: It makes model complexity a continuous problem rather than a discrete problem, and it permits us to continuously tune model complexity at every wavelength. We are violating all the usual rules though, because although at training time we are doing standard Gaussian Process regression, at test time we are doing the inverse of Gaussian Process regression! That's a bit crazy.

In late morning, Susan Kassin (STScI) told us about simple measures of galaxy dynamics and the formation and evolution of the Tully–Fisher relation. She busted some galaxy-evolution myths. What she has been doing so successfully at high redshifts (out to one), the SDSS-IV MaNGA project will do at low redshifts (and much more).

I talked to Foreman-Mackey over lunch (and over 4000 miles) about single transits. He has developed (in partnership with Schölkopf) a machine-learning methodology for finding them and understanding both completeness and false-positive rates. We discussed the issues and scope for a first paper, which would create a complete catalog for the main-mission Kepler data.

I also gave comments on current drafts from Malz and Vakili, and had a long conversation with Wandelt (IAP) regarding probabilistic approaches to cosmology. I pitched my project to evaluate a non-approximate likelihood function in real space for CMB maps and he agreed that if we could show that a full likelihood function evaluation is computationally tractable, it might influence the next generation of analyses.


the meanings of sentences

It was a low-research day today, with the urgent blocking out the important! One highlight, however, was a talk in Computer Science by Yoav Artzi (UW) about natural language processing. Unlike other computer-science talks I have seen in this area, Artzi is trying to represent the meaning of sentences and conversations semantically. Usually in a machine-learning method, the "meaning" of the sentence is encoded in some abstract vector space. These meaning objects are generally uninterpretable and have no compositionality or other useful structure. Artzi wants compositionality, because he wants to be able to combine the meanings of multiple conjoined sentences or clarifications in a conversation and also combine with knowledge about the world. So he is learning a semantic model (and an ontology, and a lexicon, and so on) as sentences come in. This is an ambitious project! He is working in the limited domain of interactive systems to schedule meetings and travel, to make his project feasible.


editing and commenting

All I did today was comment on and edit reports by Wang, Malz, Vakili, the AAAC, and the Spitzer Oversight Committee.



At Friday group meeting, we got status reports from Wang, Vakili, Huppenkothen, and Sanderson. We came up with a well-defined path for Huppenkothen's first paper on classifying black-hole states from GRS 1915.

Over lunch, I had a brief, optimistic discussion with Ness. Yesterday she was concerned that our ability to see the red-clump stars (and separate them from the normal red-giant branch) might have been statistical differences in training set properties, but she reported that all such tests are now checking out okay: We really can see this difference spectrally. There is still a very important theoretical question of why, but that's going to be a good example of the data-driven model informing theory.


spectral age indicators, structured learning

Before group meeting, Ness and I discussed the scope of a paper that separates red-clump stars from ordinary red-giant stars, using the data-driven spectral model we call The Cannon. We discussed also the possibility that this could turn into a set of spectral age indicators: If we can separate the red clump from the red-giant branch, maybe we can split the red-giant branch into the three nearly overlapping branches on which stars rise and fall as they age.

Andreas Mueller (NYU), one of the principal developers of scikit-learn joined my group meeting today. He told us about structured learning, in which you augment learning based on features with other kind of structural information, usually represented as graph edges or even graph edges with features themselves. Key example: If you want to know what pixels in an image are sky pixels, you are interested in their color, but also their proximity to neighboring pixels that are also labeled as sky pixels (or not). He is building, documenting, and maintaining an open-source package called pyStruct.


#astrohackny, candidate Heinrich, LHC

In the morning, I got a great email from Ness, showing that we can separate red-clump and non-red-clump red-giant stars at huge confidence, using APOGEE spectra and The Cannon. I also read and gave comments on Malz's first attempt at doing inference with probabilistic redshifts.

At #astrohackny Price-Whelan and I started on the crazy Gaussian-Process blind source separation plan. We also pitched a linear version too and divided up tasks among the various hackers interested in working on the Planck data. We didn't get very far, because we spent most of our hacking time understanding this Lawrence paper (PDF).

In the afternoon, I had the pleasure of being on a committee for the oral candidacy exam of Lukas Heinrich (NYU). He spoke about RECAST, which is a system to permit outsiders to re-interpret LHC ATLAS searches in terms of new or different physics models. The idea is: If a search has been done and it is relevant to some new or different physics, there is no need to do new searches ab initio until the existing searches have been checked for the relevant physics. This all also ties into ideas of preserving workflow and reproducibility and open science, all of which are very relevant to the Moore-Sloan Data Science Environments.


normal modes in stars

At lunch Huppenkothen gave the brown-bag talk, on neutron star normal modes and their possible use in constraining neutron-star equation of state (and thus nuclear physics). She was pessimistic in the end, because there are so few modes measured, but in a precision sense, the data (taken at face value) do rule out some models.

After the talk, Andrei Gruzinov and I argued about the relationship between Huppenkothen's normal-mode constraints and spin constraints on neutron stars (mentioned also last week by Kaspi in our Physics Colloquium). He made a nice argument, which I will butcher to this: The speed of sound at the surface of a neutron star (or really any gravitationally bound object) must be on the order of the gravitational orbit velocity at the surface. Why? Because otherwise the object would further compress under gravity! This all flows from the point that the sound speed is related to the compressibility through some kind of modulus. Simple! I should check this for the Sun.

I spent part of the morning prior to all this with Foreman-Mackey, discussing plans for his trip to Tübingen and Heidelberg. We want to work further on noise modeling or calibration, in the context of stellar variability, exoplanet search, and asteroseismology. We discussed Schölkopf's causal arguments and why we get overfitting despite them; I don't yet understand what is the appropriate "large data" limit at which the relevant theorems are going to hold.