fail fast!

It was a low-research day! The most productive moment came early in the morning, when I had a great discussion with Boris Leistedt (NYU) and Adrian Price-Whelan (Princeton) about the structure of my group meetings at CCA. We need to change them; we aren't hearing enough from young people, and we aren't checking in enough on projects in progress. They agreed to lead a discussion of this in both group meetings tomorrow, and to make format changes, implemented immediately. I have to learn that failing to do things right is only bad if we don't learn from it and try to do better.

Right after this conversation, Price-Whelan and I got in a short discussion about making kinematic age estimates for stars, using widely-separated, nearly co-moving pairs. I hypothesized that for any real co-moving pair, the separation event (the spatial position and time at which they were last co-spatial), will be better estimated than either the velocity or the separation, given (say) Gaia data.


precise ages from imprecise indicators; QCD dark matter

In the morning, I met with Ruth Angus (Columbia) to discuss the ages of stars. We brainstormed all possible age estimates for stars, and listed some limitations and epistemological properties. In addition to the usuals (rotation, activity, isochrone fitting, asteroseismology, and so on), we came up with some amusing options.

For example, the age of the Universe is a (crude, simple, very stringent) age estimate for every single star, no matter what. It is a very low-precision estimate, but it is unassailable (at the present day). Another odd one is the separation of comoving pairs. In prinicple every co-moving pair provides an age estimate given the relative velocity and relative position, with the proviso that the stars might not be co-eval. This is a good age estimate except when it isn't, and we only have probabilistic information about when it isn't.

We then wrote down the basic idea for a project to build up a hierarchical model of all stellar ages, where each star gets a latent true age, and every age indicator gets latent free parameters (if there any). Then we use stars that overlap multiple age indicators to simultaneously infer all free parameters and all ages. The hope—and this is a theme I would like to thread throughout all my research—is that many bad age indicators (and they are all bad for different reasons) will, when combined, produce precise age estimates nonetheless for many stars.

At lunch-time, Glennys Farrar (NYU) gave an energizing black-board talk about a dark-matter candidate that exists within QCD, made up of a highly symmetric state of six quarks. QCD is a brutal theory, so it is hard to compute the properties of this state, or its stability, but Farrar laid out some of the conditions under which it is a viable dark-matter candidate. It is very interesting phenomenologically if it exists, because it has a non-trivial cross-section for scattering off of atomic nuclei, and it could be involved in baryogenesis or the matter–anti-matter asymmetry.


#AAAC, day 2

[This is the 12th birthday of this blog, and something like the 2814th post. I remain astonished that anyone reads this blog; surely it qualifies as one of the least interesting sites on the internet.]

I spent today again inside NSF headquarters. It was a good day, because most of our session was pure unstructured discussion of the issues—not presentations from anyone—in open session. All of the AAAC sessions are completely open, with an agenda and a call-in number open to literally anyone on the planet. This openness was also part of our discussion, because we got some discussion in on the opaque process by which the Decadal Survey (which is so damned important) is executed and also staffed. As part of this I published the non-disclosure agreement that the National Academy of Sciences asks people to sign if they are going to participate. It is way too strong, I think.

We also talked about many other interesting priorities and issues for our report. One is that the America Competes Act explicitly refers to the astrophysics decadal process as an exemplar for research funding priority-setting in the US government. Another is that the freedom of scientists in government agencies to clearly and openly communicate without executive-branch interference is absolutely essential to everything we do. Another is that the current (formalized, open) discussion about CMB Stage-4 experiments is an absolutely great example of inter-agency and and inter-institutional and cross-rivalry cooperation that will lead to a very strong proposal for the agencies, the community, and the Decadal Survey.

One very important point, which also came up at #AAS229, is that if we are going to make good, specific, actionable recommendations to the Decadal Survey about the state of the profession, about software, or about funding structures, we need to gather data now. These data are hard to gather; there are design and execution issues all over; let's have that conversation right now.


#AAAC, day 1

Today was the first day of an Astronomy and Astrophysics Advisory Committee meeting at NSF headquarters in Washington, DC. We had presentations from the agencies for most of the day. Random things that I learned that interest me follow in this blog post. Our meetings are open, by the way.

NSF is trying to divest from facilities in ways that keep them running by other partners, so even though they may go public, they will at least stay part of the community. In particular, they are working to offload Aricebo to a combination of NASA and private partners.

NASA has taken its ATP theory call down to once every two years, but not reduced funding. They hope this will increase the amount of funding per submitted proposal, and the early data seems like it might. NASA and NSF have started a joint funding program called TCAN for computational methods in astrophysics. That might affect me! NASA re-balanced its fellowship postdocs, in response to concerns about pipeline, long-term trends in their own funding portfolio, and the rise in private fellowships. This is debatable and controversial, though they did not enter into this decision-making lightly. What is not controversial is that they have combined all the fellowships into a common application process, substantially reducing the workload on applicants and referees.

There is an extremely big and serious CMB S-4 process going on, in which many traditionally rivaling scientific groups are cooperating to find consensus around what to build or do next. That's very healthy for the field, I think, and will create a very strong set of ideas for the next Decadal Survey to discuss. Decadal is on the agenda for tomorrow!

Towards the end of the day, Paul Hertz (NASA) and I got into a fight about Deep Space Network. I fear that I might be wrong here; I can't really claim to understand that stuff better than Hertz!


stellar mergers and oscillations; cosmological dictionaries

Today was group-meeting day. In stars group meeting, Matteo Cantiello (CCA) discussed recent results on star–star interactions, including a star–star merger that may have been caught by the OGLE experiment. He gave us some order-of-magnitude thinking about the common-envelope phase and how we might use these events to understand stars. He was pessimistic about being able to do full simulations of the events; there are too many things happening at too many scales. He also showed us another tight binary system which shows period changes that suggest a merger in 2022.

Dan Foreman-Mackey (UW) spoke about linear algebra and asteroseismology. With Eric Agol (UW) he has developed linear algebra techniques such that he can solve matrix equations in linear time (and also take the determinant, which is super-important), provided that the matrix is a kernel matrix of a certain (very flexible) form. This form is capable of modeling a star's light curve as a mixture of stochastically driven oscillators. This raises the hope of automatically getting asteroseismic parameters for all TESS stars! In the discussion, we arrived at the idea of using Kepler to measure the three-point function for stellar variability. David Spergel (Flatiron) predicted that it would lead to constraints on mode coupling and other aspects of stellar physics.

Cosmology group meeting was crashed by Daniel Mortlock (Imperial) and Hiranya Peiris (UCL). Mortlock told us that there are still very high-redshift quasars being discovered, but that he still has the redshift record, and that, given Eddington time-scales, his is still the most extreme high-redshift quasar. This was followed by a wide-ranging discussion (led by Elijah Visbal, Flatiron) of the possibility that we could be using generative models or better estimators than two-point functions in 21-cm surveys designed to discover the physics of reionization of the Universe. Peiris brought up dictionary methods and we spent time discussing these, and the possibility that we could learn sparse dictionaries on simulations and use them on data. It was very vague, but gives me ideas about where we at CCA need to learn more about methodologies.


gaussian-process stellar spectrum

Today was hack day with Ruth Angus (Columbia), Megan Bedell (Chicago), Dan Foreman-Mackey (UW), and I all working on various things in parallel at NYU. Bedell and Foreman-Mackey got the Gaussian-Process stellar spectroscopy model working for Bedell's HARPS data, and it is blazingly fast. It is much faster than the code Bedell and I wrote that does a dense spline. The fastness comes from magic that Eric Agol (UW) and Foreman-Mackey are making happen for GP kernels of a particular (very flexible) form. We made various pragmatic decisions in this project today, like to work in log flux (rather than flux) and to optimize an error model along with the other hyper-parameters. These all look like good decisions in the short run.


GP stellar spectrum, explosions

Megan Bedell (Chicago) and Dan Foreman-Mackey (UW) came into town for a few days of hacking on stellar spectra. We had long discussions about the point and scope of our project, and made plans for the week. Foreman-Mackey argued that we should switch over to a Gaussian-Process model for the stellar spectrum. That seems sensible, in part because he has the fastest code in the world for that. He didn't object to our “fit and subtract” approach to looking for stellar variability in the spectral domain: As Andrew Gelman (Columbia) teaches us, inspecting residuals is how you make choices for model extension, improvement, and elaboration.

After lunch, Maryam Modjaz (NYU) gave a great, wide-ranging talk about her work on supernovae, supernova progenitors, chemical abundances, and the supernova–GRB connection. As I have commented here before, I think her results—which show that broad-line type-2c supernovae with and without associated gamma-ray burst live in different kinds of environments puts strong pressure on any model of GRB beaming. I also learned in her talk that there are new classes of transients that are brighter than classical novae and fainter than supernovae that are currently unexplained.


stars tracing dark matter; unresolved stars

I had lunch with Mariangela Lisanti (Princeton), where we talked about seeing the dark matter in the Milky Way using stars and stellar dynamics. One simple thing we discussed is the following: To what extent do extremely old stars in the Halo trace the dark matter? There are good theoretical reasons that they should be close, but also good theoretical reasons that they should not be perfect tracers. Interesting whether it would be possible to get a very accurate view of the dark matter distribution in space just by looking at the stellar positions for some carefully chosen set of stars.

After this I went through the talk slides MJ Vakili (NYU) has prepared for Berkeley next week. He has a great set of results, and an impressive talk. I also discussed an ancillary science proposal for APOGEE with Gail Zasowski (STScI): We want to look in M31 for the chemical abundance trends (with kinematics and galactocentric radius) that we see in the Milky Way by taking APOGEE spectra and then deconvolving (modeling) them as a linear superposition of stars with different chemistry and kinematics. That would be living the dream!


tidal disruptions are not trivial

Today there was a great, educational visit by James Guillochon (CfA) to the Flatiron Institute. Guillochon led a discussion about tidal disruption events (stars disrupted by black holes) and the transient phenomena they should make. He (perhaps unintentionally?) sowed doubt in my mind that the things currently classified as TDEs are in fact TDEs: There ought to be a huge range of phenomena, depending on the star, the black hole, and orbits. He gave a beautiful answer to my question about seeing the star brighten simply because of the tidal distortion (which is immense: the star stretches out to thousands of AU in length): He predicted that there should be a rapid recombination. That, of course, made me think that we should look for H-alpha or Lyman-alpha flashes! After this discussion, he and I discussed his work assembling all (and really all) published data about supernovae (photometry and spectroscopy), which is a scraping project of immense scope.


blind analysis; hierarchical models

In stars group meeting, we discussed two hierarchical models of the Gaia TGAS data. Keith Hawkins (Columbia) is building a model of the red clump stars; he finds that if he selects red-clump stars carefully, they are very good standard candles. His hierarchical model determines this and also de-noises the parallaxes for them. Boris Leistedt (NYU) went even further and deconvolved the full color-magnitude diagram, though with a baroque hierarchical model that includes a highly parameterized model for the density of stars in color–magnitude space.

In cosmology group meeting, a lot happened, with Josh Speagle (CfA) talking about next-generation photometric redshifts, Mike Blanton (NYU) talking about the huge NSF proposal we are putting in at NYU around physics and data science, and David Spergel (Flatiron), Blanton, and me arguing about blind analyses. The latter subject is rich with issues. We want (and need) exploratory data analysis, but we also want (and need) secure statistical results without p-hacking, forking paths, and so on. There was disagreement among the group, but I argued that you can have it all if you design right. There are interesting conflicts with open science there.

I also met with Francisco Villaescusa (Flatiron) to talk about work on neutrinos in large-scale structure. He promised me some papers to read on the subject.


Ohio State University

I spent the day at Ohio State University today, visiting the Physics and Astronomy Departments. I had a great time! Too many things happened to mention them all, but here are some highlights:

At the arXiv coffee discussion, among other results we discussed was a new paper by Radek Poleski and colleagues in which they identify hundreds of thousands of variable stars in the OGLE data set. Poleski showed some examples, which included eclipsing binaries in which both stars are elongated tidally (and it is obvious in the light curves), transiting exoplanets, transiting exoplanets where the transits come and go like there is rapid precession, periodic variables, periodic variables with second derivatives in the period (hence possibly accelerations) and so on. He claims that every single one of the variables was checked by hand.

In the stars group meeting run by Jennifer Johnson, we discussed mainly stellar rotation, and how it connects to age and stellar evolution. Johnson runs the group meeting such that each participant brings a figure (if they can) and that figure is discussed and improved. That's a good idea. Also in that meeting we discussed what to do if your result is scooped!

In the galaxies group meeting towards the end of the day, we argued about the dimensionality of chemical-abundance space, both in theory and in the observations, with me arguing that obviously the observational space is higher dimensionality than any theory space. But David Weinberg challenged me, and forced me to sharpen my arguments, and also make better plans for whatever paper I am going to write about this!



I spent the day writing reports for the National Science Foundation on my grants. Does this count as research? I guess it does, in the long run! The break in the day was a long lunch with Boris Leistedt (NYU) in which we discussed his priorities for research in the near term, and also his paper on an empirical model of stars from the Gaia data.


photometric redshifts at low redshift!

Marla Geha (Yale) came in to town today to ask Boris Leistedt (NYU) and I whether our new photometric redshift method could be used at very low redshifts. In general, low redshifts are difficult because even a large fractional change in the distance creates only a small change in the colors when the redshift is small. Geha (with Wechsler and others) has a sample of possible (intrinsically) faint satellites of low-redshift galaxies and they would like to improve the efficiency of their spectroscopic follow-up. A great project for a great tool!



In my research time today, I pair-coded with Lauren Anderson (CCA) a visualization of a two-d mixture of Gaussians. This involves a little linear algebra.


so many things (I love Wednesdays)

In the stars group meeting at CCA, there was huge attendance today. David Spergel (CCA) opened by giving a sense of the WFIRST GO and GI discussion that will happen this week at CCA. The GI program is interesting: It is like an archival program within WFIRST. This announcement quickly ran into an operational discussion about what WFIRST can do to avoid saturation of bright stars.

Katia Cunha (Observatorio Nacional, Brazil) spoke about two topics in APOGEE. The first is that they have found new elements in the spectra! They did this by looking at the spectra of s-process-enhanced stars (metal-poor ones) and finding strong, unidentified lines. This is exciting, because before this, APOGEE has no measurements of the s process. The second topic is that they are starting to get working M-dwarf models, which is a first, and can measure 13 element abundances in M dwarfs. Verne Smith (NOAO) noted that this is very important for the future use of these spectrographs and exoplanet science in the age of TESS. On this latter point, the huge breakthrough was in improvements to the molecular line lists.

Dave Bennett (GSFC) talked to us about observations of the Bulge with K2 and other instruments to do microlensing, microlensing parallax, and exoplanet discovery. He noted that there isn't a huge difference between doing characterization and doing search: The photometry has to be good to find microlensing events and not be fooled by false positives. He is in NYC this week working with Dun Wang (NYU).

Jeffrey Carlin (NOAO) led a discussion of detailed abundances for Sagittarius-stream stars as obtained with a CFHT spectrograph fiber-fed from Gemini N. These abundances might unravel the stream for us, and inform dynamical models. This morphed into a conversation about why the stellar atmosphere models are so problematic, which we didn't resolve (surprised?). I pitched a project in which we use Carlin's data at high resolution to train a model for the LAMOST data, as per Anna Y. Q. Ho (Caltech), and then do science with tens of thousands of stars.

In the cosmology group meeting, we discussed the possibility of evaluating (directly) the likelihood for a CMB map or time-ordered data given the C-ells and a noise model. As my loyal reader knows, this requires not just performing solve (inverse multiplication) operations but also (importantly) determinant evaluations. For the discussion, mathematicians Mike O'Neil (NYU) and Leslie Greengard (CCA) and Charlie Epstein (Penn) joined us, with Mike O’Neil leading the discussion about how we might achieve this, computationally. O’Neil outlined two strategies, one of which takes advantage of a possible HODLR form (Ambikasaran et al), another of which takes advantage of the spherical-harmonics transform. There was some disagreement about whether the likelihood function is worth computing, with Hogg on one end (guess which) and Naess and Hill and Spergel more skeptical. Spergel noted that if we could evaluate the LF for the CMB, it opens up the possibility of doing it for LSS or intensity mapping in a three-dimensional (thick) spherical shell (think: redshift distortions and fingers of god and so on).

Between meetings, I discussed deconvolutions of the TGAS color-magnitude diagram with Leistedt and Anderson, and low-hanging fruit in the comoving-star world with Oh and Price-Whelan.


unsupervised models of stars

I am very excited these days about the data-driven model of stellar spectra that Megan Bedell (Chicago) and I are building. In its current form, all it does is fit multi-epoch spectra of a single star with three sets of parameters, a normalization level (one per epoch) times a wavelength-by-wavelength spectral model (one parameter per model wavelength) shifted by a Doppler Shift (one per epoch). This very straightforward technology appears to be fitting the spectra to something close to the photon noise limit (which blows me away). The places where it doesn't fit appear to be interesting. Some of them are telluric absorption residuals, and some are intrinsic variations in the lines in the stellar spectra that are sensitive to activity and convection.

Today we talked about scaling this all up; right now we can only do a small part of the spectrum at a time (and we have a few hundred thousand spectral pixels!). We also spoke about how to regress the residuals against velocity or activity. The current plan is to investigate the residuals, but of course if we find anything we should add it in to the generative model and re-start.



Not much research today, but I did have conversations with Lauren Anderson (Flatiron) about deconvolving the observed (by Gaia TGAS and APASS) color-magnitude diagram of stars, with Leslie Greengard (Flatiron) and Alex Barnett (Dartmouth) about cross-over activities between CCA and CCB at Flatiron, and with Kyle Cranmer (NYU) about his immense NSF proposal.


#hackAAS at #aas229

Today was the (fifth, maybe?) AAS Hack Day; it was also the fifth day of #aas229. As always, I had a great time and great things happened. I won't use this post to list everything from the wrap-up session, but here are some personal, biased highlights:

Inclusive astronomy database
Hlozek, Gidders, Bridge, and Law worked together to create a database and web front-end for resources that astronomers can read (or use) about inclusion and astronomy, inspired in part by things said earlier at #aas229 about race and astronomy. Their system is just a prototype, but it has a few things in it and it is designed to help you find resources but also add resources.
Policy letter help tool
Brett Morris led a hack that created a web interface into which you can input a letter you would like to write to your representative about an issue. It searches for words that are bad to use in policy discussions and asks you to change them, and also gives you the names and addresses of the people to whom you should send it! It was just a prototype, because it turns out there is no way right now to automatically obtain representative names and contact information. That was a frustrating finding about the state of #opengov.
Budget planetarium how-to
Ellie Schwab and a substantial crew got together a budget and resources for building a low-buck but fully functional planetarium. One component was WWT, which is now open source.
Differential equations
Horvat and Galvez worked on solving differential equations using basis functions, to learn (and re-learn) methods that might be applicable to new kinds of models of stars. They built some notebooks that demonstrate that you can easily solve differential equations very accurately with basis functions, but that if you choose a bad basis, you get bad answers!
K2 and the sky
Stephanie Douglas made an interface to the K2 data that show a postage stamp from the data, the light curve, and then aligned (overlaid, even) imaging from other imaging surveys. This involved figuring out some stuff about K2's world coordinate systems, and making it work for the world.
Poster clothing
Once again, the sewing machines were out! I actually own one of these now, just for hack day. Pagnotta led a very successful sewing and knitting crew. Six of the team used a sewing machine for the first time today! In case you are still stuck in 2013: The material for sewing is the posters, which all the cool kids have printed on fabric, not paper these days!
Erik Tollerud built some tools for the long-term storage and archiving of #hackAAS hacks. These leverage GitHub under the hood.

There were many other hacks, including people learning how to use testing and integration tools, people learning to use the ADS API, people learning how to use version control and GitHub, testing of different kinds of photometry, and visualization of various kinds of data. It was a great day, and I can't wait for next year.

Huge thanks to our corporate sponsor, Northrop Grumman, and my co-organizers Kelle Cruz, Meg Schwamb, and Abigail Stevens. NG provided great food, and Schwamb did a great job helping everyone in the room understand the (constructive, open, friendly, fun) point of the day.


#aas229, day 4

I arrived at the American Astronomical Meeting this morning, just in time (well a few minutes late, actually) for the Special Session on Software organized by Alice Allen (ASCL). There were talks about a range of issues in writing, publishing, and maintaining software in astrophysics. I spoke about software publications (slides here) and software citations. Not only were the ideas in the session diverse, the presenters had a wide range of backgrounds (three of them aren't even astronomers)!

There were many interesting contributions to the session. I was most impressed with the data that people are starting to collect about how software is built, supported, discovered, and used. Along those lines, Iva Momcheva (STScI) showed some great data she took about how software projects are funded and built. This follows great work she did with Erik Tollerud (STScI) on how software is used by astronomers (paper here). In their new work, they find that most software is funded by grants that are not primarily (or in many cases not even secondarily) related to the software, and that most software is written by early-career scientists. These data have great implications for the next decade of astrophysics funding and planning. In the discussion afterwards, there were comments about how hard it is to fund the maintenance of software (something I feel keenly).

Similarly, Mike Hucka (Caltech) showed great results he has on how scientists discover software for use in their research projects (paper here). He finds (surprise!) that documentation is key, but there are many other contributing factors to make a piece of research software more likely to be used or re-used by others. His results have strong implications for developers finishing software projects. One surprising thing is that scientists are less platform-specific or language-specific in their needs than you might think.

I spent part of the afternoon hiding in various locations around the meeting, hacking on an unsupervised data-driven model of stellar spectra with Megan Bedell (Chicago).


making slides

My only real research accomplishment today was to make slides for my AAS talk on software publications, which is for a special session organized by Alice Allen (ASCL). The slides are available here.


carbon stars, regulation of star formation, and so much more

Rix called me to discuss the problem that when we compare the chemical abundances in pairs of stars, we get stars that are more identical than we expect, given our noise model for chemical abundances. That is, we see things with chi-squared (far) less than the number of elements. This means (I think) that our noise estimation is overly conservative: There are (at least some) stars that we are observing at very good precision. Further evidence for my view is that there are more such (very close) pairs within open clusters than across open clusters (or in the field).

In stars group meeting, Jill Knapp (Princeton) spoke about Carbon stars (stars with more carbon than oxygen, and I really mean more in counts of atoms). She discussed dredge-up and accretion origins for these, and how we might distinguish these. She has some results on the abundance of Carbon stars as a function of expected (from stellar models) surface-convection properties, which suggest accretion origins. But it is early days.

Chang-Goo Kim (Princeton) told us about simulations that are designed to understand the regulation of star formation in galaxy disks (kpc scales). He pointed out the importance of gravity in setting the star-formation rate; these arguments are always reminiscent (to me) of the Eddington argument. His simulations include supernovae feedback in the form of mechanical and radiation energy, and magnetic turbulence and cosmic ray pressure. He emphasized that conclusions about feedback-regulated star formation depend strongly on assumptions about spatial correlations and locations (think escape over time) of the supernovae relative to the dense molecular cloud in which the star formation occurs. Fundamentally the thing that sets the star-formation rate is the pressure, which can be hydrostatic or turbulent or both.

Semyeong Oh (Princeton) and I led a discussion on the lowest-hanging fruit for projects that exploit her comoving star (and group) catalog from TGAS. Some of the lowest-hanging include investigations of the locations of the pairs in phase space, to look at heating, age, and formation mechanisms.


deconvolution of labels

Lauren Anderson (CCA) and I discussed the state of our project to put spectroscopic parameters onto photometrically discovered stars using colors and magnitudes from APASS, parallaxes from Gaia TGAS, and spectroscopic parameters from the RAVE-on Catalog. We want to take the nearby neighbors in color-magnitude space and deconvolve their noisy spectroscopic parameters to make a less noisy estimate for (what you might call) the test objects. We have been using extreme deconvolution (Bovy et al.) for this, deconvolving the labels for the nearest neighbors (weighted by a likelihood). That is, find neighbors first, deconvolve second. After hours staring at the white board, we decided that maybe we should just deconvolve all the inputs up front, and do inference under the prior created by that deconvolution. Question: Is this computationally feasible?