At an undisclosed location in the Catskills, I worked on getting a talk and slides ready for the Local Group Astrostatistics meeting next week in Ann Arbor. Since the whole Stream Team will be there to talk about our stuff on streams, I plan on talking about Bovy and my April-Fools paper on the Solar System (and what that means for dynamical inference in the Milky Way) and The Cannon (because I want to encourage the crew to think about modeling the stars, too). I am not ready for my talk yet!
The close reader (there's a reader, I know, but a close reader?) of this blog will have noticed that CampHogg is in a transition. We used to think that projects producing "catalogs" from telescope data ought to produce likelihood information. Now we think this is probably impossible in general, and we will have to live with (at best) posterior probability information, under some priors. We discussed this in group meeting, in particular related to Malz's project and LSST. The photometric redshift system LSST expects to create using cross-correlations (with, say, quasars) will (if the method works) produce posterior probability distribution function estimates (not values, but estimates, which is scary).
Key questions we identified include the following: Understand what the effective priors are for that procedure. Understand what the deep assumptions are behind the cross-correlations; I think they have to be at large scales to work properly (linear bias and all that). And understand whether it has been demonstrated to work, empirically. I got all confused at the end of group meeting about the fact that the method generates noisy estimates of a pdf, which is a strange meta issue. What do you do about a noisily known posterior pdf?
José Garmilla (Princeton) showed up for group meeting, and talked to us about star–galaxy separation, which is of great interest to us at CampHogg. He has been trying many things—including many of the methods we at CampHogg been advocating—all with HyperSuprimeCam data, and he has been hitting many of the things Fadely and I have been hitting over the years: Things break down at faint magnitudes, even the existence of a truth table or training set is hard to establish once you get below about 25th magnitude.
In that same group meeting, Blanton introduced the idea of SDSS fiber collisions and the methods that are used to mitigate them. This is a seemingly trivial question: What to do with the galaxies that don't get redshifts because of proximity to another galaxy that does get a redshift? But when it comes to two-point statistics, there is no trivial answer.
After lunch, Kopytova and I pair-coded some of her spectral-fitting code, implementing the ideas we had about fitting simultaneously for a bunch of calibration or normalization parameters along with the stellar parameters. The code we (very hastily) wrote seems to work well, but of course it has model-complexity parameters (or, really, hyperparameters). That's to think about tomorrow!
My research day started late morning, when I met with Heather Knutson (Caltech), Marta Bryan (Caltech), and Henry Ngo (Caltech) to discuss population inferences involving long-period Jupiter-like planets. For each star, they get a velocity trend ("acceleration", I would call it) and a curvature ("jerk", I would call it), and they use these to constrain the properties of any long-period companion. We discussed what they are doing, and also my ideal vision of what is the best thing that can be done with these data.
One interesting technical point: When you have an acceleration and jerk that are both consistent with zero, do you treat them as "non-detections" or do you treat them identically with detections, and just compute the likelihood of the data given a long-period companion model? In the context of a full hierarchical model, there is no need (I think) to treat the detections and the non-detections differently. The key thing is to permit the model to include zero-mass planets or non-planets or models of differing complexity.
In my small amount of research time today I worked on a title, abstract, and introduction for a possible paper with Ness on inferring asteroseismological parameters (nu-max and delta-nu) from stellar spectra, using a modification of The Cannon. The project started when a preliminary study suggested that we can get ages. That is, we can train with stars with "known" ages and then infer the ages of other stars. Since the ages ultimately come from asteroseismological parameters, it makes sense to see if we can infer them first. If we can, it might have a big impact on what we think influences stellar spectra.
Today was the first day of Landolt Standards & 21st Century Photometry in Baton Rouge, organized by Pagnotta (AMNH) and Clayton (LSU). I came to speak about self-calibration. The day started with a historical overview by Bessel (MSSSO), who gave a lovely talk filled with profiles of the many people who contributed to the development of photometric calibration and magnitude systems. Many of the people he talked about (including himself) have filter systems or magnitude systems named after them! Among the many interesting things he touched on was this paper by Johnson, which I have yet to carefully read, but apparently contains some of the philosophy behind standard-star systems. He also discussed the filter choices for the Skymapper project, which seem very considered.
Suntzeff (TAMU) gave an excellent talk about the limitations of the supernova cosmology projects; his main point is that systematic issues with the photometric calibration system are the dominant term in the uncertainty budget. This is important in thinking about where to apportion new resources. He made a great case for understanding physically every part of the photometric measurement system (and that includes the stars, the atmosphere, the telescope, and the detector pixels, among other things). I couldn't agree more!
Grindlay (CfA) blew us away with the scale and content of the DASCH plate-scanning project at Harvard. It is just awesome, in time span, cadence, and sky coverage. Anyone not searching these data is making a mistake! And, as we were recovering from that, Kafka (AAVSO) blew us away again with the scale and scope of the APASS survey, which was designed, built, operated, reduced, and delivered to the public almost entirely by citizen scientists. It is dramatic; we are not worthy!
There were many other great contributions—too many to mention them all—but the day ended with a crawfish boil and then Josh Peek (STScI) and I at the bar discussing recent explosive conversations in the astronomical community around TMT and development in Hawaii.
One last thing I should say: Arlo Landolt (LSU) has had a huge impact on astronomy; his work has enabled countless projects and scientific measurements and discoveries. The development and stewardship of photometric standards and systems, and all the attention to detail it requires, is unglamorous and time-consuming work, ill-suited to most of the community, and yet absolutely essential to everything we do. I can't thank Landolt—and his collaborators and the whole community of photometrists—enough.
I got no research time in today until I headed to the airport, where I worked on my presentation to the Landolt Standards & 21st Century Photometry Symposium in honor of Arlo Landolt (LSU). I will be talking about self-calibration, the set of methods that have replaced Landoldt's (amazingly productive and important) standard-star catalogs for the calibration of huge surveys, which can (in principle) now be self-calibrated. I will talk about both the theory and the practice of the self-calibration arts.
Boris Leistedt (UCL) showed up for two days of chatting in preparation for his arrival at NYU this coming Fall. We discussed a set of projects in probabilistic cosmology. In one (which I have discussed previously with Fadely), we imagine what it would look like to infer galaxy redshifts from imaging without either a training set or models of galaxy spectral energy distributions. I feel like it might be possible, with the thought experiment being: What if you were given the photometry of the SDSS Luminous Red Galaxies? Couldn't you figure out their redshifts without being told their underlying spectral energy distributions? At least up to some geometric factor? Leistedt predicts that any method that doesn't use either models or training must do worse than any method that has good training data or models. However, it might make a framework for then including both model (theory) and training (data) information in in a principled way.
In another class of subjects, we talked about inference of the density field or the initial conditions. In the first place, if we could infer the density field in patches, we could use that to inform photometric redshifts or reconstruction. In the second, we could perhaps infer the initial conditions (and I mean the phases, not just the power spectrum); this is interesting because the Baryon Acoustic Feature ought to be much stronger and sharper in the initial conditions than it is today. We discussed some conceivably tractable approaches, some based on likelihood-free inference, some based on Gaussian Processes, and some based on machine learning (using simulations as training data!).
In group meeting, Dun Wang talked about astrometric calibration of the GALEX Satellite, and Kat Deck (Caltech) talked about the dynamical evolution of exoplanetary systems. She pointed out that we naively expect lots of planets close in period to be locked in resonances, but in fact such resonances are rare, empirically in the Kepler sample. She has explanations for this involving the evolving proto-planetary disk.
After lunch, Deck gave the astro seminar, on planetary system stability and the Kepler planets. She discussed chaos, stability, and heuristic stability criteria. One interesting thing is that there really is no non-heuristic stability criterion: We think of a planetary system as "stable" if there are no catastrophic, order-unity changes to any of the orbital osculating elements. That's not really an equation! And at the talk there was some discussion of the point (counter-intuitive and important) that a system can be stable (by our astronomer definition) for far, far longer than the Lyapunov time. Awesome and important.
At the end of the day, Foreman-Mackey and I made the (astonishing) decision to abort and fail on our NASA proposal: We just ran out of time. I am disappointed; we have an eminently fundable pitch. That said, we just didn't start early enough to make that pitch at the level we wanted. Not sure how to feel about it, but I sure need to catch up on sleep!
In the tiny bit of research time available today, I spoke with Kat Deck (Caltech) and Foreman-Mackey about finding planets with large transit-timing residuals. These signals aren't precisely periodic, so some searches could miss them entirely. Deck has simple models (based on perturbation theory) for the variations, so we can in principle add only a few parameters and capture a large range in possible variations. This might make us much more sensitive to extremely important and interesting systems hiding in existing data.
The rest of the day was spent writing our NASA proposal, with the exception of lunch with Yann LeCun (facebook) and Léon Bottou (facebook) at the NYC office of facebook Research.
At group meeting, Roberto Sanchis-Ojeda (Berkeley) and Kat Deck (Caltech) were around to argue exoplanets. We had Sanchis-Ojeda tell us about his ultra-short-period (USP) planet work. He showed us his evidence for a disrupting (or something) USP planet; to me it looks like evidence for a technological civilization making power from a star! But, for some reason, no-one else thought so! We discussed tidal mechanisms and stripping mechanisms, but we concluded that there is no real reason to think of the USPs as being related to hot jupiters or any other kind of planet in any mechanistic way.
Foreman-Mackey and I spent quality time with Roberto Sanchis-Ojeda (Berkeley) figuring out what we are going to do together, related to ultra-short-period planets. He was pulling for things related to false-positive evaluation, I was arguing for search at lower signal-to-noise ratios. I feel like there are lots of undiscovered systems. The rest of our science time was spent proposal-writing for NASA!
Roberto Sanchis-Ojeda (Berkeley) arrived for a few days of hacking. He is responsible for finding some very short-period transiting exoplanets. We didn't get a chance to set our goals for the day, because the day was packed with talks:
Victor Gorbenko (NYU) gave an absolutely excellent PhD defense. Congratulations Dr Gorbenko. Among other things, he was considering the difference between the theoretical (analytic) action of massless modes on the 1+1 worldsheet of a string (not a 0+1 wordline but s 1+1 worldsheet) and a lattice calculation. In this comparison, he found massive modes. He very nicely connected this work to larger questions in string theory, but also to the origins of string theory in QCD.
At lunch, Guido D'Amico (NYU) spoke about axions, axion–photon interactions, and cosmic transparency. This is related to work I did with Bovy and with More in the optical. He advertised some clever ideas he has about about using the Planck data to constrain this sector using millimeter wavelengths. He has been implementing some of these ideas at #astrohackny.
At the end of the day, Neil Lawrence (Sheffield) spoke about deep machine-learning models built from nested Gaussian Processes. The methods show a lot of promise and connect well to things we are thinking about in CampHogg.
I spent the morning with Leonidas Moustakas (JPL), a bit of Foreman-Mackey (in Pasadena for the Sagan Fellowship Symposium), and a bit of Andrew Romero-Wolf (JPL) and Curtis McCully (LCOGT) by phone, discussing projects related to strong-lensing time-delay measurements. We discussed two challenging projects. The first is to determine (from as many as we can construct) the best model for quasar time-domain variability. There are claims in the literature that the damped random walk is the best model, but that (very sensible) model hasn't really been competed against all that much. We know how to do this, lets do this! The reason they want good probabilistic generative models is that they want to determine time delays as precisely as possible, using a probabilistic approach.
The second project is to perform high-quality photometry on the (overlapping from the ground) images of a strongly lensed quasar. In this case—when the point-spread functions overlap—you have to do your photometry by simultaneous fitting, but with the variable (and badly known) PSF of ground-based astronomy, I have never seen such photometry that really looks right: There are always fitting-induced covariances of the overlapping-source light curves. I think this is caused by model mismatch (under-fitting), but I don't really know. Romero-Wolf and McCully pointed out that image differencing methods work well in crowded fields, so I formulated an image-modeling approach to the photometry that is as close to image differencing as possible. I promised to write it up into a document. I am kind-of excited about it; it is still just image modeling, but it makes use of the power of image differencing technologies to get flexibility to fit the real PSF as it is.
Late in the day, Adam Miller (JPL) showed me the JPL Mission Control center and a few high bays, filled with awesome stuff (including some fake Mars!).
I spent the day at LCOGT today, hosted by Diana Dragomir. While she was giving me a tour of the facilities, I met the founder and president, Wayne Rosing, who was in the shops working on the 1-m telescopes. The director, Todd Boroson, explained to me the idea behind LCOGT: It operates not as a set of independent observatories, but as a single telescope that happens to be a network of apertures (and capabilities). That's interesting; and it permits the telescope to make continuous measurements in a way that few other telescopes can. Most of the science is in the time domain. After this I had many great conversations. Some highlights follow.
One of the big issues for LCOGT is scheduling. I had a great conversation about this with Eric Saunders, who said various things that were music to my ears: One is that they have tried to specify the scheduling problem as an integer programming problem. This separates the objectives from the method or algorithm used to optimize it. The other musical thing he said is that they optimize using industrial-grade operations research packages. They are so much better than anything you could write by hand.
With Dragomir I discussed exoplanet atmospheres, and her evidence for Rayleigh scattering in a planet they are writing up now. She is struggling with inconsistencies in the data, which come from different telescopes working at different wavelengths: Are they real or, if not, how to model or remove them?
Late in the day, Andy Howell, Iair Arcavi, and Curtis McCully showed me the Supernova Exchange, which is a project they are working on to coordinate supernova follow-up. They are following up so many supernova of so many types, with so many collaborating projects, that they can't keep track without some sophisticated software. This has to be flexible enough to encode the planned and completed observations and their status, and also comments and discussion, and also tags that represent the reasons behind the follow-up. This latter point is something I have always been interested in: For many statistical questions, you need to know why something was observed.
I spent the day visiting Alberto Conti at Northrup Grumman today. NG is building JWST. I got to see it being assembled in the (huge) clean room. I also saw a full-scale model of the sun shade that they use for testing deployment. It is just straight-up absolutely enormous!
The point of the visit was to learn about NG's new proposals and concepts for NASA-funded astrophysics missions. I learned that distant, formation-flying star shades are actually practicable, especially at L2, and that ground-based testing is extremely promising. The reason the formation-flying is possible is that L2 is very flat, gravitationally. That said, it does take days or weeks to switch the pointing of the "telescope", which consists of two spacecraft 10,000 km apart! And the craziness of how the star-shade works makes me want to reconsider my physical optics!
NG also has great concepts for new missions that would be ambitious but affordable. One is a telescope that starts small and grows with robotic servicing missions. Another is a pathfinder telescope on the ISS. Another is a telescope that is highly asymmetric in mirror shape, to get good angular resolution but still fit in a launch-vehicle fairing. It was a dramatic set of presentations, in part because the hardware engineering is so impressive, but in part because they are thinking of full system engineering (including software–hardware trade-offs) to control budget and make things fundable.
My second day at JPL included conversations with Gautam Vasisht (JPL) about adaptive optics systems, Geoff Bryden (JPL) about the abundance of Earth analogs, and Alina Kiessling (JPL) about intrinsic alignments and weak lensing. On the latter we discussed the problem that most of the theory is based on dark-matter-only simulations, but this is precisely a problem where the baryons matter a lot! With Vasisht I learned that JPL has a "clock" up in one of the buildings that shows you the current counts of exoplanet candidates, confirmed exoplanets, and planets in the habitable zone! Plus a huge model of the Mars Lander. Awesome!
At the end of the day, Leonidas Moustakas (JPL), Curt Cutler (JPL), and I argued about the flow from experimental design (think: satellite astrophysics project) to quantitative results on the parameters or scientific questions of greatest interest. They are thinking about making standards and principles for doing this flow, thereby strengthening the quantitative arguments in their proposals for new (and complex) projects. We discussed the challenges of doing something of general value, but they have decided (very sensibly) to start with a few very specific projects to use as "poster children" for the idea. One challenge is doing this with the right "language" such that people from different scientific backgrounds can agree on what's being said at each stage (think of words like "bias" and "noise" and "model" and "systematic" and so on).
I am visiting JPL this week, hosted by Leonidas Moustakas. I gave a seminar today, about our work with Kepler, with a focus on the noise modeling aspects of the project. In the rest of the day I had too many great and interesting conversations to list, but highlights were the following:
With Roland de Putter I discussed parameter estimation and inference in large-scale structure projects. We talked a bit about correlation functions and Gaussian kernels. He had some nice intuitions and scaling arguments about the Gaussianity of measurements of the power spectrum that helped me understand (and corrected some of my errors in) my thinking about my cosmological inference projects. He also helped me formulate the simplest possible demonstration that what is traditionally done (in, say BAO projects) to mock up a "likelihood function" is formally (and maybe substantially) wrong and can be replaced. We also talked about inferring the initial conditions of the Universe from what we see. I pitched a "deep learning" version of this problem that I love so much I will have to write it down on the ideas blog!Francis-Yan Cyr-Racine (JPL) told me about an absolutely awesome project to understand the statistical effect of substructure on lenses without the requirement that the substructures be detected individually. That's right up my alley, and the alley of Brewer as well. He has broken the problem into parts, combining the weak parts into a Gaussian noise and doing the strong parts the hard way. Genius.
I argued for randomized (or other regular but clever) observing strategies for WFIRST and Euclid with Jason Rhodes (JPL) and the dark-matter and dark-energy group. We lamented the lack of good observing-strategy simulations for these projects, which would make such arguments quantitative, open, transparent, and efficient.
Evidently the ApJ has switched to double refereeing for papers that are interdisciplinary between the domains of astrophysics and the methodological domains. This has happened to me twice now. Today I spent time working on responding to the methods-oriented referee for our paper on The Cannon. The referee made some great points about related methodologies in statistics and machine learning to which we should be calling out in the paper; that got me doing some serious reading.
Sarbani Basu (Yale) showed up and gave a great talk on the current state of helioseismology and asteroseismology. Before her talk—which showed off the awesome of Kepler and its ability to separate the different post-main-sequence phases of stars using normal modes—she was subjected to a two-hour grilling in CampHogg group meeting. We asked her about the narrowness of modes, the time dependence of amplitudes, the principal observables and how they are determined, and so on.
After her talk, Angus, Foreman-Mackey, and I argued about what a generative probabilistic model of a stellar light curve might do for asteroseismology. Right now the data analysis chain is: Take a periodogram of the data, identify modes, identify the maximum-amplitude frequency and various frequency differences, compare those to models thereof. An alternative approach would be: Take a model of a star, compute the expected frequency spectrum and variations around that, produce a probability distribution over lightcurves that could be produced by this spectrum, and then boom, likelihood function! We are nowhere close to having this, but some of the key applied-math pieces we might need are starting to come together.
At the end of the day I spoke at the Westchester Amateur Astronomers, about exoplanet search and discovery, and the prevalence of Earth analogs. I got absolutely great questions from the (very knowledgeable) crowd.