2015-01-29

AAAC, day 2

Today was the second day of the AAAC meeting in DC; I participated remotely from NYC. The most interesting discussions of the day were around proposal pressure and causes for low proposal acceptance rates (around 15 percent for most proposal categories at NSF and NASA). The number of proposals to the agencies is rising while budgets are flat or worse (for individual-investigator grants at NSF they are declining because of increasing costs of facilities). The question is: What is leading to the increase in proposal submissions? It is substantial; something like a factor of three over the last 25 years. At the same time, the AAS membership has increased, but only by tens of percent.

With data in hand, we can rule out a few easy hypotheses: It is not people just sending in lots of bad proposals; proposals are still very good and now proposals rated Excellent and Very Good have less than 50-percent acceptance rates. It is not proposals by young people; the increase seems to be coming from senior people (people with tenure). It is not more proposals in any one year by the same PI; most PIs put in only one proposal to (say) the NSF AAG call in a year. What it might be is that people are not waiting three years between proposal submissions; I know I don't! 25 years ago, a successful research-active astronomer would put in one proposal every three years. Now I think most of us put in a proposal most years. That might account for it, but we don't yet know. It is not absolutely trivial to get the data.

In addition to getting data from the funding agencies, we also plan to do some kind of survey of the community. This will give less "hard" data, but will be able to give answers to questions that we can't ask of the data at the agencies, about things like motivation, perception, and reasoning among proposal PIs.

2015-01-28

AAAC, day 1

Today was a meeting in Washington, DC of the Astronomy and Astrophysics Advisory Committee that advises NSF, NASA, and the DOE on their overlapping interests in astronomy and astrophysics. A few highlights were the following:

NASA plans to start studies of some of the top mission concepts for big missions that might be put forward to the decadal survey in 2020. That is, they want the community to go into the 2020 decadal process with some really well researched and feasible, ambitious missions. The white-paper describing all this is here (PDF) and deserves reading and comment by the community. The best way to influence one of the mission studies is to join and get involved in the relevant "PAG" (whatever that is).

NSF and NASA did well in the FY15 budget process, but DOE HEP less well; this pressures DOE on its priorities. Nonetheless, it is attempting to move forward in sll three areas of next-generation CMB experiment, next-generation dark-matter detection experiment, and something like DESI. The DOE feels a strong commitment in all these areas.

Dressler is chairing a committee to assess the "lessons learned" from the last decadal process. Somehow this slipped past me un-noticed! I wrote him email during the meeting and will send him comments tomorrow, but in fact I think it is too late to influence the survey of the survey, which is too bad. As my loyal reader (or friend) might know, I have issues with some of the decadal-survey processes. In particular, I was horrified by the non-disclosure agreement that members of the survey were asked to sign. (That's why I didn't participate.) It said, and I quote:

"When discussing the survey process with anyone from outside the survey committee, a panel member, or a consultant appointed to the survey, you should avoid discussing any matter other than the publicly available information, such as shown on the NRC’s public web pages.

"Information discussed should be limited to presentations and discussions held in open session, materials circulated in open session only, and other publicly accessible information. Discussions by the committee, subcommittees, and panels held in closed session are confidential and should not be discussed or revealed. Similarly draft documents circulated in closed sessions of a meeting or in the time periods between meetings should be treated as privileged documents and should not be shared outside the survey. This is particularly important for any documents that contain draft conclusions or recommendations, since these survey outputs are not final until the NRC review process for the report is complete."

Astronomers will either honor this agreement—in which case we will never be able to know what really happened in the decadal survey—or else they will violate it—in which case they are violating the law! Indeed, and of course, I know many violations (since who can't talk about what happens behind closed doors?).

On the way home, I built a sandbox for looking at Nyquist-violating Fourier analysis of Kepler data. The idea is (and Vanderplas is a huge supporter of this) that if you have a great generative model of your data, there really is no true "limit". We shall see if we can do anything useful with it.

2015-01-27

K2 search

I spent most of our snow day doing Center for Data Science administration, but when I got a bit of research time, I worked on Foreman-Mackey's K2 exoplanet search paper. I also had a great phone conversation with Rix about The Cannon, and especially what the next papers should be. We need a refactored code and code release paper (led by Anna Ho, MPIA) and we need to try using The Cannon to diagnose issues with the physical models. We also have to use The Cannon to tie together the stellar parameter "systems" of two (or more) surveys, since this was the whole point of the project, originally.

2015-01-26

what can you learn from galaxy spectra?

In the lunch-time brown-bag spot, Mike Blanton gave a great talk about galaxy spectra from SDSS-IV, especially the MaNGA project. He drew an amazingly realistic galaxy spectrum on the board and talked about what can be learned from various spectral features, including galaxy age (or star-formation history), chemical abundances (including abundance anomalies that connect back to star-formation history), and mass function (or dwarf-to-giant ratio). The MaNGA survey will make all of these things possible as a function of position within the galaxy for ten-thousand galaxies, which is exciting. After the talk we discussed a bit what might happen if SDSS-IV APOGEE-2 took some long-exposure spectra of galaxies; presumably there would be tons of dwarf-to-giant information, and also a great measurement of the velocity distributions of stars of different types. Interesting to think about!

2015-01-25

a little writing

In a work day that ended up being almost all not research, I did do a tiny bit of writing in my MCMC tutorial and some cosmological data-analysis fantasies.

2015-01-23

finding exoplanets that aren't transiting

Avi Shporer (JPL) gave a nice talk this morning about finding planets in the Kepler data that are not transiting, using what's called "phase curves". There are multiple effects that make this possible: One is that close (hot) planets are heated asymmetrically, so they add to the lightcurve in a non-trivial way through the orbit. Another is relativistic beaming (seriously!) and another is produced by the ellipticity induced in the star tidally by the planet. Shporer has done a search of the Kepler data for such signals. He hasn't found many, but what he has found are all interesting. I got interested during the talk in the induced-ellipticity effect; this might have odd phase relations if the star is spinning.

In group meeting, we talked about various things, including Huppenkothen's Fermi proposal, Vakili's point-spread function model, and my proposals for cosmological inference using likelihood-free methods. I had lunch with Andrew Zirm (greenhouse.io) who used to work on galaxy evolution and is now a data scientist here in New York City. In the afternoon, Malz and I figured out the absolute simplest possible first project for doing hierarchical inference with probabilistic photometric redshifts.

2015-01-21

stream team

Kathryn Johnston and Marla Geha both brought their groups to NYU to discuss tidal streams in the Milky Way. Ana Bonaca showed the calibration process underway on her huge survey of the Palomar 5 stream. This globular cluster tidal stream extends for many tens of degrees in the SDSS imaging but then goes off the edge of the survey. Her DECam extend along the probable track. No results yet, but it is very exciting to see if the stream continues and what can be learned from it. Price-Whelan showed tantalizing evidence that, in a potential filled with chaotic orbits, tidal streams from progenitors on orbits with fast diffusion rates (short Lyapunov times) will produce streams with very different morphologies from those on orbits with slow diffiusion rates (long Lyapunov times). It is not yet clear how simple or general this is, but it is possible that chaos will be visible directly in streams and that the long streams we see are on especially (or unusually) non-chaotic orbits. That could explain things about alignments and so on. Emily Sandford (Columbia) is looking at better methods for identifying substructure interactions in the morphologies (including density distribution) of tidal streams. She is able to find energy gaps very reliably and is now looking at identifying the interactions in the simulation that produced them. The streams she has (which are from realistic n-body simulations) are really strange and unlike our standard observed streams; this might all relate to Price-Whelan's project!

2015-01-20

YCAA seminar

I spent the day at the Yale Center for Astronomy and Astrophysics (hosted by Tabby Boyajian), where I gave a seminar on data-driven models and The Cannon. I spent quite a bit of time distinguishing the model we use from vanilla machine learning; I really think that machine learning in its basic standard form is not that useful in astronomy (of course this depends in detail on your definition of the "scope" of "machine learning"). After my talk I did the rounds, and had many interesting discussions. One of these was with Andrew Hearin (Yale) who is working on generalizing the halo occupation model for large-scale structure modeling. He is convinced that we can't beat ten-percent-ish inferences about cosmology at 10-Mpc-ish scales unless we make the halo model far more general. At dinner, we had great conversations about the future of data-driven modeling and astrophysics. Thanks, Yale!

2015-01-19

don't Yale at me!

I spent my research time today getting ready a talk for Yale tomorrow. I am speaking about The Cannon (Ness, Rix, and my data-driven model for stellar spectra), which is doing a great job of labeling APOGEE spectra. It is a new talk, so I started from scratch. Big question: Will the projector be 4:3 or 16:9? Of course I can switch my slides with one line of LaTeX.

2015-01-16

light echos in LSST

Fed Bianco (NYU) gave a great talk today about eta Carinae and light echos, mass loss, and the progenitors of stripped supernovae. Amazing stuff, especially the light echos. One thing the whole audience was excited about was the idea that if you had truly high-quality synoptic imaging like LSST will provide, you might be able to do some amazing simultaneous modeling of the dust (the reflector) and time-variable point sources (the radiation field). My intuition that this "blind deconvolution" problem could be solved is based on the facts that (a) the radiation source is a very sparse set of point sources and (b) the dust lies in a (fairly) continuous, (fairly) isotropic distribution, while the light echo geometry is the surface of an expanding anisotropic ellipsoid. I bet that one isn't in the LSST Project Book.

In the afternoon, I spoke with Vakili about a possible huge speed-up in Cramer-Rao-saturating point-source centroiding methods, and I spoke with Huppenkothen about gamma-ray bursts in the Fermi data. On the latter, she is planning on putting in a data analysis proposal which would develop further our "Magnetron" project for modeling bursty bursts.

2015-01-15

good ideas over beer

Three good ideas came up in discussion with Foreman-Mackey and Fadely today. First, we formulated the likelihood-free inference for cosmological parameters given large-scale structure data (that I mentioned two days ago at CtU 2015) and concluded that it might be possible to perform this inference inexpensively with data and simulations in hand (that is, if we can convince some of the BOSS people to help out). Second, we discussed expanding Fadely's work on modeling the SDSS imaging catalog output to an infinite mixture of Gaussians using something like the Dirichlet process. Fadely pointed out that this might be impossible because we are doing the XD thing of convolving each Gaussian with each individual data point's individual uncertainty variance tensor. We agreed to at least look at the math, or maybe employ one of our math friends to help. Third, we talked about finding planets in K2 data by simultaneously fitting each lightcurve with a transit model plus a Gaussian-process-combined set of other-star lightcurves. That is, what we have been thinking about for Kepler search but (a) fitting the nuisances simultaneously and marginalizing out, and (b) predicting that all this will be even more valuable for K2 data than it was in the Kepler data.

2015-01-14

CtU 2015, day 3

I gave a second talk today at Computing the Universe, this time on nuisance parameter models. I continued my discussion of Gaussian processes and then made the following points: You must fit your nuisance simultaneously with your parameters of interest and you must marginalize them out; you don't want to estimate anything you don't care about. Don't fit and subtract or you will be subject to over- or under-fitting. The most conservative thing you can do is implement a model with a huge number of free parameters; it is not conservative to restrict the freedom of the nuisance model. Gaussian processes don't need to just be processes of spatial position or time; they can be of other features as well.

Again, great talks for the rest of the day; highlights included the following: Abel (Stanford) showed that if you reinterpret n-body simulations as being a set of points that define the corners of mass-filled tetrahedra (rather than mass points), the effective resolution increases and the accuracy for some calculations increases. He also talked about sensible and clever mesh refinement tricks to deliver very high precision at low additional cost. Norvig (Google) talked about the engineering at Google that permits huge operations on huge clusters with lots of contributing people. Press (Texas) gave a tour of all the things that he thought were essential to data analysis in astrophysics that are not in the third edition of Numerical Recipes. He talked about graphical models, kernel methods, Hamiltonian MCMC, the lasso (and other L1 regularizations), Dirichlet processes, the multi-armed bandit, and methods for the solution of systems of linear and non-linear equations. Skillman (Stanford) talked about an immense (trillion-particle) simulation that he has done and fully released on the web for anyone to use. His talk was hilarious in part because the data set is so damned huge. He has built great tools for interacting with the data efficiently. Schneider (Livermore) talked about our MBI entry to the GREAT3 weak-lensing competition. He did a great job of explaining the hierarchical inference, and describing The Tractor and also our importance-sampling inference method.

2015-01-13

CtU 2015, day 2

Today was the second day of Computing the Universe 2015 and Cosmo Hack Week. I spoke today about replacing frequentist correlation-function estimators with probabilistic inferences. I talked about likelihood-free inference (called "ABC" in previous posts, but I don't like that name), and about writing down a likelihood function based on Gaussian processes. Martin White (Berkeley) was my live critic, which was useful (and flattering).

There were many good talks; some highlights were the following: Cohn (Berkeley) talked about using unsupervised machine-learning methods to create features or descriptors of galaxy mass build-up histories, and then using a supervised regression to relate them to single-epoch observables; that seems like a productive way to think about designing new experiments. Budavari (JHU) spoke about hierarchical Bayes using GPUs to do the heavy lifting. He spoke mainly about GPU architecture and coding, but the hierarchical inference ideas he gave as his starting point are very relevant to Malz's projects; I have to get the two of them in touch. Font-Ribera (Berkeley) and Beutler (Berkeley) talked about BOSS large-scale structure results; the former about the IGM and baryon acoustic feature at redshift 2.5, and the latter about galaxies and the same feature at low redshift. Beautiful results in both cases. Schmittful (Berkeley) gave a beautifully clear talk on three-point functions and optimal estimators thereof. He has very nice results based on analysis of an Edgeworth expansion away from the Gaussian.

After the talks was hacking; at the hacking session I got together a team to argue out my ideas about CMB foregrounds. The discussion was very clarifying and I think there are some very simple baby steps towards a good project.

2015-01-12

The Cannon discussion section

I put my research time today into the discussion section in the nearly finished paper by Ness et al on The Cannon, our project to put parameter and chemical abundance labels on stars without using stellar models. In my current practice, I built a part of the discussion around our assumptions: What assumptions did we make, why did we make them, and what would be the costs and benefits of changing them.

2015-01-09

autocorrelation function

My only research activity on the airplane home from New Zealand was to work more in our MCMC users' manual. I wrote about the autocorrelation function and autocorrelation time of the chain, and how to use it in judging convergence and estimating uncertainties on sampling-based integral estimates.