The JPL HORIZONS system is amazing! You can compute the position of anything in the Solar System, at any time. With Weichi Yao (NYU) and others, I have been looking at Halley's Comet, with the thought of making a machine-learning benchmark data set (this is an idea from Soledad Villar, JHU). When we look up Halley in HORIZONS, we find many Halleys, not just one. I hypothesized that this is because there are different solutions for Halley on different apparitions. But somehow I am sort-of wrong: That's true for most of the Halleys in the system. But then today in our meeting Yao showed that there's one that seems to do well at all epochs. Huh? Anyway, HORIZONS is better on content than documentation!
2022-03-07
2017-04-26
void–galaxy cross-correlations, stellar system encounters
Both Flatiron group meetings were great today. In the first, Nathan Leigh (AMNH) Spoke about collisions of star systems (meaning 2+1 interactions, 2+2, 2+3, and 3+3), using collisionless dynamics and the sticky star approximation (to assess collisions). He finds a simple scaling of collision probabilities in terms of combinatorics; that is, the randomness or chaos is efficient, or more efficient than you might think. The crowd had many questions about scattering in stellar systems and equipartition.
This led to a wider discussion of dynamical scattering. We asked the question: Can we learn about dynamical heating in stellar systems by looking at residual exoplanet populations (for example, if the heating is by close encounters by stars, systems should be truncated)? We concluded that wide separation binaries are probably better tracers from the perspective that they are easier to see. Then we asked: Can the Sun's own Oort cloud be used to measure of star-star interactions? And: Are there interstellar comets? David Spergel (Flatiron) pointed out the (surprising, to me) fact that there are no comets on obviously hyperbolic orbits.
Raja Guhakathurta (UCSC) is in town; he showed an amazing video zooming in to a tiny patch of Andromeda’s disk. He discussed Julianne Dalcanton’s dust results in M31 (on which I am a co-author). He then showed us detailed velocity measurements he has made for 13,000 (!) stars in the M31 disk. He finds the velocity dispersion of the disk grows with age, and grows faster and to larger values than in the Milky-Way disk. That led to more lunch-time speculation.
In the cosmology meeting, Shirley Ho (CMU) spoke about large-scale structure and machine learning. She asked the question: Can we use machine learning to compare simulations to data? In order to address this, she is doing a toy project: Compare simulations to simulations. Finds that a good conv-net does as well as the traditional power-spectrum analysis. This led to some productive discussion of where machine learning is most valuable in cosmology. Ben Wandelt (Paris) hypothesized that a machine-learning emulator can’t beat an n-body simulation. I disagreed (though on weak grounds)! We proposed that we set up a challenge of some kind, very well specified.
Ben Wandelt then spoke about linear inverse problems, on which he is doing very creative and promising work. He classified foreground approaches (for LSS and CMB) into Avoid or Adapt or Attack. On Avoid: He is using a low-rank covariance constraint to find foregrounds (This capitalizes on smooth wavelength (frequency) dependences, but reduces detailed assumptions). He showed that this separates signal and foreground—by the signal being high-rank and CDM-like (isotropic, homogeneous, etc), while the foreground is low rank (smooth in wavelength space). He then switched gears and showed us an amazingly high signal-to-noise void–galaxy cross-correlation function. We discussed how the selection affects the result. The cross-correlation is strongly negative at small separations and shows an obvious Alcock–Paczynski effect. David Spergel asked: Since this is an observation of “empty space”, does it somehow falsify modified GR or radical particle things?
2016-12-09
transits of swarms of debris
I met with Ellie Schwab (CUNY) and Kelle Cruz (CUNY) to discuss Schwab's model of stellar activity in low-mass stars. We checked her MCMC sampling diagnostics and worked out how to make her model more general. It is a mixture model, with an active and inactive population of stars, mixed.
I met with Caroline Kaler (NYU) to get her started looking at the Kepler data. Inspired by my visit to Rochester this week, I have her looking at the Boyajian Star. I have a crazy thought that we might be able to use the smoothness (or not) to limit (or measure) the number of bodies contributing to the light-curve events, if those events are multi-object transits.
2016-12-07
Rochester
I spent the day at the University of Rochester, where I gave the Physics Colloquium. I spoke about data-driven models. Before my talk, I had many interesting and valuable conversations with faculty and students. One highlight was work that Alice Quillen (Rochester) is doing on tidal dissipation. She is building mechanical models of solid bodies (think: planets) to parameterize tidal dissipation and look at tidal locking mechanisms, and spin–orbit resonances and dynamics.
Another highlight was a long conversation with Eva Bodman (Rochester) who (among other things) has been looking at extra-solar comets in the Kepler data. We discussed things she has done, but also the low-hanging fruit for future work on comets around other stars. She has built a model of the strange behavior of the Boyajian Star in terms of a (bizarre, huge) comet population; this made me think that there are lots of things we might do with comet population models, or other models of swarms of debris.
2013-04-09
lunch
Fergus, Schölkopf (MPI-IS, visiting us for three months, now counting as a Camp Hogg regular), Brewer, Fadely, Foreman-Mackey, and I went to lunch together. What a pleasure! Long (unfortunately expensive) lunches are an important part of how we get things done here at Camp Hogg. We discussed further the Kuiper Belt problem, and many other things. One beautiful idea that came up is that outer Solar System objects always have (nearly) zero proper motion, whereas extra-Solar Galactic sources always have proper motions that are as large as (or larger than) the parallax amplitude. Late in the day we discussed (over drinks I am afraid) with Brewer possible engineering improvements to nested sampling and the future of complex, high performance samplers that adaptively take advantage of everything that is now known about the huge class of sampling problems. It was a great day.
2013-04-08
Kuiper Belt
Last year, Brendon Brewer (Auckland) visited us and solved a huge problem in astronomy: How do you get the number counts (or flux distribution) of sources too faint to detect in your image? You might think you can't but you can if they contribute significantly to the pixel noise statistics. Brewer's solution is the full Bayesian solution: Sample all possible catalogs! Today (because he is in town), I pitched to him the same problem but for moving sources, in particular sources moving like Kuiper Belt objects move (in an apparent sense from the Earth). If we can generalize what we have done to a mixture of stars and Kuiper Belt objects, we might be able to determine things about the size and semi-major axis distribution even for objects we can't detect significantly even in the full stack of imaging.
2013-03-28
data-driven spectral models; Oort cloud
I spent a good chunk of the day at Stanford, chatting with Blandford and Strigari. Blandford had lots of good thoughts to contribute to my general ideas about how one might build empirical (data-driven) and yet physically interpretable models of stars from enormous amounts of high signal-to-noise, high resolution spectral data (like we have in APOGEE). In particular, he pointed out that we don't have to ignore what we know about atomic physics and quantum mechanics when we do it! Strigari is thinking about the Oort cloud and the comets that allegedly fill it: Do they really have to be in a fluffy cloud around every star, or could they instead be in a space-filling population not bound to any star? Or a mixture of the two? Radical! And, he hopes, testable.
2012-02-16
three themes of data analysis
Melvyn Davies (Lund) upbraided me for not bringing a jacket and tie to Lund for tomorrow's PhD defense event, where I am a very important person. After he finished haranguing me about my jeans, we discussed the possibility of ever imaging or detecting directly free-floating planets. The conversation was discouraging!
I gave my seminar, and in preparing it I realized that there are a lot of simple themes connecting the crazy array of seemingly disconnected topics I work on. I was able even to classify my projects: Those that involve data-driven models (Tsalmantza quasar, Bovy quasar, and Fergus high-contrast projects); those that involve probabilistic classification or mixture models (Foreman-Mackey calibration, Lang Comet Holmes, and Koposov GD-1 projects); and those that involve moving away from catalogs and down towards rawer (pixel) data (Lang faint-motion and my own crazy large-scale structure projects). All this pleased me, because those are three ideas that can (in principle) be put into a one-hour seminar. I failed today, but it is a process, right?
By the way, one of the nicest conclusions of the Holl (Lund) thesis is that forward modeling is the best way to deal with Gaia's charge-transfer inefficiency issues. That's good for my brand
.
2012-02-02
station keeping
Not much to report today, except referee-responding with Lang, MCMC packaging and documentation with Foreman-Mackey, and safe operation of heavy Tractor machinery with Mykytyn. Also, Mulin Ding (the best sysadmin in all of science) installed a new 30 TB of disk on our current favorite big compute machine. I am sure we will fill it by April.
2012-01-26
responding to referee; the disk
I spent the day in Princeton; the morning with Bovy talking about the Milky Way disk and the afternoon with Lang working on the response-to-referee on the Comet Holmes paper. We are very, very behind schedule on that! We made figures that compare the Comet Holmes orbit we inferred to the NASA orbit. We don't get quite the right orbit, in part because our model of the data we scraped from the web is so crude.
Bovy and I discussed his results on the kinematics of mono-abundance subpopulations in the Milky Way disk, a follow up to his paper on the spatial structure of those same populations. We also discussed his measurement of the disk rotation curve with APOGEE data; he gets a low-amplitude (relative to Reid and The Colbert Report) rotation curve, which is intriguing.
2011-04-06
dotAstronomy day three
In the morning session I talked about modeling, including our Comet 17P/Holmes project (which got some press here and here; my dotastronomy viewgraphs are here). A highlight for me of the talks was Geert Barensten (Armagh Observatory) talking about human observing of meteroids. He showed a ridiculous distribution of meteoric material around the Earth; the detail was beautiful. At the end of his talk he showed experimentally that some meteoroids can be detected by Twitter searches!
We had an afternoon unconference, with so many good things I couldn't decide what to do. In the end I went to the mash-up discussion, which evolved into a discussion of funding, not surprisingly given the bad state of things for so many interesting projects right now (Jill Tarter of SETI noted that the Allen Telescope Array may be forced to shut down this year). We decided to look at crowd-sourcing some long-term funding propaganda.
2011-04-02
a new sampler
In Lang and my April Fool's post, arXiv/1103.6038, we made use of an affine-invariant MCMC sampler proposed by Goodman and Weare, which has very good autocorrelation properties. That is, it greatly reduces the number of likelihood calls you need to make to reach good results. Today, I worked a bit on (and then signed off on) Fengji Hou's paper on a general implementation of this sampler and application of it in exoplanet discovery.
(By the way, we tried to come out first on April 1, submitting 48 seconds after the deadline. There were seven papers in—or cross-listed in—astro-ph in the first 8 seconds after the deadline. You gotta be fast.)
2011-03-23
trying to be funny?
Lang and I worked on the title, abstract, and figures for the Holmes paper. The figures involved permissions from an enormous number of people; Lang has been heroically collecting these. This is definitely a problem with citizen science!
2011-03-18
how to frame a comet
Lang and I added parameters to our Comet Holmes model that capture the statistics of where an astrophotographer locates her or his subject within the frame. These seemed to improve our model likelihood, confirming intuitions from Iain Murray. Oddly we are learning a lot more in this project about astrophotographers than we are about comets! We also figured out by detective work that this image (below), which was torquing some of our fits, is not an image of Comet Holmes, but rather a faint galaxy!
2011-03-17
Bayes can't fail
Today was almost all administration, but I did have a great email conversation with Iain Murray about the Comet Holmes project. I was saying that Lang and I are getting biased answers, and his response was: If your model is good, the posterior PDF pretty-much has to include the right answer. That is, you can be biased, but Bayes never confidently rules out the truth. I have many things to say about that, but the blogosphere doesn't have room for it all! He said: Go hierarchical. I said: Argh! (because he is right).
2011-03-16
typesetting probability
I spent the research part of my day writing the method section of the Comet Holmes paper.
2011-03-15
marginalizing a model of astronomers
Lang and I worked on the Comet Holmes project more today. We encountered an interesting issue when we tried to marginalize out the time at which an image was taken: If you think an image was taken of a comet (and we do), and you don't know either the orbit of the comet or the time at which the image was taken (and we don't, by construction), then you are inclined to infer a slowly-moving comet! This comes from the fact that the only sensible likelihood (probability of the image given the comet parameters) involves a marginalization over times, and more time gets into each image the slower the comet goes. A slow comet is a distant comet, and that is a less observable (and less likely to be observed) comet, so we are doing something wrong, but it is not trivial to find a principled solution to this one. Bayesians out there? This is a general issue for all parametric curve fitting is it not?
2011-03-14
fitting orbits to astrophotographers' behavior

In our clandestine activities of the weekend, we wrote down a generative model for how amateur astrophotographers point their telescopes. Here (above) are some parameters of that model, superimposed on a sky picture of the footprints of their images.
2011-03-13
clandestine activity
Lang and I spent every waking minute of the weekend working on our quasi-secret paper about Comet Holmes that has an end-of-month deadline. Of course I say quasi-secret because absolutely everything we do is exposed on the web at all times in our SVN repository, and I have probably posted about it ten times previously! We got Foreman-Mackey's version of Goodman and Weare's affine-invariant ensemble sampler working (the same sampler Hou is using, but a new implementation), we figured out a new kind of likelihood function for image pointings (a generative model of astrophotographers if you will), we experimented with multiprocessor stuff, and we made figures. My kind of weekend!
2010-03-30
dynamics and MCMC
Lang and I hacked away at our Comet Holmes project today, trying to get an MCMC to find the global optimum of the likelihood function, and sample it. We know the right answer—the point is not to get the right answer in this case—the point is to make a system that just works, every time. In this and in my exoplanet stuff, there are many local minima; of course that is true for generic optimization problems, but I have an intuition that a lot of the simple dynamics problems have similar kinds of local minima.
One theme of these kinds of problems is that we (meaning astronomers) tend to use MCMC for three things that are really distinct operations:
- We want the MCMC to crawl the parameter space and find, among all the local optima, the global optimum. This is search.
- We want the MCMC to gradient-descend into that global optimum. This is optimization.
- We want the MCMC to give us back a fair sampling from the likelihood or posterior probability distribution function. This is integration.