worst of all possible worlds

The Milky Way potential is time-dependent, filled with orbiting, massive substructure, and has no useful symmetries. The stars orbiting have not had time to fully mix in phase space. Is inference—of the dynamics from a snapshot of the kinematics—still possible in this worst of all possible worlds? I think it is and I gave some hints about it today in the last of my workshop-like seminars.

[Going on vacation now for almost a month.]


ransac streams

As if Lang and I weren't distracted enough, we revived an old idea of finding Milky Way substructure (think streams) by a ransac-related approach. Use pairs of stars to generate hypotheses, test those hypotheses with the other stars. We revived this only after coming up with plans for finishing our unfinished papers, but still it was typical distraction from the critical path!


Bayesian black-hole masses, cold structures

A vigorous morning of debate with Tremaine got us to the point that our work on orbital roulette is pointed towards the Galactic Center. Our Bayesian approach permits us to use data of varying quality, subject to heterogeneous and unknown selection functions, and with missing dimensions (think projected positions and radial velocities). This will permit us to analyze simultaneously and comparatively all the available data sets, and resolve issues in the literature. Bovy is well tasked for the break.

I gave the second of my workshop-like talks about inferring Milky Way dynamics from kinematic data; today I talked about cold structures in phase space and their information content.


extreme tidal disruption, moving groups

In the limit that everything in the stellar halo is the result of tidal disruption of a satellite, the phase space will contain lots of structure, but subject to important constraints, because tidal stripping mixes or stretches one (predictable) dimension more than the others. Bovy and I discussed this along with many other ideas for finding and using phase-space structure.

Bovy also made some plots of the observational properties of moving groups of stars in Hipparcos. He may have some phenomenology that rules out some hypotheses for the origin of this structure.


extreme deconvolution, mixed angles

I gave the first of three workshop-like seminars at NYU this week. This one was on reconstructing dynamical models from kinematic measurements in the limits that the potential is integrable and the angles are mixed (the system has evolved for a very long time without resonances).

Bovy handed me a new version of his document on building the underlying, deconvolved distribution which, when given errors, generates the data in a sample, even when the errors are different and large for every point. He tentatively titled it extreme deconvolution



Lang and I spent a long time discussing Tremaine's objection to the Bayesian program for showing that a set of angles (say) are drawn from the flat (uniform) distribution between 0 and 2 pi. If the N-dimensional probability distribution for N angles phi is just the product of N flat distributions, then every N-dimensional point is equally likely, whether it corresponds to spread-out phases or concentrated phases. Lang and I came up with some clever things to say about this, but we don't yet have an answer that will satisfy Tremaine. At some level, the problem comes down to the problem that a frequentist can ask are the data consistent with the flat hypothesis? whereas the Bayesian needs to ask is the flat hypothesis better than X? where X is a well-specified alternative.


stellar populations, image modeling

Bovy, Lang, and I wrote down a complete model and objective function for our image modeling / astrometric catalog / multiwavelength counterparts project. In principle, the model and objective function is all the science; optimization is just engineering, but in practice that is never trivial for problems as hard as this one. We also reminded ourselves why the best way to do catalog matching is through synthetic image reconstruction.

Scott Trager (Kapteyn) gave our group meeting talk on stellar populations in old galaxies. When I last worked in this area, the theoretical modeling limited any conclusions. Since then, the already great data have become even better, but Trager surprised me by showing that the models have also enormously improved, in part thanks to him. It might be time to dive back in.


astrometry and image modeling

Lang is in town. Bovy, Lang, and I discussed image modeling, in preparation for Bovy's shot at multi-wavelength simultaneous modeling for the construction of astrometric measurements or an astrometric catalog. We argued about what freedom to give the point-spread function: Use the survey- or observation-provided point-spread function, fit the provided point-spread function, or learn the point-spread function from the data. I lean towards the latter, despite the fact that it involves reinventing the wheel, but I lean that way because otherwise we have to write specialized modules, one for each input data set, to deal with the point-spread function analysis available.

Lang and I also discussed getting the main Astrometry.net paper done and submitted.


radial velocities

In my tiny amount of research time today, Blanton and I discussed the possibility of taking a billion radial velocities to match Gaia's transverse measurements. This would be an enormous project, but not expensive on the scale of space missions.


Spitzer telemetry, angular momentum, and scheduling

I am at the Oversight Committee meeting for the Spitzer Science Center, which is, by the rules, not research. However, I learned a lot about scheduling and telemetry and pointing. The latter is done with reaction wheels. In low-Earth orbit, you can use a combination of reaction wheels and magnetic torques, and therefore control the reaction-wheel revolutions and speeds. But Spitzer is on an Earth-trailing orbit, very far away; it is not in a magnetic field that is large enough. This means that the reaction wheels are doing lots of revolutions, and they are known empirically to comprise one of the systems most likely to fail during the Telescope's latter years. The Earth-trailing orbit is also a challenge for data telemetry, because it relies on sending data bursts to enormous radio antennas in the Deep Space Network (and even this can only be done at certain satellite Earth-angles).


type Ia supernovae

Carles Badenes (Princeton) gave a great astro seminar about the remnants of type Ia supernovae. He can type ancient supernovae by observing their remnants, and in some cases his type determinations can be tested by observing light echos (now, hundreds of years after the original explosion).


halo paper, MCMC insanity

Started reading and commenting on Zolotov's paper on observations of a simulated galaxy. It shows that kinematics is not necessarily a good tracer of origin (that is, accreted versus in-situ formation for stars).

Suffering from insanity, I wrote a demo that uses Markov Chain Monte Carlo to fit a model composed of a mixture of k Gaussians to a set of one-dimensional data with proper Poisson likelihood. This is insane, because the Expectation-Maximization algorithm solves this problem already, and it is simple and comprehensible and far faster than MCMC. But I want us to be able to marginalize over parameters, so I need a sampling around the best fit. I don't think I have any other choice.


k-means, MML, mixtures

When I crashed Bovy's office today, he was working on a minimum message length application for the k-means data clustering algorithm. Because k-means is not usually thought of as a data model, it is a little strange to apply MML, but we are interested in comparing PCA to k-means and assessing scaling and other properties from the point of view of data compression or data summary. We also discussed our usual basket of topics, but notably implementing an MCMC-optimized mixture of gaussians model, which would have some (inferential, but not speed) advantages over EM.


Spitzer geometry

Wu, Schiminovich, and I met to discuss the progress of the S5 data reduction, which involves the assembly and unified reduction of a large number of galaxy spectra in the mid-infrared. We discussed progress and to-do items, but spent a bit of time understanding the geometry of the focal plane and spectral apertures, so that we could make informative figures and diagrams showing the relationship between the spectroscopy and our imaging data.


one star, one delta function

After the breakthroughs of the weekend, Bovy and I worked out the case for one single star (measure x, v, infer omega) for the one-dimensional simple harmonic oscillator, where we assume that the distribution function in action space is a delta function. It turns out to be identical to what we had before, but it will be different when we have N stars.


modeling phase space

There has been a long email conversation between Bovy, Lang, Tremaine, and me about modeling mixed-phase (ergodic) dynamical systems to determine potentials from instantaneous phase-space observations. This has been endless, for all the reasons mentioned in previous posts. I brought in Marshall before the Thanksgiving weekend, and traffic continued.

I think we understand something better than we did: We have to include parameters that explicitly model the distribution function in actions (recall that this is the action-angle formalism for integrable potentials at present). We can then marginalize over the parameters of the distribution function when we determine the parameters of the integrable potential.

But there is still the original problem that Tremaine started us on. We understand it better but we don't have a solution: How do you explicitly find the potential that is most consistent with mixed phase using straight-up Bayesian (that is, proper probabilistic) inference? It seems that all you are allowed to do is determine the posterior probability, and that posterior probability is not penalized (very much) if the particles are clustered strongly in phase. There is no way to penalize any such clustering strongly without also strongly violating the independent and identically distributed assumption or picture in which we would like to work (for now).

Partly this all goes to show that mixed-phase systems are hard to work with. Fortunately, they also don't exist.


optimization, deconvolved distributions

I started working through Bovy's opus on our technique for estimating the error-deconvolved distribution function that generates a d-dimensional data set. There are lots of algorithms for describing the distribution function that generates your data. But in general each data point has noise contributions, and in general those noise contributions are drawn from different distributions for each data point. Bovy's framework and algorithm models the distribution that—when convolved with each data point's noise distribution function, maximizes the likelihood of the data. This is the fundamental object of interest in science: The distribution you would have found if your data were perfect.

Conversations continued about image modeling, with Bolton, Lang, Marshall, and Bovy. We have ideas about parameterization of the model, and we have ideas about the objective function, but we are totally at sea when it comes to optimization. Everything we have in mind is expected to be incredibly slow. Incredibly. So I started looking into that.


mixture of delta functions, roulette

On the weekend, Adam Bolton (Hawaii) and I had a long discussion of modeling images, possibly as mixtures of delta functions (which, when convolved with the PSF, are a reasonable and complete basis for modeling any image). My interest is understanding multiwavelength data at the angular resolution of the highest resolution image (or better). This involves not just modeling the pixels, but also modeling spectral-energy-distribution space. We discussed using a delta-function sampling of this too.

Today, Bovy and I continued this discussion, with thoughts about how to generalize the pixel-space and SED-space models to other kinds of mixtures or linear subspaces. This is a non-trivial issue, because choices made here probably affect issues related to optimization, error analysis, and sampling, all of which will come much later.

A long email conversation among Bovy, Lang, Scott Tremaine (IAS) and I continued over our Bayesian formulation of the orbital roulette problem. This is all getting very philosophcal, but Bovy and I are taking the position that the statement of roulette is the statement that the posterior probability distribution of phase, for each object taken individually, is flat. This becomes a constraint on the permitted priors. Tremaine had hoped that roulette could be derived from Bayesian considerations, not that it would be an assumption modifying the Bayesian inputs.


clustering, shapelets

Alison Coil (UCSD) gave a nice talk about clustering and the things that can be learned about galaxy evolution from a halo occupation picture of galaxy and quasar clustering and cross-correlations. She has some tantalizing results on green valley galaxies.

Along those lines, Phil Marshall and I specified a well-posed problem for doing galaxy morphologies and learning things from them using shapelets, or, better, Kuijken's sechlets. We wrote down some early specs, although we didn't in fact get started.



Phil Marshall (UCSB) came into town to talk about various things, including image modeling or deconvolution. We are both trying to reach the information limit in ground-based data by image modeling. Marshall's problem is to find or analyze arcsecond-scale gravitational lenses in large ground-based surveys where the seeing is on the order of one arcsecond. My problem is to create the best possible astrometric catalog from multi-wavelength data from NASA surveys. In both cases, we need to fit point sources to our images; this is deconvolution (even though most wouldn't call it that).


S5 visualizations, phase space

Schminovich, Wu, and I discussed our S5 project, a statistical project with Spitzer spectroscopy of SDSS galaxies. Wu is trying to automate all of the data analysis, which is a laudable and ambitious goal. We worked out a set of data visualizations that will help us vet the output of the automated pipelines.

Kathryn Johnston (Columbia) and I discussed the determination of the Galaxy potential from the kinematics of stars when we can't assume that the potential is integrable or even time-independent, and when we can't assume the stars are phase-mixed. We discussed approaches that—instead of assuming phase-mixed—assume maximum clumpiness or minimum entropy in phase space. There are no well-posed solutions to the general problem that either of us know about; everything has to assume something we know not to be true. But which is the more damaging assumption, the assumption that the potential has time-independence or other (known to be broken) symmetries, or the assumption that the phases of the particle orbits are mixed? And how do we find out empirically?



Spent my research time today editing and commenting on manuscripts from Wu (infrared properties of ultra-low luminosity galaxies) and Bovy (Bayesian orbital roulette).


digitized plates

I spent some time learning about the horrendous photometric and astrometric issues with photographic plates that have been digitized. The photometric and astrometric solutions are non-trivial functions of stellar brightness, because as stars saturate, only the halo (outer part of the PSF) can be measured. The detectable size of the halo of each star grows slowly with magnitude (this is good, but it needs to be calibrated), and it shifts radially relative to the star (the outer part is not concentric with the inner parts). The former effect means that magnitudes are a function of "flux" in the scan and apparent angular size of the star. The latter effect means that the plate scale is a function of stellar magnitude! If we are going to get all this right, we need a generative model of the optics and the photographic process. Uh-oh.


uninformative priors

[In a violation of the rules (at right), I haven't posted for a couple of days. Not much research got done, between finishing a funding proposal, giving a talk at Michigan State University (thanks everyone!), and the related travel.]

Bovy, Lang, and I had the realization that, in the one-dimensional simple-harmonic-oscillator formulation of the Bayesian orbital roulette problem, the priors matter. Some apparently natural choices of prior on amplitude and frequency make it such that the marginalized posterior probability distribution function for the phase is not the same as the prior distribution, that is, flat in phase. These choices of prior must be wrong choices in the absence of truly informative prior information, because the phases ought to be uniformly distributed no matter what.

Once we figured out which priors are truly uninformative, we were able to make the whole thing work. Time to write it up and extend it to more complicated one-dimensional cases (such as Oort's method for determination of the vertical mass distribution in the disk).


harmonic oscillator roulette

Bovy, Lang, and I worked on the weekend and today on an even simpler problem than the orbital roulette problem: The equivalent for a one-dimensional harmonic oscillator. If we get this squared away, we can update the Oort methods for determining the mass distribution (vertically) in the Milky Way disk to a justified Bayesian framework (and improve the precision and build in the capability of accepting finite errors to boot). It turns out to be a little bit non-trivial, though we got something working by the end of the day.



Beautiful astro seminar by Leslie Greengard (NYU) who is the author of fast multipole method, and various machine-precision methods for electromagnetism and gravity. He spoke about solving equations related to scattering in velocity space in plasma physics experiments. His method gets close to machine precision by converting some of the equations into integral equations and then solving the hard part numerically with expansions. Afterwards, we ended up having a discussion of the solution of linear differential equations under non-trivial boundary conditions, which took me back to undergrad.


bayesian orbital roulette

While in Princeton, Tremaine assigned Bovy and me the problem of making a Bayesian formulation of the orbital roulette problem. The existing formulation is disturbingly frequentist (choose a statistic and optimize it!), but Tremaine thought there might be problems in principle with any Bayesian formulation. Conversations between Bovy, Lang, and myself got us to a solution that works beautifully! Unfortunately, we don't quite understand the justification of our own method, which is an odd situation, but maybe not all that uncommon when inference gets hard. This is a baby problem for the Gaia problem that I have been talking about, so it is nice to already have something solid to talk about.

The idea in the Bayesian formulation is that the fact that you expect the phases of the orbits to be randomly distributed (that is, you expect a flat distribution of phases) turns into a distribution of inferred potentials (because, for each observed position and velocity, the phase is a function of what potential you choose). The problem with our formulation is that we used this thinking to jump to the solution, without going through all the steps that ought to be there. More when we have it.


huge data

Today was the second Big Data Lunch at NYU, hosted by me and Heather Stewart of NYU Information Technology Services. The speaker was Kyle Cranmer (NYU), who talked about the LHC, which is truly big data. His talk was great, and started a tiny bit of a discussion about what would constitute truly usable shared, high-performance compute infrastructure for those of us who analyze data. For most of us, common facilities are not that useful in their current form, because we are very constrained in OS, file systems, data transfer, software installation, and other issues. Indeed, Cranmer is not allowed to publish results if he does not use particular versions of operating systems and software! Cranmer advocated an elastic system like Amazon's EC2, with virtualization that would permit arbitrary OS and software installs; I am agnostic, but I agree that we need to do something other than just build huge multi-core systems if we are going to facilitate computational science that goes beyond simulations.


the disk, AEGIS

Spent the day with Bovy at the IAS speaking with Tremaine (and others) about inferring dynamics from kinematics. We decided that the best place to start is the disk, because (a) there are lots of data available right now, (b) we already have results on the velocity distribution and the significance and origins of structure therein, (c) there are other straightforward projects to work through there (most of them much easier than what Bovy has already made happen), and (d) we need to build up some experience before we start thinking about the Gaia problem, which is huge.

Sandy Faber (UCSC) gave the IAS talk today; she spoke about the stellar mass and star-formation evolution results from DEEP2, but especially the multi-wavelength AEGIS data. She argued for a very simple picture in which halo growth is simple, and galaxy growth within those haloes is also simple (depending only on halo mass, with a slight adjustment with redshift). She showed that even this very simple picture explained most of the observations adequately. I don't disagree.


steady-state Galaxy?

I spent a chunk of the weekend and this morning working through the literature on modeling the Galaxy as a steady-state system, using phase-space data from Gaia or the projects that precede it. This didn't take long, because there isn't much written on the subject. Several good papers are by Binney, who has a torus programme. However, this program and all the others (going back to Schwarzschild) make the assumption that the potential is time-independent and has no orbiting substructure (a kind of time-dependence). Binney and others suggest that the time-dependence could be seen in residuals of any fit, but that is not clear to me, and it is also not clear that such a discovery of time dependence would permit good analysis of the time dependence.

Interestingly, tidal streams (disrupting satellites and clusters) both measure the potential directly (in some sense, because the streams highlight an orbit or a small family of similar orbits), and show that the Milky Way is not in steady state. I would love to discover the analysis program that makes use of the near-steadiness and the substructure that is not steady to infer the potential and its time dependence all at once.


tristate expolanets

Today was the (first) Tri-state Astronomy Conference, organized by Ari Maller (CUNY CityTech) and Marla Geha (Yale). There were good talks all day, with quite a few (though not all) New-York-area institutions represented. I learned the most from Ben Oppenheimer's (AMNH). He made a pitch for exoplanet research and showed me several things I had not thought about or seen before:

  • Although the properties of a star are set, pretty much, by mass and chemical composition (and age), this is not even close to true for planets. Look at the Solar System! This shows that once you get to low mass, formation history and environment matter, deeply.
  • There is no clear distinction between brown dwarfs and planets, observationally. The differences are entirely related to discovery technique! In mass and orbital radius distributions, planets and brown dwarfs overlap, and—in the absence of good theories—there is no reason to make hard distinctions (although we all do).
  • Polarimetry plus coronography combined do much better than either alone, and it is possible to see reflected light from planets and proto-planetary disks at incredibly small luminosity ratios with the combination. Coronography is limited, at the present day, by speckles (wavefront irregularities), which are introduced not just by the atmosphere but by every optical surface, of course.

I asked Oppenheimer about modeling the speckles to remove them, and he said that they are very complicated and change with time and wavelength. Of course that is true, but that also helps with modeling them. A better answer is that the modeling can only happen with the intensity data, whereas improving the optics makes use of the amplitudes and phases, a space in which you can cancel out (rather than just model) your instrument issues. So Oppenheimer is right to be putting time and money into great optics, and to work on software only after optimizing the hardware as much as possible.


merger origin of the halo, GALEX

Adi Zolotov—spending this year at Haverford with Willman—was back in the city today, and we spent some hours talking about projects. Her big project is about the signatures of hierarchical galaxy formation in the straightforward observables available to us from halo stars, such as spatial distributions, metallicities, and kinematics. She is developing a theory-motivated set of observational tests that will be sensitive to merger history.

Chris Martin (Caltech) came and gave the Physics Colloquium. He spoke about the GALEX satellite, with a focus on technical and hardware aspects. It was beautiful stuff.


quasar absorption lines, submitted

Nikhil Padmanabhan (Berkeley) showed up at NYU today, and we discussed various matters related to BOSS and the baryon acoustic feature. Nikhil argued that we do not yet have a fully worked-out plan for analyzing the quasar absorption-line spectra, and measuring the correlation function therefrom. We discussed this in detail and I got a bit interested in the data analysis problem, which, when written down correctly, is quite difficult: It involves marginalizing over all hypotheses about each spectrum's unabsorbed continuum.

In other news, Surhud More submitted the transparency paper to ApJ. Should be on arXiv on Monday.


from Gaia to gravity

In what research time was left after completing my NSF proposal (hooray!), I worked on the theoretical question of how one might infer the gravitational forces or gravitational potential of the Milky Way from the Gaia observations. The problem is extremely non-trivial, because Gaia only measures positions and velocities, but it is accelerations that constrain the gravitational potential or forces. As I have said before, all important questions in science are ill-posed; the challenge is to come up with well-posed approximations to the ill-posed questions. I think there might be some well-posed questions to ask with Gaia, but it is hard to write them down even if you are allowed to assume that the potential is time-independent and azimuthally symmetric (which you aren't).


predicting radial velocities

Most of the day was lost to an NSF proposal, but Bovy and I got in some quality time on his prediction of radial velocities in the Solar Neighborhood. He has a model of the velocity field from the transverse velocities; that model plus the measurement of each star's transverse velocity makes a prediction for each star's radial velocity, in the form of an error-convolved probability distribution function. Bovy can search these for the most discrepant values among those stars that do have radial velocities (these are stars from the halo, probably), find the stars for which the radial velocity prediction is most informative or constraining (these provide critical tests of the model), and find the stars for which the radial velocity prediction is the least informative (these provide the most valuable follow-up observations for improving the precision of the model).


transparency scoop, velocities, testing GR

More and Bovy found papers today that—to some extent—scoop our result on the transparency of the Universe with baryon acoustic feature and type Ia supernovae. We did some re-tooling to emphasize the new aspects of our work in this context.

Bovy and I discussed at length various issues related to the determination of the velocity field from Hipparcos. Bovy has implemented a beautiful system that determines the error-deconvolved distribution of velocities, even in the face of the issue that each data point has a different error. We discussed using his fit to the velocity field to predict the results of radial velocity surveys.

Bhuvnesh Jain (Penn) gave a great astro seminar on testing gravity using the comparison of lensing and kinematic / dynamical measures of the potential. He featured Adam Bolton's strong lenses as among the best tests of this, although Bolton's test is at smaller length scales (kpc) than the scales of interest to most cosmologists (Mpc to Gpc). Jain made a very good argument that if you want to test GR, you have to work at all scales and with all techniques; the different cosmological tests can only be combined or ranked if you believe GR.


white dwarfs

In between grant-proposal writing I discussed with NYU undergraduate Antony Kaplan that he might run Lang and my faint motion software on the white dwarfs in the SDSS Southern Stripe to determine all the proper motions and parallaxes, even below the individual-epoch detection limits.


rule of thumb

Phil Marshall emailed me, asking about the original citation / derivation of the rule of thumb that a source detected in an image at some signal-to-noise ratio [s/n] can be centroided to an accuracy of about the FWHM divided by [s/n]. It was funny he asked, because we had discussed the very same issue only days earlier in responding to the referee for the faint-motion paper. Rix found King (1983), which probably is the first paper to discuss this (interested if anyone out there knows a more recent reference). Nowadays, the standard answer is the Cramer-Rao bound (Robert Lupton said this in response to a query from me), but that isn't quite the answer most people are looking for.


USNO-B and GALEX, supervised

I got stranded in Nantucket by high winds (cancelled ferries). This cost me Monday, and I spent parts of today making up for it. My research time was spent with Schiminovich, talking about what we should do with the SDSS and GALEX, and what we will do in the very short term. The very short term project is to use SDSS and GALEX to learn what quasars look like and then find them all-sky with USNO-B1.0 and GALEX. Same with white dwarfs. This is a nice project in supervised methods for automated classification, something I was railing against in Ringberg.



Spent the afternoon at the AAVSO annual meeting in Nantucket (yes, my travel schedule is not sane). My word are the AAVSO observers impressive! Every talk showed ridiculous light curves with incredible sampling and huge signal-to-noise, and many of the photometry sources are people working visually (with their eyes, no detectors). The data are consistent from observer to observer and highly scientifically productive. Of course, many of the AAVSO members use CCDs too, and these tend to be among the best calibrated and understood among hobbyist setups. Naturally, that is why I am here.


minimum message length

On the plane home from Germany, I worked on various writing projects, including the transparency paper and my Class2008 proceedings. I tried to write down what minimum message length could say about the Milky-Way-reconstruction problem from astrometric measurements of stellar motions and parallaxes. I have a strong intuition that there is a correct—or at least very useful—approach that could be inspired by or directly derived in the context of the idea that the most probable (posterior-probable) model is the one that provides the best (lossless) compression of the data given the coding scheme suggested by your priors. If I could write it down, it might help with the upcoming GAIA data.


class2008, day three

On the third day of Classification and Discovery, I chaired a session on the time domain; I was blown away by the data from the CoRoT experiment. But I was even more fired up by Anthony Brown's description of the problem of inferring Galactic structure from GAIA data. This problem has so many awesome aspects, including a good argument for generating the data with the model (think Lutz-Kelker problems with parallaxes), to a huge issue with priors (because the mission measures positions and velocities but not accelerations, and accelerations are what the Galaxy produces). I will say more about the latter when I get it sorted out in my head. GAIA really will provide the best inference problem ever encountered in astrophysics.


class2008, day two

This morning concentrated on understanding galaxies in large surveys. Among a set of interesting talks about galaxy classification, Boris Haeussler gave a nice talk in which he put the standard 2-d galaxy fitting codes through their paces, and found some very interesting things, including underestimated errors—even when he puts in fake data for which the fitting codes are not making approximations! Vivienne Wild spoke about a robust PCA and its use in understanding rare populations such as post-starbursts and their role in galaxy continuity. Two of my favorite topics in one talk! The PCA adjustment is very smart although somewhat ad-hoc (not described in terms of probabilistic inference). The post-starburst work is even better; it confirms our results that suggest that post-starbursts are key in the evolution of stellar mass from the blue to red sequences. Many other good contributions too numerous to mention, with a lot of people working on optimal extraction of information from spectra; very encouraging for the future of spectroscopy.


class2008, day one

The afternoon of the first day of Classification and Discovery concentrated on classification methods, almost all supervised (learn with training set, run on larger data). I am largely against these methods, in part because very few of them make good use of the individual noise estimates, and in part because your training data are never the same—in important respects—as your real data. However, a nice discussion ensued, led in large part by Alexander Gray (Georgia Tech); in this I argued for generative models for classification, but of course these are only possible when you have a good model of both the Universe and your hardware!


more writing

Spent my research time today cleaning up my class2008 proceedings, which is now a full-on polemic about massive data analysis. In the process, I learned something about minimum message length in Bayesian model selection; we have been using this but I didn't know how rich the subject is (though I don't like the persistent comment that it encodes Occam's razor—another good subject for a polemic). On the airplane to Germany I will have to convert all this into a talk.


wrote like the wind

In a miraculous couple of hours, I cranked out the remainder of our class2008 proceedings—the necessity of automating calibration, and methodologies for automated discovery in the context of a comprehensive generative model—to make a zeroth draft. In writing this, I realized that we have actually demonstrated most of the key concepts in this automated discovery area in our faint-source proper-motion paper.

Lang has promised me not just criticism, but a direct re-write of parts, within 24 hours.



In the small amount of research time I got today, I wrote my Class2008 proceedings as rapidly as possible.


catalogs as image models

I worked more on my position on catalogs, with some help from Lang. Here are some key ideas:

  • Catalogs originated as a way for astronomers to communicate information about images. For example, Abell spent thousands of hours poring over images of the sky; his catalog communicated information he found in those images, so that other workers would not have to repeat the effort. This was at a time that you couldn't just send them the data and the code.
  • Why did the SDSS produce a catalog and didn't just release the images? Because people want to search for sources and measure the fluxes of those sources, and people do this in standard ways; the SDSS made it easier for them by pre-computing all these fluxes and making them searchable. But the SDSS could have produced a piece of fast code and made it easy to run that code on the data instead; that would have been no worse (though harder to implement at the present day).
  • One of the reasons people use the SDSS catalogs is not just that they are easy to use, but that they contain all of the Collaboration's knowledge about the data, encoded as proper data analysis procedures. But here it would have been more useful to produce code that knows about these things than a dataset that knows about these things, because the code would be readable (self-documenting), re-usable, and modifiable. Code passes on knowledge, whereas a catalog freezes it.
  • The catalogs are ultimately frequentist, in that hard decisions (about, say, deblending) are made based on arithmetic operations on the data, and then the down-stream data analysis goes according to those decisions, even when the real situation is that there is uncertainty. If, instead of a fixed catalog there was a piece of code that takes any catalog and returns the likelihood of that catalog given the imaging, we could analyze those decisions probabilistically and do real inference.

And other Important Things like that.


catalogs polemic

I started writing my contribution to Classification and Discovery in Large Astronomical Surveys; I am writing about a generative model of every astronomical image ever taken. But right now the part I am most interested in is the part about catalogs being—explicitly—bayesian models of the imaging on which they are based. If the community adopted this point of view, it would have a number of advantages, in the documentation, usability, communication, interoperability, construction, and analysis of astronomical catalogs. I am trying to make this argument very clear for the proceedings.


lucky supernova, classification of algorithms

Alicia Soderberg (CfA) gave the astro seminar today, on a supernova she discovered by a soft x-ray flash apparently immediately at shock break-out, in other words at the beginning of the explosion, long before the optical came to maximum light. This permitted the study of the supernova from beginning to end. Unfortunately, her discovery involved an incredible amount of luck and we will have to wait for the next generation of x-ray experiments to discover these routinely. In answer to an off-topic question from me, she said that to her knowledge, there is no pre-cursor activity that precedes break-out. I asked because this would be an interesting effect to look for in historical data sets.

In the evening I finished writing up my short document that describes my classification of standard data-analysis algorithms.


insane theories, super-k-means

I can't say I did much research today, but while I failed to do research, Bovy (who is also attending a scientific meeting) looked at contemporary models that violate transparency to fix the supernovae Ia results in an Einstein—de Sitter Universe. These models are somewhat crazy, because they end up building epicycles to fix a problem that isn't really a problem, but in principle we will rule them all out with BOSS.

In my sliver of research time (and with Roweis's help), I figured out that PCA, k-means, mixture-of-gaussians EM, the analysis we did in our insane local standard of rest paper, and taking a weighted mean are all different limits of one uber-problem that consists of fitting a distribution function to data with (possibly) finite individual-data-point error distributions. I am trying to write something up about this.


Tolman and Etherington

In working on the More paper, I found myself looking through cosmography literature from 1929 through 1933. There is a series of papers by Tolman, in which he works out the Tolman test for the expansion of the Universe, which I think of as being a test of transparency and Lorentz invariance. Tolman worked out the test in the context of one world model (de Sitter's); his interest was in understanding the possible physics underlying the steady-state model; Etherington generalized it to a wider range of world models in 1933. After Etherington's generalization, the community should have realized that the test doesn't really test expansion per se, but it does test relativity and electromagnetism in that context.


tranparency: the monopole term

Worked on the paper I am writing with More on the transparency of the Universe, showing that the consistency of baryon acoustic feature (not oscillation) measurements with supernovae type Ia measurements provides a non-trivial constraint on Lorentz invariance and transparency. Right now this is not super precise, but it is highly complementary to measurements of absorption (presumably by dust) in lines of sight near galaxies, because there is no model-independent way to integrate the absorption signal correlated with galaxies to the mean, global value—what I would call the monopole term.


mixture of delta functions

Spent the day working on a specialization of mixture-of-gaussians (as a model for a distribution function in a high-dimensionality space) to mixture-of-delta-functions (which would have terrible likelihood for any data set except when you consider that there are observational errors). With Bovy's help I realized that the method we published in 2005 in this unlikely place actually doesn't work for the zero-variance corner of model space. Have to figure out why.



I spent part of the day thinking about and part of the day writing about a generalization of the k-means clustering algorithm to the case where there are missing data dimensions and dimensions measured with varying quality. That is, I am attempting to generalize it so that it clusters the data by chi-squared rather than uniform-metric squared distance. This, if I am right, will be a maximum-likelihood model for the situation that the underlying distribution is a set of delta functions and the data points are samples of that distribution but after convolution with gaussian errors (different for each data point). My loyal reader will recognize this as a statement of the archetypes problem on which I have been working for the last week or so.


Spitzer data, MW halo

Wu, Schiminovich, and I discussed data reduction for our large Spitzer program of spectroscopy.

At lunch—Columbia's Pizza Lunch—I described some of Koposov's results on the Milky Way halo potential as measured by a globular-cluster tidal stream.


faster code

Bovy has re-written all our "infer d-dimensional distribution functions when you have noisy data with missing values" code in C and it appears to be much faster than the (heroic) code written by Roweis and Blanton back in the day when we were all discovering the value of pair coding. Bovy and I spent some time discussing split and merge, which is a method for exploring mixture-of-gaussians models when you think you might be stuck in a local minimum.

Bovy and I also discussed the problem of comparing millions of SDSS spectra to one another in finite time. We figured out that the full N-squared calculation would take a year even if we coded it in machine language, so we want to do the full comparison only after trimming the tree with some reliable heuristics. We came up with a straw plan, but I am suspicious about its reliability (that is, we don't want to trim valid leaves) and effectiveness (that is, we want to massively speed things up).


extreme galaxy formation

My day-derailed-off-research-by-undergraduate-studies yesterday was interrupted by a nice talk by Schiminovich about what we have learned from GALEX and Spitzer about star formation in galaxies as a function of galaxy redshift and galaxy specific star-formation rate.

Today I plotted the number of archetypes required to represent a galaxy spectroscopic sample (from Moustakas) as a function of the statistical precision (as measured by chi-squared). The number monotonically increases with the precision, but differently than I expected.


integer programming

Roweis and my approach to constructing archetypes—small subsets of data points that represent all data points—is one of integer (or actually binary integer) programming. You have a large number of data points, and you include a small number of them, and exclude the rest, subject to constraints (the constraints that each point in the large set be represented), and optimizing some cost function (the total number of archetypes, in the simplest case). In general, these problems are, indeed, NP hard, as I suspected (below).

Roweis had the good idea of approximating the binary programming problem with a linear programming problem, and then post-processing the result. This is a great idea, and it works pretty well, as I discovered this morning, when everything came together and my code just worked. However, the number of archetypes we were getting in our post-processing was significantly larger than that expected given the performance of the linear program approximation.

It turns out that standard linear programming packages (open source glpk and commercial CPLEX, for examples) have integer and binary programming capabilities. These also solve the linear program first and then post-process, but they do something extremely clever in the post-processing step and are much better than my greedy algorithm. They both come very close to saturating the linear programming optimal cost, for the problem we currently care about (although CPLEX does it much, much faster than glpk, in exchange for infinitely larger licensing fees).

It was a very satisfying, research-filled day. As time goes on I will let my loyal readers know why we are interested in this.



I worked on code to generate from a set of delta-chi-squared values a linear program in CPLEX LP format for the archetypes project. Most of the difficulty was in formatting the lines, of course!


linear programming

Spent the day learning about linear programming, for Roweis and my spectroscopic archetypes project. Our project is an integer programming problem, which is NP hard (I think), but we have a linear programming approximation. Linear programming is something I learned in high school; now there are lots of free codes that can deal with hundreds of thousands or millions of variables and constraints. Unfortunately, the languages with which the programs can be specified are a bit non-trivial; I have nearly figured out how to code my problem in one of those languages, but I don't know which language to use.



tidal stream radial velocities

I briefly helped Koposov this morning on an observing proposal to follow up his statistical measurement of the proper motion of a cold tidal stream with stellar radial velocities. The combination of transverse and radial velocities with distance and angular information means that if this proposal is accepted, Koposov will have not only full 6d phase-space information, but he will have that along the length of a long stream. This permits extremely precise orbit modeling, or, as Rix would say, a direct measurement of the acceleration due to gravity (velocity of the stream and curvature of the stream makes acceleration of the stream).


black-hole orbits

Today was black-hole-orbit day, with talks by Gabe Perez-Giz and Janna Levin (both Columbia) on methods for calculating and classifying all possible orbits around black holes. Their techniques make use of periodic orbits, which comprise a dense set that fully covers orbit space (at least for extreme mass ratio). Two nice talks and lots of discussion.


source variability

My only contributions to astrophysics knowledge today were (1) helping van Velzen extract and analyze galaxy (yes galaxy, not star) light curves from the SDSS Southern Stripe, and (2) discussing the V-max or Malmquist issues in flux-limited samples with Wu.


cosmic-ray anisotropy, regularization and convergence

I had the privilege of serving on the PhD thesis committee for the defense of Brian Kolterman's (NYU) PhD thesis today. He performed a set of very careful statistical tests of the angle and time distributions of about 1011 few-TeV cosmic rays incident on the Milagro experiment. He finds an anisotropy to the distribution in celestial coordinates, he finds a time dependence to that anisotropy, and he finds the (expected, known) effect of the orbit of the Earth around the Sun. The most surprising thing is the time dependence of the (very small but very high significance) anisotropy. After the very nice defense, Gruzinov and I spent some time arguing about whether the anisotropy and its time derivative were reasonable in the context of any simple model in which the cosmic ray population is fed by supernovae events throughout the disk of the Galaxy. I think I concluded that his results must put a strong constraint on the coherence or large-scale structure of the local magnetic field.

Bovy and I discussed the convergence and regularization of the mixture-of-gaussians model that he is fitting to the error-deconvolved velocity distribution in the disk in the Solar Neighborhood. We read some of the literature on EM and it was very instructive. Now Bovy has some serious coding to do. If he succeeds with all these enhancements, he will be hitting this problem with a very large hammer.


running code

I helped Sjoert van Velzen (NYU, Amsterdam) run our SDSS code to extract multiply observed objects in the SDSS Southern Stripe. He is looking for extremely variable AGN. Later, I rooted around with Wu in ancient directories looking for our copy of the PEGASE models for stellar populations. We found them. Can't say as I did much other research today!


shutting down star formation, galaxy templates

At group meeting, Wu and Moustakas spoke about galaxies with very high star formation rates, and galaxies that have just shut down their star formation abruptly. Wu is looking for the trigger for the cessation in these galaxies, which look like a generic phase in galaxy evolution. Moustakas has been looking at incredibly high velocity (thousand km/s) outflows of gas, possibly driven by very strong star formation.

After lunch I spoke with Bovy about my ideas to replace a PCA space with a set of hard templates in redshift determination and outlier finding for galaxy spectra. Although you need many more templates to represent the galaxies than you would need basis spectra to represent the spectra at some level of precision, you don't have to do nearly as much work to match spectra to individual unmixed templates as you have to do to find the position of the spectrum in the n-dimensional PCA space, in principle. Whether that is true in practice too, I don't yet know.


information in an image, Auger

I had long conversations with Rob Fergus (NYU) and Lang about the information content of an image. Dustin resolved some of my paradoxes, in particular he figured out that you can't say anything about how much information is in your image unless you know the distribution function over images! That distribution function is incredibly hard to describe, of course, since even for tiny images like Fergus's, it has a googol-cubed dimensions! Fergus's approach is to describe this function by sampling it, in some sense, but really what we have to do in our work is just approximate it in some sensible way.

In the afternoon, Jeff Allen (NYU) gave an excellent PhD candidacy exam talk in which he described how to reconstruct the physical properties of cosmic rays incident on the atmosphere from their fluorescence and ground-shower properties as measured by different Auger instruments. There are lots of puzzles in the reconstructions, and some of them will have mundane resolutions. There is an interesting possibility, however, that some will have extremely non-trivial resolutions.


computer failure

My household had a computer failure yesterday, which was impressive, since it was the first day of school. Fortunately, my ridiculous attention to backups paid off and we lost nothing. But it threw the supply chain into disarray and I spent today looking at purchasing some incredibly cheap spares. This is not research.


back to work

It was tough to get back to work in New York, and I spent most of the day getting ready for teaching and committees. That's not research.

I spent an hour talking with Bovy about the reconstruction of the velocity field of stars in the disk. This project is moving slowly just because our (well justified, correct) algorithm is extremely slow. I am very excited about the inference aspects of this project, because we are going to be able to make a lot of predictions about stellar radial velocities, and we will be able to test those predictions with extant data.

On a related note, I started writing a short polemic about representing posterior probability distributions (think Bayes) by sampling, and how that helps in complex scientific tasks.


last day in Heidelberg

I didn't do much research today because it was my last day at the MPIA in Heidelberg. What a great summer!

My one substantive scientific discussion was with Rix about understanding open clusters—their ages and mass functions—through astrometry. It is an old subject, but it hasn't really been tapped for all it is worth. And there are a lot of data.



Visualize me doing that NFL-style end-zone post-touchdown dance. The faint-source proper motion paper Measuring the undetectable (link should come live on 2008-09-01) is submitted to The Astronomical Journal and to the arXiv. Lang, Jester, and Rix all did some last-minute pulling together to get that done. Thanks, team! Now, does anyone want to follow up our brown dwarfs with infrared spectroscopy?


SDSS Southern Coadd Catalog

We realized today that we were slightly mis-using the SDSS Southern Stripe Coadd Catalog, and the parent sample for our faint-source proper motion paper went from 1500 sources to less than 100. But that's good, because now some of our statistics make much more sense.


warm Spitzer

I contemplated putting in a truly insane letter of intent to the Spitzer Cycle 6 (warm mission) call for proposals, which is for 10,000 hours of imaging at 3.6 and 4.5 microns. My contemplation was carried out while reading observing strategy cookbooks (thanks, Spitzer Science Center).


colors and proper motions

Lang computed infrared colors (from UKIDSS and SDSS-II) for the very faint, fast-moving sources to compare with the less-fast-moving sources at the same magnitude. Although all are very red in i−z, they vary hugely in z−J, with the faster-moving redder. This is good.


fundamental inference

After a few edits to the faint-motion paper, I had a long conversation with Adam Bolton (Hawaii) about constraining cosmological models (on small scales) with data (such as galaxy positions, redshifts, weak lensing, and the like). He wants to advocate an approach in which we find all (or a representative sample of) realizations of the density field that are consistent with the data, and ask whether they (or any of them) are consistent with the fundamental model, for model testing. This is in contrast to the usual techniques of computing statistics on the observations, statistics on the model, and comparing them. This standard technique rarely gets you close to saturating the available information, and given the quality of the CDM model, if you want to find problems you are probably going to have to come close to saturating the information available.


facilitating science

Jester, Rix, and I had various conversations today about how we can demonstrate that Lang and my method for measuring the astrometric variations of extremely faint sources can be used to speed, cheapen, and facilitate upcoming science projects, especially PanSTARRS. I am now working on paper modifications based on these discussions.


computer issues

What little work I did today was mainly preoccupied with computer issues, and not ones of an intellectually engaging nature.


Rix comments

Rix "bled" all over the faint proper-motion paper this weekend (think: lots of red ink). I got started mopping it up today.


finished polemicising

I finished my PCA polemic; it is too theoretical and needs to be threaded with real examples of failures and improvements and alternatives applied to real data sets. That is a long-term project, of course.



I poster-ified Lang and my paper faint-source proper motions for the SDSS Symposium next week in Chicago.



I finished everything I could do on my project on the monopole term in the transparency with More; that project will now wait until he returns. I worked on writing up a principled approach to a side project that Lang and I have been discussing on finding variable sources in pure photon time streams.


jackknife and PCA collide

As my loyal reader knows, I have been bashing PCA and hyping jackknife. As I was responding to some local comments on our paper on faint-source proper-motions, I found myself adding the words principal components to a discussion of how well jackknife works! This is because if you have d parameters, and you want to measure the d×d covariance matrix, you will rarely have enough jackknife trials to fill in every element of the matrix precisely. To do so, you would need N much larger than d, and even then you would only do well if the covariance matrix describes a variance that is close to spherical.

However, the jackknife (except in pathological situations) will return a sampling of the covariance matrix that gets the principal components correct. This is because the principal components will dominate the variance (by definition, in some sense). And for error propagation, all you care about are the dominant directions in the space, as defined by the true covariance matrix; this is a rare case where the concept of PCA is good: it is a rare case where you care most about the directions of largest variance.

This relates to one of my unwritten polemics: uncertainties should be communicated via samplings, not analytic descriptions of multi-dimensional confidence regions.


computers down, PCA, redshifts

I wrote more of my PCA polemic, in part preparing myself for a possible assault on the emission lines in SDSS spectra, parallel to the PCA stuff I have been doing with Tsalmantza.

Work finishing the faint proper-motion paper has been halted by computer problems. The Astrometry.net project is trying to run as a web service, and we run the service and our project management and our source code repository all out of the same few machines. One of these machines is failing frequently, and we may have to harden substantially our web serving technology. Right now, we are running with home-built stuff and hoping for good luck, but that isn't sustainable. Anyone want to donate us some enterprise-scale uptime foo?

In other news, Bunn and I finished our comment on cosmological redshifts. Comments greatly appreciated.



I started to write my polemic about PCA today. I decided that it was a worthwhile paper, or part thereof.


ultraviolet-faint star-forming galaxies

Today, Wu found a population of galaxies that are clearly forming stars (significant H-alpha emission, star-formation line ratio diagnostics), not strongly extincted in the optical (H-alpha to H-beta ratio near 3), but with very little GALEX NUV flux. There is pretty-much no way to make such galaxies, so we suspect that we have issues with either the SDSS data or the GALEX data or their intercomparison (for example, overlapping sources, or star-formation offset from the nucleus, etc.). But I am now confident that we will either be able to refine our procedures so that we can use GALEX to find post-starburst galaxies, or else we will discover a new and interesting galaxy sub-population. Either way, I am happy.


cosmological redshifts

Ted Bunn (Richmond) and I finished the first draft of a piece on cosmological redshifts today. We argue—against prevailing wisdom—that the redshifts of distant sources can be regarded as kinematic, that is, as due to recession velocities. We hope to be ready to put it on the arXiv soon.


information in images

Lang and I spent more time discussing how one measures the information in an image. It turns out that despite the fact that information is a quantitative property, how one calculates it depends on what parts of the data stream are expected to contain the information, and very different quantities are produced when you make different assumptions. For example, the information in an image, by the usual methods, does not change if you randomize the pixel order, even though most of us would say that the randomized picture contains much less information! That is because most methods for measuring the information consider only the pixel histogram, and not the pixel adjacency relations.



I worked on completing final edits and to-do items on the faint proper-motion paper


CBR transparency

I tried to work out the limit on cosmic transparency implied by the accuracy of the COBE FIRAS experiment measurement of the blackbody spectrum. It provides an effective "Tolman test" because the experiment measures not just the spectral shape, but also the absolute amplitude of the intensity field.


tidal streams, spectral archetypes

I spent most of the day reading and summarizing published work on tidal streams as possible measuring devices for massive substructure in the Galaxy halo. So far I have found no analytic calculations, and no investigations of individual features in individual streams.

In conversation with Tsalmantza, I worked out a fully empirical methodology for constructing a set of spectral archetypes that fully represents all of the galaxy spectra in the SDSS. Before today, the only methodology I had involved theoretical spectral fitting. But the work we have been doing on constructing a reliable and useful PCA subspace may actually pay off (despite my dislike of PCA).


histogram binning

Many years ago (2003, maybe?), I worked out a maximum-likelihood method for choosing the best possible binning for a histogram of discrete data. It is based on leave-one-out cross-validation. This has slept in my CVS repository until today, when I was about to post it to arXiv. Of course, just as I was beginning to post it I found this paper, with which I don't totally agree but which is clearly highly relevant (along with a number of papers referenced within it), so I will post tomorrow.


deconvolution, cold streams, marginalization

In separate conversations, Lang and I and Marshall and I have been talking about image modeling and deconvolution. I finally realized today something very fundamental about deconvolution: A deconvolved image will never have a well-defined point-spread function, because the high signal-to-noise parts of the image (think bright stars in astronomy) will deconvolve well, while the low signal-to-noise parts (think faint stars) won't. So the effective point-spread function will depend on the brightness of the source.

Properly done, image modeling or deconvolution—in some sense—maps the information in the image, it doesn't really make a higher resolution image in the sense that astronomers usually use the word resolution.

This all gets back to the point that you shouldn't really do anything complicated like deconvolution unless you have really figured out that for your very specific science goals, it is the only or best way to achieve them. For most tasks, deconvolution per se is not the best thing to be doing. Kind of like PCA, which I have been complaining about recently.

In other news, Kathryn Johnston (Columbia) agreed with me that my first baby steps on cold streams—very cold streams—is probably novel, though my literature search is not yet complete.

In yet other news, Surhud More and I figured out a correct (well, I really mean justifiable) Bayesian strategy on constraining the cosmic transparency in the visible, by marginalizing over world models.


data reduction, data compression, and probabilistic inference

I spent most of my research time today thinking about how to analyze large collections of images. Lang and I are coming around to a data compression framework: We add or change or make more precise model parameters (such as star positions and fluxes and adjustments to the PSF or flatfield) when adding or changing or making more precise those parameters reduces the total information content in (smallest compressed size of) the residuals by more than it costs us in an information sense (again, compressed size) to add to the parameters. This is data reduction.

There is a full worked-out theory of inference based on data compression; in fact to the extremists, the only probabilistic theory of inference associates probabilities with bit lengths of the model description (lossless compression) of the data stream. A beautiful (and freely available on the web; nice!) book on the subject is Information Theory, Inference, and Learning Algorithms by David MacKay.

For astronomical imaging, the best compression scheme ought to be a physical model of the sky, a physical model of every camera, and, for each image, its pointing on the sky, the camera from which it came, and residuals. The parameters of the sky model constitutes the totality of our astronomical knowledge, and we can marginalize over the rest. I love the insanity of that.


observations of streams

I made mock observations of perturbed, cold streams, to begin the process of testing my predictions of yesterday.


disrupted and perturbed

I spent much of the day working on an old-school perturbation-theory calculation. I consider a cold tidal stream in a host galaxy potential, perturbed by the close passage of a point mass. In the limit of small perturbations of very cold streams, this calculation has only two free parameters: the angle between the direction of the stream and the velocity of the perturber (in the comoving frame of the stream or equivalent), and the time since the impulse (in some scaled time units related to the mass and velocity of the perturber or amplitude of the perturbation). This is all very idealized, but actually, to zeroth order, I think I may have exhaustively described all possible perturbations to cold streams. Now the idea is to use this to constrain the substructure in our Galaxy with observations of cold streams.


finished writing

I finished the faint-source proper-motion paper. It still needs to be vetted by collaborators, but I am stoked. Here is the abstract:

The near future of astrophysics involves many large solid-angle, multi-epoch, multi-band imaging surveys. These surveys will, at their faint limits, have data on large numbers of sources that are too faint to detect at any individual epoch. Here we show that it is possible to measure in multi-epoch data not only the fluxes and positions, but also the parallaxes and proper motions of sources that are too faint to detect at any individual epoch. The method involves fitting a model of a moving point source simultaneously to all imaging, taking account of the noise and point-spread function in each image. By this method it is possible—in well-understood data—to measure the proper motion of a point source with an uncertainty (found after marginalizing over flux, mean position, and parallax) roughly equal to the minimum possible uncertainty given the information in the data, which is limited by the point-spread function, the distribution of observation times, and the total signal-to-noise in the combined data. We demonstrate our technique on artificial data and on multi-epoch Sloan Digital Sky Survey imaging of the SDSS Southern Stripe. With the SDSSSS data we show that with this technique it is possible to distinguish very red brown dwarfs from very high-redshift quasars and from resolved galaxies more than 1.6 mag fainter than by the traditional technique. Proper motions distinguish faint brown dwarfs from faint quasars with better fidelity than multi-band imaging alone; we present 16 new candidate brown dwarfs in the SDSSSS, identified on the basis of high proper motion. They are likely to be halo stars because none has a significantly measured parallax.


post-starbursts in GALEX, writing

Continued writing on the faint-motion paper. It is funny how much there is left to do when a project is done!

Wu began working on the GALEX properties of post-starburst galaxies: They aren't detected at the MIS depth. That is good, because star-forming galaxies are detected. So a GALEX selection is likely to work, at some level. The question is, how good will it be? We would like to have a GALEX-based selection of the post-starburst galaxies so we can perform emission-line studies without worrying about the fact that the post-starbursts are selected on the basis of emission lines. Fortuitously, Christy Tremonti (Arizona) showed up at the MPIA today for a month, so she may be able to help out.


submission and resubmission

Bovy, with some help from Moustakas and me, got ready the galaxy-cluster transparency paper for resubmission in response to referee. The referee really made a big difference to the paper, because he or she recommended averaging the samples in a better way, which improved the results. I, with help from Barron and Roweis, got ready the Blind Date paper (on estimating image dates using proper motions) for resubmission. And I, with help from Lang, have promised my co-authors I will get the paper on faint-source proper motions ready for submission by the end of the week. That end approaches fast.


bimodality, transparency

I switched my search for low kurtosis directions in spectrum space into a search for bimodal directions. That is, I wrote down a scalar (which has to do with k-means with k=2) that decreases as a distribution becomes more bimodal. Then I searched Tsalmantza's high-variance PCA components for directions in the space that are most bimodal. I find three, perpendicular bimodal directions! Of course each one is a different version of the red–blue galaxy bimodality, of which I have been an unheard critic. More on this as I understand it better.

Surhud More (MPIA) and I began working this week on the monopole term in the opacity of the Universe, using the consistency of baryon-acoustic and supernovae measures of the expansion history to check the phase-space conservation of photons. This test is (nearly) independent of world model, as it depends almost entirely on purely special-relativistic considerations. We hope to have an LPU (least publishable unit) on the subject soon.


null hypothesis

Wu and I are back onto looking at the processes that lead to post-starburst galaxies, this time with Frank van den Bosch and Anna Pasquali here at MPIA. The first step was to create comparison samples to make null hypotheses; because the catalog we are using has redshift and flux dependencies, we built comparison samples to be exactly matched in redshift and brightness (stellar mass). Wu finished that today.


kurtosis minimization

With Tsalmantza's help, I got the kurtosis minimization working on high-variance directions in the SDSS spectral space. In the minimal kurtosis directions (within the high-variance subspace), the star-forming and non-star-forming galaxies separate very clearly, and there are other tantalizing structures. I think this technique may have legs.


blind date

I worked on the response to referee for our blind date paper. We should be able to resubmit this week.


finishing paper

Lang and I discussed finishing the faint-source proper-motion paper, among other things.



A few days ago I bashed PCA on various grounds, in particular that it ranks components by their contribution to the data variance, and it is rarely the data variance about which one cares. Today in discussions with Tsalmantza I realized that one could rank components by the kurtosis of their amplitudes (rather than the variance), and lowest first. This has a number of advantages, but one is that (uninteresting) data artifacts and outliers tend to create high-kurtosis directions in data space, and another is that if there are directions that are multi-modal, they tend also to be low in kurtosis (think the color distribution of galaxies, which is bimodal and low in kurtosis). It is still a very frequentist approach, but a search for minimal kurtosis directions in data space might be productive. Tsalmantza and I hope to give it a shot next week.


cross-correlation issues

I made plots of my quasar–photon and white-dwarf–photon cross-correlations, and the random samples which are supposed to be equivalent. There must be some kind of bug, because the random has a negative feature at the center! So I will spend the rest of this week de-bugging.


mean white dwarf in GALEX

Schiminovich and I hope to detect intergalactic scattering with our quasar–photon cross-correlations. In order to make this detection, which will require precision, we need to create a differential experiment. The first difference is between mean quasars and mean white dwarfs. The white dwarfs are so close, they should have essentially no scattering (or scattering local to the observatory that is shared with the quasars). I made the white-dwarf–photon cross-correlations this weekend.

The mean white-dwarf image is probably interesting in its own right, if I broke down the white dwarfs by type and temperature, because they would provide extremely high signal-to-noise GALEX information. Does anyone want that?


cross-correlate and visualize

Wrote code to combine the jackknife subsamples of my quasar–photon cross-correlation functions and visualize the output. The redshift-dependence of the ultraviolet flux from quasars was not as strong as I expected. Either I have a bug in my code or a bug in my thinking.


ultraviolet vs redshift

I got back up to speed on the quasar-photon cross-correlation, splitting the SDSS photometric quasar sample by redshift. This is the first test: Does the ultraviolet flux depend on redshift as we expect? It ought to drop out at the redshifts at which the two GALEX bandpasses cross Lyman-alpha and the Lyman limit. Hope to have results for tomorrow.


data mining

I started working this week with Vivi Tsalmantza (MPIA) on data mining in the SDSS spectra. She is starting with dimensionality reduction and classification. The standard tool is PCA, but it ranks the components in terms of their contribution to the data variance. This has two problems, the first is that in many data directions your variance is probably dominated by your errors, not anything of scientific interest, and the second is that astronomers don't necessarily care most about the data variance! But we came up with some ways to apply robust estimation techniques to the dimensionality reduction, and I have an evil plan of eventually performing the dimensionality reduction on the error-deconvolved underlying distribution. But that may not be possible, for all sorts of reasons.


source association theory

I started writing some theoretical stuff about source association. I guess this would qualify as theoretical data analysis. I don't know what could be more boring than that! I am trying to justify the position that source association across catalogs is an ill-posed problem with no well-justified (even within any reasonable approximation) solution to date. This is a bit hard to argue given that astronomers have been doing it successfully for the last hundred years.


figures done

Lang and I finished all the figures for the first draft of the faint-source proper motion paper today, and I finished a first draft of all of the figure captions, one of which is almost an entire page of text.


stellar streams

Sergey Koposov and I spent time talking about finding streams in the SDSS imaging, using things akin to matched filters. Matched filters are very frequentist; they involve differencing integrals of the data. I prefer similar methods but that involve fitting distributions to the data that are more bayesian, but either way, it is clear that there is a lot of information in the color–magnitude space that is complementary to the information in angle space, and there also appears to be information in the proper-motion space. I would like us to try using it all.


stellar stream EM

Lang and I had discussed with Rix and Sergey Koposov (MPIA) the statistical detection of the proper motion of a cold stellar stream using the proper motions from comparison of the SDSS with the USNO-B imaging. This looks possible because although no star in the stream is measured at high signal-to-noise, and although no star is clearly in the stream or not, there is power in numbers. Unfortunately, just as Lang and I were getting ready to bust out some expectation-maximization, Koposov obtained the proper motion of the stream in question by a completely straightforward frequentist analysis. Nice work!


more informative figure

Lang and I worked on figure making for our faint proper-motion project. Here is the current incarnation; it shows the best-fit path for the star, and a sampling of the error distribution as a set of N other paths. The faint disks show the star sizes and positions given the image point-spread functions and assuming the source is traveling on the best-fit path.



Today Lang and I worked on the plot that shows that our proper-motion measurement code comes close to saturating the information in the data.


hypothesis comparisons

After Lang gave the Galaxy Coffee talk at MPIA, there were lingering questions about the differences between modeling a little smudge in co-added multi-epoch data as a moving, unresolved star or as a non-moving, extended galaxy. In the co-add, these hypotheses are hard to distinguish, but in the individual images, these hypotheses are very different, even though the object may not be measured with good signal-to-noise at any epoch. We began the work of explicitly making the non-moving galaxy model, so that we can perform clean hypothesis tests and quell the last of our critics.


synthetic image modeling

Lang and I discussed once again the issue of matching up datasets at the catalog level, and learning thereby about the positions and motions of stars on the sky. In each of these discussions we always conclude that the limitations of catalog level are such that we always want the images and to work at image level directly. However, today we realized that we could work with synthetic images, created from the catalogs and our model of the sky. The parameters of the model of the sky could be optimized to create synthetic images that best fit the synthetic images created from the set of catalogs. This got us substantially closer to the scalar objective we seek at catalog level.


saving dollars with astrometry

Lang and I have showed that we could have saved the very-red-objects community quite a bit of telescope time (read: money) by measuring proper motions of faint sources and thereby obviating a bit of spectroscopy. But Rix asked us if we could have saved them not just the spectroscopy but also the infrared imaging. Lang and I spent time on this question today. Not sure that we will be able to be so cocky here.


supernova rates, GAIA

Lang and I sat in on the MPIA GAIA group meeting. We discussed the photometric and spectroscopic identification of stars, binary stars, galaxies, and quasars in the GAIA data stream, and tests on the SDSS data and other related data. The GAIA team is using support vector machines, which also got Schiminovich and I excited last month; the GAIA team may be the main (or only?) users of SVMs in astronomy. It turns out there is work here that is similar to the archetypes project I have been pitching to Bovy.

In the late afternoon, Dani Maoz (Arcetri, Tel Aviv) gave a nice talk on supernova rates, focusing on Type Ia rates. He made a pretty good case that some of the Type Ia supernovae are prompt, and those that aren't prompt occur on the short side of the delay distributions that are discussed in the literature. This makes it interesting that galaxies show such clear alpha-enhancement patterns.


group finding

Lang and I pair-coded some web-based analysis of stars taken from the SDSS imaging sample, looking for groups that are plausibly tidally disrupted structures in the Milky Way halo. We didn't find anything, although we wrote code to automatically name it if we do!


nearby supernova, fast movers

Oliver Krause (MPIA) gave a beautiful talk about the discovery of light echos from the Cas A supernova and its identification as a type IIb. This identification was performed by taking a spectrum of the original supernova, but delayed by 300 years because it is being observed now in reflection from a nearby dust cloud! The identification is remarkable, because as of now it appears that the Local Group is way over-represented in type IIb SNe.

Lang and I worked on the fast-movers in our faint proper-motion paper.


structure in point sets, brown dwarfs

The MPIA was abuzz today with talk about finding structures in point sets with non-trivial error properties. This, of course, relates to the identification of streams and satellites in the Milky Way halo, which has been an industry for the last few years, since the discovery of Willman 1 at NYU in 2005. Lang was consulted, as our computational statistics expert.

Lang and I also made the list of low-mass star (including brown dwarf) candidates from our proper motion work into a LaTeX table for publication.



Lang and I conceived of and started to work out a project to build an all-sky astrometric catalog out of the original USNO-B imaging catalogs and the 2MASS catalog, all tied to Tycho. This would be the first step towards our first Astrometry.net astrometric catalog. This would also be a first shot at doing source matching at the catalog level, a subject about which we have been talking for half a year. The first order of business, we realized, is to make a conceptual data model and a scalar objective function.


last day

Today was a relatively unproductive day, but my excuse is that it was my last day at Columbia. I have had a lot of fun up there, mainly thanks to the collaboration with Schiminovich. Next week I will start a thirteen-week stint in Heidelberg and other points in Europe.


mean DB WD image, transparency

The image below shows a measurement of the cross-correlation between DB white dwarfs taken from the SDSS White Dwarf Catalog and far-ultraviolet photons observed by GALEX.

The central source in the image is effectively the average (mean) image of a DB white dwarf in GALEX; its morphology is (roughly) consistent with a GALEX point-spread function. The other sources in the image are residuals created by random bright FUV sources nearby to individual DBs; there weren't enough DBs to make this all average away; none of the nearby sources seen in the image are significant if you ratio this image to the root-variance image obtained by jackknife resampling.

Schiminovich and I spent a long time today discussing the possible effects that might make the same kind of average image not look like a point-spread function when the central sources are at cosmological distances, because of scattering and correlated sources. We also discussed how measurements of the average image as a function of sky position and redshift could constrain transparency.


average image, does environment matter?

Actually started up the average image slash quasar–photon cross-correlation code today. It runs. Does it give good answers? I don't know because it is incredibly slow. Ah well, if only I knew a good programmer!

Schiminovich and I spent time talking about religion and politics and then a bit about galaxy environments, in particular the resolution of the debate between Jacqueline von Gorkom (Columbia) and myself about whether environment matters to galaxy evolution. Being a stickler, Schiminovich objected to that characterization of the issue up-front. But I think we almost agreed that environment has its strongest effects on galaxies near the centers of clusters and merging galaxies. These systems make up only a percent or so of the galaxy population in the last few Gyr, so although environment matters deeply to these galaxies, they are a trace population.

We certainly know about environment effects on the bulk of galaxies, but it is straightforward and relatively uninteresting: At fixed mass (or other properties), the most important environmental effect is on the specific star-formation rate. But we know from statistical tests that environment affects this very slowly or very indirectly, probably by heating the interstellar medium (slowly, not generically in rapid bursts or events, which are extremely rare). So I think there is a consensus position possible: Environment matters, but a small amount to a large fraction of galaxies and a large amount to a small fraction.


mean image code

Nearly finished my mean image code today. This takes a set of points on the sky, and a set of matched random points, and creates the mean GALEX image of those points, properly background-subtracted.


ultraviolet lenses, pretty pictures

Bolton, Schiminovich, and I determined empirically that it would have been impossible to detect the SLACS lenses (which were detected as double redshifts in the SDSS spectra) using GALEX. GALEX detects some of the lensing galaxies, but it doesn't look like it detects (strongly) the background lensed galaxies (despite the fact that they are detected because they are strongly star-forming). Oh well.

Bolton and I made some pretty pictures (or relatively pretty) of the SLACS lenses. The challenge was to show the lensing clearly, despite the fact that most of his HST images are only single-band.


halo shape, Nature

Zolotov and I discussed how the Milky Way halo's shape is measured, with the configuration and velocity-space distribution of stars. She can repeat these measurements in a simulated Milky Way and see if the measurements match the truth.

We made it into the editorial page and news pages of Nature today, here, here, and here.



Adam Bolton (Hawaii) gave the pizza lunch talk today, about the SLACS project, which has found a large fraction of all known strong gravitational lens systems by looking for double redshifts among the SDSS spectra. Later in the day, Schiminovich and I discussed with Bolton the possibility that the second redshift could have been discovered in GALEX, which is insane, but maybe possible.


cross-correlation and average image

I started on my quasar–photon cross-correlation code. If you preserve the azimuthal information, this cross-correlation is what astronomers would call the average or stacked image of the quasars in the GALEX imaging. That's a nice talking point, about which I may wax poetic soon.


photon—quasar correlations

Schiminovich and I figured out that one possible method for working out the sources of the metagalactic ionizing radiation impinging on the Milky Way (local extragalactic intensity field at 912 Å) is to cross-correlate the far-ultraviolet (and also near-ultraviolet) photons recorded with GALEX with quasars at various redshifts. I have waxed poetic about cross-correlations before, because they can be measured at very high signal-to-noise in common situations, and often contain all the information you can possibly have (see yesterday's post, for example, and posts on statistical gravitational lensing previously).



Brice Ménard (CITA) gave a nice seminar at Columbia about angular correlations between background (redshifts one and higher) quasars and foreground (redshifts one third) galaxies. The correlations are dominated by lensing, but also have a small color term which is consistent with absorption by dust. His results on dust compare favorably to my results with Jo Bovy and John Moustakas, although he worked entirely in angular units; it is somewhat easier to interpret and model in projected transverse distance units (a simple modification to their current strategies). He had not done any de-projection or halo modeling of the results, so he couldn't precisely say what dust is associated with galaxies of each individual type.

After the talk Lam Hui (Columbia) and I discussed various matters transparent, including tests for the monopole (unclustered) term in attenuation, and possible other explanations for chromatic effects in photon propagation. For example, if the dark matter is made of substantial millilensing or microlensing lumps, and if quasars have a wavelength–size relation, there might be chromatic effects in the angular correlations.


nonlinear dynamics

Bovy and I spent the morning discussing places chaos (as in nonlinear dynamics) enters into astrophysics, since he is interested in both subjects. I identified several places: In the Solar System, the Lyapunov time is much shorter than the time over which there has been and will be macroscopic stability. There are also many issues in planetary system formation. In numerical models of globular clusters and similar N-body systems, some of the relaxation times seem to disagree with analytic/scaling expectations. In the disk of the Milky Way, there is very complicated velocity-space structure, possibly caused by caustics and other nonlinear structures in phase space. In the halo of the Milky Way, phase-space structure is expected to be extremely rich because the accretion history is extremely rich. In cosmology, there is my insane idea of constrained realizations of the entire observable Universe, which has a nonlinear dynamics or control theory aspect to it.


archetypes, luck, outreach

Spoke with Jo Bovy about finding spectral archetypes among the SDSS spectra; Roweis has code to find (an approximation to) the minimal set of archetypes that represent all of the SDSS spectra, if we give him the graph of which galaxies represent which others. This set of archetypes would be the ultimate in non-parametric descriptions of spectrum space; far preferable to things like PCA, which assume that the galaxies come from (and fill) a linear subspace; and it would permit new kinds of galaxy modeling and fitting.

At Pizza lunch, Arlin Crotts (Columbia) spoke about lucky imaging they are doing with a small telescope on Kitt Peak. He described a good night in which they could use two percent of the data for a diffraction-limited (or nearly) stacked image. This led to long philosophical discussions with Schiminovich about whether you can or should use all of the data. I took the position—that you might imagine—that each image must contribute information; there is no way that adding in information from a new image could reduce the information; if adding in less-good images is making the stack worse, then those images are being added in wrongly. Of course adding them in rightly might be damned difficult, since it probably involves something akin to deconvolution (or, as the astronomers say, forward modeling).

In the evening, I spoke to the Amateur Astronomers Association of Princeton about Astrometry.net and the future Open-Source Observatory. Extremely enjoyable!


metagalactic radiation field, healpix

I wasted much of the day building IDL routines to make pretty pictures from healpix maps. This was a waste both because there are better programmers than me, and better languages than IDL.

Over lunch, Schiminovich and I discussed our project of estimating the intensity of the ionizing radiation impinging on the Milky Way from extragalactic sources, and the contribution from different kinds of sources. We discussed the issue that most of the ionizing radiation comes from significant redshift; this is both because quasars are more abundant and luminous at higher redshift, and because galaxies tend to be self-shielding. Hopefully we can refine our understanding of the radiation with GALEX.


Dr Pietro Reviglio

I sat on the (successful and entertaining) thesis defense of Pietro Reviglio (Columbia), who used FIRST, NVSS, and SDSS to investigate the evolution of AGN, and the relationships between radio type and optical spectral type. He finds significant evolution, and some problems for the orientation model for the differences between broad-line and narrow-line AGN. He also finds some evolution in the host galaxy population that is consistent with some structural evolution (as in building bulges from disks), although that conclusion is more speculative, of course.


more writing, warm and hot

More writing yesterday and today. Lang made some figures for the nascent paper. I also spent some time at the Warm and Hot Universe meeting at Columbia. Several nice talks on clusters, and some good stuff on new missions.


proper motion paper, archetypes

Worked on writing up the proper motion measurement paper with Lang, and discussed the objective generation of spectral archetypes with the PRIMUS team.


brown dwarfs and quasars separated again

My loyal reader will recall that I spent the summer writing code to separate quasars and brown dwarfs using proper motions, measured in very low signal-to-noise data (that is, multi-epoch data in which the source is not detectable at any individual epoch), and that I spent the spring re-writing that code with the help of Lang. Today we finally closed the loop and showed that indeed we can separate the two populations using angular kinematics:


convex hull

Among many things which were—believe it or not—more boring than this, I spent part of today getting a geometric picture of the convex hull for stellar colors. An unresolved binary star system is composed of two stars; a galaxy is composed of many stars. Stars do not fill all of color space, but live on some restricted set of loci in the multi-dimensional space of astronomically observed colors. Combinations of stars, such as binaries and galaxies, must therefore also live in a subspace of the whole space. This is a no-brainer, but it is not trivial to visualize the situation, because color space is not linear in any sense. Some of the results were a bit surprising to me, which is embarrassing.


GALEX, tech report

Chris Martin (Caltech) was in town, and we discussed many things GALEX. In particular he got me excited (only I could get excited about this) about re-calibrating the sensitivity map, which is a non-trivial function of detector position. This all came up because Schiminovich and I have some ideas about using the time-tagged photon stream; this stream has residual artifacts in it from imperfect measurements of the sensitivity map, which is sampled in a well-defined way as the spacecraft dithers during the exposure.

In the morning, after a long bureaucratic mission at City Hall and before a power lunch with Masjedi on Wall Street, I worked on what the Astrometry.net team calls the tech report, the paper about the system that is written in the style of an Astronomical Journal submission, summarizing success to date of our blind astrometry and data recovery system.


clustering in two dimensions

The only real research I did today was discuss two-dimensional measures of three-dimensional galaxy clustering with Antara Basu-Zych (Columbia). She measures what we usually call w(theta); I measure what we usually call wp(rp); we figured out the relation between these, which depends on the angular diameter distance and the volume per solid angle (that is, an integral of the volume element over redshift). The main subtlety is to decide whether you want your wp(rp) in comoving or proper units; this can get confusing because almost all volume calculations that are commonly done are done in comoving, whereas transverse calculations tend to be proper.


blind date nearly done, galaxy kinematics

Barron cornered me and we finished the blind date draft. Blind date is the project in which we determine the date at which a photographic plate was taken by comparing the positions of stars to those in a catalog with proper motions.

Reinhardt Genzel spoke at Pizza Lunch about fully resolved galaxy kinematics at a redshift of 2. He argued that he could see disks in formation and evidence for secular creation of bulges. I am not sure I agreed with all the conclusions, but the data—spatially resolved infrared spectroscopy from VLT—were incredible.

The NSF proposal went in.


more proposal, z-band-only sources

The proposal did not turn out to be nearly done, and Fergus and I re-wrote it on the weekend and today. But we must submit tomorrow, so the end is in sight.

I spent a bit of time following up some of the z-band-only sources in the SDSS Southern Stripe. Most of them are spurious in one way or another. The dominant source of spurious sources is artificial satellites and other fast-moving objects. In particular, some of the strangest sources Lang had found so far turned out to be individual blinks from a blinking artificial satellite.