Hogg's Research: archive

Showing posts with label archive. Show all posts

2019-01-06

target selection

On the weekend, Rix (MPIA) and I got in a call to discuss the target selection for SDSS-V, which is a future survey to measure multi-epoch spectroscopy for (potentially) millions of stars. The issue is that we have many stellar targeting categories, and Rix and my view is that targeting should be based only on the measured properties of stars in a small set of public, versioned photometric and astrometric catalogs.

This might not sound like a hard constraint, but it is: It means you can't use all the things we know about the stars to select them. That seems crazy to many of our colleagues: Aren't you wasting telescope time if you observe things that you could have known, from existing observations, was not in the desired category? That is, if you require that selection be done from a certain set of public information sources, you are ensuring an efficiency hit.

But that is compensated—way more than compensated—by the point that the target selection will be understandable, repeatable, and simulate-able. That is, the more automatic the target selection it is, from simple inputs, the easier it is to do populations analyses, statistical analyses, and simulate the survey (or what the survey would have done in a different galaxy). See, for example, cosmology: The incredibly precise measurements in cosmology have been made possible by performing simple, inefficient, but easy-to-understand-and-model selection functions. And, indeed: When the selection functions get crazy (as they do in SDSS-III quasar target selection, with which I was involved), the data become very hard to use (the clustering of those quasars on large scales can never be known extremely precisely).

Side note: This problem has been disastrous for radial-velocity surveys for planets, because in most cases, the observation planning has been done by people in a room, talking. That's extremely hard to model in a data analysis.

Rix and I also discussed a couple of subtleties. One is that not only should the selection be based on public surveys, it really should be based only on the measurements from those surveys, and not the uncertainties or error estimates. This is in part because the uncertainties are rarely known correctly, and in part because the uncertainties are a property of the survey, not the Universe! But this is a subtlety. Another subtlety is that we might not just want target lists, we might want priorities. Can we easily model a survey built on target priorities rather than target selection? I think so, but I haven't faced that yet in my statistical work.

2018-12-04

#AstroData2020s, day 1

Today was the first day of a meeting hosted by the NASA IPAC (home of IRSA and Spitzer among other important projects) to start the discussion of the response of the astronomical data archives to the US Astrophysics Decadal Survey. Not everyone agreed on the point of the meeting, but I think it is to create talking points that connect to archives, but which could be incorporated into community science white papers. These white papers are due in February.

There were several highlights for me at the meeting today. One was Hillenbrand (Caltech) summarizing the white paper process from last decade, and giving advice for white-paper submitters. She emphasized that the white paper text should be cut-and-paste ready for inclusion in the final report. That is, it isn't like a proposal to be approved; it is like a community contribution to the writing of the report. And she emphasized that it doesn't make sense to make points in white papers that will be obvious to the committee!

One of the technical concepts that was discussed today by archives was that of science platforms, in which archives might provide compute resources or other scientific facilities to their users: The idea is to bring the code and the analysis to the data, since the alternatives might be too expensive. But (as I brought up in discussion) then that gets into the space of archives making decisions about what science they do and don't support, which might conflict with peer review, or put scientific projects under various kinds of double jeopardy. And it might mean that projects like LSST, which are doing related things, might end up interfering in unintended ways with the astronomical community and its scientific priorities. These are interesting issues to keep track of.

2017-06-16

cosmic rays, alien technology

I helped Justin Alsing (Flatiron) and Maggie Lieu (ESA) search for HST data relevant to their project for training a model to find cosmic rays and asteroids. They started to decide that HST's cosmic-ray identification methods that they are already using might be good enough to just rely upon, which drops their requirements down to asteroids. That's good! But it's hard to make a good training set.

Jia Liu (Columbia) swung by to discuss the possibility of finding things at exo-L1 or exo-L2 (or the other Lagrange points). Some of the Lagrange points are unstable, so anything we find would be clear signs of alien technology. We looked at the relevant literature; we may be fully scooped, but I think there are probably things to do still. One thing we discussed is the observability; it is somehow going to depend on the relative density of the planet and star!

2014-06-07

Ed Groth

I spent a great day in Princeton at the birthday and retirement celebration for Ed Groth (Princeton), who was instrumental in the HST WFPC project and is the originator of the incredibly influential Groth Strip. There were many great talks and reminiscences, a few of the highlights for me were the following:

Ed MacDonald (who worked on oceanography for the Navy and NATO) talked about moving data by paper tape from experiment to computer center, and the fact that mundane tasks are an important part of all important scientific discoveries. He noted that Bob Dicke (the leader in the Gravity Group at Princeton) was never afraid of doing mundane things in support of scientific discovery.

Bill Wickes (formerly of HP) talked about many things, not the least of which was the importance of calculators in scientific research. Indeed, calculators featured heavily in the stories and photographs from Groth's early days. Wickes is responsible for inventing and designing and improving various HP calculators. He also talked about the Gravity Group attitude of "you sit on it until it works", which is a very good principle for science!

Bruce Partridge (Haverford) discussed the precise timing of the Crab Pulsar, done at Princeton by him and Groth and others, which led to the discovery of period derivatives, second derivatives, and glitches. The timing was done very cleverly; he showed the electronics diagram. The Gravity Group was always motivated to precisely measure anything for which there was simultaneously a hope of precise measurement and a precise quantitative prediction. He showed also that the search for gravitational radiation was already in the air way back then.

Jason Rhodes (JPL) and Todd Lauer (NOAO) talked about HST imaging. Rhodes and Groth wrote one of the first papers on weak gravitational lensing. Lauer pointed out that Groth was instrumental in starting the HST Archive and our understanding of the huge legacy value of digital data sets.

Finally, Jim Peebles (Princeton) talked about correlation functions, on which he worked with Groth, and which remain the key tool of cosmology today. He showed some lovely visualizations of hand-taken data on galaxy counts from the 1960s and 70s. He highlighted the ways in which Groth's career spanned the transition from "small science" to "big science", doing important things in both modes. It was a great day!

2013-05-09

data preservation, meta-analysis

I spent the day at Radcliffe, in a small meeting arranged by Alyssa Goodman (Harvard) and Xiao-Li Meng (Harvard) on how to curate and keep data for analysis and re-analysis. Most of the discussion in this (free-form, informal, small) workshop was around the idea of meta-analysis and re-use of the data by other users. Some of the interesting ideas that came up were the following: Different people coming from different backgrounds have very different meanings for the word "model" and also many other words, including "data" and "provenance". The goals of data preservation, meta-analysis, re-use, and scientific reproducibility are all related and overlapping. Archivists and curators do best when they get involved with the data as early as possible in the "life cycle", preferably right at the original taking of the data. The concerns that arise with reproducibility and the concerns that arise with privacy (think: health data and the like) are strongly at odds.

Meta-analysis can be described in terms of hierarchical modeling (duh) and we should probably think about it that way. Meng showed some nice results on the idea of sufficient statistics in hierarchical models; specifically, he is thinking about statistics that are sufficient for sub-branches of the full model: When are they also sufficient statistics for the whole model? The range of expertise in the room—from statistics to particle physics to library science—made for a lively conversation, and many (small) disagreements. The goal for tomorrow is to write a document summarizing various things learned.

2008-01-11

leveraging services

Roweis and I had a very long discussion about ways to have existing web-based photo-sharing and web-publishing sites handle the data for Astrometry.net. There are huge numbers of issues, not the least of which is that the opportunities for storing meta-data are very limited. However, if we can work something out, we win big because maintaining high-bandwidth servers is not a problem of great intellectual interest to our team.

2008-01-10

AAS meeting, day two

I actually did some work at the Google booth helping people to operate Sky. The day was filled with discussion about how to make Sky more useful as a scientific tool, by, for example creating mash-ups with data archives. The nice thing is that Sky has an easy API, so this can be done by third parties (such as myself). One problem with trying to make progress at a meeting is that usually all you can do is plan more meetings.

I learned stuff about APOGEE (a part of SDSS-III) from, among other people, Nick Konidaris (UCSC) and Ricardo Schiavon (UVa). It will obtain high precision velocities and information about some dozen elements for 100,000 stars with a R=20,000 infrared spectrograph. This really provides qualitatively new information about the Milky Way.

2008-01-08

transforming images

My ideas about making human-viewable images out of scientific data in a quasi-reversible way started a big discussion on the Astrometry.net internal email lists. It is not possible to make human-viewable images that have the full dynamic range of a typical astronomical image and look good. In most cases you can't do either, but you can essentially never do both. So I have to either go with insane ugliness or else bad irreversibility. Or else we just punt on Flickr and Picasaweb and move to archives that can take FITS files like MAST.

On the way to Austin, TX for the AAS meeting I completed my planning for my FITS to PNG converter. Now I just have to find all the open-source Python libraries I need.