Trevor David (Flatiron) convened a group to talk about latent-variable models and the myriad decisions they require. He is trying to model the ages and the abundances of stars, with the hope of getting a new age indicator. Nora Eisner (Flatiron) is building a very similar model but for the classifications by participants in a big citizen-science project. We discussed the relationships between number of latents, model complexity, regularization, and predictions. We also discussed testing, interpretation, and trustworthiness. It's a big space.
2022-10-10
2018-07-02
Milky Way disk and halo, HabEx, M dwarfs, etc
Ah, back to work again. It is my incredible privilege to work in Heidelberg every summer. Today I spoke with Sara Rezaei Kh (MPIA) and Christina Eilers (MPIA) about projects to use Gaia DR2 to constrain properties of the Milky-Way disk, especially the rotation curve and the dust density as a function of position. That connected to a longer conversation with Lauren Anderson (Flatiron) and Hans-Walter Rix (MPIA) about measuring the properties of stellar populations in boxels of the Milky Way. Boxels in position, or in velocity, or in actions. It also led to some work in which Eilers and I looked at external validation (using open clusters) of our spectroscopic parallaxes.
I also re-started projects on M-type dwarf stars with Jessica Birky (UCSD) who is in HD for the summer. She will write up her results using The Cannon to transfer labels from a small training set fit by Andrew Mann (Columbia) to all of APOGEE if all goes well.
And into town came Daniel Stern (JPL), who gave an incredibly impressive talk about HabEx, the NASA mission concept for the next decadal survey. It is an ambitious mission, but strongly cost controlled. If it is paired with a starshade (an idea I love), it could do amazing exoplanet science. And it really motivates me to get back to thinking about physical optics!
Finally, I spent a couple hours in the back of the room for #StellarHalos18, where I learned about Gaia DR2 projects on the Milky-Way halo. In particular, I learned about the Malhan method for finding streams. It puts high weight on stars with likely co-orbital neighbors, and then uses a by-hand or by-eye step to link them into stream discoveries. Very impressive. Very fast. Very high impact! But a bit too heuristic for my taste; let's automate all the things!
2017-02-28
#DtU17, day two
Today I dropped in on Detecting the Unexpected in Baltimore, to provide a last-minute talk replacement. In the question period of my talk, Tom Loredo (Cornell) got us talking about precision vs accuracy. My position is a hard one: We never have ground truth about things like chemical abundances of stars; every chemical abundance is a latent variable; there is no external information we can use to determine whether our abundance measurements are really accurate. My view is that a model is accurate only inasmuch as it makes correct predictions about qualitatively different data. So we are left with only precision for many of our questions of greatest interest. More on this in some longer form, later.
Highlights (for me; very subjective) of the days' talks were stories about citizen science. Chris Lintott (Oxford) told us about tremendous lessons learned from years of Zooniverse, and the non-trivial connections between how you structure a project and how engaged users will become. He also talked about a long-term vision for partnering machine learning and human actors. He answered very thoughtfully a question about the ethical aspects of crowd-sourcing. Brooke Simmons (UCSD) showed us how easy it is to set up a crowd-sourcing project on Zooniverse; they have built an amazingly simple interface and toolkit. Steven Silverberg (Oklahoma) told us about Disk Detective and Julie Banfield (ANU) told us about Radio Galaxy Zoo. They both have amazing super-users, who have contributed to published papers. In the latter project, they have found (somewhat serendipitously) the largest radio galaxy ever found! One take-away from my perspective is that essentially all of the discoveries of the Unexpected have happened in the forums—in the deep social interaction parts of the citizen-science sites.
2016-10-25
#dsesummit, day 2
My day started with a long breakfast conversation with Yann LeCun (NYU) about adversarial methods in deep learning. In these methods, a generator and discriminator are trained simultaneously, and against one another. It is a great method for finding or describing complex density functions in high dimensions, and people in the business have high hopes. In particular, it is crushing in image applications. We discussed the problem that is currently on my mind, which is modeling the color–magnitude diagram of stars in Gaia, using one of these adversarial systems, plus a good noise model for the parallaxes. I would love to do that, and it should be much easier than the image problems, because the data are much lower in dimensionality.
I ran a very amusing session at the Summit, in which we had participants bring figures and we crowd-sourced a reaction, critique, and to-do list for each of them. We looked at a figure from politics from Michael Gill (NYU), making a causal claim about regulations and how meeting minutes are kept, a figure from geophysics from Nicholas Swanson-Hysell (Berkeley) showing the data and a model for polar wander, and a figure from neuroscience from Bijan Pesaran (NYU) showing brain region classifications. The feedback from the group was great and useful and constructive (though not always polite; my apologies!). One theme of our discussion ended up being consistency across figure elements. I feel like this crowd-sourcing session was a model for future sessions; it would even be fun to make this a regular event in some forum in NYC.
There was a lot of non-research today, but in the remainder of my research time, I worked on outline material for our growing paper on Hack Weeks.
2016-05-19
writing by tweeting
I spent the day working on my document about releasing data and code. I tweeted (tm) some of the ideas in the paper and started responding to the storm of replies. The twitters are excellent for getting ideas from the community!
2016-01-08
#AAS227, day 4; AAS Hack Day
Today was the fourth annual AAS Hack Day (#hackaas) at #AAS227, organized by Kelle Cruz (CUNY), Meg Schwamb (Taiwan), and myself, and sponsored by the LSST Corporation and Northrop Grumman. We had a huge crowd: About fifty people and the staff had to bring in extra tables, chairs, and power strips. The hacks varied enormously in scope and category; here are just a few that stood out:
- AAS meeting conflicts
- Adrian Price-Whelan (and a bit Scott Idem and me) used some vector-of-words methods from previous AAS Hack Days to look at schedule issues in the AAS 227 program. He found pairs of oral sessions that were scheduled in conflict that contain talks with abstracts that are close in word space. The idea was to predict which sessions led to the largest number of complaints to the AAS about scheduling, and also provide prototypes of tools that might be used to make scheduling better in the future.
- gender and questions in AAS oral sessions
- Mehmet Alpaslan and a team including Hack-Day veteran Jim Davenport looked at new data on oral session question-askers and speakers and chairs, finding (as we learned at earlier meetings) that men ask more questions than women, but also finding that the gender of the speaker seems to be correlated with the gender of the question-asker. The data are barely understood at present, being only days old.
- crowd-sourcing the old literature reference graph
- In some twitter activity prior to the meeting, we discovered that old papers have poor citation and reference information, because the references were often in footnotes, formatted inconsistently, and OCR-ed badly. Brooke Simmons taught the AAS Hack Day participants how to build prototype Zooniverse projects for crowd-sourcing, and Brendan Wells used that knowledge to build a project to solve this old-reference problem. Love that collaboration, which was un-imagined prior to the Hack Day!
- glassdome: glassdoor for astronomy
- Ellie Schwab and friends started to build a site where people of all different ranks and seniorities could openly or anonymously review their home institutions, and comment on salary and other often-private things. Originally the project started as anonymous, but evolved to more encouraging of open and transparent reviewing as the day went on.
- finding asteroids with Kepler
- Geert Barentsen arrived with the retrospectively obvious point that the Kepler satellite is awesome for finding asteroids: It spends (in its K2 mode) half of its time looking inside the Earth's orbit, so it is great for finding Earth-crossing and inner asteroids. It also has great cadence and sensitivity. He assembled a great team and started to look. Science! Also on the science with Kepler tip, Jennifer Cash and Lucianne Walkowicz started work extracting photometry from full-field images.
- death to Jet
- Timothy Pickering propagated the new matplotlib non-Jet colormaps to plotly.js. This is God's work, as it permits web-plotting gurus to benefit from the latest research in visual perception of continuous data. In case you haven't been paying attention, Hack Days are a great time for people to bond over their hatred of the Jet colormap, but Pickering also reminded us of the research that shows that it leads to misconceptions about the data, fails in black-and-white printing, and is bad for people with vision impairments.
- exoplanetary systems in WWT
- David Weigel, after reminding us that World-Wide Telescope has gone open source and is now a project of the AAS, showed us how he put a known exoplanet system into the software. The plan is to get them all in there and then make possible tours and activities around exoplanet discovery and science.
- fabric poster upcycling
- Ashley Pagnotta and company brought a sewing machine to #hackaas. It turns out that it makes sense these days to print your poster on fabric not paper! This is because fabric printing is now very cheap, and you can pack a fabric poster trivially in your luggage. Check it out. But Pagnotta and colleagues brought patterns and skills and turned posters into infotaining clothing. Insane.
2015-12-09
hierarchical photometric redshifts, combining unreliable methods
I am still not well, but well enough for the first time in ages to do my group meeting. It was great! Boris Leistedt (NYU) talked to us about his project to do template-based photometric redshifts but where he learns the templates too. I love this project; it is the only method I like for getting the redshift distribution in LSST past the (effective) spectroscopic magnitude limit. He gave us the philosophy and showed us some preliminary successes on simple fake data.
Andy Casey (Cambridge) talked to us about the different stellar parameter and abundance codes working in the Gaia-ESO spectroscopic survey collaboration. It is complicated! Getting reliable and valuable stellar quantities out of mutually inconsistent codes is not an easy problem. His methodologies are filled with good ideas. Brian McFee (NYU) suggested that he or we look at the crowd-sourcing literature for ideas here, and that turns out to be a very good idea. I proposed to Casey that we do some reading this coming week.
Right after group meeting, Casey diagnosed our bugs from yesterday and got the L1-regularized fitting working! We celebrated with ramen.
2015-05-20
#ArloFest, day 1
Today was the first day of Landolt Standards & 21st Century Photometry in Baton Rouge, organized by Pagnotta (AMNH) and Clayton (LSU). I came to speak about self-calibration. The day started with a historical overview by Bessel (MSSSO), who gave a lovely talk filled with profiles of the many people who contributed to the development of photometric calibration and magnitude systems. Many of the people he talked about (including himself) have filter systems or magnitude systems named after them! Among the many interesting things he touched on was this paper by Johnson, which I have yet to carefully read, but apparently contains some of the philosophy behind standard-star systems. He also discussed the filter choices for the Skymapper project, which seem very considered.
Suntzeff (TAMU) gave an excellent talk about the limitations of the supernova cosmology projects; his main point is that systematic issues with the photometric calibration system are the dominant term in the uncertainty budget. This is important in thinking about where to apportion new resources. He made a great case for understanding physically every part of the photometric measurement system (and that includes the stars, the atmosphere, the telescope, and the detector pixels, among other things). I couldn't agree more!
Grindlay (CfA) blew us away with the scale and content of the DASCH plate-scanning project at Harvard. It is just awesome, in time span, cadence, and sky coverage. Anyone not searching these data is making a mistake! And, as we were recovering from that, Kafka (AAVSO) blew us away again with the scale and scope of the APASS survey, which was designed, built, operated, reduced, and delivered to the public almost entirely by citizen scientists. It is dramatic; we are not worthy!
There were many other great contributions—too many to mention them all—but the day ended with a crawfish boil and then Josh Peek (STScI) and I at the bar discussing recent explosive conversations in the astronomical community around TMT and development in Hawaii.
One last thing I should say: Arlo Landolt (LSU) has had a huge impact on astronomy; his work has enabled countless projects and scientific measurements and discoveries. The development and stewardship of photometric standards and systems, and all the attention to detail it requires, is unglamorous and time-consuming work, ill-suited to most of the community, and yet absolutely essential to everything we do. I can't thank Landolt—and his collaborators and the whole community of photometrists—enough.
2014-07-04
web cams and James Bradley
Over lunch, Markus Pössel (MPIA) mentioned that he can measure the sidereal day very accurately, using a fish-eye or wide-field web cam pointed at the sky. This led us to a discussion of whether it would be possible to repeat Bradley's experiments of the 1700s that measured stellar aberration, precession of the Earth's axis, and nutation. Pössel had the very nice realization that you don't have to specifically identify any individual stars in any images to do this experiment; you can just do cross-correlations of multi-pixel time series. That's brilliant! We decided to discuss again later this month along with a possible (high school) student researcher.
Before that, Roberto Decarli (MPIA) and I discussed various projects. The most interesting is whether or how you can "stack data" (combine information from many images or many parts of an image) but in interferometric imaging data. Decarli has shown that you can do this stacking in the fourier space rather than in the image space. That's excellent, because the noise properties of the data are (conceivably) known there, but never understood properly in the image space. I gave him my usual advice, which is to replace the stacking with some kind of linear fit or regression: Stacking in bins is like linear fitting but under hard assumptions about the noise model and properties of the sources. We agreed to test some ideas.
2013-09-18
dotastro, day 3
The third and last day of dotastronomy 5 started with reports of the outcome of the Hack Day. Various extremely impressive hacks happened, way too many to mention, but including a very impressive video about planet naming, by Deacon and Angus and others, an automated astronomer-career mapping app by Foreman-Mackey and others, a XBox-Kinect doppler-shift app by Lynn that got everyone in the room dancing and spinning more than once, and (near and dear to my heart) improved functionality for the Zoonibot by Barentsen and Simmons and others. That latter hack is an extension of the the bot that got started by Beaumont and Price-Whelan (at, I am proud to say, my suggestion) at dotastronomy 4.
Among the talks, one of the highlights for me was Trouille (Adler) talking about the Galaxy Zoo Quench project, in which Zooites are taking the project from soup to nuts, including writing the paper. She spent a time in her talk on the problem of getting the participants to boldly play with the data as professional scientists might. It is a rich and deep piece of public outreach; it takes self-selected people through the full scientific process. Another highlight was Microsoft's Tony Hey talking about open access, open data, open science, libraries, and the fourth paradigm. Very inspiring stuff.
Related to that, there was great unconference action in a session on open or low-page-charge publishing models, led by Lynn (Adler) and Lintott (Oxford), in which Simpson (Oxford; and our fearless dotastronomy leader) got emotional (in all the right ways) about how crazy it is that the professional societies and individual scientists have signed away their right to their own work that they researched, wrote, reviewed, and edited for the literature. Testify!
I ran a short unconference session on combining noisy information coming from Zoo participants (or equivalent) in citizen-science and croud-sourcing situations. A good discussion of many issues came up, including about the graphical model that represents our assumptions about what is going on in the projects, about active learning and adaptive methods, and about exposing the internal data in real time so that external (third-party) systems can participate in the adaptive decision-making. I also advocated for boosting-like methods, based on the idea that there might be classifiers (people) with non-trivial and covariant residual (error) properties.
It has been a great meeting; Rob Simpson (Oxford) and Gus Muench (Harvard) deserve huge thanks for organizing and running it.
2013-09-16
dotastro, day 1
Today was the first day of dotastronomy, the meeting for astronomy and web and outreach and so-on, this time in Cambridge, MA. Stand-out talks included those by Stuart Lynn (Adler) on the Zooniverse and Elisabeth Newton (Harvard) about astronomy blogging in general (she mentioned this blog) and Astrobites in particular. Astrobites has been an incredible resource for astronomy, and it is carefully cultivated, edited, and managed. What a project!
In the afternoon we switched to unconference, some of which I skipped to attend a phonecon about Kepler data with the exoSAMSI crew, organized by Bekki Dawson (Harvard), who is effectively our leader. On that call, we discussed what everyone has been doing since exoSAMSI, which is quite a bit. Barclay (Ames) has been working on inferring the limb-darkening laws using transits as measuring tools. Quarles (Texas) has been searching the real-stars-with-injected-planets that we (read: Foreman-Mackey) made back at exoSAMSI, with some success. Foreman-Mackey and Angus have been searching for long-period systems with a fast Gaussian Process inside the search loop. We also spent some time talking about modeling the pixel-level data, since we at CampHogg have become evangelists about this. The SAMSI program, organized mainly by Eric Ford (PSU) has been incredibly productive and is effectively the basis for a lot of my research these days.
In my dotastro talk this morning, I mentioned the point that in "citizen science" you have to model the behavior of your citizens, and then generalized to "scientist science": If you are using data or results over which you have almost no control, you probably have to build a model of the behavior and interests and decision-making of the human actors involved in the data-generating process. In the afternoon, Lintott (Oxford) suggested that we find a simple example of this and write a short paper about it, maybe in an area where it is obviously true that your model of the scientists impacts your conclusions. That's a good idea; suggestions about how to do this from my loyal reader (you know who you are) are welcome.
2013-09-05
KIPAC@10, day 3
Although there were very amusing and useful talks this morning from Bloom (Berkeley), Boutigny (CNRS), Marshall, and Wecshler (KIPAC), the highlight for me was a talk by Stuart Lynn (Adler) about the Zooniverse family of projects. He spent a lot of time talking about the care they take of their users; he not only demonstrated that they are doing great science in their new suite of projects, but also that they are treating their participants very ethically. He also emphasized my main point about the Zoo, which is that the rich communication and interaction on the forums of the site is in many ways what's most interesting about the projects.
In the afternoon, we had the "unconference" session. Marshall and I led a session on weak lensing. We spent the entire afternoon tweaking and re-tweaking and arguing about a single graphical model! It was useful and fun, though maybe a bit less pragmatic than we wanted.
2013-05-22
robust rank statistics
While Lang and I programmed like mad, Schölkopf read the literature on rank statistics (and galaxies with faint features). We realized that we need to do something much more robust in our combinations of rank information. We implemented a more robust method, with Schölkopf wondering if there is something much better we could be doing. Results will appear tomorrow (or late tonight).
2013-05-21
image pixel ranks, probability, provenance
In an argumentative session, we decided that everything we did and thought yesterday about combination of images was wrong, and re-started. The argument was long and complicated, but ended up delivering a very simple algorithm. The idea is to use the rank information in an input image to update or improve our beliefs about the rank information for pixels in a combined or reference image. The point of this is that we don't believe the intensity information in the images but we do believe that brighter parts are probably truly brighter. A lot of what made things complicated is that sometimes an input image covers only part of the reference image; in this case we only want to use it to reorder the pixels within its footprint.
In a not totally unrelated conversation we asked the following question: How can you combine the rolls of two six-sided dice such that you get a random integer uniformly distributed between 1 and 6? The constraint is: You must use the two dice symmetrically. One solution: Roll the two dice and then randomly choose one die and read it. We came up with a few others. You can't add the two dice rolls and divide by two, because then the result isn't uniformly distributed between 1 and 6. The central limit theorem is a hard thing to fight against. My favorite solution: Make a 6x6 table, in which the numbers from 1 through six each appear 6 times, but placed in the table randomly. Roll two dice, use the first to choose the row and the second to choose the column in the table. That's a hash, I think, mapping the two rolls (which jointly produce 36 different outcomes) onto 6 numbers.
At the end of the day, Lang and I used pixel rankings to identify human-viewable images that were built from the same source data. The idea is that the ordering of the noisy pixel values in the sky is like a "digital fingerprint". It seems to work like magic.
2013-05-20
combining bad images
Dustin Lang arrived for a few days of hacking in preparation for (we hope) putting in a NIPS paper by the deadline of next week. We are working with Schölkopf on a project to combine arbitrarily badly processed human-viewable images to find very faint features in extended astronomical objects (like galaxies and nebulae). We argued for ages about the methodology and started to implement. In the background, while Lang and I pair-coded something somewhat sensible, Schölkopf coded up the straight-up average of the registered images. It looked surprisingly good, causing us to wonder whether it is worth going to all the trouble to which we are going!
2012-11-05
audio in the city, black hole populations
Mike Kesden (NYU) gave the brown-bag talk today, about black-hole–black-hole binary population synthesis in preparation for gravitational radiation detection with advanced LIGO. The principal pathway for making LIGO-relevant BH–BH binaries is an insane combination of mass transfer, supernova, common-envelope evolution, supernova, and inspiral, but hey! Kesden argued that it really is likely that advanced LIGO will see these.
In the morning I met with Oded Nov (NYU Poly) and Claudio Silva (NYU Poly) to discuss possible funding proposals related to engineering, science, and citizen science. We came up with a kick-ass idea for using smart phones and angry residents to map the audio response of the city to car alarms (really any noises, but car alarms are nice standard sirens
for calibration). The project would look a lot like the (non-existent) Open-Source Sky Survey but build a three-dimensional audio-response model of the city. Cool if we did it!
2012-07-10
dotastronomy, day two
Today was Hack Day at dotastronomy. The hack day started with a session in which various people proposed hacks, in part to advertise and in part to entice people in the audience with coding (or other) skills to participate. I proposed making a bot to interact with the users on the Planet Hunters forums, helping them to discuss and analyze the variable stars and transits they are finding. Foreman-Mackey proposed writing a javascript numerical optimization library for inference in the browser. Marshall proposed making quantitative measurements of images (that is, modeling) in the browser. These three hacks are related, of course, because we are thinking about very capable bots!
When we got started, I found that many people were interested in the bot concept, especially Price-Whelan, Beaumont, Lintott, and Schwamb. They got started exercising the just-started (by Lintott and Simpson) Planet Hunters API, with help from people at Adler (in real time), sending and receiving JSON. I very quietly backed away from the table, and they executed my hack with absolutely no involvement from me whatsoever [added later: I got a prize in the hack prizes for this life hack
: Getting others to hack on my behalf!]. While they made a bot, I went to help out
with (read: gaze in awe at) Foreman-Mackey's hack.
Great success all around: Marshall and Kapadia got image fitting working in the browser (and now Kapadia is contemplating porting simplexy to javascript). Foreman-Mackey made this demo, which does all its fitting in the browser; click through to the code if you need a javascript numerical optimizer. The bot, called ZooniBot, started commenting on the forums and by the end of the day had three logos (one drawn, unsolicited, by the child of a Planet Hunters user) and a bunch of online followers and direct messages!
I love Hack Day.
2011-10-26
open science, importance sampling
It is Open Access Week and for that reason, SUNY Albany libraries held an afternoon-long event. I learned a lot at the brown-bag discussion about how open access policies could dramatically improve the abilities of librarians to serve their constituents, and dramatically improve the ability of universities to generate and transmit knowledge. The horror stories about copyright, DRM, and unfair IP practices were, well, horrific. In the afternoon I gave a seminar about the openness of our group at NYU, including this blog, our web-exposed SVN repo, and our free data and code policies (obeyed where we are permitted to obey them; see above). It was great, and a great reminder that librarians are currently—in many universities—the most radical intellectuals, with sharp critiques of the conflicts and interactions between institutions of higher learning and institutions of commerce.
On the train home, I tried out importance sampling for my posterior PDF over catalogs
project. Not a good idea! The prior is so very, very large.
2011-04-06
dotAstronomy day three
In the morning session I talked about modeling, including our Comet 17P/Holmes project (which got some press here and here; my dotastronomy viewgraphs are here). A highlight for me of the talks was Geert Barensten (Armagh Observatory) talking about human observing of meteroids. He showed a ridiculous distribution of meteoric material around the Earth; the detail was beautiful. At the end of his talk he showed experimentally that some meteoroids can be detected by Twitter searches!
We had an afternoon unconference, with so many good things I couldn't decide what to do. In the end I went to the mash-up discussion, which evolved into a discussion of funding, not surprisingly given the bad state of things for so many interesting projects right now (Jill Tarter of SETI noted that the Allen Telescope Array may be forced to shut down this year). We decided to look at crowd-sourcing some long-term funding propaganda.
2011-04-05
dotAstronomy day two
There were nice talks in the morning showing off some great and useful astrophysics-related engineering. One highlight for me was Thomas Robataille (Harvard) showing off new ADS-related awesomeness. He mentioned the point that interfacing with ADS through command-line tools improves repeatability. Amen to that! Another highlight for me was Thomas Boch (CDS) showing off the next generation of insane CDS tools.
The afternoon was "Hack Day". Phil Marshall proposed that we make AstroTaches, a citizen-science platform for annotating (think "drawing moustaches on") astronomical images. Of course his first thought is for the purposes of deblending galaxy-scale strong gravitational lenses. We recruited Stuart Lowe (LCOGT), who is a javascript and HTML5 master, to make everything work in the browser, and we recruited Pamela Gay (Astrosphere), who has all necessary database, Amazon Web services, and Zooniverse foo. We got it all working, and then ran out of steam somewhere around 02:30 trying to do the data analysis on the back end!
[Note added later: The next day we won a runner-up prize in the Hack Day awards. Some of the submissions were incredible; one of them got press coverage. My favorite hack was a home-built pen-casting system (draw and record voice and drawing in real time to tell a story).]