Today at lunch Michael Blanton organized a Data Science event in which Ken Benoit (LSE) told us about quanteda, his package for manipulating text in R. This package does lots of the data massaging and munging that used to be manual work, and gets the text data into "rectangular" form for data analysis. It also does lots of data analysis tasks too, but the munging was very interesting: Part of Benoit's motivation is to make text analyses reproducible from beginning to end. Benoit's example texts were amusing because he works on political speeches. He had examples from US and Irish politics. Some discussion in the room was about Python vs R; the key motivation for working in R is that it is by far the dominant language at the intersection of statistics and political science.

