I am almost leaving after few intense days at the Second Moscow-Tartu Late Summer School on Digital Humanities, where I was invited to give a talk about my research on the cultural dynamics of emotion in fiction (see two papers here and here), and more recently in song lyrics (no paper yet – a blog post documenting the very beginning of the research is here).
I uploaded on figshare (here) a dataset. From the description there:
This dataset contains 1,093 movie scripts collected from the website imsdb.com, each in a separate text file. The file imsdb_sample.txt contains the titles of all movies (corresponding file names are in the form Script_TITLE.txt).
The website was crawled in January 2017. Some scripts are not present as they were missing in imsdb.com or because they were uploaded as pdf files. Please notice that (i) the original scripts were uploaded on the website by individual users, so that they might not correspond exactly to the movie scripts and typos may be present; (ii) html formatting was not consistent in the website, and so neither is the formatting of the resulting text files.
Even considering (i) and (ii), the quality seems good on average and the dataset can be easily used for text-mining tasks.
Almost three years ago I programmed a simple twitterbot (see here), namely a Python script that was posting every hour, when available, news or blog posts related to cultural evolution – hence the name @CultEvoBot. While the goal of the endeavour was mainly to see how difficult was to build something like that (it was easy!), and to use potentially what I learnt for other projects (I never did, but who knows!), @CultEvoBot was relatively useful and posted links to interesting sources, the majority of the time.
[Second post of the series “Things that I probably will not develop in a proper paper, but I find interesting enough to write here”. The first is on the XX century decrease of turnover rate in popular culture]
In the last couple of years, part of my research has been dedicated to explore the emotional content of published books, using the material present in the Google Books Ngram Corpus. Our analysis produced some interesting results. While analysis like ours need to be carefully weighted and possibly re-produced with various samples (but this should happen always…), I think that tools like the Google Books Corpus represent an extraordinary opportunity, as my goal is to study human culture in a scientific/quantitative framework.
Many studies of cultural evolution have focused on how transmission biases affect the likelihood of cultural traits of being transmitted. The concepts are quite intuitive. An useful distinction is between content biases, when the intrinsic features of a cultural trait make it more likely to spread (the effectiveness of a tool may be a content bias, but also a sexual hint in an image), and context biases, when the likelihood is determined by the context, as when we tend to dress as our friends/coworkers (conformist bias; but one can do the opposite, and prefer unpopular cultural traits), or as when I was trying to have a young Axl Rose haircut (prestige bias – see also my picture on the left).
Some interesting works in cultural evolution have examined, with analytical and simulation models, the adaptivity of transmission biases (e.g. did my Axl Rose haircut make me rich and/or attractive? It did not, but, on average, prestige bias may be useful) or examined the transmission biases long-trend dynamics in idealised situations (e.g. how fast will a new cultural trait “invade” a population of conformist individuals? or of anti-conformists? etc.). Other works investigated, in controlled experiments, if people are indeed subject to transmission biases when copying from others (they are, with caveat).
What is partly missing is an understanding of the impact of transmission biases in real life cultural dynamics. We recently had a paper accepted in Evolution and Human Behavior that tackles this problem. In brief (much more is in the paper!): (1) we focused on the turnover of popular traits, i.e. on how many new traits enter in a top-list of a certain size for a certain cultural domain (like here); (2) we used some predictions on how the turnover would look like if there were no biases, that is, if everybody would just copy others at random (neutral model of cultural evolution); and (3) we showed how these predictions differ if biases are instead present.
The turnover of some cultural domains, for example recent baby names in USA, looks like the red line in the figure above, signalling that people tend to prefer relatively un-common names. The turnover of others, like early baby names, musical preferences of Last.fm users who subscribed to genre-based groups (“80s Gothic”, “Acid Jazz”) , or usage of colour terms in English language books, looks instead like the blue line, signalling a conformist bias, or a content-based bias (which I call “attraction”).
Overall, turnover can be calculated when we have periodical top lists, or, more generally, when we can “count” the frequency of items trough time. Given the ubiquity of this kind of information in digital form, one can use this methodology to infer individual behaviour from population-level, aggregate, data for several cultural domains.
Acerbi, A. and Bentley, R.A. (2014), Biases in cultural transmission shape the turnover of popular traits, Evolution and Human Behavior, in press.