I found, thanks to twitter-induced serendipity (others call it procrastination), the lyrics of the songs included in the annual Billboard Top-100 from 1965 to 2015 (i.e., considering a few missing, ~5,000 songs). You can find in GitHub, together with the raw data, some clarifications on how the data were collected, their limitations, etc. plus a pointer to a nice analysis already done.
The complexity of some cultural domains, such as technology, seems to be linked to population size. This makes intuitive sense: after all, we have iPhones and rockets, whereas hunter-gatherer societies do not. How does this work, however, for other, non-technological, cultural domains? Western stories are not more complex than Aboriginal Australians ones (I guess). What about religions, or kinship systems?
The next EHBEA conference in Paris will include a “satellite meeting” on cultural attraction theory: Cultural Evolution by Cultural Attraction: Empirical Issues
I will give a talk titled Three predictions for cultural attraction theory. Below the (tentative) slides. If you cannot wait, the three predictions are:
- lo-fi copying is more significant than hi-fi copying in cultural transmission
- domain-general social influence (context-biases) is not very important
- culture is a matter of global, often neutral, traditions, more than local, generally adaptive, differences
An interesting article from Thom Scott-Phillips has been recently published in the Journal of Cognition and Culture: A (Simple) Experimental Demonstration that Cultural Evolution is not Replicative but Reconstructive – and an Explanations of Why this Difference Matters.
The article describes an experiment that nicely illustrates (“make flesh” in the words of Scott-Phillips) a thought experiment proposed by Dan Sperber. Shortly, imagine a Chinese whispers game, in which chains of individuals have to reproduce two drawings. One is a familiar configuration (in the specific case, the first three letters of the latin alphabet), while the other one is a meaningless scribble (see the image below, from Scott-Phillips’ paper).
I uploaded on figshare (here) a dataset. From the description there:
This dataset contains 1,093 movie scripts collected from the website imsdb.com, each in a separate text file. The file imsdb_sample.txt contains the titles of all movies (corresponding file names are in the form Script_TITLE.txt).
The website was crawled in January 2017. Some scripts are not present as they were missing in imsdb.com or because they were uploaded as pdf files. Please notice that (i) the original scripts were uploaded on the website by individual users, so that they might not correspond exactly to the movie scripts and typos may be present; (ii) html formatting was not consistent in the website, and so neither is the formatting of the resulting text files.
Even considering (i) and (ii), the quality seems good on average and the dataset can be easily used for text-mining tasks.