You have probably heard the adagio that “man bites dog”, and not “dog bites man”, makes for a good piece of news: unusual, exceptional, events are better stories than everyday occurrences. Hidden behind the surface, however, there is another message that seems so obvious we do not even think about it: both examples are negative events. “Dog wins the lottery” would probably be a good piece of news too, but it is not mentioned. That negative news – and possibly negative narratives in general – are more attractive than positive ones is a bit of a cliché, but it supported by much research.Continue reading “The ordeal simulation hypothesis”
I will hold a one-day practical workshop on “Emotions in 50 Years of Pop Song Lyrics: A Text Mining Approach” at the 7th Winter School Fact and Method: Data, Borders and Interpretation in Tartu – Estonia, the 7th of February 2018 (this blog post can give an idea of what we will do). The participation for PhD students is free of charge and, according to the organisers, in some cases, it is possible to reimburse the accommodation. See below a short description of the workshop and some suggested readings.
I found, thanks to twitter-induced serendipity (others call it procrastination), the lyrics of the songs included in the annual Billboard Top-100 from 1965 to 2015 (i.e., considering a few missing, ~5,000 songs). You can find in GitHub, together with the raw data, some clarifications on how the data were collected, their limitations, etc. plus a pointer to a nice analysis already done.
I uploaded on Open Science Framework (here) a dataset. From the description there:
This dataset contains 1,093 movie scripts collected from the website imsdb.com, each in a separate text file. The file imsdb_sample.txt contains the titles of all movies (corresponding file names are in the form Script_TITLE.txt).
The website was crawled in January 2017. Some scripts are not present as they were missing in imsdb.com or because they were uploaded as pdf files. Please notice that (i) the original scripts were uploaded on the website by individual users, so that they might not correspond exactly to the movie scripts and typos may be present; (ii) html formatting was not consistent in the website, and so neither is the formatting of the resulting text files.
Even considering (i) and (ii), the quality seems good on average and the dataset can be easily used for text-mining tasks.
Almost three years ago I programmed a simple twitterbot (see here), namely a Python script that was posting every hour, when available, news or blog posts related to cultural evolution – hence the name @CultEvoBot. While the goal of the endeavour was mainly to see how difficult was to build something like that (it was easy!), and to use potentially what I learnt for other projects (I never did, but who knows!), @CultEvoBot was relatively useful and posted links to interesting sources, the majority of the time.