I found, thanks to twitter-induced serendipity (others call it procrastination), the lyrics of the songs included in the annual Billboard Top-100 from 1965 to 2015 (i.e., considering a few missing, ~5,000 songs). You can find in GitHub, together with the raw data, some clarifications on how the data were collected, their limitations, etc. plus a pointer to a nice analysis already done.
I recently published, together with Olivier Morin, a paper in Cognition and Emotion: Birth of the cool: a two-centuries decline in emotional expression in Anglophone fiction. The main result is about a clear decrease in the emotional tone in English-language literature, starting plausibly from the XIX century, a decrease driven almost entirely by a decline in the proportion of positive emotion-related words, while the frequency of negative emotion-related words shows little if any decline. In other words, English literature became in the last centuries less “emotional” and, in particular, less “positive”.
[Second post of the series “Things that I probably will not develop in a proper paper, but I find interesting enough to write here”. The first is on the XX century decrease of turnover rate in popular culture]
In the last couple of years, part of my research has been dedicated to explore the emotional content of published books, using the material present in the Google Books Ngram Corpus. Our analysis produced some interesting results. While analysis like ours need to be carefully weighted and possibly re-produced with various samples (but this should happen always…), I think that tools like the Google Books Corpus represent an extraordinary opportunity, as my goal is to study human culture in a scientific/quantitative framework.
Almost one year ago, we published a paper in which we described a large scale analysis of cultural/literary trends, realised using the google books ngram corpus. In particular, we showed that, trough a relatively simple extraction of emotion-realted words (words semantically related to “main” emotions like joy, sadness, anger, etc.), it was possible to detect some clear tendencies, such as a general decline in the emotional “tone” of books published in the twentieth century – or at least in the frequencies of emotions words -, a divergence between American and British English – with the former being more emotional -, and, finally, the existence of distinct periods of “literary mood” in the last century.
Related to the last point, PLOS ONE just published a follow up of this research, in which we correlate this literary mood with the past century economic trend. The image below shows the main point of our study.
The red line is what we called “Literary Misery Index” (how “sad” are books in a certain year, on average), that we extracted from the books in the Google Corpus, while the black line is a 11-years moving average of the economic Misery index (how “bad” is economy in a certain year), a well-known economic index, realised adding inflation and unemployment rates. The two trends are strongly correlated (you can read more in the Bristol University press release here, and, of course, in the original paper).
As for the previous work, we are glad we had some media attention (see for example The New York Times and The Guardian), which generated quite a lot of buzz. Not surprisingly, this included some criticism. It is interesting that, while some commenters think that we are “stating the obvious”, others accuse us to apply a “crude” causal determinism, and to defend the implausible claim that economy “dictates” literature and culture.
To me, I am more sympathetic to the state-the-obvious side of the debate so I am not going to write on this (but: we are able to substantiate an “obvious” claim – economic conditions influence cultural mood – with empirical data, as well as provide some refinement, for example providing a possible estimate of a time lag). Regarding the other side of the debate, I would not say that economy “dictates” literature, but it is quite plausible that economic conditions may have an effect on mood. This is not just common sense: many studies link, for example, financial strain and depressive symptoms (here), or general psychological distress (here). If the google corpus is a good barometer of a culture mood, our results are not particularly surprising. This does not mean of course that all books published, for example, in the 80s, were gloomy (I feel like I am underestimating the intelligence of the readers, but some journalists seem to criticise our result on this shaky basis), or that economy alone has a causal effect on literature or culture.
On a related note, given that I can safely assume that most of the “crude determinism” critics come from literary, or, in general humanistic, departments: I like to imagine that a well-known German philosopher, that once was very praised in there, would be very supportive of our work!
Bentley R.A., Acerbi A., Ormerod P., Lampos V., (2014), Books Average Previous Decade of Economic Misery, PLoS ONE, 9 (1): e83147.
Just to give an idea of the analysis mentioned in the previous post, the plot below shows the trend for a rough measure of the “happiness” of the books present in the Google Books database. For WordNet-Affect (WNA) this is obtained, simplifying a little, by subtracting the cumulative scores of the categories of “Joy” and “Sadness”, while for Linguistic Inquiry and Word Count (LIWC) the two (equivalent) categories are called “Positive emotions” and, again, “Sadness”. Values above zero indicate generally ‘happy’ periods, and values below the zero indicate generally ‘sad’ periods.
This result is interesting for me not much because we can discover something new about the last century (even though I wonder why the 80s seems to be so sad), but because if (i) two independent ways to score the emotional content of texts (ii) trough a quite rough analysis of (iii) an enormous database of books, give highly correlated trends, this means that there is a meaningful “signal” that we can extract (which can not be taken for granted).
We also performed an analogous analysis using a tool called “Hedonometer“ (HED – see the plot below). In this case the results are quite different, even though some similarities are present, e.g. the 20s positive peak, the negative peak corresponding to Second World War, the post-80s increase in happiness. The reason is probably that LIWC and WNA are conceptually quite different from HED. LIWC and WNA are basically “lists” of words related to specific emotions (so, for example, the first – alphabetically – 5 words in LIWC’s category of “Sadness” are: abandon*, ache*, aching, agoniz*, agony), while HED uses a list of generic words not directly related to emotional states, but evaluated by human subjects as particularly happy or sad. So, for example, HED scores in texts the presence of words such as “terrorism” or “Christmas”.
One interesting things to notice regarding HED is that it is the only index that “tracks” the effect of the First World War. Also, comparing the absolute values of our results (the right y-axis in the plot above) with the the values obtained for contemporary twitter messages (see here), it seems that, in general, books tend to be slightly more “sad” than tweets.
If you are interested in more details, and in the other analysis, the preprint of our contribution can be found here.