Aim of the book

The field of cultural evolution has emerged in the last few decades as a thriving, interdisciplinary effort to understand cultural change and cultural diversity within an evolutionary framework and using evolutionary tools, concepts and methods. Given its roots in evolutionary biology, much of cultural evolution is grounded in, or inspired by, formal models. Yet many researchers interested in cultural evolution come from backgrounds that lack training in formal models, such as psychology, anthropology or archaeology.

This book aims to partly address this gap by showing readers how to create individual-based models (IBMs, also known as agent-based models, or ABMs) of cultural evolution. We provide example code written in the programming language R, which has been widely adopted in the scientific community. We will go from very simple models of the basic processes of cultural evolution, such as biased transmission and cultural mutation, to more advanced topics such as the evolution of social learning, demographic effects, and social network analysis. Where possible we recreate existing models in the literature, so that readers can better understand those existing models, and perhaps even extend them to address questions of their own interest.

What is cultural evolution?

The theory of evolution is typically applied to genetic change. Darwin pointed out that the diversity and complexity of living things can be explained in terms of a deceptively simple process: (1) organisms vary in their characteristics, (2) these characteristics are inherited from parent to offspring, and (3) those characteristics that make an organism more likely to survive and reproduce will tend to increase in frequency over time. That’s pretty much it. Since Darwin, biologists have filled in many of the details of this abstract idea. Geneticists have shown that heritable ‘characteristics’ are determined by genes, and worked out where genetic variation comes from (e.g., mutation, recombination, migration) and how genetic inheritance works (e.g., via Mendel’s laws, and DNA). The details of selection have been explored, revealing the many reasons why some genes spread and others don’t. Others realised that not all biological change results from selection, it can also result from random processes like population bottlenecks (genetic drift).

The modern theory of cultural evolution began from the observation that culture constitutes a similar evolutionary process to that outlined above. ‘Culture’ is defined as information that passes from one individual to another socially, rather than genetically. This could include what we colloquially call knowledge, beliefs, ideas, attitudes, customs, words, or values. These are all learned from others via various ‘social learning’ mechanisms such as imitation or spoken/written language. The key point is that social learning is an inheritance system. Cultural characteristics (or cultural traits) vary across individuals, they are passed from individual to individual, and in many cases, some traits are more likely to spread than others. This is Darwin’s insight, applied to culture. Cultural evolution researchers think that we can use similar evolutionary concepts, tools and methods to explain the diversity and complexity of culture, just as biologists have done for the diversity and complexity of living forms. We hope that the models in this book will help the reader to understand many of the above principles, by creating simulations of various aspects of cultural evolution.

Importantly, we do not need to assume that cultural evolution is identical to genetic evolution. Many of the details will be different. To take an obvious example, we inherit genetic information in the form of DNA only once and only from our two parents. Cultural traits, on the other hand, we can learn throughout our entire life from various sources (teachers, strangers on the internet, long-dead authors’ books, or even our parents). The goal of cultural evolution researcher is to build a theory (based on carefully conducted theoretical and empirical experiments) that will help us understand how cultural traits are transmitted between individuals and across generations, why some traits get picked up swiftly, why some stick around for a long time, and why others appear and vanish quickly. In addition to these transmission-related questions, researchers have also focussed on the coevolution of genes and culture, and more recently on how ‘rules’ that regulate transmission can themselves evolve culturally.

Why model?

A formal model is a simplified version of reality, written in mathematical equations or computer code. Formal models are useful because reality is complex. We can observe changes in species or cultures over time, or particular patterns of biological or cultural diversity, but there are always a vast array of possible causes for any particular pattern or trend, and huge numbers of variables interacting in many different ways. A formal model is a highly simplified recreation of a small part of this complex reality, containing only those few elements and processes that the modeller suspects to be important. A model, unlike reality, can be manipulated and probed to better understand how each part works. No model is ever a complete recreation of reality. That would be pointless: we would have replaced a complex, incomprehensible reality with a complex, incomprehensible model. Instead, models are useful because of their simplicity.

Formal modelling is rare in the social sciences (with some exceptions, such as economics). Social scientists tend to be sceptical that very simple models can tell us anything useful about something as immensely complex as human culture. But the clear lesson from biology is that models are extremely useful in precisely this situation. Biologists face similar complexity in the natural world. Despite this, models are useful. Population genetics models of the early 20th century helped to reconcile new findings in genetics with Darwin’s theory of evolution. Ecological models helped understand interactions between species, such as predator-prey dynamics. These models are hugely simplified: population genetics models typically make ridiculous assumptions like infinitely large populations and random mating. Even though these assumptions are of course unrealistic, the models are still capable of producing useful predictions.

Another way to look at this is that all social scientists use models, but only some use formal models. Most theories in the social sciences are purely verbal models. The problem is that words can be imprecise, and verbal models contain all kinds of hidden or unstated assumptions. The advantage of formal modelling is that we are forced to precisely specify every element and process that we propose, and make all of our assumptions explicit. In comparison to verbal models, maths and programming code do not accept any ambiguity.

Models can also help to understand the consequences of our theories. Social systems, like many others, are typically under the influence of several different interacting forces. In isolation the effects of these forces can be easy to predict. However, when several forces interact the resulting dynamics quickly become non-trivial, which is why these systems are sometimes referred to as complex systems. With verbal descriptions, figuring out the effects of interactions is left to our insights. With formal models, we can set up systems with these forces and observe the dynamics of their interactions.

Why individual-based models?

There are several different types of formal models. Some models describe the behaviour of a system at the population-level, tracking overall frequencies or other descriptive statistics of traits without explicitly modelling individuals. For example, a model can specify that the frequency of a cultural trait \(A\) at time \(t\) depends on its frequency at time \(t-1\). Perhaps it doubles at each time step. Other models, instead, describe the behaviour of a system at the individual-level, explicitly modelling the individual entities that possess the traits. Imagine the same question, but now we specify that, in a population of \(N\) individuals, each individual observes each time a random number of other individuals and, if at least one of them has trait \(A\), it copies that trait.

Another distinction concerns models that are analytically tractable and models that are not. The former are mathematical models that consist of sets of equations that can be solved to find specific answers (e.g. equilibria points). Our population-level model described above would fit this description. A big advantage of these models is that they can provide insight into the dynamics of a system for a wide range of parameters, or exact results for specific questions. However, this approach requires the studied dynamics to be rather simple. It would be more difficult (or perhaps impossible) to write and analytically solve the systems of equations necessary to describe the behaviours of the single individuals in the second model.

Often, when we want or need to describe the behaviour at the individual level - if, for example, individuals differ in their characteristics, exhibit learning or adaptation, or are embedded in social networks - trying to write a system of equations may not be the best strategy. Instead, we need to write code and let the computer program run. These are individual-based models (IBMs). These models are both individual-level (i.e. they specifies the characteristics of the individuals and some rules by which those individuals interact or change over time) and simulations (i.e. they are not solved analytically, but simulated through a computer program).

Simulations have greater flexibility than analytical models. Due to their structure they are often more intuitive to understand, especially for people with little training in mathematics. However, it is also important to be aware of their downsides. For example, generalisations are often not possible and statements only hold for parameters (or sets thereof) that have been simulated. Another potential downside is that the high flexibility of simulations can quickly lead to models that are too complex, and it can be hard to understand what is happening inside the model. That’s why, hopefully, our IBMs are simple enough to understand, and provide a gateway into cultural evolution modelling.

How to use this book - the programming

All of the code in this book is written in R. Originally R had a strong focus on statistical data analysis. Its growing user-base has turned R into a more general-purpose programming language. While R is used less often for modelling, it is widely taught in many university departments and is the subject of lots of online tutorials and support forums. It is quite likely that many readers already have some experience in R for data analysis and visualisation which can be used also for IBMs, more easily than learning another programming language. Also, if your IBMs run in R, you can use the same language to analyse the output and plot the results.

All the code for running the simulation is included in the book, and commented, often line by line. As a reader, you can therefore read the online book and, alongside, run the code. For convenience, all the code can be found in the online version of this book at here. Of course, you can just read the book, but running the code as you go will give you more direct experience of how the code executes and will allow you to play around with parameters and commands. The best way of learning – especially modelling! – is to try it out yourself.

We assume that the reader has basic knowledge of R (and RStudio, which provides a powerful user-interface for R), including installing it, setting it up, updating it, installing packages and running code. We strived to proceed from very simple to more complex code in a gradual way and to explain all the non-obvious newly introduced programming techniques, but a basic knowledge of R as a programming language, e.g. the use of variables, data frames, functions, subsetting and loops, will greatly facilitate the reading.

In the book, we use the tidyverse package. In particular, we use the tidyverse-typical data structures (tibbles rather than data frames) and the ggplot graphic system (rather than the base R plot function). These are user-friendly and widely used, and they will make it easier to manipulate data and create professional-looking visualisations. The tidyverse, however, has not been created with IBMs in mind. We have therefore not religiously stuck to tidyverse, and we also use functions, data structures, and programming styles that go beyond the tidyverse (in Chapter 7, for example, we show how matrices are more effective than tibbles in computationally-heavy simulations).

Aside from the tidyverse package, we have limited the number of additional packages needed to run the simulations whereever possible. The few packages needed to compile some of the code are explicitly introduced in the book when needed.

How to use this book - the simulations

The book is intended — as the title suggests — as a step-by-step guide. If you are interested in modelling cultural evolution, or in modelling in general, and you do not have previous experience, you should go through the simulations we describe chapter by chapter. The chapters build in complexity both from the programming and from the conceptual point of view. Alternatively, if you are interested in specific models — and you feel confident in your programming skills — feel free to go straight to the relevant chapter.

We organise the book in the following way. We start by presenting IBM versions of some of the now-classic mathematical and population-level models described in the foundational cultural evolution books, such as Robert Boyd and Peter Richerson’s Culture and the Evolutionary Process and Luigi-Luca Cavalli-Sforza and Marc Feldman’s Cultural Transmission and Evolution. The models do not add conceptually to the original analytical treatments. However, they show how to use them to develop IBMs, and they provide several geneal tools to build models that describe cultural evolution. Some of the subsequent chapters develop aspects that are possible only with IBMs, for example, simulating cultural dynamics with many different traits (Chapter 7).

We then move to what we call ‘Advanced topics.’ These chapters deal with more recent work in cultural evolution and include different perspectives, or they concern analyses that are not customary in cultural evolution modelling (for example, network analysis in Chapter 14).

The book does not present new models, views or findings on cultural evolution. Instead, we are trying to provide some up-to-date possibilities that IBM can offer cultural evolutionists. If you—while reading this book—are struck by an idea for a new model or an adaptation of one that we present here, we have succeeded in our mission.

Conventions and formatting

In general, we follow the tidyverse style guide for naming functions and variables, and code formatting.

Names of functions and variables use underscores to separate words and lowercase letters, e.g. previous_population, biased_mutation. If in the same chapter we have more than one function for the same model (for example because we gradually add parameters), they are numbered as unbiased_transmission_1(), unbiased_transmission_2(), etc.

For the text, we use the following style conventions:

  • names of functions and data structures are in fixed-width font, e.g., unbiased_transmission(), population, output

  • technical terms are in quotes, e.g., ‘geoms,’ ‘chr’

  • names of variables are in italics, e.g., \(p\), \(generation\)

Further reading

For some recent general books on cultural evolution, you can check Mesoudi (2011), Morin (2015), Henrich (2016), Laland (2017), and Acerbi (2019).

Seminal books are by Cavalli-Sforza and Feldman (1981) and Boyd and Richerson (1985).

For more on the virtues of formal models for social scientists, with a cultural evolution perspective, see Smaldino (2017). Smaldino (2020) is dedicated to good practices to translate verbal theories into formal, especially individual-based, models.

A good introduction to R programming is Grolemund (2014). Another general introduction, with a specific focus on the tidyverse logic, is Wickham and Grolemund (2017).