Chapter 11 Social learning of social learning rules

In most of the models we explored so far, individuals decide whether or not to copy according to various rules. These rules, or heuristics, are often referred to as ‘transmission biases’ in the cultural evolution jargon. Individuals might have a bias towards copying common traits, a specific subset of the population, or certain cultural traits with respect to others by virtue of their intrinsic characteristics, and so on.

A feature of all these models is that these rules are assumed to be stable over time, or are changing much more slowly when compared to the individual learning events that we model so that we effectively treated them as fixed. However, learning biases, just like cultural traits, can be affected by cultural evolution. When, what, or from whom we learn can and does evolve. This is far from being a rare instance: parents, at least in modern western societies, invest much effort to transmit to children that learning from schoolteachers is important, or teenagers groups may discourage learning from other groups, or from adults in general. Educational systems in countries such as Korea or Japan are thought to encourage pupils to learn and trust teachers almost unconditionally, whereas, in countries like UK and USA, the emphasis is on individual creativity and critical thinking.

11.1 Openness and conservatism

How can we approach the social learning of social learning rules with simple models? To start with, we can imagine that individuals learn from others whether they should copy others or not. We can imagine the simplest possible dynamic, where a single trait, P, both regulate the probability to copy from others and is the trait that is actually copied. When an individual has $P=1$ always copies others (we will call it a completely ‘open’ individual), and when it has $P=0$ never copies others (we will call it a completely ‘conservative’ individual). All intermediate values of P are possible.

Let us initialise a population of $N=1,000$ individuals, each with a value for $P$, wich we pick at random from a uniform distribution:

library(tidyverse)
N <- 1000
population <- tibble(P = runif(N))

Now, let us set up our function to run the simulations:

openness_conservatism <- function(N, t_max, r_max) {
  output <- tibble(generation = rep(1:t_max, r_max), 
                   p = as.numeric(rep(NA, t_max * r_max)), 
                   run = as.factor(rep(1:r_max, each = t_max)))
  for (r in 1:r_max) {
    # Create first generation
    population <- tibble(P = runif(N))
    
    # Add first generation's p for run r
    output[output$generation == 1 & output$run == r, ]$p <- 
      sum(population$P) / N 
    
    for (t in 2:t_max) {
      # Copy individuals to previous_population tibble
      previous_population <- population 
      
      # Choose demonstrators at random
      demonstrators <- tibble(P = sample(previous_population$P, N, replace = TRUE)) 
      
      # Choose individuals that copy, according to their P
      copy <- previous_population$P > runif(N) 
      
      # Copy
      population[copy, ]$P <- demonstrators[copy, ]$P 
      
      # Get p and put it into output slot for this generation t and run r
      output[output$generation == t & output$run == r, ]$p <- 
        sum(population$P) / N 
    }
  }
  # Export data from function
  output 
}

Everything should be familiar with this function. The only new instruction is the line copy <- previous_population$P > runif(N). We use this line to determine which individual will or will not copy. To do this, we compare each individual’s value of P with a number randomly drawn from a uniform distribution $U(0,1)$. If the P value is higher, the individual will copy, otherwise, it will not.

To see that this works, think of the extreme cases. If $P=1$ all numbers we draw from the uniform distribution will be smaller and so the comparison would always return a TRUE and the individual copies. In the reverse case where $P=0$, no value drawn from the uniform distribution will be smaller and so we will always receive a FALSE and the individual never learns.

We can now run the simulation, and plot it with the plot_multiple_runs_p() function for continuous traits we wrote in the previous chapter.

plot_multiple_runs_p <- function(data_model) {
  ggplot(data = data_model, aes(y = p, x = generation)) +
    geom_line(aes(colour = run)) +
    stat_summary(fun = mean, geom = "line", size = 1) +
    ylim(c(0, 1)) +
    theme_bw() +
    labs(y = "p (average value of P)")
}

data_model <- openness_conservatism(N = 1000, t_max = 50, r_max = 5)
plot_multiple_runs_p(data_model)

Figure 11.1: After few generations, the popualtion is composed by conservative individuals.

The average value of P in the population quickly converges towards 0 (in fact, towards the lower initial value, as there are no mutations) in all runs. At this point of the book, you should be able to introduce mutations, as well as initialise the population with different values of P. What would happen, for example, if individuals start with values of P close to 1, that is, they are all initially very open? Another possible modification is that, instead of comparing the copier’s P value with a random number, when two individuals are paired, the individual with the higher P (that is, the most open of the two) copies the other one.

At the risk of ruining the surprise, the main result of populations converging towards maximum conservatism is robust to many modifications (but you should try your own, this is what models are about). The result seems, at first glance, counterintuitive: the outcome of social transmission is to eliminate social transmission! A way to understand this result is that conservative individuals, exactly because they are conservative, change less than open individuals and, in general, transitions from open to conservative happen more frequently than transitions from conservative to open. Imagine a room where people are all copying the t-shirt colours of each other, but one stubborn individual, with a red t-shirt, never changes. If there are no other forces acting, at some point all individuals will wear red t-shirts.

11.2 Maintaining open populations

The previous results highlight a potentially interesting aspect of what could happen when social learning rules are themselves subject to social learning, but it does not represent, of course, what happens in reality. Some models, such as the Rogers’ model we explored in [chapter 8][Rogers’ model], are useful exactly because they force us to think how reality differs from the modelled situation. Individuals, in real life, remain open because learning from others is, on average, effective, and increases their fitness.

However, even without considering the possible fitness advantages of copying from others, there may be other reasons why individuals remain open to cultural influences. We can add a bit of complexity to the previous model and see what happens. For example, instead of having a single P value, individuals can be “open” or “conservative” depending on the specific cultural trait they observe. One can be open to try exotic recipes, while another may like only its local cuisine. We can say that, instead of a single P, we have many preferences associated to cultural traits and, as before, they can be transmitted from one individual to another. Second, we decide to copy other individuals depending on our preferences for the traits they show us.

Finally, differently from the models, we explored in the previous chapters, individuals in the population are replaced through a birth/death process. They are born without cultural traits, and they acquire them during the course of their life, by copying them from others, or by introducing them through innovation. The new function openness_conservatims_2() does all the above.

openness_conservatism_2 <- function(N, M, mu, p_death, t_max, r_max){
  output <- tibble(generation = rep(1:t_max, r_max), 
                   p = as.numeric(rep(NA, t_max * r_max)), 
                   m = as.numeric(rep(NA, t_max * r_max)), 
                   run = as.factor(rep(1:r_max, each = t_max)))
  
  for (r in 1:r_max) {
    
    # Initialise population
    population_preferences <- matrix( runif(M * N), ncol = M, nrow = N)
    population_traits <- matrix(0, ncol = M, nrow = N)
    
    # Write first output
    output[output$generation == 1 & output$run == r, ]$p <- mean(population_preferences)
    output[output$generation == 1 & output$run == r, ]$m <- sum(population_traits) / N  
    
    for(t in 2:t_max){
      # Innovations
      innovators <- sample(c(TRUE, FALSE), N, prob = c(mu, 1 - mu), replace = TRUE) 
      innovations <- sample(1:M, sum(innovators), replace = TRUE)
      population_traits[cbind(which(innovators == TRUE), innovations)] <- 1
      
      # Copying
      previous_population_preferences <- population_preferences
      previous_population_traits <- population_traits
      
      demonstrators <- sample(1:N, replace = TRUE)
      demonstrators_traits <- sample(1:M, N, replace = TRUE)

      copy <- previous_population_traits[cbind(demonstrators,demonstrators_traits)] == 1 & 
        previous_population_preferences[cbind(1:N, demonstrators_traits)] > runif(N)
      
      population_traits[cbind(which(copy), demonstrators_traits[copy])] <- 1
      
      population_preferences[cbind(which(copy), demonstrators_traits[copy])] <- 
        previous_population_preferences[cbind(demonstrators[copy], demonstrators_traits[copy])] 
      
      # Birth/death
      replace <- sample(c(TRUE, FALSE), N, prob = c(p_death, 1 - p_death), replace = TRUE)
      population_traits[replace, ] <- 0
      population_preferences[replace, ] <- runif(M * sum(replace))
      
      # Write output
      output[output$generation == t & output$run == r, ]$p <- mean(population_preferences)
      output[output$generation == t & output$run == r, ]$m <- sum(population_traits) / N    
    }
  }
  # Export data from function
  output
}

The population is now described by two matrices, population_preferences and population_traits. We initialise the former again with random numbers between 0 and 1 and fill the latter with 0s only, i.e. at the beginning, there are no traits in the population. A parameter of the simulation, M, gives the maximum possible number of traits. At each time step, a proportion $\mu$ of innovators introduce a trait at random.

The main novelties of the code are in the copying procedure. After storing a copy of the P values and traits from the previous round, we select demonstrators at random (they can also be demonstrators more than once, which is why we set the replace=TRUE in the sample() function. Then we select a random trait for each demonstration. If the demonstrator actually possesses the trait (previous_population_traits[cbind(demonstrators,demonstrators_traits)]==1) and the preference of the observer for that trait is sufficiently high (previous_population_preferences[cbind(1:N, demonstrators_traits)] > runif(N)), then this evaluation will be TRUE. We store this in a new variable, copy, which we will use to identify which of the $N$ individuals will copy the demonstrator. An observer will always copy both the trait and the preference of the demonstrator.

We can start with a situation similar to the previous model, with only a single trait ($M=1$). We set a relatively high innovation rate ($\mu=0.1$) so that the initial population is quickly populated by cultural traits, and $p_\text{death}=0.01$, meaning that, with a population of 100 individuals, every time step there will be on average one newborn. (As usual, we invite you to explore the effect of these parameters.)

data_model <- openness_conservatism_2(N = 1000, M = 1, mu = 0.1, 
                                      p_death = 0.01, t_max = 50, r_max = 5)
plot_multiple_runs_p(data_model)

Simlarly to the previous model, the popualtion converges to conservatism, even if the descent is less steep as individuals need some time to acquire traits.

Figure 11.2: Simlarly to the previous model, the popualtion converges to conservatism, even if the descent is less steep as individuals need some time to acquire traits.

The plot is fairly similar to what we saw before. The average openness of the population converges towards lower values within a few generations, in all runs. The descent is less steep since at the beginning of the simulations individuals need to acquire cultural traits to kick start social transmission. We can now try with a higher number of possible traits, for example, $M=10$.

data_model <- openness_conservatism_2(N = 1000, M = 10, mu = 0.1, 
                                      p_death = 0.01, t_max = 50, r_max = 5)
plot_multiple_runs_p(data_model)

Figure 11.3: With 10 possible traits, convergence to conservatism is slower.

Now the convergence appears slower. We can try with longer simulations, fixing $t_\text{max}=1000$.

data_model <- openness_conservatism_2(N = 1000, M = 10, mu = 0.1, 
                                      p_death = 0.01, t_max = 1000, r_max = 5)
plot_multiple_runs_p(data_model)

Figure 11.4: Even after 1,000 generations, with 10 possible traits, individuals are not completely conservative.

Even after $1000$ generations, population openness did not go to $0$, but it stabilises at about $0.12$. To understand what happens it is interesting to plot the other value we are recording in the output of the simulation, that is the average number of traits that individuals possess. The function below is equivalent to the usual plot_multiple_runs(), but with a different y-axis label, and it takes $M$ (the maximum number of traits) as a parameter so that we can set the y-axis to span from $0$ to $M$, to have a better visual estimate of the proportion of traits present with respect to the maximum possible.

plot_multiple_runs_m <- function(data_model, M) {
  ggplot(data = data_model, aes(y = m, x = generation)) +
    geom_line(aes(colour = run)) +
    stat_summary(fun = mean, geom = "line", size = 1) +
    ylim(c(0, M)) +
    theme_bw() +
    labs(y = "m (average number of traits)")
}

plot_multiple_runs_m(data_model, M = 10)

Figure 11.5: Individuals, on average, do not acquire all the possible traits during the lifetime.

On average, individuals do not have all $10$ possible traits. Remember that individuals are replaced with a birth/death process, and they are born with no cultural traits so that they need time to acquire them. Let’s try now with a bigger possible cultural repertoire, say $M=50$, and plot the average openness as well as the average number of traits.

data_model <- openness_conservatism_2(N = 1000, M = 50, mu = 0.1, p_death = 0.01, t_max = 1000, r_max = 5)
plot_multiple_runs_p(data_model)

Figure 11.6: Individuals, on average, acquire less then half of the available traits, when there are 50 possible traits.

plot_multiple_runs_m(data_model, M = 50)

Figure 11.7: Individuals, on average, acquire less then half of the available traits, when there are 50 possible traits.

This time the average openness stabilises to an even higher value (around $0.4$), and the number of cultural traits is below $20$, less than half of all possible traits.

We can explicitly visualise the relationship between $M$ and population openness after $1000$ generations for a few representative values of $M$. We consider only a single run for each condition as, from the previous results, we know that different runs give very similar results.

test_openness <- tibble(M = c(1,5,10,20,50,100), p = as.numeric(rep(NA, 6)))
for(condition in test_openness$M){
  data_model <- openness_conservatism_2(N = 1000, M = condition, mu = 0.1, 
                                        p_death = 0.01, t_max = 1000, r_max = 1)
  test_openness[test_openness$M == condition, ]$p <- 
    data_model[data_model$generation == 1000, ]$p
}
ggplot(data = test_openness, aes(x = M, y = p)) +
  geom_line(linetype = "dashed") +
  geom_point() +
  theme_bw() +
  labs(x = "Maximum possible number of traits", y = "p (final average value of p)")

Relationhsip between the number of possible cultural traits and the average openness of the population: when there are more traits possible to acquire, populations remain more open.

Figure 11.8: Relationhsip between the number of possible cultural traits and the average openness of the population: when there are more traits possible to acquire, populations remain more open.

The more cultural traits that can be acquired, the more individuals remain open. Why is that the case? As we saw before, a conservative individual will be able to spread its traits because they are more stable (remember the red t-shirt example). On the other hand, to be copied, an individual needs to showcase its traits. As the traits are chosen at random, it is better for an individual - from the point of view of its cultural success - to have many traits. These two requirements are in conflict: to acquire many traits an individual needs to remain relatively open. For this reason, when the cultural repertoire is big, individuals will remain open longer.

You can easily check by yourself that decreasing $p_\text{death}$ has a similar effect of decreasing $M$. Individuals living longer will generate more conservative populations. With a bit of work to the code, the same effect can be produced if individuals can learn faster. You can add a parameter to the model that tells how many traits an observer copies from the demonstrator at each interaction (in the case above is as this parameter would have been fixed to 1). The more effective is cultural transmission, the more conservative the population. It all depends on whether individuals have time to optimise both openness and conservatism: big repertories, short lifespans, and ineffective cultural transmission all maintain relatively open populations.

11.3 Summary of the model

In this chapter, we explored the idea that we can learn from others not only beliefs and skills but also the rules that govern how and when we learn from others. The models we presented just scratch the surface of what the consequences of the ‘social learning of social learning rules’ could be, and we invite you to explore other possibilities. Still, the models provide some interesting insights: successful cultural models need to integrate openness (to acquire cultural traits liked by others) and, at the same time, conservativeness (to remain stable and repeatedly show the same traits to copy). This also suggests that successful cultural traits should not only be liked by many, but they also should promote conservativeness, as we defined it here, in their bearers. After all, the first commandment in the Abrahamic religions is ‘Thou shalt have no other gods before me’ rather than ’Check out the other gods, and you’ll see I am the better one! Regardless of the particular results, however, these models mostly highlight how unexpected cultural dynamics can emerge from systems in which the rules governing social learning are not fixed, but they are themselves subject of cultural evolution.

11.4 Further readings

The models above are based on the models described in Ghirlanda, Enquist, and Nakamaru (2006) and Acerbi, Enquist, and Ghirlanda (2009). Acerbi, Enquist, and Ghirlanda (2009) investigates possible variants of the main model of this chapter, such as continuous traits, innovations possible for preferences too, and various degrees of effectiveness of cultural transmission. It also explores how the basic dynamics affect individual characteristics (young individuals are more open than older individuals, older individuals are more effective cultural models than younger individuals, and so on). Acerbi, Ghirlanda, and Enquist (2014) summarises these models and provides a more general perspective on the ‘social learning of social learning rules’ topics, including other simulated scenarios. Mesoudi et al. (2016) is a review of the individual and cultural variation in social learning, pointing to various references, including empirical evidence of cultural variation in social learning in humans.