Evolutionary Game Theory

Part Of: Game Theory sequence
Content Summary: 1300 words, 13 min read

Prisoner’s Dilemma Review

The classical Prisoner’s Dilemma has following setup:

Two prisoners A and B are interrogated, and separated asked about one another.

  • If both prisoners betray the other, each of them serves 2 years in prison
  • If A betrays B but B remains silent, A will be set free and B will serve 3 years (and vice versa)
  • If both prisoners remain silent, they will only serve 1 year in prison.

We can express the decision structure graphically:

IPD- Prisoner's Dilemma Overview

We can also represent the penalty structure. In what follows, arrows represent preference. CC → DC is true because, given that B cooperates, A would prefer the DC outcome (0 years in prison) more than CC (1 year).

IPD- Prisoner's Dilemma Regret

Our takeaways from our exploration of the Prisoner’s Dilemma:

  • An outcome is strategic dominance happens when one choice outperforms other choices, irrespective of competitor behavior. Here, DD is strategically dominant.
  • Pareto improvement is a way to improving at least one person’s outcome without harming any other player. Here, DD → CC represents such an improvement.
  • Pareto optimal outcomes are those outcomes which cannot be Pareto-improved.

The Prisoner’s Dilemma shows us that strategically-dominant outcomes need not be Pareto optimal.  Although each arrow points towards the origin for that color, the sum of all arrows points away from the origin.

It packages together the tragedy of the commons, a profound and uncomfortable fact of social living. A person can be incentivized towards an outcome that she, and everybody else, dislikes.

Towards Iterated Prisoner’s Dilemma (IPD)

In the one-off game, mutual defection is the only (economically) rational move. If a person chooses to defect, they will likely receive a bad result.

But consider morwhat happens in a more social setting, where players compete for resources multiple times. An Iterated Prisoner’s Dilemma (IPD) has the following structure:


What strategy is best? Let’s consider two kinds of strategies we might adopt. We can imagine some vindictive prisoners always defecting (AD). Other prisoner’s might be more generous, adopting a Tit-for-Tat (TfT) strategy. This has them initially cooperating, and mirroring their opponent’s previous move.

Let’s imagine that there are 200 “prisoners” playing this game, with each strategy adopted by half of the population. Which strategy should you adopt, in such a scenario?

The games look as follows:

  • AD vs AD: { DD, DD, DD,  … }. After 10 rounds: A has 20 years, B has 20 years.
  • AD vs TfT: { CD, DD, DD,  … }. After 10 rounds: A has 18 years, B has 21 years.
  • TfT vs TfT: { CC, CC, CC, … }. After 10 rounds: A has 10 years, B has 10 years.

These computations can be generalized to n rounds:

IPD- Always Defect vs TfT

The tit-for-tat (TfT) strategy wins because TfT-TfT games are collaborative, but these players also aren’t effectively exploited by players who Always Defect (AD).

Which Kinds of Strategies Are Best?

There is an very large number of possible IPD strategies. Strategy design might include considerations such as:

  • Deterministic vs Mixed. Should we follow logical rules, or employ randomness?
  • Impersonal vs Personal. Do we remember the behavior of each opponent? Do we change strategies given what we know of other players?
  • Fixed vs Adaptive. Should we use our experiences to change the above on-the-fly?

Given this behavioral diversity, which kinds of strategy are most successful?

To answer this question, in 1980 Robert Axelrod conducted a famous experiment. He invited hundreds of scholars to enter an IPD tournament, submitting their agent’s decision algorithm digitally. In a computer simulation, every agent played every other agent 200 times. The agent with highest cumulative utility was declared the winner.

Many agent strategies employed quite complex, using hundreds of lines of code. The surprising result was that simple strategies, including Tit-for-Tat, often proved to be superior. Axelrod described three properties shared among successful strategies:

IPD- Characteristics of Winning Strategy

We can call such strategies instances of reciprocal altruism.

Moral and Emotional Implications

The theory of evolution has shown us that biological systems are the product of an optimization process known as natural selection. Only genes that improve reproductive success win over evolutionary time.

From this context, it has long seemed unclear how human beings (and other animals) came to express altruistic behavior.  W.D Hamilton’s notion of inclusive fitness explains why we behave generously to relatives. As J.B.S Haldane famously joked,

I would willingly die for two brothers or eight cousins.

Game theory explains our behavior towards non-relatives. Specifically,

IPD provides insight into moral cognition. It shows how our selfish genes might, purely for selfish reasons, come to promote behaviors that are (reciprocally) altruistic.

IPD similarly explains certain emotional processes. For example, I have posited elsewhere the existence of social intuition generators like Fairness. We can now explain why natural selection generated such “socially intelligent” mental modules.

Application: Vampire Bats

Instead of jail time, we can modify our outcome structure to be more relevant to biology.

IPD- Ecological Prisoner's Dilemma (1)

Thus, we can use game theory to interpret animals competing for resources. Consider, for example, behavior of the vampire bats.

Vampire bats feed on the blood of other mammals. Their energy budget is such that they can tolerate 2 days of food deprivation before starving to death.

On a given night, 8% of adult vampire bats will fail to find food on a given night. But when they do find food, it is often more than they need.

Of course, these animals have a genetic incentive to share blood within family. But you can also observe bats sharing their food with strangers.

How can selfish genes reward altruistic behavior? Because vampire bats are playing IPD:

  • CC (Reward). I get blood on my unlucky nights. I have to give blood on my lucky nights, which doesn’t cost me too much.
  • DC (Temptation). You save my life on my poor night. But I also don’t have to feed you on my good night.
  • CD (Sucker): I pay the cost of saving your life on my good night. But on my bad night I still may starve.
  • DD (Punishment) I don’t have to feed you on my good nights. But I run a real risk of starving on my poor nights.

Towards Evolutionary Game Theory

To show why altruistic bats are more successful? Yes; we need only invent evolutionary game theory (EGT). Recall how natural selection works:

Individuals with more biological fitness tend to leave more copies of their genes.

EGT simply adds this replicator logic to the Iterated Prisoner’s Dilemma (IPD). Players with higher final scores (most resources) leave more descendants in subsequent populations (image credit):


We saw previously that Tit-For-Tat players outperform those who Always Defect. In EGT, this fact establishes how a gene that promotes altruism successfully invaded the vampire bat gene pool:

IPD- EGT Stable Strategies (2)

Of course, iterated games don’t always have one winner. Consider the following food web (structurally similar to Rock-Paper-Scissors, of course).

Snake beats Fox. Fox beats Hawk. Hawk beats snake.

What if the size of the snake population starts out quite small? In that case, hawks and foxes predominate. Since hawks are prey to foxes, the size of the hawk population decreases. But this means the snakes have fewer natural predators.

The above traces the implications of one possible starting point. However, we can use EGT maths to model the entire dynamical system, as follows (image credit):

IPD- Food Web Rock Paper Scissors (1)

With this image, we can see that any starting point will eventually (after many generations), lead to a (⅓, ⅓, ⅓) divide of snakes, foxes, and hawks. This point is the locus of the “whirlpool”, it is also known as an attractor, or an evolutionarily stable state (ESS).


  • The Iterated Prisoner’s Dilemma (IPD) makes game theory more social, where many players compete for resources multiple times.
  • While one-off PD games favor selfish behavior, IPD can favor strategies that feature reciprocal altruism, such as Tit-for-Tat.
  • More generally, IPD strategies do best if they are nice, retaliating, and forgiving. This in turn explains how certain facets of our social and moral intuitions evolved.
  • Evolutionary Game Theory (EGT) extends IPD by adding replicator logic (more successful strategies are preferentially represented in future generations).
  • Evolutionary Stable States (ESS) encode dynamical attractors, which populations asymptotically approach.

Until next time.

An Introduction To Natural Selection

Part OfDemystifying Life sequence
Followup To: Population Genetics
Content Summary: 1400 words, 14 min read

How Natural Selection Works

Consider the following process:

  1. Organisms pass along traits to their offspring.
  2. Organisms vary. These random but small variations trickle through the generations.
  3. Occasionally, the offspring of some individual will vary in a way that gives them an advantage.
  4. On average, such individuals will survive and reproduce more successfully.

This is how favorable variations come to accumulate in populations.

Let’s plug in a concrete example. Consider a population of grizzly bears that has recently migrated to the Arctic.

  1. Occasionally, the offspring of some grizzly bear will have a fur color mutation that renders their fur white.
  2. This descendent will on average survive and reproduce more successfully.

Over time, we would expect increasing numbers of such bears to possess white fur.

Biological Fitness Is Height

The above process is straightforward enough, but it lacks a rigorous mathematical basis. In the 1940s, the Modern Evolutionary Synthesis enriched natural selection by connecting it to population genetics, and its metaphor of Gene-Space. Recall what we mean by such a landscape:

  • A Genotype Is A Location.
  • Organisms Are Unmoving Points
  • Birth Is Point Creation, Death Is Point Erasure
  • Genome Differences Are Distances

Onto this topography, we identified the following features:

  • A Species Is A Cluster Of Points
  • Species Are Vehicles
  • Genetic Drift is Random Travel.

In order to understand how natural selection enriches this metaphor, we must define “advantage”. Let biological fitness refer to how how many fertile offspring an individual organism leaves behind. An elephant with eight grandchildren is more fit than her neighbor with two grandchildren.

Every organism achieves one particular level of biological fitness. Fitness denotes how well-suited an organism is to its environment. Being a measure of organism-environment harmony, we can view fitness as defined for every genotype. Since we can define some number for every point in gene-space, we have license to introduce the following identification:

  • Biological Fitness Is Height

Here is one possible fitness landscape (image credit Bjørn Østman).

Natural Selection- Fitness Landscape (1)

We can imagine millions of alien worlds, each with its own fitness landscape. What is the contours of Earth’s?

Let me gesture at three facts of our fitness landscape, to be elaborated next time:

  • The total volume of fitness is constrained by the sun. This is hinted at by the ecological notion of carrying capacity.
  • Fitness volume can be forcibly taken from one area of the landscape to another. This is the meaning of predation.
  • Since most mutations are harmless, the landscape is flat in most directions. Most non-neutral mutations are negative, but some are positive (example).

Natural Selection As Mountain Climbing

A species is a cluster of points. Biological fitness is height. What happens when a species resides on a slope?

The organisms uphill will produce comparatively more copies of themselves than those downhill. Child points that would have been evenly distributed now move preferentially uphill. Child points continue appearing more frequently uphill. This is locomotion: a slithering, amoeba-like process of genotype improvement.


We have thus arrived at a new identification:

  • Natural Selection Is Uphill Locomotion

As you can see, natural selection explains how species gradually become better suited to their environment. It is a non-random process: genetic movement is in a single direction.

Consider: ancestral species of the camel family originated in the American Southwest millions of years ago, where they evolved a number of adaptations to wind-blown deserts and other unfavorable environments, including a  long neck and long legs. Numerous other special designs emerged in the course of time: double rows of protective eyelashes, hairy ear openings, the ability to close the nostrils, a keen sense of sight and smell, humps for storing fat, a protective coat of long and coarse hair (different from the soft undercoat known as “camel hair”), and remarkable abilities to take in water (up to 100 liters at a time) and do without it (up to 17 days).

Moles, on the other hand, evolved for burrowing in the earth in search of earthworms and other food sources inaccessible to most animals. A number of specialized adaptations evolved, but often in directions opposite to those of the camel: round bodies, short legs, a flat pointed head, broad claws on the forefeet for digging. In addition, most moles are blind and hard of hearing.

The mechanism behind these adaptations is selection, because each results in an increase in fitness, with one exception. Loss of sight and hearing in moles is not an example of natural selection, but of genetic drift: blindness wouldn’t confer any advantages underground, but arguably neither would eyesight.

Microbiologists in my audience might recognize a strong analogy with bacterial locomotion. Most bacteria have two modes of movement: directed movement (chemotaxis) when its chemical sensors detect food, and a random walk when no such signal is present. This corresponds with natural selection and genetic drift, respectively.

Consequences Of Optimization Algorithms

Computer scientists in my audience might note a strong analogy to gradient descent, a kind of algorithm. In fact, there is a precise sense in which natural selection is an optimization algorithm. In fact, computer scientists have used this insight to design powerful evolutionary algorithms that spawn not one program, but thousands of programs, rewarding those with a comparative advantage. Evolutionary algorithms have proven an extremely fertile discipline in problem spaces with high dimensionality. Consider, for example, recent advances in evolvable hardware:

As predicted, the principle of natural selection could successfully produce specialized circuits using a fraction of the resources a human would have required. And no one had the foggiest notion how it worked. Dr. Thompson peered inside his perfect offspring to gain insight into its methods, but what he found inside was baffling. The plucky chip was utilizing only thirty-seven of its one hundred logic gates, and most of them were arranged in a curious collection of feedback loops. Five individual logic cells were functionally disconnected from the rest— with no pathways that would allow them to influence the output— yet when the researcher disabled any one of them the chip lost its ability to discriminate the tones…

It seems that evolution had not merely selected the best code for the task, it had also advocated those programs which took advantage of the electromagnetic quirks of that specific microchip environment. The five separate logic cells were clearly crucial to the chip’s operation, but they were interacting with the main circuitry through some unorthodox method— most likely via the subtle magnetic fields that are created when electrons flow through circuitry, an effect known as magnetic flux. There was also evidence that the circuit was not relying solely on the transistors’ absolute ON and OFF positions like a typical chip; it was capitalizing upon analogue shades of gray along with the digital black and white.

In gradient descent, there is a distinction between global optima and local optima. Despite the existence of an objectively superior solution, the algorithm cannot get there due to its fixation with local ascent.

Natural Selection- Local vs. Global Optima

This distinction also features strongly in nature. Consider again our example of camels and moles:

Given such a stunning variety of specialized differences between the camel and the mole, it is curious that the structure of their necks remains basically the same. Surely the camel could do with more vertebrae and flex in foraging through the coarse and thorny plants that compose its standard fare, whereas moles could just as surely do with fewer vertebrae and less flex. What is almost as sure, however, is that there is substantial cost in restructuring the neck’s nerve network to conform to a greater or fewer number of vertebrae, particularly in rerouting spinal nerves which innervate different aspects of the body.

Here we see natural selection as a “tinkerer”; unable to completely throw away old solutions, but instead perpetually laboring to improve its current designs.


  • In the landscape of all possible genomes, we can encode comparative advantages as differences in height.
  • Well-adapted organisms are better at replicating their genes (in other words, none of your ancestors were childless).
  • Viewed in the lens of population genetics, natural selection becomes a kind of uphill locomotion.
  • When view computationally, natural selection reveals itself to be an optimization algorithm.
  • Natural solution can outmatch human intelligence, but it is also a “tinkerer”; unable to start from scratch.