Links (Dec 2021)

Part Of: Links sequence

Current Events

  1. Demographics on overall trends and drug use in American homeless. Argument that the 80% of homelessness is driven by housing shortage. 
  2. Claims about geopolitics of the war in Ethiopia
  3. In nuclear war simulations, most people choose the escalation option
  4. Reasons to expect a Ukraine invasion in 2022. The associated Metaculus.
  5. Which demographic groups are driving the rise in Chinese religiosity? Wealthy people who feel mistreated by the government.

Psychotherapy and counseling are becoming increasingly popular in urban China, especially among those feeling marginalized and excluded from kinship networks. This raises the notion that religion and psychotherapy may be fulfilling similar roles in Chinese society… Findings [about political unfairness index] are consistent with the hypothesis that faith in religion and faith in government are substitutes in times of crisis, disorder, and lack of control.


  1. How AlphaZero learns chess
  2. An unsuccessful startup muses on why semantic search of literature AI products struggle to find business value. 
  3. An overview of Self-Assembling AI. 
  4. The ongoing consolidation of AI. Ten years ago we used LSTMs and CNNs for natural language and computer vision. Today, everything is Transformers.
  5. A review of generative models of brain dynamics.
  6. Introducing shardware. “Traditionally software people looked upon hardware people with extraordinary disdain. Now, there’s no longer a crisp boundary between hardware and software, but rather just another translation layer in the stack.”


  1. The placebo effect in the United States has been growing much stronger over time. Drugs that once would have been approved may not be now – because their performance relative to that of placebo is less convincing. May explain why the truth wears off
  1. Helpful viz for the global burden of disease. 
  2. Biological age can be measured independently of chronological age. Tests for the latter are becoming more economically feasible, from $200 to $2 per test. This should accelerate progress in gerontology.
  3. Persistent organic pollutants (POPs) can infiltrate the blood-brain barrier. 
  4. Aging can be accelerated in real life, not just M Night Shyamalan movies. Cancer survivors who underwent radiation therapy, compared to survivors who didn’t, literally live 21 years less. “Approximately one-third of childhood cancer survivors aged 35 years have a disease phenotype of a person aged 65 years.”
  5. You can only hope to lose 5-10% of your body weight with diet & exercise. Even then, there’s an 85% chance your weight loss attempt will fail. Why? Your hypothalamus has decided that your heaviest weight is your correct weight. And so, for every pound lost, your metabolism decreases 11 calories, and appetite increases 45 calories above baseline. The millstone just gets too heavy. (From here, orange is weight loss maintainer, blue is weight loss regressor)


  1. Merck’s molnupiravir may accelerate viral evolution.
  2. Omicron incubation period appears to be three days. May help explain the earlier than expected peak in South Africa.
  3. Covid recently leaked from a lab in Taiwan. Speaks toward plausibility of the lab leak hypothesis
  4. An early estimate for Omicron vaccine efficacy.
  1. T-Cell vaccines may provide longer immunity. “The Emergex shot will not be available until 2025 at the earliest, the usual timeframe for vaccine development. Last year Covid vaccines were developed within months as the regulatory process was speeded up, but the emergency has passed…”


  1. FDA approved eye drops may replace reading glasses for millions. Effect lasts 6-8 hours. 
  2. Xenobots can arguably now reproduce


  1. Here is a 40s video showing patients trying to remember pictures they were shown 5min previously. The buzzes in the audio denote when the electrodes detect memory-laden ripples. 


  1. Rampaging monkeys kill 250 dogs in India in revenge massacre.
  2. The gender-equality paradox is the (disputed) idea that countries with more gender equality have fewer women in STEM careers. This analysis suggests the paradox is robust to different operationalizations of gender equality, but mysteriously evaporates on different operationalizations of STEM participation. 
  3. Golden rice, the genetically modified superfood that almost saved millions
  4. Dead fish can still swim upstream

The Theta-Gamma Neural Code

Part Of: Neural Oscillators sequence
Related To: Habit as Action Chunking
Content Summary: 1800 words, 9 min read

Overview of Spatial Navigation

The hippocampus is centrally important for spatial navigation. Neuroscientists have discovered specialized cell types in the medial entorhinal cortex (MEC), including border cells, head-direction cells, and grid cells. 

Grids are actually organized hierarchically, with grids growing in size along the dorsal-ventral axis within MEC. But grid size does not increase continuously, rather there are precisely four interlocked grids (Stensola et al 2012). The ratio between grid distances is consistently 1.42, which may be optimally efficient (Mathis et al 2012). 

MEC navigational cells allow for the complex behavior of path integration. An ant leaving its nest will search for food by exploring the environment; once it decides to return, it integrates all of its various exploratory movements (each vector is reconstructed from direction, speed, and duration cells) into a single, precise return trajectory. If you abruptly move the ant a couple meters before it returns, it will follow the same return vector, and exhibit confused-search behavior when it cannot find the nest. 

But the hippocampus also supports another, more objective approach: allocentric navigation. Essentially, landmarks are used to orient one’s environment.

Place cells also exhibit a reward gradient, perhaps stemming from interactions with the basal ganglia. This reward gradient is significant in reinforcement learning (RL) models.

These independent navigational systems rely on two different representations of the body: egocentric (me-centered) and allocentric (world-centered) perspectives. These representations can come apart in clinical out-of-body experiences. For example, clever uses of VR to disrupt egocentric processing leads to a reduction in activity in egocentric parts of the hippocampus (Bergouignan et al 2015)

Overview of Oscillation

Neural oscillations are a central organizational principle by which the brain coordinates different neural ensembles (Buzsaki 2010). In the following image (a), each row is a different neuron. With these multi-unit recordings, you’ll find that certain groups of neurons consistently fire together (gray ovals) in the trough of the theta cycle.  

There are at least ten different oscillators with unique cognitive functions.

Relevant to today’s post, we have,

  • Gamma oscillations (30-100 Hz, 10-30ms window) are ubiquitous across cortex and associated with perceptual binding (Engel & Singer 2001).
  • Theta oscillations (4-10 Hz, 100-250ms window) are not as global, but have been detected in hippocampus, MEC, medial prefrontal cortex and ventral striatum (Drieu & Zugaro 2019). 

Gamma oscillators can be phase locked with theta, with ~7 gamma cycles nested within theta. Theta and gamma exhibit phase-amplitude coupling (PAC), an example of the more general mechanism of cross-frequency coupling (CPC). This theta-gamma neural code is important for learning and memory (Lisman & Jensen 2013).

Theta Sequences and Phase Precession

In the brain, behavior is represented with at least two different timescales.

  1. During locomotion, place cells fire in behavioral sequences that change approximately every 1-2s (T). 
  2. The same trajectories are simultaneously played out in theta sequences every 100-200ms (tau).

As the rat moves, place cells fire at progressively earlier phases within theta cycle. This phase precession was first noticed by O’Keefe & Recce (1993)

The ~7 gamma bursts within each theta rhythm seems to encode a planning trajectory, or sweep, of its upcoming behaviors. In Wikenheiser & Redish (2015), rats could either stop at a feeder (upper left location) or continue moving through the track. Their theta sequences predicted their behavioral choice, well ahead of time. A similar result was found by Gupta et al (2012).

What happens when the rat is still making up its mind? Theta sweeps examine both trajectories within a single trial (Johnson & Redish 2007; Kay et al 2020).

Jezek et al (2011) used flickering to induce confusion whether a rat was located in one of two environments. They found that theta sequences switched between either environment, but within-sequence representations were consistent. This suggests that theta sequences may represent cognitive primitives.

The SPEAR model

We have detected sub-processes within theta sequences. Hasselmo et al (2002) noticed several physiological differences that occur at the peak versus the trough of the theta cycle. They proposed that peak of the theta phase is involved with the encoding of information, while the trough features a retrieval of information. These phases correspond to the pattern separation versus pattern completion, respectively. Notably, long term potentiation (LTP) was much stronger at theta peak. This is largely driven by acetylcholine, which is expressed more strongly during encoding than retrieval (Hasselmo 2006).

Belluscio et al (2012) found that slow gamma (30-50 Hz) and medium gamma (50-90 Hz) exclusively occur in the trough and peak of theta, respectively. Schomberg et al (2014) replicated this result. Wang et al (2020) note that beta oscillations occur only during theta trough.

Navas-Olive et al (2020) associated activity in superficial and deep CA1 pyramidal neurons with trough and peak of theta, respectively. Their lab has posited four sub-circuits which fire at distinctive locations in theta phase.

Phase precession often reveals prospective choices. But neuroscientists have long noted the existence of retrospective sweeps, theta sequences of locations behind the animal. Bieri et al (2014) have linked slow and slow gamma to prospective and retrospective sweeps, respectively. As we saw in the image above, prospective sweeps simulate possible futures, while retrospective sweeps encode the past (Kay et al 2020).

Here are the properties of the Separate Phases of Encoding And Retrieval (SPEAR) model.  

The SPEAR Model explains several behavioral results. 

Takahashi et al (2014) found that during a fixation cue, where a rat is forced to rely on spatial memory to make a movement decision, there is an abrupt shift to slow gamma emerging from CA3. This is consistent with its hypothesized function of retrieval. 

In novel environments, rats tend to move quickly and non-linearly. As an environment becomes familiar, speed slows down but becomes more linear. Kemere et al (2013) show that fast gamma is expressed at higher speeds, and vice versa. This may reflect the process by which as an animal familiarizes itself with an environment, its cognitive operations increasingly rely on memory retrieval. This result was replicated in Zheng et al (2016).

Relation to Working Memory

Long ago, Sternberg (1966) reported a linear relationship between recall latency and the number of memorized items, which suggested that the list was serially scanned at a rate of 20-30ms per memory item. Similarly, Miller (1955) also collated many different experiments showing a ceiling of recall of about 7 items (but the limit becomes 4 items when rehearsal is removed; Cowan 2000). 

These constraints on working memory are suggestive: the 20-30ms scan time aligns with the gamma cycle, and there are about 7 gamma cycles within a theta sequence (Lisman & Idiart 1995). The ratio of theta to gamma may even correlate with WM span (Kaminski et al 2011), although this data is fairly uncertain.

Theta power increases systematically with working memory load (Jensen & Tesche 2002). Theta does not occur uniformly across cortex during WM tasks, but is localized to specific sites (Raghavachari et al 2001).  Theta frequency decreases during periods of high load, consistent with more representations requiring longer theta sequences (Axmacher et al 2010; but see Moran et al 2010).

Heusser et al (2016) found that maximum gamma power for the memory items occured at distinct locations in theta phase, but only when the subject was able to remember the sequence correctly. Theta also appears conducive to binding together sequences across different modalities. This was replicated in Reddy et al (2021). Clouter et al (2017) found that when audio and video clips were synchronized to theta oscillations (but not other frequency bands), recall accuracy substantially improved.

Roux & Uhlhaas (2014) review studies that explore neural oscillations during WM maintenance. They found that theta activity occurs preferentially in tasks that involve the sequential items, whereas alpha oscillations tend to occur during tasks that require simultaneous maintenance of visuospatial information. They propose to identify theta-gamma binding with one of the subcomponents of working memory: the phonological loop

Griffiths et al (2021) found that theta-gamma PAC (memory consolidation) arose after reductions in alpha/beta power (sequence perception), indicative of a two-stage retention process.

Until next time.


  1. Axmacher et al (2010). Cross-frequency coupling supports multi-item working memory in the human hippocampus
  2. Bieri et al (2014). Slow and fast gamma rhythms coordinate different spatial coding modes in hippocampal place cells
  3. Bullicino et al (2012). Cross-Frequency Phase–Phase Coupling between Theta and Gamma Oscillations in the Hippocampus
  4. Buzsaki (2010). Neural syntax: cell assemblies, synapsemblies, and readers
  5. Canolty et al (2006). High gamma power is phase-locked to theta oscillations in human neocortex
  6. Clouter et al (2017). Theta Phase Synchronization Is the Glue that Binds Human Associative Memory
  7. Colgin et al (2009). Frequency of gamma oscillations routes flow of information in the hippocampus
  8. Cowan (2000). The magical number 4 in short-term memory: A reconsideration of mental storage capacity
  9. de Almeida (2009). The Input-Output Transformation of the Hippocampal Granule Cells: From Grid Cells to Place Fields
  10. Drieu & Zugaro (2019). Hippocampal Sequences During Exploration: Mechanisms and Functions
  11. Engel & Singer (2001). Temporal binding and the neural correlates of sensory awareness. 
  12. Griffiths et al (2021). Disentangling neocortical alpha/beta and hippocampal theta/gamma oscillations in human episodic memory formation
  13. Gupta et al (2012). Segmentation of spatial experience by hippocampal theta sequences
  14. Hasselmo et al (2002). A proposed function for hippocampal theta rhythm: separate phases of encoding and retrieval enhance reversal of prior learning
  15. Hasselmo (2006). The Role of Acetylcholine in Learning and Memory
  16. Hasselmo & Stern (2013). Theta rhythm and the encoding and retrieval of space and time.
  17. Heusser et al (2016). Episodic sequence memory is supported by a theta–gamma phase code
  18. Jensen & Tesche (2002). Frontal theta activity in humans increases with working memory load in a working memory task.
  19. Jezek et al (2011). Theta-paced flickering between place-cell maps in the hippocampus
  20. Johnson & Reddish (2007). Neural ensembles in CA3 transiently encode paths forward of the animal at a decision point
  21. Kaminski et al (2011) Short term memory capacity predicted by theta to gamma cycle length ratio.
  22. Kemere et al (2013). Rapid and continuous modulation of hippocampal network state during exploration of new places
  23. Lisman & Idiart (1995). Storage of 7+/-2 short-term memories in oscillatory subcycles
  24. Lisman & Jensen (2013). The theta-gamma neural code
  25. Mathis et al (2012). Optimal Population Codes for Space: Grid Cells Outperform Place Cells
  26. Miller (1955). The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information
  27. Moran et al (2010). Peak frequency in the theta and alpha bands correlates with human working memory capacity. 
  28. Navas-Olive et al (2020). Multimodal determinants of phase-locked dynamics across deep-superficial hippocampal sublayers during theta oscillations
  29. O’Keefe & Recce (1993). Phase relationship between hippocampal place units and the EEG theta rhythm
  30. Penttonen & Buzsaki (2003). Natural logarithmic relationship between brain oscillators
  31. Raghavachari et al (2001). Gating of human theta oscillations by a working memory task
  32. Reddy et al (2021). Theta-phase dependent neuronal coding during sequence learning in human single neurons
  33. Roux & Uhlhaas (2014). Working memory and neural oscillations: alpha–gamma versus theta–gamma codes for distinct WM information?
  34. Schonberg et al (2014). Theta phase segregation of input-specific gamma patterns in entorhinal-hippocampal networks
  35. Shrivalkar et al (2010). Bidirectional changes to hippocampal theta-gamma comodulation predict memory for recent spatial episodes
  36. Srinivasan (2015). Where paths meet and cross: navigation by path integration in the desert ant and the honeybee
  37. Stensola et al (2012). The entorhinal grid map is discretized
  38. Sternberg (1966). High-Speed Scanning in Human Memory
  39. Takahashi et al (2014). Theta phase shift in spike timing and modulation of gamma oscillation: a dynamic code for spatial alternation during fixation in rat hippocampal area CA1
  40. Valero & de la Prida (2018). The hippocampus in depth: a sublayer-specific perspective of entorhinal-hippocampal function.
  41. Wang et al (2020). Alternating sequences of future and past behavior encoded within hippocampal theta oscillations.
  42. Wikenheiser & Redish (2015). Hippocampal theta sequences reflect current goals
  43. Zheng et al (2015). The relationship between gamma frequency and running speed differs for slow and fast gamma rhythms in freely behaving rats
  44. Zheng et al (2016). Spatial Sequence Coding Differs during Slow and Fast Gamma Rhythms in the Hippocampus

Links (Nov 2021)


  1. Lithium theory of obesity epidemic (general overview, more on lithium grease, brine spills, bioaccumulation, well measurement data and city-level data).
  1. Dietary fat may contribute to lipostat ratcheting (leptin resistance).
  1. During the pandemic, childhood obesity has accelerated. Food purchases in the US are also spiking
  1. Death rates increase 10% during the winter. Is it because of the cold (weakened cardiovascular system), or the season (e.g., influenza doesn’t seem affected by temperature)? 
  1. What’s causing the sex recession? New data suggests a change in religious norms.


  1. Intramuscular injections can accidentally hit a vein, causing injection into the bloodstream. This could explain some of the rare adverse reactions to Covid-19 vaccine. Study shows solid link between intravenous mRNA vaccine and myocarditis (in mice). Needle aspiration is one way to prevent this from happening.
  1. Paxlovid, a bespoke protease inhibitor from Pfizer shows 89% protection from hospitalization. It may be possible to combine with molnupiravir, the new 50% protective transcriptase inhibitor from Merck.
  1. After you exclude the fraudulent and statistically illegitimate studies, several meta-analyses still show that Ivermectin can provide a small amount of protection. The effect is small compared to Paxlovid 89% or mRNA vaccines 95%, but non-zero. What’s going on? The strongyloid hypothesis.

Energy & Climate

  1. High-temperature superconductors are accelerating fusion R&D. The MIT startup is forecasting an energy-positive proof of concept as soon as 2025. 
  1. Vox: geothermal energy is poised for a big breakout. One issue with renewables (solar, wind, etc) is they have a low capacity factor: until large-scale energy storage becomes economical they need to be supplemented by high capacity energy sources. Nuclear and geothermal might qualify; main blockers are regulation and R&D, respectively. 
  1. What is France’s secret? Nuclear power.
  1. Interesting, bullish take on SpaceX. Its starship program will likely reduce the transport cost-per-kilogram by two orders of magnitude. 

Basic Science

  1. Inspired by recent neutrino results, a new dark sector theory attempts to jointly explain dark matter, Hubble tension, baryon asymmetry, and cosmic thinness.
  1. Leading candidate axion theory (explanans: CP parity, dark matter) finds preliminary support
  1. The 3D organization of our genome.
  1. The brain senses and controls immune function, as shown by optogenetic manipulations in the insula.
  1. Lead exposure accounts for a loss of 23 million IQ points in a 6-year birth cohort of US children. See also, the lead crime hypothesis.  See also, Did lead poisoning lead to the downfall of Roman Empire? The 2021 Infrastructure Investment and Jobs Act does allocate $15b for lead abatement.

Current Events

  1. China appears to be expanding its nuclear silos from 20 to 250. And increasing its stockpile to 1000 warheads by the end of the decade. And developing maneuverable hypersonic missiles that can evade US missile defense systems (analysis here). 
  2. An interesting tweetstorm investigates the port backlog and discovers anti-stacking regulations as a key bottleneck. The Port of Long Beach rescinds the rule within one day. 


  1. Another 500 ancient Maya and Olmec complexes were discovered by Lidar in southern Mexico. 
  1. Why did the War on Terror happen? One theory points to powerful bureaucrats (“Vulcans”) gaining too much control over interagency dialogue. 
  1. A geographical way of visualizing class differences
  1. In a poll of economists, opportunity cost was found to be the most underrated. Worth applying its lessons to your everyday life!
  1. Biohacking, Level 100:


Links (Oct 2021)


1. Merck claims its COVID-19 treatment molnupiravir cuts risk of hospitalizations and death in half

2. Leaked grant proposal details high-risk coronavirus research.

Obesity Epidemic

3. The future of weight loss – new drug semaglutide shows comparable efficacy to bariatric surgery.

4. The contaminant theory of the obesity epidemic.

5. The torpor (seed oil) theory of the obesity epidemic


6. Better air quality is the easiest way not to die

7. Air filtration dramatically improves hospital outcomes.

8. Syphilis likely originated in Ohio around 0 BCE, and was ultimately brought back to Europe.


9. Can we attribute our problems to a monocausal event in 1970

10. Did social media enable the rise of Trump?

11. Decline in GOP support for childhood vaccine mandates.

12. Update on Havana Syndrome: it is probably caused by “directed, pulsed microwaves”. 


13. Ancient footprints (22.5 kya) in New Mexico comprise more evidence against Clovis-first settlement theory.

14. Dairying played a major role in the rise of the Yamnaya.

15. Was Henrich wrong about his Marriage and Family Program theory of WEIRD psychology?


16. Proof assistants makes the jump to big league math.

17. More on AlphaFold.

For decades, people have considered computational error to be the most likely source of error when a predicted structure and an experimental one don’t match, and quite rightly so. But now, if you have a big mismatch between the two, it is frankly more likely to be an experimental error, because the folding predictions are getting so solid. This is disorienting, to say the least.

Existential Risks

18. The Hanson Grabby Aliens model

19. Remnants from Theia (the collision that created the moon) forever interred in the deep Earth. Tree-like plumes emanating from these LLSVPs are likely responsible for Deccan Trap volcanism and the K-Pg extinction event. 

The team estimates that, in tens of millions of years, a blob of nightmarishly gargantuan proportions will pinch off from the central cusp and rise to meet what is now South Africa’s foundations. This, said Sigloch, would produce cataclysmic eruptions. The Deccan Traps were caused by what we would think of as a solitary mantle plume. This future mega-blob, though, would be capable of producing volcanism so prolific and extensive that the Deccan Traps would be a firecracker in comparison.”

20. Mirror DNA has been successfully engineered.

“If mirror cells acquired the ability to photosynthesize, we’d be screwed. “I suspect that all hell would break loose,” says Jim Kasting, a climate scientist at Penn State University and an expert on the global carbon cycle. All it would take would be a droplet of mirror cyanobacteria squirted into the ocean. After doing some rough calculations on the effects of a mirror cyanobacteria invasion, Jim Kasting isn’t sure which would kill us first—the global famine or the ice age.


21. Like us, non-human primates also choke under pressure

22. The Appalachian mountain chain also extends into Europe. This is possible because plate tectonics separated this mountain range. The Appalachian Mountains are older than the Atlantic Ocean.

23. How a coastline 100 million years ago influences modern election results in Alabama.

24. The evolution of homeothermy.

But why 98.6 specifically? Well, Casadevall has an answer for that too. In humans, 98.6 is the optimal tradeoff between our metabolism and protection from fungal infections. 

But recall global body temperature is declining; one explanation provided by the torpor theory of obesity (Link 5 above)

Habit as Action Chunking

Part Of: Neuroeconomics sequence
Followup To: Basal Ganglia as Action Selector, Intro to Behaviorism
Content Summary: 2600 words, 13 min read

Towards a theory of habit

Life brims with habitual behavior.

All our life is but a mass of habits—practical, emotional, and intellectual—systematically organized for our weal or woe, and bearing us irresistibly toward our destiny, whatever the latter may be.

Ninety-nine hundredths or, possibly, nine hundred and ninety-nine thousandths of our activity is purely automatic and habitual, from our rising in the morning to our lying down each night. Our dressing and undressing, our eating and drinking, our greetings and partings, our hat-raisings and giving way for ladies to precede, nay, even most of the forms of our common speech, are things of a type so fixed by repetition as almost to be classed as reflex actions.

William James

Why do we find ourselves on autopilot so frequently? What happens in our brain when we switch from reflexive to reflective thought? Is there a way to objectively tell which mode your brain is in, right now?

Our brain betray the program they employ by the errors we express.

When you flip on a light switch, your behavior could be a result of the desire for illumination coupled with the belief that a certain movement will lead to it. Sometimes, however, you just turn on the light habitually without anticipating the consequences – the very context of having arrived home in a dark room automatically triggers your reaching for the light switch. While these two cases may appear similar, they differ in the extent to which they are controlled by outcome expectancy. When the light switch is known to be broken, the habit might still persist whereas the goal-directed action might not.

Yin & Knowlton (2006)

At a conceptual level, we can differentiate three cognitive phenomena: stimulus, response, and outcome. Habitual behavior uses the environment to guide its responses (an S-R map); goal-directed behavior directly optimizes the R-O relation. Goal-directed behavior occurs immediately. Habit emerges with overtraining.

In both behavioral modes, reward is used for day-by-day learning. But only goal-directed behavior is sensitive to rapid changes in the anticipated outcome. We can operationalize this with two metrics (Balleine & Dezfouli 2019):

  1. Outcome expectancy: is behavior sensitive to changes in the environment?
  2. Reward devaluation: is it sensitive to changes in intrinsic value?

Habitual behavior exhibits both.

  • Sometimes, we flip the light switch despite knowing the causal path from the light switch to the bulb is severed.
  • Sometimes, we open the refrigerator despite being full.

When a rat becomes sated, a moderately-trained rat will immediately reduce its reward-seeking behavior (e.g., press the lever fewer times). An extensively trained rat, however, will not respond to such devaluation events – a sign it is acting out of habit. Interestingly, habit only occurs in predictable environments. In a more complicated task, habit (and its index, devaluation sensitivity) does not occur.

Adjudicated Competition Theory: Model-Based vs Model-Free

The basal ganglia is an action selector, giving exclusive motor access to the behavioral program with the strongest bid. We have already seen data connecting this structure with reinforcement learning (RL). But in the RL literature, there are two different ways to implement a learner: a tree system which builds an explicit world-model, and a cache system which ignores all that complexity, and just remembers stimulus-response pairings.

These two modeling approaches have different costs and benefits:

  • Tree Systems are very costly to compute, but learn quickly & are more responsive to changes in the environment.
  • Cache Systems are easier to maintain, but learn slowly & are less responsive.

Besides driving behavior, both models also report their own uncertainty (i.e., error bars around the reward prediction). The adjudicated competition theory of habit (Daw et al 2005) suggests that the brain implements both models, and an adjudicator gives the reins to whichever model expresses the least uncertainty.

Because cache systems are more uncertain in novel environments (stemming from their low data efficiency), tree systems tend to predominate early. But as both systems learn, cache systems eventually become more relatively confident and take over behavioral control. This shift in relative uncertainty is thought to be the reason why our brains build habits if exposed to the same environment for a couple weeks.

Overtraining manufactures habits. But only sometimes! There are several quirks with our habit-generating machinery:

  • Ratio intervals (which rewards behaviors as often as they are performed) tend to preclude habit formation. Interval training (which only provides a reward every so often) is much more habitogenic.
  • Even interval training only generates habits in relatively simple circumstances: for certain tasks involves two actions, behavior can remain goal-directed indefinitely.

Amazingly, not only could Daw et al (2005) reproduce the basic phenomena of overtraining, but their model also reproduces these quirks as well!

Which two brain systems underlie goal-oriented and habitual behaviors, respectively? For that, we turn to the basal ganglia.

Three Loops: Sensorimotor, Associative, Limbic

The striatum receives input from the entire cortex. As such, the fibers which comprise the basal ganglia are rather thick. As our tracing technologies matured, anatomists were able to inspect these tracts at higher resolutions. In the 1990s, it was discovered that this “bundle of fibers” actually comprised (at least) three parallel circuits.

These are called the Sensorimotor, Associative, and Limbic loops, based on their respective cortical afferents:

It’s important to note important differences between the rodent and human striatum.

The mesolimbic and nigrostriatal dopaminergic pathways, discussed above, directly map onto the Limbic and Sensorimotor/Associative loops, respectively:

All three circuits (direct, indirect, and hyperdirect) exist in all three loops (Nougaret et al, 2013); however, I omitted hyperdirect from the above for simplicity.

Given its participation in the Limbic Loop, the mesolimbic pathway is also sometimes referred to as the reward pathway. Its component structures, the ventral tegmental area (VTA) and nucleus accumbens (NAc), are particularly important.

There have been attempts to refine these loops into more specific circuits. Pauli et al (2016), using contrastive methods, found a different 5-network parcellation, which doesn’t overlap much with the former paper. Using resting state methods, Choi et al (2012) found five ICNs embedded within the striatum. More recently, Greene et al (2020) localized individual-specific ICNs within the cortical-basal ganglia-thalamic circuit.

For now, we will mostly confine ourselves to a discussion of three loops.

Localizing The Controllers

The associative loop appears to be the basis of the goal-directed action (GD-A) system. If you lesion any component of the system, behavior becomes exclusively habitual. For example, lesions to the posterior dorsomedial striatum (pDMS), behavior becomes insensitive to changes in both reward contingency and reward value. The same effects occurs with lesions to the SNR, and mediodorsal thalamus (MD). Finally, lesions to the basolateral amygdala (BLA) also disrupt goal-directed behavior, plausibly by altering the reward signal provided by the substantia nigra (SNr).

The sensorimotor loop appears to be the basis of the habit system. If you lesion any component of the system, behavior becomes exclusively goal-directed. For example, after lesions to the dorsolateral striatum (DLS), behavior begins to track changes in both reward contingency and reward value. The same effects occurs with lesions to the GPi, and mediodorsal thalamus (MD). Finally, lesions to the posterior central nucleus of the amygdala (pCeN) also disrupt habitual behavior, plausibly by altering the reinforcement signal provided by the substantia nigra (SNc).

These conclusions are derived from both human and rodent behavioral studies (Balleine & O’Doherty 2010). In normal circumstances, these systems interoperate seamlessly. Damage to the either system, however, causes exclusive reliance on the other system.

The infralimbic cortex (IL) plays an important role in habitual behavior. Lesions to this site prevent the formation of habit (Killcross & Coutureau 2003), and even blocks expression of already formed habits (Smith et al 2012). The IL also appears critical for the formation & retention of both Pavlovian and instrumental extinction (Barker et al 2014) But habit-related activity seems to develop first in the dorsomedial striatum and only with overtraining in the IL (Smith & Graybiel 2013). In a similar manner, the prelimbic cortex (PL) appears to play an important role in goal-directed behavior.

Action Chunks and Sequence Learning

We have defined habits with respect to outcome contingency and value. But there is a third component: the sequence learning of motor skills.

Behavior is not produced continuously. Rather, it is emitted in ~200ms atomic chunks, or behavioral syllables. Some 300 syllables have been discovered in mice (Wiltschko et al 2015).

Syllables are not emitted in random order. It often pays to use representations of multi-syllable action chunks. These chunks are, in turn, concatenated into larger sequences.

How do we know this? Chunks can be detected with response time measures: within-chunk actions occur more quickly than actions at chunk boundaries. Statistical methods also exist to detect sequence boundaries (Acuna et al 2014).

Concatenation and execution response times also respond to dissociable events. Execution latencies are preferentially impacted by changing the location of the hand relative to the body; concatenation latencies preferentially respond to transcranial magnetic stimulation (TMS) of the pre-SMA area (Abrahamse et al 2013).

Action chunks tend to emerge organically every three or four keypresses. There exists an interesting analogy here to memory chunks, for example, we remember phone numbers with three or four digit chunks. The similarity between action and memory chunks may derive from a common neurological substrate.

Neural activity in the dorsolateral striatum (DLS, part of the sensorimotor loop) exhibits an interesting task bracketing pattern: firing peaks at the beginning and end of tasks. Martiros et al (2018) find that striatal projection neurons (FPNs) generate this bracketing pattern, and are complemented by fast-spiking striatal interneurons (FSIs) which fire continuously within the bracketing window.

This bracketing saves rewarding behaviors as a package for reuse. D2 antagonists don’t interfere with well-learned sequences, but does disrupt the formation of new chunks (Levesque et al 2007). Parkinson’s disease does too (Tremblay et al 2010).

Graybiel & Grafton (2015) argue that the dorsolateral striatum is specifically involved in developing skills: learning action sequences of particular value to the organism. This explains why both habitual and non-habitual skills are learned in the DLS. Indeed, innate fixed action patterns (e.g., grooming) are mediated here too (Aldridge et al 2004).

The supplementary motor area (SMA) plays a central role in implementing sequences. Rats organize their behavior with sequence learning, and lesions to the SMA disrupt these behaviors (Ostlund et al 2009). Similarly, magnetically interrupting the human SMA during a task blocks expression of the subsequent chunk (Kennerly et al 2004).

Within the SMA, the rostral pre-SMA seems to represent cognitive sequences; the caudal SMA-proper exercises motor sequences (Cona & Semenza 2016). Working memory tasks reliably activates pre-SMA, whereas language production reliably activates both pre-SMA and SMA-proper.

Hierarchy as Loop Integration

We have so far examined theories of loop competition. But consider the impact of dopamine shortages and surpluses in the various loops, per Krack et al (2010):

This data aligns with the organizational principle of hierarchy of the central nervous system. The limbic loop selects a desire, the associative loop explores its beliefs to identify a plan, and the sensorimotor loop translates those plans into motor commands. Here’s one possible interpretation, based on Guyenet (2018).

This hierarchical interpretation nicely complements results from sequence learning.

Hierarchical Collaboration Theory

We have seen computational and neurological evidence in favor of the adjudicated competition theory of habit. But the theory also has three important limitations. First, competitive models can explain devaluation behaviors, but struggles to replicate contingency responses. Second, it doesn’t explain sequence learning: why should habits coincide with the development of motor skill? Third, it doesn’t accord with hierarchy: why should habitual behaviors be concrete responses, rather than abstract actions.

This leads us to the hierarchical collaboration theory of habit. On Balleine & Dezfouli (2019)‘s model, the associative system passes command serially to the sensorimotor system. Changes in the reward environment are noticed immediately. However, as the sensorimotor system learns increasingly complex action sequences, the associative system only notices changes to the reward environment at sequence boundaries. In other words, only after a sequence is being executed will the associative system resume control. This would explain why sequence learning so strongly coincides with habit formation and reward insensitivity.

In order to model this alternative account, one must first extend RL to accommodate chunks. These chunks replace their component parts if the benefits of using that sequence exceeds its costs. This formalism is provided by Dezfouli & Balleine (2012). Dezfouli & Balleine (2013) found that their hierarchical model replicated, and in some cases outperformed, the competition model of Daw et al (2011).

The Balleine lab is not the only group to produce computational models of hierarchical collaboration. Baladron & Hamker (2020) produces an interesting model, which assigns the infralimbic (IL) cortex the role of loop shortcut between associative/goal-directed and sensorimotor/habitual systems. Their model is also interesting in that they localize the reward prediction error (RPE) to the limbic loop, while ascribing action prediction error (APE) and movement prediction error (MPE) to the associative and sensorimotor loops, respectively.

These are early days. I look forward to more granular models of habituation, with more attention to the limbic circuit. As our mechanistic models of habit formation improve, so too does our therapeutic reach. If Graybiel & Grafton (2015) is right, and addictions are simply over-strong habits, such models may someday prove useful in clinical settings.

Until next time.


  1. Abrahamse et al (2013). Control of automated behavior: insights from the discrete sequence production task
  2. Acuna et al (2014). Multi-faceted aspects of chunking enable robust algorithms.
  3. Aldridge et al (2004). Basal ganglia neural mechanisms of natural movement sequences
  4. Baladron & Hamker (2020). Habit learning in hierarchical cortex-basal ganglia loops
  5. Balleine & Dezfouli (2019). Hierarchical action control: adaptive collaboration between actions and habits
  6. Balleine & Dickinson (1998). Goal-directed instrumental action: contingency and incentive learning and their cortical substrates.
  7. Balleine & O’Doherty (2010). Human and Rodent Homologies in Action Control: Corticostriatal Determinants of Goal-Directed and Habitual Action
  8. Barker et al (2014). A unifying model of the role of the infralimbic cortex in extinction and habits
  9. Choi et al (2012). The organization of the human striatum estimated by intrinsic functional connectivity
  10. Cona & Semenza (2016). Supplementary motor area as key structure for domain-general sequence processing: a unified account
  11. Daw et al (2011). Model-based influences on humans’ choices and striatal prediction errors
  12. Dezfouli & Balleine (2012) Habits, action sequences, and reinforcement learning
  13. Dezfouli & Balleine (2013). Actions, Action Sequences and Habits: Evidence That Goal-Directed and Habitual Action Control Are Hierarchically Organized
  14. Graybiel & Grafton (2015). The striatum: where skills and habits meet
  15. Greene et al (2020). Integrative and Network Specific Connectivity of the Basal Ganglia and Thalamus Defined in Individuals.
  16. Guyenet (2018). The Hungry Brain
  17. Holland (2004). Relations between Pavlovian-i9nstrumental transfer and reinforcer devaluation.
  18. Kesby et al (2018). Dopamine, psychosis and schizophrenia: the widening gap between basic and clinical neuroscience
  19. Krack et al (2010). Deep brain stimulation: from neurology to psychiatry?
  20. Levesque et al (2007). Raclopride-induced motor consolidation impairment in primates: role of the dopamine type-2 receptor in movement chunking into integrated sequences.
  21. Martiros et al (2018). Inversely Active Striatal Projection Neurons and Interneurons Selectively Delimit Useful Behavioral Sequences
  22. Pauli et al (2016). Regional specialization within the human striatum for diverse psychological functions
  23. Killcross & Coutureau (2003). Coordination of actions and habits in the medial prefrontal cortex of rats.
  24. Smith et al (2012). Reversible online control of habitual behavior by optogenetic perturbation of medial prefrontal cortex
  25. Smith & Graybiel (2013). A dual operator view of habitual behavior reflecting cortical and striatal dynamics
  26. Tremblay et al (2010). Movement chunking during sequence learning is a dopamine-dependent processs: a study conducted in Parkinson’s disease.
  27. Wiltschko et al (2015). Mapping Sub-Second Structure in Mouse Behavior
  28. Yin et al (2005) The role of the dorsomedial striatum in instrumental conditioning 

Intro to Multilevel Societies

Part Of: Anthropogeny sequence
Content Summary: 6000 words, 30 min read

What is a Multilevel Society?

Many mammals live with others. But for the vast majority of these species, social life revolves around families, or groups – not both.

In group-living species, there are no stable breeding bonds. In such multi-male, multi-female (mm-mf) groups, mating is promiscuous. For example, Wrangham estimates that female chimpanzees copulate between 400 and 3,000 times per conception, and female bonobos between 1,800 and 12,100 times. In such species, children bond with their mother, but cannot hope to recognize their father (who could be any of the males within the group).

In family-living species, the family lives autonomously. There are two prominent kinds of primate family: pair-living families (e.g., gibbons), or in one-male units (OMUs) with one male and several females. 

But some thirteen primate species (5%) live in multifamily groups. For these species, families (polygynous OMUs) not only share the same space, but also participate in group-level relationships and behaviors.  

While some of these multifamily species exhibit two levels (family and clan), other species have more. Sometimes clans coalesce as multilevel societies  – apex levels defined first as spatial tolerance, then full-fledged social affiliation.

These concepts have been operationalized. By GPS tagging individual primates, proximity data can empirically demonstrate the existence of such levels:

Only a few species live as multi-family or multi-level societies. Here is a partial list from Grueter et al (2020):

These species come from a wide diversity of taxa, suggesting that multi-level sociality is a derived trait

Humans are one example of MLS primate. Today, we’ll dive into nonhuman primates. By understanding these species on their own terms, we should improve our ability to understand not just how mammalian societies function, but also gain insight into how our own species evolved. 

So let’s dive in! Our exemplars come from two subfamilies within the Old World Monkeys:


  • Geladas (Theropithecus gelada)
  • Hamadryas baboons (Papio hamadryas), 
  • Guinea baboons (Papio papio), discussed occasionally


  • Golden snub-nosed monkeys (Rhinopithecus roxellana). 
  • Rwenzori colobus (Colobus angolensis ruwenzorii), discussed occasionally
  • Proboscis monkeys (Nasalis larvatus), discussed occasionally

MLS Social Organization

Kappeler & van Schaik (2001) note that primate social systems rest on three pillars:

  1. social organization (typical group size, composition, spacing, and dispersal patterns of a given species)
  2. mating system (patterns of sexual behavior)
  3. social structure (patterns of relationships between individuals)

Let’s discuss social organization first.

  • Hamadryas baboons have all four layers: OMUs, clans, bands, and troops. These layers generate a multilevel allegiance system which mirrors the complexities of a human tribe. Hamadryas clans in the same band mingle while foraging, but males ally with their own clan members in a fight. Members of different clans in the same band will in turn unite against members of alien bands.
  • Geladas only have two stable layers: OMUs and bands. Rarely, when an OMU experiences binary fission, the two separated units may cooperate to form a team (Snyder-Mackler et al 2011), by virtue of the bonds of between-OMU female kinship.  Occasionally, geladas bands come together, at least spatially, into apex level communities. 
  • Golden snub-nosed monkeys have three layers. Many OMUs consistently congregate as bands. Every winter, when local food density peaks, these bands fuse into a single troop (Qi et al 2014). 

Anthropoid primates typically features unisexual dispersal: one sex disperses to preclude inbreeding, the other remains with its natal group (is philopatric). The latter typically has a kinship advantage: it maintains lifelong social ties with their same-sex relatives. 

In several MLS societies, bisexual dispersal occurs – in general, this dispersal regime tends to correspond with strong male-female bonds. Hamadryas baboons are predominantly male philopatric: most often the females disperse. Gelada baboons are female philopatric, and the males disperse. Snub-nosed monkeys are also female philopatric, and the males disperse.

Polygynous species by definition contain a sizable number of bachelor males.  Bachelors have up to three “career paths” available:

  1. Some bachelors join OMUs as followers. Leaders tolerate followers because they are often kin, and also because follower males defend against takeovers, effectively increasing OMU longevity. Followers don’t immediately receive mating access, but they do receive other benefits, detailed below.
  2. In male philopatric MLS species (e.g., hamadryas and Guinea baboons), non-follower bachelors typically become solitary
  3. In female philopatric species (e.g., geladas, snub-nosed monkeys), bachelors can choose a solitary life, but may instead join an all-male band (AMU), which poses an increased threat to OMU security.

In sum, here is how these social organizations differ (more contingent higher levels patterns are omitted for simplicity):

Novel Social Signatures

What are the cognitive demands for life in a multilevel society? For this, we begin with a primate whose cognition we understand quite well: Homo Sapiens.

Human friendships range from casual friends to more intimate associations. But the amount of time we invest in relationships is not continuously graded. Instead, ego networks naturally cluster into four separate groups: support clique (~5 people), sympathy group (~15 people), affinity group (~50 people) and affinity network (~150 people). Willingness to act altruistically tends to stop here, but the number of acquaintances people typically have is ~500, and the number of faces we can recognize may be bounded at ~1500.

These Dunbar graphs also manifest as typical sizes of human groups (Dunbar 2020); indeed, groups that don’t conform to these sizes deteriorate more quickly (Dunbar & Sosis 2018). Human groups have four nested components (family, group, clan, tribe) – is it really so surprising that we have four kinds of friends (best friends, close friends, friends, acquaintances)?

A social signature is the distribution of social effort that people invest in their friends. Individual social signatures are resistant to turnover: the distribution of our social effort doesn’t change even when we lose or add new friends (Saramaki et al 2014). But the modal social signature for multilevel primates converges on four clusters. 

What about the social signature for other primates? Kudo & Dunbar (2001) note

For the majority of the species in this sample, the mean size of networks (i.e. the number of animals linked together by a continuous chain of relationships at the defined discriminant level) is typically around 75% of total group size. This suggests that the majority of individuals are linked together in a single network, with a small number of attached peripheral individuals. These individuals either lead a solitary existence within the group or are members of very small peripheral networks.

Within these bonded networks, primates do develop unusually strong ties to a handful of conspecifics. These cliques are analogous in size and quality to best friendships we see in human beings:

This data (and the agent-based modeling of Sutcliffe et al 2016) suggests three signature archetypes across the primate order:

One benefit from the MLS acquaintance layer is plausibly information transfer. In humans this hypothesis has been explored in seminal Granovetter (1973) The Strength of Weak Ties. Henrich’s excerpt Collapse of Supermind attests to the significance of social network size to sustain cumulative culture

MLS Mating System

On to the second pillar: mating systems! 

For bachelors, there are up to four routes to reproductive success: the initial unit strategy, where a follower befriends (or solitary kidnaps) a juvenile female. The takeover strategy involves physical combat. Followers have access to two other strategies: the opportunistic strategy where you gain a harem when the leader male isn’t around, and the inheritance strategy where females are peacefully transferred from leader to follower.  To illustrate how takeovers often go, here’s Pines et al (2011) describing an April 2008 hamadryas takeover.

Takeover of ‘‘Lizzy’’ (adult female with older infant) from ‘‘Pete’’ (old leader) by ‘‘Skivy’’ (subadult solitary): Skivy was observed following Pete and his female Lizzy (Pete’s only remaining adult female). Flanked by ‘‘Herb’’ (a deposed leader with similar facial features to Pete who became Pete’s follower after his own loss of females described below) and by ‘‘Feet’’ (a young subadult follower of Pete), Pete and Lizzy were observed hastily fleeing from Skivy, who followed but did not physically interact with Pete, Herb, or Feet. Skivy continued to pursue Pete, Lizzy, and Feet the next day. On one occasion that Skivy got close to Pete and his entourage, Feet turned and chased Skivy. Approximately half an hour after this, and with Feet no longer around, Skivy approached the foraging Lizzy and grabbed her. Pete, who was about 8 m away, picked up Lizzy’s infant and fled. After a brief flurry of mounts and grooming, Skivy repeatedly herded Lizzy toward Pete and continued to mount and groom her in full view of the now deposed leader.

From a comparative perspective, hamadryas baboons have unusually enhanced levels of sexual coercion. They exhibit herding: keeping their females within 2m using e.g., neck-biting behaviors (Swedell & Schreier 2009). Male aggression peaks immediately after a takeover, and may function to condition a female. Other species have more robust expressions of female choice. Golden snub-nosed monkeys use paternity confusion to preclude takeover-driven infanticide. Guinea baboon females don’t experience “takeovers” at all; but simply move to different OMUs if their current male doesn’t suit them. 

In contrast with most mm-mf species, males in multi-level societies associate closely with females at all times. This contrasts with mm-mf primates, in which males consort and mate guard only when females are in estrus. 

MLS mating systems varies along other dimensions as well:

Intense Sexual Selection

Sexual selection can be usefully decomposed into intrasexual selection (male-male, female-female), and intersexual selection (male-female). From a male-male competition perspective, strategies vary according to social structure:

In all primates, testes size correlates with body size. However, after controlling for body size, mm-mf species have larger testes in men. This is because females have multiple copulations during estrous, and the male who delivers the most gametes has a fitness advantage (hence the arms race to produce more gametes). For example, despite being much smaller than humans, chimp testes are much larger.

In polygynous societies, where a single male forms stable breeding bonds with multiple females, males don’t have to worry about competing with other males’ sperm. But they do have to worry about bachelors challenging & overtaking their harem. And as harem sizes grow larger (bigger operational sex ratio OSR), bachelor threat becomes increasingly intense (this is polygyny’s math problem). Size and weaponry drive success in contests for sexual access. As the intensity of bachelor contests increases, we see more sexual dimorphism: males whose bodies (and teeth!) are bigger than women (Clutton-Brock et al 1977).

That’s how male-male sexual selection works in traditional primate societies. But what happens in a multilevel sociality? More specifically, what happens when you take spatially autonomous families, and have them co-reside. Two effects manifest immediately:

  1. Nearby OMU males often exacerbate opportunities for contest competition. And Grueter & van Schaik (2009) found that, indeed, sexual dimorphism is even greater in multilevel primates than traditional polygynous societies.
  2. Nearby OMU males often provide females with more opportunities for extra-pair mating (EPM). In polygynous societies, male-driven infanticide is a constant threat (killing an infant terminates lactation, and thereby accelerates oestrus). “Cheating” promotes paternity confusion, and reduces the infanticide threat. Qi et al (2020) found that, while no observed instances of EPMs occurred, paternity tests revealed more than half of all children were sired by non-resident males. 

But not just the OMU males that change the sexual selection calculus. Bachelors behave differently in multilevel societies as well!  Due to the unusually tolerant male-male dispositions required in multilevel societies, and the dramatic expansion of male kin groups outlined above, bachelor males live in unusually cooperative all-male units (AMUs). OMUs congregate together in breeding bands (BBs), AMUs congregate in all-male bands (AMBs). 

For colobus monkeys, Qi et al (2017) showed that AMBs tend to be organized along kinship lines. Their movements tend to shadow that of the breeding band. What’s more, social patterns of grooming tend to correlate with distance between these two groups – this is likely associated with preparations for violence. 

Finally, let’s consider secondary sexual traits (ornaments). These come in three varieties: hairy traits (capes, tufts, beards), fleshy traits (lips, nose, humps), and colorful traits (red hair, blue scrotum, etc). 

Such ornaments serve at least one of the following functions

  1. Status Signaling. Some ornaments signal dominance between males. 
  2. Attractiveness. Some male ornaments are associated with female choice.

In traditional social structures, individuals are well-known to one another. In multilevel societies, which occasionally coalesce into very large groups, anonymity is more prevalent. In contexts of limited social knowledge, the functional value of such ornaments is particularly high. And indeed, multilevel societies are disproportionately likely to exhibit these ornaments.

MLS Social Structure

We’ve discussed social structure and mating systems. Time for the third pillar: social structure.

In hamadryas societies, if a leader male is incapacitated, harem females will often go their own way. But in geladas, females in an OMU will stay together even if the male is deposed. This suggests that in gelada OMUs are bonded by female kin; whereas the pair-bond is the primary core unit glue in hamadryas. 

You can actually predict this pattern with social network analysis (SNA) to evaluate the social bonds of a given society. Matsuda et al (2012) found that females are more network-central to the colobines, whereas males are more central in cercopithecines. Clustering analysis, however, reveals that geladas OMUs become more unstable when females, but not males, are removed from social networks.

Why should these species differ so radically in social structure? These differences are largely driven by philopatry-based kindreds, sexual selection, and dispersal patterns. Recall that in multilevel societies, dispersal risk is lessened by transferring within the super-group. But which level do individuals transfer? Here’s some data:

Interestingly, Rwenzori colobines have a pattern of upper-level female dispersal and lower-level male dispersal. This pattern is rather unusual, and closely resembles one other MLS primate species: Homo Sapiens.

These dispersal patterns help explain social structure:

  • In hamadryas, male dispersal patterns contribute to the L2 male alliances, used to repulse bachelor males. In this species, male sexual coercion may explain the weak female bonds, even within-harem.
  • In geladas, the lack of female transfer within bands and a lack of clan-based male bonds are likely reasons why gelada bands are not maintained as coherently as hamadryas bands. 
  • In golden snub-nosed monkeys, female philopatry and the colobine penchant for allomothering generates very strong female cohesion within core units (but male-female relationships within OMUs are also quite strong). 

Primate societies typically spend their waking days together. But a minority of species (incl. chimps, bonobos and humans) operate under fission-fusion dynamics: subgroups with variable composition forage independently. Fission-fusion behavior is an adaptive response to variable foraging environments, allowing species to dynamically alter their social structure in spatiotemporally variable ecologies. The question of whether fission-fusion behavior requires additional cognitive abilities is an area of live research.

Historically, there has been a strong tendency to conflate fission-fusion with multi-level societies. But there is a key difference: in mm-mf primates that use fission-fusion dynamics, subgroups are formed on an individual basis (they are atomic communities). In contrast, multilevel, fission-fusion primates form subgroups on a familial basis (they are molecular communities). Families are never separated: they forage together. An interesting exception to this trend is human beings, which use an atomic style during the day (by the sexual division of labor, our families aren’t literally inseparable), but a molecular style at night (unlike chimps, human nature has us rather consistently sleeping with our families).

Stronger Kindreds

In comparison to mm-mf groups, the potential for kin selection-based cooperation is dramatically enhanced by multilevel social organization. Let’s dive in.

For nearly all primates, the most common philopatry pattern is female philopatry.  Only a few primate species, including our two closest relatives chimpanzees and bonobos, exhibit male philopatry. 

Primates are capable of kin recognition. We find evidence for this in nepotism (preferentially helping relatives), incest avoidance (they largely avoid mating with conspecific kin), and dominance relations (vengeance is often directed at the aggressor’s kin). Kin recognition is grounded in parental attachment, and from there an inductive ability to notice adjacent attachment relationships. But kin recognition varies by social structure. Consider the following:

Symbol Key:

  • Circle = female; Triangle = male
  • Green = fully recognized kin, Light green = imperfectly recognized kin, White = unrecognized kin
  • Red Outline = emigrant, Blue Outline = immigrant

In female philopatric mm-mf groups, (e.g., macaques) Ego recognizes her mother and children, by virtue of parturition and lactation. Ego is also able to recognize her siblings (“the other juveniles bonded to my mother”) and maternal grandmother (“the older female bonded with my mother). She may also be able to recognize her maternal aunts and uncles, and sister’s offspring, but this requires a second inferential link, and the evidence for this ability is mixed. Ego’s father does not recognize her (impossible to say who inseminated the mother), and Ego will never know her son or’s brother’s offspring (since they dispersed as juveniles).  In fact, the three males she  can recognize as kin leave. The female kin she can recognize often cooperate jockeying for dominance status. For all individuals then, matrilines (n=5) are important determinants of social structure.

In male philopatric mm-mf groups (e.g., chimps), the situation is more lonely. Resident male Ego cannot recognize his father, but he also cannot recognize his mother’s kin, because mom is an immigrant. He can recognize his siblings. But he cannot recognize his brother’s kids (even his brothers don’t know). Nor can he build alliances with his sister’s sons, because his sister emigrates. Patrilines (n=2) permeate the society, but given their shallowness, the situation is more akin to “every male for himself”. 

The situation changes when pair-bonding is introduced (the advent of multifamily groups).  

For female philopatric multifamily groups, in addition to the above, Ego gains the ability to recognize & bond with her father and maternal grandfather. These individuals may prove useful allies in her life. But these eureka moments stop their: their immigrant status prevents Ego from meeting this part of her extended family. 

For male philopatric multifamily groups (e.g., ancestral humans), we see that pair-bonding has radically improved Ego’s situation. Ego now recognizes his father, and his paternal grandfather (that older male with whom his father is most strongly bonded with). Ego can also bond with, and promote the welfare of, his children, and his (non-emigrated) extended family. He also may form weaker bonds with his brother’s offspring, and paternal uncles & aunts. The scope of patrilines radically expands, rather than a single male ally, Ego’s patriline network has achieved n=6. 

In contrast with mm-mf groups, pair-bonding dramatically expands male kin recognition. These kindreds dramatically expand the cooperative potential of agnatic kin. 

Two Evolutionary Pathways

Shultz et al (2011) model the evolution of social life within the entire primate order. The earliest primates lived solitary, nocturnal lives. The shift towards diurnal social living represents a major shift in the primate adaptive landscape, with an increased emphasis on visual processing (efficient foraging) and group living (predation mitigation).

Dunbar & Shultz (2010) argue that sociality comes in two forms. Aggregations (siimple, fluid groups) form when individuals benefit from home range overlap. Congregations (complex, bonded groups) require a repurposing of mother-offspring attachment towards others. Such animals make friends (sensu Silk 2002). Most social mammals (e.g., ungulates) live in aggregations. A few mammals (including social carnivores, dolphins, elephants, and anthropoid primates) live in congregations. 

For primates, the transition from solitary living seems to have two stages. First, there was a stage with unbonded mm-mf aggregations (eg. diurnal lemurs). This was succeeded by bonded mm-mf congregations.

As mentioned, this escape from solitary lifestyle was likely stabilized by exaptations of mother-offspring attachment. These bonds were further reinforced by a transition from bisexual dispersal to unisexual (typically male) dispersal, which enabled strong (female) kindreds. The fact that there are zero attestations of reversion back to soli, suggesting that the stabilizing role of these mechanisms.

Female philopatry is expressed in OMUs when males successfully monopolize access to females, and mm-mf when they do not. Transitions between these forms may be ecologically mediated; namely, reductions in food density that cannot support large female aggregations. This hypothesis finds support in Barton (1999)’s observation that savanna baboons, which typically form large mm-mf groups, may sometimes subdivide into polygynous groups in harsher conditions.

Shultz et al (2011) document a transition from unstable mm-mf groups to pair living and social monogamy (not to be confused with sexual monogamy).  Social monogamy evolves prior to paternal provisioning, and represents an end state. Opie et al (2013) claims that all appearances of social monogamy take place in unstable mm-mf species, all with heightened levels of infanticide risk. In contrast, Lukas & Clutton-Brock (2013) claim that all instances of pair-bonding evolved from solitary species, for the ecological reason of sparse resource → female territoriality → incentivizing male pair-living strategies. 

These two accounts have not yet been reconciled. But Kappeler (2014) suggests both pathways might be possible: lemurs show evidence of both group-living and solitary societies; moreover, the resultant monogamous systems have systematically different properties. 

What about multilevel societies? How did they evolve?

Consider again our exemplar organisms (plus lesser-known, suspected, and intermediate MLS species located in  Pygathris, Nasalis, Madrillus, and Macaca genus). Yellow denotes OMUs, red as stable MM-MFs, and blue as MLS. 

These MLS are derived from different ancestral social systems! Grueter et al (2012) outline two pathways. With the bonding pathway, pair bonding substructures ancestral mm-mf groups. But with the aggregation pathway: autonomous OMUs increasingly overlap, and ultimately affiliate. 

Comparative data may allow us to detect substages within these pathways:

First, consider the Rwenzori colobus. This species seems to be a social Archaeopteryx: it represents a transition point halfway along the bonding pathway. The African colobine group contains both OMU and mm-mf species. The Rwenzori is multilevel, but its core units can be either OMU or MM-MF. These core units often “switch modes”. For example, Stead & Teichroeb (2019) document a case where the largest mm-mf core unit, which consisted of 8 adult males and 6 adult females, split into two units: an OMU and a 7-member AMU.  It is plausible to speculate that, for the bonding pathway, a multilevel structure predates and incentivizes the ascent of OMUs. 

Second, consider the proboscis monkeys. Derived from OMU ancestors, this species appears to be halfway along the aggregation pathway: OMUs are spatially tolerant, but don’t seem socially integrated (nested OMUs). Similar results obtain for several species in the Pythagrix genus.

Third, we’ve already seen how some MLSs have an apex or fourth level (e.g., hamadryas) and some do not (e.g., geladas). This suggests that the apex level affiliation postdates multifamily groups. Let’s call this xenophilia pathway. This pathway too may decompose: from spatially tolerant behavior (nested MFS) towards direct social affiliation (true multi-level societies). 

Two Evolutionary Scenarios

What caused these certain colobines and cercopithecines to “take the leap” towards multi-level sociality?

For colobines, an important ecological determinant is diet. Snub-nosed monkeys are unusual in that their diet is dominated by lichens, a low-quality & high-abundance food. For other colobines with different dietary profiles, the amount of scramble competition is high: larger groups means more energy invested in travelling to productive forage areas. But with multilevel colobines like Rhinopithecus bieti, their diet significantly reduces scramble competition (slope is 30x smaller, in this example). 

Many other primates share this property of minimal scramble competition, yet they don’t form multilevel societies. It seems that diet is a necessary, but not sufficient precondition.

The bachelor threat hypothesis attempts to explain the snub-nosed monkey phenomenon. Consider the following facts:

  1. Higher operational sex ratios (OSRs) correlate with increasing OMU overlap (Grueter & van Shaik 2009).
  2. Multilevel species have higher OSRs  (Grueter & van Shaik 2009).
  3. Infanticide and extra-pair copulation (paternity confusion) are prevalent in these populations. 
  4. Xiang et al (2014) found that colobine males in multilevel societies often collaborate to repulse all-male bands, particularly during mating season.

Perhaps this two-pronged approach explains the evolutionary origins of geladas. Like colobines, geladas feed on low-quality, super-abundant foods (young leaves). This dietary preference dramatically reduces within-group feeding competition. However, geladas don’t suffer as much from bachelor threat. Perhaps this explains why they haven’t taken the xenophilia pathway, in contrast with colobines.

For baboons, Jolly (2020) presents a five-stage reconstruction of Papio evolutionary history, based on modern genetic data. 

  1. Originally two species (Yellow and Kinda) populated South Africa. These baboons lived in a female philopatric, mm-mf configuration. 
  2. As East Africa evergreen forest dried, a pathway opened up. A male-philopatric multi-family ancestral species (gray color, P origin point), mounted a frontier invasion (similar to how starlings colonized North America). 
  3. This multifamily species ultimately diverged into three populations (Guinea, Ancestral Northern, Hamadryas – see Fig C)
  4. Olive baboons speciate from Ancestral Northern, and revert back to female-philopatric mm-mf, separating Guinea from Hamadryas baboons.
  5. The species cross-fertilizations in the last step are visible in modern-day “discordant” mitochondrial DNA.

The invasion species P derives from Kinda baboons, which are unusual compared to other baboons for indirect male-male cooperation, as well as stronger male-female relations including comparatively intimate consortships (Petersdorf et al 2019). These features may have produced useful cognitive preadaptations for multi-level social life. 

But why would P become male philopatric? Consider a dispersing male living with a group on the frontier. If he disperses away from the (rapidly-moving) frontier, his heritage does not contribute to the subsequent generations of the expanding population. If he disperses into the frontier, he has no group to join, and his chance of starvation or predation is quite high. Finally, if he disperses along the frontier that is also disadvantageous due to the Holt-McPeek effect (Hold & McPeek 1996). Taken together we see that male dispersal is penalized, the dispersal cost-benefit tradeoff changes, and those males predisposed to remain (always a small fraction of a group) would enjoy a fitness advantage. As the frontier expands, typically by group fission seeding the horizon, this process compounds. 

Jolly (2020) suggests that this frontier-driven shift to male philopatry allowed these baboons social organization to find a new equilibrium:

Adding male philopatry to preexisting female philopatry would produce troops that were near-endogamous, reducing the effective population size and incurring the risk of inbreeding depression and/or shortage of acceptable mates. At the frontier itself, conditions would minimize this risk. Once the frontier had moved on, however, the male-philopatric troops in the fully occupied landscape behind it would no longer enjoy superabundant resources and would therefore tend to shrink toward a size sustainable as a foraging group within such a setting. Being small and endogamous, however, would imperil a troop’s survival. 

This unstable situation changes individual incentives for group encounters:

Troops formed by fusion in this way would maintain their critical size as breeding groups as long as all members frequently gathered at the sleeping site to spend social time communally. At the same time, clans/ parties could forage independently where the ecological setting favored such behavior. In this way, populations that had become male-philopatric remained viable by decoupling the socializing and co-foraging aspects of troop membership

On this hypothesis, the social dispositions of Guinea baboon makes a plausible base from which to derive hamadryas society.  In the sparser resources in the Horn of Africa, foraging units would shrink down to the “molecular” OMUs. As the sole defender of their unit, this would have put stress on the males for aggressive protection, and fewer opportunities to interact with individuals outside of their OMU, females would be more socially isolated. 

This frontier annealing hypothesis attempts to explain why some baboons embarked on the bonding pathway. But however these northern baboons did it, it’s worth noting the only known case of pathway reversion: olive baboons (Papio anubis), while ancestrally male-philopatric multi-family, reverted back to a male-dispersing mm-mf social organization. 

Roots Of Xenophilia

For many primates, xenophobia (between-group hostility) is the norm. For example, as we saw previously, coalitions of chimp males periodically raid neighboring territories, killing anyone unfortunate to cross its path. Xenophobia is largely grounded in resource competition: violence is incentivized if a neighbor possesses a valuable resource, and a group has enough physical power to capture & retain it. 

Many intergroup encounters are agonistic. But other encounters are more tolerant. What adaptive value does extra-group affiliation unlock? Examples include:

So far, I’ve described the adaptive reasons why group encounters may be hostile vs friendly. But what proximal mechanisms nudge groups in either direction? First, much empirical work has shown relatedness of extra-group members reduces aggression (because of kin selection). Second, familiarity is another key mediator, for at least three reasons:

  1. Post-fission. When a gorilla troop experiences fission, for example, it is much less likely to fight with its previous associates.  If the annealing hypothesi is correction, and large group fissions creating MLSs in Papio, this would be how they did it.
  1. Shared interest. Consider a male philopatric species, where a female leaves Group A and pair-bonds with a male in Group B. Both groups meet from time to time. Such groups have some shared kinship and built-in familiarity via the “linking individual”. Not only this, but conflict that risks harm to the link and her offspring is against the reproductive interests of both the relatives and the affines (“in-laws”). As the number of out-marriages (out-mating + pair-bonding) increase, this reason for cooperation becomes increasingly robust.
  1. Mere exposure. Even groups that don’t share many fission- or dispersal-based links may come to act cooperatively, by the brute fact that neutral exposures to strangers paves the way for affinity (in humans, this is known as the propinquity effect). 

A summary graphic of our discussion in this section:

A couple caveats are in order. First, all three affiliation promoters assume frequency of contact: if groups practice avoidance, intergroup relations are moot.  Second, the above analysis explores primarily dyadic factors. But the relationship Group B has to Group A may play a role in its receptivity to Group C. Geography plays a role. Some multilevel societies have strictly delineated & coherent apex level structures (e.g., hamadryas baboons); others are more flexible at this higher level (e.g., human beings).

Fry et al (2021) describes peace systems: certain human societies that have managed to avoid war & achieve long-term intergroup peace. And indeed, multilevel societies in other primate societies also seem less prone to intergroup aggression. It seems likely that a deeper understanding of multilevel societies might contribute to furthering the science – and practice! – of peace. 

Until next time.


Bolded references are ones I found exceptionally interesting.

  1. Arnaboldi et al (2012). Analysis of Ego Network Structure in Online Social Networks
  2. Barton (1999). Socioecology of baboons: the interaction of male and female strategies
  3. Clutton-Brock et al (1977). Sexual Dimorphism, socionomic sex ratio and body weight in primates
  4. Chapais (2008). Primeval Kinship
  5. Dixon & Vasey (2012). Beards augment perceptions of men’s age, social status, and aggressiveness, but not attractiveness 
  6. Dunbar & Shultz (2010). Bondedness and sociality
  7. Dunbar & Sosis (2018). Optimising human community sizes
  8. Dunbar (2020). Structure and function in human and primate social networks: implications for diffusion, network stability and health
  9. Fischer et al (2016). Charting the neglected West: The social system of Guinea baboons
  10. Fry et al (2020). Societies within peace systems avoid war and build positive intergroup relationships
  11. Goffe et al (2016). Social relationships of female Guinea baboons (Papio papio) in Senegal
  12. Grannovetter (1973). The strength of weak ties. 
  13. Grueter & van Schaik (2009). Sexual size dimorphism in Asian colobines revisited
  14. Grueter & van Shaik (2009). Evolutionary determinants of modular societies in colobines
  15. Grueter et al (2012). Evolution of Multilevel Social Systems in Nonhuman Primates and Humans. 
  16. Grueter et al (2015).  Are badges of status adaptive in large complex primate groups?
  17. Grueter et al (2017). Multilevel societies
  18. Grueter et al (2020). Multilevel Organisation of Animal Sociality
  19. Holt & McPeek (1996) Chaotic population dynamics favors the evolution of dispersal. 
  20. Huang et al (2017). Male Dispersal Pattern in Golden Snub-nosed Monkey (Rhinopithecus roxellana) in Qinling Mountains and its Conservation Implication
  21. Jolly et al (2020). Philopatry at the frontier: A demographically driven scenario for the evolution of multilevel societies in baboons (Papio)
  22. Kappeler & van Schaik (2001). Evolution of primate social systems
  23. Kappeler (2014). Lemur behaviour informs the evolution of social monogamy
  24. Kirkpatrick & Grueter (2010). Snub-nosed Monkeys: Multilevel Societies across varied environments
  25. Kudo & Dunbar (2001). Neocortex size and social network size in primates
  26. Layton et al (2012). Antiquity and Social Functions of Multilevel Social Organization Among Human Hunter-Gatherers
  27. Lukas & Clutton-Brock (2013) The evolution of social monogamy in mammals. 
  28. Matsuda et al (2012). Comparisons of Intraunit Relationships in Nonhuman Primates Living in Multilevel Social Systems
  29. Miller et al (2014). Diet and Use of Fallback Foods by Rwenzori Black-and-White Colobus (Colobus angolensis ruwenzori) in Rwanda: Implications for Supergroup Formation
  30. Mirville et al (2018). Low familiarity and similar ‘group strength’ between opponents increase the intensity of intergroup interactions in mountain gorillas (Gorilla beringei beringei)
  31. Opie, C. et al. (2013) Male infanticide leads to social monogamy in primates. 
  32. Petersdorf et al (2019) Sexual selection in the Kinda baboon
  33. Pines et al (2011). Alternative Routes to the Leader Male Role in a Multi-Level Society: Follower vs. Solitary Male Strategies and Outcomes in Hamadryas Baboons
  34. Pisor & Surbeck (2019). The evolution of intergroup tolerance in nonhuman primates and humans
  35. Qi et al (2014). Satellite telemetry and social modeling offer new insights into the origin of primate multilevel societies
  36. Qi et al (2017). Male cooperation for breeding opportunities
  37. Qi et al (2020). Multilevel societies facilitate infanticide avoidance through increased extrapair matings
  38. Saramaki et al (2014). Persistence of social signatures in human communication
  39. Shultz et al (2011). Stepwise evolution of stable sociality in primates 
  40. Silk (2002). Using the ‘f’-word in primatology. 
  41. Stead & Teichroeb (2019). A multi-level society comprised of one-male and multi-male core units in an African colobine (Colobus angolensis ruwenzorii)
  42. Snyder-Mackler et al (2011). Defining Higher Levels in the Multilevel Societies of Geladas (Theropithecus gelada)
  43. Sutcliffe et al (2016). Modelling the Evolution of Social Structure
  44. Swedell & Schreier (2009). Male Aggression towards Females in Hamadryas Baboons: Conditioning, Coercion, and Control
  45. Swedell et al (2011). Female ‘‘Dispersal’’ in Hamadryas Baboons: Transfer Among Social Units in a Multilevel Society
  46. Xiang et al (2014). Males collectively defend their one‐male units against bachelor males in a multi‐level primate society

[Excerpt] Collapse of Supermind

Part Of: Culture sequence
Excerpt From: Henrich (2015). Secret of Our Success
Content Summary: 1000 words, 5 min read

The loss of adaptive cultural information can result from two different processes. The first is what we saw happen to the Polar Inuit: a random shock (an epidemic) happened to strike the most knowledgeable members of the community, wiping out a chunk of their cultural know-how. Because it removed the kayak (their transportation) from their cultural repertoire and they were already geographically very isolated, they went into a slow downward spiral as their populations dwindled due to the technological losses. This kind of phenomena may be common.

The other process is more subtle and I suspect more important. If someone is copying the techniques and practices of a highly skilled and knowledgeable expert, they will often end up with a level of skill or knowledge that is less than that of the expert they are copying. The reason is that some information was lost every generation, because copies are usually worse than the originals. Cumulative cultural evolution has to fight against this force and it is best able to do so in larger populations that are highly socially interconnected. The key is that most individuals end up worse than the models they are learning from. However, some few individuals, whether by luck, fierce practice, or intentional invention, end up better than their teachers. 

Larger populations can overcome the inherent loss of information in cultural transmission because if more individuals are trying to learn something, there’s a better chance that someone will end up with knowledge or skills that are at least as good as, or better than, those of the model they are learning from. Interconnectedness is important because it means more individuals have a chance to access the most skilled or successful models, and thereby have a chance to exceed them. 

The Tasmanian Effect

These ideas first struck me when I was reading about the aboriginal inhabitants of the island of Tasmania. Roughly four-fifths the size of Ireland and comparable to Sri Lanka, Tasmania lies about 200 kilometers south of Victoria, Australia. When the earliest European explorers made contact with the Tasmanians in the late eighteenth century, they discovered a population of hunter-gatherers equipped with the simplest toolkit of any society ever encountered (by Europeans). To hunt and fight, men used only a one-piece spear, rocks, and throwing clubs. For watercraft, the Tasmanians relied on leaky reed rafts and lacked paddles. To ford rivers, women would swim the raft across, towing their husbands and offspring. In the cool maritime climate, Tasmanians slung wallaby skins over their shoulders and applied grease to their exposed skin. Curiously, the Tasmanians did not catch or eat any fish, despite fish being plentiful around the island. They drank from skulls and may even have lost the ability to make fire. In all, the Tasmanian toolkit consisted of only about twenty-four items.

To put this simplicity into perspective, let’s consider the Pama-Nyungan-speaking Aborigines who were contemporary with the Tasmanians in the eighteenth century, and lived just across the Bass Strait in Victoria. These Aborigines possessed the entire Tasmanian toolkit plus hundreds of additional specialized tools, including a fine array of bone tools, leisters, spear throwers, boomerangs, mounted adzes (for woodworking), many multipart tools, a variety of nets for birds, fish, and wallabies, sewn-bark canoes with paddles, string bags, ground-edge axes, and wooden bowls for drinking. For clothing, rather than draped wallaby skins and grease, the Aborigines wrapped themselves in snugly fitting possum-skin cloaks, sewn and tailored with bone awls and needles. For fishing, the aboriginal populations used shellfish hooks, nets, traps, and fishing spears. Somehow, the Tasmanians ended up with a much simpler toolkit than did their contemporary cousins just across the Bass Strait.

The Tasmanian toolkit is simple even when compared to that of many ancient Paleolithic societies. The archeological record from many parts of the world, going back tens and even hundreds of thousands of years, reveals the emergence of more-complex toolkits than those possessed by the Tasmanians at the time of European contact. Tasmanian stone tools are much cruder than many of the tools found in Europe made by many Neanderthals. The Tasmanians lacked bone tools, yet elsewhere finely crafted bone harpoon points date to at least 89,000 years ago. Similarly, stone points for spears date to a half a million years ago, well before the emergence of our species. Hafted tools date back before the origins of our species, 200,000 years ago. 

Slow-Motion Collapse

The puzzle deepens when we realize that Tasmania was connected to the rest of Australia until about 12,000 years ago. As the seas rose, the Bass Strait flooded and transformed Tasmania from an Australian peninsula to an island. Until this isolation, the archaeological remains left by Tasmanians cannot be distinguished in terms of complexity from those found in Australia. With their isolation, Tasmanians began to lose complex tools. The number of bone tools gradually dwindled until about 3,500 years ago, when they vanished entirely. As evidenced by fish bones, at least some ancient Tasmanian groups probably relied heavily on fish. But, gradually. fish dwindle and disappear from the record. By the time Captain Cook’s men offered freshly caught fish from the bountiful waters around the island in 1777, the Tasmanians reacted with disgust; yet, they gladly took and ate the bread Cook offered. 

By isolating Tasmanians for eight to ten millennia, the rising seas cut them off from the vast social networks of Australia, suddenly shrinking their collective brains (supermind). A gradual loss of their most complex and difficult to learn skills and technologies ensued. 

Supermind Over Mind

In class, I show my undergraduates unlabeled pictures of four different stone toolkits from (1) eighteenth-century Tasmanians, (2) seventeenth-century Australian Aborigines, (3) Neanderthals, and (4) late Paleolithic modern-looking humans (30,000 years ago). I ask them to assess the cognitive abilities of the toolmakers by looking at the tools. My students always rate the Tasmanians and Neanderthals as less cognitively sophisticated than both the seventeenth-century Australian Aborigines and the late Paleolithic toolmakers. Of course, there’s no reason to suspect any innate cognitive differences between Tasmanians and Aborigines, who only became separate populations after the Bass Strait flooded. 

Because most Neanderthal groups possess a toolkit substantially less complex than the more modern-looking African intruders (our ancestors), the assumption has often been that Neanderthals suffered some innate cognitive deficits. 

In primates, the strongest predictor of cognitive abilities across species is overall brain size. Consequently, it’s not implausible that we were dumber than the bigger-brained Neanderthals. However, they had larger collective brains capable of generating greater cumulative cultural evolution.

Works Cited

  • Gott (2002). Fire making in Tasmania: Absence of evidence is not evidence
  • Henrich (2004). Demography and cultural evolution: Why adaptive cultural processes produced malapative losses in Tasmania
  • Rivers “The Disappearance of Useful Arts”
  • Jones (1977). The Tasmanian Paradox.
  • Jones (1995). Tasmanian Archaeology: Establishing the Sequences

The Evolution of God

Part Of: Religion sequence
Followup To: The cognitive basis of theism, Supernatural Punishment Hypothesis
Content Summary: 2900 words, 29 min read

NRMs and Cultural Group Evolution

In 2015, 4.11 billion people (56%) participate in the Abrahamic religious traditions (Judaism, Christianity, Islam). A further 1.6 billion people (22%) participated in karmic religions (Hinduism, Buddhism). 16% of people are unaffiliated, and most of these in China. Only 7% of the world adhere to other faiths. 

These five religions, the world religions, lay claim to 92% of the faithful. But the number of other religions is in the tens of thousands. On average, three new religious movements (NRMs) sprout every day. On average, these groups last 25 years (Sosis 2000). Yet some achieve striking growth. For example, the Mormon church has seen an explosive growth of 10% per decade, roughly equal to the growth rate of the Christian church in the Roman Empire. Who knows what religious life will look like, a century from now?

The religious landscape reverberates with a Darwinian restlessness. It is easy to miss this dynamic process, because a few outlier religious movements have cornered the cultural marketplace. But their monopoly is not guaranteed.

Two Interlocking Mysteries

Consider the following mysteries:

  1. Foraging tribes had a soft ceiling: groups had difficulty sustaining more than 200 people before fissions occurred. Yet farmers managed to live in much larger groups. How?
  2. The gods of hunter-gatherers tended to be weak, whimsical, and not particularly moral. Yet the gods of agriculturalists became interested in certain kinds of human actions, and slowly  developed greater powers to both monitor adherents (e.g., omniscience) and punish proper behavior (hell or karma). Why?

What is the relationship between these two questions? And why does social success correlate so strongly with religious morality?

Many anthropologists look at economic and political developments to explain the social success of farming communities. On this cultural ornament theory of religion, theistic beliefs are causally inert practices: bringing solace and meaning, but disconnected from social and political outcomes.

But we have already seen evidence that religions can generate prosocial behavior. And we also saw that perceptions of supernatural monitoring & punishment can drive this effect. It is natural to suppose that increasingly powerful, and morally concerned deities enhance these effects.

Why did this happen? Recall the golden rule of cultural group selection:

Within-group cooperation is essential to promote between-group competition. 

Organized religion was the fuel that powered the social complexification of the Neolithic. This is the Big Gods Hypothesis (Norenzayan 2015). 

Göbekli Tepe is the site of the world’s oldest temple (8000 BCE), which recent findings reveal housed a skull cult (Greski et al 2017). The site seems to have been built by foragers; which is astounding coordinative feat, especially since its construction would have required some 500 people working in tandem. The site attracted worshippers from up to 90 miles away, and the traffic it generated plausibly smoothed the way for the first cities. More significantly, the site is 20 miles away from Karaca Dağ, the site where wheat was domesticated for the first time. Would the Neolithic Agricultural Revolution have been possible without the advent of organized religion?

While not conclusive, Göbekli Tepe is precisely the kind of site we would expect to find if religious innovations were to precede social ratcheting events. But the Big Gods hypothesis is also consistent with religious innovations occurring afterwards too. Do megasocieties cause (protective) Big Gods, or do Big Gods enable megasocieties? While Whitehouse et al (2021) argue the former, Breheim et al (2021) argue the latter.

Big Gods & Societal Outcomes

How do we know that moralizing religion played a role in sociopolitical development? The first place to look would be at large historical & anthropological datasets, gleaned from the early qualitative ethnographies.

  • Seshat: The Global History Databank
  • Ethnographic Atlas (EA) has data for 1267 societies; and Standard Cross-Cultural Sample (SCCS); a 186 society sample of EA
  • Human Relations Area Files (HRAF)

Swanson (1960) was the first person to investigate what he termed Moralizing High Gods (MHG) in the EA. In a more recent analysis, Johnson (2005) found in a SCCS sample of 186 societies that moralizing high gods was highly correlated with ten different measures of cooperation, including community size, policing, norm compliance, and food sharing. These analyses arguably suffer from Galton’s Problem: unwarranted independence assumptions.

A recent attempt to work-around this problem is provided by comparative phylogenetics. Linguists have long known that the similarities between languages can be quantified as a tree. As social groups fission, migrate, and lose contact, their language systems slowly diverge. 

Darwin himself used these findings to inform his theory of common descent. Just as genetic cladograms reveal the secrets of natural selection, language trees can divulge the secrets of cultural group selection. More specifically, Watts et al (2015) used language phylogenies to reverse engineer the historical relationships between 96 Austronesian cultures. In this historically-sensitive dataset, the relationship between supernatural punishment and high political complexity is striking.

So far, we’ve discussed how moralizing gods tend to appear more in agricultural societies than foraging societies. But other modes of subsistence also exist. Peoples & Marlowe (2012) found that pastoralism (living off of domesticated animals) even more strongly gravitates to moralizing high gods. Because pastoralists are more warlike and more inequal than agriculturalists, the need for religion-induced cohesion is even more pronounced.

Big Gods & Individual Cooperation

We’ve seen societal outcomes correlated with big gods. What about individual behavior? Are believers more likely to cooperate with one another, for the betterment of their group?

The following research makes heavy use of three games:

  1. Public Goods Game (PGG) explores questions of cooperation
  2. Ultimatum Game (UG) explores questions of fairness
  3. Random Allocation Game (RAG) explores cheating behaviors

We’ve seen prosocial religions occur in larger societies. Henrich et al (2010) took these economic games (including the Ultimatum Game) to fifteen societies that span the gamut of human existence. People who lived in very large communities (5000+ people) were much more willing to punish resource distributions that were unfair (not $50 each). 

These games were conducted with villagers playing one another. Thus, these games were informed by relationships and reputation. But increasing community size requires cooperation among strangers. Did moralizing gods help bind complex multi-ethnic empires at vast geographic scales? 

To get insight into this question, Purzycki et al (2016) examined whether moralizing gods improved cooperation with strangers who share your faith. In a sample of 2228 participants from 15 populations, they found evidence of such cooperation. They also found that, as beliefs in moralistic punishment intensified, so too did cooperative behavior.

Doesn’t this discredit our facultative religious prosociality hypothesis, put forward in Supernatural Punishment Hypothesis? There, we discussed that believers don’t behave more cooperatively than non-believers on average; they only do so when their religious practices are primed. This is measured in, among other things, willingness to punish unfair behavior in economic games.

The research we reviewed then, however, only apply in WEIRD societies. In many developing nations, participation in world religion does consistently produce a cooperative advantage. Hermann et al (2008) found that the rule of law, and strong norms of civic cooperation are responsible for “raising the sanity waterline”, and equalizing the belief advantage in many developed secular societies. 

Competition Boosters

Henrich (2015) describes five basic mechanisms of intergroup competition:

  1. Group survival w/o conflict
  2. War & Raiding
  3. Differential migration
  4. Prestige-based group transmission
  5. Differential reproduction

If religious practices evolved to boost said competition, we should expect its signature on all of these mechanisms. And that is precisely what we find.

First, let’s consider group longevity. Sosis (2000) examined data from 112 secular and 88 religious utopian communes. He found that, after controlling for commune size, year of founding etc, that for any given year, religious communes were four times more likely to survive. 

Second, consider direct competition. We’ve already seen that world religions promote cooperation towards the local co-religionists, and distant co-religionists. What about those outside the faith? Norenzayan et al (2016) predicted that 

  1. In societies with lots of intergroup conflict, co-religionists should be hostile to outgroup members (to facilitate direct competition)
  2. In societies with less conflict, co-religionists should be more generous to outgroup members (to gain converts)

Lang et al (2019) confirmed this prediction. Less competition leads religionists to behave altruistically even to the outgroup.  

Religious ritual also seems to play a role in support for suicide attacks (and, I would predict, also support for hawkish foreign policy). Ginges et al (2009) found that participation in religious services predicted support for terrorist attacks in both Israeli and Palestinian subjects. This relationship is easy to explain if we conceive of religion as subject to cultural group selection. But they also found that frequency of prayer was statistically unrelated to this attitude (again hinting at separate mechanisms within the complex adaptive system we call “religion”).

Third, world religions invest much more heavily in conversion tactics. 

Finally, modern religions have a strong pro-natalism ideologies; the various stances of the Catholic Church on contraception, abortion, and the centrality of family are but one example. 

Karma & Punishment Reification

Religion is about more than just the gods and morality. Arguably more central to religion is ritual, which also comes in secular varieties (marriage, inauguration ceremonies, etc). Other subprocesses of religion include taboo, mythology, sanctity, authority, and autobiographical meaning (Sosis 2019). These eight processes have different effects, and different historical trajectories. 

For example, Whitehouse’s (2004) mode theory describes two kinds of ritual: episodic rituals (infrequent, often dysphoric events), to doctrinal rituals (frequent, more emotionally neutral gatherings). 

Episodic rituals were used by hunter-gatherers, but as farming groups became larger, rituals became increasingly doctrinal. These two forms had different psychological effects: episodic rituals promote identity fusion (and with it, a willingness to engage in extreme acts of self-sacrifice), doctrinal rituals facilitated a more conceptual sense of group identification (Whitehouse et al 2014).

We’ve spent a lot of time discussing the cognitive basis of theism, and how the fluency of disembodied-mind concepts can be exploited to promote prosocial behavior. But the above discussion of ritual shows there are many other “cracks in our psychological armor” that religious processes can exploit, driven by the eternal flame of group competition.

The Abrahamic religions use God-beliefs to promote prosociality. What do we make of karmic religions? Consider Sarkissian (2015)’s discussion of Chinese civilization:

Shang Di (Lord on High) plays the role of a supernatural monitor during the Shang Dynasty (1600–1046 BCE)… However, after the conquest of the Shang (1045 BCE), Shang Di is replaced by the notion of Tian (or heaven). Tian appears, at the outset, as the same entity as the former Shang Di. Yet Tian slowly loses its anthropomorphic characteristics and becomes less interested in human affairs. Eventually, Tian is significantly naturalized, taken to refer as much to the patterns and propensities inherent in the natural world as to any deity. 

We know that karmic religions (including Buddhism and Hinduism) promote prosociality. But how? How does karma promote prosociality?

White et al (2020) present the karma personification hypothesis, that (contra theological correctness), karma has been personified in the practiced religion of believers, and leverages the same sense of social monitoring. They adduce evidence that, for some believers some of the time, karma is conceived in this way.

But alternatives exist. Consider Nietzsche (1884):

What actually arouses indignation over suffering is not the suffering itself, but the senselessness of suffering: but neither for the Christian, who saw in suffering a whole, hidden machinery of salvation, nor for naïve man in ancient times, who saw all suffering in relation to spectators or to instigators of suffering, was there any such senseless suffering. In order to rid the world of unseen suffering, people were then practically obliged to invent gods and intermediate beings at every level, in short, something that would not miss out on an interesting spectacle of pain so easily.”

Nietzsche is putting his finger on Belief in a Just World (BJW) (Hafer & Rubel 2015) and the related phenomenon of immanent justice reasoning (Callan et al 2014). Since punishment often follows social transgressions, people seem to attribute random misfortune to moral failings (victim blaming!): 

People drawing causal links between random misfortune and their own and others’ moral shortcomings, such as a boy who believed he failed an eye examination because he glanced through a Playboy magazine 3 days earlier, or a man who believed his friend became paraplegic because he was a “cocky guy” and “never really took the time to worry about the people who couldn’t keep up” 

Social psychology research has revealed that these intuitions are innate, and thus expressed universally. A few examples:

It may be that White et al (2020) is correct, and that karmic religions are connected by monitoring by supernatural monitors. But I suspect the answer is different, that the cognitive basis of karmic religions is just different. All human beings are subject to immanent justice and mind-body dualistic intuitions; the cultural winnowing forces simply elaborate these intuitions differently, based on circumstance.

Supernatural monitoring and immanent justice reasoning have one thing in common: punishment. In contrast to the one-substrate MHG hypothesis, let’s call this two-substrate view the Broad Supernatural Punishment (BSP) hypothesis. For an excellent discussion of the relative virtues of these theories, see Raffield et al (2019).

Three Schools

The arguments of this sequence was heavily influenced by Norenzayan (2015) and Norenzayan et al (2016), with a few divergences (especially in the Karma section above). But I want to note other theories on offer. In fact, evolutionary accounts of religion can be organized into three camps:

  1. By-Product theories are often championed by cognitive scientists. These theories focus on religious universals. 
  2. Adaptationist theories are often championed by biologists. These theories tend to focus on social impacts of religion. 
  3. Cultural Evolutionary theories are often championed by anthropologists. These theories tend to focus on biodiversity.

Given their different backgrounds, these different schools tend to discover different, complementary facets of religious phenomena. Modern theories tend to be more synthetic, drawing evidence across these traditional lines. The cultural group selection model presented here, for example, is happy to build on psychological data from by-product theorists (see Cognitive Basis of Theism) and sociological data from adaptationists (see The Supernatural Punishment Hypothesis). 

In fact, as mentioned above, there is an adaptationist version of the cultural evolutionary Big Gods hypothesis, put forward here. Johnson & Kruger (2004) first put forward what they termed the Supernatural Punishment hypothesis, as an adaptationist theory of religion. Drawing on quite similar datasets, the Supernatural Punishment hypothesis is comfortable interpreting Holocene-era religious developments (e.g., the Axial Age) as being non-genetic. But Johnson also avows a genetic adaptation kickstarted our earliest religious behaviors in the Paleolithic. 

All of which raises the question, do apes possess intuitions about mind-body dualism? Do they engage in magical thinking? What about immanent justice reasoning? I am not aware of much research on this question to date.


If you had been born before the Neolithic Revolution, you would have almost certainly worshipped weak, whimsical, and not particularly moral deities. Everyone did.

Yet over time, God evolved to become interested in certain kinds of human behavior, and slowly developed greater powers to both monitor adherents (e.g., omniscience) and harsher punishments of improper behavior (hell, or eons of karma).

Why did religion become entwined with morality? Because it increasingly played a role as glue, that stabilized communities.

Intuitions about disembodied minds, along with other “cracks in our psychological armor”, were mined by religion for their ability to induce accountability & fear of punishment. All of this was done in service to the eternal flame of group competition.

Until next time.


Two Interlocking Mysteries

  • Beheim et al (2021). Treatment of missing data determines conclusions regarding moralizing gods.
  • Roes & Raymond (2003). Belief in moralizing gods.
  • Norenzayan (2015). Big Gods: How Religion Transformed Cooperation and Conflict
  • Gresky et al (2017). Modified human crania from Göbekli Tepe provide evidence for a new form of Neolithic skull cult
  • Whitehouse et al (2019). Big Gods did not drive the rise of big societies throughout world history.

Anthropological Findings

  • Johnson (2005). God’s Punishment and Public Goods: A Test of the Supernatural Punishment Hypothesis in 186 World Cultures 
  • Peoples & Marlowe (2012). Subsistence and the Evolution of Religion 
  • Swanson (1960). The Birth of the Gods
  • Turchin et al (2019). An Introduction to Seshat: Global History Databank 
  • Watts et al (2015). Broad supernatural punishment but not moralizing high gods precede the evolution of political complexity in Austronesia

Psychological Findings

  • Hermann et al (2008). Antisocial Punishment Across Societies
  • Henrich et al (2010). Markets, Religion, Community Size, and the Evolution of Fairness and Punishment
  • Henrich (2015). Culture and social behavior 
  • Norenzayan et al (2016). The cultural evolution of prosocial religions. 
  • Purzycki et al (2016). Moralistic gods, supernatural punishment and the expansion of human sociality

Competitive Boosters

  • Ginges et al (2009). Religion and Support for Suicide Attacks
  • Henrich (2015). The Secret of Our Success: How Culture Is Driving Human Evolution, Domesticating Our Species, and Making Us Smarter
  • Lang et al (2019). Moralizing gods, impartiality and religious parochialism across 15 societies. 
  • Sosis (2000). Religion and Intragroup Cooperation: Preliminary Results of a Comparative Analysis of Utopian Communities

The Role of Karma

  • Atkinson & Whitehouse (2011). The cultural morphospace of ritual form: Examining modes of religiosity cross-culturally
  • Callan et al (2014). Immanent Justice Reasoning: Theory, Research, and Current Directions
  • Hafer & Rubel (2015). The Why and How of Defending Belief in a Just World 
  • Nietzsche (1887).  Genealogy of Morals
  • Raffield et al (2019). Religious belief and cooperation: a view from Viking Age Scandinavia.
  • Sarkissian (2015). Supernatural, social, and self-monitoring in the scaling-up of Chinese civilization.
  • Sosis (2019). The building blocks of religious systems: Approaching religion as a complex adaptive system
  • White et al (2020). Cognitive pathways to belief in karma and belief in God
  • Whitehouse (2004). Modes of religiosity
  • Whitehouse et al (2014). The ties that bind us: Ritual, fusion, and identification

A Third Way

  • Johnson & Kruger (2004). The good of wrath: supernatural punishment and the evolution of cooperation.