The Last Bits are Deepest

Part Of: Machine Learning sequence
Content Summary: 2000 words, 10min read. 
Excerpt 1 From: The Unreasonable Effectiveness of Recurrent Neural Networks
Excerpt 2 From: The Scaling Hypothesis

How Much Money Is It Worth?

A language model computes the probability of the next word, given some set of tokens. Individual predictions are optimized using cross-entropy loss, to increase the weight given the correct prediction. For example, this toy model fails to assign much probability to the word “train”, so its loss is 2.69.  

In a real language model, probability mass is spread across all possible outputs – the entire vocabulary.  On the very first prediction, the model will judge every output equally likely. For a 10k word vocabulary, you would expect the cross-entropy loss to start at approximately -ln(1/10000) ≈ 9.21. From this starting point, loss will drop to some loss floor, e.g. 4 nats/token. 

The loss floor can change as a function of dataset size, model complexity, and available compute. With the discovery of neural scaling laws, we can predict how much loss that a compute budget can achieve. Research project sized budgets (e.g., $10k) can achieve a loss floor of 2.2. But in the era of large language models (LLMs), we have spent $400m to achieve a loss of 1.2. In contrast, humans can only achieve a loss of 1.7 on familiar texts. 

But what exactly is gained from building a model with 0.5 fewer nats per token (NPT)? Is 1.2 NPT really worth the $400m it takes to train it? 

To answer this, let’s explore what a model learns across different loss regimes.

Excerpt 1: Sampling Char-RNNs Output

This passage comes from: The Unreasonable Effectiveness of Recurrent Neural Networks

It’s fun to look at how the sampled text evolves while the model trains. For example, I trained an LSTM of Leo Tolstoy’s War and Peace and then generated samples every 100 iterations of training. At iteration 100 the model samples random jumbles:

tyntd-iafhatawiaoihrdemot  lytdws  e ,tfti, astai f ogoh eoase rrranbyne ‘nhthnee e plia tklrgd t o idoe ns,smtt   h ne etie h,hregtrs nigtike,aoaenns lng

However, notice that at least it is starting to get an idea about words separated by spaces. Except sometimes it inserts two spaces. It also doesn’t know that comma is almost always followed by a space. At 300 iterations we see that the model starts to get an idea about quotes and periods:

“Tmont thithey” fomesscerliund Keushey. Thom here sheulke, anmerenith ol sivh I lalterthend Bleipile shuwy fil on aseterlome coaniogennc Phe lism thond hon at. MeiDimorotion in ther thize.”

The words are now also separated with spaces and the model starts to get the idea about periods at the end of a sentence. At iteration 500:

we counter. He stutn co des. His stanted out one ofler that concossions and was to gearang reay Jotrets and with fre colt otf paitt thin wall. Which das stimn 

The model has now learned to spell the shortest and most common words such as “we”, “He”, “His”, “Which”, “and”, etc. At iteration 700 we’re starting to see more and more English-like text emerge:

Aftair fall unsuch that the hall for Prince Velzonski’s that me of her hearly, and behs to so arwage fiving were to it beloge, pavu say falling misfort how, and Gogition is so overelical and ofter.

At iteration 1200 we’re now seeing use of quotations and question/exclamation marks. Longer words have now been learned as well:

“Kite vouch!” he repeated by her door. “But I would be done and quarts, feeling, then, son is people….”

Until at last we start to get properly spelled words, quotations, names, and so on by about iteration 2000:

“Why do what that day,” replied Natasha, and wishing to himself the fact the princess, Princess Mary was easier, fed in had oftened him. Pierre aking his soul came to the packs and drove up his father-in-law women.

The picture that emerges is that the model first discovers the general word-space structure and then rapidly starts to learn the words; First starting with the short words and then eventually the longer ones. Topics and themes that span multiple words (and in general longer-term dependencies) start to emerge only much later.

Excerpt 2: Learning Trickles Up the Abstraction Hierarchy

This passage is modified from: The Scaling Hypothesis. I converted his bits per character (BPC) into nats per token (NPT) using a 4ln2 conversion factor. 

Early on in training, a model learns the crudest levels: that some letters like ‘e’ are more frequent than others like ‘z’, that every 5 characters or so there is a space, and so on. It goes from predicted uniformly-distributed bytes to what looks like Base-60 encoding—alphanumeric gibberish. As crude as this may be, it’s enough to make quite a bit of absolute progress: a random predictor needs 8 bits to ‘predict’ a byte/character, but just by at least matching letter and space frequencies, it can almost halve its error. Because it is learning so much from every character, and because the learned frequencies are simple, it can happen so fast that if one is not logging samples frequently, one might not even observe the improvement.

As training progresses, the task becomes more difficult. Now it begins to learn what words actually exist and do not exist. It doesn’t know anything about meaning, but at least now when it’s asked to predict the second half of a word, it can actually do that to some degree, saving it a few more bits. This takes a while because any specific instance will show up only occasionally: a word may not appear in a dozen samples, and there are many thousands of words to learn. With some more work, it has learned that punctuation, pluralization, possessives are all things that exist. Put that together, and it may have progressed again, all the way down to 8 NPT! (While the progress is gratifyingly fast, it’s still all gibberish, though, makes no mistake: a sample may be spelled correctly, but it doesn’t make even a bit of sense.)

But once a model has learned a good English vocabulary and correct formatting/spelling, what’s next? There’s not much juice left in predicting within-words. The next thing is picking up associations among words. What words tend to come first? What words ‘cluster’ and are often used nearby each other? Nautical terms tend to get used a lot with each other in sea stories, and likewise Bible passages, or American history Wikipedia article, and so on. If the word “Jefferson” is the last word, then “Washington” may not be far away, and it should hedge its bets on predicting that ‘W’ is the next character, and then if it shows up, go all-in on “ashington”. Such bag-of-words approaches still predict badly, but now we’re down to perhaps 7 NPT.

What next? Does it stop there? Not if there is enough data and the earlier stuff like learning English vocab doesn’t hem the model in by using up its learning ability. Gradually, other words like “President” or “general” or “after” begin to show the model subtle correlations: “Jefferson was President after…” With many such passages, the word “after” begins to serve a use in predicting the next word, and then the use can be broadened.

By this point, the loss is perhaps 6 NPT: every additional 0.1 decrease comes at a steeper cost and takes more time. However, now the sentences have started to make sense. A sentence like “Jefferson was President after Washington” does in fact mean something (and if occasionally we sample “Washington was President after Jefferson”, well, what do you expect from such an un-converged model). Jarring errors will immediately jostle us out of any illusion about the model’s understanding, and so training continues. (Around here, Markov chain & ngram models start to fall behind; they can memorize increasingly large chunks of the training corpus, but they can’t solve increasingly critical syntactic tasks like balancing parentheses or quotes, much less start to ascend from syntax to semantics.)

Now training is hard. Even subtler aspects of language must be modeled, such as keeping pronouns consistent. This is hard in part because the model’s errors are becoming rare, and because the relevant pieces of text are increasingly distant and ‘long-range’. As it makes progress, the absolute size of errors shrinks dramatically. Consider the case of associating names with gender pronouns: the difference between “Janelle ate some ice cream, because he likes sweet things like ice cream” and “Janelle ate some ice cream, because she likes sweet things like ice cream” is one no human could fail to notice, and yet, it is a difference of a single letter. If we compared two models, one of which didn’t understand gender pronouns at all and guessed ‘he’/‘she’ purely at random, and one which understood them perfectly and always guessed ‘she’, the second model would attain a lower average error of barely <0.05 NPT!

Nevertheless, as training continues, these problems and more, like imitating genres, get solved, and eventually at a loss of 3-5 (where a small char-RNN might converge on a small corpus like Shakespeare or some Project Gutenberg ebooks), we will finally get samples that sound human—at least, for a few sentences. These final samples may convince us briefly, but, aside from issues like repetition loops, even with good samples, the errors accumulate: a sample will state that someone is “alive” and then 10 sentences later, use the word “dead”, or it will digress into an irrelevant argument instead of the expected next argument, or someone will do something physically improbable, or it may just continue for a while without seeming to get anywhere.

The pretraining thesis argues that this can go even further: we can compare this performance directly with humans doing the same objective task, who can achieve closer to 2 NPT⁠. What is in that missing 1.0 NPT?

Well—everything! Everything that the model misses. While just babbling random words was good enough at the beginning, at the end, it needs to be able to reason our way through the most difficult textual scenarios requiring causality or commonsense reasoning. Every error where the model predicts that ice cream put in a freezer will “melt” rather than “freeze”, every case where the model can’t keep straight whether a person is alive or dead, every time that the model chooses a word that doesn’t help build somehow towards the ultimate conclusion of an ‘essay’, every time that it lacks the theory of mind to compress novel scenes describing the Machiavellian scheming of a dozen individuals at dinner jockeying for power as they talk, every use of logic or abstraction or instructions or Q&A where the model is befuddled and needs more bits to cover up for its mistake where a human would think, understand, and predict. For a language model, the truth is that which keeps on predicting well—because truth is one and error many. Each of these cognitive breakthroughs allows ever so slightly better prediction of a few relevant texts; nothing less than true understanding will suffice for ideal prediction.

If we trained a model which reached that loss of <2.0, which could predict text indistinguishable from a human, whether in a dialogue or quizzed about ice cream or being tested on SAT analogies or tutored in mathematics, if for every string the model did just as good a job of predicting the next character as you could do, how could we say that it doesn’t truly understand everything? (If nothing else, we could, by definition, replace humans in any kind of text-writing job!)

The last bits are deepest. The implication here is that the final few bits are the most valuable bits, which require the most of what we think of as intelligence. A helpful analogy here might be our actions: for the most part, all humans execute actions equally well. We all pick up a tea mug without dropping, and can lift our legs to walk down thousands of steps without falling even once. For everyday actions (the sort which make up most of a corpus), anybody, of any intelligence, can get enough practice & feedback to do them quite well, learning individual algorithms to solve each class of problems extremely well, in isolation. Meanwhile for rare problems, there may be too few instances to do any better than memorize the answer. In the middle of the spectrum are problems which are similar but not too similar to other problems; these are the sorts of problems which reward flexible meta-learning and generalization, and many intermediate problems may be necessary to elicit those capabilities.

[Excerpt] Syntax facilitates Predication

Part Of: Language sequence
See Also: Linear Grammar
Adapted From: Jean-Louis Dessalles (2007). Why We Talk
Content Summary: 1200 words, 6 min read

Linear Grammar as Crude Predication.

Protolanguage as a Vestige (Fossilized Competence). Bickerton sees protolanguage as a fossil, a behavourial vestige with which each of us is endowed. We are able, effortlessly and instantaneously, to adopt a pidgin form of speech, using words from our native language. Without the slightest reflexion, words come to us naturally, in an approximate order; we just spontaneously omit grammatical words, articles, prepositions, relative pronouns, markers of tense or aspect. In Bickerton’s view, this reveals the presence of a fossilized competence, an innate expertise which was once the normal form of communication among members of Homo erectus, the species from which our own derived. It is a vestige of their speech that survives in us and which we can fall back on at times when expression through normal speech is impossible.

The arbitrariness of sign. Ray Jackendoff suggests that the single-word stage represents a functional state of communication among our ancestors (Jackendoff 1999). He points out that a fundamental property of human words is that they are not attached to a particular situation, unlike the call of an animal which is. For instance, the cry that a chimpanzee utters to announce the presence of food will not be the one uttered to urge its fellows to go and fetch the food, whereas an infant will indiscriminately use the word cat or an equivalent of it to mark the presence of a cat, to enquire where the cat is, to call it, to indicate that something looks like a cat, and so on (Jackendoff 1999). The reason why some authors like Jackendoff or Deacon see the relaxing of the signified-signifier link as a decisive moment in the evolutionary history of language is no doubt that at one and the same time it originates ambiguity and semantics. Meaning ceases to be a simple reflex association and requires some cognitive processing. A system of communication in which every speech consists of one word and in which every word is essentially ambiguous becomes the simplest system using semantics.

Protolanguage evolved to report salience. Speakers belonging to the species from which we descend probably did as we do when we try to impress people by being the first to bring genuine news of salient situations when they arise. So M’s initial statement in the preceding example resembles the sort of thing that might have occasioned speech among our ancestors. This is a type of behaviour that each of us indulges in several times a day, and is no doubt one of the things which we share with our Homo erectus ancestors and perhaps also with their predecessors. The argument put forward in this chapter has been that protolanguage evolved in the service of this behaviour of reporting salient situations.

Syntax as Sophisticated Predication.

Any addressee hearing BREAD TABLE will visualize, given the context, a new loaf lying on the table. As protolanguage does not express a spatial relation between bread and table, that relation remains implicit. One feature of language is that it does express relations and properties: THE BREAD ON THE TABLE expresses a spatial relation between two entities; THE RUNNER WINS expresses that the property ‘winning’ applies to the runner. Relations and properties are generally represented by predicates. We will use the formulae On(Bread, Table) and Win(Runner) to represent the meanings of the two examples.

Protolanguage is not completely unhelpful when it comes to expressing predicates: to express Win(Runner) it is perfectly possible to say RUNNER WIN. It suffices to state the relations and properties and to express their arguments contiguously. This expressive power of protolanguage may suggest that predicative semantics appeared in the absence of syntax, contra Bickerton. But  language is better adapted than protolanguage to the expression of predicates. The devices of syntax are more effective than those offered by protolanguage for dealing with predication

The words of language, except proper nouns attached to definite entities, express predicates. Thus the word BOOK does not represent a definite entity in the perceived environment, but a property that entities in our environment may or may not possess. In contrast to the words of language, which express predicates, it can be said that the words of protolanguage behave more like proper nouns.

Predication as puzzle solving. Prepositions, complementizer phrases, or even common nouns make no direct reference. Through their expression of predication, they afford hearers a way to determine for themselves which entity is meant. As they do this, hearers are resolving a kind of equation. The phrase THE GREEN BOOK will make them seek in the perceived context for an entity x for which Book(x) & Green(x) = True. In this way, predication is used indirectly to make reference.

Syntax, like Predication, is Recursive.

This mechanism of this sort leads inevitably to a recursive system. For example, in the sentence 

(1) Paul’s brother buys the book that John got from Jack’s sister, 

at least three levels of predication may be observed: Buy (x, y); Brother(x, Paul), Book(y), Get(John, y, z); and Sister (z, Jack). It is only the first level that constitutes an assertion. The later levels are used recursively for the determination of the arguments at the preceding levels. Semantic recursion is rather like a set of Matryoshka dolls: as long as there are more dolls inside a doll, it must be opened, since they all contain elements facilitating the determination of arguments. But dolls are also like dolls: every predication contains arguments; every argument can go into a new predicate which helps determine it, and so on. This is a perspective that gives a rather fractal image of semantics, like a snow crystal which stays identical at all degrees of enlargement.

An alternative to syntax. A possible solution would be to use a number of variables to designate the shared arguments:

(1’)  x bought y; x is Paul’s brother; y is a book; John got y from z; z is the sister of Jack. 

This procedure consists of using unambiguous variables to identify the arguments of the predicates, while being sure to use the same name whenever the variable designates the same entity. These five predicates can be expressed in any order. 

Evolution might have endowed us with the ability to cope with an efficient system of variables, such as at (1′), capable of expressing the links between predicates. The fact is, though, that human beings do not spontaneously express themselves in that way. They use syntax based on the assembling of phrases. If we compare sentences (1′) with (1), the solution devised by evolution for the expressing of semantic relations does not seem the worst possible. The real problem with a system of variables is that it is bothersome and repetitious. And it is a problem that can be avoided by expressing semantic relations through the assembling of phrases.

Polity Upsweeps and Technological Shocks

Part Of: Politics sequence
Followup To: The Evolution of Social Structure
Content Summary: 2400 words, 12 min read

A Quick Overview

Last time, we introduced phase models of social structure. We started with five phases:

  • Bands (~20 people) with egalitarian relationships and ad-hoc ritual
  • Tribes (~200 people) which live in villages, sometimes led by Big Men, and calendric ritual. 
  • Simple chiefdoms (~2k people) with a hereditary chief leading a two-tier settlement hierarchy amidst social inequality.
  • Complex chiefdoms (20k people) with a hereditary chief leading a three-tier settlement hierarchy.
  • States (~200k people) with a monarch leading a four-tier settlement hierarchy, with power delegated to full-time specialist bureaucrats.

Polities also vary along the network vs corporate dimension. Corporate-oriented polities tend to be run by oligarchies, and emphasize cult; network-oriented polities tend to emphasize individual aggrandizement, and elaborate symbols of wealth and power.

Phase change can involve complexification or simplification (structural movement towards larger group sizes, or vice versa). Today we will explore cycling, with a particular society undergoing phase changes in both directions.  

Gradualists view phase change as a slow process driven by the accretion of sociocultural variables. The cycling data reviewed here provides indirect support for the punctuationist perspective. We will also explore more direct confirmation of this model with polity upsweeps driven by technological shocks

Polity ratcheting is linked to the five channels which underlie intergroup competition. One of the most hotly-debated examples of complexification is pristine state formation. The functionalist vs coercive debate can be interpreted in terms of which channels played the dominant role:

Let’s dive in.

The Creation of Inequality

As of 2024, the top 1 percent own 43 percent of all global financial assets. As measured by the Gini coefficient, income inequality is getting worse:

Differences in intergenerational wealth transmission is an important driver of inequality. Foragers and horticulturalists show less intergenerational wealth transmission than polities with other modes of subsistence (Mulder et al 2009). What causes polities to make the jump from achievement-based societies with no inequality to rank-based societies with substantial inequality?

Insight comes from achievement-rank cycling, with a single ethnic group exhibiting different phases in different villages.  Kachin villages in highland Burma transition between gumla and gumso (achievement- and rank-based) forms, with the appearance and disappearance of “thigh-eating chiefs”. This same cycling has been observed in the Manambu and Konyak Naga peoples (Flannery & Marcus 2012, Ch10). 

In the early Paleolithic, all villages had Big Men. But something was eating away at the egalitarian ethos of most villages. Hundreds of thousands of villages likely underwent achievement-rank cycling, back and forth, before eventually stabilizing in the ranked equilibrium. We are fortunate to have these three modern-day examples of cycling. These social archaeopteryx help us shed light on why hereditary inequality was born. 

Aspiring Big Men will often go into debt trying to subsidize a ritual. For those who fail to achieve renown can often result in debt slavery. Perhaps this process writ large drives the creation of an underclass. But for the establishment of an elite “overclass”, lineage competition seems to lie at the root of hereditary inequality. When Kachin societies revert back to gumla, it is often because the subordinate lineages fight back. 

For the Kachin, different facets of their cosmology were invoked to legitimize egalitarian and hierarchical ethos. There are also hints of ideological takeover (high-status clans rewriting cosmology to legitimize their of ritual) which we also see in the Avatip clan, and also in the lineage competition of Bears vs Spiders:

The Spiders argued that they were equal to the Bears in ritual authority but were never allowed to provide Oraibi with its headman. The Bears sought justification for their ritual preeminence in the legend of Matcito. The Kokop clan of Phratry VI sided with the Spiders. More and more clans began to choose sides in the dispute, and eventually half of Oraibis population picked up and moved to nearby Hotevilla.

Lineage competition predates the Neolithic. So why did it produce rank societies only in the past 10,000 years? Perhaps fission may have become less viable, as global population density reached a certain level (social circumscription). Or perhaps resource variance changed enough to incentivize more direct competition. 

The Emergence of Pristine States

Cycling is not confined to the achievement-rank boundary. Intrachiefdom cycling, transitions between simple and complex chiefdoms, are common in the archaeological record (Anderson 1994). 

When chiefdoms come into contact with states, they often quickly transform into states themselves. When Gaul encountered Rome, it became a state within three centuries. But the very first pristine states had no examples to emulate. There are perhaps no more than six examples in human history (Spencer 2010). These occurred exclusively in agricultural societies, but millenia after the domestication of plants.

Two pristine states show evidence of chiefdom predecessors. In Mesoamerica, strong evidence suggests that the first state emerged after an extended period of chiefly cycling. The Moche have a four-tiered settlement hierarchy, and many other archeological signatures of statehood. In Upper Egypt 5400 years ago, just before the first kingdom, we see three competing rank societies. 

But evidence for chiefly predecessors for the other pristine states is less clear. Consider Mesopotamia. In the Ubaid period, polities in Southern Mesopotamia, Northern Mesopotamia, and Susiana were competing for influence. The astonishingly abrupt arrival of Uruk, likely caused by an emigration from Susiana, heralded the very first state in human history. Yet precious few indicators of inequality or chiefdoms have been unearthed from this period in Southern Mesopotamia. Later written documents attest to the existence of an oligarchy in the Early Dynastic period. Perhaps their society was a corporate-emphasis chiefdom in the Ubaid Period. 

Much ink has been spilled on the formation of pristine states. But a full theory of chiefdom-state cycling predicts the first states to be unstable, and prone to regress back to regional chiefdoms (chiefdom-state cycling). Since writing was invented in several states, this process is extremely visible in human history. It goes by the name of civilizational collapse, which is more accurately understood as a reversion towards chiefdom. 

But what causes these four-tiered settlement hierarchy, bureaucratic, temple-worshipping, literate societies to emerge in the first place?

Functionalist Theories of State Formation

Many functionalist theories tend to emphasize the benefits states provide. Economists distinguish between different types of goods: public goods (and public bads) are associated with positive externalities (and negative externalities). Consider air pollution. When a manufacturing plant emits large amounts of carbon, everyone suffers; yet the toll is not located on that firm’s balance sheet: a negative externality. In the Pigouvian theory, these situations produce market failures. In this view, the role of the state is to incentivize and generate public goods (in this case, by penalizing actions that pollute).

Specific theories emphasize particular goods. The hydraulic theory (Wittfogel 1957) argued that states evolved to build the irrigation projects required for intensive agriculture. Others emphasize risk management with palace granaries, or international trade with centralized coordination. Unfortunately, many of these theories are not empirically well-supported. For example, irrigation systems were built before the first states had emerged, and were local in scope.

Some economists deny that only states can generate public goods. The Coase Theorem holds that externalities can be internalized through Coasian bargaining, provided that transaction costs are sufficiently low. Some interpret this result to mean that state is unnecessary; others see a role for government in the reduction of transaction costs. This research tradition tends to interpret current outcomes as optimal (bad policies lose because competitors avoid deadweight loss). But once you expand your gaze beyond voluntary exchange, it becomes easier to explain cases of apparent suboptimal policies (Boix 2015).

Coercive Theories of State Formation

A wide variety of coercive theories emphasize the role that violence has played in the formation of the state. Here is Turchin (2015):

War is the reason why big states emerged. No other explanation really makes sense. I don’t deny that large-scale social integration can also bring economic and information benefits, but the returns to scale in these aspects of social function are primarily relevant for modern societies in which war is less pervasive. Economic and informational challenges simply did not loom as large in prehistory as the existential challenge of battle. Besides, we have seen that war was the chief preoccupation, to the point of tedium, of archaic kings like Tiglath Pileser. We don’t find boastful inscriptions from Ashurnasirpal about trading networks or well-maintained irrigation systems. In their own official statements, the first kings were all about war. Shouldn’t we pay attention to what they tell us?

While elite-organized warfare is not frequently attested for foraging bands, it does occur frequently amongst chiefdoms. Chiefly warfare extracts tribute. But pristine states engage in territorial conquest.

Nearly all examples of pristine state formation occurred in regions with no escape. 

  • Fleeing Egypt is challenging because of the surrounding desert
  • Fleeing southern Mesopotamia is difficult because of the Tigris and Euphrates rivers.
  • Fleeing Peru is difficult because of the Andes Mountains. 

Social circumscription plays a significant role in polity evolution; geographical circumscription also seems to accelerate state formation (Carneiro 1970). 

New Guinea farmers domesticated yams, whose caloric productivity rivals cereals. As an island, it is also subject to geographical circumscription. Why didn’t a state form there? Mayshar et al (2021) show how appropriability is a more accurate predictor of state formation than agriculture per se. Cereal preservation is possible in granaries, its harvest is temporally constrained, and they have a high calorie-per-gram ratio (Scott 2017). Yams enjoy none of these properties, and are thus less appropriable.

Warfare is particularly pernicious in meta-ethnic frontiers, with groups with different languages or subsistence modes. Before centralized leadership, matriliny fosters ingroup cooperation to maximize outgroup competition (Jones 2011). The total warfare of such frontiers has an imperiogenic effect (Turchin 2015).

Nearly all the capitals of ancient empires are located not at the center of the polity, but near these meta-ethnic frontiers. Consider Egypt. Most Egyptian capitals emerging in Upper Egypt, near the Nubian frontier. When climate change removed the competition with Nubia, Egyptian polities never quite recovered. 

Throughout the Neolithic Revolution, the steppe has been an extremely important frontier between agriculturalists and pastoralists. Pastoralist “barbarians” extract tribute from farming communities, and these states militarily resist by becoming empires. Distance to the steppe is an excellent proxy for warfare intensity, and along with agricultural duration explains most of the variance:

It seems clear that states provide public goods, and also use coercion to extract rents from its citizenry. Functionalist theories struggle to explain the latter, but stationary bandit theory (Olson 2010) can explain both. 

Upsweeps and Technological Shocks

Everyone accepts the basic fact of policy ratcheting. But does polity size increase smoothly, or have there been brief periods of dramatic complexification (upsweeps)? The political science debate between gradualism and punctuationism mirrors Gould’s biological theory of punctuated equilibrium

Many upsweeps are linked to technological shocks. Economic technology can increase the productivity or appropriability of a particular region, with social consequences. Here are a few case studies to illustrate economic shocks:

  • In Alaska at 500 CE, the drag float apparatus allowed for the safe hunting of whales (Sheehan 1985). These expeditions ultimately could only be conducted on four coastal ice-leads of northwestern Alaska.  Only these places, at this particular moment, evolved into complex, unequal polities with intense warfare (Boix 2015). 
  • In the northwest coast of North America at 500 CE, there was a switch from herring to salmon fishing. Before this time period, house sizes were very similar. Afterwards, a bimodal distribution of housing sizes revealed the emergence of an elite. Villages were moved to less convenient, more defensible locations.
  • In Ukraine in 3000 BCE, the wagon was introduced to the steppe unlocking a new form of nomadic pastoralism. This likely occurred in conjunction with horse domestication. The resulting migration was so explosive that the Yamnaya language (proto-Indo-European) is today used by 3.5 billion people (Anthony 2007)

Weapons technology can also dramatically increase a state’s ability to coerce its neighbors. Many upsweeps are linked to military shocks:

  • When it arrived from the Bering Strait 1300 kya, the Asian War Complex expanded down the Californian coast, leading to a major expansion of warfare (Lambert & Walker 1991). This shock also led to an increase in social inequality, as indexed by variance in residential housing.
  • The Akkadian and Old Kingdom upsweeps (2300 BCE) occur soon after the mass production of bronze weapons (Boix 2015)
  • The Achaemenid upsweep (500 BCE) occurred in response to the Scythian advances in cavalry warfighting (Turchin et al 2021)
  • The Napoleonic Empire upsweep occurred soon after the advent of gunpowder weapons.
  • Heavy cavalry (large breeds, stirrup, saddle, knight-based tactics) arrived in the 1340s in West Africa. This led to the formation of several states in the West African savanna, all of them governed by horse-owning dynasties (Law 1976; Levtzion 1977; Goody 1971).

Many military shocks promote complexification. Tin-bronze weapons did, because it required elite-controlled trade (tin importation).  So did cavalry warfare, since horse ownership often required access to the steppe and a good deal of money. 

But some military shocks promote simplification!  By virtue of its ubiquity, iron weapons brought power back to the commoners, and helped facilitate the “barbarian resistance” (McNeill 1982; Keegan 2004). In the Middle Ages, heavy cavalry was a complexifying force, and pikemen were a simplifying one.

Until next time.

References

  • Anderson (1994). The Savannah River Chiefdoms: Political Change in the late prehistoric southeast
  • Anthony (2007). The Horse, the Wheel, and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World
  • Boix (2015). Political Order and Inequality: their foundations and their consequences for human welfare. 
  • Carneiro (1970). A theory of the origin of the state. 
  • Currie et al (2020). Duration of agriculture and distance from the steppe predict the evolution of large-scale human societies in Afro-Eurasia
  • Flannery & Marcus (2012). The Creation of Inequality: How our Prehistoric Ancestors set the stage for monarchy, slavery, and empire
  • Goody (1971). Technology, tradition, and state in Africa. 
  • Inoue et al (2015). Polity scale shifts in world-systems since the Bronze Age: A comparative inventory of upsweeps and collapses. 
  • Jones (2011). The matrilocal tribe: an organization of demic expansion
  • Keegan (2004). A history of warfare.  
  • Lambert & Walker (1991). Physical anthropological evidence for the evolution of social complexity in coastal southern California
  • Law (1976). Horses, firearms, and political power in pre-colonial West Africa. 
  • Levtzion (1977). The Western Maghrib and Sudan
  • Maschner & Mason (2013). The Bow and Arrow in Northern North America
  • Mayshar et al (2022). The Origin of the State: Land Productivity or Appropriability?
  • McNeill (1982). The pursuit of power.
  • Mulder et al (2009). Intergenerational wealth transmission and the dynamics of inequality in small-scale societies
  • Olson (2010). Power and Prosperity
  • Scott (2017). Against the Grain: a deep history of the earliest states
  • Sheehan (1985). Whaling as an organizing focus in Northwestern Alaskan Eskimo society
  • Spencer (2010). Territorial expansion and primary state formation
  • Turchin (2015). Ultrasociety: how 10,000 years of war made humans the greatest cooperators on earth
  • Turchin et al (2021). Rise of the War Machines: Charting the Evolution of Military Technologies from the Neolithic to the Industrial Revolution
  • Turchin et al (2022) Disentangling the evolutionary drivers of social complexity: A comprehensive test of hypotheses
  • Wittfogel (1957). Oriental Despotism: A comparative study of total power

The Evolution of Social Structure

Part Of: Politics sequence
Content Summary: 2400 words, 12 min read

The Unilinear Stage Model

A polity is an identifiable political entity. Polities exhibit a remarkable degree of variation. Traditionally, four polity types been distinguished (Service 1962).

First, most human beings have lived in family-level groups (“bands”), about ~20 people living together. We’ve previously discussed the egalitarian ethos, which appears to have been maintained by a reverse dominance hierarchy

Second, some communities came together as local groups (“tribes”), often in the form of villages housing ~200 people. Villages often contain clans, each with ~30 people. Clans employ unilineal inheritance, emphasizing intensive kinship. Local groups with charismatic leaders were known as achievement-based societies, or Big Man groups. 

In some ways, it can be difficult to archaeologically differentiate tribes from bands (Renfrew 1974). Both show no evidence of differentiation among the buildings, no products of specialist craftsmanship, and a complete absence of grave goods which might indicate disparities in individual wealth.

Third, a rank society (“chiefdom”) is a collection of several villages. The village of the paramount chief village typically houses 1000 people. Leadership becomes inherited rather than earned. Social stratification and inequality emerge, often visible in grave burials. Warfare becomes more organized, and increasingly furthers economic goals (rather than more personal matters of homicide and infidelity).  With the advent of the warrior class, leadership styles transition from prestige towards coercion.

Many achievement-based societies destroyed a prominent man’s property at his death. But Carneiro (1949) notes that rank societies let the son inherit his father’s property, a key step in intergenerational inequality. Achievement-based societies brought captives home to torture or kill, and expel criminals. Rank societies like the Cauca considered prisoners and criminals a commodity, to be kept as slaves.

Fourth, a state (“kingdom”) is a collection of several chiefdoms. Such polities can house upwards of ~100,000 people. Bureaucrats began to administer the state, rather than simply kinship-based relationships. Finally, states feature cities with full-time specialists freed from the burdens of food production (specialization of labor; see Childe 1950). Warfare pivots towards conquest; tribute becomes taxation

The centralized, undifferentiated leadership style of chiefdoms had its limitations: a chief can only be at one location at a time. Archaeologically, the chiefdom is spatially limited to about a day’s walk, or 25 km (Spencer 1987). Due to this ceiling, after a certain point, conquest had limited marginal utility. By switching to centralized, differentiated leadership, states were able to transcend this constraint on size. However, differentiated leadership requires delegation, which increases the risk of usurpation. States first had to mitigate this usurpation threat before delegation became truly possible. 

Religious ritual changes dramatically across polity types. Family-level groups tend to place their huts in a circle, and conduct ad-hoc rituals in the common space. Local groups hold calendric rituals in men’s houses. Finally, chiefdoms de-emphasize men’s houses and instead use temples, which serve the elevated gods associated with the leadership clan.  In chiefdoms, sacred myths provide legitimation of authority. In many early states, the priesthood often plays a forceful role in leadership; this theocratic mode becomes less prevalent in more mature states (Webster 1976).

This tight relationship between cultural institutions and polity size is rather mysterious. Why are clans so pervasive in achievement-based societies, but less so for bands? Why do groups of ten thousand people invariably express inequality? Why do societies of one million people always require a bureaucracy? 

Check-List Archaeology

Anthropology provides rich synchronic data on modern-day political institutions. But ethnographies only go back a few decades or centuries. Archaeology provides sparse diachronic data on institutions across history. But we can use insights from anthropology to shed light on the past. 

With chiefdoms, polities began to span multiple settlements. If you first identify settlements with a shared material culture, and organize them by their population size, you’ll typically see a chiefdom comprising one large village, surrounded by multiple dependent hamlets. But some chiefdoms express a more complex settlement hierarchy, with a large village of the paramount chief, surrounded by medium-sized villages of the subchiefs, in turn surrounded by small hamlets. The number of levels in the settlement hierarchy is an indicator of polity type. A chiefdom is either a simple chiefdom (two administrative levels) or a complex chiefdom (three). A state is either an archaic city-state (four levels), all the way up to nation-state (arguably, six).

Burial rituals are useful indicators of social inequality. For example, in Europe during the Copper and Bronze Ages, dramatic changes in social differentiation took place throughout Europe. Sumptuary goods such as metal were used in burial rituals to advertise an individual’s personal wealth. When children are buried with such sumptuary goods, this is often taken as evidence of hereditary rank, since achievement-based societies often destroy the wealth of their Big Men upon death. But there are many other indicators of social inequality, such as energy invested in residential housing, and even variance of height as a window into access to nutritional variety (Boix & Rosenbluth 2014). 

Polity Ratcheting across History

One of the most dramatic trends in human history is polity ratcheting – polity size tends to grow. 

A corollary is that the number of independent polities tends to decline (Carneiro 1977). During Neolithic times, there were probably more than 100,000 independent political units of family or local group scale. Through expansion, conquest, incorporation, and treaties, this number has been reduced to roughly 160 nation-states today.

Since the speciation of Sapiens (~250 kya), humans lived in family-level groups. This polity type was favored through most of our history. Ratcheting is not observed. This did not change during the transition to behavioral modernity (~80-50 kya) or the emigration from Africa (~60 kya).

In the Ancient Near East, sedentism appears in the late Epipaleolithic (15 kya), and typically manifests in rich ecological zones with rich flora (e.g., in alluvial soil) and fauna (e.g., fishing sites).  These Natufian settlements supplemented hunting with abundant wild cereals. Settlements and ritual complexes (e.g., Gobekli Tepe) became more elaborate by the Pre-Pottery Neolithic. Only later at 9.5 kya does food production (agriculture and pastoralism) gain its stride. It took three more millennia for some of these societies to develop into states.

At first, growing crops and raising livestock did not have much of a perceptible effect on social structure. Small-scale societies of agriculturalists were nearly as egalitarian as small-scale societies of foragers. And those farming groups that stayed small-scale retained their resistance to hierarchy. In contrast, those societies that went down the path to civilization–growing large, acquiring cities, developing writing and extensive division of labor, and eventually becoming states–these societies became highly unequal, even despotic.

Egalitarian bands had less inequality and warfare than modern nation-states, but less public-good cooperation and more homicide. Political ideology biases us towards which of these considerations we find salient. Pro-state thinkers appeal to consent-based mechanisms of state formations; whereas anti-state thinkers are attracted towards coercive foundations. Discussions of social evolution tend to be muddied by ideological commitments. 

Economic Drivers of Social Evolution

There have been two subsistence modes in human history. Foragers who gather and hunt wild flora and fauna, and food producers, who use domesticated species. Within the food producers category, we see farmers who rely on domesticated plants, and pastoralists who rely on the meat and milk of domesticated animals. 

Larger polities require larger population densities, and larger population densities are only possible with a more productive subsistence economy. So it is not surprising that family-level groups were mainly foragers, and chiefdoms were mainly farmers. With each technological advance in subsistence, a wave of intensification occurs, and the ceiling of population density is raised. 

But the relationship between polity phase and subsistence mode is not deterministic! Some peoples like the Northwest Coast fishers, were hierarchical foragers – even holding slaves. And some farmers (particularly horticulturalists like the Machiguenga) lived in family-level groups (Johnson & Earle 2000).

Only in unusually productive ecologies that foragers are able to achieve population densities sufficient for chiefdoms. And even the richest natural environments are insufficient to fund a full-fledged state. States only emerge in agricultural centers.

In chiefdoms we see part-time specialists, and in states we see full-time craftsmen. Specialists depend on the food production of others. This is only possible if food producers can generate a surplus. Chiefdoms and states redirect this surplus towards public works and/or personal enrichment; this is the political economy (to be contrasted with the subsistence economy). 

Towards Evolutionary Anthropology

We can model social evolution as a Markov chain. Polities can undergo complexification (e.g., complex chiefdom → state) and simplification (state → complex chiefdom).  It is tempting to ascribe moral significance onto the direction of social evolution, but this practice isn’t particularly constructive.

Boasian anthropology vehemently rejected 1960-era cultural evolutionism. Some of this debate stems from different research styles:

  • Anthropologists in the historical particularism tradition often use an idiographic style, focusing on properties unique to their case studies.
  • Cultural evolutionists tend to use a nomothetic style, seeking to locate shared properties and to explain their emergence.

It is true that 1960s-era versions of cultural evolutionary theory were empirically impoverished, and tarnished by colonial ideologies of “savage to civilization” progress. But the field has matured since. Evolutionary typologies are often useful in facilitating cross-cultural research (Earle 1987).

Stage models are best couched with three disclaimers:

  1. Some societies never undergo complexification beyond a certain level. The achievement-based societies of the Tewa, Hopi, Mandan, and Hidatsa of North America were extremely durable.  The Pueblo appear to have transitioned to chiefdoms in 900-1200 CE, but then inequality went into remission.
  2. Social evolution is not unidirectional (with only complexification). We have seen societies transition between phases in phase cycling. Phase cycling especially gives a useful window into the dynamics of social evolution. 
  3. Many societies blend elements across phases.  The Trobriand Islanders blended social conventions from both chiefdoms and Big Man collectives. Some villages in the Oaxaca Valley simultaneously housed men’s houses and temples, during their transition between phases.

Are phases arbitrary partitions of uniformly distributed social variables? Not necessarily. Technological shocks tend to elicit rapid change. This punctuationist data suggest the underlying variables are multimodally distributed. Further, transitions between phases may be unstable. In most chiefdoms, the inability to delegate power represents a geographical ceiling (half a day’s walk). Once delegation-enabling institutions were introduced by ambitious usurpers, polity size exploded upwards. 

Multilineal theorists suggest that there are other social structures besides the five already discussed. Renfrew (1974) first described group-oriented chiefdoms, or aristocracy without chiefs. One modern-day example comes from the Apa Tani near Tibet (Flannery & Marcus 2013; Ch13). These societies engaged in large-scale monumentalism (henges in England, statues in Easter Island) or multi-tiered settlement hierarchies (American Southwest), but did not feature appreciable inequality. 

We can generalize this into the network vs corporate dimension (Kowalewski 2000; Wason & Baldia 2000). Many large-scale polities encourage personal aggrandizement, concentrated wealth, and individual power. But some polities scaled up while retaining more egalitarian economic outcomes; these societies tend to share power, and place a heavier emphasis on ritual. 

Cultural Group Selection

Phase change can be bidirectional. So why does complexification tend to dominate over the long run? Why don’t we see this ballooning community size in the great apes, for example?

Cultural Group Selection (CGS) offers a framework for approaching this question.  While biological group selection is still disputed by evolutionary biologists, an emerging consensus is that group competition can occur on a cultural basis. Simply put, larger groups tend to outcompete smaller groups, and values and institutions that foster a competitive edge differentially survive. 

In many communities, selfish behavior pays off. Free-riders will invade any single population. But cooperative communities outcompete selfish ones. If between-group selection is strong, this can lead to a cooperative equilibrium. But, per the Price Equation, this outcome only works if the between-group variance is sufficiently high. Moralistic punishment serves to reduce in-group variation, and this human adaptation may have increased our capacity for group selection.

There are at least five different channels of intergroup competition (Henrich 2015):

A few general comments are in order:

  • Intragroup cooperation (parochial altruism) generally tends to correlate with intergroup competition (xenophobia)  
  • Intergroup competition likely intensified as population density rises.
  • For small polities, non-coercive channels play center stage. Warfare dominates in later stages.

Larger groups win wars. “God is on the side of the big battalions”. Institutions that promote larger groups tend to replicate. This simple observation is the best explanation I know for polity ratcheting. Why do humans not live in chiefdoms anymore? Because states outcompeted them. 

Human Norms are Phase-Dependent

This suggests an intimate relationship between human norms and polity size. On the cultural evolutionary perspective, clannishness (tight kinship) is a package of cultural values selected to prevent fission, preserve group cohesion, and promote high fertility. Each polity phase becomes possible only with a set of values and institutions: 

  • Morality is biologically innate, rudimentary law is an invention of chiefdoms.
  • Speech is biologically innate, literacy is an invention of primary states.
  • Spiritualism and symbolism is biologically innate, religion is an invention of primary states.
  • Mate guarding is biologically innate, fraternal interest groups became more potent in achievement-based societies
  • Dominance hierarchy is biologically innate, aristocracy is an invention of chiefdoms. 
  • Stranger antipathy may be innate, but stranger tolerance emerges in the transition to states. 

The intimate relationship between values and social organization manifests in values revolutions linked to phase change:

  • The Axial Age (fairness ethics and moralizing high gods) arrived at 600 BCE, with the emergence of the first mega-empires.
  • The Enlightenment (modern science) arrived at 1700 CE, with the emergence of modern nation-states.

Does ideology have a retroactive legitimizing effect, or can it play a proactive role in facilitating conquest? The debate rages in many different historical circumstances. In the example of moralizing high gods, Purzychki et al (2016) argue for a proactive role, and Turchin et al (2023) argue the retroactive case. 

Until next time.

References

  • Abrutyn & Lawrence (2010). From Chiefdom to State: Toward an Integrative Theory of the Evolution of Polity 
  • Boix & Rosenbluth (2014). Bones of contention: the political economy of height inequality. 
  • Childe (1950). The Urban Revolution
  • Flannery & Marcus (2012). The Creation of Inequality: How our Prehistoric Ancestors set the stage for monarchy, slavery, and empire
  • Henrich (2015). Secret of our Success: How Culture Is Driving Human Evolution, Domesticating Our Species, and Making Us Smarter
  • Johnson & Earle (2000). The Evolution of Human Societies
  • Korotayev (1995). Mountains and democracy: an introduction
  • Kowalewski (2000). Cyclical transformations in North American prehistory
  • Liverani (2008). The Shape of the Ancient Near East: Historical Overview
  • Mulder et al (2009). Intergenerational Wealth Transmission and the Dynamics of Inequality in Small-Scale Societies
  • Purzychki et al (2016). Moralistic gods, supernatural punishment and the expansion of human sociality
  • Renfrew (1974). Beyond a subsistence economy: the evolution of social organization in prehistoric Europe
  • Smith (2009). V. Gordon Childe and the Urban Revolution: a historical perspective on a revolution in urban studies
  • Spencer (2010). Territorial expansion and primary state formation
  • Turchin & Gavrilets (2009). Evolution of Complex Hierarchical Societies 
  • Turchin et al (2023). Explaining the rise of moralizing religions: A test of competing hypotheses using the Seshat Databank.
  • Wason & Baldia (2000). Religion, communication, and the genesis of social complexity in the European Neolithic

The Yamnaya Singularity

Part Of: History sequence
Content Summary: 5000 words, 25 min read

The Indo-European Language Family

In 1786, a British judge and linguist in India was asked to learn Sanskrit, to better understand how to integrate British and Hindu law. Three years after his arrival in Calcutta, Sir William Jones wrote,

The Sanskrit language, whatever be its antiquity, is of a wonderful structure: more perfect than the Greek, more copious than the Latin, and more exquisitely refined than either; yet bearing to both of them a stronger affinity, both in the roots of verbs and in the forms of grammar, than could possibly have been produced by accident; so strong indeed, that no philologer could examine them all three, without believing them to have sprung from some common source, which, perhaps, no longer exists.

Language variation is pervasive. We speak more like people we interact with, than people with whom we do not. Thus, the separation of speech communities causes language drift.

The discovery of language families, with their branching patterns, is a hallmark of comparative linguistics. These linguistic trees were a source of inspiration to Charles Darwin in his discovery of common descent and natural selection. The Indo-European language family is the largest known language tree, with some 3 billion speakers. The geographical breadth of this single language family suggests its original language community, the Proto-Indo-Europeans (PIE) were politically successful. When and where did the first speakers of PIE live?

Who Spoke PIE?

There are three theories: 

  1. Some archaeologists are content to deny PIE was spoken by any particular peoples. They claim that Indo-European similarities derive from extensive borrowing. But linguists can differentiate creolizations from linguistic descent.
  2. Colin Renfrew (1987) advanced the Anatolian hypothesis, which attributes early Neolithic farmers as the original speakers of PIE. This explains why the earliest branch in the PIE family is located in modern-day Turkey.
  3. Maria Gimbutas (1965) advanced the steppe hypothesis, which has pastoralists from the Pontic-Caspian steppes as the original speakers of PIE.

Language evolution involves systematic changes in pronunciation. For example, English Great Vowel Shift makes Old English nearly impossible for modern speakers to understand. But if you can detect where and when these changes occur for a particular word, it is possible to reconstruct the word as it was spoken originally. These are called cognates

A list of cognates sheds light on the original PIE vocabulary. This in turn can help us understand the people who spoke it. As Edward Sapir once said, “the complete vocabulary of a language may indeed be looked upon as a complex inventory of all the ideas, interests, and occupations that take up the attention of the community.”

The PIE vocabulary includes language for clans. Their gods were all male, suggesting a patrilineal society. They also had a word for chief. Patron-client institutions, a common thread in Indo-European cultures, are common among chiefdoms. 

PIE includes words related to wool textiles, the wheel, and wagons. None of these existed before about 4000 BCE. This suggests that Proto-Indo-European was spoken after 4000-3500 BCE. It also doesn’t include many agriculturalist concepts, but an elaborate understanding of domesticated animals. 

Finally, temperate-zone flora and fauna dominate in the reconstructed vocabulary; with no tropical or Mediterranean species. And PIE borrows many words from Proto-Uralic, and less clear linkages to Kartvelian language of the Caucasus region. 

Taken together, these data suggest the original PIE language community were Yamnaya pastoralists in the Pontic-Caspian steppes in 4000 BCE. This group’s migration patterns are consistent with the branching structure of the language family. These people also exerted an enormous influence on geopolitics in the Bronze Age, which explains their linguistic success.

The Politics of Migration

The topic of migration is politically sensitive. Flannery & Marcus (2012) suggests that the “we were here first” principle is a cultural universal, ubiquitously invoked in land disputes. Absence of migration provides clear answers to this question. But evidence of genetic change can be deployed as ammunition in these matters.

Anthony (2012) notes the politicization of Indo-European research:

The problem of Indo-European origins was politicized almost from the beginning. It became enmeshed in nationalist and chauvinist causes, nurtured the murderous fantasy of Aryan racial superiority, and was actually pursued in archaeological excavations funded by the Nazi SS… In Russia some modern nationalist political groups and neo-Pagan movements claim a direct linkage between themselves, as Slavs, and the ancient “Aryans.” In the United States white supremacist groups refer to themselves as Aryans. There actually were Aryans in history – the composers of the Rig Veda and the Avesta – but they were Bronze Age tribal people who lived in Iran, Afghanistan, and the northern Indian subcontinent. It is highly doubtful that they were blonde or blue-eyed, and they had no connection with the competing racial fantasies of modern bigots.

This Nazi legacy caused the entire topic of migration to become taboo in post-WWII archaeology. The new orthodoxy insisted that changes in language and material culture is not sufficient evidence for migration. “Pots are not people” was their rallying cry.

But migration does happen. In the early 2000s, oxygen and strontium isotopes were showing that many Neolithic people died far from where they were born – they had migrated. Even very long-range migrations occurred:

The Afanasievo culture was intrusive in the Altai, and it introduced a suite of domesticated animals, metal types, pottery types, and funeral customs that were derived from the Volga-Ural steppes. This long-distance migration almost certainly separated the dialect group that later developed into the Indo-European languages of the Tocharian branch, spoken in Xinjiang in the caravan cities of the Silk Road around 500 CE but divided at that time into two or three quite different languages, all exhibiting archaic Indo-European traits. Most studies of Indo-European sequencing put the separation of Tocharian after that of Anatolian and before any other branch. The Afanasievo migration meets that expectation. The migrants might also have been responsible for introducing horseback riding to the pedestrian foragers of the northern Kazakh steppes, who were quickly transformed into the horse-riding, wild-horse-hunting Botai culture just when the Afanasievo migration began.

The Ancient DNA Revolution

Modern DNA does contain information about the past, of course. But the further back you look, the more tenuous the inferential bridge. Ancient DNA can directly verify historical hypotheses about genetic admixture, population replacement, and natural selection. All DNA is lost within a few million years, so this technique cannot inform anthropogenic events (other proteins might). But ancient DNA does illuminate the Pleistocene and Holocene – the entire history of Sapiens. 

The scientific community, which had been using teeth, discovered that sampling of cochlea of the petrous bone could recover two orders of magnitude more DNA!  And genome sequencing has become affordable. Just as radiometric dating brought about the second scientific revolution in archaeology, with ancient DNA (aDNA) we are in the midst of the third scientific revolution.

Paleogenetic information often necessitate revisions to hypotheses based on linguistics and material culture. In our case, Indo-European studies were revolutionized in 2015.

First, paleogenetic data showed that the agricultural revolution (~9kya) was a movement of people, not a movement of ideas. During the early Neolithic ancestry derives from both sources: 

But less than 5,000 years ago, a new genetic signature started to predominate. The Yamnaya genome arrives and establishes a lion’s share of modern European ancestry (Allentoft et al 2015; Haak et al 2015). Massive migration from the steppe was the source for Indo-European languages in Europe. 


Even David Anthony, a leading proponent of the steppe hypothesis, did not suggest genetic diffusion. He proposed that material and linguistic aspects of Yamnaya culture spread through imitation and proselytization. A genetic version of the steppe hypothesis is now consensus. Even the leading proponent of the Anatolian hypothesis Colin Renfrew has by now endorsed the steppe hypothesis. 

Massive migrations are not confined to the Yamnaya. They are pervasive across continents, and across centuries. aDNA showed that inferences derived from modern DNA were more problematic than expected, because human mobility was surprisingly high (Pickrell & Reich 2014). 

We will be diving into the details of the Yamnaya story, starting with their origins in the Eneolithic, their overwhelming Europe in the Early Bronze Age (EBA), and then the Indian subcontinent in the Late Bronze Age (LBA). We will close with a discussion on whether this migration was a genocide, or something else entirely.

The Kuban Steppe in the Eneolithic

In the Northern Caucasus Mountains, chiefs appeared among what had been small-scale farmers. The Maikop chiefs were very rich, and also left kurgan graves similar to those found in Suvorovo-Novodanilovka. This was a Mesopatamian outpost, with Uruk symbols of power adorning their graves.

Ancient DNA data reveals three clines in the Eneolithic steppe (Lazaridis & Reich 2024). The Volga Cline represents genetic admixture along the Volga river, and derives from Eastern Hunter Gatherers (EHG). The Dnipro Cline follows the Dnipro River, and derives from the Ukraine Neolithic Hunter Gatherer (UNHG) population. Finally, the Caucasus-Lower Volga (CLV) Cline is situated in the Kuban steppe, includes the Maikop people, and represents admixture between steppe people and Mesopotamia. 

In 4200 BCE, Danubian Culture (“Old Europe”) was at its peak. Centered in Bulgaria, the Varna cemetery had the most elaborate funerals in the world, richer than anything of the same age in the Near East. Goddess fertility cults with female statues were ubiquitous, suggesting patriarchal control was weak in these societies. 

Old Europe collapsed between 4200 and 3900 BCE. More than six hundred tell sites were burned and abandoned in eastern Bulgaria. Steppe material culture appeared in Old Europe just before the collapse. 

The colder climate of this period undoubtedly strained the economies of Old Europe. But pervasive evidence of warfare (ubiquitous stone maces), tell fortifications, site abandonment, and massacre-linked mass graves all point to a complementary role of endemic warfare. This would not have been a coordinated military invasion. Small-scale raids and piecemeal migrations did the trick.

The Usatova and Cernavoda I cultures have 49% and 76% CLV ancestry, suggesting they were the product of migration. The Suvorovo-Novodanilovka complex left kurgan graves, which may hint at a CLV connection (the CLV also used kurgans).

The discovery of the CLV people and their migrations provide the first paleogenetic evidence corroborating the Indo-Anatolian hypothesis. Specifically, Lazaridis & Reich (2024) argue for a eastern migration of the CLV people into Anatolia. Before this, Anatolian languages emerged in western Anatolia, prompting Kloekhorst (2023) to hypothesize a western migration from the Balkans. However, the expansion of the Kura-Araxes culture removed any trace of CLV ancestry from eastern Anatolia, which may explain the absence of PIA languages from this region.

And then, in 3500 BCE, the Yamnaya singularity occurred. But first, let’s try to understand what caused the profound success of these particular people.

Towards Nomadic Pastoralism

In severe snowstorms, cattle die quickly. They cannot burrow through the snow, and perish without fodder. But horses are supremely well adapted to the cold grasslands where they evolved.  A shift to colder climatic conditions would incentivize the domestication of horses. Just such a shift occurred between 4200 and 3800 BCE. Did horse domestication occur in the Eneolithic?

In modern-day Kazakhstan, the Botai people used domesticated horses in order to more effectively hunt wild equids. The 3500 BCE site featured horse teeth wear consistent with bitting, stables, dairy consumption, and other circumstantial evidence strongly suggestive of Eneolithic husbandry (Outram et al 2009). The Botai relationship with horses may have initially been analogous to reindeer herding tribes. But Botai horses are not the ancestors of modern-day horses (Librado et al 2021), but the smaller Przewalski’s horse (Gaunitz et al 2018).

The Przewalski horse was likely not imported into the Pontic steppe. But the ancestors of modern horses already lived there. Could the Yamnaya have applied such techniques to local equids in the Eneolithic? After the collapse of Old Europe, CLV Cline settlements regularly contain horse bones. Maces with horse heads become common in steppe graves. If horses were not being ridden into the Danube valley, it is difficult to explain their sudden symbolic importance in Old European settlements. 

While the status of the horse in the Eneolithic is unclear, EBA Yamnaya exhibit skeletal pathologies consistent with horseriding (Trautmann et al 2023). So “proto-modern” horses likely contribute to the Yamnaya singularity. The horse revolutionized both pastoral economies and also their military efficacy.

With a herding dog, a person on foot can herd ~200 sheep. On horseback with the same dog, one can herd ~500. Larger herds require larger pastures, and the desire for larger pastures would have caused a series of boundary conflicts.  Further, horses also make raiding much more profitable.  When the indigenous peoples of the North American Plains first began to ride, chronic horse-stealing raids soured relationships even between tribes that had been friendly. Riding also was an excellent way to retreat quickly; often the most dangerous part of tribal raiding on foot was the running retreat after a raid. 

The wagon was likely invented in the Near East, and rapidly disseminated to the Yamnaya via the Maikop. This technology greatly expanded the pastoralist niche:

With a wagon full of tents and supplies, herders could take their herds out of the river valleys and live for weeks or months out in the open steppes between the major rivers-the great majority of the Eurasian steppes. Land that had been open and wild became pasture that belonged to someone. Soon these more mobile herding clans realized that bigger pastures and a mobile home base permitted them to keep bigger herds. Amid the ensuing disputes over borders, pastures, and seasonal movements, new rules were needed to define what counted as an acceptable move—-people began to manage local migratory behavior. Those who did not participate in these agreements or recognize the new guest-host institutions became cultural Others, stimulating an awareness of a distinctive Yamnaya identity. 

Natural selection in humans is also subject to controversy. Most adaptations take a long time to persist. Given our recent speciation date (~250 kya), Sapiens exhibit remarkable genetic homogeneity (Boyd & Silk 2020). But selective sweeps in humans may be accelerating (Hawks et al 2007), and aDNA reveal at least seven instances of selective sweeps (Matthieson et al 2018). Two of these are relevant to the Yamnaya:

First, the Yamnaya were among the first to evolve lactose persistence (Segurel et al 2020), a genetic adaptation which disseminated with their migration patterns. Lactose persistence has also evolved independently in Africa and other locales, but these variants did not spread as widely (Segurel & Bon 2017). Lactose persistence seems to evolve as an adaptation to pastoralist diet. Paleoproteomic data show dairying became pervasive in the early bronze age (predominantly from sheep), and the Yamnaya show iron deficiency characteristic of heavy milk drinking. Milk processing (converting it into cheeses, yogurts, etc) also reduces the lactose content of milk. LP adaptations complement processing, uplifting its bearers to new caloric opportunities. 

Second, the Yamnaya were subject to selection for increased height. There have been three changes in stature trends: a reduction at the advent of agriculture, an increase at the advent of mobile pastoralism, and an increase during the Industrial Revolution.  The change in human height 200 years ago is clearly caused by non-genetic changes.  But height is 80% heritable, and the genome shows selection for reduced height with the birth of agriculture, and selection for steppe pastoralists. These genetic changes are likely a response to changes in human diet. The Yamnaya simply enjoyed more protein and fat. They were 12-20 cm (5-8 inches) taller than their agriculturalist neighbors!

Sherratt (1983) proposed a revolution in subsistence economics, where pastoralists who had been using animals for their primary products (meat, blood, hides) started capitalizing on secondary products (wool, milk, and muscle power). This secondary products revolution is a nice way of conceptualizing the ascent of the steppe. 

The horse and the wagon gave pastoralists an immense economic advantage. The horse and their physical robustness gave them a considerable military advantage. These factors help explain what happened next.

The Yamnaya Singularity

Europe’s farming communities were booming in 3800 BCE. But starting in 3400 BCE, a demographic collapse gripped the subcontinent. The causes of the Neolithic decline are poorly understood. Seersholm et al (2024) found a very high prevalence of plague in early Bronze Age Scandinavia. Proximity to domesticated animals correlates with volume of zoonotic pathogens, which makes a steppe origin of plague plausible.  If Yersinia pestis exhibited high lethality, it may help explain the Neolithic decline.


The Yamnaya singularity began 3800 BCE. It did occur within a single generation. But it was sudden. aDNA shows a founder event of a few thousand Yamnaya individuals. We also see dramatic changes in pollen cores. In Sweden, we see the steppe-affiliated Single Grave Culture burning down forests to make grassland for their herds.

We can see the suddenness from aDNA time series. Steppe peoples invaded Britain around 2500 BCE, leading to a 90% replacement of ancestry. The Iberian migration was less totalizing, but it still involved a fairly sudden 40% replacement. 

Two of the very earliest Yamnaya migrations were the Afanasievo culture (who spoke Tocharian, discussed above), and also traveled through the Globular Amphora peoples to make the Corded Ware culture. The Corded Ware peoples (haplogroup R1a) received 70-80% ancestry from the steppe. The Bell Beaker peoples (haplogroup R1b) exhibit more cultural than genetic diffusion. 

Yamnaya in the eastern steppe were more mobile than the west, an economic difference with interesting implications. The western steppe had more exposure to agriculture, as shown in archaeology and western PIE languages. Western steppe rituals were female-inclusive, eastern steppe rituals were more male-centered. In western religions, the spirit of the hearth was female, and in Indo-Iranian it was male. Western graves had some women, eastern graves had nearly zero. 

Chariots and the Indo-Iranians

A millennium later, the Yamnaya hegemony was weakening. A Corded Ware population from Poland was moving east (Saag et al 2021). The first culture on the road to the Ural mountains was the Fatyanovo-Balanovo culture. The next step, east still was the Abashevo culture. These Indo-Iranian people described themselves as Aryan, but the term today has connotations with Aryanism. 

The MBA saw the beginning of the aridity crisis, which would ultimately culminate in uninhabitability in the LBA steppe. Deteriorating climate typically exacerbates warfare, and so it was in the Abashevo culture. Weapons were deposited in 10% of EBA Yamnaya graves; in the Abashevo the frequency was closer to 50%. We also see the appearance of many fortified towns. This warfare may also explain the invigorated trade of the region:

Susan Vehik studied political change in the deserts and grasslands of the North American Southwest after 1200 CE, during a period of increased aridity and climatic volatility comparable to the early Sintashta era in the steppes. Warfare increased sharply during this climatic downturn in the Southwest. Vehik found that long-distance trade increased greatly at the same time; trade after 1350 CE was more than forty times greater than it had been before then. To succeed in war, chiefs needed wealth to fund alliance-building ceremonies before the conflict and to reward allies afterward. Similarly, during the climatic crisis of the late MBA in the steppes, competing steppe chiefs searching for new sources of prestige valuables probably discovered the merchants of Sarazm in the Zeravshan valley, the northernmost outpost of Central Asian civilization. Although the connection with Central Asia began as an extension of old competitions between tribal chiefs, it created a relationship that fundamentally altered warfare, metal production, and ritual competition among the steppe cultures. 

Sintashta sites were heavily fortified. They were also metallurgic production centers feeding an enormous, high-volume trade route. But Sintashta was the source of two major military shocks. It was here that the modern DOM2 horse was domesticated (with one mutation for lumbar support; the other associated with calmness). And it was here that the spoked-wheel chariot was invented. 

In Syria, the Mittani kingdom hired Indo-Iranian chariot mercenaries who in 1500 BCE appear to have usurped the throne and founded a dynasty (Anthony 2012). The dynasty quickly became Hurrian in almost every sense but clung to Indo-Iranian concepts long after its founders faded into history. This is why the deities, moral concepts, and Old Indic language of the Rig Veda are first attested not in India, but in northern Syria. 

The Andronovo culture, descendents of the Sintashta, established dominance in this region. They established a trade relationship with the Bactria-Margiana Archaeological Complex (BMAC). This was the home of fire rituals and soma consumption, which became central elements in Zoroastrian and Hindu religion.

The BMAC genome has no steppe component. Theirs was a cultural contribution. But in 1700 BCE, the Indo-Iranian people migrated into Punjab, bringing Sanskrit with them. The Rig Veda was likely authored in this area around 1300-1500 BCE. The people of the Rig Veda did not live in brick houses and had no cities, although their enemies, the Dasyus, did live in walled strongholds. Chariots were used in races and war; the gods drove chariots across the sky. The chief god Indra is surrounded by a heavenly war band called the maryanna. 

This migration into Punjab is evident from aDNA evidence. Modern India shows ~40% ancestry from the steppe, and this genetic contribution is male-biased. Several Out of India theories also try to explain this admixture, but I have not found these alternatives convincing. Two hundred years after the steppe migrations, the Indus Valley Civilization (IVC), aka the Harappan Civilization, collapsed. The Ancestral North Indian (ANI) population is a mixture of IVC and steppe peoples (Narasimhan et al 2019). 

Was the formidable military of the Indo-Iranians responsible for the collapse of the Indus Valley Civilization? The archaeological evidence on the Aryan invasion theory is quite mixed. But the more modest Aryan migration theory receives significant support from paleogenetics. 

Steppe ancestry is not distributed equally amongst the ANI – it has a status component. Brahmin ancestry is heavily derived from the Sintashta (Reich et al 2009; Narasimhan et al 2019). Brahmins are among the traditional custodians of Vedic religious texts. The high status of the Vedic migrants may have strong genetic persistence due to endogamy practices of the subcontinent. 

There are remarkable parallels between this tale of two subcontinents:

Sex-Biased Demic Expansion

Genetic data cannot tell us how many individuals were alive at any point in time. But it allows estimates of effective population, or those individuals who leave offspring. Karmin et al (2015) found the effective population in women expands very predictably, with demographic booms associated with the initial expansion out of Africa (50 kya), and the invention of agriculture (9 kya). But the effective population in males cratered in the Neolithic, reducing almost in half. 

Poznik et al (2016) finds the most intense bursts in male demography are associated with the Yamnaya singularity. Where did all the Neolithic men go?

Demic expansion also occurred during the agricultural revolution. But Goldberg et al (2017) show that these massive migrations were conducted by men and women moving in roughly equal numbers; perhaps families or clans moving together. In contrast, the Yamnaya singularity was a male-biased migration (Saag et al 2017), with a male:female ratio roughly 15:1. Lazaridis & Reich (2017) contest this result; they argue that the signal is distorted by Corded Ware males taking Yamnaya brides. 

Several other factors lead me to the male-biased migration hypothesis.

  1. The Y-chromosome replacement rate substantially exceeds the autosomal rate (e.g., in Iberia, 40% of the genome comes from Yamnaya, but 90% of the haplogroups are from the steppe).
  2. Burials in Sweden in the Early Bronze Age are predominantly male, with women and children getting more representation in later centuries (Tornberg & Vandkilde 2024).
  3. Strontium analyses of burials suggest a pattern of local steppe males marrying females from other villages (Sjogren et al 2020).
  4. Female burial practices are more diverse than that of males.
  5. Discontinuities in skeletal femur length in the Battle Axe Culture (BAC) are suggestive of a male-biased migration:

Some theorists argue that the genetic data could be explained by differential fertility. I find this difficult to reconcile with the data suggesting male-biased migration and haplogroup turnover. 

This pattern is not confined to the Yamnaya. In many of the great admixtures in human history, a central theme has been the coupling of men with social power in one population and women from the other. In African Americans, European ancestry from females is 10% and from males is 38%. In Colombia, European ancestry from females is 10% and from males is 94%. We also see similar patterns in the Bantu migration, and perhaps from the Papuan takeover of the Pacific Islands (Reich 2015). One might view colonialism as the latest incarnation of a deep-seated human behavior – demic expansion. 

Genocide or Hypergamy?

It seems that Yamnaya males migrated into Europe fairly rapidly, and Neolithic females began preferentially reproducing with them. But Is this a case of genocide in prehistory, on a scale never before seen? Or was this driven by hypergamy, with females marrying wealthy, prestigious outsiders?

Comparative mythologists have reconstructed several aspects of PIE rituals related to raiding. The Trito myth, which legitimized the cattle raid (“so they might sacrifice the cattle properly”). The transition to manhood was organized around koryos, the warrior brotherhood of young men bound by oath to one another and to their ancestors during a ritually mandated raid. These boys were transformed into warriors by symbolically becoming dogs and wolves through the consumption of their flesh (Anthony & Brown 2017). Kristensen argues that the Yamnaya practiced primogeniture, which would have incentivized non-firstborn males to make a living elsewhere, perhaps via the koryos ritual. This is an ideology of expansion and exploitation. 

If the Yamnaya singularity was a genocide, where are all the bodies? Also, military campaigns in the Early Bronze Age are simple anachronisms, and only 10% of Yamnaya were buried with weapons. Further, use-wear analysis has revealed that many Corded Ware axes were primarily used in agriculture (Wentink 2020). 

Or it could have been as simple as female choice. We simply do not know. Historical demic expansions are conducted by states executing military campaigns. But these instances often leave a weaker genetic fingerprint. It is difficult to know how migrant males competed for females in prehistory. 

Different facets of the Yamnaya express male competition differently. Britain and Iberia had different experiences. The South Asian outcome is more likely to have been bloody, because the Sintashta were more warlike.

Perhaps someday we will understand more. 

References

  • Allentoft et al (2015). Population genomics of Bronze Age Eurasia 
  • Anthony (2012). The horse, the wheel and language: how Bronze-age riders from the Eurasian steppes shaped the modern world. 
  • Anthony & Brown (2017). The dogs of war: a Bronze Age initiation ritual in the Russian steppes
  • Boyd & Silk (2020). How humans evolved.
  • Flannery & Marcus (2012) The creation of inequality. 
  • Gaunitz et al (2018). Ancient genomes revisit the ancestry of domestic and Przewalski’s horses.
  • Goldberg et al (2017). Ancient X chromosomes reveal contrasting sex bias in Neolithic and Bronze Age Eurasian migrations
  • Goldberg et al (2017).  Reply to Lazaridis and Reich: Robust model based inference of male-biased admixture during Bronze Age migration from the Pontic Caspian Steppe
  • Gimbutas (1965). Bronze Age Cultures of Central and Eastern Europe (1965)
  • Haak et al (2015). Massive migration from the steppe was a source for Indo-European languages in Europe. 
  • Haak et al (2023). The Corded Ware Complex in Europe in light of current archaeogenetic and environmental evidence. 
  • Harmanussen (2003).Stature of early Europeans
  • Hawks et al (2007). Recent acceleration of human adaptive evolution
  • Karmin et al (2015). A recent bottleneck of Y chromosome diversity coincides with a global change in culture 
  • Kloekhorst (2023).  Proto-indo-anatolian, the “Anatolian split” and the “Anatolian trek”: a comparative linguistic perspective.
  • Lazaridis & Reich (2017). Failure to replicate a genetic signal for sex bias in the steppe migration into central Europe
  • Lazaridis & Reich (2024). The genetic origin of the Indo-Europeans
  • Librado et al (2021). The origins and spread of domestic horses from the Western Eurasian steppes. 
  • Mathieson et al (2018). Eight thousand years of natural selection in Europe.
  • Moorjani et al (2013). Genetic evidence for recent population mixture in India
  • Narasimhan et al (2019). The formation of human populations in South and Central Asia
  • Olalde et al (2019). The genomic history of the Iberian Peninsula over the past 8000 years
  • Olalde et al (2018). The Beaker Phenomenon and the Genomic Transformation of Northwest Europe
  • Pickrell & Reich (2014). Toward a new history and geography of human genes informed by ancient DNA
  • Poznik et al (2016). Punctuated bursts in human male demography inferred from 1,244 worldwide Y-chromosome sequences
  • Rasmussen et al (). Early divergent strains of Yersinia pestis in Eurasia 5,000 years ago
  • Reich et al (2009). Reconstructing Indian population history
  • Reich (2018). Who we are and how we got here. 
  • Renfrew (1987). Archaeology and Language: the puzzle of Indo-European origins
  • Saag et al (2017). Extensive farming in Estonia started through a sex-biased migration from the Steppe
  • Saag et al (2021). Genetic ancestry changes in Stone to Bronze Age transition in the East European plain
  • Seersholm et al (2024). Repeated plague infections across six generations of Neolithic Farmers
  • Segurel & Bon (2017). On the evolution of lactase persistence in humans. 
  • Segurel et al (2020). Why and when was lactase persistence selected for? Insights from Central Asian herders and ancient DNA. 
  • Sherratt (1983). The secondary exploitation of animals in the Old World.
  • Taylor & Barron-Ortiz (2024). Rethinking the evidence for early horse domestication at Botai
  • Tornberg (2018). Stature and the Neolithic transition – skeletal evidence from southern Sweden
  • Tornberg & Vandkilde (2024). Modelling age at death reveals Nordic Corded War paleodemography
  • Trautmann et al (2023). First bioanthropological evidence for Yamnaya horsemanship
  • Wang et al (2022). The genetic formation of human populations in East Asia
  • Wentink (2020). Stereotype: the role of grave sets in Corded Ware and Bell Beaker funerary practices


Intro to Kinship Systems

Part Of: Culture sequence
See Also: Kinship explains the IC Dimension
Content Summary: 1800 words, 9 min read

The Economics of Kinship

Kinship systems are solutions to economic problems. 

In foraging societies, resources are scarce and dispersed. Hunter-gatherers that exploit a diversity of unevenly distributed food resources may need to hedge their bets by having kin in different places in times of need (Yellen and Harpending, 1972).  They employ this loose (extensive) kinship

In food production societies, resources become abundant and concentrated. Competing over resources becomes much more profitable. Agropastoralists need to keep their kin close by, to reduce the dilution of inheritable family wealth (Borgerhoff Mulder et al., 2009), and may help with kin-based resource defense of such wealth, which is more important for many agropastoral societies as opposed to hunter-gatherer societies. They employ tight (intensive) kinship

Empirically, we see a tight correlation between kinship tightness and dependence on foraging:

Residence and Descent

Residence patterns describe where a new couple lives after marriage. Many foragers practice bilocal residence, with newlyweds flexibly choosing a residence that fits their needs. In contrast, most agriculturalists practice unilocal residence: consistently moving into the residence of the wife’s kin (matrilocality) or the husband’s kin (patrilocality).

Descent ideology reflects cultural norms about inherited wealth. Patrilineal systems have inheritance wealth flowing through the father’s descent; matrilineal systems have it flowing through the mother. Western surname systems, for example, reflect patrilineal logic.

According to main sequence theory, descent ideology tends to follow changes in residence pattern. Many foraging societies manifest patrilocality but not patrilineality, but not vice versa. Patrilineal societies require a patrilocal precedent, and matrilineal societies require a matrilocal precedent. This has been empirically confirmed by cross-cultural analyses (Divale 1984; Ember & Ember 1971).

Here is an example of societies from patrilocal to matrilocal residence, and back again. Avunculocality is residence with the man’s mother’s brother – his closest senior male matrilineal relative. It is a compromise between the matrilineal principle and pressure to keep related males residing in one community. 

Many hunter-gatherer societies may tend toward bilocality as a means of adapting to fluctuations in sex ratio in small local groups (Lee 1972), with bilateral descent serving as the companion ideology. 

We will discuss the matrilineal puzzle some other time, and for now confine our attention to patrilineal clans, which comprise the majority of unilineal systems. 

What caused the shift to patrilineal descent? This ideology is correlated with sedentism and community stability, which explains its prevalence for agriculturalists and not foragers (Murdock 1949). But these factors don’t explain its prevalence in pastoralists (Korotayev 2004). To explain that, a third driver of patrilineal descent has been discovered: high warfare frequency (Ember et al 1974). One of the strongest predictors of warfare frequency is the “threat of unpredictable natural disasters destroying food supplies”, and pastoralists are especially susceptible to this sort of disaster. 

Ember et al (1974) wrote,

We suggest that if allies are needed for offense or defense, it would be most advantageous to have a group to call upon which has no conflicting loyalties. If there are unilineal descent groups, every person belongs to just one set (or hierarchy of sets) of persons. A unilineal descent principle thus provides unambiguously discrete groups of people for collective action in competitive situations. In bilateral societies, where kin groups are overlapping and nondiscrete, individuals may have conflicting loyalties with respect to which particular set of kinsmen they should join in competitive conditions. In short, the speed and effectiveness of collective response in a competitive situation should be greater in a unilineal society and therefore we should expect unilineal descent groups to be more likely to occur. 

Unilineal descent groups are distinctive in their sense of corporate identity (Fuentes 1953). If a member of one lineage kills a member of another, it is not the individual who is responsible to pay blood money. The group as a whole does. Similarly, the entire group competes over property rights to various economic resources (e.g., watering holes). This sense of corporate identity is reinforced by use of classificatory kinship: describing everyone in one’s community as descending from one (perhaps fictional) eponymous ancestor. Clans use kinship terms as a kind of political ideology, a behavior which we retain in modern nation-states in the guise of nationalism (Rodseth & Wrangham 2004).

Exogamy and Consanguineous Marriage

Tight kinship societies typically practice arranged marriage, with the parents (particularly the father) making most of the decisions. Patrilocal clans practice female exogamy, with daughters leaving their clan to go live with their husbands clan. 

Indeed, marriages among kin are much more common in agropastoralist societies (Walker & Bailey 2014). One of the most common forms of consanguineous marriage is cross-cousin marriage. In patrilineal systems, parallel cousins are members of one’s own clan, and forbidden via incest taboos. Cross cousins do not participate in the same patrilineal group, and hence often are deemed eligible for marriage. 

As long as parents are not too related, a little bit of consanguinity goes a long way.  The small cost of genetic disorders is offset by social benefits of marriage alliances, creating a “Goldilocks Zone” of optimal mates at intermediate levels of relatedness (e.g., second cousins). 

Consanguineous marriage might seem foreign. But they are the predominant social pattern of the Neolithic.

Fraternal Interest Groups

Unilineal descent appears linked to external warfare to control abundant, concentrated resources. But some groups also feature fierce competition within the group. In situations of internal war, patrilocal residence becomes spatially contiguous (Otterbain 1968; Ember et al 1974). Societies with internal war organize themselves into lineages (Ember et al 1974), with explicit (rather than vague) genealogies demarcating specific factional boundaries.

All of this suggests that internal war selects for fraternal interest groups (FIGs): large kinship-bonded male coalitions capable of military action. These power groups often unite within a men’s house or other exclusively masculine setting (Murphy 1957). FIGs manifest strongly in societies that are patrilocal, patrilineal, lineage-based, contiguous-residence, with high levels of consanguineous marriage. FIGs are not just associated with internal war, but also within-society violence and also feuding (van Velzen & van Wetering 1960, Otterbain & Otterbain 1965).

Rituals often serve to surveil and marshal social consensus. But with fraternal interest groups, direct enforcement of contracts becomes possible. This may explain why non-FIG societies use menarche rituals to safeguard marriage contracts, but FIG-based societies eschew these rituals in favor of explicit brideprice contracts (Price & Price 1984). 

While light polygyny appears a human universal, these societies are more likely to practice it more extensively, and the polygynous intensity strongly correlates with even higher rates of consanguineous marriage. FIG societies manifest marital aloofness, and sexual segregation may occupy an inverse relationship with male violence (Whiting & Whiting 1975). 

The Epidemiology of Kinship

What causes kinship tightness? We have already discussed (abundant and concentrated) resources and (internal and external) warfare. Let’s turn to another possible driver: parasite load (Fincher & Thornhill 2012). To understand this, let’s revisit first principles.

The immune system is a necessary but insufficient protection against disease. The visceral immune system is energetically costly (a 13% increase in metabolic expenditure is required to increase human body temperature by 1 degree Celsius), and temporarily debilitating (the syndrome is known as sickness behavior). In addition to a reactive system, animals take preventative measures against disease: a behavioral immune system (Schaller & Park 2011). 

Disgust evolved to promote pathogen avoidance, and thus contributes to the behavioral immune system. But unfamiliar people traveling from distant groups represent contagion risk, even if they aren’t manifesting signs of infection. Xenophobia is mediated disgust, and intensifies when disease concepts are primed. 

Everyone has a behavioral immune system; but it becomes more virulent in places of high parasite load.  Stable, local, and fractionalized networks have fewer links with the rest of the community. This insulation reduces the risk of an infection entering the collective, allowing the participants to live longer. But it also restricts the group’s exposure to new technologies (Fogli & Veldkamp 2021). 

Kinship Systems Across History

The niche of Sapiens was hunting and gathering. Only 9000 years ago we switched to agriculture, and hence to tight kinship systems.

The less nutritious diets of farmers left them shorter, sicker, and more likely to die young. But farmers did reproduce more quickly than hunter-gatherers. Farming societies spread across the landscape like an epidemic, driving out foragers in their path. Early farming spread not because it was a better lifestyle, but because farming communities with particular institutions beat mobile hunter-gatherer populations in intergroup competition. 

Kinship intensification events seems to emerge near agricultural centers (e.g., Uruk) and diffuse outwards at a remarkably slow pace. Kinship systems seem very stable.

Tight kinship won. The severe competition manifested by this system is visible in the Y-chromosome:

Tight kinship won. This is why all major world religions (with the exception of the Roman Catholic Church, for reasons we’ll explore next time) encourage or permit intensive kinship strategies including cousin marriage.

Tight kinship won, so we would expect the entire modern world to have it. Well, not exactly:

From 9,000 to 1,000 years ago, this map may indeed have been bright red. However, loose kinship has made a resurgence in Western (European or European-descended) societies. We’ll explore why this happened next time. 

Implications

Most human societies are conceptualized as residing on a continuum between two kinship systems: tight and loose kinship. 

The individualism-collectivism (IC) dimension of cultural variation is widely regarded as the most significant dimension of cultural variation.  As we will see next time, kinship systems (and its drivers like parasite load) drive the IC dimension. 

While foragers are patriarchal (due to our inheritance of primate polygyny), we will see patriarchy was much exacerbated by the tightening of kinship systems.

While polity ratcheting denotes an overall increase in group size since the Neolithic, economic growth was essentially non-existent until the Industrial Revolution. The reversion to loose kinship (i.e., individualism) may help explain the advent of growth. 

References

  • Bittles & Black (2010). Consanguineous marriage and human evolution.
  • Borgerhoff et al (2009). The intergenerational transmission of wealth and the dynamics of inequality in pre-modern societies.
  • Divale (1984). Matrilocal residence in pre-literate society. 
  • Ember & Ember (1971). The conditions favoring matrilocal versus patrilocal residence
  • Ember et al (1974). On the development of unilineal descent
  • Enke (2019). Kinship, Cooperation, and the Evolution of Moral Systems. 
  • Fincher & Thornhill (2012). Parasite-stress promotes in-group assortative sociality: the cases of strong family ties and heightened religiosity.
  • Fogli & Veldkamp (2021). Germs, Social Networks, and Growth. 
  • Fuentes (1953). The structure of unilineal descent groups
  • Helgason et al (2008). An Association Between the Kinship and Fertility of Human Couples
  • Korotayev (2004). Unilocal Residence and Unilineal Descent: A Reconsideration
  • Le Bris & Gay (2023). Distance to innovations, kinship intensity, and psychological traits
  • Lee (1972). !Kung Spatial Organization: an ecological and historical perspective. 
  • Murdock (1949). Social Structure
  • Murphy (1957). Intergroup hostility and social cohesion
  • Rodseth & Wrangham (2004). Human Kinship: a continuation of politics by other means
  • Schaller & Park (2011). . The Behavioral Immune System (and Why It Matters)
  • Walker & Bailey (2014). Marrying kin in small-scale societies.




The Evolution of War

Part Of: Anthopogeny sequence
Content Summary: 1000 words, 5 min read. 

Territoriality in Mammals

Many mammals exhibit territoriality. A community will occupy a fixed piece of real estate, and defend it from conspecifics. Once an intrusion is detected, boundaries are defended by group members (often the females) coming together to chase the outsider away. The goal of these fights over land is simply the opponents’ defeat.

In general, mammals do not kill their conspecifics, despite the xenophobic emotions inherent in matters of territory defense. To be clear, infanticide is extremely pervasive in the animal kingdom. But territoriality is a defensive posture, and the killing of adult members of one’s own species is virtually unknown.

Which brings me to chimpanzees.

The Raid Adaptation

Chimpanzees also inhabit demarcated territories, and neighboring communities are treated with hostility, so much so that up to 75% of the time is spent in the central 35% of the range. Border patrols are conducted by groups of male chimps moving stealthily to enforce their territory’s boundaries.

But chimpanzees also engage in raids with large groups of males penetrating deep into enemy territory, stalking and killing members of competing troops.

https://www.youtube.com/watch?v=a7XuXi3mqYM

The ape raiders are quiet, alert to enemies… The raiding chimps appear to assess tactical risk by locating and observing their enemy before attacking, they make sure they have a clear numerical advantage, and they try as well to gain the advantage of surprise. In addition, such attacks typically immobilize the victim, so that the attackers themselves are barely injured. The victims may be either male or female; but the aggression usually focuses more severely on adult males, less severely on obviously fertile females. Young females at the start of their breeding careers (nulliparous females) are most likely to escape injury and can be forced to travel back with the raiding party into their home territory.

In comparison to the vast majority of mammals, chimpanzee raids feature a deliberate search for victims, their killing and mutilation of a helpless neighbor despite his appeals for mercy. 

Why then has such behavior been selected? Killing opens the door to territorial expansion (Mitani et al 2010), by weakening the other groups’ overall fighting power. In Gombe, a series of these raids ultimately led to the collapse of the targeted group.  In turn, territory size directly correlates with resource and mate availability.

So why haven’t other animals evolved to kill?

The Imbalance of Power Hypothesis

A handful of other exceptional species also manifest lethal territoriality. But none of these involve raids. Hyena patrols do often kill rival gang members in gang-like warfare, but these occur in patrols, there are no secret incursions in enemy territory. And because hyenas are female-bonded species, the warfighters are female. Hyenas do not “take captives” of either sex; but they do plunder resources (e.g meat). Finally, lions kill lions from other groups; but this occurs exclusively in the context of takeover events.

What do lions, hyenas, and chimpanzees have in common? All are fission-fusion groups, with a stable residential group, but with foraging group size varying with seasonal fluctuations in resources. 

We’ve already discussed how raids are governed by the logic of a local imbalance of power. Raids preferentially occur when the attacking party has gathered significantly more fighting power than the defender (Wrangham 1999). 

Perhaps these are linked! Variable-size foraging groups  Encounters in fission-fusion species are often catastrophically imbalanced – this should provide the selective pressure towards the evolution of lethal territoriality. 

The Lowly Origin of Human War

Modern warfare doesn’t resemble chimpanzee raids. But primitive warfare is heavily reliant on raids! Let’s dive into one example from Wrangham ():

War among the Yanomamö is an overtly acknowledged relationship, part of an escalating tension between villages, possessing a history that men and women discuss. It can be provoked by sorcery. It can be motivated by revenge. The combatants prepare ceremonially. They use hand-held weapons instead of teeth, and their poisoned arrows can pierce the body of an individual or be fired in a volley against a whole village. Their war can include dastardly tricks. It sometimes has a plan. It targets specific enemies. A raid often takes days, not hours. Abduction and rape are common. Retaliation is expected. And so on. When Gombe and Yanomamö are compared, the gulf that divides our two species is unmistakable. Because language makes discussion and meaning possible, the cultural dimensions to human war will always make it richer, more complicated, more exciting, as well as more self-deceiving and confused, than chimpanzee intercommunity violence.

But the similarities are also clear. All seven features of chimpanzee raids discussed above manifest in human raids. Humans engage not only in lethal territoriality, but we also share a particular style of warmaking with  our closest ancestor.

Indeed, even the rate at which foraging humans and chimpanzees engage in between-group violence is quite similar:

These data suggest a common mechanism. It is not that humans evolved a unique thirst for warfare. Rather, this instinct likely derives from our common ancestor with chimpanzees.

Implications

Why should a hairless ape behave so strangely? From the perspective of an alien scientist, the imbalance of power hypothesis makes human warmaking behavior less surprising.  

It appears that warfare is an adaptation. That is not to say that war is good (the naturalistic fallacy). Neither is war inevitable (biological determinism).  

There is a live debate on how prevalent warmaking was in foraging communities. We have strong evidence of high homicide rates in such bands. But it is difficult to say how much stems from between-group raids, versus within-group feuds. I have not yet looked at this data thoroughly.

The fact that humans live in multilevel societies makes us somewhat more xenophilic than our chimpanzee relatives. Total war, with weaker peacemaking affordances, is more prevalent on meta-ethnic frontiers between very culturally dissimilar groups. 

There is also a large amount of variation in warmaking behavior across groups. We can plumb Fry et al (2021) work on peace systems to improve our understanding on how to engineer a world system with less war. There is also variation across time; the transition from achievement-based societies and chiefdoms in particular marked an intensification of war. Thus, cultural group selection might also inform the search for peace technologies.

Until next time. 

References

  • Fry et al (2021). Societies within peace systems avoid war and build positive intergroup relationships
  • Keeley (1996). War before civilization: the myth of the peaceful savage.
  • Wrangham & Peterson (1996). Demonic Males: apes and the origins of human violence.
  • Manson et al (1991). Intergroup aggression in chimpanzees and humans
  • Mitani et al (2010). Lethal intergroup aggression leads to territorial expansion in chimpanzees
  • Wrangham (1999). Evolution of Coalitionary Killing


Links (May 2024)

Part OfLinks sequence

Biology

Cognitive Science

  • A few months ago, a new ICN was discovered: the somato-cognitive action network (SCAN). This action-oriented SCAN is located in the interstices of M1. More recently, the cingulo-opercular network is strongly functionally linked to SCAN. Given its high-level position, the “cingulo-opercular” network has been renamed the Action Mode Network (AMN) to align with our new understanding.

AI

  • In the aftermath of Cambridge Analytica, I had concluded that big-data microtargeting (disinformation campaigns) are overblown, or not very effective in practice. But LLMs are consistently achieving superhuman results in persuasion. While microtargeting might not have been very effective in 2016, but that may change in 2028. 

Physics

  • Roger Penrose long ago proposed that the human brain uses quantum effects in microtubules and that was the origin of consciousness (Orch-OR). For a long time, microtubules were thought to be “too warm and wet” to sustain quantum coherence. But they have been shown capable of sustaining quantum effects. Implications both for quantum computing, and also the Orch-OR hypothesis. 

Other

  • AI, and automation more generally, seems to damage religious affiliation. Supporting data includes 1) correlations between- and within-nations, 2) longitudinal studies tracking change in religious attitudes for different automation-exposure, and 3) experimental support.  (This result confuses me, because it doesn’t align with my understanding of why people practice religion. The inverse correlation between welfare and religion is more explicable..)
  • France began the demographic transition (aka fertility crisis) one century before the rest of Europe. There are two primary explanations for the transition: cultural evolution and life history models. The evidence from France supports the former (perhaps the early secularization played a role; the pro-natal Catholicism message lost clout). 

The Avian Pathway

Part Of: Anthropogeny sequence
Followup To: Intro to Multilevel Societies
Content Summary: 2400 words, 12 min read

Introduction

In today’s post, I will argue that humans have a multi-level society (MLS) social organization, and also evolved as cooperative breeders (CB). These attributes together explain a great deal of human uniqueness. They are an explanatory bridge over the yawning chasm, from the other great apes to our species

MLS+CB animals are very rare in mammals, but common in birds. I will also argue our ancestors took this avian-like pathway with the advent of early menopause.

Multilevel Societies in Humans

Social organization in animals can be categorized into four distinct types.

  • Solitary Living: Every individual has its own territorial stake on foraging real estate, and guards access. Solitary females will reproduce with solitary males at the boundary of its territory, but raise the child alone. The pair-bond is exclusively expressed in the relationship between mother and her children. 
  • Monogamous Pair-Living.Female and male animals will co-reside in the same territory. The pair-bond is extended to support not just parenthood, but also a romantic commitment between the reproducing parents. 
  • One-Male Units (OMU). Adult males engage in contest competition for women which ultimately take the form of one-male multi-female harems. Sexual body dimorphism is accentuated due to the selective pressure of inter-male competition. Because of polygyny’s math problem, bachelor male primates often cause problems for the established “families”.
  • Multi-male Multi-female (MM-MF) groups. Adult males don’t typically fight over access to women, sexual selection in these societies tends towards sperm competition instead. Female sexuality is promiscuous, which can be interpreted as a paternal uncertainty device. 

To be extremely reductive, MM-MF groups possess groups, but lack families and love. Monogamous and OMU social structures have families and love, but lack groups.

Every animal species exhibits one of these social systems. Which one best characterizes Sapiens? Well, iIt is natural to identify as a family-living species. But pair-living and OMU animals don’t live in groups; our multifamily groups are anathema. What gives?

The discovery of multilevel societies (MLS) gives us the language to dramatically improve our understanding of human social organization. There are some species that exhibit a blend of both family- and group-living. From a phylogenetic perspective, MLS seem to emerge from two pathways:

  • With the bonding pathway, pair bonding nucleation occurs within ancestral mm-mf groups.
  • But with the aggregation pathway: autonomous OMUs increasingly overlap, and ultimately affiliate.

Enriched Male Coalitions in MLS

To quote from Chapais (2008)

It has been estimated that female chimpanzees copulate ~1500 times per conception. Consistent, long-term preferential relationships between males and females have not been observed. Paternal certainty is near zero, as is paternal investment. In such promiscuous systems, nearly all (95%) of siblings are half siblings.

Consider the perspective of a single male chimpanzee, Ego. Ego can identify who his mother and siblings are, by virtue of developmental familiarity. The reliable ability to recognize one’s siblings is the basis of the Westermarck effect.

But Ego faces two barriers to kin recognition. First, due to parental uncertainty in promiscuous MM-MF settings, a male chimpanzee cannot identify his father (nor, in consequence, any of his father’s relatives). He also cannot identify his own sons and daughters, nor the offspring of his brothers. Second, chimpanzee societies are organized by female dispersal, with females relocating to other troops when they reach reproductive age. Without this emigration, Ego could recognize his mothers relatives, and his sister’s children… but he will never meet them (male chimpanzees cannot visit neighboring communities, because they would be killed by resident males).

The theory of inclusive fitness predicts substantial cooperation between relatives. But kinship-based cooperation is viable only amongst kin you can recognize. So, male coalitions are very small in male philopatric societies. 

But multilevel societies (MLS) reintroduce the pair bond, and paternal uncertainty is lessened. Ego is suddenly able to identify his father, and – by social inference – his father’s siblings. Ego is also able to recognize his brother’s offspring (but not his sister’s offspring, since she still disperses before reproducing). 

In principle, Ego can also identify his son’s children, and his father’s parents — but in practice, chimpanzee life history is such that only three generations typically coexist (these possible recognitions are denoted in light green). 

Kin recognition is expanded in MLS societies. The male cooperative networks have become much more powerful! In our example, rather than having one ally (his brother), Ego is able to forge alliances with five other males. 

Male cooperative networks were further expanded by the ability to recognize affines (in anthropomorphic terms, “brother-in-law”), whose shared interest in their offspring provided another kinship basis for cooperation.

Reinterpreting Human Monogamy

Consider the following facts:

  • Our closest ancestors are either MM-MF (chimpanzees) or OMUs (gorillas). None of them are monogamous pair-living.
  • On average, male body size is about 15% larger than female body size. Sexual dimorphism of 15% is not what we would expect from a monogamous species.
  • We know of zero examples of a monogamous species evolving an MLS organization. In contrast, OMUs and MM-MF species become MLS often.  
  • In monogamous primates, sexual coercion of females is rarely, if ever observed. Mate guarding and contest competition are also largely absent.

In fact, polygyny seems to be the cross-cultural human universal. Not monogamy.

For a given band of foragers, we can count how many men have one wife, and how many have more than one. Marlowe (2003) reports that polygamously-married men exist in some 90% of societies in the standard cross-cultural sample (SCCS). To be more specific, 60% of societies have about ~10% of men with more than one wife, and 30% of societies have even more sexual inequality than this.

Polygyny creates a “math problem”: it inevitably creates pools of unmarried, low-status males.

All polygynous societies have bachelor males (“incels”) who harass males in established harems. But in MLS societies, bachelor primates cooperate in all-male bands and try to wrest breeding opportunities from the “married” males (Qi et al 2017). This generates even more selective pressure from male cooperation in the breeding bands.

Henrich (2012) offers an explanation for the recent phenomenon of state-enforced monogamy.

Taking wives is always positively associated with status, wealth or nobility, even among highly egalitarian foraging societies. After the origins of agriculture, as human societies grew in size, complexity and inequality, levels of polygynous marriage intensified, reaching extremes in the earliest empires whose rulers assembled immense harems. Yet, monogamous marriage has spread across Europe, and more recently across the globe, even as absolute wealth differences have expanded. Here, we develop and explore the hypothesis that the norms and institutions that compose the modern package of monogamous marriage have been favored by cultural evolution because of their group-beneficial effects—promoting success in intergroup competition.

Intro to Cooperative Breeding

Eusocial insects, such as bees, are characterized by caste-based division of labor, cooperative raising of children (i.e., alloparenting), extreme reproductive skews (e.g., reproduction monopolized by the queen), and ultracooperative behaviors leading to colonies behaving as a cohesive superorganism

E.O Wilson (1975) famously called humans eusocial apes, appealing to our own species’ ultracooperative behavior. Eusociality was once thought to exist only in insects, but striking evolutionary convergences have since convinced us of a eusociality continuum (Sherman et al 1995) between these insects and cooperatively breeding birds and mammals. 

Most primates adopt a continuous care and contact (CCC) parenting model, with the infant maintaining an unceasing grasp on his mother’s fur until they reach maturity. But CCC is a last resort for primate mothers who lack safe alternatives. When the species-specific frequency of male infanticide is low (as it is for humans), maternal hyperpossessiveness is relaxed (Hrdy 2009). 

In contrast with monkeys, mothering does not “come naturally” for great apes. Infant mortality for first time mothers is surprisingly high. Little surprise, then, that young female apes are drawn to practice mothering. 

Compared to their non-cooperative breeding peers, species that do use alloparenting exhibit shorter interbirth intervals, which dramatically improves their ability to colonize new habitats. The family Callitrichidae, for example, are famous for breeding fast and for their rapid colonization of new habitats. 

More interesting, species with alloparenting exhibit many more prosocial behaviors than those without it (Burkart & van Schaik 2010). These include enhanced mindreading abilities, shared intentionality, increased social tolerance, and spontaneous prosociality. 

Why might explain the link between alloparenting and prosociality? Consider the plight of an infant raised in such an environment. Their survival is no longer the exclusive domain of their mother; but also depends on soliciting help from other alloparents – some of whom may not even be kin. Consequences for failure are large: unlike CCC, mothers in cooperative breeding societies exhibit high rates of child abandonment, when they assess insufficient resources are available. 

For the infant, this sets off a kind of social Machiavellian environment. Infants must attend to the mental state of nearby conspecifics, and tune their solicitations accordingly. In order to “stay in touch without touch”, infants use vocalizations like crying and babbling. This may explain why shared intentionality develops as early as 9 months in human infants (Tomasello & Gonzalez-Cabrera 2017).

Grandmothers as Human Alloparents

If you compare the senescence of physiological functions in human females, you see a remarkable pattern. 

For the vast majority of animals, death occurs very shortly after infertility. Infertile organisms live in the selection shadow, where somatic maintenance is deprioritized. And early menopause is exceedingly rare in the animal kingdom; only found in a handful of other species. 

In preindustrial high infant mortality regimes, grandmothers consistently have a tremendous influence on the reproductive success of their offspring. Post-reproductive women gained roughly two extra grandchildren for every ten years they survived past completion of their childbearing (Hrdy 2009; Sear & Mace 2008). Ethnographers report grandmothers are economically productive, provide childcare, and also play a role in teaching motherhood skills. These data suggest that early menopause evolved in the human transition to cooperative breeding

What conditions would cause a great ape to evolve (grandmother-based) cooperative breeding? We might expect three:

  • Environmental scarcity.  Cooperative breeding tends to emerge when mothers cannot raise children alone. This is consistent with the Kingdon (2000) hypothesis that Erect speciation occurred in a desert (e.g., Near East), an argument grounded in our hairlessness and moisture-retaining nose,
  • Reduction in infanticide rates.  Hyper-possessive behaviors are a maternal strategy of last resort. Offspring have more latitude when the risk of male infanticide is low.  Whatever the reason, humans exhibit low infanticide rates. This development almost certainly predates and prefigures cooperative breeding.
  • Closeness of maternal kin. One of the central objections to the cooperative breeding hypothesis  is that both great apes and hunter-gatherers appear to be patrilocal (male bonded). How could selection for grandmothering occur if females disperse away from their own kin? One could perhaps appeal to sororal polygyny as a stepping stone. But Alvarez (2004) showed that most hunter-gatherer groups are actually bilocal, not patrilocal. Perhaps female dispersal was sufficiently flexible to allow female kin.

An Avian Pathway

What drives species towards these archetypes? We don’t have a complete theory. But both multilevel societies (MLS) and cooperative breeding (CB) tend to require environmental harshness, and kinship-bonded, socially tight knit groups.

Across all mammalian species, less than 1% exhibit CB strategies. A similar fraction of species manifest as MLS. And despite their overlapping drivers, finding species with both archetypes is very unusual in mammals (but see Ren 2012; Xiang et al 2019). 

So how did humans evolve both?

Birds may provide some insight. In them, cooperative breeding occurs more frequently (~8%). And MLSs and CB frequently co-occur in birds (Camerlenghi et al 2021). 

Computational modeling suggests that these high sex ratios can incentivize a behavior change away from multiple-mating and towards pair-bonding (Coxworth et al 2015). If hominin evolution followed this sequence, this would suggest that early menopause induced our species to enter the bonding pathway. 

What to expect from an animal who took the Avian Pathway?

Suppose you construct a list of anatomy, behaviors, and cognitive affordances of humans. Then make the same list for chimpanzees, and categorize each element as shared. 

  • Shared phenotypes. Both species engage in mindreading, toolmaking, sophisticated coalitionary politics, engage in warlike raids, rudimentary culture…
  • Derived phenotypes. Only humans have the capacity for language, prestige status psychology, altruism, explosive cumulative culture…

The list of shared phenotypes is a useful reminder of our lowly origin. But the list of idiosyncratically-human phenotypes (aka human universals). is remarkably long. Chimpanzees and humans share a common ancestor some 7-9 mya, a considerable amount of time. But why do humans have these derived features, and not others? 

While the MLS-CB social phenotype doesn’t explain the origin of all human universals, it does make our phenotype much less surprising. Let me illustrate this in the domains of language, prestige, and cumulative cultural evolution.

As many theorists have noted, language could not evolve without shared intentionality (the capacity to share attention on a single option), altruism (information donation is an altruistic act), and vocal control (to support e.g., babbling). None of these capacities exist in great apes. But these prerequisites are likely to exist in a cooperative breeding species!  

Most primates grapple for status using dominance psychology, which involves aggression-fueled contests. But humans also gain status by prestige, a status system grounded in admiration and deference. We can see these two rival systems expressed in politics. Big men societies use prestige-based leadership; chiefdoms use dominance-based leadership. Both can be seen in eye gaze behavior. In a dominance context, eye contact is confrontational; in prestige contexts this same behavior is deferential. 

Henrich & Gil-White (2001) view prestige as a uniquely-human psychological adaptation to facilitate cultural transmission in humans. But one other species seems to have evolved a form of prestige as well: the Arabian babbler. It competes for the right to perform altruistic acts, such as feeding each other, territory defense, and sentinel guard duty (Dattner et al 2015; but see Wright et al 2001). Zahavi describes this prestige behavior using the handicap principle: only truly fit individuals can afford to produce these costly signals of altruism. 

Perhaps prestige psychology will soon be found in other cooperative breeding species too. Or perhaps prestige evolves in response to unusually high cost (often mortal) dominance contests, which dovetails nicely with the notion that human reverse dominance psychology evolved in response to projectile weapons. 

Less is known about the cognitive correlates of multilevel social organization. A research area to watch closely. 

Humans do have an unusually robust capacity for cumulative cultural evolution (CCE); this trait is often credited for our ecological dominance. Undoubtedly, language supercharged our cultural capacities. The population size hypothesis suggests that demographic factors also constrain CCE (Derex & Mesoudi 2020). Multi-level societies have much larger effective group sizes, and group size is an accelerant. Further, the social network architecture inherent in MLS societies can also facilitate cultural recombination (Cantor et al 2021). Taken together, the human shift towards MLS organization seems to have prefigured our later cultural superpowers. 

Until next time. 

References

  • Alvarez (2004). Residence groups among hunter-gatherers: a view of the claims and evidence for patrilocal bands
  • Burkart & van Schaik (2010). Cognitive consequences of cooperative breeding in primates?
  • Cantor et al (2021). Social network architecture and the tempo of cumulative cultural evolution
  • Camerlenghi et al (2021). Cooperative breeding and the emergence of multilevel societies in birds
  • Chapais (2008). Primeval Kinship
  • Coxworth et al (2015). Grandmothering life histories and human pair bonding
  • Dattner et al (2015). Competition over guarding in the Arabian babbler (Turdoides squamiceps), a cooperative breeder
  • Derex & Mesoudi (2020). Cumulative cultural evolution within evolving population structures
  • Henrich & Gil-White (2001). The evolution of prestige: freely conferred deference as a mechanism for enhancing the benefits of cultural transmission.
  • Heinrich (2012). The puzzle of monogamous marriage
  • Heinrich (2020). The WEIRDest People in the World: How the West Became Psychologically Peculiar and Particularly Prosperous
  • Hrdy (2009). Mothers and Others
  • Hrdy & Burkart (2020). The emergence of emotionally modern humans: implications for language and learning
  • Kingdon (2000). Lowly Origin: where, when, and why our ancestors first stood up
  • Marlowe (2003). The Mating System of Foragers in the Standard Cross-Cultural Sample
  • Sear & Mace (2008). Who keeps children alive? A review of the effects of kin on child survival.
  • Sherman et al (1995). The Eusociality Continuum
  • Peccei (2001).  Menopause: adaptation or epiphenomenon?
  • Ren (2012). Evidence of Allomaternal Nursing across One-Male Units in the Yunnan Snub-Nosed Monkey (Rhinopithecus Bieti)
  • Tomasello & Gonzalez-Cabrera (2017). The Role of Ontogeny in the Evolution of Human Cooperation
  • Qi et al (2017). Male cooperation for breeding opportunities
  • Wilson (1975). Sociobiology: the new synthesis
  • Wright et al (2001). Cooperative sentinel behaviour in the Arabian babbler
  • Xiang et al (2019). Routine allomaternal nursing in a free-ranging Old World monkey 
  • Zahavi (1995) Arabian babblers: the quest for social status in a cooperative breeder

[Excerpt] States and Stationary Bandits

Excerpt From: Olson (2010). Power and Prosperity
Content Summary: 1200 words, 6 min read
Part Of: Politics sequence

Theft in Moderation

Let us contrast the individual criminal in a populous community with the head of a Mafia family that can monopolize crime in a neighborhood. Suppose that in some well-defined turf, a criminal gang cannot only steal more or less as it pleases but can prevent anyone else from committing crimes there. Will it gain from taking all it can on its own ground? Definitely not.

If business in this domain is made unprofitable by theft, or migration away from the neighborhood is prompted by crime, then the neighborhood will not generate as much income and there will not be as much to steal. Indeed, the Mafia family with a true and continuing monopoly on crime in a neighborhood will not commit any robberies at all. If it monopolizes crime in the neighborhood, it will gain from promoting business profitability and safe residential life there.

Thus, the secure Mafia family will maximize its take by selling protection- both against the crime it would commit itself (if not paid) as well as that which would be committed by others (if it did not keep out other criminals). Other things being equal, the better the community is as an environment for business and for living, the more the protection racket will bring in. Accordingly, if one Mafia family has the power to monopolize crime, there is little or no crime (apart from the protection racket). The considerable literature on monopolized crime makes it clear that secure monopolization of crime does, in fact, usually lead to protection rackets rather than ordinary crime. Outbreaks of theft and violence in Mafia-type environments are normally a sign that the controlling gang is losing its monopoly. 

The individual robber in a populous society obtains such a narrow stake of any loss to society that he ignores the damage his thievery does to society. By contrast, the Mafia family that monopolizes crime in a community has, because of this monopoly, a moderately encompassing stake in the income of that community, so it takes the interest of the community into account in using its coercive power. Whereas the individual criminal in a populous society bears only a minuscule share of the social loss from his crime, the gang with a secure monopoly on crime in a neighborhood obtains a significant fraction of the total income of the community from its protection tax theft. Therefore, though the individual criminal normally takes all of the money in the wallet he steals, the secure and rational Mafia leader never sets a protection tax rate anywhere near 100 percent: this would reduce the neighborhood’s income so much that the Mafia family itself would be a net loser.

Preference for Stationary Bandits

The warlord who I was reading about, Feng Yu-hsiang, was noted for the exceptional extent to which he used his army for suppressing thievery and for his defeat of the relatively substantial army of a notorious roving bandit called White Wolf. Apparently, most people in Feng’s domain wanted him to stay as warlord and greatly preferred him to the roving bandits. At first, this situation was puzzling: Why should warlords who were simply stationary bandits continuously stealing from a given group of victims be preferred, by those victims, to roving bandits who soon departed? The warlords had no claim to legitimacy and their thefts were distinguished from those of roving bandits only because they took the form of relentless tax theft rather than occasional plunder.

There is a good reason for this preference. As we have seen, there is little production in an anarchy and thus not much to steal. If the leader of a roving bandit gang who finds only slim pickings is strong enough to take hold of a given territory and to keep other bandits out, he can monopolize crime in that area- he becomes a stationary bandit. The advantage of this monopoly over crime is not mainly that he can take what others might have stolen: it is rather that it gives him an encompassing interest in the territory. He actually has a stronger encompassing interest than the Mafia family, since the bandit leader who takes over an anarchic area does not have competition from any government’s tax collectors: he is the only one who is able to tax or steal in the domain in question.

A Benefactor to Those He Robs

The second way in which the encompassing interest of the stationary bandit changes his incentives is that it gives him an incentive to provide public goods that benefit his domain and those from whom his tax theft is taken. Paradoxically, he provides these public goods with money that he fully controls and could spend entirely on himself. We know that a public good benefits everyone in some area or group and that many public goods, such as levees that protect against floods, police that deter crime, and quarantines that limit contagious diseases, make a society more productive.  He has an incentive to spend his resources on all productivity-enhancing public goods up to the point where his last dollar spent on these goods equals his share of the resulting increase in output. Thus, if the stationary bandit’s optimal rate of tax theft is 50 percent, he will spend on public goods up to the point where the last dollar spent on these goods adds $2 to the output of the domain, since he will then receive $1.  Readers who want formal proofs and a mathematical and geometric exposition of this argument should consult McGuire & Olson (1996).

The bandit leader, if he is strong enough to hold a territory securely and monopolize theft there, has an encompassing stake in his domain. This encompassing interest leads him to 

  1. Limit and regularize the rate of his theft and to spend some of the resources that he controls on public goods that benefit his victims no less than himself. 
  2. Since the settled bandit’s victims are for him a source of tax payments, he prohibits the murder or maiming of his subjects. 
  3. Because stealing by his subjects, and the theft-averting behavior that it generates, reduces total in-come, the bandit does not allow theft by anyone but himself. 

He serves his interests by spending some of the resources that he controls to deter crime among his subjects and to provide other public goods. A bandit leader with sufficient strength to control and hold a territory has an incentive to settle down, to wear a crown, and to become a public good- providing autocrat.

Autocracy has been commonplace at least since King Sargon’s conquests created the empire of Akkad in ancient Mesopotamia. Most of humanity over most of history has been subjected to autocracy and exploited by tax theft. It is very difficult to find examples of benevolent despots. The stationary bandit model fits the facts far better than the hypothesis that autocrats are altruistic.

References

  • McGuire & Olson (1996). The Economics of Autocracy and Majority Rule