The Politics of Monolatrism

Part Of: History sequence
Content Summary: 3000 words, 30 min read

The Pan-Israelite Identity

Until 700 BCE Judah is a much smaller political force than it makes itself to be. One demonstration of the small scale of this society is the request in one of the letters sent by the Abdi-Heba, king of Jerusalem, to the pharaoh that he supply fifty men “to protect the land.” Another letter asks the pharaoh for one hundred soldiers to guard Megiddo from an attack by his aggressive neighbor, the king of Shechem. These Amarna letters date to the 14th century BCE. But the population in the intervening time period does not change much. Until 700 BCE, Judah’s population totaled no more than twenty settlements with a population of roughly 40,000, with a handful of fortified cities (not including Jerusalem).1

Judah wasn’t always beholden to the Assyrian empire. But when the Assyrian god-king Tiglath-Pileser III switched from a policy from remote domination to direct military control, the states of Canaan began looking for a way out. Israel and Aram-Damascus went to Jerusalem to pressure Judah to join the independence movement. But the Judahite King Ahaz instead appealed for military assistance from Assyria, at the price of becoming a vassal to the superpower. Tiglath-Pileser III accepted the proposition, and utterly destroyed Aram-Damascus in short order (2 Kings 16:5-18). He also conquered Megiddo and Hazor in 732 BCE, crippling the Northern economy. Assyria “cancelled” the Israelite Kingdom entirely in 722 BCE. 

These Iron Age nations lived on the edge of a knife. One political miscalculation, and atrocities ensue. 

The Assyrians were feared for their war crimes, and their practice of exile: forcibly relocating thousands of people into a new region, until their national heritage was subsumed by Assyrian monoculture. Little wonder that archaeologists find evidence for a mass migration of southern Samaria Israelites into Judah, plausibly as a mean to escape exile. Conservative estimates place the Judah population doubling from 40,000 to 80,000 people. The immigration was particularly pronounced in Jerusalem, which gained 15x more people in less than a generation (Brochi, 1974).

Not only did Judah experience a population boom. Sites in Stephalah that show signs of a new olive oil industry. Beyond tribute, Ahaz also integrated Judah into the Assyrian world economy. This economic boom complemented the population boom. Together, they led Judah towards full statehood; this time period contains the first evidence of an advanced bureaucracy, complete with public works projects, and scribal activity.

The population of Judah doubled. Imagine 400 million Canadians emigrated to the United States. It’s hard to fathom the myriad ways life would have to change to accommodate such an influx. Social stability would only be possible with heavy ideological emphasis on unity.

The South remembered a King David; the North remembered a King Saul. While these historical figures may have interacted one another, the tales of their relationship – and how David ultimately earned the right to the unified throne – are surely relevant to the interests of Judahite scribes. These scribes compiled texts in an effort to reconcile the two peoples, to motivate a sense of Pan-Israelite identity. This era is where clearly-Northern (Judges, E, Saul) and clearly-Southern (J, David) traditions were first brought together in a unified series of texts.

Preparations For Revolt

2 Kings 18:14 reports that Sennacherib levied tribute of 30 talents of gold for Judah. Assyrian records reveal that this is in fact an extremely steep sum: only two other vassals received greater demands per Rothlin & Roux (2013). This suggests that Judah around this time was quite wealthy, a fact attested in 2 Chronicles 32:27-29. How did Judah manage to acquire so much wealth? 

Judah’s role in the international market was limited by her lack of a major sea port and natural resources. But Rothlin & Roux (2013) point out that Judah should have been able to extract taxes against traffic following two international trade routes, the King’s Highway and the Via Maris. The other two cities adjacent to international trade routes, Tyre and Damascus, were subject to comparably steep tribute demands and frequent military action; testifying to their tremendous wealth-generating potential.

Given his immense wealth, King Hezekiah did not find vassal-hood acceptable. So, in a move that would ultimately doom his nation, he began preparing a revolt. His administration built the Siloam Tunnel, bringing freshwater to Jerusalem as a defense against siege. 

Archaeologists have also discovered vast numbers of storage jars produced during Hezekiah’s reign, decorated with “LMLK”, which roughly translates to “property of the King”. Many scholars think they were used for the distribution of supplies in preparation for the revolt.

Tithes as Taxes

Genesis is rife with stories of the patriarchs building altars to worship their god. The practice is codified by the Covenant Code: Exodus 20:24 endorses the construction of local altars, where all the people of Israel can participate in the Yahweh cult. 

Contrast this with Deut 12:5-6,11-14, which insists on cult centralization. Here Moses insists that there is only one legitimate place of worship – Jerusalem. This motif is a central fixture of the book of Deuteronomy. 

From a comparative perspective, centralization is an unusual policy. Recall, these reforms occurred centuries before the concept of prayer, synagogue, and scripture even existed – there was only cultic ritual.  By necessity, they deprives the worshiper of that direct and spontaneous religious experience to which he was accustomed in the local altars spread throughout the country. 

So, why? Why was centralization such a vital issue to the Hebrew Bible? 

Many ex-Israelites presumably still worshiped in the Bethel temple, situated in the midst of their ancestral villages. Located just a few miles north of Jerusalem, this must have posed a serious religious challenge to Judahite authority. It seems that the solution was a ban on all sanctuaries – countryside shrines in Judah and the Bethel temple alike. 

Social reasons may not have been the only factor. Theocracies like ancient Israel feature strong interactions between politics, economics, and religion. Consider the taxation system of ancient Judah. As described in Oden (1984), there were several ways a king could generate revenue:

Recall Hezekiah’s predicament. Judah was starting to mature beyond its provincial chiefdom legacy. The state is actively strengthening its power, especially in the capital city of Jerusalem. Meanwhile, he has hatched the desperate plan to revolt against the Assyrian superpower. This act will require massive funding: standing armies and city fortifications aren’t exactly cheap.

Claburn (1973) put forward the fiscal hypothesis:

How does an ambitious king most efficiently get his hands on the largest possible proportion of the peasantry’s agricultural surplus? If he is smart, he does it not by raising the assessed level of taxes, but by reforming his fiscal system so that he brings into the capital a larger proportion of the taxes already being assessed. He does this by substituting for the semi-independent local dignitaries to whom the peasant had been paying the taxes (but who had been pocketing most of the proceeds locally) a hierarchically organized central internal revenue bureau of paid officials under his direct control. 

2 Chr 31:11-12 describes the construction of elaborate storehouses to store the new influx of wealth. Deut 14:24-26 provides helpful advice on how peasants can more efficiently transport their money to the Jerusalem coffers. Deut 16:16 offers another plain assertion (emphasis added): “three times a year all your males must appear before the Yahweh in the place he chooses for the Feast of Unleavened Bread, the Feast of Weeks, and the Feast of Shelters; and they must not appear before Yahweh empty-handed.”

Hezekiah also guaranteed financial protection to the now-impoverished Levites (2 Chr 31:19), which may have gone some way in quelling a potential source of civil unrest. By granting supplies to the local priests, Hezekiah assured them that his reforms did not intend to deprive them of their livelihood.

The Pious Lie

Since Martin Noth, scholars have recognized that the books of Deuteronomy, Judges, Joshua, 1-2 Samuel, and 1-2 Kings share the same author (the Deuteronomist Dtr), or at least the same cadre of authors. Together, these books form the Deuteronomistic History (DtrH)

We have already seen how centralization pervades Deuteronomy. But critically, centralization also plays a pivotal role in the books 1 and 2 Kings. In the DtrH, all kings of Judah and Israel (!) are evaluated, in large part, on their failure to enforce Jerusalem-only worship. All northern kings are given a bad evaluation, even Jehu, who destroyed the cult of Baal. Even the good kings of Judah after Solomon and prior to Hezekiah are given only qualified good evaluations, because they permitted sanctuaries on the high places.

Next, we turn to 2 Kings 22:8-13

Hilkiah the high priest said to Shaphan the secretary, “I have found the Book of the Law in the temple of the Lord.” He gave it to Shaphan, who informed the king, “Hilkiah the priest has given me a book.” When the king heard the words of the Book of the Law, he tore his robes. He gave these orders: “Go and inquire of the Lord about what is written in this book that has been found. Great is the Lord’s anger that burns against us because those who have gone before us have not obeyed the words of this book; they have not acted in accordance with all that is written there concerning us.”

What was this “book of the law” that Hilkiah found? Since the early eighteenth century, scholars have known it to be Deuteronomy. It is the only book in the Pentateuch to advocate centralization, and the details of Deuteronomy’s proscriptions are reported to be specifically implemented by Josiah. 

Let us put all of this together. 

  1. Deuteronomy contradicts Exodus’ endorsement of decentralized Yahweh worship (the ancestral form) with the (much later) idea of centralization.
  2. Centralization served to funnel wealth away from the local Levites, and channel those funds directly into royal coffers.
  3. In Kings, monarchs are judged good/bad on two criteria: exclusive worship of Yahweh, but also conformance to centralization.
  4. In Kings, Josiah is said to “discover” the book of Deuteronomy, and use it as the basis of his centralization reforms. 
  5. Per the DtrH hypothesis, the author of Kings most likely also authored Deuteronomy. 
  6. In Kings, the most textual space and full-throated praise is given to King Josiah and Hezekiah – who revolted against Assyria.
  7. In Kings, the most vitriolic condemnations are reserved for Ahaz and Manasseh – who were deferent to Assyria. 

These data suggest three natural conclusions:

  1. Dtr is a scribe in Josiah’s court. In Kings, he combines history with an ideological argument against decentralization and idolatry. 
  2. Dtr also penned the book of Deuteronomy. Moses never advocated centralization; these ideas were instead placed on Moses’ lips. 
  3. Dtr was not only pro-centralization and pro-intolerance. He was also orchestrating a political independence movement.

In short, Dtr was a member of a Hardliner group in Jerusalem. They were violently opposed to another faction, whom we’ll call the Internationalists:

The Hardliners needed centralization to fund their war efforts. It is less clear why they affiliated with Yahweh-only monolatrist prophets like Elijah. Couldn’t the Yahwists have just as easily chosen the Internationalists’ approach to geopolitics? Or is there some structural connection between religious fundamentalism and nationalistic ferver? I don’t have an answer, but Akenaten’s eerily similar theopolitical reforms may suggest the latter. 

Sennacherib’s Revenge

Hezekiah represented the Hardliner faction within Jerusalem. At first, he continued the posture pioneered by Ahaz: subservience to Assyria. This was appropriate given that Hezekiah was crowned during the reign of Sargon II of Assyria, who single-handedly transformed the neo-Assyrian state into a multinational empire. 

But Hezekiah planned his rebellion, using centralization as a new source of funding. And when Sargon II was killed in battle, and his untested son Sennacherib assumed the throne, Hezekiah took a gamble and declared independence.

The revolt did not go well. The Neo-Assyrian war machine in this era was absolutely devastating, and Sennacherib proved able to wield it. He laid siege to every significant Judean town, captured, and ransacked them. Hezekiah was subjugated as a vassal, and conceded an enormous tribute.

The Hebrew Bible spends just one sentence on this state-crippling result. But the archaeological record has revealed the extent of the damage. Sennacherib commissioned artwork depicting his victory over Lachish, the second most important city in Judah. The inscription of the Lachish Relief reads “Sennacherib, the mighty king of Assyria, sitting on the throne of judgment, at the entrance of the city of Lachish. I gave permission for its slaughter”.

Sennacherib laid siege to Jerusalem, but failed to capture it. The Hardliners make much of this fact, attributing the non-capture to a miracle. Yet despite their failure to destroy the monarchy of Judah (as they had Israel and Arab-Damascus), the Assyrians did cripple the state, and negotiated a very unfavorable peace treaty. 

The Hardliners took the survival of Jerusalem as evidence of their own invulnerability. But the Internationalists looked to another outcome: the slaughter of their people. Little surprise the Internationalists took control of government. The next Judean king, Manasseh, reversed the unpopular doctrine of centralization, allowing Yahweh worship to continue on countryside altars.

The 55 year reign of Manasseh, with a conciliatory policy towards Assyria, surely facilitated the nation’s economic recovery. Internationalist scribes likely authored the Great Solomonic Empire tradition (archaeology suggests the historical Solomon was little more than a warlord1). From Finkelstein & Silberman (2006):

The stories of Solomon in the Bible are uniquely cosmopolitan. Foreign leaders are not enemies to be conquered or tyrants to be suffered; they are equals with whom to deal politely, if cleverly, to achieve commercial successs. The Solomonic narratives were used to legitimize for all of Judah’s people the aristocratic culture and commercial concerns of the court of Manasseh that promoted Judah’s participation in the Assyrian world economy.

Return of the Hardliners

After Manasseh died, Amon inherited the throne… and was then murdered. The Hardliners emerged from the coup in control of the reigns of government, with an 8 year old boy named Josiah ultimately crowned King. During Josiah’s childhood, Dtr authored Deuteronomy, and this text was later used by the King as an ideological justification for his renewed efforts at taxation-centralization. Deuteronomy is also framed as a suzerain covenant treaty of submission of Israel to Yahweh, in the same template as was used by Assyria to assert dominance over its vassals (Romer, 2007). By declaring fealty to Yahweh, a political statement was made: Judah was no longer a vassal of Assyria. 

The Hardliners were more skeptical of the Assyrian global economy. To the glories of Internationalist Solomon were added Hardliner allegation of moral depravity, stemming from corruption by his foreign wives.

In the middle of Josiah’s reign, and for external reasons, the Assyrian Empire was beginning to disintegrate, and the Neo-Babylonian Empire had not yet risen to replace it. It is possible that Egypt and Assyria reached some sort of an understanding, according to which Egypt inherited the Assyrian provinces to the west of the Euphrates in exchange for a commitment to provide Assyria with military support (Finkelstein, 2002). Yet in these uncertain times, many nations formerly under the yoke of Assyria were able to govern themselves independently. One imagines a waft of optimism during this time, hope tinged with zealous patriotism.

A Fateful Miscalculation

So when the Egyptians journeyed north to support Assyria against the “new kid on the block” (Babylon), the precocious Judah decided not to let them pass. The thought of their hated nemesis receiving military support was perhaps too much. 

Of course, this geopolitical read turned out to be mistaken. Babylon, not Assyria, turned out to be the empire to worry about. And more to-the-point, Egypt defeated Judah on the battlefield of Megiddo. Josiah was killed. His army was slaughtered.

Just as Israel had done, a dramatically weakened Judah went on to stubbornly rebel against Babylon, despite the suicidal imbalance of power. And they paid the price. Nebucchadnezzar sacked Jerusalem, and exiled the Jerusalem elite. All told, 5-15% of the Judahite population – the intelligenstia – were exiled into Babylon provinces. This is called the beginning of diaspora. But, note that most of the Judahite population remained on the land as rural subsistence farmers: their daily lives weren’t affected much by the change of power in the capital.

The Politics of Monolatrism

Had the Bible been written in a modern context, you might see it bristling with geopolitical intrigue, moving appeals for independence, and the like. But since it was written in a vastly different cultural milieu, these very same sentiments manifest themselves as zealous ardor for centralization.

After exile, Dtr naturally didn’t throw away the exuberant texts of Kings and Deuteronomy. Instead, he reworked them to explain why the state of Judah had been destroyed. Unwilling to attribute blame to the Hardliner acts of rebellion, he instead attributed the collapse of the state to two factors:

  • Past Internationalist administrations blamed for the bad political outcomes of the Hardliners. 
  • The sins of the peasants, who consistently failed to renounce their polytheism and worship Yahweh.

In this second edition of DtrH, the conditional Mosaic covenant (“you will keep the land if…”) was emphasized, as a way of reconciling history and the unconditional Davidic covenant (“David’s dynasty will never end”). 

Perhaps exile would have been the end of the story, if the processes of cultural assimilation had not been interrupted by Cyrus and the Achaemenid Empire. But they did intervene. And within this timeframe, as we will see, a rival faction to the Deuteronomists are responsible for one of the most important ideological innovations of our modern world. Called the Priestly source in DH parlance, these temple-less priests sitting in exile invented monotheism

Until next time.


1. This particular section relies on low chronology. Alternative chronologies exist; see Thomas (2016). Note that most of the conclusions reached in this article do not depend on low chronology, and are also held by “high chronology” scholars.


  • Borowski (1995). Hezekiah’s Reforms and the Revolt against Assyria 
  • Brochi (1974). The Expansion of Jerusalem in the Reigns of Hezekiah and Manasseh
  • Claburn (1973). The Fiscal Basis of Josiah’s Reforms
  • Finkelstein & Silberman (2002). The Bible Unearthed
  • Finkelstein & Silberman (2006). David and Solomon
  • Oden (1984). Taxation in Biblical Israel
  • Rothlin & Roux (2013). Hezekiah and the Assyrian tribute
  • Romer (2007). The So-Called Deuteronomistic History
  • Thomas (2016). Debating the United Monarchy: Let’s See How Far We’ve Come 

Epistemic vs Aleatory Uncertainty

Part Of: Bayesianism series
Content Summary: 2300 words, 23 min read
Epistemic Status: several of these ideas are not distillations, but rather products of my own mind. Recommend a grain of salt.

The Biology of Uncertainty

In the reinforcement learning literature, there exists a bedrock distinction of exploration vs exploitation. A rat can either search for a new food source, or continue mining calories from his current stash. There is risk in exploration (what if you don’t find anything better?), and often diminishing returns (if you’re confined to 2 miles from your sleeping grounds, there’s only so much territory that needs to be explored). But without exploration, you hazard large opportunity costs and your food supply becomes quite fragile. 

Exploitation can be conducted unconsciously. You simply need nonconscious modules to track the rate of returns provided by your food site. These devices will alarm if the food source degrades, but otherwise don’t bother you much. In contrast, exploration engages an enormous amount of cognitive resources: your cognitive map (neural GPS), action plans, world-beliefs, causal inference. Exploration is about learning, and as such requires consciousness. Exploration is paying attention to the details.

Exploration will tend to produce probability matching behaviors: your actions are in proportion to your action value estimates. Exploitation tends to produce maximizing behaviors: you always choose the action estimated to produce the most value. 

Statistics and Controversy

Everyone agrees that probability theory is a profoundly useful tool for understanding uncertainty. The problem is, statisticians cannot agree on what probability means. Frequentists insist on interpreting probability as relative frequency; Bayesians interpret probability as degree of confidence. Frequentists use random variables to describe data; Bayesians are comfortable also using them to describe model parameters. 

We can reformulate the debate as between two conceptions of uncertainty. Epistemic uncertainty is the subjective Bayesian interpretation, the kind of uncertainty that can be reduced by learning. Aleatory uncertainty is the objective Frequentist stuff, the kind of uncertainty you accept and work around.

Philosophical disagreements often have interesting implications. For example, you might approach deontological (rule-based) and consequential (outcome-based) ethical theories as a winner-take-all philosophical slugfest. But Joshua Greene has shown that both camps express unique circuitry in the human mind: every human being experiencing both ethical intuitions during moral dilemmas (but at different intensities and with different activation profiles. 

The sociological fact of persistent philosophical disagreement sometimes reveals conflicting intuitions within human nature itself. Controversy reification is a thing. Is it possible this controversy within philosophy of statistics suggests a tension buried in human nature?

I submit these rivaling definitions of uncertainty are grounded in the exploration and exploitation repertoires. Exploratory behavior treats unpredictability as ignorance to be overcome, exploitation behavior treats unpredictability as noise to be accomodated. All vertebrates possess two ways of approaching uncertainty. Human philosophers and statisticians are rationalizing and formalizing truly ancient intuitions.

Cleaving Nature At Its Joints

Most disagreements are trivial. Nothing biologically significant hinges on the fact that some people prefer the color blue, and others green. Do frequentist/Bayesian intuitions resemble blue/green, or deontological/consequential? How would you tell?

Blue-preferring statements don’t seem systematically different from green-preferring statements. But intuitions about epistemic vs aleatory uncertainty do systematically differ. The psychological data presented in Brun et al (2011) is very strong on this point.

Statistical concepts are often introduced with ridiculously homogenous events, like a coin flip. It is essentially impossible for a neurotypical human to perfectly predict the outcome of a coin flip (which are determined by the arcane minutiae of muscular spasms, atmospheric friction, and chaos theory). Coin flips are perceived as the same. Irrelevant is the location of the coin flip, the atmosphere of the room, the force you apply – none seem to disturb the outcome of a fair coin. In contrast, epistemic uncertainty is perceived within single-case heterogenous events, such as propositions like “Is it true that Osama Bin Ladin is inside the compound”

As mentioned previously, these uncertainties elicit different kinds of information search (causal mental models versus counting), linguistic markers (“plausible” vs “chance”), and even different behaviors (exploration vs exploitation). 

People experience epistemic uncertainty as more aversive. People prefer to guess the roll of a die, the sex of a child, and the outcome of a horse race before the event rather than after. Before a coin flip, we experience aleatory uncertainty; if you flip the coin and hide the result, out psychology switches to a more uncomfortable sense of epistemic uncertainty. We are often less willing to bet money when we experience significant epistemic uncertainty.  

These epistemic discomforts of course make sense from an sociological perspective: if we sit under epistemic uncertainty, we are more vulnerable to being exploited – both materially by betting, and reputationally by appearing ignorant.

Several studies have found that although participants tend to be underconfident assessing probabilities that their specific answers are correct, they tend to be underconfident when later asked to estimate the proportion of items that they had answered correctly. While the particular mechanism driving this phenomenon is unclear, the pattern suggests that evaluations of epistemic vs aleatory uncertainty rely on distinct information, weights, and/or processes.

People can be primed to switch their representation. If you advise a person to “think like a statistician”, they will invariably This is true drawing balls from an urn: if you remove it but don’t show the color, people switch from Outside View (extensional) to Inside View (intensional). 

Other Appearances of the Distinction

Perhaps the most famous expression of the distinction comes from Donald Rumsfeld in 2002:

As we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns—the ones we don’t know we don’t know. And if one looks throughout the history of our country and other free countries, it is the latter category that tend to be the difficult ones.

You can also find the distinction hovering in Barack Obama’s retrospective on the decision to raid a suspected OBL compound:

  • The question of whether Osama Bin Laden was within the compound is an unknown fact – an epistemic uncertainty.
  • The question of whether the raid would be successful is an outcome of a distribution – an alethic uncertainty.

A related distinction, Knightian uncertainty, comes from the economist Frank Knight. “Uncertainty must be taken in a sense radically distinct from the familiar notion of Risk, from which it has never been properly separated…. The essential fact is that ‘risk’ means in some cases a quantity susceptible of measurement, while at other times it is something distinctly not of this character; and there are far-reaching and crucial differences in the bearings of the phenomena depending on which of the two is really present and operating…. It will appear that a measurable uncertainty, or ‘risk’ proper, as we shall use the term, is so far different from an unmeasurable one that it is not in effect an uncertainty at all.” It is  It is well illustrated by the Ellsburg Paradox:

As Hsu et al (2005) demonstrates, people literally use different systems in their brains to process the above games. When the game structure is known, the reward processing centers (the basal ganglia) are used. When the game structure is unknown, fear processing centers (amygdala nuclei) are instead employed. 

Mousavi & Gigerenzer (2017) use Knightian uncertainty to defend the rationality of heuristics in decision making. Nassim Taleb’s theory of “fat tailed distributions” are often interpreted as affirmations of Knightian uncertainty, a view he rejects

Towards a Formal Theory

For some, Knightian uncertainty has been a rallying cry driven by discontents with orthodox probability theory. It is associated with efforts at replacing its Kolmogorov foundations. Intuitionistic probability theory, replacing classical axioms with computationally tractable alternatives, is a classic example of this kind of work. But as Weatherson (2003) notes, other alternatives exist:

It is a standard claim of modern Bayesian epistemology that reasonable epistemic states should be representable by probability functions. There have been a number of authors who have opposed this claim. For example, it has been claimed that epistemic states should be representable by Zadeh’s fuzzy sets, Dempster and Shafer’s evidence functions, Shackle’s potential surprise functions, Cohen’s inductive probabilities or Schmeidler’s non-additive probabilities. A major motivation of these theorists has been that in cases where we have little or no evidence for or against p, it should be reasonable to have low degrees of belief in each of p and not-p, something apparently incompatible with the Bayesian approach. 

Evaluating the validity of these heterodoxies is beyond the scope of this article. For now, let me state that it may be possible to simply accommodate the epistemic/aleatory distinction within probability theory itself. As Andrew Gelman claims:

The distinction between different sources of uncertainty can in fact be encoded in the mathematics of conditional probability. So-called Knightian uncertainty can be modeled using the framework of probability theory.

You can arguably see the distinction in the statistical concept of Bayesian optimality. For tasks with low aleatory uncertainty (e.g., classification on high-res images), classification performance can approach 100%. But other tasks with higher aleatory uncertainty (e.g., predicting future stock prices), model performance asymptotically approaches a much lower bound. 

Recall the Bayesian interpretation of learning:

Learning is a plausibility calculus, where new data pays down uncertainty. What is uncertainty? Uncertainty is how “loosely held” our beliefs are. The more data we have, the less uncertain we must be, and the sharper the peaks in our belief distribution.

We can interpret learning as asymptoptic distribution refinement, some raw noise profile beyond which we cannot reach:

Science qua cultural learning, then, is not about certainty, not about facts etched into stone tablets. Rather, science is about painstakingly paying down epistemic uncertainty: sharpening our hypotheses to be “as simple as possible, but no simpler”. 

Inside vs Outside View

The epistemic/aleatory distinction seems to play an underrated role in forecasting. Consider the inside vs outside view, first popularized by Kahneman & Lovallo (1993):

Two distinct modes of forecasting were applied to the same problem in this incident.  The inside view of the problem is the one that all participants adopted.  An inside view forecast is generated by focusing on the case at hand, by considering the plan and the obstacles to its completion, by constructing scenarios of future progress, and by extrapolating current trends.  The outside view is the one that the curriculum expert was encouraged to adopt.   It essentially ignores the details of the case at hand, and involves no attempt at detailed forecasting of the future history of he project.  Instead, it focuses on the statistics of a class of cases chosen to be similar in relevant respects to the present one.  The case at hand is also compared to other members of the class, in an attempt to assess its position in the distribution of outcomes for the class.  …

Tetlock (2015) describes how superforecasters tend to start with the outside view, 

It’s natural to be drawn to the inside view. It’s usually concrete and filled with engaging detail we can use to craft a story about what’s going on. The inside view is typically abstract, bare, and doesn’t lend itself so readily to storytelling. But superforecasters don’t bother with any of that, at least not at first. 

Suppose I pose to you the following question. “The Renzettis live in a small house at 84 Chestnut Avenue. Frank Renzetti is forty-five and works as a bookkeeper for a moving company. Mary Renzetti is thirty-five and works part-time at a day care. They have one child, Tommy, who is five. Frank’s widowed mother, Camila, also lives with the family. Given all that information, how likely is it that the Renzettis have a pet?

A superforecaster knows to start with the outside view; in this case, the base rates. The first thing they would do is find out what percentage of American households own a pet. Starting from this probability, then you can slowly incorporating the idiosyncrasies of the Renzettis into your answer.

At first, it is very difficult to square this recommendation with how rats learn. This ordering is, in fact, precisely backwards:

Fortunately, the tension disappears when you remember the human faculty of social learning. In contrast with rats, we don’t merely form beliefs from experience; we also ingest mimetic beliefs – those which we directly download from the supermind of culture. The rivaling fields of personal epistemology and social epistemology is yet another example of controversy reification.

This, then, is why Tetlock’s advice tends to work well in practice1:

On some occasions, for some topics, humans cannot afford to engage in individual epistemic learning (see the evolution of faith). But for important descriptive matters, it is often advisable to start with a socially accepted position and “mix in” your own personal insights and perspectives (developing the Inside View). 

When I read complaints about the blind outside view, what I hear is a simple defense of individual learning.


1. Even this individual/social distinction is not quite precise enough. There are in fact, two forms of social learning. Qualitative social learning is learning by speech generated by others, quantitative social learning is learning by maths and data curated by others. Figuring out how the quantitative/qualitative data intake mechanisms work is left as an exercise to the reader 😉


  • Brun et al (2011). Two Dimensions of Uncertainty
  • Hsu et al (2005). Neural Systems Responding to Degrees of Uncertainty in Human Decision-Making
  • Kahneman & Lovallo (1993). Timid choices and bold forecasts: A cognitive perspective on risk taking
  • Mousavi & Gigerenzer (2017). Heuristics are Tools for Uncertainty
  • Tetlock (2015). Superforecasting
  • Weatherson (2003). From Classical to Intuitionistic Probability. 

GDP as Standard of Living

Part Of: Economics sequence
Content Summary: 2500 words, 25 min read

This will be an (embarrassingly high-level) overview of macroeconomics. This post is intended as a framework, a jumping-off point for more detailed analyses.


During the Great Depression, Americans had a vague sense that it was harder to keep a job, and harder to pay your bills. But no one really knew how long it would last, or if it could be brought to a merciful end. Governments tried several policy solutions, but it was very hard to tell whether their policies were helping or hurting. Governments were making decisions on the basis of such sketchy data as stock price indices, freight car loading, and incomplete indices of industrial production. 

As the problem worsened, it weighed increasingly on the public mind. For the first time, the economy entered the public lexicon as a noun. And the situation prompted governments to get more serious about economic data collection. In order to forecast economic outcomes, it pays to get quantitative about the present. What is the state of the economy?

To answer this, we might endeavor to calculate the value of all the stuff in the United States. 

Imagine going through your living space, taking every possession and entering its value into a spreadsheet. Imagine doing this, but for all goods in every house, every apartment, every place of business, every square meter of pavement (and services too, like your last haircut). 

The calculation sounds daunting. So, why not just keep track of the stuff you bought this year? Rather than calculating wealth (net worth), it’s often simpler to calculate spending. Just as a wealthier person spends more, a wealthier nation (like the US) most plausibly produces more every year.

The formal definition:

Gross Domestic Product (GDP) is the market value of all finished goods and services produced within a country in a year. 

Now, three aspects facets of this definition are worth keeping in mind:

  • finished: A finished good is a good that will not be sold again as part of another good. Steel, engines, and flour often serve as examples of intermediate goods: raw materials that are repackaged into final goods like bicycles, cars, and bread. But if a customer buys eggs to make an omelette, those eggs still count as final goods, since the omelette will not be again put up for sale. 
  • produced: GDP only counts new goods and services. A used car sold this year does not count towards GDP; but a new car does. 
  • within a country: exports count, imports count against.

GDP is how economists measure three very important aspects of human societies:

  • Standard of living is GDP per capita. 
  • Productivity is GDP per hour worked.
  • Growth is GDP change over time.

Standard of living matters. In the DR Congo, people earn on average $500 per year. That’s only six baskets of stuff… for an entire year. In Mexico, $21,000 or 178 baskets. In the United States, it’s $67,000 or 545 baskets worth of stuff.

GDP Categories

In a personal budget, it often helps to group your spending habits by categories – so too for nations. GDP is often decomposed into four components: consumption,

  1. Consumption. Private expenditures, including durable goods, non-durable goods, and services.
  2. Investment. Does not include financial products (which is instead considered saving).
  3. Government Consumption. All government expenditures on final goods/services, and also its investments
  4. Net Exports. Exports have outbound value, imports have inbound value. Imports detracts from export receipt; Net Exports = Exports – Imports. 

To understand what drives changes in GDP, other disaggregations are possible. For example,

  • Partitioning by State or Province is useful in interrogating geographical information.
  • Partitioning by Industry is useful to flagging problematic industries. 

There is a related notion of Gross National Income (GNI). The relationship between expenditures and income is something like Newton’s Third Law: “for every action, there is an equal and opposite reaction”. In theory, GDP and GNI should be equivalent; in practice they sometimes slightly come apart (for complicated reasons). GNI thus provides a complementary way of measuring changes in wealth.

There are many ways to disaggregate GNI; one of the more popular operationalizations is to consider four factors: { employee compensation, rent, interest, and profit }.

Towards Real GDP

During the hyperinflation era of Zimbabwe, the price of a sheet of toilet paper went to 417 Z-dollars. Surely, we don’t want to confuse the act of printing more money, with producing more valuable goods and services. 

We’ve all heard our grandparents say, “when I was a kid, that cost a quarter”. But such memories conflate nominal versus real prices. If you control for inflation, some goods (e.g. movie tickets) have kept roughly the same price; other goods (e.g. electricity) have become easier to purchase. Yet inflation makes both feel more expensive.

In general, money illusion denotes our predisposition to focus on nominal rather than real prices. People positively revolt when their nominal salary is cut, but rarely notice if their real salary is cut (eg if inflation increases more than your raise).

To compute real GDP over time, simply fix your dollar value to a single year (eg 2020 dollars). This allows for comparison between real GDP versus nominal GDP. In the United States before 1980, nominal growth has been about 7.5% per year; whereas real growth has been about 3.5%.

Economic data like this also showcases two important facts: when Real GDP is negative for two consecutive quarters, that is the definition of a recession (you can’t see that as clearly in this dataset, which aggregates growth by year).

The cost of an iPhone is $700 USD in the United States, and $700 in India. The cost of a haircut is $20 in the United States, and $1 in India. This is the Balassa-Samuelson effect. Why should it exist?

If iPhones were sold for less in India, more people would purchase iPhones from India & have them shipped to their house. This process is called arbitrage, and it guarantees the Law of One Price. However, Law of One Price only applies to tradable goods: you cannot ship a haircut overseas.

Before adjusting for this effect, you might conclude that the average income of a person living in India is 33 times smaller than someone living in the United States. After the adjustment, the actual number becomes visible: only 10 times less purchasing power. 

The Significance of Growth

For most developed nations, GDP doesn’t increase linearly (say, an addition $10b per year), but exponentially (e.g. 2% more per year). Just as exponential growth in epidemics can lead to surprisingly horrendous outcomes, exponential growth in economies can lead to surprisingly affluent outcomes. 

The economically naive think to themselves, “previous lives were similar to mine, except with different ideas and older technologies.” But consider that, for the entirety of human history, our predecessors lived as close to starvation as the modern-day poorest nations. After controlling for inflation – with today’s dollars – almost everyone made less than $1 per day. 

Jesus once said, “The poor will always be with you”. And yes, a person living in the 1st century would have good reason to believe our species is eternally doomed to absolute poverty. But then the industrial revolution happened!

Take a moment to get your head around this. Extreme poverty has been the fate of 90% of the world’s population since our species emerged on the world scene some 270,000 years ago. Only two centuries ago did this state of affairs change.

Prior to the industrial revolution, all human beings were subject to the Malthusian trap, where resources were a zero-sum game. Wealth temporarily increased during the Black Death, simply because there were fewer people to “share the pie” with.

Another way to view this same data is by looking at land fertility (since agriculture used to be the only significant economic sector). Ever since the first agricultural revolution in 10,000 BCE, productivity has produced people, not prosperity. 

This was the state of affairs for 99.925% of human history. You are living in a very unusual time.

The Causes of Growth

So… how did our species escape the Malthusian trap?

Escape is not a guarantee. It didn’t happen before 1800. And it also didn’t happen uniformly; it began as a phenomenon of the West.

Why is there “divergence, big time”? What causes growth to succeed or fail? To answer this, we need a theory of the causes of growth. 

As a first pass, people use cultural knowledge and physical tools to produce goods and services. The Solow Model is used to model these immediate causes of growth. But as we arguably learned from communism, bad institutions can impeded incentives to produce. While harder to measure, institutional structure orchestrates economic production. Finally, institutions do not derive ex nihilo; rather, they too are (slowly) molded by the forces of history, geography, etc etc. Our account of growth thus features three tiers of causes.

You can see the effect of institutions clearly, by satellite photos of the Korea peninsula:

Most people see this picture and think, “wow, communism really made its citizenry poor”. But that is fuzzy thinking. In 1945, Korea was a single country, with the same (quite impoverished) economy. Sure, North Korea did become somewhat more poor, but the much larger effect was – South Korea became prosperous. 

The field of development economics studies what causes some nations to catch the growth train, and others to miss it (and what can be done).

GDP vs Wealth

If you could only choose two economic measures to track, which would they be?

  • A person’s finances cannot be completely described by income; it also helps to know your net worth
  • A company’s finances cannot be completely described by profits; it also helps to know your balance sheet
  • A country’s finances cannot be completely described by GDP; it also helps to know your total wealth.

Imagine a partially-full bathtub, with some water entering and some leaving. In system dynamics jargon of stocks and flows: GDP is an inflow, wealth is a stock.

After it has been bombed, a city’s GDP often increases. Why? The damage sustained during warfare is destruction of wealth: a large outflow. Yet by the law of diminishing marginal utility, it is often easier to replace capital rather than make even more stuff. While GDP gives you a rosy picture, if you also track wealth you will have an easier time grasping the true cost of war. 

It is often useful to extend our mental model to include the environment. In this sense, GDP relies on extraction of (often non-renewable) resources from the Earth. In this sense, GDP is not just an inflow to wealth, but also an outflow of natural resources.

Exactly how large is the stock of natural resources? Your answer will likely affect your judgment of the morality of the capitalistic enterprise. 

GDP vs Welfare

One way of interpreting policy decisions is that they ought to maximize a single variable: societal welfare. But what is this variable sensitive to?

Welfare is a multidimensional measure. Other dimensions arguably should be included in any final analysis:

Importantly, GDP tends to correlate with immaterial factors of welfare. As countries become more affluent, for example, they tend to invest more in health care (and vice versa). The correlation (bidirectional causal link) between GDP and life expectancy is very strong.

Positive psychology has been directly measuring subjective life satisfaction for many years now. Enduring low standards of living is unpleasant! 

In the above, GDP per capita has been log-transformed. When you are very poor, becoming more wealthy matters a lot; when you are rich, less so. 

I used to think consequentialist thinking was confined to 19th century philosophical traditions… and then I learned economics.

Five Concerns

I’ll mention five concerns often levied against free-market economics generally, and productivity specifically.

  1. Unsustainability. Exponential growth means exponential depletion. It cannot be sustained.
  2. Materialism. Developed nations produce much more than they need; so we lionize gratuitous consumption to increase demand. 
  3. Specialization. Division of labor produces more wealth. Yet as this process intensifies, our mental lives become increasingly banal.
  4. Inequality. Capitalism is extinguishing absolute poverty, but at the same time exacerbating relative inequality. This is unfair, and socially toxic.
  5. Monoculture. The West got rich first, and abused its power first by direct colonialist enslavement, and later by sneakily-abstract trade deals.

I will defer an evaluation of these charges for now; I simply felt it useful to present this incomplete list. 


This post discussed eight topics:

  1. GDP is “the market value of all finished goods and services produced within a country in a year”
  2. There are many ways to disaggregate GDP, including looks at GNI (the equivalent, income-based variant)
  3. After you adjust for inflation, Nominal GDP becomes Real GDP. After you adjust for the Balassa-Samuelson effect, Real GDP can facilitate between-country comparisons.
  4. Before the industrial revolution, our species was stuck in a Malthusian trap, where productivity produced people not prosperity.
  5. Physical capital, human capital, and ideas conspire to create wealth. More distal influences include institutions, including property rights, reliable courts, etc…
  6. Growth is an inflow into a country’s wealth. It is important to recognize that growth depletes natural resources.  
  7. Welfare (aggregate life satisfaction) requires more than material comfort. But note! GDP strongly correlates with life expectancy and happiness.
  8. There are five concerns often voiced towards GDP talk. They are unsustainability, materialism, specialization, inequality, and monoculture.

[Excerpt] The Moral/Conventional Distinction

Part Of: Demystifying Ethics sequence
Excerpt From: Kelly et al (2007). Harm, affect, and the moral/conventional distinction.
Content Summary: 800 words, 8 min read.

Commonsense intuition seems to recognize a distinction between two quite different sorts of rules governing behavior, namely moral rules and conventional rules. Prototypical examples of moral rules include those prohibiting killing or injuring other people, stealing their property, or breaking promises. Prototypical examples of conventional rules include those prohibiting wearing gender-inappropriate clothing (e.g., men wearing dresses), licking one’s plate at the dinner table, and talking in a classroom when one has not been called on by the teacher.

Starting in the mid-1970s, a number of psychologists, following the lead of Elliott Turiel, have argued that the moral/conventional distinction is both psychologically real and psychologically important. 

Though the details have varied from one author to another, the core ideas about moral rules are as follows:

  • Moral rules have objective, prescriptive force; they are not dependent on the authority of any individual or institution.
  • Moral rules hold generally, not just locally; they not only proscribe behavior here and now, they also proscribe behavior in other countries and at other times in history.
  • Violations of moral rules typically involve a victim who has been harmed, whose rights have been violated, or who has been subject to an injustice
  • Violations of moral rules are typically more serious than violations of conventional rules. 

By contrast, the following are offered as the core features of conventional rules:

  • Conventional rules are arbitrary, situation-dependent rules that facilitate social coordination and organization; they do not have an objective, prescriptive force, and they can be suspended or changed by an appropriate authoritative individual or institution. 
  • Conventional rules are often local; the conventional rules are applicable in one community often will not apply in other communities or at other times in history.
  • Violations of conventional rules do not involve a victim who has been harmed, whose rights have been violated, or who has been subject to an injustice
  • Violations of conventional rules are typically less serious than violations of moral rules. 

To make the case that the moral/conventional distinction is both psychologically real and important, Turiel and his associates developed an experimental paradigm in which subjects are presented with prototypical examples of moral and conventional rule transgressions and asked a series of questions aimed at eliciting their judgment of such actions. 

Early findings using this paradigm indicated that subjects’ responses to prototypical moral and conventional transgressions do indeed differ systematically. Transgressions of prototypical moral rules were judged to be more serious, the wrongness of the transgression was not ‘authority dependent’, the violated rule was judged to be general in scope, and the judgments were justified by appeal to harm, justice or rights. Transgressions of prototypical conventional rules, by contrast, were judged to be less serious, the rules themselves were authority dependent and not general in scope, and the judgments were not justified by appeal to harm, justice, and rights. 

During the last 25 years, much the same pattern has been found in an impressively diverse set of subjects ranging in age from toddlers (as young as 3.5yo) to adults, with a substantial array of different nationalities and religions. The pattern has also been found in children with a variety of cognitive and developmental abnormalities, including autism. Much has been made of the intriguing fact that the pattern is not found in psychopaths or in children exhibiting psychopathic tendencies. 

What conclusions have been drawn from this impressive array of findings? The clear majority of investigators in this research tradition would likely endorse something like the following collection of conclusions:

  1. In moral/conventional task experiments, subjects typically exhibit one of two signature response patterns. Moreover, these patterns are what philosophers of science call nomological clusters – there is a strong (‘lawlike’) tendency for the members of the cluster to occur together. 
  2. Transgressions involving harm, justice of rights evoke the signature moral pattern. Transgressions that do not invoke these things evoke the signature conventional pattern.
  3. The regularities described here are pan-cultural, and emerge quite early in development.

Kevin’s Addendum

The paper goes on to criticize the moral-conventional distinction as not well supported by the data. The above introduction is thus notable in its clarity of steel-manning. Their two biggest complaints are,

  1. Experiments designed to measure the distinction are based on “schoolyard dilemmas”; those with more real-to-life moral scenarios manifest the effect less robustly.
  2. The theory is highly predicated on the progressive conceit that care/harm is the only moral dimension that matters; but cross-cultural analyses have revealed many moral taste buds.

My personal betting money is that the research tradition will survive these objections, as it responds and re-engineers itself in the coming decades.

Randomized Controlled Trials (RCTs)

Part Of: Causality sequence
See Also: Potential Outcomes model
Content Summary: 2300 words, 23 min read

Counterfactuals and the Control Group

If businesses were affected by one factor at a time, the notion of a control group would be unnecessary: just intervene and see what changes. But in real life, many causal factors can influence an outcome.

Consider click-through rates (CTR) for a website’s promotional campaign. Suppose we want to know how a website redesign will affect CTR. One naive approach would be to simply compare click-through rates before and after the change is deployed. However, even if the CTR did change, there are plenty of potential confounds: other processes that may better explain the change.

Can we conclude the website decreased click-throughs by 2,000? Only if the other causal factors driving CTR were fixed. Call this assertion ceteris paribus: other things being equal. 

In practice, can we safely assert nothing else changed from Friday to Saturday? By no means! We have taken no action to ensure these factors are fixed, and the number of wrenches can be thrown at us. 

The trick is to create an environment where other causal factors are held constant. The control group is the experimental group, except for the causal factor under investigation. So we create two servers, and ensure the product and its consumers are as similar as possible. 

So long as the two groups are in fact similar, if the (sometimes unmeasured) causal forces are equivalent, then we can safely make a causal conclusion. From this data, we might conclude that the website helped, despite the drop in CTR.

It is imperative that the experimental group must be as similar to the control group as possible. If the control group outcome was measured on a different day, the weekend effect would disappear.

To recap, how would things be different, if something else had occurred? Such counterfactual questions capture something important about causality. But we cannot access such parallel universes. The best you can do is create “clones” (maximally similar groups) in this universe. Counterfactuals are replaced with ceteris paribus; clones with the control group.

The Problem of Selection Bias

The above argument was qualitative. To get more clear on randomized control trial (RCT), it helps to formalize the argument.  

Consider two individuals: hearty Alice and frail Bob. We want to know whether or not some drug improves their health. 

Alice is assigned to the control group, Bob the treatment group. Despite taking the drug, Bob has a worse health outcome than Alice. While the treatment group is performing worse than the control group, this is not due to drug inefficacy. Rather, the difference in outcome is caused by difference in group demographics.

Let’s formalize this example. In Potential Outcome Models, we can represent whether or not she had the drug as X = \{ 0, 1\}, and whether or not their health improved as Y

For each person, the individual causal effect (ICE) of health insurance is:

Y_{1,Bob}- Y_{0,Bob} = 5 - 5 = 0

Y_{1,Bob} - Y_{0,Bob} = 4 - 3 = 1

But these potential outcomes are fundamentally unobservable. The only observation we can make is:

Y_{treatment} - Y_{control} = Y_{1,Bob} - Y_{0,Alice} = -1

Taken at face value, this suggests that Bob’s decision to accept health insurance is counterproductive. But this conclusion is erroneous. We can express this mathematically with the following device:

Y_{1,Bob} - Y_{0,Alice} = Y_{1,Bob} - Y_{0,Bob} + ( Y_{0, Bob} - Y_{0, Maria} )

In other words,

Difference = Average Causal Effect + Selection Bias

Different outcomes between experimental and control groups is a combination of the causal effect of the treatment, and the differences among groups before the treatment is applied. To isolate the causal effect, you must minimize selection bias.

Randomization versus Selection Bias

Group differences contaminate causal analyses. How often is observational data contaminated in this way? 

Quite often. For example, here are a few comparisons between those who have health insurance versus those who do not. People with health insurance are 2.71 years older, have 2.74 more years of education, are 7% more likely to be employed, and have an annual income of $60,000 more. With so many large differences in our data, we should suspect other differences in unobserved dimensions, too.

To minimize selection bias, we need our groups to be as similar as possible. We need to compare apples to apples

Random allocation is a good way to promote between-group homogeneity, before the causal intervention. We can demonstrate this statistically. Let’s say that the causal effect of a treatment is the same across individuals, \forall i, Y_{1,i} - Y_{0,i} = \kappa. Then,

E_{treatment}[Y_{1,i}] - E_{control}[Y_{0,i}]

= E_{treatment}[\kappa + Y_{0,i}] - E_{control}[Y_{0,i}]

= \kappa + E_{treatment}[Y_{0,i}] - E_{control}[Y_{0,i}]

= \kappa

Consider, for example, the Health Insurance Experiment undertaken by RAND. They randomly divided their sample into four 1000-person groups: a catastrophic plan with essentially zero insurance, and then three treatment groups with variations of different forms of health insurance.

The left column shows means for each attribute (e.g. 56% of the catastrophic group are female). Other columns represent differences between the various treatment groups and control (e.g. 56-2 = 54% of the deductible group are female). How do we know if random allocation succeeded? We simply compare the group differences with standard error: if group difference is more than 2x greater than standard error, the difference is statistically significant. 

In these data, only two group differences are statistically significant, and the differences don’t seem to follow obvious patterns, so we can conclude that random allocation appears to have executed successfully. But it’s worth underscoring that we didn’t perform randomization and then walk away, rather we empirically validate our group composition is homogenous. 

(For those wondering, RCT studies like this consistently reveal that health insurance improves financial outcomes, but not health outcomes, for the poor. In general, medicine correlates weakly with health. On the aggregate, US consumes 50% more medical services than we need.)

This post doesn’t address null hypothesis significance testing (NHST) which is an analysis technology frequently paired with RCT methodology. There are also extensions of NHST such as factorial designs and repeated measures (within-subject tests) which merit future discussion. 

External vs Internal Validity

Randomness is a proven way to minimize selection bias. It occurs in two stages:

  1. Random sampling mitigates sampling bias, thereby ensuring the study results inferences generalize to the broader population. By the law of large numbers (LLN), with sufficiently large samples, the distribution of the sample is guaranteed to approach that of the population. Random sampling promotes external validity.
  2. Random allocation mitigates selection bias, thereby ensuring that the groups have a comparable baseline. We can then safely access a causal interpretation of the study results. Random allocation promotes internal validity.

RCTs were pioneered in the field of medicine. How do you test if a drug works? You might consider simply giving the pill to treatment subjects. But human beings are complicated. We often manifest the placebo effect, where even an empty pill can produce real physiological relief in the body. There is much debate how the mere expectation of health can produce health; recent research points to the top-down control signals containing the predictions of your body’s autonomic nervous system. 

Remember our guiding principle: To minimize selection bias, we need our groups to be as similar as possible. If you want to isolate the medicinal properties of a drug, you need both groups to believe they are being treated. Giving the control group sugar-water pills is an example of blinding: your group similarity increases if subjects can’t see what group they are in. 

Blinding can mitigate our psychological penchant for letting expectations structure our experience in other domains too. Experimenters may unconsciously measure trial outcomes differently if they are financially vested in the outcome (detection bias). The most careful RCTs are double-blind trials: both experimenters and participants are ignorant of their group status for the duration of the trial.

There are other complications to bear in mind:

  • The Hawthorne effect: people behave differently if they are aware of being watched
  • Meta-analyses have revealed high levels of unblinding in pharmacological trials. 
  • Often patients will fail to comply with experimental protocol. Compliance issues may not occur at random, effectively violating ceteris paribus.
  • Often patients will drop out from the study. Just as before, attrition issues may not occur at random, effectively violating ceteris paribus. 

How do you deal with noncompliance and attrition? 

  • Intention to treat studies will leave them in the analysis: more external validity, less internal validity
  • Per protocol studies will exclude them from the analysis: less statistical power, more internal validity.

RCTs in Medical History

The field of medicine is a story of learning to trust experimental results over the opinions of the knowledgeable. Here’s an excerpt from Tetlock’s Superforecasting. 

Consider Galen, the second-century physician to Roman emperors. No one has influenced more generations of physicians. Glaen’s writings were the indisputable source of medical authority for more than a thousand years. “It is I, and I alone, who has revealed the true path to medicine,” Galen wrote with his usual modesty. And yeti Galen never conducted anything resembling a modern experiment. Why should he? Experiments are what people do when they aren’t sure what the truth is. And Galen was untroubled by doubt. Each outcome confirmed he was right, no matter how equivocal the evidence might look to someone less wise than the master. “All who drink of this treatment recover in a short time, except those whom it does not help, who all die,” he wrote. “It is obvious, therefore, that it fails only in incurable cases.”

Galen is the sort of figure who pops up repeatedly in the history of medicine. They are men of strong conviction and a profound trust in their own judgment. They embrace treatments, develop bold theories for why they work, denounce rivals as quacks and charlatans, and spread their insights with evangelical passion. So it went from the ancient Greeks to Galen to Paracelsus to the German Samuel Hahnemann and the American Benjamin Rush. In the nineteenth century, American medicine saw pitched battles between orthodox physicians and a host of charismatic figures with curious new theories like Thomsonianism, which posited that most illness was due to an excess of cold in the body. Fringe or mainstream, almost all of it was wrong, with the treatments on offer ranging from the frivolous to the dangerous. Ignorance and confidence remained defining features of medicine. As the surgeon and historian Ira Rutkow observed, physicians who furiously debated the merits of various treatments and theories were like blind men arguing over the colors of the rainbow.” 

Not until the twentieth century did the idea of RCTs, careful measurement, and statistical inference take hold. “Is the application of the numerical method to medicine a trivial and time-wasting idea as some hold, or is it an important stage in the development of our art, as others proclaim it”, the Lancet asked in 1921. 

Unfortunately, this story doesn’t end with physicians suddenly realizing the virtues of doubt and rigor. The idea of RCTs was painfully slow to catch on and it was only after World War II that the first serious trials were attempted. They delivered excellent results. But still the physicians and scientists who promoted the modernization of medicine routinely found that the medical establishment wasn’t interested, or was even hostile to their efforts.

When hospitals created cardiac care units to treat patients recovering from heart attacks, Cochrane proposed an RCT to determine whether the new units delivered better results than the old treatment, which was to send the patients home for monitoring and bed rest. Physicians balked. It was obvious the cardiac care units were superior, they said, and denying patients the best care would be unethical. But Cochrane persisted in running a trial. Partway through the trial, Cochrane told a group of cardiologists preliminary results. The difference in outcomes between the two treatments was not statistically significant, he emphasized, but it appeared that patients might do slightly better in the cardiac care units. They were vociferous in their abuse: “Archie,” they said, “we always thought you were unethical. You must stop the trial at once.” But then Cochrane revealed that he had reversed the results: home care had done slightly better than the cardiac units. There was dead silence, and a palpable sense of nausea.

Today, evidence-based medicine (EBM) rightly privileges RCTs as more authoritative than expert opinion. This movement has put forward a hierarchy of evidence, to gesture at which sources of evidence to take lightly. 

I personally deny that evidence-based medicine is the best approach to evidence. It gets confused by how to interpret “absence of evidence”, as we have seen in the Covid-19 debate on mask efficacy. Yet EBM is undeniably a big improvement from the epistemic learned helplessness that was ancient medicine.

Limitations & Prospects

Everyone agrees that RCTs are the gold standard at drawing conclusions about cause and effect. It is worth seriously considering whether RCTs can be effectively deployed to answer questions besides medicine. Can we use RCTs to get better at policy making? Charity? Managerial science?

There are several important criticisms of RCTs that are worth mentioning:

  • Ecological Sterility. The more rigorously you attempt to enforce ceteris paribus, the less your laboratory environment resembles the real world. 
  • Ethical Limitations of Scope. RCTs were never employed to test whether smoking causes cancer, because it is unethical to force someone to smoke.
  • Expense. Pharmacological RCTs cost $12 million dollars to implement, on average.
  • Statistical Power. Because of their expense, sample sizes for RCTs are often much lower than observational studies. 

RCTs are the gold standard for causal inference, but they are not the only product on the market. As we will see later, there are other technologies in the Furious Five toolbox, which statistics and econometrics use to learn causal relationships. These are,

  1. Random Controlled Trials (RCTs)
  2. Regression
  3. Instrumental Variables
  4. Regression Discontinuity
  5. Differences-in-Differences

Until next time. 

Seeing Through Calibrated Eyes

Part Of: Bayesianism sequence
See Also: [Excerpt] Fermi Estimates
Content Summary: 1500 words, 15 min read, 15 min exercise (optional)

The most important questions of life are indeed, for the most part, really only problems of probability.

Pierre Simon Laplace, 1812

Accessing One’s Own Predictive Machinery

Any analyst can describe the unnerving intimacy one develops while acclimating to a dataset.  With data visualizations, we acclimate ourselves to the contours of the Manifold of Interest, one slice at a time. Human beings simply become more incisive, powerful thinkers when we choose to put aside the rhetoric and reason directly with quantitative data. 

The Bayesian approach interprets learning as a plausibility calculus, where new data pays down uncertainty. What is uncertainty? Uncertainty is how “loosely held” our beliefs are. The more data we have, the less uncertain we must be, and the sharper the peaks in our belief distribution.

The Bayesian approach affirms silicon and nervous tissue conform to the same principles. Machines learn from digital data, our brains do the same with perceptual data.  The chamber of consciousness is small. Yet, could there be a way to directly tap into the sophisticated inference systems within our subconscious mind?

Quantifying Error Bars

How many hours per week do employees spend in meetings? Even if you don’t know the exact values to questions like these, you still know something. You know that some values would be impossible or at least highly unlikely. Getting clear on what you already know is an absolutely crucial skill to develop as a thinker. To do that, we need to find a way to accurately report our own uncertainty. 

One method to report our uncertainty is to use words of estimative probability.

But these words are crude tools. A more sophisticated approach is to express uncertainty about a number is to think of it as a range of probable values. In statistics, a range that has a particular chance of containing the correct answer is called a confidence interval (CI). A 90% CI is a range that has a 90% chance of containing the correct answer. For example, if you are 90% sure the average number of hours spent in meetings is between 6 and 15 hours, then we can say you have a 90% CI [6, 15]. You might have produced this range with sophisticated statistical inference methods, but you might have just picked them out from your experience. Either way, the values should be a reflection of your uncertainty about this quantity. 

When you say “I am 70% sure of X”, how do you know your stated uncertainty is correct? Suppose you make 10 such predictions. A calibrated estimator should get about 7 out of 10 predictions correct. An overconfident estimator will get less than 7 answers right (they knew less than they thought). An unconfident estimator will get more than 7 answers correct (they knew more than they thought).  You can be a better thinker if you learn to balance the scales between under- and over-confidence. 

Unfortunately, extensive research has shown that most people are systematically overconfident. For example, here are the results from 972 estimation tests for 90% CI intervals. If people were naturally calibrated, the number of correct responses would most typically be 9/10; but in practice the actual mean is roughly 5.5.

Here’s a real life example of overconfidence: overly narrow error bars in expert forecasts of US COVID-19 case load.

From a psychological perspective, our ignorance of our state of knowledge is not a particularly surprising fact. All animals are metacognitively incompetent  – we are truly strangers to ourselves. Our biasing towards overconfidence is easily explained by the argumentative theory of reasoning, and closely aligns with the Dunning-Kruger effect. 

Bad news so far. However, with practice and some debiasing techniques, people can become much more reliably calibrated estimators. Consider the premise of superforecasting:

In Superforecasting, Tetlock and coauthor Dan Gardner offer a masterwork on prediction, drawing on decades of research and the results of a massive, government-funded forecasting tournament. The Good Judgment Project involves tens of thousands of ordinary people—including a Brooklyn filmmaker, a retired pipe installer, and a former ballroom dancer—who set out to forecast global events. Some of the volunteers have turned out to be astonishingly good. They’ve beaten other benchmarks, competitors, and prediction markets. They’ve even beaten the collective judgment of intelligence analysts with access to classified information. They are “superforecasters.”

Calibration is a foundational skill in the art of rationality. And it can be taught.

Try It Yourself

Like other skills, calibration emerges through practice. Let’s try it out!


  • 90% CI. For each of the 90% CI questions, provide both an upper bound and a lower bound. Remember that the range should be wide enough that you believe there is a 90% chance that the answer will be between the bounds. 
  • Binary Questions. Answer whether each of the statements is true or false, then circle the probability that reflects how confident you are in your answer. If you are absolutely certain in your answer, you should say you have a 100% chance of getting the answer right. If you have no idea whatsoever, then your chance would be the same as a coin flip (50%). Otherwise (probably usually), it is one of the values between 50% and 100%.

Alright, good luck! 🙂 

I recommend printing this out!

To evaluate your results, the answer key is an image at the end of this article. Go ahead and count how many answers you got correct.

  • 90% CI. If you were fully calibrated, then you should have gotten 9 out of 10 answers right. Your test performance can be interpreted like this: if you got 7 to 10 within your range, you might be calibrated; if you got 6 right, you are very likely to be overconfident; if you got 5 or less right, you are almost certainly overconfident and by a large margin. 
  • Binary Questions. To compute the expected outcome, convert each of the percentages you circled to a decimal (i.e., .5, .6, … 1.0) and add them up. Let’s say your confidence in your answers was 0.5, 0.7, 0.6, 1, 1, 0.8, 0.5, 0.6, 0.5, 0.7, totaling to 6.9. This means your “expected” number is 6.9. For tests with 20 binary questions, most participants should get the expected score to within 2.5 points of the actual score.

Calibration Training is Possible

There are five tactics used to improve one’s calibration, in practice. We will discuss the most significant tactic first, in order of descending efficacy.

First, the most important thing we can do to improve is practice, and going over one’s mistakes. This simple advice has deep roots in global workspace theory, where the primary function of consciousness is to serve as a learning device. As I wrote elsewhere:

Consider the radical simplicity of the act of learning itself. To learn anything new, we merely pay attention to it, and thereby become conscious of it.

For a public example of self-evaluation, see SlateStarCodex annual predictions and his calibration scores. If you would like to practice against more of these general trivia tests, three are provided in the book which inspired this article, How to Measure Anything. 

Second, a particularly powerful tactic for becoming more calibrated is to pretend to bet money. 

Consider another 90% CI question: what is the average weight in tons of an adult male African elephant? As you did before, provide an upper and lower bound that are far apart enough that you think there is a 90% chance the true answer is between them. Now consider the two following games:

  • Game A. You win $1000 if the true answer turns out to be between your upper and lower bound. If not, you win nothing.
  • Game B: You roll a 10-sided die. If the die lands on anything but 10, you win $1000. Else you win nothing.

80% of subjects prefer Game B. This means that their “90% CI” is actually too narrow (they are unconsciously overconfident). 

Give yourself a choice between betting on your answer being correct or rolling the dice. I call this the equivalent bet test. Research indicates that even just pretending to bet money significantly improves a person’s ability to assess odds (Kahneman & Tversky, 1972, 1973). In fact, actually betting money turns out to be only slightly better than pretending to bet. 

Third, people apply sophisticated evaluation techniques to evaluate the claims of other people. These faculties are typically not employed for the stuff coming out of our mouth. But there is a simple technique to promote this behavior: the premortem. Imagine you got a question wrong, and on this hypothetical scenario, ask yourself why you got it wrong.  This technique has also been shown to significantly improve your performance (Koriat et al 2012).

Fourth, it’s worth noting that the anchoring heuristic can contaminate bound estimation (an example of anchoring might be, if I ask you whether Gandhi died at 120 year old, your estimate will be likely older than if I had not provided the anchor). In order not to be unduly influenced by your initial guess, it can help to determine bounds separately. Instead of asking yourself “Is there a 90% chance the answer is between LB and UB”, ask yourself “Is there a 95% chance the answer is below (above) my LB (UB)”?

Fifth, rather than approaching estimately by generating guesses, it can sometimes help to instead eliminate answers that seem absurd. Rather than guess 5,000 pounds for the elephant, explore what weights you consider absurd.

In practice, these techniques are fairly effective at improving calibration in people. Here are the results of Hubbard’s half-day of training (n=972); as you can see most people did achieve nearly perfect calibration within half a day.

All of this training was done on general trivia. Does calibrative skill generalize to other domains? There is not much research on this question, but provisionally speaking – generalization seems plausible. Individual forecasters who completed calibration training had their job performance measured and they saw improvements to their job performance.

Until next time.

Quiz Answer Key


  • Kahneman & Tversky (1972) Subjective Probability: A judgment of representativeness.
  • Kahneman & Tversky (1973) On the psychology of prediction
  • Koriat et al (1980). Reasons for confidence

[Excerpt] Fermi Estimates

Excerpt From: How to Measure Anything book
Part Of: Bayesianism sequence
Content Summary: 1200 words, 6 min read


Our first mentor of measurement did something that was probably thought by many in his day to be impossible. An ancient Greek named Eratosthenes (ca 276-194 BCE) made the first recorded measurement of the circumference of the Earth. If he sounds familiar, it might be because he is mentioned in many high school trigonometry and geometry textbooks. 

Eratosthenes didn’t use accurate survey equipment and he certainly didn’t have lasers and satellites. He didn’t even embark on a risky and potentially lifelong attempt at circumnavigating the Earth. Instead, while in the Library of Alexandria, he read that a certain deep well in Syene (a city in southern Egypt) would have its bottom entirely lit by the noon sun one day a year. This meant the sun must be directly overhead at that point in time. He also observed that at the same time, vertical objects in Alexandria (almost directly north of Syene) cast a shadow. This meant Alkexandria received sunlight at a slightly different angle at the same time. Eratosthenes recognized that he could use this information to assess the curvature of Earth.

He observed that the shadows in Alexandria at noon at that time of year made an angle that was equal to an angle of 7.2 degrees. Using geometry, he could then prove that this meant that the circumference of Earth must be 50 times the distance between Alexandria and Syene. Modern attempts to replicate Eratosthenes’ calculations put his answer within 3% of the actual value. Eratosthenes’s calculation was a huge improvement on previous knowledge, and his error was much less than the error modern scientists had just a few decades ago for the size and age of the univers. Even 1700 year later, Columbus was apparently unaware of Eratosthenes’s result; his estimate was fully 25% shorrt. (This is one of the reasons Columbus thought he might be in India, not another large, intervening landmass where I reside). In fact, a more accurate measurement than Eratosthenes’s would not be available for another 300 years after Columbus. By then, two Frenchmen, armed with the finest survey equipment available in eighteenth-century France, numerous staff, and a significant grant, finally were able to do better than Eratosthenes. 

Here is the lesson: Eratosthenes made what might seem like an impossible measurement by making a clever calculation on some simple observations. When I ask participants in my seminars how they would make this estimate without modern tools, they usually identify one of the “hard ways” to do it (e.g., circumnavigation). But Eratosthenes, in fact, need not have even left the vicinity of the library to make this calculation. He wrung more information out of the few facts he could confirm instead of assuming the hard way was the only way. 

Enrico Fermi

Consider Enrico Fermi (1901-1954 CE), a physicist who won the Nobel Prize in Physics in 1938. 

One renowned example of his measurement skills was demonstrated at the first detonation of the atom bomb on July 16, 1945, where he was one of the atomic scientists observing the blast from base camp. While other scientists were making final adjustments to instruments used to measure the yield of the blast, Fermi was making confetti out of a page of notebook paper. As the wind from the initial blast began to blow through the camp, he slowly dribbled the confetti into the air, observing how far back it was scattered by the blast (taking the farthest scattered pieces as being the peak of the pressure wave). Simply put, Fermi knew that how far the confetti scattered in the time it would flutter down from a known height (his outstretched arm) gave him a rough approximation of wind speed which, together with knowing the distance from the point of detonation, provided an approximation of the energy of the blast. 

Fermi concluded that the yield must be greater than 10 kilotons. This would have been news, since other initial observers of the blast did not know that lower limit. Could the observed blast be less than 5 kilotons? Less than 2? These answers were not obvious at first. (As it was the first atomic blast on the planet, nobody had much of an eye for these things. After much analysis of the instrument readings, the final yield estimate was determined to be 18.6 kilotons. Like Eratosthenes, Fermi was aware of a rule relating one simple observation – the scattering of confetti in the wind – to a quantity he wanted to measure. The point of the story is not to teach you enough physics to estimate like Fermi, but that, rather, you should start thinking about measurements as a multistep chain of thought. Inferences can be made from highly indirect observations.

The value of quick estimates was something Fermi was known for throughout his career. He was famous for teaching his students skills to approxximate fanciful-sounding quantities that, at first glance, they might presume they knew nothing about. The best-known example of such a “Fermi question” was Fermi asking his students to estimate the number of piano tuners in Chicago. His students – science and engineering majors – would begin by saying that they could not possibly know anything about such a quantity. What Fermi was trying to teach his students was, to figure out that they already knew something about the quantity in question. 

Fermi would start by asking them to estimate other things about pianos and piano tuners that, while still uncertain, might seem easier to estimate. These included the current population of Chicago (a little over 3 million in the 1930s), the average number of people per household (two or three), the share of households with regularly tuned pianos (not more than 1 in 10 but not less than 1 in 30), the required frequency of tuning (perhaps once a year, on average), how many pianos a tuner could tune in a day (four or five, including travel time), and how many days a year the tuner works (say, 250 or so). The result would be computed:

Tuners in Chicago = population / people per household
* percentage of households with tuned pianos
* tunings per year per piano / (tunings per tuner per day * workdays per year)

Depending on which specific values you chose, you would probably get answers in the range of 30 to 150, with something like 50 being fairly common. When this number was compared to the actual number (which Fermi would already have acquired from the phone directory of a guild list), it was always closer to the true value than the students would have guessed. This may seem like a very wide range, but consider the improvement this was from the “How could we possibly even guess?” attitude his students often started with. 


Taken together, these examples show us something very different from what we are typically exposed to in business. Executives often say “We can’t even begin to guess at something like that.” They dwell ad infinitum on the overwhelming uncertainties. Instead of making any attempt at measurement, they sometimes prefer to be stunned into inactivity by the apparent difficulty in dealing with these uncertainties. Fermi might say, “Yes, there are a lot of things you don’t know, but what do you know?”

Viewing the world as these individuals do- through calibrated eyes that see things in a quantitative light – has been a historical force propelling both science and economic productivity. If you are prepared to rethink some assumptions and put in the time, you will see through calibrated eyes as well. 

[Excerpt] The Evolution of Infanticide

See Also: Cooperative Breeding Hypothesis
Excerpt From: Hrdy (2009) Mothers and Others. Page 70-72, 99-100
Content Summary: 1300 words, 13 minute read

Child Abandonment in Nonhuman Primates

Many mammalian mothers can be surprisingly selective about babies they care for. A mother mouse or prairie dog may cull her litter, shoving aside a runt; a lioness whose cubs are too weak to walk may abandon the entire litter “with no attempt to nudge them to their feet, carry them or otherwise help. Some mammals (and this includes humans) even discriminate against healthy babies, if they happen to be born the “wrong” sex. But not Great Ape or most primate mothers. No matter how deformed, scrawny, odd, or burdensome, there is no baby that a wild ape mother won’t keep. Babies born blind, limbless, or afflicted with cerebral palsy – newborns that a hunter-gatherer mother would likely abandon at birth – are picked up and held close. If her baby is too incapacitated to hold on, the mother may walk tripedally so as to support the baby with one hand. 

Mother and ape mothers rarely discriminate based on a baby’s particular attributes, as some human mothers do. Except perhaps those born very prematurely, babies are cared for (and carried) almost no matter what. Even if her baby dies, the mother will continue to carry the desiccated corpse around for days.

Child Abandonment in Humans

Maternal devotion in the human case is more complicated. A woman undergoes the same endocrinological transformations during pregnancy as other apes. At birth, her cortisol levels and heartbeat reflect just how sensitive to infant cues she has become. But whereas the nonhuman ape mother undiscriminatingly accepts any infant born to her without taking into account physical attributes, the human mother’s devotion is more conditional. A newborn perceived as defective may be drowned, buried alive, or simply wrapped in leaves and left in the bush within a few hours of birth. “Defective” may mean anything from having too few toes or too few. It may mean being born with a deformed limb or at a very low birthweight, coming too soon after the birth of an older sibling, or having some culturally arbitrary “affliction” such as having too much or too little hair, or being born the wrong sex. 

Unlike any other ape, a mother in a hunter-gatherer society examines her baby right after birth and, depending on its specific attributes and her own social circumstances (especially how much social support she is likely to have) makes a conscious decision to either keep the baby or let it die. In most traditional hunter-gatherer societies, abandonment is rare, and almost always undertaken with regret. It is an act no woman wants to recall, a topic ethnographers must tiptoe around gingerly. Typically, interviewers will broach the subject indirectly, asking other women rather than the mother herself. Back when the !Kung still lives as nomadic foragers, the rate of abandonment was about one in one hundred live births. Higher rates were reported among people with strong sex preferences, as among the pre-missionized Eipo horticulturalists of highland New Guinea. Forty-one percent of live births in this group resulted in abandonment, and in the vast majority of cases the abandoned babies were newborn daughters whose mothers hoped to reduce the time until a song might be born. 

Once a baby has nursed at his mother’s breast and lactation is under way, a woman’s hormonal and neurological responses to this stimulation, combined with visual, auditory, tactile, and olfactory cues, produce a powerful emotional attachment to her baby. Once she passes this tipping point, a mother’s passionate desire to keep her baby safe usually overrides other (including conscious) considerations. This is why, if a mother is going to abandon her infant, she usually does so immediately, before her milk comes in and before mother-infant bonding is past the point of no return.

Two Kinds of Parenting Style

There are two kinds of primate parenting styles:

  • Continuous care and contact, where the mother’s hyper-possessive instincts rebuff offers of otherwise-interested babysitters
  • Cooperative breeding, where relatives (“allomothers”) take turns carrying the young, and sometimes provisioning them with food.

About half of all primate species use cooperative breeding models. However, only 20% of primate species do alloparents provision the young, and for the most part this provisioning does not amount to much. Let us call robust cooperative breeding those species that generously provision their young. So far the only full alloparents belong to the family callitrichidae– mostly marmosets and tamarins. Callitrichidae are famous for breeding fast and for their rapid colonization of new habitats. 

More than 30 million years have passed since humans last shared a common ancestor with these tiny (rarely more than four pounds), clawed, squirrel-like arboreal creatures. New World monkeys literally inhabit a different world from that of their primate cousins who evolved in Africa. Theirs is a sensory world dominated by smell rather than sight. Yet in many respects callitrichids may provide better insight into early hominin family lives than do far more closely related species like chimpanzees or cercopithecine monkeys.

What humans have in common with the Callitrichidae is worth itemizing. In both types of primates, group members are unusually sensitive to the needs of others and are characterized by potent impulses to give. In both groups, a mother produces closely spaced offspring whose needs exceed her capacity to provide for them. Thus the mother must rely on others to help care for and provision her young. When prospects for support seem poor, mothers in both groups are more likely to bail out than other primates are. Human and callitrichid mothers stand out for their pronounced ambivalence toward newborns and their extremely contingent maternal commitment. Infants have adapted, as we will see later, with special traits for attracting the attention of potential caregivers. And finally, humans have a marmoset-like ability to colonize and thrive in novel environments. 

What happens when you take a clever ape with incipient social intelligence, tool manufacturing, robust mindreading,then introduce cooperative breeding? This, we submit, is the recipe to produce a uniquely human cognitive system. Prosocial motivations transformed the mindreading system into a mindsharing system, which ultimately led to the development of norms, language, and cumulative culture.

This is the cooperative breeding hypothesis. 

The Dark Side of Cooperative Breeding

As noted above, By far, the most common exceptions to this general primate pattern are found in the family Callitrichidae. Like all cooperative breeders, tamarin and marmoset mothers depend on others to help rear their young. Shared care and provisioning clearly enhances maternal reproductive success, but there is also a dark side to such dependence. Tamarin mothers short on help may abandon their young, bailing out at birth by failing to pick up neonates when they fall to the ground or forcing clinging newborns off their bodies. Although infanticide is a hazard across the Primate order, observations almost always implicate either strange males or females other than the mother, not the mother herself. 

The high rates of maternal abandonment seen among callitrichids and humans are almost unheard of elsewhere among primates. Cooperative breeding systems endowed humans with a deep felt sense of cooperation and altruism… but increased rates of child abandonment are a corollary.

The Evolution of Abortion

Note: this section is my own; these are not Hrdy’s words.

It is possible to interpret modern debates about abortion to this ancient primate instinct documented above. As humans became increasingly culturally sophisticated, the motivation to abandon a child could be acted upon prenatally.

This is not to make an appeal to nature, “X is good because it is natural”. Indeed, our normative systems (mindsharing writ large) allow us to push against human nature when we so choose. And I won’t speak towards a moral appraisal of abortion here.

But let’s imagine human parenting systems were instead inherited from the continuous care and contact model of the other great apes. In such a system, I submit the topic of abortion would be as foreign as meat-eating might be to a talking gorilla.