Randomized Controlled Trials (RCTs)

Part Of: Causality sequence
See Also: Potential Outcomes model
Content Summary: 2300 words, 23 min read

Counterfactuals and the Control Group

If businesses were affected by one factor at a time, the notion of a control group would be unnecessary: just intervene and see what changes. But in real life, many causal factors can influence an outcome.

Consider click-through rates (CTR) for a website’s promotional campaign. Suppose we want to know how a website redesign will affect CTR. One naive approach would be to simply compare click-through rates before and after the change is deployed. However, even if the CTR did change, there are plenty of potential confounds: other processes that may better explain the change.

Can we conclude the website decreased click-throughs by 2,000? Only if the other causal factors driving CTR were fixed. Call this assertion ceteris paribus: other things being equal. 

In practice, can we safely assert nothing else changed from Friday to Saturday? By no means! We have taken no action to ensure these factors are fixed, and the number of wrenches can be thrown at us. 

The trick is to create an environment where other causal factors are held constant. The control group is the experimental group, except for the causal factor under investigation. So we create two servers, and ensure the product and its consumers are as similar as possible. 

So long as the two groups are in fact similar, if the (sometimes unmeasured) causal forces are equivalent, then we can safely make a causal conclusion. From this data, we might conclude that the website helped, despite the drop in CTR.

It is imperative that the experimental group must be as similar to the control group as possible. If the control group outcome was measured on a different day, the weekend effect would disappear.

To recap, how would things be different, if something else had occurred? Such counterfactual questions capture something important about causality. But we cannot access such parallel universes. The best you can do is create “clones” (maximally similar groups) in this universe. Counterfactuals are replaced with ceteris paribus; clones with the control group.

The Problem of Selection Bias

The above argument was qualitative. To get more clear on randomized control trial (RCT), it helps to formalize the argument.  

Consider two individuals: hearty Alice and frail Bob. We want to know whether or not some drug improves their health. 

Alice is assigned to the control group, Bob the treatment group. Despite taking the drug, Bob has a worse health outcome than Alice. While the treatment group is performing worse than the control group, this is not due to drug inefficacy. Rather, the difference in outcome is caused by difference in group demographics.

Let’s formalize this example. In Potential Outcome Models, we can represent whether or not she had the drug as X = \{ 0, 1\}, and whether or not their health improved as Y

For each person, the individual causal effect (ICE) of health insurance is:

Y_{1,Bob}- Y_{0,Bob} = 5 - 5 = 0

Y_{1,Bob} - Y_{0,Bob} = 4 - 3 = 1

But these potential outcomes are fundamentally unobservable. The only observation we can make is:

Y_{treatment} - Y_{control} = Y_{1,Bob} - Y_{0,Alice} = -1

Taken at face value, this suggests that Bob’s decision to accept health insurance is counterproductive. But this conclusion is erroneous. We can express this mathematically with the following device:

Y_{1,Bob} - Y_{0,Alice} = Y_{1,Bob} - Y_{0,Bob} + ( Y_{0, Bob} - Y_{0, Maria} )

In other words,

Difference = Average Causal Effect + Selection Bias

Different outcomes between experimental and control groups is a combination of the causal effect of the treatment, and the differences among groups before the treatment is applied. To isolate the causal effect, you must minimize selection bias.

Randomization versus Selection Bias

Group differences contaminate causal analyses. How often is observational data contaminated in this way? 

Quite often. For example, here are a few comparisons between those who have health insurance versus those who do not. People with health insurance are 2.71 years older, have 2.74 more years of education, are 7% more likely to be employed, and have an annual income of $60,000 more. With so many large differences in our data, we should suspect other differences in unobserved dimensions, too.

To minimize selection bias, we need our groups to be as similar as possible. We need to compare apples to apples

Random allocation is a good way to promote between-group homogeneity, before the causal intervention. We can demonstrate this statistically. Let’s say that the causal effect of a treatment is the same across individuals, \forall i, Y_{1,i} - Y_{0,i} = \kappa. Then,

E_{treatment}[Y_{1,i}] - E_{control}[Y_{0,i}]

= E_{treatment}[\kappa + Y_{0,i}] - E_{control}[Y_{0,i}]

= \kappa + E_{treatment}[Y_{0,i}] - E_{control}[Y_{0,i}]

= \kappa

Consider, for example, the Health Insurance Experiment undertaken by RAND. They randomly divided their sample into four 1000-person groups: a catastrophic plan with essentially zero insurance, and then three treatment groups with variations of different forms of health insurance.

The left column shows means for each attribute (e.g. 56% of the catastrophic group are female). Other columns represent differences between the various treatment groups and control (e.g. 56-2 = 54% of the deductible group are female). How do we know if random allocation succeeded? We simply compare the group differences with standard error: if group difference is more than 2x greater than standard error, the difference is statistically significant. 

In these data, only two group differences are statistically significant, and the differences don’t seem to follow obvious patterns, so we can conclude that random allocation appears to have executed successfully. But it’s worth underscoring that we didn’t perform randomization and then walk away, rather we empirically validate our group composition is homogenous. 

(For those wondering, RCT studies like this consistently reveal that health insurance improves financial outcomes, but not health outcomes, for the poor. In general, medicine correlates weakly with health. On the aggregate, US consumes 50% more medical services than we need.)

This post doesn’t address null hypothesis significance testing (NHST) which is an analysis technology frequently paired with RCT methodology. There are also extensions of NHST such as factorial designs and repeated measures (within-subject tests) which merit future discussion. 

External vs Internal Validity

Randomness is a proven way to minimize selection bias. It occurs in two stages:

  1. Random sampling mitigates sampling bias, thereby ensuring the study results inferences generalize to the broader population. By the law of large numbers (LLN), with sufficiently large samples, the distribution of the sample is guaranteed to approach that of the population. Random sampling promotes external validity.
  2. Random allocation mitigates selection bias, thereby ensuring that the groups have a comparable baseline. We can then safely access a causal interpretation of the study results. Random allocation promotes internal validity.

RCTs were pioneered in the field of medicine. How do you test if a drug works? You might consider simply giving the pill to treatment subjects. But human beings are complicated. We often manifest the placebo effect, where even an empty pill can produce real physiological relief in the body. There is much debate how the mere expectation of health can produce health; recent research points to the top-down control signals containing the predictions of your body’s autonomic nervous system. 

Remember our guiding principle: To minimize selection bias, we need our groups to be as similar as possible. If you want to isolate the medicinal properties of a drug, you need both groups to believe they are being treated. Giving the control group sugar-water pills is an example of blinding: your group similarity increases if subjects can’t see what group they are in. 

Blinding can mitigate our psychological penchant for letting expectations structure our experience in other domains too. Experimenters may unconsciously measure trial outcomes differently if they are financially vested in the outcome (detection bias). The most careful RCTs are double-blind trials: both experimenters and participants are ignorant of their group status for the duration of the trial.

There are other complications to bear in mind:

  • The Hawthorne effect: people behave differently if they are aware of being watched
  • Meta-analyses have revealed high levels of unblinding in pharmacological trials. 
  • Often patients will fail to comply with experimental protocol. Compliance issues may not occur at random, effectively violating ceteris paribus.
  • Often patients will drop out from the study. Just as before, attrition issues may not occur at random, effectively violating ceteris paribus. 

How do you deal with noncompliance and attrition? 

  • Intention to treat studies will leave them in the analysis: more external validity, less internal validity
  • Per protocol studies will exclude them from the analysis: less statistical power, more internal validity.

RCTs in Medical History

The field of medicine is a story of learning to trust experimental results over the opinions of the knowledgeable. Here’s an excerpt from Tetlock’s Superforecasting. 

Consider Galen, the second-century physician to Roman emperors. No one has influenced more generations of physicians. Glaen’s writings were the indisputable source of medical authority for more than a thousand years. “It is I, and I alone, who has revealed the true path to medicine,” Galen wrote with his usual modesty. And yeti Galen never conducted anything resembling a modern experiment. Why should he? Experiments are what people do when they aren’t sure what the truth is. And Galen was untroubled by doubt. Each outcome confirmed he was right, no matter how equivocal the evidence might look to someone less wise than the master. “All who drink of this treatment recover in a short time, except those whom it does not help, who all die,” he wrote. “It is obvious, therefore, that it fails only in incurable cases.”

Galen is the sort of figure who pops up repeatedly in the history of medicine. They are men of strong conviction and a profound trust in their own judgment. They embrace treatments, develop bold theories for why they work, denounce rivals as quacks and charlatans, and spread their insights with evangelical passion. So it went from the ancient Greeks to Galen to Paracelsus to the German Samuel Hahnemann and the American Benjamin Rush. In the nineteenth century, American medicine saw pitched battles between orthodox physicians and a host of charismatic figures with curious new theories like Thomsonianism, which posited that most illness was due to an excess of cold in the body. Fringe or mainstream, almost all of it was wrong, with the treatments on offer ranging from the frivolous to the dangerous. Ignorance and confidence remained defining features of medicine. As the surgeon and historian Ira Rutkow observed, physicians who furiously debated the merits of various treatments and theories were like blind men arguing over the colors of the rainbow.” 

Not until the twentieth century did the idea of RCTs, careful measurement, and statistical inference take hold. “Is the application of the numerical method to medicine a trivial and time-wasting idea as some hold, or is it an important stage in the development of our art, as others proclaim it”, the Lancet asked in 1921. 

Unfortunately, this story doesn’t end with physicians suddenly realizing the virtues of doubt and rigor. The idea of RCTs was painfully slow to catch on and it was only after World War II that the first serious trials were attempted. They delivered excellent results. But still the physicians and scientists who promoted the modernization of medicine routinely found that the medical establishment wasn’t interested, or was even hostile to their efforts.

When hospitals created cardiac care units to treat patients recovering from heart attacks, Cochrane proposed an RCT to determine whether the new units delivered better results than the old treatment, which was to send the patients home for monitoring and bed rest. Physicians balked. It was obvious the cardiac care units were superior, they said, and denying patients the best care would be unethical. But Cochrane persisted in running a trial. Partway through the trial, Cochrane told a group of cardiologists preliminary results. The difference in outcomes between the two treatments was not statistically significant, he emphasized, but it appeared that patients might do slightly better in the cardiac care units. They were vociferous in their abuse: “Archie,” they said, “we always thought you were unethical. You must stop the trial at once.” But then Cochrane revealed that he had reversed the results: home care had done slightly better than the cardiac units. There was dead silence, and a palpable sense of nausea.

Today, evidence-based medicine (EBM) rightly privileges RCTs as more authoritative than expert opinion. This movement has put forward a hierarchy of evidence, to gesture at which sources of evidence to take lightly. 

I personally deny that evidence-based medicine is the best approach to evidence. It gets confused by how to interpret “absence of evidence”, as we have seen in the Covid-19 debate on mask efficacy. Yet EBM is undeniably a big improvement from the epistemic learned helplessness that was ancient medicine.

Limitations & Prospects

Everyone agrees that RCTs are the gold standard at drawing conclusions about cause and effect. It is worth seriously considering whether RCTs can be effectively deployed to answer questions besides medicine. Can we use RCTs to get better at policy making? Charity? Managerial science?

There are several important criticisms of RCTs that are worth mentioning:

  • Ecological Sterility. The more rigorously you attempt to enforce ceteris paribus, the less your laboratory environment resembles the real world. 
  • Ethical Limitations of Scope. RCTs were never employed to test whether smoking causes cancer, because it is unethical to force someone to smoke.
  • Expense. Pharmacological RCTs cost $12 million dollars to implement, on average.
  • Statistical Power. Because of their expense, sample sizes for RCTs are often much lower than observational studies. 

RCTs are the gold standard for causal inference, but they are not the only product on the market. As we will see later, there are other technologies in the Furious Five toolbox, which statistics and econometrics use to learn causal relationships. These are,

  1. Random Controlled Trials (RCTs)
  2. Regression
  3. Instrumental Variables
  4. Regression Discontinuity
  5. Differences-in-Differences

Until next time. 

Seeing Through Calibrated Eyes

Part Of: Bayesianism sequence
See Also: [Excerpt] Fermi Estimates
Content Summary: 1500 words, 15 min read, 15 min exercise (optional)

The most important questions of life are indeed, for the most part, really only problems of probability.

Pierre Simon Laplace, 1812

Accessing One’s Own Predictive Machinery

Any analyst can describe the unnerving intimacy one develops while acclimating to a dataset.  With data visualizations, we acclimate ourselves to the contours of the Manifold of Interest, one slice at a time. Human beings simply become more incisive, powerful thinkers when we choose to put aside the rhetoric and reason directly with quantitative data. 

The Bayesian approach interprets learning as a plausibility calculus, where new data pays down uncertainty. What is uncertainty? Uncertainty is how “loosely held” our beliefs are. The more data we have, the less uncertain we must be, and the sharper the peaks in our belief distribution.

The Bayesian approach affirms silicon and nervous tissue conform to the same principles. Machines learn from digital data, our brains do the same with perceptual data.  The chamber of consciousness is small. Yet, could there be a way to directly tap into the sophisticated inference systems within our subconscious mind?

Quantifying Error Bars

How many hours per week do employees spend in meetings? Even if you don’t know the exact values to questions like these, you still know something. You know that some values would be impossible or at least highly unlikely. Getting clear on what you already know is an absolutely crucial skill to develop as a thinker. To do that, we need to find a way to accurately report our own uncertainty. 

One method to report our uncertainty is to use words of estimative probability.

But these words are crude tools. A more sophisticated approach is to express uncertainty about a number is to think of it as a range of probable values. In statistics, a range that has a particular chance of containing the correct answer is called a confidence interval (CI). A 90% CI is a range that has a 90% chance of containing the correct answer. For example, if you are 90% sure the average number of hours spent in meetings is between 6 and 15 hours, then we can say you have a 90% CI [6, 15]. You might have produced this range with sophisticated statistical inference methods, but you might have just picked them out from your experience. Either way, the values should be a reflection of your uncertainty about this quantity. 

When you say “I am 70% sure of X”, how do you know your stated uncertainty is correct? Suppose you make 10 such predictions. A calibrated estimator should get about 7 out of 10 predictions correct. An overconfident estimator will get less than 7 answers right (they knew less than they thought). An unconfident estimator will get more than 7 answers correct (they knew more than they thought).  You can be a better thinker if you learn to balance the scales between under- and over-confidence. 

Unfortunately, extensive research has shown that most people are systematically overconfident. For example, here are the results from 972 estimation tests for 90% CI intervals. If people were naturally calibrated, the number of correct responses would most typically be 9/10; but in practice the actual mean is roughly 5.5.

Here’s a real life example of overconfidence: overly narrow error bars in expert forecasts of US COVID-19 case load.

From a psychological perspective, our ignorance of our state of knowledge is not a particularly surprising fact. All animals are metacognitively incompetent  – we are truly strangers to ourselves. Our biasing towards overconfidence is easily explained by the argumentative theory of reasoning, and closely aligns with the Dunning-Kruger effect. 

Bad news so far. However, with practice and some debiasing techniques, people can become much more reliably calibrated estimators. Consider the premise of superforecasting:

In Superforecasting, Tetlock and coauthor Dan Gardner offer a masterwork on prediction, drawing on decades of research and the results of a massive, government-funded forecasting tournament. The Good Judgment Project involves tens of thousands of ordinary people—including a Brooklyn filmmaker, a retired pipe installer, and a former ballroom dancer—who set out to forecast global events. Some of the volunteers have turned out to be astonishingly good. They’ve beaten other benchmarks, competitors, and prediction markets. They’ve even beaten the collective judgment of intelligence analysts with access to classified information. They are “superforecasters.”

Calibration is a foundational skill in the art of rationality. And it can be taught.

Try It Yourself

Like other skills, calibration emerges through practice. Let’s try it out!


  • 90% CI. For each of the 90% CI questions, provide both an upper bound and a lower bound. Remember that the range should be wide enough that you believe there is a 90% chance that the answer will be between the bounds. 
  • Binary Questions. Answer whether each of the statements is true or false, then circle the probability that reflects how confident you are in your answer. If you are absolutely certain in your answer, you should say you have a 100% chance of getting the answer right. If you have no idea whatsoever, then your chance would be the same as a coin flip (50%). Otherwise (probably usually), it is one of the values between 50% and 100%.

Alright, good luck! 🙂 

I recommend printing this out!

To evaluate your results, the answer key is an image at the end of this article. Go ahead and count how many answers you got correct.

  • 90% CI. If you were fully calibrated, then you should have gotten 9 out of 10 answers right. Your test performance can be interpreted like this: if you got 7 to 10 within your range, you might be calibrated; if you got 6 right, you are very likely to be overconfident; if you got 5 or less right, you are almost certainly overconfident and by a large margin. 
  • Binary Questions. To compute the expected outcome, convert each of the percentages you circled to a decimal (i.e., .5, .6, … 1.0) and add them up. Let’s say your confidence in your answers was 0.5, 0.7, 0.6, 1, 1, 0.8, 0.5, 0.6, 0.5, 0.7, totaling to 6.9. This means your “expected” number is 6.9. For tests with 20 binary questions, most participants should get the expected score to within 2.5 points of the actual score.

Calibration Training is Possible

There are five tactics used to improve one’s calibration, in practice. We will discuss the most significant tactic first, in order of descending efficacy.

First, the most important thing we can do to improve is practice, and going over one’s mistakes. This simple advice has deep roots in global workspace theory, where the primary function of consciousness is to serve as a learning device. As I wrote elsewhere:

Consider the radical simplicity of the act of learning itself. To learn anything new, we merely pay attention to it, and thereby become conscious of it.

For a public example of self-evaluation, see SlateStarCodex annual predictions and his calibration scores. If you would like to practice against more of these general trivia tests, three are provided in the book which inspired this article, How to Measure Anything. 

Second, a particularly powerful tactic for becoming more calibrated is to pretend to bet money. 

Consider another 90% CI question: what is the average weight in tons of an adult male African elephant? As you did before, provide an upper and lower bound that are far apart enough that you think there is a 90% chance the true answer is between them. Now consider the two following games:

  • Game A. You win $1000 if the true answer turns out to be between your upper and lower bound. If not, you win nothing.
  • Game B: You roll a 10-sided die. If the die lands on anything but 10, you win $1000. Else you win nothing.

80% of subjects prefer Game B. This means that their “90% CI” is actually too narrow (they are unconsciously overconfident). 

Give yourself a choice between betting on your answer being correct or rolling the dice. I call this the equivalent bet test. Research indicates that even just pretending to bet money significantly improves a person’s ability to assess odds (Kahneman & Tversky, 1972, 1973). In fact, actually betting money turns out to be only slightly better than pretending to bet. 

Third, people apply sophisticated evaluation techniques to evaluate the claims of other people. These faculties are typically not employed for the stuff coming out of our mouth. But there is a simple technique to promote this behavior: the premortem. Imagine you got a question wrong, and on this hypothetical scenario, ask yourself why you got it wrong.  This technique has also been shown to significantly improve your performance (Koriat et al 2012).

Fourth, it’s worth noting that the anchoring heuristic can contaminate bound estimation (an example of anchoring might be, if I ask you whether Gandhi died at 120 year old, your estimate will be likely older than if I had not provided the anchor). In order not to be unduly influenced by your initial guess, it can help to determine bounds separately. Instead of asking yourself “Is there a 90% chance the answer is between LB and UB”, ask yourself “Is there a 95% chance the answer is below (above) my LB (UB)”?

Fifth, rather than approaching estimately by generating guesses, it can sometimes help to instead eliminate answers that seem absurd. Rather than guess 5,000 pounds for the elephant, explore what weights you consider absurd.

In practice, these techniques are fairly effective at improving calibration in people. Here are the results of Hubbard’s half-day of training (n=972); as you can see most people did achieve nearly perfect calibration within half a day.

All of this training was done on general trivia. Does calibrative skill generalize to other domains? There is not much research on this question, but provisionally speaking – generalization seems plausible. Individual forecasters who completed calibration training had their job performance measured and they saw improvements to their job performance.

Until next time.

Quiz Answer Key


  • Kahneman & Tversky (1972) Subjective Probability: A judgment of representativeness.
  • Kahneman & Tversky (1973) On the psychology of prediction
  • Koriat et al (1980). Reasons for confidence

[Excerpt] Fermi Estimates

Excerpt From: How to Measure Anything book
Part Of: Bayesianism sequence
Content Summary: 1200 words, 6 min read


Our first mentor of measurement did something that was probably thought by many in his day to be impossible. An ancient Greek named Eratosthenes (ca 276-194 BCE) made the first recorded measurement of the circumference of the Earth. If he sounds familiar, it might be because he is mentioned in many high school trigonometry and geometry textbooks. 

Eratosthenes didn’t use accurate survey equipment and he certainly didn’t have lasers and satellites. He didn’t even embark on a risky and potentially lifelong attempt at circumnavigating the Earth. Instead, while in the Library of Alexandria, he read that a certain deep well in Syene (a city in southern Egypt) would have its bottom entirely lit by the noon sun one day a year. This meant the sun must be directly overhead at that point in time. He also observed that at the same time, vertical objects in Alexandria (almost directly north of Syene) cast a shadow. This meant Alkexandria received sunlight at a slightly different angle at the same time. Eratosthenes recognized that he could use this information to assess the curvature of Earth.

He observed that the shadows in Alexandria at noon at that time of year made an angle that was equal to an angle of 7.2 degrees. Using geometry, he could then prove that this meant that the circumference of Earth must be 50 times the distance between Alexandria and Syene. Modern attempts to replicate Eratosthenes’ calculations put his answer within 3% of the actual value. Eratosthenes’s calculation was a huge improvement on previous knowledge, and his error was much less than the error modern scientists had just a few decades ago for the size and age of the univers. Even 1700 year later, Columbus was apparently unaware of Eratosthenes’s result; his estimate was fully 25% shorrt. (This is one of the reasons Columbus thought he might be in India, not another large, intervening landmass where I reside). In fact, a more accurate measurement than Eratosthenes’s would not be available for another 300 years after Columbus. By then, two Frenchmen, armed with the finest survey equipment available in eighteenth-century France, numerous staff, and a significant grant, finally were able to do better than Eratosthenes. 

Here is the lesson: Eratosthenes made what might seem like an impossible measurement by making a clever calculation on some simple observations. When I ask participants in my seminars how they would make this estimate without modern tools, they usually identify one of the “hard ways” to do it (e.g., circumnavigation). But Eratosthenes, in fact, need not have even left the vicinity of the library to make this calculation. He wrung more information out of the few facts he could confirm instead of assuming the hard way was the only way. 

Enrico Fermi

Consider Enrico Fermi (1901-1954 CE), a physicist who won the Nobel Prize in Physics in 1938. 

One renowned example of his measurement skills was demonstrated at the first detonation of the atom bomb on July 16, 1945, where he was one of the atomic scientists observing the blast from base camp. While other scientists were making final adjustments to instruments used to measure the yield of the blast, Fermi was making confetti out of a page of notebook paper. As the wind from the initial blast began to blow through the camp, he slowly dribbled the confetti into the air, observing how far back it was scattered by the blast (taking the farthest scattered pieces as being the peak of the pressure wave). Simply put, Fermi knew that how far the confetti scattered in the time it would flutter down from a known height (his outstretched arm) gave him a rough approximation of wind speed which, together with knowing the distance from the point of detonation, provided an approximation of the energy of the blast. 

Fermi concluded that the yield must be greater than 10 kilotons. This would have been news, since other initial observers of the blast did not know that lower limit. Could the observed blast be less than 5 kilotons? Less than 2? These answers were not obvious at first. (As it was the first atomic blast on the planet, nobody had much of an eye for these things. After much analysis of the instrument readings, the final yield estimate was determined to be 18.6 kilotons. Like Eratosthenes, Fermi was aware of a rule relating one simple observation – the scattering of confetti in the wind – to a quantity he wanted to measure. The point of the story is not to teach you enough physics to estimate like Fermi, but that, rather, you should start thinking about measurements as a multistep chain of thought. Inferences can be made from highly indirect observations.

The value of quick estimates was something Fermi was known for throughout his career. He was famous for teaching his students skills to approxximate fanciful-sounding quantities that, at first glance, they might presume they knew nothing about. The best-known example of such a “Fermi question” was Fermi asking his students to estimate the number of piano tuners in Chicago. His students – science and engineering majors – would begin by saying that they could not possibly know anything about such a quantity. What Fermi was trying to teach his students was, to figure out that they already knew something about the quantity in question. 

Fermi would start by asking them to estimate other things about pianos and piano tuners that, while still uncertain, might seem easier to estimate. These included the current population of Chicago (a little over 3 million in the 1930s), the average number of people per household (two or three), the share of households with regularly tuned pianos (not more than 1 in 10 but not less than 1 in 30), the required frequency of tuning (perhaps once a year, on average), how many pianos a tuner could tune in a day (four or five, including travel time), and how many days a year the tuner works (say, 250 or so). The result would be computed:

Tuners in Chicago = population / people per household
* percentage of households with tuned pianos
* tunings per year per piano / (tunings per tuner per day * workdays per year)

Depending on which specific values you chose, you would probably get answers in the range of 30 to 150, with something like 50 being fairly common. When this number was compared to the actual number (which Fermi would already have acquired from the phone directory of a guild list), it was always closer to the true value than the students would have guessed. This may seem like a very wide range, but consider the improvement this was from the “How could we possibly even guess?” attitude his students often started with. 


Taken together, these examples show us something very different from what we are typically exposed to in business. Executives often say “We can’t even begin to guess at something like that.” They dwell ad infinitum on the overwhelming uncertainties. Instead of making any attempt at measurement, they sometimes prefer to be stunned into inactivity by the apparent difficulty in dealing with these uncertainties. Fermi might say, “Yes, there are a lot of things you don’t know, but what do you know?”

Viewing the world as these individuals do- through calibrated eyes that see things in a quantitative light – has been a historical force propelling both science and economic productivity. If you are prepared to rethink some assumptions and put in the time, you will see through calibrated eyes as well. 

[Excerpt] The Evolution of Abortion

See Also: Cooperative Breeding Hypothesis
Excerpt From: Hrdy (2009) Mothers and Others. Page 70-72, 99-100
Content Summary: 1300 words, 13 minute read

Child Abandonment in Nonhuman Primates

Many mammalian mothers can be surprisingly selective about babies they care for. A mother mouse or prairie dog may cull her litter, shoving aside a runt; a lioness whose cubs are too weak to walk may abandon the entire litter “with no attempt to nudge them to their feet, carry them or otherwise help. Some mammals (and this includes humans) even discriminate against healthy babies, if they happen to be born the “wrong” sex. But not Great Ape or most primate mothers. No matter how deformed, scrawny, odd, or burdensome, there is no baby that a wild ape mother won’t keep. Babies born blind, limbless, or afflicted with cerebral palsy – newborns that a hunter-gatherer mother would likely abandon at birth – are picked up and held close. If her baby is too incapacitated to hold on, the mother may walk tripedally so as to support the baby with one hand. 

Mother and ape mothers rarely discriminate based on a baby’s particular attributes, as some human mothers do. Except perhaps those born very prematurely, babies are cared for (and carried) almost no matter what. Even if her baby dies, the mother will continue to carry the desiccated corpse around for days.

Child Abandonment in Humans

Maternal devotion in the human case is more complicated. A woman undergoes the same endocrinological transformations during pregnancy as other apes. At birth, her cortisol levels and heartbeat reflect just how sensitive to infant cues she has become. But whereas the nonhuman ape mother undiscriminatingly accepts any infant born to her without taking into account physical attributes, the human mother’s devotion is more conditional. A newborn perceived as defective may be drowned, buried alive, or simply wrapped in leaves and left in the bush within a few hours of birth. “Defective” may mean anything from having too few toes or too few. It may mean being born with a deformed limb or at a very low birthweight, coming too soon after the birth of an older sibling, or having some culturally arbitrary “affliction” such as having too much or too little hair, or being born the wrong sex. 

Unlike any other ape, a mother in a hunter-gatherer society examines her baby right after birth and, depending on its specific attributes and her own social circumstances (especially how much social support she is likely to have) makes a conscious decision to either keep the baby or let it die. In most traditional hunter-gatherer societies, abandonment is rare, and almost always undertaken with regret. It is an act no woman wants to recall, a topic ethnographers must tiptoe around gingerly. Typically, interviewers will broach the subject indirectly, asking other women rather than the mother herself. Back when the !Kung still lives as nomadic foragers, the rate of abandonment was about one in one hundred live births. Higher rates were reported among people with strong sex preferences, as among the pre-missionized Eipo horticulturalists of highland New Guinea. Forty-one percent of live births in this group resulted in abandonment, and in the vast majority of cases the abandoned babies were newborn daughters whose mothers hoped to reduce the time until a song might be born. 

Once a baby has nursed at his mother’s breast and lactation is under way, a woman’s hormonal and neurological responses to this stimulation, combined with visual, auditory, tactile, and olfactory cues, produce a powerful emotional attachment to her baby. Once she passes this tipping point, a mother’s passionate desire to keep her baby safe usually overrides other (including conscious) considerations. This is why, if a mother is going to abandon her infant, she usually does so immediately, before her milk comes in and before mother-infant bonding is past the point of no return.

Two Kinds of Parenting Style

There are two kinds of primate parenting styles:

  • Continuous care and contact, where the mother’s hyper-possessive instincts rebuff offers of otherwise-interested babysitters
  • Cooperative breeding, where relatives (“allomothers”) take turns carrying the young, and sometimes provisioning them with food.

About half of all primate species use cooperative breeding models. However, only 20% of primate species do alloparents provision the young, and for the most part this provisioning does not amount to much. Let us call robust cooperative breeding those species that generously provision their young. So far the only full alloparents belong to the family callitrichidae– mostly marmosets and tamarins. Callitrichidae are famous for breeding fast and for their rapid colonization of new habitats. 

More than 30 million years have passed since humans last shared a common ancestor with these tiny (rarely more than four pounds), clawed, squirrel-like arboreal creatures. New World monkeys literally inhabit a different world from that of their primate cousins who evolved in Africa. Theirs is a sensory world dominated by smell rather than sight. Yet in many respects callitrichids may provide better insight into early hominin family lives than do far more closely related species like chimpanzees or cercopithecine monkeys.

What humans have in common with the Callitrichidae is worth itemizing. In both types of primates, group members are unusually sensitive to the needs of others and are characterized by potent impulses to give. In both groups, a mother produces closely spaced offspring whose needs exceed her capacity to provide for them. Thus the mother must rely on others to help care for and provision her young. When prospects for support seem poor, mothers in both groups are more likely to bail out than other primates are. Human and callitrichid mothers stand out for their pronounced ambivalence toward newborns and their extremely contingent maternal commitment. Infants have adapted, as we will see later, with special traits for attracting the attention of potential caregivers. And finally, humans have a marmoset-like ability to colonize and thrive in novel environments. 

What happens when you take a clever ape with incipient social intelligence, tool manufacturing, robust mindreading,then introduce cooperative breeding? This, we submit, is the recipe to produce a uniquely human cognitive system. Prosocial motivations transformed the mindreading system into a mindsharing system, which ultimately led to the development of norms, language, and cumulative culture.

This is the cooperative breeding hypothesis. 

The Dark Side of Cooperative Breeding

As noted above, By far, the most common exceptions to this general primate pattern are found in the family Callitrichidae. Like all cooperative breeders, tamarin and marmoset mothers depend on others to help rear their young. Shared care and provisioning clearly enhances maternal reproductive success, but there is also a dark side to such dependence. Tamarin mothers short on help may abandon their young, bailing out at birth by failing to pick up neonates when they fall to the ground or forcing clinging newborns off their bodies. Although infanticide is a hazard across the Primate order, observations almost always implicate either strange males or females other than the mother, not the mother herself. 

The high rates of maternal abandonment seen among callitrichids and humans are almost unheard of elsewhere among primates. Cooperative breeding systems endowed humans with a deep felt sense of cooperation and altruism… but increased rates of child abandonment are a corollary.

The Evolution of Abortion

Note: this section is my own; these are not Hrdy’s words.

It is possible to interpret modern debates about abortion to this ancient primate instinct documented above. As humans became increasingly culturally sophisticated, the motivation to abandon a child could be acted upon prenatally.

This is not to make an appeal to nature, “X is good because it is natural”. Indeed, our normative systems (mindsharing writ large) allow us to push against human nature when we so choose. And I won’t speak towards a moral appraisal of abortion here.

But let’s imagine human parenting systems were instead inherited from the continuous care and contact model of the other great apes. In such a system, I submit the topic of abortion would be as foreign as meat-eating might be to a talking gorilla.

The Domestication of Sapiens

Part Of: Anthropogeny sequence
Followup To: An Introduction to Domestication
Content Summary: 2000 words, 20 min read

Two Forms of Aggression

Aggression is not a natural kind. Rather, as described in e.g., Siegel & Victoroff (2009), there are two kinds of aggression.

  1. Reactive aggression is based on the RAGE subsystem. It is the biological basis of resource competition. 
  2. Proactive aggression is based on the SEEKING subsystem. It is the biological basis of predation, and sexual selection-driven infanticide.

These two systems have different behavioral signatures. Reactive aggression is associated with high arousal, sudden initiation, and functions to remove a threatening stimulus. As observers of a bar fight can tell you, you don’t want to get close to an enraged person at the wrong time – the aggressive behavior can easily switch its target. In contrast, proactive aggression is associated with low arousal, planned initiation, and functions to achieve some sort of goal. 

These systems also feature different physiological signatures. Reactive aggression is caused by activation of the mediobasal hypothalamic nuclei, and dorsal nuclei of the periaqueductal gray (PAG). Amygdala activity promotes these behaviors, and are accompanied by low levels of prefrontal control. In contrast, proactive aggression is caused by activation of the lateral hypothalamic nucleus, and ventral regions of the PAG; amygdala activity suppresses its expression, and it is accompanied by significant cortical activity. 

Of course, these two systems can interact.

  • When a beta chimpanzee challenges an alpha, he may convert predatory aggression (plotting a coup) to an escalating sequence of reactive violence.
  • When a human being suffers intense personal injury and is unable to immediately retaliate, he may convert that reactive rage into the more proactive and delayed phenomena known as vengeance

The distinction also prominently appears in human legal codes: we tend to punish proactive aggression (premeditated murder) more virulent than reactive aggressive (bar fight). 

Anyone looking at homicide data will tell you that being male, and being young, render a person much more likely to kill. Violence-generating mechanisms differ by sex, because each sex is subject to diverging selective pressures.

Of course, homicide can be produced by two kinds of aggression. It would be more useful to policy makers to analyze rates of reactive versus proactive aggression separately. Given its more cognitive basis, I suspect proactive violence is more amenable to cultural interventions; whereas reactive violence might be best treated with therapy and pharmaceuticals to strengthen one’s self-control.

And indeed, just these kinds of considerations are now being employed by social scientists seeking to better understand and mitigate phenomena such as domestic violence, and delinquency in children.

From a historical perspective, our species spent most of its history as foragers (i.e.,, hunter-gatherers), with statecraft a consequence of the agricultural revolution. There is a keen interest in understanding the natural tendency of forager populations, since these are more representative of the “original social contract”. The Rousseau paradigm sees foraging humans as a naturally benign and unaggressive species. This position considers violence to be promoted by the state. The Hobbes paradigm rejects the idea of the noble savage and holds violence in the evolutionary path. In this view, the state is an instrument to restrain violence.  

The Evolution of War

Comparative biology data can resolve the Rousseau-Hobbes debate. 

First, consider how chimpanzees use gangs of allied individuals to achieve political ends through aggressive means. These coalitions are very rare in the animal kingdom. They are only known to occur among social carnivores and primates. These acts of coalition-based aggression are proactive in nature. 

Second, it is important to understand how chimpanzees express xenophobia. Chimp troops don’t wander haphazardly; they instead inhabit clearly demarcated territories. The troops of neighboring communities are treated with hostility, so much so that up to 75% of the time is spent in the central 35% of the range. Another expression of chimp territoriality is border patrols, conducted by groups of male chimps moving stealthily to enforce their territory’s boundaries. 

Third, these factors coalesce in the phenomenon of chimpanzee commando raids, with large groups of males penetrating deep into enemy territory, stalking and killing members of competing troops. Why small-scale raids instead of large-scale brawls? Well, warfare is only adaptive when the potential benefits outweigh the risk of personal injury. Thus, these raids are governed by the logic of a local imbalance of power. Raids preferentially occur when the attacking party has gathered significantly more fighting power than the defender (Wrangham 1999).

Killing doesn’t directly increase one’s biological fitness. Why then has such behavior been selected? Because successful raids promote the possibility of territorial expansion (Mitani et al 2010), plausibly by weakening the other groups’ overall fighting power. In turn, territory size directly correlates with resource and mate availability.

Here is Wrangham (1999) explaining parallels with human warfare:

It is clear that intergroup aggression has occurred among many, possibly all, hunter-gatherer populations and follows a rather uniform pattern. From the most northern to the most southern latitudes, the most common pattern of intergroup aggression was for a party of men from one group to launch a surprise attack in circumstances in which the attackers were unlikely to be harmed. Attacks were sometimes unsuccessful but were, at other times, responsible for the deaths of one or many victims. Women and girls were sometimes captured.

Chimpanzees and hunter gatherers, we conclude, share a tendency to respond aggressively in encounters with members of other social groups; to avoid intensely aggressive confrontations in battle line (typically, by retreating); and to seek, or take advantage of, opportunities to use imbalances of power for males to kill members of neighboring groups.

Indeed, even the rate at which foraging humans and chimpanzees engage in between-group violence is quite similar:

These data suggest a common mechanism. It is not that humans evolved a unique thirst for warfare. Rather, this instinct long predates our species.

The Domestication of Bonobos

It is hard to imagine species with more dramatically different social lives than bonobos and chimps. They are renowned for their ultra-sexualtity: sexual acts are used in lieu of grooming, as the primary vehicle to strengthen relationships. They also exhibit startlingly low rates of violence:

  1. Killing of any kind (including coalition-based acts of violence) is literally unheard of. 
  2. Rape and infanticide have also never occurred.
  3. Commando raids do not occur; bonobos do not even express hostility to neighboring troop “outgroups”. 

The bonobos and chimp lineages diverged very recently (less than 1 mya); yet they lead entirely different social lives. How is this possible?

A clue comes from observations of unusually strong female coalitions in bonobos. Every time a male tries to coerce a female for food or sex, that female’s coalition vigorously rebuff the coalition. These female coalitions in effect give non-aggressive males an advantage. Over the generations, this selective pressure will yield decreasing levels of (proactive) aggression in the bonobo species. 

As we learned in An Introduction to Domestication: when aggression is downregulated in a species, a whole complex of unintended byproducts occur. And we see precisely this domestication syndrome in bonobos. Bonobos have smaller crania, reduced pigmentation, increased sexual behaviors, and a general uptick in childlike mannerisms. Bonobos domesticated themselves! Here is the model from Hare et al (2012):

The Puzzle of Humanity

Chimpanzees and humans have comparably high rates of proactive (predatory) violence, and this proclivity underlies a shared love of warfare. In contrast, bonobos exhibit near-zero rates of proactive aggression.

Let’s turn our attention back to reactive violence. Bonobos exhibit moderate forms of reactive aggression; primarily expressed by female coalitions to curtail male domination behaviors. In contrast, chimps are notoriously short-tempered; reacting violently to even trivial “provocations”. How do rates of human reactive aggression compare in practice?

Even in comparatively violent forager groups, the difference is remarkably large. Humans experience reactive violence at rates two orders of magnitude less than our chimpanzee cousins. 

With these data, the following picture has emerged:

Let’s assume chimpanzee aggression behaviors are representative of the LCA. Bonobos became docile by a process of self-domestication. Why are humans less reactively violent? Did we self-domesticate too?

Another Case of Self-Domestication

The surprising answer is yes. A host of anatomical changes in H. Sapiens around 300 ka all support the self-domestication hypothesis (Leach 2003, Cieri et al 2014).

One symptom of domestication is paedomorphism: childlike features that extend into adulthood. Our adult cranium (especially the smooth, round skull) resembles the skull of chimpanzee children (in contrast with a chimpanzee adult’s prognathic face):

In domestication (among others), we see a reduction in face size, and a feminization of the skull:

These changes look a lot like the change between our mid-Pleistocene ancestor and anatomically modern H. Sapiens:

Other “domestication signatures” in modern humans include:

  • Brain volume reduction (in last 30,000 years)
  • Smaller teeth, small face-body ratio
  • Reduced sexual dimorphism (differences between male vs female skeletons)
  • More childlike features in adults (longer juvenile period, extended learning, adult play)
  • Increased fertility rate (incl. hidden estrus)
  • Increased rates of lifelong homosexuality

This anatomical evidence of self-domestication nicely explains with our species’ unique relationship with violence. 

Significance of Self-Domestication

Consider again that the domestication syndrome appears between 400 and 100 kya. The Heidelbergs were more violent than Sapiens. 

Most primates don’t get enraged by acts of violence that don’t involve them personally. But humans do experience moral outrage at such acts, to the point of being willing to engage in so-called altruistic punishment (risking personal injury to punish a third-party offense).  

Moral instincts are one of the couple dozen traits that are uniquely human. Evolutionary anthropology must explain when and why these uniquely human faculties were forged. Being willing to punish acts of reactive violence surely played a role in the self-domestication process. We can safely conclude that morality as a cognitive adaptation evolved late. Heidelbergs were amoral; Sapiens were increasingly subject to the moral sentiments.

I’ll speak towards why morality evolved another time. For now, let’s turn our attention from the causes, to the effects of self-domestication. For it turns out that these data give us unique insights into what made our species ecologically dominant. Heidelbergs did not conquer the globe – Sapiens did. But how could a reduction in intra-group violence create the necessary conditions for our species’ success?

Our species was not successful because of its pacifism. Rather, the cultural intelligence hypothesis holds that our species’ unique gifts for coordinating with others to transmit cultural knowledge created the conditions for cultural ratcheting. Rather than inheriting only our genetic legacy, we also inherit cultural knowledge which (together with our innate endowments) give us increasing powers to control our environment.

Importantly, the ability for our cultural knowledge (or “super mind”) to accumulate information is not guaranteed. If a particular community of humans is too few in number, or too antagonistic towards one another, its net cultural know-how will not grow across the generations. 

In this model, our cultural instincts evolved earlier in our lineage. But the advent of morality, and its concomitant reduction in reactive violence, was the event that unleashed the astonishing generative potential of human culture.

Until next time. 

Inspiring Materials

Some of these views are articulated in more detail in Wrangham (2019a). For video lecture on this topic, please see:

Works Cited

  • Cieri et al (2014). Craniofacial Feminization, Social Tolerance, and the Origins of Behavioral Modernity
  • Hare et al (2012). The self-domestication hypothesis: evolution of bonobo psychology is due to selection against aggression
  • Leach (2003). Human Domestication Reconsidered.
  • Marean (2015). An Evolutionary Anthropological Perspective on Modern Human Origins
  • Mitani et al (2010). Lethal intergroup aggression leads to territorial expansion in chimpanzees
  • Siegel & Victoroff (2009). Understanding human aggression: New insights from neuroscience 
  • Wrangham (1999). Evolution of Coalitionary Killing 
  • Wrangham (2003). Intergroup Relations in Chimpanzees
  • Wrangham (2018). Two types of aggression in human evolution
  • Wrangham (2019a). The Goodness Paradox: The Strange Relationship Between Virtue and Violence
  • Wrangham (2019b). Hypotheses for the Evolution of Reduced Reactive Aggression in the Context of Human Self-Domestication

Strangers To Ourselves

Part Of: Sociality sequence
Followup To: Intro to Confabulation
Content Summary: 2000 words, 20 min read

We do not have direct access to our mental lives. Rather, self-perception is performed by other-directed faculties (i.e., mindreading) being “turned inwards”. We guess our intentions, in exactly the same way we guess at the intentions of others.

Self-Knowledge vs Other-Knowledge

The brain is organized into perception-action cycles, with decisions mediating these streams.  We can represent this thesis as a simple cartoon, which also captures the abstraction hierarchy (concrete vs abstract decisions) and the two loop hypothesis (world vs body).

Agent files are the mental records we maintain about our relationships with people. Mindreading denotes the coalition of processes that attempt to reverse engineer the mental state of other people: their goals, their idiosyncratic mental states, and even their personality. Folk psychology contrasts this interpretive method of understanding other people with our ability to understand ourselves. 

We have powerful intuitions that self-understanding is fundamentally different than other-understanding. The Cartesian doctrine of introspection holds that our mental states and mechanisms are transparent; that is, directly accessible to us. It doesn’t matter which mental system generates the attitude, or why it does so – we can directly perceive all of this. 

Our Unconscious Selves

Cartesian thinking has fallen out of favor. Why? Because we discovered that most mental activity happens outside of conscious awareness.

A simple example should illustrate. When we speak, the musculature in our vocal tracts contort in highly specific ways. Do you have any idea which muscles move, and in which direction, to speak? No – you are merely conscious of the high-level desire. The way that those instructions are cached out at the more detailed motor commands is opaque to you. 

The first movement against transparency was Freud, who championed that a repression hypothesis: that unconscious beliefs are too depraved to be admitted to consciousness. But, after a brief detour through radical behaviorism, modern cognitive psychology tends to avow a plumbing hypothesis: that unconscious states are too complex (or not sufficiently useful) to merit admission to consciousness.

The distinction between unconscious and conscious processes can feel abstract, until you grapple with the limited capacity of consciousness. Why is it possible to read one, but not two books simultaneously? Why is it possible for most of us to remember a new phone number, but not the first twenty digits of pi, after the first 15 minutes of exposure? 

The ISA Theory

The Interpretive Sensory-Access (ISA) theory holds that our conscious selves are completely ignorant of our own mental lives save for the mindreading faculty. That is, the very same faculty used in our social interactions also constructs models of ourselves. 

It is important to realize that the range of perceptual data available for self-interpretation is larger than that available for people outside of ourselves. For both types of mindreading, we have perceptual data on various behaviors. In the case of self-mindreading, we also have access to our subvocalizations (inner speech) and the low-capacity contents of the global broadcast, more generally. 

Perhaps our mindreading faculties are more accurate, given they have more data on which to construct a self-narrative. 

The ISA theory explains the behavior-identity bootstrap; i.e., why the “fake it until you make it” proverb is apt. By acting in accordance with a novel role (e.g., helping the homeless more often), we gradually begin to become that person (e.g., resonating to the needs of others more powerfully in general). 

Theses, Predictions, Evidence

The ISA theory can be distilled into four theses:

  1. There is a single mental faculty underlying our attributions if propositional attitudes, whether to ourselves or to others
  2. That faculty has only sensory access to its domain
  3. Its access to our attitudes is interpretive rather than transparent
  4. The mental faculty in question evolved to sustain and facilitate other-directed forms of social cognition. 

The ISA theory is testable. It generates the following predictions:

  1. No non-sensory awareness of our inner lives
  2. There should be no substantive differences in the development of a child’s capacities for first-person and third-person understanding. 
  3. There should be no dissociation between a person’s ability to attribute mental states to themselves and to others. 
  4. Humans should lack any form of deep and sophisticated metacognitive competence. 
  5. People should confabulate promiscuously. 
  6. Any non-human animal capable of mindreading should be capable of turning its mindreading abilities on itself. 

These predictions are largely borne out in experimental data:

  1. Introspection-sampling studies suggest that some people believe themselves to experience non-sensory attitudes. These data is hard for ISA theory to accommodate. But it is also hard for introspection-based theories to reconcile with – if we had transparent access to our attitudes, why do some people only experience them with a sensory overlay?
  2. Wellman et al (2001) conducted a meta-analysis of well over 100 pairs of experiments in which children had been asked, both to ascribe a false belief to another persons and to attribute a previous false belief to themselves. They were able to find no significant difference in performance, even at the youngest ages tested. 
  3. Other theorists (e.g., Nichols & Stich 2003) claim that autism exemplifies deficits in other-k but not in self-k, and schizophrenia is an impairment of self-k but not other-k. But on inspection, these claims have weak if nonexistent empirical support. These syndromes injure both forms of knowledge.
  4. Transparent self-knowledge should entail robust metacognitive competencies. But we do not.  For example, the correlation between people’s judgments of learning and later recall are not very strong (Dunlosky & Metcalfe (2009)). 
  5. The philosophical doctrine of first-person authority holds that we cannot hold false beliefs about our mental lives. The robust phenomena of confabulation discredits this hypothesis (Nisbett & Wilson (1977)). We are allergic to admitting “I don’t know why I did that”; rather, we invent stories about ourselves without realizing their contrived nature. I discuss this form of “sincere dishonesty” at length here.
  6. Primates are capable of desire mindreading, and their behavior is consistent with their possessing some rudimentary forms of self-knowledge.

The ISA theory thus receives ample empirical confirmation.

Competitors to ISA Theory

There are many competitors to the ISA account. For the below, we will use attitude to denote non-perceptual mental representations such as desires, goals, reasons and decisions. 

  1. Source tagging theories (e.g., Rey 2013) hold that, whenever the brain generates a new attitude, the generating system(s) add a tag indicating their source. Whenever that representation is globally broadcast, our conscious selves can inspect the tag to view its origin. 
  2. Attitudinal working memory theories (e.g., Fodor 1983, Evans 1982) hold that, in addition to a perception-based working memory system, there is a separate faculty to broadcast conscious attitudes and decisions. 
  3. Constitutive authority theories (e.g., Wilson 2002, Wegner 2002, Frankish 2009) admit that conscious events (e.g., suppose we say I want to go to the store) do not directly cause action. However, we do attribute these utterances to ourselves, and the subconscious metanorm I DESIRE TO REALIZE MY COMMITMENTS works to translate these conscious self-attributions to unconscious action programs. 
  4. Inner sense theories hold that, as animal brains increased in complexity, there was increasing need for cognitive monitoring and control. To perform that adaptive function, the faculty of inner sense evolved to generate metarepresentations: representations of object-level computational state. There are three important flavors of this theory:

But there are data speaking against these theories

  1. Contra source tagging, the source monitoring literature shows that people simply don’t have transparent access to the sources of their memory images. For example, Henkel et al (2000) required subjects to either see, hear, imagine as seen, or imagine as heard, a number of familiar events, such as a basketball bouncing. But people frequently misremembered which of these four mediums had produced their memory, when asked later. 
  2. The capacity limits of sensory-based working memory explains nearly the entire phenomena of fluid g, also known as IQ (Colom et al 2004). If attitudinal working memory evolved alongside this system, it is hard to explain why it doesn’t contribute to fluid intelligence scores. 

More tellingly, however, each of the above theories fails to explain confabulation data. Most inner sense theories today (e.g., Goldman 2006) adopt a dual-method stance: when confabulating, people are using mindreading; else people are using transparent inner sense. But as an auxiliary hypothesis, dual-method theories fail to explain the patterning of when a person will make correct versus incorrect self-attributions. 

Biased ISA Theory

The ISA theory holds self-knowledge to be grounded in sparse but unbiased perceptual knowledge. But this does not seem to be the whole story. For we know that we are prone to overestimate the good qualities of the Self and Us, but underestimate the bad qualities of the Other and Them. 

For example, the fundamental attribution error describes the tendency to explain our own failings as contingent on the situation, but the failings of others to immutable character flaws. More generally, the argumentative theory of reasoning posits a justification faculty which subconsciously makes our reasons rosier, and our folk sociology faculty demonizes members of the outgroup. 

In social psychology, there is a distinction between dispositional beliefs (avowals that are generated live) and standing beliefs (those actively represented in long-term memory). The relationship between the content of what one says and the content of the underlying attitude may be quite complex. It is unclear whether these parochial biases act upon standing or dispositional beliefs. 

Explaining Transparency

The following section is borrowed from Carruthers (2020). 

In general, our judgments of others’ opinions come in two phases:

  1. First pass representation of the attitude expressed, relying on syntax, prosody, and the salient feature of conversational context.
  2. Lie Detection. Whenever the degree of support for the initial interpretation is lower than normal, or there is a competing interpretation in play that has at least some degree of support, or the potential costs of a misunderstanding are higher than normal, a signal would be sent to executive systems to slow down and issue inquiries more widely before a conclusion is reached. 

Why do our self-attributions feel transparent? Plausibly, because, the attribution of self-attitudes only undergo the first stage (not subject to disambiguation and lie detection systems). This architecture would likely generate the following inference rules:

  1. One believes one is in mental state M → one is in mental state M.
  2. One believes one isn’t in mental state M → one isn’t in mental state M.

The first will issue in intuitions of infallible knowledge, and the second in the intuition that mental states are always self-presenting to their possessors.

For example, consider the following two sentences

  1. John thinks he has just decided to go to the party, but really he hasn’t. 
  2. John thinks he doesn’t intend to go to the party, but really he does.

These sentences are hard to parse, precisely because the mindreading inference rules render them strikingly counterintuitive.

These intuitions may be merely tacit initially, but will rapidly transition into explicit transparency beliefs in cultures that articulate them. Such beliefs might be expected to exert a deep “attractor effect” on cultural evolution, being sustained and transmitted both because of their apparent naturalness. And indeed, transparency doctrines have been found in traditions from Aristotle, to the Mayans, to pre-Buddhist China.

Until next time. 

Inspiring Materials

These views are more completely articulated in Carruthers (2011). For a lecture on this topic, please see:

Works Cited

  • Carruthers (2011). The Opacity of Mind
  • Carruthers (2020). How mindreading might mislead cognitive science
  • Colom et al (2004). Working memory is (almost) perfectly predicted by g
  • Evans (1982). The Varieties of Reference
  • Henkel et al (2000). Cross-modal source monitoring confusions between perceived and imagined events
  • Fodor (1983). The Modularity of Mind
  • Goldman (2006). Simulating Minds. 
  • Frankish (2009). How we know our conscious minds. 
  • Nichols & Stitch (2003). Mindreading: An Integrated Account of Pretence, Self-Awareness, and Understanding Other Minds
  • Nisbett & Wilson (1977). Telling more than we can know: verbal reports on mental processes.
  • Rey (2013). We aren’t all self-blind: A defense of modest introspectionism
  • Wilson (2002). Strangers to ourselves
  • Wegner (2002). The illusion of conscious will

The Mindreading System

Part Of: Sociality sequence
Followup To: Counterfactual Simulation
Content Summary: 1600 words, 16 min read

A Brief Review

Mindreading (also known as mentalizing, the intentional stance, or theory of mind) is the penchant of animals to represent the mental lives of one another. What are the beliefs and desires of those around us? A classic demonstration of mindreading comes from Heider & Simmel (1944):

While the mindreading faculty was designed to understand the minds of other animals, it had no trouble ascribing beliefs and goals to two dimensional shapes. This is roughly analogous to your email provider accepting a tennis ball as a login password.

Another classic demonstration of mindreading is the Sally-Anne test, from Baren-Cohen et al (1985):

Imagine Sally puts a marble in the basket, then leaves the room. Anne moves the marble from the basket into the box. When Sally returns to the room looking for the marble, where will she turn next?

This test is designed to test the ability to understand false beliefs

The Phylogeny of Mindreading

Historically, mindreading intrigued researchers because it seemed a uniquely human capacity. Here is a quote from Carruthers (2011):

Until 2000, all the evidence seemed to point to the conclusion that apes lack any understanding of the perceptual access of others. But a breakthrough came when it was realized that all the initial experiments had involved cooperative paradigms of one sort or another. For example, the apes might have had to decide which of two humans to request food from. But non-human apes are not naturally cooperative. They are, however, immensely competitive. Hence it might be they weren’t sufficiently motivated to take account of others’; perceptions in cooperative contexts, but would do so in competitive ones. 

In a ground-breaking series of experiments, Hare et al (2000, 2001) set out to test this idea.. Indeed, in competing for food, the subordinate seemed to take account of what the dominant could see. Taken together with the results for chimps understanding desires, it seems that monkeys and apes possess at least a goal-perception mindreading system. 

Indeed, Call & Tomasello (2008) reviews mindreading studies conducted on chimpanzees, conclusively showing that chimpanzee behavior is responsive to the mental states they perceive in other living things. 

But do chimpanzees possess knowledge of false beliefs? The answer as recently as 2014 was a clear “no”. Dozens of attempts to test chimpanzee awareness of false beliefs had failed to establish any such confidence…

Until they did. Krupenye et al (2016) and Buttelmann et al (2017) demonstrate that, indeed, great apes can pass false belief tasks when the experimental paradigm is sufficiently motivating. In anticipatory looking studies such as this, the question is whether the ape anticipates the action according to what he knows, or what the actor falsely beliefs. As you can see, the ape anticipates the actor to act erroneously (red dots indicate where the chimp is looking)

Chimpanzees (and other great apes) possess a full capacity for mindreading. There is (more contentious) evidence of mindreading in non-ape primates, canids and corvids. I suspect some of the simpler capacities like detecting self-generated motion are deeply homologous. But the more sophisticated faculties seem likely to be cases of convergent evolution.  In the case of the great apes, I suspect that their robust mindreading faculties evolved in service of competitive foraging efficiency.

The Ontogeny of Mindreading

Ontogeny builds individuals. It is one thing to say adults can pass these three dozen tests related to mindreading. It is another to say when the ability to pass these tests is delivered by ontogeny.

For example, when are children able to pass the Sally-Anne test? Baren-Cohen et al (1985) demonstrate that this ability arrives at 44 months, except in autistic children for whom the ability to pass is severely delayed.

Do the Sally-Anne results suggest that the ability to model the knowledge of others arrives at 44 months? By no means! While a child can verbalize false beliefs at 44 months, it turns out their subconscious mind begins generating the appropriate expectancies much earlier. Looking time studies (e.g., Onishi & Baillargeon 2006) demonstrate that 15 month infants are surprised by violations of false-belief scenarios. 

Why the 2.5 year gap between intuition-based and language-based understanding of false belief? Carruthers (2011, section 8.4) reviews attempts to explain it; but I didn’t find his explanation (or anyone else’s) particularly satisfying. It turns out that this intuition vs report gap may generalize beyond mindreading; similar phenomenons occur in e.g., measures of outcome bias. Let’s bookmark this as an open question, and move on…

While the mindreading abilities of adult humans is fairly comparable to other apes, in humans it is more frequently expressed in non-competitive contexts. Further, comparative development data has shown that mindreading emerges much earlier in human infancy:

As we will see later, this acceleration of mindreading has important implications.

Sub-processes: Components of Mindreading

The term mindreading has proven a useful banner to rally research efforts. But more work needs to be done to identify the basis functions underlying mindreading. In the language of the theoretician’s quadrant, to move forward, we must engage in a Q3 exercise. 

We have begun to sketch an outline of these basis functions during previous discussions of social phenomena. 

Other mindreading-impacting phenomena we have not yet discussed include:

  • Agency Detection. When the natural world violates our expectations (a leaf moves against gravity), often these events are caused by an (presently unseen) agent. Mismatches between agent and agency detection are thought to generate the intuitions that underlie our species’ folk animism. 
  • Emotion Contagion
  • Friendship behaviors.
  • Shared Attention mechanisms. Before we can reason about the beliefs of another agent, we must learn to 
  • Cultural Psychological mechanisms. We have yet to discuss prestige biases, and our compulsive need to share information.
  • Six Pillars of Selfhood. Kahneman distinguishes the Remembering vs Experiencing self. There’s 

The following graphic attempts to bring together these subcomponents into a 10,000 foot view of the system. For more on this train of thought, I recommend Schaafsma et al (2015).

Clearly, representation mindreading involves more than a single faculty, but rather deploys a broad coalition of social faculties. It is likely that distinct “mindreading tasks” employed in experiments typically recruit coalitions with subtly different profiles. 

Relationship to ICNs

The cognitive neuroscience community has converged on a set of neural mechanisms underlying goal and representation mindreading. These five regions are:

  1. Medial Prefrontal Cortex (MPFC)
  2. Posterior Cingulate Cortex (PCC)
  3. Temporo-Parietal Junction (TPJ)
  4. Superior Temporal Sulcus (STS).
  5. Temporal Pole (TP). 

We have begun localizing specific functions to these five regions-of-interest. The TPJ seems to be the key site for representation mindreading, whereas goal mindreading is produced by the other sites; with the temporal pole appearing to underlie desire attribution specifically.

Scientific consensus is hard to achieve without a deluge of data; this network is here to stay. But there are two reasons to hesitate before drawing further conclusions. First, “mindreading” is probably not a natural kind; neural mechanisms probably map to more granular functions that join together to produce both macrosystems.

Second, these five regions must be structurally understood in terms of intrinsic connectivity networks (ICNs), and this work has not yet been undertaken. In my writeup of ICNs, we described evidence for five “processing networks”:

  1. Default mode network (and its three subcomponents)
  2. Salience Network and the closely related Ventral Attention Network (VAN)
  3. Dorsal Attention Network (DAN)
  4. Fronto-Parietal Control Network (FPCN) implicated in volitional control and willpower
  5. Cingulo-Opercular Control Network (COCN), implicated in working memory rehearsal and fluid intelligence.  

The five regions of interest above are a subset of what social cognition theorists describe as the sociality network. In turn, the sociality network seems to comprise a subset of the default mode network. An increasing number of theorists are gesturing towards three subnetworks within the DM network, with mindreading modules mostly but not entirely residing within one of those subnetworks. Further, we have evidence that the default mode network is the basis of interoception and allostasis (that is, the brain’s unconscious representation of the body aka the hot loop). 

These hints are suggestive. But precious little of our knowledge is detailed enough to be formalized and modeled. Someday I will be able to say more about the relationship between sociality, mindreading, interoception, and the default mode network. But that is not yet possible in 2020… at least, as far as I know.

Until next time. 


  • Baren-Cohen et al (1985) Does the autistic child have a “theory of mind”?
  • Call & Tomasello (2008). Does the chimpanzee have a theory of mind? 30 years later
  • Tomasello (2014). A Natural History of Human Thinking
  • Gergely et al (1994). Taking the intentional stance at 12 months of age
  • Heider & Simmel (1944) An experimental study of apparent behavior
  • Schaafsma et al (2015). Deconstructing and reconstructing theory of mind

Intrinsic Connectivity Networks

Part Of: Neuroanatomy sequence
Content Summary: 2200 words, 22 min read

Four Cortical Networks

Cognitive neuroscience typically employs fMRI scans under a carefully crafted task structure. Such research localized various task functions to different neural structures (cortical areas). For example, these studies produced evidence suggesting that the hippocampus is the seat of autobiographical memory. 

In the early 2000s that researchers stumbled upon a different question, what brain regions are active when the brain is at rest? Here is Raichle (2015) describing his discovery of the default mode network

One of the guiding principles of cognitive psychology at that time was that a control state must explicitly contain all the elements of the associated task other than the one element of interest (e.g., seeing a word versus reading the same word). Using a control state of rest would clearly seem to violate that principle. Despite our commitment to the strategies of cognitive psychology in our experiments, we routinely obtained resting-state scans in all our experiments, a habit largely carried over from experiments involving simple sensory stimuli, in which the control state was simple the absence of the stimulus. At some point in our work, and I do not recall the motivation, I began to look at the resting-state scans minus the task scans. What immediately caught my attention was the fact that regardless of the task under investigation, the activity decreases almost always included the posterior cingulate and the adjacent precuneus. 

Well before the discovery of the default mode network, Peterson and Posner (1980) had put forward three networks underlying attention. The dorsal attention network generated salience maps across the perceptual field, and used these maps to orient to interesting stimuli. The ventral attention network is involved in attention switching to novel stimuli. The executive network produces top-down control of attention, for example translating the instruction “pay attention to the green triangle” to sustained attention on an otherwise-uninteresting object. 

Fox et al (2005) brought these two worlds together in their seminal paper, which identified a brain-wide task-positive network which anti-correlated with their task-negative network. Their use of resting-state functional connectivity MRI (rs-fcMRI) provided independent evidence of the existence of these networks.  

Their task-negative network was the default mode network. And the task-positive network seemed to contain two networks previously identified: the executive network, and the dorsal top-down attention network. The ventral attention network, however, was not identified in their analysis.

And that was the state of the world in 2006. Neuroscientists had identified four networks, which we will henceforth call intrinsic connectivity networks (ICNs). They are:

  1. Executive Control
  2. Dorsal Attention
  3. Ventral Attention
  4. Default Mode Network

Towards Eight Networks

While the data supporting the legitimacy of these networks was strong, these anatomical structures pose a fairly routine challenge in neuroscience: they correlate with “too many functions”. Take the default mode network. It is associated with mind-wandering, social cognition, self-reference, semantic concepts, and autobiographical memory. How could one structure produce these widely divergent behaviors?

In the case when you have too many functions, you have two options: look for more specific mechanisms (Q3), and group similar concepts (Q4). In many neuroscience applications, the former is more productive: reality has a surprising amount of detail.

Researchers began to find subnetworks within the executive control. 

Dosenbach et al (2007) found two networks within the “executive network”. They found a fronto-parietal control network (FPCN), involved in error correction, and control over task execution. They also found a cingulo-opercular control network (COCN), involved in task set maintenance. The FPCN was most active at task onset and errors, the COCN expressed activity consistently throughout the task.

These subgraphs usefully pick out useful psychological concepts. We have long known that rehearsal increases working memory capacity from 3 to 7 chunks. It seems the COCN produces this miracle (but recall that the contents of working memory, the stuff it rehearses, lives in perceptual cortex, Postle 2006). Likewise, psychologists have long studied the phenomenon of willpower or volition. The FPCN might be the neural substrate of this ability. 

Seeley et al (2007) also found substructures within the original executive network. But they didn’t see a rehearsal system in the cingulo-opercular regions. Instead, they found a salience network, which bound affective and emotional information into perceptual objects, and links to the basal ganglia reward system. 

Since publication, each of these networks have been replicated dozens of times, using a widely diverging set of paradigms (ROI vs voxel granularity, fMRI vs rs-fcMRI) and statistical techniques (graph theory, dynamical causal modeling, hierarchical clustering, and independent component analysis). 

Unfortunately, these subnetworks looked and behaved radically differently. For years, neuroscientists collected data using these diverging theories. Peterson & Posner (2012) updated theory of attention rely on Dosenbach’s rehearsal network, whereas many other articles took inspiration from Seeley’s salience network. 

And then, a miracle. Power et al (2011), using graph theoretic tools and more granular data, identified both salience and rehearsal networks hidden within the cingulo-opercular graph. Despite the close proximity of these two networks, they perform dramatically diverging functions (left image).

They also discussed the spatial distribution of these networks across cortex. Essentially, the attention networks are sandwiched between sensorimotor networks and prefrontal control networks. This configuration might play an important role in reducing wiring cost for between-network communication. 

Default Mode Network and Interoception

Power et al (2011) also compared network properties of their ICNs and discovered two categories of ICN:

  • processing networks that are directly involved in perceptual-action loops. These networks tend to be very modular in their organization.
  • control networks that modulate cybernetic loops. These networks tend to have more extra-subgraph relationships.

The above illustrates an intriguing finding: the default mode network is a processing network, rather than a control network. But what sense modality does underlie? 

The answer is straightforward to an affective neuroscientist. The default mode network and the salience network comprise the seat of the hot loop; it performs:

  • interoception (viscerosensory body perception); and
  • allostasis (visceromotor body regulation)

It is a cornerstone of dual cybernetic loops. Indeed, comparative studies with macaque monkeys put empirical meat on this assertion:

  • { anterior cingulate cortex, dorsal amygdala, ventral anterior insula } perform visceromotor functions (allostasis)
  • { dorsal anterior insula } perform viscerosensory functions (interoception). 

As Kleckner et al (2017) show, these assertions are born out by myriad human rs-fcMRI studies, and further bolstered by tract-tracing studies in non-human animals.

I’ll note in passing that most experts now detect three subgraphs within the default mode network (cf Andrews-Hanna et al (2014)). But the functional signature of these subgraphs has not yet been worked out, so let me simply note this development in passing.

Network Neuroscience

We have so far discussed results from function-derived structures, with techniques such as rs-fcMRI computing ICNs from the dynamics of neural activity. A complementary research tradition can be described as anatomy-derived structures, which is a more anatomical emphasis on connectome studies. These two network types have important differences, including time scales (anatomy-derived structures tend to persist longer than task-dependent structures) and levels of detail (neuron versus region-of-interest). Nevertheless, these data can be made to usefully constrain one another (functional networks are beginning to look more like structural networks, and vice versa). 

These approaches have recently coalesced (Basset & Sporns (2017)) into the new discipline of network neuroscience. Very similar techniques are used in network science and social network analysis in the analysis of social networks. 

If a neuron is a node in a graph, and a synapse is an edge, what properties does the graph of a human brain enjoy? There are several kinds of networks possible. Regular networks enjoy rich local connections, but few cross-graph connections. Random networks enjoy more long-term connections, but are less structured. Small-world networks represent a kind of middle ground, with lots of local structure but also afford the ability to make long-term connections.

With graph theoretic measures, we can quantitatively partition networks into sets of modules.  A hub is a node with high degrees of centrality (e.g. node degree: how many edges that node supports). A connector hub facilitates between module communication; a provincial hub promotes communication within modules. 

Connectome studies (anatomy-derived structural networks) have shown that brain hub regions are more densely interconnected than predicted on the basis of their degree alone. This set of unusually central connector hubs is called the rich club. The rich club is the most metabolically expensive areas of cortex: they are “high cost, high value”.  They are loosely analogous to DNS servers (the thirteen servers are the global basis of the internet)

Human neural architecture is thus a specific kind of small-world network, one equipped with a “rich club”. These topologies have been shown to exist in other species, such as macaque monkeys and cats. Interestingly, some hubs (posterior cingulate, precuneus, and medial frontal cortex) act as sinks (more afferent than efferent connections) whereas and hubs within attentional networks (incl. dorsal prefrontal, posterior parietal, visual, and insular cortex) act as sources (more efferent than afferent connections). 

What does this have to do with ICNs? As shown by von den Heuval & Sporns (2013b), the rich club seems to be the substrate of inter-ICN communication. 

Networks vs Consciousness

According to global workspace theory, consciousness contents are generated via a publicity organ which selects perceptual information worthy of further processing by downstream modules. There is, however, much disagreement about the mechanism of conscious contents. Theories include:

  1. Dehaene and Changeux have focused on frontal cortex 
  2. Edelman and Tononi on complexity in re-entrant thalamocortical dynamics 
  3. Singer and colleagues on gamma synchrony
  4. Flohr on NMDA synapses
  5. Llinas on a thalamic hub
  6. Newman and Baars on thalamocortical distribution from sensory cortex

Shanahan (2012) offered a new hypothesis, that the rich club has recently hypothesized as the basis of consciousness. Its central location and role synchronizing large-scale brain networks makes it a plausible suspect. However, it is unclear whether the rich club is primarily facilitated by corticocortical white matter, or corticothalamic reentrant loops. If the latter, the hypothesis would converge with existing theories that emphasize the role of the thalamus.

There is some evidence that the thalamus facilitates ICNs. Habas et al (2009) found strong links between cerebellar substructures and various ICNs. This finding is suggestive because cerebellar error signals as passed to cortex through the thalamus. 

Networks vs Modules

ICNs comprise a central organizing principle of the nervous system. But they are not the only such principle; we have identified some fifteen others!

It is difficult to reconcile intrinsic connectivity networks (ICNs) with massive modularity, so that will be the topic of this section.

ICNs have been seized upon by some theorists in the Bayesian predictive coding traditions (e.g. Barrett & Simmons (2015)) as evidence of the illegitimacy of modules. But most ICN theorists still admit the centrality of modules (e.g., Sporns & Betzel (2015)). Here, for example, is von den Heuval & Sporns (2013a):

Since the beginning of modern neuroscience, the brain has generally been viewed as an anatomically differentiated organ whose many parts and regions are associated with the expression of specific mental faculties, behavioral traits, or cognitive operations. The idea that individual brain regions are functionally specialized and make specific contributions to mind is supported by a wealth of evidence from both anatomical and physiological studies. These studies have documented highly specific cellular and circuit properties, finely tuned neural responses, and highly differentiated regional activation profiles across the human brain. Functional specialization has become one of the enduring theoretical foundations of cognitive neuroscience. 

Most researchers now admit the interaction of both principles (specialization and integration). It is unclear how it could be otherwise. I have personally read far too many papers that have described activity in the dorsolateral prefrontal cortex as task-specific, without considering it is a simple expression of the volitional control or working memory rehearsal networks. Similarly, I have read dozens of reviews of the anterior insula that would have profited from the realization that it participates in at least three different ICNs. 

The three streams hypothesis integrates notions of massive modularity, cortical streams, the abstraction hierarchy, and the cybernetic loop hypothesis. It is less clear how ICNs might integrate with these organizing principles. 

Does the ventral temporal parietal junction (vTPJ) only perform integrative functions in service of the ventral attention network (VAN)? Or does the real estate claimed by these ICNs also used to perform specialized computations such as mindreading? The latter proposition strikes me as more likely. But I’d like to see more data on this. To be continued…

Wrapping Up

The human cortex has intrinsic connectivity networks (ICNs) that coordinate to provide integrative services on behalf of our central nervous system. Researchers have so far identified the following networks:

  • Default mode network (and its three subcomponents)
  • Salience Network and the closely related Ventral Attention Network (VAN)
  • Dorsal Attention Network (DAN)
  • Fronto-Parietal Control Network (FPCN) implicated in volitional control and willpower
  • Cingulo-Opercular Control Network (COCN), implicated in working memory rehearsal and fluid intelligence.  

Until next time.

Works Cited

I’ve put the papers I found especially helpful in bold.

  1. Andrews-Hanna et al (2014). The default network and self-generated thought: component processes, dynamic control, and clinical relevance. 
  2. Bassett & Sporns (2017). Network Neuroscience
  3. Barrett & Simmons (2015). Interoceptive predictions in the brain. 
  4. Christoff et al (2016). Mind-wandering as spontaneous thought: a dynamic framework
  5. Dosenbach et al (2007). Distinct brain networks for adaptive and stable task control in humans
  6. Fox et al (2005). The human brain is intrinsically organized into dynamic, anticorrelated functional networks
  7. Kleckner et al (2017). Evidence for a large-scale brain system supporting allostasis and interoception in humans. 
  8. Laird et al (2011). Behavioral Interpretations of Intrinsic Connectivity Networks
  9. Habas et al (2009). Distinct Cerebellar Contributions to Intrinsic Connectivity Networks
  10. Peterson & Posner (1990). The attention system of the human brain
  11. Peterson & Posner (2012). The Attention System of the Human Brain: 20 Years After
  12. Postle (2006). Working memory as an emergent property of the mind and brain.
  13. Power et al (2011). Functional Network Organization of the Human Brain
  14. Raichle (2015). The Brain’s Default Mode Network
  15. Seeley et al (2007). Dissociable Intrinsic Connectivity Networks for Salience Processing and Executive Control
  16. Shanahan (2012). The Brain’s Connective Core and its Role in Animal Cognition
  17. Sporns & Betzel (2015). Modular Brain Networks
  18. Von den Heuval & Sporns (2013a). Network hubs in the human brain 
  19. Von den Heuval & Sporns (2013b) An Anatomical Substrate for Integration among Functional Networks in Human Cortex

Consciousness as a Learning Device

Part Of: Consciousness sequence
Content Summary: 1600 words, 16 min read
Inspiration: Baars (1998) A Cognitive Theory of Consciousness.

Automatization in Tasks

Almost everything we do, we do better unconsciously than consciously. In first learning a new skill we fumble, feel uncertain, and are conscious of many details of action. Once the task is learned, we lose consciousness of the details, forget the painful encounter with uncertainty, and sincerely wonder why beginners seem so slow and awkward. 

In dual task paradigms, subjects are asked to perform two tasks simultaneously. Performance is often poor, because of the limited capacity of consciousness. But when a subject extensively practices one of these tasks, the task will stop interfering with others, and performance improves.

Consider reading, the act of translating visual letters into conceptual meaning. Reading proceeds automatically. If you see the word “pink”, it is nearly impossible to avoid subvocalizing and imagining the color (inner speech and semantic recall). You are not aware of identifying individual letters, or searching your memory for the requisite sounds and meanings – they just occur.

Driving a car is yet another example of a skill that becomes automatic:

When we first learn to drive a car, we are very conscious of the steering wheel, the transmission lever, the foot pedals, and so on. But once having learned to drive, we minimize consciousness of these things and become mainly concerned with the road, with turns in the road, traffic to cope with, and pedestrians to evade. The mechanics of driving become part of the unconscious frames within which we experience the road. 

But even the road can be learned to the point of minimal conscious involvement if it is predictable enough: then we devote most of our consciousness to thinking of different destinations, of long-term goals, and so forth. The road has itself now become “framed”. The whole process is much like Alice moving through the Looking Glass, entering a new reality, and forgetting for the time being that it is not the only reality. Things that were previously conscious become presupposed in the new reality. In fact, tools and subgoals in general become framed as they become predictable and automatic.

Why, when the act of driving becomes automatic, do we become conscious of the road? Presumably the road is much more informative within our purposes than driving has become. Dodging another car, turning a blind corer, braking for a pedestrian – these are much less predictable than the handling of the steering wheel. 

The process of automatizing a skill is called habituation. Habituation involves an increase in performance and a decrease in demand for cognitive resources. But it also involves:

  • loss of self-monitoring: an unpracticed beginner is aware of their own performance, but an expert practitioner can be deceived into believing her performance was much less than its actual value.
  • loss of long-term working memory. Consider, in typing, which finger is used to type the letter c? Most people have to consult their fingers to find out the answer

Suppose someone is given a shape from among the following set, and asked to memorize it. They then receive pairs of other images, and select which one is more similar. 

Pani (1982) found that, as subjects practiced the task, the original image faded from consciousness even as the responses became faster and more accurate. 

Automatization in Perception

The Pani experiment suggests that not merely actions that move to autopilot. Perception can fade from consciousness as well.

Consider the pressure of the chair you are sitting in. Before I mentioned it, that tactile sensation had likely faded into the background. In contrast, the visual experience of reading these words was very much at the center of your conscious experience. 

What is the difference between the tactile quality of the chair and the visual experience of these words?  Redundancy! The chair feels very similar one moment to the next, whereas each new word has a subtly different experience. 

These redundancy effects are pervasive. Consider the experience of moving to an area with a distinctive smell. For the first few days, the smell is at the forefront of your conscious experience; but over time, this redundant sensation fades to the background.

We have seen redundant touch and smell fade from consciousness. Why don’t we become blind to redundant visual information?

Unlike touch and smell, our fovea constantly move across the visual field in an involuntary movements called saccades. This might be one way that the visual system combats redundancy.

If you mount a tiny projector on a contact lens firmly attached to the eye, you can ensure that the visual image is invariant to eye movements. Pritchard et al (1960) found that in such conditions, the visual image fades in a few seconds. Similarly, when people look at a bright but featureless field (the Ganzfeld), they experience “blank outs” – periods when visual perception seems to fade altogether. (Natsoulas, 1982). When vision is not protected by saccades, it behaves just like the other senses.

Becoming blind to redundant information is not limited to perception. Semantic satiation occurs when a person repeats the same word over and over again, until the word starts to feel foreign and arbitrary. Try this for yourself, say “gum” to yourself 50 times and see what happens. 

There is a school of thought that interprets these redundancy effects as anatomical fatigue (perhaps processing the same image dozens of times exhausts neurotransmitters in the relevant microcircuits). But these interpretations are confounded by our ability to surprised by the lack of a stimulus, which implies that the redundancy is encoded in terms of information rather than energy.

It is also worth noting that redundant perceptions do not fade into the background if they are highly relevant to the organism’s health and goals. Chronic pain and hunger fall under this rubric. These are, however, exceptions to the rule. 

Errors and Curiosity

When we experience difficulty performing automatized tasks, consciousness access returns.

  • In reading, lexical access becomes automatic. But simply turning a book upside down will interfere with our reading proficiency, and the perceptual details of “stitching letters to form words” comes back to us.
  • In visual matching, our ability to describe the original target image disappears as we become proficient. But by simply increasing task complexity, our ability to describe the target image returns.
  • In driving, if we move to a new city, our routing autopilot procedures evaporate, and we are more conscious of navigational decisions. If we buy a new car with different operating characteristics (a more sensitive brake pedal, and less sensitive steering control), the mechanical details of driving flood back into our consciousness. 

It seems that consciousness is used to debug automatic processes that run into difficulties.

We often tire of practicing tasks that we have mastered. We often tire of receiving sense data we can fully anticipate. In the case where our brain has fully habituated to some phenomena (and indeed, often before that point is reached), curiosity moves our attention towards other domains. This impulse towards novelty is one way our brain builds a diverse coalition of mental modules capable of responding to an intrinsically complicated world.

Towards A Theory of Conscious Learning

From the global workspace perspective, we expect consciousness to be involved in learning novel events. Such learning requires unpredictable communication patterns between modules; a feat only possible by way of widespread broadcasting. 

Consider the radical simplicity of the act of learning itself. To learn anything new, we merely pay attention to it. By merely allowing ourselves to interact consciously with a new language – even without a learning plan, nor knowledge of its syntactic structure – we nevertheless “magically” acquire the ability to comprehend and speak.

Today we explored the relationship between learning, and the habituation of awareness. Baars says it best,

Habituation is not an accidental by-product of learning. Rather, it is something essential, connected at the very core to the acquisition of new information. And since learning and adaptation are perhaps the most basic functions of the nervous system, the connection between consciousness, habituation, and learning is fundamental indeed.

Factoring in our observations about error and curiosity, it seems as though learning can be modeled as a push-pull system. Learning promotes habituation, error promotes deautomization, and curiosity redirects the brain to different activities if the current one has been mastered.

The learning-surprise versus curiosity systems bears a striking resemblance to the reinforcement learning dichotomy of exploitation versus exploration. 

Towards The Future

I noted in Function of the Basal Ganglia that habituation has been associated with control shifting from the associative to the sensorimotor loop in the basal ganglia. This is hard to reconcile with the neurological basis of consciousness in the corticothalamic system. A more systematic account of these biological interactions is required. 

Consciousness has been linked to many other functions besides learning and habituation. It is most natural to interpret polyfunctional biological systems like this to have accreted function across evolutionary time. Untangling the phylogenetic ordering of these subfunctions (peeling the onion) is an important task that will require input from comparative anatomy.

The consciousness organ is not the only system to exhibit redundancy effects. Habituation to repeated input is a universal property of neural tissue. Even a single neuron will respond to electrical stimulation at a given frequency only for a while; after that, it will cease responding to the original frequency, but continue to respond to other frequencies. (Kaidel et al (1960). The relationship between the specific corticothalamic system and these microproperties of neurons is also an open research area.

Until next time. 


  • Baars (1998), A Cognitive Theory of Consciousness, especially sections 1.2.4, 1.3.3, 1.4.1, 1.4.4, and 3
  • Pani (1982). A functionalist approach to mental imagery.
  • Pritchard et al (1960). Visual perception approached by the method of stabilized images.
  • Kaidel et al (1960). Sensory Communication (pp 319-338).
  • Natsoulas (1982). Dimensions of perceptual awareness.

[Excerpt] Language vs Communication

Part Of: Language sequence.
Excerpt From: Tecumseh Fitch, The Evolution of Language
Content Summary: 800 words, 4 min read

What kind of sound does a dog make? That depends on which language you speak. Dogs are said to go ouah ouah in French, but ruff or woof in English. 

Crucially, however, the sounds that the dogs themselves make do not vary in this way. Dogs growl, whine, bark, howl and pant in the same way all over the world. This is because such sounds are part of the innate behavioral repertoire that every dog is born with. This basic vocal repertoire will be present even in a deaf and blind dog. This is not, of course, to say that dog sounds do not vary: they do. You may be able to recognize the bark of your own dog, as an individual, and different dog breed produce recognizably different vocalizations. But such differences are not learned; they are the inevitable byproducts of the fact that individuals vary, and  differences at the morphological, neural or “personality” level will have an influence on the sounds an individual makes. Dogs do not learn how to bark or growl, cats do not learn how to meow, and cows do not learn their individual “moos”. Such calls constitute an innate call system. By “innate” in this context, I simply mean “reliably developing without acoustic input from others” or canalized. For example, in experiments where young squirrel monkeys were raised by muted mothers, and never heard conspecific vocalizations, they nevertheless produced the full range of calls. 

The same regularity applies to important aspects of human communication. A smile is a smile all over the world, and a frown or grimace of disgust indicates displeasure everywhere. Not only are many facial expressions equivalent in all humans, but their interpretation is as well. Many vocal expressions are equally universal. Such vocalizations as laughter, sobbing, screaming, and groans of pain or pleasure are just as innately determined as the facial expressions that normally accompany them. Babies born both deaf and blind, unable to perceive either facial or vocal signals in their environment, nonetheless smile, laugh , frown, and cry normally. Again, just as for dog barking, individuals vary, and you may well recognize the laugh of a particular friend echoing above the noise in a crowded room. And we have some volitional control over our laughter: we can (usually) inhibit socially inappropriate laughter. These vocalizations form an innate human call system. Just like other animals, we have a species-specific, innate set of vocalizations, biologically associated with particular emotional and referential states. In contrast, we must learn the words or signs of language. 

This difference between human innate calls, like laughter and crying, and learned vocalizations, like speech and song, is fundamental (even down to the level of neural circuitry). An anencephalic human baby (entirely lacking a forebrain) still produces normal crying behavior but will never learn to speak or sing. In aphasia, speech is often lost while laughter and crying remain normal. Innate human calls provide an intuitive framework for understanding a core distinction between language and most animal signals, which are more like the laughs and cries of our own species than like speech. Laughs and cries are unlearned signals with meanings tied to important biological functions. To accept this fact is not to deny their communicative power. Innate calls can be very expressive and rich – indeed their affective power may be directly correlated with their unlearned nature. The “meaning” of a laugh can range from good-natural conviviality to scornful, derisive exclusion, just as a cat’s meow might “mean” she wants to go out, she wants food, or she wants to be petted. Insightful observers of animals and man have recognized these fundamental facts for many years. 

Obviously, signals of emotion and signals of linguistic meaning are not always neatly separable. In vocal prosodic cues, facial expressions, and gestures, our linguistic utterances are typically accompanied by “non-verbal” cues to how we feel about what we are saying. One signal typically carries both semantic information intelligible only to those who know the language, and a more basic set of information that can be understood by any human being or even other animals. Non-verbal expressive cues are invaluable to the child learning language, helping to coordinate joint attention and disambiguate the message and context. They also make spoken utterances more expressive than a written transcription alone. Other than the exclamation mark or emoticons, our tools to transcribe the expressive component are limited, but the ease and eagerness with which humans read illustrates that we can nonetheless understand language without this expressive component. This too, reinforces the value of a distinction between two parallel, complementary systems. 

As we discuss other animals’ communication systems, I invite the reader to compare these systems not only to language exchanges, but also to the last time you had a good laugh with a group of friends, and the warm feeling that goes along with it, or the sympathetic emotions summoned by seeing someone else cry, scream, or groan in pain. The question we must ask – “is this call type more like human laughter and crying, or more like speech or song?”. I will shortly argue that all non-human communication systems fall in the former category.