# The Evolution of Disgust

Introduction

Why did disgust evolve? Why does it play a role in morality? Should it?

One of the best ways to understand an emotion is to build a behavioral profile: a list of its responses (outputs) and elicitors (inputs).

Disgust Responses

One of the striking features of disgust is how diverse its set of responses. These include an affect program:

• Gape face. This is characterized by a nose wrinkle, extension of the tongue, and wrinkle upper brow.
• Feeling of nausea. In fact, the physiological signature of intense disgust closely matches physical nausea.
• A withdrawal reflex. This reflex need not be physical retreat, but can also yield motivation to remove the offending object.

But disgust also produces an inferential signature:

• Sense of oral incorporation. That is, the subjective feeling that the offending object is already in one’s mouth.
• Offensiveness tagging. Even after the object has been removed, it will continue to be treated as offensive indefinitely.
• Asymmetric transmission logic. See the law of contagion: a clean object that touches something gross is contaminated, but not vice versa.

Disgust Elicitors

Even more diverse than its outputs, the elicitors of disgust include cultural universals, including:

• Organic decay.
• People and objects associated with illness
• Compromised body envelope. These include: cuts, gashes, lesions, or open sores.
• Substances that have left the body. These include feces, vomit, spit.

Swallowing the saliva that is currently in your mouth is innocuous, but even imagining yourself drinking a glass of spit (even if it is (was?) your own, is disgusting. These last two elicitors are body perimeter tracking: they not only police the boundaries of the body in peripersonal space, but also seem to enforce a no re-entry policy: anything that exits or becomes detached triggers it.

There exists another suite of elicitors that are culturally tuned

• Specific foods.  Some foods are deemed disgusting even when they have never been tried (e.g., liver).
• Specific living animals. These can include: flies, maggots, worms, rates, lice, tics, slugs, snails, and spiders…
• Specific sexual practices. These can include: homosexuality, pedophilia, bestiality, necrophilia, …
• Specific morphological signatures. Deviations from bodily normality, however that is construed in a particular culture. These can include: the elderly, disabled, little people, …

It is worth emphasizes that disgust over sexual practices and morphological signatures varies widely across cultures and across individuals. For example, ancient Greece mostly didn’t find homosexuality disgusting but 20th century Americana mostly did.

Finally, people comprise another category of elicitors.

• Moral transgressors. These can include: murderers, rapists, …
• Members of an out-group. These can include: untouchable caste, Jews (in Nazi Germany), …

Neuroscientific data suggest that, when people are deemed sufficiently disgusting, brain areas associated with mindreading become deactivated. This is likely the neural basis of dehumanization.

The Entanglement Thesis

Taken together, here is the behavioral profile of disgust:

Puzzle: Why should the sight of a person with leprosy evoke a gape face and a feeling of nausea? Leprosy has nothing to do with digestion.

Solution: Disgust is a kludge! It is the unholy merger of two separate systems.

Poison monitoring is a faculty of the digestive system. It evolved to regulate food intake and protect the gut against ingested substances that are poisonous or otherwise harmful. It was designed to expel substances entering the gastrointestinal system via the mouth. It also acquires new elicitors very quickly.

Infection avoidance is a faculty of the immune system. It evolved to protect against infection from pathogens and parasites, by avoiding them. Not specific to ingestion, but serves to guard against coming into close physical proximity with infectious agents. This involves avoiding not only visible pathogens and parasites, but also places, substances and other organisms that might be harboring them.

Any theory of disgust should explain the unity of responses to disgust. Here is how entanglement theory does it:

• Poison monitoring produces the affect program. Gape face, nausea and withdrawal all serve digestive (and not immunological) purposes.
• Infection avoidance produces (most of) the inferential signature. The tendency to monitor disgusting things even when not immediately exposed, and the asymmetric logic of contamination, make perfect sense when tracking the spread of parasites.

Any theory of disgust should explain the diversity of elicitors of disgust. Here is how entanglement theory does it:

• Poison monitoring is sensitive to certain foods (namely, those that are associated with toxicity)
• Infection avoidance explains the aversion to certain living animals (flies are more likely to carry disease than dogs), apparently disease-infected substances, to certain sexual practices (sexual practices can bring increased risk of disease) and morphological deviations (e.g., violates of facial symmetry correlate with parasites). It also explains the general tendency for disgust to monitor the body perimeter: which is, after all, how pathogens can enter the body!

Any theory of disgust should explain cultural variation of the elicitors. Here is how entanglement theory does it:

• The poison monitoring system is very quick to learn features the Garcia effect: one-shot learning.
• In women, aversion to deviant sexual practices (and not other forms of disgust) vary with where they are in the ovulation cycle.

Besides the increase in explanatory power, phylogenetic and ontogenic data also support the independence of these two systems:

• Researchers disagree whether disgust is unique to humans, or whether homologies exist in the animal kingdom. Both are right: animals show clear signs of the existence of both systems but the systems are expressed separately.
• Ever wonder why children don’t seem to mind disgusting objects & behaviors? It is because poison monitoring appear very early (within first year of life) but infection avoidance emerges significantly later.

The Evolution of Disgust

Why should the poison avoidance and pathogen monitoring have become entangled in the course of human evolution? Why didn’t poison avoidance become entangled with e.g., FEAR instead?

First, the two systems both care about digestion. Food intake can bring both poison and pathogens into the body, and as such it is monitored by both systems.

Why did entanglement only happen in humans, specifically? Compared to other primates, early hominids adopted a unique lifestyle, that combined scavenging with a nascent ultrasociality. These two characteristics put enormous adaptive pressure on the pathogen avoidance system to innovate.

Perhaps the most important reason for entanglement has to do with signaling. As hominids began to increasingly emphasize social cooperation, there became a need to communicate pathogenic information. Before the emergence of language, the pathogen avoidance module had an inferential signature – but how to communicate this contamination tagging information with others? The functionally-overlapping toxin monitoring system had a clearly visible output: the gape face. Plausibly, the two modules merged such that pathogen monitoring system could co-opt gape face to communicate. We can call this the gape face as signal theory.

My Take on the Theory

The theory I have presented here was developed by Daniel Kelly’s book Yuck! The Nature and Moral Significance of Disgust. The theory strongly complements Mark Schaller’s work on the behavioral immunity system. The overlap between these two researchers will become clear next time, when we turn to the social co-optation of the disgust system.

I personally find the entanglement thesis (the merger of toxin monitoring and pathogen avoidance systems) compelling, given its tremendous explanatory power outline above.

Despite accepting the overall architecture, Kelly’s theory for why the architecture evolved (gape face as signal) strikes me as incomplete.

I also feel like this theory will remain incomplete until we discover how toxin monitoring and parasite avoidance are implemented in dissociable neurobiological structures (i.e., modules).

After the psychological mechanisms are mapped to their physical roots, we could attempt to integrate our knowledge of disgust with other systems:

• What is the relationship of disgust to the generalized stress response? Stress & the immune systems co-evolved to share the HPA axis, after all.
• How is disgust implemented in the microbiome-gut-brain axis, which also has links to both the digestive system (enteric nervous system) and the immune system (e.g., leaky gut)?
• How does the MGB axis differentially produce both disgust and other social phenomena like anxiety?

Open questions are exciting! To me, it suggests a clear research program where we can start integrating our newfound theory of disgust into the broader picture of visceral processes (the hot loop).

Takeaways

The human brain comes equipped with two systems:

1. Poison monitoring is a faculty of the digestive system. It evolved to regulate food intake and protect the gut against harmful substances.
2. Infection avoidance is a faculty of the immune system. It evolved to protect against infection from pathogens and parasites, by avoiding them.

In humans, these two systems were entangled in the emotion of disgust. This explains the otherwise baffling diversity of disgust elicitors & behaviors.

# Confabulation: saying more than we can know

Anosognosia

It is unfortunate to experience illness. It is strange to fail to recognize illness within oneself. Anosognosia is the name for this inability. A few examples:

Example 1. In a letter to his friend Lucilius, Seneca (40 CE) described a woman who obstinately denied her blindness.“….You know that Harpestes, my wife’s fatuous companion, has remained in my home as an inherited burden….This foolish woman has suddenly lost her sight. Incredible as it might appear, what I am going to tell you is true: She does not know she is blind. Therefore, again and again she asks her guardian to take her elsewhere because she claims that my home is dark…..It is difficult to recover from a disease if you do not know to be ill….”.

Example 2. After a right-hemisphere stroke, she lost movement in her left arm but continuously denied it. When the doctor asked her to move her arm, and she observed it not moving, she claimed that it wasn’t actually her arm, it was her daughter’s. Why was her daughter’s arm attached to her shoulder? The patient claimed her daughter had been there in the bed with her all week. Why was her wedding ring on her daughter’s hand? The patient said her daughter had borrowed it. Where was the patient’s arm? The patient “turned her head and searched in a bemused way over her left shoulder”.

Spend enough time with these patients, and it becomes clear that their problem is not cognitive dissonance. No, the delusion has a much deeper, subterranean, hold on their mental lives.  These patients freely generate explanations for their illness-related behavior (“I can’t walk around because the house is dark”, “The unmoving arm isn’t mine, it is my daughters”). These explanations are not examples of dishonesty. They are genuine perceptions of a misfiring mind. The word for these honest lies is confabulation.

If you’re anything like me, you’ll find such epistemic fences a bit unsettling. Is it possible our entire species is entertaining a similar delusion that increases biological fitness? Do we actually have four fingers but are collectively convinced that little fingers exist?

Split Brain Patients

The vertebrate brain has two hemispheres. Some neural functions are bilateral: visual processing occurs in both right and left hemisphere (one per eye). Other functions are unilateral: language processing is usually left-lateralized (with the exceptions tending to be left-handed). The advantages & disadvantages of lateralization of brain function is an active research area.

In neurotypical animals, there exist traverse fibers (commissures) which integrate information between the hemispheres. The corpus callosum is the overwhelmingly dominant bridge between hemispheres:

• Corpus Callosum: 250 million fibers
• Anterior commissure: 0.5 million fibers
• Posterior commissure: 0.5 million fibers
• Habenula commisure: 0.1 million fibers

Split brain patients are those that have had their corpus callosum severed. These patients tend to exhibit selfhood fracturing: each hemisphere constitutes a largely autonomous entity with its own beliefs and desires.

Present the left hemisphere with a picture of a chicken claw, and the right with a picture of a wintry scene. Now show the patient an array of cards with pictures of objects on them, and ask them to point (with each hand) something related to what they saw. The hand controlled by the left hemisphere points to a chicken, the hand controlled by the right hemisphere points to a snow shovel. So far so good.

But what happens when you ask the patient to explain why they pointed to those objects in particular? The left hemisphere is in control of the verbal apparatus. It knows that it saw a chicken claw, and it knows that it pointed at the picture of the chicken, and that the hand controlled by the other hemisphere pointed at the picture of a shovel. Asked to explain this, it comes up with the explanation that the shovel is for cleaning up after the chicken. While the right hemisphere knows about the snowy scene, it doesn’t control the verbal apparatus and can’t communicate directly with the left hemisphere, so this doesn’t affect the reply. The patient instead confabulates.

What did ”the patient” think was going on? This is a wrong question. Once you know what the left hemisphere believes, what the right hemisphere believes, and how this influences organism behavior, then you know all that there is to know.

Gazzaniga has described this propensity of patients to confabulate reasons for the behavior of the right brain as the left-brain apologist. The left hemisphere functions as an interpreter, a lawyer, a press secretary:: it justifies behavior to make the organism look good. V.S Ramachandran, drawing on observations that right-brain lesions disproportionately produce delusions, claims the existence of a right-brain revolutionary. It is the failure some module in the right hemisphere that causes anosognosia: the left-brain apologist to go unchecked: confabulation exacerbated by delusion.

Confabulation in Neurotypicals

We have so far explored confabulation in patients with brain damage. Do neurotypical, everyday people produce “honest lies”?

We confabulate all the time.. We just don’t realize that we are.

In Telling More Than We Can Know: Verbal Reports on Mental Processes, Nisbett & Wilson (1977) review hundreds of studies, across dozens of disciplines. Their evidence admits a theme: people’s attempts to explain their behavior is almost always unhelpful in identifying the important factors influencing their decisions. Let me briefly review four example findings.

Study 1: Insufficient Justification.

Zimbardo et al (1969) ask participants to accept a series of painful shocks while performing a learning task. Participants were split into two groups:

• Inadequate Justification (“I’m curious to see what happens”)

Who suffers less?

→ The Inadequate Justification group. This group learns much more quickly, and admit lower galvinic skin response (lower “fight or flight”).

Why do they suffer less?

→ These people were given a poor justification for continuing, and yet they continued anyway. To explain their own behavior, they generate intrinsic motivation for continuing. (As an aside, this phenomenon is similar to the overjustification effect).

Do they know that they suffer less?

→ No! Subjective reports of pain were the same across groups.

Storms & Nisbett (1970) ask insomnia-suffering participants to sleep under observation. Participants were split into two groups:

Who falls asleep more quickly?

→ Arousal Attribution group (28% faster).

Why do they fall asleep more quickly?

→ Attribution of restlessness to placebo, rather than cognitive factors.

Do they know why they fall asleep more quickly?

→ No! More than 80% of patients would not attribute sleep improvement to pill, even after the experiment being explained to them.

Bem & McConnell (1970) ask participants for their view on a political topic. Then ask they write an essay against their own view. Participants were split into two groups:

• Coercion: bribed to write the essay
• Freedom: led to believe they had a choice

Who changes their position after writing the essay?

→ Freedom group.

Why do they change?

→ Difficult to explain writing that essay, unless they wanted to.

Do they know that they changed their position?

→ No! In contrast to the Coercion group which had accurate memories, those whose opinions had changed failed to remember their previous position.

Study 4: Choice Blindness

Johannson et al. (2005) ask participants to evaluate which of two female faces was more attractive. Researchers then hand subjects the face they had chosen, asking them to explain the motives behind their choice. Participants were split into two groups:

• Switch: used a sleight-of-hand trick to switch the photos, showing viewers the face they had not chosen.
• Control: show the face they had chosen

Does the Switch group notice the change?

→ Most don’t. ⅔ of participants believe they had chosen the other face.

Did those who didn’t notice explain of their (non-)choice?

→ Without missing a step. They happily explained why they preferred the face they had actually rejected, inventing reasons like “I like her smile” even though they had actually chosen the solemn-faced picture.

Putting It All Together

Confabulation is “honest lying”: communicating an untruth, while earnestly believing in its veracity.

• Anosognosia patients cannot admit that they are paralyzed. When asked to explain their inability to move, they confabulate answers.
• Split brain patients similarly confabulate explanations for the behavior of the non-linguistic right hemisphere.
• Confabulation is not merely a medical curiosity. Confabulation is everywhere: most self-reports are utterly useless. Some evidence includes:
1. Insufficient Justification: people didn’t notice when they were suffering less
2. Attribution Effect: people failed to understand the reason why they slept better
3. Counterattitudinal Advocacy: after people change their minds, they fail to remember they ever thought differently
4. Choice Blindness: once tricked into thinking they chose something different, people are happy to explain their reasons.

Why do human beings confabulate so often? How can we be such utter strangers to ourselves?  We shall explore these questions next time. Until then!

# The Construction of Body Status

Connection To Philosophy of Well-being

What is well-being?

Philosophers have put forward three theories.

• Hedonic Theory. Well-being is experiencing pleasure.
• Desire Fulfillment Theory. Well-being is achieving your goals.
• Objective List Theory. Well-being is living an objectively good life.

In this post, we ask “does the brain have any incentive to compute biological measures of well-being? If so, what would this data structure be used for?”

Well-being is Body Status

Everyone agrees that the following are true about well-being:

1. Well-being is sensitive to variables of body status. Instantaneous well-being is less if an animal is in pain, other things being equal.
2. Well-being responds to many divergent factors (e.g., both pain and hunger reduce instantaneous well-being).

But there is only one biological apparatus that satisfies these properties:

Proposition 1. Well-being is body status, constructed by regulatory processes.

In 1925, Walter Cannon formulated homeostasis, which posits the body striving to maintain internal variables essential for life. For example, the body measures its own body temperature. If it is too hot or cold, a negative feedback process will initiate actions to bring the variable back into its optimal value.

The body tracks many more variables besides body temperature. These variables together constitute a representation I will call body status:

Body status representations play a key role in the biological construction of personal identity and subjectivity. We will return to this topic at another time.

Desire from Body Status

Markov Decision Process (MDPs) are a lens through which we can interpret behavior. An MDP contains states, actions, and a reward signal. The organism selects a policy $\pi$ such that the states encountered maximize the reward signal.

Within the brain, the basal ganglia implements two data structures which together generate motivation:

• A policy 𝝅 which maps states to actions, S → A.
• A value function V(s) which represents expected reward.

Reinforcement learning theory is silent on the biological substrate of the reward signal. But to us, the solution is clear:

Proposition 2.  Reward is derived from the body status representation.

This is one mechanism by which low body temperature is corrected. Body status deviations elicit a reward signal that prompt “cold” motor desires (e.g., shivering). In contrast, notice that “hot” visceral desires (e.g., blood vessel constriction) are constructed directly, not implemented by the basal ganglia.

Hedonics from Body Status

There are two liking systems in the brain:

1. Hedonics is a global measure of pleasure and pain. It summarizes body state information.
2. Valence is an object-specific judgment of value. Valence usually correlates with desire: we approach things that are pleasant, and avoid things that are unpleasant.

Yet drug addicts often reach the point where drug consumption is unpleasant, yet they pursue a fix regardless. Wanting and liking are dissociable. Why? Because they are implemented by different neurochemical systems (phasic dopamine and opioids, respectively).

Body status is not only used to behaviorally motivate. In my view, it also tags perceptual data with information about its visceral relevance.  This includes the two primary affective dimensions:

• Object salience (“does this merit attention, further computation”)
• Object valence (“is this safe to approach”)

So we have arrived at our next thesis:

Proposition 3. Hedonics and valence are derived from body status representation.

Philosophers debate whether well-being is best attributed to pleasure/pain or desire. But body status is used to construct both of these phenomena. This gives us reason to believe that the philosophical theories of hedonism and desire fulfillment can be unified.

The Socialification of Body Status

Across the course of natural history, certain animals have become increasingly social, able to interact more meaningfully with their conspecifics.

• In mammals, social status. Animals track their standing in the group.
• In primates, social inclusion. Group living made possible by e.g., exchange of favors.
• In hominids, social reputation. An prosocial alternative to power, independent of the dominance hierarchy.

How might a biological organism introduce these new behavioral repertoires? A simple way to do it might be to extend body status to incorporate social variables of interest:

Proposition 4. Body status was extended to support novel social behaviors.

This proposition lends a biological perspective why social ostracization is so painful, and elicits physiological distress directly comparable to e.g., evading predation.

This socialification hypothesis is more speculative than my other three propositions. How might we go about evaluating whether it is true?

Recall that body status is represented by an overlapping set of neurochemical networks, whose main connecting hub is the hypothalamus. If Proposition 4 is true, we would expect to find new chemical systems uniquely responsive to these proposed dimensions.

I suspect these connections will be established rather quickly. We already possess several extremely suggestive lines of evidence. See, for example, Hennessy et al (2014). Sociality and sickness: have cytokines evolved to serve social functions beyond times of pathogen exposure?

Takeaways

Today, I presented the following ideas:

• Proposition 1. Well-being is body status, constructed by regulatory processes.
• Proposition 2. Desire is derived from body status representation.
• Proposition 3. Hedonics and valence are derived from body status representation.
• Proposition 4. Body status was extended to support novel social behaviors.

Until next time.

# The Relational Sphere Hypothesis

A Theory of Relationship Dynamics

How can we make sense of social life? Let’s start by considering a simple cup of coffee.

1. In my own house, I can just help myself to as much as I want, sharing with others in the framework of “what’s mine is yours.”
2. Or my friend can get me a cup of coffee in return for the one I got for him yesterday, so we take turns or match small favors for each other.
3. At Starbucks, I buy my coffee, using price and value as the framework.
4. To my children, however, none of these principles apply. To them, coffee is something that only “big people” are allowed to drink: It is a privilege that goes with social rank.

What is true of a humble cup of coffee is true of the moral dilemmas surrounding major policy questions such as organ donation. Decisions have to be made, and there are again four fundamental ways to make them:

1. Should we hold a lottery, giving each person an equal chance?
2. Should we somehow rank the social importance of potential recipients?
3. Should we sell organs to the highest bidder?
4. Or should we expect everyone in a local community to give freely, offering a kidney to anyone group member in need?

(The above excerpt is from [FE] )

Relational Models Theory (RMT) proposes that these four social categories are exhaustive and culturally universal. Human interactions are complex, and typically use more than one of the above processes. But every relationship, in every culture, seems to be some combination of the following:

• In Communal Sharing (Communality), people are viewed as equals oriented around some particular identity. This can include being in love, sports fans, and co-religionists.
• In Authority Ranking (Dominance), people are situated in a hierarchy where superiors are deferred to, respected, and in some cases obeyed.
• In Equality Matching (Reciprocity) people are interested in restoring balance, turn-taking, and making sure everyone is treated fairly.
• In Market Pricing (Exchange), relationships are governed by quantitative, utilitarian concerns such as prices, exchanges, or cost-benefit analyses.

We can use relational models to explain a wide swathe of social phenomena:

• Some examples of norm violation are in fact category errors. For example, we would interpret a situation such as the price of our meal is two hours on dishwasher duty as a conflation of Market Pricing vs. Equality Matching.
• Some (but not all) examples of taboo trade-offs are in fact category errors. The Finite Price of Human Life thesis feels counterintuitive because it pits our Market Pricing versus the sacred values held by Communality.
• Humans often use indirect speech acts to reconcile relationship types with semantic content.Rather than saying e.g., “pick me up after work”, we often say things like, “If you would pick me up after work, that would be awesome”. While more verbose, the latter expression feels more polite because it is couched in a Communality frame, rather than signaling Dominance.

In addition to its explanatory reach, multiple strands of evidence come together in support of  Relational Model theory:

• Factor analysis. If you ask people to describe their relationships, you can see whether your theory predicts statistical patterns in their responses. When RMT was compared with other taxonomies (and there are a lot of them), RMT starkly outperforms its competitors.
• Ethnographies. RMT was invented by anthropologist Alan Fiske to capture regularities he saw across different cultures. For example, he found examples of marriage treated as Dominance, as Market Pricing, etc – but never a fifth type. A number of cross-cultural studies indicate that the four relational models constitute a human universal.
• Social errors. When people misremember a person’s name, it tends to be a person with whom they share the same relationship type. For example, if you flub the name of your boss, you are more likely to say the name of someone else in a position of authority over you.
• Brain studies.  In the cortex, the default mode network is universally acknowledged to perform social processing. But within this specialized region, different subregions are activated when processing e.g., Communality vs Reciprocity relationships.

The Relational Sphere Hypothesis

Human societies can be conceived as operating in three spheres: markets, governments, and communities. The Cultural Sphere Hypothesis holds this trichotomy to be fundamental, and exhaustive of social space.

There seems to be a relationship between the cultural spheres and relation models. But there are three spheres vs four models. What gives?

Things become more clear when we remember that market- based economies were invented during the Neolithic Revolution, with the dawn of agriculture. Before this inflection point in history, transactions took place with gift economies.

This suggests that the Market Pricing relational model is evolutionarily recent: before the invention of agriculture, it simply did not exist.

I call this particular mapping from relational models to cultural spheres the Relational Sphere Hypothesis (RSH). It is an intertheoretic reduction: it purports to be a significant join point between micro- and macro-sociality.

RSH predicts that three out of four relational models can be traced back to the birthplace of Homo Sapiens. Thus, we should expect predecessors for these relationship categories in primate societies! And we find precisely that:

• Dominance models are expressed in the dominance hierarchy (where physical dominance slowly gave way to symbolic dominance).
• Communality models are expressed in kin selection (where attachment to and care for relatives was slowly extended towards e.g. close friends).
• Reciprocity models are expressed in reciprocal altruism (where increasingly large delays between favor-transactions became possible).

I have argued elsewhere that the dual-process models so popular in today’s moral psychology can be captured in the interactions between (cortical) propriety frames and (subcortical) social intuitions. These two systems comprise the building blocks of sociality. RSH dovetails nicely with this dual process account, as it perceives categories within these systems, each with its own distinctive logic:

With the exception of Sanctity, these subconscious social intuitions arguably exist in primates. For example, here is evidence that rhesus monkeys have strong intuitions about Fairness:

A New Kind of Social Network

The Relational Sphere Hypothesis can be further illustrated by social networks: graphs where nodes are individuals, and edges are relationships. These kinds of models are very common across many disciplines that study aggregate social phenomena; for example evolutionary game theorists. A social network may look something like this:

But relationships inhabit different categories. We can express this fact by coloring edges according to their relational model:

Note that some nodes (e.g. A and B) are connected by more than one color. This signifies that the relationship between A and B features both Communality and Dominance.

From this more complete picture of human relationships, we can derive our cultural spheres by examining the (mono-color) subgraphs:

Sphere Evolution & Competition

Political, social, and economic institutions have dramatically changed across the course of human history. As we saw in Deep History of Humanity, the evolution of our species can be usefully divided into three time periods:

The Sphere Competition Conjecture comprises a set of informal intuitions that relational models “competes for our attention”: gains in one sphere are often accompanied by losses in another.

Let me illustrate this conjecture with examples. 🙂

Social vs Economic spheres

• The religious instinct is etched deeply into the hominid mind, and evidence for shamanic animism dates back to the advent of behavioral modernity. Modern religion is located squarely within the Social sphere. But what caused its institutionalization, the invention of the full-time religious specialist: the priest? Religious institutions were founded during the transition from gift economy to market economies. For the first time in history, material wealth mattered more in transactions than interpersonal reputation. With the Social sphere threatening to collapse, perhaps it is not a coincidence that it was at this moment in history that religion became more explicitly social.
• Some existential philosophers argue that the industrial revolution, with its obscenely large increase in Economic productivity, has correlated with a weakening of Social values, as witnessed empirically by the rise of materialism. Perhaps the malaise and cynicism of postmodernity can be explained by the weakening of the ties of community.
• The custom of tipping can be conceived as an organ of Sociality, that feels misplaced in today’s Market-oriented economy. This institution shows no signs of abating (for example, Uber recently rescinded its no-tipping policy). Perhaps the reason this Social technology persists, while others have disintegrated, is because tipping solves the principal agent problem: customer service is otherwise not factored into the price, because that information is not easily available to management.
• Product boycotts are another example of Social outrage affecting Economic markets.

Social vs Political.

• Another important event in the history of religion is the transition to universal religions: where the concerns of the gods and the consequences of moral violations were imbued with an aura of the eternal. Anthropological evidence clearly suggests that universal religions succeeded because they facilitated larger group sizes.
• Corruption is often treated as a political problem, but in fact bribery and collusion both require high amounts of social capital.
• In American history, political partisanship has been most severe in the 1880s, and at present. Both then and now are periods of an intense drought of social capital. Further, participation in voting strongly correlates with vibrant community and civic life. We might conjecture that weaker communities are more vulnerable to partisanship infighting. This conjecture is aligned with the oft-cited observation that partisanship tends to correlate with moderates abandoning the political arena.

Economic vs Political.

• Capitalist Peace Theory formalizes the observed inverse relationship between free trade and international conflict. On this hypothesis, one of the strongest predictors of war is resource acquisition, and the risk-benefit calculus changes (improves) substantially with the removal of tariffs.

Economic vs Political vs Social.

• The Size of Nations Hypothesis is the idea that the size of nation (Political) is driven by two competing factors: larger nations are able to produce public goods more efficiently (Economic), but conversely their populations are more heterogenous and thereby less cohesive (Socially).

Some of the phenomena described above have been extensively studied by social scientists. However, to my knowledge, no extant models robustly capture the doctrine of relational model theory. Perhaps the next generation of formal models will do better.

# New Foundations: Towards Tribal Unity

Overview

In Five Tribes of Machine Learning, I reviewed Pedro Domingos’ account of tribes within machine learning. These were the Symbolists, Connectionists, Bayesians, Evolutionaries, and Analogizers. Domingos thinks the future of machine learning lies in unifying these five tribes into a single algorithm. This master algorithm would weld together the different focal points of the various tribes (c.f. the parable of the blind men and the elephant).

Today, I will argue that Domingos’ goal is worthy, but his approach too confined. Integrating theories of learning surely constitutes a constructive line of inquiry. But direct attempts to unify the tribes (e.g., Markov logic) are inadequate. Instead, we need to turn our gaze towards pure mathematics: the bedrock of machine learning theory. Just as there are tribes within machine learning, mathematical research has its own tribes (image credit Axel Sarlin):

The tribes described by Domingos draw from the math of the 1950s. Attempting mergers based on these antiquated foundations is foolhardy. Instead, I will argue that updating towards modern foundational mathematics is a more productive way to pursue the master algorithm. Specifically, I submit that machine learning tribes should strive to incorporate constructive mathematics, category theory, and algebraic topology.

Classical Foundations

Domingos argues for five machine learning tribes. I argue for four. I agree that his Symbolists, Connectionists , and Bayesians are worthy of attention. But I will not consider his Evolutionaries and Analogizers: these tribes have been much less conceptually coherent, and also less influential. Finally, I submit Frequentists as a fourth tribe. While this discipline tend to self-identify as “predictive statistics” instead of  “machine learning”, their technology is sufficiently similar to merit consideration.

The mathematical foundations of the Symbolists rests on predicate logic, invented by Gottlieb Frege and C.S. Peirce. This calculus in turn forms the roots of set theory, invented by Georg Cantor and elaborated by Bertrand Russell. Note that 3 out of 4 of these names come from analytic philosophy. Alan Turing’s invention of his eponymous machine marked the birthplace of computer science. The twin pillars of computer science are computability theory and complexity theory, which in turn both rest on top of set theory. Finally, algorithm design connects with the mathematical discipline of combinatorics.

The foundation of the Statisticians (both Bayesian and Frequentist) is measure theory (which, coincidentally, borrows from set theory). The field of information theory gave probability distributions the concept of uncertainty: see entropy as belief uncertainty. Finally, formal theories of learning draw heavily from optimization: where model parameters are tuned to optimize against miscellaneous objective functions.

Mathematical research can largely be decomposed into two flavors: algebraic and analytical. Algebra focuses on mathematical objects and structures: group theory, for example, falls under its umbrella. Analysis alternatively focuses on continuity, and includes fields like measure theory and calculus. Notice that the mathematical foundations of the Symbolists is fundamentally algebraic; whereas that of the Statisticians are analytic. This gets at the root of why machine learning tribes often have difficulty communicating with one another.

Classical Applications

We have already noted that that Symbolists, Connectionists, and Bayesians have all created applications in machine learning (decision trees, neural networks and graphical models, respectively). These tribes are also expressed in neuroscience (language of thought, Hebbian learning, and Bayesian Brain, respectively). They have also all developed their own flavors of cognitive architectures (e.g., production rule systems, attractor networks, and predictive coding respectively).

Frequentist Statisticians have no real presence in machine learning, neuroscience, nor cognitive architecture. But they are the only dominant force in the social sciences; e.g., econometrics.

I should also note that, in addition to the fields already noted Symbolists have unique presence in linguistics (especially Chomskyian universal grammar) and analytic philosophy (c.f., that field’s heavy reliance on predicate logic, and the linguistic turn in the early twentieth century).

Finally, causal inference only exists in the Bayesian (Pearlean d-separation) and Frequentist (Rubin potential outcome models). To my knowledge, this technology has not yet been robustly integrated into the Symbolist nor Connectionist tribes to date.

These four tribes largely draw from early twentieth century mathematics. Let us now turn to what mathematicians have been up to, in the past century.

Towards New Foundations

Let me now introduce you to the three developments in modern mathematics: constructive mathematics, category theory, and algebraic topology.

In classical logic, truth is interpreted ontologically: a fact about the world. But truth can also be interpreted epistemically: a true proposition is one that we can prove. But epistemic logic (aka intuitionistic logic) has us reject the Law of Excluded Middle (LEM): failing to prove a theorem is not the same thing as disproving it.

By removing LEM from mathematics, proof-by-contradiction become impossible. While this may seem limiting, in fact it also opens the doors for constructive mathematics: mathematics that can be input, and verified, by a computer. Erdos’ Book of God will be supplanted by the Github of God.

In recent years, category theory has emerged as the lingua franca of theoretical mathematics. It is built on the observation that all mathematical disciplines (algebraic and analytic) fundamentally describe mathematical objects and their relationships. Importantly, category theory allows theorems proved in one category to be translated into entirely novel disciplines.

Finally, since Alexander Grothendieck’s work on sheaf and topos theory, algebraic topology (and algebraic geometry) have come to occupy an increasingly central role in mathematics. This trend has only intensified in the 21st century. As John Baez puts it,

These are just the first steps in the ‘homotopification of mathematics, a trend in which algebra more and more comes to resemble topology, and ultimately abstract ‘spaces’ (for example, homotopy types) are considered as fundamental as sets.

These three “pillars” are perhaps best motivated by the technology that rests on it.

Computational trinitarianism is built on deep symmetries between proof theory, type theory and category theory. The movement is encapsulated in the slogan “Proofs are Programs” and “Propositions are Types”. This realization led to the development of Martin-Lof dependent type theory, which in turn has led to theorem proving software packages such as Coq.

In metamathematics, researchers investigate whether a single formal language can form the basis of the rest of mathematics. Historically, three candidates have been Zermelo-Frankel (ZF) set theory, and more recently Elementary Theory of the Category of Sets (ETCS). Homotopy type theory (HoTT) is a new entry into the arena, and extends computational trinitarianism by the Univalence Axiom, an entirely new interpretation of logical equality. Under the hood, the univalence axiom relies on a topological interpretation of the equality type. Suffice it to say, this particular theory has recently inspired a torrent of novel research. Time will tell how things develop.

In thermodynamics is built on the idea of Gibbs entropy (or, more formally, free energy). The basic intuition, which stems from statistical physics, is that disorder tends to increase over time. And thermodynamics does appear to be relevant in a truly diverse set of physical phenomena.

• In physics, entropy is the reason behind the arrow of time (its “forward directionality”)
• In chemistry, entropy forms the basis for spontaneous (asymmetric) reactions
• In paleoclimatology, there is increasing reason to think that abiogenesis occurred via a thermodynamic process.
• In anatomy, entropy is the organizing principle underlying cellular metabolism.
• In ecology, entropy explains emergent phenomena related to biodiversity.

If I were to point at one candidate for the Universal Algorithm, entropy minimization would be my first pick. It turns out, strangely enough, that thermodynamic (Gibbs) entropy has the same functional form as information-theoretic (Shannon) entropy, which measures uncertainty in probability distributions. This is no accident. Information geometry extends this notion of “thermodynamic information” by interpreting entropy-distributions as stochastic manifolds.

In physics, of course, the two dominant theories of nature (general relativity + QFT) are mutually incompatible. It is increasingly becoming apparent that quantum topology is most viable way to achieve a Grand Unified Theory. From this paper,

Feynman diagrams are used to reason about quantum processes. In the 1980s, it became clear that underlying these diagrams is a powerful analogy between quantum physics and topology. Namely, a linear operator behaves very much like a ‘cobordism’: a manifold representing spacetime, going between two manifolds representing space. This led to a burst of work on topological quantum field theory and ‘quantum topology’

Searching For Unity

That was a lot of content. Let’s zoom out. What is the point of being introduced to these new foundations? To give an more detailed intuition on which ML research is worthy of your attention (and participation!).

Most attempts to unify machine learning draw from merely classical foundations. For example, consider fuzzy logic, Markov logic networks, Dempster-Shafer theory, and Bayesian Neural Networks. While these ideas may be worth learning (particularly the last two), as candidates for unification they are necessarily incomplete; doomed by their unimaginative foundations.

In contrast, I submit you should funnel more enthusiasm towards ideas that draw from our new foundations. These may be active research concepts.

• In linguistics, categorical compositionality is the marriage of category theory and traditional syntax. It blends nicely with probabilistic approaches of meaning (e.g., word2vec). See this 2015 paper, for example.
• In statistics, topological data analysis is a rapidly expanding discipline. Rather than limiting oneself to probabilistic distribution theory (exponential families), this approach to statistics incorporates structural notions from algebraic topology. See this introductory tutorial, for example.
• In neuroscience, the most recent Blue Brain experiment suggests that the Hebbian-style learning is not the whole story. Instead, the brain seems to rely on connectome topography: dynamically summon and disperse cliques of neurons, whose cooperation subsequently disappears like a tower of sand.
• In macroeconomics, neoclassical models (based on partial differential equations) are being challenged by a new kind of model, econophysics, which views the market as a kind of heat machine.

Or they may be entirely unexplored questions that dawn on you by contemplating conceptual lacunae.

• What would happen if I were to re-imagine probability theory from intuitionistic principles?
• How might I formalize production rule cognitive architectures like ACT-R in category theory?
• Is there a way to understand neural network behavior and the information bottleneck from a topological perspective?

Until next time.

# ERTAS: The Engine of Consciousness

Existential Mode Generators

In Why We Sleep, we discussed sleep architecture diagrams. These diagrams show clear electrical differences between three existential modes: NREM (“sleeping”), REM (“dreaming”), and Consciousness.

While EEG excels at providing temporal resolution, it doesn’t provide much spatial information. Where does the brain construct these three modes?

To answer this, neuroscientists cut the brains of cats in half… literally. If you perform a Cerveau Isolé cut (slice above the midbrain), the top half’s electrical signature is NREM. If you do a Midpontine Pre-Trigeminal cut (slice below the midbrain), the top half’s electrical signature is NREM + Consciousness.

This evidence shows that existential modes are generated by different areas. Specifically:

• Sleep is induced by the diencephalon.
• Dreaming is initiated by the metencephalon.
• Consciousness is ignited by the mesencephon.

Neuroscientists now knew where to look! It was not long before they discovered the machinery that create consciousness, sleeping, and dreaming:

We now turn our gaze to the ascending reticular activating system (ARAS).  “Reticular” is a word that means “web-like”, so the name roughly means “web-like ignition switch”.  But before we do so, we need to turn our gaze to the relationship between cortico-thalamic (CT) radiations and consciousness.

Thalamus Anatomy & Function

We have also explained that the purpose of consciousness is to solve the binding problem: gluing together disparate adjectives into coherent nouns:

Consciousness creates the coherent objects of working memory by implementing phase binding, where object features are stitched together in distinct frequency bands, not unlike the radio in your car.

We have previously described the thalamus and cortex as dually innervating spheres, not dissimilar to a plasma globe:

And indeed, the nuclei within the thalamus tile the entire cortex:

Note, however, that only some thalamic nuclei are specific (project to discrete patches of cortex). Nonspecific thalamic nuclei are also present, including the Intralaminar Nuclei (ILN) and Reticular Nucleus of the Thalamus (RNT).

These nonspecific nuclei are the principal components of the ERTAS system, and plausible candidates for the engine of consciousness.

Damage of specific nuclei produce loss of a particular modality.  In contrast, lesions to nonspecific nuclei produces deep disturbances of consciousness. In fact, recent evidence suggests that such lesions perturb cortico-cortical information transmission.

The ERTAS Hypothesis

The ascending reticular activating system (ARAS) consists of a dense web of nuclei. Indeed, the word “reticular” means “web-like”. Parvizi, Damasio (2001) outline the more significant members of the system:

These nuclei project to the following three sites:

1. Reticular Nucleus of the Thalamus (RNT), a sheet that sits on top of the thalamus.
2. Intralaminar Nuclei (ILN), which are embedded deep within the thalamus.
3. Basal Forebrain, which receives & distributes several neurochemical systems.

These structures in turn route information flowing to cortex:

The extended reticular-thalamic activating system (ERTAS) hypothesis connects the ARAS system with the phase binding interpretation of the cortico-thalamo-cortical reentrant loop. One hypothesis, adapted from Newman (1999), has three theses:

• ILN performs phase binding (and thus, the consciousness generator).
• RNT implements selective attention.
• Basal Forebrain provides visceral “body-relevant” information.

More recent research has corroborated the role of the ILN in phase binding, and expanded its scope. Saalmann (2014) notes that the ILN seems to participate in a larger group of higher-order nuclei which each manage information within more constrained parts of cortex. The anterior ILN seems more related to oculomotor processes; the posterior deals with the multimodal integration of different sense data.

One unexpected recent finding has been that lesions of “higher-order nuclei” such as the ILN seem to perturb cortico-cortical information transmission. This underscores the need to understand interactions between the CTC Loop and other reentrant loops.

The Role of The Claustrum

The claustrum is a tiny sheet of gray matter suspended between thalamus and cortex. However, it receives information from essentially the entire cortex:

Given that the purpose of consciousness is to integrate cortical information, the anatomical position of the claustrum is suggestive.

Recent anatomical evidence has only strengthened the case for claustrum promoting consciousness:

• Koubeissi et al  (2014) is a case study where they were electrical stimulation of the claustrum induced loss of consciousness (!).
• Chau et al (2015) announced evidence that correlate claustrum lesions with the duration, but not the frequency, of loss of consciousness.
• Wang et al (2016) conclusively proved that the claustrum has reciprocal connections everywhere in cortex.
• Reardon (2017) announced the discovery of a single neuron whose dendrites encircled the entire brain (image credit)

These data are suggestive. However, it will be some time before we know enough to integrate claustrum function within the ERTAS system.

Until next time.

