Jesus, disciple of John

Why do the Gospels care about John?

In the 20s CE, at least two prophets were active in the Israelite highlands: John the Baptist and Jesus of Nazareth. Both were killed on political grounds. Jesus left behind disciples that remained loyal to him in some sense. So did John. In fact, these two religious groups interacted (vied for influence?) after the deaths of their leaders.

Ultimately, John’s religious group died out; Jesus’ following did not. With the exception of Josephus and a few other secular sources, the Christian gospels are our best source of information of the religious climate of this time period.

These Christian gospels spend an astonishing amount of time describing John: both his independent ministry, and his relationship with Jesus. John’s message that a powerful Son of Man will judge the world, is interpreted by Christians as referring to Jesus.

Why should the gospels lavish John with such attention and theological import? Two hypotheses suggest themselves,

  1. The early Christians shared a broader Jewish respect for John’s ministry, and that reverence led to the attention & theological significance.
  2. The early Christians crafted the gospels partially in effort to convert John’s disciples.

As we shall see, neither of these hypotheses are adequate. Instead, we shall see evidence suggesting that Jesus began his ministry as a disciple of John.

On Jesus’ Baptism

The gospels record that John baptized Jesus. This event is prima facie embarrassing for two reasons:

  1. Implications of imperfection. John’s baptism was clearly and consistently described as “for the forgiveness of sins”.
  2. Implications of subordination.  This is the reason Matthew has the Baptist say “I need to be baptized by you, yet you come to me?”

Mark and Matthew combat with these implications by describing a theophany where God calls Jesus his Son. In contrast, Luke makes the Baptist a relative of Jesus, and has John imprisoned before Jesus’ baptism. We are never explicitly told who baptizes Jesus. And in the fourth gospel, John the Baptist is not the Baptist, the title is never used on him. He even denies that he is Elijah, even though in Matthew, Jesus flatly affirms that he is.

This incredible diversity of interpretations is due to a simple fact. At the beginning of Jesus’ ministry stands an independent Baptist, a Jewish prophet who won great popularity and reverence before and apart from Jesus, who also won the reverence and submission of Jesus to his baptism of repentance for the forgiveness of sins, and who left behind a religious group that continued to exist apart from Christianity.

The Baptist constituted a stone of stumbling right at the beginning of the story of Jesus, a stone too well known to be ignored or denied, a stone that each evangelist had to come to terms with as best he could. The embarrassment of the evangelists is illustrated by the diverse, not to say contradictory ways in which they try to bend the independent Baptist to a dependent position within the story of Jesus.

A Common Vision

The gospels record that Jesus was baptized by this prophet. But why would he go? Since nobody compelled him, he must have gone to John because he agreed with John’s message.

There were lots of other groups vying for Jewish attention. Jesus did not join the Pharisees, who emphasized scrupulous observance of the Torah. He did not align himself with the Sadducees, who focused on the worship of God through the Temple cult. Nor did he associate with the Essenes, who formed monastic communities to maintain their own ritual purity. Nor did he subscribe to the teaching of the “fourth philosophy”, which advocated a violent rejection of Roman domination.

No, Jesus associated with an ascetic prophet who proclaimed an imminent end of history. As we will see later, this fact will shed light on the ministry of the historical Jesus.

A Common Practice

Was Jesus’ baptism a singular event? Did he spend much time with John? Was he admitted into John’s inner circle?

Jesus’ first disciples were John’s disciples. If some disciples of the Baptist came to transfer their allegiance to him while they were still in the company of the Baptist, that suggests that Jesus had stayed in the Baptist’s orbit long enough for some of the latter’s disciples to come to know him and be impressed by him.

The fourth gospel admits that Jesus’ ministry included baptism. In fact, not ten sentences later, and that claim is baldy contradicted. However, several pieces of evidence suggest this is the (rather clumsy) work of a Johannine redactor.

Jesus practicing baptism is further reinforced by Mark 11:27-30: “The chief priests asked Jesus, “Who gave You this authority to do these things? Jesus replied, “One question, then I will tell you. Was John’s baptism from heaven or from men?”

The Sadducees were keen to admit John’s religious authority, and deny Jesus’. So why would Jesus invoke John’s baptism? A likely explanation is that it was an area of ministry overlap: the Sadducees couldn’t well admit John’s baptism was divine, yet criticize Jesus’ ministry which included that very baptism.

Jesus as Disciple

A picture is slowly emerging. Jesus began his public life as one of John’s disciples. This is the best explanation for his a) being baptized by John, b) taking John’s disciples, c) practicing John’s baptism. He slowly differentiated himself with the following teachings:

  • Non-asceticism. John was renowned for his minimal lifestyle. Jesus was no stranger to parties, so to speak.
  • Miraculous works. John’s ministry did not feature miracles. Jesus’ did, and he used this to illustrate his end-times message.

Yet despite these divergences, Jesus and John operated largely complementary ministries. Consider Matthew 11:16-19

To what should I compare this generation? It’s like children who call out to each other: “We played the flute for you, but you didn’t dance; we sang a lament, but you didn’t mourn!”

For John did not come eating or drinking, and they say, ‘He has a demon!’ Jesus came eating and drinking, and they say, ‘Look, a glutton and a drunkard, a friend of tax collectors and sinners!’

Yet wisdom is vindicated by her children.

This passage is remarkable because it places John and Jesus’ ministry side by side. Absent are theological claims of Jesus’ superiority.  To be sure, John’s asceticism and Jesus’ non-asceticism are contrasted. Yet John (lamenter) and Jesus (flute player) are both children of wisdom.

Jesus after John

What was the relationship like between John and Jesus? Did they always function collaboratively, or competitively?

The details of this relationship are largely lost to history. Some evidence of tension can be inferred in how frequently Jesus was asked to clarify his relationship to John.

One of our most compelling clues, however, lies in the moving plea from Jesus to his former rabbi:

When John heard in prison what the Messiah was doing, he sent a message by his disciples and asked Him, “Are You the One who is to come, or should we expect someone else?” Jesus replied to them, “Go and report to John what you hear and see: the blind see, the lame walk, those with skin diseases are healed, the deaf hear, the dead are raised, and the poor are told the good news. And if anyone is not offended because of Me, he is blessed.

Absent are the polemics so typical of Jesus’ sayings.  This beautitude has an audience of one. This delicate appeal to his former rabbi: “please do not be offended because of [my origin]”. And yet here, tellingly, the conversation stops. We are not told John’s reply. The relationship is left ambiguous, as John heads for his execution by Herod Antipas.

After the execution of the Baptist, Jesus’ ministry developed by itself. And yet, as we will see, Jesus never fully emerges from the shadow of John. Their common ministry and message pervades the remaining years of Jesus’ ministry.


Five Tribes of Machine Learning

Part Of: Machine Learning sequence
Content Summary: 900 words, 9 min read

ML is tribal, not monolithic

Research in artificial intelligence (AI) and machine learning (ML) has been going on for decades. Indeed, the textbook Artificial Intelligence: A Modern Approach reveals a dizzying variety of learning algorithms and inference schemes. How can we make sense of all the technologies on offer?

As argued in Domingos’ book The Master Algorithm, the discipline is not monolithic. Instead, five tribes have progressed relatively independently. What are these tribes?

  1. Symbolists use formal systems. They are influenced by computer science, linguistics, and analytic philosophy.
  2. Connectionists use neural networks. They are influenced by neuroscience.
  3. Bayesians use probabilistic inference. They are influenced by statistics.
  4. Evolutionaries are interested in evolving structure. They are influenced by biology.
  5. Analogizers are interested in mapping to new situations. They are influenced by psychology.

Expert readers may better recognize these tribes by their signature technologies:

  • Symbolists use decision trees, production rule systems, and inductive logic programming.
  • Connectionists rely on deep learning technologies, including RNN, CNN, and deep reinforcement learning.
  • Bayesians use Hidden Markov Models, graphical models, and causal inference.
  • Evolutionaries use genetic algorithms, evolutionary programming, and evolutionary game theory.
  • Analogizers use k-nearest neighbor, and support vector machines.

Five Tribes- Strengths and Technologies

In fact, my blog can be meaningfully organized under this research landscape.

History of Influence

Here are some historical highlights in the development of artificial intelligence.

Symbolist highlights:

  • 1950: Alan Turing proposes the Turing Test in Computing Machinery & Intelligence.
  • 1974-80: Frame problem & combinatorial explosion caused First AI Winter.
  • 1980: Expert systems & production rules re-animate the field. 
  • 1987-93: Expert systems too fragile & expensive, causing the Second AI Winter.
  • 1997: Deep Blue defeated reigning chess world champion Gary Kasparov.

Connectionist highlights:

  • 1957: Perceptron invented by Frank Rosenblatt.
  • 1968: Minsky and Papert publish the book Perceptrons, criticizing single-layer perceptrons. This puts the entire field to sleep, until..
  • 1986: Backpropagation invented, and connectionist research restarts.
  • 2006: Hinton et al publish A fast learning algorithm for deep belief nets, which rejuvinates interest in Deep Learning.
  • 2017: AlphaGo defeats reigning Go world champion, using DRL.

Bayesian highlights:

  • 1953: Monte Carlo Markov Chain (MCMC) invented. Bayesian inference finally becomes tractable on real problems.
  • 1968: Hidden Markov Model (HMM) invented.
  • 1988: Judea Pearl authors Probabilistic Reasoning in Intelligent Systems, and creates the discipline of probabilistic graphical models (PGMs).
  • 2000: Judea Pearl authors Causality: Models, Reasoning, and Inference, and creates the discipline of causal inference on PGMs.

Evolutionary highlights

  • 1975: Holland invents genetic algorithms.

Analogizer highlights

  • 1968: k-nearest neighbor algorithm increases in popularity.
  • 1979: Douglas Hofstadter publishes Godel, Escher, Bach.
  • 1992: support vector machines (SVMs) invented.

We can summarize this information visually, by creating an AI version of the Histomap:

Five Tribes- Historical Size and Competition (2)

These data are my own impression of AI history. It would be interesting to replace it with real funding & paper volume data.

Efforts Towards Unification

Will there be more or fewer tribes, twenty years from now? And which sociological outcome is best for AI research overall?

Theory pluralism and cognitive diversity are underappreciated assets to the sciences. But scientific progress is often heralded by unification. Unification comes in two flavors:

  • Reduction: identifying isomorphisms between competing languages,
  • Generalization: creating a supertheory that yields antecedents as special cases.

Perhaps AI progress will mirror revolutions in physics, like when Maxwell unified theories of electricity and magnetism.

Symbolists, Connectionists, and Bayesians suffer from a lack of stability, generality, and creativity, respectively. But one tribe’s weakness is another tribe’s strength. This is a big reason why unification seem worthwhile.

What’s more, our tribes possesses “killer apps” that other tribes would benefit from. For example, only Bayesians are able to do causal inference. Learning causal relations in logical structure, or in neural networks, are important unsolved problems. Similarly, only Connectionists are able to explain modularity (function localization). Symbolist and Bayesian tribes are more normative than Connectionism, which makes their technologies tend towards (overly?) universal mechanisms.

Symbolic vs Subsymbolic

You’ve heard of the symbolic-subsymbolic debate? It’s about reconciling Symbolist and Connectionist interpretations of neuroscience. But some (e.g., [M01]) claim that both theories might be correct, but at different levels of abstraction. Marr [M82] once outlined a hierarchy of explanation, as follows:

  • Computational: what is the structure of the task, and viable solutions?
  • Algorithmic: what procedure should be carried out, in producing a solution?
  • Implementation: what biological mechanism in the brain performs the work?

One theory, supported by [FP98] is that Symbolist architectures (e.g., ACT-R) may be valid explanations, but somehow “carried out” by Connectionist algorithms & representations.

Five Tribes- Tribes vs Levels (2)

I have put forward my own theory, that Symbolist representations are properties of the Algorithmic Mind; whereas Connectionism is more relevant in the Autonomic Mind.

This distinction may help us make sense for why [D15] proposes Markov Logic Networks (MLN) as a bridge between Symbolist logic and Bayesian graphical models. He is seeking to generalize these technologies into a single construct; in the hopes that he can later find a reduction of MLN in the Connectionist paradigm. Time will tell.


Today we discussed five tribes within ML research: Symbolists, Connectionists, Bayesians, Evolutionaries, and Analogists. Each tribe has different strengths, technologies, and developmental trajectory. These categories help to parse technical disputes, and locate promising research vectors.

The most significant problem facing ML research today is, how do we unify these tribes?


  • [D15] Domingos (2015). The Master Algorithm
  • [M01] Marcus (2001). The Algebraic Mind
  • [M82] Marr (1982). Vision
  • [FP98] Fodor & Pylyshyn (1998). Connectionism and cognitive architecture: A critical analysis

Constraint Satisfiability: Zebra Puzzle

Part Of: Logic sequence
Content Summary: 1000 words, 10 min read.

Today, we look at the Zebra Puzzle (aka Einstein Puzzle). According to legend, Albert Einstein invented this as a child, and claimed that 98% of the human population cannot solve it.

Let’s see if we are in the 2%.

The Puzzle

Five men of different nationalities and with different jobs live in consecutive houses on a street. These houses are painted different colors. The men have different pets and have different favorite drinks. The following rules are provided:

  1. The English man lives in a red house
  2. The Spaniard owns a dog
  3. The Japanese man is a painter
  4. The Italian drinks tea
  5. The Norwegian lives in the first house on the left
  6. The green house immediately to the right of the white one
  7. The photographer breeds snails
  8. The diplomat lives in the yellow house
  9. Milk is drunk in the middle house
  10. The owner of the green house drinks coffee
  11. The Norwegian’s house is next to the blue one
  12. The violinist drinks orange juice
  13. The fox is in a house that is next to that of the physician
  14. The horse is in a house next to that of the diplomat

Who owns a zebra? And whose favorite drink is mineral water?

To answer this problem, we must learn 5 house-nation-color-drink-pet-job combinations. A solution might look like this:

  • Yellow far-left house has Norwegian diplomat who drinks water and owns a fox
  • White left house has Italian photographer who drinks tea and owns a zebra.
  • Red middle house has English photographer who drinks milk and owns snails.
  • Green right house has Spanish physician who drinks OJ and owns a dog
  • Blue far-right house has Japanese painter who drinks coffee and owns a horse.

But this solution is incorrect: it violates Rule 6: “The green house immediately to the right of the white one.”

How do we find a solution that doesn’t violate any of our constraints? Does one even exist? Or is this set of constraints not satisfiable?

Formalizing Logical Structure

Words are distracting. Let’s use symbols instead.

Einstein's Puzzle- Symbol Code (7)

With this code, we can write the above solution as a matrix.

Einstein's Puzzle- Solution Matrix

We can also formalize our constraints.

Einstein's Puzzle- Constraint Formalization

These constraints are ugly. Let’s write them in matrix form instead!

Einstein's Puzzle- Constraint Matrix Horizontal (1)

Constraint Satisfaction as a Jigsaw Puzzle

We can use the above constraints to visually check satisfiability. Whereas before you had to parse the meaning of Rule 6 verbally, now you can just inspect whether there is a visual match between rule and solution.

Einstein's Puzzle- Visual Satisfiability Check (1)

One way to determine satisfiability is to perform these checks until you find a viable solution. But this is computationally expensive: there are 25 billion solutions. Instead of inspecting every possible solutions, why don’t we generate one solution?

How? Since our Rules are used for solution-checking, why can’t we use them for solution-building?

On this view, solution building takes on the flavor of a jigsaw puzzles. Each constraint is a puzzle piece, from these ingredients we construct the solution.


Unfortunately, there is more than one way to solve a 5×5 jigsaw puzzle. Let me show you one way to solve this one. We will be use choice minimization to simplify our lives: try to play the move with the fewest degrees of freedom.

Solution: Path A

Rule 5 and 9 relate to the houses, they are easy to apply.

After these, the Rule 11 puzzle piece fits unambiguously.


Let’s apply Rule 6 next. That jigsaw piece can fit in two locations, the M+R columns, or the R + FR columns. Gotta choose one: let’s select the former. After that move, Rule 10 fits unambiguously.

The FR column is the only place that has an unclaimed nation and color: Rule 1 must go there. similarly, the FL column is the only available spot for Rule 8.


Here we can apply Rule 14 (the original clue’s wording “The horse is in a house next to that of the diplomat” means that the puzzle piece can be flipped horizontally).

After that, only column L can accommodate Rule 4. Then FR must accept Rule 12. 


Disaster! Consider Clue 2, 3, and 7. These rules are mutually exclusive (they have at least one row in common with one another), and have overlapping domains (they all cannot fit in FL, but must fit in either M or R).

Einstein's Puzzle- hPath A3 Paradox

This is the pigeonhole principle: just as three pigeons cannot fit into two holes, there is no way to reach a solution.

Does that mean the puzzle is unsolvable? No, it means we explore other choices.

Solution: Path B

Let’s return to the other possible placement of Rule 6. Instead of putting it in M+R columns, we’ll put it in R+FR. Then, Rules 10, 1, 8, and 14 follow inevitably (each has precisely one choice).


Here we face another choice: do we put puzzle piece 4 in the left or right house? Let’s choose the right house. Then, Rule 12 and Rule 3 follow logically.


Alas! Another disaster. Rule 2 doesn’t fit. 😦

Solution: Path C

Retrace our steps! The last choice we made was Place(4, R). What if we place it in the left house instead?


To our delight, we now see that Path 2b is the only correct logical journey through our puzzle.  The concluding steps are given below, and the desired quantities are shown in the “missing” tiles.


Recall the original questions:

Who owns a zebra (P5)? Whose favorite drink is mineral water (D5)?

Our symbol table can translate our answer:

The Japanese man (N3) owns the zebra, and the Norwegian (N5) drinks mineral water


The above solution is nothing more to solving a 5×5 jigsaw puzzle. I suspect this technique will only become clear with practice. Go solve Einstein’s Riddle on your own, or one of these variants!

For the solution above, it is helpful to review our search history. Remarkably, we only faced two choices in our solution. When one branch failed, we turned out attention to other branches. This is known as recursion, and will be the subject of another blog post.

Einstein's Puzzle- Search History (2)

Many programming solutions exist for these kinds of problems. In practice, libraries can be used to write more concise solvers.

This kind of problem is called propositional satisfiability (SAT), or constraint programming (CP), although these two disciplines differ in subtle ways.

As we will see next time, SAT problems are at the root of complexity theory and artificial intelligence. Until then.

Complementary Learning Systems

Part Of: Demystifying Memory sequence
Content Summary: 1000 words, 10 min read


Your brain is constantly keeping track of the world and your body. It represents these ever-changing environments by patterns of neural activation. Knowledge is not kept in the neurons themselves, but in the connections between neurons.

Sometimes, the brain will discover useful regularities in the environment, and store these patterns for later use. This is long-term memory. We shall concern ourselves with five kinds of long-term memory:

  1. Episodic: ability to remember events or episodes (e.g., dinner last Tuesday night)
  2. Semantic: ability to remember facts and concepts (e.g., hands have five fingers)
  3. Procedural: ability to develop skills (e.g., playing the piano).
  4. Behavioral: ability to remember stimulus-outcome pairs (e.g., bell means food)
  5. Emotional: ability to remember emotional information (e.g., she is always angry).

These memory systems are computed in different areas of the brain.

  1. Episodic memories are computed by the hippocampus
  2. Semantic memories are computed by the association neocortex
  3. Procedural memories are computed by the somatosensory neocortex
  4. Behavioral memories are computed by the basal ganglia
  5. Emotional memories are computed by the central amygdala

Only episodic and semantic memory are directly accessible to consciousness (i.e., working memory). The others are just available to the autonomous mind.

CLS- Categories of Long-Term Memory (1)


We have previously described conscious experience as a mental movie. But, unlike a normal theater, consciousness has several screens, each of which playing a different sense modality. visual, audio information etc. Call this the multimodal movie.

Semantic memory come in two forms: encyclopedic memory (abstract descriptions of events) and conceptual memory (concepts and their inter-relationships). Both abstractions are derived from the movie, by removing redundant information.

CLS- Episodic vs Semantic Memory

Mind wandering is the tendency of animals to recall past experiences. But why does mind wandering resurrect the details of what was seen, heard, smelled, touched? Why not simply use the plot summary (encyclopedic memory) instead?

Why does episodic memory exist at all?


Henry Molaison was born on February 26, 1926. As a child, he suffered from epilepsy.

CLS- Patient HM (2)

His doctors removed what they thought to be the source of the seizures: the hippocampus. After the surgery, Henry still recognized objects, was able to solve puzzles, even had the same IQ. He had a rich emotional life, and could learn new skills (e.g., to play the piano). But he was completely incapable of forming new episodic memories. Henry (i.e., Patient HM) was locked in a 5 minute loop, never remembering prior events.

Let’s imagine different kinds of amnesia Henry might have experienced.

Scenario 1. Henry has no retrograde amnesia (old memories were unperturbed), but suffers severe anterograde amnesia (unable to create new memories). From this data, we might conclude that the hippocampus creates, but does not store, episodic memories.

CLS- HM Amnesia Pattern v1 (1)

Scenario 2. Henry experiences both severe retrograde and anterograde amnesia. From this data, we might conclude that the hippocampus creates and stores episodic memories.

CLS- HM Amnesia Pattern v2 (2)

Neither scenario actually happened. Instead, Henry experienced temporally graded retrograde amnesia:

CLS- HM Amnesia Pattern v3 (2)

This shows that, while the hippocampus creates and stores episodic memories, these memories are eventually copied elsewhere. This process is called consolidation. Hippocampal damage destroy memories that have not yet been consolidated. 

But why should the brain copy memories? This seems inefficient. And why does this process take years, even decades?


The connectionist paradigm models the brain as a neural networkThe AB-AC task illustrates a challenge for connectionism. It goes as follows:

You want to associate stimulus A with response B. For example, when you hear “chair”, you should say “map”. There are many such associations (Chair-Map, Book-Dog, Car-Idea). This is the AB list.

After you achieve 100% recall on the AB list , a new set of stimulus-response words are given: the AC list. You want to learn both. However, the AB and AC lists have the same stimuli paired with novel responses (e.g. Chair-Printer, Book-Flower, Car-Shirt).

How well do humans and connectionist models do against this task? Let’s find out! The following graphs take place after the AB list has been learned perfectly. Y-axis is %correct, x-axis is number of exposures to the AC list.

CLS- Catastrophic Interference (2)

Consider the left graph. Dotted line is AC recall over time. Humans were able to learn the AC list. The solid line shows AB list performance. As humans learned AC associations, their AB performance suffered a little, from 100 to 60%. This is moderate interference.

Consider the right graph. Dotted line shows that the model is able to learn the AC list, just like the human. But solid line shows that AB recall very quickly drops to 0%. This is catastrophic interference.

Catastrophic interference occurs when the AB list and AC list are learned separately (focused learning). But what if you learn them at the same time? More specifically, what if you train against a shuffled set of AB and AC associations (interleaved learning)?

CLS- Interleaved vs Focused Learning (2)

On the left, focused learning (black squares) shows catastrophic interference against AB memories, as before. But interleaved learning (white dots) show zero interference!

On the right, we see another consequence of interleaved learning: new memories are acquired much more slowly.


We are ready to put the puzzle together.

Catastrophic interference is an inevitable consequence of systems that employ highly-overlapping distributed representations, despite the fact that such systems have a number of highly desirable properties (e.g., the ability to perform generalization and inference).

This problem can be addressed by employing a structurally distinct system with complementary learning properties: sparse, non-overlapping representations that are highly robust to interference from subsequent learning. Such a sparse system by itself would be like an autistic savant: good at memorization but unable to perform everyday inferences. But when paired with the highly overlapping system, a much more versatile overall system can be achieved.

The neocortex and hippocampus comprise these learning systems:

CLS- Two Component Model

First introduced in 1995, Complementary Learning System (CLS) theory predicts a wide range of extant biological, neuropsychological, and behavioral data. It explains why the hippocampus exists, why it performs consolidation, and why consolidation takes years to complete.

The CLS theory was first presented in [M95]. Data in section 4 taken from that paper. Section 5 quotes liberally from [O11].

  • [M95] McClelland et al (1995). Why There Are Complementary Learning Systems in the Hippocampus and Neocortex: Insights From the Successes and Failures of Connectionist Models of Learning and Memory
  • [O11] O’Reilly et al (2011). Complementary Learning Systems

The Argumentative Theory of Reason

Part Of: Demystifying Language sequence
Content Summary: 1200 words, 12 min read.

The Structure of Reason

Learning is the construction of beliefs from experience. Conversely, inference predicts experience given those beliefs.

Reasoning refers to the linguistic production and evaluation of an argument. Learning and inference are ubiquitous across all animal species. But only one species are capable of reasoning: human beings.

Argument can be understood by the lens of deductive logic. Logical syllogisms are a calculus that maps premises to conclusions. An argument is valid if the conclusions follow from the premises. An argument is sound if it is valid, and its premises are true.

Premises can be evaluated directly via intuition. The relationship between argument structure and intuition parallels decision trees versus evaluative functions.

Two Theories of Reason

Why did reasoning evolve? What is its biological purpose? Consider the following theories:

  1. Epistemic theoryreasoning is an extension of our individual cognitive powers. 
  2. Argumentative theory: reasoning is a device for social communication.

One way to adjudicate these rival theories is to examine domain gradients. Roughly, a biological mechanism performs optimally when situated in contexts for which they were originally designed. Our cravings for sugars and fats mislead us today, but encourage optimal foraging in the Pleistocene epoch.

Reasoning is used in both individual and social contexts. But our theories disagree on which is the original domain. Thus, they generate opponent predictions as to which context will elicit the most robust performance.

Argumentative Reason- Domain Gradients (1)

Here we see our first direct confirmation of the argumentative theory: in practice, people are terrible at reasoning in individual contexts. Their reasoning skills become vibrant only when placed in social contexts. It’s a bit like Kevin Malone doing mental math. 🙂

Structure of Argumentative Reason

All languages ever discovered contain both nouns and verbs. This universal distinction reflects the brain’s perception-action dichotomy. Nouns express perceptual concepts, and verbs express action concepts.

Recall that natural language has two processes: speech production & speech comprehension. These functions both accept nouns and verbs as arguments. Thus, we can express the cybernetics of language as follows:
Argumentative Reason- Cybernetics of Language

Argumentative reasoning is a social extension of the faculty of language. It consists of two processes:

  1. Persuasion deals with arguments to support beliefs. 
  2. Justification deals with reasons to justify our actions.

Persuasion and justification draw on perceptual and action concepts, respectively. Thus, the persuasion-justification distinction mirrors the noun-verb distinction, but at a higher level of abstraction. Here is our cybernetics of reasoning diagram.

Argumentative Reason- Cybernetics of Reason

We return to phylogeny. Why did reasoning-as-argumentation evolve?

For communication to persist, it must benefit both senders and receivers. But stability is often threatened by senders who seek to manipulate receivers. We know that humans are gullible by default. Nevertheless, our species does possess lie detection devices. 

The evolution of argumentative reason was shaped by a similar set of ecological pressures as that of language. Let me cover these hypotheses in another post.

For now, it helps to think of belief as clothes, serving both pragmatic and social functions. A wide swathe of biases stems from persuasive arguments performing social rather than epistemic ends.  This is not to say that truth is irrelevant to reasoning. It is simply not always the dominant factor.

On Persuasion

Persuasion processes involve arguments about beliefs. It has two subprocesses: argument production (listener persuasion) and argument evaluation (argument quality inspection). These two processes are locked in an evolutionary arms race, developing ever more sophisticated mechanisms to defeat the other.

Argument production is responsible for the two most damning biases in the human repertoire. There is extensive evidence that we are subject to confirmation bias: the attentional habit to preferentially examine evidence that helps our case. We are also victim to motivated reasoning, which biases our judgments towards our self-interest. We often describe instances of motivated reasoning as hypocrisy.

Consider the following example:

There are two tasks one short & pleasant, the other long & unpleasant. Selectors are asked to select their task, knowing that the other task is giving to another participant (the Receiver). Once they are done with the task, each participant states how fair the Selector has been. It is then possible to compare the fairness ratings of Selectors versus those of the Receivers.

Selectors rate their decisions as more fair than the Receivers, on the average. However, if participants are distracted when they asked their fairness judgments, the ratings were identical and showed no hint of hypocrisy. If reasoning were not the cause of motivated reasoning but the cure for it, the opposite would be expected.

In contrast to production, argument evaluation involves two subprocesses: trust calibration and coherence checking. The ability to distrust malevolent informants has been shown to develop in stages between the ages of 3 and 6.

Coherence checking is less self-serving than production mechanism. In fact, it is responsible for the phenomenon of truth wins. For example, in group puzzles the person whoever stumbles on the solution will successfully persuade her peers, regardless of her social standing. In practice, good arguments tend to be more persuasive than bad arguments. 

On Justification

Justification processes involve reasons about behavior. This is not to be confused with motivations for behavior, which happen at the subconscious level. In fact, there is evidence to suggest that the reasons we acquire by introspection are not true. It has been consistently observed that attitudes based on reasons are much less predictive of future behaviors (and often not predictive at all) than were attitudes stated without recourse to reasons.

The justification module produces reason-based choice; that is, we tend to choose behaviors that are easy to justify to our peers. Reason-based choice explains an impressive number of documented human biases. For example,

The sunk cost fallacy is the tendency to continue an endeavor once an investment has been made. It doesn’t occur in children or non-human animals. If reasoning were not the cause of this phenomenon but the cure for it, the opposite would be expected.

The disjunction effect, endowment effect, and decoy effect can similarly be explained in terms of reason-based choice.

This is not to say that justification is insensitive to the truth. Better decisions are usually easier to justify. But when a more easily justifiable decision is not a good one, reasoning still drives us towards ease of justification.

Theory Evaluation

I was initially skeptical of the argumentative theory because it felt “fashionable” in precisely the wrong sense, underwritten by postmodern connotations of narrative-is-everything and epistemic nihilism. Another warning flag is that the theory draws from the field of social psychology, which has been quite vulnerable to the replication crisis.

However, the evidential weight in favor of the argumentative theory has recently persuaded me. For a comphrehensive view of that evidence, see [MS11]. I no longer believe argumentative reason entails epistemic nihilism, and I predict its evidential basis will not erode substantially in coming decades.

I am also attracted to the theory because it helps tie together several other theories into a comprehensive meta-theory: The Tripartite Mind. Let me sketch just one of example of this appeal.

The heuristics and biases literature has uncovered a bewildering variety of errors, shortcuts, and idiosyncrasies in human cognition.  Responses to this literature vary widely. But too many voices take such biases as “conceptual atoms”, or fundamental facts of the human brain. Neuroscience can and must identify the mechanisms underlying these phenomena.

The argumentative theory is attractive in that it explains a wide swathe of the zoo.

Argumentative Reason- Bias Explanation (1)


Reason is not a profoundly flawed general mechanism. Instead, it is an efficient linguistic device adapted to a certain type of social interaction.


[MS11]. Mercer & Sperber (2011). Why do humans reason? Arguments for an argumentative theory.

The Social Behavior Network

Part Of: Affective Neuroscience sequence
Content Summary: 800 words, 8 min read

Primary Emotion

There are many possible emotions. How can we make sense of this diversity?

Primary emotions are often used to shed light on our emotional lives. Like primary colors, these emotions blend together to reconstitute the full spectrum of emotional experience. For example, contempt is viewed as a combination of anger and disgust.

An emotion qualifies as primary if it satisfies the following criteria:

  1. Unique Machinery. It must be localized to specific neural processes.
  2. Known Signature. A fixed set of phenomenological and behavioral expressions
  3. Universal (Pre-Cultural). Expressed in all members of a given species. For ecologically valid stimuli, response does not detract from overall fitness.
  4. Primitive (Pre-Cognitive). Activated more strenuously during early development or immediate crisis (i.e., with minimal cognitive regulation).
  5. Differentiable.  Can be dissociated from other primary emotions.

Despite consensus about the above criteria, there is less agreement on which emotions deserve membership.  Here are three representative lists.

SBN- Theories of Primary Emotions (4)

The Social Behavior Network (SBN)

Neuroscientists studying aggression have identified six brain regions that seem to produce this behavior. They are:

  1. Preoptic Area of the Hypothalamus (PO)
  2. Anterior Nucleus of the Hypothalamus (AH)
  3. Ventromedial Nucleus of the Hypothalamus (VMH)
  4. Periacquductal Gray (PAG)
  5. Lateral Septum (LS)
  6. Extended Amygdala (extAMY)

If any of these regions are damaged, an animal often becomes less aggressive. If you electrically stimulate these regions, the animal becomes enraged.

What is interesting about these six regions is that they were independently discovered by other neuroscientists who labelled them as the seat of parental care.

… AND, by yet other neuroscientists who had been investigating the neural basis of sexual behavior.

What do { Parental Care, Aggression, Sexual Behavior } have in common? They are entirely directed at members of one’s own species. These primary emotions are deeply related to animal social behavior.

Since the six nuclei { PO, AH, VMH, PAG, LS, extAMY } contribute to each of these three emotions & behaviors, they are now called the social behavior network (SBN). 

SBN- Overview

Will it turn out that all social primary emotions are created by the SBN? I don’t know. It is suggestive, however, that Play has been partially localized to the lateral septum (LS).

SBN and Emotion Selection

The SBN is one brain structure that can produce three distinct emotional response. How is this possible? How does each emotion individuate itself within a single apparatus?

To proceed, we consult our “theorizing roadmap”:

SBN- Principles of Structure Function

Conceptually, we are plagued by “too many emotions”. Thus, we can either:

  1. Examine whether our three emotions can be unified; or
  2. Look for granularity within the SBN

Since the former is impractical, let’s look more carefully at the SBN.

One way to explain emotion individuation would be a shape hypothesis. If the intensity of neuron firing is encoded by height, you might expect different topographies (landscapes) to encode different emotions. 

SBN- Emotion Differentiation Shape Hypothesis

Another hypothesis is the granularity hypothesis. This posits that there may be e.g., three subdivisions of the lateral septum, and each subdivision supports a different emotion.

I tend to find this approach more plausible, given my experience with other subcortical structures. That said, time will tell. 🙂

Relation To The Basal Ganglia

The SBN is anatomically related to the basal ganglia. Recall that the basal ganglia has three loops: Associative, Sensorimotor, and Limbic. The SBN is strongly connected to, and shares two nodes with, the Limbic Loop.

SBN- SBN vs Limbic Loop (2)

As we have seen, the basal ganglia is the seat of motivation. The anatomical connection between SBN and basal ganglia mirrors the behavioral link between sociality and motivation. However, on a mathematical level, it is less clear how social emotions can be incorporated into the reinforcement learning apparatus:

SBN- Application to Neuroeconomics

Evolution of Emotion

Let’s use comparative anatomy to discover when the social behavior network evolved. By dissecting brains from five representative species, we can infer that the basal ganglia dates back to at least the origin of ray-finned fish.

SBN- Phylogeny (1)

The SBN nuclei are preserved across our representative species:

SBN- Comparative Anatomy

And hodology (connections) between SBN nuclei are preserved:

SBN- Comparative Hodology

This evidence demonstrates that the social behavior network has been around since the invention of vertebrates. It also raises important questions, such as:

  • How has the SBN changed to support hyper-social animals like primates?
  • How much further back do emotional adaptations go? Do insects feel emotions? If yes, which kinds?

Until next time.

Related Works

  • Newman (1999). The Medial Extended Amygdala in Male Reproductive Behavior: A Node in the Mammalian Social Behavior Network
  • O’connell, Hofmann (2011). The vertebrate mesolimbic reward system and social behavior network: a comparative synthesis