Confabulation: saying more than we can know

Part Of: Demystifying Sociality sequence
Content Summary: 1500 words, 15min read

Anosognosia

It is unfortunate to experience illness. It is strange to fail to recognize illness within oneself. Anosognosia is the name for this inability. A few examples:

Example 1. In a letter to his friend Lucilius, Seneca (40 CE) described a woman who obstinately denied her blindness.“….You know that Harpestes, my wife’s fatuous companion, has remained in my home as an inherited burden….This foolish woman has suddenly lost her sight. Incredible as it might appear, what I am going to tell you is true: She does not know she is blind. Therefore, again and again she asks her guardian to take her elsewhere because she claims that my home is dark…..It is difficult to recover from a disease if you do not know to be ill….”. 

Example 2. After a right-hemisphere stroke, she lost movement in her left arm but continuously denied it. When the doctor asked her to move her arm, and she observed it not moving, she claimed that it wasn’t actually her arm, it was her daughter’s. Why was her daughter’s arm attached to her shoulder? The patient claimed her daughter had been there in the bed with her all week. Why was her wedding ring on her daughter’s hand? The patient said her daughter had borrowed it. Where was the patient’s arm? The patient “turned her head and searched in a bemused way over her left shoulder”. 

Spend enough time with these patients, and it becomes clear that their problem is not cognitive dissonance. No, the delusion has a much deeper, subterranean, hold on their mental lives.  These patients freely generate explanations for their illness-related behavior (“I can’t walk around because the house is dark”, “The unmoving arm isn’t mine, it is my daughters”). These explanations are not examples of dishonesty. They are genuine perceptions of a misfiring mind. The word for these honest lies is confabulation.

Confabulation_ Comparing to Dishonesty (1)

If you’re anything like me, you’ll find such epistemic fences a bit unsettling. Is it possible our entire species is entertaining a similar delusion that increases biological fitness? Do we actually have four fingers but are collectively convinced that little fingers exist?

Split Brain Patients

The vertebrate brain has two hemispheres. Some neural functions are bilateral: visual processing occurs in both right and left hemisphere (one per eye). Other functions are unilateral: language processing is usually left-lateralized (with the exceptions tending to be left-handed). The advantages & disadvantages of lateralization of brain function is an active research area.

In neurotypical animals, there exist traverse fibers (commissures) which integrate information between the hemispheres. The corpus callosum is the overwhelmingly dominant bridge between hemispheres:

  • Corpus Callosum: 250 million fibers
  • Anterior commissure: 0.5 million fibers
  • Posterior commissure: 0.5 million fibers
  • Habenula commisure: 0.1 million fibers

Split brain patients are those that have had their corpus callosum severed. These patients tend to exhibit selfhood fracturing: each hemisphere constitutes a largely autonomous entity with its own beliefs and desires.

Present the left hemisphere with a picture of a chicken claw, and the right with a picture of a wintry scene. Now show the patient an array of cards with pictures of objects on them, and ask them to point (with each hand) something related to what they saw. The hand controlled by the left hemisphere points to a chicken, the hand controlled by the right hemisphere points to a snow shovel. So far so good.

But what happens when you ask the patient to explain why they pointed to those objects in particular? The left hemisphere is in control of the verbal apparatus. It knows that it saw a chicken claw, and it knows that it pointed at the picture of the chicken, and that the hand controlled by the other hemisphere pointed at the picture of a shovel. Asked to explain this, it comes up with the explanation that the shovel is for cleaning up after the chicken. While the right hemisphere knows about the snowy scene, it doesn’t control the verbal apparatus and can’t communicate directly with the left hemisphere, so this doesn’t affect the reply. The patient instead confabulates.

What did ”the patient” think was going on? This is a wrong question. Once you know what the left hemisphere believes, what the right hemisphere believes, and how this influences organism behavior, then you know all that there is to know.

Gazzaniga has described this propensity of patients to confabulate reasons for the behavior of the right brain as the left-brain apologist. The left hemisphere functions as an interpreter, a lawyer, a press secretary:: it justifies behavior to make the organism look good. V.S Ramachandran, drawing on observations that right-brain lesions disproportionately produce delusions, claims the existence of a right-brain revolutionary. It is the failure some module in the right hemisphere that causes anosognosia: the left-brain apologist to go unchecked: confabulation exacerbated by delusion.

Confabulation in Neurotypicals

We have so far explored confabulation in patients with brain damage. Do neurotypical, everyday people produce “honest lies”?

We confabulate all the time.. We just don’t realize that we are.

In Telling More Than We Can Know: Verbal Reports on Mental Processes, Nisbett & Wilson (1977) review hundreds of studies, across dozens of disciplines. Their evidence admits a theme: people’s attempts to explain their behavior is almost always unhelpful in identifying the important factors influencing their decisions. Let me briefly review four example findings.

Study 1: Insufficient Justification.

Zimbardo et al (1969) ask participants to accept a series of painful shocks while performing a learning task. Participants were split into two groups:

  • Adequate Justification (“nothing will be learned unless shocks administered again”)
  • Inadequate Justification (“I’m curious to see what happens”)

Who suffers less?

→ The Inadequate Justification group. This group learns much more quickly, and admit lower galvinic skin response (lower “fight or flight”).

Why do they suffer less?

→ These people were given a poor justification for continuing, and yet they continued anyway. To explain their own behavior, they generate intrinsic motivation for continuing. (As an aside, this phenomenon is similar to the overjustification effect).

Do they know that they suffer less?

→ No! Subjective reports of pain were the same across groups.

Study 2: Attribution Effect

Storms & Nisbett (1970) ask insomnia-suffering participants to sleep under observation. Participants were split into two groups:

  • Arousal Attribution: placebo given, claimed to cause restlessness, alertness
  • Control: no placebo administered

Who falls asleep more quickly?

→ Arousal Attribution group (28% faster).

Why do they fall asleep more quickly?

→ Attribution of restlessness to placebo, rather than cognitive factors.

Do they know why they fall asleep more quickly?

→ No! More than 80% of patients would not attribute sleep improvement to pill, even after the experiment being explained to them.

Study 3: Counterattitudinal Advocacy

Bem & McConnell (1970) ask participants for their view on a political topic. Then ask they write an essay against their own view. Participants were split into two groups:

  • Coercion: bribed to write the essay
  • Freedom: led to believe they had a choice

Who changes their position after writing the essay?

→ Freedom group.

Why do they change?

→ Difficult to explain writing that essay, unless they wanted to.

Do they know that they changed their position?

→ No! In contrast to the Coercion group which had accurate memories, those whose opinions had changed failed to remember their previous position.

Study 4: Choice Blindness

Johannson et al. (2005) ask participants to evaluate which of two female faces was more attractive. Researchers then hand subjects the face they had chosen, asking them to explain the motives behind their choice. Participants were split into two groups:

  • Switch: used a sleight-of-hand trick to switch the photos, showing viewers the face they had not chosen.
  • Control: show the face they had chosen

Does the Switch group notice the change?

→ Most don’t. ⅔ of participants believe they had chosen the other face.

Did those who didn’t notice explain of their (non-)choice?

→ Without missing a step. They happily explained why they preferred the face they had actually rejected, inventing reasons like “I like her smile” even though they had actually chosen the solemn-faced picture.

Putting It All Together

Confabulation is “honest lying”: communicating an untruth, while earnestly believing in its veracity.

  • Anosognosia patients cannot admit that they are paralyzed. When asked to explain their inability to move, they confabulate answers.
  • Split brain patients similarly confabulate explanations for the behavior of the non-linguistic right hemisphere.
  • Confabulation is not merely a medical curiosity. Confabulation is everywhere: most self-reports are utterly useless. Some evidence includes:
    1. Insufficient Justification: people didn’t notice when they were suffering less
    2. Attribution Effect: people failed to understand the reason why they slept better
    3. Counterattitudinal Advocacy: after people change their minds, they fail to remember they ever thought differently
    4. Choice Blindness: once tricked into thinking they chose something different, people are happy to explain their reasons.

Confabulation_ Evidence Overview

Why do human beings confabulate so often? How can we be such utter strangers to ourselves?  We shall explore these questions next time. Until then!

The Construction of Body Status

Part Of: Neuroeconomics sequence
Content Summary: 800 words, 8 min read

Connection To Philosophy of Well-being

What is well-being?

Philosophers have put forward three theories.

  • Hedonic Theory. Well-being is experiencing pleasure.
  • Desire Fulfillment Theory. Well-being is achieving your goals.
  • Objective List Theory. Well-being is living an objectively good life.

In this post, we ask “does the brain have any incentive to compute biological measures of well-being? If so, what would this data structure be used for?”

Well-being is Body Status

Everyone agrees that the following are true about well-being:

  1. Well-being is sensitive to variables of body status. Instantaneous well-being is less if an animal is in pain, other things being equal.
  2. Well-being responds to many divergent factors (e.g., both pain and hunger reduce instantaneous well-being).

But there is only one biological apparatus that satisfies these properties:

Proposition 1. Well-being is body status, constructed by regulatory processes.

In 1925, Walter Cannon formulated homeostasis, which posits the body striving to maintain internal variables essential for life. For example, the body measures its own body temperature. If it is too hot or cold, a negative feedback process will initiate actions to bring the variable back into its optimal value.

homeostasis (2)

The body tracks many more variables besides body temperature. These variables together constitute a representation I will call body status:

Wellbeing Biology- Healthy Organism Body Status (2)

Body status representations play a key role in the biological construction of personal identity and subjectivity. We will return to this topic at another time.

Desire from Body Status

Markov Decision Process (MDPs) are a lens through which we can interpret behavior. An MDP contains states, actions, and a reward signal. The organism selects a policy \pi such that the states encountered maximize the reward signal.

mdp basics

Within the brain, the basal ganglia implements two data structures which together generate motivation:

  • A policy 𝝅 which maps states to actions, S → A.
  • A value function V(s) which represents expected reward.

Reinforcement learning theory is silent on the biological substrate of the reward signal. But to us, the solution is clear:

Proposition 2.  Reward is derived from the body status representation.

Body Status- Construction of Reward Signal (1)

This is one mechanism by which low body temperature is corrected. Body status deviations elicit a reward signal that prompt “cold” motor desires (e.g., shivering). In contrast, notice that “hot” visceral desires (e.g., blood vessel constriction) are constructed directly, not implemented by the basal ganglia.

Hedonics from Body Status

There are two liking systems in the brain:

  1. Hedonics is a global measure of pleasure and pain. It summarizes body state information.
  2. Valence is an object-specific judgment of value. Valence usually correlates with desire: we approach things that are pleasant, and avoid things that are unpleasant.

Yet drug addicts often reach the point where drug consumption is unpleasant, yet they pursue a fix regardless. Wanting and liking are dissociable. Why? Because they are implemented by different neurochemical systems (phasic dopamine and opioids, respectively).

Body status is not only used to behaviorally motivate. In my view, it also tags perceptual data with information about its visceral relevance.  This includes the two primary affective dimensions:

  • Object salience (“does this merit attention, further computation”)
  • Object valence (“is this safe to approach”)

Body Status- Tagging for Visceral Relevance (1)

So we have arrived at our next thesis:

Proposition 3. Hedonics and valence are derived from body status representation.

Philosophers debate whether well-being is best attributed to pleasure/pain or desire. But body status is used to construct both of these phenomena. This gives us reason to believe that the philosophical theories of hedonism and desire fulfillment can be unified.

The Socialification of Body Status

Across the course of natural history, certain animals have become increasingly social, able to interact more meaningfully with their conspecifics.

Three important social adaptations were:

  • In mammals, social status. Animals track their standing in the group.
  • In primates, social inclusion. Group living made possible by e.g., exchange of favors.
  • In hominids, social reputation. An prosocial alternative to power, independent of the dominance hierarchy.

How might a biological organism introduce these new behavioral repertoires? A simple way to do it might be to extend body status to incorporate social variables of interest:

body status socialification

Proposition 4. Body status was extended to support novel social behaviors.

This proposition lends a biological perspective why social ostracization is so painful, and elicits physiological distress directly comparable to e.g., evading predation.

This socialification hypothesis is more speculative than my other three propositions. How might we go about evaluating whether it is true?

Recall that body status is represented by an overlapping set of neurochemical networks, whose main connecting hub is the hypothalamus. If Proposition 4 is true, we would expect to find new chemical systems uniquely responsive to these proposed dimensions.

I suspect these connections will be established rather quickly. We already possess several extremely suggestive lines of evidence. See, for example, Hennessy et al (2014). Sociality and sickness: have cytokines evolved to serve social functions beyond times of pathogen exposure?

Takeaways

Today, I presented the following ideas:

  • Proposition 1. Well-being is body status, constructed by regulatory processes.
  • Proposition 2. Desire is derived from body status representation.
  • Proposition 3. Hedonics and valence are derived from body status representation.
  • Proposition 4. Body status was extended to support novel social behaviors.

Until next time.

Counting: The Fourfold Way

Part Of: Statistics, Algebra sequences
Content Summary: 1100 words, 11 min read

The Fundamental Principle of Counting

We often care to count the number of possible outcomes for multiple events.

Example 1. Consider purchasing a lunch with the following components:

  • Burger b \in \{ Chicken, Beef \}
  • Side s \in \{ Fries, Chips \}
  • Drink d \in \{ Fanta, Coke, Sprite \}

How many lunch outcomes are possible?

Three approaches to counting suggest themselves. We might make a list. But this process can be error prone. Other representations are more systematic: we can build a tree, or imagine a (hyper)-volume. Each strategy converges on the same answer: 12 possible lunches.

Permutation_ Trees of Events (1)

Can we generalize? Yes, with the help of the fundamental principle of counting (aka the rule of multiplication). For any event A with a possible outcomes, and another event B with b possible outcomes, the number of possible outcomes for composite event A \cup B is a*b

How does counting work for a repeating event? For an event with n possibilities occurs k times, there are k^n possible outcomes.

Example 2. How many numbers can be represented by a byte (8 bits)?

Each bit has two possible assignments: zero or one. For eight such “bit events”, we have 2^8 = 256 possible outcomes.

Permutations

Example 3. A trifecta bet guesses which horse will place first, which second, and which third.  How many such bets are possible in a 9-horse race?

Each medal has nine possible assignments. For three such “medal events”, we have 9^3 = 729 possible outcomes.

This answer is completely wrong. To understand why, consider the lottery machine:

powerball

  • In Example 2, a single value (e.g., 0) can freely be assigned to multiple bits. Every time you draw a bit from the “possibility machine”, it is replaced when the next bit is drawn. Sampling with replacement means that each event is exactly the same.
  • In Example 3, a single horse (e.g., Secretariat) cannot be assigned multiple medals. Every time you draw a horse from the “possibility machine”, it cannot be drawn for subsequent events. Sampling without replacement means that each event has diminishing numbers of possibilities.

Definition 4. A permutation is a list of outcomes drawn without replacement.

For the trifecta bet, how many permutations exist? Well, 9 different horses that earn the gold. Given that one horse won the gold, 8 different horses that can earn the silver. Then there are 7 different horses that can earn bronze. Thus, there are 9 \times 8 \times 7 = 504 possible trifecta bets.

Permutation_ 9 perm 3

Similar to how exponentiation is defined as repeated multiplication, a factorial is defined as slowly-decrementing multiplication.

9! = 9 \times 8 \times 7 \times 6 \times 5 \times 4 \times 3 \times 2 \times 1

But we only want 9 \times 8 \times 7. How can we get rid of the other terms? By division, of course!

9 \times 8 \times 7 = \dfrac{9 \times 8 \times 7 \times 6 \times 5 \times 4 \times 3 \times 2 \times 1}{6 \times 5 \times 4 \times 3 \times 2 \times 1} = \dfrac{9!}{6!}

Why did we use the number 6? Because if three of our nine horses place, six do not place. So a more general way to write this equation,

\dfrac{9!}{9-3!}

More generally, if you have n items and want to find the number of ways k items can be ordered:

Equation 5: Permutation. P(n, k) = \dfrac{n!}{(n-k)!}

Combinations

In contrast to permutations, for combinations, order doesn’t matter. A permutation is a list, a combination is a set.

A boxed trifecta bet requires correctly which three horses will place first, second and third (order doesn’t matter). A trifecta bet selects a permutations; a boxed trifecta bet selects a combination.

Imagine only four horses in the race. That’s P(4,3) = \dfrac{4!}{4-3!} = 24 possible trifecta bets. But how many boxed trifecta bets are possible?

Combinations treat duplicates as a single entry. For example, abc and acb are equivalent for a boxed trifecta bet. We can identify four groups, with six equivalent permutations each:

Permutation_ 4 choose 3

In general, how many winner duplicates exist? How many ways can we shuffle k winners? Well, if you have k winners and are wondering how many permutations exist for that entire set… that’s P(k,k)!

Equation 6: Combination. C(n, k) = \dfrac{P(n,k)}{P(k,k)} = \dfrac{P(n,k)}{k!} = \dfrac{n!}{(n-k)! k!}

For an example of combinations used to solve a real problem, I recommend this post.

The Fourth Way: Stars and Bars

Example 7. You have k=3 cookies a, b, c, d to give to n=4 kids. How many possible ways are there to do so?

In the case of medals and horses, we claimed four solutions: \{ a, b, c\}\{ a, b, d\}\{ a, c, d\}, and \{ b, c, d\}. But there is an important difference: horse-less medals are impossible, but cookie-less children are not! So we need to account for situations like \{ a, a, a\}, with one child getting all of the cookies.

We can use the traditional bins-as-containers metaphor to visualize outcomes (top row). Or we can instead visualize bin boundaries (bottom row). This visualization strategy is called stars and bars.

Combinatorics_ Stars and Bars (2)

How many kid-cookie outcomes are possible? The answer becomes apparent only if we use stars and bars  (bottom row). Every possible shuffling of the stars in those squares produces a valid event. That is, \binom{6}{3}.

How many objects are possible in general? There are n stars (kids). Since bars represent bin boundaries, there are n-1 bars. Thus:

Equation 8: Multi-Combination. C(n, k) = \binom{n+(k-1)}{k}

The Fourfold Way

Every example we have seen differentiates possibilities and outcomes. We will use the metaphor of balls for outcomes (something concrete) and bins for possibilities (something to “clothe” outcomes).

Combinatorics_ Possibilities vs Outcomes

Equation 9. An event is a function that maps outcomes to possibilities:

Event : Outcomes \rightarrow Possibilities

Combinatorics_ Events as Functions (2)

This function can be compactly represented as bbd.

Functions require every element of the domain to map to the codomain. Event functions require no unrealized outcomes. That is: every outcomes manifests a possibility. Every ball is given a bin.

We saw previously that combinations and permutations don’t allow events like aab and ccc. A single horse cannot win multiple medals. Multiply-realized possibilities are not allowed.

Recall the definitions of injective, surjective and bijective functions. This requirement is the injective property. Sampling without replacement is the same thing as injectivity.

Tuple, permutation, combination, and multi-combination. This is the fourfold way.  The Way can be made more general by counting situations where the possibilities are unlabeled, and the event function meets the surjection property. But for details on the more complete twelvefold way, I recommend this post.

Combinatorics_ Fourfold Way

Towards a Rosetta Stone

Now, consider all possible functions for 4 bins and 3 balls:

Combinatorics_ Rosetta Stone (4, 3)

What do the equations above have to do with this shape? Well, each way of counting corresponds with a different subset of this broader shape:

Combinatorics_ Shape of The Way (1)

I leave it to the interested reader to ponder, how such a jagged shape can be represented by these four relatively clean formulae.

Until next time.

Related Resources

https://www.ece.utah.edu/eceCTools/Probability/Combinatorics/ProbCombEx15.pdf

https://wizardofodds.com/games/poker/

http://www.math.hawaii.edu/~ramsey/Probability/PokerHands.html

Jesus, disciple of John

Why do the Gospels care about John?

In the 20s CE, at least two prophets were active in the Israelite highlands: John the Baptist and Jesus of Nazareth. Both were killed on political grounds. Jesus left behind disciples that remained loyal to him in some sense. So did John. In fact, these two religious groups interacted (vied for influence?) after the deaths of their leaders.

Ultimately, John’s religious group died out; Jesus’ following did not. With the exception of Josephus and a few other secular sources, the Christian gospels are our best source of information of the religious climate of this time period.

These Christian gospels spend an astonishing amount of time describing John: both his independent ministry, and his relationship with Jesus. John’s message that a powerful Son of Man will judge the world, is interpreted by Christians as referring to Jesus.

Why should the gospels lavish John with such attention and theological import? Two hypotheses suggest themselves,

  1. The early Christians shared a broader Jewish respect for John’s ministry, and that reverence led to the attention & theological significance.
  2. The early Christians crafted the gospels partially in effort to convert John’s disciples.

As we shall see, neither of these hypotheses are adequate. Instead, we shall see evidence suggesting that Jesus began his ministry as a disciple of John.

On Jesus’ Baptism

The gospels record that John baptized Jesus. This event is prima facie embarrassing for two reasons:

  1. Implications of imperfection. John’s baptism was clearly and consistently described as “for the forgiveness of sins”.
  2. Implications of subordination.  This is the reason Matthew has the Baptist say “I need to be baptized by you, yet you come to me?”

Mark and Matthew combat with these implications by describing a theophany where God calls Jesus his Son. In contrast, Luke makes the Baptist a relative of Jesus, and has John imprisoned before Jesus’ baptism. We are never explicitly told who baptizes Jesus. And in the fourth gospel, John the Baptist is not the Baptist, the title is never used on him. He even denies that he is Elijah, even though in Matthew, Jesus flatly affirms that he is.

This incredible diversity of interpretations is due to a simple fact. At the beginning of Jesus’ ministry stands an independent Baptist, a Jewish prophet who won great popularity and reverence before and apart from Jesus, who also won the reverence and submission of Jesus to his baptism of repentance for the forgiveness of sins, and who left behind a religious group that continued to exist apart from Christianity.

The Baptist constituted a stone of stumbling right at the beginning of the story of Jesus, a stone too well known to be ignored or denied, a stone that each evangelist had to come to terms with as best he could. The embarrassment of the evangelists is illustrated by the diverse, not to say contradictory ways in which they try to bend the independent Baptist to a dependent position within the story of Jesus.

A Common Vision

The gospels record that Jesus was baptized by this prophet. But why would he go? Since nobody compelled him, he must have gone to John because he agreed with John’s message.

There were lots of other groups vying for Jewish attention. Jesus did not join the Pharisees, who emphasized scrupulous observance of the Torah. He did not align himself with the Sadducees, who focused on the worship of God through the Temple cult. Nor did he associate with the Essenes, who formed monastic communities to maintain their own ritual purity. Nor did he subscribe to the teaching of the “fourth philosophy”, which advocated a violent rejection of Roman domination.

No, Jesus associated with an ascetic prophet who proclaimed an imminent end of history. As we will see later, this fact will shed light on the ministry of the historical Jesus.

A Common Practice

Was Jesus’ baptism a singular event? Did he spend much time with John? Was he admitted into John’s inner circle?

Jesus’ first disciples were John’s disciples. If some disciples of the Baptist came to transfer their allegiance to him while they were still in the company of the Baptist, that suggests that Jesus had stayed in the Baptist’s orbit long enough for some of the latter’s disciples to come to know him and be impressed by him.

The fourth gospel admits that Jesus’ ministry included baptism. In fact, not ten sentences later, and that claim is baldy contradicted. However, several pieces of evidence suggest this is the (rather clumsy) work of a Johannine redactor.

Jesus practicing baptism is further reinforced by Mark 11:27-30: “The chief priests asked Jesus, “Who gave You this authority to do these things? Jesus replied, “One question, then I will tell you. Was John’s baptism from heaven or from men?”

The Sadducees were keen to admit John’s religious authority, and deny Jesus’. So why would Jesus invoke John’s baptism? A likely explanation is that it was an area of ministry overlap: the Sadducees couldn’t well admit John’s baptism was divine, yet criticize Jesus’ ministry which included that very baptism.

Jesus as Disciple

A picture is slowly emerging. Jesus began his public life as one of John’s disciples. This is the best explanation for his a) being baptized by John, b) taking John’s disciples, c) practicing John’s baptism. He slowly differentiated himself with the following teachings:

  • Non-asceticism. John was renowned for his minimal lifestyle. Jesus was no stranger to parties, so to speak.
  • Miraculous works. John’s ministry did not feature miracles. Jesus’ did, and he used this to illustrate his end-times message.

Yet despite these divergences, Jesus and John operated largely complementary ministries. Consider Matthew 11:16-19

To what should I compare this generation? It’s like children who call out to each other: “We played the flute for you, but you didn’t dance; we sang a lament, but you didn’t mourn!”

For John did not come eating or drinking, and they say, ‘He has a demon!’ Jesus came eating and drinking, and they say, ‘Look, a glutton and a drunkard, a friend of tax collectors and sinners!’

Yet wisdom is vindicated by her children.

This passage is remarkable because it places John and Jesus’ ministry side by side. Absent are theological claims of Jesus’ superiority.  To be sure, John’s asceticism and Jesus’ non-asceticism are contrasted. Yet John (lamenter) and Jesus (flute player) are both children of wisdom.

Jesus after John

What was the relationship like between John and Jesus? Did they always function collaboratively, or competitively?

The details of this relationship are largely lost to history. Some evidence of tension can be inferred in how frequently Jesus was asked to clarify his relationship to John.

One of our most compelling clues, however, lies in the moving plea from Jesus to his former rabbi:

When John heard in prison what the Messiah was doing, he sent a message by his disciples and asked Him, “Are You the One who is to come, or should we expect someone else?” Jesus replied to them, “Go and report to John what you hear and see: the blind see, the lame walk, those with skin diseases are healed, the deaf hear, the dead are raised, and the poor are told the good news. And if anyone is not offended because of Me, he is blessed.

Absent are the polemics so typical of Jesus’ sayings.  This beautitude has an audience of one. This delicate appeal to his former rabbi: “please do not be offended because of [my origin]”. And yet here, tellingly, the conversation stops. We are not told John’s reply. The relationship is left ambiguous, as John heads for his execution by Herod Antipas.

After the execution of the Baptist, Jesus’ ministry developed by itself. And yet, as we will see, Jesus never fully emerges from the shadow of John. Their common ministry and message pervades the remaining years of Jesus’ ministry.

Polytheistic Roots of Israelite Religion

Part Of: Demystifying Religion sequence
Followup To: Yahweh and the Levites
Content Summary: 2000 words, 10min read.

Introduction

Is the Hebrew Bible monotheistic?  

We might be tempted to say yes after reading Isaiah 44:6 “I am the first and I am the last; besides me there is no God”.

But the situation is more complicated. The Hebrew Bible is also replete with polytheism. A few examples:

  • “Do you not possess that which Chemosh, your god, has given you? So shall we possess what Yahweh has given us.” Judges 11:24
  • “Who is like Yahweh among the gods?” Exodus 15:11
  • “The people of Judah have as many gods as they have towns.” Jeremiah 11:13

We also see middle ground staked out between these two positions. For example, the original audience of the book of Deuteronomy is often exhorted not to follow after other gods, without it ever being asserted that these gods did not exist or were not real. This is known as monolatrism (“single worship”).

Which belief came first?

Last time, we showed how Yahweh was originally a god of metallurgy in northwest Saudi Arabia. Today, we will work with the framework that Yahweh was introduced to Israel in a five-stage process:

  1. Traditional Polytheism. The earliest Israelites worshipped creator god El, his wife Asherah, and his sons e.g., Baal.
  2. Incorporation. Yahweh was incorporated as a 2nd tier god in El’s pantheon.
  3. Elevation. Yahweh and El are identified as the same deity.
  4. Monolatrism. A new Yahweh-only movement emerges, and the gods of the second tier are denied.
  5. Monotheism. Gods of other nations are denied, Yahweh’s power is deemed universal in scope.

Why did Yahweh worship progress along this trajectory? As we shall explore next time, as with the theocracies of surrounding nations, changes in the religious landscape have strong, robust correlates in the sociopolitical life.

Today I’d like to focus on a different, simpler topic. We shall turn to archaeology and cultural anthropology to explore expressions of polytheism within the Hebrew Bible. Many of my readers already know that the text acknowledges (polemicizes against) polytheistic practices. Less well-known are examples of celebration (bald assertions of polytheistic beliefs) and assimilation (Yahweh “adopts” the roles and characteristics of rival deities). 

Monotheism_ Five Stages (2)

 

Let’s review the deities in El’s pantheon, and their appearance in the Hebrew Bible.

A Disclaimer

For many modern readers, polytheism is a term loaded with negative connotation. Partisans use it as a weapon. Attackers point to continuities between Israelite religion & polytheism, and defenders point to instances where Israelite rhetoric polemicizes against polytheism. But all ideological innovations have both features.

More to the point, those who spend time interacting with polytheism understands how earnestly it grapples with the same aspects of the human condition as other strands of religious expression. Polytheism must be encountered on its own terms. To weaponize is to misunderstand.

The important thing to bear in mind in the following, is that underneath the images and icons of religious expression lie a particular group of people, responding to social and political pressures in thoroughly understandable ways. My experience has been, the more time you spend in someone else’s culture, the easier it becomes to empathize with their plight.

El

Israelite Polytheism_ El

At some point in its history, El was identified with Yahweh as the same god.

This equation is expressed clearly in Exodus 6:2-3. “And God said to Moses, “I am Yahweh. I appeared to the patriarchs as El, but by my name Yahweh I did not make myself known to them.” Other Biblical material asserts this equation. Joshua 22:22 states “the god of gods is Yahweh”. Judges 9:46 refers to “El of the covenant”.

The Yahweh-alone movement vigorously condemn prominent Canaanite gods… except El. There are zero condemnations of El in the Hebrew Bible. This makes sense if Yahweh was ultimately identified with this Canaanite creator-god. What’s more, archaeological evidence suggests that the Yahweh religious centers in Shiloh and Bethel were originally a place of El worship.

El and Yahweh are attributed same characteristics. El is depicted as a wise old man with a beard eg “You are great, O El, and your hoary beard instructs you”. Yahweh is described in the same terms (Daniel 7:9, Job 36:26, Habakkuk 3:6). Like “Kind El, the Compassionate”, Yahweh is a “merciful and gracious god”. The description of Yahweh’s dwelling place as a tent (Psalms 15:1, 27:6, 91:10) recalls the tent of El in the Canaanite narrative of Elkunirsa. Finally, both Yahweh and El are said to dwell amidst cosmic waters (Isaiah 33:20-22, Ezekiel 47:1-12, Zechariah 14:8).

Just as Zeus had a council, or assembly, of other gods, so too does Yahweh. The Hebrew Bible is overflowing with references to Yahweh’s (El’s) assembly. See for example Psalm 89:6-8, Zechariah 14:5, 1 Kings 22:19, Isaiah 6:1-8, and Jeremiah 23:18,22.

Baal

Israelite Polytheism_ Baal

Worship of Baal can be dated back to the foundation of Israelite societies. This can be seen in onamatology, the study of proper names. Names in the Ancient Near East tend to have a theophoric component: usually a suffix that honors a deity. Yahwistic names include Josiah, Jehu (note the “J” sound); Baal-oriented names include e.g., “Zerubabbel”. In addition to hundreds of icons devoted to Baal worship, we also see Ba’al theophoric names as common in the Levant in this time period.

Yahwistic prophets of this period reserve the most vitriol for Baal worship. Why? Because the Omride dynasty (including King Ahab & Jezebel) erected a temple to Ba’al. While the cult of Yahweh continued in the northern kingdom, Baal was perhaps elevated as the patron god of the northern monarchy, thus creating some sort of theopolitical unity between the kingdom of the north and the city of Tyre.

Indeed, there is some evidence that the cult of Baal and Yahweh got conflated in the north. Hosea 2:16-24 suggest that some northern Israelites did not distinguish between Yahweh and Baal. The religious sanctuaries in the Israelite cities of Dan and Bethel centered around golden calves; this iconography strongly parallels that of Baal. Finally, the redundancy in 1 Kings 16:32 was almost certainly a scribe glossing over the original text, “altar for Baal in temple of Yahweh”.

To induce the Israelites to stop worshipping Baal, the imagery of Baal was adopted by the Yahweh cult. The Baal Cycle, ancient mythology on the scale of the Epic of Gilgameth, has four literary themes for the storm god. Here are those themes, along with the Biblical text which mirrors them.

  1. The march of the divine warrior (Psalm 104:3 “He makes the clouds his chariot, and travels along on the wings of the wind”)
  2. The convulsions of nature as the divine warrior manifests his power (Judges 5:5, Hab 3:10)
  3. The return of the divine warrior to his holy mountain to assume divine kingship (Isaiah 31:4)
  4. The utterance of the divine warrior’s voice from his palace provides rains that fertilize the earth (Jeremiah 10:13)

Yahweh is also depicted as defeating Baal’s classic enemies:

  • Baal/Yahweh defeats a seven headed dragon, Leviathan, and River (CAT 5.1, Psalm 74:13-15).
  • Baal/Yahweh defeats Sea (KTU 1.14, Psalm 89:10).
  • Baal/Yahweh defeats Death/Mot (KTU 1.4 VIII-1.6, Isaiah 25:8).

Asherah

Israelite Polytheism_ Asherah

El’s wife was named Asherah. When Yahweh was identified with El, did he also inherit his wife? In the blessings of Joseph, Genesis 49:25 contains language specific to the Asherah cult “blessings from Breast-and-Womb”. The Bible further admits that the Israelites frequently worshipped a “Queen of Heaven” (Jeremiah 7:18, 44:17-25). Indeed, 2 Kings 21:7 tells us that worship of Asherah happened within the Temple itself. Finally, archaeology has uncovered several icons with the inscription “Yahweh and his Asherah”. This evidence cumulatively suggests that, in early forms of Israelite religion Yahweh was believed to have a wife.

Israelite polytheism_ Yahweh and his Asherah

The push towards monolatrism led to the eviction of the Asherah cult, whose memory may be preserved in Zechariah 5:5-11. But this eviction created a deficit of femininity to Israelite religious expression. To compensate, the Biblical writers began attributing feminine attributes to Yahweh (Isaiah 49:15, 46:3, 44:2,24, 42:14). Asherah-like characteristics also appear in the goddess of Wisdom in Proverbs 8.

Astral-ification

There is extensive evidence for worship of an astral deity (sun god) in Jerusalem.  And Jerusalem is presumably the site that Yahweh was identifed with El. Since the Ugaritic texts hint that El’s family was astral in character, it is not unthinkable that Yahweh was viewed similarly.

  • Proper names. A certain number of proper names are constructed from the root ‘-w-r (“shine, gleam, light”). These include Uriyyah (“Yhwh is my light”) the name of one of David’s generals, Neriyahu “Yhwh is my lamp”, Yizrayah “Yhwh gleams”, minister of Hezekiah, and dozens more.
  • Archaeology. Many pieces of material evidence, including many seals found in Jerusalem with image of the sun, or the sun god in the form of a wing bed scarab.
  • Biblical affirmations. Job 38:6-7 may attest to Israelite recognition of astral deities “Who sets its cornerstone when the morning stars sang together, and all the divine beings shouted for joy?” Similarly Judges 5:20 features conflict in the astral plane “the stars fought in the heavens”.
  • Biblical acknowledgements. Ezekiel 8:16 has Israelites worshipping sun gods. So does 2 Kings 23:5,10-11 and Zephaniah 1:4-5.
  • Biblical Incorporation. The story of Sodom and Gomorrah reflects astral themes, where the divine punishment is meted out at the moment when the sun rises. It is even possible that the two messengers and the deity in the story represent the sun god and his two acolytes. Psalm 19:4-6 and Psalm 84:11 also shows Yahweh taking on astral qualities.

Other Deities

The Ugaritic texts mention hundreds of Canaanite gods. The Bible only criticizes two of them: Ba’al and Asherah. What gives?

The Biblical authors conflates Asherah and Astarte, and conflates multiple male god as “the Baals”.  Despite this, there is only evidence of ~10 gods worshipped in early Israel. This is also true amongst Israel’s neighbors. It appears that the religious landscape of Iron Age Canaan was simply less diverse than Bronze Age Ugarit.

Do we see evidence for these gods in the Bible, despite their not being named in that text?

Anat. Known for her savagery, Anat worship involves a celebration of gore. “Knee-deep she gleans in warrior blood, neck-deep in the gore of soldiers, until she [Anat] is sated with fighting.”  While no evidence of Anat-worship exists in ancient Israel, these divine themes have strong parallels in the Biblical text. The Bible describes heaps of copses, drinking blood, devouring flesh, and swords dripping with viscera.

Astarte. In the Bible, the Name of Yahweh is described in personal terms. The divine name acts as a warrior (Isaiah 30:27) and possesses martial qualities such as radiance and strength (Psalm 29:1-2). The warrior goddess Astarte bears the title “name of Baal”. This designation of Astarte and her martial character and special relationship to the god Baal approximate the martial character of the name, and its special relationship to Yahweh as warrior god. Further evidence for this hypothesis has been adduced from the Elephantine papyri

Similar lines of argument can be made for entities like Light and Truth of Psalm 43:3.

Angels. The lowest tier of the Israelite pantheon also went through alterations. As the Ugaritic texts show, the lowest tier involved a number of deities who served in menial capacities. A common task for such gods was to act as messenger, the literal meaning of the English word “angel”. Certainly angels are not regarded in later traditions as gods. But they were in early traditions.

Takeaways

This post provides evidence for a simple point. Polytheistic expression (not just condemnation!) occurs in the Hebrew Bible.

These expressions are best explained by the Yahweh cult shifting away from its traditional pagan roots, and towards a monolatrist (worship one god) and later monotheist (acknowledge one god) understandings.

As we will see next time, the reasons why Yahweh worship proceeded in this interesting (but not original) trajectory, are fairly easy to understand.

Yahweh, god of metallurgy

Part Of: History sequence
Content Summary: 2200 words, 11 min read.

Where, and how, was the god of Judaism first worshiped?

Yahweh was originally a god of metallurgy in northwest Saudi Arabia. 

Rethinking the Israelite origin story

First, a mass exodus of two million people (six hundred thousand fighting-age men) is vanishingly unlikely. If it was historical, we would expect:

  1. physical debris from the pilgrimage, at any of the thirty locations they are said to have stopped.
  2. archaeological evidence of a dramatic demographic shift in the highlands of Israel.
  3. inclusion in the (otherwise quite voluminous) records of the Egyptian border guards
  4. Egyptian texts discussing the new political situation (since the Egyptians had control over, and military outposts throughout Canaan)

How much of the above evidence do we have? Zero! Recall that absence of evidence can (and in this case does) mean evidence of absence. The very first piece of evidence aligns with the Biblical text is from 1000 BCE, where the Tel Dan stele affirms the existence of the “house of David”.

Second, the conquest narrative is non-historical. Most cities listed as razed in the Joshua narrative show evidence of uninterrupted prosperity in the archaeological record. And the three (out of thirty-one!) cities that do show interruption have not been localized to Israelite violence.

Third, until 700 BCE Judah is a much smaller political force than it makes itself to be. One demonstration of the small scale of this society is the request in one of the Armarna letter sent by the king of Jerusalem to the pharaoh that he supply fifty men “to protect the land.” Another letter asks the pharoah for one hundred soldiers to guard Megiddo from an attack by his aggressive neighbor, the king of Shechem. (Finkelstein, pp78). These letters date to the 14th century BCE. But the population in the intervening time period does not change much. Until 700 BCE, Judah’s population totaled no more than twenty settlements with a population of roughly 30,000. Only after the fall of Israel did Judah experience a population boom and full statehood.   

The Israelite people were indigenous Canaanites.

So where did the Israelite people come from? The Israelite people were originally Canaanite pastoralists who, in 1300 BCE. changed their economic strategy in response to worsening conditions. There is substantial evidence for this hypothesis

  • Ecological: we now know that the Late Bronze Age collapse (a dark age from 1200 – 900 BCE) was caused primarily by climate change-driven famine. The pastoralist strategy can only be successful if neighboring agriculturalists have surplus wheat available to trade. When that surplus dried up, former pastoralists are forced to grow their own wheat, and adapt a hybrid lifestyle.
  • Linguistic: Hebrew and Canaanite language are increasingly indistinguishable the further back you go in the Iron Age.
  • Material culture: Israelite and Canaanites shared the same building plans, pottery designs, village layouts, cooking habits …
  • Historic repetition: Canaanite pastoralists had twice before settled the highlands, but the previous two attempts had eventually failed.

We can also see when these highlands settlements began to slowly differentiate themselves from their “parent” lowland cities. First, the highland settlements did not consume pork (pigs were available for food in all regions of Canaan). Second, the highland peoples seemed to go identify themselves by the name “Israelite”, earliest mention of which is in the Merneptah stele (1204 BCE).

Since Israelites were indigenous Canaanites, we know they share the same culture. But did they start out worship the same gods?

The first Israelites worshiped the pantheon of El

In Egyptian mythology, the most powerful god was Ra. In Babylon, it was Marduk. In Greece, it was Chronus.

Monotheism_ Greek Pantheon

In Canaan, the chief god was El. El’s wife was Asherah, and his sons include Ba’al and Anut. The Canaanite pantheon is well-understood from the discovery of the Ugaritic texts.

In most English translations of the Hebrew Bible, you will see frequent use of the words “God” and “Lord”. The Hebrew terms for these phrases are more literally translated “El” and “Yahweh”. They are used so interchangeably in the Hebrew Bible that you would think them synonyms.

  • Names. The very name “Israel” means “house of El”. In contrast, later Israelite names have “Yahweh”-based suffixes e.g., Jehu. Further, most Israelite cities were named after the gods in El’s assembly.  The god Anat was honored in the city of Anathoth, the place of origin of the prophet Jeremiah. The god Dagan in Beth-Dagan. The god El in Beth-El. The god Shamash in Beth-Shamash. The god Shalimu in Jerusalem.
  • Ritual systems. The priestly system laid out in Leviticus is very nearly copy-and-pasted from the Ugaritic sacrificial system.
  • Legal codes. the Covenant, Holiness, and Deuteronomic law codes share strong parallels with surrounding Canaanite legal systems.
  • Iconography. A seal found in Jerusalem in a tomb of the seventh century shows a solar god flanked by two minor gods: “Righteousness” and “Justice”

There are also expressions of polytheism throughout the Hebrew Bible. For example,

  • “Do you not possess that which Chemosh, your god, has given you? So shall we possess what Yahweh has given us.” Judges 11:24
  • “Who is like Yahweh among the gods?” Exodus 15:11
  • “The people of Judah have as many gods as they have towns.” Jeremiah 11:13

In part two of this series, we will see hundreds more data establishing Israel’s traditional religion as polytheism.

The original Yahweh cult was a Shasu religion located in southern Edom

Recognized for their goatees and hair held back in a hairband, the Shasu nomads were well-known to the Egyptian authorities. They conducted copper mining in the wilderness, and also were quite successful camel breeders. The Bible uses the terms Edom, Teman, and Midianite interchangeably. Egyptian descriptions of the Shasu geographically overlap the Biblical land of the Midianites.

Okay. So how do we know that the Yahweh cult originated with the Shashu people?

  • Four of the oldest texts in the Bible tell us so. See Deut 33:2, Judges 5:4-5, Habakkuk 3:3 and Isaiah 63:1.
  • Special treatment of Edom. The Bible repeatedly condemns the gods of the Ammonites, the Moabites, and the Sidionites, but never the god of Edom. Deut 23:7 calls Edomites the “brothers” of the Israelites. Edom’s patriarch Esau is said to be the brother of Israel’s patriarch Jacob. The Bible makes a point of not mentioning Qos, the national god of Edom. We have evidence that Qos was a rather late theological development in Edom. Given this evidence, it is plausible to assume that Yahweh was worshiped in Edom and Qos stepped in only when Yahweh became the national god of Israel/Judah.
  • Archaeology.  Two Egyptian inscriptions, one dated to the period of Amenhotep III (14th century BCE), the other to the age of Ramesses II (13th century BCE), refer to “Yahweh in the land of the Shasu”. We also have one 9th century BCE text at Kuntillet Ajrud which refers to “Yahweh of Teman”.

Yahweh was first worshiped as a god of metallurgy

Gods in the ancient worlds were given a specific set of powers. For reasons we will get into next time, Yahweh in the Bible is attributed the attributes of many kinds of gods: he exhibits power of the storm, of the sun, and even of femininity. But if we limit our search for descriptions of God in Midianite territory, we see the following picture:

For more information, I recommend Amzallag, 2009. Yahweh, the Canaanite God of Metallurgy?

The founder of Judaism, Moses, was said to be a Midianite

Moses is described as having settled down with the Midianite people (the Shasu). His wife Zipporah and two sons were Midianite. What’s more: Moses’ father-in-law Jethro is called a priest. A priest of what god? Well, in Exodus 18:12, Jethro (and not Moses) is portrayed initiating a sacrifice to Yahweh. The Biblical editors seem uncomfortable with this tradition, for they later interjected a confession of faith on Jethro’s lips, which very much mirrors other such confessions. All of this suggests that Moses’ Midianite father-in-law was a priest of Yahweh. In fact, he seems to have spiritual authority over Moses in this passage.

The E source is replete with this kind of claim. We first meet Moses in Midian (no claims of him being born in Egypt, in this document). Moses’ response to Yahweh’s call, “Who am I that I should bring the Israelites out of Egypt?” would be a fair question for a man in Midian. E also claims he cannot go to Egypt because he is “heavy of tongue”. Traditionally interpreted as a speech defect, this phrase only occurs in one other place in the Hebrew Bible, where it means cannot speak the language. Finally, E also claims that the Midianites are direct descendents of Abraham.

While two Levite sources admit Moses’ Midianite connection, P actively tried to hide it. In the P source, has absolutely nothing about his ever being in Midian. Nothing about a Midianite wife, a priest father-in-law, nothing about his sons. Two books later, the P source injects a (blood-curdling) story designed to vilify the Midianites. Moses himself gives the order to kill all of the Midianite women. And this source does not include the little fact that Moses has a wife who happens to be a Midianite woman. The fact that the P source tries to deny the Midianite connection suggests the underlying claim is historical.

One does not need to take a position on the historicity of Moses, or of a mini-Exodus, to consider the above evidence. Even if he was entirely fictional, the fact that Israelite priests portrayed Moses as a Midianite is significant.

Yahweh was introduced to Israel as a second tier deity (a member of El’s family)

This can be seen in Deuteronomy 32:8-9, where El gives each of his sons a nation to rule over:

When El gave the nations their inheritance, when he divided all mankind, he set up boundaries for the peoples according to the number of the sons of El. For Yahweh’s portion is his people, Jacob his allotted inheritance.

In Psalm 82, we see Yahweh not at the head of the pantheon, but later asked to assume the job of all gods. “Yahweh stands in the divine assembly of El. Among the divinities, he pronounces judgment… Arise O Yahweh, judge the world; for You inherit all the nations.” Genesis 49:24-25 and Numbers 23-24 also view YHWH and El existing as distinct deities.

We have seen how Yahweh was first worshiped in Midian, and not Israel. Concurrently, El was worshiped in the land of Israel.

Then, when Yahwism emigrated to Israel (incorporation), Yahweh was not recognized as a god of gods. Rather, Yahweh was elevated to this position (equated with El) as the nation of Judah transitioned towards statehood.

As we will see next, worship of Yahweh emerged gradually, in five stages:

Monotheism_ Five Stages (2)

Takeaways

Here’s what we covered today:

  • The Israelite origin story is largely a patriotic fiction.
  • The Israelite people were indigenous Canaanites.
  • The first Israelites worshiped the pantheon of El.
  • The original Yahweh cult was a Shasu religion located in southern Edom
  • Yahweh was first worshiped as a god of metallurgy
  • The founder of Judaism, Moses, was said to be a Midianite
  • Yahweh was introduced to Israel as a second tier deity (a member of El’s family)

Until next time.

The Documentary Hypothesis

Part Of: Demystifying Religion sequence
Followup To: A Secret in The Ark
Content Summary: 1900 words, 19min read.

Who Wrote The Hebrew Bible?

A close reading of the Hebrew Bible reveals the existence of doublets: two stories that describe the same event. A few examples:

  • Abraham’s covenant (Genesis 15:1-21 and 17:1-27),
  • Jacob becoming Israel (Genesis 32:25-33 and 35:9-15),
  • Yahweh summons Moses (Exodus 3-4 and 6:2-30)
  • Water in the wilderness (Exodus 15:22b-25a and 17:1-7)

Dozens of these doublets appear throughout the first five books of the Hebrew Bible (also known as the Torah). Traditionally, the Torah is thought to have a single author, and doublets like these were explained as either a) different events, or b) same event but with different emphases.

But what if these doublets exist because the Torah has multiple authors?

Let’s look deeper.

Source Identification as Unsupervised Learning

In principle, how might we discern between a single- and a multi-author book?

The Clustering Method. Let’s conjecture two sources (clusters) and, for each sentence, assign it either Cluster 1 or Cluster 2. We have complete freedom in our assignments. We want to chose clusters that maximize the coherence within each source, and also maximize the difference between the sources.

  • If the clusters are not very different, there is probably only one author.
  • If they are very different, we can safely conclude two authors.

For readers familiar with machine learning: this is unsupervised learning – searching for latent variables that best explain our data.

A Tale of Two Books

Suppose you encounter a book you have never read before, originally written in English by a single author. Call this Book A.

But you don’t know if Book A has one or two authors! To find out, you might use the Clustering Method.

What happens if you look at every sentence in Book A, and try to make each source-cluster as different as possible. Even for books written by a single author, the resultant source-clusters could be contrived to be truly different. For example, you could put all optimistic sentences in one bucket, and all pessimistic sentences in the other. But even though the texts feel a little different, they don’t differ that much (after all, a single person wrote both!)

In contrast, imagine you come across another book, Book B, replete with doublets. You break those doublets into clusters, and discover the following facts:

  1. Dialect. One cluster uses an antiquated dialect of English (e.g., Shakespearean), the other a modern dialect (e.g., African-American Vernacular English).
  2. Terminology. One cluster consistently uses the word “soda”, the other consistently uses the alternative, “pop”. 
  3. Consistent Content. One cluster is very interested in economic issues. The other is more interested in rehashing political debates.
  4. Narrative Flow.  Reading each cluster as a standalone book tends to smooth out non-sequiturs, and generally improve the sense of narrative flow.
  5. Inter-Source Relationships.  Imagine Book B is situated in an anthology with other books (B2 and B3) of unknown authorship. These other books are kinda dissimilar  from B. But B2 has lots in common with Cluster 1, and B3 sounds like it shares an author with Cluster 2. 
  6. Historical Grounding. Given the above information, we can make a pretty good guess as to identify of both authors, and why they got merged into a single anonymous volume.

On this evidence, it seems very unlikely that there is a single author of Book B. Instead, most people would indeed accept that this document has two different authors.

The Hypothesis: Five Sources

The Hebrew Bible is like Book B. Only, instead of two distinct authors, we have identified five. These are the Jahwist source (J), the Elohim source (E), the  Priestly source (P), the Deuteronomist source (D), and the Redactor (R). This is the Documentary Hypothesis.

We will explore the different personalities of these authors in more detail next section; for now, I want to briefly describe their contributions to the Torah from a textual perspective:

Documentary Hypothesis_ Source Distribution

And here is the timeline on which our source documents were authored, where the final redactor R (Ezra) compiled the final JEPD product.

Documentary Hypothesis_ Composition Timeline (2)

Evidence For The Hypothesis

How do we know all of this? On the following grounds:

  1. Dialect. Sources J and E are written in the Hebrew of the 10th BCE. In contrast, P and D are written in 8th century BCE.
  2. Terminology. A couple examples. Source D alone use of the phrase “with all your heart and with all your soul”. Source P uses all 100 instances of the word “congregation”, and 67 out of 69 examples of the work “chieftain”. Here are more examples:

Documentary Hypothesis_ Terminology (1)

  1. Consistent Content.
    • The Revelation of God’s Name.  According to J, the name YHWH was known since the earliest generations of humans. But in E and P it is stated just as explicitly that YHWH does not reveal this name until the generation of Moses.
    • Sacred Objects.
      • Tabernacle: P discusses the Tabernacle 200 times, it receives more attention than any other subject. It is never mentioned in J or D. E mentions it three times.
      • The Ark: J identifies the ark is identified as crucial to Israel’s travels and military successes; it is never mentioned in E.
      • Urim and Thummim: P mentions Urim and Thummim. J, E, and D never do.
      • Cherubs: P and J invoke cherubs. E and D never do.
      • Miracles: E has miracles performed by Moses’ staff. P uses Aaron’s staff.
    • Priestly Leadership. In P, access to the divine is limited to Aaronid priests. There is no talk of dreams, angels, talking animals, judges, and very few mentions to prophets. These themes are developed almost exclusively in J, E, and D.
  2. Narrative Flow.  Reading J, E, D, and P as standalone narratives tends to remove non-sequiturs and contradictions, and generally improve the sense of narrative flow. Want to see this for yourself? Go compare the original composite story of Noah, and contrast it with the original two stories (the original stories were weaved together by a later redactor).
  3. Inter-Source Relationships. Source D shares the same tone, emphases, and worldview as the book of Jeremiah. Source P resonates strongly with the book of Ezekiel. Finally, Sources J and E mirrors the book of Hosea.
  4. Historical Grounding. This is the most exciting piece of evidence, for reasons I will more fully explore next time. Suffice to say that we can localize each source to the historical context in which it was written. We have evidence suggesting that J and E were composed during the divided monarchy, before Israel fell in 722 BCE. J is written from a Southern perspective (in Judah), E is written from a Norther perspective (in Israel). After the fall of the northern kingdom, many Israelites fled to Judah. Because the old tribal disputes had faded in importance, J and E were combined into a JE narrative. The Priestly source P was an alternative telling of JE written in 8th century Judah. Finally, the first iteration of Deuteronomy was composed during the reign of King Josiah (641 BCE), just 20 years before the Babylonian exile (622 BCE).

I’ll let Richard Elliott Friedman wrap up this section.

Above all, the strongest evidence establishing the Documentary Hypothesis is that several different lines of evidence converge. There are more than thirty cases of doublets: stories or laws that are repeated in the Torah. The existence of so many overlapping texts is noteworthy itself. But their mere existence is not the strongest argument. One could respond, after all, that this is just a matter of style of narrative strategy. Similarly, there are hundreds of apparent contradictions in the text, but one could respond that we can taken them one by one and find some explanation for each contradiction. And, similarly, there is a matter of the texts that consistently call the deity God while other texts consistently call God by the name YHWH, to which one could respond that this is simply like calling someone sometimes by his name and sometimes by his title.

The powerful argument is not any one of these matters. It is that all these matters converge. When we separate the doublets, this also results in the resolution of nearly all the contradictions. And when we separate the doublets, the name of God divides consistently in all but three out of more than two thousand occurrences. And when we separate the doublets, the terminology of each source remains consistent within the source. And when we separate the sources, this produces continuous narratives that flow with only a rare break. And when we separate the sources, this fits with the linguistic evidence, where the Hebrew of each source fits consistently with what we know of the Hebrew in each period. And so on for each of the categories that precede this section.

The name of God and doublets were the were the starting-points of the investigation into the formation of the Bible. But they are not major arguments or evidence in themselves. The most compelling argument is that all this evidence of so many kinds comes together so consistently. To this day, no one known to me who challenged the hypothesis has ever addressed this fact.

Open Questions

Most scholars agree with the broad picture of four sources (J, E, P, D) and two redactions (JE and JEPD). There does exist considerable controversy at finer levels of detail. The four most contentious mini-debates I know of are as follows:

  • While there is consensus on the dating of J, E, and D, the dating of P is somewhat controversial (700 vs 500 BCE).
  • The exact relationship of J and E is at times hard to work out, particularly because E has less material than J. Were parts of E ejected during the redaction process of JE? Or was E composed as a supplement to J, and not a standalone work?
  • It is hard to make out how the two redaction processes actually worked. The Hebrew Bible is the very first example of prose writing in the entire world (earlier writing was entirely poetic).
  • There is consensus that J, E, P, and D were authored long after the events that they describe. They were undoubtedly influenced by early oral traditions. However, the extent of continuity and historical memory transferred from these oral traditions is in some doubt.

Takeaways

I was raised in an evangelical household, which means that growing up, I have read the Hebrew Bible (known to Christians as the Old Testament) cover-to-cover several times. I found such reading difficult. Some of this was mere cultural distance: a kid in the 20th century CE is three millenia removed from Canaanite culture in which the Bible was written.

But for me, the Hebrew Bible feels much easier to understand in light of the Documentary Hypothesis.

  • Contradictions are explained.
  • The within-source stories flow much better.
  • It is easier to understand the narrative discontinuities in the composite.
  • The diverse perspectives can be situated within their originating cultural milieu.

I wish more people knew about the Documentary Hypothesis for these reasons. Or better yet, could look at the labelled sources of the Hebrew Bible online. But for now, if you’d like to read the Hebrew Bible yourself, with labelled sources, the best way to do this is simply to purchase a book like, The Bible With Sources Revealed for a copy of the complete Torah, color coded by authorship.

Until next time.

An Introduction to Language Models

Part Of: Language sequence
Content Summary: 1500 words, 15 min read

Why Language Models?

In the English language, ‘e’ appears more frequently than ‘z’. Similarly,  “the” occurs more frequently than “octopus”. By examining large volumes of text, we can learn the probability distributions of characters and words.

Language Models_ Letter and Word Frequency

Roughly speaking, statistical structure is distance from maximal entropy. The fact that the above distributions are non-uniform means that English is internally recoverable: if noise corrupts part of a message, the surrounding can be used to recover the original signal. Statistical structure is also used to reverse engineer secret codes such as the Roman cipher.

We can illustrate the predictability of English by generating text based on the above probability distributions. As you factor in more of the surrounding context, the utterances begin to sound less alien, and more like natural language.

Language Model_ Structure of English

A language model exploits the statistical structure of a language to express the following:

  • Assign a probability to a sentence P(w_1, w_2, w_3, \ldots w_N)
  • Assign probability of an upcoming word P(w_4 \mid w_1, w_2, w_3)

Language models are particularly useful in language perception, because they can help interpret ambiguous utterances. Three such applications might be,

  • Machine Translation: P(\text{high winds tonight}) > P(\text{large winds tonight})
  • Spelling correction: P(\text{fifteen minutes from}) > P(\text{fifteen minuets from})
  • Speech Recognition: P(\text{I saw a van}) > P(\text{eyes awe of an})

Language models can also aid in language production. One example of this is autocomplete-based typing assistants, commonly displayed within text messaging applications. 

Towards N-Grams

A sentence is a sequence of words \textbf{w} = (w_1, w_2, \ldots, w_3). To model the joint probability over this sequence, we use the chain rule:

p(\text{this is the house})

= p(\text{this})p(\text{is}\mid\text{this})p(\text{the}\mid\text{this is})p(\text{house}\mid\text{this is the})

As the number of words grows, the size of our conditional probability tables (CPTs) quickly becomes intractable. What is to be done? Well, recall the Markov assumption we introduced in Markov chains.

markov_assumption

The Markov assumption constrains the size of our CPTs. However, sometimes we want to condition on more (or less!) than just one previous word. Let v denote how many variables we admit in our context. A variable order Markov model (VOM) allows v elements in its context: p(s_{t+1} | s_{t-v}, \ldots, s_{t}). Then the size of our CPT is n=v+1, because we must take our original variable into account. Thus an N-gram is defined as a v-order Markov model. By far, the most common choices are trigrams, bigrams, and unigrams:

Language Models_ Ngram comparison (1)

We have already discussed Markov Decision Processes, used in reinforcement learning applications.  We haven’t yet discussed MRFs and HMMs. VOMs represent a fourth extension: the formalization of N-grams. Hopefully you are starting to appreciate the  richness of this “formalism family”. 🙂

Language Model_ Markov Formalisms (1)

Estimation and Generation

How can we estimate these probabilities? By counting!

ngram_v2

Let’s consider a simple bigram language model. Imagine training on this corpus:

This is the cheese.

That lay in the house that Alice built.

Suppose our trained LM encounters the new sentence “this is the house”. It estimates its probability as:

p(\text{this is the house})

= p(\text{this})p(\text{is} \mid \text{this})p(\text{the} \mid \text{is})p(\text{house} \mid \text{the}) 

= \dfrac{1}{12} * 1 * 1 * \dfrac{1}{2} = \dfrac{1}{24}

How many problems do you see with this model? Let me discuss two.

First, we have estimated that p(\text{this}) = \dfrac{1}{24}. And it is true that “this” occurs only once in our toy corpus above. But out of two sentences, “this” leads half of them. We can express this fact by adding a special START token into our vocabulary.

Second, recall what happens when language models generate speech. Once they begin a sentence, they are unable to end it! Adding a new END token will allow our model the terminate a sentence, and begin a new one.

With these new tokens in hand, we update our products as follows:

Language Models_ Sentence Estimation (1)

A couple other “bug fixes” I’ll mention in passing:

  • Out-of-vocabulary words are given zero probability. It helps to add an unknown  (UNK) pseudoword and assign it some probability mass.
  • LMs prefer very short sentences (sequential multiplication is monotonic decreasing). We can address this e.g., normalizing by sentence length.

Smoothing

In the last sentence in the image above, we estimate p(END|house) = 0, because we have no instances of this two-word sequence in our toy corpus. But this causes our language model to fail catastrophically: the sentence is deemed impossible (0% probability).

This problem of zero probability increases as we increase the complexity of our N-grams. Trigram models are more accurate than bigrams, but produce more p=0 events. You’ll notice echoes of the bias-variance (accuracy-generalization) tradeoff.

How can we remove zero counts? Why not add one to every word? Of course, we’d then need to increase the size of our denominator, to ensure the probabilities still sum to one. This is Laplace smoothing

Language Model_ Laplace Smoothing

In a later post, we will explore how (in a Bayesian framework) such smoothing algorithms can be interpreted as a form of regularization (MAP vs MLE).

Due to its simplicity, Laplace smoothing is well-known  But several algorithms achieve better performance.  How do they approach smoothing?

Recall that a zero count event in an N-gram is not likely to occur in (N-1)-gram model. For example, it is very possible that the phrase “dancing were thought” hasn’t been seen before. 

Language Model_ Backoff Smoothing

While a trigram model may balk at the above sentence, we can fall back on the bigram and/or unigram models. This technique underlies the Stupid Backoff algorithm.

As another variant on this theme, some smoothing algorithms train multiple N-grams, and essentially use interpolation as an ensembling method. Such models include Good-Turing and Kneser-Ney algorithms.

Beam Search

We have so far seen examples of language perception, which assigns probabilities to text. Let us consider language perception, which generates text from the probabilistic model. Consider machine translation. For a French sentence \textbf{x}, we want to produce the English sentence \textbf{y} such that y^* = \text{argmax } p(y\mid x).  

This seemingly innocent expression conceals a truly monstrous search space. Deterministic search has us examine every possible English sentence. For a vocabulary size V, there are V^2 possible two-word sentences. For sentences of length n, our time complexity of our brute force algorithm is O(V^n).

Since deterministic search is so costly, we might consider greedy search instead. Consider an example French sentence \textbf{x} “Jane visite l’Afrique en Septembre”. Three candidate translations might be,

  • y^A: Jane is visiting Africa in September
  • y^B: Jane is going to Africa in September
  • y^C: In September, Jane went to Africa

Of these, p(y^A|x) is the best (most probable) translation. We would like greedy search to recover it.

Greedy search generates the English translation, one word at a time. If “Jane” is the most probable first word \text{argmax } p(w_1 \mid x), then the next word generated is \text{argmax } p(w_2 \mid \text{Jane}, x). However, it is not difficult to contemplate p(\text{going}\mid\text{Jane is}) > p(\text{visiting}\mid\text{Jane is}), since the word “going” is used so much more frequently in everyday conversation. These problems of local optima happen surprisingly often.

The deterministic search space is too large, and greedy search is too confining. Let’s look for a common ground.

Beam search resembles greedy search in that it generates words sequentially. Whereas greedy search only drills one such path in the search tree, beam search drills a finite number of paths. Consider the following example with beamwidth b=3

beam_search

As you can see, beam search elects to explore y^A as a “second rate” translation candidate despite y^B initially receiving the most probability mass. Only later in the sentence does the language model discover the virtues of the y^A translation. 🙂

Strengths and Weaknesses

Language models have three very significant weaknesses.

First, language models are blind to syntax. They don’t even have a concept of nouns vs. verbs!  You have to look elsewhere to find representations of pretty much any latent structure discovered by linguistic and psycholinguistic research.

Second, language models are blind to semantics and pragmatics. This is particularly evident in the case of language production: try having your SMS autocomplete write out an entire sentence for you. In the real world, communication is more constrained: we choose the most likely word given the semantic content we wish to express right now.

Third, the Markov assumption is problematic due to long-distance dependencies. Compare the phrase “dog runs” vs “dogs run”. Clearly, the verb suffix depends on the noun suffix (and vice versa). Trigram models are able to capture this dependency. However, if you center-embed prepositional phrases, e.g., “dog/s that live on my street and bark incessantly at night run/s”, N-grams fail to capture this dependency.

Despite these limitations, language models “just work” in a surprising diversity of applications. These models are particularly relevant today because it turns out that Deep Learning sequence models like LSTMs share much in common with VOMs. But that is a story we shall have to take up next time.

Until then.

 

[Video] An introduction to reinforcement learning

Part Of: Reinforcement Learning sequence

Sorry it’s been so long since my last post!  I’ve been teaching a Deep Learning class, based on Andrew Ng’s Coursera specialization.  Don’t worry, my other lectures will ultimately be cleaned & shared here too 🙂

This talk covers the mathematical intuitions of RL, which draws from content relating to Markov Chains and Markov Decision Processes. It also contains some novel material, including my thoughts on how RL compares with other machine learning techniques.