Polytheistic Roots of Israelite Religion

Part Of: Demystifying Religion sequence
Followup To: Yahweh and the Levites
Content Summary: 2000 words, 10min read.


Is the Hebrew Bible monotheistic?  

We might be tempted to say yes after reading Isaiah 44:6 “I am the first and I am the last; besides me there is no God”.

But the situation is more complicated. The Hebrew Bible is also replete with polytheism. A few examples:

  • “Do you not possess that which Chemosh, your god, has given you? So shall we possess what Yahweh has given us.” Judges 11:24
  • “Who is like Yahweh among the gods?” Exodus 15:11
  • “The people of Judah have as many gods as they have towns.” Jeremiah 11:13

We also see middle ground staked out between these two positions. For example, the original audience of the book of Deuteronomy is often exhorted not to follow after other gods, without it ever being asserted that these gods did not exist or were not real. This is known as monolatrism (“single worship”).

Which belief came first?

Last time, we showed how Yahweh was originally a god of metallurgy in northwest Saudi Arabia. Today, we will work with the framework that Yahweh was introduced to Israel in a five-stage process:

  1. Traditional Polytheism. The earliest Israelites worshipped creator god El, his wife Asherah, and his sons e.g., Baal.
  2. Incorporation. Yahweh was incorporated as a 2nd tier god in El’s pantheon.
  3. Elevation. Yahweh and El are identified as the same deity.
  4. Monolatrism. A new Yahweh-only movement emerges, and the gods of the second tier are denied.
  5. Monotheism. Gods of other nations are denied, Yahweh’s power is deemed universal in scope.

Why did Yahweh worship progress along this trajectory? As we shall explore next time, as with the theocracies of surrounding nations, changes in the religious landscape have strong, robust correlates in the sociopolitical life.

Today I’d like to focus on a different, simpler topic. We shall turn to archaeology and cultural anthropology to explore expressions of polytheism within the Hebrew Bible. Many of my readers already know that the text acknowledges (polemicizes against) polytheistic practices. Less well-known are examples of celebration (bald assertions of polytheistic beliefs) and assimilation (Yahweh “adopts” the roles and characteristics of rival deities). 

Monotheism_ Five Stages (2)


Let’s review the deities in El’s pantheon, and their appearance in the Hebrew Bible.

A Disclaimer

For many modern readers, polytheism is a term loaded with negative connotation. Partisans use it as a weapon. Attackers point to continuities between Israelite religion & polytheism, and defenders point to instances where Israelite rhetoric polemicizes against polytheism. But all ideological innovations have both features.

More to the point, those who spend time interacting with polytheism understands how earnestly it grapples with the same aspects of the human condition as other strands of religious expression. Polytheism must be encountered on its own terms. To weaponize is to misunderstand.

The important thing to bear in mind in the following, is that underneath the images and icons of religious expression lie a particular group of people, responding to social and political pressures in thoroughly understandable ways. My experience has been, the more time you spend in someone else’s culture, the easier it becomes to empathize with their plight.


Israelite Polytheism_ El

At some point in its history, El was identified with Yahweh as the same god.

This equation is expressed clearly in Exodus 6:2-3. “And God said to Moses, “I am Yahweh. I appeared to the patriarchs as El, but by my name Yahweh I did not make myself known to them.” Other Biblical material asserts this equation. Joshua 22:22 states “the god of gods is Yahweh”. Judges 9:46 refers to “El of the covenant”.

The Yahweh-alone movement vigorously condemn prominent Canaanite gods… except El. There are zero condemnations of El in the Hebrew Bible. This makes sense if Yahweh was ultimately identified with this Canaanite creator-god. What’s more, archaeological evidence suggests that the Yahweh religious centers in Shiloh and Bethel were originally a place of El worship.

El and Yahweh are attributed same characteristics. El is depicted as a wise old man with a beard eg “You are great, O El, and your hoary beard instructs you”. Yahweh is described in the same terms (Daniel 7:9, Job 36:26, Habakkuk 3:6). Like “Kind El, the Compassionate”, Yahweh is a “merciful and gracious god”. The description of Yahweh’s dwelling place as a tent (Psalms 15:1, 27:6, 91:10) recalls the tent of El in the Canaanite narrative of Elkunirsa. Finally, both Yahweh and El are said to dwell amidst cosmic waters (Isaiah 33:20-22, Ezekiel 47:1-12, Zechariah 14:8).

Just as Zeus had a council, or assembly, of other gods, so too does Yahweh. The Hebrew Bible is overflowing with references to Yahweh’s (El’s) assembly. See for example Psalm 89:6-8, Zechariah 14:5, 1 Kings 22:19, Isaiah 6:1-8, and Jeremiah 23:18,22.


Israelite Polytheism_ Baal

Worship of Baal can be dated back to the foundation of Israelite societies. This can be seen in onamatology, the study of proper names. Names in the Ancient Near East tend to have a theophoric component: usually a suffix that honors a deity. Yahwistic names include Josiah, Jehu (note the “J” sound); Baal-oriented names include e.g., “Zerubabbel”. In addition to hundreds of icons devoted to Baal worship, we also see Ba’al theophoric names as common in the Levant in this time period.

Yahwistic prophets of this period reserve the most vitriol for Baal worship. Why? Because the Omride dynasty (including King Ahab & Jezebel) erected a temple to Ba’al. While the cult of Yahweh continued in the northern kingdom, Baal was perhaps elevated as the patron god of the northern monarchy, thus creating some sort of theopolitical unity between the kingdom of the north and the city of Tyre.

Indeed, there is some evidence that the cult of Baal and Yahweh got conflated in the north. Hosea 2:16-24 suggest that some northern Israelites did not distinguish between Yahweh and Baal. The religious sanctuaries in the Israelite cities of Dan and Bethel centered around golden calves; this iconography strongly parallels that of Baal. Finally, the redundancy in 1 Kings 16:32 was almost certainly a scribe glossing over the original text, “altar for Baal in temple of Yahweh”.

To induce the Israelites to stop worshipping Baal, the imagery of Baal was adopted by the Yahweh cult. The Baal Cycle, ancient mythology on the scale of the Epic of Gilgameth, has four literary themes for the storm god. Here are those themes, along with the Biblical text which mirrors them.

  1. The march of the divine warrior (Psalm 104:3 “He makes the clouds his chariot, and travels along on the wings of the wind”)
  2. The convulsions of nature as the divine warrior manifests his power (Judges 5:5, Hab 3:10)
  3. The return of the divine warrior to his holy mountain to assume divine kingship (Isaiah 31:4)
  4. The utterance of the divine warrior’s voice from his palace provides rains that fertilize the earth (Jeremiah 10:13)

Yahweh is also depicted as defeating Baal’s classic enemies:

  • Baal/Yahweh defeats a seven headed dragon, Leviathan, and River (CAT 5.1, Psalm 74:13-15).
  • Baal/Yahweh defeats Sea (KTU 1.14, Psalm 89:10).
  • Baal/Yahweh defeats Death/Mot (KTU 1.4 VIII-1.6, Isaiah 25:8).


Israelite Polytheism_ Asherah

El’s wife was named Asherah. When Yahweh was identified with El, did he also inherit his wife? In the blessings of Joseph, Genesis 49:25 contains language specific to the Asherah cult “blessings from Breast-and-Womb”. The Bible further admits that the Israelites frequently worshipped a “Queen of Heaven” (Jeremiah 7:18, 44:17-25). Indeed, 2 Kings 21:7 tells us that worship of Asherah happened within the Temple itself. Finally, archaeology has uncovered several icons with the inscription “Yahweh and his Asherah”. This evidence cumulatively suggests that, in early forms of Israelite religion Yahweh was believed to have a wife.

Israelite polytheism_ Yahweh and his Asherah

The push towards monolatrism led to the eviction of the Asherah cult, whose memory may be preserved in Zechariah 5:5-11. But this eviction created a deficit of femininity to Israelite religious expression. To compensate, the Biblical writers began attributing feminine attributes to Yahweh (Isaiah 49:15, 46:3, 44:2,24, 42:14). Asherah-like characteristics also appear in the goddess of Wisdom in Proverbs 8.


There is extensive evidence for worship of an astral deity (sun god) in Jerusalem.  And Jerusalem is presumably the site that Yahweh was identifed with El. Since the Ugaritic texts hint that El’s family was astral in character, it is not unthinkable that Yahweh was viewed similarly.

  • Proper names. A certain number of proper names are constructed from the root ‘-w-r (“shine, gleam, light”). These include Uriyyah (“Yhwh is my light”) the name of one of David’s generals, Neriyahu “Yhwh is my lamp”, Yizrayah “Yhwh gleams”, minister of Hezekiah, and dozens more.
  • Archaeology. Many pieces of material evidence, including many seals found in Jerusalem with image of the sun, or the sun god in the form of a wing bed scarab.
  • Biblical affirmations. Job 38:6-7 may attest to Israelite recognition of astral deities “Who sets its cornerstone when the morning stars sang together, and all the divine beings shouted for joy?” Similarly Judges 5:20 features conflict in the astral plane “the stars fought in the heavens”.
  • Biblical acknowledgements. Ezekiel 8:16 has Israelites worshipping sun gods. So does 2 Kings 23:5,10-11 and Zephaniah 1:4-5.
  • Biblical Incorporation. The story of Sodom and Gomorrah reflects astral themes, where the divine punishment is meted out at the moment when the sun rises. It is even possible that the two messengers and the deity in the story represent the sun god and his two acolytes. Psalm 19:4-6 and Psalm 84:11 also shows Yahweh taking on astral qualities.

Other Deities

The Ugaritic texts mention hundreds of Canaanite gods. The Bible only criticizes two of them: Ba’al and Asherah. What gives?

The Biblical authors conflates Asherah and Astarte, and conflates multiple male god as “the Baals”.  Despite this, there is only evidence of ~10 gods worshipped in early Israel. This is also true amongst Israel’s neighbors. It appears that the religious landscape of Iron Age Canaan was simply less diverse than Bronze Age Ugarit.

Do we see evidence for these gods in the Bible, despite their not being named in that text?

Anat. Known for her savagery, Anat worship involves a celebration of gore. “Knee-deep she gleans in warrior blood, neck-deep in the gore of soldiers, until she [Anat] is sated with fighting.”  While no evidence of Anat-worship exists in ancient Israel, these divine themes have strong parallels in the Biblical text. The Bible describes heaps of copses, drinking blood, devouring flesh, and swords dripping with viscera.

Astarte. In the Bible, the Name of Yahweh is described in personal terms. The divine name acts as a warrior (Isaiah 30:27) and possesses martial qualities such as radiance and strength (Psalm 29:1-2). The warrior goddess Astarte bears the title “name of Baal”. This designation of Astarte and her martial character and special relationship to the god Baal approximate the martial character of the name, and its special relationship to Yahweh as warrior god. Further evidence for this hypothesis has been adduced from the Elephantine papyri

Similar lines of argument can be made for entities like Light and Truth of Psalm 43:3.

Angels. The lowest tier of the Israelite pantheon also went through alterations. As the Ugaritic texts show, the lowest tier involved a number of deities who served in menial capacities. A common task for such gods was to act as messenger, the literal meaning of the English word “angel”. Certainly angels are not regarded in later traditions as gods. But they were in early traditions.


This post provides evidence for a simple point. Polytheistic expression (not just condemnation!) occurs in the Hebrew Bible.

These expressions are best explained by the Yahweh cult shifting away from its traditional pagan roots, and towards a monolatrist (worship one god) and later monotheist (acknowledge one god) understandings.

As we will see next time, the reasons why Yahweh worship proceeded in this interesting (but not original) trajectory, are fairly easy to understand.


Yahweh and the Levites

Part Of: Demystifying Religion sequence
Related To: Who Wrote The Bible?
Content Summary: 4000 words, 20min read.

Exodus 3:14 has God saying to Moses, “I Am that I Am.” And he said, “You must say this to the Israelites, ‘I Am has sent me to you.’” The Hebrew initials for “I am that I am” is YHWH (pronounced “Yahweh”). This tetragrammaton is the name of the god of Judaism.

But where, and by whom, was Yahweh first worshipped?

Today, we shall see that Yahweh was originally a god of metallurgy in northwest Saudi Arabia. The Levites brought worship of him to Israel via a “mini-Exodus”.

A Disclaimer

The historicity of the exodus is a fairly partisan topic. Many uninformed people like to give their opinions, and many opinions are uninformed. 

None of my material comes from Christian or atheistic apologetic websites. I made a point to only draw material from academic sources. Specifically, I draw from the following books (and journal articles and lecture videos, not pictured):

Yahweh_ Books

People familiar with this field will note that my sources do not see eye to eye. For example, Friedman and Romer leverage conservative and liberal approaches, respectively. Yet despite the range of expression, my sources converge on complementary solutions to the origin of Yahweh. My task today is to weld their insights together into a coherent whole.

Researching this post has felt a little like digging into a mystery novel. I hope reading it provides you with a similar experience.

Stage 1: El’s Pantheon in Israel

1.1) Certain aspects of Israelite prehistory as given by the Bible are non-historical.

First, a mass exodus of two million people (six hundred thousand fighting-age men) is not vanishingly unlikely. If it had actually happened, we would expect

  1. physical debris from the pilgrimage, at any of the thirty locations they are said to have stopped.
  2. archaeological evidence of a dramatic demographic shift in the highlands of Israel.
  3. inclusion in the (otherwise quite voluminous) records of the Egyptian border guards
  4. Egyptian texts discussing the new political situation (since the Egyptians had control over, and military outposts throughout Canaan)

And how much evidence do we have in each of these four dimensions? Literally zero evidence- in all of them. Recall that absence of evidence can (and in this case does) mean evidence of absence. The very first piece of evidence confirming the Biblical text is from 1000 BCE, where the Tel Dan stele affirms the existence of the “house of David”.

Second, the conquest narrative is non-historical. Most cities listed as razed in the Joshua narrative show evidence of uninterrupted prosperity in the archaeological record. And the three (out of thirty-one!) cities that do show interruption have not been localized to Israelite violence.

Third, until 700 BCE Judah is a much smaller political force than it makes itself to be. One demonstration of the small scale of this society is the request in one of the Armarna letter sent by the king of Jerusalem to the pharaoh that he supply fifty men “to protect the land.” Another letter asks the pharoah for one hundred soldiers to guard Megiddo from an attack by his aggressive neighbor, the king of Shechem. (Finkelstein, pp78). These letters date to the 14th century BCE. But the population in the intervening time period does not change much. Until 700 BCE, Judah’s population totaled no more than twenty settlements with a population of roughly 30,000. Only after the fall of Israel did Judah experience a population boom and full statehood.   

1.2) The Israelite people were indigenous Canaanites.

So where did the Israelite people come from? The Israelite people were originally Canaanite pastoralists who, in 1300 BCE. changed their economic strategy in response to worsening conditions. We have a wealth of evidence supporting this positive hypothesis, including:

  • Ecological: we now know that the Late Bronze Age collapse (a dark age from 1200 – 900 BCE) was caused primarily by climate change-driven famine. The pastoralist strategy can only be successful if neighboring agriculturalists have surplus wheat available to trade. When that surplus dried up, former pastoralists are forced to grow their own wheat, and adapt a hybrid lifestyle.
  • Linguistic: Hebrew and Canaanite language are increasingly indistinguishable the further back you go in the Iron Age.
  • Material culture: Israelite and Canaanites shared the same building plans, pottery designs, village layouts, cooking habits …
  • Historic repetition: Canaanite pastoralists had twice before settled the highlands, but the previous two attempts had eventually failed.

We can also see when these highlands settlements began to slowly differentiate themselves from their “parent” lowland cities. First, the highland settlements did not consume pork (pigs were available for food in all regions of Canaan). Second, the highland peoples seemed to go identify themselves by the name “Israelite”, earliest mention of which is in the Merneptah stele (1204 BCE).

Since Israelites were indigenous Canaanites, we know they share the same culture. But did they start out worship the same gods?

1.3) The Israelites and the Canaanites shared the same religion: the pantheon of El.

In Egyptian mythology, the most powerful god was Ra. In Babylon, it was Marduk. In Greece, it was Chronus.

Monotheism_ Greek Pantheon

In Canaan, the chief god was El. El’s wife was Asherah, and his sons include Ba’al and Anut. The Canaanite pantheon is well-understood from the discovery of the Ugaritic texts.

In most English translations of the Hebrew Bible, you will see frequent use of the words “God” and “Lord”. The Hebrew terms for these phrases are more literally translated “El” and “Yahweh”. They are used so interchangeably in the Hebrew Bible that you would think them synonyms.

  • Names. The very name “Israel” means “house of El”. In contrast, later Israelite names have “Yahweh”-based suffixes e.g., Jehu. Further, most Israelite cities were named after the gods in El’s assembly.  The god Anat was honored in the city of Anathoth, the place of origin of the prophet Jeremiah. The god Dagan in Beth-Dagan. The god El in Beth-El. The god Shamash in Beth-Shamash. The god Shalimu in Jerusalem.
  • Ritual systems. The priestly system laid out in Leviticus is very nearly copy-and-pasted from the Ugaritic sacrificial system.
  • Legal codes. the Covenant, Holiness, and Deuteronomic law codes share strong parallels with surrounding Canaanite legal systems.
  • Iconography. A seal found in Jerusalem in a tomb of the seventh century shows a solar god flanked by two minor gods: “Righteousness” and “Justice”

There are also expressions of polytheism throughout the Hebrew Bible. For example,

  • “Do you not possess that which Chemosh, your god, has given you? So shall we possess what Yahweh has given us.” Judges 11:24
  • “Who is like Yahweh among the gods?” Exodus 15:11
  • “The people of Judah have as many gods as they have towns.” Jeremiah 11:13

In part two of this series, we will see hundreds more data establishing Israel’s traditional religion as polytheism.

Stage 2: Yahwism in Edom

2.1) The original Yahweh cult was a Shasu religion located in southern Edom (northwest Saudi Arabia). (video)

Recognized for their goatees and hair held back in a hairband, the Shasu nomads were well-known to the Egyptian authorities. They conducted copper mining in the wilderness, and also were quite successful camel breeders. The Bible uses the terms Edom, Teman, and Midianite interchangeably. Egyptian descriptions of the Shasu geographically overlap the Biblical land of the Midianites.

Okay. So how do we know that the Yahweh cult originated with the Shashu people?

  • Four of the oldest texts in the Bible tell us so. See Deut 33:2, Judges 5:4-5, Habakkuk 3:3 and Isaiah 63:1.
  • Special treatment of Edom. The Bible repeatedly condemns the gods of the Ammonites, the Moabites, and the Sidionites, but never the god of Edom. Deut 23:7 calls Edomites the “brothers” of the Israelites. Edom’s patriarch Esau is said to be the brother of Israel’s patriarch Jacob.
    • The Bible makes a point of not mentioning Qos, the national god of Edom. We have evidence that Qos was a rather late theological development in Edom. Given this evidence, it is plausible to assume that Yahweh was worshipped in Edom and Qos stepped in only when Yahweh became the national god of Israel/Judah.
  • Archaeology.  Two Egyptian inscriptions, one dated to the period of Amenhotep III (14th century BCE), the other to the age of Ramesses II (13th century BCE), refer to “Yahweh in the land of the Shasu”. We also have one 9th century BCE text at Kuntillet Ajrud which refers to “Yahweh of Teman”.

2.2) Who was Yahweh? A god of metallurgy.  (paper)

Gods in the ancient worlds were given a specific set of powers. For reasons we will get into next time, Yahweh in the Bible is attributed the attributes of many kinds of gods: he exhibits power of the storm, of the sun, and even of femininity. But if we limit our search for descriptions of God in Midianite territory, we see the following picture:

Stage 3. The Levite Encounter

The Bible was written by four authors: J, E, P and D. Of these, E, P and D are traced to Levite priestly authors. There exist startling differences across Levite and non-Levite texts.

3.1) There was no mass exodus. But there was a mini-exodus of a group of Levites from Egypt (article, video).

Textual evidence:

  • The two oldest things in the Bible are the Song of the Sea, and the Song of Deborah. The Song of the Sea is a Levite text that does not mention Israel. The Song of Deborah, meanwhile, lists all ten tribes of Israel (Judah and Simeon were a separate community at this time and not part of Israel) but doesn’t mention Levi. Similarly, all twelve tribes are mentioned in the Blessings of Moses, but it is the only tribe associated with the exodus.
  • Detail in Egyptian stories. Only the Levite sources — E, P, and also D — that tell the entire story of the plagues and exodus from Egypt.  J, the non-Levite source, doesn’t tell it. If you read J, it jumps from Moses’ saying “Let my people go” in Exodus 5:1f to the people’s already having departed Egypt in Exodus 13:21.
  • Name of God. If the Levites brought Yahweh into Israel, they should be keen to describe the relationship between Yahweh and El. And only our Levite sources do this: J presumes the name is Yahweh from the beginning of her document.
  • It is likewise the Levite sources that concentrate on the Tabernacle.  E mentions it a little; P treats it a lot. There is more about the Tabernacle than about anything else in the Torah.  But the non-Levite source J never mentions it at all.

Egypt was known to host many Semitic peoples over the years. It is not unthinkable to imagine some small group escaping. The Shasu people were allowed by Mernepteh to bring their herds into Egyptian territory. The absence of evidence only gravitates against a massive exodus. It is silent on the question of an exodus on a small scale.

  • Names of the Levites. Hophni, Hur, Phinehas, Merari, Pashhur and above all Moses are Egyptian names. No one else, in all the names mentioned in the Bible, has an Egyptian name. If Egyptian names were invented, why only attribute them to the Levites? Further, the story of Moses’ name suggests the Biblical redactors did not know these names were Egyptian).
  • Cultural derivatives. There are strong parallels between the Levite priests’ description of the Ark and Egyptian barks. Likewise, the Seraphim that occupy the First Temple come from Egypt (the uraeus) IG.151. The serpent on Aaron’s staff mirrors Egyptian mythology. Professor Michael Homan showed that the Tabernacle has architectural parallels with the battle tent of Pharaoh Ramses II.
  • Circumcision. Only texts written by Levites (11/11)  give the requirement to practice circumcision — which was a known practice in Egypt.  So Egyptian cultural influences are present, but only in the Levite texts!

3.2) Moses was a Midianite.

  • Moses is described as having settled down with the Midianite people (the Shasu). His wife Zipporah and two sons were Midianite. What’s more: Moses’ father-in-law Jethro is called a priest. A priest of what god? Well, in Exodus 18:12, Jethro (and not Moses) is portrayed initiating a sacrifice to Yahweh. The Biblical editors seem uncomfortable with this tradition, for they later interjected a confession of faith on Jethro’s lips, which very much mirrors other such confessions. All of this suggests that Moses’ Midianite father-in-law was a priest of Yahweh. In fact, he seems to have spiritual authority over Moses in this passage.
  • The E source is replete with this kind of claim. We first meet Moses in Midian (no claims of him being born in Egypt, in this document). Moses’ response to Yahweh’s call, “Who am I that I should bring the Israelites out of Egypt?” would be a fair question for a man in Midian. E also claims he cannot go to Egypt because he is “heavy of tongue”. Traditionally interpreted as a speech defect, this phrase only occurs in one other place in the Hebrew Bible, where it means cannot speak the language. Finally, E also claims that the Midianites are direct descendents of Abraham.
  • While two Levite sources admit Moses’ Midianite connection, P actively tried to hide it. In the P source, has absolutely nothing about his ever being in Midian. Nothing about a Midianite wife, a priest father-in-law, nothing about his sons. Two books later, the P source injects a (blood-curdling) story designed to vilify the Midianites. Moses himself gives the order to kill all of the Midianite women. And this source does not include the little fact that Moses has a wife who happens to be a Midianite woman. The fact that the P source tries to deny the Midianite connection suggests the underlying claim is historical.

There are a couple problems with this theory. First, if Moses was Midianite, why did he have an Egyptian name? Further, why would he come to be in Egypt? There are ways around these difficulties (perhaps his name was retrofitted, or perhaps he didn’t come to Egypt, or …).

These problems illustrate that, unlike some of the other theories in this post, this particular hypothesis is under the most uncertainty. Fortunately, we can fairly easily swap it out with alternative theories (Moses as enslaved Levite, Moses as Egyptian royalty, etc) without harming the overall thesis. The key point in all of this, is that the Levites left Egypt and encountered Yahweh in Midian.

3.3) The Levites came into contact with the Shasu cult, and accelerated Yahweh’s introduction to Israel and Judah.

  • We need some account for how Yahweh was introduced into El’s pantheon. It is possible that Yahweh was slowly introduced to Israel via trade with its southern neighbors. However, the Levite emigration to Israel explains how the Yahweh cult became so influential.
  • Location of Sinai. Religious thinking in that era strongly associated gods with locations. In fact, deities were commonly thought to reside in sacred mountains. Mount Olympus was the home of Zeus & his pantheon. Mount Sapan was the home of Ba’al and his pantheon. Mount Sinai (aka Mount Horeb) was the house of Yahweh. This mountain was located in southern Edom, and the Levites regularly traveled to that location to worship him.
  • Exodus 24:8 features Moses splashing blood on his followers in a ritual ceremony. This kind of blood covenant was unknown to Canaan, but common in pre-Islamic Arabia.

Stage 4: El’s Adoption of Yahweh

4.1) On arrival into Israel, Yahweh was introduced as a second tier diety (a member of El’s family).

This can be seen in Deuteronomy 32:8-9, where El gives each of his sons a nation to rule over:

When El gave the nations their inheritance, when he divided all mankind, he set up boundaries for the peoples according to the number of the sons of El. For Yahweh’s portion is his people, Jacob his allotted inheritance.

In Psalm 82, we see Yahweh not at the head of the pantheon, but later asked to assume the job of all gods. “Yahweh stands in the divine assembly of El. Among the divinities, he pronounces judgment… Arise O Yahweh, judge the world; for You inherit all the nations.” Genesis 49:24-25 and Numbers 23-24 also view YHWH and El existing as distinct deities.

Again, we will see more evidence for this particular proposition in part two of this series.

4.2) The Levites “attached” themselves as priestly class

  • The Levites claim responsibility for the massacres in Genesis 34, Exodus 32:26-29, and Numbers 25:6-15 and Jacob’s blessing “Levi’s knives are vicious weapons. May I never enter their council. For in their anger they kill men, and on a whim they hamstring oxen. Their anger is cursed, for it is strong,and their fury, for it is cruel!” While the bloody purges specified in the conquest narrative are non-historical, they too speak towards the bloody zeal of the Levite people. All of this is to say: when they did arrive in Israel asking for refuge, they were not a people the Israelites could easily say no to.
  • In the book of Exodus, there are myriad references to “the people” and very few (retro-fitted) references to the Israelites. It is very plausible that “the people” referred exclusively to militant Levites. Deut 33:2-5 seems to support this distinction: “his people assembled with the tribes of Israel”.
  • On arrival, the Levites are not given territory. Instead, they are given a 10% tithe as priests. This fits into William Propp’s commentary on Exodus, which makes a strong case on the etymology of the very word “Levi” that its most probable meaning is an “attached person” in the sense of resident alien.
  • Over and over, the Levite sources command that one must not mistreat an alien. Why? “Because we were aliens in Egypt”. In the three Levite sources, the command to treat aliens fairly comes up 52 time! And how many times in the non-Levite source, J? None. Compared to legal texts of surrounding nations, this aspect is unique to the Israelite law code.

4.3) The Levites wrote the national history.

Those who accept the (very) strong reasons to think the mass exodus non-historical (section 1.1) need to explain how the story of the Exodus made it into the Bible. But we are not being asked to explain how it was invented whole-cloth. Rather, we must explain why and how memory of the mini-exodus (section 3.1) became stretched and aggrandized over time.

Why did the Levites invent the mass-exodus narrative?

  1. Promoting worship of Yahweh. The Levites were convinced that Yahweh had saved them from Egypt. What better way to have Israel worship Yahweh, than create a new history?
  2. Simple power politics. Political influence is easier to hold & retain if your group is the only “outsider”.
  3. Political unification. Iron age Israel was theocratic. The priests and kings shared (and sometimes competed for) power. A common origin story is a powerful tool for unification and shared identity. Similarly, the demonization on lowland city states (cultural & ethic siblings) as “Canaanite” served to support campaigns against them.

How did they accomplish this? By the production and dissemination of an origin story.  

While we are investigating the historicity of the Biblical narrative, we should also consider: why do these texts exist at all? The Hebrew Bible is humanity’s first attempt at prose, and of history. This intermingling of religion and history was unique to the ancient world. Instead of cyclic episodes of mythological combat, the Israelite religious imagination was fixated on events of their material past. Its structure is entirely unique, and cries out for an explanation. The Bible was written to create a written tradition (much more stable than oral traditions) of national identity.

In addition to violence, the Levites also had a reputation for teaching. We can see this in verses like Deuteronomy 6:20-23, which reads,

When your children ask you later on, “What are these laws that Yahweh commanded you?” you must say to them, “We were Pharaoh’s slaves in Egypt, but the Lord brought us out of Egypt in a powerful way. And he brought signs and great, devastating wonders on Egypt, on Pharaoh, and on his whole family before our very eyes. He delivered us from there so that he could give us the land he had promised our ancestors.

What specifically did the Levites fabricate?

They started with their own experience (an actual event), and added the following:

First, to make a mini-exodus massive, you need large numbers. You can actually “watch” the estimates grow as we move from earlier to later sources. J doesn’t mention numbers at all. E estimates a total of around 600,000, and P estimates of total of 600,000 fighting-age males (for a total of two million).

Second, the Exodus, without the conquest, would never have survived as a story. You need to explain how a nomadic nation came to reside in someone else’s territory. The conquest does this (and also stokes political sentiment of a later time period).

Why did the Israelites believe this story?

Don’t we all evaluate our personal origin stories with a bit too much credulity? Many Romans literally believed a wolf raised their patriarchs. Even in American culture, many people I’ve spoken with conceive of the Founding Fathers in mythic, rather than human, terms.

But why didn’t the first recipients of the mass exodus story reject it? Imagine the Levites waited ten or twenty generations before telling the story, and the mini-exodus narrative expansion happened only gradually. Israelites would only have distant inklings of the remembered past to go on. It is true that, for the exodus story to take root in early Israel it was necessary for it to pertain to the remembered past of settlers who did not emigrate from Egypt. And this is in fact the case. Egypt did control and oppress Canaan, during the mini-Exodus.


Today we learned that Yahweh was originally a god of metallurgy in northwest Saudi Arabia. The Levites brought worship of him to Israel.

More specifically:

  • Certain aspects of Israelite prehistory as given by the Bible cannot be read literally. We have strong evidence that he Israelite people we indigenous Canaanites. The Israelites and the Canaanites shared the same religion: the pantheon of El. The earliest Israelites worshipped creator god El, his wife Asherah, and his sons e.g., Baal.
  • The original Yahweh cult was located in south Edom (northwest Saudi Arabia). Yahweh was there worshipped by the Shasu people as a god of metallurgy
  • There was no mass exodus. But there was a mini-exodus of a group of Levite priests from Egypt. The Biblical evidence suggests that Moses was a Midianite, and his encounter with Yahweh occurred in Midian.
  • On arrival at Israel, the Levites were incorporated into the Israeli population. Instead of land, they were ceded priestly roles, which included a 10% tithe. Their deity Yahweh was introduced as a second tier god: a member of El’s family. The national history created by the Levites thus helped unify Israel around her new pre-history.

Until next time.

[Sequence] Demystifying Religion

Most of my efforts focus on various aspects of science and mathematics. Why write about religion?

I am not particularly interested in evaluating theological claims. But this blog is very interested in the computational and biological bases of primate sociality. And religion plays a key role in our evolved social capacities.

This post is meant as an executive summary of various positions I have come to accept over the years. As with my other overview posts, the positions laid out here are a moving target. I’m hoping to eventually motivate each topic; to give you the evidence rather than summarizing the belief. If you want to hear more about a particular topic, don’t hesitate to let me know!

Social Theses

At a social level, religious belief brings communities together. This explains the special attention many faiths place on ethics: ethical norms are the frame on which social institutions rest. It also explains why most conversion experiences tends to occur at a deeper, more emotional place of the mind (not so much in the cold light of reason).

  1. The Relational Sphere Hypothesis. Social institutions come in three flavors. There is the political sphere, economic sphere, and social sphere. Religious institutions are an extension of (a buttressing of) the social sphere.
  2. Generator of Social Capital. The reason why religion became institutionalized is that, with the triumph of market economies over gift economies, religious structure provided an alternative mode for promoting social bonds within a community.
  3. Monotheistic Cohesion Hypothesis. Monotheistic cultures tend to treat strangers more fairly than polytheistic ones. Monotheism was successful in part because it facilitated larger group size (strangers could identify as the same team).

Cognitive Theses

At a cognitive level, religious experience meets at been the nexus of animism, mythology, and ritual. Occasionally it is accentuated by numinous (altered) states of consciousness. Only very recently has belief played a role in some forms of religious participation. Here, I survey the cognitive machinery that drives these aspects of religiosity.

  1. Animism as Hyperactive Agency Detection. Mammals are good at differentiating events caused by inanimate nature, versus those caused by animate events. Due to the asymmetry of false positives vs false negatives, our Agency Detectors are built on a hair trigger: we are often too quick to attribute agency. Humans are susceptible to invoke supernatural agents whenever emotionally eruptive events arise that have superficial characteristics of agency in the absence of a corresponding agent.
  2. Mythology as Counterintuitive Narratives
  3. Ritual as Paradox-Based Social Bonding
  4. The Numinous as Altered States of Consciousness
  5. Two Faces of Meaning. Beliefs are like clothes; they serve two purposes. The first purpose is functional: beliefs can constrain expectations of physical experience. The second purpose is signaling: beliefs can signal group membership, ethical values, and personality. Most beliefs serve both purposes, at least to some extent. Religious belief is notable in that its content is mostly the latter. That is, religious belief typically does not constrain expectation of physical experience.

Historical Theses

It is admittedly strange to discuss 1st century Palestine in depth. Why pay so much attention here, as opposed to 7th century Saudi Arabia, or 19th century US state of Utah?

To this I must disclose that, most of my friends and family self-describe as evangelical Christian. 1st century Palestine is brought to my attention literally once a week. I am hoping these conversations become more interesting after I construct positive theories that go beyond “I don’t know”.

Thus, the following historical theses are rightly viewed as less interesting than more universal topics on my blog. That said, perhaps you will find value in them.

Judaism subsequence

  • A Secret In The Ark. Presents the linguistic evidence that the story of Noah was not authored by Moses, but was instead produced by the interweaving of two (surprisingly divergent) narratives.
  • Who Wrote The Bible? Introduces the theory as a generalization of observations such as the above.

Christianity subsequence

  • Jesus as Apocalyptic Prophet.

Who Wrote The Bible?

Part Of: Demystifying Religion sequence
Followup To: A Secret in The Ark
Content Summary: 1900 words, 19min read.

Who Wrote The Hebrew Bible?

A close reading of the Hebrew Bible reveals the existence of doublets: two stories that describe the same event. A few examples:

  • Abraham’s covenant (Genesis 15:1-21 and 17:1-27),
  • Jacob becoming Israel (Genesis 32:25-33 and 35:9-15),
  • Yahweh summons Moses (Exodus 3-4 and 6:2-30)
  • Water in the wilderness (Exodus 15:22b-25a and 17:1-7)

Dozens of these doublets appear throughout the first five books of the Hebrew Bible (also known as the Torah). Traditionally, the Torah is thought to have a single author, and doublets like these were explained as either a) different events, or b) same event but with different emphases.

But what if these doublets exist because the Torah has multiple authors?

Let’s look deeper.

Source Identification as Unsupervised Learning

In principle, how might we discern between a single- and a multi-author book?

The Clustering Method. Let’s conjecture two sources (clusters) and, for each sentence, assign it either Cluster 1 or Cluster 2. We have complete freedom in our assignments. We want to chose clusters that maximize the coherence within each source, and also maximize the difference between the sources.

  • If the clusters are not very different, there is probably only one author.
  • If they are very different, we can safely conclude two authors.

For readers familiar with machine learning: this is unsupervised learning – searching for latent variables that best explain our data.

A Tale of Two Books

Suppose you encounter a book you have never read before, originally written in English by a single author. Call this Book A.

But you don’t know if Book A has one or two authors! To find out, you might use the Clustering Method.

What happens if you look at every sentence in Book A, and try to make each source-cluster as different as possible. Even for books written by a single author, the resultant source-clusters could be contrived to be truly different. For example, you could put all optimistic sentences in one bucket, and all pessimistic sentences in the other. But even though the texts feel a little different, they don’t differ that much (after all, a single person wrote both!)

In contrast, imagine you come across another book, Book B, replete with doublets. You break those doublets into clusters, and discover the following facts:

  1. Dialect. One cluster uses an antiquated dialect of English (e.g., Shakespearean), the other a modern dialect (e.g., African-American Vernacular English).
  2. Terminology. One cluster consistently uses the word “soda”, the other consistently uses the alternative, “pop”. 
  3. Consistent Content. One cluster is very interested in economic issues. The other is more interested in rehashing political debates.
  4. Narrative Flow.  Reading each cluster as a standalone book tends to smooth out non-sequiturs, and generally improve the sense of narrative flow.
  5. Inter-Source Relationships.  Imagine Book B is situated in an anthology with other books (B2 and B3) of unknown authorship. These other books are kinda dissimilar  from B. But B2 has lots in common with Cluster 1, and B3 sounds like it shares an author with Cluster 2. 
  6. Historical Grounding. Given the above information, we can make a pretty good guess as to identify of both authors, and why they got merged into a single anonymous volume.

On this evidence, it seems very unlikely that there is a single author of Book B. Instead, most people would indeed accept that this document has two different authors.

The Hypothesis: Five Sources

The Hebrew Bible is like Book B. Only, instead of two distinct authors, we have identified five. These are the Jahwist source (J), the Elohim source (E), the  Priestly source (P), the Deuteronomist source (D), and the Redactor (R). This is the Documentary Hypothesis.

We will explore the different personalities of these authors in more detail next section; for now, I want to briefly describe their contributions to the Torah from a textual perspective:

Documentary Hypothesis_ Source Distribution

And here is the timeline on which our source documents were authored, where the final redactor R (Ezra) compiled the final JEPD product.

Documentary Hypothesis_ Composition Timeline (2)

Evidence For The Hypothesis

How do we know all of this? On the following grounds:

  1. Dialect. Sources J and E are written in the Hebrew of the 10th BCE. In contrast, P and D are written in 8th century BCE.
  2. Terminology. A couple examples. Source D alone use of the phrase “with all your heart and with all your soul”. Source P uses all 100 instances of the word “congregation”, and 67 out of 69 examples of the work “chieftain”. Here are more examples:

Documentary Hypothesis_ Terminology (1)

  1. Consistent Content.
    • The Revelation of God’s Name.  According to J, the name YHWH was known since the earliest generations of humans. But in E and P it is stated just as explicitly that YHWH does not reveal this name until the generation of Moses.
    • Sacred Objects.
      • Tabernacle: P discusses the Tabernacle 200 times, it receives more attention than any other subject. It is never mentioned in J or D. E mentions it three times.
      • The Ark: J identifies the ark is identified as crucial to Israel’s travels and military successes; it is never mentioned in E.
      • Urim and Thummim: P mentions Urim and Thummim. J, E, and D never do.
      • Cherubs: P and J invoke cherubs. E and D never do.
      • Miracles: E has miracles performed by Moses’ staff. P uses Aaron’s staff.
    • Priestly Leadership. In P, access to the divine is limited to Aaronid priests. There is no talk of dreams, angels, talking animals, judges, and very few mentions to prophets. These themes are developed almost exclusively in J, E, and D.
  2. Narrative Flow.  Reading J, E, D, and P as standalone narratives tends to remove non-sequiturs and contradictions, and generally improve the sense of narrative flow. Want to see this for yourself? Go compare the original composite story of Noah, and contrast it with the original two stories (the original stories were weaved together by a later redactor).
  3. Inter-Source Relationships. Source D shares the same tone, emphases, and worldview as the book of Jeremiah. Source P resonates strongly with the book of Ezekiel. Finally, Sources J and E mirrors the book of Hosea.
  4. Historical Grounding. This is the most exciting piece of evidence, for reasons I will more fully explore next time. Suffice to say that we can localize each source to the historical context in which it was written. We have evidence suggesting that J and E were composed during the divided monarchy, before Israel fell in 722 BCE. J is written from a Southern perspective (in Judah), E is written from a Norther perspective (in Israel). After the fall of the northern kingdom, many Israelites fled to Judah. Because the old tribal disputes had faded in importance, J and E were combined into a JE narrative. The Priestly source P was an alternative telling of JE written in 8th century Judah. Finally, the first iteration of Deuteronomy was composed during the reign of King Josiah (641 BCE), just 20 years before the Babylonian exile (622 BCE).

I’ll let Richard Elliott Friedman wrap up this section.

Above all, the strongest evidence establishing the Documentary Hypothesis is that several different lines of evidence converge. There are more than thirty cases of doublets: stories or laws that are repeated in the Torah. The existence of so many overlapping texts is noteworthy itself. But their mere existence is not the strongest argument. One could respond, after all, that this is just a matter of style of narrative strategy. Similarly, there are hundreds of apparent contradictions in the text, but one could respond that we can taken them one by one and find some explanation for each contradiction. And, similarly, there is a matter of the texts that consistently call the deity God while other texts consistently call God by the name YHWH, to which one could respond that this is simply like calling someone sometimes by his name and sometimes by his title.

The powerful argument is not any one of these matters. It is that all these matters converge. When we separate the doublets, this also results in the resolution of nearly all the contradictions. And when we separate the doublets, the name of God divides consistently in all but three out of more than two thousand occurrences. And when we separate the doublets, the terminology of each source remains consistent within the source. And when we separate the sources, this produces continuous narratives that flow with only a rare break. And when we separate the sources, this fits with the linguistic evidence, where the Hebrew of each source fits consistently with what we know of the Hebrew in each period. And so on for each of the categories that precede this section.

The name of God and doublets were the were the starting-points of the investigation into the formation of the Bible. But they are not major arguments or evidence in themselves. The most compelling argument is that all this evidence of so many kinds comes together so consistently. To this day, no one known to me who challenged the hypothesis has ever addressed this fact.

Open Questions

Most scholars agree with the broad picture of four sources (J, E, P, D) and two redactions (JE and JEPD). There does exist considerable controversy at finer levels of detail. The four most contentious mini-debates I know of are as follows:

  • While there is consensus on the dating of J, E, and D, the dating of P is somewhat controversial (700 vs 500 BCE).
  • The exact relationship of J and E is at times hard to work out, particularly because E has less material than J. Were parts of E ejected during the redaction process of JE? Or was E composed as a supplement to J, and not a standalone work?
  • It is hard to make out how the two redaction processes actually worked. The Hebrew Bible is the very first example of prose writing in the entire world (earlier writing was entirely poetic).
  • There is consensus that J, E, P, and D were authored long after the events that they describe. They were undoubtedly influenced by early oral traditions. However, the extent of continuity and historical memory transferred from these oral traditions is in some doubt.


I was raised in an evangelical household, which means that growing up, I have read the Hebrew Bible (known to Christians as the Old Testament) cover-to-cover several times. I found such reading difficult. Some of this was mere cultural distance: a kid in the 20th century CE is three millenia removed from Canaanite culture in which the Bible was written.

But for me, the Hebrew Bible feels much easier to understand in light of the Documentary Hypothesis.

  • Contradictions are explained.
  • The within-source stories flow much better.
  • It is easier to understand the narrative discontinuities in the composite.
  • The diverse perspectives can be situated within their originating cultural milieu.

I wish more people knew about the Documentary Hypothesis for these reasons. Or better yet, could look at the labelled sources of the Hebrew Bible online. But for now, if you’d like to read the Hebrew Bible yourself, with labelled sources, the best way to do this is simply to purchase a book like, The Bible With Sources Revealed for a copy of the complete Torah, color coded by authorship.

Until next time.

An Introduction to Language Models

Part Of: Language sequence
Content Summary: 1500 words, 15 min read

Why Language Models?

In the English language, ‘e’ appears more frequently than ‘z’. Similarly,  “the” occurs more frequently than “octopus”. By examining large volumes of text, we can learn the probability distributions of characters and words.

Language Models_ Letter and Word Frequency

Roughly speaking, statistical structure is distance from maximal entropy. The fact that the above distributions are non-uniform means that English is internally recoverable: if noise corrupts part of a message, the surrounding can be used to recover the original signal. Statistical structure is also used to reverse engineer secret codes such as the Roman cipher.

We can illustrate the predictability of English by generating text based on the above probability distributions. As you factor in more of the surrounding context, the utterances begin to sound less alien, and more like natural language.

Language Model_ Structure of English

A language model exploits the statistical structure of a language to express the following:

  • Assign a probability to a sentence P(w_1, w_2, w_3, \ldots w_N)
  • Assign probability of an upcoming word P(w_4 \mid w_1, w_2, w_3)

Language models are particularly useful in language perception, because they can help interpret ambiguous utterances. Three such applications might be,

  • Machine Translation: P(\text{high winds tonight}) > P(\text{large winds tonight})
  • Spelling correction: P(\text{fifteen minutes from}) > P(\text{fifteen minuets from})
  • Speech Recognition: P(\text{I saw a van}) > P(\text{eyes awe of an})

Language models can also aid in language production. One example of this is autocomplete-based typing assistants, commonly displayed within text messaging applications. 

Towards N-Grams

A sentence is a sequence of words \textbf{w} = (w_1, w_2, \ldots, w_3). To model the joint probability over this sequence, we use the chain rule:

p(\text{this is the house})

= p(\text{this})p(\text{is}\mid\text{this})p(\text{the}\mid\text{this is})p(\text{house}\mid\text{this is the})

As the number of words grows, the size of our conditional probability tables (CPTs) quickly becomes intractable. What is to be done? Well, recall the Markov assumption we introduced in Markov chains.


The Markov assumption constrains the size of our CPTs. However, sometimes we want to condition on more (or less!) than just one previous word. Let v denote how many variables we admit in our context. A variable order Markov model (VOM) allows v elements in its context: p(s_{t+1} | s_{t-v}, \ldots, s_{t}). Then the size of our CPT is n=v+1, because we must take our original variable into account. Thus an N-gram is defined as a v-order Markov model. By far, the most common choices are trigrams, bigrams, and unigrams:

Language Models_ Ngram comparison (1)

We have already discussed Markov Decision Processes, used in reinforcement learning applications.  We haven’t yet discussed MRFs and HMMs. VOMs represent a fourth extension: the formalization of N-grams. Hopefully you are starting to appreciate the  richness of this “formalism family”. 🙂

Language Model_ Markov Formalisms (1)

Estimation and Generation

How can we estimate these probabilities? By counting!


Let’s consider a simple bigram language model. Imagine training on this corpus:

This is the cheese.

That lay in the house that Alice built.

Suppose our trained LM encounters the new sentence “this is the house”. It estimates its probability as:

p(\text{this is the house})

= p(\text{this})p(\text{is} \mid \text{this})p(\text{the} \mid \text{is})p(\text{house} \mid \text{the}) 

= \dfrac{1}{12} * 1 * 1 * \dfrac{1}{2} = \dfrac{1}{24}

How many problems do you see with this model? Let me discuss two.

First, we have estimated that p(\text{this}) = \dfrac{1}{24}. And it is true that “this” occurs only once in our toy corpus above. But out of two sentences, “this” leads half of them. We can express this fact by adding a special START token into our vocabulary.

Second, recall what happens when language models generate speech. Once they begin a sentence, they are unable to end it! Adding a new END token will allow our model the terminate a sentence, and begin a new one.

With these new tokens in hand, we update our products as follows:

Language Models_ Sentence Estimation (1)

A couple other “bug fixes” I’ll mention in passing:

  • Out-of-vocabulary words are given zero probability. It helps to add an unknown  (UNK) pseudoword and assign it some probability mass.
  • LMs prefer very short sentences (sequential multiplication is monotonic decreasing). We can address this e.g., normalizing by sentence length.


In the last sentence in the image above, we estimate p(END|house) = 0, because we have no instances of this two-word sequence in our toy corpus. But this causes our language model to fail catastrophically: the sentence is deemed impossible (0% probability).

This problem of zero probability increases as we increase the complexity of our N-grams. Trigram models are more accurate than bigrams, but produce more p=0 events. You’ll notice echoes of the bias-variance (accuracy-generalization) tradeoff.

How can we remove zero counts? Why not add one to every word? Of course, we’d then need to increase the size of our denominator, to ensure the probabilities still sum to one. This is Laplace smoothing

Language Model_ Laplace Smoothing

In a later post, we will explore how (in a Bayesian framework) such smoothing algorithms can be interpreted as a form of regularization (MAP vs MLE).

Due to its simplicity, Laplace smoothing is well-known  But several algorithms achieve better performance.  How do they approach smoothing?

Recall that a zero count event in an N-gram is not likely to occur in (N-1)-gram model. For example, it is very possible that the phrase “dancing were thought” hasn’t been seen before. 

Language Model_ Backoff Smoothing

While a trigram model may balk at the above sentence, we can fall back on the bigram and/or unigram models. This technique underlies the Stupid Backoff algorithm.

As another variant on this theme, some smoothing algorithms train multiple N-grams, and essentially use interpolation as an ensembling method. Such models include Good-Turing and Kneser-Ney algorithms.

Beam Search

We have so far seen examples of language perception, which assigns probabilities to text. Let us consider language perception, which generates text from the probabilistic model. Consider machine translation. For a French sentence \textbf{x}, we want to produce the English sentence \textbf{y} such that y^* = \text{argmax } p(y\mid x).  

This seemingly innocent expression conceals a truly monstrous search space. Deterministic search has us examine every possible English sentence. For a vocabulary size V, there are V^2 possible two-word sentences. For sentences of length n, our time complexity of our brute force algorithm is O(V^n).

Since deterministic search is so costly, we might consider greedy search instead. Consider an example French sentence \textbf{x} “Jane visite l’Afrique en Septembre”. Three candidate translations might be,

  • y^A: Jane is visiting Africa in September
  • y^B: Jane is going to Africa in September
  • y^C: In September, Jane went to Africa

Of these, p(y^A|x) is the best (most probable) translation. We would like greedy search to recover it.

Greedy search generates the English translation, one word at a time. If “Jane” is the most probable first word \text{argmax } p(w_1 \mid x), then the next word generated is \text{argmax } p(w_2 \mid \text{Jane}, x). However, it is not difficult to contemplate p(\text{going}\mid\text{Jane is}) > p(\text{visiting}\mid\text{Jane is}), since the word “going” is used so much more frequently in everyday conversation. These problems of local optima happen surprisingly often.

The deterministic search space is too large, and greedy search is too confining. Let’s look for a common ground.

Beam search resembles greedy search in that it generates words sequentially. Whereas greedy search only drills one such path in the search tree, beam search drills a finite number of paths. Consider the following example with beamwidth b=3


As you can see, beam search elects to explore y^A as a “second rate” translation candidate despite y^B initially receiving the most probability mass. Only later in the sentence does the language model discover the virtues of the y^A translation. 🙂

Strengths and Weaknesses

Language models have three very significant weaknesses.

First, language models are blind to syntax. They don’t even have a concept of nouns vs. verbs!  You have to look elsewhere to find representations of pretty much any latent structure discovered by linguistic and psycholinguistic research.

Second, language models are blind to semantics and pragmatics. This is particularly evident in the case of language production: try having your SMS autocomplete write out an entire sentence for you. In the real world, communication is more constrained: we choose the most likely word given the semantic content we wish to express right now.

Third, the Markov assumption is problematic due to long-distance dependencies. Compare the phrase “dog runs” vs “dogs run”. Clearly, the verb suffix depends on the noun suffix (and vice versa). Trigram models are able to capture this dependency. However, if you center-embed prepositional phrases, e.g., “dog/s that live on my street and bark incessantly at night run/s”, N-grams fail to capture this dependency.

Despite these limitations, language models “just work” in a surprising diversity of applications. These models are particularly relevant today because it turns out that Deep Learning sequence models like LSTMs share much in common with VOMs. But that is a story we shall have to take up next time.

Until then.


[Video] An introduction to reinforcement learning

Part Of: Reinforcement Learning sequence

Sorry it’s been so long since my last post!  I’ve been teaching a Deep Learning class, based on Andrew Ng’s Coursera specialization.  Don’t worry, my other lectures will ultimately be cleaned & shared here too 🙂

This talk covers the mathematical intuitions of RL, which draws from content relating to Markov Chains and Markov Decision Processes. It also contains some novel material, including my thoughts on how RL compares with other machine learning techniques.


The Relational Sphere Hypothesis

Part Of: Demystifying Sociality sequence
Followup To: The Three Spheres of Culture
Content Summary: 1700 words, 17 min read

A Theory of Relationship Dynamics

How can we make sense of social life? Let’s start by considering a simple cup of coffee.  

  1. In my own house, I can just help myself to as much as I want, sharing with others in the framework of “what’s mine is yours.”  
  2. Or my friend can get me a cup of coffee in return for the one I got for him yesterday, so we take turns or match small favors for each other.
  3. At Starbucks, I buy my coffee, using price and value as the framework.
  4. To my children, however, none of these principles apply. To them, coffee is something that only “big people” are allowed to drink: It is a privilege that goes with social rank.

What is true of a humble cup of coffee is true of the moral dilemmas surrounding major policy questions such as organ donation. Decisions have to be made, and there are again four fundamental ways to make them:

  1. Should we hold a lottery, giving each person an equal chance?  
  2. Should we somehow rank the social importance of potential recipients?
  3. Should we sell organs to the highest bidder?  
  4. Or should we expect everyone in a local community to give freely, offering a kidney to anyone group member in need?

(The above excerpt is from [FE] )

Relational Models Theory (RMT) proposes that these four social categories are exhaustive and culturally universal. Human interactions are complex, and typically use more than one of the above processes. But every relationship, in every culture, seems to be some combination of the following:

  • In Communal Sharing (Communality), people are viewed as equals oriented around some particular identity. This can include being in love, sports fans, and co-religionists.
  • In Authority Ranking (Dominance), people are situated in a hierarchy where superiors are deferred to, respected, and in some cases obeyed.
  • In Equality Matching (Reciprocity) people are interested in restoring balance, turn-taking, and making sure everyone is treated fairly. 
  • In Market Pricing (Exchange), relationships are governed by quantitative, utilitarian concerns such as prices, exchanges, or cost-benefit analyses.

We can use relational models to explain a wide swathe of social phenomena:

  • Some examples of norm violation are in fact category errors. For example, we would interpret a situation such as the price of our meal is two hours on dishwasher duty as a conflation of Market Pricing vs. Equality Matching.
  • Some (but not all) examples of taboo trade-offs are in fact category errors. The Finite Price of Human Life thesis feels counterintuitive because it pits our Market Pricing versus the sacred values held by Communality.
  • Humans often use indirect speech acts to reconcile relationship types with semantic content.Rather than saying e.g., “pick me up after work”, we often say things like, “If you would pick me up after work, that would be awesome”. While more verbose, the latter expression feels more polite because it is couched in a Communality frame, rather than signaling Dominance.

In addition to its explanatory reach, multiple strands of evidence come together in support of  Relational Model theory:

  • Factor analysis. If you ask people to describe their relationships, you can see whether your theory predicts statistical patterns in their responses. When RMT was compared with other taxonomies (and there are a lot of them), RMT starkly outperforms its competitors. 
  • Ethnographies. RMT was invented by anthropologist Alan Fiske to capture regularities he saw across different cultures. For example, he found examples of marriage treated as Dominance, as Market Pricing, etc – but never a fifth type. A number of cross-cultural studies indicate that the four relational models constitute a human universal.
  • Social errors. When people misremember a person’s name, it tends to be a person with whom they share the same relationship type. For example, if you flub the name of your boss, you are more likely to say the name of someone else in a position of authority over you.
  • Brain studies.  In the cortex, the default mode network is universally acknowledged to perform social processing. But within this specialized region, different subregions are activated when processing e.g., Communality vs Reciprocity relationships.

The Relational Sphere Hypothesis

Human societies can be conceived as operating in three spheres: markets, governments, and communities. The Cultural Sphere Hypothesis holds this trichotomy to be fundamental, and exhaustive of social space.

Relational Models_ Cultural Regime Dissociations (4)

There seems to be a relationship between the cultural spheres and relation models. But there are three spheres vs four models. What gives?

Things become more clear when we remember that market- based economies were invented during the Neolithic Revolution, with the dawn of agriculture. Before this inflection point in history, transactions took place with gift economies.

This suggests that the Market Pricing relational model is evolutionarily recent: before the invention of agriculture, it simply did not exist.

Relational Model Theory_ Models vs Spheres (3)

I call this particular mapping from relational models to cultural spheres the Relational Sphere Hypothesis (RSH). It is an intertheoretic reduction: it purports to be a significant join point between micro- and macro-sociality.

RSH predicts that three out of four relational models can be traced back to the birthplace of Homo Sapiens. Thus, we should expect predecessors for these relationship categories in primate societies! And we find precisely that:

  • Dominance models are expressed in the dominance hierarchy (where physical dominance slowly gave way to symbolic dominance).
  • Communality models are expressed in kin selection (where attachment to and care for relatives was slowly extended towards e.g. close friends).
  • Reciprocity models are expressed in reciprocal altruism (where increasingly large delays between favor-transactions became possible).

I have argued elsewhere that the dual-process models so popular in today’s moral psychology can be captured in the interactions between (cortical) propriety frames and (subcortical) social intuitions. These two systems comprise the building blocks of sociality. RSH dovetails nicely with this dual process account, as it perceives categories within these systems, each with its own distinctive logic:


With the exception of Sanctity, these subconscious social intuitions arguably exist in primates. For example, here is evidence that rhesus monkeys have strong intuitions about Fairness:

A New Kind of Social Network

The Relational Sphere Hypothesis can be further illustrated by social networks: graphs where nodes are individuals, and edges are relationships. These kinds of models are very common across many disciplines that study aggregate social phenomena; for example evolutionary game theorists. A social network may look something like this:

Relational Models_ Aggregated Social Networks

But relationships inhabit different categories. We can express this fact by coloring edges according to their relational model:

Relational Models_ Complete Social Network (2)

Note that some nodes (e.g. A and B) are connected by more than one color. This signifies that the relationship between A and B features both Communality and Dominance.

From this more complete picture of human relationships, we can derive our cultural spheres by examining the (mono-color) subgraphs:

Relational Models_ Social Network Subgraphs (2)

Sphere Evolution & Competition

Political, social, and economic institutions have dramatically changed across the course of human history. As we saw in Deep History of Humanity, the evolution of our species can be usefully divided into three time periods:

Relational Models_ Sphere Evolution (1)


The Sphere Competition Conjecture comprises a set of informal intuitions that relational models “competes for our attention”: gains in one sphere are often accompanied by losses in another.

Let me illustrate this conjecture with examples. 🙂

Social vs Economic spheres

  • The religious instinct is etched deeply into the hominid mind, and evidence for shamanic animism dates back to the advent of behavioral modernity. Modern religion is located squarely within the Social sphere. But what caused its institutionalization, the invention of the full-time religious specialist: the priest? Religious institutions were founded during the transition from gift economy to market economies. For the first time in history, material wealth mattered more in transactions than interpersonal reputation. With the Social sphere threatening to collapse, perhaps it is not a coincidence that it was at this moment in history that religion became more explicitly social.
  • Some existential philosophers argue that the industrial revolution, with its obscenely large increase in Economic productivity, has correlated with a weakening of Social values, as witnessed empirically by the rise of materialism. Perhaps the malaise and cynicism of postmodernity can be explained by the weakening of the ties of community.
  • The custom of tipping can be conceived as an organ of Sociality, that feels misplaced in today’s Market-oriented economy. This institution shows no signs of abating (for example, Uber recently rescinded its no-tipping policy). Perhaps the reason this Social technology persists, while others have disintegrated, is because tipping solves the principal agent problem: customer service is otherwise not factored into the price, because that information is not easily available to management.  
  • Product boycotts are another example of Social outrage affecting Economic markets.

Social vs Political.

  • Another important event in the history of religion is the transition to universal religions: where the concerns of the gods and the consequences of moral violations were imbued with an aura of the eternal. Anthropological evidence clearly suggests that universal religions succeeded because they facilitated larger group sizes.
  • Corruption is often treated as a political problem, but in fact bribery and collusion both require high amounts of social capital.
  • In American history, political partisanship has been most severe in the 1880s, and at present. Both then and now are periods of an intense drought of social capital. Further, participation in voting strongly correlates with vibrant community and civic life. We might conjecture that weaker communities are more vulnerable to partisanship infighting. This conjecture is aligned with the oft-cited observation that partisanship tends to correlate with moderates abandoning the political arena.

Economic vs Political.

  • Capitalist Peace Theory formalizes the observed inverse relationship between free trade and international conflict. On this hypothesis, one of the strongest predictors of war is resource acquisition, and the risk-benefit calculus changes (improves) substantially with the removal of tariffs.

Economic vs Political vs Social.

  • The Size of Nations Hypothesis is the idea that the size of nation (Political) is driven by two competing factors: larger nations are able to produce public goods more efficiently (Economic), but conversely their populations are more heterogenous and thereby less cohesive (Socially).

Some of the phenomena described above have been extensively studied by social scientists. However, to my knowledge, no extant models robustly capture the doctrine of relational model theory. Perhaps the next generation of formal models will do better.

Recommended Resources