The Documentary Hypothesis

Part Of: Demystifying Religion sequence
Followup To: A Secret in The Ark
Content Summary: 1900 words, 19min read.

Who Wrote The Hebrew Bible?

A close reading of the Hebrew Bible reveals the existence of doublets: two stories that describe the same event. A few examples:

  • Abraham’s covenant (Genesis 15:1-21 and 17:1-27),
  • Jacob becoming Israel (Genesis 32:25-33 and 35:9-15),
  • Yahweh summons Moses (Exodus 3-4 and 6:2-30)
  • Water in the wilderness (Exodus 15:22b-25a and 17:1-7)

Dozens of these doublets appear throughout the first five books of the Hebrew Bible (also known as the Torah). Traditionally, the Torah is thought to have a single author, and doublets like these were explained as either a) different events, or b) same event but with different emphases.

But what if these doublets exist because the Torah has multiple authors?

Let’s look deeper.

Source Identification as Unsupervised Learning

In principle, how might we discern between a single- and a multi-author book?

The Clustering Method. Let’s conjecture two sources (clusters) and, for each sentence, assign it either Cluster 1 or Cluster 2. We have complete freedom in our assignments. We want to chose clusters that maximize the coherence within each source, and also maximize the difference between the sources.

  • If the clusters are not very different, there is probably only one author.
  • If they are very different, we can safely conclude two authors.

For readers familiar with machine learning: this is unsupervised learning – searching for latent variables that best explain our data.

A Tale of Two Books

Suppose you encounter a book you have never read before, originally written in English by a single author. Call this Book A.

But you don’t know if Book A has one or two authors! To find out, you might use the Clustering Method.

What happens if you look at every sentence in Book A, and try to make each source-cluster as different as possible. Even for books written by a single author, the resultant source-clusters could be contrived to be truly different. For example, you could put all optimistic sentences in one bucket, and all pessimistic sentences in the other. But even though the texts feel a little different, they don’t differ that much (after all, a single person wrote both!)

In contrast, imagine you come across another book, Book B, replete with doublets. You break those doublets into clusters, and discover the following facts:

  1. Dialect. One cluster uses an antiquated dialect of English (e.g., Shakespearean), the other a modern dialect (e.g., African-American Vernacular English).
  2. Terminology. One cluster consistently uses the word “soda”, the other consistently uses the alternative, “pop”. 
  3. Consistent Content. One cluster is very interested in economic issues. The other is more interested in rehashing political debates.
  4. Narrative Flow.  Reading each cluster as a standalone book tends to smooth out non-sequiturs, and generally improve the sense of narrative flow.
  5. Inter-Source Relationships.  Imagine Book B is situated in an anthology with other books (B2 and B3) of unknown authorship. These other books are kinda dissimilar  from B. But B2 has lots in common with Cluster 1, and B3 sounds like it shares an author with Cluster 2. 
  6. Historical Grounding. Given the above information, we can make a pretty good guess as to identify of both authors, and why they got merged into a single anonymous volume.

On this evidence, it seems very unlikely that there is a single author of Book B. Instead, most people would indeed accept that this document has two different authors.

The Hypothesis: Five Sources

The Hebrew Bible is like Book B. Only, instead of two distinct authors, we have identified five. These are the Jahwist source (J), the Elohim source (E), the  Priestly source (P), the Deuteronomist source (D), and the Redactor (R). This is the Documentary Hypothesis.

We will explore the different personalities of these authors in more detail next section; for now, I want to briefly describe their contributions to the Torah from a textual perspective:

Documentary Hypothesis_ Source Distribution

And here is the timeline on which our source documents were authored, where the final redactor R (Ezra) compiled the final JEPD product.

Documentary Hypothesis_ Composition Timeline (2)

Evidence For The Hypothesis

How do we know all of this? On the following grounds:

  1. Dialect. Sources J and E are written in the Hebrew of the 10th BCE. In contrast, P and D are written in 8th century BCE.
  2. Terminology. A couple examples. Source D alone use of the phrase “with all your heart and with all your soul”. Source P uses all 100 instances of the word “congregation”, and 67 out of 69 examples of the work “chieftain”. Here are more examples:

Documentary Hypothesis_ Terminology (1)

  1. Consistent Content.
    • The Revelation of God’s Name.  According to J, the name YHWH was known since the earliest generations of humans. But in E and P it is stated just as explicitly that YHWH does not reveal this name until the generation of Moses.
    • Sacred Objects.
      • Tabernacle: P discusses the Tabernacle 200 times, it receives more attention than any other subject. It is never mentioned in J or D. E mentions it three times.
      • The Ark: J identifies the ark is identified as crucial to Israel’s travels and military successes; it is never mentioned in E.
      • Urim and Thummim: P mentions Urim and Thummim. J, E, and D never do.
      • Cherubs: P and J invoke cherubs. E and D never do.
      • Miracles: E has miracles performed by Moses’ staff. P uses Aaron’s staff.
    • Priestly Leadership. In P, access to the divine is limited to Aaronid priests. There is no talk of dreams, angels, talking animals, judges, and very few mentions to prophets. These themes are developed almost exclusively in J, E, and D.
  2. Narrative Flow.  Reading J, E, D, and P as standalone narratives tends to remove non-sequiturs and contradictions, and generally improve the sense of narrative flow. Want to see this for yourself? Go compare the original composite story of Noah, and contrast it with the original two stories (the original stories were weaved together by a later redactor).
  3. Inter-Source Relationships. Source D shares the same tone, emphases, and worldview as the book of Jeremiah. Source P resonates strongly with the book of Ezekiel. Finally, Sources J and E mirrors the book of Hosea.
  4. Historical Grounding. This is the most exciting piece of evidence, for reasons I will more fully explore next time. Suffice to say that we can localize each source to the historical context in which it was written. We have evidence suggesting that J and E were composed during the divided monarchy, before Israel fell in 722 BCE. J is written from a Southern perspective (in Judah), E is written from a Norther perspective (in Israel). After the fall of the northern kingdom, many Israelites fled to Judah. Because the old tribal disputes had faded in importance, J and E were combined into a JE narrative. The Priestly source P was an alternative telling of JE written in 8th century Judah. Finally, the first iteration of Deuteronomy was composed during the reign of King Josiah (641 BCE), just 20 years before the Babylonian exile (622 BCE).

I’ll let Richard Elliott Friedman wrap up this section.

Above all, the strongest evidence establishing the Documentary Hypothesis is that several different lines of evidence converge. There are more than thirty cases of doublets: stories or laws that are repeated in the Torah. The existence of so many overlapping texts is noteworthy itself. But their mere existence is not the strongest argument. One could respond, after all, that this is just a matter of style of narrative strategy. Similarly, there are hundreds of apparent contradictions in the text, but one could respond that we can taken them one by one and find some explanation for each contradiction. And, similarly, there is a matter of the texts that consistently call the deity God while other texts consistently call God by the name YHWH, to which one could respond that this is simply like calling someone sometimes by his name and sometimes by his title.

The powerful argument is not any one of these matters. It is that all these matters converge. When we separate the doublets, this also results in the resolution of nearly all the contradictions. And when we separate the doublets, the name of God divides consistently in all but three out of more than two thousand occurrences. And when we separate the doublets, the terminology of each source remains consistent within the source. And when we separate the sources, this produces continuous narratives that flow with only a rare break. And when we separate the sources, this fits with the linguistic evidence, where the Hebrew of each source fits consistently with what we know of the Hebrew in each period. And so on for each of the categories that precede this section.

The name of God and doublets were the were the starting-points of the investigation into the formation of the Bible. But they are not major arguments or evidence in themselves. The most compelling argument is that all this evidence of so many kinds comes together so consistently. To this day, no one known to me who challenged the hypothesis has ever addressed this fact.

Open Questions

Most scholars agree with the broad picture of four sources (J, E, P, D) and two redactions (JE and JEPD). There does exist considerable controversy at finer levels of detail. The four most contentious mini-debates I know of are as follows:

  • While there is consensus on the dating of J, E, and D, the dating of P is somewhat controversial (700 vs 500 BCE).
  • The exact relationship of J and E is at times hard to work out, particularly because E has less material than J. Were parts of E ejected during the redaction process of JE? Or was E composed as a supplement to J, and not a standalone work?
  • It is hard to make out how the two redaction processes actually worked. The Hebrew Bible is the very first example of prose writing in the entire world (earlier writing was entirely poetic).
  • There is consensus that J, E, P, and D were authored long after the events that they describe. They were undoubtedly influenced by early oral traditions. However, the extent of continuity and historical memory transferred from these oral traditions is in some doubt.


I was raised in an evangelical household, which means that growing up, I have read the Hebrew Bible (known to Christians as the Old Testament) cover-to-cover several times. I found such reading difficult. Some of this was mere cultural distance: a kid in the 20th century CE is three millenia removed from Canaanite culture in which the Bible was written.

But for me, the Hebrew Bible feels much easier to understand in light of the Documentary Hypothesis.

  • Contradictions are explained.
  • The within-source stories flow much better.
  • It is easier to understand the narrative discontinuities in the composite.
  • The diverse perspectives can be situated within their originating cultural milieu.

I wish more people knew about the Documentary Hypothesis for these reasons. Or better yet, could look at the labelled sources of the Hebrew Bible online. But for now, if you’d like to read the Hebrew Bible yourself, with labelled sources, the best way to do this is simply to purchase a book like, The Bible With Sources Revealed for a copy of the complete Torah, color coded by authorship.

Until next time.


A Secret In The Ark

Part Of: History sequence
Content Summary: 1500 words, 15min read


Today, I want to try something unusual: I want to analyze the story of Noah from a literary perspective. Some surprises lurk beneath the surface.

A Fresh Take On Noah

Try your utmost to read the following with fresh eyes. There will be a quiz after! (Okay, so you can review its four question above, and there is no grade. :P)

Ready to begin? Okay. See you soon!

Examining The Text

Q1. How many animals?

You are to bring into the ark two of all living creatures, male and female, to keep them alive with you. Two of every kind of bird, of every kind of animal and of every kind of creature that moves along the ground will come to you to be kept alive.

Take with you seven pairs of every kind of clean animal, a male and its mate, and one pair of every kind of unclean animal, a male and its mate, and also seven pairs of every kind of bird, male and female

Now, the above seems contradictory.  The difference seems to be:

  • { “clean”:”1 pair” ; “unclean: “1 pair”}     vs    
  • { “clean”:”7 pairs” ; “unclean: “1 pair”}

Is this apparent contradiction a real one? Can it be resolved? Such questions are irrelevant to the argument. The simple point is: there is tension in the narrative.

Q2. How long did the flood last?

Another hard question. Take your best guess.

As you re-read the story, you are probably struck with the fact that there is A LOT of temporal information in this story. The task of constructing a coherent answer is hard. Especially when you compare quotes like these:

For forty days the flood kept coming on the earth

The waters flooded the earth for a hundred and fifty days.

Again, the point here is about tension. Notice your confusion.

Q3. How was the narrative flow?

Yes, the narrative had structure. Yes, its plot holds together. But was it a pleasure to read?

Well, I didn’t think so.

To most modern readers, perhaps, the level of detail is painful, the amount of repetition tiresome. What are we to make of this? Are we to judge the story’s author as less enlightened regarding narrative structure?

A typical counter-argument appeals to chronological snobbery. Writing styles change, and over the millennia they plausibly change a lot.

But this response misses the point. For it turns out that these Israelite authors were better at constructing prose than the text might suggest at first glance.

Q4. What is the point-of-view of the author?

Could you create a compelling answer to this question, dear reader? I’m not sure if I could. My answer would be vague, and would lean heavily on the contents of story itself.

A New Hypothesis

Okay, so we’ve identified a few points of discomfort within the story.  If we modify our beliefs about how it was constructed, can we better explain our confusion?

Consider what happens if we view this text as the work of two different authors. We’d then need to get out two highlighters, and guess which passages come from the first, and which come from the second. Let consider one such guess now. I’d like you to just briefly skim through the following:

Notice anything cool?

As an aside: I want you thinking about how we could automate this “highlighter procedure”. Could we teach a computer how to reconstruct multiple authorship, if and only if such blending had occurred? How would we make it learn the process? How could we test it?

Okay, time to name the authors.

  • The author of the orange text we shall call J: the Jahwist source (because he likes to use the YHWH title).
  • The author of the pink text we shall call P: the Priestly source (for reasons I’ll explain in my next article).

Refining Our Hypothesis

Imagine for a moment I have written a novel. Do you think you would be able to carve my novel into two pieces, and preserve the structure and coherence of both halves?  I suspect not.

Let us name our hypotheses:

  • Let H1 represent the original, one-author hypothesis.
  • Let H2 represent the new, two-author hypothesis.

H2 can be visualized as follows:

Compilation of Noah (2)

I’ve already shown you the right hand side (the previous excerpt). Now, I’ll introduce you to the (more exciting) left hand side: the original narratives.

Evaluating The Evidence

Like good little Bayesians, we have H1 (one author) and H2 (two author) floating around in our mental apparatus.  Which hypothesis best explains this document?

To find out, let’s revisit the evidence.

Q1: How many animals were brought onto the ark?

  • The Jahwist narrative has the rule: 7 pairs for clean animals, 1 pair for unclean animals.
  • The Priestly narrative has the rule: 1 pair of all living creatures.

The tension dissolves.

Notice that the burnt offering only occurs in the Jahwist tale, and he is careful to describe the sacrifice of only clean animals (which in his version, has 7 pairs). No more need to worry about burnt offerings causing extinctions! 🙂

Q2: How long did the flood last?

  • The Jahwist narrative has the flood lasting for 40 days.
  • The Priestly narrative has the flood lasting for 150 days.

The tension dissolves.

Q3: How would you rate the narrative flow?

… it’s a lot better!

Q4: How well can you make out the author’s point-of-view?

Recall that, before, we didn’t have much of an answer: we just mumbled something about the story. But now, look:

  • P only uses the more universal term God (16 times). J uses the more personal YHWH exclusively (10 times).
  • P is interested in details such as ark dimensions, and lineages (only he names the sons of Noah). J is more oriented around the events.
  • P uses very precise dates, reminiscent of a calendar. J uses the numeric theme of 7 and 40.
  • Stylistically, P reads like the work of a scribe. J reads like an epic saga, like the Epic of Gilgamesh.

Epistemic Status

I am not a philologist. I did not make this argument. What do the experts think?

The multiple authorship solution to the story of Noah (H2)  is the consensus of modern academia. It is not a contentious issue.

That this consensus is not public knowledge to those who would like to know is a rather interesting cultural failure mode.

Parting Thoughts

I hope that learning about the two authors of Noah elicited an “aha moment” from you. A few parting thoughts:

  • The debates surrounding apparent contradictions in the Bible would be more useful if they incorporated source criticism results like these.
  • It seems long overdue for resources like BibleGateway to offer different versions of authorship highlighting, just as they do for translation options.
  • Which narrative did the Noah movie borrow from the most, and will the OTHER STORY also land a blockbuster hit? 😉

Next time, I will be immersing this example of multiple authorship inference within the context of the Documentary Hypothesis and the modern atmosphere of Biblical studies. See you then!


During the construction of this article, I drew from this textbook and this UPenn resource.