Five Tribes of Machine Learning

Part Of: Machine Learning sequence
Content Summary: 900 words, 9 min read

ML is tribal, not monolithic

Research in artificial intelligence (AI) and machine learning (ML) has been going on for decades. Indeed, the textbook Artificial Intelligence: A Modern Approach reveals a dizzying variety of learning algorithms and inference schemes. How can we make sense of all the technologies on offer?

As argued in Domingos’ book The Master Algorithm, the discipline is not monolithic. Instead, five tribes have progressed relatively independently. What are these tribes?

  1. Symbolists use formal systems. They are influenced by computer science.
  2. Connectionists use neural networks. They are influenced by neuroscience.
  3. Bayesians use probabilistic inference. They are influenced by statistics.
  4. Evolutionaries are interested in evolving structure. They are influenced by biology.
  5. Analogizers are interested in mapping to new situations. They are influenced by psychology.

Expert readers may better recognize these tribes by their signature technologies:

  • Symbolists use decision trees, production rule systems, and inductive logic programming.
  • Connectionists rely on deep learning technologies, including RNN, CNN, and deep reinforcement learning.
  • Bayesians use Hidden Markov Models, graphical models, and causal inference.
  • Evolutionaries use genetic algorithms, evolutionary programming, and evolutionary game theory.
  • Analogizers use k-nearest neighbor, and support vector machines.

Five Tribes- Strengths and Technologies

In fact, my blog can be meaningfully organized under this research landscape.

History of Influence

Here are some historical highlights in the development of artificial intelligence.

Symbolist highlights:

  • 1950: Alan Turing proposes the Turing Test in Computing Machinery & Intelligence.
  • 1974-80: Frame problem & combinatorial explosion caused First AI Winter.
  • 1980: Expert systems & production rules re-animate the field. 
  • 1987-93: Expert systems too fragile & expensive, causing the Second AI Winter.
  • 1997: Deep Blue defeated reigning chess world champion Gary Kasparov.

Connectionist highlights:

  • 1957: Perceptron invented by Frank Rosenblatt.
  • 1968: Minsky and Papert publish the book Perceptrons, criticizing single-layer perceptrons. This puts the entire field to sleep, until..
  • 1986: Backpropagation invented, and connectionist research restarts.
  • 2006: Hinton et al publish A fast learning algorithm for deep belief nets, which rejuvinates interest in Deep Learning.
  • 2017: AlphaGo defeats reigning Go world champion, using DRL.

Bayesian highlights:

  • 1953: Monte Carlo Markov Chain (MCMC) invented. Bayesian inference finally becomes tractable on real problems.
  • 1968: Hidden Markov Model (HMM) invented.
  • 1988: Judea Pearl authors Probabilistic Reasoning in Intelligent Systems, and creates the discipline of probabilistic graphical models (PGMs).
  • 2000: Judea Pearl authors Causality: Models, Reasoning, and Inference, and creates the discipline of causal inference on PGMs.

Evolutionary highlights

  • 1975: Holland invents genetic algorithms.

Analogizer highlights

  • 1968: k-nearest neighbor algorithm increases in popularity.
  • 1979: Douglas Hofstadter publishes Godel, Escher, Bach.
  • 1992: support vector machines (SVMs) invented.

We can summarize this information visually, by creating an AI version of the Histomap:

Five Tribes- Historical Size and Competition (2)

These data are my own impression of AI history. It would be interesting to replace it with real funding & paper volume data.

Efforts Towards Unification

Will there be more or fewer tribes, twenty years from now? And which sociological outcome is best for AI research overall?

Theory pluralism and cognitive diversity are underappreciated assets to the sciences. But scientific progress is often heralded by unification. Unification comes in two flavors:

  • Reduction: identifying isomorphisms between competing languages,
  • Generalization: creating a supertheory that yields antecedents as special cases.

Perhaps AI progress will mirror revolutions in physics, like when Maxwell unified theories of electricity and magnetism.

Symbolists, Connectionists, and Bayesians suffer from a lack of stability, generality, and creativity, respectively. But one tribe’s weakness is another tribe’s strength. This is a big reason why unification seem worthwhile.

What’s more, our tribes possesses “killer apps” that other tribes would benefit from. For example, only Bayesians are able to do causal inference. Learning causal relations in logical structure, or in neural networks, are important unsolved problems. Similarly, only Connectionists are able to explain modularity (function localization). Symbolist and Bayesian tribes are more normative than Connectionism, which makes their technologies tend towards (overly?) universal mechanisms.

Symbolic vs Subsymbolic

You’ve heard of the symbolic-subsymbolic debate? It’s about reconciling Symbolist and Connectionist interpretations of neuroscience. But some (e.g., [M01]) claim that both theories might be correct, but at different levels of abstraction. Marr [M82] once outlined a hierarchy of explanation, as follows:

  • Computational: what is the structure of the task, and viable solutions?
  • Algorithmic: what procedure should be carried out, in producing a solution?
  • Implementation: what biological mechanism in the brain performs the work?

One theory, supported by [FP98] is that Symbolist architectures (e.g., ACT-R) may be valid explanations, but somehow “carried out” by Connectionist algorithms & representations.

Five Tribes- Tribes vs Levels (2)

I have put forward my own theory, that Symbolist representations are properties of the Algorithmic Mind; whereas Connectionism is more relevant in the Autonomic Mind.

This distinction may help us make sense for why [D15] proposes Markov Logic Networks (MLN) as a bridge between Symbolist logic and Bayesian graphical models. He is seeking to generalize these technologies into a single construct; in the hopes that he can later find a reduction of MLN in the Connectionist paradigm. Time will tell.


Today we discussed five tribes within ML research: Symbolists, Connectionists, Bayesians, Evolutionaries, and Analogists. Each tribe has different strengths, technologies, and developmental trajectory. These categories help to parse technical disputes, and locate promising research vectors.

The most significant problem facing ML research today is, how do we unify these tribes?


  • [D15] Domingos (2015). The Master Algorithm
  • [M01] Marcus (2001). The Algebraic Mind
  • [M82] Marr (1982). Vision
  • [FP98] Fodor & Pylyshyn (1998). Connectionism and cognitive architecture: A critical analysis

Logic Design: Harmony in IPL

Followup To: Logic Structure: Connectives in IPL
Part Of: Logic sequence
Content Summary: 300 words, 3 min read


Last time, we looked at Intuitionistic Propositional Logic (IPL). In IPL, there are five connectives, and hence five introduction-elimination pairs:

IPL- All Rules (1)

What if you had to design a new logic from scratch? Suppose we were to invent five new connective symbols. Would you start by defining their introduction rule, and use these to infer elimination? Or would you instead define elimination first?

This choice reflects different ways to interpret the semantics of logic:

  • The verificationist starts with introduction first. For them, the meaning of a connective is in their constructor (introduction rules).
  • The pragmatist starts with elimination first. For them, the meaning of a proposition is how you use it.

But if introduction and elimination rules agree, then a logical system has harmony.

How do we evaluate harmony in practice? Harmony is defined as two propositions:

  • Local soundness: if I introduce and then eliminate a connective, do I gain information? If so, the elimination rule is too weak.
  • Local completeness: if I eliminate then re-introduce connective, do I lose information? If so, the elimination rule is too strong.

Demonstrating Harmony in IPL

We can show that conjunction rules exhibit harmony.

IPL Harmony- Conjunction Connective (1)

Note that we have only shown soundness for left-elimination. But demonstrating soundness for right-elimination is highly analogous.

Implication rules also exhibit harmony.

IPL Harmony- Implication Connective (4)

So does disjunction.

IPL Harmony- Disjunction Connective (5)

It is trivial to demonstrate the harmony of truth and falsity. Thus, we can say that IPL, as a formal system, has harmony.


In this article, we have discussed harmony, which helps us evaluate how useful a given formal system is. This notion may seem straightforward in IPL; however, it will prove useful in designing new logics, such as linear logic.

Another more subtle point to consider is that the soundness demonstration also seems to reflect a logic of simplification. This point will return when we discuss the Curry-Howard-Lambek correspondence, and the deep symmetries between logic and computation.

Until next time.

Satisfiability & the Zebra Puzzle

Part Of: Logic sequence
Content Summary: 1000 words, 10 min read.

Today, we look at the Zebra Puzzle (aka Einstein Puzzle). According to legend, Albert Einstein invented this as a child, and claimed that 98% of the human population cannot solve it.

Let’s see if we are in the 2%.

The Puzzle

Five men of different nationalities and with different jobs live in consecutive houses on a street. These houses are painted different colors. The men have different pets and have different favorite drinks. The following rules are provided:

  1. The English man lives in a red house
  2. The Spaniard owns a dog
  3. The Japanese man is a painter
  4. The Italian drinks tea
  5. The Norwegian lives in the first house on the left
  6. The green house immediately to the right of the white one
  7. The photographer breeds snails
  8. The diplomat lives in the yellow house
  9. Milk is drunk in the middle house
  10. The owner of the green house drinks coffee
  11. The Norwegian’s house is next to the blue one
  12. The violinist drinks orange juice
  13. The fox is in a house that is next to that of the physician
  14. The horse is in a house next to that of the diplomat

Who owns a zebra? And whose favorite drink is mineral water?

To answer this problem, we must learn 5 house-nation-color-drink-pet-job combinations. A solution might look like this:

  • Yellow far-left house has Norwegian diplomat who drinks m. water and owns a fox
  • White left house has Italian photographer who drinks tea and owns a zebra.
  • Red middle house has English photographer who drinks milk and owns snails.
  • Green right house has Spanish physician who drinks OJ and owns a dog
  • Blue far-right house has Japanese painter who drinks coffee and owns a horse.

But this solution is incorrect: it violates Rule 6: “The green house immediately to the right of the white one.”

How do we find a solution that doesn’t violate any of our constraints? Does one even exist? Or is this set of constraints not satisfiable?

Formalizing Logical Structure

Words are distracting. Let’s use symbols instead.

Einstein's Puzzle- Symbol Code (7)

With this code, we can write the above solution as a matrix.

Einstein's Puzzle- Solution Matrix

We can also formalize our constraints.

Einstein's Puzzle- Constraint Formalization

These constraints are ugly. Let’s write them in matrix form instead!

Einstein's Puzzle- Constraint Matrix Horizontal (1)

Constraint Satisfaction as a Jigsaw Puzzle

We can use the above constraints to visually check satisfiability. Whereas before you had to parse the meaning of Rule 6 verbally, now you can just inspect whether there is a visual match between rule and solution.

Einstein's Puzzle- Visual Satisfiability Check (1)

One way to determine satisfiability is to perform these checks until you find a viable solution. But this is computationally expensive: there are 25 billion solutions. Instead of inspecting every possible solutions, why don’t we generate one solution?

How? Since our Rules are used for solution-checking, why can’t we use them for solution-building?

On this view, solution building takes on the flavor of a jigsaw puzzles. Each constraint is a puzzle piece, from these ingredients we construct the solution.


Unfortunately, there is more than one way to solve a 5×5 jigsaw puzzle. Let me show you one way to solve this one. We will be use choice minimization to simplify our lives: try to play the move with the fewest degrees of freedom.

Solution: Path A

Rule 5 and 9 relate to the houses, they are easy to apply.

After these, the Rule 11 puzzle piece fits unambiguously.


Let’s apply Rule 6 next. That jigsaw piece can fit in two locations, the M+R columns, or the R + FR columns. Gotta choose one: let’s select the former. After that move, Rule 10 fits unambiguously.

The FR column is the only place that has an unclaimed nation and color: Rule 1 must go there. similarly, the FL column is the only available spot for Rule 8.


Here we can apply Rule 14 (the original clue’s wording “The horse is in a house next to that of the diplomat” means that the puzzle piece can be flipped horizontally).

After that, only column L can accommodate Rule 4. Then FR must accept Rule 12. 


Disaster! Consider Clue 2, 3, and 7. These rules are mutually exclusive (they have at least one row in common with one another), and have overlapping domains (they all cannot fit in FL, but must fit in either M or R).

Einstein's Puzzle- hPath A3 Paradox

This is the pigeonhole principle: just as three pigeons cannot fit into two holes, there is no way to reach a solution.

Does that mean the puzzle is unsolvable? No, it means we explore other choices.

Solution: Path B

Let’s return to the other possible placement of Rule 6. Instead of putting it in M+R columns, we’ll put it in R+FR. Then, Rules 10, 1, 8, and 14 follow inevitably (each has precisely one choice).


Here we face another choice: do we put puzzle piece 4 in the left or right house? Let’s choose the right house. Then, Rule 12 and Rule 3 follow logically.


Alas! Another disaster. Rule 2 doesn’t fit. 😦

Solution: Path C

Retrace our steps! The last choice we made was Place(4, R). What if we place it in the left house instead?


To our delight, we now see that Path 2b is the only correct logical journey through our puzzle.  The concluding steps are given below, and the desired quantities are shown in the “missing” tiles.


Recall the original questions:

Who owns a zebra (P5)? Whose favorite drink is mineral water (D5)?

Our symbol table can translate our answer:

The Japanese man (N3) owns the zebra, and the Norwegian (N5) drinks mineral water


The above solution is nothing more to solving a 5×5 jigsaw puzzle. I suspect this technique will only become clear with practice. Go solve your own jigsaw here!

For the solution above, it is helpful to review our search history. Remarkably, we only faced two choices in our solution. When one branch failed, we turned out attention to other branches. This is known as recursion, and will be the subject of another blog post.

Einstein's Puzzle- Search History (2)

Many programming solutions exist for these kinds of problems. In practice, libraries can be used to write more concise solvers.

This kind of problem is called propositional satisfiability (SAT), or constraint programming (CP), although these two disciplines differ in subtle ways.

As we will see next time, SAT problems are at the root of complexity theory and artificial intelligence. Until then.

Complementary Learning Systems

Part Of: Demystifying Memory sequence
Content Summary: 1000 words, 10 min read


Your brain is constantly keeping track of the world and your body. It represents these ever-changing environments by patterns of neural activation. Knowledge is not kept in the neurons themselves, but in the connections between neurons.

Sometimes, the brain will discover useful regularities in the environment, and store these patterns for later use. This is long-term memory. We shall concern ourselves with five kinds of long-term memory:

  1. Episodic: ability to remember events or episodes (e.g., dinner last Tuesday night)
  2. Semantic: ability to remember facts and concepts (e.g., hands have five fingers)
  3. Procedural: ability to develop skills (e.g., playing the piano).
  4. Behavioral: ability to remember stimulus-outcome pairs (e.g., bell means food)
  5. Emotional: ability to remember emotional information (e.g., she is always angry).

These memory systems are computed in different areas of the brain.

  1. Episodic memories are computed by the hippocampus
  2. Semantic memories are computed by the association neocortex
  3. Procedural memories are computed by the somatosensory neocortex
  4. Behavioral memories are computed by the basal ganglia
  5. Emotional memories are computed by the central amygdala

Only episodic and semantic memory are directly accessible to consciousness (i.e., working memory). The others are just available to the autonomous mind.

CLS- Categories of Long-Term Memory (1)


We have previously described conscious experience as a mental movie. But, unlike a normal theater, consciousness has several screens, each of which playing a different sense modality. visual, audio information etc. Call this the multimodal movie.

Semantic memory come in two forms: encyclopedic memory (abstract descriptions of events) and conceptual memory (concepts and their inter-relationships). Both abstractions are derived from the movie, by removing redundant information.

CLS- Episodic vs Semantic Memory

Mind wandering is the tendency of animals to recall past experiences. But why does mind wandering resurrect the details of what was seen, heard, smelled, touched? Why not simply use the plot summary (encyclopedic memory) instead?

Why does episodic memory exist at all?


Henry Molaison was born on February 26, 1926. As a child, he suffered from epilepsy.

CLS- Patient HM (2)

His doctors removed what they thought to be the source of the seizures: the hippocampus. After the surgery, Henry still recognized objects, was able to solve puzzles, even had the same IQ. He had a rich emotional life, and could learn new skills (e.g., to play the piano). But he was completely incapable of forming new episodic memories. Henry (i.e., Patient HM) was locked in a 5 minute loop, never remembering prior events.

Let’s imagine different kinds of amnesia Henry might have experienced.

Scenario 1. Henry has no retrograde amnesia (old memories were unperturbed), but suffers severe anterograde amnesia (unable to create new memories). From this data, we might conclude that the hippocampus creates, but does not store, episodic memories.

CLS- HM Amnesia Pattern v1 (1)

Scenario 2. Henry experiences both severe retrograde and anterograde amnesia. From this data, we might conclude that the hippocampus creates and stores episodic memories.

CLS- HM Amnesia Pattern v2 (2)

Neither scenario actually happened. Instead, Henry experienced temporally graded retrograde amnesia:

CLS- HM Amnesia Pattern v3 (2)

This shows that, while the hippocampus creates and stores episodic memories, these memories are eventually copied elsewhere. This process is called consolidation. Hippocampal damage destroy memories that have not yet been consolidated. 

But why should the brain copy memories? This seems inefficient. And why does this process take years, even decades?


The connectionist paradigm models the brain as a neural networkThe AB-AC task illustrates a challenge for connectionism. It goes as follows:

You want to associate stimulus A with response B. For example, when you hear “chair”, you should say “map”. There are many such associations (Chair-Map, Book-Dog, Car-Idea). This is the AB list.

After you achieve 100% recall on the AB list , a new set of stimulus-response words are given: the AC list. You want to learn both. However, the AB and AC lists have the same stimuli paired with novel responses (e.g. Chair-Printer, Book-Flower, Car-Shirt).

How well do humans and connectionist models do against this task? Let’s find out! The following graphs take place after the AB list has been learned perfectly. Y-axis is %correct, x-axis is number of exposures to the AC list.

CLS- Catastrophic Interference (2)

Consider the left graph. Dotted line is AC recall over time. Humans were able to learn the AC list. The solid line shows AB list performance. As humans learned AC associations, their AB performance suffered a little, from 100 to 60%. This is moderate interference.

Consider the right graph. Dotted line shows that the model is able to learn the AC list, just like the human. But solid line shows that AB recall very quickly drops to 0%. This is catastrophic interference.

Catastrophic interference occurs when the AB list and AC list are learned separately (focused learning). But what if you learn them at the same time? More specifically, what if you train against a shuffled set of AB and AC associations (interleaved learning)?

CLS- Interleaved vs Focused Learning (2)

On the left, focused learning (black squares) shows catastrophic interference against AB memories, as before. But interleaved learning (white dots) show zero interference!

On the right, we see another consequence of interleaved learning: new memories are acquired much more slowly.


We are ready to put the puzzle together.

Catastrophic interference is an inevitable consequence of systems that employ highly-overlapping distributed representations, despite the fact that such systems have a number of highly desirable properties (e.g., the ability to perform generalization and inference).

This problem can be addressed by employing a structurally distinct system with complementary learning properties: sparse, non-overlapping representations that are highly robust to interference from subsequent learning. Such a sparse system by itself would be like an autistic savant: good at memorization but unable to perform everyday inferences. But when paired with the highly overlapping system, a much more versatile overall system can be achieved.

The neocortex and hippocampus comprise these learning systems:

CLS- Two Component Model

First introduced in 1995, Complementary Learning System (CLS) theory predicts a wide range of extant biological, neuropsychological, and behavioral data. It explains why the hippocampus exists, why it performs consolidation, and why consolidation takes years to complete.

The CLS theory was first presented in [M95]. Data in section 4 taken from that paper. Section 5 quotes liberally from [O11].

  • [M95] McClelland et al (1995). Why There Are Complementary Learning Systems in the Hippocampus and Neocortex: Insights From the Successes and Failures of Connectionist Models of Learning and Memory
  • [O11] O’Reilly et al (2011). Complementary Learning Systems

The Tripartite Mind

Part Of: Neural Architecture sequence
Content Summary: 700 words, 7 min read

Dual-Process Theory

Dual process theory identifies two modes of human cognition: a fast, parallel System 2 and a slow, serial System 1.

Linguistic Implications- Dual Process Theory

This distinction can be expressed phylogenetically:

Tripartite Mind- Dual-Process Theory Phylogeny (2)

But this is incorrect. We know that engine of consciousness is the extended reticular-thalamic activating system (ERTAS), which implements feature integration by phase binding.  Mammalian brains contain this device. Also, behavioral evidence indicate that non-human animals possess working memory and fluid intelligence[C13].

Conscious, non-linguistic animals exist. We need a phylogeny that accepts this fact.

The Tripartite Mind

Let’s rename System 1, and divide System 2 into two components.

  • The Autonomous mind is a subsymbolic neural network.
  • The Algorithmic Mind constructs perceptual object via the Global Workspace.
  • The Linguistic Mind applies linguistic processing to conscious contents.

This allows us to conceive of conscious, non-verbal mammals:

Linguistic Musings- Architectural Phylogeny (1)

This lets us refresh our view of property dissociations:

Tripartite Mind- TPM Property Dissociations (1)

Most dual-process theorizing (for example, our theory of moral cognition) maps neatly to the autonomous and linguistic mind, respectively.

But the Tripartite Mind theory cannot bear the weight of all behavioral phenomena. For that, we need the more robust language of two loops. In fact, we can marry these two theories as follows:

Tripartite Mind- Cybernetics Interpretation

This diagram reflects the following facts:

  • The Autonomous Mind constitutes most of the brain.
  • The Algorithmic Mind is perceptual, and processes Autonomic representations.
  • The Linguistic Mind receives Algorithmic (but not Autonomic) information.

Boundaries on the Linguistic Mind

The Linguistic Mind creates cultural knowledge. It is the technology underlying the invention of agriculture, calculus, and computational neuroscience. It is hard to see how such a device could be not only biased, but in some respects completely blind.

But the Linguistic Mind does not have access to raw sensorimotor signals. It only has access to the intricately curated working memory. You cannot communicate mental experiences outside of working memory. You can try, but that would be confabulation (unintentional dishonesty). As [NW77] describe in their seminal paper Telling more than we can know, in practice, human beings are strangers to themselves.

The evidence suggests that working memory does not contain any information about your judgments and decision making. All attempts to describe this aspect of our inner life fail. Introspection on these matters cannot secure direct access to the truth of the matter. Rather, we guess at our own motives, using the exact same machinery we use to interpret the behavior of other people. For more on the Interpretive Sensory Access theory of introspection [C10], I recommend this lecture.

Sociality and the Linguistic Mind

Per the Social Brain Hypothesis [D09], humans are not more intelligent than other primates; we are rather more social. In other words, the Linguistic Mind is a social invention, which facilitates the construction of cultural institutions which allow propriety frames to be synchronized more explicitly.

On the argumentative theory of reasoning, social reasoning is not independent of language. It is the purpose of language.

Argumentative Reason- Module Evolution (2)

While the Linguistic Mind evolved to satisfy social selection pressures, not all primate sociality is linked to this device. Social mechanisms have arrived in stages:

  • Primary emotions as social behavior network can be traced back to ray-finned fish.
  • In New World primates, body language evolved as an extension to our autonomic nervous system, as described in the polyvagal theory. [P03]
  • Certain human-specific social mechanisms evolved within the neuroimmune axis as a defense mechanism to parasites [TF14].

All of these mechanisms can be attributed to the Autonomous Mind. But since the Linguistic Mind is driven by our motivation apparatus (just like everything else in the brain), its behavior is sensitive to the wishes of these “lower” modules. This doesn’t contradict our earlier assumption that its content is divorced from Autonomous data.


  • [C13] Carruthers 2013. Evolution of working memory.
  • [C10] Carruthers 2010. Introspection: Divided and Partly Eliminated
  • [TF14] Thornhill, Fincher (2014). The Parasite-Stress Theory of Sociality, the Behavioral Immune System, and Human Social and Cognitive Uniqueness.
  • [P03] Porges (2003). The Polyvagal Theory: phylogenetic contributions to social behavior
  • [D09] Dunbar (2009). The social brain hypothesis and its implications for social evolution