The Symmetric Group

Part Of: Algebra sequence
Followup To: An Introduction to Geometric Group Theory
Content Summary: 1800 words, 18 min read

On Permutations

In the last few posts, we have discussed algebraic structures whose sets contain objects (e.g., numbers). Now, let’s consider structures over a set of functions, whose binary operation is function composition.

Definition 1. Consider two functions f and g. We will denote function composition of g(f(x)) as f \bullet g. We will use this notation instead of the more common g \circ f.  Both represent the idea “apply f, then g“. 

Consider G = (\left\{ f(x) = 2x, g(x) = x+1, h(x) = 2x + 2, i(x) = 2x+1 \right\}, \bullet)

Is this a group? Let’s check closure:

  • g \bullet f = 2(x+1) = h \in G
  • f \bullet g = (2x)+1 = i \in G
  • f \bullet h = 2(2x)+2 \not\in G

Closure is violated. G isn’t even a magma! Adding j(x) = 4x+2 to the underlying set exacerbates the problem: then both f \bullet j \not\in G and g \bullet j \not\in G.

So it is hard to establish closure under function composition. Can it be done?

Yes. Composition exhibits closure on sets of permutation functions. Recall that a permutation is simply a bijection: it re-arrange a collection of things. For example, there are six possible bijections of a set of three elements.

Symmetry Group_ Permutation Options

Definition 2. The symmetric group S_n denotes composition over a set of all bijections (permutations) over some set of n objects. The symmetric group is then of order n!.

The underlying set of S_{3} is the set of all permutations over a 3-element set. It is of order 3! = 3 \times 2 \times 1 = 6.

This graphical representation of permutations is rather unwieldy. Let’s switch to a different notation system!

Notation 3: Two Line Notation. We can use two lines to denote each permutation mapping \phi. The top row represents the original elements x, the bottom represents where each element has been relocated \phi(x).

Symmetric Group_ Two Line Notation

Two line notation is sometimes represented as an array, with the top row as matrix row, and bottom denoting matrix column. Then the identity matrix represents the identity permutation.

Definition 4. A cycle is a sequence of morphisms that forms a closed loop. An n-cycle is a cycle of length n. A 1-cycle does nothing. A 2-cycle is given the special name transposition. S_3 has two permutations with 3-cycles: can you find them?

Theorem 5. Cycle Decomposition Theorem. Every permutation can be decomposed into disjoint cycles. Put differently, a node cannot participate in more than one cycle. If it did, its parent cycles would merge.

Notation 6: Cycle Notation. Since permutations always decompose into cycles, we can represent them as (A_1\ A_2\ \ldots\ A_n), pronounced “A_1 goes to A_2 goes to …”.

Symmetric Group_ Cycle Notation (1)

Cycle starting element does not matter: (A\ C\ B) = (C\ B\ A) = (B\ A\ C).

The Cycle Algorithm

It is difficult to tell visually the outcome of permutation composition. Let’s design an algorithm to do it for us!

Algorithm 7: Cycle Algorithm. To compose two permutation functions a(S) and b(S), take each element s \in S and follow its arrows until you find the set of disjoint cycles. More formally, compose these functions x times until you get (a \bullet b)^x(s) = s.

Here’s a simple example from S_{3} = ( \left\{ 0, 1, 2, 3, 4, 5 \right\}, \bullet ).

Symmetric Group_ Cycle Algorithm S3 Ex (2)

Make sense? Good! Let’s try a more complicated example from S_4.

Symmetric Group_ Cycle Algorithm S4 Ex (1)

A couple observations are in order.

  • 1-cycles (e.g., (A)) can be omitted: their inclusion does not affect algorithm results.
  • Disjoint cycles commute: (B\ D),(A\ C) = (A\ C),(B\ D). Contrast this with composition, which does not commute (B\ D) \bullet (A\ C) \neq (A\ C) \bullet (B\ D).

Now, let’s return to S_{3} for a moment, with its set of six permutation functions. Is this group closed? We can just check every possible composition:

Symmetric Group_ Permutation Composition

From the Cayley table on the right, we see immediately that S_{3} is closed (no new colors) and non-Abelian (not diagonal-symmetric).

But there is something much more interesting in this table. You have seen it before. Remember the dihedral group D_{3}? It is isomorphic to S_{3}!

Symmetric Group_ D3 S3 Isomorphism (3)

If you go back to the original permutation pictures, this begins to make sense. Permutations 1 and 2 resemble rotations/cycles; 3, 4, and 5 perform reflections/flips.

Generators & Presentations

In Theorem 7, we learned that permutations decompose into cycles. Let’s dig deeper into this idea. 

Theorem 8. Every n-cycle can be decomposed into some combination of 2-cycles. In other words, cycles are built from transpositions.

The group S_{3} = \left\{ 0, 1, 2, 3, 4, 5 \right\}, \bullet) has three transpositions 3 = (A\ B), 4 = (B\ C), and 5 = (A\ C).

Transpositions are important because they are generators: every permutation can be generated by them. For example, 1 = 3 \bullet 4. In fact, we can lay claim to an even stronger fact:

Theorem 9. Every permutation can be generated by adjacent transpositions. Every permutation S_{n} = \langle (1 2), (2 3), \ldots, (n-1\ n) \rangle.

By the isomorphism S_3 \cong D_3 , we can generate our “dihedral-looking” Cayley graph by selecting generators 1=(A\ B\ C) and 3 = (B\ C).

But we can use Theorem 9 to produce another, equally valid Cayley diagram. There are two adjacent transpositions in S_{3}: 3 and 4. All other permutations can be written in terms of these two generators:

  • 0 = 3 \bullet 3.
  • 1 = 3 \bullet 4.
  • 2 = 4 \bullet 3.
  • 5 = 3 \bullet 4 \bullet 3

This allows us to generate a transposition-based Cayley diagram. Here are the dihedral and transposition Cayley diagrams, side by side:

Symmetric Group_ Cayley Diagram (4)

We can confirm the validity of the transposition diagram by returning to our multiplication table: 1 \bullet 3 = 5 means a green arrow 1 \rightarrow 5.

Note that the transposition diagram is not equivalent cyclic group C_6, because arrows in the latter are monochrome and unidirectional.

We’re not quite done! We can also rename our set elements to employ generator-dependent names, by “moving clockwise”:

Symmetric Group_ Abstract Cayley Diagram

We could just as easily have “moved counterclockwise”, with names like 3 \mapsto r, 1 \mapsto rl. And we can confirm by inspection that, in fact, rl = lrlr etc.

Using the original clockwise notation, one presentation of S_3 becomes:

S_3 = \langle l, r \mid r^2 = e, l^2 = e, (lr)^3 = e \rangle

Towards Alternating Groups

Any given permutation can be written as a product of permutation. Consider, for example, the above equalities

  • 1 = rl = lrlr = (lr)^3rl. These have 2, 4, and 8 permutations, respectively. 
  • 5 = lrl = lrl^3 = l^3r^3l^3. These have 3, 5, and 9 permutations, respectively.

Did you notice any patterns in the above lists? All expressions for 1 require an even number of transpositions, and all expressions of 5 require an odd number. In other words, the parity (evenness or oddness) of a given permutation doesn’t seem to be changing. In fact, this observation generalizes:

Theorem 10. For any given permutation, the parity of its transpositions is unique.

Thus, we can classify permutations by their parity. Let’s do this for S_3:

  • 0, 1, 2 are even permutations.
  • 3, 4, 5 of odd permutations.

Theorem 11. Exactly half of S_n are even permutations, and they form a group called the alternating group A_n. Just as \lvert S_n \rvert = n!, A_n has \dfrac{n!}{2} elements. 

Why don’t odd permutations form a group? For one thing, it doesn’t contain the identity permutation, which is always even.

Let’s examine A^3 = (\left\{ 0, 1, 2 \right\}, \bullet ) in more detail. Does it remind you of anything?

It is isomorphic to the cyclic group C_3 ..!

Symmetric Group_ C3 A3 Isomorphism (1)

We have so far identified the following isomorphisms: S_3 \cong D_3 and A_3 \cong C_3. Is it also true that e.g., S_4 \cong D_4 and A_4 \cong C_4?

No! Recall that the \lvert D_n \rvert=2n and \lvert C_n \rvert = nOnly n \neq 3, these sets are not even potentially isomorphic.  For example:

  • \lvert D_4 \rvert=2 \times 4 = 8 \neq 24 = 4! = \lvert S_4 \rvert.
  • \lvert C_4 \rvert= 4 \neq 12 = \frac{24}{2} = \lvert A_4 \rvert.

For these larger values of n, the symmetric group is much larger than dihedral and cyclic groups.


Why do symmetric & alternating groups matter? Let me give two answers.

Perhaps you have seen the quadratic equation, the generic solution to quadratic polynomials ax^2 + bx + c = 0.

x = \dfrac{-b \pm \sqrt{b^2 - 4ac}}{2a}

Analogous formulae exist for cubic (degree-3) and quartic (degree-4) polynomials. 18th century mathematics was consumed by the theory of equations: mathematicians attempting to solve quintic polynomials (degree-5). Ultimately, this quest proved to be misguided: there is no general solution to quintic polynomials.

Why should degree-5 polynomials admit no solution? As we will see when we get to Galois Theory, it has to do with the properties of symmetric group S_5.

A second reason to pay attention to symmetric groups comes from the classification theorem of finite groups. Mathematicians have spent decades exploring the entire universe of finite groups, finding arcane creatures such as the monster group, which may or may not explain features of quantum gravity.

One way to think about group space is by the following periodic table:

Symmetric Group_ Periodic Table

Image courtesy of Samuel Prime

Crucially, in this diverse landscape, the symmetric group plays a unique role:

Theorem 12: Cayley’s Theorem. Every finite group is a subgroup of the subgroup S_n, for some sufficiently large n.

For historical reasons, subgroups of the symmetric group are usually called permutation groups.

Until next time.

Wrapping Up


  • The symmetric group S_n is set of all bijections (permutations) over some set of n objects, closed under function composition.
  • Permutations can be decomposed into disjoint cycles: cycle notation uses this fact to provide an algorithm to solve for arbitrary compositions.
  • All permutations (and hence, the symmetric group) can be generated by adjacent transpositions. This allows us to construct a presentation of the symmetric group.
  • Permutations have unique parity: thus we can classify permutations as even or odd. The group of even presentations is called the alternating group A_n.
  • It can be shown that S_3 \cong D_3 and A_3 cong C_3. However, for larger n, the symmetric and alternating group are much larger than cyclic and dihedral groups.

The best way to learn math is through practice! If you want to internalize this material, I encourage you to work out for yourself the Cayley table & Cayley diagram for S_4. 🙂

Related Resources

For a more traditional approach to the subject, these Harvard lectures are a good resource.


An Introduction to Geometric Group Theory

Part Of: Algebra sequence
Followup To: An Introduction to Abstract Algebra
Content Summary: 1500 words, 15 min read

An Example Using Modular Addition

Last time, we saw algebraic structures whose underlying sets were infinitely large (e.g., the real numbers \mathbb{R}). Are finite groups possible?

Consider the structure ( \left\{ 0, 1, 2, 3 \right\}, +). Is it a group? No, it isn’t even a magma: 2 + 3 \not\in \left\{ 0, 1, 2, 3 \right\}! Is there a different operation that would produce closure?

Modular arithmetic is the mathematics of clocks. Clocks “loop around” after 12 hours. We can use modulo-4 arithmetic, or +_{4}, on  \left\{ 0, 1, 2, 3 \right\}. For example, 2 +_{4} 3 = 1.

To check for closure, we need to add all pairs of numbers together, and verify that each sum has not left the original set. This is possible with the help of a Cayley table. You may remember these as elementary school multiplication tables 😛 .

Geometrical Group Theory_ C4 Cayley Table Modular Arithmetic

By inspecting this table, we can classify Z_4 = ( \left\{ 0, 1, 2, 3 \right\} ), +_{4}).

  1. Does it have closure? Yes. Every element in the table is a member of the original set.
  2. Does it have associativity? Yes. (This cannot be determined by the table alone, but is true on inspection).
  3. Does it have identity? Yes. The rows and columns associated with 0 express all elements of the set.
  4. Does it have inverse? Yes. The identity element appears in every row and every column.
  5. Does it have commutativity? Yes. The table is symmetric about the diagonal.

Therefore, Z_4 is an abelian group.

An Example Using Roots of Unity

Definition 1. A group is said to be order n if its underlying set has cardinality n.

So Z_4 is order 4. What other order-4 structures exist?

Consider the equation i^4 = -1. Its solutions, or roots, is the set \left\{ 1, i, -1, -i \right\}. This set is called the fourth roots of unity.

So what is the Cayley table of this set under multiplication R_{4} = ( \left\{ 1, i, -1, -i \right\}, *)? In the following table, recall that i = \sqrt{-1}, thus i^2 = (sqrt{-1})^2 = -1.

Geometric Group Theory_ C4 Cayley Table Roots of Unity (1)

Something funny is going on. This table (and its colors) are patterned identically to Z_4! Recall that a binary operation is just a function f : A \times A \rightarrow A. Let’s compare the function maps of our two groups:

Cyclic Groups_ Binary Operation as Function (2)

These two groups for structurally identical: two sides of the same coin. In other words, they are isomorphic, we write Z_{4} \cong R_{4}. Let us call this single structure C_4.

But why are these examples of modular arithmetic and complex numbers equivalent?

One answer involves an appeal to rotational symmetry. Modular arithmetic is the mathematics of clocks: the hands of the clock rotating around in a circle. Likewise, if the reals are a number line, complex numbers are most naturally viewed as rotation on a number plane.

This rotation interpretation is not an accident. It helps use more easily spot other instances of C_4. Consider, for instance, the following shape.

Geometric Group Theory_ Rotational Symmetry Object

On this shape, the group of rotations that produce symmetry is W_4 = (\left\{ 0, 90, 180, 270 \right\}, \text{rotate}). Inspection reveals that this, too, is isomorphic to C_{4}!

Towards The Presentation Formalism

We describe C_3 as a cyclic group, for reasons that will become clear later. 

Theorem 2. For every cyclic group C_n, there exists some generator g in its underlying set such that every other set element can be constructed by that generator.

Definition 3. When a generator has been identified, we can express a group’s underlying set with generator-dependent names. Two notation are commonly used in practice:

  1. In multiplicative notation, the elements are renamed \left\{ e, r, r^2, r^3 \right\}, where r is any generator.
  2. Similarly, in additive notation, the elements become \left\{ e, r, 2r, 3r \right\}.

Geometric Group Theory_ Multiplicative and Additive Notation (1)

These two notation styles are interchangeable, and a matter of taste. In my experience, most mathematicians prefer multiplicative notation.

What generators exist in C_4? Let’s look at our three instantiations of this group:

  • In modular arithmetic, you can recreate all numbers by 0 + 1 + 1 + \ldots. But you can also recreate them by 0 + 3 + 3 + \ldots.
  • In complex numbers, you can visit all numbers by multiplying by i, or multiplying by -i. Only -1 fails to be a generator.
  • In our rotation symmetry shape, two generators exist: clockwise 90 \circ rotation, and counterclockwise 90 \circ rotation.

For now, let’s rename all elements of C_{4} to be C_4 = (\left\{ 0, 1, 2, 3 \right\}, +)  = \langle 1 \rangle = \langle 3 \rangle.

Okay. But why is 2 not a generator in C_4?

Theorem 4. For finite groups of order n, each generator must be coprime to n. That is, their greatest common divisor \text{gcd}(g, n) = 1.

  • 2 not a generator in C_4 because it is a divisor of | \left\{ 0, 1, 2, 3 \right\} | = 4.
  • What are the generators in C_5? All non-identity elements: C_{5} = \langle 1 \rangle = \langle 2 \rangle =  \langle 3 \rangle =  \langle 4 \rangle.
  • What are the generators in C_6? Only 1 and 5: C_{5} = \langle 1 \rangle = \langle 5 \rangle.

We just spent a lot of words discussing generators. But why do they matter?

Generators are useful because they allow us to discover the “essence” of a group. For example, the Rubik’s cube has 5.19 \times 10^{20} configurations. It would take a long time just writing down such a group. But it has only six generators (one for a 90 \circ rotation along each of its faces) which makes its presentation extremely simple.

Another way to think about it is, finding generators is a little bit like identifying a basis in linear algebra.

Towards Cayley Diagrams

Definition 5. We are used to specifying groups as set-operator pairs. A presentation is an generator-oriented way to specify the structure of a group. A relator is defined as constraints that apply to generators. A presentation is written \langle \text{generators} \mid \text{relators} \rangle

  • In multiplicative notation: C_4 = \langle r \mid r^3 = e \rangle.
  • In additive notation: C_4 = \langle r \mid 3r = e \rangle.

The = e suffix is often left implicit from presentations (e.g., C_4 = \langle r  \mid r^n \rangle) for the sake of concision.

Definition 6. A Cayley diagram is used to visualize the structure specified by the presentation.  Arrow color represents the generator being followed.

Note that Cayley diagrams can be invariant to your particular choice of generator:

Geometric Group Theory_ Cayley Diagram (1)

The shape of the Cayley diagram explains why C_3 is called a cyclic group, by the way!

With these tools in hand, let’s turn to more complex group structures.

Dihedral Groups

Cyclic groups have rotational symmetry. Dihedral groups have both rotational and reflectional symmetry. The dihedral group that describes the symmetries of a regular n-gon is written D_{n}. Let us consider the “triangle group” D_{3}, generated by a clockwise 120\circ rotation r and a horizontal flip f.

With triangles, we know that three rotations returns to the identity r^3 = e. Similarly, two flips returns to the identity f^2 = e. Is there some combination of rotations and flips that are equivalent to one another? Yes. Consider the following equality:

Geometric Group Theory_ Rotation vs Reflection Equivalence (2)

Analogously, it is also true that rf = fr^2.

Definition 7. Some collection of elements is a generating set if combinations amongst only those elements recreates the entire group.

Cyclic groups distinguish themselves by having only one element in their generating set. Dihedral groups require two generators.

We can write each dihedral group element based on how it was constructed by the generators:

D_n = \left\{ e, r, r^2, \ldots, r^n-1, f, rf, r^2f, \ldots, r^{n-1}f \right\}

Alternatively, we can instead just write the presentation of the group:

D_{3} = \langle r, f \mid r^3 = 1, f^2 = 1, r^2f = fr, rf = fr^2  \rangle.

We can visualize this presentation directly, or as a more abstract Cayley graph:

Geometric Group Theory_ Dihedral Groups Intro (3)

The Cayley table for this dihedral group is:

Geometrical Group Theory- Dihedral Cayley Table

This shows that D_3 is not abelian: its multiplication table is not symmetric about the diagonal.

By looking at the color groupings, one might suspect it is possible to summarize this 6 \times 6 table with a 2 \times 2 table. We will explore this intuition further, when we discuss quotients.

Until next time.

Wrapping Up


  • Finite groups can be analyzed with Cayley tables (aka multiplication tables).
  • The same group can have more than one set-operation expressions (e.g., modular arithmetic vs. roots of unity vs. rotational symmetry).
  • Generators, elements from which the rest of the set can be generated, are a useful way to think about groups.
  • Group presentation is an alternate way to describing group structure. We can represent presentation visually with the help of a Cayley diagram.
  • Cyclic groups (e.g., C_3) have one generator; whereas dihedral groups (e.g., D_3) have two.

Related Resources

  • This post is based on Professor Macaulay’s Visual Group Theory lectures, which in turn is based on Nathan Carter’s eponymous textbook.
  • Related to this style of teaching group theory are Dana Ernst’s lecture notes.
  • If you want to see explore finite groups with software, Group Explorer is excellent.
  • For a more traditional approach to the subject, these Harvard lectures are a good resource.

An Introduction to Abstract Algebra

Part Of: Algebra sequence
Content Summary: 1200 words, 12 min read

A Brief Prelude

Recall that a set is a collection of distinct objects, and a function f: A \rightarrow B is a mapping from the elements of one set to another. Further, in number theory we can express numbers as infinite sets:

  • The natural numbers \mathbb{N} = \left\{ 0, 1, 2, 3, \ldots \right\}.
  • The integers \mathbb{Z} = \left\{ \dots, -2, -1, 0, -1, -2, \ldots \right\}.
  • The rational numbers \mathbb{Q} = \left\{ x | x = p/q, p \in \mathbb{Z}, q \in \mathbb{Z}, q \neq 0 \right\}.
  • The real numbers \mathbb{R}.

The Axioms of Addition and Multiplication

In elementary school you learned that a+b = b+a, for any two integers. In fact there exist five such axioms:

  • Closure. \forall a, b \in \mathbb{Z}: a + b \in \mathbb{Z}.
  • Associativity\forall a, b, c \in \mathbb{Z}: (a + b) + c = a + (b+c).
  • Identity. There exists an element 0 such that, \forall a \in \mathbb{Z}: 0 + a = a + 0 = a.
  • Inverse. \forall a \in \mathbb{Z} there exists an element \boldsymbol{-a} such that a + (-a) = (-a) + a = 0.
  • Commutativity. \forall a, b \in \mathbb{Z}: a + b = b + a.

These axioms encapsulate all of integer addition. We can represent “integer addition” more formally as a set-operator pair: (\mathbb{Z}, +)

Likewise, you have surely learned that a \times b = b \times a. Multiplication too can be described with five axioms:

  • Closure. \forall a, b \in \mathbb{Z}: a \times b \in \mathbb{Z}.
  • Associativity\forall a, b, c \in \mathbb{Z}: (a \times b) \times c = a \times (b \times c).
  • Identity. There exists an element 1 such that, \forall a \in \mathbb{Z}: 1 \times a = a \times 1 = a.
  • Inverse. \forall a \in \mathbb{Z} there exists an element \frac{1}{a} such that a \times \frac{1}{a} = \frac{1}{a} \times a = 1.
  • Commutativity. \forall a, b \in \mathbb{Z}: a \times b = b \times a.

These axioms encapsulate all of integer multiplication. We can represent “integer multiplication” more formally as a set-operator pair: (\mathbb{Z}, \times)

Towards Algebraic Structure

Did the above section feel redundant? A lesson from software engineering: if you notice yourself copy-pasting, you should consolidate the logic into a single block of code.

Let’s build an abstraction that captures the commonalities above.

Definition 1. A binary operation is a function that takes two arguments. Since functions can only map between two sets, we write f : A \times A \rightarrow A.

Examples of binary operations include +, \times, \text{etc}. Note that a \times b is just shorthand for the more formal \times(a, b). Note that the operation symbol \times is just a name: we could just as easily rename the above function to be f(a, b), as long as the underlying mapping doesn’t change.

Definition 2. Let arity denote the number of arguments to an operation. A binary operation has arity-2. A unary operation (e.g., sin(x)) has arity-1. A finitary operation has arity-n.

Definition 3. An algebraic structure is the conjunction of a set with some number of finitary operations, and may be subject to certain axioms. For each operation in an algebraic structure, the following axioms may apply:

  • Closure. \forall a, b \in \mathbb{A}: a \bullet b \in \mathbb{A}.
  • Associativity\forall a, b, c \in \mathbb{A}: (a \bullet b) \bullet c = a \bullet (b \bullet c).
  • Identity. There exists the element \boldsymbol{e} such that, \forall a \in \mathbb{A}: \boldsymbol{e} \bullet a = a \bullet \boldsymbol{e} = a.
  • Inverse. \forall a \in \mathbb{A} there exists an element \boldsymbol{a^{-1}} such that a \bullet \boldsymbol{a^{-1}} = \boldsymbol{a^{-1}} \bullet a = \boldsymbol{e}.
  • Commutativity. \forall a, b \in \mathbb{A}: a \bullet b = b \bullet a.

Algebraic structures are a generalization of  integer addition and integer multiplication. Our  (\mathbb{Z}, +) and (\mathbb{Z}, \times) tuples actually comprise parameters that specify an algebraic structure.

As soon as we define algebraic structures, we begin to recognize these objects strewn across the mathematical landscape. But before we begin, a word about axioms!

The Axiomatic Landscape

Consider algebraic structures that exhibit one binary operation. These structures may honor different combinations of axioms. We can classify these axiom-combinations. Here then, are five kinds algebraic structures (“Abelian” means commutative):

Abstract Algebra- Structure Names

Of course, more esoteric options are available, including:

Abstract Algebra- Other Structure Names (1)

Of all these structures, groups are the most well-studied. In fact, it is easy to find it is not uncommon to of people conflating groups vs algebraic structures.

Definition 4. An algebraic structure is group-like if it contains one 2-ary operation. If it has more than one operation, or operation(s) with a different arity, it is not group-like.

All of our examples today count as group-like algebraic structures. There is also a large body of research studying algebraic structures with two operations, including ring-, lattice-, module-, and algebra-like structures. We will meet these structures another day.

Examples of Group-Like Structures

We saw above that the integers under addition (\mathbb{Z}, +) and multiplication (\mathbb{Z}, \times) are abelian groups. A similar finding occurs when you switch to the reals, or rationals, or natural numbers.

But addition and multiplication are not the only possible binary operations. What about subtraction (\mathbb{Z}, +)? Well, that is only a magma. Closure is satisfied, but all other axioms are violated (e.g., associativity (4 - 2) - 2 \neq 4 - (2-2)) and commutativity (4 - 2 \neq 2 - 4). Likewise, the natural numbers under subtraction are not even a magma: 2 - 4 \not\in \mathbb{N}.

All of our examples so far have groups encapsulating sets of numbers. But groups can contain sets of anything! Let’s switch to linear algebra. What about the set of all n \times n matrices under matrix multiplication?

  • Does it have closure? Yes. Matrix multiplication yields another n \times n matrix.
  • Does it have associativity? Yes. Matrix multiplication is associative.
  • Does it have identity? Yes. The identity element is the matrix I = [ \begin{smallmatrix}1 & 0\\0 & 1\end{smallmatrix}].
  • Does it have inverse? No!  Some n \times n matrices have determinants of 0. Thus, not all members of our set are invertible.

We can now identify this algebraic structure. The set of all n \times n matrices under matrix multiplication is a monoid.

But what if we limit our set to be all n \times n matrices with non-zero determinants? Well, that is a group (the inverse exists for all members). More formally, that set forms the basis of the general linear group GL_{n}(\mathbb{R}). Why isn’t it abelian? Because matrix multiplication is not commutative.

These five examples provide a glimpse into the landscape of algebraic structures. Our recipe is simple:

Take any set and operation that you care about. Classify the resultant algebraic structure by examining which axioms hold.

With these tools, we can begin to build a map of algebraic structures:



  • Multiplication and addition share a remarkable number of properties, including closure, associativity, identity, inverse, and commutativity.
  • An algebraic structure (set-operation pair) generalizes the similarities in the above examples.
  • Algebraic structures can have more than one operation. Group-like structures are those with only one (binary) operation.
  • Once you can know about algebraic structures, you can find examples of them strewn across the mathematical landscape.

Until next time.

Isotopy in Ambient Manifolds

Part Of: Analysis sequence
Followup To: An Introduction to Topology
Content Summary: 1300 words, 13 min read

Degenerate Geometries

Two lines can either be parallel, or not. There exist unending variations of both situation. But which is more common, on the average?

Consider two lines y_{1} = m_{1}x + b_{1} and y_{2} = m_{2}x + b_{2} When do we call these lines parallel? When their slopes are equal m_{1} = m_{2}. We can gain insight into the situation by mapping parameter space, where m_{1} and m_{2} form the horizontal and vertical axes respectively.

The red line does not represent one line, but an infinite set of parallel lines where m_{1} = m_{2}.

Isotopy Invariance_ Degenerate Geometries Parallel Lines

Suppose we start somewhere on the red line (with some pair of parallel lines)$. Perturbation of the slopes of these lines corresponds to a random walk beginning at that point. The more you walk around on the plane, the more likely you are to stand in green territory (slopes whose lines are not parallel).

Definition 1. A situation holds generically if, by that perturbing its constituent properties, it tends to default to that situation.

This concept applies in many situations. For example:

  • In two dimensions, two lines might be parallel, but generically intersect at a point.
  • In three dimensions, two planes might be parallel, but generically intersect in a line.
  • In linear algebra, a matrix might be singular is its determinant is equal to zero. But the determinant becomes non-zero on perturbation of individual matrix values. Thus, matrices are generically invertible.

This concept has also been called general position.

Manifold Intersection & Overflow

We now turn to the general science of intersection. The following observations hold generically:

  • In two dimensions, a point and another point do not intersect.
  • In two dimensions, a point and a line do not intersect.
  • In two dimensions, a line and a line do intersect, at a point.
  • In three dimensions, a line and another line do not intersect.
  • In three dimensions, a line and a plane do intersect, at a point.
  • In three dimensions, a plane and another plane do intersect, at a line.

Points, lines, planes… these are manifolds! Some of them infinitely large, but manifolds nonetheless! Let’s use the language of topology to look for patterns.

Each of the above examples contains two submanifolds K and L being placed in an ambient manifold M. We denote their intersection as K \cap L. Let us compare the dimensions of these three manifolds to the dimension of their overlap.

We represent dim(M), dim(K), dim(L), dim(K \cap L) as m, k, l, k \cap l respectively. Now our examples can be expressed as 4-tuples (m, k, l, k \cap l):

  • (2, 0, 0, \varnothing)
  • (2, 0, 1, \varnothing)
  • (2, 1, 1, 0)
  • (3, 1, 1, \varnothing)
  • (3, 1, 2, 0)
  • (3, 2, 2, 1)

In four dimensions, would we expect two planes to intersect; and if so, what would we expect the dimension of the intersection? Put differently, what would we predict to be the value of x in (4, 2, 2, x)?

If you guessed x=0, that the two planes intersect at a point, you have noticed the pattern!

Definition 2. Consider two submanifolds embedded in an ambient manifold K, L \subseteq M. The overflow is defined as o = (k+l) - m.

Theorem 3. Let K, L \subseteq M. The following properties are true, generically:

  • If o < 0, the submanifolds do not intersect: K \cap L = \varnothing
  • If o \geq 0, the intersection is non-empty K \cap L \neq \varnothing, and dim( K \cap L )= o

To see why this is the case, consider basis vectors in linear algebra. An m-dimensional space requires an m-dimensional basis vector. Submanifold dimensions are then “placed within” the ambient basis. If we try to minimize the overlap between two submanifolds, the equation for overflow falls out of the picture. 🙂

Isotopy_ Linear Algebra Basis and Overflow

Overflow during Motion

We have considered overflow in static submanifolds. But what if we move one of them?

The following observations hold generically:

  • In two dimensions, moving a point across a point do not intersect.
  • In two dimensions, moving a point across a line intersect, at a point.
  • In two dimensions, moving a line across another line intersect, across the entire line.
  • In three dimensions, moving a line across another line intersect, at a point.
  • In three dimensions, moving a line across a plane intersect, at a line.
  • In three dimensions, moving a plane across another plane intersect, across the entire plane.

Compare these “motion” (m, k, l, k \cap l) tuples against our previous tuples:

  • (2, 0, 0, \varnothing) vs. (2, 0, 0, \varnothing)
  • (2, 0, 1, 0) vs. (2, 0, 1, \varnothing)
  • (2, 1, 1, 1) vs. (2, 1, 1, 0)
  • (3, 1, 1, 0) vs. (3, 1, 1, \varnothing)
  • (3, 1, 2, 1) vs. (3, 1, 2, 0)
  • (3, 2, 2, 2) vs. (3, 2, 2, 1)

The dynamic overflow is one dimension larger than the static overflow. Why? 

The way I like to think about it is, by moving K across time, you are effectively enlisting an extra dimension (time). A moving point starts to look like a string, etc.

Theorem 4.  Let K, L \subseteq M. Suppose we want to move K from one side to the other side of L. This crossing is said to be topologically possible if K and L do not intersect at any point during the transition. The possibility of a crossing depends on the overflow o:

  • If o < -1, then generically, a crossing is possible (at all times, K \cap L = \varnothing)
  • If o \geq -1, then generically, a crossing is not possible (at some time, K \cap L \neq \varnothing and dim( K \cap L )= o

Towards Isotopy

Let’s put this theorem to work.

Example 5. In what R^m is the following movement possible?

Isotopy_ Two Line Crossover

We know from physical experience that this is not possible in our three-dimensional universe R^{3}. But Theorem 4 says that, if the ambient dimension is four, then the overflow is (1+1) - 4 = -2 < -1, so crossing is possible.

Intuitively, this makes sense. In three dimensions, a point can “hop” over a line by moving into the “extra” dimension. Similarly, the line K can cross over L by moving into the fourth dimension.

We can generalize this notion of successful crossing as follows:

Definition 6. Imagine moving submanifold K through an ambient manifold M. Let K_{0} and K_{1} represent the beginning and end positions as it travels throughout time t \in [0,1]. If K_{t} never has self-intersection at any time t, we say K_{0} is isotopic to K_{1} in M, and write K_{0} \sim_{M} K_{1}

Isotopy is especially relevant to knot theory. A classic example is the trefoil knot, a simple kind of knot. The string trefoil knot composed of 1-dimensional string is not isotopic to the unknot (1-torus) in \mathbb{R}^3: this is why it is called non-trivial.

Example 7. Show how the trefoil knot can isotope to the unknot.

  1. From K_{0} to K_{0.5}: we can unwind the trefoil knot in \mathbb{R}^4: simply lift the top-left string up.
  2. From K^{0.5} to K^{1}: you only need \mathbb{R}^3 to finish the job. Simply pull the top loop down, and then untwist.

Isotopy_ Unknotting Trefoil (1)

So knot theory is only interesting in \mathbb{R}^3! In \mathbb{R}^2, knots self-intersect. In \mathbb{R}^4, they are all isotopic to the unknot.

More Isotopy Examples

Let’s get some more practice under our belt.

Example 8. Consider a genus-two torus with its holes intertwined. What ambient manifold do we require to undo the knot?

Isotope_ Solution in R6 (1)

We might imagine the solution isotope would require us to pull apart the “hands” directly. And it is true that we can successfully isotope in M = \mathbb{R}^{6}; after all o = (2+2) - 6 = -2 < -1.

But do we really need six dimensions to get this done? Or can we do better? It turns out that we only really need M = \mathbb{R}^{3}:

Isotope_ Solution in R3

As the number of ambient dimensions increases, finding an isotope becomes increasingly easy. Thus, we often strive to find the smallest possible ambient manifold.

Example 9. Consider a genus-3 torus (\Sigma^{3}). If we isotope along its surface using M = \Sigma^{3}, is K_{1} \sim K_{2}? How about L_{1} \sim L_{2}

Isotopy_ Genus 3 Torus (1)

It is easy to see that K_{1} \sim_{\Sigma^{3}} K{2}. You just pull the circle left, along that surface of the torus.

But it takes some time to see that L_{1} \nsim L{2}. You might suspect you can just pull the blue circle over the middle hole. But that would require leaving the surface \Sigma^{3}. Thus L_{1} \nsim_{\Sigma^{3}} L{2} (but L_{1} \sim_{\mathbb{R}^{3}} L{2}).

Related Materials

This post is based on Dr. Tadashi Tokeida’s excellent lecture series, Topology & Geometry. For more details, check it out!

An Introduction to Set Theory

Part Of: Algebra sequence
Content Summary: 1800 words, 18 min read

Fundamentals of Sets

Definition 1. A set is a collection of distinct objects.

A few examples to kick us off:

  • MyFavoriteFruits = \left\{ apples, persimmon, pineapple \right\} represents the set of fruits which I prefer.
  • A = \left\{ 1, 2, 3, 4, 5 \right\} can represent, among other things, the fingers on my left hand.
  • The set of natural numbers \mathbb{N} = \left\{ 0, 1, 2, 3, 4, ... \right\}.
  • The set of integers \mathbb{Z} = \left\{ ..., -2, -1, 0, 1, 2, ... \right\}.

Sets are subject to the following properties:

  • Order blindness: \left\{2, 1, 3\right\} and \left\{1, 2, 3\right\} expresses the very same set.
  • Duplicate blindness. \left\{ 1, 1, 2, 3, 3 \right\} = \left\{1, 2, 3 \right\}. We will prefer to express sets with the latter, more compact, notation.
  • Recursion. \left\{ 1, \left\{2, 3\right\} \right\} is a perfectly valid two-element set, quite distinct from the three-element set \left\{ 1, 2, 3 \right\}.

Definition 2. Two sets A and B are said to be equal (A = B) if A and B contain exactly the same elements.

  • Let A = \left\{ 1, 2, 3 \right\} and B = \left\{ 3, 1, 2 \right\}. Then, A = B.
  • Let A = \left\{ 1, 2, 3 \right\} and B = \left\{ 1, \left\{2, 3\right\} \right\}. Then, A \neq B.

Definition 3. If an object x is an element of (i.e., member of) set S, we write x \in S. Otherwise, we write x \notin S.

  • Let PrimaryColors = \left\{ red, yellow, blue \right\}. Then yellow \in PrimaryColors means “yellow is an element of the set of primary colors”.
  • -1 \notin \mathbb{N} means “-1 is not an element of the natural numbers”.
  • 1 \notin B = \left\{ 0, \left\{ 1 \right\}, \left\{ \left\{ 2 \right\} \right\} \right\}. The element 1 is not in B: only the set \left\{ 1 \right\} is.

Definition 4. For some set X, its cardinality (i.e., size) |X|, is the number of elements in that set.

  • Let A = \left\{ 1, 2, 3, 4, 5 \right\}. Then |A| = 5.
  • Let B = \left\{ 1, \left\{2, 3 \right\} \right\}. Then |B| = 2. Note that cardinality only looks at “the outer layer”.

Definition 5. The empty set (i.e., the null set) \varnothing is the set containing no elements.

  • \varnothing = \left\{ \right\}.
  • | \varnothing | = 0.
  • | \left\{ \varnothing \right\} | = 1

Definition 6. Instead of listing all elements, set builder notation specifies rules or properties that govern the construction of a set. The notation takes the following form: { x | \normalfont{property of} x }, where the | symbol is pronounced “such that”.

  • A = \left\{ 1, 2, 3, 4, 5 \right\} = \left\{ x \in \mathbb{Z} | x > 0, x < 6 \right\}. In words “let A be the set of integers X such that x is greater than zero and less than six.”
  • The set of rational numbers \mathbb{Q} = \left\{ x / y \mid x \in \mathbb{Z}, y \in \mathbb{Z}, y \neq 0 \right\}.

Sets defined by their properties are often called set comprehension. Such properties or rules are called the intension, the resultant set is called the extension.

  • Let A = \left\{ y | x \in \mathbb{R}, y = (x+1)(x-1) \right\} and let B = \left\{ y | x \in \mathbb{R}, y = x^{2} -1 \right\}. Here A = B, despite their use of different rules. We say that A and B have the same extension, but different intensions.

The intension-extension tradeoff denotes an inverse relationship between the number of intensional rules versus the size of the set those rules pick out. Let’s consider two examples to motivate this tradeoff.

Consider hierarchical addressing in computer architecture. Suppose we have 2^{6} = 64 bits of computer memory, each bit of which is uniquely identified with a 6-bit address. Suppose further that our memory has been allocated to data structures of varying size. To promote addressing efficiency, a computer can adopt the following strategy: assign shorter addresses to larger variables.

Naive Set Theory- Hierarchical Addressing

We can also see the intension vs. extension tradeoff in the memory systems of the brain. Specifically, semantic memory is organized into a concept hierarchy. We might classify a Valentine’s day gift according to the following tree:

Naive Set Theory- Concept Hierarchy

The number of objects classified as a RED_ROSE is clearly less than the number of objects classified as LIVING_THING. But as our extensional size decreases, the size of our intension (the number of properties needed to classify a RED_ROSE) increases.

Subsets and Power set

Definition 7. A set A is a subset of another set B, written A \subseteq B, if every element of A is also an element of B.

  • Let A = \left\{ 2, 3, 9 \right\} and B = \left\{1,9,3,6, 2\right\}. Then A \subseteq B (recall that element order is irrelevant).
  • Let A = \left\{ 2, 3, 9 \right\} and C = \left\{1,7, 3, 6, 2 \right\}. Then A \nsubseteq C (C does not contain 9).
  • Is \varnothing a subset of \left\{2,3,9\right\}?  Yes. For all A, \varnothing \subseteq A is true.

Definition 8. A set A is a proper subset of another set B, written A \subset B, if A \subseteq B and A \neq B.

Definition 9. For a given set A, its power set \mathbb{P} (A) is the set of all subsets of A.

  • Let A = \left\{ 0, 1 \right\}. Then \mathbb{P}(A) = \left\{ \varnothing ,  \left\{ 0 \right\} , \left\{ 1 \right\}, \left\{ 0, 1 \right\} \right\}.

A power set can be constructed by the use of a binary tree, as follows:

Naive Set Theory- Building Powersets (1)

As can be seen above, the total number of subsets must be a power of two. Specifically, if |A| = n, then |\mathbb{P}(A)| = 2^n.

It is important to get clear on the differences between the element-of ( \in ) versus subset-of ( \subseteq ) relations. Consider again A = \left\{ 0, 1 \right\} and its power set \mathbb{P}(A) = \left\{ \varnothing ,  \left\{ 0 \right\} , \left\{ 1 \right\}, \left\{ 0, 1 \right\} \right\}

  • A \in \mathbb{P}(A). But \left\{A\right\} \not\in \mathbb{P}(A). The \in relation requires the brackets match.
  • \left\{A\right\} \subseteq \mathbb{P}(A). But A \nsubseteq \mathbb{P}(A). The \subseteq relation requires the “extra bracket”.

Probability theory is intimately connected with the notion of power set. Why? Because many discrete probability distributions have \sigma-algebras draw from the power set of natural numbers \mathbb{P} ( \mathbb{N} ).

Cartesian Product, Tuples

Definition 10. Given two sets A and B, their Cartesian product A \times B is the set of all ordered pairs \langle a, b \rangle such that a \in A and b \in B. Note that, unlike the elements in a set, the elements of an ordered pair cannot be reordered.

  • Let A = \left\{1, 2, 3 \right\} and B = \left\{4, 5, 6\right\}. Then A \times B = \left\{ (1, 4) , (1, 5), (1,6), (2, 4), (2, 5), (2, 6),  (3, 4), (3, 5), (3, 6) \right\}.

We can represent this same example visually, as follows:

Naive Set Theory- Cartesian Product (1)

  • Contrast this with B \times A = \left\{ (3, 1) , (4, 1), (2, 4), (1, 3) \right\}. Thus, A \times B \neq B \times A. This is because elements within ordered pairs cannot be rearranged.
  • Note that |A \times B| = 9 = |A_{1}| \times |A_{2}|. In combinatorics, this observation generalizes to the multiplication principle.
  • The real plane \mathbb{R}^2 = \mathbb{R} \times \mathbb{R} is a well-known example of a Cartesian product.

Definition 11. Given n sets, A_{1}, A_{2}, \ldots , A_{n}, their Cartesian product is the set of all n-tuples.

  • Let A = \left\{1, 2\right\}, B = \left\{a, b\right\} and C = \left\{ 100 \right\}. Now A \times B \times C = \left\{ (1, a, 100) , (1, b, 100), (2, a, 100), (2, b, 100) \right\}.

Linear algebra is intimately connected with the Cartesian product operation. Why? Because n-tuples are strongly related to n-dimensional vectors. 

Intersection and Union

Definition 12. The intersection of two set A and B, written A \cap B , is the set of elements common to both sets.

  • Let A = \left\{ 2, 4, 9 \right\} and B = \left\{1, 2, 3, 6, 9 \right\}. Then A \cap B = \left\{ 2, 9 \right\}.
  • Let A = \left\{ 2, 4, 9 \right\} and B = \left\{1, 3, 6, 8, 10\right\}. Then A \cap B = \varnothing.
  • Let A = \left\{ 2, 4, 9 \right\}. Then \varnothing \cap A = A \cap \varnothing = \varnothing.

Definition 13. The union of two sets A and B, written A \cup B is the set of all elements that are in A or B, or both.

  • Let A = \left\{ 2, 4, 9 \right\} and B = \left\{1, 2, 3, 6, 9 \right\}. Then A \cup B = \left\{ 1, 2, 3, 4, 6, 9 \right\}.
  • Let A = \left\{ 2, 4, 9 \right\}. Then \varnothing \cup A = A \cup \varnothing = A.

Venn diagrams represent sets as enclosed areas in a 2D plane. If two (or more!) sets have shared elements, their areas overlap. We can use this technique to visualize sets and their overlap:

Naive Set Theory- Elements vs Venn Diagram

We can also use Venn diagrams to represent our intersection and union relations:

Naive Set Theory- Union vs Intersection Venn (1)

Note that |A \cup B| = |A| + |B| - |A \cap B|. This makes sense in light of the Venn diagram. Adding the cardinality of both sets counts the elements that exist in the middle section twice. To avoid this, we subtract the cardinality of the intersection. In combinatorics, this formula is generalized by the inclusion-exclusion principle

Difference and Complement

Definition 14. Given two sets A and B, their difference A \setminus B is the set of elements in A but not also in B.

  • Let A = \left\{ 1, 2, 3, 4, 5, 6 \right\} and B = \left\{0, 2, 4, 6, 8 \right\}. Then A \setminus B = \left\{ 1, 3, 5 \right\} and B \setminus A = \left\{ 0, 8 \right\}.
  • Let A = \left\{ 1, 2, 3, 4, 5, 6 \right\} and B = \left\{7, 8, 9, 10 \right\}. Then A \setminus B = A and B \setminus A = B.

Definition 15. Given two sets A and B, the symmetric difference A \triangle B is the set of elements in A or B, but not both.

  • Let A = \left\{ 1, 2, 3, 4, 5, 6 \right\} and B = \left\{0, 2, 4, 6, 8 \right\}. Then A \triangle B = \left\{ 0, 1, 3, 5, 8 \right\}.

Definition 16. In many set problems, all sets are defined as subsets of some reference set. This reference set is called the universe U.

  • Let A = \left\{ 1+i, 12-8j, 3+0i \right\} and let its universe be the set of complex numbers \mathbb{C}. It is true that A \subseteq \mathbb{C}.

Definition 17. Relative to a universe U, the complement of A, written \overline{A} is the set of all elements of U not contained in A.

  • Let U be the set of positive integers less than 10: U = \left\{ x | x \in \mathbb{Z}^{+}, x < 10 \right\} and A = \left\{ 1, 2, 3, 4, 5 \right\}. Then \overline{A} = \left\{ 6, 7, 8, 9 \right\}.

We can again represent these relations graphically, via Venn diagrams:

Naive Set Theory- Complement and Difference Venn (2)




Let me summarize this post in terms of our 17 definitions 🙂

  • Def 1-5 introduced the notion of set, set equality, the element-of operator, cardinality (set size), and empty set.
  • Def 6 introduced set builder notation, and the intension-extension tradeoff.
  • Def 7-9 introduced subset, proper subset, and power set.
  • Def 10-11 introduced Cartesian product, ordered pairs, and n-tuples.
  • Def 12-13 introduced intersection (“and”) and union (“or”), as well as Venn diagrams.
  • Def 14-17 introduced difference, symmetric difference (“xor”), and complement (“not”).

Want to learn more? I recommend the following resources:

This introductory article focused on promoting intuitions through worked examples. Next time, we’ll look at these same operations more carefully, and examine the relationship between set theory and classical predicate logic.

New Foundations: Towards Tribal Unity

Part Of: Principles of Machine Learning sequence
Followup To: Five Tribes of Machine Learning
Content Summary: 1700 words, 17 min read


In Five Tribes of Machine Learning, I reviewed Pedro Domingos’ account of tribes within machine learning. These were the Symbolists, Connectionists, Bayesians, Evolutionaries, and Analogizers. Domingos thinks the future of machine learning lies in unifying these five tribes into a single algorithm. This master algorithm would weld together the different focal points of the various tribes (c.f. the parable of the blind men and the elephant).

Today, I will argue that Domingos’ goal is worthy, but his approach too confined. Integrating theories of learning surely constitutes a constructive line of inquiry. But direct attempts to unify the tribes (e.g., Markov logic) are inadequate. Instead, we need to turn our gaze towards pure mathematics: the bedrock of machine learning theory. Just as there are tribes within machine learning, mathematical research has its own tribes (image credit Axel Sarlin):

New Foundations- Mapthematics

The tribes described by Domingos draw from the math of the 1950s. Attempting mergers based on these antiquated foundations is foolhardy. Instead, I will argue that updating towards modern foundational mathematics is a more productive way to pursue the master algorithm. Specifically, I submit that machine learning tribes should strive to incorporate constructive mathematics, category theory, and algebraic topology.

Classical Foundations

Domingos argues for five machine learning tribes. I argue for four. I agree that his Symbolists, Connectionists , and Bayesians are worthy of attention. But I will not consider his Evolutionaries and Analogizers: these tribes have been much less conceptually coherent, and also less influential. Finally, I submit Frequentists as a fourth tribe. While this discipline tend to self-identify as “predictive statistics” instead of  “machine learning”, their technology is sufficiently similar to merit consideration.

The mathematical foundations of the Symbolists rests on predicate logic, invented by Gottlieb Frege and C.S. Peirce. This calculus in turn forms the roots of set theory, invented by Georg Cantor and elaborated by Bertrand Russell. Note that 3 out of 4 of these names come from analytic philosophy. Alan Turing’s invention of his eponymous machine marked the birthplace of computer science. The twin pillars of computer science are computability theory and complexity theory, which in turn both rest on top of set theory. Finally, algorithm design connects with the mathematical discipline of combinatorics.

The foundation of the Statisticians (both Bayesian and Frequentist) is measure theory (which, coincidentally, borrows from set theory). The field of information theory gave probability distributions the concept of uncertainty: see entropy as belief uncertainty. Finally, formal theories of learning draw heavily from optimization: where model parameters are tuned to optimize against miscellaneous objective functions.

Mathematical research can largely be decomposed into two flavors: algebraic and analytical. Algebra focuses on mathematical objects and structures: group theory, for example, falls under its umbrella. Analysis alternatively focuses on continuity, and includes fields like measure theory and calculus. Notice that the mathematical foundations of the Symbolists is fundamentally algebraic; whereas that of the Statisticians are analytic. This gets at the root of why machine learning tribes often have difficulty communicating with one another.

New Foundations- Foundations

Classical Applications

We have already noted that that Symbolists, Connectionists, and Bayesians have all created applications in machine learning (decision trees, neural networks and graphical models, respectively). These tribes are also expressed in neuroscience (language of thought, Hebbian learning, and Bayesian Brain, respectively). They have also all developed their own flavors of cognitive architectures (e.g., production rule systems, attractor networks, and predictive coding respectively).

Frequentist Statisticians have no real presence in machine learning, neuroscience, nor cognitive architecture. But they are the only dominant force in the social sciences; e.g., econometrics.

I should also note that, in addition to the fields already noted Symbolists have unique presence in linguistics (especially Chomskyian universal grammar) and analytic philosophy (c.f., that field’s heavy reliance on predicate logic, and the linguistic turn in the early twentieth century).

Finally, causal inference only exists in the Bayesian (Pearlean d-separation) and Frequentist (Rubin potential outcome models). To my knowledge, this technology has not yet been robustly integrated into the Symbolist nor Connectionist tribes to date.

New Foundations- Applications

These four tribes largely draw from early twentieth century mathematics. Let us now turn to what mathematicians have been up to, in the past century.

Towards New Foundations

Let me now introduce you to the three developments in modern mathematics: constructive mathematics, category theory, and algebraic topology.

In classical logic, truth is interpreted ontologically: a fact about the world. But truth can also be interpreted epistemically: a true proposition is one that we can prove. But epistemic logic (aka intuitionistic logic) has us reject the Law of Excluded Middle (LEM): failing to prove a theorem is not the same thing as disproving it.

By removing LEM from mathematics, proof-by-contradiction become impossible. While this may seem limiting, in fact it also opens the doors for constructive mathematics: mathematics that can be input, and verified, by a computer. Erdos’ Book of God will be supplanted by the Github of God.

In recent years, category theory has emerged as the lingua franca of theoretical mathematics. It is built on the observation that all mathematical disciplines (algebraic and analytic) fundamentally describe mathematical objects and their relationships. Importantly, category theory allows theorems proved in one category to be translated into entirely novel disciplines.

Finally, since Alexander Grothendieck’s work on sheaf and topos theory, algebraic topology (and algebraic geometry) have come to occupy an increasingly central role in mathematics. This trend has only intensified in the 21st century. As John Baez puts it,

These are just the first steps in the ‘homotopification of mathematics, a trend in which algebra more and more comes to resemble topology, and ultimately abstract ‘spaces’ (for example, homotopy types) are considered as fundamental as sets.

New Foundations- Three Pillars Overview

These three “pillars” are perhaps best motivated by the technology that rests on it.

Computational trinitarianism is built on deep symmetries between proof theory, type theory and category theory. The movement is encapsulated in the slogan “Proofs are Programs” and “Propositions are Types”. This realization led to the development of Martin-Lof dependent type theory, which in turn has led to theorem proving software packages such as Coq.

In metamathematics, researchers investigate whether a single formal language can form the basis of the rest of mathematics. Historically, three candidates have been Zermelo-Frankel (ZF) set theory, and more recently Elementary Theory of the Category of Sets (ETCS). Homotopy type theory (HoTT) is a new entry into the arena, and extends computational trinitarianism by the Univalence Axiom, an entirely new interpretation of logical equality. Under the hood, the univalence axiom relies on a topological interpretation of the equality type. Suffice it to say, this particular theory has recently inspired a torrent of novel research. Time will tell how things develop.

In thermodynamics is built on the idea of Gibbs entropy (or, more formally, free energy). The basic intuition, which stems from statistical physics, is that disorder tends to increase over time. And thermodynamics does appear to be relevant in a truly diverse set of physical phenomena.

  • In physics, entropy is the reason behind the arrow of time (its “forward directionality”)
  • In chemistry, entropy forms the basis for spontaneous (asymmetric) reactions
  • In paleoclimatology, there is increasing reason to think that abiogenesis occurred via a thermodynamic process.
  • In anatomy, entropy is the organizing principle underlying cellular metabolism.
  • In ecology, entropy explains emergent phenomena related to biodiversity.

If I were to point at one candidate for the Universal Algorithm, entropy minimization would be my first pick. It turns out, strangely enough, that thermodynamic (Gibbs) entropy has the same functional form as information-theoretic (Shannon) entropy, which measures uncertainty in probability distributions. This is no accident. Information geometry extends this notion of “thermodynamic information” by interpreting entropy-distributions as stochastic manifolds.

In physics, of course, the two dominant theories of nature (general relativity + QFT) are mutually incompatible. It is increasingly becoming apparent that quantum topology is most viable way to achieve a Grand Unified Theory. From this paper,

Feynman diagrams are used to reason about quantum processes. In the 1980s, it became clear that underlying these diagrams is a powerful analogy between quantum physics and topology. Namely, a linear operator behaves very much like a ‘cobordism’: a manifold representing spacetime, going between two manifolds representing space. This led to a burst of work on topological quantum field theory and ‘quantum topology’

New Foundations- Three Pillars Complete (1)

Searching For Unity

That was a lot of content. Let’s zoom out. What is the point of being introduced to these new foundations? To give an more detailed intuition on which ML research is worthy of your attention (and participation!).

Most attempts to unify machine learning draw from merely classical foundations. For example, consider fuzzy logic, Markov logic networks, Dempster-Shafer theory, and Bayesian Neural Networks. While these ideas may be worth learning (particularly the last two), as candidates for unification they are necessarily incomplete; doomed by their unimaginative foundations.

New Foundations- Old Research Strategies (2)

In contrast, I submit you should funnel more enthusiasm towards ideas that draw from our new foundations. These may be active research concepts.

  • In linguistics, categorical compositionality is the marriage of category theory and traditional syntax. It blends nicely with probabilistic approaches of meaning (e.g., word2vec). See this 2015 paper, for example.
  • In statistics, topological data analysis is a rapidly expanding discipline. Rather than limiting oneself to probabilistic distribution theory (exponential families), this approach to statistics incorporates structural notions from algebraic topology. See this introductory tutorial, for example.
  • In neuroscience, the most recent Blue Brain experiment suggests that the Hebbian-style learning is not the whole story. Instead, the brain seems to rely on connectome topography: dynamically summon and disperse cliques of neurons, whose cooperation subsequently disappears like a tower of sand.
  • In macroeconomics, neoclassical models (based on partial differential equations) are being challenged by a new kind of model, econophysics, which views the market as a kind of heat machine.

New Foundations- New Research Strategies (1)

Or they may be entirely unexplored questions that dawn on you by contemplating conceptual lacunae.

  • What would happen if I were to re-imagine probability theory from intuitionistic principles?
  • How might I formalize production rule cognitive architectures like ACT-R in category theory?
  • Is there a way to understand neural network behavior and the information bottleneck from a topological perspective?

Until next time.