Why Language Models?

In the English language, ‘e’ appears more frequently than ‘z’. Similarly, “the” occurs more frequently than “octopus”. By examining large volumes of text, we can learn the probability distributions of characters and words.

Roughly speaking,** statistical structure** is distance from maximal entropy. The fact that the above distributions are non-uniform means that English is internally recoverable: if noise corrupts part of a message, the surrounding can be used to recover the original signal. Statistical structure is also used to reverse engineer secret codes such as the Roman cipher.

We can illustrate the predictability of English by generating text based on the above probability distributions. As you factor in more of the surrounding context, the utterances begin to sound less alien, and more like natural language.

A **language model** exploits the statistical structure of a language to express the following:

- Assign a probability to a sentence
- Assign probability of an upcoming word

Language models are particularly useful in language perception, because they can help interpret ambiguous utterances. Three such applications might be,

- Machine Translation:
- Spelling correction:
- Speech Recognition:

Language models can also aid in language production. One example of this is autocomplete-based typing assistants, commonly displayed within text messaging applications.

Towards N-Grams

A sentence is a sequence of words . To model the joint probability over this sequence, we use the **chain rule**:

As the number of words grows, the size of our **conditional probability tables (CPTs)** quickly becomes intractable. What is to be done? Well, recall the** Markov assumption** we introduced in Markov chains.

The Markov assumption constrains the size of our CPTs. However, sometimes we want to condition on more (or less!) than just one previous word. Let denote how many variables we admit in our context. A **variable order Markov model (VOM)** allows elements in its context: . Then the size of our CPT is , because we must take our original variable into account. Thus an **N-gram** is defined as a -order Markov model. By far, the most common choices are trigrams, bigrams, and unigrams:

We have already discussed Markov Decision Processes, used in reinforcement learning applications. We haven’t yet discussed MRFs and HMMs. VOMs represent a fourth extension: the formalization of N-grams. Hopefully you are starting to appreciate the richness of this “formalism family”.

Estimation and Generation

How can we estimate these probabilities? By counting!

Let’s consider a simple bigram language model. Imagine training on this corpus:

This is the cheese.

That lay in the house that Alice built.

Suppose our trained LM encounters the new sentence “this is the house”. It estimates its probability as:

How many problems do you see with this model? Let me discuss two.

First, we have estimated that . And it is true that “this” occurs only once in our toy corpus above. But out of two sentences, “this” leads half of them. We can express this fact by adding a special START token into our vocabulary.

Second, recall what happens when language models generate speech. Once they begin a sentence, they are unable to end it! Adding a new END token will allow our model the terminate a sentence, and begin a new one.

With these new tokens in hand, we update our products as follows:

A couple other “bug fixes” I’ll mention in passing:

- Out-of-vocabulary words are given zero probability. It helps to add an unknown (UNK) pseudoword and assign it some probability mass.
- LMs prefer very short sentences (sequential multiplication is monotonic decreasing). We can address this e.g., normalizing by sentence length.

Smoothing

In the last sentence in the image above, we estimate , because we have no instances of this two-word sequence in our toy corpus. But this causes our language model to fail catastrophically: the sentence is deemed impossible (0% probability).

This problem of zero probability increases as we increase the complexity of our N-grams. Trigram models are more accurate than bigrams, but produce more events. You’ll notice echoes of the bias-variance (accuracy-generalization) tradeoff.

How can we remove zero counts? Why not add one to every word? Of course, we’d then need to increase the size of our denominator, to ensure the probabilities still sum to one. This is **Laplace smoothing**.

In a later post, we will explore how (in a Bayesian framework) such smoothing algorithms can be interpreted as a form of regularization (MAP vs MLE).

Due to its simplicity, Laplace smoothing is well-known But several algorithms achieve better performance. How do they approach smoothing?

Recall that a zero count event in an -gram is not likely to occur in -gram model. For example, it is very possible that the phrase “dancing were thought” hasn’t been seen before.

While a trigram model may balk at the above sentence, we can fall back on the bigram and/or unigram models. This technique underlies the **Stupid Backoff **algorithm.

As another variant on this theme, some smoothing algorithms train multiple -grams, and essentially use interpolation as an ensembling method. Such models include Good-Turing and Kneser-Ney algorithms.

Beam Search

We have so far seen examples of language perception, which assigns probabilities to text. Let us consider language perception, which generates text *from *the probabilistic model. Consider **machine translation**. For a French sentence , we want to produce the English sentence such that .

This seemingly innocent expression conceals a truly monstrous search space. **Deterministic search** has us examine *every possible English sentence*. For a vocabulary size , there are possible two-word sentences. For sentences of length , our time complexity of our brute force algorithm is .

Since deterministic search is so costly, we might consider **greedy search** instead. Consider an example French sentence “Jane visite l’Afrique en Septembre”. Three candidate translations might be,

- : Jane is visiting Africa in September
- : Jane is going to Africa in September
- : In September, Jane went to Africa

Of these, is the best (most probable) translation. We would like greedy search to recover it.

Greedy search generates the English translation, one word at a time. If “Jane” is the most probable first word , then the next word generated is . However, it is not difficult to contemplate , since the word “going” is used so much more frequently in everyday conversation. These problems of local optima happen surprisingly often.

The deterministic search space is too large, and greedy search is too confining. Let’s look for a common ground.

Beam search resembles greedy search in that it generates words sequentially. Whereas greedy search only drills one such path in the search tree, beam search drills a finite number of paths. Consider the following example with **beamwidth **

As you can see, beam search elects to explore as a “second rate” translation candidate despite initially receiving the most probability mass. Only later in the sentence does the language model discover the virtues of the translation.

Strengths and Weaknesses

Language models have three very significant weaknesses.

First, language models are blind to syntax. They don’t even have a concept of nouns vs. verbs! You have to look elsewhere to find representations of pretty much any latent structure discovered by linguistic and psycholinguistic research.

Second, language models are blind to semantics and pragmatics. This is particularly evident in the case of language production: try having your SMS autocomplete write out an entire sentence for you. In the real world, communication is more constrained: we choose the most likely word *given the semantic content we wish to express right now*.

Third, the Markov assumption is problematic due to **long-distance dependencies**. Compare the phrase “dog runs” vs “dogs run”. Clearly, the verb suffix depends on the noun suffix (and vice versa). Trigram models are able to capture this dependency. However, if you center-embed prepositional phrases, e.g., “dog/s that live on my street and bark incessantly at night run/s”, N-grams fail to capture this dependency.

Despite these limitations, language models “just work” in a surprising diversity of applications. These models are particularly relevant today because it turns out that Deep Learning sequence models like LSTMs share much in common with VOMs. But that is a story we shall have to take up next time.

Until then.

]]>

Sorry it’s been so long since my last post! I’ve been teaching a Deep Learning class, based on Andrew Ng’s Coursera specialization. Don’t worry, my other lectures will ultimately be cleaned & shared here too

This talk covers the mathematical intuitions of RL, which draws from content relating to Markov Chains and Markov Decision Processes. It also contains some novel material, including my thoughts on how RL compares with other machine learning techniques.

]]>

A Theory of Relationship Dynamics

How can we make sense of social life? Let’s start by considering a simple cup of coffee.

- In my own house, I can just help myself to as much as I want, sharing with others in the framework of “what’s mine is yours.”
- Or my friend can get me a cup of coffee in return for the one I got for him yesterday, so we take turns or match small favors for each other.
- At Starbucks, I buy my coffee, using price and value as the framework.
- To my children, however, none of these principles apply. To them, coffee is something that only “big people” are allowed to drink: It is a privilege that goes with social rank.

What is true of a humble cup of coffee is true of the moral dilemmas surrounding major policy questions such as **organ donation**. Decisions have to be made, and there are again four fundamental ways to make them:

- Should we hold a lottery, giving each person an equal chance?
- Should we somehow rank the social importance of potential recipients?
- Should we sell organs to the highest bidder?
- Or should we expect everyone in a local community to give freely, offering a kidney to anyone group member in need?

*(The above excerpt is from [FE] )*

**Relational Models Theory (RMT)** proposes that these four social categories are exhaustive and culturally universal. Human interactions are complex, and typically use more than one of the above processes. But *every* relationship, in every culture, seems to be some combination of the following:

- In
**Communal Sharing (Communality)**, people are viewed as equals oriented around some particular identity. This can include being in love, sports fans, and co-religionists. - In
**Authority Ranking (Dominance)**, people are situated in a hierarchy where superiors are deferred to, respected, and in some cases obeyed. - In
**Equality Matching (Reciprocity)**, people are interested in restoring balance, turn-taking, and making sure everyone is treated fairly. - In
**Market Pricing (Exchange)**, relationships are governed by quantitative, utilitarian concerns such as prices, exchanges, or cost-benefit analyses.

We can use relational models to explain a wide swathe of social phenomena:

- Some examples of
**norm violation**are in fact category errors. For example, we would interpret a situation such as*the price of our meal is two hours on dishwasher duty*as a conflation of Market Pricing vs. Equality Matching. - Some (but not all) examples of
**taboo trade-offs**are in fact category errors. The*Finite Price of Human Life*thesis feels counterintuitive because it pits our Market Pricing versus the sacred values held by Communality. - Humans often use
**indirect speech acts**to reconcile relationship types with semantic content.Rather than saying e.g., “pick me up after work”, we often say things like, “If you would pick me up after work, that would be awesome”. While more verbose, the latter expression feels more polite*because*it is couched in a Communality frame, rather than signaling Dominance.

In addition to its explanatory reach, multiple strands of evidence come together in support of Relational Model theory:

- Factor analysis. If you ask people to describe their relationships, you can see whether your theory predicts statistical patterns in their responses. When RMT was compared with other taxonomies (and there are a lot of them), RMT starkly outperforms its competitors.
- Ethnographies. RMT was invented by anthropologist Alan Fiske to capture regularities he saw across different cultures. For example, he found examples of marriage treated as Dominance, as Market Pricing, etc – but never a fifth type. A number of cross-cultural studies indicate that the four relational models constitute a
**human universal**. - Social errors. When people misremember a person’s name, it tends to be a person with whom they share the same relationship type. For example, if you flub the name of your boss, you are more likely to say the name of someone else in a position of authority over you.
- Brain studies. In the cortex, the
**default mode network**is universally acknowledged to perform social processing. But within this specialized region, different subregions are activated when processing e.g., Communality vs Reciprocity relationships.

The Relational Sphere Hypothesis

Human societies can be conceived as operating in three spheres: markets, governments, and communities. The **Cultural Sphere Hypothesis** holds this trichotomy to be fundamental, and exhaustive of social space.

There seems to be a relationship between the cultural spheres and relation models. But there are three spheres vs four models. What gives?

Things become more clear when we remember that market- based economies were invented during the Neolithic Revolution, with the dawn of agriculture. Before this inflection point in history, transactions took place with gift economies.

This suggests that the Market Pricing relational model is evolutionarily recent: before the invention of agriculture, it simply did not exist.

I call this particular mapping from relational models to cultural spheres the **Relational Sphere Hypothesis (RSH)**. It is an intertheoretic reduction: it purports to be a significant join point between micro- and macro-sociality.

RSH predicts that three out of four relational models can be traced back to the birthplace of Homo Sapiens. Thus, we should expect predecessors for these relationship categories in primate societies! And we find precisely that:

- Dominance models are expressed in the
**dominance hierarchy**(where physical dominance slowly gave way to symbolic dominance). - Communality models are expressed in
**kin selection**(where attachment to and care for relatives was slowly extended towards e.g. close friends). - Reciprocity models are expressed in
**reciprocal altruism**(where increasingly large delays between favor-transactions became possible).

I have argued elsewhere that the dual-process models so popular in today’s moral psychology can be captured in the interactions between (cortical) propriety frames and (subcortical) social intuitions. These two systems comprise the building blocks of sociality. RSH dovetails nicely with this dual process account, as it perceives categories within these systems, each with its own distinctive logic:

With the exception of Sanctity, these subconscious social intuitions arguably exist in primates. For example, here is evidence that rhesus monkeys have strong intuitions about Fairness:

A New Kind of Social Network

The Relational Sphere Hypothesis can be further illustrated by **social networks**: graphs where nodes are individuals, and edges are relationships. These kinds of models are very common across many disciplines that study aggregate social phenomena; for example evolutionary game theorists. A social network may look something like this:

But relationships inhabit different categories. We can express this fact by coloring edges according to their relational model:

Note that some nodes (e.g. A and B) are connected by more than one color. This signifies that the relationship between A and B features both Communality and Dominance.

From this more complete picture of human relationships, we can derive our cultural spheres by examining the (mono-color) subgraphs:

Sphere Evolution & Competition

Political, social, and economic institutions have dramatically changed across the course of human history. As we saw in Deep History of Humanity, the evolution of our species can be usefully divided into three time periods:

The **Sphere Competition Conjecture **comprises a set of informal intuitions that relational models “competes for our attention”: gains in one sphere are often accompanied by losses in another.

Let me illustrate this conjecture with examples.

Social vs Economic spheres

- The religious instinct is etched deeply into the hominid mind, and evidence for shamanic animism dates back to the advent of behavioral modernity. Modern religion is located squarely within the Social sphere. But what caused its
*institutionalization*, the invention of the full-time religious specialist: the priest?**Religious institutions**were founded during the transition from**gift economy**to market economies. For the first time in history, material wealth mattered more in transactions than interpersonal reputation. With the Social sphere threatening to collapse, perhaps it is not a coincidence that it was at this moment in history that religion became more explicitly social. - Some existential philosophers argue that the industrial revolution, with its obscenely large increase in Economic productivity, has correlated with a weakening of Social values, as witnessed empirically by the rise of
**materialism**. Perhaps the malaise and cynicism of postmodernity can be explained by the weakening of the ties of community. - The custom of
**tipping**can be conceived as an organ of Sociality, that feels misplaced in today’s Market-oriented economy. This institution shows no signs of abating (for example, Uber recently rescinded its no-tipping policy). Perhaps the reason this Social technology persists, while others have disintegrated, is because tipping solves the**principal agent problem**: customer service is otherwise not factored into the price, because that information is not easily available to management. **Product boycotts**are another example of Social outrage affecting Economic markets.

Social vs Political.

- Another important event in the history of religion is the transition to
**universal religions**: where the concerns of the gods and the consequences of moral violations were imbued with an aura of the eternal. Anthropological evidence clearly suggests that universal religions succeeded because they facilitated larger group sizes. **Corruption**is often treated as a political problem, but in fact bribery and collusion both require high amounts of social capital.- In American history, political
**partisanship**has been most severe in the 1880s, and at present. Both then and now are periods of an intense drought of social capital. Further, participation in voting strongly correlates with vibrant community and civic life. We might conjecture that weaker communities are more vulnerable to partisanship infighting. This conjecture is aligned with the oft-cited observation that partisanship tends to correlate with moderates abandoning the political arena.

Economic vs Political.

**Capitalist Peace Theory**formalizes the observed inverse relationship between free trade and international conflict. On this hypothesis, one of the strongest predictors of war is resource acquisition, and the risk-benefit calculus changes (improves) substantially with the removal of tariffs.

Economic vs Political vs Social.

- The
**Size of Nations Hypothesis**is the idea that the size of nation (Political) is driven by two competing factors: larger nations are able to produce public goods more efficiently (Economic), but conversely their populations are more heterogenous and thereby less cohesive (Socially).

Some of the phenomena described above have been extensively studied by social scientists. However, to my knowledge, no extant models robustly capture the doctrine of relational model theory. Perhaps the next generation of formal models will do better.

Recommended Resources

- [FE] Fiske, Ehrenhalt.
*Basic Relationships*. Accessible here (first link) - [Has04] Haslam (2004).
*Relational Models Theory: A Contemporary Overview* - [Wick09] Wick’s (2009).
*A Model of Dynamic Balance among the Three Spheres of Society.* - Pinker’s take on Relational Models Theory, Animated
- RMT Research Bibliography

Most people agree that human societies operate in different contexts: markets, governments, and communities. The **Three Sphere Hypothesis** holds that this trichotomy is fundamental and exhaustive of social space. What’s more, these spheres interact. Neither markets nor governments nor communities can be analyzed thoroughly without understanding their dependence upon, and their effects upon, the others.

[Excerpt] Intellectual History of the Hypothesis

**Source**: Wicks (2009). *A Model of Dynamic Balance among the Three Spheres of Society*

Social scientists – including economists – as well as journalists and others, often refer to “the economic, political, and social conditions” underlying any particular situation, but usually without any further analysis of what these terms imply, and how they relate to each other.

Apparent references to these three spheres pop up – in both popular and technical literature – almost everywhere. It can be a fun game, like “whack-a-mole”:

- Where and how will the three spheres “pop up” in this or that text?
- And, given any set of three social attributes that do “pop up”, can they be seen in some way as representing the three spheres?

Etzioni (1996:122) speaks of “three different conditions: paid, coerced, or convinced”; Etzioni (1988) explores motivations in the community sphere at length.

Personalist economics, based on Catholic theology, also recognizes three organizing principles: competition, intervention, and cooperation (Jonish and Terry, 1999:465-6; O’Boyle, 1999:536-7, 2000:550-51).

Hirschman (1992) referred to three social mechanisms: exit, voice, and loyalty. Though all three can apply in varying ways to each sphere, exit refers primarily to the market sphere where, in a competitive situation, one has unlimited choice of buyers or sellers, so can “exit” from any one. Voice might refer primarily to the political sphere, where one can attempt to influence results by persuasion, and loyalty to the community sphere – though one could argue the other way as well.

Streeck and Smitter (1985:1) refer to these “three basic mechanisms of mediation or control” (Ouchi, 1980) as spontaneous solidarity, hierarchical control, and dispersed competition.

Friedland and Alford (1991:39) refer to three domains with different “logics of action”: In the marketplace, we are more likely to base our actions on individual utility and efficient means; in the polity, on democracy and justice; and in the family, on mutual support.

Van Staveren (2001:24) asserts that “three values appear time and again in economic analysis: liberty, justice, and care. Markets tend to express freedom, states to express justice, and unpaid labor to express care among human beings.” She notes (p. 213) that Ayres (1961:170) asserted a similar set of core human values: “freedom, equality, and security”. Van Staveren (p. 203) also notes:

- the form that these values take: exchange, redistribution, and giving;
- the locations where they operate: market, state, and the care-economy; and
- the corresponding virtues: prudence, propriety, and benevolence.

She further asserts that there are “distinct emotions and forms of deliberation as well”.

Mackey (2002:384) refers to “economic, political, and social problems” in Saddam’s Iraq; elsewhere (p. 181) she uses a different order, referring to “the new political, social, and economic paradigm” (an order which Rothstein and Stolle, 2007:1, also use); and yet elsewhere (p. 49) she notes that something “meant more socially, politically, and economically”. The order of expression doesn’t seem to matter, to Mackey or to most other authors, and one can easily find the other three permutations as well (e.g., Friedman, 2000:131; Giddens and Pierson, 1998:89; Sage, 2003).

But the community sphere is often ignored, and thus is sometimes considered third (Adaman and Madra, 2002). In political theory, the “Third Way” (Giddens, 1998) represents an alternative to either markets or governments, focused more in communities.

Waterman (1986:123) asserts “three freedoms: economic, political, and religious (conscience)”; and Hobson (1938/1976:52) refers to “the democratic triad of liberty, equality, fraternity”.

As some of these examples illustrate, a wide variety of words are used to refer to the three spheres, as in the title of the book (cited by Bennett, 1985) *Mexico: Catholicism, Capitalism, and the State*, or when

- Mackey (2002:217) discusses “political, economic, and… cultural control”;
- Bowles (1998:105) refers to “states, communities, and markets”;
- Wright (2000:211) refers to “governance, moral codes, and markets”;
- Mauss (1925/1967:52) refers to the “law, morality, and economy of the Latins” and to “the distinction between ritual, law, and economic interest”;
- Yuengert (1999:46) discusses “free markets circumscribed within a tight legal framework, and operating within a humane culture”;
- Polanyi (1997:140), in discussing “economic life”, refers to “freedom under law and custom, as laid down and amended when necessary by the State and public opinion”.

In *The Foundations of Welfare Economics* (1949:230), Little points out that “if a person argues that a certain change would increase economic welfare, it is open to anyone to argue that it would decrease spiritual or political welfare.”

This tripartite taxonomy has been used by economists since Adam Smith who, of course, had first written *The Theory of Moral Sentiments* (1759/1982) about communities and social goods, then *The Wealth of Nations* (1776/1976) about markets, economics. But he was planning a third major work – which was never completed – on the political system (Smith, 1759/1982:342 and “Advertisement” therein).

Minowitz (1993) uses the same tripartite taxonomy twice (in varying order) in the title of his book: *Profits, Priests, and Princes: Adam Smith’s Emancipation of Economics from Politics and Religion.*

The English economist and theologian Philip Wicksteed referred to “business, politics, and the pulpit” in his book of sermons titled Is Christianity Practical? (1885/1920, referenced in Steedman 1994:83). In discussing Wicksteed’s work, Steedman (p. 99) also refers to “potatoes, politics, and prayer”. Similarly, Hobson (1938/1976:55) referred to “the purse, power, and prestige of the ruling classes in business, politics, and society”. Success itself is often defined as “wealth, fame, and power” (Bogle, 2004:1; Carey, 2006), or sometimes as “money, status, and power”.

A similar tripartite taxonomy – perhaps Marxian – of firms, social classes, and states, can easily be seen as referring to the three spheres.

According to Trotsky (1957:255), communism would demonstrate that the human race had “ceased to crawl on all fours before God, kings, and capital” (quoted by Minowitz, 1993:240).

A variety of sources also provide evidence of an apparently widespread belief that the three spheres are both fundamental and exhaustive of social space. Michael Novak refers to the “three mutually autonomous institutions: the state, economic institutions, and cultural, religious institutions” as “the doctrine of the trinity in democratic capitalism” (Abdul-Rauf, 1986:175; also Neuhaus, 1986:517).

Dasgupta (1993:104) notes “one overarching idea, that of citizenship, with its three constituent spheres: the civil, the political, and the socio-economic.”

Meyer et al. (1992:12) assert that “individuals must acquire the means to participate effectively in the economic, social, and political life of the nation.” In the same work, Wong (1992:141) makes it clear that these three spheres are considered exhaustive by referring to “all social domains… economy… polity… and… cultural system”.

Polanyi (1997:158) describes the Russian Revolution and the Soviets’ “project for a new economic, political, and social system of mankind”.

Shadid (2001:3) points out that “political Islam, or Islamism…suggests an all-embracing approach to economics, politics, and social life.”

Dicken (2007:538) says that “corporate social responsibilities span the entire spectrum of relationships between firms [and] states, civil society, and markets.”

]]>

A graphic I created summarizing key cultural and biological milestones.

Note that time is situated on a logarithmic scale. Full resolution image here.

Hominid Phylogeny

Of course, the hominid line began diverging genetically from that of other primates around 7 million year ago.

Image from Berkeley’s Understanding Evolution. Full resolution image here.

Out of Africa

Finally, here is the geography & timeline of the emigration waves out of Africa, courtesy of Huffington Post and National Geographic.

A couple facts that provide context on our journey out of Africa:

- Emigration paths were radically influenced by the Quaternary glaciation.
- Emigration was largely caused by the Toba supereruption.

Related Content

This post bears on the history of human- and hominid-like species.

- For a history of the Earth, see my common descent graphic here.
- For a history of the universe, see Deep Time: The Story of Cosmogenesis.

On Permutations

In the last few posts, we have discussed algebraic structures whose sets contain objects (e.g., numbers). Now, let’s consider structures over a set of *functions*, whose binary operation is function composition.

Definition 1. Consider two functions and . We will denote **function composition** of as . We will use this notation instead of the more common . Both represent the idea “apply , then “.

Consider .

Is this a group? Let’s check closure:

Closure is violated. isn’t even a magma! Adding to the underlying set exacerbates the problem: then both and .

So it is hard to establish closure under function composition. Can it be done?

Yes. Composition exhibits closure on sets of **permutation** functions. Recall that a permutation is simply a **bijection**: it re-arrange a collection of things. For example, here are the six possible bijections over a set of three elements.

Definition 2. The **symmetric group** denotes composition over a set of all bijections (permutations) over some set of objects. The symmetric group is then of order .

The underlying set of is the set of all permutations over a 3-element set. It is of order .

This graphical representation of permutations is rather unwieldy. Let’s switch to a different notation system!

Notation 3: **Two Line Notation**. We can use two lines to denote each permutation mapping . The top row represents the original elements , the bottom represents where each element has been relocated .

Two line notation is sometimes represented as an array, with the top row as matrix row, and bottom denoting matrix column. Then the identity matrix represents the identity permutation.

Definition 4. A **cycle **is a sequence of morphisms that forms a closed loop. An n-cycle is a cycle of length n. A 1-cycle does nothing. A 2-cycle is given the special name **transposition**. has two permutations with 3-cycles: can you find them?

Theorem 5. **Cycle Decomposition Theorem**. Every permutation can be decomposed into **disjoint cycles**. Put differently, a node cannot participate in more than one cycle. If it did, its parent cycles would merge.

Notation 6: **Cycle Notation**. Since permutations always decompose into cycles, we can represent them as , pronounced “ goes to goes to …”.

Cycle starting element does not matter: .

The Cycle Algorithm

It is difficult to tell visually the outcome of permutation composition. Let’s design an algorithm to do it for us!

Algorithm 7: **Cycle Algorithm**. To compose two permutation functions and , take each element and follow its arrows until you find the set of disjoint cycles. More formally, compose these functions times until you get .

Here’s a simple example from .

Make sense? Good! Let’s try a more complicated example from .

A couple observations are in order.

- 1-cycles (e.g., ) can be omitted: their inclusion does not affect algorithm results.
- Disjoint cycles commute: . Contrast this with composition, which does not commute .

Now, let’s return to for a moment, with its set of six permutation functions. Is this group closed? We can just check every possible composition:

From the Cayley table on the right, we see immediately that is closed (no new colors) and non-Abelian (not diagonal-symmetric).

But there is something much more interesting in this table. *You have seen it before*. Remember the dihedral group ? It is isomorphic to !

If you go back to the original permutation pictures, this begins to make sense. Permutations and resemble rotations/cycles; , , and perform reflections/flips.

Generators & Presentations

In Theorem 7, we learned that permutations decompose into cycles. Let’s dig deeper into this idea.

Theorem 8. Every n-cycle can be decomposed into some combination of 2-cycles. In other words, cycles are built from **transpositions.**

The group has three transpositions , , and .

Transpositions are important because they are generators: every permutation can be generated by them. For example, . In fact, we can lay claim to an even stronger fact:

Theorem 9. Every permutation can be generated* *by adjacent transpositions. Every permutation .

By the isomorphism , we can generate our “dihedral-looking” Cayley graph by selecting generators and .

But we can use Theorem 9 to produce another, equally valid Cayley diagram. There are two adjacent transpositions in : and . All other permutations can be written in terms of these two generators:

- .
- .
- .
- .

This allows us to generate a transposition-based Cayley diagram. Here are the dihedral and transposition Cayley diagrams, side by side:

We can confirm the validity of the transposition diagram by returning to our multiplication table: means a green arrow .

Note that the transposition diagram is *not* equivalent cyclic group , because arrows in the latter are monochrome and unidirectional.

We’re not quite done! We can also rename our set elements to employ generator-dependent names, by “moving clockwise”:

We could just as easily have “moved counterclockwise”, with names like , . And we can confirm by inspection that, in fact, etc.

Using the original clockwise notation, one presentation of becomes:

Towards Alternating Groups

Any given permutation can be written as a product of permutation. Consider, for example, the above equalities

- . These have 2, 4, and 8 permutations, respectively.
- . These have 3, 5, and 9 permutations, respectively.

Did you notice any patterns in the above lists? All expressions for require an even number of transpositions, and all expressions of require an odd number. In other words, the **parity **(evenness or oddness) of a given permutation doesn’t seem to be changing. In fact, this observation generalizes:

Theorem 10. For any given permutation, the parity of its transpositions is unique.

Thus, we can classify permutations by their parity. Let’s do this for :

- are
**even permutations**. - of
**odd permutations**.

Theorem 11. Exactly half of are even permutations, and they form a group called the **alternating group** . Just as , has elements.

Why don’t odd permutations form a group? For one thing, it doesn’t contain the identity permutation, which is always even.

Let’s examine in more detail. Does it remind you of anything?

It is isomorphic to the cyclic group ..!

We have so far identified the following isomorphisms: and . Is it also true that e.g., and ?

No! Recall that the and . *Only* , these sets are not even *potentially* isomorphic. For example:

- .
- .

For these larger values of , the symmetric group is much larger than dihedral and cyclic groups.

Applications

Why do symmetric & alternating groups matter? Let me give two answers.

Perhaps you have seen the quadratic equation, the generic solution to quadratic polynomials .

Analogous formulae exist for cubic (degree-3) and quartic (degree-4) polynomials. 18th century mathematics was consumed by the **theory of equations**: mathematicians attempting to solve quintic polynomials (degree-5). Ultimately, this quest proved to be misguided: there is no general solution to quintic polynomials.

Why should degree-5 polynomials admit no solution? As we will see when we get to **Galois Theory**, it has to do with the properties of symmetric group .

A second reason to pay attention to symmetric groups comes from the **classification theorem of finite groups**. Mathematicians have spent decades exploring the entire universe of finite groups, finding arcane creatures such as the monster group, which may or may not explain features of quantum gravity.

One way to think about group space is by the following periodic table:

*Image courtesy of Samuel Prime. *

Crucially, in this diverse landscape, the symmetric group plays a unique role:

Theorem 12: **Cayley’s Theorem**. Every finite group is a subgroup of the subgroup , for some sufficiently large .

For historical reasons, subgroups of the symmetric group are usually called **permutation groups**.

Until next time.

Wrapping Up

Takeaways:

- The
**symmetric group**is set of all bijections (permutations) over some set of objects, closed under function composition. - Permutations can be decomposed into disjoint cycles:
**cycle notation**uses this fact to provide an algorithm to solve for arbitrary compositions. - All permutations (and hence, the symmetric group) can be generated by
**adjacent transpositions**. This allows us to construct a presentation of the symmetric group. - Permutations have unique parity: thus we can classify permutations as even or odd. The group of even presentations is called the
**alternating group**. - It can be shown that and . However, for larger , the symmetric and alternating group are much larger than cyclic and dihedral groups.

The best way to learn math is through practice! If you want to internalize this material, I encourage you to work out for yourself the Cayley table & Cayley diagram for .

Related Resources

- This post is based on Professor Macaulay’s Visual Group Theory lectures, which in turn is based on Nathan Carter’s eponymous textbook.
- Related to this style of teaching group theory are Dana Ernst’s lecture notes.
- If you want to see explore finite groups with software, Group Explorer is excellent.

For a more traditional approach to the subject, these Harvard lectures are a good resource.

]]>An Example Using Modular Addition

Last time, we saw algebraic structures whose underlying sets were infinitely large (e.g., the real numbers ). Are finite groups possible?

Consider the structure . Is it a group? No, it isn’t even a magma: ! Is there a different operation that would produce closure?

**Modular arithmetic** is the mathematics of clocks. Clocks “loop around” after 12 hours. We can use modulo-4 arithmetic, or , on . For example, .

To check for closure, we need to add all pairs of numbers together, and verify that each sum has not left the original set. This is possible with the help of a **Cayley table**. You may remember these as elementary school multiplication tables .

By inspecting this table, we can classify .

- Does it have closure? Yes. Every element in the table is a member of the original set.
- Does it have associativity? Yes. (This cannot be determined by the table alone, but is true on inspection).
- Does it have identity? Yes. The rows and columns associated with 0 express all elements of the set.
- Does it have inverse? Yes. The identity element appears in every row and every column.
- Does it have commutativity? Yes. The table is symmetric about the diagonal.

Therefore, is an **abelian group**.

An Example Using Roots of Unity

Definition 1. A group is said to be **order ** if its underlying set has cardinality .

So is order** **4. What other order-4 structures exist?

Consider the equation . Its solutions, or roots, is the set . This set is called the fourth **roots of unity**.

So what is the Cayley table of this set under multiplication ? In the following table, recall that , thus .

Something funny is going on. This table (and its colors) are patterned identically to ! Recall that a binary operation is just a function . Let’s compare the function maps of our two groups:

These two groups for structurally identical: two sides of the same coin. In other words, they are **isomorphic**, we write . Let us call this single structure .

But* why *are these examples of modular arithmetic and complex numbers equivalent?

One answer involves an appeal to **rotational symmetry**. Modular arithmetic is the mathematics of clocks: the hands of the clock rotating around in a circle. Likewise, if the reals are a number *line*, complex numbers are most naturally viewed as rotation on a number *plane*.

This rotation interpretation is not an accident. It helps use more easily spot other instances of . Consider, for instance, the following shape.

On this shape, the group of rotations that produce symmetry is . Inspection reveals that this, too, is isomorphic to !

Towards The Presentation Formalism

We describe as a **cyclic group**, for reasons that will become clear later.

Theorem 2. For every cyclic group , there exists some **generator** in its underlying set such that every other set element can be constructed by that generator.

Definition 3. When a generator has been identified, we can express a group’s underlying set with **generator-dependent names**. Two notation are commonly used in practice:

- In
**multiplicative notation**, the elements are renamed , where r is any generator. - Similarly, in
**additive notation**, the elements become .

These two notation styles are interchangeable, and a matter of taste. In my experience, most mathematicians prefer multiplicative notation.

What generators exist in ? Let’s look at our three instantiations of this group:

- In modular arithmetic, you can recreate all numbers by . But you can also recreate them by .
- In complex numbers, you can visit all numbers by multiplying by , or multiplying by . Only fails to be a generator.
- In our rotation symmetry shape, two generators exist: clockwise rotation, and counterclockwise rotation.

For now, let’s rename all elements of to be .

Okay. But why is not a generator in ?

Theorem 4. For finite groups of order , each generator must be **coprime** to . That is, their greatest common divisor .

- not a generator in because it is a divisor of .
- What are the generators in ? All non-identity elements: .
- What are the generators in ? Only 1 and 5: .

We just spent a lot of words discussing generators. But why do they matter?

Generators are useful because they allow us to discover the “essence” of a group. For example, the Rubik’s cube has configurations. It would take a long time just writing down such a group. But it has only six generators (one for a rotation along each of its faces) which makes its presentation extremely simple.

Another way to think about it is, finding generators** **is a little bit like identifying a **basis** in linear algebra.

Towards Cayley Diagrams

Definition 5. We are used to specifying groups as set-operator pairs. A **presentation **is an generator-oriented way to specify the structure of a group. A **relator **is defined as constraints that apply to generators. A presentation is written

- In multiplicative notation: .
- In additive notation: .

The suffix is often left implicit from presentations (e.g., ) for the sake of concision.

Definition 6. A **Cayley diagram** is used to visualize the structure specified by the presentation. Arrow color represents the generator being followed.

Note that Cayley diagrams can be invariant to your particular choice of generator:

The shape of the Cayley diagram explains why is called a cyclic group, by the way!

With these tools in hand, let’s turn to more complex group structures.

Dihedral Groups

Cyclic groups have rotational symmetry. **Dihedral groups** have both rotational and reflectional symmetry. The dihedral group that describes the symmetries of a regular n-gon is written . Let us consider the “triangle group” , generated by a clockwise rotation and a horizontal flip .

With triangles, we know that three rotations returns to the identity . Similarly, two flips returns to the identity . Is there some combination of rotations *and* flips that are equivalent to one another? Yes. Consider the following equality:

Analogously, it is also true that .

Definition 7. Some collection of elements is a **generating set** if combinations amongst only those elements recreates the entire group.

Cyclic groups distinguish themselves by having only one element in their generating set. Dihedral groups require two generators.

We can write each dihedral group element based on how it was constructed by the generators:

Alternatively, we can instead just write the presentation of the group:

.

We can visualize this presentation directly, or as a more abstract Cayley graph:

The Cayley table for this dihedral group is:

This shows that is not abelian: its multiplication table is not symmetric about the diagonal.

By looking at the color groupings, one might suspect it is possible to summarize this table with a table. We will explore this intuition further, when we discuss **quotients**.

Until next time.

Wrapping Up

Takeaways:

- Finite groups can be analyzed with Cayley tables (aka multiplication tables).
- The same group can have more than one set-operation expressions (e.g., modular arithmetic vs. roots of unity vs. rotational symmetry).
- Generators, elements from which the rest of the set can be generated, are a useful way to think about groups.
- Group presentation is an alternate way to describing group structure. We can represent presentation visually with the help of a Cayley diagram.
- Cyclic groups (e.g., ) have one generator; whereas dihedral groups (e.g., ) have two.

Related Resources

- This post is based on Professor Macaulay’s Visual Group Theory lectures, which in turn is based on Nathan Carter’s eponymous textbook.
- Related to this style of teaching group theory are Dana Ernst’s lecture notes.
- If you want to see explore finite groups with software, Group Explorer is excellent.
- For a more traditional approach to the subject, these Harvard lectures are a good resource.

A Brief Prelude

Recall that a **set **is a collection of distinct objects, and a **function** is a mapping from the elements of one set to another. Further, in number theory we can express numbers as infinite sets:

- The natural numbers .
- The integers .
- The rational numbers .
- The real numbers .

The Axioms of Addition and Multiplication

In elementary school you learned that , for any two integers. In fact there exist five such axioms:

**Closure**. .**Associativity**. .**Identity**. There exists an element such that, .**Inverse**. there exists an element such that .**Commutativity**. .

These axioms encapsulate all of integer addition. We can represent “integer addition” more formally as a set-operator pair:

Likewise, you have surely learned that . Multiplication too can be described with five axioms:

**Closure**. .**Associativity**. .**Identity**. There exists an element such that, .**Inverse**. there exists an element such that .**Commutativity**. .

These axioms encapsulate all of integer multiplication. We can represent “integer multiplication” more formally as a set-operator pair:

Towards Algebraic Structure

Did the above section feel redundant? A lesson from software engineering: if you notice yourself copy-pasting, you should consolidate the logic into a single block of code.

Let’s build an abstraction that captures the commonalities above.

Definition 1. A **binary ****operation** is a function that takes two arguments. Since functions can only map between two sets, we write .

Examples of binary operations include . Note that is just shorthand for the more formal . Note that the operation symbol is just a name: we could just as easily rename the above function to be , as long as the underlying mapping doesn’t change.

Definition 2. Let **arity** denote the number of arguments to an operation. A binary operation has arity-2. A unary operation (e.g., ) has arity-1. A **finitary operation** has arity-n.

Definition 3. An **algebraic structure** is the conjunction of a set with some number of finitary operations, and may be subject to certain axioms. For each operation in an algebraic structure, the following axioms *may* apply:

**Closure**. .**Associativity**. .**Identity**. There exists the element such that, .**Inverse**. there exists an element such that .**Commutativity**. .

Algebraic structures are a generalization of integer addition and integer multiplication. Our and tuples actually comprise parameters that specify an algebraic structure.

As soon as we define algebraic structures, we begin to recognize these objects strewn across the mathematical landscape. But before we begin, a word about axioms!

The Axiomatic Landscape

Consider algebraic structures that exhibit one binary operation. These structures may honor different combinations of axioms. We can classify these axiom-combinations. Here then, are five kinds algebraic structures (“Abelian” means commutative):

Of course, more esoteric options are available, including:

Of all these structures, groups are the most well-studied. In fact, it is easy to find it is not uncommon to of people conflating groups vs algebraic structures.

Definition 4. An algebraic structure is **group-like** if it contains one 2-ary operation. If it has more than one operation, or operation(s) with a different arity, it is not group-like.

All of our examples today count as group-like algebraic structures. There is also a large body of research studying algebraic structures with two operations, including **ring-**, **lattice-**, **module-**, and **algebra-like** structures. We will meet these structures another day.

Examples of Group-Like Structures

We saw above that the integers under addition and multiplication are **abelian groups**. A similar finding occurs when you switch to the reals, or rationals, or natural numbers.

But addition and multiplication are not the only possible binary operations. What about subtraction ? Well, that is only a **magma**. Closure is satisfied, but all other axioms are violated (e.g., associativity ) and commutativity (). Likewise, the natural numbers under subtraction are not even a magma: .

All of our examples so far have groups encapsulating sets of numbers. But groups can contain sets of anything! Let’s switch to linear algebra. What about the set of all matrices under matrix multiplication?

- Does it have closure? Yes. Matrix multiplication yields another matrix.
- Does it have associativity? Yes. Matrix multiplication is associative.
- Does it have identity? Yes. The identity element is the matrix .
- Does it have inverse? No! Some matrices have determinants of 0. Thus, not all members of our set are invertible.

We can now identify this algebraic structure. The set of all matrices under matrix multiplication is a **monoid**.

But what if we limit our set to be all matrices with non-zero determinants? Well, *that* is a **group **(the inverse exists for all members). More formally, that set forms the basis of the **general linear group **. Why isn’t it abelian? Because matrix multiplication is not commutative.

These five examples provide a glimpse into the landscape of algebraic structures. Our recipe is simple:

Take any set and operation that you care about. Classify the resultant algebraic structure by examining which axioms hold.

With these tools, we can begin to build a map of algebraic structures:

Takeaways

- Multiplication and addition share a remarkable number of properties, including closure, associativity, identity, inverse, and commutativity.
- An
**algebraic structure**(set-operation pair) generalizes the similarities in the above examples. - Algebraic structures can have more than one operation. Group-like structures are those with only one (binary) operation.
- Once you can know about algebraic structures, you can find examples of them strewn across the mathematical landscape.

Until next time.

]]>Degenerate Geometries

Two lines can either be parallel, or not. There exist unending variations of both situation. But which is more common, on the average?

Consider two lines and When do we call these lines parallel? When their slopes are equal . We can gain insight into the situation by mapping **parameter space**, where and form the horizontal and vertical axes respectively.

The red line does not represent one line, but an infinite set of parallel lines where .

Suppose we start somewhere on the red line (with some pair of parallel lines)$. Perturbation of the slopes of these lines corresponds to a random walk beginning at that point. The more you walk around on the plane, the more likely you are to stand in green territory (slopes whose lines are not parallel).

Definition 1. A situation holds **generically** if, by that perturbing its constituent properties, it tends to default to that situation.

This concept applies in many situations. For example:

- In two dimensions, two lines
*might*be parallel, but generically intersect at a point. - In three dimensions, two planes
*might*be parallel, but generically intersect in a line. - In linear algebra, a matrix
*might*be singular is its determinant is equal to zero. But the determinant becomes non-zero on perturbation of individual matrix values. Thus, matrices are generically invertible.

This concept has also been called **general position**.

Manifold Intersection & Overflow

We now turn to the general science of intersection. The following observations hold generically:

- In two dimensions, a point and another point do not intersect.
- In two dimensions, a point and a line do not intersect.
- In two dimensions, a line and a line do intersect, at a point.
- In three dimensions, a line and another line do not intersect.
- In three dimensions, a line and a plane do intersect, at a point.
- In three dimensions, a plane and another plane do intersect, at a line.

Points, lines, planes… these are manifolds! Some of them infinitely large, but manifolds nonetheless! Let’s use the language of topology to look for patterns.

Each of the above examples contains two **submanifolds** and being placed in an **ambient manifold** . We denote their intersection as . Let us compare the dimensions of these three manifolds to the dimension of their overlap.

We represent as respectively. Now our examples can be expressed as 4-tuples :

In four dimensions, would we expect two planes to intersect; and if so, what would we expect the dimension of the intersection? Put differently, what would we predict to be the value of in ?

If you guessed , that the two planes intersect at a point, you have noticed the pattern!

Definition 2. Consider two submanifolds embedded in an ambient manifold . The **overflow** is defined as .

Theorem 3. Let . The following properties are true, generically:

- If , the submanifolds do not intersect:
- If , the intersection is non-empty , and

To see why this is the case, consider basis vectors in linear algebra. An m-dimensional space requires an m-dimensional basis vector. Submanifold dimensions are then “placed within” the ambient basis. If we try to minimize the overlap between two submanifolds, the equation for overflow falls out of the picture.

Overflow during Motion

We have considered overflow in static submanifolds. But what if we move one of them?

The following observations hold generically:

- In two dimensions, moving a point across a point do not intersect.
- In two dimensions, moving a point across a line intersect, at a point.
- In two dimensions, moving a line across another line intersect, across the entire line.
- In three dimensions, moving a line across another line intersect, at a point.
- In three dimensions, moving a line across a plane intersect, at a line.
- In three dimensions, moving a plane across another plane intersect, across the entire plane.

Compare these “motion” tuples against our previous tuples:

- vs.
- vs.
- vs.
- vs.
- vs.
- vs.

The dynamic overflow is one dimension larger than the static overflow. Why?

The way I like to think about it is, by moving K across time, you are effectively enlisting an extra dimension (time). A moving point starts to look like a string, etc.

Theorem 4. Let . Suppose we want to move K from one side to the other side of L. This **crossing** is said to be topologically possible if and do not intersect at any point during the transition. The possibility of a crossing depends on the overflow :

- If , then generically, a crossing is possible (at all times, )
- If , then generically, a crossing is not possible (at some time, and

Towards Isotopy

Let’s put this theorem to work.

Example 5. In what is the following movement possible?

We know from physical experience that this is not possible in our three-dimensional universe . But Theorem 4 says that, if the ambient dimension is four, then the overflow is , so crossing is possible.

Intuitively, this makes sense. In three dimensions, a point can “hop” over a line by moving into the “extra” dimension. Similarly, the line can cross over by moving into the fourth dimension.

We can generalize this notion of successful crossing as follows:

Definition 6. Imagine moving submanifold through an ambient manifold . Let and represent the beginning and end positions as it travels throughout time . If never has self-intersection at any time , we say is **isotopic **to in , and write

Isotopy is especially relevant to **knot theory**. A classic example is the **trefoil knot**, a simple kind of knot. The string trefoil knot composed of 1-dimensional string is not isotopic to the **unknot **(1-torus) in : this is why it is called **non-trivial**.

Example 7. Show how the trefoil knot can isotope to the unknot.

- From to : we can unwind the trefoil knot in : simply lift the top-left string up.
- From to : you only need to finish the job. Simply pull the top loop down, and then untwist.

So knot theory is only interesting in ! In , knots self-intersect. In , they are all isotopic to the unknot.

More Isotopy Examples

Let’s get some more practice under our belt.

Example 8. Consider a genus-two torus with its holes intertwined. What ambient manifold do we require to undo the knot?

We might imagine the solution isotope would require us to pull apart the “hands” directly. And it is true that we can successfully isotope in ; after all .

But do we really need six dimensions to get this done? Or can we do better? It turns out that we only really need :

As the number of ambient dimensions increases, finding an isotope becomes increasingly easy. Thus, we often strive to find the *smallest possible* ambient manifold.

Example 9. Consider a genus-3 torus (). If we isotope along its surface using , is ? How about ?

It is easy to see that . You just pull the circle left, along that surface of the torus.

But it takes some time to see that . You might suspect you can just pull the blue circle over the middle hole. But that would require leaving the surface . Thus (but ).

Related Materials

This post is based on Dr. Tadashi Tokeida’s excellent lecture series, Topology & Geometry. For more details, check it out!

]]>Fundamentals of Sets

Definition 1. A **set **is a collection of distinct objects.

A few examples to kick us off:

- represents the set of fruits which I prefer.
- can represent, among other things, the fingers on my left hand.
- The set of natural numbers .
- The set of integers .

Sets are subject to the following properties:

- Order blindness: and expresses the very same set.
- Duplicate blindness. . We will prefer to express sets with the latter, more compact, notation.
- Recursion. is a perfectly valid two-element set, quite distinct from the three-element set .

Definition 2. Two sets A and B are said to be **equal **() if A and B contain exactly the same elements.

- Let and . Then, .
- Let and . Then, .

Definition 3. If an object x is an **element of **(i.e., member of) set S, we write . Otherwise, we write .

- Let . Then means “yellow is an element of the set of primary colors”.
- means “-1 is not an element of the natural numbers”.
- . The element is not in : only the set is.

Definition 4. For some set , its **cardinality **(i.e., size) , is the number of elements in that set.

- Let . Then .
- Let . Then . Note that cardinality only looks at “the outer layer”.

Definition 5. The **empty set** (i.e., the null set) is the set containing no elements.

- .
- .
- .

Definition 6. Instead of listing all elements, **set builder notation** specifies rules or properties that govern the construction of a set. The notation takes the following form: , where the | symbol is pronounced “such that”.

- . In words “let A be the set of integers X such that x is greater than zero and less than six.”
- The set of rational numbers .

Sets defined by their properties are often called **set comprehension**. Such properties or rules are called the **intension**, the resultant set is called the **extension**.

- Let and let . Here , despite their use of different rules. We say that and have the same extension, but different intensions.

The **intension-extension tradeoff** denotes an inverse relationship between the number of intensional rules versus the size of the set those rules pick out. Let’s consider two examples to motivate this tradeoff.

Consider **hierarchical addressing** in computer architecture. Suppose we have bits of computer memory, each bit of which is uniquely identified with a 6-bit address. Suppose further that our memory has been allocated to data structures of varying size. To promote addressing efficiency, a computer can adopt the following strategy: *assign shorter addresses to larger variables.*

We can also see the intension vs. extension tradeoff in the memory systems of the brain. Specifically, semantic memory is organized into a **concept hierarchy**. We might classify a Valentine’s day gift according to the following tree:

The number of objects classified as a RED_ROSE is clearly less than the number of objects classified as LIVING_THING. But as our extensional size decreases, the size of our intension (the number of properties needed to classify a RED_ROSE) increases.

Subsets and Power set

Definition 7. A set is a **subset **of another set , written , if every element of is also an element of .

- Let and . Then (recall that element order is irrelevant).
- Let and . Then (C does not contain 9).
- Is a subset of ? Yes. For all , is true.

Definition 8. A set is a **proper subset** of another set , written , if and .

Definition 9. For a given set , its **power set** is the set of all subsets of .

- Let . Then .

A power set can be constructed by the use of a binary tree, as follows:

As can be seen above, the total number of subsets must be a power of two. Specifically, if , then .

It is important to get clear on the differences between the element-of ( ) versus subset-of ( ) relations. Consider again and its power set

- . But . The relation requires the brackets match.
- . But . The relation requires the “extra bracket”.

Probability theory is intimately connected with the notion of power set. Why? Because many discrete probability distributions have -algebras draw from the power set of natural numbers .

Cartesian Product, Tuples

Definition 10. Given two sets and , their **Cartesian product** is the set of all **ordered pairs** such that and . Note that, unlike the elements in a set, the elements of an ordered pair cannot be reordered.

- Let and . Then .

We can represent this same example visually, as follows:

- Contrast this with . Thus, . This is because elements within ordered pairs cannot be rearranged.
- Note that . In combinatorics, this observation generalizes to the
**multiplication principle**. - The
**real plane**is a well-known example of a Cartesian product.

Definition 11. Given n sets, , their Cartesian product is the set of all n-tuples.

- Let , and . Now .

Linear algebra is intimately connected with the Cartesian product operation. Why? Because n-tuples are strongly related to n-dimensional vectors.

Intersection and Union

Definition 12. The **intersection **of two set and , written , is the set of elements common to both sets.

- Let and . Then .
- Let and . Then .
- Let . Then .

Definition 13. The **union **of two sets and , written is the set of all elements that are in or , or both.

- Let and . Then .
- Let . Then .

**Venn diagrams** represent sets as enclosed areas in a 2D plane. If two (or more!) sets have shared elements, their areas overlap. We can use this technique to visualize sets and their overlap:

We can also use Venn diagrams to represent our intersection and union relations:

Note that . This makes sense in light of the Venn diagram. Adding the cardinality of both sets counts the elements that exist in the middle section *twice*. To avoid this, we subtract the cardinality of the intersection. In combinatorics, this formula is generalized by the **inclusion-exclusion principle**.

Difference and Complement

Definition 14. Given two sets and , their **difference ** is the set of elements in but not also in .

- Let and . Then and .
- Let and . Then and .

Definition 15. Given two sets and , the **symmetric difference** is the set of elements in or , but not both.

- Let and . Then .

Definition 16. In many set problems, all sets are defined as subsets of some reference set. This reference set is called the **universe **.

- Let and let its universe be the set of complex numbers . It is true that .

Definition 17. Relative to a universe , the **complement **of , written is the set of all elements of not contained in .

- Let U be the set of positive integers less than 10: and . Then .

We can again represent these relations graphically, via Venn diagrams:

Takeaways

Let me summarize this post in terms of our 17 definitions

- Def 1-5 introduced the notion of
**set**, set**equality**, the**element-of**operator, cardinality (set size), and**empty set**. - Def 6 introduced
**set builder notation**, and the**intension-extension tradeoff**. - Def 7-9 introduced
**subset**,**proper subset**, and**power set.** - Def 10-11 introduced
**Cartesian product**,**ordered pairs**, and**n-tuples**. - Def 12-13 introduced
**intersection**(“and”) and**union**(“or”), as well as Venn diagrams. - Def 14-17 introduced
**difference**,**symmetric difference**(“xor”), and**complement**(“not”).

Want to learn more? I recommend the following resources:

- Articles: A Little Set Theory (Never Hurt Anybody), Sets, Functions, Relations
- Videos: The Trev Tutor, Discrete Math 1 series

This introductory article focused on promoting intuitions through worked examples. Next time, we’ll look at these same operations more carefully, and examine the relationship between set theory and classical predicate logic.

]]>