Prediction Functions: Regression vs Classification

Part Of: Principles of Machine Learning sequence
Content Summary: 500 words, 5 min read


Data scientists are in the business of answering questions with data. To do this, data is fed into prediction functions, which learn from the data, and use this knowledge to produce inferences.

Today we take an intuitive, non-mathematical look at two genres of prediction machine: regression and classification. Whereas these approaches may seem unrelated, we shall discover a deep symmetry lurking below the surface.

Introducing Regression

Consider a supermarket that has made five purchases of sugar from its supplier in the past. We therefore have access to five data points:

Regression Classification- Regression Data (3)

One of our competitors intends to buy 40kg of sugar. Can we predict the price they will pay?

This question can be interpreted visually as follows:

Regression Classification- Regression Prediction Visualization

But there is another, more systematic way to interpret this request. We can differentiate training data (the five observations where we know the answer) versus test data (where we are given a subset of the relevant information, and asked to generate the rest):

Regression Classification- Regression Prediction Schema

A regression prediction machine will for any hypothetical x-value, predicts the corresponding y-value. Sound familiar? This is just a function. There are in fact many possible regression functions, of varying complexity:

Regression Classification- Simple vs Complex Regression Outputs

Despite their simple appearance, each line represents a complete prediction machine. Each one can, for any order size, generate a corresponding prediction of the price of sugar.

Introducing Classification

To illustrate classification, consider another example.

Suppose we are an animal shelter, responsible for rescuing stray dogs and cats. We have saved two hundred animals; for each, we record their height, weight, and species:
Regression Classification- Classification Data (1)

Suppose we are left a note that reads as follows:

I will be dropping off a stray tomorrow that is 19 lbs and about a foot tall.

A classification question might be: is this animal more likely to be a dog or a cat?

Visually, we can interpret the challenge as follows:

Regression Classification- Classification Prediction Interpretation (2)

As before, we can understand this prediction problem as taking information gained from training data, to generate “missing” factors” from test data:

Regression Classification- Classification Prediction Schema

To actually build a classification machine, we must specify a region-color map, such as the following:

Regression Classification- Simple Classification Output


Indeed, the above solution is complete: we can produce a color (species) label for any new observation, based on whether it lies above or below our line. 

But other solutions exist. Consider, for example, a rather different kind of map:Regression Classification- Complex Classification Boundary

We could use either map to generate predictions. Which one is better? We will explore such questions next time.

Comparing Prediction Methods

Let’s compare our classification and regression models. In what sense are they the same?

Regresssion Classification- Comparing Models

If you’re like me, it is hard to identify similarities. But insight is obtained when you compare the underlying schemas:

Regresssion Classification- Comparing Schema

Here we see that our regression example was 2D, but our classification example was 3D. It would be easier to compare these models if we removed a dimension from the classification example.

Regresssion Classification- 1D Classification (1)

With this simplification, we can directly compare regression and classification:

Regresssion Classification- More Direct Comparison

Thus, the only real difference between regression and classification is whether the prediction (the dependent variable) is continuous or discrete.


Until next time.

Kagan: Structure of Ethical Theories

Part Of: Demystifying Ethics sequence
Followup ToAn Introduction To Ethical Theories
See Also: Shelly Kagan (1994). The Structure of Normative Ethics
Content Summary: 700 words, 7 min read

Are Ethical Theories Incompatible?

Last time, we introduced five major ethical theories:

Ethical Theories- Summary

At first glance, we might consider these theories as rivals competing for the status of a ground for morality. However, when discussing these theories, one has a distinct sense that they are simple addressing different concerns.

Perhaps these theories are compatible with one another. But it is hard to see how, because we lack a map of the major conceptual regions of normative ethics, and how they relate to one another.

Let’s try to construct such a map.

Identifying Morally Relevant Factors

There are two major activities in the philosophical discourse about normative ethics: factorial analysis, and foundational theories.

Factorial analysis involves getting clear on which variables affect in our moral judgments. This is the goal of moral thought experiments. By constructing maps from situations to moral judgment, we seek to understand situational factors that contribute to (and compete for control over) our final moral appraisals.

We can discern four categories of factors which bear on moral judgments: Goodness of Outcome, General Constraints, Special Obligations, and Options. We might call these categories factorial genres. Here are some example factors from each genre.

Ethics Taxonomy- Morally Relevant Factors (1)

While conducting factorial analysis, we typically ask questions about:

  1. Relative Strength. Does Don’t Harm always outweigh factors related to outcome?
  2. Explanatory Parsimony. Is Keep Your Promises redundant with Don’t Be Unfair?
  3. Subfactor Elaboration. What does Maximize Overall Happiness mean, exactly?

Constructing Foundational Theories

A foundational mechanism is a conceptual apparatus designed to generate the right set of morally relevant factors. An example of such a theory is contractarianism, which roughly states that:

Morally relevant factors are those which would be agreed to by a social community, if they were placed in an Original Position (imagine you are designing a social community from scratch), and subject to the Veil of Ignorance (you don’t know the details of what your particular role will be).

Thus, our two philosophic activities relate as follows:

Ethical Structure- Factors vs Foundations (2)

These two activities are fueled by different sets of intuitions.

  • Factorial intuitions are identified by appeal to concrete ethical dilemmas.
  • Foundational intuitions are often related to one’s metaethical dispositions.

Let us examine other accounts of foundational mechanisms. These claim that we should accept only morally relevant factors that…

  • …if everyone followed such rules, total well-being would be maximized (rule utilitarianism).
  •  … if the factor was universalized, became like a law of nature, no contradictions would emerge (Kantian universalization).
  • … can be attributed to a being acting purely in self-interest (egoism).

Localizing Ethical Theories in our Map

We can now use this scheme to better understand the space of ethical theories.

Proposition 1. Ethical theories can be decomposed into their foundational and factorial components.

Three of our five ethical theories have the following decomposition:

Ethical Structure- Deconstructing Ethical Theories (1)

Proposition 2. Factorial pluralism is compatible with foundational monism.

Certain flavors of consequentialists, deontologists, consequentialists insist on factorial monism, that only one kind of moral factor really matters.

But as a descriptive matter, it seems that human morality is sensitive to many different kinds of factors. Outcome valence, action constraint, role-based obligations all seem to play in real moral decisions.

Factorial monism has the unpleasant implication of demonstrating some of these factors as misguided. But philosophers are perfectly free to affirm factorial pluralism: that each intuition “genre” are prescriptively justified.

Some examples of one foundational device generating a plurality of genres:

  1. Rule Utilitarianism (rules that maximize societal well-being) could easily generate rules to keep one’s promises.
  2. Kantian Universalization might generate outcome-sensitive moral factors that are immune to contradiction.
  3. People in the Original Position might enter into a contract of general constraints (e.g., human rights).


Are ethical theories truly competitors? One might suspect that the answer is no. Ethical theories seem to address different concerns.

We can give flesh to this intuition by analyzing the structure of ethical theories. They can be decomposed into two parts: factorial analysis, and foundational mechanisms.

  • Factorial analysis provide the list of factors relevant to moral judgments.
  • Foundational mechanisms are hypothesized to generate these moral factors.

Ethical Structure- Factors vs Foundations (2)

Most defenses of foundational mechanisms have them generating a single factorial genre. However, it is possible to endorse factorial pluralism. There is nothing incoherent in the view that e.g., both event outcome and general constraints bear on morality.

This taxonomy allows us to contrast ethical theories in a new way. Utilitarianism can be seen as a theory about the normative factors, contractarianism is a foundational mechanism. Far from being rival views, one could in fact endorse both!

The Finite Price of Human Life [Excerpt]

Content Summary: 1400 words, 7 min read.
Original Author: Scott Alexander

Price of Life

Recently on both sides of the health care debate I have been hearing people make a very dangerous error. They point to a situation in which someone was denied coverage for a certain treatment because it was expensive and unproven, and say: “This is an outrage! We can’t let ‘death panels’ say some lives aren’t worth saving! How can people say money is more important than a human life? We have a moral duty to pay for any treatment, no matter how expensive, no matter how hopeless the case, if there is even the tiniest chance that it help this poor person.”

All of these are simple errors. Contrary to popular belief, you can put a dollar value on human life. That dollar value is $5.8 million. Denying this leads to terrible consequences.

Let me explain.

On The Risks of Dying

Consider the following:

A man has a machine with a button on it. If you press the button, there is a one in five million chance that you will die immediately; otherwise, nothing happens. He offers you some money to press the button once. What do you do? Do you refuse to press it for any amount? If not, how much money would convince you to press the button?

What do you think?

If you answered something like “Never for any amount of money,” or “Only for a million dollars”, you’re not thinking clearly.

One in five million is pretty much your chance of dying from a car accident every five minutes that you’re driving. Choosing to drive for five minutes is exactly equivalent to choosing to press the man’s button. If you said you wouldn’t press the button for fifty thousand dollars, then in theory if someone living five minutes away offers to give you fifty thousand dollars no strings attached, you should refuse the offer because you’re too afraid to drive to their house.

Likewise, if you drive five minutes to a store to buy a product, instead of ordering the same product on the Internet for the same price plus $5 shipping and handling, then you should be willing to press the man’s button for $5.

When I asked this question to several friends, about two-thirds of them said they’d never press the button. This tells me people are fundamentally confused when they consider the value of life. When asked directly how much value they place on life, they always say it’s infinite. But people’s actions show that in reality they place a limited value on their life; enough that they’re willing to accept a small but real chance of death to save five bucks. And as we will see, that is a very, very good thing.

Insurance Example: Fixed Costs

Consider the following:

Imagine an insurance company with one hundred customers, each of whom pays $1. This insurance company wants 10% profit, so it has $90 to spend. Seven people on the company’s plan are sick, with seven different diseases, each of which is fatal. Each disease has a cure. The cures cost, in order, $90, $50, $40, $20, $15, $10, and $5.

We have decided to give everyone every possible treatment. So when the first person, the one with the $90 disease, comes to us, we gladly spend $90 on their treatment; it would be inhuman to just turn them away. Now we have no money left for anyone else. Six out of seven people die.

The fault here isn’t with the insurance company wanting to make a profit. Even if the insurance company gave up its ten percent profit, it would only have $10 more; enough to save the person with the $10 disease, but five out of seven would still die.

A better tactic would be to turn down the person with the $90 disease. Instead, treat the people with $5, $10, $15, $20, and $40 diseases. You still use only $90, but only two out of seven die. By refusing treatment to the $90 case, you save four lives. This solution can be described as more cost-effective; by spending the same amount of money, you save more people. Even though “cost-effectiveness” is derided in the media as being opposed to the goal of saving lives, it’s actually all about saving lives.

If you don’t know how many people will get sick next year with what diseases, but you assume it will be pretty close to the amount of people who get sick this year, you might make a rule for next year: Treat everyone with diseases that cost $40 or less, but refuse treatment to anyone with diseases that cost $50 or more.

Insurance Example: Probabilistic Costs

There is a similar argument applies to medical decisions that involve risk. Consider:

You have $900. There are four different fatal diseases: A, B, C, and D. There are 40 patients, ten with each disease. with four different fatal diseases. Each disease costs $300 to cure.

In this case, your only option is to cure A, B, and C… and tell patients with D that unfortunately there’s not enough left over for them.

But what if the cure for A only had a 10% chance of working? In this case, you cure A, B, and C and have, on average, 21 people left alive.

Or you could tell A that you can’t approve the treatment because it’s not proven to work. Now you use your $90 to treat B, C, and D instead, and you have on average 30 people left alive. By denying someone an unproven treatment, you’ve saved 9 lives.

Computing the Value of a Life

So, in the real world, how should we decide how much money is a good amount to spend on someone?

I mentioned before that people don’t act as if the lives of themselves or others are infinitely valuable. They act as if they have a well-defined price tag. Well, some enterprising economists have figured out exactly what that price tag is. They made their calculations by examining, for example, how much extra you have to pay someone to take a dangerous job, or how much people who are spending their own money are willing to spend on unproven hopeless treatments. They determined that most people act as if their lives were worth, on average, 5.8 million dollars.

Most health care, government or private, uses a similar calculation. One common practice is to value an extra year of healthy life at $50,000. So:

  1. If a treatment costs $60,000 and will only let you live another year, they’ll reject it.
  2. If a treatment costs $600,000 and will let you live 20 more years, then since 600000/20 = 30,000 which is < 50,000, they’ll approve it.
  3. If a treatment costs $15,000 and has only a one in ten chance of letting you live another two years, then since [(15000)/(1/10)]/2 = 75,000 which is > 50,000, they’ll reject it.

I’m not claiming I have any of the answers to this health care thing. I’m not claiming that $50,000 is or isn’t a good number to value a year of life at. I’m not saying that government health care couldn’t become much more efficient and save lots of money, or that private health care couldn’t come up with a better incentive system that makes denying treatments less common and less traumatizing. I’m not saying that insurance companies don’t make huge and stupid mistakes when performing this type of analysis, or even that they aren’t the slime of the earth. I’m not saying the insurance system is currently fair to the poor, whatever that means. I’m not saying that there aren’t many many variables not considered in this simplistic analysis, or anything of that sort.

I am saying that if you demand that you “not be treated as a number” or that your insurance “never deny anyone treatment as long as there’s some chance it could help”, or that health care be “taken out of the hands of bureaucrats and economists”, then you will reap what you have sown: worse care and a greater chance of dying of disease, plus the certainty that you have inflicted the same on many others.

I’m also saying that this is a good example of why poorly informed people who immediately get indignant at anything packaged by the media as being “outrageous”, even when their “hearts are in the right places”, end up poisoning a complicated issue and making it harder for responsible people to make any progress.

ERTAS: The Engine of Consciousness

Part Of: Demystifying Consciousness sequence
Content Summary: 800 words, 8 min read

Existential Mode Generators

In Why We Sleep, we discussed sleep architecture diagrams. These diagrams show clear electrical differences between three existential modes: NREM (“sleeping”), REM (“dreaming”), and Consciousness.


While EEG excels at providing temporal resolution, it doesn’t provide much spatial information. Where does the brain construct these three modes?

To answer this, neuroscientists cut the brains of cats in half… literally. If you perform a Cerveau Isolé cut (slice above the midbrain), the top half’s electrical signature is NREM. If you do a Midpontine Pre-Trigeminal cut (slice below the midbrain), the top half’s electrical signature is NREM + Consciousness.

Consciousness Ignition- Localizing Circuits (2)

This evidence shows that existential modes are generated by different areas. Specifically:

  • Sleep is induced by the diencephalon.
  • Dreaming is initiated by the metencephalon.
  • Consciousness is ignited by the mesencephon.

Neuroscientists now knew where to look! It was not long before they discovered the machinery that create consciousness, sleeping, and dreaming:

Consciousness Ignition- Mode Localization (2)

We now turn our gaze to the ascending reticular activating system (ARAS).  “Reticular” is a word that means “web-like”, so the name roughly means “web-like ignition switch”.  But before we do so, we need to turn our gaze to the relationship between cortico-thalamic (CT) radiations and consciousness.

Thalamus Anatomy & Function

We have also explained that the purpose of consciousness is to solve the binding problem: gluing together disparate adjectives into coherent nouns:

Objects- Distributed Object Networks (2)

Consciousness creates the coherent objects of working memory by implementing phase binding, where object features are stitched together in distinct frequency bands, not unlike the radio in your car.

Objects- Phase Locking & Wakefulness

We have previously described the thalamus and cortex as dually innervating spheres, not dissimilar to a plasma globe:

Brain- Plasma Globe analogy (2)

And indeed, the nuclei within the thalamus tile the entire cortex:

Consciousness Ignition- Thalamic Architecture

Note, however, that only some thalamic nuclei are specific (project to discrete patches of cortex). Nonspecific thalamic nuclei are also present, including the Intralaminar Nuclei (ILN) and Reticular Nucleus of the Thalamus (RNT).

These nonspecific nuclei are the principal components of the ERTAS system, and plausible candidates for the engine of consciousness.

Damage of specific nuclei produce loss of a particular modality.  In contrast, lesions to nonspecific nuclei produces deep disturbances of consciousness. In fact, recent evidence suggests that such lesions perturb cortico-cortical information transmission.

The ERTAS Hypothesis

The ascending reticular activating system (ARAS) consists of a dense web of nuclei. Indeed, the word “reticular” means “web-like”. Parvizi, Damasio (2001) outline the more significant members of the system:

Consciousness Ignition- Mesencephalon Reticular Formation

These nuclei project to the following three sites:

  1. Reticular Nucleus of the Thalamus (RNT), a sheet that sits on top of the thalamus.
  2. Intralaminar Nuclei (ILN), which are embedded deep within the thalamus.
  3. Basal Forebrain, which receives & distributes several neurochemical systems.

These structures in turn route information flowing to cortex:

Consciousness Ignition- Thalamus ILC NR

The extended reticular-thalamic activating system (ERTAS) hypothesis connects the ARAS system with the phase binding interpretation of the cortico-thalamo-cortical reentrant loop. One hypothesis, adapted from Newman (1999), has three theses:

  • ILN performs phase binding (and thus, the consciousness generator).
  • RNT implements selective attention.
  • Basal Forebrain provides visceral “body-relevant” information.


More recent research has corroborated the role of the ILN in phase binding, and expanded its scope. Saalmann (2014) notes that the ILN seems to participate in a larger group of higher-order nuclei which each manage information within more constrained parts of cortex. The anterior ILN seems more related to oculomotor processes; the posterior deals with the multimodal integration of different sense data.

One unexpected recent finding has been that lesions of “higher-order nuclei” such as the ILN seem to perturb cortico-cortical information transmission. This underscores the need to understand interactions between the CTC Loop and other reentrant loops.


The Role of The Claustrum

The claustrum is a tiny sheet of gray matter suspended between thalamus and cortex. However, it receives information from essentially the entire cortex:

Consciousness Ignition- Claustrum Anatomy (2)

Given that the purpose of consciousness is to integrate cortical information, the anatomical position of the claustrum is suggestive.

Recent anatomical evidence has only strengthened the case for claustrum promoting consciousness:

  • Koubeissi et al  (2014) is a case study where they were electrical stimulation of the claustrum induced loss of consciousness (!).
  • Chau et al (2015) announced evidence that correlate claustrum lesions with the duration, but not the frequency, of loss of consciousness.
  • Wang et al (2016) conclusively proved that the claustrum has reciprocal connections everywhere in cortex.
  • Reardon (2017) announced the discovery of a single neuron whose dendrites encircled the entire brain (image credit)

Consciousness Ignition- Claustrum Mega-Neurons

These data are suggestive. However, it will be some time before we know enough to integrate claustrum function within the ERTAS system.

Until next time.

Related Works

  • Chau et al (2015). The effect of claustrum lesions on human consciousness and the recovery of function
  • Crick, Koch (2005). What is the function of the claustrum?
  • Koubeissi et al (2014). Electrical stimulation of a small brain area reversibly disrupts consciousness
  • Newman (1999). Putting the puzzle together: towards a general theory of the neural correlates of consciousness
  • Parvizi, Damasio (2001). Consciousness and the brainstem
  • Reardon (2017). A giant neuron found wrapped around entire mouse brain.
  • Wang et al (2016). Organization of the connections between claustrum and cortex in the mouse

The Social Behavior Network

Part Of: Affective Neuroscience sequence
Content Summary: 800 words, 8 min read

Primary Emotion

There are many possible emotions. How can we make sense of this diversity?

Primary emotions are often used to shed light on our emotional lives. Like primary colors, these emotions blend together to reconstitute the full spectrum of emotional experience. For example, contempt is viewed as a combination of anger and disgust.

An emotion qualifies as primary if it satisfies the following criteria:

  1. Unique Machinery. It must be localized to specific neural processes.
  2. Known Signature. A fixed set of phenomenological and behavioral expressions
  3. Universal (Pre-Cultural). Expressed in all members of a given species. For ecologically valid stimuli, response does not detract from overall fitness.
  4. Primitive (Pre-Cognitive). Activated more strenuously during early development or immediate crisis (i.e., with minimal cognitive regulation).
  5. Differentiable.  Can be dissociated from other primary emotions.

Despite consensus about the above criteria, there is less agreement on which emotions deserve membership.  Here are three representative lists.

SBN- Theories of Primary Emotions (4)

The Social Behavior Network (SBN)

Neuroscientists studying aggression have identified six brain regions that seem to produce this behavior. They are:

  1. Preoptic Area of the Hypothalamus (PO)
  2. Anterior Nucleus of the Hypothalamus (AH)
  3. Ventromedial Nucleus of the Hypothalamus (VMH)
  4. Periacquductal Gray (PAG)
  5. Lateral Septum (LS)
  6. Extended Amygdala (extAMY)

If any of these regions are damaged, an animal often becomes less aggressive. If you electrically stimulate these regions, the animal becomes enraged.

What is interesting about these six regions is that they were independently discovered by other neuroscientists who labelled them as the seat of parental care.

… AND, by yet other neuroscientists who had been investigating the neural basis of sexual behavior.

What do { Parental Care, Aggression, Sexual Behavior } have in common? They are entirely directed at members of one’s own species. These primary emotions are deeply related to animal social behavior.

Since the six nuclei { PO, AH, VMH, PAG, LS, extAMY } contribute to each of these three emotions & behaviors, they are now called the social behavior network (SBN). 

SBN- Overview

Will it turn out that all social primary emotions are created by the SBN? I don’t know. It is suggestive, however, that Play has been partially localized to the lateral septum (LS).

SBN and Emotion Selection

The SBN is one brain structure that can produce three distinct emotional response. How is this possible? How does each emotion individuate itself within a single apparatus?

To proceed, we consult our “theorizing roadmap”:

SBN- Principles of Structure Function

Conceptually, we are plagued by “too many emotions”. Thus, we can either:

  1. Examine whether our three emotions can be unified; or
  2. Look for granularity within the SBN

Since the former is impractical, let’s look more carefully at the SBN.

One way to explain emotion individuation would be a shape hypothesis. If the intensity of neuron firing is encoded by height, you might expect different topographies (landscapes) to encode different emotions. 

SBN- Emotion Differentiation Shape Hypothesis

Another hypothesis is the granularity hypothesis. This posits that there may be e.g., three subdivisions of the lateral septum, and each subdivision supports a different emotion.

I tend to find this approach more plausible, given my experience with other subcortical structures. That said, time will tell. 🙂

Relation To The Basal Ganglia

The SBN is anatomically related to the basal ganglia. Recall that the basal ganglia has three loops: Associative, Sensorimotor, and Limbic. The SBN is strongly connected to, and shares two nodes with, the Limbic Loop.

SBN- SBN vs Limbic Loop (2)

As we have seen, the basal ganglia is the seat of motivation. The anatomical connection between SBN and basal ganglia mirrors the behavioral link between sociality and motivation. However, on a mathematical level, it is less clear how social emotions can be incorporated into the reinforcement learning apparatus:

SBN- Application to Neuroeconomics

Evolution of Emotion

Let’s use comparative anatomy to discover when the social behavior network evolved. By dissecting brains from five representative species, we can infer that the basal ganglia dates back to at least the origin of ray-finned fish.

SBN- Phylogeny (1)

The SBN nuclei are preserved across our representative species:

SBN- Comparative Anatomy

And hodology (connections) between SBN nuclei are preserved:

SBN- Comparative Hodology

This evidence demonstrates that the social behavior network has been around since the invention of vertebrates. It also raises important questions, such as:

  • How has the SBN changed to support hyper-social animals like primates?
  • How much further back do emotional adaptations go? Do insects feel emotions? If yes, which kinds?

Until next time.

Related Works

  • Newman (1999). The Medial Extended Amygdala in Male Reproductive Behavior: A Node in the Mammalian Social Behavior Network
  • O’connell, Hofmann (2011). The vertebrate mesolimbic reward system and social behavior network: a comparative synthesis