Prediction Functions: Regression vs Classification

Part Of: Principles of Machine Learning sequence
Content Summary: 500 words, 5 min read

Motivations

Data scientists are in the business of answering questions with data. To do this, data is fed into prediction functions, which learn from the data, and use this knowledge to produce inferences.

Today we take an intuitive, non-mathematical look at two genres of prediction machine: regression and classification. Whereas these approaches may seem unrelated, we shall discover a deep symmetry lurking below the surface.

Introducing Regression

Consider a supermarket that has made five purchases of sugar from its supplier in the past. We therefore have access to five data points:

Regression Classification- Regression Data (3)

One of our competitors intends to buy 40kg of sugar. Can we predict the price they will pay?

This question can be interpreted visually as follows:

Regression Classification- Regression Prediction Visualization

But there is another, more systematic way to interpret this request. We can differentiate training data (the five observations where we know the answer) versus test data (where we are given a subset of the relevant information, and asked to generate the rest):

Regression Classification- Regression Prediction Schema

A regression prediction machine will for any hypothetical x-value, predicts the corresponding y-value. Sound familiar? This is just a function. There are in fact many possible regression functions, of varying complexity:

Regression Classification- Simple vs Complex Regression Outputs

Despite their simple appearance, each line represents a complete prediction machine. Each one can, for any order size, generate a corresponding prediction of the price of sugar.

Introducing Classification

To illustrate classification, consider another example.

Suppose we are an animal shelter, responsible for rescuing stray dogs and cats. We have saved two hundred animals; for each, we record their height, weight, and species:
Regression Classification- Classification Data (1)

Suppose we are left a note that reads as follows:

I will be dropping off a stray tomorrow that is 19 lbs and about a foot tall.

A classification question might be: is this animal more likely to be a dog or a cat?

Visually, we can interpret the challenge as follows:

Regression Classification- Classification Prediction Interpretation (2)

As before, we can understand this prediction problem as taking information gained from training data, to generate “missing” factors” from test data:

Regression Classification- Classification Prediction Schema

To actually build a classification machine, we must specify a region-color map, such as the following:

Regression Classification- Simple Classification Output

 

Indeed, the above solution is complete: we can produce a color (species) label for any new observation, based on whether it lies above or below our line. 

But other solutions exist. Consider, for example, a rather different kind of map:Regression Classification- Complex Classification Boundary

We could use either map to generate predictions. Which one is better? We will explore such questions next time.

Comparing Prediction Methods

Let’s compare our classification and regression models. In what sense are they the same?

Regresssion Classification- Comparing Models

If you’re like me, it is hard to identify similarities. But insight is obtained when you compare the underlying schemas:

Regresssion Classification- Comparing Schema

Here we see that our regression example was 2D, but our classification example was 3D. It would be easier to compare these models if we removed a dimension from the classification example.

Regresssion Classification- 1D Classification (1)

With this simplification, we can directly compare regression and classification:

Regresssion Classification- More Direct Comparison

We have arrived at our moral: the only real difference between regression and classification is whether the dependent (predicted) variable is continuous or discrete.

From Regression To Classification

 

 

Until next time.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s