Singular Value Decomposition

March 7, 2017March 23, 2017kevinbinz Leave a comment

Part Of: Algebra sequence
Followup To: Eigenvalues and Eigenvectors
Content Summary: 1300 words, 13 min read.

Limitations of Eigendecomposition

Last time, we learned how to locate eigenvalues and eigenvectors of a given matrix. Diagonalization is the process where a square matrix can be decomposed into two factors: a matrix $Q$ (with eigenvectors along the columns) and a matrix $\Lambda$ (with eigenvalues along the diagonal).

$A = Q \Lambda Q^T$

But as we saw with the spectral theorem, eigendecomposition only works well against square, symmetric matrices. If a matrix isn’t symmetric, it is easy to run into complex eigenvalues. And if a matrix isn’t square, you are out of luck entirely!

Eigendecomposition- Spectral Theorem (1)

Can we generalize eigendecomposition to apply to a wider family of matrices? Can we diagonalize matrices of any size, even if they don’t have “nice” properties?

Yes. Self-transposition is the key insight of our “eigendecomposition 2.0”. We define the self-transpositions of $A$ as $AA^{T}$ and $A^{T}A$ .

Suppose $A \in \mathbb{R}^{m x n}$ . Then $AA^T \in \mathbb{R}^{mxm}$ and $A^TA \in \mathbb{R}^{nxn}$ . So these matrices are square. But they are also symmetric!

To illustrate, consider the following.

$A = \begin{bmatrix} 4 & 4 \\ -3 & 3 \\ \end{bmatrix}$

Since A is not symmetric, we have no guarantee that its eigenvalues are real. Indeed, its eigenvalues turn out to be complex:

$\det(A - \lambda I) = \begin{bmatrix} 4 - \lambda & 4 \\ -3 & 3 - \lambda \\ \end{bmatrix} = 0$

$(12 - 7 \lambda + \lambda^2) + 12 = 0 \Rightarrow \lambda^2 -7 \lambda + 24 = 0$

$\lambda = \frac{7 \pm \sqrt{(-7)^2 - 4*1*24}}{2*1} = \frac{7}{2} \pm \frac{\sqrt{47}i}{2}$

Eigendecomposition on $A$ sucks. Are the self-transposed matrices any better?

$A^TA = \begin{bmatrix} 4 & -3 \\ 4 & 3 \\ \end{bmatrix} \begin{bmatrix} 4 & 4 \\ -3 & 3 \\ \end{bmatrix} = \begin{bmatrix} 25 & 7 \\ 7 & 25 \\ \end{bmatrix}$

$AA^T = \begin{bmatrix} 4 & 4 \\ -3 & 3 \\ \end{bmatrix} \begin{bmatrix} 4 & -3 \\ 4 & 3 \\ \end{bmatrix} = \begin{bmatrix} 32 & 0 \\ 0 & 18 \\ \end{bmatrix}$

These matrices are symmetric! Thus, they are better candidates for eigendecomposition.

Towards Singular Value Decomposition

Singular Value Decomposition (SVD) is based on the principle that all matrices are eigendecomposable after self-transposition. It is essentially a bug fix:

SVD- Self-Transposition

An important way to picture SVD is with the idea of orthogonal bases. It is relatively easy to find any number of orthogonal bases for a given rowspace. Call the matrix of orthogonal vectors $V$ .

We desire to find an orthogonal basis $V$ such that $AV$ produces an orthogonal basis in the column space. Orthogonal bases are not particularly hard to find. But most orthogonal bases, once projected to column space, will lose their orthogonality! We desire that particular orthogonal basis $U$ such that $V$ is also orthogonal.

We won’t require vectors in $U$ to be the same size as those in $V$ . Instead, we will normalize $V$ ; its basis vectors will be orthonormal (orthogonal and normal). Then the length of each vector in $U$ will diverge by a scaling factor.

As we will soon see, these scaling factors are not eigenvalues. Instead, we will use sigmas instead of lambdas.

Scaling factors $\sigma$ , analogous to eigenvalues $\lambda$ .
Diagonal matrix $\Sigma$ , analogous to diagonal matrix $\Lambda$

Our full picture then, looks like this:

SVD- Orthogonal Basis Transfer

Let us now translate this image into matrix language.

$A \begin{bmatrix} \vdots & \vdots & \vdots & \vdots \\ v_1 & v_2 & \dots & v_n \\ \vdots & \vdots & \vdots & \vdots \\ \end{bmatrix} = \begin{bmatrix} \vdots & \vdots & \vdots & \vdots \\ \sigma_1u_1 & \sigma_2u_2 & \dots & \sigma_nu_n \\ \vdots & \vdots & \vdots & \vdots \\ \end{bmatrix}$

But we can easily factorize the right-hand side:

$\begin{bmatrix} \vdots & \vdots & \vdots & \vdots \\ \sigma_1u_1 & \sigma_2u_2 & \dots & \sigma_nu_n \\ \vdots & \vdots & \vdots & \vdots \\ \end{bmatrix} = \begin{bmatrix} \vdots & \vdots & \vdots & \vdots \\ u_1 & u_2 & \dots & u_n \\ \vdots & \vdots & \vdots & \vdots \\ \end{bmatrix} * \begin{bmatrix} \sigma_1 & 0 & \dots & 0 \\ 0 & \sigma_2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & \sigma_n \\ \end{bmatrix}$

So we have that:

$AV = U \Sigma$

Since both $V$ and $U$ are orthogonal, inversion of either is equivalent to transposition:

$A = U \Sigma V^{-1} = U \Sigma V^T$

This strongly resembles our diagonalization equation $A = Q \Lambda Q^T$ . SVD distinguishes itself by considering two orthogonal eigenmatrices $U$ and $V$ , not just one ( $Q$ ).

Recalling that $A = U \Sigma V^T$ ,

$A^TA = (V \Sigma^T U^T) (U \Sigma V^T)$

But now the innermost term cancels. Since $\Sigma$ is a square diagonal matrix, its self-transposition is simply equal to $\Sigma^{2}$ . So,

$A^TA = V \Sigma^2 V^T$

Since $A^{T}A$ is a square, symmetric matrix, our diagonalization theorem applies!

$A^TA = V \Sigma ^2 V^T = Q \Lambda Q^T$

To find $U$ , a similar trick works:

$AA^T = (U \Sigma V^T)(V \Sigma^T U^T) = U \Sigma^2 U^T = Q \Lambda Q^T$

The relationships between SVD and eigendecomposition are as follows:

$V$ is the eigenvectors of $A^TA$
$U$ is the eigenvectors of $AA^T$
$\Sigma$ is the square root of the eigenvalues matrix $\Lambda$

If any eigenvalue is negative, the corresponding sigma factor would be complex. But $A^TA$ and $AA^T$ are positive-semidefinite, which guarantees non-negative eigenvalues. This assures us that $\Sigma$ contains only real values.

In contrast to eigendecomposition, every matrix has an SVD decomposition. Geometrically, $V$ and $U$ act as rotational transformations, and Sigma acts as a scaling transformation. In other words, every linear transformation comprises a rotation, then scaling, then another rotation.

SVD- Rotation Scaling Rotation (1)

A Worked Example

Let’s revisit $A$ . Recall that:

$A = \begin{bmatrix} 4 & 4 \\ -3 & 3 \\ \end{bmatrix}, A^TA = \begin{bmatrix} 25 & 7 \\ 7 & 25 \\ \end{bmatrix}, AA^T = \begin{bmatrix} 32 & 0 \\ 0 & 18 \\ \end{bmatrix}$

Eigendecomposition against $A$ is unpleasant because $A$ is not symmetric. But $A^{T}A$ is guaranteed to be positive semi-definite; that is, to have non-negative eigenvalues. Let’s see this in action.

$det(A^TA - \lambda I) = \begin{bmatrix} 25 - \lambda & 7 \\ 7 & 25 - \lambda \\ \end{bmatrix} = 0$

$\lambda^2 - 50 \lambda + 576 = 0 \Rightarrow (\lambda_1 - 32)(\lambda_2 - 18) = 0$

$\lambda_1 = 32, \lambda_2 = 18$

$trace(A^TA) = 50 = \sum{\lambda_i}$

$det(A^TA) = 625-49 = 576 = 18 * 32 = \prod{\lambda_i}$

These are positive, real eigenvalues. Perfect! Let’s now derive the corresponding (normalized) eigenvectors.

$A - 32I = \begin{bmatrix} -7 & 7 \\ 7 & -7 \\ \end{bmatrix} = 0, A - 18I = \begin{bmatrix} 7 & 7 \\ 7 & 7 \\ \end{bmatrix} = 0$

$rref(A - 32I) = \begin{bmatrix} -1 & 1 \\ 0 & 0 \\ \end{bmatrix} = 0, rref(A-18I) = \begin{bmatrix} 1 & 1 \\ 0 & 0 \\ \end{bmatrix} = 0$

$v_1 = \begin{bmatrix} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \\ \end{bmatrix}, v_2 = \begin{bmatrix} \frac{1}{\sqrt{2}} \\ \frac{-1}{\sqrt{2}} \\ \end{bmatrix}$

SVD intends to decompose $A$ into $U \Sigma V^{T}$ . The above findings give us two of these ingredients.

$V^{T} = \begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ \frac{-1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ \end{bmatrix}$

$\Sigma =\begin{bmatrix} \sqrt{32} & 0 \\ 0 & \sqrt{18} \\ \end{bmatrix}$

What’s missing? $U$ ! To find it, we perform eigendecomposition on $AA^{T}$ . This is an especially easy task, because $AA^{T}$ is already a diagonal matrix.

$AA^T = \begin{bmatrix} 32 & 0 \\ 0 & 18 \\ \end{bmatrix}$

$U = \begin{bmatrix} u_1 & u_2 \\ \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ \end{bmatrix}$

We have arrived at our first Singular Value Decomposition.

$A = U \Sigma V^T = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ \end{bmatrix} \begin{bmatrix} \sqrt{32} & 0 \\ 0 & \sqrt{18} \\ \end{bmatrix} \begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & - \frac{1}{\sqrt{2}} \\ \end{bmatrix}$

Okay, so let’s check our work. 😛

$A = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ \end{bmatrix} \left( \begin{bmatrix} \sqrt{32} & 0 \\ 0 & \sqrt{18} \\ \end{bmatrix} \begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & - \frac{1}{\sqrt{2}} \\ \end{bmatrix} \right) = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ \end{bmatrix} \begin{bmatrix} 4 & 4 \\ 3 & 3 \\ \end{bmatrix} = \begin{bmatrix} 4 & 4 \\ 3 & 3 \\ \end{bmatrix}$

These matrices are a viable factorization: multiplication successfully recovers $A$ .

Takeaways

Eigendecomposition only works for a subclass of matrices; SVD decomposes all matrices.
SVD relies on self-transposition to convert any arbitrary matrix into one that works well against eigendecomposition (guarantees square $m = n$ and symmetric $A = A^{T}$ ).
Another way to interpret SVD is by taking a special kind of orthogonal basis that, once passed through the linear transformation, preserves its orthogonality.
Every matrix $A = U \Sigma V^{T}$ . That is, every linear transformation can be conceived as rotation + scaling + rotation.

Until next time.

Eigenvalues and Eigenvectors

March 5, 2017March 23, 2017kevinbinz 1 Comment

Part Of: Algebra sequence
Followup To: An Introduction to Linear Algebra
Next Up: Singular Value Decomposition
Content Summary: 1300 words, 13 min read

Geometries of Eigenvectors

Matrices are functions that act on vectors, by mapping from row-vectors to column-vectors. Consider two examples:

Reflection matrices, which reflect vectors across some basis.
Rotation matrices, which rotate vectors clockwise by $\theta$ degrees.

eigenvectors-transformation-geometries-1

The set of eigenvectors of a matrix $A$ is a special set of input vectors for which the matrix behaves as a scaling transformation. In other words, we desire the set of vectors $\vec{x}$ whose output vectors $A\vec{x}$ differ by a scaling factor.

Eigenvectors have a straightforward geometric interpretation:

Reflection eigenvectors are orthogonal or parallel to the reflecting surface. In the left image above, that is the top two pairs of vectors.
Rotation eigenvectors do not exist (more formally, cannot be visualized in $\mathbb{R}^2$ ).

Algebra of Eigenvectors

We can express our “parallel output” property as:

$A\vec{x} = \lambda \vec{x}$

Thus $\vec{x}$ and $A\vec{x}$ point in the same direction, but differ by scaling factor $\lambda$ .

Scaling factor $\lambda$ is the eigenvalue. There can be many $\left( x, \lambda \right)$ pairs that satisfy the above equality.

For an $\mathbb{R}^{n x n}$ matrix, there are $n$ eigenvalues. These eigenvalues can be difficult to find. However, two facts aid our search:

The sum of eigenvalues equals the trace (sum of values along the diagonal).
The product of eigenvalues equals the determinant.

To solve, subtract $\lambda \vec{x}$ from both sides:

$A\vec{x} = \lambda \vec{x}$

$(A - \lambda I)\vec{x} = 0$

We would like to identify n unique eigenvectors. But if the new matrix $(A - \lambda I)$ has an empty nullspace, it will contain zero eigenvectors. So we desire this new matrix to be singular.

How to accomplish this? By finding eigenvalues that satisfy the characteristic equation $\det(A - \lambda I) = 0$ . Matrices are singular iff their determinants equal zero.

Let’s work through an example! What is the eigendecompositon for matrix $A$ :

$A = \begin{bmatrix} 3 & 1 \\ 1 & 3 \\ \end{bmatrix}$

We need to find eigenvalues that solve the characteristic equation.

$\det(A - \lambda I) = \begin{vmatrix} 3-\lambda & 1 \\ 1 & 3-\lambda \\ \end{vmatrix} = 0$

$(3 - \lambda)^2 - 1^2 = \lambda_2 -6\lambda + 8 = (\lambda-2)(\lambda-4) = 0$

$\lambda_1 = 2, \lambda_2 = 4$

Are these eigenvalues correct? Let’s check our work:

$trace(A) = 6 = \sum{\lambda_i}$

$det(A) = 8 = \prod{\lambda_i}$

How to find our eigenvectors? By solving the nullspace given each eigenvalue.

For $\lambda_1=2$ :

$A - 2I = \begin{bmatrix} 1 & 1 \\ 1 & 1 \\ \end{bmatrix} \Rightarrow rref(A - 2I) = \begin{bmatrix} 1 & 1 \\ 0 & 0 \\ \end{bmatrix}$

$(\lambda_1, \vec{x}_1) = (2, \begin{bmatrix} 1 \\ -1 \\ \end{bmatrix})$

For $\lambda_2=4$ :

$A - 4I = \begin{bmatrix} -1 & 1 \\ 1 & -1 \\ \end{bmatrix} \Rightarrow rref(A - 4I) = \begin{bmatrix} -1 & 1 \\ 0 & 0 \\ \end{bmatrix}$

$(\lambda_2, \vec{x}_2) = (4, \begin{bmatrix} 1 \\ 1 \\ \end{bmatrix})$

Desirable Matrix Properties

The above example was fairly straightforward. But eigendecomposition can “go awry”, as we shall see. Consider a rotation matrix, which in two dimensions has the following form:

$R = \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \\ \end{bmatrix}$

What are the eigenvalues for rotation $\theta = 90^{\circ}$ ?

$R = \begin{bmatrix} 0 & -1 \\ 1 & 0 \\ \end{bmatrix}$

$\det(R - \lambda I) = \begin{bmatrix} - \lambda & -1 \\ 1 & - \lambda \\ \end{bmatrix} = 0$

$(- \lambda)^2 - 1^2 \Rightarrow \lambda^2 = 1$

$\lambda_1 = i, \lambda_2 = -i$

We can check our work:

$trace(R) = 0 = \sum{\lambda_i}$

$det(R) = 1 = \prod{\lambda_i}$

We saw earlier that rotation matrices have no geometric interpretation. Here, we have algebraically shown that its eigenvalues are complex.

$A = \left[ \begin{smallmatrix} 3 & 1 \\ 1 & 3 \\ \end{smallmatrix} \right]$ has real eigenvalues, but $R = \left[ \begin{smallmatrix} 0 & -1 \\ 1 & 0 \\ \end{smallmatrix} \right]$ has less-desirable complex eigenvalues.

We can generalize the distinction between $A$ and $R$ as follows:

Spectral Theorem. Any matrix that is symmetric (A = A^T) is guaranteed to have real, nonnegative eigenvalues. The corresponding n eigenvectors are guaranteed to be orthogonal.

In other words, eigendecomposition works best against symmetric matrices.

Eigendecomposition- Spectral Theorem (1)

Diagonalization

Let us place each eigenvector in the column of a matrix $S$ . What happens when you multiply the original matrix $A$ by this new matrix? Since $S$ contains eigenvectors, multiplication by $A$ reduces to multiplication by the associated eigenvalues:

$AS = \begin{bmatrix} \vdots & \vdots & \vdots & \vdots \\ \lambda_1x_1 & \lambda_2x_2 & \dots & \lambda_nx_n \\ \vdots & \vdots & \vdots & \vdots \\ \end{bmatrix}$

We see the product contains a mixture of eigenvalues and eigenvectors. We can separate these by “pulling out” the eigenvalues into a diagonal matrix. Call this matrix $\Lambda$ (“capital lambda”).

$AS = \begin{bmatrix} \vdots & \vdots & \vdots & \vdots \\ x_1 & x_2 & \dots & x_n \\ \vdots & \vdots & \vdots & \vdots \\ \end{bmatrix} * \begin{bmatrix} \lambda_1 & 0 & \dots & 0 \\ 0 & \lambda_2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & \lambda_n \\ \end{bmatrix} = S \Lambda$

Most matrices have the property that its eigenvectors are linearly independent. For such matrices, $S$ is invertible. Given this fact, we can solve for $A$ :

$\Lambda = S^{-1} A S$

$A = S \Lambda S^{-1}$

Matrices that can be factorized in this way are said to be diagonalizable. We can see that both elimination and eigendecomposition are performing the same type of work: factorizing matrices into their component parts.

If $A$ is symmetric, then we know $\Lambda$ is real, and its eigenvectors in $S$ are orthogonal. Let us rename $S$ to be $Q$ , to reflect this additional property. But orthogonal matrices have the property that transposition equats inversion: $Q^T = Q^{-1}$ . Thus, if $A$ is symmetric, we can simplify the diagonalization formula to:

$A = Q \Lambda Q^{-1} = Q \Lambda Q^T$

Asymptoptic Interpretations

This diagonalization approach illustrates an important use case of eigenvectors: power matrices. What happens when $A$ is applied arbitrarily many times? What does the output look like in the limit?

We can use the diagonalization equation to represent $A^k$ :

$A^k = \prod_{i=1}^{k} (Q^{-1} \Lambda Q) = (Q^{-1} \Lambda Q)(Q^{-1} A Q)\dots(Q^{-1} \Lambda Q)$

We can simplify by canceling the inner terms $QQ^{-1}$ :

$A^k = Q^{-1} \Lambda^k Q$

This equation tells us that the eigenvectors is invariant to how many times $A$ is applied. In contrast, eigenvalue matrix $\Lambda$ has important implications for ongoing processes:

If each eigenvalue has magnitude less than one, the output will trend towards zero.
If each eigenvalue has magnitude greater than one, the output will trend to infinity.

Fibonacci Eigenvalues

The powers interpretation of eigenvalues sheds light on the behavior of all linear processes. This includes number sequences such as the Fibonacci numbers, where each number is the sum of the previous two numbers.

Recall the Fibonacci numbers are $0,1,1,2,3,5,8,13,...$ What is $F_{100}$ ?

Eigenvalues can answer this question. We must first express the Fibonacci generator as a linear equation:

$F(k+2) = 1F(k+1) + 1F(k)$

In order to translate this into a meaningful matrix, we must add a “redundant” equation

$F(k+1) = 1F(k+1) + 0F(k)$

With these equations, we can create a 2×2 Fibonacci matrix $F$ .

$F = \begin{bmatrix} 1 & 1 \\ 1 & 0 \\ \end{bmatrix}$

This matrix uniquely generates Fibonacci numbers.

$u_1 = Fu_0 = \begin{bmatrix} 1 & 1 \\ 1 & 0 \\ \end{bmatrix} \begin{bmatrix} 1 \\ 0 \\ \end{bmatrix} = \begin{bmatrix} 1 \\ 1 \\ \end{bmatrix}$

$u_4 = F^4u_0 = \begin{bmatrix} 1 & 1 \\ 1 & 0 \\ \end{bmatrix}^4 \begin{bmatrix} 1 \\ 0 \\ \end{bmatrix} = \begin{bmatrix} 5 \\ 3 \\ \end{bmatrix}$

To discover the rate at Fibonacci numbers grow, we decompose $F$ into its eigenvalues:

$\det(F - \lambda I) = \begin{vmatrix} 1 - \lambda & 1 \\ 1 & 1- \lambda \\ \end{vmatrix} = 0$

$\lambda^2 - 2\lambda - 1 = 0$

$\lambda_1 = \frac{1 + \sqrt{5}}{2}, \lambda_2 = \frac{1 - \sqrt{5}}{2}$

$\lambda_1 = 1.61803, \lambda_2 = -0.61803$

$trace(F) = 1 = \sum{\lambda_i}$

$det(F) = -1 = \prod{\lambda_i}$

We can go on to discover eigenvectors $x_1$ and $x_2$ . We can then express the Fibonnaci matrix $F$ as

$F = \lambda_1x_1 + \lambda_2 x_2$

$F^k = \lambda_1^k x_1 + \lambda_2^k x_2$

As k goes to infinity, the second term goes to zero. Thus, the ratio is dominated by the larger eigenvalue, 1.61803.

Mathematicians in the audience will recognize this number as the golden ratio.

eigenvectors-fibonacci-and-golden-ratio

We have long known that the ratio of successive Fibonnaci numbers converges to 1.61803. Eigenvalues provide a mechanism to derive this value analytically.

Until next time.

An Introduction to Linear Algebra

February 20, 2017March 23, 2017kevinbinz Leave a comment

Part Of: Algebra sequence
Content Summary: 1300 words, 13 min read.

Linear algebra is the math of vectors and matrices. Let me attempt to explain it as succinctly as possible.

Vector Operations

If n is a positive integer and ℝ is the set of real numbers, then ℝⁿ is the set of all n-tuples of real numbers. A vector v ∈ ℝⁿ is one such n-tuple. For example,

$V = (v_1, v_2, v_3) \in (\mathbb{R}, \mathbb{R}, \mathbb{R}) \equiv \mathbb{R}^3$

The six vector operations are: addition, subtraction, scaling, cross product, dot product, norm (vector length). For vectors u and v, these are defined as follows:

linear-algebra-vector-operations-1

The dot product and the cross product norm are related to the angle between two vectors. Specifically,

$u \cdotp v = \|u\| \|v\| cos(\theta)$

$\|u \times v\| = \|u\| \|v\| sin(\theta)$

Finally, note the the cross product is not commutative: u x v != v x u.

Matrix Operations

A matrix A ∈ ℝ^{m x n} is a rectangular array of real numbers with m rows and n columns. For example,

$A= \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \\ a_{31} & a_{32} \end{bmatrix} \in \begin{bmatrix} \mathbb{R} & \mathbb{R} \\ \mathbb{R} & \mathbb{R} \\ \mathbb{R} & \mathbb{R} \\ \end{bmatrix} \equiv \mathbb{R}^{3x2}$

Important matrix operations are addition, subtraction, product, transpose, and determinant:

linear-algebra-matrix-operations-2

Of the above operations, matrix product is especially significant. This operation is only possible when A has the same number of columns as B has rows.

Matrix as Morphism

Importantly, matrices with one row or one column can interpreted as vectors. Matrix multiplication with a vector can be interpreted in two ways. The row picture interprets the output as a dot product, the column picture interprets the output as a linear combination.

linear-algebra-row-vs-column-picture-4

Let’s take A to be a 2 by 2 matrix with row-vectors (2, -1) and (-1,2), and let b = (0,3). When we solve for x, we find there are two ways to do so:

linear-algebra-row-vs-column-geometry-1

In the above example, we set b = (0,3). But this linear combination could generate any vector b. In general, linear combinations can generate the entirety of a space (in this case, a two-dimensional plane).

In both views, we say A transform x → b. If vectors are groups of data, matrices are functions that operate on vectors. Specifically, multiplication by a matrix A ∈ ℝ^{m x n} we will call a linear transformation, that converts an n-tuple into a m-tuple T_A : ℝⁿ → ℝ^m.

The symmetry between functions and linear transformations runs deep. When we explore category theory, we will discover the structure underpinning this relationship. But in the meantime, consider the following equivalences:

linear-algebra-function-vs-linear-transformation-4

Especially important to matrix functions is the notion of inverse matrix. Just as f^-1(x) = ln(x) undoes the effects of f(x) = e^x, inverse matrix A^-1 undoes the effects of A. The cumulative effect of applying A^-1 after A is the identity matrix A^-1A = 1. The identity matrix is analogous to multiplying by one, or adding zero: it is algebraically inert.

Only square matrices can be invertible, because reversal is commutative AA^-1 = A^-1A = 1. But not all square matrices have inverses. The determinant of a matrix checks for invertibility. Det(A) != 0 iff A is invertible.

Matrix inversion is vitally important to solving systems of linear equations.

AX = B → multiply both sides by A^-1 on the right → X = A^-1B
XA = B → multiply both sides by A^-1 on the right → X = BA^-1
ABXC = E → put C^-1 on the right and B^-1A^-1 on the left → X = B^-1A^-1EC^-1

Elimination As Inverse Solver

To solve Ax = b, we must find inverse matrix A^-1. This can be done via Gauss-Jordan Elimination. This method affords three row operations:

Linear Combination: R_i += kR_j
Scaling: R_i *= k
Row Swap: R_i ⇆ R_j

After creating a matrix of the form Ax = b, we can solve for x by creating an augmented matrix of the form [ A | b ], and converting the left-hand side into the identity matrix:

linear-algebra-gauss-jordan-elimination-2

To solve for x algebraically, Ax = b → A^-1Ax = A^-1b → 𝟙x = A^-1b. So Gauss-Jordan facilitates the discovery of inverse matrix A^-1. We can show this computation explicitly, by setting an augmented matrix of form [ A | 1 ]

linear-algebra-gauss-jordan-inverse-matrix-6

Row operations are functions that act on vectors. So are matrices. Thus, it isn’t very surprising that our three row operations each correspond to an elementary matrix. The elementary matrix is similar to the identity matrix; in the below picture, only the shaded region diverges.

linear-algebra-elementary-matrices-7

We now see that the row operations of Gauss Jordan elimination are a kind of factoring: we are finding the elementary matrices underlying A^-1:

linear-algebra-gauss-jordan-elementary-matrices-2

Thus E₄E₃E₂E₁ = A^-1 and Ax = b → (E₄E₃E₂E₁)Ax = (E₄E₃E₂E₁)b → x = A^-1b

Fundamental Vector Spaces

A vector space consists of a set of vectors and all linear combinations of these vectors. The set of all linear combinations is called the span. For example, the vector space S = span{ v₁, v₂ } consists of all vectors of the form v = ɑv₁ + βv₂, where ɑ and β are real numbers.

Vector spaces can be characterized by a basis: the original set of vectors whose linear combination creates the vector space. The vectors of any basis must be linearly independent, or orthogonal. You can check whether two vectors or orthogonal by confirming their dot product u · v = 0. The dimension of a vector space is the cardinality of the basis (number of orthogonal vectors).

Recall that matrices are functions that project vectors from ℝⁿ to ℝ^m. This corresponds to projecting from row space to column space, as follows:

linear-algebra-fundamental-space-interpretation-6

We will unpack the full implications of this graphic another time. For now, it suffices to understand that discovering the bases of these subspaces is a useful exercise.

For example, if we show that a matrix has an non-empty kernel, this in itself is proof of non-invertibility. Why? Because if T_A sends a vector to the zero vector, there is no T_A^-1 that can undo this operation.

We can say more. For an n x n matrix A, the following statements are equivalent:

A is invertible
The determinant of A is nonzero.
The RREF of A is the n x n identity matrix
The rank of the matrix is n
The row space of A is ℝⁿ
The column space of A is ℝⁿ
A doesn’t have a null space (only the zero vector N(A) = {0} )

Elimination as Basis Identifier

We have seen elimination discover (and factorize) the matrix inverse. But what happens when an inverse does not exist? Well, elimination has a natural, unique stopping point:

linear-algebra-rref-derivation-2

Matrices in reduced row echelon form (RREF) have the following properties:

Rows with all zeroes are moved to the bottom.
The first nonzero number from the left (pivot) is to the right of the pivot above it.
Every pivot equals 1 and is the only nonzero entry in its column

Gauss-Jordan Elimination can replace every matrix with an equivalent one in rref form. Note that we have seen several examples of elimination creates the identity matrix 1 when A is invertible. This is only possible because 1 fits the rref criteria.

Once a matrix is in rref form, it becomes trivial to discover the basis for its three fundamental spaces:

R(A) basis: all row vectors that are non-zero
C(A) basis: all column vectors that contain pivot.
N(A) basis: the solution to Ax = 0

linear-algebra-deriving-fundamental-spaces-2

We see that rank(A) + nullity(A) = 3 + 1 = 4, which indeed is the number of columns. Also, the column space of A equals the row space of A^T. This is not an accident, since transpose simply swaps rows with columns.

Takeaways

Vectors are groups of numbers; matrices are functions that act on vectors.
We can solve linear equations by conducting Gauss-Jordan elimination.
Gauss-Jordan elimination-based solution rely on searching for inverse matrix A^-1
The domain of matrices is its row vectors, its codomain is its column vectors.
Even if a matrix is not invertible, elimination can find its most reduced form (RREF).
RREF matrices can be used to derive fundamental subspaces.

Until next time.

Related Works

Linear Algebra in Four Pages
SageMath can be used to check my work

Fewer Lacunae

Distilled, Integrative Research

Tag linear algebra

Singular Value Decomposition

Eigenvalues and Eigenvectors

An Introduction to Linear Algebra