Singular Value Decomposition

Part Of: Algebra sequence
Followup To: Eigenvalues and Eigenvectors
Content Summary: 1300 words, 13 min read.

Limitations of Eigendecomposition

Last time, we learned how to locate eigenvalues and eigenvectors of a given matrix. Diagonalization is the process where a square matrix can be decomposed into two factors: a matrix Q (with eigenvectors along the columns) and a matrix \Lambda (with eigenvalues along the diagonal).

A = Q \Lambda Q^T

But as we saw with the spectral theorem, eigendecomposition only works well against square, symmetric matrices. If a matrix isn’t symmetric, it is easy to run into complex eigenvalues. And if a matrix isn’t square, you are out of luck entirely!

Eigendecomposition- Spectral Theorem (1)

Can we generalize eigendecomposition to apply to a wider family of matrices? Can we diagonalize matrices of any size, even if they don’t have “nice” properties?

Yes. Self-transposition is the key insight of our “eigendecomposition 2.0”. We define the self-transpositions of A as AA^{T} and A^{T}A.

Suppose A \in \mathbb{R}^{m x n}. Then AA^T \in \mathbb{R}^{mxm} and A^TA \in \mathbb{R}^{nxn}. So these matrices are square. But they are also symmetric!

To illustrate, consider the following.

A = \begin{bmatrix}  4 & 4 \\ -3 & 3 \\ \end{bmatrix}

Since A is not symmetric, we have no guarantee that its eigenvalues are real. Indeed, its eigenvalues turn out to be complex:

\det(A - \lambda I) = \begin{bmatrix} 4 - \lambda & 4 \\ -3 & 3 - \lambda \\ \end{bmatrix} = 0

(12 - 7 \lambda + \lambda^2) + 12 = 0 \Rightarrow \lambda^2 -7 \lambda + 24 = 0

\lambda = \frac{7 \pm \sqrt{(-7)^2 - 4*1*24}}{2*1} = \frac{7}{2} \pm \frac{\sqrt{47}i}{2}

Eigendecomposition on A sucks. Are the self-transposed matrices any better?

A^TA = \begin{bmatrix} 4 & -3 \\ 4 & 3 \\ \end{bmatrix} \begin{bmatrix} 4 & 4 \\ -3 & 3 \\ \end{bmatrix} = \begin{bmatrix} 25 & 7 \\ 7 & 25 \\ \end{bmatrix}

AA^T = \begin{bmatrix} 4 & 4 \\ -3 & 3 \\ \end{bmatrix} \begin{bmatrix} 4 & -3 \\ 4 & 3 \\ \end{bmatrix} = \begin{bmatrix} 32 & 0 \\ 0 & 18 \\ \end{bmatrix}

These matrices are symmetric! Thus, they are better candidates for eigendecomposition.

Towards Singular Value Decomposition

Singular Value Decomposition (SVD) is based on the principle that all matrices are eigendecomposable after self-transposition. It is essentially a bug fix:

SVD- Self-Transposition

An important way to picture SVD is with the idea of orthogonal bases. It is relatively easy to find any number of orthogonal bases for a given rowspace. Call the matrix of orthogonal vectors V.

We desire to find an orthogonal basis V such that AV produces an orthogonal basis in the column space. Orthogonal bases are not particularly hard to find. But most orthogonal bases, once projected to column space, will lose their orthogonality! We desire that particular orthogonal basis U such that V is also orthogonal.

We won’t require vectors in U to be the same size as those in V. Instead, we will normalize V; its basis vectors will be orthonormal (orthogonal and normal). Then the length of each vector in U will diverge by a scaling factor.

As we will soon see, these scaling factors are not eigenvalues. Instead, we will use sigmas instead of lambdas.

  • Scaling factors \sigma , analogous to eigenvalues \lambda.
  • Diagonal matrix \Sigma , analogous to diagonal matrix \Lambda

Our full picture then, looks like this:

SVD- Orthogonal Basis Transfer

Let us now translate this image into matrix language.

A \begin{bmatrix} \vdots & \vdots & \vdots & \vdots \\ v_1 & v_2 & \dots & v_n \\ \vdots & \vdots & \vdots & \vdots \\ \end{bmatrix} = \begin{bmatrix} \vdots & \vdots & \vdots & \vdots \\ \sigma_1u_1 & \sigma_2u_2 & \dots & \sigma_nu_n \\ \vdots & \vdots & \vdots & \vdots \\ \end{bmatrix}

But we can easily factorize the right-hand side:

\begin{bmatrix} \vdots & \vdots & \vdots & \vdots \\ \sigma_1u_1 & \sigma_2u_2 & \dots & \sigma_nu_n \\ \vdots & \vdots & \vdots & \vdots \\ \end{bmatrix} = \begin{bmatrix} \vdots & \vdots & \vdots & \vdots \\ u_1 & u_2 & \dots & u_n \\ \vdots & \vdots & \vdots & \vdots \\ \end{bmatrix} * \begin{bmatrix} \sigma_1 & 0 & \dots & 0 \\ 0 & \sigma_2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & \sigma_n \\ \end{bmatrix}

So we have that:

AV = U \Sigma

Since both V and U are orthogonal, inversion of either is equivalent to transposition:

A = U \Sigma V^{-1} = U \Sigma V^T

This strongly resembles our diagonalization equation A = Q \Lambda Q^T. SVD distinguishes itself by considering two orthogonal eigenmatrices U and V, not just one (Q).

Recalling that A = U \Sigma V^T ,

A^TA = (V \Sigma^T U^T) (U \Sigma V^T)

But now the innermost term cancels. Since \Sigma is a square diagonal matrix, its self-transposition is simply equal to \Sigma^{2}. So,

A^TA = V \Sigma^2 V^T

Since A^{T}A is a square, symmetric matrix, our diagonalization theorem applies!

A^TA = V \Sigma ^2 V^T = Q \Lambda Q^T

To find U, a similar trick works:

AA^T =  (U \Sigma V^T)(V \Sigma^T U^T) = U \Sigma^2 U^T = Q \Lambda Q^T

The relationships between SVD and eigendecomposition are as follows:

  • V is the eigenvectors of A^TA
  • U is the eigenvectors of AA^T
  • \Sigma is the square root of the eigenvalues matrix \Lambda

If any eigenvalue is negative, the corresponding sigma factor would be complex. But A^TA and AA^T are positive-semidefinite, which guarantees non-negative eigenvalues. This assures us that \Sigma contains only real values.

In contrast to eigendecomposition, every matrix has an SVD decomposition. Geometrically, V and U act as rotational transformations, and Sigma acts as a scaling transformation. In other words, every linear transformation comprises a rotation, then scaling, then another rotation.

SVD- Rotation Scaling Rotation (1)

A Worked Example

Let’s revisit A. Recall that:

A = \begin{bmatrix} 4 & 4 \\ -3 & 3 \\ \end{bmatrix}, A^TA = \begin{bmatrix} 25 & 7 \\ 7 & 25 \\ \end{bmatrix}, AA^T = \begin{bmatrix} 32 & 0 \\ 0 & 18 \\ \end{bmatrix}

Eigendecomposition against A is unpleasant because A is not symmetric. But A^{T}A is guaranteed to be positive semi-definite; that is, to have non-negative eigenvalues. Let’s see this in action.

det(A^TA - \lambda I) = \begin{bmatrix} 25 - \lambda & 7 \\ 7 & 25 - \lambda \\ \end{bmatrix} = 0

\lambda^2 - 50 \lambda + 576 = 0 \Rightarrow (\lambda_1 - 32)(\lambda_2 - 18) = 0

\lambda_1 = 32, \lambda_2 = 18

trace(A^TA) = 50 = \sum{\lambda_i}

det(A^TA) = 625-49 = 576 = 18 * 32 = \prod{\lambda_i}

These are positive, real eigenvalues. Perfect! Let’s now derive the corresponding (normalized) eigenvectors.

A - 32I = \begin{bmatrix} -7 & 7 \\ 7 & -7 \\ \end{bmatrix} = 0, A - 18I = \begin{bmatrix} 7 & 7 \\ 7 & 7 \\ \end{bmatrix} = 0

rref(A - 32I) = \begin{bmatrix} -1 & 1 \\ 0 & 0 \\ \end{bmatrix} = 0, rref(A-18I) = \begin{bmatrix} 1 & 1 \\ 0 & 0 \\ \end{bmatrix} = 0

v_1 = \begin{bmatrix} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \\ \end{bmatrix}, v_2 = \begin{bmatrix} \frac{1}{\sqrt{2}} \\ \frac{-1}{\sqrt{2}} \\ \end{bmatrix}

SVD intends to decompose A into U \Sigma V^{T}. The above findings give us two of these ingredients.

V^{T} = \begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ \frac{-1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ \end{bmatrix}

\Sigma =\begin{bmatrix} \sqrt{32} & 0 \\ 0 & \sqrt{18} \\ \end{bmatrix}

What’s missing? U! To find it, we perform eigendecomposition on AA^{T}. This is an especially easy task, because AA^{T} is already a diagonal matrix.

AA^T = \begin{bmatrix} 32 & 0 \\ 0 & 18 \\ \end{bmatrix}

U = \begin{bmatrix} u_1 & u_2 \\ \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ \end{bmatrix}

We have arrived at our first Singular Value Decomposition.

A = U \Sigma V^T = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ \end{bmatrix} \begin{bmatrix} \sqrt{32} & 0 \\ 0 & \sqrt{18} \\ \end{bmatrix} \begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & - \frac{1}{\sqrt{2}} \\ \end{bmatrix}

Okay, so let’s check our work. 😛

A = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ \end{bmatrix} \left( \begin{bmatrix} \sqrt{32} & 0 \\ 0 & \sqrt{18} \\ \end{bmatrix} \begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & - \frac{1}{\sqrt{2}} \\ \end{bmatrix} \right) = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ \end{bmatrix} \begin{bmatrix} 4 & 4 \\ 3 & 3 \\ \end{bmatrix} = \begin{bmatrix} 4 & 4 \\ 3 & 3 \\ \end{bmatrix}

These matrices are a viable factorization: multiplication successfully recovers A.

Takeaways

  • Eigendecomposition only works for a subclass of matrices; SVD decomposes all matrices.
  • SVD relies on self-transposition to convert any arbitrary matrix into one that works well against eigendecomposition (guarantees square m = n and symmetric A = A^{T}).
  • Another way to interpret SVD is by taking a special kind of orthogonal basis that, once passed through the linear transformation, preserves its orthogonality.
  • Every matrix A = U \Sigma V^{T}. That is, every linear transformation can be conceived as rotation + scaling + rotation.

Until next time.

Eigenvalues and Eigenvectors

Part Of: Algebra sequence
Followup To: An Introduction to Linear Algebra
Next Up: Singular Value Decomposition
Content Summary: 1300 words, 13 min read

Geometries of Eigenvectors

Matrices are functions that act on vectors, by mapping from row-vectors to column-vectors.  Consider two examples:

  1. Reflection matrices, which reflect vectors across some basis.
  2. Rotation matrices, which rotate vectors clockwise by \theta degrees.

eigenvectors-transformation-geometries-1

The set of eigenvectors of a matrix A is a special set of input vectors for which the matrix behaves as a scaling transformation. In other words, we desire the set of vectors \vec{x} whose output vectors A\vec{x} differ by a scaling factor.

Eigenvectors have a straightforward geometric interpretation:

  1. Reflection eigenvectors are orthogonal or parallel to the reflecting surface. In the left image above, that is the top two pairs of vectors.
  2. Rotation eigenvectors do not exist (more formally, cannot be visualized in \mathbb{R}^2).

Algebra of Eigenvectors

We can express our “parallel output” property as:

A\vec{x} = \lambda \vec{x}

Thus \vec{x} and A\vec{x} point in the same direction, but differ by scaling factor \lambda.

Scaling factor \lambda is the eigenvalue. There can be many \left( x, \lambda \right) pairs that satisfy the above equality.

For an \mathbb{R}^{n x n} matrix, there are n eigenvalues. These eigenvalues can be difficult to find. However, two facts aid our search:

  • The sum of eigenvalues equals the trace (sum of values along the diagonal).
  • The product of eigenvalues equals the determinant.

To solve, subtract \lambda \vec{x} from both sides:

A\vec{x} = \lambda \vec{x}

(A - \lambda I)\vec{x} = 0

We would like to identify n unique eigenvectors. But if the new matrix (A - \lambda I) has an empty nullspace, it will contain zero eigenvectors. So we desire this new matrix to be singular.

How to accomplish this?  By finding eigenvalues that satisfy the characteristic equation \det(A - \lambda I) = 0. Matrices are singular iff their determinants equal zero.

Let’s work through an example! What is the eigendecompositon for matrix A:

A = \begin{bmatrix} 3 & 1 \\ 1 & 3 \\ \end{bmatrix}

We need to find eigenvalues that solve the characteristic equation.

\det(A - \lambda I) = \begin{vmatrix} 3-\lambda & 1 \\ 1 & 3-\lambda \\ \end{vmatrix} = 0

(3 - \lambda)^2 - 1^2 = \lambda_2 -6\lambda + 8 = (\lambda-2)(\lambda-4) = 0

\lambda_1 = 2, \lambda_2 = 4

Are these eigenvalues correct? Let’s check our work:

trace(A) = 6 = \sum{\lambda_i}

det(A) = 8 = \prod{\lambda_i}

How to find our eigenvectors? By solving the nullspace given each eigenvalue.

For \lambda_1=2 :

A - 2I = \begin{bmatrix} 1 & 1 \\ 1 & 1 \\ \end{bmatrix} \Rightarrow rref(A - 2I) = \begin{bmatrix} 1 & 1 \\ 0 & 0 \\ \end{bmatrix}

(\lambda_1, \vec{x}_1) = (2, \begin{bmatrix} 1 \\ -1 \\ \end{bmatrix})

For \lambda_2=4 :

A - 4I = \begin{bmatrix} -1 & 1 \\ 1 & -1 \\ \end{bmatrix} \Rightarrow rref(A - 4I) = \begin{bmatrix} -1 & 1 \\ 0 & 0 \\ \end{bmatrix}

(\lambda_2, \vec{x}_2) = (4, \begin{bmatrix} 1 \\ 1 \\ \end{bmatrix})

Desirable Matrix Properties

The above example was fairly straightforward. But eigendecomposition can “go awry”, as we shall see. Consider a rotation matrix, which in two dimensions has the following form:

R = \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \\ \end{bmatrix}

What are the eigenvalues for rotation \theta = 90^{\circ} ?

R = \begin{bmatrix} 0 & -1 \\ 1 & 0 \\ \end{bmatrix}

\det(R - \lambda I) = \begin{bmatrix} - \lambda & -1 \\ 1 & - \lambda \\ \end{bmatrix} = 0

(- \lambda)^2 - 1^2 \Rightarrow \lambda^2 = 1

\lambda_1 = i, \lambda_2 = -i

We can check our work:

trace(R) = 0 = \sum{\lambda_i}

det(R) = 1 = \prod{\lambda_i}

We saw earlier that rotation matrices have no geometric interpretation. Here, we have algebraically shown that its eigenvalues are complex.

A = \left[ \begin{smallmatrix} 3 & 1 \\ 1 & 3 \\ \end{smallmatrix} \right] has real eigenvalues, but R = \left[ \begin{smallmatrix} 0 & -1 \\ 1 & 0 \\ \end{smallmatrix} \right] has less-desirable complex eigenvalues.

We can generalize the distinction between A and R as follows:

Spectral Theorem. Any matrix that is symmetric (A = AT) is guaranteed to have real, nonnegative eigenvalues. The corresponding n eigenvectors are guaranteed to be orthogonal.

In other words, eigendecomposition works best against symmetric matrices.

Eigendecomposition- Spectral Theorem (1)

Diagonalization

Let us place each eigenvector in the column of a matrix S. What happens when you multiply the original matrix A by this new matrix? Since S contains eigenvectors, multiplication by A reduces to multiplication by the associated eigenvalues:

AS = \begin{bmatrix} \vdots & \vdots & \vdots & \vdots \\ \lambda_1x_1 & \lambda_2x_2 & \dots & \lambda_nx_n \\ \vdots & \vdots & \vdots & \vdots \\ \end{bmatrix}

We see the product contains a mixture of eigenvalues and eigenvectors. We can separate these by “pulling out” the eigenvalues into a diagonal matrix. Call this matrix \Lambda (“capital lambda”).

AS = \begin{bmatrix} \vdots & \vdots & \vdots & \vdots \\ x_1 & x_2 & \dots & x_n \\ \vdots & \vdots & \vdots & \vdots \\ \end{bmatrix} * \begin{bmatrix} \lambda_1 & 0 & \dots & 0 \\ 0 & \lambda_2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & \lambda_n \\ \end{bmatrix} = S \Lambda

Most matrices have the property that its eigenvectors are linearly independent. For such matrices, S is invertible. Given this fact, we can solve for A:

\Lambda = S^{-1} A S

A = S \Lambda S^{-1}

Matrices that can be factorized in this way are said to be diagonalizable. We can see that both elimination and eigendecomposition are performing the same type of work: factorizing matrices into their component parts.

If A is symmetric, then we know \Lambda is real, and its eigenvectors in S are orthogonal. Let us rename S to be Q, to reflect this additional property. But orthogonal matrices have the property that transposition equats inversion: Q^T = Q^{-1}. Thus, if A is symmetric, we can simplify the diagonalization formula to:

A = Q \Lambda Q^{-1} = Q \Lambda Q^T

Asymptoptic Interpretations

This diagonalization approach illustrates an important use case of eigenvectors: power matrices. What happens when A is applied arbitrarily many times? What does the output look like in the limit?

We can use the diagonalization equation to represent A^k:

A^k = \prod_{i=1}^{k} (Q^{-1} \Lambda Q) = (Q^{-1} \Lambda Q)(Q^{-1} A Q)\dots(Q^{-1} \Lambda Q)

We can simplify by canceling the inner terms QQ^{-1}:

A^k = Q^{-1} \Lambda^k Q

This equation tells us that the eigenvectors is invariant to how many times A is applied. In contrast, eigenvalue matrix \Lambda has important implications for ongoing processes:

  • If each eigenvalue has magnitude less than one, the output will trend towards zero.
  • If each eigenvalue has magnitude greater than one, the output will trend to infinity.

Fibonacci Eigenvalues

The powers interpretation of eigenvalues sheds light on the behavior of all linear processes. This includes number sequences such as the Fibonacci numbers, where each number is the sum of the previous two numbers.

Recall the Fibonacci numbers are 0,1,1,2,3,5,8,13,... What is F_{100} ?

Eigenvalues can answer this question. We must first express the Fibonacci generator as a linear equation:

F(k+2) = 1F(k+1) + 1F(k)

In order to translate this into a meaningful matrix, we must add a “redundant” equation

F(k+1) = 1F(k+1) + 0F(k)

With these equations, we can create a 2×2 Fibonacci matrix F.

F = \begin{bmatrix} 1 & 1 \\ 1 & 0 \\ \end{bmatrix}

This matrix uniquely generates Fibonacci numbers.

u_1 = Fu_0 = \begin{bmatrix} 1 & 1 \\ 1 & 0 \\ \end{bmatrix} \begin{bmatrix} 1 \\ 0 \\ \end{bmatrix} = \begin{bmatrix} 1 \\ 1 \\ \end{bmatrix}

u_4 = F^4u_0 = \begin{bmatrix} 1 & 1 \\ 1 & 0 \\ \end{bmatrix}^4 \begin{bmatrix} 1 \\ 0 \\ \end{bmatrix} = \begin{bmatrix} 5 \\ 3 \\ \end{bmatrix}

To discover the rate at Fibonacci numbers grow, we decompose F into its eigenvalues:

\det(F - \lambda I) = \begin{vmatrix} 1 - \lambda & 1 \\ 1 & 1- \lambda \\ \end{vmatrix} = 0

 \lambda^2 - 2\lambda - 1 = 0

 \lambda_1 = \frac{1 + \sqrt{5}}{2}, \lambda_2 = \frac{1 - \sqrt{5}}{2}

 \lambda_1 = 1.61803, \lambda_2 = -0.61803

trace(F) = 1 = \sum{\lambda_i}

det(F) = -1 = \prod{\lambda_i}

We can go on to discover eigenvectors x_1 and x_2. We can then express the Fibonnaci matrix F as

F = \lambda_1x_1 + \lambda_2 x_2

F^k = \lambda_1^k x_1 + \lambda_2^k x_2

As k goes to infinity, the second term goes to zero. Thus, the ratio is dominated by the larger eigenvalue, 1.61803.

Mathematicians in the audience will recognize this number as the golden ratio.

eigenvectors-fibonacci-and-golden-ratio

We have long known that the ratio of successive Fibonnaci numbers converges to 1.61803. Eigenvalues provide a mechanism to derive this value analytically.

Until next time.