# Singular Value Decomposition

Part Of: Algebra sequence
Followup To: Eigenvalues and Eigenvectors
Content Summary: 1300 words, 13 min read.

Limitations of Eigendecomposition

Last time, we learned how to locate eigenvalues and eigenvectors of a given matrix. Diagonalization is the process where a square matrix can be decomposed into two factors: a matrix $Q$ (with eigenvectors along the columns) and a matrix $\Lambda$ (with eigenvalues along the diagonal).

$A = Q \Lambda Q^T$

But as we saw with the spectral theorem, eigendecomposition only works well against square, symmetric matrices. If a matrix isn’t symmetric, it is easy to run into complex eigenvalues. And if a matrix isn’t square, you are out of luck entirely!

Can we generalize eigendecomposition to apply to a wider family of matrices? Can we diagonalize matrices of any size, even if they don’t have “nice” properties?

Yes. Self-transposition is the key insight of our “eigendecomposition 2.0”. We define the self-transpositions of $A$ as $AA^{T}$ and $A^{T}A$.

Suppose $A \in \mathbb{R}^{m x n}$. Then $AA^T \in \mathbb{R}^{mxm}$ and $A^TA \in \mathbb{R}^{nxn}$. So these matrices are square. But they are also symmetric!

To illustrate, consider the following.

$A = \begin{bmatrix} 4 & 4 \\ -3 & 3 \\ \end{bmatrix}$

Since A is not symmetric, we have no guarantee that its eigenvalues are real. Indeed, its eigenvalues turn out to be complex:

$\det(A - \lambda I) = \begin{bmatrix} 4 - \lambda & 4 \\ -3 & 3 - \lambda \\ \end{bmatrix} = 0$

$(12 - 7 \lambda + \lambda^2) + 12 = 0 \Rightarrow \lambda^2 -7 \lambda + 24 = 0$

$\lambda = \frac{7 \pm \sqrt{(-7)^2 - 4*1*24}}{2*1} = \frac{7}{2} \pm \frac{\sqrt{47}i}{2}$

Eigendecomposition on $A$ sucks. Are the self-transposed matrices any better?

$A^TA = \begin{bmatrix} 4 & -3 \\ 4 & 3 \\ \end{bmatrix} \begin{bmatrix} 4 & 4 \\ -3 & 3 \\ \end{bmatrix} = \begin{bmatrix} 25 & 7 \\ 7 & 25 \\ \end{bmatrix}$

$AA^T = \begin{bmatrix} 4 & 4 \\ -3 & 3 \\ \end{bmatrix} \begin{bmatrix} 4 & -3 \\ 4 & 3 \\ \end{bmatrix} = \begin{bmatrix} 32 & 0 \\ 0 & 18 \\ \end{bmatrix}$

These matrices are symmetric! Thus, they are better candidates for eigendecomposition.

Towards Singular Value Decomposition

Singular Value Decomposition (SVD) is based on the principle that all matrices are eigendecomposable after self-transposition. It is essentially a bug fix:

An important way to picture SVD is with the idea of orthogonal bases. It is relatively easy to find any number of orthogonal bases for a given rowspace. Call the matrix of orthogonal vectors $V$.

We desire to find an orthogonal basis $V$ such that $AV$ produces an orthogonal basis in the column space. Orthogonal bases are not particularly hard to find. But most orthogonal bases, once projected to column space, will lose their orthogonality! We desire that particular orthogonal basis $U$ such that $V$ is also orthogonal.

We won’t require vectors in $U$ to be the same size as those in $V$. Instead, we will normalize $V$; its basis vectors will be orthonormal (orthogonal and normal). Then the length of each vector in $U$ will diverge by a scaling factor.

As we will soon see, these scaling factors are not eigenvalues. Instead, we will use sigmas instead of lambdas.

• Scaling factors $\sigma$ , analogous to eigenvalues $\lambda$.
• Diagonal matrix $\Sigma$ , analogous to diagonal matrix $\Lambda$

Our full picture then, looks like this:

Let us now translate this image into matrix language.

$A \begin{bmatrix} \vdots & \vdots & \vdots & \vdots \\ v_1 & v_2 & \dots & v_n \\ \vdots & \vdots & \vdots & \vdots \\ \end{bmatrix} = \begin{bmatrix} \vdots & \vdots & \vdots & \vdots \\ \sigma_1u_1 & \sigma_2u_2 & \dots & \sigma_nu_n \\ \vdots & \vdots & \vdots & \vdots \\ \end{bmatrix}$

But we can easily factorize the right-hand side:

$\begin{bmatrix} \vdots & \vdots & \vdots & \vdots \\ \sigma_1u_1 & \sigma_2u_2 & \dots & \sigma_nu_n \\ \vdots & \vdots & \vdots & \vdots \\ \end{bmatrix} = \begin{bmatrix} \vdots & \vdots & \vdots & \vdots \\ u_1 & u_2 & \dots & u_n \\ \vdots & \vdots & \vdots & \vdots \\ \end{bmatrix} * \begin{bmatrix} \sigma_1 & 0 & \dots & 0 \\ 0 & \sigma_2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & \sigma_n \\ \end{bmatrix}$

So we have that:

$AV = U \Sigma$

Since both $V$ and $U$ are orthogonal, inversion of either is equivalent to transposition:

$A = U \Sigma V^{-1} = U \Sigma V^T$

This strongly resembles our diagonalization equation $A = Q \Lambda Q^T$. SVD distinguishes itself by considering two orthogonal eigenmatrices $U$ and $V$, not just one ($Q$).

Recalling that $A = U \Sigma V^T$ ,

$A^TA = (V \Sigma^T U^T) (U \Sigma V^T)$

But now the innermost term cancels. Since $\Sigma$ is a square diagonal matrix, its self-transposition is simply equal to $\Sigma^{2}$. So,

$A^TA = V \Sigma^2 V^T$

Since $A^{T}A$ is a square, symmetric matrix, our diagonalization theorem applies!

$A^TA = V \Sigma ^2 V^T = Q \Lambda Q^T$

To find $U$, a similar trick works:

$AA^T = (U \Sigma V^T)(V \Sigma^T U^T) = U \Sigma^2 U^T = Q \Lambda Q^T$

The relationships between SVD and eigendecomposition are as follows:

• $V$ is the eigenvectors of $A^TA$
• $U$ is the eigenvectors of $AA^T$
• $\Sigma$ is the square root of the eigenvalues matrix $\Lambda$

If any eigenvalue is negative, the corresponding sigma factor would be complex. But $A^TA$ and $AA^T$ are positive-semidefinite, which guarantees non-negative eigenvalues. This assures us that $\Sigma$ contains only real values.

In contrast to eigendecomposition, every matrix has an SVD decomposition. Geometrically, $V$ and $U$ act as rotational transformations, and Sigma acts as a scaling transformation. In other words, every linear transformation comprises a rotation, then scaling, then another rotation.

A Worked Example

Let’s revisit $A$. Recall that:

$A = \begin{bmatrix} 4 & 4 \\ -3 & 3 \\ \end{bmatrix}, A^TA = \begin{bmatrix} 25 & 7 \\ 7 & 25 \\ \end{bmatrix}, AA^T = \begin{bmatrix} 32 & 0 \\ 0 & 18 \\ \end{bmatrix}$

Eigendecomposition against $A$ is unpleasant because $A$ is not symmetric. But $A^{T}A$ is guaranteed to be positive semi-definite; that is, to have non-negative eigenvalues. Let’s see this in action.

$det(A^TA - \lambda I) = \begin{bmatrix} 25 - \lambda & 7 \\ 7 & 25 - \lambda \\ \end{bmatrix} = 0$

$\lambda^2 - 50 \lambda + 576 = 0 \Rightarrow (\lambda_1 - 32)(\lambda_2 - 18) = 0$

$\lambda_1 = 32, \lambda_2 = 18$

$trace(A^TA) = 50 = \sum{\lambda_i}$

$det(A^TA) = 625-49 = 576 = 18 * 32 = \prod{\lambda_i}$

These are positive, real eigenvalues. Perfect! Let’s now derive the corresponding (normalized) eigenvectors.

$A - 32I = \begin{bmatrix} -7 & 7 \\ 7 & -7 \\ \end{bmatrix} = 0, A - 18I = \begin{bmatrix} 7 & 7 \\ 7 & 7 \\ \end{bmatrix} = 0$

$rref(A - 32I) = \begin{bmatrix} -1 & 1 \\ 0 & 0 \\ \end{bmatrix} = 0, rref(A-18I) = \begin{bmatrix} 1 & 1 \\ 0 & 0 \\ \end{bmatrix} = 0$

$v_1 = \begin{bmatrix} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \\ \end{bmatrix}, v_2 = \begin{bmatrix} \frac{1}{\sqrt{2}} \\ \frac{-1}{\sqrt{2}} \\ \end{bmatrix}$

SVD intends to decompose $A$ into $U \Sigma V^{T}$. The above findings give us two of these ingredients.

$V^{T} = \begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ \frac{-1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ \end{bmatrix}$

$\Sigma =\begin{bmatrix} \sqrt{32} & 0 \\ 0 & \sqrt{18} \\ \end{bmatrix}$

What’s missing? $U$! To find it, we perform eigendecomposition on $AA^{T}$. This is an especially easy task, because $AA^{T}$ is already a diagonal matrix.

$AA^T = \begin{bmatrix} 32 & 0 \\ 0 & 18 \\ \end{bmatrix}$

$U = \begin{bmatrix} u_1 & u_2 \\ \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ \end{bmatrix}$

We have arrived at our first Singular Value Decomposition.

$A = U \Sigma V^T = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ \end{bmatrix} \begin{bmatrix} \sqrt{32} & 0 \\ 0 & \sqrt{18} \\ \end{bmatrix} \begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & - \frac{1}{\sqrt{2}} \\ \end{bmatrix}$

Okay, so let’s check our work. 😛

$A = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ \end{bmatrix} \left( \begin{bmatrix} \sqrt{32} & 0 \\ 0 & \sqrt{18} \\ \end{bmatrix} \begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & - \frac{1}{\sqrt{2}} \\ \end{bmatrix} \right) = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ \end{bmatrix} \begin{bmatrix} 4 & 4 \\ 3 & 3 \\ \end{bmatrix} = \begin{bmatrix} 4 & 4 \\ 3 & 3 \\ \end{bmatrix}$

These matrices are a viable factorization: multiplication successfully recovers $A$.

Takeaways

• Eigendecomposition only works for a subclass of matrices; SVD decomposes all matrices.
• SVD relies on self-transposition to convert any arbitrary matrix into one that works well against eigendecomposition (guarantees square $m = n$ and symmetric $A = A^{T}$).
• Another way to interpret SVD is by taking a special kind of orthogonal basis that, once passed through the linear transformation, preserves its orthogonality.
• Every matrix $A = U \Sigma V^{T}$. That is, every linear transformation can be conceived as rotation + scaling + rotation.

Until next time.