Processing math: 100%

Singular Value Decompositions


Learning Objectives

Singular Value Decomposition

An m×n real matrix A has a singular value decomposition of the form

A=UΣVT

where

Σ=[σ1σs0000]when m>n,andΣ=[σ100σs00]whenm<n.

where s=min(m,n) and σ1σ2σs0 are the square roots of the eigenvalues values of ATA. The diagonal entries are called the singular values of A.

Time Complexity

The time-complexity for computing the SVD factorization of an arbitrary m×n matrix is proportional to m2n+n3, where the constant of proportionality ranges from 4 to 10 (or more) depending on the algorithm.

In general, we can define the cost as:

O(m2n+n3)

Reduced SVD

The SVD factorization of a non-square matrix A of size m×n can be represented in a reduced format:


The following figure depicts the reduced SVD factorization (in red) against the full SVD factorizations (in gray).

In general, we will represent the reduced SVD as:

A=URΣRVTR

where UR is a m×s matrix, VR is a n×s matrix, ΣR is a s×s matrix, and s=min(m,n).

Example: Computing the SVD

We begin with the following non-square matrix, A

A=[323882874187647]

and we will compute the reduced form of the SVD (where here s=3):

(1) Compute ATA:

ATA=[174158106158197134106134127]

(2) Compute the eigenvectors and eigenvalues of ATA:

λ1=437.479,λ2=42.6444,λ3=17.8766,v1=[0.5850510.6526480.481418],v2=[0.7103990.1260680.692415],v3=[0.3912120.7470980.537398]

(3) Construct VR from the eigenvectors of ATA:

VR=[0.5850510.7103990.3912120.6526480.1260680.7470980.4814180.6924150.537398].

(4) Construct ΣR from the square roots of the eigenvalues of ATA:

ΣR=[20.9160006.532070004.22807]

(5) Find U by solving UΣ=AV. For our reduced case, we can find UR=AVRΣ1R. You could also find U by computing the eigenvectors of AAT.

U=A[323882874187647]V[0.5850510.7103990.3912120.6526480.1260680.7470980.4814180.6924150.537398]Σ1[0.0478100.00.00.00.1531330.00.00.00.236515] U=[0.2153710.0303480.3054900.5194320.5037790.4191730.5342620.3110210.0117300.4387150.7878780.4313520.4537590.1667290.738082]

We obtain the following singular value decomposition for A:

A[323882874187647]=U[0.2153710.0303480.3054900.5194320.5037790.4191730.5342620.3110210.0117300.4387150.7878780.4313520.4537590.1667290.738082]Σ[20.9160006.532070004.22807]VT[0.5850510.6526480.4814180.7103990.1260680.6924150.3912120.7470980.537398]

Recall that we computed the reduced SVD factorization (i.e. Σ is square, U is non-square) here.

Rank, null space and range of a matrix

Suppose A is a m×n matrix where m>n (without loss of generality):

A=UΣVT=[||||||u1unum||||||][σ1σn0][vT1vTn]

We can re-write the above as:

A=[||||u1un||||][σ1vT1σnvTn]

Furthermore, the product of two matrices can be written as a sum of outer products:

A=σ1u1vT1+σ2u2vT2+...+σnunvTn

For a general rectangular matrix, we have:

A=si=1σiuivTi

where s=min(m,n).

If A has s non-zero singular values, the matrix is full rank, i.e. rank(A)=s.

If A has r non-zero singular values, and r<s, the matrix is rank deficient, i.e. rank(A)=r.

In other words, the rank of A equals the number of non-zero singular values which is the same as the number of non-zero diagonal elements in Σ.

Rounding errors may lead to small but non-zero singular values in a rank deficient matrix. Singular values that are smaller than a given tolerance are assumed to be numerically equivalent to zero, defining what is sometimes called the effective rank.

The right-singular vectors (columns of V) corresponding to vanishing singular values of A span the null space of A, i.e. null(A) = span{vr+1, vr+2, …, vn}.

The left-singular vectors (columns of U) corresponding to the non-zero singular values of A span the range of A, i.e. range(A) = span{u1, u2, …, ur}.

Example:

A=[121200122120000010010][14000140000000][100010001]

The rank of A is 2.

The vectors [121200] and [121200] provide an orthonormal basis for the range of A.

The vector [001] provides an orthonormal basis for the null space of A.

(Moore-Penrose) Pseudoinverse

If the matrix Σ is rank deficient, we cannot get its inverse. We define instead the pseudoinverse:

(Σ+)ii={1σiσi00σi=0

For a general non-square matrix A with known SVD (A=UΣVT), the pseudoinverse is defined as:

A+=VΣ+UT

For example, if we consider a m×n full rank matrix where m>n:

A+=[|...|v1...vn|...|][1/σ1001/σn00][||||||u1unum||||||]T

Euclidean norm of matrices

The induced 2-norm of a matrix A can be obtained using the SVD of the matrix :

A2=maxx=1Ax=maxx=1UΣVTx=maxx=1ΣVTx=maxVTx=1ΣVTx=maxy=1Σy

And hence,

A2=σ1

In the above equations, all the notations for the norm . refer to the p=2 Euclidean norm, and we used the fact that U and V are orthogonal matrices and hence U2=V2=1.

Example:

We begin with the following non-square matrix A:

A=[323882874187647].

The matrix of singular values, Σ, computed from the SVD factorization is:

Σ=[20.9160006.532070004.22807].

Consequently the 2-norm of A is

A2=20.916.

Euclidean norm of the inverse of matrices

Following the same derivation as above, we can show that for a full rank n×n matrix we have:

A12=1σn

where σn is the smallest singular value.

For non-square matrices, we can use the definition of the pseudoinverse (regardless of the rank):

A+2=1σr

where σr is the smallest non-zero singular value. Note that for a full rank square matrix, we have A+2=A12. An exception of the definition above is the zero matrix. In this case, A+2=0

2-Norm Condition Number

The 2-norm condition number of a matrix A is given by the ratio of its largest singular value to its smallest singular value:

cond2(A)=A2A12=σmax/σmin.

If the matrix A is rank deficient, i.e. rank(A)<min(m,n), then cond2(A)=.

Low-rank Approximation

The best rank-k approximation for a m×n matrix A, where k<s=min(m,n), for some matrix norm ., is one that minimizes the following problem:

minAk AAksuch thatrank(Ak)k.

Under the induced 2-norm, the best rank-k approximation is given by the sum of the first k outer products of the left and right singular vectors scaled by the corresponding singular value (where, σ1σs):

Ak=σ1u1vT1+σkukvTk

Observe that the norm of the difference between the best approximation and the matrix under the induced 2-norm condition is the magnitude of the (k+1)th singular value of the matrix:

AAk2=||ni=k+1σiuivTi||2=σk+1

Note that the best rank-k approximation to A can be stored efficiently by only storing the k singular values σ1,,σk, the k left singular vectors u1,,uk, and the k right singular vectors v1,,vk.

The figure below show best rank-k approximations of an image (you can find the code snippet that generates these images in the IPython notebook):

Review Questions

  1. For a matrix A with SVD decomposition A=UΣVT, what are the columns of U and how can we find them? What are the columns of V and how can we find them? What are the entries of Σ and how can we find them?
  2. What special properties are true of U, V and Σ?
  3. What are the shapes of U, V and Σ in the full SVD of an m×n matrix?
  4. What are the shapes of U, V and Σ in the reduced SVD of an m×n matrix?
  5. What is the cost of computing the SVD?
  6. Given an already computed SVD of a matrix A, what is the cost of using the SVD to solve a linear system Ax=b? How would you use the SVD to solve this system?
  7. How do you use the SVD to compute a low-rank approximation of a matrix? For a small matrix, you should be able to compute a given low rank approximation (i.e. rank-one, rank-two).
  8. Given the SVD of a matrix A, what is the SVD of A+ (the psuedoinverse of A)?
  9. Given the SVD of a matrix A, what is the 2-norm of the matrix? What is the 2-norm condition number of the matrix?

ChangeLog