the Eckart-Young-Misky Theorem

The theorem

Eckart-Young-Misky Theorem
ARm×nA \in \mathbb{R}^{m\times n}
A=UΣV=i=1nσiuiviA = U\Sigma V^{\top}= \sum_{i=1}^n \sigma_i \vec{u}_i \vec{v}_i^{\top}

And we define

Ak=i=1kσiuiviA_k = \sum_{i=1}^k \sigma_i \vec{u}_i \vec{v}_i

We will see that

(1)

arg minBRm×n,rk(B)=kABF=Ak\argmin_{B \in \mathbb{R}^{m\times n}, rk(B) = k} ||A-B||_F= A_k

(2)

arg minBRm×n,rk(B)=kAB2=Ak\argmin_{B \in \mathbb{R}^{m \times n}, rk(B) = k} ||A-B||_2 = A_k

L2 Norm Case

We will first prove the L2-norm case

AAk2=i=k+1nσiuivi2=maxw2=1w(AAk)w2=σk+1||A-A_k||_2 = \sum_{i=k+1}^n ||\sigma_i \vec{u}_i \vec{v}_i^{\top}||_2 = \max_{||\vec{w}||_2=1} ||\vec{w}^{\top}(A-A_k)\vec{w}||_2 = \sigma_{k+1}

Now show that for all rk(B)=k,AB2σk+1rk(B) = k, ||A-B||_2 \ge \sigma_{k+1}

AB2(AB)w2(for any w)Choose wN(B)Aw2\begin{split} ||A-B||_2 &\ge ||(A-B)\vec{w}||_2 \quad (\text{for any } \vec{w} ) \\ &\text{Choose $\vec{w} \in N(B)$} \\ &\ge ||A\vec{w}||_2 \end{split}

Consider Vk+1V_{k+1}

Vk+1=[v1v2vk+1]V_{k+1} = \begin{bmatrix} \vec{v}_1 &\vec{v}_2 &\cdots &\vec{v}_{k+1} \end{bmatrix}

Property:

rk(Vk+1)=k+1rk(V_{k+1}) = k+1

Dim(N(B))=nkDim(N(B)) = n-k

So…. (nk)+(k+1)=n+1>n(n-k)+(k+1) = n+1 > n

There exist one dimension overlap in R(Vk+1)R(V_{k+1}) and N(B)N(B)

So instead of choosing wN(B)\vec{w} \in N(B), chose wN(B)R(Vk+1)\vec{w} \in N(B) \cap R(V_{k+1}) such that

w=Vα=[Vk+1Vrest][α1αk+100]=α1v1+α2v2++αk+1vk+1\begin{split} \vec{w} &= V\vec{\alpha} = \begin{bmatrix} V_{k+1} &V_{rest} \end{bmatrix} \begin{bmatrix} \alpha_1 \\ \vdots \\ \alpha_{k+1} \\ 0 \\ \vdots \\ 0 \end{bmatrix} \\ &=\alpha_1 \vec{v}_1 + \alpha_2 \vec{v}_2 + \cdots + \alpha_{k+1} \vec{v}_{k+1} \end{split}

And choose αi\alpha_i such that w2=1||\vec{w}||_2 = 1i=1k+1αi2=1\sum_{i=1}^{k+1} \alpha_i^2 = 1

Coming back to

AB2(AB)w2(for any w)=Aw2=UΣVVα2=UΣα2=Σα2=α12σ12++αk+12σk+12σk+12\begin{split} ||A-B||_2 &\ge ||(A-B)\vec{w}||_2 \quad (\text{for any } \vec{w} ) \\ &\quad = ||A\vec{w}||_2 \\ &\quad = ||U\Sigma V^{\top} V\vec{\alpha}||_2 \\ &\quad = ||U\Sigma\vec{\alpha}||_2 \\ &\quad = ||\Sigma \vec{\alpha}||_2 \\ &\quad = \alpha_1^2\sigma_1^2 + \cdots + \alpha_{k+1} ^ 2\sigma_{k+1} ^ 2 \\ &\quad \ge \sigma_{k+1}^2 \end{split}

Frob. Norm Case

AF=i,jAij2=trace(AA)||A||_F = \sqrt{\sum_{i,j} A_{ij}^2} = \sqrt{trace(A^{\top}A)}

Nice property about trace:

trace(AB)=trace(BA)trace(AB)=trace(BA)

Also:

AUF=UAF=AF||AU||_F = ||UA||_F = ||A||_F

Frobenius Norm Invariant to Orthonormal Transform Proof

So:

AF=UΣVF=ΣF=i=1nσi2\begin{split} ||A||_F &=||U\Sigma V^{\top}||_F = ||\Sigma||_F \\ &=\sqrt{\sum_{i=1}^n \sigma_i^2} \end{split}

We still want to show:

Ak=arg minBRm×n,rk(B)=kABFA_k = \argmin_{B \in \mathbb{R}^{m \times n}, rk(B) = k} ||A-B||_F

In other words, we want

ABFAAkF||A-B||_F \ge ||A-A_k||_F

where

Ak=i=1kσiuiviA_k = \sum_{i=1}^k \sigma_i \vec{u}_i \vec{v}_i^{\top}

Let’s start!

AAkF=i=k+1nσiuiviF=i=k+1nσ2\begin{split} ||A-A_k||_F &= ||\sum_{i=k+1}^n \sigma_i \vec{u}_i \vec{v}_i^{\top}||_F \\ &= \sqrt{\sum_{i=k+1}^n \sigma^2} \end{split}

So with this, can we try show that:

i=1nσi2(AB)i=k+1nσi2(A)\sum_{i=1}^n \sigma_i^2(A-B) \ge \sum_{i = k+1}^n \sigma_i^2(A)

Note in our last proof for the L2-norm case, we have

A2=σmax(A)||A||_2 = \sigma_{\max}(A)

Therefore,

σk+i(A)=(k+i)-th largest σ of A=largest σ after top (k+i1) σ are removed=AAk+i12\begin{split} \sigma_{k+i}(A) &= \text{$(k+i)$-th largest $\sigma$ of A} \\ &= \text{largest $\sigma$ after top $(k+i-1)$ $\sigma$ are removed} \\ &= ||A-A_{k+i-1}||_2 \end{split}

Maybe we can do some similar transformation for σi2(AB)\sigma_i^2(A-B)?

Denote AB=CA-B=C

σi(AB)=σi(C)=CCi12\sigma_i(A-B)=\sigma_i(C)=||C-C_{i-1}||_2

We already have the fact that:

Brank kσk+1(B)=0BBk2=0B \rightarrow \text{rank } k \\ \sigma_{k+1}(B) = 0 \\ ||B-B_k||_2 = 0

Hmm…So we can just add a zero term to the σi(AB)\sigma_i(A-B)

σi(AB)=CCi12+BBk2undefinedtriangular inequalityC+BCi1Bk2ACi1Bk2\begin{split} \sigma_i(A-B) &=||C-C_{i-1}||_2 + ||B-B_k||_2 \\ &\underbrace{\ge}_{\mathclap{\text{triangular inequality}}}||C+B-C_{i-1}-B_k||_2 \\ &\ge ||A-C_{i-1}-B_k||_2 \end{split}

We know:

rk(Bk)=k,rk(Ci1)i1rk(B_k)=k, rk(C_{i-1}) \le i-1

We also know this fact that for any two matrices A,BA, B:

rk(A+B)rk(A)+rk(B)rk(A+B) \le rk(A)+rk(B)

So we know for D=Ci1+BkD = C_{i-1}+B_k

rk(D)k+i1rk(D) \le k + i-1

So.

σi(AB)ACi1Bk2AD2\begin{split} \sigma_i(A-B) &\ge ||A-C_{i-1}-B_k||_2 \\ &\ge ||A-D||_2 \\ \end{split}

We know that (from previous section)

arg minDRm×n,rk(D)=i+k1AD2=Ak+i1minDRm×n,rk(D)=i+k1AD2=σk+i(A)\argmin_{D \in \mathbb{R}^{m \times n}, rk(D)=i+k-1} ||A-D||_2 = A_{k+i-1} \\ \min_{D \in \mathbb{R}^{m \times n}, rk(D)=i+k-1} ||A-D||_2 = \sigma_{k+i}(A)

Therefore

σi(AB)AD2σk+i(A)\begin{split} \sigma_i(A-B) &\ge ||A-D||_2 \\ &\ge \sigma_{k+i}(A) \end{split}

Yeah!!!

We showed that

BRm×n,rk(B)=k,σi(AB)σk+i(A)\forall B\in \mathbb{R}^{m \times n}, rk(B) = k, \\ \sigma_i(A-B) \ge \sigma_{k+i}(A)

therefore

i=1nσi2(AB)i=k+1nσi2(A)\sum_{i=1}^n \sigma_i^2(A-B) \ge \sum_{i = k+1}^n \sigma_i^2(A)

And therefore

i=1nσi2(AB)i=k+1nσi2(A)\sqrt{\sum_{i=1}^n \sigma_i^2(A-B)} \ge \sqrt{\sum_{i = k+1}^n \sigma_i^2(A)}

And therefore

ABFAAkF||A-B||_F \ge ||A-A_k||_F

Some wrong type of proof

minrk(B)=kABF=minrk(B)=kUΣVBF=minrk(B)=kΣUBVF=minrk(Z)=k,ZdiagΣZFundefinedSince having elements that does not lay along the diagonal only increases F normSo just try to pick Z=Σ and it completes the proof\begin{split} \min_{rk(B) = k} ||A-B||_F &=\min_{rk(B)=k} ||U\Sigma V^{\top} -B||_F \\ &=\min_{rk(B)=k} ||\Sigma - U^{\top}BV||_F \\ &=\underbrace{\min_{rk(Z)=k, Z \in diag} ||\Sigma - Z||_F}_{\mathclap{\text{Since having elements that does not lay along the diagonal only increases F norm}}} \\ &\text{So just try to pick $Z = \Sigma$ and it completes the proof} \end{split}

What’s wrong???

Last few steps ⇒ proof by talking

When we transform UBVZU^{\top}BV \rightarrow Z, if we eliminate the non-diagonal terms, we might be increasing ranks.