manierato5h

2022-06-21

PCA formulation: Find the best rank-k subspace of d-dimensional space
Finding the best rank-k subspace of d-dimensional space can be interpreted as the Principle Component Analysis (PCA) problem provided that the goal of optimization is minimizing the error in the sense of variance.
Assume we have access to the data sample ${x}_{1},\dots ,{x}_{n}\in {\mathbb{R}}^{n}$, prove that the first problem is equivalent to the second one:
$\begin{array}{}\text{(1)}& \stackrel{^}{{U}_{n}}=\mathrm{arg}min\sum _{l=1}^{n}‖{x}_{l}-{\mathcal{P}}_{U}\left({x}_{l}\right){‖}_{2}^{2}\end{array}$
such that
$U\in {\mathbb{R}}^{d×k},{U}^{T}U={I}_{k}$
$\begin{array}{}\text{(2)}& \stackrel{^}{{U}_{n}}=\mathrm{arg}min‖{X}_{n}-U{U}^{T}{X}_{n}{‖}_{F}^{2}\end{array}$
such that
$U\in {\mathbb{R}}^{d×k},{U}^{T}U={I}_{k}$
where Ik is Identity matrix, ${P}_{U}=U{U}^{T}$ is a projection matrix, ${X}_{n}=\left[\begin{array}{ccc}{x}_{1}& \cdots & {x}_{n}\end{array}\right]$ is a matrix of data, $‖\cdot {‖}_{F}$ is Frobenius norm, i.e. $‖A{‖}_{F}=\sqrt{\text{Trace}\left({A}^{T}A\right)}$

Sage Mcdowell

I start with $‖{x}_{l}-{\mathcal{P}}_{U}\left({x}_{l}\right){‖}_{2}^{2}$. It can be written as$‖{x}_{l}-{\mathcal{P}}_{U}\left({x}_{l}\right){‖}_{2}^{2}=\left({x}_{l}^{T}-{x}_{l}^{T}U{U}^{T}{\right)}^{T}\left({x}_{l}-U{U}^{T}{x}_{l}\right)=$
$={x}_{l}^{T}{x}_{l}-{x}_{l}^{T}U{U}^{T}{x}_{l}-{x}_{l}^{T}U{U}^{T}{x}_{l}+{x}_{l}^{T}U{U}^{T}U{U}^{T}{x}_{l}$
since ${U}^{T}U={I}_{k}$
$={x}_{l}^{T}{x}_{l}-{x}_{l}^{T}U{U}^{T}{x}_{l}-{x}_{l}^{T}U{U}^{T}{x}_{l}+{x}_{l}^{T}U{U}^{T}{x}_{l}={x}_{l}^{T}{x}_{l}-{x}_{l}^{T}U{U}^{T}{x}_{l}$
$\sum _{l=1}^{n}‖{x}_{l}-{\mathcal{P}}_{U}\left({x}_{l}\right){‖}_{2}^{2}=\sum _{l=1}^{n}\left({x}_{l}^{T}{x}_{l}-{x}_{l}^{T}U{U}^{T}{x}_{l}\right)$
since the value of the sum is scalar we can write
$\begin{array}{}\text{(1)}& \sum _{l=1}^{n}\left({x}_{l}^{T}{x}_{l}-{x}_{l}^{T}U{U}^{T}{x}_{l}\right)=\text{Trace}\left(\sum _{l=1}^{n}\left({x}_{l}^{T}{x}_{l}-{x}_{l}^{T}U{U}^{T}{x}_{l}\right)\right)=\text{Trace}\left(\left[\begin{array}{cccc}{x}_{1}^{T}& 0& \cdots & 0\\ 0& {x}_{2}^{T}& 0& \cdots \\ ⋮& 0& \cdots & 0\\ 0& 0& \cdots & {x}_{n}^{T}\end{array}\right]\left[\begin{array}{cccc}{x}_{1}& 0& \cdots & 0\\ 0& {x}_{2}& 0& \cdots \\ ⋮& 0& \cdots & 0\\ 0& 0& \cdots & {x}_{n}\end{array}\right]-\left[\begin{array}{cccc}{x}_{1}^{T}& 0& \cdots & 0\\ 0& {x}_{2}^{T}& 0& \cdots \\ ⋮& 0& \cdots & 0\\ 0& 0& \cdots & {x}_{n}^{T}\end{array}\right]U{U}^{T}\left[\begin{array}{cccc}{x}_{1}& 0& \cdots & 0\\ 0& {x}_{2}& 0& \cdots \\ ⋮& 0& \cdots & 0\\ 0& 0& \cdots & {x}_{n}\end{array}\right]\right)\end{array}$
On the other hand,
$\left[\begin{array}{c}{x}_{1}^{T}\\ {x}_{2}^{T}\\ \cdots \\ {x}_{n}^{T}\end{array}\right]\left[\begin{array}{cccc}{x}_{1}& {x}_{2}& \cdots & {x}_{n}\end{array}\right]=\left[\begin{array}{cccc}{x}_{1}^{T}{x}_{1}& {x}_{1}^{T}{x}_{2}& \cdots & {x}_{1}^{T}{x}_{n}\\ {x}_{1}^{T}{x}_{2}& {x}_{1}^{T}{x}_{2}& \cdots & {x}_{2}^{T}{x}_{n}\\ ⋮& ⋮& ⋮& ⋮\\ {x}_{n}^{T}{x}_{1}& {x}_{n}^{T}{x}_{2}& \cdots & {x}_{n}^{T}{x}_{n}\end{array}\right]$
Note that
$\text{Trace}\left(\left[\begin{array}{cccc}{x}_{1}^{T}{x}_{1}& {x}_{1}^{T}{x}_{2}& \cdots & {x}_{1}^{T}{x}_{n}\\ {x}_{1}^{T}{x}_{2}& {x}_{1}^{T}{x}_{2}& \cdots & {x}_{2}^{T}{x}_{n}\\ ⋮& ⋮& ⋮& ⋮\\ {x}_{n}^{T}{x}_{1}& {x}_{n}^{T}{x}_{2}& \cdots & {x}_{n}^{T}{x}_{n}\end{array}\right]\right)=\text{Trace}\left(\left[\begin{array}{cccc}{x}_{1}^{T}& 0& \cdots & 0\\ 0& {x}_{2}^{T}& 0& \cdots \\ ⋮& 0& \cdots & 0\\ 0& 0& \cdots & {x}_{n}^{T}\end{array}\right]\left[\begin{array}{cccc}{x}_{1}& 0& \cdots & 0\\ 0& {x}_{2}& 0& \cdots \\ ⋮& 0& \cdots & 0\\ 0& 0& \cdots & {x}_{n}\end{array}\right]\right)$
And,
$\left[\begin{array}{c}{x}_{1}^{T}\\ {x}_{2}^{T}\\ \cdots \\ {x}_{n}^{T}\end{array}\right]U{U}^{T}\left[\begin{array}{cccc}{x}_{1}& {x}_{2}& \cdots & {x}_{n}\end{array}\right]=\left[\begin{array}{c}{x}_{1}^{T}U{U}^{T}\\ {x}_{2}^{T}U{U}^{T}\\ \cdots \\ {x}_{n}^{T}U{U}^{T}\end{array}\right]\left[\begin{array}{cccc}{x}_{1}& {x}_{2}& \cdots & {x}_{n}\end{array}\right]$
$=\left[\begin{array}{cccc}{x}_{1}^{T}U{U}^{T}{x}_{1}& {x}_{1}^{T}U{U}^{T}{x}_{2}& \cdots & {x}_{1}^{T}U{U}^{T}{x}_{n}\\ {x}_{1}^{T}U{U}^{T}{x}_{2}& {x}_{1}^{T}U{U}^{T}{x}_{2}& \cdots & {x}_{2}^{T}U{U}^{T}{x}_{n}\\ ⋮& ⋮& ⋮& ⋮\\ {x}_{n}^{T}U{U}^{T}{x}_{1}& {x}_{n}^{T}U{U}^{T}{x}_{2}& \cdots & {x}_{n}^{T}U{U}^{T}{x}_{n}\end{array}\right]$
Note that
$\text{Trace}\left(\left[\begin{array}{cccc}{x}_{1}^{T}U{U}^{T}{x}_{1}& {x}_{1}^{T}U{U}^{T}{x}_{2}& \cdots & {x}_{1}^{T}U{U}^{T}{x}_{n}\\ {x}_{1}^{T}U{U}^{T}{x}_{2}& {x}_{1}^{T}U{U}^{T}{x}_{2}& \cdots & {x}_{2}^{T}U{U}^{T}{x}_{n}\\ ⋮& ⋮& ⋮& ⋮\\ {x}_{n}^{T}U{U}^{T}{x}_{1}& {x}_{n}^{T}U{U}^{T}{x}_{2}& \cdots & {x}_{n}^{T}U{U}^{T}{x}_{n}\end{array}\right]\right)=\text{Trace}\left(\left[\begin{array}{cccc}{x}_{1}^{T}& 0& \cdots & 0\\ 0& {x}_{2}^{T}& 0& \cdots \\ ⋮& 0& \cdots & 0\\ 0& 0& \cdots & {x}_{n}^{T}\end{array}\right]U{U}^{T}\left[\begin{array}{cccc}{x}_{1}& 0& \cdots & 0\\ 0& {x}_{2}& 0& \cdots \\ ⋮& 0& \cdots & 0\\ 0& 0& \cdots & {x}_{n}\end{array}\right]\right)$
Hence,
$\left(1\right)=\text{Trace}\left({X}^{T}X-{X}^{T}U{U}^{T}X\right)$
$=\text{Trace}\left({X}^{T}X-{X}^{T}U{U}^{T}X-{X}^{T}U{U}^{T}X+{X}^{T}U{U}^{T}U{U}^{T}X\right)=\text{Trace}\left(\left(X-U{U}^{T}X{\right)}^{T}\left(X-U{U}^{T}X\right)\right)=‖X-U{U}^{T}X{‖}_{F}^{2}$

Do you have a similar question?