$\newcommand{\ones}{\mathbf 1}$

Least-squares approximate solution of overdetermined equations

Suppose $A\in {\mathbf R}^{m \times n}$ is skinny and full rank, $y \in {\mathbf R}^m$, and $x_\mathrm{ls} = (A^TA)^{-1}A^Ty$.

$x_\mathrm{ls}$ is the point in ${\mathbf R}^n$ closest (in terms of norm) to a solution of $Ax=y$.

• Incorrect.
• Correct! There may be no solution of $Ax=y$.

$Ax_\mathrm{ls}$ is the point in $\mathcal R(A)$ closest (in terms of norm) to $y$.

If $y \in \mathcal R (A)$, then $Ax_\mathrm{ls} =y$.

Suppose $y=Ax+v$, where $x\in {\mathbf R}^n$ is some set of parameters you wish to estimate, $y\in {\mathbf R}^m$ is a set of measurements, and $v$ represents a noise. We assume $m>n$, and $A$ is full rank. Consider an estimator of the form $\hat x=By$.

Choosing $B$ to be any left inverse of $A$ yields $\hat x =x$, no matter what $x$ is, provided $v=0$.

The choice $B = A^\dagger = (A^TA)^{-1}A^T$ yields $\hat x =x$, provided $v$ is small.

The choice $B = A^\dagger$ yields $\hat x$ that is closest to $x$.

The choice $B = A^\dagger$ yields $\hat x$ that minimizes the norm of $Ax-y$.

If $B$ is any left inverse of $A$, then for each $i,j$, $|B_{ij}| \geq |B^\dagger _{ij}|$.