3 Resolution analysis

After having a solution (e.g. the least-squares solution, but generally), there are a couple of questions that can be summarized with the term resolution:

How does reality propagate into our image?
What is the quality of the inversion result?
What is the contribution of the individual data?

3.1 Model resolution matrix

We start with the data consisting of a true model $\vb m^\text{true}$ \[\vb d= \vb G \vb m^\text{true} + \vb n\]

If we do a matrix inversion with any inverse operator ${\vb G}^\dagger$, we obtain our model estimate ${\vb m}^\text{est}$ by application to $\vb d$ \[{\vb m}^\text{est} = {\vb G}^\dagger {\vb d}= {\vb G}^\dagger{\vb G}{\vb m}^\text{true} + {\vb G}^\dagger{\vb n} = \]

The first term on the right-hand side is the combination of forward and inverse operator, projecting from model to data and back to model, and is therefore called model resolution matrix ${\vb R}^M={\vb G}^\dagger{\vb G}$:

\[\vb m^\text{est} = {\vb R}^M {\vb m}^\text{true} + {\vb G}^\dagger{\vb n}\]

$\vb R^M$ obviously tells us how is the true model (${\vb m}^\text{true}$) is reflected in our estimate (${\vb m}^\text{est}$). Every column of the matrix can be interpreted as a so-called point-spread function, i.e. how a single anomaly is distributed in the model space.

The diagonal elements of $\vb R^M$ states how well the individual points are reflected correctly and can therefore be treated as resolution of individual parameters and plotted like a model, mostly between 0 and 1. Ideally, $\vb R^M$ is an identity matrix, but mostly it deviates from it.

3.2 Data resolution

Let’s put it the other way round: How well are observed data $d^\text{obs}$ explained by our model? We again use the model

\[{\vb m}^\text{est} = {\vb G}^\dagger {\vb d}^\text{obs}\]

and project into the data space

$${d}^ = {G}{m}^ = {G}{G}^^

The first term on the right-hand side projects from data to model and back to data and is therefore called data resolution or data importance (or information density) matrix ${\vb R}^D={\vb G}{\vb G}^\dagger$, so that

\[{\vb d}^\text{est} = {\vb R}^D {\vb d}^\text{obs}\]

It tells us how the individual data are correlated with each other. The diagonal of $R^D$ bears the information content of the individual data contributing to the model and can be plotted like any data vector .

3.2.1 Model resolution for overdetermined problems

\[{\vb G}^\dagger=({\vb G}^T{\vb G})^{-1} {\vb G}^T\]

\[\Rightarrow {\vb R}^M = ({\vb G}^T{\vb G})^{-1} {\vb G}^T {\vb G}= {\vb I}\]

perfect model resolution

\[{\vb G}^\dagger=({\vb G}^T{\vb G})^{-1} {\vb G}^T \quad \Rightarrow \quad {\vb R}^D = {\vb G}({\vb G}^T{\vb G})^{-1} {\vb G}^T\]

3.2.2 Resolution of matrix problem

Show the code

Ginv = np.linalg.inv(G.T @ G) @ G.T
RM = Ginv@G
RD = G@Ginv
im = plt.imshow(RD, vmin=-1, vmax=1, cmap="bwr")
plt.colorbar(im)

3rd row (measurement) is most important
1st and 2nd are highly correlated, sum(diag(RD)) $\Rightarrow$ 2

3.3 Linear regression

Show the code

x = np.arange(0, 3, 0.1)
x[-1] = 3.5
err = 0.1
d = 0.5-0.5*x + np.random.randn(len(x)) * err
A = np.ones((len(x), 2))
A[:, 1] = x[:]
Ainv = np.linalg.inv(A.T @ A) @ A.T
m = Ainv @ d
print("m = ", m)
fig, ax = plt.subplots()
ax.plot(x, d, "b+")
ax.plot(x, A@m, "r-")
ax.grid()
residual = A @ m - d
print("mean=", np.mean(residual))
print("std=", np.std(residual))
print("X²=", np.sqrt(np.mean((residual/err)**2)))

m =  [ 0.47881015 -0.46971096]
mean= 1.295260195396016e-17
std= 0.08725032651333633
X²= 0.8725032651333633

3.4 Linear regression - data importance matrix

Show the code

RD = A @ Ainv
fig, ax = plt.subplots(figsize=(7,6))
im = ax.matshow(RD, cmap="bwr",
    vmin=-0.3, vmax=0.3)
plt.colorbar(im);

neighboring data points are correlated, opposite anti-correlated

Show the code

rd = np.diag(RD)
display(np.sum(rd))
plt.plot(x, rd, "+-", ms=5);

np.float64(1.9999999999999996)

main diagonal of $\vb R^D$ $\Rightarrow$ data importance (outside biggest)
sum of importances equals $M$ (later: rank of matrix)

3.5 Errors & robust inversion

error goes into the scaling of $\vb G$ and $\vb d$
sometimes data have different errors
there might be outliers spoiling the result

3.5.1 Error weighting

Unweighted residual norm (root-mean square RMS)

\[\|{\bf d}-{\bf f}({\bf m})\| = \sqrt{1/N\sum (d_i-f_i({\bf m}))^2}\]

Weighting by individual error $\epsilon_i$ (chi-square value):

\[\chi^2=\frac1N \sum \left( \frac{d_i-f_i({\bf m})}{\epsilon_i} \right)^2 \rightarrow \min\]

In case of exact error estimates: $\chi^2=1$

3.5.2 Error weighting

replace $d_i$ by $\hat{d}_i=d_i/\epsilon_i$ leads to \[{\bf m}= (\hat{\bf G}^T\hat{\bf G})^{-1} \hat{\bf G}^T \hat{\bf d}\] with $\hat{{\bf G}}=\mbox{diag}(1/\epsilon_i)\cdot {\bf G}$

3.5.3 General $L_p$ norm

The $L_2$ norm is the rooted sum of squared elements of a vector $\bf v$:

\[\Phi^P=\|{\bf v}\|_2=\sqrt{\sum_i v_i^2}\]

More generally, the $L_P$ norm is computed by \[\Phi^P=\|{\bf v}\|_P=\left( \sum_i |v_i|^P \right)^{1/P}\]

3.5.4 $L_1$ norm minimization (IRLS)

\[\Phi^1=\|{\bf v}\|_1=\sum_i |v_i|\]

The $L_1$ norm is much less prone to outliers as the misfit is not squared

By the ratio of its contribution to L1 and L2 norm we can reweight $\epsilon_i$ \[ w_i = \frac{|v_i|/\|\vb v\|^1_1}{v_i^2/\|\vb v\|^2_2} = \frac{\|\vb v\|^2_2}{|v_i| \cdot \|\vb v\|^1_1} \]

This is known as Iteratively Reweighted Least-Squares (IRLS)

3.5.5 Linear regression with outliers

Show the code

d[-1] += 1
m = Ainv @ d
print("m = ", m)
fig, ax = plt.subplots()
ax.plot(x, d, "b+")
ax.plot(x, A@m, "r-")
ax.plot(x, 0.5-0.5*x, "m--")
ax.grid()

m =  [ 0.39065588 -0.38706633]

3.5.6 The misfit distribution

Important

Always have a look at the misfit distribution (not only $\chi^2$)

3.5.7 Wrap-up overdetermined problems ($N>M$)

objective function $\Phi$ as squared data misfit
weighting of individual data (unitless, more flexible & objective)
broad minimum of the objective function
grid search to plot $\phi$ $\Rightarrow$ only nice for $M$=2
least squares method yields
resolution matrices for model (perfect) and data (distributed)

3.6 Under-determined problems

specifically $N<M$ (too less data for the model pameters), but, more general, if the rank $r<M$

Rank of a matrix

number of independent columns and rows, i.e. the degree of non-degenerateness.

3.6.1 Example of a mixed-determined problem

\[ \textbf{G}=\mqty(1 & 1 & 0 \\ 0 & 0 & 1) \]

$N$<$M$: clearly under-determined,
parameter $m_3$ is perfectly determined (=$d_2$)
any combination of $m_1$ and $m_2=d_1-m_1$ fulfils the first equation
there is no unique solution (model ambiguity)

3.6.2 Changing the system

\[ \textbf{G}=\mqty(1 & 1 & 0 \\ 0 & 0 & 1) \]

adding data 1+2

\[ \textbf{G}=\mqty(1 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 1 & 1) \]

does not help, because they are linear dependent ($rk({\bf G})=2$).

3.6.3 Minimum-norm solution

Analog to the least-squares inverse $({\bf G}^T {\bf G})^{-1} {\bf G}^T$ there is an inverse for under-determined problems:

\[ \bf m = G^T (G G^T)^{-1} d = G^\dagger d \]

that is called minimum-norm solution.

Observation from Notebook

The backslash operator (G\d) yields the minimum-norm solution.

3.6.4 Derivation

From the models fulfilling $\bf G m=d$ we are looking for the “smallest”

\[ \text{min} \Phi=\bf m^T m \quad\mbox{with}\quad G m=d \]

We use the method of Lagrangian parameters ($\lambda_i$) \[ \Phi = \bf m^T m + \lambda^T (G m - d) \rightarrow \mbox{min} \]

The derivatives vanish

\[ \frac{\partial\Phi}{\partial \lambda} = \bf G m - d=0 \]

\[ \frac{\partial\Phi}{\partial m} = 2\bf m^T + {\bf \lambda}^T G = 0 \]

\[ \Rightarrow {\bf m} = -\frac{1}{2} ({\lambda}^T {\bf G})^T = -\frac{1}{2}{\bf G}^T \lambda \]

from which we can determine the Langrangian parameters

\[ \lambda = -2 ({\bf G G^T})^{-1} \bf d \]

Replacing them we obtain the minimum-norm solution \[ {\bf m}_{MN} = \bf G^T (G G^T)^{-1} d \]

3.6.5 Resolution of the minimum norm inverse

\[ \bf G^\dagger = G^T (G G^T)^{-1} \]

\[ {\bf R}^M = \bf G^\dagger G = G^T (G G^T)^{-1} G \]

\[ {\bf R}^D = \bf G G^\dagger = G G^T (G G^T)^{-1} = I \]

Note

For underdetermined problems, all data are independent and equally important. Model parameters are insufficiently resolved.

3.7 Resolution kernels (point spread functions)

\[ \Rightarrow \vb R^M=\vb G^\dagger \vb G=(\vb G^T \vb G + \lambda \vb C^T \vb C)^{-1} \vb G^T \vb G \]

A column represents how a cell anomaly is “spread” (Friedel 2003).

3.8 Model resolution radius

The diagonal of the resolution matrix

states how a single model cell can be retrieved

How big needs a sphere to be to integrate a value of 1 (perfect resolution)?

Resolution radius \[ r = \frac{r_i}{\sqrt{\vb R^M_{ii}}}=\frac{\sqrt{A_i/\pi}}{\sqrt{\vb R^M_{ii}}} \]

Friedel S. 2003. Resolution, stability and effiency of resistivity tomography estimated from a generalized inverse approach. Geophys. J. Int. 153. 305–16

3 Resolution analysis

3.1 Model resolution matrix

3.2 Data resolution

3.2.1 Model resolution for overdetermined problems

3.2.2 Resolution of matrix problem

3.3 Linear regression

3.4 Linear regression - data importance matrix

3.5 Errors & robust inversion

3.5.1 Error weighting

3.5.2 Error weighting

3.5.3 General \(L_p\) norm

3.5.4 \(L_1\) norm minimization (IRLS)

3.5.5 Linear regression with outliers

3.5.6 The misfit distribution

3.5.7 Wrap-up overdetermined problems (\(N>M\))

3.6 Under-determined problems

3.6.1 Example of a mixed-determined problem

3.6.2 Changing the system

3.6.3 Minimum-norm solution

3.6.4 Derivation

3.6.5 Resolution of the minimum norm inverse

3.7 Resolution kernels (point spread functions)

3.8 Model resolution radius