3  Resolution analysis

After having a solution (e.g. the least-squares solution, but generally), there are a couple of questions that can be summarized with the term resolution:

3.1 Model resolution matrix

We start with the data consisting of a true model \(\vb m^\text{true}\) \[\vb d= \vb G \vb m^\text{true} + \vb n\]

If we do a matrix inversion with any inverse operator \({\vb G}^\dagger\), we obtain our model estimate \({\vb m}^\text{est}\) by application to \(\vb d\) \[{\vb m}^\text{est} = {\vb G}^\dagger {\vb d}= {\vb G}^\dagger{\vb G}{\vb m}^\text{true} + {\vb G}^\dagger{\vb n} = \]

The first term on the right-hand side is the combination of forward and inverse operator, projecting from model to data and back to model, and is therefore called model resolution matrix \({\vb R}^M={\vb G}^\dagger{\vb G}\):

\[\vb m^\text{est} = {\vb R}^M {\vb m}^\text{true} + {\vb G}^\dagger{\vb n}\]

\(\vb R^M\) obviously tells us how is the true model (\({\vb m}^\text{true}\)) is reflected in our estimate (\({\vb m}^\text{est}\)). Every column of the matrix can be interpreted as a so-called point-spread function, i.e. how a single anomaly is distributed in the model space.

The diagonal elements of \(\vb R^M\) states how well the individual points are reflected correctly and can therefore be treated as resolution of individual parameters and plotted like a model, mostly between 0 and 1. Ideally, \(\vb R^M\) is an identity matrix, but mostly it deviates from it.

3.2 Data resolution

Let’s put it the other way round: How well are observed data \(d^\text{obs}\) explained by our model? We again use the model

\[{\vb m}^\text{est} = {\vb G}^\dagger {\vb d}^\text{obs}\]

and project into the data space

$${d}^ = {G}{m}^ = {G}{G}^^

The first term on the right-hand side projects from data to model and back to data and is therefore called data resolution or data importance (or information density) matrix \({\vb R}^D={\vb G}{\vb G}^\dagger\), so that

\[{\vb d}^\text{est} = {\vb R}^D {\vb d}^\text{obs}\]

It tells us how the individual data are correlated with each other. The diagonal of \(R^D\) bears the information content of the individual data contributing to the model and can be plotted like any data vector .

3.2.1 Model resolution for overdetermined problems

\[{\vb G}^\dagger=({\vb G}^T{\vb G})^{-1} {\vb G}^T\]

\[\Rightarrow {\vb R}^M = ({\vb G}^T{\vb G})^{-1} {\vb G}^T {\vb G}= {\vb I}\]

perfect model resolution

\[{\vb G}^\dagger=({\vb G}^T{\vb G})^{-1} {\vb G}^T \quad \Rightarrow \quad {\vb R}^D = {\vb G}({\vb G}^T{\vb G})^{-1} {\vb G}^T\]

3.2.2 Resolution of matrix problem

Show the code
Ginv = np.linalg.inv(G.T @ G) @ G.T
RM = Ginv@G
RD = G@Ginv
im = plt.imshow(RD, vmin=-1, vmax=1, cmap="bwr")
plt.colorbar(im)

  • 3rd row (measurement) is most important
  • 1st and 2nd are highly correlated, sum(diag(RD)) \(\Rightarrow\) 2

3.3 Linear regression

Show the code
x = np.arange(0, 3, 0.1)
x[-1] = 3.5
err = 0.1
d = 0.5-0.5*x + np.random.randn(len(x)) * err
A = np.ones((len(x), 2))
A[:, 1] = x[:]
Ainv = np.linalg.inv(A.T @ A) @ A.T
m = Ainv @ d
print("m = ", m)
fig, ax = plt.subplots()
ax.plot(x, d, "b+")
ax.plot(x, A@m, "r-")
ax.grid()
residual = A @ m - d
print("mean=", np.mean(residual))
print("std=", np.std(residual))
print("X²=", np.sqrt(np.mean((residual/err)**2)))
m =  [ 0.47881015 -0.46971096]
mean= 1.295260195396016e-17
std= 0.08725032651333633
X²= 0.8725032651333633

3.4 Linear regression - data importance matrix

Show the code
RD = A @ Ainv
fig, ax = plt.subplots(figsize=(7,6))
im = ax.matshow(RD, cmap="bwr",
    vmin=-0.3, vmax=0.3)
plt.colorbar(im);

  • neighboring data points are correlated, opposite anti-correlated

Show the code
rd = np.diag(RD)
display(np.sum(rd))
plt.plot(x, rd, "+-", ms=5);
np.float64(1.9999999999999996)

  • main diagonal of \(\vb R^D\) \(\Rightarrow\) data importance (outside biggest)
  • sum of importances equals \(M\) (later: rank of matrix)

3.5 Errors & robust inversion

  • error goes into the scaling of \(\vb G\) and \(\vb d\)
  • sometimes data have different errors
  • there might be outliers spoiling the result

3.5.1 Error weighting

Unweighted residual norm (root-mean square RMS)

\[\|{\bf d}-{\bf f}({\bf m})\| = \sqrt{1/N\sum (d_i-f_i({\bf m}))^2}\]

Weighting by individual error \(\epsilon_i\) (chi-square value):

\[\chi^2=\frac1N \sum \left( \frac{d_i-f_i({\bf m})}{\epsilon_i} \right)^2 \rightarrow \min\]

In case of exact error estimates: \(\chi^2=1\)

3.5.2 Error weighting

replace \(d_i\) by \(\hat{d}_i=d_i/\epsilon_i\) leads to \[{\bf m}= (\hat{\bf G}^T\hat{\bf G})^{-1} \hat{\bf G}^T \hat{\bf d}\] with \(\hat{{\bf G}}=\mbox{diag}(1/\epsilon_i)\cdot {\bf G}\)

3.5.3 General \(L_p\) norm

The \(L_2\) norm is the rooted sum of squared elements of a vector \(\bf v\):

\[\Phi^P=\|{\bf v}\|_2=\sqrt{\sum_i v_i^2}\]

More generally, the \(L_P\) norm is computed by \[\Phi^P=\|{\bf v}\|_P=\left( \sum_i |v_i|^P \right)^{1/P}\]

3.5.4 \(L_1\) norm minimization (IRLS)

\[\Phi^1=\|{\bf v}\|_1=\sum_i |v_i|\]

The \(L_1\) norm is much less prone to outliers as the misfit is not squared

By the ratio of its contribution to L1 and L2 norm we can reweight \(\epsilon_i\) \[ w_i = \frac{|v_i|/\|\vb v\|^1_1}{v_i^2/\|\vb v\|^2_2} = \frac{\|\vb v\|^2_2}{|v_i| \cdot \|\vb v\|^1_1} \]

This is known as Iteratively Reweighted Least-Squares (IRLS)

3.5.5 Linear regression with outliers

Show the code
d[-1] += 1
m = Ainv @ d
print("m = ", m)
fig, ax = plt.subplots()
ax.plot(x, d, "b+")
ax.plot(x, A@m, "r-")
ax.plot(x, 0.5-0.5*x, "m--")
ax.grid()
m =  [ 0.39065588 -0.38706633]

3.5.6 The misfit distribution

Important

Always have a look at the misfit distribution (not only \(\chi^2\))

3.5.7 Wrap-up overdetermined problems (\(N>M\))

  • objective function \(\Phi\) as squared data misfit
  • weighting of individual data (unitless, more flexible & objective)
  • broad minimum of the objective function
  • grid search to plot \(\phi\) \(\Rightarrow\) only nice for \(M\)=2
  • least squares method yields
  • resolution matrices for model (perfect) and data (distributed)

3.6 Under-determined problems

  • specifically \(N<M\) (too less data for the model pameters), but, more general, if the rank \(r<M\)
NoteRank of a matrix

number of independent columns and rows, i.e. the degree of non-degenerateness.

3.6.1 Example of a mixed-determined problem

\[ \textbf{G}=\mqty(1 & 1 & 0 \\ 0 & 0 & 1) \]

  • \(N\)<\(M\): clearly under-determined,
  • parameter \(m_3\) is perfectly determined (=\(d_2\))
  • any combination of \(m_1\) and \(m_2=d_1-m_1\) fulfils the first equation
    there is no unique solution (model ambiguity)

3.6.2 Changing the system

\[ \textbf{G}=\mqty(1 & 1 & 0 \\ 0 & 0 & 1) \]

adding data 1+2

\[ \textbf{G}=\mqty(1 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 1 & 1) \]

does not help, because they are linear dependent (\(rk({\bf G})=2\)).

3.6.3 Minimum-norm solution

Analog to the least-squares inverse \(({\bf G}^T {\bf G})^{-1} {\bf G}^T\) there is an inverse for under-determined problems:

\[ \bf m = G^T (G G^T)^{-1} d = G^\dagger d \]

that is called minimum-norm solution.

NoteObservation from Notebook

The backslash operator (G\d) yields the minimum-norm solution.

3.6.4 Derivation

From the models fulfilling \(\bf G m=d\) we are looking for the “smallest”

\[ \text{min} \Phi=\bf m^T m \quad\mbox{with}\quad G m=d \]

We use the method of Lagrangian parameters (\(\lambda_i\)) \[ \Phi = \bf m^T m + \lambda^T (G m - d) \rightarrow \mbox{min} \]

The derivatives vanish

\[ \frac{\partial\Phi}{\partial \lambda} = \bf G m - d=0 \]

\[ \frac{\partial\Phi}{\partial m} = 2\bf m^T + {\bf \lambda}^T G = 0 \]

\[ \Rightarrow {\bf m} = -\frac{1}{2} ({\lambda}^T {\bf G})^T = -\frac{1}{2}{\bf G}^T \lambda \]

from which we can determine the Langrangian parameters

\[ \lambda = -2 ({\bf G G^T})^{-1} \bf d \]

Replacing them we obtain the minimum-norm solution \[ {\bf m}_{MN} = \bf G^T (G G^T)^{-1} d \]

3.6.5 Resolution of the minimum norm inverse

\[ \bf G^\dagger = G^T (G G^T)^{-1} \]

\[ {\bf R}^M = \bf G^\dagger G = G^T (G G^T)^{-1} G \]

\[ {\bf R}^D = \bf G G^\dagger = G G^T (G G^T)^{-1} = I \]

Note

For underdetermined problems, all data are independent and equally important. Model parameters are insufficiently resolved.

3.7 Resolution kernels (point spread functions)

\[ \Rightarrow \vb R^M=\vb G^\dagger \vb G=(\vb G^T \vb G + \lambda \vb C^T \vb C)^{-1} \vb G^T \vb G \]

A column represents how a cell anomaly is “spread” (Friedel 2003).

Ideal imaging

Contrast deficiency

Geometrical distortion

Side lobes

3.8 Model resolution radius

NoteThe diagonal of the resolution matrix

states how a single model cell can be retrieved

How big needs a sphere to be to integrate a value of 1 (perfect resolution)?

Resolution radius \[ r = \frac{r_i}{\sqrt{\vb R^M_{ii}}}=\frac{\sqrt{A_i/\pi}}{\sqrt{\vb R^M_{ii}}} \]

Friedel S. 2003. Resolution, stability and effiency of resistivity tomography estimated from a generalized inverse approach. Geophys. J. Int. 153. 305–16