![]() |
Inversion theory:
|
Contents and objectivesThis is an important page. After studying it carefully, you will be able to explain why the two concepts on this page (misfit and model norms) are important to the inversion problem. A more rigorous treatment is given in the "Inversion Tutorial" which can be reached by following the link on the left-hand menu. IntroductionIn the first section the relation between data and the Earth was explored and the UBC-GIF Linear Inversion Applet was introduced. Now it is time to explore the two components underlying successful inversion schemes - data misfit and model norms. Referring to the flow chart, these are the two parallel items that can be considered after all pre-requisites have been met. After dealing with these two aspects in this section, the subsequent section will address the "perform inversion" block. (Recall you can click flowchart icons to see the complete flow chart page.)
|
| A simple under-determined problem: |
What are we to do in the face of this seemingly unsolvable problem? The answer is that we can invoke some alternative information to narrow down the options. What kinds of alternative, or prior, information are available in the applied geophysics context? Some examples are:
One way to look at the inversion problem is to recognize that we are basically trying to build an automated way of selecting one solution which contains geologically useful information about the earth. If "fitting the data" is the only criterion, then there are infinitely many acceptable models when there are more model cells than data values. Out of that group of acceptable models, we want to find the "best" solution. If we could somehow equate "best" with some measurable attribute of the models, then we could solve the inverse problem using optimization. Our solution could be the smallest or largest model, or it could be the model with the highest or lowest value of this attribute. The only information that we have about the model, other than the data, is the prior information.
To make inversion an optimization problem, we must encode this prior information into a form that will help us with the optimization. That is, we must build a mathematical "ruler" to test sizes of possible models so that we can choose the one model that is optimal. The figure to the right illustrates an analogy - if one person is to be chosen from a room full of people, how is the choice made? Some criterion must be established. It could be height (as shown) and then the ruler to measure each candidate is the familar ruler. Alternatively, the criterion could be IQ and then a suitable test score might be used as a ruler. Or the ruler could be the ability to run a particular distance; the possibilities are endless. The point is, we can make a decision (choose an optimal solution) by defining a suitable criterion.
Once a set of models is sorted based upon some type of ruler, then there will be two end-members in that set. There will be a smallest model and a largest model. These could be found using optimization techniques where we carry out a minimization to find the smallest element or a maximization to find the largest element. There is also a philosophical issue here. If the ruler involves amount of structure, then "large" models will have much structure and will, therefore, be complicated to interpret. Alternatively, "small" models will have minimum structure and it is hoped that such solutions at least capture the essential features of the true model. For most of what is done here, we will choose this minimization route.
Summary comment: The key point that we want to re-emphasize is that the task of choosing one model from infinitely many that acceptably fit the data is critically dependent upon what prior information we have about the earth and how we include that information into computational software.
When a ruler measures the size (or length) of an object (in our case one of our candidate models), then the output is a single number. As discussed in the previous section we can devise different rulers to achieve our goals. There are many different rulers possible and we'll explore some of these throughout the CD. For the moment, we consider our model to be a vector of M unknowns m = (m1, m2, ..., mM). If the size of m is
m, it can be given by
m = W m 2
where W is an M x M matrix, and the notation denotes a "vector norm," meaning a measure of the size of a vector. The values of vector norms can be any number that is greater than or equal to zero. Two possible cases are outlined next:
Case I: Suppose W = I the identity matrix (the matrix with ones on the diagonal and zeros everywhere else). Then
. Taking the square root would generate the common Euclidean length (i.e. distance) as we know it. If m = (2,2,...2), a constant vector, then
m = 4M.
Case II: Alternatively, W could be a finite difference matrix, which operates upon the differences between neighboring elements:
. In this case, contributions to
m occur only if there is a difference between neighboring values. Application of this norm to the constant vector m = (2,2,...2) would yield
m = 0.
|
|
The images to the right illustrate both of these norms using a simple hypothetical situation in which the model is made up of four elements. The first image shows the values of the four model elements. The second image shows what is "measured" by the Euclidean norm, and the third image shows what is "measured" by the difference norm.
These concepts form the basis for an important idea: given that we have a number of possible solutions to our inverse problem, any specific solution can be found by choosing the one model which has the minimum vector norm value. The key is to choose an appropriate ruler (i.e. the right form of the vector norm).
These ideas will be illustrated using a simple 4-parameter problem characterized in the two boxes below. This system has four unknowns ( m1, m2, m3, m4) and two data (6, 2). It is an underdetermined problem and has no unique solution. Four possible solutions mA, mB, mC, mD are shown:
|
|
So how do we pick one solution? Suppose we are interested in that model that has the minimum Euclidean length (our usual ruler). Then we would calculate the Euclidean norm for all possible models and choose the model with the smallest norm, using the following equation:
#3.3.2
Comparing
m for all four possibilities, it is clear that mB is the model to choose because it's value of
m is the smallest (shown in the left-hand table below).
Suppose we are more interested in "structure" of the model - that is differences between adjacent parameter values. In that case we would compare values of the norm using equation 3.3.3. This norm effectively measures the "flatness " of the model.
#3.3.3
In this case mA would be the model chosen.
Now, what is the effect of applying these two norms to obtain specific results for our four-parameter problem? We must calculate the respective norms four our problem. You can do this by hand, but the results of these calculations are shown here with the "best" solution in boldface:
| Values of the "smallest" norm for each model.
|
Values of the "flattest" norm for each model.
|
When m is a function (that is, our cells have been reduced to an infinitesimal size) we can still have norms to measure the size of the elements. Below are four possibilities. The first function is a usual measure of size (or energy) associated with an element, m. The second function is associated with the derivative of the model or its "flatness." The third norm measures how close an element, m, is to a reference model. The last norm is particularly useful because of the flexibility that it affords. It has two parts. For historical reasons we refer to the first term as a "smallest" model component and the second term as a "flattest" model component. These are weighted with user defined constants (
s,
x) to allow recovery of models that have different amounts of energy and structural variation.
| #3.3.4 | This is the "smallest" model norm introduced above. | |
| #3.3.5 | This is the "smoothest" model norm introduced above. | |
| #3.3.6 | This is similar to the smallest model norm, but size is measured with respect to a reference value m0. We will see later that this reference value is valuable for incorporating some knowledge that is already known | |
| #3.3.7 | This is a combination of the second and third norms above. The coefficients in front of the two terms allow you to build a range of reasonable models. |
The Linear Inversion Applet uses a model norm like the fourth version in this table. Most inversion schemes discussed here allow the user to specify the two values of
. This allows you to ask the inversion scheme to return models that may be smoother (flatter), or more similar to a reference model (m0). Exactly how to select suitable values for the
's is discussed in a later section (Practicalities: 2D DC) in this chapter.
sidebar: For a comment on norms, vectors, functions, and real problems, click this sidebar icon.
Let us summarize what has been covered so far:
The second critical aspect of inversion is misfit. Traditionally this term refers to the difference between the measured data and the predicted data. If these two quantities are sufficiently close, then we consider the model to be a viable candidate for the solution to our problem. Before continuing, we re-introduce our 2-parameter tiny problem to show the importance of not fitting data too well if the data are known to be inaccurate.
| The simple problem again: |
This simple example illustrates that we do not want to find a model that reproduces the observations as well as possible. Because the data are inaccurate, we know that any model that reproduces those data exactly is guaranteed to be the wrong answer. A realistic goal, therefore, is to find those models whose predicted data are consistent with the errors in the observations. Clearly this requires that we have quantitative knowledge about the accuracy of the field data.
What is a reasonable measure of misfit? To answer this, we invoke a standard approach from basic statistics. If each datum, di, contains errors (or noise) that can be described as Gaussian with a standard deviation of
, then a good measure of misfit between predictions and field data, which we will call
d is:
#3.3.8where diobs and dipred are observed (i.e. measured) and predicted data respectively. |
|
The predicted data will be considered OK when the value of
d is less than some pre-determined tolerance. What is a reasonable tolerance? Again we draw upon basic statistics and note that the expected value of
d will be N, the number of data values. This is clear if you recognize that the difference in the above equation (i.e. the numerator) will, on average, be
, so the result, on average, will come out to a sum of N ones, or simply N.
We showed above that we don't want to have a model, m, that results in a misfit that is too small because this would indicate that we are fitting the noise. Instead, we want the predicted data to be reasonably close to the observed values. Again, we will resort to statistics to gain some idea for what might be a reasonable value of
d . The quantity in eqn 3.3.8 is a "chi-squared" random variable. Therefore its expected value is equal to N, the number of data, and the standard deviation is
. So if the data errors really were Gaussian with standard deviations
, the true model should produce a misfit with the observations that is about equal to N. Also, there will be a 65% chance that the misfit between the true model and the observations lies in the range N +/-
.
On this page, we have introduced two aspects that make the solution to under-determined inversion problems possible. One is the use of a model norm that allows our models to be ranked according to "size." Importantly, this model norm is designed by the user so that the minimum norm model, which will be found by the inversion algorithm, will be compatible with prior knowledge about the earth. The second aspect pertains to the misfit between the observed and predicted data. We want this misfit to be neither too small nor too large. In the next section we combine these two concepts into a solution for the inverse problem.
Parametric problems: When a data set is used to find some model that has fewer parameters than there are data, then we have an "over-determined" problem, and non-uniqueness is not the same problem as for under-determined problems. The solution can involve finding the model that can reproduce the measured data as closely as possible. Conceptually this is what least squares regression does. The aim is to find a straight line (a model with 2 parameters, slope and one zero-crossing value) through two or more values.