Introduction
Over the last decade, ice-sheet model development has focused primarily on the improvement of model physics and computational efficiency. This focus is responsible for the substantial increase in the variety of glaciological problems now addressed with models. In contrast, little attention has been paid to improving the techniques for fitting model output to observations; in other words, to model-parameter tuning. This emphasis has been justified up to now by the fact that simplifications and inaccuracies in model physics usually mean that model parameters bear little resemblance to physical counterparts. In the future, however, when model physics is improved to the point where agreement between model and observation is not simply accidental, one must pay close attention to the way in which model-input variables are chosen. In so doing, one can evaluate the degree to which model tuning gives insight into the physical system being modeled.
Here, I present a method for ice-sheet model tuning which is based on control theory. (See Reference WunschWunsch (1988) and Reference Bryson and HoBryson and Ho (1975), for a general introduction to control theory.) I use a simple, idealized model-tuning problem as a vehicle for demonstrating this method. The problem consists of estimating a basal-friction parameter which, when applied in a model that predicts the velocity of an idealized ice stream, yields the best agreement between the model and an arbitrary velocity field which plays the role of observations. The so-called observed velocity field is generated from the model using a known value of the basal-friction parameter. Thus, the performance of the control method is evaluated by comparing the known parameter value with that determined by model tuning.
Notation
The Problem
Consider an imaginary, rectangular ice stream which rests on a subglacial bed characterized by a spatially variable basal-friction parameter, ß(x,y). The object of the problem posed here, is to estimate ß(x,y) from an arbitrary ice-stream surface velocity, Ud, which serves the role of observations. (Here and henceforth, variables with subscript d play the role of observations in the following demonstration.) Let the function ß(x,y) be expanded in terms of basis functions, F i (x,y), which are assumed to span the function space, Φ, in which ß(x, y) is found:
The expansion coefficients, are determined from the inner product associated with Φ, which is assumed to be
where
and where is a normalization constant, and Γ is the areal domain of the ice stream.
The function space Φ and the inner product expressed by Equation(2) are normally defined by the physical constraints satisfied by allowable ß(x,y) If, for example, ß(x,y) were to satisfy a particular partial differential equation and boundary conditions associated with subglacial hydrology, the functions Fi(x, y) would be the eigenfunctions of the differential equation. In the present circumstance, I require only that ß(x, y) and its derivatives be continuous within the domain Γ which, as shown in Figure 1, is rectangular in geometry. In this circumstance, the F((x,y) can be taken to be sinusoidal functions with x and y wavelengths that are divisible into the horizontal dimensions of the ice stream. In other words, Equation (1) expresses ß(x,y) in terms of a two-dimensional Fourier series.
In practical circumstances, the infinite series in Equation (1) should be truncated to reflect the fact that surface-velocity data may not allow ß(x,y) to be determined to an arbitrarily fine spatial resolution. Reference Balise and RaymondBalise and Raymond (1985) argued, for example, that variations in basal-sliding velocity over scales less than the thickness of the ice stream are unlikely to have an appreciable effect on the surface velocity. It may thus be wise to truncate the series in Equation (1) in such a way as to exclude basis functions Fi(x,y) which have horizontal wavelengths less than the ice thickness. Truncation in this manner will limit the accuracy with which ß(x,y) can be determined. This limitation is acceptable, however, because other sources of inaccuracy, such as the ill-conditioning of arithmetic operations necessary to evaluate the fine-scale detail of ß(x,y), can then be avoided.
Here, I truncate the infinite series of Equation (1) after two terms. All coefficients a¿ are zero except two: c*i and α2 In practical applications, this truncation would be too severe. Doing so in the present demonstration, however, allows the steps which determine αχ and 0¾ to be visualized in terms of two-dimensional graphs and contour maps. The functions, F1(x,y) and F2(x,y) are chosen arbitrarily (and do not reflect the lowest horizontal wave numbers which, in other applications, might be associated with the indices i = 1, 2). This arbitrary choice gives a ß(x,y) which represents a field of sticky spots:
where Lx and Ly are the longitudinal and transverse dimensions of the ice stream, 200 km and 50 km, respectively. The term sticky spot has been used by Reference KambKamb (1991) to describe scattered regions of strong resistance to basal sliding which might occur if bedrock protruded through a subglacial layer of deformable sediment.
The ice-stream velocity, U = (u,v), that is produced by a numerical model of the ice stream is assumed to be independent of the vertical coordinate, z, and to be related to β by the following equations which govern the ice-stream’s stress balance. (The validity of these equations is taken for granted in this exercise; for a discussion of their applicability to realistic ice streams, see Reference MacAyealMacAyeal (1989).)
The viscosity, v, is strain-rate dependent, and accounts for steady-state power-law creep
The square of β is used in the basal-friction terms of the above equations to enforce the constraint that basal friction be positive.
As stated above, the problem to be solved here is to determine α1 and α2 from an observed or prescribed velocity field, U d=(u d,u d), which in this exercise is generated from Equations (6)–(8) by imposing a known The performance of the control method is then evaluated by comparing with α1 and α2 obtained by the control method. Before describing the control method, I first present the details used to create Ud.
Ideal Observations
The imaginary ice stream flows down a 50 km by 200 km rectangular channel (Fig. 1) where ice thickness and surface elevation are given by
and
where are expressed in units of meters.
Kinematic boundary conditions are applied at all boundaries as follows
and
where the unit for constants appearing as the first terms on the righthand sides of Equations (13) and (14) is ms−1. As with the choice of F1(x, y) and F2(x, y), the variations of , Hd and the kinematic boundary conditions are chosen arbitrarily: they simply serve to generate the observations Ud.
To this end, Equations (6)–(8), subject to Equations (9)–(14), were solved by using . As shown in Figure 1, this evaluation represents a field of sticky spots distributed across the ice-stream channel. The solution of the forward problem with the above input, Ud, is shown in Figure 2. This solution will serve as the (imaginary) surface-velocity observations to be used by the control method to obtain ß(x,y). To achieve this solution, I discretized the ice-stream domain on a 65 × 65 finite-difference grid and used a software package MUDPACK (MUD2SA, described by Reference AdamsAdams (1989)) to solve Equations (6) and (7). Non-linearity represented by the (low law expressed in Equation (8) was treated using the method of successive approximations (e.g. Reference MacAyealMacAyeal, 1989). MUDPACK is an elliptic partial-differential-equation solver developed by the National Center for Atmospheric Research of the U.S.A., and is based on the multi-grid relaxation method.
In practice, velocity observations are subject to measurement uncertainty and to spatial-sampling limitations. Here Ud is error-free because it is generated by a model. Approaches to dealing with measurement uncertainty will not be outlined here, but have been summarized by Reference ThackerThacker (1988). As I shall soon demonstrate, however, uncertainty will arise anyway because of resolution limitations associated with the assumed ice dynamics, and because of inaccuracy in the computations.
A Direct Approach
A simple and direct solution to the problem of estimating ß(x,y) (or, equivalently, α1 and α2) from Ud is presented first as an example of an inappropriate method. The difficulties associated with this inappropriate method will be used to motivate the control method discussed below. A direct solution to the problem is derived by rewriting Equations (6) and (7) in a manner that will isolate β:
or
Assuming β can be determined uniquely by these equations, α1 and α2 are then found by evaluating the inner products expressed in Equation (2).
The determination of α1 and α2 by the above means may not be possible in practice for the following reasons. First, errors in the velocity observations may produce unnatural small-scale structure in the velocity derivatives, which in turn can affect the computation of β. Secondly, the velocity magnitude may approach zero in some regions, and this will introduce zeros in the denominators of the righthand sides of Equations (15) and (16). Thirdly, the velocity errors may make the square roots expressed in Equations (15) and (16) complex valued. Fourthly, the governing (Equations (6) and (7) describe only large-scale aspects of the ice-stream flow. Thus, small-scale structure in Ud is incompatible with the assumptions used to derive Equations (15) and (16).
The Control Method
To avoid the above difficulties, one can use control methods to estimate β (or, equivalently, α1 and α2). In this case, the method does not require that Ud be an exact solution of Equations (6)–(8). Instead, α1 and α2 are estimated on the basis that, when substituted into Equations (6)–(18), they give the solution U which is as close as possible to Ud. This solution, U, may differ from Ud even when the closest possible fit is achieved because, as mentioned previously, Ud may be incompatible with solutions of Equations (6)–(8). In some circumstances, an inequality of this nature may suggest that Equations (6)–(8) may not adequately describe the dynamics of the ice stream. In other circumstances, an inequality merely reflects the inherent noise in the observed velocity.
Here, I use a least-squares measure, J, to gauge the misfit between U and Ud :
The implicit relationship between U and (α1, α2 given by Equations (6) and (7) can be satisfied as a constraint on the minimization of J′ by using a Lagrange-multiplier vector Λ = (λ, μ) (Reference Bryson and HoBryson and Ho, 1975):
Conditions for the minimization of J′ are generated by requiring that the variation of J′ with respect to the variations of be zero. This variation is not difficult to generate provided the variation of the viscosity, v, associated with the variation in U(x,y) is disregarded. (The alternative leads to a very tedious expression for the variation of J’ with respect to U.) I make this simplification here because the error it introduces in the expression for the gradient of J′ with respect to α1, α2 is small. (An alternative approach would be to attach the flow law given by Equation (8) as a constraint to the definition of the performance index using a third Lagrange multiplier; and then to treat the variation of v independently.) The variation of J’ so determined is written:
In deriving the expression for δJ′ shown above, the kinematic boundary conditions, which require δu=δu=0 on boundaries, are used to eliminate boundary integrals arising from integration by parts. Had dynamic, instead of kinematic, boundary conditions been defined at some of the boundaries, the comparable expression for δJ′ would have included boundary integrals.
The variation of J′ with respect to arbitrary variations in λ and μ is zero when Equations (6) and (7) are satisfied. This is the main rationale for introducing λ and μ as variables: they enforce satisfaction of the model equations. The variation of J′ with respect to arbitrary variations in u and υ is zero when the following equations are satisfied:
These equations generate the spatial distributions of λ and μ. They are motivated by the mathematics of minimization, not by the physics of ice flow. I will interpret these equations in detail after first justifying interest in λ and μ.
The benefit to be obtained from Λ comes from the fact that the gradient of J′ with respect to α1, α2 denoted by grad(J’), is expressed as a vector having two components that are written in terms of Λ = (λ, μ) and other variables of the system:
This gradient can be visualized in the present example by plotting the contours of J′ on a map of , the vector space spanned by all (α1, α2). The grad(J′) is a vector which points towards increasing J′, and which is everywhere orthogonal to the contours (except at extremal points where grad(J′) vanishes).
The ability to express grad(J′) in terms of an analytic expression (Equation (23)) provides three important work-saving benefits. First, the effort to minimize J′ is reduced because the direction of its gradient in K2 is now defined. Secondly, the extrema of J′ are defined when the expression given in Equation (23) is zero. This provides a convenient identification for the point(s) at which J’ is minimized. Thirdly, and most important, the effort to compute the grad(J′) is substantially reduced. Without Equation (23), a finite-difference evaluation of grad(J′) would require 2N solutions of the forward problem (defined as Equations (6)–(8) with a specific evaluation of α i,i=1,....,N where Ν is the number of unknown αis (two in this problem). With Equation (23), a single solution of the forward problem and a single solution of the equations which determine Λ (Equations (20)–(22)) give the same result. If there were large numbers of unknown αis, such as would be the case for realistic ice-stream modeling problems, evaluation of grad(J′) by Equation (23) can provide a substantial saving in effort. (Effort is also minimized by the fact that the equations to be solved for the Lagrange-multiplier vector components are sufficiently similar to the equations of the forward problem that little additional programming is required to solve them.)
The Adjoint Problem
Having defined the Lagrange-multiplier vector Λ = (λ,μ), and motivated interest in its evaluation, I now return to the subject of Equations (20)–(22) which determine λ and μ from U and Ud These equations are called the adjoint equations because they are mathematically adjoint to the linear forms of Equations (6) and (7).
The adjoint equations have several distinctive features. They satisfy homogeneous boundary conditions, the surface-slope forcing term is absent and they are forced by model/observation mismatch (the difference between U and Ud). These distinctive features suggest that the Lagrange-multiplier vector Λ(x,y) can be regarded as a sensitivity function which determines the effect of model/observation mismatch on grad(J′). As was shown above, Λ not only provides for the explicit evaluation of grad(J′), it defines the spatial region over which a given velocity observation will constrain the value of α1, α2). In this way, Λ can provide useful insight into the resolving power of the observations.
Conditions Defining the Minimum of J′
A solution of the inverse problem is found when the variation of J′ with respect to arbitrary variations in α1, α2) is zero. According to the definition of δJ′ derived above, this condition is met when
Any one of five possible situations can achieve this condition. First, α1, α2) could be zero. This situation would apply when further improvements in the fit between model and observation would require a basal stress that pushed the ice stream forward rather than resisted its flow. Secondly, u and υ could be zero. This possibility must be mentioned because, where the flow is zero, the basal-friction coefficient has no relevance to the fit between model and observations. Thirdly, λ and μ could be zero. As seen from the forcing terms of Equations (20) and (21), this circumstance would occur when the misfit between model and observation is zero everywhere (the minimum value of J′ in this circumstance is zero). Fourthly, the vectors U and Λ could be everywhere orthogonal. In this case, the vector dot-products in the integrands of Equation (23) are zero. Fifthly and last, the scalar function Λ and U can be orthogonal to the functions . In this case, the integrals of Equation (23) define the inner product because may not be members of Φ. As will be described below, the last two circumstances occur when the fit between model and observations is not perfect, yet cannot be improved by additional tuning of α1, α2) The inability to improve further model tuning may reflect errors in the observations which are incompatible with model physics; or, alternatively, incorrect or over-simplified model physics.
Minimization Algorithm
To satisfy Equation (24) and find the values of α1, α2 which minimize J′, I recommend a simple down-gradient search algorithm. With an initial guess of α1, α2) I solve the forward problem expressed by Equations (6)–(8) to determine an associated velocity field U.
The velocity field, U, and effective viscosity, v(x,y), resulting from the initial guess are then used to evaluate the forcing term and effective viscosity in the adjoint equations (Equations (20)–(22)), which are then solved for the Lagrange-multiplier vector Λ. With both U and Λ determined, grad(J′) is evaluated using Equation (23).
If the value of grad(J′) is not zero (which is usually the case), the original estimate of en and α1 and α2 is improved by searching down the gradient of J′ This search will usually involve evaluating J′ (by solving the forward problem) several more times to determine how far down the gradient α1 and α2 should be changed. In this exercise, I employed a univariate minimization routine from IMSL called UVMIF to perform this search. (I added a constant times grad(J′) to the original (α1, α2); and then minimized J′ with respect to this new constant.) Once an improved estimate is achieved, the process described above is repeated until grad(J′) is zero or, more realistically, until its size is so small that further reductions in the magnitude of J′ are not warranted by the accuracy of the observations. The solution so obtained is designated (α1 *,α2 *).(Here, the superscript * signifies that are derived by the control method, and will normally be different from used to generate the data.)
The Hessian Matrix
A danger in the above algorithm is that might lie on a saddle point where J′ is minimized in some search directions, but maximized in others. To check for this, the Eigenvalues of the Hessian matrix, H, must be evaluated. This matrix is defined as the second derivative of the performance index with respect to the unknowns:
When all the Eigenvalues are positive definite, a local minimum of J′ is assured.
To perform this test, it is necessary to compute H. This computation can involve as much computation as the search for the minimum of J′. Reference ThackerThacker (1989) recommended a number of methods for evaluating H. A particularly efficient method involves using the adjoint equations to evaluate grad(J′) for points neighboring the solution . Evaluation of grad(J′) at the neighboring point generates the first column of H, and evaluation of grad(J′) at the neighboring point generates the second column of H.
Uncertainty
The uncertainty of is determined by a number of factors. First, J′ may possess multiple minima. In this circumstance, is determined by where the initial guess lies with respect to the attractor sets of the minima. Such attractor sets will be sub-sets of , and can have very complex geometry. Insight and experimentation with different initial guesses are necessary to assess the uncertainty associated with multiple minima.
A second factor which determines uncertainty in is the possibility that might lie in a subset of within which J′ is constant. I shall label this subset In this circumstance, it is not possible to identify a unique answer to the problem. Some additional constraint must be proposed to select the solution. One such constraint might be that the solution have minimum norm in in other words, that it minimize . More likely, one would select the solution on the basis of additional physics, direct observations of the bed or subjective prejudice (e.g. that the resulting ß(x,y) be as smooth as possible).
The description of a non-empty can be found by examining the Eigenvalues and Eigenvectors of the Hessian matrix (Reference ThackerThacker, 1989). If Η is singular, or if some of its Eigenvalues are close to zero (as determined by some criterion related to the observational error or to the accuracy of calculations), then the geometry of in the neighborhood of is indicated by the Eigenvectors of Η associated with the zero or small Eigenvalues. The sub-set will extend in the direction of these Eigenvectors. In the present exercise, Η has only two Eigenvalues; thus the local geometry of will be a line extending in the direction of the Eigenvector associated with the zero or small Eigenvalue.
The third factor which determines the uncertainty in is the observational error (and the spatial coherence of this error if the velocity field is measured at sparsely distributed points). Reference ThackerThacker (1988, Reference Thacker1989) discussed how this uncertainty can be computed using the Eigenvalues and Eigenvectors of H.
Solution of the Problem
To evaluate the performance of the control method described above, I determined from the imaginary observations, Ud, generated from the known . An initial guess of was used to start the minimization algorithm. (Recall that ) Convergence of the minimization algorithm was achieved after approximately 20 iterations.
The progress of the minimization algorithm is examined in Figure 3 and 4. Figure 3 displays the monotonic decrease of J’ with each cycle through the minimization algorithm. Figure 4 presents a contour map of J’ on , and shows where the initial guess and subsequent determinations of (α1,α2) lie. (This map was created by solving the forward problem and calculating J’ for 400 points surrounding the known solution.)
Mismatch between U and Ud converges to zero with each iteration of the minimization algorithm as shown in Figure 5. Lagrange-multiplier vector fields calculated at various steps through the minimization algorithm are displayed in Figure 6. These vector fields are generated by errors in fitting the model to imaginary observations, and are used to calculate the gradient of J′. As the iteration procedure converges, Ν tends everywhere toward zero magnitude as expected.
The Hessian, H, evaluated at |U–Ud(m a *#x2013;1)| was found to have two positive Eigenvalues, but one was only 12% of the other. That they are both positive confirms the fact that the solution lies at a local minimum of J′. The large difference between the Eigenvalues reflects the fact that the contours of J,′ describe a steep gradient in the (α1,α2) direction and a gentle gradient in the (—α1,α2) direction (Fig. 4). Although not shown in Figure 4, J′ was found to have three additional minima. One is located at . The other two were located at approximately .
Conclusion
The above exercise demonstrates the way in which control methods can be used objectively to tune unconstrained parameters used in ice-sheet models. Although the circumstances of this exercise have been deliberately simplified, the exercise suggests that control methods can be useful in the design of modeling experiments which address real glaciological observations. The advantages of control methods over ad hoc trial-and-error techniques are numerous. Control methods are more efficient and objective. They reduce the computational work load necessary to evaluate the gradient of model-output variables with respect to unknown model-input variables. They define the region over which a given model-input parameter can influence the match between an observation and a model counterpart. They allow efficient assessment of uncertainty. Finally, control methods provide a formal means to accomplish one of the aims of ice-sheet modeling research: the inversion of ice-dynamics equations to allow unobservable parameters to be evaluated from disparate observations and field data.
Acknowledgements
I thank R. Frolich, V. Barcilon, J. Firestone, E. Waddington, R. Bindschadler, R. Alley, I. Whillans, K. van der Veen, W. Thacker and C. Wunsch for valuable suggestions making this research possible. I especially thank K. Hutter for suggesting improvements in the mathematical discussions and for improving the clarity of the writing. Financial support was provided by the U.S. National Science Foundation grant NSF DPP 89–14938. Computer support was provided by the National Center for Atmospheric Research (NCAR) and by the National Center for Supercomputing Applications (NCSA), both of the U.S.A.
The accuracy of references in the text and in this list is the responsibility of the author, to whom queries should be addressed.