Calculation of Parameters: Principles, Criterion Function,

Robust Estimations

Tests of Adequacy

Parameters to be found

Criterion function. Robust estimates

Calculation of unknown parameters using Newton and Gauss-Newton Methods:

Algorithms

Stop criteria

Analysis of model adequacy:

Residual variance and criterion c2

Skewness, kurtosis, mean residual and residual mean

Estimation of errors and correlation coefficients for parameters

Redundancy of a model as decided from the analysis of Jacoby matrix

To the Help main page

Parameters to be found

  1. logarithms p of unknown equilibrium constants;
  2. intensity factors W of reagents (e.g. molar absorptivities in spectrophotometry) for L analytical positions (e.g. wavelength in spectrophotometry).

Total number of parameters to be found z = p + L ´ W.

Program CLINP 2.1 allows to consider logarithms of unknown stability constants to be estimated as functions of parameters LKu to be computed:

Here lg bi0 is the invariable contribution into lg bi;

LKu are parameters to be found (their number must be equal to the number of unknown stability constants for sprcies Li),

tiu are elements of a p´ p matrix T transforming parameters LKu into lg bi.

See description of input data

For simplicity at this section we shall assume lg bi0 = 0 and matrix T to be the unit one. Hence, lg bi are considered as parameters to be calculated.

CRITERION FUNCTION. ROBUST ESTIMATES

A vector q (composed of unknown b i, a li) that gives minimum to the chosen criterion function U:

|q > = arg min U(q)

is adopted as a proper estimation of sought-for parameters.

If all measured Alk are independent and the density of their errors e is known, then the criterion function is chosen according to maximum likelihood principle.

If the density is normal, parameters may be estimated using the nonlinear least-square method as vector q that gives minimum to functional

where wlk is the statistical weight corresponding to variance estimate s2(Dlk),

Dlk is the discrepancy

Alk is a property of the equilibrium system.

If the distribution of e deviates from the normal one, estimates obtained using the least squares method (LSM-estimates) are biased, insignificant and inefficient. Alternative ways of estimating parameters that are less sensitive to the violation of the LSM basic assumptions are termed robust.

In developing a robust criterion for parameter identification a distribution of e is assumed with "longer" tails in comparison with the normal distribution (i.e. a fractional mass is moved from the central region to the tails). These tails increase the probability for e considerably deviated from zero. A quantitative measure of tail length is the kurtosis g2: g2=0 for the normal distribution and g 2>0 if tails are longer.

Let measurement errors e have density

ð(e) = [(100 - d)× j(e) + d × h(e)] / 100,

where j (e) is the standard density with zero mean and unity variance (density of main observations), h(e) is the density of outliers, d is the intensity of outliers (%). This hypothesis about experimental errors is termed "the model of outliers (gross errors)". It was shown by Huber [Huber P.J. Robust Statistical Procedures, SIAM, 1996] that the maximum likelihood estimate corresponds in this case to the minimum of criterion

where x k is the weighted discrepancy (wk1/2× Dk), r(xkq) is a loss function. This formula is given for the case of one analytical position, L =1.

Estimates corresponding to the minimum of the above functional are termed M-estimates. They are non-biased, significant and highly efficient provided that function r(xkq) increases slowly than x2. M-estimates are more robust with respect to outliers than LSM-estimates. The loss function introduced by Huber has the form

where c is a constant. It depends on the intensity of outliers d and is found as a solution to equation

M-estimates keep their optimal properties in the case of small numbers of experiments and also robust to asymmetrical contaminations of the normal distribution of errors.

Criterion M is a hybrid of criteria corresponding to LSM and least modules method (LMM). When c=µ (intensity of outliers d =0) it becomes

whereas when d ® 100% (c® 0) it is equivalent to

M-estimates are not invariant with respect to scaling. To treat data with an arbitrary variance of density (not only unit density) a minimization problem

need to be solved in which unknown scale parameter s is estimated along with parameters q . The value of a is found as

where z is the number of sought-for parameters, j(x) is the standard normal density.

The variance of density of main observations s2 is an estimate of residual variance provided that distribution of errors is normal (gross errors are absent):

The problem of parametric identification can be easily generalized for the multidimensional case when L properties of an equilibrium system are registered at each experimental point. In this case the criterion to be minimized is

Note that a high flexibility is inherent in the method of data analysis based on M-estimates due to additional variable, percentage of outliers d , in comparison with the method of least squares. A priori information about d is not available, and variation of this parameter may be considered as an adaptation of procedures of parametric identification to treated data.

To the top of the page

Calculation of unknown parameters using Newton and Gauss-Newton Methods

Algorithms

Stop criteria

Algorithms

Point estimates of unknown parameters |lg bi>, <ali> are found as values that give the minimum to the criterion function . Along with them the scale parameter s is estimated.

Newton and Gauss-Newton methods are realized in CLINP 2.1 for minimizing . The algorithm is as follows. After m iterations of s vector q is refined proceeding from s (m), and then new value of s is calculated. Iterations are repeated until the difference between two subsequent values of s will be not more than 1%.

Iterative scheme that makes the current values of sought-for parameters lgbc (c means current) more precise at fixed value of s is represented by the formula

lg b+ = lg bc + l × s,

in which vector

s = -H-1 × g

specifies the direction of movement from lg bc to lg b+, new values of parameters at next iteration, Í is Hess matrix, g is the gradient vector for , 0<1 is the step. Hess matrix is calculated exactly if Newton method is used and is approximated if Gauss-Newton method is used. M-estimates of unknown intensity factors ali are calculated at each iteration of lg b using linear regression method.

Global convergence of the algorithm and fast local convergence are ensured by

1) correction of Hess matrix of indefinite signs based on its spectral decomposition;

2) bypass of saddle points in the direction of negative curvature;

3) linear strategy of searching l beyond vicinities of saddle points (Armijo-Goldstein rule [Armijo L. Pacific J. Math. 16 (1966) 1; Goldstein A. A. Constructive Real Analysis, Harper&Row, New York, 1967]).

Simplified calculations of Hess matrix in Newton-Gauss method may result in the loss of local convergence a) in the case of big discrepancies because of inappropriate choice of zero approximation for sought-for parameters or inappropriate choice of the set of chemical reactions included into the model; b) if minimized criterion is essentially nonlinear.

To the top of the page

Stop criteria

Two main stop criteria are used in iterating lgb

The first one is constructed using the gradient of minimized function g(lg b). Vector gr(lg b+) of the relative gradient is calculated during iterations, its components being

where is the value of minimized criterion. When norm ||gr(lgb+)|| is small enough (smaller than 10-7 –10-4) iteration process is stopped.

Iteration process may also be stopped for another reason: when all increments si are small. Fulfillment of the following condition

is the criterion in this case, where

typb is the "typical value" of sought-for parameters (recommended: 1-10),

estep is the minimal admissible relative iteration step (recommended 10-6 – 10-4 ).

In addition, users may restrict the number of iterations by specifying the maximum number Itermax.

If Itermax=0, the program will calculate the equilibrium composition of a system, weighted discrepancies and characteristics of model adequacy for parameter values specified by a user.

To the top of the page

Analysis of model adequacy

Residual variance and criterion c2

Skewness, kurtosis, mean residual and residual mean

Residual variance and criterion c2

If weighted discrepancies xlk = wlk1/2×Dlk are independent random variables normally distributed with zero mean and unit density, then the residual variance is

where f = N×L- z, z is the total number of calculated parameters. This random variable is distributed as c2/f with f degrees of freedom and unit mean. A model may be considered as adequate one if inequality

,

is fulfilled, where cf2(a) is the 100a -percentage point for distribution c 2 with f degrees of freedom. This condition is exact only for models linear in parameters. It is also used for checking the adequacy of complex formation models nonlinear in parameters, the significance level a being considered as approximate.

When Huber's M-estimates are used in parametric identification of models, resulted values of parameters ali and lg bi may not correspond to the minimum of residual variance. Accordingly, the conventional procedure of checking the model adequacy needs to be relaxed. This may be done by reducing f.

If a distribution of weighted discrepancies xlk is differ from the normal one, its kurtosis being g2, corrected value of f may be found using formula

f = (N× L - z) × {1 + 0.5× g2 × (N× L - z)/N× L }-1,

where int means the integer part of a number. Instead of g2 its sampling estimate is used in CLINP 2.1.

To the top of the page

Skewness, kurtosis, mean residual and residual mean

Proceeding from values of weighted discrepancies xlk sampling estimates may be found for parameters of their distribution. CLINP 2.1 calculates:

skewness

where m2 and m3 are second (variance) and third central moments (respectively) of the distribution;

kurtosis

g2 = m4 / (m2)2 - 3,

where m4 is the fourth central moment;

mean residual

residual mean

If xlk are distributed normally with zero mean and unit density, expectations of the above quantities are as follows:

Sampling values may be compared with percentage points of distributions of corresponding statistics (Tables 1-3).

If for a given significance level a sampling estimates do not exceed their 100a -percentage points, the model may be considered as adequate one.

To the top of the page

Table 1. Percentage points of distribution

Number of observations

Significance level a

0.01

0.05

0.10

11

0.94

0.91

0.89

16

0.92

0.89

0.87

21

0.90

0.88

0.86

26

0.89

0.87

0.86

31

0.88

0.86

0.85

36

0.88

0.86

0.85

41

0.87

0.85

0.84

46

0.87

0.85

0.84

51

0.86

0.85

0.84

71

0.85

0.84

0.83

101

0.85

0.83

0.83

201

0.83

0.82

0.82

0.81

0.81

0.81

Table 2. Percentage points of distribution

Number of observations

Significance level a

0.01

0.05

25

1.06

0.71

30

0.98

0.66

35

0.92

0.62

40

0.87

0.59

50

0.79

0.53

70

0.67

0.49

100

0.57

0.39

150

0.46

0.32

200

0.40

0.28

400

0.285

0.20

1000

0.18

0.13

Table 3. Percentage points of g2 distribution

Number of observations

Significance level

0.01

0.05

50

1.92

1.01

100

1.40

0.77

200

0.98

0.57

300

0.79

0.47

500

0.60

0.37

1000

0.41

0.26

To the top of the page

Estimation of errors and correlation coefficients for parameters

After the iteration process is finished, the program calculates symmetrical covariance matrix D(lg b)of calculated parameters. Its diagonal elements are variance estimates for components of vector lg b – s2(lg bj), other elements are their covariances cov(lg bi, lg bj):

This matrix is used for estimating confidence limits of parameters and their correlation coefficients.

Partial correlation coefficients rij characterize the mutual influence of parameters lg bi and lg bj provided that all other parameters are fixed:

General correlation coefficients sij serve as a measure of correlation of parameters lg bi and lg bj, when other parameters are considered as adjustable:

Multiple correlation coefficients Ri measures the degree of correlation of a given parameter with all other parameters lg bi:

Correlation coefficient may vary in the interval [-1;1]. Zero values mean total independence of parameters, values ± 1 correspond to linear dependence of parameters (a model is overdetermined). The reconsideration of a chemical model (exclusion of one of complexes) is believed to be of need when the absolute value of corresponding correlation coefficient exceeds a threshold value chosen in the interval 0.95-0.98.

Redundancy of a model as decided from the analysis of Jacoby matrix

The program shows results of singular decomposition of Jacoby matrix

J = A / lg b = U S VT,

where U and V  are orthogonal matrices, S is a rectangular matrix consisted of diagonal and zero blocks. Diagonal elements of S are singular numbers ki.

Small singular numbers (ki/kmax< 10-4 – 10-6) correspond to linear combinations of sought-for parameters lg bj. The criterion function is insensitive to them.

Coefficients of these linear combinations are situated in columns of matrix V with numbers corresponding to that of small singular values. Not infrequently these coefficients provide sufficient information for detecting an odd complex. In more involved cases this may be easily done by analyzing equilibrium concentrations of reagents (an unresolvable linear combination of logarithms of constants appears when expressions of sought-for constants include concentrations of reagents not represented in the system).

To the top of the page

To the Help main page