Transl Clin Pharmacol. 2016 Dec;24(4):161-168. English.
Published online Dec 16, 2016.
Copyright © 2016 Translational and Clinical Pharmacology
Review

R-based reproduction of the estimation process hidden behind NONMEM® Part 2: First-order conditional estimation

Kyun-Seop Bae,1 and Dong-Seok Yim2
    • 1Asan Medical Center, University of Ulsan College of Medicine, Seoul 05505, Korea.
    • 2PIPET (Pharmacometrics Institute for Practical Education and Training), College of Medicine, The Catholic University of Korea, Seoul 06591, Korea.

It is identical to the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/).

Abstract

The first-order conditional estimation (FOCE) method is more complex than the first-order (FO) approximation method because it estimates the empirical Bayes estimate (EBE) for each iteration. By contrast, it is a further approximation of the Laplacian (LAPL) method, which uses second-order expansion terms. FOCE without INTERACTION can only be used for an additive error model, while FOCE with INTERACTION (FOCEI) can be used for any error model. The formula for FOCE without INTERACTION can be derived directly from the extension of the FO method, while the FOCE with INTERACTION method is a slight simplification of the LAPL method. Detailed formulas and R scripts are presented here for the reproduction of objective function values by NONMEM.

Keywords
R NONMEM FOCE

Introduction

The first-order conditional estimation (FOCE) method estimates the empirical Bayes estimate (EBE) at each step. Therefore, one iteration is almost the same as the first-order (FO) approximation method with the POSTHOC option, which produces EBE after finalization of parameter estimation.[1] The term ‘parameter’ in this tutorial means either theta (θ), omega (Ω), or sigma (Σ). Actually, NONMEM combines the theta vector and vech (half-vectorization)-operated ‘Cholesky-like’ decomposed omega and sigma matrix into one vector, and it differentiates simultaneously for the quasi-Newton-type minimization algorithm.[2, 3, 4] ‘Cholesky-like’ means that it is similar to a Cholesky decomposition, but not exactly the same, because it includes a kind of scaling for this unconstrained minimization algorithm. NONMEM 7.3 and R 3.3.1 were used for their calculation in this tutorial. Derivation of the formula and scripts for the FO method are explained in the previous article of this tutorial series.[5]

Essentially, there are two ways to derive FOCE: one is simply an expansion of the FO approximation method,[6] and the other is a further approximation of the LAPL approximation.[7] If we want to consider the interaction between the random variables eta (η) and epsilon (ε), we need to use a second-order approximation of the Taylor series expansion or Laplacian approximation. The Taylor series expansion is adequate for the approximation of the prediction function (F), while the Laplacian approximation is suitable for the approximation of the objective function, which is usually in the form of an exponential function. Calculation of the objective function value using the second-order approximation of the prediction function (F) is too complex. Therefore, using the Laplacian approximation of likelihood (objective) function is much easier when considering the interaction of the random variables eta and epsilon, because the Laplacian method is the second-order approximation of an exponential function, such as the likelihood objective function.

FOCE without INTERACTION

The FO method can be regarded as a kind of maximum likelihood estimation (MLE) with the Taylor series expansion of the ‘observation’ (or prediction) function.[8, 9] Observation function, or the vector of observation values will be noted here as ‘Y,’ and the prediction function, or the vector of prediction values as ‘F’. If we know the mean and variance of the observation function (Y), we can plug those into the MLE formula of normal distribution in the following:

L=12πV(Y)e-12V(Y)(Y-EY)2eq. 1 

where E(Y) is the mean of observation, and V(Y) is the variance of observation.

For the FO method, we can use E(Y) and V(Y) as follows:

E(Y)=F0=PRED (Typical Prediction)eq. 2 

V(Y)=G0G0T+diag(H0H0T)=C0eq. 3 

where

F=f1(θ,η,x) F0=f1(θ,0,x)

Y=f2(F,ε)

G=Fη=gθ,η,x G0=g(θ,0,x)

H=Yε=hF,ε H0=h(f0,0)

Here, it is assumed that eta and epsilon are independent random variables, i.e. COV(eta, epsilon) = 0, and there is no correlation among residuals. The assumption of no correlation among epsilons is not necessary, but it is natural for the nonlinear structural model combined with the heteroscedastic statistical model. If we observe correlation among residuals, it means that the structural and/or statistical models are not adequate and the modeling process needs to be continued. Sheiner and Beal discarded the assumption of normal distribution after the building of the objective function. They called this procedure the ‘Extended Least Square (ELS)’ method instead of the MLE method.[10, 11, 12]

Meanwhile, the FOCE method uses the following E(Y) and V(Y) for the MLE plug-in as follows:

E(Y)=F1−G1ηt=IPRED−G1ηteq. 4 

V(Y)=G1G1T+diag(H0H0T)=Cxeq. 5 

V(Y)=G1G1T+diag(H1H1T)=C1 eq. 6 

where

F=f1(θ,η,x)  F1=f1(θ,ηt,x)

Y=f2(F,ε)

G=Fη=gθ,η,x G1=g(θ,ηt,x)

H=Yε=hF,ε  H0=h(f0,0)  H1=h(F1,0)

F0 is called the typical prediction (PRED) with EBE=0, while F1 is called the individual prediction (IPRED or IPRE) with EBE. Subscript 0 denotes that 0 was used as a true eta value and 1 denotes that EBE was used as a true eta value (ηt). The function for Y, f2 may contain theta (θ) in real situations, i.e. one can include thetas (θ) within the $ERROR clause of NONMEM. However, theta (θ) is not included in f2 for the clarity. One may also use a Y function, such as Y=f(θ,η,x,ε). For the MLE plug-in, V(Y) in equation 6 seems more natural but NONMEM® uses V(Y) in equation 5 strangely.

Here, F is the predictive function, theta (θ) is a parameter (constant) vector within the predictive function, and eta (η) is a random variable vector indicating interindividual random variability with the mean of zero vector and the variance–covariance matrix of omega (Ω). Epsilon (ε) is also a random variable vector used to describe unexplainable residual variability between prediction (F) and observation (Y).

The primary work of NONMEM software is to estimate parameters (θ, Ω, and Σ), while the secondary work is to estimate the empirical Bayes estimate (EBE) of the random variable eta for each subject. Therefore, NONMEM has two objective functions to optimize: one for the parameters (θ, Ω and Σ), and the other for EBE. To estimate parameters (θ, Ω and Σ), EBEs are regarded as true values, while to estimate EBE, parameters (θ, Ω and Σ) are considered to be true. Parameters (θ, Ω and Σ) are constant for the entire population, while EBEs are constant for a certain subject or dosing occasion.

For the additive error model, H1 is identical to H0 as 1.

Y=F+ε  eq. 7 

H=Yε=1eq. 8 

In the case of the following combined error model:

Y=F+F·ε12eq. 9 

Yε1=F eq. 10 

Yε2=1eq. 11 

thus, H=(F,1).

We can construct two types of residuals as follows:

R0=C0-12Y-F0=WRES eq. 12 

R1=Cx-12Y-F1=CWRES eq. 13 

R1=C1-12Y-F1=CWRES eq. 14 

R0 is called the weighted residual (WRES) in NONMEM, which is distinct from the weighted residual in statistics textbooks. Therefore, the sign of WRES could be opposite to that of the residual, RES (= Y − F), in NONMEM. By contrast, R1 in equation 13 is called a conditional weighted residual (CWRES). However, using R1 in equation 14 seems more natural in authors' opinion. C0, CX and C1 are defined in equations 3, 5 and 6, respectively. The calculation of the matrix to the power of −0.5 is explained as the R function of mat.sqrt.inv() in the previous article [5].

If we use the variance (V(Y) or C), and the weighted residual (WRES or R) for the objective function, NONMEM's objective function is as follows:

Oi=logCi+RiTRieq. 15 

O=i=1niOieq. 16 

The total objective function value (O) is simply the summation of individual objective function values (Oi).

The reason that we use “G11 G21 G31 H11” (Fig. 1) for the extraction of G matrix and H matrix is that NONMEM (up to version 6.2) stores those values in the internal matrices as follows:

Figure 1
NONMEM control file for FOCE without INTERACTION

G=Fη12Fη1η1[blank][blank]Fη22Fη2η12Fη2η2[blank]Fη32Fη3η12Fη3η22Fη3η3

H=Yε12Yε1η12Yε1η22Yε1η3Yε2[blank][blank][blank]

The second subscripts in “G11, G21, G31, H11” mean the column numbers of the internal storage matrix, while the first subscripts denote the row numbers, i.e. G21 means row 2, column 1 of the G matrix. NONMEM version 7.x also supports printing the first column elements, such as G11, G21, G31, H11, and H21, but not other elements. Usually, we need the values of the first column only. If you want to reduce the time needed to calculate unnecessary derivatives, you can use the “$ABBREVIATED DERIV2=NO” clause in the NONMEM control file. If you use the PREDPP library (ADVAN subroutines), G and H matrices are renamed as GG and HH, respectively.

A slight difference in objective function values between R (Fig. 2) and NONMEM may be caused while calculating many matrices. Matrix calculation results often differ among different software programs. The first seven digits were identical in our example.

Figure 2
R script for FOCE without INTERACTION

About SIGDIG in NONMEM

SIGDIG (significant digits) can be defined as follows:

SIGDIG=-log10θ-θtrueθtrueeq. 17 

In a scaled and transformed parameter (STP, now it is renamed as UCP, unconstrained parameter) space, the standardization (dividing by θtrue) may not be very meaningful because it was already standardized in a certain way. NONMEM starts minimization with the value of 0.1 in UCP space.

If we calculate the following example, the process may be clearer. Notice that the values are in a real number domain, not in an integer domain.

-log1010.001-10.00010.000=4eq. 17 

However, we do not know the true value of parameters. Therefore, we need to use the following instead:

SIGDIG^=-log10|θ i-θ i-1θ i|eq. 18 

Here, θi denotes the i-th iterated value during the minimization of objective function. During the minimization process, differences between adjacent iterated values become smaller. Therefore, SIGDIG increases as the iterations proceed. NONMEM calculates SIGDIG a little differently because it also considers the gradients. Other software programs usually regards the minimization process as successful if differences in objective function values or parameter values between adjacent iterations are smaller than a specified amount. However, NONMEM includes one more criterion to judge minimization success or failure – the SIGDIG. Where the minimization process cannot reach the predefined SIGDIG criterion (Default is 3), it ends with a ‘rounding error’, which is considered as ‘success’ by other software. Because this SIGDIG is calculated in the scaled and transformed parameter (STP) space, there are usually more actual significant digits in the original scale. One simple explanation for this could be that the modeler specifies in the control file as constrained (bounded) optimization of which the maximum bound is (−1e6,1e6), but in the background optimization process, it uses unconstrained minimization of which the bound is (−inf, +inf). Therefore, even if one uses SIGDIG = 1, one usually obtains more significant digits in the original scale. This means that during the model building process, one does not need to place too much emphasis on SIGDIG, and it could even be reduced to 1. One more tip for covariate screening is that the OMEGA matrix could even be fixed during the covariate screening and this does not cause much difference in covariate selection.

FOCE with INTERACTION

To consider the interaction of eta and epsilon, we need to use at least a second-order approximation of observation function (Y) or likelihood function (accurate objective function).

The Taylor series expansion of observation function (Y) for the calculation of E(Y) and V(Y) is too complex. Because they are no longer normally distributed, we cannot use the MLE formula for normal distribution. Therefore, it is better to use the Laplacian approximation, which is the second-order Taylor series approximation of the likelihood (an exponential function).

Oi=Φi+η^iT-1η^i+log||+log-1+Δ^i2eq. 19 

Oi=Φi+η^iTΩ-1η^i+log|Ω|+log|Ω-1+GTiV-1Gi|eq. 20 

Oη^i=Φi+η^iTΩ-1η^ieq. 21 

where

Φi==2LLi-nilog2π=logVi+Yi-FiTVi-1(Yi-Fi)

Δ^i=2Φiη2η=ηt

The objective function for LAPL is equation 19.[6, 7] Additionally, its approximation is equation 20, which can be used for the objective function of FOCE with or without INTERACTION. A more complete explanation of equation 20 will be made in the next article of this tutorial series.

Interestingly, the first two terms of equations 19 and 20 are the same as those in the objective function for EBE (eq. 21).

To show the calculation results of objective function values from the FOCE with INTERACTION method in NONMEM, the combined error model (epsilon 1 for proportional error and epsilon 2 for additive error) was chosen as shown in Figure 3.

Figure 3
NONMEM control file for FOCE with INTERACTION method

The difference in objective function values between NONMEM and R (Fig. 4) becomes larger, which might be the result of a more complex calculation of matrices. One can also see larger differences in individual objective function values in OFV,[3] in the code. If one uses the INTERACTION option in the FOCE method in NONMEM and the combination of formulae in equations 1, 4, and 5 in R, one will get a different objective function value: 88.187 in R compared with 92.831 in NONMEM. This shows that NONMEM uses equation 20 for the FOCE with INTERACTION method, and not a direct extension of the FO method. A theoretical proof and numerical examples are provided in Wang's article.[7] The method of calculating CWRES in FOCE with INTERACTION is known to be the same as FOCE without INTERACTION method.

Figure 4
R script for FOCE with INTERACTION method

If one does not use the INTERACTION option when the error model is other than additive, a residual plot could show a peculiar trend as seen in Figure 5.

Figure 5
Comparison of WRES vs time plot in the proportional error model

Conclusion

Mathematical formulae and R scripts for FOCE with or without INTERACTION are shown in this article. For an additive error model, we can use the FOCE without INTERACTION method, while for other error models, we should use the FOCE with INTERACTION method. The FOCE formula that was obtained by the approximation of LAPL usually gives results very similar to those using the LAPL method. Therefore, we recommend using FOCE with INTERACTION method as a workhorse in population modeling.

Notes

Conflict of Interests:The authors declare that there is no conflict of interests.

Acknowledgement

The authors appreciate Dr. Yaning Wang at the FDA for his careful review and precious comments to improve this manuscript. Authors also thank Dr. Sungpil Han for the management of references.

References

    1. Kang D, Bae KS, Houk BE, Savic RM, Karlsson MO. Standard Error of Empirical Bayes Estimate in NONMEM® VI. Korean J Physiol Pharmacol 2012;16:97–106. [doi: 10.4196/kjpp.2012.16.2.97]
    1. Sheiner LB, Beal SL. Evaluation of methods for estimating population pharmacokinetics parameters. I. Michaelis-Menten model: routine clinical pharmacokinetic data. J Pharmacokinet Biopharm 1980;8:553–571.
    1. Sheiner LB, Beal SL. Evaluation of methods for estimating population pharmacokinetic parameters. II. Biexponential model and experimental pharmacokinetic data. J Pharmacokinet Biopharm 1981;9:635–651.
    1. Sheiner LB, Beal SL. Evaluation of methods for estimating population pharmacokinetic parameters. III. Monoexponential model: routine clinical pharmacokinetic data. J Pharmacokinet Biopharm 1983;11:303–319.
    1. Kim MG, Yim DS, Bae KS. R-based reproduction of the estimation process hidden behind NONMEM® Part 1: first-order approximation method. Transl Clin Pharmacol 2015;23:1–7. [doi: 10.12793/tcp.2015.23.1.1]
    1. Beal SL, Sheiner LB, Boeckmann AJ, Bauer R. In: NONMEM user's guides. Ellicott city, MD, USA: 1989-2011.
    1. Wang Y. Derivation of various NONMEM estimation methods. J Pharmacokinet Pharmacodyn 2007;34:575–593. [doi: 10.1007/s10928-007-9060-6]
    1. Beal S, Sheiner L. The NONMEM System. Am Stat 1980;34:118–119. [doi: 10.2307/2684123]
    1. Plan EL, Maloney A, Mentre F, Karlsson MO, Bertrand J. Performance comparison of various maximum likelihood nonlinear mixed-effects estimation methods for dose-response models. AAPS J 2012;14:420–432. [doi: 10.1208/s12248-012-9349-2]
    1. Peck CC, Beal SL, Sheiner LB, Nichols AI. Extended least squares nonlinear regression: a possible solution to the "choice of weights" problem in analysis of individual pharmacokinetic data. J Pharmacokinet Biopharm 1984;12:545–558.
    1. Sheiner LB, Beal SL. Pharmacokinetic parameter estimates from several least squares procedures: superiority of extended least squares. J Pharmacokinet Biopharm 1985;13:185–201.
    1. Peck CC, Sheiner LB, Nichols AI. The problem of choosing weights in nonlinear regression analysis of pharmacokinetic data. Drug Metab Rev 1984;15:133–148. [doi: 10.3109/03602538409015060]

Metrics
Share
Figures

1 / 5

PERMALINK