144 PROCEEDINGS of the20-th International Conference SAER-2006 SYSTEM IDENTIFICATION OF BIOLOGICAL PROCESSES Kaloyan Yankov Medical Faculty, Thracian University, Armejska str., 11, Stara Zagora 6000, Bulgaria e-mail: [email protected] Abstract: This paper discusses some aspects of system identification of biological processes. An iterative optimization method for identification is proposed. The created software aids in appreciation and analysis of effects of new drugs. The major merit is the graphical user interface. It would make the software easy to work for users without specific expertise in system identification despite the sophistication of the theory and algorithms Key words: System identification, Systems biology, Cyclic coordinate descent. 1. INTRODUCTION The development of new drugs requires assessing their safety and efficacy before they can be marketed. The process of assessing consists of two phases: preclinical and clinical. In preclinical phase the experiments are carried out on animals in vivo or on isolated tissues and organs in vitro. The obtained results from the preclinical research phase usually are processed by the methods of statistics or by numeric analysis of their dynamic characteristics [1,2]. The received data are insufficient and it is incorrect to translate the results and conclusions upon the humans. A resources and methods are necessary in order to be able to predict the effects on humans and then to pass to the clinical research phase. To obtain reliable experimental data the use lots of animals is necessary and that is in discrepancy with modern tendencies in biological science. In addition this makes the experiment more expensive and complicates the research conditions. An alternative approach, which proves its effectiveness for system evaluation and determination its quality at reasonable price, is the use of simulation models [3]. The goal of modeling is to create a model of phenomenon using at least minimal number of variables and parameters and correctly to reproduce the main characteristics of phenomenon. Models map the relationships between the variables in the system to be modeled onto mathematical structures like simple algebraic equations, differential equations or even systems of differential equations. In developing a mathematical model two fundamental approaches are possible. The first is based on fundamental understanding of the modeled processes. The other is based on experimental data and is essentially a data-driven approach (black-box model). Experimental modeling is known in the literature as system identification (SI). Hence, nowadays there are many simulators and simulation languages aiding the process of system identification. Аll-purpose or general simulators. Мodels are expressed using differential equations or transfer functions, so they can be used for any kind of system: chemical, boilogical, mechanical, electrical, etc. Examples are Matlab-Simulink and ACSL [4]. The main obstacle in its use is that they require a good knowledge about mathematics and system theory. Specialised simulators. They are applicable in a concrete scientific field, even for concrete real systems. The scope of application is very limited – they cover a class of models typical for the field of study. The major merit is the graphical user interface (GUI), which should render it extremely easy to use, despite the sophistication of theory and algorithms. In the field of drug research physiologists, physicians and pharmacologist should work with such simulators without specific expertise in system identification. 23 – 24 September 2006, BULGARIA 145 In this study specialized software for system identification of biological processes is discussed. Biological systems are difficult to model as they are non-linear. An iterative optimization method for identification is proposed. As an example a model of plasma renin activity after nifedipine treatment is shown. 2. DESIGN OF IDENTIFICATION PROCESS The identification experiment is performed by exciting the system (Fig.1) using particular input signals U(t) and observing the corresponding output y(t) over a time interval. These signals are normally recorded in a computer mass storage for subsequent data processing. U(t) - input vector, y(t) - output vector, X(t) - state vector y(t) = f(t,X(t),U(t)) Figure 1 2.1. Input signals. They are used during the design testing. The most used signals are: • Impulse function. In biomedical experiments often the input is a short injection of a quantity of unknown or known material. These inputs can be safely modeled as impulses (Dirac functions). Because of Dirac function is a frequency-rich input signal that guarantee good parameter identification. • Step function. The input influence is considered as a step function, when the input has constant amplitude for all time course of the experiment/measurement. 2.2. Output response. It depends on an applied input signal and a state vector of the system and is denoted as y(t). During the experiment a few values of the output signal are recorded. Let Ф(t) ⊂ y(t) is the observational vector, which is known by the experiment: Ф(t) = [ф , ф ,... ф ,]Т, N – number of samples 1 2 N Vector Ф(t) is used during the process of identification. 2.3. Identification time t . The time for process modeling. It is generally defined as the time p for the response to reach the steady state level and the observed parameters get the normal values. 2.4. Sampling time. That is the time interval for performing the measurement of the output. It is unavoidable that sampling leads to information losses. Therefore, it is important to select the sampling frequency so that these losses are insignificant for the given application. 2.5. Signal processing. For increasing the quality of the SI sometimes processing the data beforehand is necessary. Filtering ignores any possible random fluctuations. Increasing the set Ф(t) when the number of measurements N is small and the function y(t) is assumed to be smooth is possible by using the appropriate approximation, for exampe spline interpolation [5]. 2.6. Model Structure. The most theoretical part of the SI procedure is the selection of the model structure. It is here that a priori knowledge as well as intuition and insight into the specific problem have to be combined with the formal (mathematical) properties of the models. Let Y(q ,q ,…,q ) is a function class and q ,q ,…,q are identification parameters. They 1 2 m 1 2 m form an identification vector Q = [q ,q ,…,q ]. Models belonging to a certain class differ 1 2 m according to the values of the parameters of Q. Defining the values of those parameters is the object of system identification. First, the experimental data should be defined as belonging to a certain class of functions Y. Many processes in biological systems, which are of scientific interest, lead to quantitative changes. They are characterized as being monotonous and most of them could be modeled by first order ordinary differential equations (ODE). Another category of processes, which belong to velocity processes in the system [6,7] follow an oscillation curve. The most appropriate 146 PROCEEDINGS of the20-th International Conference SAER-2006 model for them is second or third order ODE. The user of the modeling software chooses the model structure according to the experimental data. 3. SYSTEM IDENTIFICATION Let us chose to subject to SI the function y(t,Q) ∈Y(Q). As a result of the measurement we know its values ф in m number of points t i i y(t,q ,q ,…,q ) = ф(t), i=1,2,…,m (1) i 1 2 m i i The identification vector Q must be defined from the system (1). But this system is not well defined. For each point ф the residual d is: i i d(Q) = ||y-ф|| i i i The aim is to minimize d for the whole interval of identification t : i p D (Q) = inf|| d(Q) || < ε, ε>0 Y i (2) y∈Y Thus the identification goal is translated into an optimization problem. The optimization method is defined depending on the way of minimization of D (Q). The various identification tasks Y differ according to the way of defining of D and the chosen class of functions Y. Y 3.1. Least square fitting. An object of minimization is the functional: N D (Q)=∑d 2 →min (3) Y i i=1 It is applied when the experimental data are approximate and cannot be considered very accurate. The disadvantage of this method is that it does not guarantee the smallest absolute error, it does not lead to high accuracy and with experiment performed accurately the resulting model is unsatisfactory. 3.2. Uniform fitting The uniform fitting is aimed at minimizing the maximal deviation in the base points ф: i D (Q)=max|d |→min (4) Y i Uniform approximation is used when we are search for higher accuracy and the application of the least square fitting is difficult or impossible because of the nature of the identification function. The fault of this approach is that no method has been devised to find with a limited number of operations the coefficients of the best smooth approach. 3.3. Cyclic coordinate descent identification The use of the first derivative to find the minimum of system (2) is difficult. The received equations are very complex and the algorithms converge slowly. For nonlinear models very few results have been obtained and no standard algorithm exists for testing global identifiability. Various approaches have been proposed, e.g., power series [8], stochastic approximation [9] and similarity transformation methods [10]. In this work а Cyclic coordinate descent (CCD) [1] is applied. The CCD method is an iterative heuristic search technique that attempts to minimize the error by varying one parameter at a time. It reduces the multi-criteria optimization to the single-criteria one. A number of passes are made over the model equation to find the global minimum D for system (2). The successive Y approximation of the coordinate variable q on the k-th iteration is evaluated using the procedure: i qk = argminD (qk,qk,...θ,qk−1,...,qk−1) (5) i Y 1 2 i i+1 N θ where argmin(…θ…) is a such value of θ for which D (Q) has a minimum on the corresponding Y coordinate when the other coordinates are fixed. The values of D (Q) obtained at every iteration Y generate a sequence: D (Q0) ≥ D (Q1) ≥ D (Q2) ≥ .... D (Qк) = inf||D ||≥0 Y Y Y Y Y This sequence is monotonous and limited consequently it converges uniformly. If: 23 – 24 September 2006, BULGARIA 147 D (Qк) ≤ ε, ε>0 (6) Y then the problem is solved. If (6) is not true, i.e. D (Qк) ≥ ε and D (Qк) = D (Qк-1) (7) Y Y Y then the solution is a local not a global minimum. Usually the check for termination the identification process is the unequality: | D (Qк) - D (Qк-1) | < δ, where δ is a accuracy estimation (8) Y Y CCD is a simple for realization, relatively robust method. The main disadvantage is its locality and the fact that it may take a lot of calculations. 3.4. Program realization On fig.(2) is the CCD algorithm. Procedure Coordinate Descent; begin K:=0; // k-th Iteration Set_init_configuration(Q[K]); // initial values for configuration vector DY[K]:=Model_Identification(Q[K]); // calculate residual DY upon model Q[K] repeat // K-th iteration procedure inc(K); I:=0; repeat // minimization for I-th parameter q[I] inc(I) ; repeat q[I]:=Adaptation_step; // calc appropriate coordinate value DY[temp]:=Model_Identification(q[I]); // calculate DY upon model q[I] untul(DY[temp] < DY[K-1]); // better identification error until (I=N); // last parameter DY[K] := DY[temp] until ((DY[K] <eps) or ((DY[K] - DY[K-1])<delta); // global solution or local minimum end; Figure 2. Cyclic Coordinate Descent Algorithm The main role in this algorithm has the procedure Model_Identification. The equation of the model is solved and the error d in the base points ф is calculated for the formulated particular i i vector Q. When the model is algebraic the calculation is explicit. When the model contains differential equations the method of Runge-Kutta is used. The procedure Adaptation_step defines the current q. The explicit calculation of q can be i i very difficult or nearly impossible. That is why an adaptive procedure is used. • Start at base value q ; base • Find a direction of improvement using penalty function; • Step in the direction of improvement assuming monotonic behaviour. 5. EXAMPLE Нifedipine is a drug used for treatment of hypertension but it influences the secretion of the kidney enzyme renin [12,13]. In the present paper are studied its effect on the dynamic parameters of plasma renin activity (PRA). The experiment consists of impulse input of nifedipine, while the model output is the measured PRA. The system was influenced by peroral treatment of male Wistar rats with nifedipine 20 mg/kg body weight. (with permission from d-r A.Tolekova [14]). The identification time is 9 hours. The process is modeled using second order differential equation: d2y(t) dy(t) +2ζϖ +ϖ2y(t)+ K = KϖU(t) (9) dt2 dt 0 u where: U(t) – input signal; and identification parameters Q(ζ,ω,К , K ) are: 0 u ζ – damping ratio (10)