SAFETY, RELIABILITY AND APPLICATIONS OF EMERGING INTELLIGENT CONTROL TECHNOLOGIES A Postprint volume from the IF AC Workshop, Hong Kong, 12 -14 December 1994 Edited by T.S. NG and Y.S. HUNG The University of Hong Kong Published for the INTERNATIONAL FEDERATION OF AUTOMATIC CONTROL by PERGAMON First edition 1995 IF AC Workshop on Safety, Reliability and Applications of Emerging Intelligent Control Technologies 1994 Organised by The Hong Kong Institution of Engineers Sponsored by IF AC Technical Committee on Computers FOREWORD The trend to incorporate intelligent controllers into control systems has continued over the last few years. The number and type of intelligent controllers that contain variations of fuzzy logic, neural network, genetic algorithms or some other forms of knowledge based reasoning technology has increased dramatically. On the other hand, the stability of the system, when such controllers are incorporated, is difficult to analyse and the system behaviour under unexpected conditions may be difficult to predict. IFAC, recognising the industrial, technical and economic significance of safety, reliability and application issues of Emerging Intelligent Control Technologies has continued to promote research and dissemination of research results in these areas through the activities sponsored by the Technical Committee on Safety of Computer Control Systems. This first International Workshop on Safety, Reliability and Applications on Emerging Intelligent Control Technologies and the accompanied proceedings exemplified the diversity and significant progress made by leading researchers world-wide. In addition to papers that are organised into 8 sessions, covering a wide range of topics including neural networks, fuzzy systems, genetic algorithms and applications in power systems, autonomous vehicles and fault detection, there are five keynote papers presented by internationally renowned experts and two panel discussions on current development and future research directions. The IPC members recognise the very high quality of the technical papers submitted and would like to extend their thanks and appreciation to the authors for their effort and kind co-operation. The National Organising Committee would also like to record their appreciation for the financial support by the sponsors and for individuals who have put in their valuable time and effort to make the Workshop successful. T.S. Ng T.S. Ng M.G. Rodd Y.S. Hung Chairmen IPC Proceedings Editors Copyright © IF AC Emerging Intelligent Control Technologies, Hong Kong, 1994 SAFE AI - IS THIS POSSIBLE? M.G. Rodd Real-time AI Research Group Department of Electrical and Electronic Engineering University of Wales Swansea Abstract This deliberately provocative paper poses the question as to whether it is professionally-acceptable to use Al-based control systems in real-time automation. The premise is that the increasing demands for improved control of industrial processes are resulting in a search for new control techniques which, it is suggested, will be based on various AI methodologies. However, in the face of the inherently self-learning, self- adaptive nature of such techniques, it is pointed out that to produce safe, predictable systems, the controllers should be functionally deterministic! In posing this dilemma, the paper suggests possible ways in which it may be resolved. Key words: Real-time AI, safety in control systems, verification of Al-based control systems, real-time control 1. INTRODUCTION — WHY AI? against finished products-out), is not as effective as possible. The past 5 years or so have seen a dramatic shift in Despite the extensive theoretical work that has been the way in which the design of control systems is undertaken (for example in large-scale systems being approached. There are probably two major theory and industrial production scheduling) viable, influences which are seen to be at work, and which practical solutions are slow to emerge — if indeed, are causing control engineers to re-think their they are emerging at all! This is not surprising, approach. given the nature of the processes that need to be controlled, which include the following features The first driving force is the realisation that much of (Rodd, 1992): the present mathematically-based control, however successful it has been at the lower levels of control, • Many real-world processes are simply incapable is simply incapable of effectively supporting and of being mathematically modelled — most controlling process-wide integration. This latter complex processes are inherently non-linear, aspect is a serious requirement — it is self-evident non-stationary and time-varying. Where models that to meet ever-increasing production demands, have been developed, they often prove to be far against the constant pressure to reduce all too simple to be useful in practice. expenditures, the maximum efficiency in industrial • Most, if not all, process variables will interact processes must be achieved This efficiency must be — often in a non-deterministic and ill-defined obtained, however, in the face of the need to ensure fashion. maximum process flexibility, so as to be able to meet • Many essential process parameters and variables the ever-shifting demands of a fickle international are simply not capable of being measured. market. In most industrial processes, system-wide • Much of the data which is measurable is often integration is thus becoming essential; simply inaccurate, highly noisy and over-voluminous. controlling the lower process levels is of little use if • "Good" operators always seem to be able to the overall effectiveness, measured in terms of control processes, or parts thereof, better than system input against system output (i.e. resources-in their automated equivalents can. 1 Against the above demands, there is a second real 2. THE DILEMMA driving force, which is gradually beginning to influence control-system design. Alongside the Against the apparent desire to develop the types of increasing demands for environmentally-acceptable systems mentioned above, i.e. those which combine industrial processes, lies a return to sanity in terms symbolic (a priori or on-going) learnt information of reversing the perceived incessant drive towards with strictly defined mathematical calculations, is fully automated processes. Not only is this the constant demand from society for engineers to happening for good sociological and political produce systems which can be totally relied upon. reasons, but experience is showing that it is This is not only reliance viewed from a safety point extremely difficult, and extremely costly, to produce of view, important as this is, but also reliance fully automated plants. Also, in many cases, if it is measured from a purely financial viewpoint. The actually possible to achieve full automation, the failure of any large engineering system is a disaster resulting processes are far too rigid to cope with the — both in terms of possible human loss of life and of dynamic demands of modern production. financial expenditure, but also, of increasing importance, in terms of public confidence in The consequence of these two forces is providing, technology. The past ten years have seen too many more than anything else, the drive to explore new catastrophic failures for these realities to be ignored. technologies and techniques. These new technologies must cope with the shortcomings and The problem which faces control engineers (and realities mentioned previously. It can thus be indeed will probably face engineers in most suggested that the control systems of the future will disciplines) is that there is an increasing pressure to have to display, amongst many other things, the produce products which society can depend upon. following characteristics: On the other hand, though, for very good reasons, control techniques are being suggested (and, in fact, • The ability to handle, in an integrated fashion, already introduced into daily use), which seem, symbolic and mathematical information. literally, to be almost laws unto themselves — Process information will arise from many offering inherent capabilities to change their sources: original design data, sophisticated performance at their own will. Indeed, this is why control algorithms, on-line process models and such controllers are referred to as displaying some of equations, real-time as well as historic process the characteristics of "artificial" intelligence. information, visual observations, remotely acquired indicators, operator inputs (maybe via It is this dilemma which the rest of this paper speech or physical actions), etc. addresses: starting with a discourse on aspects of • An inherent ability to learn, and to be able to reliability and safety, the paper then briefly surveys adapt, according to changing situations, where AI-based control techniques are heading. including the introduction of new products, raw From these observations, the dilemma can be materials, or energy sources, or changing addressed, and some possible ways ahead are environmental requirements, etc. indicated. • An ability to cope with imprecise data, or data which simply has constituent parts missing. Firstly, given the somewhat casual manner in which • A capacity to meet the ever-increasing demands, some of the above terms are used, it is relevant both legal and economic, for safe, reliable clarify what is understood here by ''reliable" and operation under all circumstances. "safe". For ease of understanding, only informal • Conformance with the environmental definitions will be used. acceptability requirements. The starting point has to be the simple fact of This means that the control systems of the future engineering life that anything that can fail, will fail. will have to provide overall supervision of multiple, Despite the corporate human arrogance, nothing simple, well-defined control systems, such as those human-made has ever been shown to be incapable of based on PID controllers or simple PLC-based failing, be it a mechanical part, an electrical device sequential controllers. This is, of course, continuing or a chemical substance. The discipline, often the current trend towards increasingly distributed neglected in engineering curricula, of reliability control systems. However, the supervision and engineering sets out to study failure mechanisms and control provided must be combined with decisions to provide statistical results which will enable an based on a priori knowledge, or on information estimation of the likelihood of failure occurrences — provided by operators, experienced workers and the Mean Time Between Failures (MTBF). many other possible sources. Much of this additional information, however, might be somewhat irrational On the basis, then, on the interconnection of a and not fully quantifiable. variety of devices, for each of which the individual 2 MTBF has been derived, the reliability engineer can defined, but every aspect of failure, both of the plant produce a statistical estimate of when, in terms of an under control and of the controlling system, must be interval of time, the overall device can be expected fully investigated and analysed. to fail. The resulting "reliability" thus provides a mathematical assessment of how "reliable" a product Of course, it must be noted that in all areas of is. On the basis of this figure the supplier, or the engineering there is nothing new in this! The customer, can decide what, if any, action needs to be mechanical engineer producing a modern taken. If the reliability is too "low", the product can automobile will have carefully designed the system be re-designed, maybe using components with knowing all aspects of potential failures. This improved individual MTBFs, but a cost penalty information will have been acquired over many could well result. The customer, though, could years, and is based on both practical experience and decide that the resulting "risk" of the product failing extensive formal and experimental failure analysis is acceptable; otherwise, an alternative can be — of individual components as well as of systems sought. and sub-systems. The essence of this is that the "reliability" of a man- It is critical to recall here that even the most reliable made system can be assessed and, in theory (if often system will fail — given the sheer limits of not in practice) the user (be it a person or a society) technology and the fact that even in extremely well- can be made aware of this, and can then make the developed areas of engineering, designers still do not appropriate decisions. This implies that the term have complete knowledge of all aspects of any "reliable" is meaningless unless it is qualified by system, or the possible components thereof. At some statistical estimate. Of course, as will be whatever level one wishes to take this, what has to alluded to later, this immediately raises the problem be done is always to undertake extensive system of how the reliability of software can be expressed! evaluations, and from these determine the MTBFs of Despite the best efforts of computer scientists, there components and sub-systems. These can then be are still no universally acceptable techniques for aggregated to produce overall, system-wide assessing the MTBF of software. Indeed, although it assessments. is now almost trite to say it, since the comprehensive testing of software is impossible, only very vague However, there always is the possibility, however estimates of any program's reliability seem ever to small, that all five nuts on a motor-car wheel will be possible. break at the same time; the wheel will then fall off and people could be killed. What engineers have to Turning to the issue of safety, this effectively takes do, of course, is to reduce this possibility to the the outcome of reliability studies one stage further. smallest possible level, and then to design back-up Given that the failure modes, and their associated systems. statistics, are known, the next step in any engineering design should be to take appropriate Thus, where areas of potential failure are discovered, measures to ensure that when such failures occur, normally — at least in a responsible environment — that there will be no injury or loss of life to persons solutions are developed to cope with the results of associated with that activity. "Safety" thus relates to failure. Therefore, if that wheel does fall off the car, the responsibility of engineers towards society, and the rest of the system should ensure that at least the largely becomes a legal issue, relating to the need to car slows down in a relatively straight line and is comply with the local "safety laws" which are in brought to a safe halt. Similarly, the failure of a place in a given situation. single aircraft engine should not be catastrophic; the aeroplane should be able to fly on the remaining These requirements of meeting reliability and safety engines. If there is only one engine, then at least the targets are pushing control engineers into many new plane should be able to glide to a safe landing. (If it arenas. The most important of these probably relates is an inherently unstable jet fighter, this is probably to the question of design, viewed first and foremost impossible, so the pilot is given an ejector seat.) from a reliability point of view. In striving to build Whether this approach to systems design is always highly reliable systems, engineers down the ages true in reality, of course, is often questionable but have of course tried to ensure that everything is as every attempt is normally made to make engineered well-defined, with actual functional performance as systems as safe as possible. well-understood and as well-tested, as possible. However, a system viewed from the outside consists It is worthy of comment here that, unfortunately, of a single, unitary object, with the actual plant and engineers (or more probably their managers and its controller indistinguishable from each other. shareholders) are often "carefiil" of how they release Thus, for example, not only should the mathematical reliability information to their customers. They characteristics of the control algorithms be well- might, for example, declare that the MTBF of a 3 nuclear power station is (possibly) one million years Now, in the case of software, the "logical" — which sounds good enough. However, they often operations of systems have traditionally been the "forget" to tell their potential customers, particularly focus of the attention of software designers. those living around the station, that whilst this However, to process engineers, for example, whilst means only one failure in a million years, that one the chemistry relating to a specific chemical reaction occurrence could happen at any time within the is important, the speed and sequence of such million years — perhaps today! reactions is inherently absolutely vital when designing a process system. To software engineers In summary, it should be re-iterated that safety and however, until relatively recently (and despite the reliability must be carefully separated. Reliability is heroic efforts of researchers such as Stankovic all about how often a system will fail. Safety, on the (1988) and Kopetz (1984)), the "temporal" other hand, is concerned with what happens if a performance of a computer's software has been of system does, in fact, fail. It is clear, though, that little interest. The traditional approach has been to there are strong links between the two concepts and design the software and then, provided it seems to it is extremely interesting to see that the common execute "fast" enough, to accept it. If it is too "slow", element is a question of "determinism". To access the algorithms are changed, or a faster computer is reliability, and hence move on to ensure safety, introduced. designers must be able to determine what happens at all times and under all circumstances, within the But things are changing! As was pointed out by plant and all its components, within the controlling Stankovic (1988), real-time considerations are systems and, of particular importance, within their essential in the design of real-world systems. As a mutual interactions. result, it must be recognised that to control a real- time, real-world process, both the logical operations and the timing requirements of the controlled system 3. DETERMINISM IN CONTROL SYSTEMS must be matched by those of the controller. Thus, both the logical and the temporal performances of Determinism is a doctrine that everything that the two systems being matched, and their joint happens is determined by a necessary chain of operation, must be understood, and where possible, causation. A system can be said to be deterministic if must be functionally deterministic. the performance of all the components which comprise that system is fully defined — both As Kurki-Suonio (1994) quite rightly points out, of logically and temporally. course designers will never have perfect knowledge of either the process itself, or of any model of it. The very term "determinism" has been the subject of This must be accepted and catered for in the design much debate. In his excellent summary of methodologies. (Here, for example, the design "misconceptions" about real-time, Kurki-Suonio techniques suggested by Motus (Motus and Rodd, (1994) points out that the statement that "Real time 1994) offer many useful contributions to the design requires determinism" is a misconception in itself. procedures.) In practice, whilst logical determinism Taking a counter argument, Kurki-Suonio argues might be possible, at least at the design stage, the that question of temporal determinism is still a difficult and misunderstood arena. In most cases, temporal "Non-determinism ("A situation with some freedom performances will probably have to be expressed for the outcome of an operation, or for the next using timing bounds rather than absolute instances operation to be executed.") is a property of the or intervals. The most critical aspect, though, is that models (of systems) and not of reality. Reality may timing must be dealt with! or may not be non-deterministic, but as far as programs are concerned, we can assume reality to In summary, when talking about determinism, it is be deterministic but incompletely known. " suggested that the word should be understood to include all aspects involved in developing a full In a typical engineering fashion, and with an eye on understanding of the overall functional performance providing a simple understanding, such of a system. This will apply even if some of this sophisticated arguments will be avoided here! knowledge is incomplete and may have to be Instead, a naive, but "engineeringly meaningful" represented statistically (just like MTBF estimates, approach will be adopted. Firstly, the proposition is though) or by timing bounds (using maximum made that any understanding of a system's communication delays, for example). determinism must consider both the logical and the temporal activities. Overall, the objective is to To achieve such overall functional determinism, understand what a system will do — under all then, the designer must ensure that all the links in circumstances (or at least all conceivable ones). any causation chain are fully defined, and that their 4 actions/reactions are completely understood. This this match is correct, i.e. unless the dynamics of the requires that the effect of each component in a computing system folly match the dynamics of the system be folly defined, both logically and objects under control, there is the potential for temporally. Therefore, in any complex system, each serious errors to occur. The aeroplane about the component must be specifically defined such that its crash into the side of a mountain will have its own causes and effects under all conditions are well inertia, and almost its own will! To control that understood. There could well be, as was pointed out system will require the controlling elements to take above, some degree of probability in such into consideration the dynamics of that aircraft; if assessments, and this is normally well understood in instructions sent to an actuator arrive too late, then the design of traditional engineering systems. there is no way that any corrective action can result. Critical to the above discussion is the question of The consequence of this is not only that the understanding the failing modes of the various algorithmic processing must be correct, but so must system components. Moving into the complex world all timing-related aspects of the processing. At this of computer-based systems, there are several links in stage in the development of industrial computer the chain which designers are simply unable to cope control, though, only the early shoots of serious with. The first, and most obvious, one relates to concern about these aspects of real-time software software. It is very well-known (see, for example, engineering are emerging. As a result, only recently (Musa, 1989)) that engineers still have to come to has the need for the development of appropriate grips with the problem of developing totally reliable specification, analysis and design tools been talked software. Indeed, despite extensive work over the about by industrial practitioners. (See, for example, past three decades, very few clues have yet emerged (Motus and Rodd (1994)). relating to the determination of the reliability of a piece of software — for either existing or proposed As a consequence of the points made in this section, code. This problem, of course, relates directly to if a fairly rigid view of real-time systems is taken, in inherent nature of software — being so complex, that they should be functionally (i.e., both logically there are so many aspects where things that can go and temporally) deterministic, then many currently wrong. As a result, it is not even possible to test installed real-time control systems are seen to be fully even a simple piece of code. Also, there is the using components which are simply not appropriate. continuing problem of software engineers who The issue is that once it is recognised that all simply cannot perceive that their software could components in a system should be temporally possibly ever have any errors in it! deterministic (even if this is expressed in terms of upper bounds), then many current tools become To assist in coping with this problem, a wide range inappropriate. This rules out well-loved systems of software engineering tools have been developed, such as OSI-based communication techniques, the and the use of these should do much to increase the use of Ethernet as a communications mechanism, reliability of the final products. However, computer and the wholesale use of operating systems such as system designers continually have to come back to UNIX, DOS and Windows. the fact that they simply do not know how to calculate software reliability — given that they are incapable of fully testing any code. 4. THEN ALONG CAME AI Indeed, the situation is even more bizarre than that! Reflecting back on the previous three sections, a As pointed out by Motus Rodd (1994) there is an critical convergence of thoughts can be observed. important aspect of system testing which has On the one hand there is the need to produce real- continually been forgotten or (where understood) time control systems which people can rely on. On conveniently been ignored. This relates to the the other, there is an on-going march towards the assessment of the temporal performance of software, introduction of new controlling technologies — the relevance of which has been referred to above. given that the current range of available techniques The thrust in software testing so far has largely been is simply not adequate to cope with the new aimed at ensuring that the algorithms operate as application demands. The dilemma is, therefore, logically or as mathematically correctly as possible. that system designers are required to do a job which However, when using computers to control real-time they simply do not, at the moment, have the correct processes, the additional dimension of time is tools to undertake. introduced The point here is simple: in real-time automation, the computers are being used to control However, it is essential that control engineers, and processes which have their own real-world, real- more importantly their paymasters, do not simply sit time, dynamic operating conditions. The back and do nothing: "our" company, or "our" fundamental issue that arises from this is that unless country, might decide not to adopt new approaches, 5 but given the globalisation of production, someone able to quote the frequency with which relays are else surely will! The truth is that many AJ-based found "stuck" in an un-safe condition. The theory tools are being rapidly taken into regular, on-line that gravity will ensure, for example, that a power- use: there is already a significant, installed base of failure will always result in a relay failing into a systems which are reliant for their functioning on known condition is simply not true. Indeed, it is real-time expert systems, neural networks, fuzzy only recently that solid-state units have become controllers, etc. The academic view of the situation, acceptable as end-elements, and it is also only that is, saying that the techniques are not appropriate recently that PLCs have been accepted as possible and therfore should not be used, is simply replacements for relays at the sharp end of control! unrealistic. The technologies are being used and the issues this raises, in terms of safety and reliability, Returning to the first point raised previously, many have to be addressed. systems have, literally, an electronic filter between the controller and the controlled system. This It is proposed here (deliberately, simplistically) that approach can be visualised as providing a protective there are two approaches which may be taken to "jacket" between the controller and the system it is handle the problem: controlling. Thus, all output settings are screened, to make certain that they fall within acceptable • Protect the plant from any adverse decisions limits. These limits will normally be related to data made by the controlling system, or values, but it is possible to conceive of a "temporal • Make the controlling system deterministic. filter" — which would know when values should be sent between the process and its controller. In These two approaches will be examined in the next practice, a combination of value and time filtering is sections. probably required. Of course, this all poses the question as to how one 5. PROTECTING THE SYSTEM FROM ITS can guarantee that the filtering operation will always CONTROLLER occur correctly. If the filter is software-based, then the question of not being able to validate software This is, of course, the traditional approach, which must immediately be raised again. More has been adopted, in essence, by Safety Inspectorates fundamentally, of course, the use of any jacketing- throughout the world. The idea is simply that the type of approach simply increases the overall system actions of the controlling system must always, complexity, and its validity must then be questioned. effectively, go through a "filter" before they are The more components there are, the more the things allowed to cause any actions on the plant under that can fail. control. This "filter" could take various forms: it could be a system which, literally, checks all outputs to the plant, and often, to protect the computing 6. MAKING AI TECHNIQUES system, all inputs from the plant. In addition, or as DETERMINISTIC an alternative, the signals to be applied to the plant are handled by an additional "safe" final drive This is seemingly the most sensible way ahead, and system. hence this is a field which warrants urgent attention. Whilst the "jacketing" approach discussed in the Taking the second approach first, the final control previous section seems initially to make a lot of element in any safety-critical control system, in sense, it has been pointed out that this adds layers of order to meet licensing requirements, has additional complexity. The fundamental rule of traditionally had to be a "well-understood" (i.e. well- successful engineering design, especially when tried and tested) component. Often, the final viewed from reliability considerations, (but surely element thus has to be inherently "fail-safe" — i.e. it true from any point of view?) has to be to get the must always, itself, fail to a "safe" condition.. As a number of components in the functional chain down result, in many countries it is still required that the to the minimum. Traditionally, of course, much final interface to the plant must be a relay — the effort has gone into designing and implementing basis of this thought being that the failure mode of a algorithms that are as reliable and, hence, as safe as relay is always defined. possible. However, as mentioned in Section 4 above, these efforts have often excluded paying any It is important, though, to point out that such attention to the temporal effects. Also, and perhaps approaches and the assumption upon which they are more philosophically, the question of not based, are not themselves strictly valid, but are understanding the infinite failure modes of software "acceptable" approximations. Anyone who has must be addressed. Even now, major software worked in railway signalling, for example, will be vendors would not disagree with commonly-used 6

