ebook img

Responsive Computing: A Special Issue of REAL-TIME SYSTEMS The International Journal of Time-Critical Computing Systems Vol. 7, No.3 (1994) PDF

102 Pages·1994·3.262 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Responsive Computing: A Special Issue of REAL-TIME SYSTEMS The International Journal of Time-Critical Computing Systems Vol. 7, No.3 (1994)

RESPONSIVE COMPUTING by Miroslaw Malek University of Texas, Austin A Special Issue of REAL-TIME SYSTEMS The International Journal of Time-Critical Computing Systems Voi. 7, No. 3 (1994) '~·' SPRINGER SCIENCE+BUSINESS MEDIA, LLC REAL-TIME SYSTEMS The International Journal of Time-Critical Computing Systems Volume 7, No.3, November 1994 Special Issue: Responsive Computer Systems Guest Editor: Miroslaw Malek Guest Editor Introduction ................................. Miroslaw Malek 1 Scheduling Algorithms for Fault-Tolerance in Hard-Real-Time Systems ........ . · . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Alan A. Bertossi and Luigi V. Mancini 3 Runtime Monitoring of Timing Constraints in Distributed Real-Time Systems .... · ............. Farnam lahanian. Ragunathan Rajkumar, and Sitaram C. V. Raju 21 Automated Verification of Responsive Protocols Modeled by Extended Finite State Machines. . . . . . . . . . . .. Yoshiaki Kakuda. Tohru Kikuno. and Kenichi Kawashima 49 Compositional Reasoning about Responsive Systems with Limited Resources .... · .................................................... " Henk Schepers 65 Concise Paper Enhancing Fault-Tolerance in Rate-Monotonic Scheduling .................. . · . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. Yingfeng Oh and Sang H. Son 89 Library of Congress Cataloging-in-Publication Data Malek, Miroslaw. Responsive computing 1 edited by Miroslaw Malek. p. cm. "A special issue of Real time systems, the intemationaljoumal of time-critical compuitng systems, vol. 7, no. 3 (1994)." Includes bibliographical references (p. ). ISBN 978-1-4613-6204-3 ISBN 978-1-4615-2786-2 (eBook) DOI 10.1007/978-1-4615-2786-2 1. Real-time data processing. 2. Fault-tolerant computing. 1. Title. QA76.54.M34 1994 005. 2--dc20 94-31249 CIP Copyright © 1994 by Springer Science+Business Media New York Originally published by Kluwer Academic Publishers in 1994 Softcover reprint ofthe hardcover lst edition 1994 AU rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, mechanical, photo-copying, recording, or otherwise, without the prior written permission of the publisher, Springer Science+B usiness Media, LLC. Printed on acid-free paper. Guest Editor's Introduction MIROSLAW MALEK As a rapidly growing number of users demands correct computation on time, there is an urgent need to integrate the theory and practice of real-time systems, fault-tolerant computing, and parallel and distributed processing. One of the main challenges is the ability to design and implement paralleVdistributed com puting systems which operate correctly on time, even in the presence of faults. Maximizing the probability of correct executions on time, even in the presence of faults (this proba bility is called responsiveness), is the major goal of responsive computer systems design. Classical measures of fault tolerance and dependability have concentrated primarily on sys tem attributes such as reliability, availability, and MTIF, while real-time systems research was, from the outset, concerned with service attributes such as deadlines and durations. In addition, ever-increasing system complexity, diversity of requirements and applications, quantum leaps in technology, and an ever-growing reliance on computer/communication systems by all of society make responsiveness optimization sine qua non in most contem porary designs of computer systems. This Special Issue is devoted to concepts, methods, algorithms, and tools to aid in design and implementation of responsive computer systems. The first paper, by Bertossi and Mancini, entitled "Scheduling Algorithms for Fault Toler ance in Hard Real-Time Systems," proposes scheduling algorithms for a set of independent periodic tasks in the presence of processor failures, provided states are checkpointed on some reliable medium. The second paper, by Jahariian, Rajkumar and Raju, entitled "Runtime Monitoring of Timing Constraints in Distributed Real-Time Systems," describes a run-time environment for monitoring of timing constraints and reporting their violations in distributed real-time systems. The main advantage of this proposed approach is the ability to predict violation of a user-level constraint (based on intermediate constraints), even before violation occurs. This approach may aid in tolerance of potential system errors and, therefore, improvement of a system's responsiveness. The third paper, by Kakuda, Kikuno and Kawashima, entitled ''Automated Verification of Responsiveness Protocols Modeled by Extended Finite State Machines," proposes a ver ification method of self-stabilizing and timeliness properties for communication protocols which are modeled by finite state machines. This work is vital to responsive protocol design and verification. The fourth paper, by Schepers, entitled "Compositional Reasoning about Responsive Systems with Limited Resources," introduces a compositional network proof theory to specify and verify properties of responsive computer systems. The theory provides means 1 228 MIROSLAW MALEK of reasoning about responsive systems which must respond to internal guiding programs or external inputs by predictable, timely means, even in the presence of faults. With the concise paper, by Dh and Son, entitled "Enhancing Fault Tolerance in Rate Monotonic Scheduling," the focus is on the problem of scheduling in the presence of faults. The authors show how to support fault tolerance in a rate-monotonic scheduling environment. All in all, the papers featured in this first-ever Special Issue on Responsive Computer Systems contribute to progress in this area. I anticipate that, as computer systems proliferate and invade all walks of our lives, responsiveness will be the most sought-after quality in computer/communication systems. 2 Scheduling Algorithms for Fault-Tolerance in Hard-Real-Time Systems ALAN A. BERTOSSI* [email protected] Dipartimento di Informatica, Universita di Pisa, Corso Iralia 40,56125 Pisa, Italy LUIGI V. MANCINI [email protected] Dipartimento di Informatica e Scienze dell'lnformazione, Universira di Genova, Viale Benedetto XV 3, 16132 Genova, Italy Abstract. Many time-critical applications require predictable performance in the presence of failures. This paper considers a distributed system with independent periodic tasks which can checkpoint their state on some reliable medium in order to handle failures. The problem of preemptively scheduling a set of such tasks is discussed where every occurrence of a task has to be completely executed before the next occurrence of the same task can start. Efficient scheduling algorithms are proposed which yield sub-optimal schedules when there is provision for fault-tolerance. The performance of the solutions proposed is evaluated in terms of the number of processors and the cost of the checkpoints needed. Moreover, analytical studies are used to reveal interesting trade-offs associated with the scheduling algorithms. 1. Introduction In the area of command and control systems, flight control systems, and robotics, there is an increasing demand for more complex and sophisticated real-time computing systems. In particular, fault-tolerance is one of the requirements that are playing a vital role in the design of new real-time distributed systems. Different schemes have been proposed to support fault-tolerant computing in distributed systems. The basic technique employed is to checkpoint the state of each task on a backup processor. When a processor P fails, the copies of the tasks executed by P are restarted on the backup processor from the last checkpointed state. This technique may require many additional processors in a hard-real-time system. Indeed, every task in the system must complete its execution before its deadline minus the task execution time. This is because after a processor failure, one must have enough time to complete the backup copy of the failed task by its deadline. This additional requirement causes a low processor utilization, and hence an increase in the number of processors needed. As an alternative, an N -replication scheme with majority voting can be employed (Mancini 1986). However, in this case, N -times the number of processors are needed since the basic non-fault-tolerant schedule gets multiplied by the degree of replication. Therefore, it is crucial to design scheduling algorithms which minimize the number of processors needed to schedule a set of tasks. * This work has been supported by grants from the Italian "Ministero dell'Universita e della Ricerca Scientifica e Tecnologica" and the "Consiglio Nazionale delle Ricerche-Progetto Finalizzato Sistemi Informatici e Calcolo Parallelo" . 3 230 A. A. BERTOSSI AND L. V. MANCINI In this paper, we consider periodic tasks which must be correctly executed within their periods. Many periodic scheduling problems have been found to be NP-hard, that is, it is believed that they cannot be solved by optimal polynomial-time algorithms (Lawler et al. 1989). In particular, preemptive scheduling of out-of-phase, static-priority tasks with arbitrary deadlines is NP-hard, even if only a single processor is available (Leung and Merrill 1980). Several heuristic algorithms for scheduling periodic tasks in uniprocessor and multiprocessor systems have been proposed. Liu and Layland (1973) proposed the Rate Monotonic (RM) and Earliest Deadline First (EDF) algorithms for uniprocessor systems. The RM algorithm was generalized by Dhall and Liu (1978) to accommodate multiprocessor systems, and successively improved by Davari and Dhall (1986). Finally, Bertossi and BonucceIli (1983) proposed a scheduling algorithm for the class of uniform multiprocessor systems. When provision for fault-tolerance is necessary, Krishna and Shin (1986) devised a dy namic programming algorithm for multiprocessors which ensures that backup schedules can be efficiently embedded within the primary schedule. The algorithm assumes the existence of the optimal allocation of tasks to processors and schedules the tasks of each processor in order to minimize a given local cost function. Liestman and Campbell (1986) proposed an algorithm, which copes only with software failures, to generate optimal schedules in a uniprocessor system employing the recovery block scheme (Randell, Lee, and Treleaven 1978). Finally, the algorithms proposed in Balaji et al. (1989) and Stankovic, Ramam ritham, and Cheng (1985) dynamically recompute the schedule in order to redistribute the tasks among the remaining operational processors, when a processor fails. In particular, the algorithm presented by Stankovic, Ramamritham, and Cheng (1985) follows a bidding strategy. The present paper considers the classical problem of preemptively scheduling a set of independent periodic tasks under the assumptions that each task deadline coincides with the next request of the same task and that all tasks start in-phase (Liu and Layland 1973, Dhall and Liu 1978). The proposed algorithms use the processor-sharing strategy (Coffman and Denning 1976) and fault-tolerance is implemented by periodically checkpointing the state of each task on a backup processor. This paper is organized as follows. Section 2 introduces the system model and formally characterizes the scheduling problem to be solved. Section 3 recalls how the processor sharing strategy constructs a schedule using the minimum number of processors in the non fault-tolerant case. Sections 4 and 5 give two heuristic algorithms for the problem defined in Section 2 under the assumptions that no two failures occur within the same task period and that the task periods are multiple of the checkpoint period, which in turn must be greater than the checkpoint overhead. Both heuristics require O(n) time to construct the schedule, allow a simple recovery from failures, and provide fault-tolerance by reserving enough spare time in the schedule by assigning a convenient execution rate to tasks. In particular, the heuristic of Section 5 is designed for reducing the overall number of preemptions needed in the schedule. In Section 6 the performance of the algorithms is analyzed and discussed. In particular, it is shown that under reasonable assumptions the two heuristics use less than twice the minimum number of processors needed in the non-fault-tolerant case. Finally, Section 7 discusses directions for further research. 4 SCHEDULING ALGORITHMS FOR FAULT-TOLERANCE IN HARD-REAL-TIME SYSTEMS 231 2. The Scheduling Problem The distributed system under consideration consists of a set of autonomous computers (processors) which are connected by a Real-Time Local Area Network. It is assumed that processors fail in a/ail-stop manner (Schneider and Schlichting 1981); that is, a processor is either operational or faulty, and that all operational processors are able to communicate with each other. Many distributed real-time systems perform a synchronous fault-detection protocol every few milliseconds. In such classes of systems, one could think of taking a synchronous checkpoint of the state ofthe tasks running on all the non-faulty processors at every execution of the failure-detection protocol. Since such systems usually employ a hardware mechanism to synchronize the processor clocks, a synchronous checkpoint can be performed on all the non-faulty processors with no additional message passing. If the time needed for checkpointing is known, then it is possible to plan the schedule in order to meet all the task deadlines even in the presence of failures. Indeed, the maximum amount of computation to recover after a failure will correspond to the period between two executions of the failure-detection protocol minus the checkpoint overhead. Since such failure-detection period is quite small, the checkpoint overhead can be considered as negligible. Indeed, one can think of checkpointing only those task variables whose values changed within the failure-detection period. It is worth noting that current technology allows checkpoints to be performed as atomic operations with a very low overhead and at a very high rate, thus making the algorithms proposed in this paper attractive. For example, the Spring kernel (Ramamritham and Stankovic 1991) provides the abstraction of a fast global shared memory implemented by a fiber optic ring. In such a distributed system the synchronization of the task copies could be implemented just as simple write operations on the global memory. This paper assumes that the distributed system provides the support for synchronous failure detection and periodic checkpointing. The checkpoint interval is called the slot, and it is assumed that the slot length s is constant. It is also assumed that a checkpoint has a a a a cost of time units, with < s, leaving only the first s - units of each slot available for task executions. With such an architecture, a processor failure is detected by the end of the slot in which the failure occurs. In case of a failure only the computation performed within the slot is lost and must be reexecuted on a backup processor. Before describing the algorithms to construct a schedule which allows such a recovery scheme, a precise definition of the scheduling problem addressed in this paper is given. In this context, a periodic task T; is completely identified by a pair (t;, r;), where t; is T; 's execution time and r; is T; 's request period. The requests for T; are periodic, with a constant integer interval r; between every two successive requests. The worst case execution time for all the (infinite) requests of T; is constant and equal to t;. The tasks are in-phase, namely, the first request of each T; is issued at time zero. The periodic tasks T" ... , Tn are independent; that is, the requests of any task do not depend on the execution of the other tasks. Let T" ... , Tn be n independent periodic tasks. The scheduling problem addressed here consists of finding an order in which all the periodic requests of the tasks are to be executed on a set of identical processors so as to satisfy the following conditions: 5 232 A. A. BERTOSSI AND L. V. MANCINI 0 1/5 2/5 3/5 \ = = = = t2 t3 2 t4 10 Pl Tl r =r =r =1 0 r =1 1 P2 123 4 0 28/55 (8) (b) do d3 dS d6 d q P1 T1 ••• P2 0 1 0 11 20 22 30 33 110 (C) Figure 1. (a) A set of four periodic tasks. (b) The pattern schedule of length one obtained with McNaughton's algorithm. (c) The periodic schedule of period R = I \0 obtained with the Optimal algorithm. 1. integrity is preserved, that is, each task is executed by at most one processor at a time and no processor executes more than one task at a time; 2. deadlines are met, namely, each request of any task must be completely executed before the next request of the same task is issued, that is, by the end of its period; 3. the number m of processors is minimized; 4. fault tolerance is guaranteed, namely, the above conditions are verified even when processors fail, provided that two consecutive failures do not occur within the same period of a task. A schedule is feasible if requirements (1 )-(2) are satisfied, it is optimal if requirement (3) is also verified, and it isfault-tolerant if also (4) holds. In the following, fault-tolerant scheduling algorithms will be presented which can tolerate k processor failures given that there are at least k backup processors. Such algorithms are based on a processor-sharing strategy (Coffman and Denning 1976). In this strategy, each task is assigned a fixed rate of computation per time unit within its period without being assigned any priority at all. This strategy is non-greedy, for it does not operate according to any static or dynamic priority scheme and might insert intentional idle time into the schedule (an example of a schedule produced by such a strategy which includes intentional idle time is shown in Fig. lc, and will be discussed in the next section.) A periodic schedule with period R is a feasible schedule in which each processor does + exactly the same thing at time t that it does at time t R. In other words, if task T; is assigned to processor Pj at time t, then it is also assigned to the same processor at time + + t R, while if a processor is idle at time t, then it is idle at time t R, too. One can see that 6 SCHEDULING ALGORITHMS FOR FAULT-TOLERANCE IN HARD-REAL-TIME SYSTEMS 233 if there is a feasible schedule for a set of tasks T1, ••• , Tn, then there is a feasible periodic schedule with period R = km {rl' ... , rn }, the least common multiple of the task request periods (Lawler and Martel 1981). 3. Background All the algorithms presented in this paper for fault-tolerant scheduling periodic tasks use as a subroutine the well-known McNaughton's algorithm (McNaughton 1959). Recall that McNaughton's algorithm is designed for the general (non-periodic) problem of preemptively scheduling on m identical processors a set of n independent tasks T{, ... , T~ with execution times t;, ... , t~. Such an algorithm constructs a schedule of minimum length L, where: t:l· t~}, L L = max {maX{I;, ... , (lIm) I~r~n This algorithm fills the processors one at a time, by successively scheduling the tasks in any order, and splitting a task between two processors whenever the length L is met. The time complexity of McNaughton's algorithm is O(n). It is well-known that the above algorithm directly leads to a processor-sharing algorithm (Coffman and Denning 1976) for scheduling the set T1, ••• , Tn of periodic tasks using the minimum number m* = fLI<i<n tdril of processors. Consider a non-periodic problem in which t; = ti I ri, for i = 1~ .~ . ,n. Substituting such values in the formula giving the length of the schedule produced by McNaughton's algorithm, we get: I· L L = max {max{ttlrl' ... , tnlrn}, (llm*) tdri 1"5.1~n If t; I r; :::: 1, for all i, then L :::: l. By McNaughton's analysis, in the pattern schedule generated by means ofJ:tis algorithm no processor is assigned to more than one task at a time, and no task is assigned to more than one processor at a time. To construct a schedule for the periodic tasks T1, ••• , Tn, it suffices to repeat the pattern schedule for every unit interval. + Since there are r; repetitions of such a schedule during any time interval (hr;, (h 1) r;], h 2: 0, each occurrence of task T; is processed for ri (t; I r;) = t; units of time, and every task occurrence meets its deadline. + The pattern schedule generated by McNaughton's algorithm can have at most n m* - 1 preemptions per unit of time. Indeed, at most one task per processor may be preempted, within the same pattern schedule, between such a'processor and the next processor, while each task may be preempted between two successive pattern schedules. Thus the overall + number of preemptions within a time interval of length R is at most R(n m* - 1), where R is the least common multiple of the task periods. This number of preemptions can be reduced by considering iteratively any interval between two successive requests of any tasks, and then expanding the pattern schedule by a scale factor equal to the length of such interval. The resulting algorithm, which we call Optimal, reduces the overall number of 7

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.