Uwe Zdun · Eugene Wallingford e n Editors i l b u S l a n r u o Transactions on J 0 0 Pattern Languages 6 0 1 S of Programming IV C N L James Noble · Ralph Johnson Editors-in-Chief 123 Lecture Notes in Computer Science 10600 Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen Editorial Board David Hutchison Lancaster University, Lancaster, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Friedemann Mattern ETH Zurich, Zurich, Switzerland John C. Mitchell Stanford University, Stanford, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel C. Pandu Rangan Indian Institute of Technology Madras, Chennai, India Bernhard Steffen TU Dortmund University, Dortmund, Germany Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA More information about this series at http://www.springer.com/series/8677 James Noble Ralph Johnson (cid:129) Uwe Zdun Eugene Wallingford (Eds.) (cid:129) Transactions on Pattern Languages of Programming IV 123 Editors-in-Chief James Noble RalphJohnson Victoria University ofWellington Siebel Center forComputer Science Wellington, NewZealand Urbana,IL, USA Editors UweZdun Eugene Wallingford University of Vienna University of Northern Iowa Vienna,Austria Cedar Falls,IA, USA ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notesin Computer Science ISSN 1869-6015 ISSN 2511-6444 (electronic) Transactions onPattern Languages ofProgramming ISBN 978-3-030-14290-2 ISBN978-3-030-14291-9 (eBook) https://doi.org/10.1007/978-3-030-14291-9 LibraryofCongressControlNumber:2019932783 ©SpringerNatureSwitzerlandAG2019 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartofthe material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodologynow knownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbookare believedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsortheeditors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictionalclaimsin publishedmapsandinstitutionalaffiliations. ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Preface Welcome to the fourth issue of LNCS Transactions on Pattern Languages of Programming. Software patterns are an effective means for improving the quality of softwaredesignandengineeringandcommunicationamongthepeoplebuildingthem. Patterns capture the best practices of software design, making them available to all software engineers. LNCS Transactions on Pattern Languages of Programming publishes papers on patterns and pattern languages as applied to software design, development, and use, throughout all phases of the software life cycle, from requirements and design to implementation, maintenance and evolution. The primary focus of the LNCS Trans- actions on Pattern Languages of Programming is on patterns, pattern collections, and pattern languages themselves. The journal also includes reviews, survey articles, crit- icisms of patterns and pattern languages, as well as other research on patterns and pattern languages. This issue includes six articles that went through two phases of review and improvement. First, the articles were workshopped at one of the PLoP conferences where (after an initial peer review) they received suggestions for improvement in a shepherding process and in a writer’s workshop. Then the articles were substantially extended by the authors, and these extended versions were peer-reviewed again by at least three reviewers per article. ThiseditionofLNCSTransactionsonPatternLanguagesofProgrammingmarksa transitionineditorship:Thefoundingeditors-in-chief,JamesNobleandRalphJohnson, will retire after this issue and the current editors, Eugene Wallingford and Uwe Zdun, willbecometheneweditors-in-chief.WethankJamesandRalphfortheireffortsduring thefounding periodof LNCS Transactions on Pattern Languages ofProgramming. We thank the anonymous reviewers who helped in the peer-review process of this issue. January 2019 James Noble Eugene Wallingford Uwe Zdun Contents Patterns for Light-Weight Fault Tolerance and Decoupled Design in Distributed Control Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Pekka Alho and Jari Rauhamäki Safety Architecture Pattern System with Security Aspects . . . . . . . . . . . . . . 22 Christopher Preschern, Nermin Kajtazovic, and Christian Kreiner An Open Source Pattern Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Christoph Hannebauer and Volker Gruhn Patterns for Functional Safety System Development . . . . . . . . . . . . . . . . . . 100 Jari Rauhamäki Internet of Things Patterns for Communication and Management . . . . . . . . . 139 Lukas Reinfurt, Uwe Breitenbücher, Michael Falkenthal, Frank Leymann, and Andreas Riegg A Pattern Language for Knowledge Handover When People Transition. . . . . 183 Kei Ito, Joseph W. Yoder, Hironori Washizaki, and Yoshiaki Fukazawa Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Patterns for Light-Weight Fault Tolerance and Decoupled Design in Distributed Control Systems Pekka Alho1(&) and Jari Rauhamäki2 1 Departmentof Intelligent HydraulicsandAutomation, Tampere University of Technology,Tampere, Finland pekka.alho@iki.fi 2 Department of Automation Science andEngineering, Tampere University of Technology,Tampere, Finland [email protected] Abstract. Distributed control systems comprise networked computing units that monitor and control physical processes in feedback loops. Reliability of these systems is affected by dynamic and complex computing environments where connections and system configurations may change rapidly. Diverse redundancy can be effective in improving system dependability, but it is sus- ceptible to common mode failures and development costs for design diversity areoftenseenasprohibitive.Inthispaperwepresentthreepatternsthatcanbe used to provide light-weight form of fault tolerance to improve system dependabilityandresiliencebyprovidingabilitytocopewithunexpectedevents and faults. These patterns are presented together with a pattern language that shows howthey relate toother fault tolerance patterns. (cid:1) (cid:1) (cid:1) Keywords: Dependability Distributed systems Fault tolerance (cid:1) Real-time systems Reliability 1 Introduction Distributed control systems are continuously gaining importance, as more and more devicesandmachinesareequippedwithembeddedsystemsthatcontroltheiroperation. Computers in these control systems are increasingly more powerful and networked, providingintelligenceandinteroperability.Examplesofsuchsystemsrangefromlarge mobile machines to groups of robots and intelligent sensor networks. These cyber- physicalsystems(CPSs)interactwithenvironmentandphysicalprocesses,influencing many parts of our lives either directly or indirectly. Therefore they need to be de- pendable, which can be measured with the attributes of availability, reliability, safety, integrity and maintainability [1]. However, with the increased functionality and intel- ligence, the complexity of these systems is also increased, meaning that the develop- ment process becomes more demanding and dependability becomes more costly to achieve and verify. Another significant feature of CPSs is that they often have strict timing constraints, which may put limitations on the architecture. ©SpringerNatureSwitzerlandAG2019 J.Nobleetal.(Eds.):TPLOPIV,LNCS10600,pp.1–21,2019. https://doi.org/10.1007/978-3-030-14291-9_1 2 P. Alho and J.Rauhamäki Many critical systemsthat havefailed catastrophicallyarewell-known –examples such as Therac-25 radiation therapy machine and the explosion of Ariane 5 rocket are infamous, whereashighly reliable systems receive little recognition, even though their study might give valuable ideas for the design and architecture of new software. One example of such systems can be found in telephony applications, namely Ericsson AXD301 Asynchronous Transfer Mode (ATM) switches that achieved nine nines (99.9999999%) service availability, running software written in Erlang [2]. Erlang’s highly decoupled actor model and fault handling based on supervisors have inspired especially. Let it crash and Service manager patterns found in this paper. Thispaperpresentsthreesoftwarepatternsthatcanbeusedtoimprovecontrolsystem dependability–thethirdpatterniscalledDATA-CENTRICARCHITECTURE–andshowshow theyfitintheexistingliteraturebyaddressingthespecificneedsofCPSs.Theapproach promoted by these three patterns is based on implementing a decoupled architectural designwithsupportingfaultmitigationandhandling.Thedecoupledarchitecturecanalso beusedtograduallyintroduceadditionalfaulttolerancesolutionssuchascheckpointing andrejuvenationtothesystem,untilasufficientlevelofreliabilityhasbeenachieved[3]. Our patterns were originally encountered in the research of remote handling control systemsforroboticmanipulators,butallpatternshaveexamplesofotherknownusesas well.Theseexamplesarepresentedinthecorrespondingsectionsofthepatterns. One reason why development of CPSs is difficult is because the systems typically consist of dynamic service chains that operate on wide range of platforms, which complicates management of end-to-end deadlines. Moreover, modern middleware provide capabilities to flexibly change service deployment on these subsystems, but some configurations may be inefficient or even unusable if communication links become overloaded. While adaptability has benefits, these uncertainties nevertheless complicate assurance of reliability and predictability of the system. Therefore, CPSs benefit from a design that makes the overall system more robust, whereas more tra- ditional fault tolerance solutions, such as hardware redundancy, are arguably better suited for static safety-critical subsystems. Data-centric approach is one way to increase decoupling between communicating units. However, data-centric design as a central communication paradigm, as well as the concept of CPS, is still fairly novel in the domain of distributed control systems. Although control systems are by nature data-centric (read sensor data and desired output, send actuator command, etc.), this has usually been from point A to point B. The patterns in this paper capture some of the ways that reliability-related challenges faced in developing more intelligent and adaptable distributed control systems have been solved. Next chapter shows how our patterns fit the gaps in the existing pattern literature, by addressing needs specific to CPSs. 2 Context of the Patterns Faulttolerancecannotbeimplementedwithoutredundancyofsomekind.Tohavefault tolerancefore.g.computerfailures,wewouldneedatleasttwocomputers–ifonefails theotheronecandetecttheerrorandtrytocorrectit.Softwarefaultsontheotherhand are typically development faults, which are harder todetect and correct than hardware Patterns for Light-Weight Fault Tolerance andDecoupled Design 3 faults. To have good coverage for software faults, diverse redundancy (e.g. N-version programming) is needed, but it has been criticized of being susceptible to common mode failures [4]. Moreover, development costs for design diversity are often seen as prohibitive. Patterns in this paper present an alternative approach to fault tolerance, based on dividingthesystemintohighlydecoupledmodulesandimplementinglightweightform of fault tolerance. We present an architectural pattern called DATA-CENTRIC ARCHITEC- TURE as one way to achieve a high level of decoupling. One of the key points of decoupling is that it should by itself improve reliability by limiting fault propagation and improving modularity and understandability of the system. In a way, modular approachcanbeseensimilartocompartmentalizationofships–withoutcompartments, everyleakcansinktheship.Anexampleofasoftwaresystemthatusesmodularityto successfully implement fault isolation and resilience is the MINIX 3 operating system releasedin2005[5].DrivermanagementofMINIX3ispresentedasoneoftheknown uses of SERVICE MANAGER. Modularanddecoupledarchitecturecanalsobeusedtoimplementotherreliability- improvingpatternslikeSERVICEMANAGERandLETITCRASHdocumentedinthispaperor other well-known patterns like LEAKY BUCKET COUNTER [6], WATCHDOG [6, 7], etc. The shortdescriptionsofthepatternspresentedinthispaperarelistedintheTable 1.Listof all referenced patterns with descriptions can be found in an appendix. Table1. Pattern descriptions Pattern Description DATA-CENTRIC How toimplement reliableandscalable distributed control system? ARCHITECTURE Build the system fromautonomous modules that communicate by sharing data that isbased onawell-designed andconsistent data model SERVICEMANAGER Howtodetectfaultsandrestartmodulesor processesafterafailure? Implementaservicemanagerthatcanmonitor,startandstopmodules LETITCRASH Howtoreacttofailureswithoutcrashingthewholesystem?Flushthe corruptedstateby“crashing”theprocessinsteadofwritingextensive error handlingcode. Letsome other process like service managerdo the error recovery e.g.byrestarting thecrashed process DATA-CENTRIC ARCHITECTURE provides the decoupled architectural model needed to use LET IT CRASH for fault handling. The SERVICE MANAGER pattern provides a way for trying recovery after failures, in addition to providing error detection and monitoring. TheideaofcrashingaprocesssuggestedbyLETITCRASHmaysoundlikeariskyaction to take. However, the idea is to offer recovery from transient physical and interaction faults (sometimes called Heisenbugs), ability to keep the system as a whole func- tioning, even if some internal process would crash, and possibility to hot-swap code and bug-fixes. The downside of this approach is of course that it is not suited for fail- operate systems such as flight controllers that must be operational all the time – this type of systems would be the right domain to apply design diversity.