BULK PRIMITIVES IN LINDA RUN-TIME SYSTEMS Antony Ian Taylor Rowstron Submittedfor theDegreeofDoctorofPhilosophy DepartmentofComputerScience October1996 Abstract This thesis investigates techniques for the efficient implementation of the Linda parallel process coordination modelforopen,distributed computing systems. The principal focus of the research is on the use of the bulk movement of tuples within open systems which, contrary to intuition, can result in significant efficiency gains for a large class of problems. Theemphasisonopensystems—thoseinwhichfuturehistoryofprocesscreationand deletion cannot beknown atcompile-time —isdueto thecurrent interest inextending theLinda model to encompass widely distributed computing, as exemplified by the ‘Network Computer’ notion. However,suchopensystemsplacesevereconstraintsonthetypesofoptimisationavailable relative to closed systems — in particular, the very powerful compile-time analysis techniques previously usedarenolongerfeasible. MethodsfortheconstructionofefficientLindakernelsareintroducedbasedonanovelmethod ofdynamicallyclassifyingtuplespacesaccordingtotheirlocality,whichallowstherun-timemove- ment of tuple spaces’ locations within the distributed kernel. An important consequence of the proposed technique is that it does not require any ‘global’ information — it works solely on in- formation locally available to each component of the distributed kernel. Equally importantly, the scheme is entirely transparent to the programmer, and therefore requires no user-supplied ‘hints’ or‘pragmas’. Theimplementedkernelisfullydistributed,consequentlytupleswithinaparticulartuplespace maybe stored on several physical nodes. The kernel supports standard Linda with multiple tuple spaces, thecollectprimitive, andanother primitivecalled copy-collect. Thejustification for the addition of copy-collectis the multiple rd problem which is described in detail in this dissertation. No acceptable way of overcoming the multiple rd problem without the use of thecopy-collectprimitivehasbeenpublished. Theperformanceoftheimplementedkernelisshowntobesignificantlybetterthantheperfor- manceofakernelthatdoesnotusethebulkmovementoftuples,andthroughusinga“real-world” example the kernel is shown to provide, under some circumstances, better performance than the bestcommercially available closedimplementation whichusescompiletimeanalysis. Finally,anextensionoftheconceptofclassifying tuplespacesispresented, whichgeneralises the concept leading to a detailed proposal for a multi-layer hierarchical kernel, which is more scalablethancurrenttraditional implementations. I II ABSTRACT Contents Abstract I Dedication XV Acknowledgements XVII Declaration XIX 1 Introduction 1 1.1 ThesisOverview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Relatedandbackgroundwork 7 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Linda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.1 TheLindaprimitives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.2 Tuplematching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.3 Linda1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Lindaextensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.1 Multipletuplespaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.2 Newprimitives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.3 PiranhaandBauhausLinda . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.4 TheLindamachine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.5 Closingcomments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4 Theinpandrdpprimitives: theneedforoutordering . . . . . . . . . . . . . 19 2.4.1 Globalsynchronisation . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.4.2 outordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.5 Hostlanguages usedinthisdissertation . . . . . . . . . . . . . . . . . . . . . . 23 2.5.1 ISETL-Lindasyntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.5.2 C-Lindasyntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.5.3 Processesandthehostlanguages . . . . . . . . . . . . . . . . . . . . . . 28 III IV CONTENTS 2.6 Non-determinism andLinda . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.6.1 Instructional footprinting . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3 Themultiplerdproblem 33 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2 Parallelcomposition oftwobinaryrelations . . . . . . . . . . . . . . . . . . . . 34 3.2.1 Formaldefinitionofthecomposition oftwobinaryrelations . . . . . . . 34 3.2.2 Thegeneralapproach toimplementation . . . . . . . . . . . . . . . . . . 34 3.3 Tuplesassemaphores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.3.1 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.4 Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.4.1 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.5 Experimentalresults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.5.1 Thebinarysemaphoremethod . . . . . . . . . . . . . . . . . . . . . . . 40 3.5.2 Thestreammethod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.5.3 Experimentalconclusions . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.6 Coarsening theapproach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4 Copy-collect: AnewprimitivefortheLindamodel 49 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.2 Thecopy-collectprimitive . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.3 Usingcopy-collecttosolvethemultiplerdproblem . . . . . . . . . . . . 55 4.4 Experimentalresults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.5 Modellingtheperformance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.6 Comparisonwithothersimilarproposals . . . . . . . . . . . . . . . . . . . . . . 67 4.7 Newcoordination constructs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5 Theimplementationofbulkprimitives 79 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.2 Reviewofimplementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.2.1 Openimplementations versusclosedimplementations . . . . . . . . . . 80 5.2.2 Closedimplementation techniques . . . . . . . . . . . . . . . . . . . . . 81 5.3 Kernelimplementation techniques . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.3.1 Tupledistribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.3.2 Theevalprimitive . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.4 Addingexplicitinformation toLindaprograms . . . . . . . . . . . . . . . . . . 89 CONTENTS V 5.5 Implementing multipletuplespaces . . . . . . . . . . . . . . . . . . . . . . . . 91 5.5.1 Linda-Polylith . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.5.2 MTS-Linda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.6 Anaiveapproach toimplementing thebulkprimitives . . . . . . . . . . . . . . . 94 5.7 Classification oftuplespaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.8 TheYorkKernelII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.9 TupleDistribution withinthekernel . . . . . . . . . . . . . . . . . . . . . . . . 102 5.10 Tuplesandtuplestoragewithinthekernel . . . . . . . . . . . . . . . . . . . . . 102 5.11 TheLocalTupleSpaceManager . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.12 Implementing theoutprimitive . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.13 Implementing theinandrdprimitives . . . . . . . . . . . . . . . . . . . . . . 105 5.14 Implementing thebulkprimitives . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.15 Trackingtuplespacehandles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.16 Thebulkmovementoftuples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.17 OptimisingtheYorkKernelII . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.17.1 outoptimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.17.2 LTSMtupleinsertion optimisation . . . . . . . . . . . . . . . . . . . . . 110 5.18 Whyclassify tuplespaces? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.19 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 6 PerformanceoftheYorkKernelII 117 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 6.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 6.2.1 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 6.2.2 Experimental conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.3 TheHoughtransform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 6.4 Paralleldecomposition oftheHoughtransform. . . . . . . . . . . . . . . . . . . 134 6.4.1 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 6.4.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 7 Generalisingtuplespaceclassification 143 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 7.2 Eagerness intuplespacemovement . . . . . . . . . . . . . . . . . . . . . . . . 143 7.3 Reclassification ofaRTStoLTSs . . . . . . . . . . . . . . . . . . . . . . . . . 146 7.4 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 7.5 N-layerhierarchical kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 7.6 TheTSSnodestructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 7.7 Demonstration oftheN-layerhierarchical kernel . . . . . . . . . . . . . . . . . 161 VI CONTENTS 7.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 8 ConclusionsandFutureresearch 167 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 8.2 Futureresearch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 8.2.1 A“Linda”fordistributed computing? . . . . . . . . . . . . . . . . . . . 169 8.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 8.3.1 Multiplerdoperation . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 8.3.2 Proposalfortheadoption ofthecopy-collectprimitive . . . . . . . 171 8.3.3 AnovelkernelforLinda . . . . . . . . . . . . . . . . . . . . . . . . . . 171 8.3.4 Detaileddescription ofahierarchical kernel . . . . . . . . . . . . . . . . 171 8.4 Closingremarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 A OverviewoftheLindaimplementations 173 B Sourcecodeforexperimentalresults 175 B.1 Experimentone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 B.2 Experimenttwo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 B.3 Experimentthree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 B.4 Experimentfour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 B.5 Experimentfive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 B.6 Experimentsix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 List of Figures 2.1 Anexampleofaninteractivesession usingISETL-Linda. . . . . . . . . . . . . . 26 2.2 Howthephilosophers spendtheretimeintheLindaversion. . . . . . . . . . . . 31 2.3 Howthephilosophers mayspendtheirtimeintheoptimisedversion. . . . . . . . 31 3.1 Composition oftwobinaryrelationsrepresented usingthreetuplespaces. . . . . 34 3.2 Execution time for the parallel composition of binary relations when using the binarysemaphore method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.3 Executionpatterns fortwoandfiveworkerprocesses usingthesemaphore method. 42 3.4 Execution time for the parallel composition of binary relations when using the streamapproach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.5 Theworkerprocesswhenadoptingacoarserapproachtotheparallelcomposition ofbinaryrelations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.1 Usingthecopy-collectprimitivetosolvethemultiplerdproblem. . . . . . 55 4.2 Executiontimefortheparallelcompositionofbinaryrelationswhenusingthenew copy-collectprimitive. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.3 Execution time for the parallel composition versus the number of pairs in tuple spaceSwhicheachpairintuplespaceRmatches. . . . . . . . . . . . . . . . . . 59 4.4 Absoluteprimitivecountforthestreammethodforthemultiplerdproblem. . . 61 4.5 Absoluteprimitivecountforthesemaphore methodforthemultiplerdproblem. 61 4.6 Absolute primitive count for the copy-collect method for the multiple rd problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.7 Comparison ofabsolute primitivecountsforthestreamandsemaphoremethods. 63 4.8 Comparison ofthenon-concurrent primitivecounts forthestreamandsemaphore methodstothemultiplerdproblem. . . . . . . . . . . . . . . . . . . . . . . . . 63 4.9 Comparison of the non-concurrent primitive counts for copy-collect and streammethodstothemultiplerdproblem. . . . . . . . . . . . . . . . . . . . . 64 4.10 Best case expected communication overheads for the three different methods to themultiplerdproblem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.11 Worst case expected communication overheads for the three different methods to themultiplerdproblem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 VII VIII LISTOFFIGURES 5.1 Intermediate uniform distribution using16kernelprocesses. . . . . . . . . . . . 85 5.2 Anaiveapproach toimplementingthecopy-collectprimitive. . . . . . . . . 95 5.3 TheYorkKernelIIarchitecture. . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.4 Thetuplestoragedatastructure usedwithinTSSprocesses andtheLTSM. . . . . 103 5.5 Tuplespacehandles embeddedwithintuplespaces. . . . . . . . . . . . . . . . . 107 5.6 Latencyformessagesupto1024bytesinsizeusingPVM. . . . . . . . . . . . . 112 5.7 Latencyformessagesupto100KilobytesinsizeusingPVM. . . . . . . . . . . 112 5.8 Bandwidthversusmessagessizeupto1024bytes. . . . . . . . . . . . . . . . . . 113 5.9 Bandwidthversusmessagessizeupto100Kilobytes. . . . . . . . . . . . . . . . 113 5.10 Thetimetakentosendasinglebyteformessagessizeupto1024bytes. . . . . . 114 5.11 Thetimetakentosendasinglebyteformessagessizeupto100Kilobytes. . . . 114 6.1 Summaryoftheresults ofexperimentone. . . . . . . . . . . . . . . . . . . . . . 119 6.2 Summaryoftheresults ofexperimenttwo. . . . . . . . . . . . . . . . . . . . . . 120 6.3 Summaryoftheresults ofexperimentthree. . . . . . . . . . . . . . . . . . . . . 121 6.4 Summaryoftheresults ofexperimentfour. . . . . . . . . . . . . . . . . . . . . 122 6.5 Summaryoftheresults ofexperimentfive. . . . . . . . . . . . . . . . . . . . . . 123 6.6 Summaryoftheresults ofexperimentsix. . . . . . . . . . . . . . . . . . . . . . 124 6.7 Summaryoftheresults ofallsixexperiments. . . . . . . . . . . . . . . . . . . . 125 6.8 Processing anaeroplane image. . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 6.9 AsimpleimageanditsParameterspaceaftertheHoughtransformhasbeenapplied.133 6.10 Theparalleldecomposition oftheHoughtransform . . . . . . . . . . . . . . . . 135 7.1 Anexamplefivelayerhierarchical kernel. . . . . . . . . . . . . . . . . . . . . . 149 7.2 The pseudo-code for managing an out message within a TSSnode in the N-layer hierarchical kernel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 7.3 The pseudo-code for managing an in (and rd) message within a TSS node in the N-layerhierarchical kernel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 7.4 Thepseudo-code formanaging areplymessage withinaTSSnodeintheN-layer hierarchical kernel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 7.5 Thepseudo-codeformanagingarequestmessagewithinaTSSnodeintheN-layer hierarchical kernel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 7.6 The pseudo-code for managing a packet message within a TSS in the N-layer hierarchical kernel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 7.7 Thepseudo-codeformanagingacollectmessagewithinaTSSnodeintheN-layer hierarchical kernel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 7.8 The pseudo-code for managing a packet down message within a TSSnode in the N-layerhierarchical kernel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
Description: