Low-level Concurrent Programming Using the Relaxed Memory Calculus Michael J. Sullivan CMU-CS-17-126 November 2017 SchoolofComputerScience CarnegieMellonUniversity Pittsburgh,PA15213 ThesisCommittee: KarlCrary,Chair KayvonFatahalian ToddMowry PaulMcKenney,IBM Submittedinpartialfulfillmentoftherequirements forthedegreeofDoctorofPhilosophy. Copyright(cid:13)c 2017MichaelJ.Sullivan This research was sponsored in part by the Carnegie Mellon University Presidential Fellowship. The views and conclusions contained in this document are those of the author and should not be interpreted as representing the officialpolicies,eitherexpressedorimplied,ofanysponsoringinstitution,theU.S.governmentoranyotherentity. Keywords: programming languages, compilers, concurrency, memory models, relaxed memory Abstract The Relaxed Memory Calculus (RMC) is a novel approach for portable low- level concurrent programming in the presence of the the relaxed memory behavior caused by modern hardware architectures and optimizing compilers. RMC takes a declarativeapproachtoprogrammingwithrelaxedmemory: programmersexplicitly specify constraints on execution order and on the visibility of writes. This differs from other low-level programming language memory models, which—when they exist—areusuallybasedonorderingannotationsattachedtosynchronizationopera- tionsand/orexplicitmemorybarriers. In this thesis, we argue that this declarative approach based on explicit programmer-specifiedconstraintsisapracticalapproachforimplementinglow-level concurrentalgorithms. Tothisend,wepresentRMC-C++,anextensionofC++with RMCconstraints,andrmc-compiler,anLLVMbasedcompilerforit. Weliveonaplacidislandofignoranceinthemidstofblackseasofinfinity, anditwasnotmeantthatweshouldvoyagefar. —“TheCallofCthulhu”,H.P.Lovecraft AndallIaskisatallshipandastartosteerherby; —“SeaFever”,JohnMasefield iv Acknowledgments It’s difficult to overstate how much thanks is owed to my parents, Jack and Mary Sullivan. They’ve provided nearly three decades of consistent encouragement, support (emotional, moral, logistical,financial,andmore),andadvice(“Put‘writethesisproposal’onyourTo-DoList,then write the proposal and cross it off!”). I know I wasn’t the easiest kid to raise, but they managed tomostlysalvagethesituation. Thankyou—Iwouldn’tbeherewithoutyou. Thankstomybrother,Joey,whomyinteractionswithovertheyearshaveshapedmeinmore ways that I understand. And I’m very grateful that—at this point in our lives—most of those interactionsarenowpositive. Immeasurable thanks to my advisor, Karl Crary, without whom this never would have hap- pened. Karl is an incredible researcher and it was a pleasure to spend the last six years collabo- rating with him. Karl also—when I asked him about the CMU 5th Year Masters—was the one to convince me to apply to Ph.D. programs instead. Whether that is quite worth a “thanks” is a hardquestiontoanswerwithanythingapproachingcertainty,butIbelieveitis. I also owe a great debt to Dave Eckhardt. Operating Systems Design and Implementation (15-410) is the best class that I have ever taken and I haven’t been able to get it out of my head. The work I did over seven semesters as one of Dave’s TAs for 410—teaching students about concurrency, operating systems, and, most of all, giving thoughtful and detailed feedback about software architecture and design—is the work I am most proud of, and I am supremely thankful for being able to be a part of it. While much of my work is likely to take me far afield, I try to carryalittleofthekernelhackerethicinmyheartwhereverIgo. Ontopofthat,Davehasbeena consistentsourceofadvice,support,andguidancewithoutwhichI’dneverhavemadeitthrough gradschool. IowemuchtoalongstringofexcellentteachersIhadintheMequon-ThiensvilleSchoolDis- trict,beginningwithmythird-gradeteacherMrs. Movell,who—inresponsetomeaskingifthere was a less tedious way to draw some particular shape in LOGO—arranged for a math teacher to come over from the high school to teach me about the wonders of variables, if statements, and GOTOs. I owe a particularly large debt to Robin Schlei, who provided years of support, and to KathleenConnelly,whofirstputmeintouchwithMarkStehlik. Thankyou! On that note, thanks to Mark Stehlik, who was as good an undergraduate advisor as could be asked for, even if he wouldn’t let me count HOT Compilation as an “applications” class. And thanks to all the great teachers I had, especially Bob Harper, who kindled an interest in programminglanguages. Grad school was a long and sometimes painful road, and I’d have never made it through withoutmyhobbies. Chiefamongthem,myregulartabletoprole-playinggamegroup,whosac- rificed countless Saturday nights to join me in liberating Polonia, exploring the Ironian Wastes fortheAmbrosiaTradingCompany,andconclusivelydemonstrating,viatheirworkfortheKro- mianMagesGuild,thatnotallwizardsaresubtle: RhettLauffenburger(RogerKazynski),Gabe Routh (Hrothgar Boarbuggerer ne´ Hrunting), Ben Blum (Yolanda Swaggins-Moonflute), Matt Maurer (Jack Silver), Michael Arntzenius (Jakub the Half-Giant), and Will Hagen (Daema). A GM couldn’t ask for a better group of players. I’d also like to thank my other major escape: the EVEOnlineallianceOfSoundMind,andespeciallytheCEOswhohavebuiltSOUNDintosuch a great group to fly with—XavierVE, June Ting, Ronnie Cordova, and Jacob Matthew Jansen. v Flysafe! The SCS Musical provided another consistently valuable form of stress in my life, and I’d liketothankeverybodyinvolvedinhelpingtoreviveitandkeepitrunning. Ihadallowedmyself to forget about the giant technical-theatre-shaped hole in my heart, and I don’t think I’ll be able toagain. Thanks to the old gang from Homestead—Becki Johnson, Gina Buzzanca, and Michael and Andrew Bartlein. Though we’ve been scattered far and wide from our parents’ basements in Wisconsin—andthoughtheBartleinsarerubbishatkeepingintouch—you’veremainedsomeof mytruestfriends. Alotofpeopleprovided—intheirownways—importantemotionalsupportatdifferentpoints in the process. I owe particular thanks to Jack Ferris, Joshua Keller, Margaret Meyerhofer, AmandaWatson,andSarahSchultz. Salil Joshi and Joe Tassarotti contributed in important ways to the design of RMC and its compiler. BenSegall,PaulDagnelie,andRickBenuaprovidedhelpfulsuggestionsaboutbench- markingwoes. PaulandRickalsoprovidedusefulcommentsonthedraft. MattMaurerprovided somemuch neededstatisticalhelp. I’dlike tothankShakedFlur etal.for providingmewiththe ARMv8 version of the ppcmem tool, which proved invaluable for exploring the possible behav- iorsofARMv8s. A heartfelt thanks to everyone on my committee. Paul McKenney provided early and in- teresting feedback on this document (and a quote from “Sea Fever”), Kayvon Fatahalian gave excellent advice on what he saw as the key arguments of my work, and Todd Mowry helped me understandthecomputerarchitect’sviewoftheworldalittlebetter. I’vehadalotoffantasticcolleaguesduringthedecadeIspentatCMU,bothasanundergrad andasagradstudent. Ilearnedmoreaboutcomputerscienceandaboutprogramminglanguages in conversation with my colleagues than I have from any books or papers. Thanks to Joshua Wise, Jacob Potter, Ben Blum, Ben Segall, Glenn Willen, Nathaniel Wesley Filardo, Matthew Wright, Elly Fong-Jones, Philip Gianfortoni, Car Bauer, Laura Abbott, Chris Lu, Eric Faust, Andrew Drake, Amanda Watson, Robert Marsh, Jack Ferris, Joshua Keller, Joshua Watzman, Cassandra Sparks, Ryan Pearl, Kelly Hope Harrington, Margaret Meyerhofer, Rick Benua, Paul Dagnelie, Michael Arntzenius, Carlo Anguili, Matthew Maurer, Joe Tassarotti, Rob Simmons, ChrisMartens,NicoFeltman,Favonia,DannyGratzer,StefanMueller,AnnaGommerstadt,and manymanyothers. Thanks also to all the wonderful colleagues I had working at Mozilla and Facebook on the RustandHackprogramminglanguages. The most stressful and miserable part of my grad school process was the semester I spent preparing my thesis proposal. I was nearly burnt out after it and I don’t think I could have kept ongoingwithoutthetenweeksIspent“WorkingFromTahoe”(WFT)fromTheFairwayHouse inLakeTahoe. ThankstoVailResortsandeverybodywhosplittheskicabinwithmeformaking it possible—especially to Jack Ferris and Kelly Hope Harrington for bullying me into it. And thanks to the Northstar Ski Patrol, the paramedics of the Truckee Fire Protection District, and the staff at the Tahoe Forest Hospital and the University of Pittsburgh Medical Center for all the excellentcareIreceivedafterIskiedintoatree. ThankstoAaronRodgersandMikeMcCarthy,butnottoDomCapers. GoPackGo! NVIDIA Corporation donated a Jetson TK1 development kit which was used for testing vi and benchmarking. The IBM Power Systems Academic Initiative team provided access to a POWER8machinewhichwasusedfortestingandbenchmarking. ToalloftheundoubtedlycountlesspeoplewhoIreallyreallyoughttohavethankedherebut whoslippedmymind: I’msorry,andthankyou! And last, but certainly not least, I’d like to extend a heartfelt thanks to Phil, Madz, Manny, and all the rest of the gang down at Emarhavil Heavy Industries. Emarhavil Heavy Industries— wherethere’salwayssomethingbigjustoverthehorizon. vii viii Contents 1 Introduction 1 1.1 SequentialConsistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 ParadiseLost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.1 Hardwarearchitectureproblems . . . . . . . . . . . . . . . . . . . . . . 2 1.2.2 Compilerproblems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 LanguageMemoryModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.1 Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.2 C++11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Anewapproach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 TheRelaxedMemoryCalculus 9 2.1 ATourofRMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.2 Concretesyntax: tagging . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.3 Preandpostedges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1.4 Transitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1.5 Pushes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2.1 Ringbuffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2.2 Usingdatadependency . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3 AdvancedFeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.1 Non-atomiclocationsanddataraces . . . . . . . . . . . . . . . . . . . . 19 2.3.2 Sequentiallyconsistentlocations . . . . . . . . . . . . . . . . . . . . . . 20 2.3.3 Giveandtake-fine-grainedcrossfunctionedges . . . . . . . . . . . . . 21 2.3.4 LPREandLPOST-PreandPostedgestootheractions . . . . . . . . . . . 23 2.4 ModelDetails . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4.1 ExecutionModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4.2 Memorysystemmodel . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.5.1 Executionvs. visibility . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.5.2 Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 ix 3 RMCFormalism 29 3.1 BasicRMC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.1.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.1.2 Threadstaticsemantics . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.1.3 Threaddynamicsemantics . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.1.4 TheStore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.1.5 Tracecoherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.1.6 Storestaticsemantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.1.7 Signaturedynamicsemantics . . . . . . . . . . . . . . . . . . . . . . . . 40 3.1.8 Top-levelsemantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.2.1 Mootness,incorrectspeculation,andsemanticdeadlock . . . . . . . . . 41 3.2.2 Read-readcoherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.2.3 ConnectionstoRMC-C++ . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.2.4 Thin-AirReads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.3 Metatheory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.3.1 Typesafety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.3.2 Sequentialconsistencyresults . . . . . . . . . . . . . . . . . . . . . . . 51 3.4 Improvementsandnewfeatures . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.4.1 Bettercompare-and-swap . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.4.2 Pushedges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.4.3 Spawningnewthreads . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.4.4 Livenesssideconditions . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.4.5 Sequentiallyconsistentoperations . . . . . . . . . . . . . . . . . . . . . 58 3.4.6 Non-concurrent(plain)locations . . . . . . . . . . . . . . . . . . . . . . 60 3.4.7 Allocations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4 CompilingRMC 65 4.1 Generalapproach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.2 x86 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.3 ARMv7andPOWER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.4 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.4.1 GeneralApproach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.4.2 Analysisandpreprocessing . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.4.3 CompilationUsingSMT . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.4.4 Scopedconstraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.4.5 Findingdatadependencies . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.4.6 Usingthesolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.5 ARMv8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.5.1 TheARMv8memorymodel . . . . . . . . . . . . . . . . . . . . . . . . 83 4.5.2 TargetingARMv8release/acquire . . . . . . . . . . . . . . . . . . . . . 84 4.5.3 Usingdmbld . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.5.4 Fakinglwsync . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.6 Compilationweights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 x
Description: