Steal Tree: Low-Overhead Tracing of Work Stealing Schedulers JonathanLifflander*,SriramKrishnamoorthy ,LaxmikantV.Kale* † jliffl2,kale,[email protected] { } *UniversityofIllinoisUrbana-Champaign †PacificNorthwestNationalLaboratory June19,2013 Motivation Structuredparallelprogramming(e.g. async-finish)idiomshave ⌅ proliferated I Examples: OpenMP3.0,JavaConcurrencyUtilities,IntelTBB,Cilk, X10 Workstealingisoftenusedtoschedulethem: ⌅ I Well-studieddynamicloadbalancingstrategy I Provablyefficientscheduling I Understandableboundsontimeandspace StealTree:Low-OverheadTracingofWorkStealingSchedulers ⌥ JonathanLifflander ⌥ 2/29 StealTree:Low-OverheadTracingofWorkStealingSchedulers2/29 Tracing Whereandwheneachtaskexecuted ⌅ Capturestheorderofeventsandiseffectiveforonlineandoffline ⌅ analysis Challenges ⌅ I Thesizemaylimitwhatcanbefeasiblyanalyzed I Itmayperturbtheapplication’sexecutionmakingitimpractical Applications ⌅ I Replay I Performanceanalysis I Data-racedetection,retentivestealing,... StealTree:Low-OverheadTracingofWorkStealingSchedulers ⌥ JonathanLifflander ⌥ 3/29 StealTree:Low-OverheadTracingofWorkStealingSchedulers3/29 Trace Sizes Using the STEAL TREE 5 2000 Cores 4000 Cores 4 8000 Cores 16000 Cores Core 3 32000 Cores KB/ 2 1 0 AQ-HF AQ-WF SCF-HF SCF-WF TCE-HF TCE-WF PG-HF PG-WF StealTree:Low-OverheadTracingofWorkStealingSchedulers ⌥ JonathanLifflander ⌥ 4/29 StealTree:Low-OverheadTracingofWorkStealingSchedulers4/29 Approach Forasync-finishprogramstracingindividualtasksisnotfeasible ⌅ I Oftentheseprogramsexposefarmoreconcurrencythanthenumberof threads F Finegranularity F Sheernumberoftasks Ratherthantraceindividualtasks,exploitthestructureofthe ⌅ schedulertocoarsentheeventstraced Weidentifykeypropertiesoftwoschedulingpolicies: ⌅ I Help-first: exposemoreconcurrencybyexpandingtasksinthecurrent scopebeforeexecutingatask I Work-first: depth-firsttraversalofthecode(Cilk) StealTree:Low-OverheadTracingofWorkStealingSchedulers ⌥ JonathanLifflander ⌥ 5/29 StealTree:Low-OverheadTracingofWorkStealingSchedulers5/29 Example async-finish Program fn() { s1; async { s5; async w; s6; } s2; finish { s7; async x; s8; async y; s9; async z; s10; } s3; async { s11; } s4; } StealTree:Low-OverheadTracingofWorkStealingSchedulers ⌥ JonathanLifflander ⌥ 6/29 StealTree:Low-OverheadTracingofWorkStealingSchedulers6/29 Example async-finish Program fn() { s1; Root of Computation async { s5; Sequential Block async w; s6; } Asynchronous Task s2; finish { s7; Finish Scope async x; s8; Continuation async y; s9; async z; s10; } s3; async { s11; } s4; } StealTree:Low-OverheadTracingofWorkStealingSchedulers ⌥ JonathanLifflander ⌥ 7/29 StealTree:Low-OverheadTracingofWorkStealingSchedulers7/29 Help-first Scheduling Policy Enqueueallasyncsinthecurrentleveluntilafinishisreached ⌅ Example Program Snapshot of Execution Deque fn() { steal end s1; level async { s5; { async w; 1 s1 s2 s3 s3 s6; } s2; finish { s7; async x; s8; async y; local end s9; async z; s10; } s3; async { s11; } s4; } StealTree:Low-OverheadTracingofWorkStealingSchedulers ⌥ JonathanLifflander ⌥ 8/29 StealTree:Low-OverheadTracingofWorkStealingSchedulers8/29 Help-first Scheduling Policy Enqueueallasyncsinthecurrentleveluntilafinishisreached ⌅ Example Program Snapshot of Execution Deque fn() { steal end level s1; async { s5; 1 { s1 s2 s3 s3 async w; s6; } { s2; 2 s7 s8 s9 s10 finish { s7; async x; s8; local end async y; s9; async z; s10; } s3; async { s11; } s4; } StealTree:Low-OverheadTracingofWorkStealingSchedulers ⌥ JonathanLifflander ⌥ 9/29 StealTree:Low-OverheadTracingofWorkStealingSchedulers9/29 Help-first Scheduling Policy Enqueueallasyncsinthecurrentleveluntilafinishisreached ⌅ Example Program Snapshot of Execution Deque fn() { steal end s1; level async { s5; { async w; 1 s1 s2 s3 s3 s6; } { s2; 2 s7 s8 s9 s10 finish { s7; async x; 3 { z s8; async y; local end s9; async z; s10; } s3; async { s11; } s4; } StealTree:Low-OverheadTracingofWorkStealingSchedulers ⌥ JonathanLifflander ⌥ 10/29 StealTree:Low-OverheadTracingofWorkStealingSchedulers10/29