ebook img

VI-HPS Archer Tutorial PDF

18 Pages·2017·0.63 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview VI-HPS Archer Tutorial

VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING OpenMP Runtime Error Detection with A RCHER At the 27nd VI-HPS Tuning Workshop Joachim Protze, Simone Atzeni RWTH Aachen University, University of Utah April 2018 + VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Data race example in OpenMP What could possibly go static double farg1,farg2; wrong? #define FMAX(a,b) (farg1=(a),farg2=(b),farg1>farg2?farg1:farg2) To avoid side effects, the arguments Double checked scoping of variables: are copied to temporary storage everything seems to be fine 1619: #pragma omp parallel for shared(bar, foo, THRESH) 1620: for (x=0; x<1000; x++) 1621: T = FMAX(0.1111*foo*bar[x],THRESH); What could possibly go Tool flags a write-write race in line 1621 wrong? JOACHIM PROTZE -RWTH AACHEN UNIVERSITY 04/27/2018 2 VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Threaded Applications (OpenMP) Threaded Defects Data Deadlocks Races JOACHIM PROTZE -RWTH AACHEN UNIVERSITY 04/27/2018 3 VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Threaded Applications (OpenMP) Threaded Defects – Deadlock A circular wait condition exists in the system that causes two or more parallel units to wait indefinitely #pragma omp parallel sections time { #pragma omp section Deadlocking { set(lock_a) set(lock_b) Execution omp_set_lock(&lock_a); Thread1 Order omp_set_lock(&lock_b); Thread2 omp_unset_lock(&lock_b); omp_unset_lock(&lock_a); set(lock_b) set(lock_a) } #pragma omp section { • Thread 1 waits for lock_b owned by omp_set_lock(&lock_b); thread 2 omp_set_lock(&lock_a); • Thread 2 waits for lock_a, owned by omp_unset_lock(&lock_a); Thread 1. omp_unset_lock(&lock_b); } • Neither thread can free a lock and } both threads wait indefinitely. JOACHIM PROTZE -RWTH AACHEN UNIVERSITY 04/27/2018 4 VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Threaded Applications (OpenMP) Threaded Defects – Data Race Program behavior dependent on execution order of threads/processes int x,y; int x,y; #pragma omp parallel #pragma omp parallel { { x = omp_get_thread_num (); #pragma omp master #pragma omp barrier sleep(5); #pragma omp master x = omp_get_thread_num (); printf (“Master is:%d” ,x); #pragma omp barrier } #pragma omp master printf (“Master is:%d” ,x); } A write-write race on x If the master thread is intended to write x, it will usually do so, due to the sleep; But sometimes it may not … JOACHIM PROTZE -RWTH AACHEN UNIVERSITY 04/27/2018 5 VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Threaded Applications (OpenMP) Definitions Data race Deadlock ▪ Two threads access the same shared ▪ Two or more threads are waiting for variable each other to release locks while ▪ at least one thread modifies the variable holding the lock the other leads to non- ▪ the accesses are concurrent, i.e. deterministic behavior unsynchronized ▪ Program hangs ▪ Leads to non-deterministic behavior ▪ May be non-deterministic ▪ Hard to find with traditional debugging tools JOACHIM PROTZE -RWTH AACHEN UNIVERSITY 04/27/2018 6 VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Data race detection tools Helgrind Intel Inspector (XE?) ▪ valgrind --tool=helgrind ▪ They rename the tool every other year ☺ ▪ Less false alerts ▪ Many false alerts ▪ Especially for newer OpenMP ▪ Misses synchronization information clauses/constructs ▪ Binary instrumentation during execution ▪ High runtime overhead for detailed analysis JOACHIM PROTZE -RWTH AACHEN UNIVERSITY 04/27/2018 7 VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Data race detection tools Archer ▪ Error checking tool for ▪ Memory errors ▪ Threading errors (OpenMP, Pthreads) ▪ Based on ThreadSanitizer (runtime check) ▪ Available for Linux, Windows and Mac ▪ Supports C, C++ (Fortran in work) ▪ Modified OpenMP runtime improved for data race detection ▪ More info: https://github.com/PRUNERS/archer JOACHIM PROTZE -RWTH AACHEN UNIVERSITY 04/27/2018 8 VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Archer – Background ▪ Static Analysis ▪ Only for OpenMP programs ▪ Exclude race free regions and sequential code from runtime analysis to reduce overhead ▪ Runtime check ▪ Error detection only in software branches that are executed ▪ Low runtime overhead ▪ Roughly 2x - 20x ▪ Detect races in large OpenMP applications ▪ No false positives ▪ Compiler instrumentation ▪ Slower compilation process (apply different passes on the source code to identify race free regions of code, instruments only the rest) JOACHIM PROTZE -RWTH AACHEN UNIVERSITY 04/27/2018 9 VIRTUAL INSTITUTE – HIGH PRODUCTIVITY SUPERCOMPUTING Archer – Usage ▪ Compile the program with the –g and –fsanitize=thread flag ▪ clang-archer myprog.c –o myprog ▪ Run the program under control of A Runtime RCHER ▪ export OMP_NUM_THREADS=... ./myprog ▪ Detects problems only in software branches that are executed ▪ Understand and correct the threading errors detected ▪ Edit the source code ▪ Repeat until no errors reported JOACHIM PROTZE -RWTH AACHEN UNIVERSITY 04/27/2018 10

Description:
OpenMP Runtime Error Detection with ARCHER. At the 27nd Program behavior dependent on execution order of threads/processes. 04/27/2018.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.