Table Of Content

LLVM Compiler Implementation for Explicit Parallelization and SIMD Vectorization Xinmin Tian, Hideki Saito, Ernesto Su, Jin Lin, Satish Guggilla, Diego Caballero Matt Masten, Andrew Savonichev, Michael Rice, Elena Demikhovsky Ayal Zaks, Gil Rapaport, Abhinav Gaba Vasileios Porpodas, Eric Garcia Intel Corporation Denver, Colorado| November 13-18, 2017 Agenda  Motivation  IR-Region Annotation RFC State  Implementation for Vectorization, Parallelization and Offloading (VPO) – Architecture – VPO Framework – Fork-Join Semantics – Issues and Solutions  Re-use VPO for OpenCL and Autonomous Driving Applications  Summary and Future Work 2 Motivation Why and What to do?  Impossible or high cost to undo certain inefficient code generated by the front-end  Front-end can’t generate optimal code for SIMD modifiers without mixed-data-type  Flexible pass ordering  Loop transformation sequence is loop strip-mining, loop peeling, offload code generation, threaded code generation, SIMD code generation with proper chunking.  Need to maintain and pass program information among these passes.  Need to access the target machine’s architectural and micro-architectural parameters.  … … … … #pragma omp target teams distribute parallel for simd schedule(simd:dynamic,16) for (frameIDX=0; frameIDX<1024*1024; frameIDX++) { … //loop has multiple memory refs and mixed data types } 3 Motivation  Initially proposed by !! !$accloop gang !$acc loop gang worker NVidia, ORNL and !$omp teams distribute do i=1,N LBL !$acc loop independent !! !$accloop gang worker  Education purpose !$omp teams distribute parallel do do j=1,N  Portable code with ... !! !$accloop gang vector understand of non end do !$omp teams distribute simd optimal performance end do !! !$accloop gang worker vector !$omp teams distribute parallel do simd !$omp target teams distribute parallel do !! !$accloop worker do i=1,N !$omp parallel do !$omp concurrent do j=1,N !! !$accloop vector !$omp simd ... end do !! !$accloop worker vector end do !$omp parallel do simd 4 IR-Region Annotation RFC State (Intel and ANL) • Updated language agnostic LLVM IR extensions based on LLVM Token and OperandBundle representation (based on feedback from Google and Xilinx). • def int_directive_region_entry : Intrinsic<[llvm_token_ty],[], []>; • def int_directive_region_exit : Intrinsic<[], [llvm_token_ty], []>; • def int_directive_marker : Intrinsic<[llvm_token_ty],[], []>; • Implemented explicit parallelization, SIMD vectorization and offloading in the LLVM middle-end based on IR-Region annotation for C/C++. • Leveraged the new parallelizer and vectorizer for OpenCL explicit parallelization and vectorization extensions to build autonomous driving workloads. 5 IR-Region Annotation Usage Examples #pragma omp target device(1) if(a) \ • Parallel region/loop/sections map(tofrom: x, y[5:100:1]) • Simd / declare simd structured-block • Task / taskloop • Offloading: Target map(…) %t0 = call token @llvm.directive.region.entry() • Single, master, critical, [”DIR.OMP.TARGET”(), atomics “QUAL.OMP.DEVICE”(1), • … … “QUAL.OMP.IF”(type @a), “QUAL.OMP.MAP.TOFROM”(type *%x), “QUAL.OMP.MAP.TOFROM:ARRSECT”(type *%y,1,5,100,1)] structured-block call void @llvm.directive.region.exit(token %t0) [”DIR.OMP.END.TARGET”()] 6 OpenMP Examples #pragma omp parallel for schedule(static) #pragma omp parallel for schedule(static, 64) #pragma omp parallel for schedule(dynamic) #pragma omp parallel for schedule(dynamic, 128) #pragma omp parallel for schedule(guided) #pragma omp parallel for schedule(guided, 256) #pragma omp parallel for schedule(auto) #pragma omp parallel for schedule(runtime) #pragma omp parallel for schedule(monotonic:dynamic, 16) #pragma omp parallel for schedule(nonmonotonic:dynamic,16) #pragma omp parallel for schedule(simd:dynamic, 16) #pragma omp parallel for schedule(simd,monotonic:dynamic, 16) #pragma omp parallel for schedule(simd,nonmonotonic:dynamic, 16) #pragma omp parallel for schedule(monotonic,simd:dynamic, 16) #pragma omp parallel for schedule(nonmonotonic,simd:dynamic,16) 7 LLVM IR Extension Examples %44 = call token @llvm.directive.region.entry() [ "DIR.OMP.PARALLEL.LOOP"(), "QUAL.OMP.SCHEDULE.STATIC"(i32 0) ] body call @llvm.directive.region.exit(%44) %47 = call token @llvm.directive.region.entry() [ "DIR.OMP.PARALLEL.LOOP"(), "QUAL.OMP.SCHEDULE.STATIC"(i32 64) ] …… %50 = call token @llvm.directive.region.entry() [ "DIR.OMP.PARALLEL.LOOP"(), "QUAL.OMP.SCHEDULE.DYNAMIC"(i32 1) ] …… %53 = call token @llvm.directive.region.entry() [ "DIR.OMP.PARALLEL.LOOP"(), "QUAL.OMP.SCHEDULE.DYNAMIC"(i32 128) ] %56 = call token @llvm.directive.region.entry() [ "DIR.OMP.PARALLEL.LOOP"(), "QUAL.OMP.SCHEDULE.GUIDED"(i32 1) ] %59 = call token @llvm.directive.region.entry() [ "DIR.OMP.PARALLEL.LOOP"(), "QUAL.OMP.SCHEDULE.GUIDED"(i32 256) ] %62 = call token @llvm.directive.region.entry() [ "DIR.OMP.PARALLEL.LOOP"(), "QUAL.OMP.SCHEDULE.AUTO"(i32 0) ] %65 = call token @llvm.directive.region.entry() [ "DIR.OMP.PARALLEL.LOOP"(), "QUAL.OMP.SCHEDULE.RUNTIME"(i32 0) ] %68 = call token @llvm.directive.region.entry() [ "DIR.OMP.PARALLEL.LOOP"(), "QUAL.OMP.SCHEDULE.DYNAMIC:MONOTONIC"(i32 16) ] %71 = call token @llvm.directive.region.entry() [ "DIR.OMP.PARALLEL.LOOP"(), "QUAL.OMP.SCHEDULE.DYNAMIC:NONMONOTONIC"(i32 16) ] %74 = call token @llvm.directive.region.entry() [ "DIR.OMP.PARALLEL.LOOP"(), "QUAL.OMP.SCHEDULE.DYNAMIC:SIMD"(i32 16) ] %77 = call token @llvm.directive.region.entry() [ "DIR.OMP.PARALLEL.LOOP"(), "QUAL.OMP.SCHEDULE.DYNAMIC:SIMD.MONOTONIC"(i32 16) ] %80 = call token @llvm.directive.region.entry() [ "DIR.OMP.PARALLEL.LOOP"(), "QUAL.OMP.SCHEDULE.DYNAMIC:SIMD.NONMONOTONIC"(i32 16) ] %83 = call token @llvm.directive.region.entry() [ "DIR.OMP.PARALLEL.LOOP"(), "QUAL.OMP.SCHEDULE.DYNAMIC:MONOTONIC.SIMD"(i32 16) ] %86 = call token @llvm.directive.region.entry() [ "DIR.OMP.PARALLEL.LOOP"(), "QUAL.OMP.SCHEDULE.DYNAMIC:NONMONOTONIC.SIMD"(i32 16) ] 8 Implementation for Vectorization, Parallelization and Offloading (VPO) 10000ft View: Intel® LLVM Compiler Architecture …… Clang C/C++ FE/Fortran/OpenCL … … Loop distribution LLVM IR PPaarr//VVeecc//OOffffllooaadd Loop fusion PPrreeppaarree ppaassss VVeeccttoorriizzaattiioonn O0/O1 ((eexxpplliicciitt // aauuttoo)) ScalarOpts Community’s or Loop Unrolling Vendor’s Vectoizer O2 & LLVM IR Community’s or Vendor’s above Loop Optimizers LLVM IR Lowering and AAnnnnoottaatteedd ppaarr--lloooopp ffoorr Outlining for LoopOpts aauuttoo--ppaarraalllleelliizzaattiioonn OpenMP, Autopar, LLVM IR O0/O1 Offload LLVM IR SSccaallaarrOOppttss LLLLVVMM CCGG 10 10

Description:

Ayal Zaks, Gil Rapaport, Abhinav Gaba. Vasileios Porpodas, Eric Garcia. Intel Corporation . foo-x86.bc clcpcom ld foo.out icx bar.c bar.bc bar_-target.asm clcpcom bar-x86.bc clcpcom as bar-target.o libtarget.bc. -mlink-target-bitcode as foo-target.o. File renaming. File renaming link. *-target*.out

LLVM Compiler Implementation for Explicit Parallelization and SIMD Vectorization PDF

35 Pages·2017·4.06 MB·English

Checking for file health...

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Download LLVM Compiler Implementation for Explicit Parallelization and SIMD Vectorization PDF Free - Full Version

by Unknow| 2017| 35 pages| 4.06| English

Download LLVM Compiler Implementation for Explicit Parallelization and SIMD Vectorization by in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About LLVM Compiler Implementation for Explicit Parallelization and SIMD Vectorization

Detailed Information

Author:	Unknown
Publication Year:	2017
Pages:	35
Language:	English
File Size:	4.06
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free LLVM Compiler Implementation for Explicit Parallelization and SIMD Vectorization Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download LLVM Compiler Implementation for Explicit Parallelization and SIMD Vectorization PDF?

Yes, on https://PDFdrive.to you can download LLVM Compiler Implementation for Explicit Parallelization and SIMD Vectorization by completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read LLVM Compiler Implementation for Explicit Parallelization and SIMD Vectorization on my mobile device?

After downloading LLVM Compiler Implementation for Explicit Parallelization and SIMD Vectorization PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of LLVM Compiler Implementation for Explicit Parallelization and SIMD Vectorization?

Yes, this is the complete PDF version of LLVM Compiler Implementation for Explicit Parallelization and SIMD Vectorization by Unknow. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download LLVM Compiler Implementation for Explicit Parallelization and SIMD Vectorization PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.