Overview: Programming Environment for Intel® Xeon Phi™ Coprocessor One Source Base, Tuned to many Targets Compilers, Source Libraries, Parallel Models Multicore Many-core Cluster Multicore Multicore CPU CPU Intel® MIC Multicore Multicore and Architecture Cluster Many-core Cluster Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel® Parallel Studio XE 2013 and Intel® Cluster Studio XE 2013 Phase Product Feature Benefit Intel® Threading design assistant • Simplifies, demystifies, and speeds Advisor XE (Studio products only) parallel application design • C/C++ and Fortran compilers • Intel® Threading Building Blocks • Enabling solution to achieve the Intel® • Intel® Cilk™ Plus application performance and Composer XE • Intel® Integrated Performance scalability benefits of multicore and Build Primitives forward scale to many-core • Intel® Math Kernel Library • Enabling High Performance Intel® High Performance Message Scalability, Interconnect Independence, Runtime Fabric MPI Library† Passing (MPI) Library Selection, and Application Tuning Capability Intel® Performance Profiler for • Remove guesswork, saves time, VTune™ optimizing application makes it easier to find performance Amplifier XE performance and scalability and scalability bottlenecks Memory & threading dynamic • Increased productivity, code quality, Intel® analysis for code quality and lowers cost, finds memory, Verify Inspector XE threading , and security defects & Tune Static Analysis for code quality before they happen • Analyze performance of MPI Intel® Trace MPI Performance Profiler for programs and visualize parallel Analyzer & understanding application application behavior and Collector† correctness & behavior communications patterns to identify hotspots Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Native Models Tools Tools Host Executable Coprocessor MKL Executable MKL OpenMP OpenMP e e l l MPI et et MPI lu lu alp alp rm rm TBB a a TBB Po Po C C OpenCL OpenCL Intel Cilk Plus Intel Cilk Plus C++/Ftn C++/Ftn Parallel programming is the same on coprocessor and host 4 Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel® MIC Centric Native MIC Programming Enabled by –mmic compiler option • Fully supported by compiler vectorization, Intel® MKL, OpenMP*, Intel® TBB, Intel® Cilk Plus, Intel® MPI, … • No Intel® Integrated Performance Primitives library yet • For sure, an option for some applications: • Needs to fit into memory (up to 16GB today) • Should be highly parallel code • Serial parts are slower on MIC than on host • Limited access to external environment like I/O • Native MIC file system exists in memory only • NFS allows external I/O but limited bandwidth Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Offload Models Tools Tools Host Executable Coprocessor MKL Executable MKL OpenMP OpenMP e e e l l MPI leut CI leut MPI alp P alp rm rm TBB a a TBB Po Po C C OpenCL OpenCL Intel Cilk Plus Heterogeneous Intel Cilk Plus Compute C++/Ftn C++/Ftn Offload Directives (Data Marshalling) Offload Keywords (Shared Virtual Memory) Parallel programming is the same on coprocessor and host 6 Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Programming Models and Mindsets Multi-Core Centric Many-Core Centric Xeon MIC Multi-Core Hosted Symmetric Many-Core Hosted General purpose Codes with balanced serial and parallel Highly-parallel codes needs computing Offload Codes with highly- parallel phases Main( ) Main( ) Main( ) Foo( ) Foo( ) Foo( ) Multi-core MPI_*( ) MPI_*( ) MPI_*( ) (Xeon) Main( ) Main( ) Foo( ) Foo( ) Foo( ) MPI_*( ) MPI_*( ) Many-core (MIC) Range of models to meet application needs 7 Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Offload Models • Intel® Xeon Phi™ supports two offload models: – Explicit: Data transfers from host to/from coprocessor are initiated by programmer – Implicit: Data is (virtually) shared (VSHM) between host and coprocessor • Also called LEO (Language Extensions for Offload) 8 5/28/2014 Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Explicit Offload Model Host Target 1 pA Allocate 2 3 Copy over 5 Read/Modify Free 4 Copy back • The programmer explicitly control data and function movement between the host and target(s) – Data is copied (not shared) – Must be bitwise copy-able (pointers NOT relocated) • Supported for Fortran, C/C++ 9 5/28/2014 Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Explicit Offload Model - Use Explicit offloading requires user to manage data persistence: • Data/Functions marked as… C/C++: • #pragma offload_attribute(push, target(mic)) … #pragma offload_attribute(pop)) • _attribute__((target(mic))) Fortran: • !DIR$ OPTIONS /OFFLOAD_ATTRIBUTE_TARGET=mic • !DIR$ ATTRIBUTES OFFLOAD:mic :: <subroutine> Will exist on both the host and target systems and copied between host and target when referenced. • Named targets – : runtime picks the card target(mic) – : explicitly name the logical card number target(mic:n) n 10 5/28/2014 Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Description: