Code Modernization – Why and How October 2014 Peter Kerney Intel Australia [email protected] Legal Disclaimers Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmarkand MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you infully evaluating your contemplated purchases, including the performance of that product when combined with other products. Relative performance is calculated by assigning a baseline value of 1.0 to one benchmark result, and then dividing the actualbenchmark result for the baseline platform into each of the specific benchmark results of each of the other platforms, and assigning them a relative performance number that correlates withthe performance improvements reported. Intel does not control or audit the design or implementation of third party benchmarks or Web sites referenced in this document.Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmarks are reported and confirm whether the referenced benchmarks are accurateand reflect performance of systems available for purchase. Intel® Hyper-Threading Technology Available on select Intel® Xeon® processors. Requires an Intel® HT Technology-enabled system. Consult your PC manufacturer. Performance will vary depending on the specific hardware and software used. For more information including details on which processors supportHTTechnology, visit http://www.intel.com/info/hyperthreading. Intel® Turbo Boost Technology requires a Platform with a processor with Intel Turbo Boost Technology capability. Intel TurboBoost Technology performance varies depending on hardware, software and overall system configuration. Check with your platform manufacturer on whether your system delivers Intel Turbo Boost Technology. For more information, see http://www.intel.com/technology/turboboost Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor series,not across different processor sequences. See http://www.intel.com/products/processor_numberfor details. Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications. All dates and products specified are for planning purposes only and are subject to change without notice Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel’s current plan of record product roadmaps. Product plans, dates, and specifications are preliminary and subject to change without notice Copyright © 2014 Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon and Xeon logo , Xeon Phi and Xeon Phi logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. All dates and products specified are for planning purposes only and are subject to change without notice. *Other names and brands may be claimed as the property of others. 2 Optimization Notice Optimization Notice Intel®compilers, associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel®and non-Intel microprocessors (for example SIMD instruction sets), but do not optimize equally for non-Intel microprocessors. In addition, certain compiler options for Intel compilers, including some that are not specific to Intel micro-architecture, are reserved forIntel microprocessors. For a detailed description of Intel compiler options, including the instruction sets and specific microprocessors they implicate, please refer to the “Intel®Compiler User and Reference Guides”under “Compiler Options." Many library routines that are part of Intel®compiler products are morehighly optimized for Intel microprocessors than for other microprocessors. While the compilers and libraries in Intel®compiler products offer optimizations for both Intel and Intel- compatible microprocessors, depending on the options you select, your code and other factors, you likely will get extra performance on Intel microprocessors. Intel®compilers, associated libraries and associated development tools may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel®Streaming SIMD Extensions 2 (Intel® SSE2), Intel® Streaming SIMD Extensions 3 (Intel®SSE3), and Supplemental Streaming SIMD Extensions 3 (Intel® SSSE3) instruction sets and otheroptimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel®and non-Intel microprocessors, Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements. We hope to winyour business by striving to offer the best performance of any compiler or library; please let us know if you find we do not. Notice revision #20101101 3 FAQs – HPC Customers 1. Why should I care about code modernization? 2. What is the optimal platform for code modernization? 3. Is that platform’s performance compelling? 4. What is this platform’s value proposition? 5. What options do I have to evaluate this platform? Xeon = Intel® Xeon® processor Xeon Phi = Intel® Xeon Phi™ coprocessor 4 Why should I care about code modernization? Binomial Options DP 45000 Lot of performance is c 40000 being left on the table e S 35000 r ePre 30000 SS dtt ee sB 25000 VS Software and workloads used in performance tests may secorP sn si rehgiH 1250000000 57x SVPP hmasynaisvdctre eMo mbpoersbeo, inccleoe oMsmpsaoptrimroksn,. i azePreneedtr sm ff,oo serroma pfstaeuwnrrfaecoreder mtu,e osasiptnnsegc,r e sas utopiconehncly isaf iosca n nSc dYIon Smtmepla urtker o it 10000 functions. Any change to any of those factors may cause p O the results to vary. You should consult other information 5000 and performance tests to assist you in fully evaluating 0 your contemplated purchases, including the performance of that product when combined with other products. See slide in back-up for configuration details We believe most codes are here 4C 4C 6C 8C 12C Parallelization and vectorization of your code will maximize your ROI & be competitive 5 Introducing Intel® Parallel Studio XE 2015 – Latest Release! 6 Vectorization: What is it? (Graphical View) A + Scalar - One Instruction B - One Mathematical Operation C for (i=0;i<=MAX;i++) c[i]=a[i]+b[i]; a[i+7] a[i+6] a[i+5] a[i+4] a[i+3] a[i+2] a[i+1] a[i] + Vector - One Instruction b[i+7] b[i+6] b[i+5] b[i+4] b[i+3] b[i+2] b[i+1] b[i] - Eight Mathematical Operations1 c[i+7] c[i+6] c[i+5] c[i+4] c[i+3] c[i+2] c[i+1] c[i] 1. Number of operations per instruction varies based on the which SIMD instruction is used and the width of the operands 7 What is the optimal platform for code modernization? tba tba Future The world is going Intel® Xeon® Processor Intel® Xeon® Processor Intel® Xeon Phi™ x100 Intel® Xeon Phi™ x200 parallel –stick E5-2600 v2 Product E5-2600 v3 Product Product Family Product Family with sequential Family formerly Family formerly formerly codenamed codenamed … codenamed codenamed Knights Knights Future code and you will fall behind. IvyBridge Haswell Corner Landing Xeon Cores* 12 18 61 >60 tba Threads/Core 2 2 4 4 tba Vector Width 256-bit 256-bit 512-bit tba 512-bit Memory Bandwidth* 59 GB/s 68 GB/s 352 GB/s tba tba Optimize code today1 with Xeon Phi to maximize ROI and be ready for highly-parallel KNL and future Xeon; Optimization benefit for Xeon is not possible with the GPU/CUDA* path All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject tochange without notice. 1e.g. NERSC XXeeoonn =Ph Ii n=te Iln®te Xl®eo Xn®eo np rPohcie™ss coorp rocessor 8 *Other names and brands may be claimed as the property of others. Intel® Xeon Phi™ Coprocessors Highly-parallel Processing for Unparalleled Discovery Groundbreaking: differences Up to 61 IA cores/1.2 GHz/ 244 Threads Up to 16GB memory with up to 352 GB/s bandwidth 512-bit SIMD instructions Linux operating system, IP addressable Standard programming languages and tools Leading to Groundbreaking results Over 1.2 TeraFlop/s double precision peak performance1 Enjoy up to 1.79x higher memory bandwidth than on an Intel® Xeon®processor E5 family-based server.2 Up to 3x more performance per watt than with an Intel® Xeon® processor E5 family-based server.3 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmarkand MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results tovary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance Notes 1, 2 & 3, see backup for system configuration details. 9 9 Intel® Xeon Phi™ Coprocessors: They’re So Much More General purpose IA Hardware leads to less idle time for your investment Restrictive architectures It’s a supercomputer on a chip Operate as a compute node Run a full OS Program to MPI GPU ASIC Run x86 code FPGA Run restricted code Run offloaded code Custom HW Acceleration Intel®Xeon Phi™Coprocessor* Restrictive architectures limit the ability for applications to use arbitrary nested parallelism, functions calls and threading models *Refer to software.intel.com/mic-developerfor details on the Intel Xeon Phi™ coprocessor 10 10
Description: