C++ AMP: Accelerated Massive Parallelism with Microsoft Visual C++ ® ® Kate Gregory Ade Miller Pub shed wth the authorzaton of Mcrosoft Corporaton by O’Re y Meda, Inc 1005 Gravensten Hghway North Sebastopo, Ca forna 95472 Copyrght © 2012 by Ade M er, Gregory Consutng Lmted A rghts reserved No part of the contents of ths book may be reproduced or transmtted n any form or by any means wthout the wrtten permsson of the pub sher ISBN 978-0-7356-6473-9 1 2 3 4 5 6 7 8 9 LSI 7 6 5 4 3 2 Prnted and bound n the Unted States of Amerca Mcrosoft Press books are ava abe through bookse ers and dstrbutors wordwde If you need support reated to ths book, ema Mcrosoft Press Book Support at mspnput@mcrosoftcom Pease te us what you thnk of ths book at http//wwwmcrosoftcom/earnng/booksurvey Mcrosoft and the trademarks sted at http//wwwmcrosoftcom/about/ega/en/us/Inte ectuaProperty/ Trademarks/EN-USaspx are trademarks of the Mcrosoft group of companes A other marks are property of ther respectve owners The exampe companes, organzatons, products, doman names, ema addresses, ogos, peope, paces, and events depcted heren are ficttous No assocaton wth any rea company, organzaton, product, doman name, ema address, ogo, person, pace, or event s ntended or shoud be nferred Ths book expresses the author’s vews and opnons The nformaton contaned n ths book s provded wthout any express, statutory, or mp ed warrantes Nether the authors, O’Re y Meda, Inc, Mcrosoft Corporaton, nor ts rese ers, or dstrbutors w be hed abe for any damages caused or a eged to be caused ether drecty or ndrecty by ths book Acquisitions and Developmental Editor: Russe Jones Production Editor: Ho y Bauer Editorial Production: nSght, Inc Copyeditor: nSght, Inc Indexer: nSght, Inc Cover Design: Twst Creatve • Seatte Cover Composition: Zyg Group, LLC Illustrator: Rebecca Demarest Dedicated to Brian, who has always been my secret weapon, and my children, now young adults who think it’s normal for your mum to write books. —Kate GreGory Dedicated to The Susan, who is so much more than I deserve. —ade Mller Contents at a Glance Foreword xv Introduction xvii ChAPter 1 Overview and C++ AMP Approach 1 ChAPter 2 NBody Case Study 21 ChAPter 3 C++ AMP Fundamentals 45 ChAPter 4 tiling 63 ChAPter 5 tiled NBody Case Study 83 ChAPter 6 Debugging 101 ChAPter 7 Optimization 127 ChAPter 8 Performance Case Study—reduction 171 ChAPter 9 Working with Multiple Accelerators 203 ChAPter 10 Cartoonizer Case Study 223 ChAPter 11 Graphics Interop 257 ChAPter 12 tips, tricks, and Best Practices 283 APPeNDIx Other resources 309 Index 313 About the Authors 327 Contents Foreword .......................................................xv Introduction ....................................................xvii Chapter 1 Overview and C++ AMP Approach 1 Why GPGPU? What Is Heterogeneous Computng? 1 Hstory of Performance Improvements 1 Heterogeneous Patforms 2 GPU Archtecture 4 Canddates for Performance Improvement through Para e sm 5 Technooges for CPU Para e sm 8 Vectorzaton 8 OpenMP 10 Concurrency Runtme (ConcRT) and Para e Patterns Lbrary 11 Task Para e Lbrary 12 WARP—Wndows Advanced Rasterzaton Patform 12 Technooges for GPU Para e sm 13 Requrements for Successfu Para e sm 14 The C++ AMP Approach 15 C++ AMP Brngs GPGPU (and More) nto the Manstream 15 C++ AMP Is C++, Not C 16 C++ AMP Leverages Toos You Know 16 C++ AMP Is Amost A Lbrary 17 C++ AMP Makes Portabe, Future-Proof Executabes 19 Summary 20 Chapter 2 NBody Case Study 21 Prerequstes for Runnng the Exampe 21 Runnng the NBody Sampe 22 Structure of the Exampe 28 vii CPU Cacuatons 29 Data Structures 29 The wWinMain Functon 30 The OnFrameMove Ca back 30 The OnD3D11CreateDevice Ca back 31 The OnGUIEvent Ca back 33 The OnD3D11FrameRender Ca back 33 The CPU NBody Casses 34 NBodySimpleInteractionEngine 34 NBodySimpleSingleCore 35 NBodySimpleMultiCore 35 NBodySimpleInteractionEngine::BodyBodyInteraction 35 C++ AMP Cacuatons 36 Data Structures 37 CreateTasks 38 The C++ AMP NBody Casses 40 NBodyAmpSimple::Integrate 40 BodyBodyInteracton 41 Summary 43 Chapter 3 C++ AMP Fundamentals 45 array<T, N> 45 accelerator and accelerator view 48 index<N> 50 extent<N> 50 array view<T, N> ................................................ 51 para e for each 55 Functons Marked wth restrict(amp) 57 Copyng between CPU and GPU 59 Math Lbrary Functons 61 Summary 62 viii Contents
Description: