Processor Allocation, Message Scheduling, and Algorithm Selection for Parallel Space-Time Adaptive Processing Dissertation Defense May 2000 Jack M. West Committee Members Dr. John K. Antonio (Chair) Dr. William Marcy Dr. Noe Lopez-Benitez T T U EXAS ECH NIVERSITY Department of Computer Science · College of Engineering · Lubbock · Texas · 79019 · USA ACKNOWLEDGEMENTS ii CONTENTS ACKNOWLEDGEMENTS....................................................................................................II CONTENTS............................................................................................................................III ABSTRACT...........................................................................................................................VI TABLES...............................................................................................................................VIII FIGURES.................................................................................................................................X CHAPTER I..............................................................................................................................1 INTRODUCTION....................................................................................................................1 CHAPTER II.............................................................................................................................4 PARALLELIZATION APPROACH FOR STAP...................................................................4 2.1 Principles of Space-Time Adaptive Processing....................................................4 2.2 Overview of Mercury’s RACE® Multicomputer.................................................9 2.3 The Interconnection Network of the RACE® Multicomputer.............................9 2.4 The Compute Nodes of the RACE® Multicomputer..........................................17 2.5 Linear Pipelined Execution Model for Embedded Applications......................21 2.6 Multi-Phased Pipelined Execution Model.........................................................23 2.7 Multi-Phased Pipelined Execution Model with Sub-Sub Bar Partitioning........26 CHAPTER III..........................................................................................................................31 MAPPING DATA AND SCHEDULING COMMUNICATIONS FOR IMPROVED PERFORMANCE...................................................................................31 3.1 Mapping a STAP Data Cube onto the Mercury RACE System....................31 3.2 Scheduling Communications During RePartitioning Phases............................34 CHAPTER IV.........................................................................................................................41 DESIGN OF THE NETWORK SIMULATOR....................................................................41 4.1 UML Class Definitions.......................................................................................41 4.2 Refining Class Operations..................................................................................45 iii 4.3 UML Statecharts and Activity Diagrams of the Simulator...............................52 4.4 Preliminary Numerical Results..........................................................................59 4.4.1 Performance Metric for a 3×12, 12×3, and 4×9 Process Set........59 4.4.2 Performance Metric for a 12×3, 9×4, 6×6, and 4×9 Process Set............................................................................................61 CHAPTER V...........................................................................................................................65 ENHANCEMENT AND VERIFICATION OF THE NETWORK SIMULATOR............65 5.1 Enhancements to the Network Simulator............................................................65 5.2 Test Plan for Verification of the Network Simulator........................................72 5.3 Timing Studies for the Verification Test Plans..................................................76 CHAPTER VI.........................................................................................................................91 DATA MAPPING OPTIMIZATION....................................................................................91 6.1 Optimal Mapping Heuristic for STAP Data Distribution...................................92 6.2 Software Mapping Utility.................................................................................100 CHAPTER VII......................................................................................................................105 MESSAGE SCHEDULING OPTIMIZATION..................................................................105 7.1 Background.......................................................................................................105 7.2 Methodology and Implementation...................................................................109 CHAPTER VIII.....................................................................................................................118 NUMERICAL STUDIES.....................................................................................................118 8.1 Data Mapping Optimization...............................................................................118 8.1.1 Optimal Mapping for 16 and 32 CN Systems....................................119 8.1.2 Study of a 240×32×16 Data Cube...................................................132 8.1.3 Study of a 480×64×32 Data Cube..................................................135 8.1.4 Study of a 32×32×32 Data Cube....................................................135 8.1.5 Study of a 64×64×64 Data Cube....................................................136 iv 8.2 GA-Based Message Scheduling Optimization..................................................139 8.2.1 Optimal Scheduling for 16 Compute Nodes......................................140 8.2.2 Optimal Scheduling for 24 Compute Nodes......................................158 8.2.3 Optimal Scheduling for 32 Compute Nodes......................................174 8.3 Power Analysis...................................................................................................188 8.4 Summary of results.............................................................................................191 CHAPTER IX.......................................................................................................................193 CONCLUSION.....................................................................................................................193 REFERENCE.......................................................................................................................195 v ABSTRACT The minimization of execution time (which includes both computation and communication components) and/or the maximization of throughput are of great significance in embedded parallel environments. Given tight system constraints associated with applications in these environments, it is imperative to efficiently map the tasks and/or data of an application onto the processors so as to reduce the imposed inter-processor communication traffic. In addition to mapping the tasks and data to the processors in an efficient manner, it is also important to schedule the communication of messages during phases of data movement so as to minimize network contention in an attempt to attain the smallest possible communication time. In this instance, mapping and scheduling can be classified as optimization problems, where the performance of the parallel system is vastly impacted by the optimization of both mapping and scheduling. This dissertation involves optimizing the mapping of data and the scheduling of messages for a class of signal processing techniques known as space-time adaptive processing (STAP). An objective function is proposed to measure the quality of data mapping to processing elements of a parallel system for a STAP algorithm. The objective function is a cost metric that provides a quantitative measurement of the message traffic generated during phases of data movement based on the mapping of data to processors on a parallel system. The results show significant differences in the quality of data mappings using the proposed objective function. A genetic algorithm (GA) based approach for solving the message scheduling optimization problem is proposed, and numerical results from different scenarios are vi provided. The GA-based optimization is performed off-line, and the results of this optimization are static schedules for each processing element in the parallel system. These static schedules are then implemented in the on-line parallel STAP application. The results of this research illustrate significant improvement in communication time performance is possible using the proposed GA-based approach to scheduling. vii TABLES 8.1 Objective function results from the best 10 and worst 10 mappings for a 16 CN network and a STAP data cube with 240 range bins, 32 pulses, and 16 channels.....................................................................................................................121 8.2 Objective function results from the best 20 mappings for a 16 CN network and a STAP data cube with 32 range bins, 32 pulses, and 32 channels..................123 8.3 Objective function results from the best 20 mappings for a 16 CN network and a STAP data cube with 64 range bins, 64 pulses, and 64 channels..................126 8.4 Objective function results from the best 10 and worst 10 mappings for a 32 CN network and a STAP data cube with 240 range bins, 32 pulses, and 16 elements....................................................................................................................128 8.5 Objective function results from the best 20 mappings for a 32 CN network and a STAP data cube with 32 range bins, 32 pulses, and 32 channel elements and 64 range bins, 64 pulses, and 64 channels.........................................131 8.6 Simulated communication time improvements of the GA scenarios depicted in Fig. 8.14 with respect to the best, average, and worst schedules from the initial population.......................................................................................................143 8.7 Simulated communication time improvements of the GA scenarios depicted in Fig. 8.15 with respect to the best, average, and worst schedules from the initial population.......................................................................................................144 8.8 Simulated communication time improvements of the GA scenarios depicted in Fig. 8.18 with respect to the best, average, and worst schedules from the initial population.......................................................................................................149 8.9 Simulated communication time improvements of the GA scenarios depicted in Fig. 8.19 with respect to the best, average, and worst schedules from the initial population.......................................................................................................150 8.10 Simulated communication time improvements of the GA scenarios depicted in Figs. 8.20 and 8.21...............................................................................................152 8.11 Simulated communication time improvements of the GA scenarios depicted in Figs. 8.22 and 8.23...............................................................................................155 8.12 Simulated communication time improvements of the GA scenarios depicted in Figs. 8.26 with respect to the best, average, and worst schedules from the initial population.......................................................................................................160 viii 8.13 Simulated communication time improvements of the GA scenarios depicted in Figs. 8.27 with respect to the best, average, and worst schedules from the initial population.......................................................................................................162 8.14 Simulated communication time improvements of the GA scenarios depicted in Figs. 8.28 with respect to the best, average, and worst schedules from the initial population.......................................................................................................163 8.15 Simulated communication time improvements of the GA scenarios depicted in Figs. 8.29 with respect to the best, average, and worst schedules from the initial population.......................................................................................................164 8.16 Simulated communication time improvements of the GA scenarios depicted in Figs. 8.32 with respect to the best initial schedule, an inherited schedule, a random schedule, and a baseline schedule............................................................168 8.17 Simulated communication time improvements of the GA scenarios depicted in Figs. 8.33 with respect to the best initial schedule, an inherited schedule, a random schedule, and a baseline schedule............................................................169 8.18 Simulated communication time improvements of the GA scenarios depicted in Figs. 8.38 with respect to the best, average, and worst schedules from the initial population.......................................................................................................176 8.19 Simulated communication time improvements of the GA scenarios depicted in Figs. 8.38 with respect to the best initial schedule, an inherited schedule, a random schedule, and a baseline schedule............................................................178 8.20 Simulated communication time improvements of the GA scenarios depicted in Figs. 8.43 with respect to the best initial schedule, an inherited schedule, a random schedule, and a baseline schedule............................................................182 8.21 Simulated communication time improvements of the GA scenarios depicted in Figs. 8.44 with respect to the best initial schedule, an inherited schedule, a random schedule, and a baseline schedule............................................................183 ix FIGURES Fig. 2.1 The STAP CPI three-dimensional data cube (derived from [18]).............................7 Fig. 2.2 Generic space-time adaptive processor (derived from [18])......................................7 Fig. 2.3 The RACE® Multicomputer (derived from [4])......................................................10 Fig. 2.4 The RACEway six-port network crossbar chip (derived from [7]).........................10 Fig. 2.5 The RACE® Multicomputer fat-tree interconnection network...............................13 Fig. 2.6 Illustration of packet transfer between CNs.............................................................13 Fig. 2.7 Standard hardware priority arbitration algorithm (derived from [20])....................18 Fig. 2.8 Top-Level hardware priority arbitration algorithm (derived from [20]).................18 Fig. 2.9 SHARC compute node (derived from [4])...............................................................20 Fig. 2.10 PowerPC compute node (derived from [4])...........................................................20 Fig. 2.11 Mercury computer configuration of in-house system............................................22 Fig. 2.12 Linear pipelined execution model for STAP..........................................................22 Fig. 2.13 Illustration showing processor mapping for a STAP application. In (a) a linear execution model is used, and in (b) a staged partitioning model is used..................25 Fig. 2.14 Multi-phased paradigm for a STAP application implemented on a parallel embedded system........................................................................................................25 Fig. 2.15 STAP data cube partitioning by sub-cube bars (derived from [13])......................29 Fig. 2.16 Sub-cube bar partitioning prior to pulse compression (derived from [13])...........29 Fig. 2.17 Sub-cube bar partitioning prior to Doppler filtering (derived from [13])..............29 Fig. 2.18 Illustration of sub-cube bar mapping technique for the case of 12 CNs. The mapping of the sub-cube bars to CNs defines the required data communications. (a) Example illustration of the communication requirements from CN 1 to other CNs (2, 3, and 4) after completion of range processing and prior to Doppler processing. (b) Example illustration of the communication requirements from CN 1 to other CNs (5 and 9) after completion of Doppler processing and prior to adaptive weight processing...................................................................................................................30 x
Description: