Exzellenzcluster CognitiveInteractionTechnology KognitronikundSensorik Prof. Dr.-Ing. U.Rückert Heterogeneous Computing Systems for Vision-based Multi-Robot Tracking zur Erlangung des akademischen Grades eines DOKTOR-INGENIEUR (Dr.-Ing.) der Technischen Fakultät der Universität Bielefeld genehmigte Dissertation von M.Eng. Arif Irwansyah Referent: Prof. Dr.-Ing. UlrichRückert Korreferent: Prof. Dr.-Ing. FranzKummert TagdermündlichenPrüfung: 12.09.2017 Bielefeld/September2017 Acknowledgments Firstly,IwouldliketoexpressmyprofoundgratitudetoProf. Dr.-Ing. UlrichRückertand Dr.-Ing. MarioPorrmannfortheirinvaluablesupport,motivation,andencouragement throughoutmydoctoralthesis. Ihavelearnedandgainmanyexperiencesfromthem duringmyworkinCognitronicsandSensorSystemsGroup. Iwouldalsoliketothank JensHagemeyerforhisgreatsupportandhelpwhenImetdifficultiesintechnicaland scientificproblems. IthankCordulaHeidbredeforhergreatsupportandhelpinall administrationandnon-technicalaspectsthatIneeded. IthankOmarWIbraheem,for hisbrothership,companion,andsupportduringmyjourneyinBielefeld. Ourresearch collaborationthatwehavedonetogetherhavesignificantlyacceleratedmywork. IwouldalsoliketothankProf. Dr.-Ing. FranzKummert,whoacceptedtoreview mythesis. Additionally,manythankstothememberoftheexaminationcommittee Prof. Dr. ElisabettaChiccaandDr. rer. nat. ThiesPfeiffer,fortheirparticipationsasthe chairmanandtheexaminer,respectively. Iamalsohighlythankfultothepastandcurrentmembersofourresearchgroup. I am especially thankful to my colleagues Andry Tanoto, René Zorn, Muhammad Shahzad,andMeysamPeykanu. Mysincerestappreciationgoesouttoallthosewho havecontributeddirectlyandindirectlytothecompletionofthisthesis. Iamasever,especiallyindebtedtomyparents,Mrs. Mardiyah,Mrs. Sa’diyahand Mr. Djauharifortheirloveandsupportthroughoutmylife. Withoutexception,Iam also highly thankful to my wife’s parent Mr. Rubiyanto and Mrs. Susiyati for their supportandprayer. Iamreallygratefultoallofyou. SpecialthankstomybelovedwifeSitiRokhmawati,forallyourloveandsupport. Youhaveshownunrelentingcareandsupportthroughoutthischallengingendeavor. I am also thankful to my precious children Dayyinah, Tsabita, and Hafidz. Finally, Alhamdulillah,allpraisetoHim,forHisguidanceandallowingmetoachieveallthese amazingthings. Bielefeld,September2017 ArifIrwansyah iii Abstract Vision-basedrobottrackingiscommonlyusedformonitoringanddebugginginsingle- andmulti-robotenvironments. Currently,mostoftheestablishedvision-basedmulti- robottrackingsystemsarebasedontheimplementationsofageneralpurposecentral processingunit(CPU)inthecomputer. Thesesolutionsarenotfeasibleforuse-cases withlargeframesizes,multiplecameras,andalargenumberofrobotstobetracked. Themostcommonsolutiontohandletheincreasingnumberofcamerasandrobotsis theadditionofextracomputers. Asanalternative,hardwareacceleratorssuchasfield programmablegatearrays(FPGAs)andgeneralpurposegraphicprocessingunit(GPU) canbeusedtoreleasethehostcomputerfromcomputation-intensivetaskslikevision processingthroughtheirhighinherentparallelism. FPGAsandGPUsofferdifferent approaches to maximize the performance of a computing system. An FPGA is an integratedcircuit(IC)designedtobehardwarereconfigurableaftermanufacturing. It ispurpose-builthardwarethatcanbeusedforspecificalgorithmsaccordingtotheuser’s applicationstoobtainhighercomputingperformance. Meanwhile,theadvantagesof theGPUasanacceleratorrelyonitsarchitecture,whichconsistsofalargenumber oflightweightcoresandappliesasingleinstructionmultiplethreads(SIMT)model forexecutingprograms. Thisthesisemphasizestheimplementationsoftwodistinct heterogeneouscomputingsystemsforavision-basedmulti-robottrackingapplication, encompassingtheuseofFPGAsandGPUsashardwareaccelerators. Itaimstodetermine whicharchitectureofferstheoptimumsolution,intermsofthedetectionperformance, computingperformance,andpowerefficiency. TheproposedheterogeneouscomputingsystemscombinetheadvantagesofaCPU withthebenefitsofanFPGAoraGPU.Thedesignsattempttoefficientlyhandlecom- putationally intensive vision-based multi-robot tracking algorithms. The FPGA and GPUareutilizedashardwareaccelerators,processingtheportionofthealgorithmthat is computationally intensive to detect the robots’ locations. Meanwhile, the CPU is usedastheprocessorinthehostPCforpost-processinganddisplay. IntheFPGA-based acceleratedcomputingsystem,acompletedesignfordetectingeachrobot’slocation is implemented, comprising a multi-camera frame grabber and IP cores for object segmentation,edgefiltering,andcircledetection. Thenumberofcamerasusedinthe proposed design is scalable. This design presents three basic configurations, which differinthenumberofstreaminghardwareacceleratorsandintheparallelismofthe implementation. Additionally,twouniquearchitecturesforFPGA-basedcircledetec- tionformulti-robottracking,usingthecombinationofthecircularHoughtransform (CHT)-graphclusteralgorithmandcirclescanningwindow(CSW)technique-graph cluster algorithm, are proposed and implemented. Regarding the implementation oftheGPUasahardwareaccelerator,theproposedGPU-basedcomputingsystemis designedtoimprovethecomputationalperformancebyutilizingthebenefitsofthe v GPU’s architecture, particularly its thousands of lightweight processing cores. The algorithm’simplementationintheGPUincludesobjectsegmentation(debayer,RGB to HSV color conversion, and color masking operations), edge filtering, and circle detection(CHTandCSW).TheFPGA/GPUperformsthecomputationallyintensive tasksforafullresolutionimage(amaximumof2048×2048pixels),whiletheCPU executesthepost-processingalgorithmforsmallsub-images(40×40pixels). Toobtain therobots’orientationsandIDs,theadvantageofthemulti-corearchitectureofthe CPUisemployedtoprocessallofthesub-imagesinamulti-threadapproach. TheresultsofthisthesisshowthattheFPGA-andGPU-basedhardwareaccelerators greatlyenhancethecomputationalperformanceofthecomputingsystemforvision- based multi-robot tracking. The maximum frame rate in the FPGA implementation is optimized by utilizing four streaming hardware accelerators, working in parallel. Meanwhile,thehigh-performanceoftheGPUimplementationisachievedbyemploying itsmanycores. Accordingtotheexperiments,boththeFPGA-basedandGPU-based designspresenthighlyaccurateperformance. Thedesignanditsalgorithmcanprovide a highly accurate performance for the localization of multiple robots with a typical detection performance (precision and recall) of 99 %. Additionally, both the FPGA andGPUhardwareacceleratorsofferhigherpowerefficiencythantheCPU.Theycan increase the computation performance per watt of the computing system. Finally, quantitative and qualitative parameters (e.g. computational performance, power consumption, power efficiency, and developing time) are analyzes more details to determinewhichtechnologyismoresuitableforthevision-basedmulti-robottracking application. vi Contents 1 Introduction 1 1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 ThesisOrganization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Vision-basedRobotTrackingComputingSystem 7 2.1 BasicConceptofVision-basedRobotTrackingSystem . . . . . . . . . . 7 2.2 RelatedWork. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.1 CPU-basedComputingSystem . . . . . . . . . . . . . . . . . . . 9 2.2.2 FPGAAcceleratedComputingSystem . . . . . . . . . . . . . . . 14 2.2.3 GPUAcceleratedComputingSystem . . . . . . . . . . . . . . . . 16 2.2.4 FPGA-GPUAcceleratedComputingSystem . . . . . . . . . . . . 16 2.3 HardwareAcceleratorsinVisionProcessing . . . . . . . . . . . . . . . . 17 2.3.1 Multi-coreCPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.2 GraphicProcessingUnit(GPU) . . . . . . . . . . . . . . . . . . . 23 2.3.3 FieldProgrammableGateArrays(FPGAs) . . . . . . . . . . . . 31 2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3 Vision-basedMulti-RobotTrackingwithHeterogeneousComputingSys- tems 39 3.1 HeterogeneousComputingSystem . . . . . . . . . . . . . . . . . . . . . 39 3.2 ArchitectureandDesignFlow . . . . . . . . . . . . . . . . . . . . . . . . 43 3.2.1 FPGA-CPUHeterogeneousComputingSystem . . . . . . . . . . 44 3.2.2 GPU-CPUHeterogeneousComputingSystem. . . . . . . . . . . 46 3.3 Vision-basedMulti-RobotTrackingAlgorithm . . . . . . . . . . . . . . . 49 3.3.1 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.3.2 RobotDetection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.3.3 Post-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4 ImplementationinFPGA-acceleratedHeterogeneousComputingSystem 69 4.1 FPGA-CPUHardwareEnvironmentDescription . . . . . . . . . . . . . . 70 4.2 AlgorithmImplementation . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.3 VisionProcessingModuleImplementationinFPGAs. . . . . . . . . . . 77 4.3.1 Multi-CameraGigEVisionFrameGrabberModule . . . . . . . 77 vii 4.3.2 ObjectSegmentation . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.3.3 EdgeFilterModule . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.3.4 ObjectLocalization . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.4 ResourceUtilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.5 PostProcessinginHostPC . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5 ImplementationinGPU-acceleratedHeterogeneousComputingSystem 101 5.1 GPU-CPUHardwareEnvironmentDescription . . . . . . . . . . . . . . 102 5.2 AlgorithmImplementation . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.3 CUDAKernelsImplementation . . . . . . . . . . . . . . . . . . . . . . . . 107 5.3.1 ObjectSegmentation . . . . . . . . . . . . . . . . . . . . . . . . . 107 5.3.2 EdgeFilter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.3.3 ObjectLocalization . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.4 AchievedOccupancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.5 PostProcessinginHostPC . . . . . . . . . . . . . . . . . . . . . . . . . . 118 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 6 ResultsandAnalysis 121 6.1 DetectionPerformance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 6.1.1 FPGAimplementation. . . . . . . . . . . . . . . . . . . . . . . . . 123 6.1.2 GPUimplementation . . . . . . . . . . . . . . . . . . . . . . . . . 127 6.1.3 Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 6.2 ComputingPerformance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 6.2.1 FPGAimplementation. . . . . . . . . . . . . . . . . . . . . . . . . 133 6.2.2 GPUimplementation . . . . . . . . . . . . . . . . . . . . . . . . . 140 6.2.3 Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 6.2.4 PowerEfficiencyEvaluation . . . . . . . . . . . . . . . . . . . . . 153 6.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 7 ConclusionsandOutlook 161 7.1 Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 7.2 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 ListofFigures 165 ListofTables 171 Abbreviations 173 References 177 viii Author’sPublications 189 ix
Description: