ebook img

Overcoming Hard-Faults in High-Performance Microprocessors PDF

169 Pages·2008·3.6 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Overcoming Hard-Faults in High-Performance Microprocessors

Overcoming Hard-Faults in High-Performance Microprocessors by AminAnsari A dissertationsubmittedin partialfulfillment oftherequirementsforthedegreeof DoctorofPhilosophy (ComputerScience and Engineering) in TheUniversityofMichigan 2011 DoctoralCommittee: AssociateProfessorScott Mahlke,Chair ProfessorToddM. Austin AssistantProfessorThomasF. Wenisch AssistantProfessorZhengyaZhang © AminAnsari 2011 AllRightsReserved Tomyfamily ii ACKNOWLEDGEMENTS Firstofall,Iwouldliketoexpressmygratitudetomyadvisor,ProfessorScottMahlke, forhissupportandmentorshipinthesepastyears. Anenthusiasticresearcherandaconstant source of ideas, Scott was a great advisor. I also owe thanks to the remaining members of my dissertation committee, Professor Austin, Professor Wenisch, and Professor Zhang. Theyall donatedtheirtimetohelpshapethisresearch intowhatit hasbecometoday. I am also indebted to my reliability colleagues, Shantanu Gupta and Shuguang Feng. It has been a pleasure working with them, and I cannot imagine having this thesis in its current form without their support. Shantanu spent a lot of time discussing research ideas with me and helping me to gain a better grasp of research fundamentals. Shuguang has helped with methroughoutthis process, sittingthrough ourlong meetings with Scott, giv- ing excellentresearch insights,and always beingthere forany help withwritingand proof reading papers. During my stay in Michigan, I was lucky to work with an amazing set of people in our research lab, Compilers Creating Custom Processors (CCCP). I would like to thank Gaurav Chadha, Hyoun Kyu Cho, Kevin Fan, Shuguang Feng, Shantanu Gupta, Jeff Hao, Amir Hormati, Po-Chun Hsu, Anousheh Jamshidi, Manjunath Kudlur, Yuan Lin, Andrew Lukefahr,MojtabaMehrara,HyunchulPark,YongjunPark,MehrzadSamadi,AnkitSethia, iii MarkWoh,andGriffinWright. Youfolksmadecomingtotheofficemorefun,andIwould have not made it through without you. I have also learned a great deal about different cultures, beliefs, and cuisines through our countless discussions in the lab. I also want to thank my world-class ping pong buddy, Gaurav Chadha for late-night, smash-intensive games. Most importantly, my family deserves major gratitude. My parents and my sister pro- vided their unconditional love and support. I feel blessed to have a family who has given me all the opportunities to succeed in life. My dad’s dedication and academic excellence have been a constant source of inspiration to me throughout my whole life. His endless knowledgeandcourageousgrandvisionsstillamazeme. Myuncleandhiswifehavemade my occasional California vacations truly memorable. Finally, the greatest of thanks go to mycousins,uncles,aunts,and grandparents whoenriched momentsofmylife. iv TABLE OF CONTENTS DEDICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii LIST OFFIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii LIST OFTABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv CHAPTER I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 ReliabilityThreats inDeep SubmicronTechnologies . . . . . . . . 1 1.1.1 Process Variation . . . . . . . . . . . . . . . . . . . . . 3 1.1.2 Manufacturingdefects . . . . . . . . . . . . . . . . . . 3 1.1.3 Wearout . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.4 PowerConsumption . . . . . . . . . . . . . . . . . . . 5 1.2 OvercomingHard-FaultsinHigh-Performance Microprocessors . . 6 1.2.1 ChallengeswithHigh-Performance Microprocessors . . 7 1.2.2 ProtectingOn-Chip Caches . . . . . . . . . . . . . . . . 9 1.2.3 ProtectingNon-Cache PartsoftheCore . . . . . . . . . 11 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 II. RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.1 Fault-TolerantCache Techniques . . . . . . . . . . . . . . . . . . 16 2.1.1 CodingSolutions . . . . . . . . . . . . . . . . . . . . . 16 2.1.2 Circuit-Leveland VLSISolutions . . . . . . . . . . . . 17 2.1.3 ArchitecturalSolutions . . . . . . . . . . . . . . . . . . 18 2.2 Low-PowerCacheTechniques . . . . . . . . . . . . . . . . . . . 19 2.2.1 ConventionalLow-PowerCache Techniques . . . . . . 19 v 2.2.2 LoweringPowerbyToleratingFailuresinOn-ChipCaches 19 2.2.3 AlternativeSRAM Cells . . . . . . . . . . . . . . . . . 20 2.3 HandlingHard-Faultsin theNon-Cache Parts oftheCore . . . . . 21 2.3.1 Coarse-Grained Redundancy and Disabling . . . . . . . 21 2.3.2 Fine-Grained Redundancyand Disabling . . . . . . . . 21 2.3.3 UnconventionalApproaches . . . . . . . . . . . . . . . 22 III. Armoring CacheArchitectures inHighDefectDensity Technologies . . 24 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2 ZerehCache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2.1 ZCArchitecture . . . . . . . . . . . . . . . . . . . . . 28 3.2.2 Hard-FaultDetection . . . . . . . . . . . . . . . . . . . 35 3.2.3 ZCConfiguration . . . . . . . . . . . . . . . . . . . . . 36 3.3 DesignSpace Exploration . . . . . . . . . . . . . . . . . . . . . . 41 3.4 YieldAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.5 Wearout Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.6 Comparisonand Discussion . . . . . . . . . . . . . . . . . . . . . 55 3.6.1 ComparisonwithConventionalTechniques . . . . . . . 55 3.6.2 ComparisonwithRecently Proposed Techniques . . . . 57 3.6.3 Significance . . . . . . . . . . . . . . . . . . . . . . . . 59 3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 IV. A Polymorphic Cache Design for Enabling Robust Near-Threshold Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.2 Archipelago . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.2.1 BaselineAPArchitecture . . . . . . . . . . . . . . . . . 67 4.2.2 APwithRelaxed GroupFormation . . . . . . . . . . . 71 4.2.3 APConfiguration . . . . . . . . . . . . . . . . . . . . . 75 4.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.3.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . 81 4.3.2 DesignSpace Exploration . . . . . . . . . . . . . . . . 83 4.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.4 QuantitativeComparisontoAlternativeMethods . . . . . . . . . . 90 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 V. Enhancing System Throughput by Animating DeadCores . . . . . . . 93 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.2 Utilityofan Undead Core . . . . . . . . . . . . . . . . . . . . . . 95 5.2.1 Effect ofHard-Faults onProgram Execution. . . . . . . 95 5.2.2 RelaxingCorrectness Constraints . . . . . . . . . . . . 97 5.2.3 OpportunitiesforAcceleration . . . . . . . . . . . . . . 98 vi 5.3 From TraditionalCouplingtoAnimation . . . . . . . . . . . . . . 100 5.4 NM Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.4.1 High-LevelNMSystem Description . . . . . . . . . . . 103 5.4.2 HintGatheringand Distribution . . . . . . . . . . . . . 106 5.4.3 ReducingCommunicationOverheads . . . . . . . . . . 109 5.4.4 HintDisablingMechanisms . . . . . . . . . . . . . . . 110 5.4.5 Resynchronization . . . . . . . . . . . . . . . . . . . . 113 5.4.6 NMDesignforCMP Systems . . . . . . . . . . . . . . 114 5.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.5.1 ExperimentalMethodology . . . . . . . . . . . . . . . 116 5.5.2 ExperimentalResults . . . . . . . . . . . . . . . . . . . 119 5.6 ThroughputEnhancement . . . . . . . . . . . . . . . . . . . . . . 126 5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 VI. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 vii LIST OF FIGURES Figure 3.1 Probabilityofhavingat leastonefaultySRAM cellat differentgranular- itieswhilevaryingthefailureprobabilityofeach SRAM cell,P . . . . . 25 F 3.2 Fractionofnon-functionalSRAMbit-cellsfora2MBL2cacheovertime. Here, the mean time to failure of each SRAM bit-cell is varied from 50 to 200years. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3 Twosimplescenariosinwhichthelineswappingcanpreservethecorrect functionality of the cache by resolving the occurred collision. A black box showsafaulty chunkofdata. . . . . . . . . . . . . . . . . . . . . . . 29 3.4 Thehigh-levelarchitectureoftheZCisshowninthisfigureandtheextra modules that are added to the baseline cache are highlighted. Note that the slices of the base address are shown using numbers 1, 2 and 3 (Ad- dressFormat). Thefaultmaparrayandsparecachehavetheirownshared decoder to avoid getting their word-line activationsignals from the main cache’s decoder. For simplicity, the separate sense amps for the fault map and spare cache are not shown. Built-in-self-test (BIST) module is commonlyused forfault diagnosisintheembeddedmemorystructures. . 30 3.5 A Benes network is shown which connects the second rows of the four consecutive logical group of rows in the main cache. As an example, a singleroutefromthedecoderto theword-linesis alsoshown. . . . . . . . 32 3.6 Mapping between the graph coloring problem and the defect pattern in the main/spare caches. The solid edges stand for the intrinsic conflicts between the word-lines. The dotted edges correspond to the word-line conflictsduetothedefectpattern. An“X”indicatesacollisionusingade- faultgrouping. Numberswritteninthefaultmapindicatethecorrespond- ing cache word-lines to which the spare units are assigned. (G=Green, B=Blue, P=Purple, O=Orange) . . . . . . . . . . . . . . . . . . . . . . . 37 viii 3.7 Proper configuration of two BNs that transform the actual cache layout (left) to the virtual one (right) for the given coloring assignment. The upper(lower)BNconnectsthefirst(second)rowsofthe4logicalgroups. The darker 2-input MUXes are configured to output their lower input while the lighter MUXes output their upper input. (G=Green, B=Blue, P=Purple, O=Orange) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.8 The run-time of the IBSC graph coloring solver in ms for different edge densities and number of nodes in the graph. In this figure, p is the edge densitywhich is defined as theprobabilityofhaving an edge between an arbitrary pairofnodesin arandomgraph G(n,p). . . . . . . . . . . . . . 43 3.9 P of L2 ZC for different P while fixing two parameters and allowing op F thethirdonetovary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.10 Area, power, and energy overhead of the potential L1/L2 ZCs which are stated inpercentage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.11 Distribution of generated chips by the number of faulty SRAM cells in their L1/L2 caches. A population of 1000 chips is generated by consid- ering the large-area clustering effect, intra-die, inter-die, systematic, and parametricvariations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.12 Results of Monte Carlo lifetime simulation which show the probability of operation for L1/L2 caches protected by different mechanisms. In ad- dition, the shaded region showsthe expected numberof failures overthe life-time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.13 Area overhead of the different protection mechanisms for tolerating a given P . In this figure, Row-Redun stands for the row redundancy pro- F tection scheme. ECC and ECC-2 are the 1-bit and 2-bit error correction schemes, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.1 BiterrorrateforanSRAMcellwithvaryingV valuesin90nm. Forthis dd technology, the write-margin is the dominant factor and limits the oper- ational voltage of the SRAM structure. Here, the Y-axis is logarithmic, highlightingtheextremelyfastgrowthinfailureratewithdecreasingV . dd The two horizontal dotted-lines mark the failure rates at which the men- tionedSRAM structures(64KBand2MB)canoperatewithatleast99% manufacturingyield. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 ix

Description:
with me and helping me to gain a better grasp of research fundamentals. Shuguang has helped with me throughout this 19. 2.2.1 ConventionalLow- PowerCacheTechniques 19 v . functionality of the cache by resolving the occurred collision. A black . and, is therefore very expensive to repair. Two
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.