NAVAL POSTGRADUATE SCHOOL MONTEREY, CALIFORNIA THESIS DEVELOPMENTOFANEWPREDICTION ALGORITHMANDASIMULATORFORTHE PREDICTIVEREADCACHE(PRC) by F.NadirAltmisdort September 1996 ThesisAdvisor: DouglasJ.Fouts Thesis A4275 Approvedforpublicrelease; distributionisunlimited. DUDLEYKNOXLIBRARY NAVALPOSTGRADUATESCHOOL MONTEREY CA 93943-5101 REPORTDOCUMENTATION PAGE ApprovedOMBNo07(14-0188 Publicreportingburdenforthiscollectionofinformationisestimatedtoaverage1hourperresponse,includingthetimeforreviewinginstruction,searchingexistingdatasources, gatheringandmaintainingthedataneeded,andcompletingandreviewingthecollectionofinformation Sendcommentsregardingthisburdenestimateoranyotheraspectofthis collectionofinformation,includingsuggestionsforreducingthisburden,toWashingtonHeadquartersServices,DirectorateforInformationOperationsandReports.1215Jefferson DavisHighway,Suite1204,Arlington,VA22202-4302,andtotheOfficeofManagementandBudget,PaperworkReductionProject(0704-0188)WashingtonDC20503 AGENCYUSEONLY(Leaveblank) 2. REPORTDATE 3. REPORTTYPEANDDATESCOVERED September 1996 Master'sThesis 4. TITLE AND SUBTITLE DEVELOPMENT OF A NEW PREDICTION FUNDINGNUMBERS ALGORITHMANDASIMULATORFORTHEPREDICTIVEREADCACHE (PRC) AUTHOR(S) F.NadirAltmisdort 7. PERFORMINGORGANIZATIONNAME(S)ANDADDRESS(ES) PERFORMING NavalPostgraduateSchool ORGANIZATION Monterey,CA93943-5000 REPORTNUMBER SPONSORING/MONITORINGAGENCYNAME(S)ANDADDRESS(ES) 10. SPONSORING/MONITORING AGENCYREPORTNUMBER 11. SUPPLEMENTARYNOTES Theviewsexpressed inthisthesisarethoseoftheauthoranddonotreflecttheofficial policyorpositionoftheDepartmentofDefenseortheU.S.Government. 12a. DISTRIBUTION/AVAILABILITYSTATEMENT DISTRIBUTIONCODE Approvedforpublicrelease;distributionisunlimited. 13.ABSTRACT(maximum200words) Efforts to bridge the cycle-time gap between high-end microprocessors and low-speed main memories have led to a hierarchical approach in memory subsystem design. The predictive read cache (PRC) has been developed as an alternative way to overcome the speed discrepancy without incurring the hardware cost ofa second-level cache. Althoughthe PRC can providean improvementoveramemoryhierarchyusingonlyafirst- level cache, previous studies have shown that its performance is degraded due to the poor locality ofreference causedbyprogrambranches,subroutinecalls,andcontextswitches. ThisthesisdevelopsanewpredictionalgorithmthatallowsthePRCtotrackthemisspatternsofthefirst- level cache, even with programsexhibiting poor locality. It presents PRC design alternatives and hardware cost estimates for the implementation ofthe new algorithm. The architectural support needed from the underlying microprocessorisalsodiscussed. The secondpartofthethesisinvolvesthedevelopmentofamemoryhierarchysimulatorandanaddress- traceconversion program to perform trace-driven simulations ofthe PRC. Usingaddresstraces captured from a SPARC-based computer system, the simulations showthat the new prediction algorithm provides a significant improvementinthePRCperformance.ThismakesthePRCidealforembeddedsystemsinspace-based,weapons- basedandportable/mobilecomputingapplications. 14. subject terms Cache, Predictive Read Cache, PRC, Memory, Address 15. NPAUGMEBSER146 OF Traces,Simulator 16. PRICECODE 17. SECURITY CLASSIFICA- 18. SECURITY CLASSIFI- SECURITY CLASSIFICA- 20. LIMITATION OF TIONOFREPORT CATIONOFTHISPAGE TIONOFABSTRACT ABSTRACT Unclassified Unclassified Unclassified UL NSN7540-01-280-5500 StandardForm298(Rev.2-89) PrescribedbyANSIStd.239-18298-102 Approved forpublic release; distribution is unlimited. DEVELOPMENTOFANEWPREDICTIONALGORITHMANDASIMULATORFOR THEPREDICTIVEREADCACHE(PRC) F.NadirAltmisdort Ltjg,TurkishNavy B.S.,TurkishNavalAcademy, 1990 Submittedinpartial fulfillment oftherequirementsforthedegreeof MASTEROFSCIENCEIN ELECTRICALENGINEERING fromthe NAVALPOSTGRADUATE SCHOOL September 1996 DUDLEYKNOXLIBRARY NAVALPOSTGRADUATESCHOOL MONTEREY CA 93943-5101 ABSTRACT Efforts to bridge the cycle-time gap between high-end microprocessors and low- speedmainmemorieshaveledtoahierarchicalapproachinmemorysubsystemdesign.The predictive read cache (PRC) has been developed as an alternative way to overcome the speed discrepancy without incurring the hardware cost ofa second-level cache. Although the PRC can provide an improvement over a memory hierarchy using only a first-level cache,previousstudieshaveshownthatitsperformanceisdegradedduetothepoorlocality ofreferencecausedbyprogrambranches,subroutinecalls,andcontextswitches. This thesis develops a new prediction algorithm that allows the PRC to track the miss patterns of the first-level cache, even with programs exhibiting poor locality. It presentsPRC designalternativesandhardwarecostestimatesfortheimplementationofthe newalgorithm.Thearchitecturalsupportneededfromtheunderlyingmicroprocessorisalso discussed. The second part ofthe thesis involves the development of a memory hierarchy simulator and an address-trace conversion program to perform trace-driven simulations of the PRC. Using address traces captured from a SPARC-based computer system, the simulations showthat the new prediction algorithm provides a significant improvement in the PRC performance. This makes the PRC ideal for embedded systems in space-based, weapons-basedandportable/mobilecomputingapplications. VI TABLEOFCONTENTS INTRODUCTION I. 1 A.MEMORYHIERARCHY 1 B.CACHETHEORY 2 C.THEPREDICTIVEREADCACHE 5 D. OUTLINEOFTHESIS 6 II. DESIGNOFANEWPREDICTIVEREADCACHE 9 A.A NEWPREDICTIONALGORITHM 9 B.ARCHITECTURALSUPPORT 12 C.DESIGNALTERNATIVES 13 1.Direct-MappedPRCDesign 15 2. Set-AssociativePRCDesign 20 3.Fully-AssociativePRCDesign 26 III. HARDWARECOSTESTIMATES 31 IV. ADDRESSTRACECONVERSION 39 A.TRACE-DRIVENCACHESIMULATIONS 39 B.ADDRESSTRACES 41 C. SOFTWARETOOLREQUIREMENTS 44 1.TraceConverter(Tracer) 44 2.BACHAddressTraceEditor(BATE) 54 D.USINGCONVERSIONTOOLS 57 V. CACHEANDPRC SIMULATOR 67 A.INTRODUCTION 67 B. SOFTWAREARCHITECTURE 67 C.OPERATIONALDETAILS 71 1.Configuration 71 2. Simulation 79 3.Evaluation 86 D.INTEGRATEDDEBUGGER 86 VI. SIMULATIONRESULTSANDANALYSIS 89 A.ASSUMPTIONS 89 B. CONSTANTPARAMETERS 89 1.First-levelCacheParameters 90 2. PRCParameters 91 3.TransactionPriorities 93 4.BufferModuleParameters 93 5. MainMemoryParameters 94 C. SIMULATIONRESULTS 94 1. Second-LevelCacheSimulations 94 2.4-WaySet-AssociativePRC Simulations 95 3.Fully-AssociativePRC Simulations 99 D.COST/PERFORMANCE 103 VII. CONCLUSION 105 SUMMARY A. 105 B.RECOMMENDATIONS 106