ebook img

Very long instruction word architectures for digital signal processing PDF

182 Pages·1997·6.7 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Very long instruction word architectures for digital signal processing

VERYLONGINSTRUCTIONWORDARCHITECTURESFORDIGITAL SIGNALPROCESSING By JONATHOND.MELLOTT ADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOL OFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENT OFTHEREQUIREMENTSFORTHEDEGREEOF DOCTOROFPHILOSOPHY UNIVERSITYOFFLORIDA 1997 TABLEOFCONTENTS LISTOFFIGURES vi LISTOFTABLES ix ABSTRACT xi CHAPTERS 1 INTRODUCTION 1 1.1 ComparisonofGeneralPurposeProcessorsVersusDSPProcessors .... 4 1.2 MotivationforVLIWInsertioninDigitalSignalProcessors 9 1.2.1 Characteristicsofdigitalsignalprocessingalgorithms 10 1.2.2 Architecturalresourcesfordigitalsignalprocessing 11 1.2.3 Techniquesforexploitinginstructionlevelparallelism 17 1.2.4 VLIWfordigitalsignalprocessing 20 1.3 ResearchActivities 21 2 INTRODUCTIONTOTHERESIDUENUMBERSYSTEM 25 2.1 TheChineseRemainderTheorem 26 2.2 ComplexResidueNumberSystem 27 2.3 QuadraticResidueNumberSystem 28 2.4 Galois-EnhancedQuadraticResidueNumberSystem 29 2.5 LogarithmicResidueNumberSystem 31 2.6 PreviousWorkintheRNSandConclusions 32 3 THEATHENASENSORARITHMETICPROCESSOR 37 3.1 TestChip 39 3.2 DetailedArchitectureDescription 44 3.2.1 SynchronousstaticRAM 44 3.2.2 Dataswitch 47 3.2.3 Commandandconfigurationregister 47 3.2.4 LRNScorrelatorprocessor 48 3.2.5 LRNSprocessorelement 50 3.3 ExecutionofBasicAlgorithms 53 li 3.3.1 Initialization 53 3.3.2 Basicvectoroperations 54 3.3.3 Convolution 57 3.4 ASAPTestFixture 62 3.5 ASAPTesting 65 3.6 Summary 66 4 VERYLONGINSTRUCTIONWORDDIGITALSIGNALPROCESSORS. 68 4.1 VLIWProcessorOverview 68 4.2 VLIWProcessorFunctionalUnits 71 4.2.1 Instructionfetchanddecodeunit 71 6 4.2.2 Addressarithmeticunit 72 4.2.3 Conventionalarithmeticunit 74 4.2.4 Residuearithmeticunits 74 4.3 On-ChipMemories 78 5 VERYLONGINSTRUCTIONWORDCOMPILERTECHNOLOGY 80 5.1 Introduction 80 5.2 TheCppp ProgrammingLanguage 81 5.2.1 Motivation 81 5.2.2 DifferencesbetweenCandCjjgp 82 5.2.3 Results 85 5.3 AlgorithmAnalysis 86 5.3.1 Convolutionandthefiniteimpulseresponsefilter 87 5.3.2 DiscreteFouriertransform 97 5.3.3 QRdecomposition 108 5.3.4 Results 110 CONCLUSIONS 112 6.1 Summary 112 6.2 Contributions 114 6.3 FutureWork 115 APPENDICES A Cp)£p LANGUAGEREFERENCE 117 A.l Introduction 117 A.2 Notation 117 A.3 LexicalElements 118 A.3.1 Characterset 118 A.3.2 Abstractliterals 119 m A.3.3 Comments 121 A.3.4 Identifiers 122 A.3.5 Reservedwords 123 A.4 TranslationUnit 123 A.4.1 Functiondefinitions 124 A.4.2 Externalobjectdefinitions 125 A.5 Conversions 126 A.6 Expressions 127 A.6.1 Primaryexpressions 127 A.6.2 Postfixoperators 128 A.6.3 Unaryoperators 131 A.6.4 Castoperators 132 A.6.5 Convolutionandsumofproductsoperators 133 A.6.6 Multiplicativeoperators 134 A.6.7 Additiveoperators 135 A.6.8 Bitwiseshiftoperators 135 A.6.9 Relationaloperators 136 A.6.10Equalityoperators 137 A.6.11 BitwiseANDoperator 137 A.6.12BitwiseexclusiveORoperator 138 A.6.13BitwiseinclusiveORoperator 138 A.6.14LogicalANDoperator 138 A.6.15LogicalORoperator 139 A.6.16Conditionaloperator 139 A.6.17Assignmentoperators 139 A.6.18Commaoperator 141 A.7 ConstantExpressions 141 A.8 Declarations 141 A.8.1 Storage-classspecifiers 142 A.8.2 Typespecifiers 143 A.8.3 Typequalifiers 145 A.8.4 Declarators 145 A.8.5 Typenames 146 A.8.6 Typedefinitions 146 A.8.7 Initialization 147 A.9 Statements 147 A.9.1 Labeledstatements 148 A.9.2 Compoundstatements 148 A.9.3 Expressionstatements 149 A.9.4 Selectionstatements 149 A.9.5 Iterationstatements 150 A.9.6 Jumpstatements 153 IV B M-FILES 155 B.l DFTCode 155 B.1.1 rpdft.m 155 B.1.2 gtdft.m 155 B.2 CRTCode 156 B.2.1 crtconf.m 156 B.2.2 gen.m 157 B.2.3 crt.m 158 C TYPOGRAPHICALNOTES 159 REFERENCES 161 BIOGRAPHICALSKETCH 168 v LISTOFFIGURES 1.1 TransistorDensitiesperChipTrendsforMemoriesandMicroprocessors 3 2.1 BlockDiagramofaGEQRNSMultiplier 30 2.2 BlockDiagramofanLRNSMultiplier-Accumulator 33 2.3 PhotographofGaussMachineSingleChannel,QuadProcessorCard .. 34 2.4 Illustrationof(a) PadQuantitytoAreaRatioManagementOptions and(b)ImpactofProcessImprovementsonPadQuantitytoAreaRatio 35 3.1 BlockDiagramofASAPArchitecture 38 3.2 AnnotatedDiePhotographoftheASAPDevice 40 3.3 PinoutoftheTestChip 43 3.4 ASAPTestChipinTestFixture 44 3.5 BlockDiagramofModularMultiplier/Adder/AccumulatorArithmetic Element 45 3.6 BlockDiagramofSynchronousSRAM 46 3.7 DataSwitchBlockDiagram 47 3.8 LRNSCorrelatorProcessor 49 3.9 SimplifiedBlockDiagramofModularMultiplier/Adder/Accumulator ArithmeticElement 51 3.10 AnnotatedDiePhotographofLRNSProcessorElement 52 3.11 PipelineOperationforVectorMultiplication 55 3.12 PipelineOperationofVectorAddition 55 3.13 PipelineOperationofVectorAccumulate 56 vi 3.14 PipelineOperationofMultiply-AccumulateOperation 57 3.15 PipelineOperationofLinearConvolutionOperationforM=N—3... 60 3.16 PipelineOperationforCircularConvolution 62 3.17 BlockDiagramofASAPTestFixture 63 3.18 PhotographofASAPTestFixturewithDeviceUnderTest 64 4.1 VLIWMachineArchitectureBlockDiagram 69 4.2 ExampleofVLIWInstructionCompaction 73 4.3 BlockDiagramofanAddressArithmeticUnit 73 4.4 ExtendedRNSMACArchitecture 75 4.5 NextGenerationVectorUnit 76 5.1 C£)gp SourceforConvolutionSum 88 5.2 DataDistributionandFlowforTwoProcessorConvolutionSum 88 5.3 DataDistributionandFlowforTwoProcessorConvolutionSumUsing InterleavedData 89 5.4 DataDistributionforanLProcessorConvolutionSum 91 5.5 Data Distributionforan L Processor ConvolutionSumUsingInter- leavedData 92 5.6 VLIWFilterSpeedupVersusFilterOrderandNumberofProcessors, BestCase 94 5.7 VLIWFilterSpeedupVersusFilterOrderandNumberofProcessors, WorstCase 95 5.8 Group ofProcessor Elementswith Three-Level HierarchicalProces- sor/MemorySwitching 96 5.9 VLIWFilterSpeedupVersusFilterOrderandNumberofProcessors UsingNUMAInterconnectwith(a)G=4,and(b)G=8 98 5.10 Good-ThomasFFTPermutationMapsforM=3x5=15 101 vii 5.11 Good-ThomasFFTInput/OutputSequencePermutationforM—15 Computation 101 5.12 Cjjgp FunctionforanN=15Good-ThomasFFT 103 5.13 RaderPrimeDFTCircularConvolutionEngine,p=17 106 5.14 Cjjjgp Implementationofap=5RaderPrimeDFT 107 5.15 Cjjgp FunctionforQRDecomposition 109 5.16 DiagramofExecutionTimingandExploitableBlockLevelParallelism forHouseholderQRDecomposition Ill A.l SemanticsofindexAttributes 131 A.2 ControlFlowFortheifandif-elseStatements 150 A.3 ControlFlowForthewhileStatement 151 A.4 ControlFlowForthedo-whileStatement 151 A.5 ControlFlowFortheforStatement 152 A.6 ControlFlowForthedoparStatement 153 viii LISTOFTABLES 3.1 ASAPTestChipPinDescriptions 42 3.2 SynchronousSRAMCommandEffects 46 3.3 CommandRegisterMap 48 3.4 CorrelatorDataI/OandControlSignals 50 3.5 LRNSControlSignalsandOperations 51 3.6 LRNSProcessorInitializationInputs 53 3.7 ProcessorInitializationSequence 54 3.8 VectorMultiplicationProcedure 54 3.9 VectorAdditionProcedure 55 3.10 VectorAccumulateProcedure 56 3.11 VectorMultiply-AccumulateProcedure 57 3.12 LinearConvolutionforM—N=3 58 3.13 LinearConvolutionProcedureforN=3 59 3.14 CircularConvolutionforN=3 60 3.15 ActualDataflowforCircularConvolutionforN=3 61 3.16 CircularConvolutionProcedureforN=3 61 3.17 PatternGeneratorPodMapping 63 3.18 CommandSignalstoLSAD1PodMapping 65 3.19 EstimatedPerformanceofLRNS MACCellin MOSISTechnologies, WhereAvailable 66 IX 3.20 EstimatedPerformanceofanLRNS ArrayofThirty-TwoBitMACs ona1cm2DieforRealandComplexArithmetic 66 4.1 AddressingModesSupportedbyAddressArithmeticUnit 74 5.1 ProductofAllCombinationsofTwoorMorePrimesin{2,3,5,7,11,13} 105 A.l TheCdsp CharacterSet 118 A.2 RegularExpressionsforIntegralLiterals 119 A.3 EscapeSequencesforCharacterandStringLiterals 120 A.4 RegularExpressionforFloating-PointandFixed-PointFormats 121 A.5 Cdsp ReservedWords 123 A.6 DirectionofAutomaticTypeConversions 126 A.7 CompoundAssignmentOperationsandEquivalentAssignments 140 x

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.