Table Of ContentALGEBRAICANDNUMBERTHEORETICCOMPUTING:ADVANCESAND
APPLICATIONSINVLSISIGNALPROCESSING
By
GLENNS.ZELNIKER
ADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOLOFTHE
UNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHE
REQUIREMENTSFORTHEDEGREEOFDOCTOROFPHILOSOPHY
UNIVERSITYOFFLORIDA
1991
ACKNOWLEDGMENTS
First,Iwouldliketothankmyadvisorandcommitteechair,Dr. FredTaylor.
Underhisdirection,Iwasgiventhefreedomandautonomytopursuewhateverareas
ofresearchIfoundinteresting. Healsoprovidedmewithfinancialsupport,showed
methewayintheacademicworld,andtaughtmethevirtuesofpracticality.
IamindebtedtoDr. R.E.KalmanforhistwoyearsofsupportattheCenterfor
MathematicalSystemTheory. Iwouldalsoliketothankmycommitteemembers.
Dr. H.Lam,Dr. K.Sigmon,Dr. J.C.Principe,andDr. D.Wilson. Specialthanks
gotoDr. Principeformanystimulatingconversationsaboutspectralestimationand
adaptivefilteringandtoDr. WilsonandDr. Sigmonwhoinstilledinmealonging
tobeamathematician.
MonicaMurphyatTheAthenaGroupdeservesspecialmentionforfinancingmuch
ofthelaterworkinthisdissertation.
Mygirlfriend,Patricia,hasmadethispastyearoneofincrediblegrowth,both
intellectualandpersonal. ToherIamgratefulforhercompanionship,support,and
allofthewonderfulthingswehavedonetogether.
Finally,but most importantly, Imust thank myfamily. They havegivenme
constantsupport,love,andguidance. Withoutthem,thisworkwouldnothavebeen
possible.
11
TABLEOFCONTENTS
ACKNOWLEDGMENTS ii
LISTOFFIGURES vii
ABSTRACT viii
CHAPTERS
1 INTRODUCTION 1
1.1 HistoryofResidueNumberSystems 3
1.2 LiteratureSurvey 8
1.3 ResidueNumberSystemsinDSP 15
1.4 TheNeedforanAlternativeTechnology 18
1.5 OrganizationofDissertation 24
2 MATHEMATICALPRELIMINARIES 29
2.1 UniversalAlgebras 29
2.1.1 AlgebraicSystems 29
2.1.2 ComputationbyHomomorphicImages 32
2.2 TheChineseRemainderTheorem 33
2.2.1 TheIntegerCRT 35
2.2.2 ThePolynomialCRT 36
2.3 FiniteFieldTheory 38
2.4 AssociativeAlgebras 42
3 THEINTEGERRNS 44
3.1 Introduction 44
iii
3.2 SignedRNS 47
3.3 AnRNSSystem 47
3.3.1 RNSInputConversion 48
3.3.2 TheRNSComputationalUnit 49
3.3.3 OutputConversion 54
3.3.4 EfficientCRTimplementation 54
4 THEQUADRATICRNS 67
4.1 MultipleModulusQRNS 70
4.2 AQRNSSystem 71
4.2.1 QRNSInputConversion 72
4.2.2 TheQRNSComputationalUnit 75
4.2.3 OutputConversion 76
4.2.4 LogarithmicFiniteFieldAddition 78
5 THERNSANDDIGITALFILTERING 84
5.1 DigitalFiltering 84
5.2 TheRNSFIR 86
5.3 AnRNSAdaptiveTransversalFilter 92
6 THEPOLYNOMIALRESIDUENUMBERSYSTEM 95
6.1 Introduction 95
6.2 PRNSForwardandInverseMappings 97
6.3 TheRing{{Zpf)[x\ 102
6.4 DynamicRangeExtension 103
6.5 ThePRNSFFT 106
6.6 2-DCyclicConvolution Ill
6.7 ChineseRemainderTheoremOverR[x,y] 112
6.8 The2-DRaderAlgorithm 118
7 COMPUTATIONALCOMPLEXITY 121
8 ALGEBRAICINTEGERRESIDUENUMBERSYSTEM(AIRNS) .... 126
IV
9 DISTRIBUTEDARITHMETICIMPLEMENTATIONOFFFTs 134
9.1 TheGood-ThomeisFFT 134
9.2 TheRaderPrimeAlgorithm 136
9.3 DistributedArithmetic 137
9.4 TheDistributedArithmeticSmallFFT 139
10 THEFFTARRAYPROCESSOR 143
10.1 Introduction 143
10.2 TheRadix-8FastFourierTransform 145
10.2.1 Introduction 145
10.2.2 EfficientMemoryAddressingforParallelComputationofthe
Radix-8FFT 149
10.2.3 DoublingFFT ^• 151
10.3 EfficientCRTImplementation 152
10.4 TheFFTArrayProcessor 153
10.4.1 Introduction 153
10.4.2 OverviewoftheFFTAP 155
10.4.3 InputConversionSubsystem 156
10.4.4 TheRadix-8Processor 158
10.4.5 QRNSRadix-8Processor 161
10.4.6 ScaledCRTsubsystem 166
10.4.7 FFTAPintheCascadeMode 168
10.4.8 FFTAPMemorySubsystem 169
10.5 VLSILayoutandTimingAnalysis 174
10.5.1 InputConversionChip 174
10.5.2 QRNSRadix-8Processor 175
10.5.3 ScaledCRTChip 175
10.6 NumericalSimulationoftheFFTAP 181
11 SUIMMARYANDCONCLUSIONS 186
REFERENCES 189
BIOGRAPHICALSKETCH 198
V
LISTOFFIGURES
3.1 ThebasiccomponentsofanRNSsystem 48
3.2 RNSforwardconversionelement 50
3.3 Mod-padderarchitecture 55
3.4 SimplifiedscalingCRTengine(p=2^®) 60
3.5 BlockdiagramofDA-CRT 65
4.1 QRNSforwardconversionelement 74
4.2 QRNSmultiplierunitusingindexaddition 76
4.3 ScalingQRNSCRTengine 79
4.4 LogarithmicZp-adders 83
5.1 TheRNSFIR 87
5.2 Multiplierlessmultiply/accumulateunit 90
5.3 VLSIfloorplanforFIRarray 91
5.4 FinitefieldLMStap-weightupdatecell 94
6.1 Semi-systolicarraysforPRNSforwardandinversemappings.... 105
6.2 BlockdiagramofPRNSDFTengine 110
8.1 Thesets (top)and (bottom) 129
9.1 DistributedarithmeticFFTengine 142
10.1 ScaledCRTengineforQRNSoutputconversion 154
10.2 BlockdiagramofFFTAP 157
10.3 QRNSForwardConversionChip 159
10.4 Conventionalradix-8processor 162
10.5 Eight-pointradix-2decimation-in-timeFFT 164
10.6 QRNSradix-2butterfly 165
10.7 QRNSradix-8processor 167
VI
10.8 CubicmemorymoduleforFFTAP 173
10.9 TiminganalysisofQRNSinputconversionchip 176
10.10 Radix-8andradix-2enginetiminganalysis 177
10.11 TiminganalysisofscaledCRTchip 179
10.12 TiminganalysisofFFTAPforasingleFFTstage 180
Vll
AbstractofDissertationPresentedtotheGraduateSchooloftheUniversityof
FloridainPartialFulfillmentoftheRequirementsoftheDegreeofDoctorof
Philosophy
ALGEBRAICANDNUMBERTHEORETICCOMPUTING:ADVANCESAND
APPLICATIONSINVLSISIGNALPROCESSING
By
GlennS.Zelniker
May1991
Chairman: Dr. FredJ.Taylor
MajorDepartment: ElectricalEngineering
Digitalsignalprocessing(DSP)isafieldwhichhasbenefitedfromadvancesin
devicetechnologyandintegration. Asdevicesareapproachingtheirlimitsinterms
ofsizeandspeed,theneedforincreasedthroughputstillremains.Itseemslikelythat
thenecessarybreakthroughswillbeatthearithmeticandalgorithmiclevels.Wewill
showhowthecomputationalbottleneckcanbefreedbyusinginnovationsfromthe
areasofalgebraicandnumbertheoreticcomputing. Theseinnovationsfacilitatethe
developmentoffastalgorithmsandthedesignofnewclassesofarithmeticproces-
sorswhicharecapableofachievingunprecedentedthroughputinalimitedsizewith
limitedpowerdissipation.
viii
Themajorsub-fieldofalgebraicandnumber-theoreticcomputingwewilldiscuss
istheresiduenumbersystem(RNS).TheRNSisasystemforcomputerarithmetic
whichisbasedontheprincipleofcomputationbyhomomorphicimages(CHI).The
RNShasbeenshowntobeoptimalinbothsizeandspeedandpossessessomead-
ditionalpropertiessuchasmodularityandfault tolerancewhichmakeitanideal
candidateforVLSIsignalprocessingimplementations. Wewillalsopresentsome
otherCHIschemesforhigh-speedsignalprocessingwhicharevariantsoftheRNS
andareeasilyimplementedinVLSI.
This dissertation will provide the mathematical foundations of algebraic and
number-theoreticcomputingand showhow the RNS and its variantsfit intothe
generaltheory. WealsoprovideanewinterpretationoftheChineseremainderthe-
oremwhichyieldsnewinsight as tohowit providesadecreasein computational
complexity.
Finally,wewillgivegeneralguidelinesfortheVLSIrealizationofRNSandother
CHIhardware. Asaconcreteexample,wewilldemonstratethedesignofanRNS
VLSIarrayprocessorforthecomputationofultrahigh-speedfastFouriertransforms
(FFTs).Thisprocessorservesasamotivationforthetheoryandpointsouttheshort-
comingsofconventionaltechnology. Itwillbeseenthattheprocessorisunmatched
intermsofsize,throughput,andpowerdissipationandillustratestheimpactthatal-
gebraicandnumber-theoreticcomputingcanhaveindesigningextremelydemanding
systemsfordigitalsignalprocessing.
IX
CHAPTER1
INTRODUCTION
Oneoftheprincipaladvantagesandapplicationsofthemoderndigitalcomputer
isnumericdataprocessing. Highnumericdataraterequirementsarefoundinmany
technicalareassuchassignalandimageprocessing,communicationsandradar,and
artificialneuralnetworks. Theassumedperformancemetricinthisfieldhasbeen
arithmeticspeedmeasuredinMIPsorMFLOPs. However,thereareotherfactors
whichalsomustbeconsideredindigitalapplications. First,manydigitalprocessors
aredesignedtooperateinasmallvolumeandconsumelittlepower.Secondly,some
ofthesedigitalarithmeticprocessorswilloperateunattendedoverlongperiodsof
timeandmustthereforecarrywiththemthemeansofrecoveringfromcomponent
failures.
Alldigitaltechnologiesbringwiththemtheirownagendaofspeed,size,power,
andfault-tolerancetrade-offs. Thisdissertationwilladdresssomeoftherecentthe-
oreticalandtechnologicaladvanceswhichhaveledtothedevelopmentofpowerful
newclassesofnumericprocessors. Oneparticularsub-areaofalgebraiccomputing
willbetreatedindepth: theresiduenumbersystem(RNS)anditsmanyvariants
[67]. ItwillbeshownthattheRNSpossessesauniqueblendofspeed,size,power
consumption,faulttolerance,andarithmeticefficiency.
TheuseoftheRNSwillbemotivatedbythedesignofamachinefortheultra
high-speedcomputationoffastFourierTransforms,whichisaproblemofsignificant
1