Implementation of binary floating-point arithmetic on embedded integer processors Polynomialevaluation-basedalgorithms and certifiedcodegeneration GuillaumeRevy Advisors: Claude-PierreJeannerodandGillesVillard Are´naireINRIAproject-team(LIP,EnsLyon) Universite´deLyon CNRS Ph.D.Defense–December1st,2009 GuillaumeRevy–December1st,2009. Implementationofbinaryfloating-pointarithmeticonembeddedintegerprocessors 1/45 SomeembeddedsystemsdonothaveanyFPU(floating-pointunit) Highlyusedinaudioandvideoapplications (cid:73) demandingonfloating-pointcomputations Motivation Embeddedsystemsareubiquitous (cid:73) microprocessorsdedicatedtooneorafewspecifictasks (cid:73) satisfyconstraints: area,energyconsumption,conceptioncost GuillaumeRevy–December1st,2009. Implementationofbinaryfloating-pointarithmeticonembeddedintegerprocessors 2/45 Highlyusedinaudioandvideoapplications (cid:73) demandingonfloating-pointcomputations Motivation Embeddedsystemsareubiquitous (cid:73) microprocessorsdedicatedtooneorafewspecifictasks (cid:73) satisfyconstraints: area,energyconsumption,conceptioncost SomeembeddedsystemsdonothaveanyFPU(floating-pointunit) Embeddedsystems NoFPU GuillaumeRevy–December1st,2009. Implementationofbinaryfloating-pointarithmeticonembeddedintegerprocessors 2/45 Motivation Embeddedsystemsareubiquitous (cid:73) microprocessorsdedicatedtooneorafewspecifictasks (cid:73) satisfyconstraints: area,energyconsumption,conceptioncost SomeembeddedsystemsdonothaveanyFPU(floating-pointunit) Applications FPcomputations Embeddedsystems NoFPU Highlyusedinaudioandvideoapplications (cid:73) demandingonfloating-pointcomputations GuillaumeRevy–December1st,2009. Implementationofbinaryfloating-pointarithmeticonembeddedintegerprocessors 2/45 Motivation Embeddedsystemsareubiquitous (cid:73) microprocessorsdedicatedtooneorafewspecifictasks (cid:73) satisfyconstraints: area,energyconsumption,conceptioncost SomeembeddedsystemsdonothaveanyFPU(floating-pointunit) Applications FPcomputations Embeddedsystems Softwareimplementing NoFPU floating-pointarithmetic Highlyusedinaudioandvideoapplications (cid:73) demandingonfloating-pointcomputations GuillaumeRevy–December1st,2009. Implementationofbinaryfloating-pointarithmeticonembeddedintegerprocessors 2/45 Motivation Embeddedsystemsareubiquitous (cid:73) microprocessorsdedicatedtooneorafewspecifictasks (cid:73) satisfyconstraints: area,energyconsumption,conceptioncost SomeembeddedsystemsdonothaveanyFPU(floating-pointunit) ST2Ims-3IusC1ebimdascPTceyooeihsrm3rreteyiexpemrhseraccIonlosntnTettrrIrranrooupPbsllblItlpCrueuerTatrunrsfLnaufcineBcttprhDidopenobrrBurtRe4efir8gugegawfilingrenislesriietc(statie6htetdree4rS3)rs2T-BbIMUuitsulIUIMUulIUD(LSULtToSonaLUridetB)UTLBDmsPrSDC4uebbDWr-CegoebuuxsImifnirasffsedSipffyctttoteeDreehosecrrorrthreIyelstmsCSCMUCS64T-Bbuits AFpPpcolimcpauttiaotinonss 61interruptsDebuglink Embeddedsystems Softwareimplementing NoFPU floating-pointarithmetic Highlyusedinaudioandvideoapplications (cid:73) demandingonfloating-pointcomputations GuillaumeRevy–December1st,2009. Implementationofbinaryfloating-pointarithmeticonembeddedintegerprocessors 2/45 VLIW(VeryLongInstructionWord) → instructionsgroupedintobundles → Instruction-LevelParallelism(ILP)explicitlyexposedbythecompiler uint32_t R1 = A0 + C; Issue1 Issue2 Issue3 Issue4 uint32_t R2 = A3 * X; uint32_t R3 = A1 * X; 0 R1 R2 R3 uint32_t R4 = X * X; 1 R4 Overview of the ST231 architecture ST231core SDIports 4-issueVLIW32-bitintegerprocessor ICache InsbIturTufLfceBtrion rR4efi8egwlgreiesri(sati6tetdee4r)rs Mul Mul D(LSULtToSonaLUridetB) UTLB DrC4ebWCgouxinrafsSifcttteDrheeroreIls SCUSTBus Pa→ralnleolFePxeUcutionunit CMC64-bit Ims-usebimdseyosrtyem conTtrraopllPbeCrruannaincthd rBergafiinsltecehr IU IU IU IU DmsuP-ebbsrmiseudyffoeefsrtetycerhm (cid:73)(cid:73) 42ipnitpeeglienreAdLmUultipliers32×32→32 PeTrimi3pxehresralscIonntetrrroullpetrsupDpeobrutgunit S32T-Bbuits Latencies: ALU→1cycle,Mul→3 61interrupts Debuglink cycles GuillaumeRevy–December1st,2009. Implementationofbinaryfloating-pointarithmeticonembeddedintegerprocessors 3/45 uint32_t R1 = A0 + C; Issue1 Issue2 Issue3 Issue4 uint32_t R2 = A3 * X; uint32_t R3 = A1 * X; 0 R1 R2 R3 uint32_t R4 = X * X; 1 R4 Overview of the ST231 architecture ST231core SDIports 4-issueVLIW32-bitintegerprocessor ICache InsbIturTufLfceBtrion rR4efi8egwlgreiesri(sati6tetdee4r)rs Mul Mul D(LSULtToSonaLUridetB) UTLB DrC4ebWCgouxinrafsSifcttteDrheeroreIls SCUSTBus Pa→ralnleolFePxeUcutionunit CMC64-bit Ims-usebimdseyosrtyem conTtrraopllPbeCrruannaincthd rBergafiinsltecehr IU IU IU IU DmsuP-ebbsrmiseudyffoeefsrtetycerhm (cid:73)(cid:73) 42ipnitpeeglienreAdLmUultipliers32×32→32 PeTrimi3pxehresralscIonntetrrroullpetrsupDpeobrutgunit S32T-Bbuits Latencies: ALU→1cycle,Mul→3 61interrupts Debuglink cycles VLIW(VeryLongInstructionWord) → instructionsgroupedintobundles → Instruction-LevelParallelism(ILP)explicitlyexposedbythecompiler GuillaumeRevy–December1st,2009. Implementationofbinaryfloating-pointarithmeticonembeddedintegerprocessors 3/45 Overview of the ST231 architecture ST231core SDIports 4-issueVLIW32-bitintegerprocessor ICache InsbIturTufLfceBtrion rR4efi8egwlgreiesri(sati6tetdee4r)rs Mul Mul D(LSULtToSonaLUridetB) UTLB DrC4ebWCgouxinrafsSifcttteDrheeroreIls SCUSTBus Pa→ralnleolFePxeUcutionunit CMC64-bit Ims-usebimdseyosrtyem conTtrraopllPbeCrruannaincthd rBergafiinsltecehr IU IU IU IU DmsuP-ebbsrmiseudyffoeefsrtetycerhm (cid:73)(cid:73) 42ipnitpeeglienreAdLmUultipliers32×32→32 PeTrimi3pxehresralscIonntetrrroullpetrsupDpeobrutgunit S32T-Bbuits Latencies: ALU→1cycle,Mul→3 61interrupts Debuglink cycles VLIW(VeryLongInstructionWord) → instructionsgroupedintobundles → Instruction-LevelParallelism(ILP)explicitlyexposedbythecompiler uint32_t R1 = A0 + C; Issue1 Issue2 Issue3 Issue4 uint32_t R2 = A3 * X; uint32_t R3 = A1 * X; 0 R1 R2 R3 uint32_t R4 = X * X; 1 R4 GuillaumeRevy–December1st,2009. Implementationofbinaryfloating-pointarithmeticonembeddedintegerprocessors 3/45 How to emulate floating-point arithmetic in software? Designandimplementationofefficientsoftwaresupportfor IEEE754floating-pointarithmeticonintegerprocessors ExistingsoftwareforIEEE754floating-pointarithmetic: (cid:73) Softwarefloating-pointsupportofGCC,GlibcandµClibc,GoFast Floating-PointLibrary (cid:73) SoftFloat(→STlib) (cid:73) FLIP(Floating-pointLibraryforIntegerProcessors) • softwaresupportforbinary32floating-pointarithmeticonintegerprocessors • correctly-roundedaddition,subtraction,multiplication,division,squareroot, reciprocal,... • handlingsubnormals,andhandlingspecialinputs GuillaumeRevy–December1st,2009. Implementationofbinaryfloating-pointarithmeticonembeddedintegerprocessors 4/45
Description: