ebook img

Building Custom Arithmetic Operators with the FloPoCo Generator PDF

162 Pages·2013·1.81 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Building Custom Arithmetic Operators with the FloPoCo Generator

Building Custom Arithmetic Operators with the FloPoCo Generator √xsine√nx(cid:88)2x+xy2+ilzo2πgxx Florent de Dinechin ex+ =0 y i Outline Introduction: custom arithmetic FloPoCo for application developers Computing just right FloPoCo for developers of HLS software FloPoCo for developers of custom arithmetic A tutorial: building an integer divider by 3 A tutorial: building a faithful FIR (cid:112) A tutorial: building a x2+y2 operator Conclusion F.deDinechin AFloPoCotutorial 2 Introduction : custom arithmetic Introduction: custom arithmetic FloPoCo for application developers Computing just right FloPoCo for developers of HLS software FloPoCo for developers of custom arithmetic A tutorial: building an integer divider by 3 A tutorial: building a faithful FIR (cid:112) A tutorial: building a x2+y2 operator Conclusion F.deDinechin AFloPoCotutorial 3 Two different ways of wasting silicon Here are two universally programmable chips. Who’s best for (insert your computation here)? F.deDinechin AFloPoCotutorial 4 If you lose according to a metric, change the metric. Peak figures for double-precision floating-point exponential Pentium core : 20 cycles / DPExp @ 4GHz : 200 MDPExp/s FPExp in FPGA : 1 DPExp/cycle @ 400MHz : 400 MDPExp/s Chip vs chip : 6 Pentium cores vs 150 FPExp/FPGA Power consumption also better Single precision data better (Intel MKL vector libm, vs FPExp in FloPoCo version 2.0.0) Are FPGAs any good at floating-point? Long ago (1995), people ported the basic operations : +,−,× Versus the highly optimized FPU in the processor, each operator 10x slower in an FPGA This is the inavoidable overhead of programmability. F.deDinechin AFloPoCotutorial 5 Are FPGAs any good at floating-point? Long ago (1995), people ported the basic operations : +,−,× Versus the highly optimized FPU in the processor, each operator 10x slower in an FPGA This is the inavoidable overhead of programmability. If you lose according to a metric, change the metric. Peak figures for double-precision floating-point exponential Pentium core : 20 cycles / DPExp @ 4GHz : 200 MDPExp/s FPExp in FPGA : 1 DPExp/cycle @ 400MHz : 400 MDPExp/s Chip vs chip : 6 Pentium cores vs 150 FPExp/FPGA Power consumption also better Single precision data better (Intel MKL vector libm, vs FPExp in FloPoCo version 2.0.0) F.deDinechin AFloPoCotutorial 5 Dura Amdahl lex, sed lex SPICE Model-Evaluation, cut from Kapre and DeHon (FPL 2009) F.deDinechin AFloPoCotutorial 6 Custom arithmetic (not your Pentium’s) SX EX FX Shift to fixed−point Constant Fixed-pointX multipliers ×1/log(2) E ×log(2) generic Y A Z polynomial evaluator precomputed ROM eA eZ−Z−1 truncated multiplier E normalize / round R F.deDinechin AFloPoCotutorial 7 Custom arithmetic (not your Pentium’s) SX EX FX Shift to fixed−point wE+wF+g+1 Constant Fixed-pointX multipliers ×1/log(2) E wE+wF+g+1 ×log(2) wE+wF+g+1 wE+1 generic Y A Z polynomial precomputed k MSBwF+g+1−2k evaluator ROM eA eZ−Z−1 wF+g+1−k 1+wF+g truncated MSBwF+g+2−k wF+g+2−k multiplier wF+g−k Never compute E 1 bit more accurately 1+wF+g normalize / round than needed! R F.deDinechin AFloPoCotutorial 7 Custom arithmetic (not your Pentium’s) SX EX FX Shift to fixed−point Need a wE+wF+g+1 Constant Fixed-pointX generator multipliers ×1/log(2) E wE+wF+g+1 ×log(2) wE+wF+g+1 wE+1 generic Y A Z polynomial precomputed k MSBwF+g+1−2k evaluator ROM eA eZ−Z−1 wF+g+1−k 1+wF+g truncated MSBwF+g+2−k wF+g+2−k multiplier wF+g−k Never compute E 1 bit more accurately 1+wF+g normalize / round than needed! R F.deDinechin AFloPoCotutorial 7

Description:
A generator framework (written in C++, outputting VHDL). Objectives : A complete single-precision FPU in a single VHDL file : .. My current crusade.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.