Floating-pointNumbers SourcesofErrors StabilityofanAlgorithm SensitiviyofaProblem Fallacies Summary Elements of Floating-point Arithmetic SanzhengQiao DepartmentofComputingandSoftware McMasterUniversity September,2011 Floating-pointNumbers SourcesofErrors StabilityofanAlgorithm SensitiviyofaProblem Fallacies Summary Outline 1 Floating-pointNumbers Representations IEEEFloating-pointStandards UnderflowandOverflow Correctly RoundedOperations 2 SourcesofErrors RoundingError TruncationError Discretization Error 3 Stability ofanAlgorithm 4 Sensitiviy ofa Problem 5 Fallacies Floating-pointNumbers SourcesofErrors StabilityofanAlgorithm SensitiviyofaProblem Fallacies Summary Outline 1 Floating-pointNumbers Representations IEEEFloating-pointStandards UnderflowandOverflow Correctly RoundedOperations 2 SourcesofErrors RoundingError TruncationError Discretization Error 3 Stability ofanAlgorithm 4 Sensitiviy ofa Problem 5 Fallacies Floating-pointNumbers SourcesofErrors StabilityofanAlgorithm SensitiviyofaProblem Fallacies Summary Two ways of representing floating-point On paperwe write afloating-pointnumberin theformat: d .d d βe 1 2 t ± ··· × 0 < d < β, 0 d < β (i > 1) 1 i ≤ t: precision β: base(or radix), almostuniversally2,other commonlyusedbasesare10 and16 e: exponent,integer Floating-pointNumbers SourcesofErrors StabilityofanAlgorithm SensitiviyofaProblem Fallacies Summary Two ways of representing floating-point (cont.) Examples: 1.0 10−1 × t = 2 (the lastzero counts), β = 10,e = 1 − Floating-pointNumbers SourcesofErrors StabilityofanAlgorithm SensitiviyofaProblem Fallacies Summary Two ways of representing floating-point (cont.) Examples: 1.0 10−1 × t = 2 (the lastzero counts), β = 10,e = 1 − 1.234 102 × t = 4,β = 10,e = 2 Floating-pointNumbers SourcesofErrors StabilityofanAlgorithm SensitiviyofaProblem Fallacies Summary Two ways of representing floating-point (cont.) Examples: 1.0 10−1 × t = 2 (the lastzero counts), β = 10,e = 1 − 1.234 102 × t = 4,β = 10,e = 2 1.10011 2−4 × t = 6,β = 2 (binary), e = 4 − Floating-pointNumbers SourcesofErrors StabilityofanAlgorithm SensitiviyofaProblem Fallacies Summary Two ways of representing floating-point (cont.) Examples: 1.0 10−1 × t = 2 (the lastzero counts), β = 10,e = 1 − 1.234 102 × t = 4,β = 10,e = 2 1.10011 2−4 × t = 6,β = 2 (binary), e = 4 − Theprecision t, the baseβ, andtherangeofthe exponente determinea floating-pointnumbersystem. Floating-pointNumbers SourcesofErrors StabilityofanAlgorithm SensitiviyofaProblem Fallacies Summary In memory,a floating-pointnumberis stored in three consecutive fields: sign (1 bit) exponent(dependsonthe range) fraction (dependsonthe precision) Floating-pointNumbers SourcesofErrors StabilityofanAlgorithm SensitiviyofaProblem Fallacies Summary In memory,a floating-pointnumberis stored in three consecutive fields: sign (1 bit) exponent(dependsonthe range) fraction (dependsonthe precision) In orderfora memoryrepresentationto beuseful,there must be astandard.
Description: