Slide 10.3: Floating-point representation (cont.)

Floating-Point Representation (Cont.)

The previous formats are part of the IEEE 754 floating-point standard. For a normalized floating point number (S, E, F):

F = f₁ f₂ f₃ f₄ ...

Significand is equal to (1.F)₂ = (1.f₁f₂f₃f₄)₂ because

IEEE 754 assumes hidden 1. (not stored) for normalized numbers.
Significand is therefore 1 bit longer than fraction.

Value of a normalized floating point number is

   (-1)^S × (1.F)₂ × 2^val(E)

 = (-1)^S × (1.f₁f₂f₃f₄...)₂ × 2^val(E)

 = (-1)^S × (1 + f₁×2^-1 + f₂×2^-2 + f₃×2^-3 + f₄×2^-4 ...)₂ × 2^val(E)

For the reason of simplified sorting, IEEE 754 uses biased representation for the exponent, that is,

   Value of exponent = val(E) = E – Bias

Recall that exponent field is 8 bits for single precision. E can be in the range

   [0=00000000₂, 255=11111111₂=2⁸-1]

E = 0 and E = 255 are reserved for special use and E = 1 to 254 are used for normalized floating point numbers. So, Bias=127(=254÷2) and val(E)=E-127. For example,

   val(E=126=01111110₂) = 126-127 =  -1
   val(E=128=10000000₂) = 128-127 =   1
   val(E=254=11111110₂) = 254-127 = 127

For similar reason, the exponent bias for double precision is 1023 because its 11-bit exponent has the range [0, 2047]. The value of a normalized floating point number is therefore refined as

   (-1)^S × (1.F)₂ × 2^E-Bias

 = (-1)^S × (1.f₁f₂f₃f₄...)₂ × 2^E-Bias

 = (-1)^S × (1 + f₁×2^-1 + f₂×2^-2 + f₃×2^-3 + f₄×2^-4 ...)₂ × 2^E-Bias