The Guard Bit


Floating-point numbers are approximations for real numbers because they are not able to represent all real numbers. For example, there is an infinite variety of real numbers between 1.0 and 2.0, but no more than 252 fractions can be represented in double-precision floating-point numbers. The best we can do is getting the floating-point number close to the actual number. Extra bits are generated in intermediate results, but when packing result fraction, extra bits are discarded. The guard bit is used to guard against loss of a significant bit: The following example uses 24 significant bits with and without a guard bit.
   1.00000000101100010001101 × 25
 – 1.00000000000000010011010 × 2-2 (subtraction)
                      
   1.00000000101100010001101 × 25
 – 0.00000010000000000000001 0011010 × 25 (shift right 7 bits)
The following subtraction is WITHOUT a guard bit:
   1.00000000101100010001101 × 25
 1 1.11111101111111111111110 1100110 × 25 (2’s complement) 
 ————————————————————————————————————————— 
 0 0.11111110101100010001011 1100110 × 25 (add significands) 
                      
   1.11111101011000100010110 1100010 × 24 (normalized)
The following subtraction is WITH a guard bit (in red color):
   1.00000000101100010001101 × 25
 1 1.11111101111111111111110 1 100110 × 25 (2’s complement)
 ————————————————————————————————————————— 
 0 0.11111110101100010001011 1 100110 × 25 (add significands)
                      
   1.11111101011000100010111   100010 × 24 (normalized)



      “People know what they do; frequently they know why they do what they do;    
      but what they don’t know is what what they do does.”    
      ― Michel Foucault, Madness and Civilization: A History of Insanity in the Age of Reason