The Guard Bit
Floating-point numbers are approximations for real numbers because they are not able to represent all real numbers.
For example, there is an infinite variety of real numbers between 1.0 and 2.0, but no more than 252 fractions can be represented in double-precision floating-point numbers.
The best we can do is getting the floating-point number close to the actual number.
Extra bits are generated in intermediate results, but when packing result fraction, extra bits are discarded.
The guard bit is used to guard against loss of a significant bit:
- Only one guard bit is needed to maintain accuracy of result, and
- It is used as the last fraction bit during normalization.
The following example uses 24 significant bits with and without a guard bit.
1.00000000101100010001101 × 25
– 1.00000000000000010011010 × 2-2 (subtraction)
⇓
1.00000000101100010001101 × 25
– 0.00000010000000000000001 0011010 × 25 (shift right 7 bits)
The following subtraction is WITHOUT a guard bit:
1.00000000101100010001101 × 25
1 1.11111101111111111111110 1100110 × 25 (2’s complement)
—————————————————————————————————————————
0 0.11111110101100010001011 1100110 × 25 (add significands)
⇓
1.11111101011000100010110 1100010 × 24 (normalized)
The following subtraction is WITH a guard bit (in red color):
1.00000000101100010001101 × 25
1 1.11111101111111111111110 1 100110 × 25 (2’s complement)
—————————————————————————————————————————
0 0.11111110101100010001011 1 100110 × 25 (add significands)
⇓
1.11111101011000100010111 100010 × 24 (normalized)