CSCI 370 Computer Architecture: Homework 3 Solutions

Due date (firm): On or before Thursday, April 04, 2024
Absolutely no copying others’ works
Name: ____Professor Hu____

  1. (Floating-point representation: 20%) What decimal number does the hexadecimal number (D53A 6800 0000 0000)16 represent if it is a floating point number? Use the IEEE 754 standard.
    Ans>
      1 1 0 1 0 1 0 1 0 0 1 1 1 0 1 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0
      0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

      Sign = 1 is negative.
      Exponent = 101010100112 = 20 + 21 + 24 + 26 + 28 + 210 = 1 + 2 + 16 + 64 + 256 + 1024 = 1363
      E - bias = 1363 - 1023 = 340
      Significand = 1.1010011010000...2 = 1 + 2-1 + 2-3 + 2-6 + 2-7 + 2-9 = 1 + 0.5 + 0.125 + 0.015625 + 0.0078125 + 0.001953125 = 1.650390625
      Therefore, the value in decimal is -1.650390625 × 2340


  2. (Floating-point arithmetic: 80%) IEEE 754-2008 contains a half precision that is only 16 bits wide. The leftmost bit is still the sign bit, the exponent is 5 bits wide and has a bias of 15, and the mantissa is 10 bits long. A hidden 1 is assumed. Assuming the three numbers, A=1.237219, B=2.524723, and C=6.742837, are stored in the 16-bit IEEE 754-2008 format, calculate A×B-C by hand. Assume 1 guard bit, 1 round bit, and 1 sticky bit, and round to the nearest even (only on the Step 4. Rounding the Significand). Show all the steps in normalized floating point numbers with 11 bits of precision (or 10 bits of fraction); i.e., (-1)S×(1.F)2×2E, where S is the sign bit, |F|=10, and E is without including the bias. Note that

    • have to show all five steps step-by-step,
    • the calculation should be similar to those in Slide 10.11 and Slide 10.16; i.e., the numbers should use a base-2 scientific notation and include 3 extra bits each (e.g., IEEE format and bias need NOT be used),
    • to answer this question, simulate the hardware by including 3 extra bits for each operand and result and using the 3 extra bits as possible as you could,
    • if the number has to be shifted to the right on the Step 1. Making Exponents Equal, round (just using the regular rounding method) the number to 10 bits of fraction and 3 extra bits before calculation,
    • rounding to the nearest even will be applied only once after finishing all calculation because the 3 extra bits need to be kept for further calculation, and
    • 2’s complement representation has to be used if needed.

    Ans>
      1.    1.237219
         = 1.00111100101110...2
         ≈ 1.0011110010 1 1 12 × 20 (Guard=1, Round=1, Sticky=1)
        
        because
        
         0.237219 × 2 = 0.474438 ⇒ 0
         0.474438 × 2 = 0.948876 ⇒ 0
         0.948876 × 2 = 1.897752 ⇒ 1
         0.897752 × 2 = 1.795504 ⇒ 1
         0.795504 × 2 = 1.591008 ⇒ 1
         0.591008 × 2 = 1.182016 ⇒ 1
         0.182016 × 2 = 0.364032 ⇒ 0
         0.364032 × 2 = 0.728064 ⇒ 0
         0.728064 × 2 = 1.456128 ⇒ 1
         0.456128 × 2 = 0.912256 ⇒ 0
         0.912256 × 2 = 1.824512 ⇒ 1
         0.824512 × 2 = 1.649024 ⇒ 1
         0.649024 × 2 = 1.298048 ⇒ 1
         0.298048 × 2 = 0.596096 ⇒ 0
         ...
        
         So, 0.237219 = 0.00111100101110...2

      2.    2.524723
         = 10.10000110010101...2
         ≈ 1.0100001100 1 0 12 × 21 (Guard=1, Round=0, Sticky=1)
        
        because
        
         0.524723 × 2 = 1.049446 ⇒ 1
         0.049446 × 2 = 0.098892 ⇒ 0
         0.098892 × 2 = 0.197784 ⇒ 0
         0.197784 × 2 = 0.395568 ⇒ 0
         0.395568 × 2 = 0.791136 ⇒ 0
         0.791136 × 2 = 1.582272 ⇒ 1
         0.582272 × 2 = 1.164544 ⇒ 1
         0.164544 × 2 = 0.329088 ⇒ 0
         0.329088 × 2 = 0.658176 ⇒ 0
         0.658176 × 2 = 1.316352 ⇒ 1
         0.316352 × 2 = 0.632704 ⇒ 0
         0.632704 × 2 = 1.265408 ⇒ 1
         0.265408 × 2 = 0.530816 ⇒ 0
         0.530816 × 2 = 1.061632 ⇒ 1
         ...
        
         So, 0.524723 = 0.10000110010101...2

      3.    6.742837
         = 110.10111110001010...2
         ≈ 1.1010111110 0 0 12 × 22 (Guard=0, Round=0, Sticky=1)
        
        because
        
         0.742837 × 2 = 1.485674 ⇒ 1
         0.485674 × 2 = 0.971348 ⇒ 0
         0.971348 × 2 = 1.942696 ⇒ 1
         0.942696 × 2 = 1.885392 ⇒ 1
         0.885392 × 2 = 1.770784 ⇒ 1
         0.770784 × 2 = 1.541568 ⇒ 1
         0.541568 × 2 = 1.083136 ⇒ 1
         0.083136 × 2 = 0.166272 ⇒ 0
         0.166272 × 2 = 0.332544 ⇒ 0
         0.332544 × 2 = 0.665088 ⇒ 0
         0.665088 × 2 = 1.330176 ⇒ 1
         0.330176 × 2 = 0.660352 ⇒ 0
         0.660352 × 2 = 1.320704 ⇒ 1
         0.320704 × 2 = 0.641408 ⇒ 0
         ...
        
         So, 0.742837 = 0.10111110001010...2

        A × B
      = 1.237219 × 2.524723
      ≈ (1.0011110010 1 1 12 × 20) × (1.0100001100 1 0 12 × 21)
      Step 1. Calculating the exponent of the product
         0 + 1 = 1
      Step 2. Multiplying the significands
                       1.0011110010111
       ×               1.0100001100101
      ————————————————————————————————
                        10011110010111
                       00000000000000
                      10011110010111
                     00000000000000
                    00000000000000
                   10011110010111
                  10011110010111
                 00000000000000
                00000000000000
               00000000000000
              00000000000000
             10011110010111
            00000000000000
       +   10011110010111
      ————————————————————————————————
           110001111110011011010010011 or
          1.10001111110011011010010011
         (1.0011110010 1 1 12 × 20) × (1.0100001100 1 0 12 × 21)
       = 1.100011111100110110100100112 × 21
      Step 3. Normalizing the product
         1.100011111100110110100100112 × 21
       ≈ 1.1000111111 0 0 12 × 21 (Guard=0, Round=0, Sticky=1)

         A × B - C
      ≈ (1.1000111111 0 0 12 × 21) - (1.1010111110 0 0 12 × 22)
      Step 1. Making exponents equal
         1.1000111111 0 0 12 × 21
       = 0.1100011111 1 0 12 × 22 (Guard=1, Round=0, Sticky=1)
      Step 2. Performing subtraction
          0.1100011111 1 0 12 × 22 (Guard=1, Round=0, Sticky=1)
       -  1.1010111110 0 0 12 × 22 (Guard=0, Round=0, Sticky=1)
                                   ⇓
         00.1100011111 1 0 12 × 22
       + 10.0101000001 1 1 12 × 22 (2’s complement)
      ————————————————————————————————————————————
         11.0001100001 1 0 02 × 22 (negative)
       = -0.1110011110 1 0 02 × 22 (Guard=1, Round=0, Sticky=0)
      Step 3. Normalizing the difference
          -0.1110011110 1 0 02 × 22 (Guard=1, Round=0, Sticky=0)
        = -1.1100111101 0 02   × 21 (Round=0, Sticky=0)
      Step 4. Rounding the significand
          -1.1100111101 0 02 × 21 (Round=0, Sticky=0)
        ≈ -1.11001111012     × 21
      Step 5. Checking for overflow or underflow
         -1.11001111012 × 21
       = -11.1001111012
       = -(21 + 20 + 2-1 + 2-4 + 2-5 + 2-6 + 2-7 + 2-9)
       = -(2 + 1 + 0.5 + 0.0625 + 0.03125 + 0.015625 + 0.0078125 + 0.001953125)
       = -3.619140625
       ≈  1.237219 × 2.524723 - 6.742837
       = -3.61920173