Due date (firm): On or before Thursday, April 04, 2024
Absolutely no copying others’ works
Name: ___________________________
Upload the completed homework to the section of “COVID-19 Exams, Homeworks, & Programming Exercises” of Blackboard.
The purpose of homeworks is for students to practice for the exams without others’ help, so the penalty of mistakes will be minor.
Without practicing for the exams properly, students would not be able to do well on the exams.
(Floating-point representation: 20%)
What decimal number does the hexadecimal number (D53A 6800 0000 0000)16 represent if it is a floating point number?
Use the IEEE 754 standard.
(Floating-point arithmetic: 80%)
IEEE 754-2008 contains a half precision that is only 16 bits wide.
The leftmost bit is still the sign bit, the exponent is 5 bits wide and has a bias of 15, and the mantissa is 10 bits long.
A hidden 1 is assumed.
Assuming the three numbers, A=1.237219, B=2.524723, and C=6.742837, are stored in the 16-bit IEEE 754-2008 format, calculate A×B-C by hand.
Assume 1 guard bit, 1 round bit, and 1 sticky bit, and round to the nearest even (only on the Step 4. Rounding the Significand).
Show all the steps in normalized floating point numbers with 11 bits of precision (or 10 bits of fraction); i.e., (-1)S×(1.F)2×2E, where S is the sign bit, |F|=10, and E is without including the bias.
Note that
have to show all five steps step-by-step,
the calculation should be similar to those in Slide 10.11 and Slide 10.16; i.e., the numbers should use a base-2 scientific notation and include 3 extra bits each (e.g., IEEE format and bias need NOT be used),
to answer this question, simulate the hardware by including 3 extra bits for each operand and result and using the 3 extra bits as possible as you could,
if the number has to be shifted to the right on the Step 1. Making Exponents Equal, round (just using the regular rounding method) the number to 10 bits of fraction and 3 extra bits before calculation,
rounding to the nearest even will be applied only once after finishing all calculation because the 3 extra bits need to be kept for further calculation, and
2’s complement representation has to be used if needed.