Refined Version of Multiplication Hardware


The previous algorithm and hardware can be refined as follows. The speed-up comes from performing the operations in parallel: the multiplier and multiplicand are shifted while the multiplicand is added to the product if the multiplier bit is a 1.

The Multiplicand register, ALU, and Multiplier register are all 32 bits wide, with only the Product register left at 64 bits. The MIPS special registers hi and lo are used to store the products of the multiply instructions mult and multu.

Now the product is shifted right. The separate Multiplier register also disappeared. The multiplier is placed instead in the right half of the Product register. The previous algorithm takes 3×32=96 clock cycles to multiply two 32-bit numbers. This algorithm can process the shift and addition at the same time, all in one clock cycle. So it only requires 32 clock cycles to multiply two 32-bit numbers.

4-bit adder produces a 5-bit sum (with carry). Using the above algorithm to complete the following table, which shows a 4-bit multiplication of
11002 × 11012 = 100111002
Iteration Multiplicand Carry Product = HI LO
0 Initialize
1 Add or do nothing
Shift
2 Add or do nothing
Shift
3 Add or do nothing
Shift
4 Add or do nothing
Shift
5 Add or do nothing
Shift
6 Add or do nothing
Shift
... ... ... ... ... ... ...