CSCI 370 Computer Architecture: Homework 4 Solutions

Due date: On or before Monday, April 28, 2025
Absolutely no copying others’ works
Name: Professor Hu

  1. Answer the following questions:

    1. (10%) What is the latency of an R-type instruction (i.e., how long must the clock period be to ensure that this instruction works correctly)?
      Hint: Find the time for the critical (longest-latency) path as there may have several paths for an instruction.
      Ans>
        R-type (“add $t1, $t2, $t3,” e.g.) ⇒

          40 (PC: Register Read) +
          230 (I-Mem) +
          120 (Register File) +
          20 (Mux) +
          210 (ALU) +
          20 (Mux) +
          15 (Register File: Register Setup)
               = 655 ps

    2. (10%) What is the latency of lw?
      Ans>
        lw (“lw $t1, offset($t2),” e.g.) ⇒

          40 (PC: Register Read) +
          230 (I-Mem) +
          120 (Register File) +
          20 (Mux) +
          210 (ALU) +
          230 (D-Mem) +
          20 (Mux) +
          15 (Register File: Register Setup)
               = 885 ps

    3. (10%) What is the latency of sw?
      Ans>
        sw (“sw $t1, offset($t2),” e.g.) ⇒

          40 (PC: Register Read) +
          230 (I-Mem) +
          120 (Register File) +
          20 (Mux) +
          210 (ALU) +
          230 (D-Mem)
               = 850 ps

    4. (10%) What is the latency of beq?
      Ans>
        beq (“beq $t1, $t2, offset,” e.g.) ⇒

          40 (PC: Register Read) +
          230 (I-Mem) +
          120 (Register File) +
          20 (Mux) +
          210 (ALU) +
          5 (AND gate) +
          5 (OR gate) +
          20 (Mux) +
          15 (PC: Register Setup)
               = 665 ps

    5. (10%) What is the latency of an arithmetic, logic, or shift I-type (non-load) instruction?
      Ans>
        I-type (“addi $t1, $t2, 100,” e.g.) ⇒

          40 (PC: Register Read) +
          230 (I-Mem) +
          120 (Register File) +
          20 (Mux) +
          210 (ALU) +
          20 (Mux) +
          15 (Register File: Register Setup)
               = 655 ps

    6. (10%) What is the minimum clock period for this CPU?
      Hint: This is a single-cycle datapath in which all instructions are executed in one clock cycle.
      Ans>
        885 ps


  2. Consider the addition of a multiplier to the CPU shown in the above figures in Slide 12.10. This addition will add 320 ps to the latency of the ALU, but will reduce the number of instructions by 7% (because there will no longer be a need to emulate the multiply instruction).

    1. (10%) What is the clock cycle time with and without this improvement?
      Hint: The clock cycle time without this improvement is from the answer of above question 1.f.
      Ans>
      • Without improvement: 885 ps

      • With improvement: 885 + 320 = 1205 ps


    2. (10%) What is the speedup achieved by adding this improvement?
      Hint: Speedup from the addition = running time without the addition ÷ running time with the addition
      Ans>
           Running time without the addition
        = Instruction count × CPI × Clock cycle time
        = Instruction count × CPI × 885

           Running time with the addition
        = Instruction count × CPI × Clock cycle time
        = 0.93 × Instruction count × CPI × 1205

           Speedup from the addition
        = Running time without the addition ÷ Running time with the addition
        = ( Instruction count × CPI × 885 ) ÷ ( 0.93 × Instruction count × CPI × 1205 )
        = 885 ÷ ( 0.93 × 1205 )
        = 0.79

        Thus, the “speedup” is 0.79. (This “improved” CPU is actually slower than the original).


    3. (10%) What is the slowest the new ALU can be and still result in improved performance?
      Ans>
            Running time with the additionRunning time without the addition
        ⇒ ( 0.93 × Instruction count × CPI × (885+x) ) ≤ ( Instruction count × CPI × 885 )
        ⇒ x ≤ ( 0.07 × Instruction count × CPI × 885 ) ÷ ( 0.93 × Instruction count × CPI )
        ⇒ x ≤ ( 0.07 × 885 ) ÷ 0.93
        ⇒ x ≤ 66.61

        Thus, the time for the ALU can increase by up to 66 ps (i.e., from 210 ps to 276 ps).