Reducing the Delay of Branches


Branch delay can be reduced from 2 cycles to just 1 cycle by the following method. Instead of determining the branch decision at the EX stage, they can be determined earlier in the Decode stage by performing the following two tasks to the “Next PC” logic block, which (i) computes the branch target address and (ii) evaluates the branch decision:
Thus, only one instruction that follows the branch needs to be fetched. If the branch is taken, the instruction is flushed.

We need a control signal to reset the IF/ID register. This will convert the fetched instruction into a nop. Note that this hardware only works for beq (branch on equal). Other branches can be implemented in a similar way. Two difficulties occur when determining branch decisions at the ID stage:



      Do not bother Joseph when he is in his cups (drunk) —    
      he is very irritable.