Delayed Branches


The delayed branch executes one or more instructions following the conditional branch before the branch is taken. This avoids stalling the pipeline while the branch condition is evaluated, thus keeping the pipeline full and minimizing the effect of conditional branches on CPUs. Compilers and assemblers try to place an instruction that always executes after the branch in the branch delay slot. The figure shows the three ways in which the branch delay slot can be scheduled. The left box in each pair shows the code before scheduling; the right box shows the scheduled code.

  1. The delay slot is scheduled with an independent instruction from before the branch. This is the best choice.

  2. The branch delay slot is scheduled from the target of the branch; usually the target instruction will need to be copied because it can be reached by another path. This strategy is preferred when the branch is taken with high probability, such as a loop branch.

  3. Finally, the branch may be scheduled from the not-taken fall-through.

Strategies (b) and (c) are used when (a) is not possible. In the code for (b) and (c), the use of R1 in the branch condition prevents the add instruction from being moved into the branch delay slot. To make this optimization legal for (b) or (c), it must be OK to execute the sub instruction when the branch goes in the unexpected direction. For example, if R4 were an unused temporary register when the branch goes in the unexpected direction.




      Girlfriend just told me she doesn’t care what    
      she gets for Christmas as long as it has diamonds in it.    
      A pack of playing cards it is then.