Zero-Delay Branches


The scheme of prediction of taken or not taken requires the calculation of the branch target. This calculation takes one cycle, meaning that taken branches will have a 1-cycle penalty. Theoretically, delayed branches have zero delay, but they include the following disadvantages: Another approach to achieve zero-delay is to use a branch target buffer, which is a structure that caches the destination program counter or destination instruction for a branch. The figure shows the structure of branch target buffer and branch prediction buffer, which will be explained in the next slide.

Branch target buffer is usually organized as a cache with tags, making it more costly than a simple prediction buffer, which uses a small memory instead.

The approach of using a branch target buffer works as follows:
  1. Check the PC to see if the instruction being fetched is a branch.

  2. Store the branch target address in a branch buffer in the IF stage.

  3. If branch is predicted taken,

        then “next PC = branch target fetched from branch target buffer”

        else “next PC = PC + 4”
The prediction bits are to predict whether branches are taken or not taken. They are dynamically determined by the hardware.