### Chapter 3 – Instruction-Level Parallelism and its Exploitation (Part 3)

ILP vs. Parallel Computers Dynamic Scheduling (Section 3.4, 3.5) Dynamic Branch Prediction (Section 3.3, 3.9, and Appendix C) Hardware Speculation and Precise Interrupts (Section 3.6) Multiple Issue (Section 3.7) Static Techniques (Section 3.2, Appendix H) Limitations of ILP Multithreading (Section 3.11) Putting it Together (Mini-projects)

| 2     IF     ID     EX     MEM     WB       3     IF     ID     EX     MEM     WB       4     IF     ID     EX     MEM     WB       5     IF     ID     EX     MEM     WB       6     IF     ID     EX     MEM     WB       7     IF     ID     EX     MEM     WB       8     IF     ID     EX     MEM     WB                                             | IF ID EX MEM WB<br>I IF ID EX MEM WB<br>IF ID EX MEM WB                                                                                                                                                              |                    |            |     |     |     |    |    |    |     |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------|------------|-----|-----|-----|----|----|----|-----|
| 1     IF     ID     EX     MEM     WB       2     IF     ID     EX     MEM     WB       3     IF     ID     EX     MEM     WB       4     IF     ID     EX     MEM     WB       5     IF     ID     EX     MEM     WB       6     IF     ID     EX     MEM     WB       7     IF     ID     EX     MEM     WB       8     IF     ID     EX     MEM     WB | 1     IF     ID     EX     MEM     WB       2     IF     ID     EX     MEM     WB       3     IF     ID     EX     MEM     WB       4     IF     ID     EX     MEM     WB       5     IF     ID     EX     MEM     WB       6     IF     ID     EX     MEM     WB       7     IF     ID     EX     MEM     WB       8     IF     ID     EX     MEM     WB | 1 2 3 4 5 6 7 8 9  | 7 8 9      | 6   | 5   | 4   | 3  | 2  | 1  |     |
| 2     IF     ID     EX     MEM     WB       3     IF     ID     EX     MEM     WB       4     IF     ID     EX     MEM     WB       5     IF     ID     EX     MEM     WB       6     IF     ID     EX     MEM     WB       7     IF     ID     EX     MEM     WB       8     IF     ID     EX     MEM     WB                                             | 2     IF     ID     EX     MEM     WB       3     IF     ID     EX     MEM     WB       4     IF     ID     EX     MEM     WB       5     IF     ID     EX     MEM     WB       6     IF     ID     EX     MEM     WB       7     IF     ID     EX     MEM     WB       8     IF     ID     EX     MEM     WB                                             | IF ID EX MEM WB    |            |     | WB  | MEM | EX | ID | IF |     |
| 3     IF     ID     EX     MEM     WB       4     IF     ID     EX     MEM     WB       5     IF     ID     EX     MEM     WB       6     IF     ID     EX     MEM     WB       7     IF     ID     EX     MEM     WB       8     IF     ID     EX     MEM     WB                                                                                         | 3     IF     ID     EX     MEM     WB       4     IF     ID     EX     MEM     WB       5     IF     ID     EX     MEM     WB       6     IF     ID     EX     MEM     WB       7     IF     ID     EX     MEM     WB       8     IF     ID     EX     MEM     WB                                                                                         | +1 IF ID EX MEM WB |            |     | WB  | MEM | ΕX | ID | IF | i+1 |
| 4         IF         ID         EX         MEM         WB           5         IF         ID         EX         MEM         WB           6         IF         ID         EX         MEM         WB           7         IF         ID         EX         MEM         WB           8         IF         ID         EX         MEM         WB                 | 4         IF         ID         EX         MEM         WB           5         IF         ID         EX         MEM         WB           6         IF         ID         EX         MEM         WB           7         IF         ID         EX         MEM         WB           8         IF         ID         EX         MEM         WB                 | +2 IF ID EX MEM WB | В          | WB  | MEM | EX  | ID | IF |    | i+2 |
| 5         IF         ID         EX         MEM         WB           6         IF         ID         EX         MEM         WB           7         IF         ID         EX         MEM         WB           8         IF         ID         EX         MEM         WB                                                                                     | 5         IF         ID         EX         MEM         WB           6         IF         ID         EX         MEM         WB           7         IF         ID         EX         MEM         WB           8         IF         ID         EX         MEM         WB                                                                                     | +3 IF ID EX MEM WB | В          | WB  | MEM | EX  | ID | IF |    | i+3 |
| 6 IF ID EX MEM WB<br>7 IF ID EX MEM WB<br>8 IF ID EX MEM WB                                                                                                                                                                                                                                                                                               | 6 IF ID EX MEM WB<br>7 IF ID EX MEM WB<br>8 IF ID EX MEM WB                                                                                                                                                                                                                                                                                               | +4 IF ID EX MEM WB | EM WB      | MEM | ΕX  | ID  | IF |    |    | i+4 |
| 7 IF ID EX MEM WB<br>8 IF ID EX MEM WB                                                                                                                                                                                                                                                                                                                    | 7 IF ID EX MEM WB<br>8 IF ID EX MEM WB                                                                                                                                                                                                                                                                                                                    | +5 IF ID EX MEM WB | EM WB      | MEM | ΕX  | ID  | IF |    |    | L+5 |
| 8 IF ID EX MEM WB                                                                                                                                                                                                                                                                                                                                         | 8 IF ID EX MEM WB                                                                                                                                                                                                                                                                                                                                         |                    | X MEM WB   | ΕX  | ID  | IF  |    |    |    | L+6 |
|                                                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                                                                                                                                                           |                    | X MEM WB   | ΕX  | ID  | IF  |    |    |    | +7  |
| 9 IF ID EX MEM WB                                                                                                                                                                                                                                                                                                                                         | 9 IF ID EX MEM WB                                                                                                                                                                                                                                                                                                                                         |                    |            |     |     |     |    |    |    | +8  |
|                                                                                                                                                                                                                                                                                                                                                           |                                                                                                                                                                                                                                                                                                                                                           | +9 IF ID EX MEM WB | D EX MEM 🕅 | ID  | ΙF  |     |    |    |    | +9  |

| IF     |         | lel access to I-cache<br>ire alignment?                                                          |  |
|--------|---------|--------------------------------------------------------------------------------------------------|--|
| ID     | Fixed-  | cate logic<br>I-length instructions?<br>DLE INTRA-CYCLE HAZARDS                                  |  |
| EX     | Paralle | lel/pipelined (as before)                                                                        |  |
| MEM    |         | er cycle?<br>hazards & multi-ported D-cache                                                      |  |
| WB     |         | ent register files?<br>ported register files?                                                    |  |
| Progre | ession: | Integer + floating-point<br>Any two instructions<br>Any four instructions<br>Any n instructions? |  |

**Beyond Pipelining (Section 3.7)** 

Hardware determines which of next n instructions can issue

Compiler packs multiple independent operations into an

Unpipelined instruction issue logic (Flynn limit:  $CPI \ge 1$ )

Limits on Pipelining

Latch overheads & signal skew

Superscalar or multiple issue

VLIW - Very Long Instruction Word

in parallel

instruction

2

4

Two techniques for parallelism in instruction issue

Maybe statically or dynamically scheduled

|           |           | Example Superscalar                                  |
|-----------|-----------|------------------------------------------------------|
| Assur     | me two i  | instructions per cycle                               |
| 0         | ne integ  | jer, load/store, or branch                           |
| 0         | ne floati | ng point                                             |
| Could     | l require | e 64-bit alignment and ordering of instruction pair. |
| ١F        | ١F        | FI                                                   |
| ١F        | FΙ        | FI                                                   |
| OK        | NOT       | NOT                                                  |
|           | OK        | OK                                                   |
| Best      | case      |                                                      |
| CPI = 0.5 |           |                                                      |
| B         | ut        |                                                      |
|           |           |                                                      |
|           |           |                                                      |

5





#### Compiler Techniques to Expose ILP

Many compiler techniques exist Several used for multiprocessors as well Our focus on techniques specifically for ILP

|        | Loop Unrolling (Section 3.2)                                                                                                          |
|--------|---------------------------------------------------------------------------------------------------------------------------------------|
| Add so | calar to vector                                                                                                                       |
| Loop:  | L.D F0, 0(R1)<br>stall<br>ADD.D F4, F0, F2<br>stall<br>stall<br>S.D 0(R1), F4<br>DSUBUI R1, R1, #8<br>stall<br>BNEZ R1, Loop<br>stall |
| With s | cheduling                                                                                                                             |
| Loop:  | L.D F0, 0(R1)<br>DSUBUI R1, R1, #8<br>ADD.D F4, F0, F2<br>stall<br>BNEZ R1, Loop ; Assume delayed branch<br>S.D 8(R1), F4             |

10

9

|        | Loop Unrolling                                                                                                                                                                                                                                                                                       |
|--------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Jnroll | ng the loop                                                                                                                                                                                                                                                                                          |
| Loop:  | L.D F0, 0(R1)<br>ADD.D F4, F0, F2<br>S.D 0(R1), F4<br>L.D F6, -8(R1)<br>ADD.D F8, F6, F2<br>S.D -8(R1), F8<br>L.D F10, -16(R1)<br>ADD.D F12, F10, F2<br>S.D -16(R1), F12<br>L.D F14, -24(R1)<br>ADD.D F16, F14, F2<br>S.D -24(R1), F16<br>DSUBUI R1, R1, #32<br>BNEZ R1, Loop; Assume delayed branch |
| Renar  | BNEZ RI, Loop; Assume delayed branch                                                                                                                                                                                                                                                                 |
|        | ve some branch overhead (calculate intermediate values)                                                                                                                                                                                                                                              |



# Software Pipelining (Section H.3)

Pipeline loops in software Pipelined loop iteration

Executes instructions from multiple iterations of original loop Separates dependent instructions

Less code than unrolling



14

13

#### **Global Scheduling**

Loop unrolling and software pipelining work well for straightline code

What if code has branches?

Global scheduling techniques Trace scheduling

# Trace Scheduling

Compiler predicts most frequently executed execution path (trace) Schedules this path and inserts repair code for mispredictions

15

| b[i] = ``old''                                                                                                                                                  |                                                |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------|
| a[i] =                                                                                                                                                          |                                                |
| if (a[i] == 0) then                                                                                                                                             |                                                |
| <pre>b[i] = ``new''; common case</pre>                                                                                                                          |                                                |
| else                                                                                                                                                            |                                                |
| X                                                                                                                                                               |                                                |
| endif                                                                                                                                                           |                                                |
|                                                                                                                                                                 |                                                |
| c[i] =                                                                                                                                                          |                                                |
| L[i] =<br>Until done<br>Select most common path - a trac                                                                                                        |                                                |
| c[i] =<br>Until done<br>Select most common path - a trac<br>Schedule trace across basic block<br>Repair other paths                                             | KS                                             |
| c[i] =<br>Until done<br>Select most common path - a trac<br>Schedule trace across basic block<br>Repair other paths<br>trace to be scheduled:                   | KS repair code:                                |
| <pre>c[i] = Until done Select most common path - a trac Schedule trace across basic block Repair other paths trace to be scheduled: b[i] = ``old''</pre>        | (S<br>repair code:<br>A: restore old b[i]      |
| <pre>c[i] = Until done Select most common path - a trac Schedule trace across basic block Repair other paths trace to be scheduled: b[i] = ``old'' a[i] =</pre> | (S<br>repair code:<br>A: restore old b[i]<br>X |
| <pre>c[i] = Until done Select most common path - a trac Schedule trace across basic block Repair other paths trace to be scheduled: b[i] = ``old''</pre>        | (S<br>repair code:<br>A: restore old b[i]      |



18

20

Predicated Instructions (Section H.4)

Used to convert control dependence to data dependence Instruction executed based on a predicate (or guard or condition) 

 Example

 if (condition) then {

 A = B;

 ...

 Condition or to:

 R1 <- result of condition evaluation</td>

 A = B predicated on R1

 ...

 Hardware can schedule instructions across the branch

 Alpha, MIPS, PowerPC, SPARC V9, x86 (Pentium) have conditional moves

 IA-64 has general predication - 64 1-bit predicate bits

 Limitations

 Takes a clock even if annulled

19

17

Instruction executed based on a predicate (or guard o If condition is false, then no result write or exceptions Hardware Support for Compiler Speculation (Section H.5) Successful compiler scheduling requires Preservation of exception behavior on speculation Mechanism to speculatively reorder memory operations



21

| Poison Bits                                                       |
|-------------------------------------------------------------------|
| Hardware support                                                  |
| A poison bit for each register                                    |
| A speculation bit for each instruction                            |
| If a speculative instruction sees an exception                    |
| it sets poison bit of destination                                 |
| If a speculative instruction sees poison bit set for source       |
| it propagates poison bit to its destination                       |
| If normal instruction sees poison bit for source, takes exception |
| Normal instruction resets poison bit of destination register      |
|                                                                   |
|                                                                   |
|                                                                   |
|                                                                   |
| }                                                                 |



23

24