Handout #10 Winter 2003

Lecture #8: Lab 3 EE183 Pipelined Processor

> Kunle Olukotun Stanford EE183 February 3, 2003

# Lab Stuff

- Lab #2 due Friday at 6pm
- I 'll be in the lab at 5pm or so for demos.
- Any questions?

# System-on-Chip (SoC) Design Challenges

- > 100,000 gates/mm<sup>2</sup>
- Verification is a bottleneck
  - Increasing complexity
  - >70% of resources on verification
- Cost of bugs rising
  - 1 million mask cost
  - Extra design time
- Shorter time to market
  - Markets change
  - Standards change
- Need to increase designer productivity

## RTL Design



- Datapath
  - Fixed purpose
  - consumes most of gates
- Control
  - Datapath sequencing
  - Error conditions
  - Communication
  - Control consumes most of verification time



### Processor Based SOC

- Processors controls hardwired functions
- Memory level communication
  - Memory mapped I/O
  - VGA example
- More software development
- Added flexibility
- Decreased design time
- Adapt to changing standards
- Embedded processors
- MIPS, ARM, PowerPC

# Application Specific Instructionset Processor (ASIP)



- Specialized datapath (FUs)
- Register level communication
- ISA changes
- Advantages
  - Raises design abstraction: C instead of Verilog
  - Eliminates most control logic
  - Added flexibility
- Configurable processors
  - Tensilica
  - ARC
- Final project

### Handout #10 Winter 2003

## **Processor Overview**

- 12-bit RISC Microcontroller
  - What would having only 8 bits mean for addressing memory?
- 4 or 8 General Purpose Registers
  - 4 back in the days when we had a small FPGA  $\textcircled{\mbox{$\odot$}}$
- 43 Instructions
- 3 operand instructions
- 4 stage pipeline
- Register indirect addressing mode
  - What does this mean?

## Why Design this Processor?

- Complex enough to be "interesting"
- Simple enough to complete in 2 weeks
- Pipelining is an important technique in digital design
- *Exciting!* Tell your friends and look cool at dinner parties

# Processor Overview



# Instruction Set Architecture (ISA)

- 8 General Purpose Registers
- ALU Instructions
  - 28 Instructions
  - 3 operands
- Control Transfer Instructions
  - 12 Instructions
  - Conditional/Unconditional branches
- Memory Instructions
  - 2 instructions
  - Load/Store

### ALU Instructions I

#### **ALU Instruction format**



#### Arithmetic

| <u>OP<sub>hex</sub></u> | <u>operation</u> | mnemo  | <u>nic</u> |
|-------------------------|------------------|--------|------------|
| 00                      | C = A + B        | ADD    | С, А, В    |
| 01                      | C = A + B + 1    | ADDINC | С, А, В    |
| 02                      | C = A            | PASSA  | С, А       |
| 03                      | C = A + 1        | INCA   | С, А       |
| 04                      | C = A - B - 1    | SUBDEC | С, А, В    |
| 05                      | C = A - B        | SUB    | С, А, В    |
| 06                      | C = A - 1        | DECA   | С, А       |
| 07                      | C = A            | PASSA  | С, А       |

### ALU Instructions II

#### **Shift Instructions**

| <u>OP</u> hex | operation                 | <u>mnemonic</u> |      |  |  |  |  |
|---------------|---------------------------|-----------------|------|--|--|--|--|
| 08            | C = Logical Shift Left(A) | LSL             | С, А |  |  |  |  |
| 09            | C = Arith Shift Right(A)  | ASR             | С, А |  |  |  |  |

#### **Boolean Instructions**

| <u>OPhex</u> | <u>operation</u>    | mnemoni  | <u>c</u> |    |   |
|--------------|---------------------|----------|----------|----|---|
| 10           | C = 0               | ZEROS    | С        |    |   |
| 11           | $C = A \bullet B$   | AND      | с,       | А, | В |
| 12           | $C = A' \bullet B$  | ANDNOTA  | с,       | А, | В |
| 13           | C = B               | PASSB    | с,       | В  |   |
| 14           | $C = A \bullet B'$  | ANDNOTB  | с,       | А, | В |
| 15           | C = A               | PASSA    | С,       | А  |   |
| 16           | $C = A \oplus B$    | XOR      | с,       | А, | В |
| 17           | C = A + B           | OR       | С,       | А, | В |
| 18           | $C = A' \bullet B'$ | NOR      | С,       | А, | В |
| 19           | $C = A \oplus B'$   | XNOR     | с,       | А, | В |
| 1A           | C = A'              | PASSNOTA | с,       | А  |   |
| 1B           | C = A' + B          | ORNOTA   | с,       | А, | В |
| 1C           | C = B'              | PASSNOTB | С,       | В  |   |
| 1D           | C = A + B'          | ORNOTB   | С,       | А, | В |
| 1E           | C = A' + B'         | NAND     | с,       | А, | В |
| 1F           | C = 1               | ONES     | С        |    |   |

### Literal Instruction

#### Literal Instruction

 $\frac{OP_{hex}}{O2} \qquad \frac{operation}{C = literal}$ 

<u>mnemonic</u> LOADLIT *C, literal* 

#### **Old literal Instruction format**



#### New literal Instruction format



### **Control Transfer Instructions**

• Extra points for making conditional branch PC

Requires assembler

relative

changes

#### Conditional and unconditional jumps with absolute addresses

#### Instruction format

| 15 14               | 13 12  | 11 8 | 7                | 0 |  |  |  |  |  |
|---------------------|--------|------|------------------|---|--|--|--|--|--|
| 0 0                 | OP     | COND | JUMP ADDRESS     |   |  |  |  |  |  |
| 15 14               | 13_12_ | 11   | · · · · · · · ·_ | 0 |  |  |  |  |  |
| 0 0 OP JUMP ADDRESS |        |      |                  |   |  |  |  |  |  |

Instructions

| OP <sub>bin</sub><br>00 | operation<br>Jump False              | <u>mnemonic</u><br>JF.cond JPC |                                       |
|-------------------------|--------------------------------------|--------------------------------|---------------------------------------|
| 01                      | Jump True                            | JT.cond JPC                    |                                       |
| 10                      | Uncond. Jump                         | J JPC                          |                                       |
| CONDin<br>0100          | <u>condition</u><br>ALU result negat | mnemonic<br>.NEG               | Could add up to 8 external conditions |
| 0101                    | ALU result zero                      | .ZERO                          | external conditions                   |
| 0110                    | ALU carry                            | .CARRY                         |                                       |
| 0111                    | ALU result negat                     | tive or zero .NEGZERO          |                                       |
| 0000                    | TRUE                                 | .TRUE                          |                                       |
| 1000                    | External Condition                   | on .EXT                        |                                       |

Condition codes are only set by ALU instructions

### Memory Instructions

### Register indirect addressing mode

#### Instruction format



#### Instructions

| <u>OP<sub>hex</sub></u> | operation  | mnemonic   |  |  |  |  |  |  |  |
|-------------------------|------------|------------|--|--|--|--|--|--|--|
| 08                      | C = Mem[A] | LOAD C, A  |  |  |  |  |  |  |  |
| 10                      | Mem[A] = B | STORE A, B |  |  |  |  |  |  |  |

### **NOP** Instruction

NOP instructions are useful in pipelined processors Many different NOP instruction encodings are possible NOP Jump False on condition TRUE: JF.TRUE 0x00



# I/O Devices I

- VGA
  - Memory mapped I/O
  - Pick two addresses in processor address space
    - Addr1 = BRAM Address
    - Addr2 = BRAM data

STORE Addr1, X //Use X as BRAM location LOAD Addr2, A //Load from BRAM location STORE Addr2, A //Store to BRAM location

- Use VGA
  - Manipulate a simple polygon on the display
    - Flash a square on and off
    - move a square from side to side
  - Design needs to run at least 25MHz

### I/O devices II

- Connect free running counter to EXT
  - Choose 1 bit or multiple bits
  - Extra: use memory mapped I/O to reset
- Use EXT in timing loops

\_LAB1 JF.EXT \_LAB1 //spin waiting for posedge < Body of timing loop > \_LAB2 JT.EXT \_LAB2 //spin waiting for negedge J \_LAB1

### EE183 Olukotun

### Instruction Execution Steps

### • 4 Step Sequence

- Step I Fetch instruction from Instruction Memory
- Step R Read operands from registers (A, B)
- Step E Execute instruction, set condition codes
- Step W Write results to register C
- One stage per step
- Each instruction goes through all four stages
  - Assume each stage takes one clock cycle

| Cycle number    | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|-----------------|---|---|---|---|---|---|---|---|---|----|----|----|
| Instruction i   | Ι | R | Е | w |   |   |   |   |   |    |    |    |
| Instruction i+1 |   |   |   |   | I | R | Ε | w |   |    |    |    |
| Instruction i+2 |   |   |   |   |   |   |   |   | Ι | R  | Ε  | w  |

## Pipeline

| Cycle number    | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
|-----------------|---|---|---|---|---|---|---|---|---|----|----|----|
| Instruction i   | Ι | R | Е | w |   |   |   |   |   |    |    |    |
| Instruction i+1 |   | Ι | R | Е | w |   |   |   |   |    |    |    |
| Instruction i+3 |   |   | I | R | Е | w |   |   |   |    |    |    |
| Instruction i+4 |   |   |   | Ι | R | Е | w |   |   |    |    |    |
| Instruction i+5 |   |   |   |   | Т | R | Е | w |   |    |    |    |
| Instruction i+6 |   |   |   |   |   | Ι | R | Е | w |    |    |    |
| Instruction i+7 |   |   |   |   |   |   | Т | R | Е | w  |    |    |
| Instruction i+8 |   |   |   |   |   |   |   | Ι | R | Е  | W  |    |
| Instruction i+9 |   |   |   |   |   |   |   |   | I | R  | Е  | w  |



Bypassing/Forwarding

- Given the following code fragment ADD R1, R2, R3 SUB R4, R1, R5 XXX YYY
- Cycle number 1 2 3 4 5 67 • What's going on REW L in the pipeline? REW L. L R EW I REW
- How many different types of data hazards are there?

# Control Transfer

### • Code fragment

00 ADD R1, R2, R3 01 JT.ZERO \_taken 02 SUB R4, R5, R6 03 AND R7, R8, R1

\_taken 11 NOR R7, R8, R1

• Branch Taken vs. Branch Not Taken

| Cycle number | 1 | 2 | 3 | 4 | 5 | 6 | 7 | Cycle number | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|--------------|---|---|---|---|---|---|---|--------------|---|---|---|---|---|---|---|
|              | Ι | R | Е | w |   | _ |   |              | Ι | R | Е | w |   | _ |   |
|              |   | Т | R | Е | w |   | _ |              |   | Т | R | Е | w |   |   |
|              |   |   | Т | R | Е | w |   |              |   |   | Т | R | Е | w |   |
|              |   |   |   | Ι | R | Е | w |              |   |   |   | Ι | R | Е | w |

# ASM183 (Assembler)



### EE183 Olukotun

## What do you get?

- Lab 3 Verilog
  - A lot of verilog given
  - Look through ALL of it
    - Some are not instantiated in the Lab 3 schematic --e.g. **boolean.v**
- ASM183
  - Perl assembler
  - Perl Handout
    - How many already know perl?