# Lecture 8: More Pipelining

David Black-Schaffer davidbbs@stanford.edu EE183 Spring 2003

#### Overview

- Getting Started with Lab 2
  - Just get a single pixel calculating at one time
  - Then look into filling your pipeline
- Multipliers
  - Different options for pipelining: what do you need?
  - 3 Multipliers or put x\*x, y\*y, and x\*y through sequentially?
- Pipelining
  - If it won't fit in one clock cycle you have to divide it up so each stage will fit
  - The control logic must be designed with this in mind
  - Make sure you need it

EE183 Lecture 8 - Slide 2

## **Public Service Announcement**

- Xilinx Programmable World
  - Tuesday, May 6th
  - http://www.xilinx.com/events/pw2003/index.htm
- Guest Lectures
  - Monday, April 28th
    Ryan Donohue on Metastability and
    Synchronization
  - Wednesday, May 7th
    Gary Spivey on ASIC & FPGA Design for Speed
  - The content of these lectures will be on the Quiz

EE183 Lecture 8 - Slide 3

#### Logistics

- Lab 2 Prelab due Friday by 5pm
- Guest lecture next Monday
  Synchronization and Metastability
  These are critical for high-speed systems and anything where you'll be connecting across clock domains.

#### SHOW UP! (please)













## Key points on Pipelining

- Increases utilization for operators
  - You can do multiple calculations at once so you can use everything maximally (ideally)
  - This is the point! Store the results from smaller calculations to make the overall calculation faster.
- Insert the next data item into the datapath before the previous one has finished
- The pipe registers keep the computation separate
  You will have a lot of pipe registers if you're doing a lot of calculations (I.e., Lab 3!)
- What is the effect of the algorithm feeding back on itself?Do all points have the same number of iterations? *control* 
  - Is the data dependent between pipeline stages? *hazards*

EE183 Lecture 8 - Slide 11

## Multipliers

- CoreGen gives you several pipelining options
- Which is best?
  - Depends on your design
  - Data size will determine speed and pipelining
- Design is an iterative process so you won't be able to choose the best approach at first (i.e., get started early!)



- Multipliers are BIGHow can we get away with fewer multipliers?
- Multipliers may be SLOWHow can we utilize them maximally?



## Now we have...

- With a 3-stage multiplier you've now got 5 pipeline stages
- How can you keep the pipeline full?
- How many things do you need to calculate at once?
- What is full? Will you ever get 100% utilization? What is good enough?

EE183 Lecture 8 - Slide 15





#### **Pipeline Performance Analysis** • With the bad data path (3, 3 stage multipliers and 2 stages after that; multiple pixels at a time) M1a M2a M3a M1b M2b M3b M1c M2c M3c Add1 Add2 Clk ٠ • 1 ٠ ۸ ▲ 2 • • • 3 • • ٠ ۸ • • • ٠ ٠ ÷ 4 • • • • 5 ٠ • ٠ ٠ ÷ ٠ . ٠ . ٠ We approach 100% utilization if there are no stalls or dependencies and we can keep getting new data EE183 Lecture 8 - Slide 18

## What performance is required?

- Replication and Pipelining are not trivial to implement—make sure you need them
- Is either needed for Lab #2?
- How would you tell?
  - Hint: each Julia image takes at most (64\*64\*64\*7\*1/50e6) = 0.036s to create.
  - Is this "real-time" enough for an animation?
  - Other issues? Need to meet timing for the VGA.

EE183 Lecture 8 - Slide 19

#### What do we expect

- The previous data path is terribly inefficient if you only put one pixel through at a time, but doing multiple pixels at once is very complicated
- As an alternative you can use one multiplier and put your x\*x, y\*y, and x\*y through it in a pipelined manner.
- What's the efficiency? Is it a good tradeoff for area/speed? This analysis is critical!



#### **Pipeline Performance** Conclusions

- You need to know your algorithm and what tradeoffs you are making
- What do you care about?
  - Speed?
  - Area?
  - Both. (Power is a function of speed and number of transistors, i.e., area.)





