Pipelining. March 17, Howard Huang 1

Relevanta dokument
Foto: Rona Proudfoot (some rights reserved) Datorarkitektur 1. Datapath & Control. December


Support Manual HoistLocatel Electronic Locks

Writing with context. Att skriva med sammanhang

Digitalteknik och Datorarkitektur 5hp

Isometries of the plane

12.6 Heat equation, Wave equation

Information technology Open Document Format for Office Applications (OpenDocument) v1.0 (ISO/IEC 26300:2006, IDT) SWEDISH STANDARDS INSTITUTE

Make a speech. How to make the perfect speech. söndag 6 oktober 13

Kvalitetsarbete I Landstinget i Kalmar län. 24 oktober 2007 Eva Arvidsson

Adding active and blended learning to an introductory mechanics course

Webbregistrering pa kurs och termin

Preschool Kindergarten

EXTERNAL ASSESSMENT SAMPLE TASKS SWEDISH BREAKTHROUGH LSPSWEB/0Y09

Viktig information för transmittrar med option /A1 Gold-Plated Diaphragm

Boiler with heatpump / Värmepumpsberedare

Digitala System: Datorteknik ERIK LARSSON

Workplan Food. Spring term 2016 Year 7. Name:

Beijer Electronics AB 2000, MA00336A,

Discovering!!!!! Swedish ÅÄÖ. EPISODE 6 Norrlänningar and numbers Misi.se

6 th Grade English October 6-10, 2014

2 Uppgifter. Uppgifter. Svaren börjar på sidan 35. Uppgift 1. Steg 1. Problem 1 : 2. Problem 1 : 3

Om oss DET PERFEKTA KOMPLEMENTET THE PERFECT COMPLETION 04 EN BINZ ÄR PRECIS SÅ BRA SOM DU FÖRVÄNTAR DIG A BINZ IS JUST AS GOOD AS YOU THINK 05

Webbreg öppen: 26/ /

Protokoll Föreningsutskottet

English. Things to remember

BOENDEFORMENS BETYDELSE FÖR ASYLSÖKANDES INTEGRATION Lina Sandström

Consumer attitudes regarding durability and labelling

FÖRBERED UNDERLAG FÖR BEDÖMNING SÅ HÄR

Styrteknik: Binära tal, talsystem och koder D3:1

Starla juldekoration / christmas decoration

Samverkan på departementsnivå om Agenda 2030 och minskade hälsoklyftor

1. Compute the following matrix: (2 p) 2. Compute the determinant of the following matrix: (2 p)

Problem som kan uppkomma vid registrering av ansökan

SVENSK STANDARD SS :2010

Provlektion Just Stuff B Textbook Just Stuff B Workbook

Module 1: Functions, Limits, Continuity

CHANGE WITH THE BRAIN IN MIND. Frukostseminarium 11 oktober 2018

Stad + Data = Makt. Kart/GIS-dag SamGIS Skåne 6 december 2017

Surfaces for sports areas Determination of vertical deformation. Golvmaterial Sportbeläggningar Bestämning av vertikal deformation

Kursutvärderare: IT-kansliet/Christina Waller. General opinions: 1. What is your general feeling about the course? Antal svar: 17 Medelvärde: 2.

Module 6: Integrals and applications

To Lauren Beukes Tune: Top of the World Written by Marianna Leikomaa

Andy Griffiths Age: 57 Family: Wife Jill, 1 kid Pets: Cats With 1 million SEK he would: Donate to charity and buy ice cream

Ett hållbart boende A sustainable living. Mikael Hassel. Handledare/ Supervisor. Examiner. Katarina Lundeberg/Fredric Benesch

STORSEMINARIET 3. Amplitud. frekvens. frekvens uppgift 9.4 (cylindriskt rör)

c a OP b Digitalteknik och Datorarkitektur 5hp ALU Design Principle 1 - Simplicity favors regularity add $15, $8, $11

Datorarkitekturer med operativsystem ERIK LARSSON

Svenska()(Bruksanvisning(för(handdukstork()(1400(x(250(mm(

Förändrade förväntningar

#minlandsbygd. Landsbygden lever på Instagram. Kul bild! I keep chickens too. They re brilliant.

Datasäkerhet och integritet

Graphs (chapter 14) 1

Isolda Purchase - EDI

In Bloom CAL # 3. Nu ska du göra 8 separata blomblad. Ett mitt på varje sida och ett i varje hörn. Använd nål 3.5 mm.

INSTALLATION INSTRUCTIONS

2.1 Installation of driver using Internet Installation of driver from disk... 3

Byggdokument Angivning av status. Construction documents Indication of status SWEDISH STANDARDS INSTITUTE

Service och bemötande. Torbjörn Johansson, GAF Pär Magnusson, Öjestrand GC

Health café. Self help groups. Learning café. Focus on support to people with chronic diseases and their families

Quicksort. Koffman & Wolfgang kapitel 8, avsnitt 9

Hur utforma en strategi för användande av sociala medier? Skapa nytta och nå fram i bruset

Pipelining i Intel 80486

Join the Quest 3. Fortsätt glänsa i engelska. Be a Star Reader!

Förskola i Bromma- Examensarbete. Henrik Westling. Supervisor. Examiner

Read Texterna består av enkla dialoger mellan två personer A och B. Pedagogen bör presentera texten så att uttalet finns med under bearbetningen.

D-RAIL AB. All Rights Reserved.


Grafer, traversering. Koffman & Wolfgang kapitel 10, avsnitt 4

KTH MMK JH TENTAMEN I HYDRAULIK OCH PNEUMATIK allmän kurs kl

Arctic. Design by Rolf Fransson

Solowheel. Namn: Jesper Edqvist. Klass: TE14A. Datum:

Grammar exercises in workbook (grammatikövningar i workbook): WB p 121 ex 1-3 WB p 122 ex 1 WB p 123 ex 2

Övning 5 ETS052 Datorkommuniktion Routing och Networking

Välkommen in på min hemsida. Som företagsnamnet antyder så sysslar jag med teknisk design och konstruktion i 3D cad.

Vässa kraven och förbättra samarbetet med hjälp av Behaviour Driven Development Anna Fallqvist Eriksson


Uttagning för D21E och H21E

Hur fattar samhället beslut när forskarna är oeniga?

Enkätundersökning Askar Camp 2011

MÅLSTYRNING OCH LÄRANDE: En problematisering av målstyrda graderade betyg

ISBN: Tommy Ohlsson Stockholm 2013

Övning 3 ETS052 Datorkommuniktion IP, TCP och

How to format the different elements of a page in the CMS :

Hemlig påse 1. (åk f-3) Hoppa 15 hopp baklänges. Hämta en pinne som är längre än din tumme. Hämta en tung sten. Hämta sedan en lättare sten.

EVALUATION OF ADVANCED BIOSTATISTICS COURSE, part I

Blueprint Den här planeringen skapades med Blueprints gratisversion - vänligen uppgradera nu. Engelska, La06 - Kursöversikt, 2015/2016.

Dokumentnamn Order and safety regulations for Hässleholms Kretsloppscenter. Godkänd/ansvarig Gunilla Holmberg. Kretsloppscenter

Tentamen MMG610 Diskret Matematik, GU

The Municipality of Ystad

Grafisk teknik IMCDP IMCDP IMCDP. IMCDP(filter) Sasan Gooran (HT 2006) Assumptions:

Rastercell. Digital Rastrering. AM & FM Raster. Rastercell. AM & FM Raster. Sasan Gooran (VT 2007) Rastrering. Rastercell. Konventionellt, AM

Urban Runoff in Denser Environments. Tom Richman, ASLA, AICP

Algoritmer och Komplexitet ht 08. Övning 6. NP-problem

VAD SKULLE DU HA VALT PDF

Methods to increase work-related activities within the curricula. S Nyberg and Pr U Edlund KTH SoTL 2017

Sparbankerna PDF. ==>Download: Sparbankerna PDF ebook By 0

Byggritningar Ritsätt Fästelement. Construction drawings Representation of fasteners SWEDISH STANDARDS INSTITUTE

Investeringsbedömning

Transkript:

Pipelining We ve seen two possible implementations of the IPS architectre. A single-cycle path eectes each instrction in jst one clock cycle, bt the cycle time may be very long. A mlticycle path has mch shorter cycle times and higher clock rates, bt each instrction reqires many cycles to eecte. A third approach, pipelining, yields the best of both worlds and is sed in every modern desktop processor. Cycle times are short so clock rates are high. Bt we can still eecte an instrction in abot one clock cycle! Today we introdce pipelining and its benefits, and on Wednesday we ll show a pipelined path and control nit. After spring break we ll talk abot what makes pipelining difficlt in real life and what to do abot it. arch 7, 23 2-23 Howard Hang

Instrction eection review Eecting a IPS instrction can take p to five steps. Step Instrction Fetch Instrction Decode Eecte emory back Name Description an instrction from. sorce registers and generate control signals. Compte an R-type reslt or a branch target. or write the. Store a reslt in the destination register. However, as we saw, not all instrctions need all five steps. Instrction Steps reqired beq R-type sw lw arch 7, 23 Pipelining 2

Single-cycle path diagram PC 4 Add Reg Shift left 2 Add PCSrc Instrction [3-] Instrction I [25-2] I [2-6] I [5 - ] register register 2 register 2 Registers ALU Zero Reslt ALUOp em Data emtoreg RegDst ALUSrc em I [5 - ] Sign etend arch 7, 23 Pipelining 3

Single-cycle review All five eection steps occr in one clock cycle. This means the cycle time mst be long enogh to accommodate all the steps of the most comple instrction a lw in or instrction set. If the register file has a ns latency and the memories and ALU have a 2ns latency, lw will reqire 8ns. Ths all instrctions will take 8ns to eecte. Each hardware element can only be sed once per clock cycle. A lw or sw mst access twice (in the and stages), so there are separate instrction and memories. There are mltiple adders, since each instrction increments the PC () and performs another comptation (). On top of that, branches also need to compte a target. arch 7, 23 Pipelining 4

lticycle path diagram PC PC IorD ALUSrcA em Address emory em em Data IR [3-26] [25-2] [2-6] [5-] [5-] RegDst register register 2 register Reg 2 Registers A B 4 2 3 ALU Zero Reslt ALUOp ALU Ot PCSorce Instrction register emory register Sign etend Shift left 2 ALUSrcB emtoreg arch 7, 23 Pipelining 5

lticycle review Instrction eection is split into five stages, each taking one clock cycle. The cycle time is shorter, since each stage is relatively simple. With a ns delay for registers and 2ns for the and ALU, the cycle time wold be 2ns. Only necessary stages are eected, so more comple instrctions will not slow down simpler ones. Instrction Steps reqired Cycles beq 3 R-type 4 sw 4 lw 5 The actal CPI will depend on the particlar instrction mi. We can get by with only one and one ALU, becase they can be resed in different clock cycles of a single instrction eection. arch 7, 23 Pipelining 6

Eample: Instrction Fetch () Let s qickly review how lw is eected in the single-cycle path. We ll ignore PC incrementing and branching for now. In the Instrction Fetch () step, we read the instrction. Reg Instrction [3-] Instrction I [25-2] I [2-6] I [5 - ] register register 2 register 2 Registers ALU Zero Reslt ALUOp em Data emtoreg RegDst ALUSrc em I [5 - ] Sign etend arch 7, 23 Pipelining 7

Instrction Decode () The Instrction Decode () step reads the sorce register from the register file. Reg Instrction [3-] Instrction I [25-2] I [2-6] I [5 - ] register register 2 register 2 Registers ALU Zero Reslt ALUOp em Data emtoreg RegDst ALUSrc em I [5 - ] Sign etend arch 7, 23 Pipelining 8

Eecte () The third step, Eecte (), comptes the effective from the sorce register and the instrction s constant field. Reg Instrction [3-] Instrction I [25-2] I [2-6] I [5 - ] register register 2 register 2 Registers ALU Zero Reslt ALUOp em Data emtoreg RegDst ALUSrc em I [5 - ] Sign etend arch 7, 23 Pipelining 9

emory () The emory () step involves reading the, from the compted by the ALU. Reg Instrction [3-] Instrction I [25-2] I [2-6] I [5 - ] register register 2 register 2 Registers ALU Zero Reslt ALUOp em Data emtoreg RegDst ALUSrc em I [5 - ] Sign etend arch 7, 23 Pipelining

back () Finally, in the back () step, the vale is stored into the destination register. Reg Instrction [3-] Instrction I [25-2] I [2-6] I [5 - ] register register 2 register 2 Registers ALU Zero Reslt ALUOp em Data emtoreg RegDst ALUSrc em I [5 - ] Sign etend arch 7, 23 Pipelining

A bnch of lazy fnctional nits Notice that each eection step ses a different fnctional nit. In other words, the main nits are idle for most of the 8ns cycle! The instrction RA is sed for jst 2ns at the start of the cycle. Registers are read once in (ns), and written once in (ns). The ALU is sed for 2ns near the middle of the cycle. ing the only takes 2ns as well. That s a lot of epensive hardware sitting arond doing nothing. arch 7, 23 Pipelining 2

Ptting those slackers to work We sholdn t have to wait for the entire instrction to complete before we can re-se the fnctional nits. For eample, the instrction is free in the Instrction Decode step as shown below, so... Idle Instrction Decode () Reg Instrction [3-] Instrction I [25-2] I [2-6] I [5 - ] register register 2 register 2 Registers ALU Zero Reslt ALUOp em Data emtoreg RegDst ALUSrc em I [5 - ] Sign etend arch 7, 23 Pipelining 3

Decoding and fetching together Why don t we go ahead and fetch the net instrction while we re decoding the first one? Fetch 2nd Decode st instrction Reg Instrction [3-] Instrction I [25-2] I [2-6] I [5 - ] register register 2 register 2 Registers ALU Zero Reslt ALUOp em Data emtoreg RegDst ALUSrc em I [5 - ] Sign etend arch 7, 23 Pipelining 4

Eecting, decoding and fetching Similarly, once the first instrction enters its Eecte stage, we can go ahead and decode the second instrction. Bt now the instrction is free again, so we can fetch the third instrction! Fetch 3rd Decode 2nd Eecte st Reg Instrction [3-] Instrction I [25-2] I [2-6] I [5 - ] register register 2 register 2 Registers ALU Zero Reslt ALUOp em Data emtoreg RegDst ALUSrc em I [5 - ] Sign etend arch 7, 23 Pipelining 5

Working hard The idea behind pipelining is to maimize the sage of the hardware by overlapping the eection of several instrctions. Each of or five eection steps ses a different fnctional nit. Conceivably we can have p to five instrctions eecting at the same time one in each of the,,, and stages. To make this work, we will go back to the single-cycle path with its mltiple adders and memories, so many instrctions can eecte together withot interference. arch 7, 23 Pipelining 6

A pipeline diagram lw $t, 4($sp) sb $v, $a, $a and $t, $t2, $t3 or $s, $s, $s2 add $sp, $sp, -4 2 3 Clock cycle 4 5 6 7 8 9 A pipeline diagram shows the eection of a series of instrctions. The instrction seqence is shown vertically, from top to bottom. Clock cycles are shown horizontally, from left to right. Each instrction is divided into its component stages. (We show five stages for every instrction, which will make the control nit easier.) This clearly indicates the overlapping of instrctions. For eample, there are three instrctions active in the third cycle above. The lw instrction is in its Eecte stage. Simltaneosly, the sb is in its Instrction Decode stage. Also, the and instrction is jst being fetched. arch 7, 23 Pipelining 7

Pipeline terminology lw $t, 4($sp) sb $v, $a, $a and $t, $t2, $t3 or $s, $s, $s2 add $sp, $sp, -4 2 3 Clock cycle 4 5 6 7 8 9 filling fll emptying The pipeline depth is the nmber of stages in this case, five. In the first for cycles here, the pipeline is filling, since there are nsed fnctional nits. In cycle 5, the pipeline is fll. Five instrctions are being eected simltaneosly, so all hardware nits are in se. In cycles 6-9, the pipeline is emptying. arch 7, 23 Pipelining 8

Performance benefits of pipelining lw $t, 4($sp) sb $v, $a, $a and $t, $t2, $t3 or $s, $s, $s2 add $sp, $sp, -4 2 3 Clock cycle 4 5 6 7 8 9 How mch time wold it take to eecte these five instrctions? We wold need 4ns with a single-cycle CPU with an 8ns cycle time. In a mlticycle design, these instrctions reqire 42ns. With pipelining, we need jst 9 cycles, and 8ns, as shown above! What kind of speedps are we talking abot? Compared to a single-cycle design, 4ns/8ns = 2.2. Verss the mlticycle machine, the speedp is 42ns/8ns = 2.3! arch 7, 23 Pipelining 9

It gets better! lw $t, 4($sp) sb $v, $a, $a and $t, $t2, $t3 or $s, $s, $s2 add $sp, $sp, -4 2 3 Clock cycle 4 5 6 7 8 9 This instrction seqence is too short. ost of the cycles are spent filling or emptying the pipeline. We only achieve fll tilization for one clock cycle. Imagine a longer program with instrctions. It takes 4 cycles to fill the pipeline, and more cycles to complete all the instrctions. So it takes 4 cycles to eecte instrctions. That s almost one cycle per instrction, and a speedp of 3.98 over the single-cycle path! arch 7, 23 Pipelining 2

Almost one cycle per instrction lw $t, 4($sp) sb $v, $a, $a and $t, $t2, $t3 or $s, $s, $s2 add $sp, $sp, -4 2 3 Clock cycle 4 5 6 7 8 9 Here s another way to visalize the benefits of pipelining. In the first for stages of the diagram above, the pipeline is filling. Bt after that, while the pipeline is fll or emptying, yo can see that one instrction completes on every cycle. So ecept for some initial overhead, we really are eecting at the rate of one cycle per instrction. arch 7, 23 Pipelining 2

It s GRRReeeaaAAT! Tony the Tiger is trademark of Kellogg. Don t se me. So pipelining can really deliver the goods. The CPI is nearly, like in a single-cycle processor. Bt the cycle time is very short (2ns), like a mlticycle path. This adds p to ecellent performance! arch 7, 23 Pipelining 22

Ideal speedp lw $t, 4($sp) sb $v, $a, $a and $t, $t2, $t3 or $s, $s, $s2 add $sp, $sp, -4 2 3 Clock cycle 4 5 6 7 8 9 In or pipeline, we can eecte p to five instrctions simltaneosly. This implies that the maimm speedp is 5 times. In general, the ideal speedp eqals the pipeline depth. Why was or speedp on the previos slide only 3.98 times? The pipeline stages are imbalanced: a register file operation can be done in ns, bt we mst stretch that ot to 2ns to keep the and stages synchronized with, and. Balancing the stages is one of the many hard parts in designing a pipelined processor. arch 7, 23 Pipelining 23

The pipelining parado lw $t, 4($sp) sb $v, $a, $a and $t, $t2, $t3 or $s, $s, $s2 add $sp, $sp, -4 2 3 Clock cycle 4 5 6 7 8 9 Pipelining does not improve the eection time of any single instrction. Each instrction here actally takes longer to eecte than in a singlecycle path (ns vs. 8ns)! Instead, pipelining increases the throghpt, or the amont of work done per nit time. Here, several instrctions are eected together in each clock cycle. The reslt is improved eection time for a seqence of instrctions, sch as an entire program. arch 7, 23 Pipelining 24

Smmary Pipelining attempts to maimize hardware sage by overlapping the eection stages of several different instrctions. Pipelining offers amazing speedp. The CPU throghpt is dramatically improved, becase several instrctions can be eecting concrrently. In the best case, one instrction finishes on every cycle, and the speedp is eqal to the pipeline depth. Net time we ll see an actal path and control nit for a pipelined processor, so we can see how to make all these wild ideas work. arch 7, 23 Pipelining 25