Digitala integrerade kretsar: teknologi och metod

Digitala integrerade kretsar: teknologi och metod Viktor Öwall

Motivation Hur konstruera med transistorer? Hur påverkar teknologin prestanda, tex klockfrekvens och effektförbrukning? Vart är vi på väg? Vilka alternativ har vi?

Vi tittar först på logiska grindar, viss repetition och en del nytt. Sen lite mer vart elektroniken är på väg.

Vad är detta? V DD B Truth Table B OUT B OUT 0 0 0 1 1 0 1 1 GND

Logisk grind, NND + a NND + Inverter a ND B NND ND US & Europe 0 0 1 0 1 1 1 0 1 1 1 0 0 0 0 1

Varför börjar jag med NND och inte ND? Grundblock och för att den är enkel att implementera med CMOS transistorer.

Digitalt arbetar vi i stort sätt uteslutande med CMOS transistorer. Source Gate Drain I d n + n + I D N-Channel p - Solid State Physics V DS [V] V GS Electrical Characteristics Digital transistor as a switch Small Signal Model (amplifier design) gate drain V DD V gs g m V gs g o V ds source GND

Logic Gates, ND V dd V dd V dd B V DD f NND f ND B PMOS GND NND + Inverter a ND NND ND B NND ND B GND GND NMOS US & Europe 0 0 1 0 1 1 1 0 1 1 1 0 0 0 0 1

NND Two Input NND/ ND 0.8 m CMOS Inverter Början av 90-talet. Vad har vi idag?

I D som funktion av V DS Pinch-off V DS V GS V T Linear V GS V GS I D V GS Saturation V DS [V]

NMOS transistorn som strömbrytare v in = hög a kortslutning V GS Ökande V GS D v in G S I D V GS v in = låg a öppen V GS V DS [V]

PMOS transistorn som strömbrytare V DS [V] v in = hög a öppen V DD v in S G D V GS V GS v in = låg a kortslutning V GS I D

CMOS inverteraren med transistorn som strömbrytare hög in a V DD NMOS kortsluten PMOS öppen I D GND

CMOS Inverteraren V OUT Ideal Idealt slår transistorerna om som strömbrytare vid V DD /2. Men hur är det egentligen? V dd /2 V IN

CMOS Inverteraren V OUT Ideal V OUT N Off P Lin Verklig N Sat P Lin Inget omedelbart omslag N Sat P Sat N Lin P Sat N Lin P Off V dd /2 V IN Omslagspunkt varierar V dd /2 V IN

Omslagstid V OUT N Off P Lin Verklig N Sat P Lin Vad beror omslagstiden på? N Sat P Sat N Lin P Sat N Lin P Off V IN

Omslagstid V OUT N Off P Lin Verklig N Sat P Lin Vad beror omslagstiden på? N Lin P Sat N Sat P Sat N Lin P Off En kapacitans som inkluderar såväl interna som externa bidrag, t.ex. kapacitansen från ingången till nästa steg. V IN

Omslagstid V OUT N Off P Lin Verklig N Sat P Lin Vad beror omslagstiden på? N Lin P Sat N Sat P Sat N Lin P Off V IN En kapacitans som inkluderar såväl interna som externa bidrag, t.ex. kapacitansen från ingången till nästa steg. Mer om hastigheten senare.

Hur konstruerar vi som ingenjörer tex en inverterare? Om alls så i datorn. Ofta kommer dessa grundkomponenter i ett cellbibliotek, dvs att få konstruktörer gör detta själv.

P-Channel N-Well P-Substrate N-Channel

Courtesy of Intel Source Gate Drain n + n + p - substrat

Detta kallas: Complementary Logic V DD Pull up network B B OUT Pull down network Properties: + rail to rail swing, i.e.out = VDD or GND + no static power, i.e. either PUN or PDN off GND - Many transistors

Pseudo-NMOS Gates V DD Pull up network B GND OUT Pull down network Properties: + fewer transistors + in the early years there was only NMOS - Static power consumption - Low input not 0

Pseudo-NMOS Gates V DD OUT En resistans där R beror på transistorns egenskaper. B Properties: + fewer transistors GND -Static power consumption - Low input not 0

Pseudo-NMOS Gates: Static Power V DD OUT Vad händer när =B=1? B GND

Pseudo-NMOS Gates: Static Power V DD R B I static OUT Vad händer när =B=1? Ström från V DD till GND vars storlek beror på R. GND Statisk effektförbrukning!

Så med enbart NND kan vi göra allt, dvs i princip bygga en processor. Men är det effektivt?

Så med enbart NND kan vi göra allt, dvs i princip bygga en processor. Men är det effektivt? Nej, alltför många transistorer, alltför långsamt och för hög effektförbrukning!

Komplexare funktioner: addition msb = most signifcant bit a msb b msb a i+1 b i+1 a i b i lsb = least signifcant bit cin msb cin i Overflow om resultatet för stort. cout msb sum msb sum i+1 sum i cout i+1 Memory digit: carry value

Komplexare funktioner: addition a i b i cin i sum i Hur implementerar jag denna med transistorer?

Från tidigare lektion

Heladderare med NND 9 x NND a 36 transistorer

Full dder in CMOS, 1 bit 24 transistorer (36 med enbart NND) B V DD B B V DD C B V DD C C B B C o B C C C B S B C in C o S 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 1 1 0 and B: in C: carra in S: sum C o : carryout 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1

Full dder in CMOS, 1 bit B V DD B B V DD C B V DD C C B B C = 1 o B C C C B S = 1 B C in C o S 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 1 1 0 and B: in C: carra in S: sum C o : carryout 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1

En carry ripple adder. a msb b msb a i+1 b i+1 a i b i Max delay cin msb cin i Max delay? cout msb sum msb sum i+1 sum i Vad är maximal fördröjning? Viktigt att optimera carry-kedjan.

Full dder in CMOS, 1 bit B V DD B B V DD C B V DD C C o C S C C B B B C B Hur stora skall transistorerna vara? Vilka ingångar skall placeras var? Är detta en bra lösning? Beror på, tex är det hastighet eller effektförbrukning. Finns mänger med adderar strukturer.

Det var lite om funktionaliteten. Nu till prestanda.

Hastighet Effektförbrukning L är kanallängden vilket refereras till som processen, technology, technology node, nges i meter. Vad är den idag?

Kapacitanser: påverkar hastigheten Source Gate Drain C GS C GD n + n + C G t ox C SB C DB Bulk Cap. Junction Cap. Overlap Cap. X d

Hastighet: en enkel model T pd k C V L DD ( VDD VT ) 2 Stor kapacitans ger långsamma kretsar:

Hastighet: en enkel model T pd k C V L DD ( VDD VT ) 2 Stor kapacitans ger långsamma kretsar: U in R U ut Liten RC a snabbare V C t / RC v t E 1 e Stor R or C a långsammare tid

Hastighet: en enkel model T pd k C V L DD ( VDD VT ) 2 mindre transistorer a lägre kapacitans a snabbare kretsar Så vad är problemet?

Hastighet: en enkel model T pd k C V L DD ( VDD VT ) 2 mindre transistorer a lägre kapacitans a snabbare kretsar Så vad är problemet? V DD?

Hastighet: en enkel model T pd T om pd k C V L DD ( VDD VT V DD C kv L V DD T 1 f ) 2 Stor approximation idag!

vståndet mellan V DD and V T har minskat. Olika VT ger olika karakteristik Shouri Chatterjee, Yannis Tsividis and Peter Kinget, nalog Circuit Design Techniques at 0.5V

Hastighet: en enkel model T pd T om pd k C V L DD ( VDD VT V DD C kv L V DD T 1 f ) 2 Så hastigheten är proportionell mot V DD. Varför inte öka den?

Hastighet: en enkel model T pd T om pd k C V L DD ( VDD VT V DD C kv L V DD T 1 f ) 2 Så hastigheten är proportionell mot V DD. Varför inte öka den? Effektförbrukningen ökar och små transistorer tål inte stora spänningar!

Effektförbrukningen hos CMOS Ignorerades tidigare men nu viktigt. P total P dynamic P static Historiskt den viktigaste.

Dynamisk Effektförbrukning V DD Uppladdning Urladdning P f C V 2 dynamic L DD P dynamic varierar med kvadraten på V DD a vi vill minska den a långsammare kretsar

V DD Statisk effektförbrukning: vad är strömmen när ingångarna ligger fast på högt eller lågt? Linear Saturation

Statisk effektförbrukning på grund av läckströmmar. V DD I leakage increases with decreasing V T P stat =I leakage V DD Drain Leakage I leakage Subthreshold Current

ln( I DS ) V T skalning: trade-off mellan V T och I OFF Performance vs Leakage: V T a I OFF I OFFL Low V T High V T I OFFH V TL V TH V G När V T minskar ökar läckströmmarna! Men vad var bra med låg V T? Snabbare kretsar!

Trender inom effektförbrukning From OptimizationDSP rchitecture Design Essentials By Dejan Markovic (UCL) and Robert W. Brodersen (UC Berkeley)

So what do we do? High V DD needed for high speed a High Power consumption! One possibility: parallel processing! Low V T needed for high speed a High leakage power! Two possibilities: Multiple V T Find ways to reduce leakage, e.g. power gating

The end of some scaling!

Going Multicore, e.g. Intel SandyBridge! 32 nm 64 bit 4 995 000 000 Transistors ~3.5 GHz 216 mm 2 (10x Pentium 4)

Going sub-threshold: long pursued in research. - Intel September 2011!

MOS transistorn Source Gate Drain Gate-oxid (isolerande) n + n + p - substrat WFER 2007-09-03 ESS010 - Konsumentelektronik: Överblick 60

Nya typer av transitor FinFET/TriGate Going from flat to three-dimensional in the conservative microchip industry is a radical shift, but as Leo Mathew, a research scientist at Freescale Semiconductor, says, the payoff will be substantial. Finns i Intels senaste processorer!

and new technologies!

You can learn more in ETIN20 Digital IC-design.

Computational Platforms What options do we have? What are the trade-offs?

Software vs. Hardware lgorithm Processor Programmable Low Design cost CPUs, micro processors, micro controllers, Programmble Hardware Reconfigurable hardware No processing Programble Logic Devices (PLD): PL, PL, FPGs, Gate array (include processing), Dedicated Hardware Process chips High Performance Low Power High cost

Software or Hardware? Flexibility Performance Requirements Power Consumption Throughput Cost Volume Know how Time to Market

Dedicated Hardware rchitecture, i.e. you design hardware that makes what you want! Special Purpose PLD, e.g. FPG SIC Gate rray PLDs: PL, Field Programmable Gate rrays, Reconfigurable Fast Turn round Prototyping pplication/lgorithm Specific Integrated Circuit High Calculation Capacity High Utilization Low Power Low Price at Volume

Energy and rea Efficiencies Energy efficiency (MOPS/mW) 1000 100 10 1 0,1 Microprocessors General Purpose DSP s Dedicated Designs 0,01 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Courtesy: Professor Bob Brodersen, UC Berkeley Chip Number (see next slide)

The Cost of pproaching Shannon s Bound Relative Complexity 100000 10000 1000 100 10 1 1/2 LDPC, N=10 7, 1100 iterations 2/3 Capacity Bound 1/2 Capacity Bound 1/2 Turbo, =4, N=64k 1,2, and 3 iterations 8/9 Capacity Bound 2/3 Turbo, =4, N=64k 1,2, and 3 iterations 8/9 LDPC, N=4k 1,3, and 5 iterations 8/9 Turbo, =4, N=4k 1/2 Conv. Code, =4, N=64k 0 2 4 6 8 10 12 SNR (db) for BER of 10-5 2/3 Conv. Code, =4, N=64k 8/9 Conv. Code, =3, N=4k Courtesy Engling Yeo, UCB

Relative Complexity The Cost of pproaching Shannon s Bound 100000 10000 1/2 LDPC, N=10 7, 1100 iterations LDPC-codes were proposed by Gallager in 1963. However, they were considered of limited practical use due to the implementation complexity. Now LDPC-codes are in standards. for BER of 10-5 1000 100 1/2 Capacity Bound 1/2 Turbo, =4, N=64k 1,2, and 3 iterations 10 1/2 Conv. Code, =4, N=64k 1 0 2 4 6 8 10 12 SNR (db) Courtesy Engling Yeo, UCB

Multiple ntenna Systems e.g. MIMO Tx Rx High complexity receiver Data S/P Tx Tx Tx Rx Rx Rx r = Hs + n Channel Estimation H ^ Symbol Detection H ^ -1 Matrix Inversion ^ s = H^ -1 r Multi-antenna approach exploits multi-path by sending data along several channels Results in large theoretical improvements in bandwidth efficiency for fading channels But computationally hungry PE PE PE PE PE PE PE PE PE QR-factorisation PE PE PE PE PE Inversion of triangular submatrix PE PE PE PE PE PE

MIMO Hardware perspective Tx Rx Data S/P Tx Tx Tx Rx Rx Rx r = Hs + n Channel Estimation H ^ Symbol Detection ^ H -1 Matrix Inversion ^ s = H ^ -1 r WLN 802.11n Example Modulation 256QM; 4 Tx antennas; 108 sub-channels, 4 s per symbol ML detection 1.159 x 10 17 lattice points/sec Current DSP technology is 1G inst/s 10 8 processors! OR ( Moores Law... processor capability doubles every 18 months) MUST WIT 40years! Mike Faulkner 2005, Victoria Univ.

Trading Complexity 4x4 antennas B E R Sub-optimal QPSK (square-root) 0.35 μm Sphere 16QM 0.35 μm ML-detection #mult/ symbol + Soft Output 0.13 μm

MIMO going Massive World Unique testbed 300kg 5kW@start-up Lots of cables! In cooperation with National Instruments (NI)!

LuMaMi: Lund Massive Mimo Testbed 50 NI USRP: 2 radio chains each a 100 simultaneous antennas Xilinx Kintex-7 That is where the power is going!

BNK 6 BNK 7 BNK 3 BNK 2 Block RM FPG: Virtex from Xilinx BNK 0 BNK 1 IOB CLB Routing Timing BNK 5 BNK 4

Example: Xilinx FPGs CLB CLB Switching matrix Horizontal Routing Channel CLB CLB Interconnect point Configurable Logic Block Combinational logic Storage elements R Vertical Routing Channel B/Q 1 /Q 2 C/Q 1 /Q 2 D B/Q 1 /Q 2 C/Q 1 /Q 2 D ny function of up to 4 variables ny function of up to 4 variables F G D in F G F G R D Q 1 CE R D Q 2 CE G F E Clock CE

Basic Spartan rchitecture Low end FPG

Xilinx Virtex-II Pro Heterogeneous Programmable Platforms FPG Fabric Embedded PowerPc Embedded memories Hardwired multipliers High-speed I/O Courtesy Xilinx

Examples of FPG Development Boards e.g. from Digilent (http://www.digilentinc.com) Nexys4 Board US$320/179 full/academic (2014) Xilinx rtix -7 FPG 128Mbit serial Flash Serial port, Ethernet, VG port, 3-axis accelerometer, PWM audio output, Temperature sensor, microphone, USB for mice, keyboards and memory sticks etc Virtex-V Board US$2199/799 full/academic (2014) Virtex-5 FPG 256MB DDR2 + 2 x 32MB Flash + more JTG, Ethernet, Video input, Video (DVI/VG) output, Stereo C97 codec with line in, line out, headphone, microphone, and SPDIF digital audio etc

Some remarks on FPGs Fantastic development platforms. Offers lots of processing resources. However: They are not low power Hardware uilization is rather low Lots of resources in configuration and routing

Other Platforms rduino rduino is a tool for making computers that can sense and control more of the physical world than your desktop computer. It's an open-source physical computing platform based on a simple microcontroller board, and a development environment for writing software for the board. Raspberry Pi From www.raspberrypi.org The Raspberry Pi is a low cost, credit-card sized computer that plugs into a computer monitor or TV, and uses a standard keyboard and mouse. It is a capable little device that enables people of all ages to explore computing, and to learn how to program in languages like Scratch and Python.

Memories are a crucial part of most designs.

Cell-phone SIC complexity and cost Courtesy: Sven Mattisson, Ericsson

Market for Memories ccording to a new technical market research report, semiconductor Memory: Technologies and Global Markets, the value of the global semiconductor memory industry was nearly $46.2 billion in 2009, but is expected to increase to nearly $79 billion in 2014, for a 5-year compound annual growth rate (CGR) of 11.3%. The largest segment of the market, DRM, or dynamic random access memory, is projected to increase at a CGR of 10.4% to $41.5 billion in 2014, after being valued at nearly $25.2 billion in 2009. NND, or nonvolatile/nno RM, which is the second-largest segment of the market, is estimated at $12.8 billion in 2009, and is expected to increase at a 5-year CGR of 15% to reach more than $25.7 billion in 2014. Source: Semiconductor Memory: Technologies and Global Markets, pril 2010 From http://www.electronics.ca/presscenter/articles/1272/1/global-market-for- Semiconductor-Memory-To-Be-Worth-79-Billion-In-2014/Page1.html Report Price: Price:USD $4,850.00!!! Motivation behind the quest for new memory technologies!

Example: FFT Design 8k points FFT for DVB (Digital Video Broadcasting) 1996-97 in 0.5 m CMOS Several embedded memories who s properties and size is crucial to the implementation

Sequential Circuits & D flip-flop Properties: can be dynamic or static Latch or Register Latch - level sensitive Register - edge triggered Flip-flop most often refer to an edge triggered register. From Lecture 3.

We have registers, why memories? Static D Flip-flop : 252µm 2 SRM Memory element : 30µm 2 0.35µm CMOS technology process

Flip-flops vs. SRM lcatel Microelectronics 0.35µm CMOS technology process Process and library dependent. 1.8 1.6 1.4 Flip-flops Dual port memory Single port memory Double width memory Flip-flops 1.2 square mm 1 0.8 0.6 0.4 0.2 SRM 0 0 500 1000 1500 2000 2500 3000 3500 4000 4500 memory elements

memory is more than the storage elements, i.e. the memory cells. ddress decoders, sense amplifiers, clock buffers, etc

The Complete Memory Memory array n Overview of Logic rchitectures Inside Flash Memory Devices NDRE SILVGNI, GIUSEPPE FUSILLO, ROBERTO RVSIO, MSSIMILINO PICC, ND STEFNO ZNRDI PROCEEDINGS OF THE IEEE, VOL. 91, NO. 4, PRIL 2003

Pseudo NMOS ROM V DD Pull Up BL[0] BL[1] BL[2] BL[3] WL[0] WL[1] Do we recognize this? WL[2] WL[3] The placements of transistors decide memory content.

Pseudo-NMOS Gates V DD Pull up network = load device Function? B OUT B GND OUT Pull down network 0 0 1 0 1 0 1 1 Properties: + fewer transistors -Static power consumption - Low input not 0

WL[0] WL[1] WL[2] WL[3] 1 1 0 1 Pseudo NMOS NND ROM BL[0] BL[1] BL[2] on on off V DD Pull Up on off BL[3] B Q 0 0 1 0 1 1 1 0 1 1 1 0 ll transistors ON pulls down Bit Line Non-selected WL =1 WL lines reversed Select WL[2] a WL[0,1,3]=1 and WL[2] = 0 Transistor on selected line shuts off path to GND

Pseudo NMOS NOR ROM WL[0] WL[1] V DD Pull Up No transitors = always pulled up GND B Q 0 0 1 0 1 0 1 0 0 1 1 0 One transistor ON pulls down Bit Line WL[2] WL[3] BL[0] BL[1] BL[2] BL[3] GND NMOS NOR ROM GND lines overhead rea Reduced by Mirroring

Pseudo NMOS NOR ROM WL[0]=0 WL[1]=0 V DD Pull Up GND B Q 0 0 1 0 1 0 1 0 0 1 1 0 One transistor ON pulls down Bit Line WL[2]=1 WL[3]=0 GND NMOS NOR ROM GND lines overhead rea Reduced by Mirroring 1 0 1 0 Select WL[2] a WL[0,1,3]=0 and WL[2] = 1

ROM or PL ND plane OR plane 0 /X 0 1 /X 1 2 /X 2 f 0 f 1 But now we had NND/NOR?

NND - NND ND - OR NOR - NOR

NOR or NND? NOR is faster no series transistors. NND is smaller no GND lines. You can see this if you look at e.g. FLSH memories

What is a Flash memory? ROM Read Only Memory RM Random ccess Memory FLSH

What is a Flash memory? ROM Read Only Memory data doesn t change data remain when powered down RM Random ccess Memory data can be both read and stored data disappears when powered down FLSH data can be both read and stored data remain when powered down

Floating Gate Transistor (FMOS) electrically programmable V TH Floating gate Control gate WL BL n + n + Control gate is connected to wordline Floating gate is left unconnected If charged heavily negative a High V TH a Never a channel If charged removed a Low V TH a ordinary operation EPROM, EEPROM and Flash has different ways of controlling the charge of the floating gate

Flash EEPROM Control gate Floating gate erasure n 1 source programming p- substrate Thin tunneling oxide n 1 drain

FLSH stucture V DD Pull Up word0 word1 GND word2 word3 GND Floating gate transistors everywhere!

FLSH write, e.g. trap charge V DD Pull Up word0 word1 GND word2 word3 GND = trapped charge. Transitor is always off a Same content as ROM.

Read-Write Memories (RM) Static (SRM) Data stored as long as supply is applied Large cells (6 transistors/cell) Fast Dynamic (DRM) Periodic refresh required Small cells (1-3 transistors/bit) Slower

Dynamic or Static RM 6-transistor SRM Cell M2 Q M5 M1 WL V dd M4 Q M6 M3 1-transistor DRM BL WL M 1 C S BL BL C BL Compare dynamic latch/register

How do we design today?

Design Flow: a simplified view HDL (VHDL/Verilog/...) Simulation Cell library Synthesis P&R Configuration Post-layout sim. Fabrication

To move to a higher abstraction levels. lgorithms, like for LDPC and MIMO, is often developed in MTLB or C/C++. How can we easier get from there to hardware? Is High Level Synthesis the answer? SystemC, CatapultC, Vivado, etc It s getting there!? Trend towards High Level Synthesis

You can learn more in: ETIN20 Digital IC-design, EITF35 Introduction to Structured VLSI Design and ETIN35/40 IC project 1 & 2