Institutionen för systemteknik Deprtment of Electricl Engineering Exmensrbete Prllel Evlution Of Fixed-Point Polynomils Exmensrbete utfört i Elektroniksystem vid Teknisk högskoln i Linköping v Shhid Nwz Khn LiTH-ISY-E--10/4406--SE Linköping 010 Deprtment of Electricl Engineering Linköpings universitet SE-581 83 Linköping, Sweden Linköpings teknisk högskol Linköpings universitet 581 83 Linköping
Prllel Evlution Of Fixed-Point Polynomils Exmensrbete utfört i Elektroniksystem vid Teknisk högskoln i Linköping v Shhid Nwz Khn LiTH-ISY-E--10/4406--SE Hndledre: Exmintor: Muhmmd Abbs isy, Linköpings universitet Oscr Gustfsson isy, Linköpings universitet Linköping, 14 September, 010
Avdelning, Institution Division, Deprtment Division of Electronic Systems Deprtment of Electricl Engineering Linköpings universitet SE-581 83 Linköping, Sweden Dtum Dte 010-09-14 Språk Lnguge Svensk/Swedish Engelsk/English Rpporttyp Report ctegory Licentitvhndling Exmensrbete C-uppsts D-uppsts Övrig rpport ISBN ISRN LiTH-ISY-E--10/4406--SE Serietitel och serienummer Title of series, numbering ISSN URL för elektronisk version http://www.es.isy.liu.se/ http://www.ep.liu.se Titel Title Prllell evluering v polynom i fix-tlrepresenttion Prllel Evlution Of Fixed-Point Polynomils Förfttre Author Shhid Nwz Khn Smmnfttning Abstrct In some pplictions polynomils should be evluted, e.g., polynomil pproximtion of elementry function nd frrow filter for rbitrry re-smpling. For polynomil evlution Horner s scheme uses the minimum mount of hrdwre resources, but it is sequentil. Mny lgorithms were developed to introduce prllelism in polynomil evlution. This prllelism is chieved t the cost of hrdwre, but ensures evlution in less time. This work exmines the trde-off between hrdwre cost nd the criticl pth for different level of prllelism for polynomil evlution. The trde-offs in generting powers in polynomil evlution using different building blocks(squrers nd multipliers) re lso discussed. Wordlength requirements of the polynomil evlution nd the effect of power generting schemes on the timing of opertions is lso discussed. The re requirements re clculted by using Design Anlyzer from Synopsys (tool for logic synthesis) nd the GLPK (GNU Liner Progrmming Kit) is used to clculte the bit requirements. Nyckelord Keywords Horner, Estrin, prllel, polynomil, evlution, Addition chins
Abstrct In some pplictions polynomils should be evluted, e.g., polynomil pproximtion of elementry function nd frrow filter for rbitrry re-smpling. For polynomil evlution Horner s scheme uses the minimum mount of hrdwre resources, but it is sequentil. Mny lgorithms were developed to introduce prllelism in polynomil evlution. This prllelism is chieved t the cost of hrdwre, but ensures evlution in less time. This work exmines the trde-off between hrdwre cost nd the criticl pth for different level of prllelism for polynomil evlution. The trde-offs in generting powers in polynomil evlution using different building blocks(squrers nd multipliers) re lso discussed. Wordlength requirements of the polynomil evlution nd the effect of power generting schemes on the timing of opertions is lso discussed. The re requirements re clculted by using Design Anlyzer from Synopsys (tool for logic synthesis) nd the GLPK (GNU Liner Progrmming Kit) is used to clculte the bit requirements. v
Acknowledgments Countless thnks to ALLAH Almighty, worthy of ll prise, Who guides us from drkness to light, nd mny blessings nd pece be upon Mohmmd s..w, the finl messenger of Allh. I m thnkful to my supervisor M.Sc. Muhmmd Abbs for his kind support nd guidnce. I m very thnful to my exminer Dr. Oscr Gustfsson, hed electronic systems division, for his help nd specilly for his encourgement. I m very thnkful to M.Sc. Zk ullh Sheikh nd M.Sc. Ysir Ali shh, who were lwys there whenever i needed there help. M.Sc. Frooq Ul Amin, the person who convinced me to come to linköping nd giuded me bout the study system. M.Sc. Abdul Mjid nd M.Sc. Owis, who relly encourged during my studies t linköping university. Finlly I would like to thnk my fmily, especilly my mother for her unconditionl support nd love.... vii
Contents 1 Introduction 5 1.1 Aim of the thesis............................ 5 1. Thesis orgniztion........................... 5 1.3 Different schemes............................ 6 1.3.1 Horner s scheme........................ 6 1.3. K-th Order Horner s scheme................. 7 1.3.3 Even Odd scheme....................... 8 1.3.4 Estrin s scheme......................... 9 1.3.5 A Simple Prllel Algorithm For Polynomil Evlution.. 10 1.3.6 Algorithm A.......................... 1 1.3.7 Direct Evlution........................ 13 1.4 A generl overlook........................... 14 1.4.1 Using two schemes t the sme time............. 14 Generting the power terms 15.1 Algorithms for short chins...................... 15.1.1 The Binry Method...................... 16.1. Fctor Method......................... 17.1.3 Power Tree Method...................... 18. Addition chins............................. 19.3 Implementtion issues......................... 0.3.1 Multiple power terms..................... 0.3. Criticl pth.......................... 1.4 A proposed solution...........................4.1 Generting ll powers of x from to N............4. Generting specific powers of x................ 3.4.3 Shring of hrdwre resources................. 4 3 Pipeline registers nd bit requirements for different schemes 9 3.1 Bit requirement............................. 9 3.1.1 Even Odd scheme....................... 9 3.1. Horner s scheme........................ 33 3.1.3 Estrin s scheme......................... 34 3.1.4 Lie s scheme........................... 34 ix
x Contents 3. Pipeline registers............................ 37 3..1 Horner s scheme........................ 37 3.. Even Odd scheme....................... 37 3..3 Direct Evlution........................ 37 3..4 Estrin s scheme......................... 37 3..5 K-th Order Horner s scheme................. 38 3..6 Lie s Algorithm........................ 38 3..7 Algorithm A.......................... 38 4 Results 41 4.1 Bit requirements for selected schemes................. 41 4.1.1 Horner s scheme........................ 41 4.1. Even Odd scheme....................... 48 4.1.3 Estrin s scheme......................... 56 4.1.4 Lie s scheme........................... 6 4. Pipelining requirements........................ 66 4..1 Comprison between Horner nd Even Odd schemes.... 66 4.. Comprison of different schemes............... 68 4..3 Comprison between selected schemes............ 68 4.3 Are comprison............................ 68 4.3.1 Theoreticl results....................... 69 4.3. Simultion results....................... 73 4.4 Criticl pth comprison........................ 74 4.4.1 With speed s first preference................. 74 4.4. With re s first preference................. 75 4.5 Power comprison........................... 75 5 Conclusions nd future work 77 5.1 Conclusions............................... 77 5. Future work............................... 79 Bibliogrphy 81
List of Tbles.1 Shortest ddition chins for vlues of N from 5 to 16........ 1. Generting multiple powers of x from to 3............ 6.3 Generting multiple even powers of x from to 3......... 7.4 Generting powers (multiples of 3) of x from to 3........ 7 4.1 Bit requirement Horner s scheme order 3.............. 4 4. Bit requirement Horner s scheme order 4.............. 43 4.3 Bit requirement Horner s scheme order 5.............. 44 4.4 Bit requirement Horner s scheme order 6.............. 46 4.5 Bit requirement Horner s scheme order 7.............. 47 4.6 Bit requirement Even Odd scheme order 3.............. 48 4.7 Bit requirement Even Odd scheme order 4.............. 49 4.8 Bit requirement Even Odd scheme order 5.............. 51 4.9 Bit requirement Even Odd scheme order 6.............. 5 4.10 Bit requirement Even Odd scheme order 7.............. 54 4. Bit requirement Even Odd scheme order 8.............. 55 4.1 Bit requirement Estrin s scheme scheme order 3........... 56 4.13 Bit requirement Estrin s scheme scheme order 4........... 58 4.14 Bit requirement Estrin s scheme scheme order 5........... 58 4.15 Bit requirement Estrin s scheme scheme order 6........... 60 4.16 Bit requirement Estrin s scheme scheme order 7........... 61 4.17 Bit requirement Lie s scheme scheme order 3............ 6 4.18 Bit requirement Lie s scheme scheme order 5............ 63 4.19 Bit requirement Lie s scheme scheme order 7............ 65
Contents List of Figures 1.1 Order 6 polynomil using Horner s scheme.............. 7 1. Order 6 polynomil using K-th Order Horner s scheme....... 8 1.3 Order 6 polynomil using Even Odd scheme............. 9 1.4 Order 6 polynomil using Estrin s scheme............... 10 1.5 Order 5 polynomil using Lie s scheme................ 1.6 Order 6 polynomil using Algorithm A................ 1 1.7 Order 6 polynomil using Direct Evlution.............. 13.1 Power Tree lgorithm......................... 19. Comprison between two chins of x 3.................3 Generting powers of x........................ 3.4 Generting even powers of x..................... 4.5 Generting powers of x for multiples of three............ 4.6 Generting powers of x for Estrin s scheme............. 5 3.1 Order 4 polynomil using Estrin s scheme.............. 30 3. Order 4 polynomil using Estrin s scheme with quntiztions... 30 3.3 Order 4 polynomil using Estrin s scheme with noise sources... 31 3.4 Fig. 3.3 lbeled for using integer optimiztion tool......... 3 3.5 Order 3 polynomil with Estrin s scheme, lbeled for using integer optimiztion tool............................ 33 3.6 Order 3 polynomil using Horner s scheme.............. 33 3.7 Order 3 polynomil using Horner s scheme with quntiztions.. 33 3.8 Order 3 polynomil using Horner s scheme with noise sources lbeled for integer optimiztion tool.................. 34 3.9 Order 5 polynomil using Estrin s scheme.............. 34 3.10 Order 5 polynomil using Estrin s scheme with quntiztions... 35 3. Order 5 polynomil using Estrin s scheme with noise sources nd lbeled for integer optimiztion tool................. 35 3.1 Order 7 polynomil using Lie s scheme................ 35 3.13 Order 7 polynomil using Lie s scheme with quntiztions..... 36 3.14 Order 7 polynomil using Lie s scheme with noise sources nd lbeled for integer optimiztion tool.................. 36 3.15 Pipelining in order 6 polynomil using Horner s scheme...... 37 3.16 Pipelining in order 6 polynomil using Even Odd scheme..... 38 3.17 Pipelining in order 6 polynomil using Direct Evlution...... 38 3.18 Pipelining in order 6 polynomil using Estrin s scheme....... 39 3.19 Pipelining in order 6 polynomil using K-th Order Horner s scheme 39 3.0 Pipelining in order 5 polynomil using Lie s Algorithm...... 40 3.1 Pipelining in order 5 polynomil using Algorithm A........ 40 4.1 Bit requirement for order 3 polynomil using Horner s scheme.. 41 4. Bit requirement for order 4 polynomil using Horner s scheme.. 4 4.3 Bit requirement for order 5 polynomil using Horner s scheme.. 43 4.4 Bit requirement for order 6 polynomil using Horner s scheme.. 45
Contents 3 4.5 Bit requirement for order 7 polynomil using Horner s scheme.. 45 4.6 Bit requirement for order 3 polynomil using Even Odd scheme.. 48 4.7 Bit requirement for order 4 polynomil using Even Odd scheme.. 49 4.8 Bit requirement for order 5 polynomil using Even Odd scheme.. 50 4.9 Bit requirement for order 6 polynomil using Even Odd scheme.. 50 4.10 Bit requirement for order 7 polynomil using Even Odd scheme.. 53 4. Bit requirement for order 8 polynomil using Even Odd scheme.. 53 4.1 Bit requirement for order 3 polynomil using Estrin s scheme... 56 4.13 Bit requirement for order 4 polynomil using Estrin s scheme... 57 4.14 Bit requirement for order 5 polynomil using Estrin s scheme... 57 4.15 Bit requirement for order 6 polynomil using Estrin s scheme... 59 4.16 Bit requirement for order 7 polynomil using Estrin s scheme... 59 4.17 Bit requirement for order 3 polynomil using Lie s scheme..... 6 4.18 Bit requirement for order 5 polynomil using Lie s scheme..... 63 4.19 Bit requirement for order 7 polynomil using Lie s scheme..... 64 4.0 Are comprison between Horner nd Even Odd schemes..... 66 4.1 Register requirements rtio betweenhorner nd Even Odd schemes 67 4. Register requirements rtio betweenhorner nd Even Odd schemes 67 4.3 Are requirement comprison between different schemes...... 68 4.4 Are requirement comprison between selected schemes...... 69 4.5 Are comprison between Horner nd Even Odd schemes..... 70 4.6 Are comprison mong different schemes.............. 70 4.7 Are comprison between different schemes with squrers..... 71 4.8 Are comprison mong selected schemes with squrers...... 7 4.9 Are comprison mong selected schemes without squrers.... 7 4.30 Are requirement comprison between Horner nd Even Odd schemes 73 4.31 Are comprison mong selected schemes.............. 74 4.3 Criticl Pth comprison mong different schemes with speed preferred over re............................. 75 4.33 Criticl Pth comprison mong different schemes with re preferred over speed............................ 76 4.34 Power requirement comprison mong selected schemes...... 76
Chpter 1 Introduction In some pplictions polynomils should be evluted. This is the cse for polynomil pproximtion of elementry functions nd Frrow filters tht cn be used for rbitrry resmpling. The input to the polynomil evlution is the polynomil coefficients, i nd the input dt x. The computtion to be performed for n N-th order polynomil is N P(x) = i x i (1.1) If the polynomil is fixed it is possible to optimize, the rchitecture further. However, here we primrily consider rchitectures for generl polynomil coefficients. We will first discuss simpler nd seril rchitectures like Horner s scheme nd then we will move towrds prllel rchitectures tht require more hrdwre nd give better speed. 1.1 Aim of the thesis i=0 The min objectives of the thesis re s follows: Study the trde-off between the number of opertions nd the criticl pth for vrious degrees of prllelism in polynomil evlution. Study the trde-off in generting powers nd polynomil evlutions using different building blocks (squrers, multipliers etc). Study how the different power generting schemes ffect the timing of opertions. Study wordlength requirements of polynomil evlution. 1. Thesis orgniztion The evlution of polynomil in prllel wy cn be brodly divided into two mjor tsks: 5
6 Introduction First one is t lgorithm level, in which structure is expnded into more brnches, i.e., introduce prllelism, with minimum requirement of powers of x. Second tsk is the efficient wy to generte powers of x with minimum hrdwre. The criticl pth cn lso be n importnt spect in genertion of powers of x. Following the introduction in chpter one, different lgorithms nd schemes tht introduce some kind of prllelism in the polynomil evlution re discussed. These different schemes re explined in detil with their dvntges nd disdvntges. In chpter two the power genertion techniques re discussed in detil with their reltive dvntges nd disdvntges. A simple solution is lso discussed for the short-comings in the existing methods. In chpter three, pipelining register nd bit requirements re clculted. The pipeline registers re introduced fter ech rithmetic opertion. The bit requirements for selected schemes re clculted by using integer optimiztion tool. Chpter four focuses on results obtined nd compres these schemes on the bses of these results. Finlly chpter five contins conclusions nd future work. 1.3 Different schemes Eqution (1.1) cn be fctored in different wys with different trde-offs. Some of the schemes used re discussed below. 1.3.1 Horner s scheme Consider polynomil P(x) of degree N in (1.1) divide P(x) by liner fctor x x 0, we get P(x) = (x x 0 )(b 1 + b x +... + b N x N 1 ) + b 0. (1.) By equting with the (1.1) the vlues of b 0, b 1...b N cn be clculted s b N = N, (1.3) b j = j + x 0 b j+1, j = N 1,...0, (1.4) b j cn be clculted from (1.3) nd (1.4) nd from (1.) we get P(x) = b 0. This method of polynomil evlution is clled Horner s rule [4] nd cn lso be expressed by P(x) = 0 + x( 1... + x( N + x( N 1 + Nx ))...) (1.5)
1.3 Different schemes 7 This method hppens to be the most economicl of ll the possible methods in terms of rithmetic opertions required to clculte P(x). It requires N multiplictions nd N dditions to clculte P(x). For order 6 we hve P(x) = 0 + x( 1 + x( + x( 3 + x( 4 + x( 5 + 6 x))))) (1.6) 4 3 1 0 5 6 P(x) Figure 1.1. A 6-th order polynomil in (1.6) implemented ccording to Horner s scheme. Pros nd Cons of Horner s scheme Minimum number of hrdwre resources re gurntied by this method. No need to generte powers of x, so neither ny logic for generting powers nor ny hrdwre for it is required. Its the most simplest of vilble schemes. The length of criticl pth is comprtively the longest. It is cler from (1.6) nd Fig. 1.1 tht this method is sequentil in its pproch becuse the next clcultion is dependent on the result of the previous clcultion nd cnnot be strted before getting the previous result. So no prllelism is possible in this cse. 1.3. K-th Order Horner s scheme As lredy known tht Horner s scheme is the most simplest scheme. To introduce some kind of prllelism in it, generliztion on Horner s scheme is developed by W.S.Dorn [4]. If P(x 0 ) represents the Horner s scheme for P(x) which ws divided by x x 0 then in generliztion P(x) is divided by polynomil q(x) for which q(x o ) is true, then the result P(x 0 ) is obtined t x = x 0 By choosing q(x) = x k x k 0, where k 1, P(x) = (x k x k 0 )(b k + b k+1 x +... + b N x N k )(b k 1 x k 1 +... + b 1 x + b 0 ). (1.7) By compring (1.7) with (1.1), we get b j = j, j = N,...N k + 1, b j = j + x k 0b j+k, j = N k,...0,
8 Introduction P(x 0 ) = b k 1 x k 1 0 +... + b 1 x 0 + b 0. Its the sme Horner s scheme for K = 1. But with vlues of K greter thn 1 it tends to produce prllel structure. For N = 6 nd K = 3 the K-th Order Horner scheme is given by P(x) = 0 + x( 1 + 4 x 3 ) + x ( + 5 x 3 ) + x 3 ( 3 + 6 x 3 ). (1.8) 5 4 3 1 0 P(x) 3 3 6 3 Figure 1.. 6-th order polynomil rrnged ccording to K-th Order Horner scheme represented by (1.8) 1.3.3 Even Odd scheme Its specil cse for the K-th Order Horner s scheme for K =. Its clled Even Odd becuse of its two brnches, one consists of even powers of the polynomil nd the other consists of the odd powers of the polynomil. Hence the single pth is divided into two pths. The increse in the hrdwre is only one extr multiplier used for clculting x tht is needed in this scheme f(x) = 0 + x( 1 + x ( 3 + 5 x )) + x ( + x ( 4 + 6 x )). (1.9) Pros nd Cons of K-th Order Horner s scheme It cn increse the prllelism by K t lest theoreticlly. Or in other words we cn sy tht it decreses the processing time by K times. Criticl pth is reduced becuse of the prllel structures. Different sub-structures cn be formed by different vlues of K. Importnt one to mention here is the one with K = here nmed s Even Odd scheme.
1.3 Different schemes 9 5 3 1 0 P(x) 6 4 Figure 1.3. 6-th order polynomil rrnged ccording to Even Odd scheme represented by (1.9) The processing time is decresed but the decrese is not K times if we include the time to generte powers of x. Generting powers of x is n extr hedche in this cse which requires some lgorithm nd hrdwre. 1.3.4 Estrin s scheme In [5], Gerld Estrin cme up with scheme which is known now by his nme. As seen in Horner s scheme every thing runs in series so its long pth to the end. Estrin s scheme lso introduces prllelism to decrese the criticl pth. In Estrin s method sub-expressions of the form (A+Bx) nd x n re isolted. Eqution (1.1) for N = 6 cn be written s P(x) = ( 0 + 1 x) + ( + 3 x)x + (( 4 + 5 x) + 6 x )x 4. (1.10) We get (1.10) from (1.1) this is simplified in the following wy [9] where nd P(x) = q(x)x (N/)+1 + r(x), q(x) = N x N/ +... + (N/)+1, r(x) = N/ x N/ +... + 0. Estrin s scheme [5] is n intelligent scheme. As the polynomil goes lrger, the more prllel it becomes nd it lso uses squre terms which re esy to implement thn odd terms. As squrer is more suitble thn multiplier, in terms of hrdwre. Pros nd Cons of Estrin s scheme An intelligent lgorithm which chnges itself with the chnge in order of the polynomil unlike K-th Order Horner in which the lgorithm depends
10 Introduction on the vlue of K nd this vlue is different for different polynomil orders depending upon the requirements to meet. Along with its dvntge of incresing prllelism with increse in polynomil order nother dvntge is tht it only requires the powers of x which re squres of the previous one. This is the minimum time to generte specific power of x. In generl it my be better for introducing prllelism but for specific order of polynomils, it hs only one solution nd tht my not be the optiml. In other schemes we cn chnge the vlue of specific given vrible to get our desired results. 1 3 0 5 6 4 P(x) Figure 1.4. 6-th order polynomil rrnged ccording to Estin s scheme represented by (1.10) 1.3.5 A Simple Prllel Algorithm For Polynomil Evlution In [8] prllel lgorithm for polynomil evlution is presented for polynomil P(x) of degree N P(x) = N i x i, i=0 N + 1 = KL, where the number of processors P = L + 1. Divide N + 1 terms of P(x) in L groups
1.3 Different schemes where L 1 P(x) = b i x ik i=0 b i = ik + ik+1 x + ik+ x +... + ik+k 1 x K 1 for i = 0, 1,...L 1. For n even N+1 vlue, [8] gives the mximum flexibility of prllelism. Tking the dvntge of the fct tht every even number is divisible by two, when K is set equl to, we get mximum prllelism s L is mximum t tht time N + 1 = KL, L = (N + 1)/. Lets tke n exmple of structure of this type with order N = 5 nd for L = 3 in this cse P(x) = ( 0 + 1 x) + ( + 3 x)x + ( 4 + 5 x)x 4. (1.) But for odd N + 1, this level of prllelism is not chievble by this lgorithm. 0 1 3 P(x) 5 4 Figure 1.5. 5th order polynomil rrnged ccording to Lie s scheme represented by (1.) Among odd vlues of N +1, vlues which re divisible by 3, i.e., K = 3 give better prllelism. Prllelism reduces when this rtio for N + 1 is neither divisible by nor 3 such s 5, 49 etc. But the worst cse occurs when N +1 is prime number. In this scenrio the lgorithm completely fils s for s prllelism is concerned
1 Introduction Pros nd Cons of Simple Prllel Algorithm for Polynomil Evlution When its pplicble it cn give the best results long with the option of chnging the structure ccording to your priorities, whether im is to minimize re or criticl pth. Unfortuntely, its not pplicble generlly due to certin limittions. 1.3.6 Algorithm A This lgorithm is defined in [9] long with its different versions. According to this lgorithm if we hve K < N processors, we cn express polynomil of degree N s follows: P(x) = A 0 (x) + A 1 (x) +... + A k 1 (x) k 1, where = x N/k nd A i re the polynomils of degree N/k. Using different possible vlues of K, different spects of the structure cn be trgeted. A 6-th order polynomil s shown in (1.1) with N = 6 cn be represented using lgorithm A s: P(x) = 0 + 1 x + x + 3 x 3 + ( 4 x + 5 x + 6 x 3 )x 3. (1.1) 1 0 4 3 P(x) 5 3 6 3 Figure 1.6. 6-th order polynomil rrnged ccording to Algorithm A scheme represented by (1.1) Pros nd Cons of Algorithm A Cn provide better prllelism.
1.3 Different schemes 13 A set bck is tht its lso not pplicble to ll polynomils generlly. 1.3.7 Direct Evlution This method seems the most prllel version. The required powers of x re computed nd multiplied with the coefficients in prllel. No lgorithm is required P(x) = 0 + 1 x + x + 3 x 3 + 4 x 4 + 5 x 5 + 6 x 6. (1.13) 1 0 4 3 3 4 5 6 P(x) Figure 1.7. 6-th order polynomil rrnged ccording to Direct Evlution represented by (1.13) Pros nd Cons of Direct Evlution Simple in sense tht no lgorithm is needed. Generte the powers, multiply with constnts nd then dd them. Advntge which might be ttrcting us is the shortest criticl pth but it does not turn out to be the shortest one. The number of dders is not n issue in ll these schemes becuse the dders re lwys the sme in ech scheme for given polynomil. The difference is in the number of multipliers used in different schemes s fr s the hrdwre cost comprison is concerned. In ll the previous schemes we need only some specific powers of x to be generted which is lwys less thn the order of the polynomil, but in this cse we hve to generte ll powers of x from two to the order of polynomil. Here we re considering generl polynomil so ny coefficient being zero cse is ignored becuse tht would be ddressing specific cse rther thn generl cse.
14 Introduction 1.4 A generl overlook As we move form Horner s scheme to the Direct Evlution, the demnd for hrdwre resources tends to increse but t the sme time the prllelism lso seems to increse. So before looking into it in detil, it cn be observed tht prllelism is t the cost of hrdwre. If we gin one dvntge we hve to compromise on the other. As the min im is to introduce prllelism in the whole structure s much s possible with minimum hrdwre, ll the schemes re definitely going to be more expensive thn the Horner s scheme which is sequentil. A trend seen in ll the schemes which offers vrible controlling the prllelism. By chnging the vrible, complete dvntges nd disdvntges cn be chnged. The decision of selecting prticulr vrible my not be ccording to the generl trends tht re followed up till tht point. More clerly it cn be sid tht the vlues of the vrible should be selected which is best ccording to the given conditions nd requirements. Becuse in the sme scheme if one vlue of tht vrible is selected it mkes the scheme good in terms of re nd if other vlue is selected it become good for the criticl pth nd the re cost might increse. 1.4.1 Using two schemes t the sme time For the improvement of performnce of some schemes, there is good chnce to use nother scheme inside tht scheme. Some schemes tend to divide polynomil into different sections nd if tht section is nother polynomil then some scheme cn be used on tht inner polynomil lso. This is shown below for n -th order polynomil implemented by Lie scheme (Section 1.3.5) nd Horner s Rule (Section 1.3.1) being pplied to the inner polynomil Improvement for Lie s scheme K 3 With K 3, the inner sum cn be considered s n independent polynomil of degree K-1 nd different techniques cn be used to simplify the sum in terms of rithmetic opertions nd more prllelism, e.g., we cn use Horner s method for the inner polynomil. Let N =, so N + 1 = 1, L=6, for K=,we get P(x) = 0 + 1 x+x ( + 3 x)+x 4 ( 4 + 5 x)+x 6 ( 6 + 7 x)+x 8 ( 8 + 9 x)+x 10 ( 10 + x), now for K = 3, for the sme polynomil, i.e., N =, we get P(x) = 0 + 1 x+ x +x 3 ( 3 + 4 x+ 5 x )+x 6 ( 6 + 7 x+ 8 x )+x 9 ( 9 + 10 x+ x ), now with Horner s rule pplied to the inner sum, P(x) = 0 +x( 1 + x)+x 3 ( 3 +x( 4 + 5 x))+x 6 ( 6 +x( 7 + 8 x))+x 9 ( 9 +x( 10 + x)).
Chpter Generting the power terms.1 Algorithms for short chins Aprt from Horner s scheme when we move to more prllel schemes we hve to del with the higher powers of x s the order of the polynomil increses. In Horner s scheme we only need x so we do not need to generte powers of x. In Estrin s scheme e.g., we my need x 16. In order to get x 16 either we strt with x nd multiply it 15 times with x to get x 16 or to think of more efficient method in which the number of multiplictions cn be reduced. An efficient method would be to generte x,x 4,x 8 nd x 16 by squring the previous result successively, in this wy we cn reduce the number of multiplictions from 15 to 4. x = x x x 4 = x x x 8 = x 4 x 4 x 16 = x 8 x 8 If we write the powers of x successively in set we get s = (1,, 4, 8, 16). Lets tke nother exmple of x 3. Its genertion sequence would be like this x = x x x 3 = x x x 5 = x x 3 x 8 = x 5 x 3 x 13 = x 5 x 8 x 1 = x 13 x 8 x 3 = x 1 x. If this is lso written s set then we get s = (1,, 3, 5, 8, 13, 1, 3). It s clled 15
16 Generting the power terms chin for 3 of length 7, since = 1 + 1 3 = + 1 5 = 3 + 8 = 5 + 3 13 = 8 + 5 1 = 13 + 8 3 = 1 +..1.1 The Binry Method The binry method for clculting short chin is one of the ncient methods known nd it ppered before 00 B.C. The lgorithm In order to clculte x N by this method, we convert N into binry form nd then remove zeros t the left. Replce 1 s by S nd 0 s by S. Remove S on the left nd the remining prt is the rule to clculte x N. S is squring function nd is multipliction with x. Lets tke x 3 s n exmple. In this cse we hve N = 3. First step is to convert 3 into binry form, i.e., 10. Now replce 1 s by S nd 0 s by, we get SSSSS. Remove S from the left, we get SSSS. which mens the sequence of opertions is squring, squring, multipliction by x squring, multipliction by x, squring nd then multipliction by x [7]. This cn be expressed s x = x x x 4 = x x x 5 = x x 4 x 10 = x 5 x 5 x = x 10 x x = x x x 3 = x x s = (1,, 4, 5, 10,,, 3). The min dvntge of this method is tht, temporry storge is only required for x nd current prtil results. Also its simplicity mkes it the most cited method in the literture. Mny uthors thought of it s the optiml lgorithm, but its not true for ll vlues of N. Lets tke x 15 s n exmple. In this cse we hve N = 15, which in binry is. Replcing 1 s by S, we get SSSS. Removing
.1 Algorithms for short chins 17 the left most S we re left with SSS. So the sequence to generte x 15 would be x = x x x 3 = x x x 6 = x 3 x 3 x 7 = x 6 x x 14 = x 7 x 7 x 15 = x 14 x s = (1,, 3, 6, 7, 14, 15). By binry method we need 6 multiplictions to get x 15 from x, but we cn get the sme result in 5 multiplictions x = x.x x 3 = x x x 6 = x 3 x 3 x 9 = x 6 x 3 x 15 = x 9 x 6 s = (1,, 3, 6, 9, 15). Another wy to represent the binry method to clculte short chin for x N cn be expressed s [3] x if N =1 x N = x N/ x N/ if N is even x N 1 x otherwise..1. Fctor Method This method s evident by its nme is bsed on fctoriztion of N. It works entirely different from the binry method. The lgorithm In order to clculte x N, we tke N = p q, where p is the smllest prime fctor of N nd q > 1. First x p clculted nd then subsequently rising the outcome to the q-th power. For the cse where N is prime number, x N 1 is clculted by multiplying by x. For N = 1 we do not need ny clcultions. By pplying this lgorithm recursively, we cn clculte x N. lets tke x 55 s n exmple. Here N = 55 N = (p q) so p = 5 nd q =
18 Generting the power terms Now we gin pply this lgorithm on p nd q. As p is prime number so y = x 5 = x 4 x = (x ) x y = y 10 y = (y ) 5 y Which cn be represented in more demonstrtive wy s following x = x x x 4 = x x x 5 = x 4 x x 10 = x 5 x 5 x 0 = x 10 x 10 x 40 = x 0 x 0 x 50 = x 40 x 10 x 55 = x 50 x 5 s = (1,, 4, 5, 10, 0, 40, 50, 55). To clculte x 55, 8 multiplictions re needed by this method, while binry method requires 9 multiplictions for the sme clcultion. Generlly this method is better thn the binry method but not lwys. The minimum vlue for which binry method is better thn fctor method is N = 33. [7]..1.3 Power Tree Method Another grphicl method which give the miniml ddition chins, i.e., minimum multiplictions for reltively smll vlue of N is shown in Fig..1 in [7]. To find the desired result, the required N in the tree is found nd then the pth from the strt to the N gives the desired sequence of opertions required to clculte x N. The lgorithm We suppose tht i levels of the tree re completed nd we hve to mke the (i+1)- th level in the tree. Tke node, e.g., N in the i-th level strting from the left nd moving towrds right. Attch nodes N +1, N + 1, N +..., N + i 1 = N to node N. Where 1, 1,..., i 1 is the pth from the strting point of the tree to the node N. The node tht hs lredy been declred is not repeted. Lets tke N = 14 s n exmple Fig..1. Here 1,, 3, 4 =, 3, 5, 7 respectively. The new nodes cn be 15, 16, 17, 19, 1, 8, but s it cn be seen tht 15, 16, 17 re lredy declred in the previous level so these re not declred gin. We re left with three nodes, i.e., 19, 1, 1 from node N = 14, s shown in Fig..1. It hs been verified tht this method gives optiml results for the vlues of N listed in this exmple. But this might not be the cse for lrge enough vlues. Minimum vlues of N for which this method is not the best re N = 77, 154, 33. The minimum vlue for which it overtkes both of Binry nd Fctor method is for
. Addition chins 19 1 3 4 5 6 8 7 10 9 1 16 19 14 13 15 0 18 4 17 3 1 8 3 6 5 30 40 7 36 48 33 34 64 38 35 4 9 31 56 44 46 39 5 50 45 60 41 43 80 5437 7 4951 96 66 68 65 18 Figure.1. Power Tree. N = 3. Its not lwys better thn Fctor method. For N 100000, it is better thn the Fctor method 88803 times. It gives the sme results s tht of Fctor method 191 times. It only loses 6 times to Fctor method.[7].. Addition chins We find out tht the problem of generting powers of x with minimum number of multiplictions is ctully the problem of finding the lest ddition chin of the integer. In our cse this integer is the power of x. In the bove two exmples we derived ddition chins for 16 nd 3. An ddition chin for n integer N is n scending list of integers 1 = 0, 1,..., r = N such tht ny element except the first one cn be represented s the sum of two preceding elements [3]. In [13], new prospective of the problem ws studied nd tht ws differentition between squrer nd multiplier in ddition chins which is not considered before. The ide of tking squrer s different entity from multiplier is more beneficil, s shown by the work done in [13]. According to [13] if the cost of re needed by binry dder on n FPGA is considered to be C = n, where n is the input bit size. Then clcultions for cost of binry multipliction nd binry squring cn be done. From [1] it is observed in [13] tht prllel rry multiplier consists of n n-bit dders. Prllel dd vector bsed multiplier lso needs similr hrdwre resources. To produce prtil products n AND opertions re needed. The multiplier cost on n FPGA is clculted s C m = n. In [10], it is shown tht by computing squre the number of prtil product bits cn be reduced to hlf. On the bsis of this work, the cost of squrer on n FPGA clculted to be C s = n. If this is true then definitely the ddition chins which consist of more squrers would give economicl results s fr s the re cost is concerned. Lets tke N = 5 s n exmple. The two possible chins re
0 Generting the power terms chin1 = 1,, 3, 5 chin = 1,, 4, 5. Both these chins re of sme length, if we only consider the number of multiplictions then both of these re the sme, but if we squrers s different entity, then these re different from ech other. The re cost on n FPGA for chin1 is C m +C s nd for chin is C m +C s. It cn be seen tht chin is definitely more economicl, keeping in mind tht the sme number of multipliction opertions re needed for both chins. In [13] due to the bove stted reson the miniml cost ddition chin problem is defined s C A = Aw(), where C m if = b + c, b c,, b A w() = C s if = b, b A 0 if = 1 The following theorems re derived in [13]. Theorem 1 If C s nd C m re re cost of squrer nd multiplier respectively, then lower bound for the cost of ddition chin for N is C s log N + C m log v(n), where v(n) represents the number of binry ones in the binry representtion of N [13] Theorem If C s nd C m re re cost of squrer nd multiplier respectively, then lower bound for the cost C pq of ny ddition with elements from p to q where p > q is given by: () If q t p, then if q even C pq = tc s, otherwise C pq = (t 1)C s + C m. (b) If q is odd nd q 3 t r, then C pq = (t + 1)C s + C m [13]..3 Implementtion issues.3.1 Multiple power terms After considering ll these power genertion techniques there re still certin limittions in these methods when prcticl issues re tckled. First nd the most importnt short coming of ll the bove stted methods is tht ll these methods focus on getting the shortest ddition chin for single vlue of N but in rel polynomil evlution tsk we encounter more thn one powers of x t time. And if we go with the bove methods the hrdwre cost for the specific genertion of power my be the minimum but if we hve to clculte more thn one powers thn the overll cost will definitely increse. Reson is tht the genertion of powers is independently done. From Tble.1 it cn be seen tht generting different
.3 Implementtion issues 1 Tble.1. Shortest ddition chins for vlues of N from 5 to 16. N Shortest Addition Chins 5 (1,, 3, 5)(1,, 4, 5) 6 (1,, 3, 6)(1,, 4, 6) 7 (1,, 3, 4, 7)(1,, 3, 5, 7)(1,, 3, 6,7)(1,, 4, 5, 7)(1,, 4,6, 7) 8 (1,, 4, 8) 9 (1,, 3, 6, 9)(1,, 4, 5, 9)(1,, 4, 8, 9) 10 (1,, 3, 5, 10)(1,, 4, 5, 10)(1,,4, 6, 10)(1,, 4, 8, 10) (1,, 3, 4, 7, )(1,, 3, 4, 8, )(1,,3, 5, 6, )(1,, 3, 5, 8, ) (1,, 3, 5, 10, )(1,, 3, 6, 8, )(1,, 3, 6, 9,)(1,, 4, 5, 6, ) (1,, 4, 5, 7, )(1,, 4, 5, 9, )(1,, 4, 5, 10,)(1,, 4, 6, 7, ) (1,, 4, 6, 10, )(1,, 4, 8, 9,, )(1,, 4, 8, 10,) 1 (1,, 3, 6, 1)(1,, 4, 6, 1)(1,, 4,8, 1) 13 (1,, 3, 5, 8, 13)(1,, 3, 5, 10, 13)(1,,3, 6, 7, 13)(1,, 3, 6, 1, 13) (1,, 4, 5, 8, 13)(1,, 4, 5, 9, 13)(1,, 4, 6, 7, 13)(1,, 4, 6, 1, 13) (1,, 4, 8, 9, 13)(1,, 4, 8, 1,13) 14 (1,, 3, 4, 7, 14)(1,, 3, 5, 7, 14)(1,,3, 6, 7, 14)(1,, 3, 6, 8, 14) (1,, 3, 6, 1, 14)(1,, 4, 5, 7, 14)(1,,4, 5, 9, 14)(1,, 4, 5, 10, 14) (1,, 4, 6, 7, 14)(1,, 4, 6, 8, 14)(1,, 4, 6, 10, 14)(1,, 4, 6, 1, 14) (1,, 4, 8, 10, 14)(1,, 4, 8, 1, 14) 15 (1,, 3, 5, 10, 15)(1,, 3, 6, 9, 15)(1,, 3, 6, 1,15)(1,, 4, 5, 10, 15) 16 (1,, 4, 8, 16) powers require different shortest ddition chins independent of ech other. As n exmple tke N = 8 it hs shortest ddition chin of (1,, 4, 8) which does not contin 7, 6, 5, 3 similrly in cse of 7, 6 nd 5 sme sitution cn be seen. This sitution gets worse s the vlue of N increses long with the required number of powers to be generted..3. Criticl pth The second importnt issue which is not covered in shortest ddition chins or ny of the previous methods for generting power is the criticl pth from x to x N. It hs been observed tht in ddition chins nd the other power genertion schemes, the pth my be shortest in terms of the number of ddition but it my not be shortest is terms of criticl pth. Lets tke n exmple in Fig... Two chins for generting x 3 re compred. Let T be the time tken by single multipliction or squring, so the tie of single multipliction opertion is scled horizontlly. In this prticulr cse sizes of ll the multipliers re tken sme which might not be the cse in the rel time scenrio. Here comprison is mde between the criticl pths of the two power genertion schemes, so different multiplier sizes will not effect the result concluded. The chin for the scheme on top in Fig.. is 1,, 3, 4, 7, 8, 16, 3 nd for the scheme t the bottom, it is 1,, 3, 5, 10, 0, 3. It cn be seen tht the scheme t
Generting the power terms Squrer Multiplier 4 4 16 3 7 3 3 5 10 0 3 1T T 3T 4T 5T 6T Time Comprison Figure.. Comprison. the bottom is shortest ddition chin nd its length is 6, where s the scheme t the top hs length of 7. But the importnt point to mention is tht the scheme t the top gives result x 3 fter 5 multipliction times, i.e., 5T but the scheme t the bottom gives the sme result fter 6 multipliction times. Disdvntge is tht its chin length is 7..4 A proposed solution We hve to del with three issues in combine wy which re explined seprtely in the previous section. How to use minimum resources to generte multiple powers. Observe the trde-offs between the lest number of multiplictions nd minimum criticl pth. Tke dvntge of the less complexity of squrer over multiplier..4.1 Generting ll powers of x from to N Here we hve N = 3 s shown in the Tble., S represents squrer nd M represents multiplier. The subscripts with M nd S represent their corresponding position in the power genertion string. Column shows the length of shortest ddition chins for respective vlues of N. Column 3 shows the requirement of squrers nd multipliers for the genertion of tht specific power of N by the proposed method. Column 4 shows the multipliers/squrers in the criticl pth for generting tht specific power in the proposed method.
.4 A proposed solution 3 The 5-th nd the lst column shows the totl expenditure of multipliers nd squrers up to tht prticulr vlue of N for the proposed method. A comprison between length of shortest ddition chins nd tht of the proposed solution is mde in column nd 3. We find out tht the length of shortest ddition chin is less thn the proposed one only t N = 3, 7, 30, 31 for N = 1 to N = 3. From Tble.1 it cn be observed tht generting tht power independently would require lot more resources thn tht of the proposed solution in Tble.. The reson for this is tht mximum number of multipliction opertions re converted to squre opertion nd lso the criticl pth issue hs been tken into ccount. A scheme showing the power genertion of N from 1 to 16 is shown in Fig..3. Squrer Multiplier 13 5 10 4 3 8 6 16 1 7 14 9 15 1 3 4 Figure.3. Generting powers of x.4. Generting specific powers of x In the rel time implementtion of polynomil evlution schemes we find some schemes in which we only need some specific powers. If we follow the bove method we my be using excessive resources which might not be needed. Lets nlyze scenrio in which we only need even powers of x in polynomil evlution. In Tble.3 scheme for generting only even powers of x hs been shown for the vlues of x from to 3. If it is compred with the Tble., specilly the lst column it clerly indictes tht the hrdwre cost hs been reduced by hlf. A power generting scheme for generting only even powers of x is shown in Fig..4 for generting powers of x from x to x 3. Tble.4 shows n exmple of generting powers of x tht re multiples of three.the lst column cn be observed to see the reduction in the multiplictions nd squring opertions. Sme method is explined in Fig..5 for N = 1 to N = 15. Figure.6 shows the genertion of
4 Generting the power terms Squrer Multiplier 10 4 8 16 6 1 14 1T T 3T 4T Figure.4. Generting even Powers of x powers of Estrin s scheme. Squrer Multiplier 3 6 1 15 9 1T T 3T 4T 5T Figure.5. Generting powers of x for multiples of three.4.3 Shring of hrdwre resources This method is independent of ll other methods explined bove nd it cn be used to further reduce the cost of implementtion by reducing the cost of hrdwre nd power. Ide is simple, we know tht x is sme for generting ll powers so if we see inside multiplier circuit nd the multipliction principle we find out tht some of the prtil products generted in generting power of x do repet in the prtil products obtined during the genertion of nother power of x (This
.4 A proposed solution 5 Squrer Multiplier 4 8 16 1T T 3T 4T Figure.6. Generting Power of x for Estrin s scheme probbility certinly increses if the powers of x being compred re djcent to ech other, e.g., x 5 nd x 6 ). These prtil products cn be shred insted of producing them gin nd gin. This my increse the interconnecting issues inside the circuit. The interconnect re might become n dditionl overhed.
6 Generting the power terms Tble.. Generting multiple powers of x from to 3 (S=squrer nd M=Multiplier). N Length of Multiplictions Criticl Pth Multipliers needed Shortest for single to generte powers Addition chins power up to N 1 S = 1, M = 0 S 1 S = 1, M = 0 3 S = 1, M = 1 S 1 M 1 S = 1, M = 1 4 S =, M = 0 S 1 S S =, M = 1 5 3 S =, M = 1 S 1 S M S =, M = 6 3 S =, M = 1 S 1 M 1 S 3 S = 3, M = 7 4 S =, M = S 1 M 1 M 3 S = 3, M = 3 8 3 S = 3, M = 0 S 1 S S 4 S = 4, M = 3 9 4 S = 3, M = 1 S 1 S S 4 M 4 S = 4, M = 4 10 4 S = 3, M = 1 S 1 S M S 5 S = 5, M = 4 5 S = 3, M = S 1 S S 4 M 5 S = 5, M = 5 1 4 S = 3, M = 1 S 1 M 1 S 3 S 6 S = 6, M = 5 13 5 S = 3, M = 1 S 1 S M M 7 S = 6, M = 6 14 5 S = 3, M = S 1 M 1 M 3 S 7 S = 7, M = 7 15 5 S = 3, M = 3 S 1 M 1 M 3 M 8 S = 7, M = 8 16 4 S = 4, M = 0 S 1 S S 4 S 8 S = 8, M = 8 17 5 S = 4, M = 1 S 1 S S 4 S 8 M 9 S = 8, M = 9 18 5 S = 4, M = 1 S 1 S S 4 M 4 S 9 S = 9, M = 9 19 6 S = 4, M = S 1 S S 4 S 8 M 10 S = 9, M = 10 0 5 S = 4, M = 1 S 1 S M S 5 S 10 S = 10, M = 10 1 6 S = 4, M = S 1 S S 4 S 8 M S = 10, M = 6 S = 4, M = S 1 S S 4 M 5 S S =, M = 3 6 S = 4, M = 3 S 1 S S 4 S 8 M 1 S =, M = 1 4 5 S = 4, M = 1 S 1 M 1 S 3 S 6 S 1 S = 1, M = 1 5 6 S = 4, M = S 1 S S 4 M 4 M 13 S = 1, M = 13 6 6 S = 4, M = S 1 M 1 S 3 M 7 S 13 S = 13, M = 13 7 6 S = 4, M = 3 S 1 S S 4 M 5 M 14 S = 13, M = 14 8 6 S = 4, M = S 1 M 1 M 3 S 7 S 14 S = 14, M = 14 9 7 S = 4, M = 3 S 1 S M M 7 M 15 S = 14, M = 15 30 6 S = 4, M = 3 S 1 M 1 M 3 M 8 S 15 S = 15, M = 15 31 7 S = 4, M = 4 S 1 M 1 M 3 M 8 M 16 S = 15, M = 16 3 5 S = 5, M = 0 S 1 S S 4 S 8 S 16 S = 16, M = 16
.4 A proposed solution 7 Tble.3. Generting multiple even powers of x from to 3 (S=squrer nd M=Multiplier). N Shortest Multiplictions Criticl Pth Multipliers needed Addition for single to generte powers chins power up to N 1 S = 1, M = 0 S 1 S = 1, M = 0 4 S =, M = 0 S 1 S S =, M = 0 6 3 S =, M = 1 S 1 S M 1 S =, M = 1 8 3 S = 3, M = 0 S 1 S S 3 S = 3, M = 1 10 4 S = 3, M = 1 S 1 S S 3 M S = 3, M = 1 4 S = 3, M = 1 S 1 S M 1 S 4 S = 4, M = 14 5 S = 3, M = S 1 S M 1 M 3 S = 4, M = 3 16 4 S = 4, M = 0 S 1 S S 3 S 5 S = 5, M = 3 18 5 S = 4, M = 1 S 1 S S 3 S 5 M 4 S = 5, M = 4 0 5 S = 4, M = 1 S 1 S S 3 M S 6 S = 6, M = 4 6 S = 4, M = S 1 S S 3 S 5 M 5 S = 6, M = 5 4 5 S = 4, M = 1 S 1 S M 1 S 4 S 7 S = 7, M = 5 6 6 S = 4, M = S 1 S S 3 S 5 M 6 S = 7, M = 6 8 6 S = 4, M = S 1 S M 1 M 3 S 8 S = 8, M = 6 30 6 S = 4, M = 3 S 1 S M 1 M 3 M 7 S = 8, M = 7 3 5 S = 5, M = 0 S 1 S S 3 S 5 S 9 S = 9, M = 7 Tble.4. Generting powers (multiples of 3) of x from 3 to 30 (S=squrer nd M=Multiplier) N Shortest Multiplictions Criticl Pth Multipliers needed Addition for single to generte powers chins power up to N 3 S = 1, M = 1 S 1 M 1 S = 1, M = 1 6 3 S =, M = 1 S 1 M 1 S 3 S =, M = 1 9 4 S = 3, M = 1 S 1 S S 4 M 4 S =, M = 1 4 S = 3, M = 1 S 1 M 1 S 3 S 6 S = 3, M = 15 5 S = 3, M = 3 S 1 M 1 M 3 M 8 S = 3, M = 3 18 5 S = 4, M = 1 S 1 S S 4 M 4 S 9 S = 4, M = 3 1 6 S = 4, M = S 1 S S 4 S 8 M S = 4, M = 4 4 5 S = 4, M = 1 S 1 M 1 S 3 S 6 S 1 S = 5, M = 4 7 6 S = 4, M = 3 S 1 S S 4 M 5 M 14 S = 5, M = 5 30 6 S = 4, M = 3 S 1 M 1 M 3 M 8 S 15 S = 6, M = 5
Chpter 3 Pipeline registers nd bit requirements for different schemes Due to limittion of time ll the schemes tht hve been explined erlier re not considered in this section. Only four of them re considered here. 3.1 Bit requirement Integer optimiztion The tool used for getting the optimized number of bits for different structure is GLPK(GNU Liner Progrmming Kit) []. It solves lrge scle liner progrmming (LP), mixed integer progrmming (MIP), nd other relted problems. One exmple from ech of the four schemes is explined below nd the bit requirements for ll the remining vlues of N (where N is the order of polynomil rnging from 3 to 8) for the following four schemes re clculted. An importnt nd bsic requirement before using this optimiztion tool is to clculte the re for the multipliers tht might be used in the optimiztion process. For this simple VHDL code is written for multipliction opertion with different bit sizes, which in our cse is from to 0 bits multipliers. The res re computed using design nlyzer tool. Vlues for the re re very importnt becuse these re the priority fctors in the clcultion of bit requirement for certin output noise vrince vlue. 3.1.1 Even Odd scheme In this section it is intended to explin the optimiztion method used to obtin the bit requirement for Even Order scheme. Here on the bsis of simplicity nd functionl dvntge, we consider the even nd odd order of polynomil seprtely. 9
30 Pipeline registers nd bit requirements for different schemes In this scheme the structures for even order re different from tht of odd ordered polynomils, which is the reson for considering them seprtely for optimiztion problem to find their bit requirements. For even ordered polynomils For this cse let us consider the simple exmple of polynomil of order 4 s shown in Fig. 3.1. In Fig. 3., represents the quntizers introduced fter the 4 0 P(x) 3 1 Figure 3.1. An Order 4 polynomil using Even Odd scheme. 0 4 b b P(x) 3 b b 1 Figure 3.. An Even Odd order 4 polynomil fter the inserting quntiztions. multiplictions nd for the coefficients. In Fig. 3.3, is replced by its liner model. The errors e m1, e m, e m3, e m4 nd e s1, e s, e s3, e s4, e s5 re introduced fter the multiplictions nd t the coefficient input respectively. It is ssumed tht ll errors re uncorrelted with ech other. This provides simplicity becuse now the contribution from ech error source cn be clculted independently using the superposition principle [6]. The impulse response h(n)
3.1 Bit requirement 31 for ll the noise sources e m1, e m, e m3, e m4 nd e s1, e s, e s3, e s4, e s5 is clculted independently. From Fig. 3.3 the impulse responses re e m4 e s3 e m e s5 b b e s1 P(x) e s4 b b e m3 e s e m1 Figure 3.3. An Even Odd order 4 polynomil fter the inserting noise sources for e s5, e s4, e s3, e s, e s1 respectively, nd h s5 (n) = b 4 h s4 (n) = b 3 h s3 (n) = b h s (n) = b h s1 (n) = 1 h m4 (n) = b h m3 (n) = b h m (n) = 1 h m1 (n) = 1 for e m4, e m3, e m, e m1 respectively. The totl output noise vrince is the sum of vrinces clculted for the bove sources using the following eqution [] σ = 1 1 ( b1 )Σh (n) This is how output noise vrince is clculted. Now we need to clculte the bit requirement for certin vlue of noise vrince. For tht purpose integer liner optimiztion is used nd the objective function is the minimiztion of re of multipliers nd dders, used in the structure for certin vlue of roundoff noise requirement t the output. Ares for different bit length multipliers nd full dder re clculted using design nlyzer tool by synopsys. Generl theme of the GLPK code is to find the optiml solution of bits requirement for this structure for given output noise vrince t the miniml cost of hrdwre. In our cse the objective function is to minimize the re of dders nd multipliers of ny given tsk. A set of constrints is defined so tht the clcultion is limited to certin boundries.
3 Pipeline registers nd bit requirements for different schemes In order to clculte the size of multipliers nd dders, the minimum nd mximum of the two inputs 1i nd i re 4i nd 5i respectively. The constrints re functions of binry vribles s shown in Fig. 3.4. An importnt constrint is tht of computtion of the totl noise vrince. With rndom ssigned bit widths for different binry vribles in the structure nd clculting the noise vrince such tht it is less thn the specified limit t the output. The objective function is to minimize the re requirements while meeting the requirement limit of out put noise vrince. Here 4i nd 5i re used to clculte the re for multipliers nd dders respectively, wheres the other input of the multiplier is fixed to some word length. As result of this objective function the bit widths re now forced to the smller vlues for minimizing size of multipliers nd dders used, while still meeting the output requirement. 6 e s5 16 45 b 5 55 15 e s4 e m4 e s3 14 56 54 b b 4 44 46 e m3 3 13 e s 43 53 b e m e m1 1 e s1 41 1 51 P(x) Figure 3.4. Fig. 3.3 Binry vrible used for integer optimiztion code For odd ordered polynomils Figure 3.5 shows the structure for odd ordered polynomils for the Even Odd polynomil evlution scheme. This structure shown here is of order 3 to keep things simple. All the method explined in (Section 3.1.1) cn be pplied to the odd ordered polynomils lso but with some minor djustments becuse of the structurl difference between even ordered nd odd ordered polynomil structures of even odd scheme. These minor chnges cn be in the prmeter set definitions nd some vrible constrints where conditions my be different ccording to the lbeling of the structure. For ll even ordered polynomils when using Even Odd scheme, this method slightly vries with the chnge of order, i.e., from 4 to 6. Some prmeter vlues need to be djusted for order 6 which were previously for the order 4.