Institutionen för Electrical and Information Technology Lunds Universitet Lunds Tekniska Högskola Optimal Signal Processing Laboratory work Martin Stridh Leif Sörnmo Bengt Mandersson 2007 Department of Electrical and Information Technology, Lund University, Sweden
1 SPEECH CODING 1 1 Speech coding 1.1 Speech model The speech signal consists of voiced sound and unvoiced sound. The voiced sound consist of pulse train (periodical signal) generated in the glottis and then filtered in the mouth. The unvoiced sound (non-periodical signal) is generated from noise and then also filtered in the mouth. An example of waveform is shown in Fig.??. 1 0.8 D - et -ta är en te - st -sig -nal 0.6 0.4 0.2 0-0.2-0.4-0.6-0.8-1 0 2000 4000 6000 8000 10000 12000 14000 16000 Figur 1: Example of a speech signal s(t) (waveform). Spectra for periodical sound (left) and for non-periodical sound (right) are shown in Fig??. 1000 30 900 800 25 700 20 600 amplitude 500 amplitude 15 400 300 10 200 5 100 0 0 1000 2000 3000 4000 frequency (Hz) 0 0 1000 2000 3000 4000 frequency (Hz) Figur 2: Left: Spectrum for the vowel a. Right: Spectrum for non-voiced sound s. In Fig.??, an example of the system function of the mouth. The lips affect the model as R(z) =1 z 1 (1)
1 SPEECH CODING 2 which is a high pass filter with 6 db/octav. The total frequency function (Fourier transform) S(f) can be written as which is shown in Fig.??. S(f) =R(f)V (f)e(f) (2) Pitch frekvens Pulsformare e(t), E(f) Talrörets överföringsfunktion V(f) Läpparnas överföringsfunktion R(f) talsignal s(t), S(f) Brusgenerator Figur 3: Block diagram of speech production. Periodisk impulsgenerator talsignal LINJÄR PREDIKTIV KODNING Redundans i form av LPC-parametrar tal utan redundans PITCH KODNING Information om grundton re con str. re sidual. utan ton RESIDUAL KODNING Residual för excitering Figur 4: Block diagram over a speech coder.
2 LABORATION 1: GSM SPEECH CODING 3 2 Laboration 1: GSM Speech coding 2.1 Preparation before the laboratory work Read Chapter 4.7.2, 5.2.6 in the texbook by Hayes. 2.2 Introduction Global System for Mobile communications (GSM) is a digital system for mobile communication. It is developed in Europe but spread all over the world. In 1982, Conference of European Posts and Telegraphs (CEPT) start a special group Groupe Special Mobile for development of a new digital mobile phone system for Europe. In 1989, the project was move to European Telecommunication Standards Institute (ETSI). The first recommendation published in 1990 and the first system started around 1991. In this laboratory work. we will look at the speech coder 13 kbit/s Linear Predictive Coder - Long Term Prediction - Rectangular Pulse Excitation (LPC-LTP-RPE), GSM Full-rate (FR) there FR means 13 kbit/s. 2.3 Description of the GSM speech coder talsignal LPC 8 stycken reflexionskoefficienter 1800 bits/s tal utan redundans LONG-TERM PREDICTION Grundtonens periodicitet och förstärkning 1800 bits/s re con str. re sidual. utan ton RECTANGULAR PULSE EXCITATION Normaliserad och nedsamplad residual, förstärkning och förskjutning 9400 bits/s Figur 5: Block diagram for the GSM FR speech coder. 2.3.1 Input signal Sample with 8 khz sampling rate and 13 bits resoulution. Use the A-law for compression. 2.3.2 Prefiltering First high pass filtering in two steps, H offset (z) = 1 z 1 1 αz 1 α = 32735 2 15 (3) H preemph (z) =1 βz 1 β = 28180 2 15 (4)
2 LABORATION 1: GSM SPEECH CODING 4 2.3.3 Linjär prediktiv kodning In GSM, the filter parameters are updated every 160 samples (20 ms). The signals are first divided into blocks of 160 samples. V (z) = 1 A(z) = 1 1+a 1 z 1 + a 2 z 2 +...+ a 8 z 8 (5) First, estimate r s (0) r s (8) with (Hayes [4.153]). 159 r s (k) = s(i)s(i k) k =0..8 (6) i=k Then,determine A(z)-parametrers. and the reflection koefficients. To increase the resolution for values closed to -1 and 1, determine the Log-Area Ratios (LAR) according to D LAR i = log 10 ( 1+γ i 1 γ i ) (7) 2 1.5 1 0.5 LAR-parametrar 0-0.5-1 -1.5-2 -1-0.8-0.6-0.4-0.2 0 0.2 0.4 0.6 0.8 1 Reflexionskoefficienter Figur 6: Expander for the reflection parameters. Eight LAR-parameters use 36 bits for each segment of 160 samples which gives 1800 bits/s. 2.3.4 Pitch analysis Here, 9 bits are used for each segment which gives 1800 bits/s. 2.3.5 Residual coding 2.3.6 GSM FR dekoder The decoder receive the error signal, add the pitchand, then pass through the LPE encoder and finally the post filter.
2 LABORATION 1: GSM SPEECH CODING 5 1. Aktuell LPC-residual 8. Rekonstruerad LPC-residual för de tre föregående delintervallen 4. Skattad LPC-residual Korskorrelation ger parametrar N och b 3. Kvant. N och b 2. Väljer ut den del till radiosystemet av den tidigare rekonstruerade LPC-residualen som mest liknar den aktuella LPC-residualen 7. Aktuell rekonstruerad LPC-residual som behövs för nästa delintervall + + 5. Aktuell Exciteringssignal 6. Rekonstruerad Exciteringssignal Figur 7: Pitch-analysen (LTP) i GSM FR steg för steg. 1. Aktuell Exciteringssignal 2. Smetar ut signalen Lågpassfilter 3. Den 40 sampel långa sekvensen representeras av var tredje sampel med början på sampel 1,2,3 eller 4 (13 sampel). RPE kodning 4. Startsampel till radiosystemet 5. Residual (12 sampel) 9. Rekonstruerad Exciteringssignal 7. Rekonstruerad residual (12 sampel) APCM kodning och avkodning av residual RPE avkodning 6. Kodad residual till radiosystemet 8. 40 sampel residual rekonstrueras Figur 8: Residual coding in the GSM FR. 2.4 MATLAB files for the laboraty work function ut=convert(in,fs_in,fs_ut,bitar_in,bitar_ut) Denna funktion samplar om signalen "in" som har samplingsfrekvens
2 LABORATION 1: GSM SPEECH CODING 6 "fs_in" och "bitar_in" bitars upplösning till signalen "ut" med samplingsfrekvens "fs_ut" och "bitar_ut" bitars upplösning. Exempel ut=convert(in,fs_in,fs_ut,bitar_in,bitar_ut) function [LPCres,KvantLAR]=LPCenc(sig,AntalPar); Denna funktion tar bort redundans i talsignalen och skapar en LPC-residual. Den borttagna informationen finns nu i de kvantiserade filterparametrarna. sig - talsignal AntalPar - Antal filterparametrar (LAR) KvantLAR - De kvantiserade LAR-parametrarna LPCres - Den kvarvarande signalen. Exempel [LPCres,KvantLAR]=LPCenc(sig,AntalPar); function utsig=lpcdec(kvantlar,reclpcres); Denna funktion lägger till den redundans som finns kodad i LAR-parametrarna till den rekonstruerade LPC-residualen och skapar därmed den rekonstruerade talsignalen. utsig - rekonstruerad talsignal KvantLAR - De kvantiserade LAR-parametrarna RecLPCres - Den rekonstruerade LPC-residualen Exempel utsig=lpcdec(kvantlar,reclpcres); function [Excsig,RecExcsig,KvantN,Kvantb,KvantM,KvantMax,KvantExc]=LTPenc(LPCres); Denna funktion kodar LPC-residualen i grundtonrelaterade parametrar och residual KvantN - Förskjutning mellan två toppar i signalen Kvantb - Amplitudförstärkning jämfört med föregående topp KvantM - 0-3 talar om startsampel för de 12 bitar som representerar de 40 ursprungliga (var tredje) KvantMax - Talar om vilket utstyrningsområde som gäller vid APCM av residualen KvantExc - Värde för de 12 normaliserade samplen LPCres - LPC-residual dvs talsignal från vilken korttidsredundans är borttagen med LPC
2 LABORATION 1: GSM SPEECH CODING 7 Excsig - LPC-residual från vilken tonen är bortplockad RecExcsig - Detta är Excsig återskapad från den information som finns i de kvnatiserade parametrarn Exempel [Excsig,RecExcsig,KvantN,Kvantb,KvantM,KvantMax,KvantExc]=LTPenc(LPCres); function RecLPCres=LTPdec(KvantN,Kvantb,KvantM,KvantMax,KvantExc,Res); Denna funktion rekonstruerar LPC-residualen från de översända kvantiserade parametrarna KvantN - Förskjutning mellan två toppar i signalen Kvantb - Amplitudförstärkning jämfört med föregående topp KvantM - 0-3 talar om startsampel för de 12 bitar som representerar de 40 ursprungliga (var tredje) KvantMax - Talar om vilket utstyrningsområde som gäller vid APCM av residualen KvantExc - Värde för de 12 normaliserade samplen RecLPCres - Rekonstruerad LPC residual. Dvs återskapad exciteringssignal med adderad ton Res - res om KvantExc ska användas och vit om brus ska användas. Exempel RecLPCres=LTPdec(KvantN,Kvantb,KvantM,KvantMax,KvantExc,Res); function ut=prefilt(in); Exempel ut=prefilt(in); function ut=postfilt(in); Exempel ut=postfilt(in);
2 LABORATION 1: GSM SPEECH CODING 8 2.5 Laboratory exercises Start WaveStudio (or some other program for recording) and record 1 secund of a voiced sound (vowel äänd one second of a non-voiced sound (s ). 1. Press = record and Start. 2. Speak into the microphone. 3. Press Stop. 4. Select one seconds of the signal. 5. ThenEdit and Copy. 6. ThenFile and,new. 7. Paste Edit. 8. Save as a.wav resp. s.wav in c:\myfiles\. Start Matlab and type initosb. Problem 1 Read your files with wavread and convert to 16-bits, 8 khz sampling rate, convert. Now select one second, signal=signal(1:8000);) Problem 2 Prefilt your signals with prefilt. Also plot the first 200 samples of the signal before and after the filter. Listen: sound([inputsignal;outputsignal]);) Problem 3 Do the LPC analysis with the function LPCenc. Use eight filter coefficients. Plot the LPC residual and the quantified LAR parameters. Is eight coefficients OK? Any difference between voiced and non-voiced sound? Listen. Explain the results.
2 LABORATION 1: GSM SPEECH CODING 9 Problem 4 Compute the LTP analyzes of the LPC residual using LTPenc. Plot the exciteringssignalen and the correspionding quantified signal RecExcsig. What is the fundamenta frequeny in your wovwl A. Listen to the exciteringssignalen (error signal after LPT filtering). Compare with rekonstruerade exciteringssignalen. Problem 5 Plot 200 samples of each signal (före förfilt., efter förfilt, LPC-res, Exc-sig and rekonstruerad exc-sig). Use subplot to divide the plot area. Listen and compare the wovel and the s sound. Problem 6 Reconstruction: Decode the quantified parameters and reconstruct the LPC residual. Use LTPdec. Listen and compare with signals in the step before.
2 LABORATION 1: GSM SPEECH CODING 10 Problem 7 Synthesize now using LPCdec. Listen to the results. Comment. Problem 8 Postfilt with postfilt. Problem 9 Compare the signals before and after the speech coder. Listen to the signals after each step in the coder. Problem 10 Now use white noise as error signal (use vit as input to LTPdec. Corresponds to 3600 bits/s). Then,
2 LABORATION 1: GSM SPEECH CODING 11
3 LABORATORY WORK 2: POWER SPECTRUM ESTIMATION 12 3 Laboratory work 2: Power spectrum estimation 3.1 Preparation before the laboratory work Prepare the laboratory work by reading chapter 8 in Hays book. 3.2 Introduction In this laboratory work we will test some of the methods for power spectrum estimation given in Hay s book. 3.3 Matlab files and signal files function Px=pergram(x,n1,n2) Periodogram in Hayes book. function Px=mper(x,win,n1,n2); Modified Periodogram (see Hayes book). function Px=welch(x,L,over,win); Welch method (Hayes book). afib.mat ECG-signal And all files from laboratory 1.
3 LABORATORY WORK 2: POWER SPECTRUM ESTIMATION 13 3.4 Laboratory works Problem 1 Record an A-sound (vowel a ) and a short sentence (4-5 words) using Wavestudio and save it into files. Problem 2 Resample the signals to 8 khz sampling rate and 13 bits resolution using convert. Select about 1 second of the signals. (foe example signal=signal(1:8000);). Problem 3 Use the functions pergram, mper and welch to analyze the signals. Downsample by a factor 4 (decimate). Problem 4 Can you characterize the sound using your spectra. Problem 5 Load the signal afib.mat. Execute your Welch program file. Comment the contents of the power spectrum of the signal. The signal is sampled with 1000 Hz sampling rate. Downsample the signal by a factor 50 decimate.then, only frequency components below 10 Hz are remaining. Comments?
3 LABORATORY WORK 2: POWER SPECTRUM ESTIMATION 14 Problem 6 Modify your Welch-file so that each spectra form the frames are stored in different rows in a matrix. Then, you got a time-frequency analyzer. Problem 7 Analyze the A-signal and the sentence with your time-frequency analyzer. Look for some different vowels in the sentence and describe the power spectra for these vowels. Problem 8 Run the speech coder program using the A-signal and the sentence as inputs (sample rate 8 khz). Downsample the signals (insignal,...,recexcsig) in the coder to 2 khz and analyze the signal with your time-frequency analyzer. Explain the results.
3 LABORATORY WORK 2: POWER SPECTRUM ESTIMATION 15