Hidden Markov Model. Definition: V, X,{T k },π. hidden Markov model. X is an output alphabet. V is a finite set of states

Relevanta dokument
12.6 Heat equation, Wave equation

Isometries of the plane

Kurskod: TAMS11 Provkod: TENB 28 August 2014, 08:00-12:00. English Version

This exam consists of four problems. The maximum sum of points is 20. The marks 3, 4 and 5 require a minimum

Kurskod: TAMS28 MATEMATISK STATISTIK Provkod: TEN1 05 June 2017, 14:00-18:00. English Version

F ξ (x) = f(y, x)dydx = 1. We say that a random variable ξ has a distribution F (x), if. F (x) =

1. Compute the following matrix: (2 p) 2. Compute the determinant of the following matrix: (2 p)

Module 6: Integrals and applications

Solutions to exam in SF1811 Optimization, June 3, 2014

Preschool Kindergarten

Rastercell. Digital Rastrering. AM & FM Raster. Rastercell. AM & FM Raster. Sasan Gooran (VT 2007) Rastrering. Rastercell. Konventionellt, AM

Module 1: Functions, Limits, Continuity

8 < x 1 + x 2 x 3 = 1, x 1 +2x 2 + x 4 = 0, x 1 +2x 3 + x 4 = 2. x 1 2x 12 1A är inverterbar, och bestäm i så fall dess invers.

Chapter 2: Random Variables

Viktig information för transmittrar med option /A1 Gold-Plated Diaphragm

Pre-Test 1: M0030M - Linear Algebra.

Beijer Electronics AB 2000, MA00336A,

Support Manual HoistLocatel Electronic Locks

Tentamen i Matematik 2: M0030M.

Hur fattar samhället beslut när forskarna är oeniga?

Kurskod: TAMS24 / Provkod: TEN (8:00-12:00) English Version

Tentamen MMG610 Diskret Matematik, GU

1. Varje bevissteg ska motiveras formellt (informella bevis ger 0 poang)

Grafisk teknik IMCDP IMCDP IMCDP. IMCDP(filter) Sasan Gooran (HT 2006) Assumptions:

2.1 Installation of driver using Internet Installation of driver from disk... 3

Second handbook of research on mathematics teaching and learning (NCTM)

Schenker Privpak AB Telefon VAT Nr. SE Schenker ABs ansvarsbestämmelser, identiska med Box 905 Faxnr Säte: Borås

denna del en poäng. 1. (Dugga 1.1) och v = (a) Beräkna u (2u 2u v) om u = . (1p) och som är parallell

Quicksort. Koffman & Wolfgang kapitel 8, avsnitt 9

Adding active and blended learning to an introductory mechanics course

Mid-Semester Evals. CS 188: Artificial Intelligence Spring Outline. Contest. Conditional Independence. Recap: Reasoning Over Time

SF1911: Statistik för bioteknik

Grafisk teknik IMCDP. Sasan Gooran (HT 2006) Assumptions:

Sammanfattning hydraulik

A TUTORIAL ON HIDDEN MARKOV MODELS. Signal Processing and Articial Neural Networks Laboratory. Indian Institute of Technology Bombay

Theory 1. Summer Term 2010

FÖRBERED UNDERLAG FÖR BEDÖMNING SÅ HÄR

Grafisk teknik. Sasan Gooran (HT 2006)

Boiler with heatpump / Värmepumpsberedare

NO NEWS ON MATRIX MULTIPLICATION. Manuel Kauers Institute for Algebra JKU

Högskolan i Skövde (SK, JS) Svensk version Tentamen i matematik

Styrteknik: Binära tal, talsystem och koder D3:1

Webbregistrering pa kurs och termin

Calculate check digits according to the modulus-11 method

Kurskod: TAMS11 Provkod: TENB 07 April 2015, 14:00-18:00. English Version

Om oss DET PERFEKTA KOMPLEMENTET THE PERFECT COMPLETION 04 EN BINZ ÄR PRECIS SÅ BRA SOM DU FÖRVÄNTAR DIG A BINZ IS JUST AS GOOD AS YOU THINK 05

Senaste trenderna från testforskningen: Passar de industrin? Robert Feldt,

LUNDS TEKNISKA HÖGSKOLA Institutionen för Elektro- och Informationsteknik

Tentamen del 2 SF1511, , kl , Numeriska metoder och grundläggande programmering

Isolda Purchase - EDI

Kurskod: TAIU06 MATEMATISK STATISTIK Provkod: TENA 15 August 2016, 8:00-12:00. English Version

Algoritmer och Komplexitet ht 08. Övning 6. NP-problem

Webbreg öppen: 26/ /

CHANGE WITH THE BRAIN IN MIND. Frukostseminarium 11 oktober 2018

Rep MEK föreläsning 2

Kurskod: TAIU06 MATEMATISK STATISTIK Provkod: TENA 31 May 2016, 8:00-12:00. English Version

(D1.1) 1. (3p) Bestäm ekvationer i ett xyz-koordinatsystem för planet som innehåller punkterna

Kurskod: TAMS11 Provkod: TENB 12 January 2015, 08:00-12:00. English Version

Materialplanering och styrning på grundnivå. 7,5 högskolepoäng

Module 4 Applications of differentiation

English Version. 1 x 4x 3 dx = 0.8. = P (N(0, 1) < 3.47) = =

Vässa kraven och förbättra samarbetet med hjälp av Behaviour Driven Development Anna Fallqvist Eriksson

Statistical Quality Control Statistisk kvalitetsstyrning. 7,5 högskolepoäng. Ladok code: 41T05A, Name: Personal number:

Kvalitetsarbete I Landstinget i Kalmar län. 24 oktober 2007 Eva Arvidsson

Eternal Employment Financial Feasibility Study

Resultat av den utökade första planeringsövningen inför RRC september 2005

SF1911: Statistik för bioteknik

Kurskod: TAIU06 MATEMATISK STATISTIK Provkod: TENA 17 August 2015, 8:00-12:00. English Version

DVG C01 TENTAMEN I PROGRAMSPRÅK PROGRAMMING LANGUAGES EXAMINATION :15-13: 15

SAMMANFATTNING AV SUMMARY OF

Tentamen i Matematik 2: M0030M.

Scalable Dynamic Analysis of Binary Code

Graphs (chapter 14) 1

NP-fullständighetsbevis

Discovering!!!!! Swedish ÅÄÖ. EPISODE 6 Norrlänningar and numbers Misi.se

and u = och x + y z 2w = 3 (a) Finn alla lösningar till ekvationssystemet


Schenker Privpak AB Telefon VAT Nr. SE Schenker ABs ansvarsbestämmelser, identiska med Box 905 Faxnr Säte: Borås

P = b) Vad betyder elementet på platsen rad 1 kolumn 3 i matrisen P 319? (2 p)

Grafer, traversering. Koffman & Wolfgang kapitel 10, avsnitt 4

Bernoullis ekvation Rörelsemängdsekvationen Energiekvation applikationer Rörströmning Friktionskoefficient, Moody s diagram Pumpsystem.

Flervariabel Analys för Civilingenjörsutbildning i datateknik

Make a speech. How to make the perfect speech. söndag 6 oktober 13

English Version. Number of sold cakes Number of days

EXTERNAL ASSESSMENT SAMPLE TASKS SWEDISH BREAKTHROUGH LSPSWEB/0Y09

Lösningar till Tentamen i Reglerteknik AK EL1000/EL1100/EL

Information technology Open Document Format for Office Applications (OpenDocument) v1.0 (ISO/IEC 26300:2006, IDT) SWEDISH STANDARDS INSTITUTE

Övning 5 ETS052 Datorkommuniktion Routing och Networking

Lösningar till tentan i SF1861/51 Optimeringslära, 3 juni, 2015

Measuring child participation in immunization registries: two national surveys, 2001


2 Uppgifter. Uppgifter. Svaren börjar på sidan 35. Uppgift 1. Steg 1. Problem 1 : 2. Problem 1 : 3

Exam Molecular Bioinformatics X3 (1MB330) - 1 March, Page 1 of 6. Skriv svar på varje uppgift på separata blad. Lycka till!!

SVENSK STANDARD SS-EN ISO 19108:2005/AC:2015

Aborter i Sverige 2008 januari juni

Michael Q. Jones & Matt B. Pedersen University of Nevada Las Vegas

The present situation on the application of ICT in precision agriculture in Sweden

Writing with context. Att skriva med sammanhang

Stokastisk simulering och Monte Carlo-metoder. Beräkningsvetenskap 2, 2009.

INSTALLATION INSTRUCTIONS

Transkript:

Hidden Markov Model Definition: hidden Markov model V, X,{T k },π X is an output alphabet V is a finite set of states {T k }={T k k X} are transition matrices T k is an V V matrix, T k j [,], j,k Tk j = π is a row vector, π [,], π =, π= k πtk Shorthand: λ= π,{t k }

Word Probabilities

Notation and Definition Alphabet: States: Word: By definition, X={,,2,...,N } V={,,2,...,V } = 2 L Pr( )=πt X =N V =V =L For example, Pr() = j k π T () j T () jk T() k How do we compute Pr( ) efficiently?

Brute Force Algorithm Method: Add each term in the summation. def wordprob( ): s=[,,...,] prob = # L+ zeros, these are indices keepgoing = True while keepgoing: term = π[s ] for i in range(l): term = term T[ ][s,s + ] prob += term # increment the indices in lexicographic order keepgoing = incrementindices( s) return prob This algorithm is O L N L. For any reasonable L, the algorithm is too slow. The algorithm calculates the same quantities repeatedly. Pr()=π T T +π T T +π T T +π T T +π T T +π T T +π T T +π T T For example, linked, colored quantities are calculated twice.

Forward Probabilities Definition: α t (j)=pr(seeing t and ending up in state j) For example, let = and N=V=2. Then, α ()=Pr(seeing and ending up in state ) =π T () +π T () α ()=Pr(seeing and ending up in state ) =π T () +π T () Notice, Pr()=α ()+α ().

Forward Probabilities Definition: α t (j)=pr(seeing t and ending up in state j) For example, let = and N=V=2. Then, α ()=π T () +π T () α ()=π T () +π T () α ()=Pr(seeing and ending up in state ) =π T () T() +π T () T() +π T () T() +π T () T() =α ()T () +α ()T () α ()=Pr(seeing and ending up in state ) =α ()T () +α ()T ()

Forward Probabilities Definition: α t (j)=pr(seeing t and ending up in state j) In general, α t (j)= π T ( ) j t= α t ( )T ( t) j <t<l Pr( t )= α t (j) j Notice, t is represented as [:t+] in Python.

Forward Algorithm Method: Use forward probabilities. def wordprob( ): L = len( ) α = zeros((l,v),float) # an L V matrix of zeros for j in range(v): for i in range(v): α[,j] += π[i] T[ ][i, j] for t in range(,l): for j in range(v): for i in range(v): α[t,j] += α[t,i] T[ t ][i, j] prob = for j in range(v): prob += α[n,j] return prob This algorithm is O L V 2 In some cases, algorithm can be improved to be linear in V. See Fast Algorithms for Large-State-Space HMMs with Applications to Web Usage Analysis by Felzenszwalb, Huttenlocher, and Kleinberg.

Complete Sets of Word Probabilities Suppose we needed to word probabilities for every word of length L. Is there a good way to do this? Notice: This problem is inherently exponential in L. Using the brute-force method on each of the N L words gives an algorithm in O(N 2L L). Using the forward-algorithm on each of the N L words gives an algorithm in O(N L V 2 L). Using the forward-algorithm efficiently gives an algorithm in O(N L V 2 logl) that gives probabilities for words of every length up to L. To compute Pr(), we also compute Pr() and Pr(). So we can store these values and use them when computing Pr().

Complete Sets of Word Probabilities λ λ

Complete Sets of Word Probabilities λ λ

Complete Sets of Word Probabilities λ λ

Complete Sets of Word Probabilities λ λ

Complete Sets of Word Probabilities λ λ

Complete Sets of Word Probabilities λ λ

Complete Sets of Word Probabilities λ λ

Complete Sets of Word Probabilities λ λ

Complete Sets of Word Probabilities λ λ

Mathematica Code Even Process in Mathematica T[]={{/2, }, {, }}; T[]={{, /2}, {, }}; n = Table[{}, {i,, Dimensions[T[]][[]]}]; A = {, }; ue = Table[a[i], {i,, Length[n]}]; evec = Solve[{ue.(T[] + T[]) == ue, Sum[a[i], {i,, Length[n]}] == }, ue]; e = Table[evec[[, i, 2]], {i,, Length[n]}]; wordprob[l_] := Module[{currentWord, i, words}, currentmatrices := Fold[Dot, T[i[]], Table[T[i[j ]], {j, 2, L}]]; words := Flatten[ Fold[ Table, { MyStringJoin[ Table[MyToString[i[k]], {k,,l}] ], (e.currentmatrices.n) }, Table[{i[k],, }, {k, L,, }] ] ] /. MyToString > ToString /. MyStringJoin > StringJoin; Return[words]; ] For appropriate V V matrices, wordprob[l] computes the probabilities for every word of length L using the brute force method for each of the X L words. For L 5, Mathematica computes wordprob[l] as fast as Python does when using the forward algorithm intelligently!

Viterbi Path

Question and Example Given an HMM and an output sequence, what sequence of states most likely caused the output sequence?.99..99. If we observe =, the possible internal state sequences are: Since we are in state most of the time, the most likely state sequence is.

Viterbi Path - Brute Force The Viterbi path, ρ V L+, is given by: ρ( )= rgm x s Pr( s) Once again, we can compute this with brute force. Given a word,. calculate Pr( s) where s is a sequence of states 2. do this for all N L possible state sequences s 3. return the s which maximizes Pr( s) This algorithm is ridiculously similar to the brute force method for computing word probabilities, and thus, is also O(L N L ). As before, dynamic programming techniques should be used.

Viterbi Algorithm δ t (j)=(probability,path) The first component is the probability of the Viterbi path for t that ends in state j. m x π T j t= δ X t (j)= m x δ t ( )T t j <t<l X The second component is a list of the path mentioned above. δ t (j)= rgm x π T j δ t ( ) { } t= <t<l, = rgm x δ t T t j The union operator should be understood as append to the list.

Viterbi Algorithm δ t (j)= m x π T j m x X δ t ( )T t j X t= <t<l δ t (j)= rgm x π T j δ t ( ) { } t= <t<l, = rgm x δ t T t j For each j V, δ (j) is a possible Viterbi path for. The L actual Viterbi path ρ is: ρ=δ L (j ) {j } where j = rgm x j {δ L (j)} That is, the correct path is the one with maximum likelihood.

Viterbi Example Consider = for a 2-state HMM with X={,}.

Viterbi Example Consider = for a 2-state HMM with X={,}. δ ()=m x π T,π T

Viterbi Example Consider = for a 2-state HMM with X={,}. δ ()=m x π T,π T For demonstration purposes, let s pick one of these paths to be more probable. Notice, this algorithm assumes they are not equal. Thus, we now have δ ()= π T,{}

Viterbi Example Consider = for a 2-state HMM with X={,}. δ ()=m x π T,π T

Viterbi Example Consider = for a 2-state HMM with X={,}. δ ()=m x π T,π T Once again, let s pick one to be the maximum. Thus, δ ()= π T,{}

Viterbi Example Consider = for a 2-state HMM with X={,}. So far, we have: δ ()= π T,{} δ ()= π T,{}

Viterbi Example Consider = for a 2-state HMM with X={,}. δ ()=m x δ ()T,δ ()T =m x π T T,π T T

Viterbi Example Consider = for a 2-state HMM with X={,}. δ ()=m x δ ()T,δ ()T Picking a maximum gives: =m x π T T,π T T δ ()= π T T,{,}

Viterbi Example Consider = for a 2-state HMM with X={,}. δ ()=m x δ ()T,δ ()T =m x π T T,π T T

Viterbi Example Consider = for a 2-state HMM with X={,}. δ ()=m x δ ()T,δ ()T Computing the maximum gives: =m x π T T,π T T δ ()= π T T,{,}

Viterbi Example Consider = for a 2-state HMM with X={,}. Now we have: δ ()= π T,{} δ ()= π T,{} δ ()= π T T,{,} δ ()= π T T,{,}

Viterbi Example Consider = for a 2-state HMM with X={,}. δ 2 ()=m x δ ()T,δ ()T =m x π T T T,π T T T

Viterbi Example Consider = for a 2-state HMM with X={,}. δ 2 ()=m x δ ()T,δ ()T =m x π T T T,π T T T Hypothetically, we compute the maximum and obtain: δ 2 ()= π T T T,{,,}

Viterbi Example Consider = for a 2-state HMM with X={,}. δ 2 ()=m x δ ()T,δ ()T =m x π T T T,π T T T

Viterbi Example Consider = for a 2-state HMM with X={,}. δ 2 ()=m x δ ()T,δ ()T =m x π T T T,π T T T Finally, we compute the maximum and obtain: δ 2 ()= π T T T,{,,}

Viterbi Example Consider = for a 2-state HMM with X={,}. In total, we have: δ ()= π T,{} δ ()= π T,{} δ ()= π T T,{,} δ ()= π T T,{,} δ 2 ()= π T T T,{,,} δ 2 ()= π T T T,{,,}

Viterbi Example Consider = for a 2-state HMM with X={,}. To find the Viterbi path for =, first we find the j that maximizes δ 2 (j). j = rgm x{π T T T,π T j }{{} T T } }{{} j= j= Suppose, the j= term was the larger of the two. Then the Viterbi path is: ρ=δ 2 () {}={,,} {}={,,,}

Viterbi Example Code def viterbi( ): L = len( ) δ = {} for j in range(v): (v_prob, v_path) = (, None) for i in range(num_states): prob = π[i] T[ ][i, j] if prob > v_prob: (v_prob, v_path) = (prob, [i ]) δ[,j] = (v_prob, v_path) for t in range(,l): for j in range(v): (v_prob, v_path) = (, None) for i in range(v): (prior_prob, prior_path) = δ[t,i] prob = prior_prob T[ t ][i, j] if prob > v_prob: (v_prob, v_path) = (prob, prior_path + [i ]) δ[t,j] = (v_prob, v_path) value_max = argmax = None for j in range(v): if δ[n,j][] > value_max: value_max = δ[n,j][] argmax = j path = δ[n,argmax][] + [argmax] return path Like the forward algorithm, this algorithm is O(L V 2 ).

HMM Inference

The Idea Given and an assumed number of states, adjust λ= π,{t k } to maximize Pr( λ). That is, λ = rgm x λ Pr( λ) One common method to use is the Baum-Welch algorithm. In practice, only local maxima can be found. The method is iterative producing a series of λ such that Pr( λ + )>Pr( λ ) Eventually, the improvements on λ decrease to zero. MLE method via expectation-modification (EM) algorithm. q q qt t q t+ t+ L 2 q L L q L

Backward Probabilities Recall, the forward probabilities: α t (j)=pr( t,q t+ =j λ) = π T ( ) j t= α t ( )T ( t) j <t<l Now, the backward probabilities: β t ( )=Pr( t+ t+2 L q t+ =,λ) = t=l j T t+ j β t+ (j) t<l

Another Definition The probability of being in state at time t+, given : γ t ( )=Pr(q t+ =,λ) Notice, Pr(,q t+ = λ)=α t ( )β t ( ) Using Pr(A B)=Pr(A B)Pr(B), γ t ( )= Pr(,q t+= λ) Pr( λ) = α t( )β t ( ) α t( )β t ( ) Thus, we compute the forward and backward probabilities. Then we calculate γ t ( ). Given, one way to estimate π is by: π =γ ( )

Yet Another Definition The probability of being in state at time t+ then j, given : ξ t (,j)=pr(q t+ =,q t+2 =j,λ) t L 2 = Pr(q t+=,q t+2 =j, λ) Pr( λ) = α t( )T t+ j β t+ (j) j α t( )T t+ j β t+ (j) Notice, γ t ( ) is the marginal distribution. γ t ( )= ξ t (,j) j

Baum-Welch Algorithm Let λ and λ be two different HMM specifications. Q(λ, λ)= Pr(,q λ)logpr(,q λ) q V L+ For state-output HMMs, it was shown 2 that: Q(λ, λ)>q(λ,λ) Pr( λ)>pr( λ) We can generate an edge-output HMM from a state-output HMM that describes the same process, so the results should hold (with slight modifications). This is also known as the forward-backward algorithm. 2 Baum,Petrie,Soules,andWeiss. Amaximizationtechniqueoccurringinthe statistical analysis of probabilistic functions of Markov chains.

Baum-Welch Algorithm We can build a new λ from λ and : π =prob. of being in state at time(t=) =γ ( ) T k j = prob. of transitioning from state to j and seeing k prob. of transitioning from state to j = L 2 t= t+ =k L 2 t= ξ t (,j) ξ t (,j) So, we take λ= π,{ T k }

Baum-Welch Procedure Procedure: # assume the number of states # choose some initial λ, perhaps a uniform λ # observe # using, generate λ while not λ λ: λ= λ # regenerate λ Example: (eventually)

Sources Most sources use state-output HMMs. Lawrence R. Rabiner. A tutorial on hidden markov models and selected applicationsin speech recognition. Proceedings of the IEEE, Vol. 77, No. 2, Feb 989. Wikipedia. Viterbi algorithm. http://en.wikipedia.org/wiki/viterbi_algorithm Roger Boyle. Hidden Markov Models. http://www.comp.leeds.ac.uk/roger/hiddenmarkovmodels/html_dev/main.html Benjamin Taitelbaum. The Uses of Hidden Markov Models and Adaptive Time-Delay Neural Networks in Speech Recognition. http://occs.cs.oberlin.edu/~btaitelb/projects/honors/paper.html Narada Dilp Warakagoda. A Hybrid ANN-HMM ASR system with NN based adaptive preprocessing. http:// jedlik.phy.bme.hu/~gerjanos/hmm/hoved.html