Unit 21 27 Probabilistic Segmentation Model.mp4

Let's build a probabilistic word model of segmentation. By definition, the best segmentation, which we'll call S, is equal to the one which maximizes the joint probability of the segmentation. So we're going to segment the text into a sequence of words-- word 1 through word n-- and find that segmentation into words that maximize the joint probability. By the definition of joint probability, that's the same as maximizing the product over the words of the probability of each word given all the previous words. Now this is going to be a little unwieldy to deal with, so we can make an approximation. We can say that the best segmentation is approximately equal to the one that maximizes, and what we could do here is we could make the Markov assumption and say we're only going to be considering the few previous words. But I'm going to go all the way and make the naive Bayes assumption and say we're going to treat each word independently. We just want to maximize the probability of each individual word regardless of the word that comes before or after it. Now, I know that that assumption is wrong and that the words do depend on the words to the right or the left of them, but I'm going to hope that this simplification is going to make the process of learning easier and will turn out to be good enough.