You are right. It's the total number of words in the vocabulary, since there can be only one entry for a term in the vocabulary.
Calculating B and |V| in naive bayes text classification
-
11-04-2022 - |
Question
I found a link about multinomial naive bayes classifier
How we could calculate the B'
or |V|
?
The page said that it is the number of terms in the vocabulary. In its example, how we could get 6
for B
? Is it the counting of all term?
"chinese", "beijing", "shanghai", "meacao", "tokyo", "japan"
One more question, what if new term appear in testing document? example, in doc 6 appears "bangkok" or any new word that never appear before. how to count the probability of new term ?
Solution
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow