Understanding Bayes' Theorem

https://stackoverflow.com/questions/1974288

21-09-2019
|

Question

I'm working on an implementation of a Naive Bayes Classifier. Programming Collective Intelligence introduces this subject by describing Bayes Theorem as:

Pr(A | B) = Pr(B | A) x Pr(A)/Pr(B)

As well as a specific example relevant to document classification:

Pr(Category | Document) = Pr(Document | Category) x Pr(Category) / Pr(Document)

I was hoping someone could explain to me the notation used here, what do Pr(A | B) and Pr(A) mean? It looks like some sort of function but then what does the pipe ("|") mean, etc?

Solution

Pr(A | B) = Probability of A happening given that B has already happened
Pr(A) = Probability of A happening

But the above is with respect to the calculation of conditional probability. What you want is a classifier, which uses this principle to decide whether something belongs to a category based on the previous probability.

See http://en.wikipedia.org/wiki/Naive_Bayes_classifier for a complete example

OTHER TIPS

I think they've got you covered on the basics.

Pr(A | B) = Pr(B | A) x Pr(A)/Pr(B)

reads: the probability of A given B is the same as the probability of B given A times the probability of A divided by the probability of B. It's usually used when you can measure the probability of B and you are trying to figure out if B is leading us to believe in A. Or, in other words, we really care about A, but we can measure B more directly, so let's start with what we can measure.

Let me give you one derivation that makes this easier for writing code. It comes from Judea Pearl. I struggled with this a little, but after I realized how Pearl helps us turn theory into code, the light turned on for me.

Prior Odds:

O(H) = P(H) / 1 - P(H)

Likelihood Ratio:

L(e|H) = P(e|H) / P(e|¬H)

Posterior Odds:

O(H|e) = L(e|H)O(H)

In English, we are saying that the odds of something you're interested in (H for hypothesis) are simply the number of times you find something to be true divided by the times you find it not to be true. So, say one house is robbed every day out of 10,000. That means that you have a 1/10,000 chance of being robbed, without any other evidence being considered.

The next one is measuring the evidence you're looking at. What is the probability of seeing the evidence you're seeing when your question is true divided by the probability of seeing the evidence you're seeing when your question is not true. Say you are hearing your burglar alarm go off. How often do you get that alarm when it's supposed to go off (someone opens a window when the alarm is on) versus when it's not supposed to go off (the wind set the alarm off). If you have a 95% chance of a burglar setting off the alarm and a 1% chance of something else setting off the alarm, then you have a likelihood of 95.0.

Your overall belief is just the likelihood * the prior odds. In this case it is:

((0.95/0.01) * ((10**-4)/(1 - (10**-4))))
# => 0.0095009500950095

I don't know if this makes it any more clear, but it tends to be easier to have some code that keeps track of prior odds, other code to look at likelihoods, and one more piece of code to combine this information.

I have implemented it in Python. It's very easy to understand because all formulas for Bayes theorem are in separate functions:

#Bayes Theorem

def get_outcomes(sample_space, f_name='', e_name=''):
    outcomes = 0
    for e_k, e_v in sample_space.items():
        if f_name=='' or f_name==e_k:
            for se_k, se_v in e_v.items():
                if e_name!='' and se_k == e_name:
                    outcomes+=se_v
                elif e_name=='':
                    outcomes+=se_v
    return outcomes

def p(sample_space, f_name):
    return get_outcomes(sample_space, f_name) / get_outcomes(sample_space, '', '')

def p_inters(sample_space, f_name, e_name):
    return get_outcomes(sample_space, f_name, e_name) / get_outcomes(sample_space, '', '')

def p_conditional(sample_space, f_name, e_name):
    return p_inters(sample_space, f_name, e_name) / p(sample_space, f_name)

def bayes(sample_space, f, given_e):
    sum = 0;
    for e_k, e_v in sample_space.items():
        sum+=p(sample_space, e_k) * p_conditional(sample_space, e_k, given_e)
    return p(sample_space, f) * p_conditional(sample_space, f, given_e) / sum

sample_space = {'UK':{'Boy':10, 'Girl':20},
                'FR':{'Boy':10, 'Girl':10},
                'CA':{'Boy':10, 'Girl':30}}

print('Probability of being from FR:', p(sample_space, 'FR'))
print('Probability to be French Boy:', p_inters(sample_space, 'FR', 'Boy'))
print('Probability of being a Boy given a person is from FR:', p_conditional(sample_space, 'FR', 'Boy'))
print('Probability to be from France given person is Boy:', bayes(sample_space, 'FR', 'Boy'))

sample_space = {'Grow' :{'Up':160, 'Down':40},
                'Slows':{'Up':30, 'Down':70}}

print('Probability economy is growing when stock is Up:', bayes(sample_space, 'Grow', 'Up'))

Pr(A | B): Conditional probability of A : i.e. probability of A, given that all we know is B

Pr(A) : Prior probability of A

Pr is the probability, Pr(A|B) is the conditional probability.

Check wikipedia for details.

the pipe (|) means "given". The probability of A given B is equal to the probability of B given A x Pr(A)/Pr(B)

Based on your question I can strongly advise that you need to read some undergraduate book on Probability Theory first. Without this you will not advance properly with your task on Naive Bayes Classifier.

I would recommend you this book http://www.athenasc.com/probbook.html or look at MIT OpenCourseWare.

The pipe is used to represent conditional probability. Pr(A | B) = Probability of A given B

Example: Let's say you are not feeling well and you surf the web for the symptoms. And the internet tells you that if you have these symptoms then you have XYZ disease.

In this case: Pr(A | B) is what you are trying to find out, which is: The probability of you having XYZ GIVEN THAT you have certain symptoms.

Pr(A) is the probability of having the disease XYZ

Pr(B) is the probability of having those symptoms

Pr(B | A) is what you find out from the internet, which is: The probability of having the symptoms GIVEN THAT you have the disease.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow