Naive bayes and conditional probability calculation

https://stackoverflow.com/questions/19744115

03-07-2022
|

Question

Well here is my situation, I know some probability theory, I know Bayes theorem, etc. But to put it into matlab I'm lost as how to calculate the conditional.

What I'm doing is the classification of the iris data set, this:

    5.1000    3.5000    1.4000    0.2000    1.0000
    4.9000    3.0000    1.4000    0.2000    1.0000
    4.7000    3.2000    1.3000    0.2000    1.0000
    4.6000    3.1000    1.5000    0.2000    1.0000
    5.0000    3.6000    1.4000    0.2000    1.0000
    5.4000    3.9000    1.7000    0.4000    1.0000
    4.6000    3.4000    1.4000    0.3000    1.0000
    5.0000    3.4000    1.5000    0.2000    1.0000
    4.4000    2.9000    1.4000    0.2000    1.0000
    4.9000    3.1000    1.5000    0.1000    1.0000
    5.4000    3.7000    1.5000    0.2000    1.0000
    4.8000    3.4000    1.6000    0.2000    1.0000
    4.8000    3.0000    1.4000    0.1000    1.0000
    4.3000    3.0000    1.1000    0.1000    1.0000
    5.8000    4.0000    1.2000    0.2000    1.0000
    5.7000    4.4000    1.5000    0.4000    1.0000
    5.4000    3.9000    1.3000    0.4000    1.0000
    5.1000    3.5000    1.4000    0.3000    1.0000
    5.7000    3.8000    1.7000    0.3000    1.0000
    5.1000    3.8000    1.5000    0.3000    1.0000
    5.4000    3.4000    1.7000    0.2000    1.0000
    5.1000    3.7000    1.5000    0.4000    1.0000
    4.6000    3.6000    1.0000    0.2000    1.0000
    5.1000    3.3000    1.7000    0.5000    1.0000
    4.8000    3.4000    1.9000    0.2000    1.0000
    5.0000    3.0000    1.6000    0.2000    1.0000
    5.0000    3.4000    1.6000    0.4000    1.0000
    5.2000    3.5000    1.5000    0.2000    1.0000
    5.2000    3.4000    1.4000    0.2000    1.0000
    4.7000    3.2000    1.6000    0.2000    1.0000
    4.8000    3.1000    1.6000    0.2000    1.0000
    5.4000    3.4000    1.5000    0.4000    1.0000
    5.2000    4.1000    1.5000    0.1000    1.0000
    5.5000    4.2000    1.4000    0.2000    1.0000
    4.9000    3.1000    1.5000    0.1000    1.0000
    5.0000    3.2000    1.2000    0.2000    1.0000
    5.5000    3.5000    1.3000    0.2000    1.0000
    4.9000    3.1000    1.5000    0.1000    1.0000
    4.4000    3.0000    1.3000    0.2000    1.0000
    5.1000    3.4000    1.5000    0.2000    1.0000
    5.0000    3.5000    1.3000    0.3000    1.0000
    4.5000    2.3000    1.3000    0.3000    1.0000
    4.4000    3.2000    1.3000    0.2000    1.0000
    5.0000    3.5000    1.6000    0.6000    1.0000
    5.1000    3.8000    1.9000    0.4000    1.0000
    4.8000    3.0000    1.4000    0.3000    1.0000
    5.1000    3.8000    1.6000    0.2000    1.0000
    4.6000    3.2000    1.4000    0.2000    1.0000
    5.3000    3.7000    1.5000    0.2000    1.0000
    5.0000    3.3000    1.4000    0.2000    1.0000
    7.0000    3.2000    4.7000    1.4000    2.0000
    6.4000    3.2000    4.5000    1.5000    2.0000
    6.9000    3.1000    4.9000    1.5000    2.0000
    5.5000    2.3000    4.0000    1.3000    2.0000
    6.5000    2.8000    4.6000    1.5000    2.0000
    5.7000    2.8000    4.5000    1.3000    2.0000
    6.3000    3.3000    4.7000    1.6000    2.0000
    4.9000    2.4000    3.3000    1.0000    2.0000
    6.6000    2.9000    4.6000    1.3000    2.0000
    5.2000    2.7000    3.9000    1.4000    2.0000
    5.0000    2.0000    3.5000    1.0000    2.0000
    5.9000    3.0000    4.2000    1.5000    2.0000
    6.0000    2.2000    4.0000    1.0000    2.0000
    6.1000    2.9000    4.7000    1.4000    2.0000
    5.6000    2.9000    3.6000    1.3000    2.0000
    6.7000    3.1000    4.4000    1.4000    2.0000
    5.6000    3.0000    4.5000    1.5000    2.0000
    5.8000    2.7000    4.1000    1.0000    2.0000
    6.2000    2.2000    4.5000    1.5000    2.0000
    5.6000    2.5000    3.9000    1.1000    2.0000
    5.9000    3.2000    4.8000    1.8000    2.0000
    6.1000    2.8000    4.0000    1.3000    2.0000
    6.3000    2.5000    4.9000    1.5000    2.0000
    6.1000    2.8000    4.7000    1.2000    2.0000
    6.4000    2.9000    4.3000    1.3000    2.0000
    6.6000    3.0000    4.4000    1.4000    2.0000
    6.8000    2.8000    4.8000    1.4000    2.0000
    6.7000    3.0000    5.0000    1.7000    2.0000
    6.0000    2.9000    4.5000    1.5000    2.0000
    5.7000    2.6000    3.5000    1.0000    2.0000
    5.5000    2.4000    3.8000    1.1000    2.0000
    5.5000    2.4000    3.7000    1.0000    2.0000
    5.8000    2.7000    3.9000    1.2000    2.0000
    6.0000    2.7000    5.1000    1.6000    2.0000
    5.4000    3.0000    4.5000    1.5000    2.0000
    6.0000    3.4000    4.5000    1.6000    2.0000
    6.7000    3.1000    4.7000    1.5000    2.0000
    6.3000    2.3000    4.4000    1.3000    2.0000
    5.6000    3.0000    4.1000    1.3000    2.0000
    5.5000    2.5000    4.0000    1.3000    2.0000
    5.5000    2.6000    4.4000    1.2000    2.0000
    6.1000    3.0000    4.6000    1.4000    2.0000
    5.8000    2.6000    4.0000    1.2000    2.0000
    5.0000    2.3000    3.3000    1.0000    2.0000
    5.6000    2.7000    4.2000    1.3000    2.0000
    5.7000    3.0000    4.2000    1.2000    2.0000
    5.7000    2.9000    4.2000    1.3000    2.0000
    6.2000    2.9000    4.3000    1.3000    2.0000
    5.1000    2.5000    3.0000    1.1000    2.0000
    5.7000    2.8000    4.1000    1.3000    2.0000
    6.3000    3.3000    6.0000    2.5000    3.0000
    5.8000    2.7000    5.1000    1.9000    3.0000
    7.1000    3.0000    5.9000    2.1000    3.0000
    6.3000    2.9000    5.6000    1.8000    3.0000
    6.5000    3.0000    5.8000    2.2000    3.0000
    7.6000    3.0000    6.6000    2.1000    3.0000
    4.9000    2.5000    4.5000    1.7000    3.0000
    7.3000    2.9000    6.3000    1.8000    3.0000
    6.7000    2.5000    5.8000    1.8000    3.0000
    7.2000    3.6000    6.1000    2.5000    3.0000
    6.5000    3.2000    5.1000    2.0000    3.0000
    6.4000    2.7000    5.3000    1.9000    3.0000
    6.8000    3.0000    5.5000    2.1000    3.0000
    5.7000    2.5000    5.0000    2.0000    3.0000
    5.8000    2.8000    5.1000    2.4000    3.0000
    6.4000    3.2000    5.3000    2.3000    3.0000
    6.5000    3.0000    5.5000    1.8000    3.0000
    7.7000    3.8000    6.7000    2.2000    3.0000
    7.7000    2.6000    6.9000    2.3000    3.0000
    6.0000    2.2000    5.0000    1.5000    3.0000
    6.9000    3.2000    5.7000    2.3000    3.0000
    5.6000    2.8000    4.9000    2.0000    3.0000
    7.7000    2.8000    6.7000    2.0000    3.0000
    6.3000    2.7000    4.9000    1.8000    3.0000
    6.7000    3.3000    5.7000    2.1000    3.0000
    7.2000    3.2000    6.0000    1.8000    3.0000
    6.2000    2.8000    4.8000    1.8000    3.0000
    6.1000    3.0000    4.9000    1.8000    3.0000
    6.4000    2.8000    5.6000    2.1000    3.0000
    7.2000    3.0000    5.8000    1.6000    3.0000
    7.4000    2.8000    6.1000    1.9000    3.0000
    7.9000    3.8000    6.4000    2.0000    3.0000
    6.4000    2.8000    5.6000    2.2000    3.0000
    6.3000    2.8000    5.1000    1.5000    3.0000
    6.1000    2.6000    5.6000    1.4000    3.0000
    7.7000    3.0000    6.1000    2.3000    3.0000
    6.3000    3.4000    5.6000    2.4000    3.0000
    6.4000    3.1000    5.5000    1.8000    3.0000
    6.0000    3.0000    4.8000    1.8000    3.0000
    6.9000    3.1000    5.4000    2.1000    3.0000
    6.7000    3.1000    5.6000    2.4000    3.0000
    6.9000    3.1000    5.1000    2.3000    3.0000
    5.8000    2.7000    5.1000    1.9000    3.0000
    6.8000    3.2000    5.9000    2.3000    3.0000
    6.7000    3.3000    5.7000    2.5000    3.0000
    6.7000    3.0000    5.2000    2.3000    3.0000
    6.3000    2.5000    5.0000    1.9000    3.0000
    6.5000    3.0000    5.2000    2.0000    3.0000
    6.2000    3.4000    5.4000    2.3000    3.0000
    5.9000    3.0000    5.1000    1.8000    3.0000

Now I know I can get the prior by counting and then dividing by the total:

load('iris.data');
iris
classes = iris(:, 5);

%priors by counting

class1 = (classes == 1);
prior_1 = sum(class1)./length(class1);
class2 = (classes == 2);
prior_2  = sum(class2)./length(class2);
class3 = (classes == 3);
prior_3 = sum(class3)./length(class3);

%% Now find a way to get the likelihood of the data given the class p(x|c)
% to apply bayes p(c|x_i) = p(x_i|c)p(c)/p(x_i){p(x_i|c_1)p(c_1) +
% p(x_i|c_2)p(c_2) + p(x_i|c_3)p(c_3)}

But how do I get that likelihood, I feel that it can't be done counting, at least I think. So how do I do it?? Help please, I'm lost completely (: Thanks.

Solution

With the normal distribution:

%% Load Fisher's Iris data set
load iris.dat;
iris;
number_of_features = 4;
classes = iris(:, number_of_features + 1);
number_of_classes = length(unique(classes));

%% Priors by counting
for class_number = 1:number_of_classes
    class{class_number} = (classes == class_number);
    prior{class_number} = sum(class{class_number})./length(class{class_number});
end

%% Compute likelihood 
% Assumption: distributions are Gaussian.
% The probability density function for the normal distribution is defined 
% by two parameters (mean and standard deviation)
% (We could shorten the code by using the 'By' parameter of fitdist())
for class_number = 1:number_of_classes
    likelihood{class_number} = struct;
    for feature_number = 1:number_of_features
        likelihood{class_number}.pd{feature_number} = fitdist(iris(find(iris(:, 5) == class_number), feature_number),'Normal');
    end
end

%% Compute posteriors for all flowers (= making predictions)
% Note that we don't take into account the predictor prior probability
% because it won't impact the class we choose.
posterior = zeros(length(iris), number_of_classes);
for flower_number = 1:length(iris)
    flower = iris(flower_number, 1:number_of_features);
    for class_number = 1:number_of_classes
        flower_likelihood = 1;
        for feature_number = 1:number_of_features
            pd = likelihood{class_number}.pd{feature_number};
            flower_likelihood = flower_likelihood * pdf(pd,flower(feature_number));
        end  % Naive Bayes -> strong (naive) independence assumptions.  
        posterior(flower_number, class_number) = flower_likelihood * prior{class_number};
    end
end

% PS: A nice tutorial: http://www.saedsayad.com/naive_bayesian.htm

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow