Question

Here is some code I wrote to calculate the probability of labels with respect to some observed features using a Naive Bayes classifier. This is intended to compute the Naive Bayes formula without smoothing, and is intended to calculate the actual probabilities, so do use the usually omitted denominator. The problem I have is that for the example (below) the probability of a "good" label is > 1. (1.30612245) Can anyone help me understand what thats about? Is this a byproduct of the Naive assumption?

package NaiveBayes;

use Moose;

has class_counts => (is => 'ro', isa => 'HashRef[Int]', default => sub {{}});
has class_feature_counts => (is => 'ro', isa => 'HashRef[HashRef[HashRef[Num]]]', default => sub {{}});
has feature_counts => (is => 'ro', isa => 'HashRef[HashRef[Num]]', default => sub {{}});
has total_observations => (is => 'rw', isa => 'Num');

sub insert {
    my( $self, $class, $data ) = @_;
    $self->class_counts->{$class}++;
    $self->total_observations( ($self->total_observations||0) + 1 );
    for( keys %$data ){
        $self->feature_counts->{$_}->{$data->{$_}}++;
        $self->class_feature_counts->{$_}->{$class}->{$data->{$_}}++;
    }
    return $self;
}

sub classify {
    my( $self, $data ) = @_;
    my %probabilities;
    my $feature_probability = 1;
    for my $class( keys %{ $self->class_counts } ) {
        my $class_count = $self->class_counts->{$class};
        my $class_probability = $class_count / $self->total_observations;
        my($feature_probability, $conditional_probability) = (1) x 2;
        my( @feature_probabilities, @conditional_probabilities );
        for( keys %$data ){
            my $feature_count = $self->feature_counts->{$_}->{$data->{$_}};
            my $class_feature_count = $self->class_feature_counts->{$_}->{$class}->{$data->{$_}} || 0;
            next unless $feature_count;
            $feature_probability *= $feature_count / $self->total_observations;
            $conditional_probability *= $class_feature_count / $class_count;
        }
        $probabilities{$class} = $class_probability * $conditional_probability / $feature_probability;
     }
     return %probabilities;
}

__PACKAGE__->meta->make_immutable;
1;

Example:

#!/usr/bin/env perl

use Moose;
use NaiveBayes;

my $nb = NaiveBayes->new;

$nb->insert('good' , {browser => 'chrome'   ,host => 'yahoo'    ,country => 'us'});
$nb->insert('bad'  , {browser => 'chrome'   ,host => 'slashdot' ,country => 'us'});
$nb->insert('good' , {browser => 'chrome'   ,host => 'slashdot' ,country => 'uk'});
$nb->insert('good' , {browser => 'explorer' ,host => 'google'   ,country => 'us'});
$nb->insert('good' , {browser => 'explorer' ,host => 'slashdot' ,country => 'ca'});
$nb->insert('good' , {browser => 'opera'    ,host => 'google'   ,country => 'ca'});
$nb->insert('good' , {browser => 'firefox'  ,host => '4chan'    ,country => 'us'});
$nb->insert('good' , {browser => 'opera'    ,host => '4chan'    ,country => 'ca'});

my %classes = $nb->classify({browser => 'opera', host => '4chan', country =>'uk'});

my @classes = sort { $classes{$a} <=> $classes{$b} } keys %classes;

for( @classes ){
    printf( "%-20s : %5.8f\n", $_, $classes{$_} );
}

Prints:

bad                  : 0.00000000
good                 : 1.30612245

Im less worried about the 0 probability, but more that the "probability" of good > 1. I believe this is the implementation of the classic Naive Bayes definition.

p(C│F_1 ...F_n )=(p(C)p(F_1 |C)...p(F_n |C))/(p(F_1)...p(F_n))

How can this be > 1?

No correct solution

OTHER TIPS

It's far too long since I used Perl properly for me to be able to debug this, but I think I can see where the problem is. The marginal probability of the feature vector p(f_1 ... f_n) is not computed the way you appear to be doing it, which is as a separate computation with separate parameters. Instead if you have classes c_1 and c_2 with priors p(c_1) and p(c_2), and likelihood terms p(f | c_1) and p(f | c_2), then the marginal probability of f is:

p(c_1)*p(f|c_1) + p(c_2)*p(f|c_2)

This is why the denominator is often dropped: it just involves a sum of quantities that you're already using. Anything you want to know about relative probabilities can be computed as ratios of the unnormalised scores, so computing the constant of proportionality is only useful if you explicitly want a number between 0 and 1.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top