Question

Hello Im using the KDD 1999 dataset and I was looking to apply naive bayes in matlab to it. What I want to know is the kdd dataset is a 494021x42 array of data, if you notice "training" and "target_class" below in the code for naive bayes:

training = [1;0;-1;-2;4;0]; % this is the sample data.
target_class = ['posi';'zero';'negi';'negi';'posi';'zero'];
    % This should have the same number of rows as training data but why?

% Training and Testing the classifier (between positive and negative)
test = 10*randn(10,1) % this is for testing. I am generating random numbers.
class  = classify(test,training, target_class, 'diaglinear')  
% This command classifies the test data depening on the given training data using a       Naive Bayes classifier

% diaglinear is for naive bayes classifier; there is also diagquadratic

What I would like to know is "Target_class" related to the kdd dataset attack types?

back dos
buffer_overflow u2r
ftp_write r2l
guess_passwd r2l
imap r2l
ipsweep probe
land dos
loadmodule u2r
multihop r2l
neptune dos
nmap probe
perl u2r
phf r2l
pod dos
portsweep probe
rootkit u2r
satan probe
smurf dos
spy r2l
teardrop dos
warezclient r2l
warezmaster r2l

Or is the target class the colum headers contained within the "test" set? i.e

protocol_type: symbolic.
service: symbolic.
flag: symbolic.
src_bytes: continuous.
dst_bytes: continuous.
land: symbolic.
wrong_fragment: continuous.
Was it helpful?

Solution

If you read the task definition at, e.g. here, you'll see that the target classes are indeed the attack types. However the training set contains less attack types than the test set.

This is done for realism since after you train your intrusion detection algorithm it must be able to deal with new attack types which are close but not the same as existing ones.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top