KDD1999 dataset Features exolaination

https://stackoverflow.com/questions/17024961

31-05-2022
|

Question

I'm using KDD1999 dataset to prevent intrusion, but i have some questions about the features: can someone explain to me or give me the meaning of the flags. Here is the list of the flags used in the KDD1999 dataset:

'flag' { 'OTH', 'REJ', 'RSTO', 'RSTOS0', 'RSTR', 'S0', 'S1', 'S2', 'S3', 'SF', 'SH' }

here is a example of KDD dataset records:

0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,normal.
0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,normal.
0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,normal.
0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,snmpgetattack.

Solution

First of all, note that the data set is flawed and should not be used (KDNuggets statement). Roughtly said for two reasons: A) it is not at all realistic, in particular not for modern attacks (heck, not even for real attacks back in 1998!) - todays, most attacks are SQL injection and password theft via trojans, neither of which will be detectable with this kind of data. B) the data set is focused around attacks, so it consists of attacks with some background noise; while actual traffic will be largely data and some attacks and C) it was simulated with a largely virtual network, and you can detect the "attacks" by the simulated network topology only.

Judging from the documentation of the usual preprocessed version, the flags is a derived value of the connection state, i.e. whether the reply to the connection attempt was a TCP REJ, TCP RST etc.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow