LGP has been already successfully implemented to a variety of machine learning problems . This paper investigates the suitability of linear genetic programming technique, for modeling intrusion detection systems and indentifying the key features that help in deciding whether a connection is intrusive or normal activity. We also compare the performance of LGP with Support Vector Machines (SVM) and a Neural Network (NN) trained using resilient backpropagation learning. Performance metrics include a few critical aspects of intrusion detection like training and testing times, scalability and detection accuracy that help IDSs perform in real time or near real time. The data we used in our experiments originated from MIT’s Lincoln Lab .
We perform experiments to classify the network traffic patterns according to a 5- class taxonomy. The five classes of patterns in the DARPA data are (normal, probe, denial of service, user to super-user, and remote to local). The experimental results of overall classification accuracy and class specific accuracies using linear genetic programs, support vector machines and resilient back propagation neural network are reported. Results obtained form feature ranking experiments are also reported with brief description of the most important features for each class.
In Section 2 a brief introduction to linear genetic programs, support vector machines and resilient back propagation algorithm is given. Section 3 briefly introduces the data we used to compare performance of different soft computing techniques. Section 4 presents the experimental results using LGP, SVM and NN. We briefly describe the feature selection/importance of ranking and the related results in section 5. The summary and conclusions of our work are given in section 6.
Linear Genetic Programming (LGP)
Linear genetic programming is a variant of the GP technique that acts on linear genomes [15,16]. Its main characteristics in comparison to tree-based GP lies in that the evolvable units are not the expressions of a functional programming language (like LISP), but the programs of an imperative language (like C/C ++). An alternate approach is to evolve a computer program at the machine code level, using lower level representations for the individuals. This can tremendously hasten up the evolution process as, no matter how an individual is initially represented, finally it always has to be represented as a piece of machine code, as fitness evaluation requires physical execution of the individuals. The basic unit of evolution is a native machine code instruction that runs on the floating-point processor unit (FPU). Since different instructions may have different sizes, here instructions are clubbed up together to form instruction blocks of 32 bits each. The instruction blocks hold one or more native machine code instructions, depending on the sizes of the instructions. A crossover point can occur only between instructions and is prohibited from occurring within an instruction. However the mutation operation does not have any such restriction.
Resilient Backpropagation (RBP)
The purpose of the resilient backpropagation training algorithm is to eliminate the harmful effects of the magnitudes of the partial derivatives. Only the sign of the derivative is used to determine the direction of the weight update; the magnitude of