Abstract: The network infrastructure of any organization is always under constant threat to a variety of attacks; namely, break-ins, security breach or system misuse. The Network Intrusion Detection System (NIDS) employed in a network detects such penetration attacks and intrusions within a network. Known classes of attacks can be detected easily by performing pattern matching while the unknown attacks are harder to detect. An attempt has been made to design and implement a deep learning approach for intrusion detection that not only learns but also adjusts itself to the patterns not defined earlier. Sparse auto-encoder has been used for unsupervised feature learning. Logistic classifier is then utilized for classification on NSL-KDD datasets. The results have been found to be very promising for future use and modifications. Index Terms— NIDS, deep learning, Sparse auto-encoder, logistic classifier, NSL-KDD
Network architecture is always vulnerable to various types of security breaches, attempted break-ins, penetration attacks and other similar intrusions by unauthorized and malicious users. The network being a repository aims at sharing resources between authorized users, also attracts unwanted users who are interested in exploiting them. In addition, formulations of global protection policies are rare and difficult to implement. The security breach or intrusion is a critical issue for any organization. It is thus important to develop precautionary measures to safeguard the interest of the organization from various categories of attacks to which it is susceptible to. As defined by Heady et al. , “an intrusion is a set of actions that attempts to compromise the integrity, confidentiality or availability of information resources.” The system employed to detect such malicious actions in a network is termed as a Network Intrusion Detection System (NIDS). It should be able to detect a wide range of attacks and security violations inflicted by outsiders. The system should also be able to check on any activity of malpractices and abuses practiced by the insiders. Intruders can broadly be classified into three different categories. Masqueraders are typically outsiders who are not authorized users, but penetrate the system using legitimate user accounts. A Misfeasor is an insider, a legitimate user who misuses the privileges given and accesses resources that they are not authorized to. A Clandestine can be either an insider or an outsider who tries to gain supervisory access to the system . The NIDS are of two categories namely; Signature-based Network Intrusion Detection System (SNIDS) and Anomaly detection based Network Intrusion Detection System (ADNIDS). SNIDS raises an alarm for intrusion by performing a pattern matching on the features of the information it is aware of. ADNIDS on the other hand raises an alarm for intrusion if there are any significant deviations of the user activity under analysis from the normal traffic pattern. SNIDS, therefore, has a higher detection rate for the known types of attacks, while ADNIDS performs better in case of novel/unknown patterns of attacks. However due to the variations in the behavior of the intruder, an ADNIDS has a tendency of a generating high false alarms.
The security violations can be detected by monitoring the system audit record for any abnormal pattern of system usage . Different kinds of machine learning techniques have been employed to develop a Network Intrusion Detection System for anomaly detection . The NIDS model designed can be trained and tested for performance using NSL-KDD dataset , which is a significant upgrade of the KDD Cup 99 dataset . Different machine learning techniques perform differently based on the input features, the training and the test datasets selected . Similar types of approaches, learning techniques, and input features do not always guaranteed the same results for a variety of different classes of possible unknown attacks. Deep learning techniques are popular as they facilitate the design of robust and efficient NIDS. A deep learning approach based on Sparse Auto-Encoder  or a Non-symmetric Deep Auto-Encoder (NDAE)  is useful for unsupervised feature learning of unlabeled data to understand the behavioral patterns of the intruder. A classification of the patterns can then be performed by using soft-max regression or any other suitable classifiers. In the proposed work a deep learning approach based on sparse auto encoders is used to learn the nature of the patterns and a logistic regression classifier is used to classify the users based on the patterns learnt by the stacked encoders. The related work is discussed in Section 2.
The proposed work is given in Section 3 followed by its design in Section 4. The Experimental Results are given in Section 5. The results and discussion are given in Section 6. The conclusion and future work is given in Section 7. 2. Related Works Most of the works carried out for Intrusion detection predictive modeling part are performed using similar types of datasets for training and testing. It is difficult to generalize the real time events through these data sets. The performance measure of the majority of these predictive models thus decreases when thrown into real network traffic. Several approaches have been proposed for classification of normal connections with anomalies to detect intrusions in a network. Shyu et al.  proposed a novel scheme using Principal Component Analysis (PCA) treating anomalies as outliers. The anomaly detection scheme performed better with the KDD’99 dataset. The detection rate rose to 99% while the false alarm rate dropped to as low as 1%. Revathi, et al.  performed a detailed analysis on the NSL-KDD dataset using only relevant features both with and without feature reduction of the dataset on different classification algorithms like J48 decision tree, Random Forest, Support Vector Machine, Naive Bayes algorithm, etc. Random Forest achieved the highest test accuracy in both the cases. Deep learning techniques facilitate the development of flexible and robust NIDS. Niyaz et al.  proposed a Self-Taught Learning (STL), a deep learning technique using Sparse Auto-Encoder for unsupervised feature learning and soft-max regression for classification. The model was evaluated for 2 class, 5 class and 23 class classification and the results obtained were encouraging and the model showed better performance. Shone, et al.  proposed a novel deep learning classification model constructed using stacked Non-Symmetric Deep Auto-Encoder (NDAE) for unsupervised feature learning and RF classification algorithm for classification.
The model was implemented in TensorFlow using benchmark KDD Cup’99 and NSL-KDD datasets. The model achieved a consistent level of classification accuracy with the reduction in training time and a high level of precision and recall. 3. Proposed Work The proposed work aims at using a deep-learning based approach for network intrusion detection. The system uses a deep network to train itself with the patterns of anomalies and classify the network traffic between the normal connections and the intrusions. The approach is also focused at reducing the false alarm rate to a minimum value. The approach has the flexibility to adjust to new patterns of intrusions and the behavior of the person that might change during the course. The proposed system implements a deep network system (sparse auto-encoder with logistic regression), trained by the NSL-KDD dataset. It gives an output value of 0 or 1, where 1 denotes an intruder and 0 corresponds to a normal user. The system utilizes the a total of 115 features as an input to the system some of which are; protocol used, source address, destination address, the time-stamp, services, flag, number of failed logins, number of logins. Each feature is given as an input to the neurons. A sparse auto-encoder with sparsity constraint is designed for training and learning new features from the data set. A deep network is created by stacking the auto-encoders and the classification from the features learnt is implemented using logistic regression network. Logistic regression is taken as the output involves the identification of two classes of users. The fine-tuned network is then used to classify the input data.
Pre-processing of the dataset is done before being applied to the network. The non numeric parameters are replaced with numeric values and the data set is normalized using max-min operation for data standardization. The overall flow of the proposed system is given below in Figure 1. The KDD-Cup Dataset, a modification of the NSL-KDD dataset includes 41 features derived from TCP/IP connections, traffic features accumulated in window interval and content features extracted from the application layer data of connections. Out of the 41 features, 34 are continuous, 4 are binary and 3 are symbolic (protocol_type, service, flags). Figure 1: Design Flowchart An auto-encoder is an artificial neural network used for unsupervised learning. It learns new features from a set of data from the input patterns. The input layer represents the original sets of features, the hidden layer facilitates in the better understanding of the new features with reduced dimension helps. The output layer represents the target feature which is same as that of the input source. Sparse auto-encoders with a sparsity constraint allow the network for a clear exploration of the effects of sparseness for a given dataset thus helping in finding new pattern distribution of the input data. The auto-encoder uses a stochastic conjugate gradient for error minimization with the sigmoid function as the activation function. The first level of the sparse auto-encoder reduces the 115 feature set to 50 as shown in Figure 2. In the diagram, Xi(1-115) represent the input nodes, hi(1-50) represents the hidden layer nodes and Xi(1-115) represents the output layer nodes.
The level 2 sparse auto-encoder further reduces the 50 learned features to 10 new features which are then given as inputs to logistic regression. In the diagram as shown in Figure 2, h1i(1-50) represents input nodes, h2i(1-10) represents the hidden layer nodes and h1i(1-50) represents the output layer nodes. The new features learned from level 2 auto-encoder 2 fed into the logistic classifier which identifies whether a user is normal (0) or an intruder (1) as given in Figure 4. Logistic Regression uses a sigmoid or logistic function as its activation function giving the probability measure of the output in the range of [0,1]. Figure 4: Logistic Classifier (Classifies 10 inputs to two outputs) The final stack implements a fully connected network consisting of 1 input layer, 1 output layer, and 2 Hidden Units. The 115 inputs from the original dataset is compressed and reduce to 50 nodes in the second layer and to 10 nodes in the third layer. The final output layer classifies whether a user is normal or not. Figure 5: Fully Connected Layer (Input:115, Hidden 1:50 Hidden 2:10 Output:2) 5. Experimental Results A total of 22,545 data with 41 features was taken from the NSL-KDD dataset for training. The 3 symbolic features (protocol, service, flag) were expanded using 1-N encoding. The encoded data contains 115 features (3 from protocol, 64 from service and 11 from flag). The protocol_type has 64 variations namely; FTP, HTTP, login, etc which indicates the protocol used.
The service type describes the ICMP, TCP, and UDP services. The flags REJ, SF, S0, S1, etc denote the priority of the data. The num_access_files is ignored as it stays 0 throughout the dataset. The NSL-KDD dataset is normalized with a max-min operation. 5.1. Auto-encoder 1: The parameters associated with auto-encoder ‘msesparse’ at level 1, as shown in Figure 6 are : 1. regularization=0 2. L2 WeightRegularization=0.001 3. sparsityRegularization= 4 4. sparsity=0.2 Figure 6: Sparse Auto-Encoder 1 (output of view (network 1)) and Performance Plot for Sparse Auto-encoder 1 The best performance validation for auto-encoder 1 of 0.021971 is evaluated at 382 epochs. 5.2 Sparse Auto-Encoder 2: The parameters associated with auto-encoder ‘msesparse’ at level 2, as shown in Figure 7 are : 1. regularization = 0 2. L2WeightRegularization = 0.001 3.sparsityRegularization = 1 4. sparsity = 0.05 Figure 7: Sparse Auto-Encoder 2 (output of view (network 2)) and Performance Plot for Sparse Auto-encoder 2 The best performance validation for sparse auto-encoder at level 2 of 0.0046754 is evaluated at 200 epochs. 5.3 Logistic Classifier: The logistic classifier takes the output of the level 2 encoders and classifies them to 2 feature classes as shown in Figure 9. Figure 9: Regression plot for Logistic Classifier. 5.4. Fully Connected Layer The fully connected network of the stack of all the networks (auto-encoder 1, auto-encoder 2, logistic regression) is constructed as shown in Figure 10. The dataset is taken as an input and classified into 2 outputs (normal = 0; intruder = 1). The internal weights are derived from previous Auto-Encoders and Logistic Classifier. Figure 10: Fully Connected Layer (view (deepnet)) 6. RESULTS AND DISCUSSIONS The network was tested with 2401 sample inputs and the confusion matrix formed is shown below in Figure 11. Figure 11: Confusion Matrix From a dataset of 1283 anomalies, 1191 were identified successfully and the remaining 92 were recognized as false positives. From 1118 normal patterns as an input, 216 were identified as intrusions whereas 902 were classified as normal. The overall accuracy is 87.2%. 7.
CONCLUSION AND FUTURE WORK
A deep learning based approach for Network Intrusion Detection System is an anomaly based technique used to detect any possible intrusion of any type in the network. Anomaly based Deep Learning Approach gives us higher accuracy rates than Signature Based Intrusion Detection System. The network learns and adjusts itself to patterns which were not defined previously. The approach is useful for attaining higher accuracy rates and it also reduces the chances of False Positives and Negatives. The system can be implemented on any server which monitors the network activity of any organization in real time. The deepnet can identify any intrusion and adjust itself with newer data which classify an intruder. Further works can be carried out to make the system more robust and also increase the accuracy of detection to a higher degree.