Similarity score (x100)
Similarity score (x100)
covert udp ftp-data
Figure 6: Similarity of different types of traffic (a) Comparison of NZIX-II unhidden and covert traffic. (b) Comparison of DARPA and unhidden covert traffic.
empirical distribution. For example, there are applications of the Kolmogorov-Smirnov test. In our research, we are not seeking to model either the overt or the covert network traffic. Our goal is to define metrics that differentiate covert from overt traffic, therefore, these methods are not directly applicable to the detection of IP timing channels.
An Empirical Evaluation
The goal of our experiments was to examine the efficacy of our two metrics. To this end we first report experiments with a basic covert channel that employs a single timing interval throughout the communication and does not try to mask itself in any way. Our second set of experiments looks at how our metrics fare when the measures are taken to hide the channel. Our ultimate experimental objective is to measure not only our method’s false negative rate for covert channels but also its false positive rate for non-covert traffic. To this end, our third experiment explores how our metrics can be combined to form an automated detection method.
4.2.1 Data sets:
In our experiments, we used both synthetic and real traffic data sets for the sake of completeness. Our synthetic data set is the ’99 DARPA data set for Telnet and HTTP traffic . Additionally, we employ the second version of NZIX data sets (NZIX-II) which is a collection of TCP and UDP traces collected by the WAND research group . For the TCP traces, we chose to investigate Telnet, FTP, and HTTP traffic.
For each experiment we report results for traffic flows of 2000 packets. Our goal is not to model or identify a traffic distribution, but to determine whether we can accurately de- tect a covert channel in a window of 2000 packets. In future work we will investigate what is the minimum length of the window for which our methods are still effective. Note that although the covert channel was run between Purdue and Georgetown Universities, for the non-covert traffic we use the recorded IA times in the datasets. A drawback is that we cannot have the same network conditions (e.g., number of hops, same jitter), but excluding the case of jitter this does not impact our results. None of our measures look at
Table 1: Regularity of NZIX-II, DARPA, and covert traffic with windows of size 250 and 100.
absolute IA values, but rather compute measures of regular- ity in terms of the relative differences among IA values.
4.2.2 Covert Channel I: A simple timing channel:
Our first experiment examines each metric’s ability to de- tect a covert timing channel that employs a single timing interval (set to be 0.04 sec) for the entire communication. In Table 1 we show the regularity of the variance for two window sizes (100 and 250) within the 2000 packet dataset. Our results are the average of ten different sets of data for each protocol, including the covert channel.
Observe that the variance in the pairwise differences be- tween the variance of each pair of windows is on average less for the covert channel than for the other traffic. However, one FTP and one UDP dataset had similarly low scores. This is to be expected because FTP and UDP send streams of data as fast as possible resulting in a uniform IA. Note that the smaller window size appears to better differentiate the covert channel’s regularity from the other protocols. In other words, there is a larger difference between the value (4.63) for the covert channel and the values for the non- covert channels.
6 we The
show the results for the second metric,
percentage of all pairs of sorted IA values whose difference is less than . For a covert channel we would anticipate that the majority of the traffic would have small differences in the
sorted IA values. For both the NZIX-II and the datasets, the graph show the results for Telnet,