We can easily observe that the Gnutella topology in the biased case is well correlated with the Internet AS topology, where the nodes within an AS form a dense cluster, with only a few connections go- ing to nodes in other ASes. This is in stark contrast to the unbiased Gnutella graph, where no such property can be observed.
To analyze how churn influences the metrics such as node de- gree, path length, diameter and number of intra-AS peerings, we sample the Gnutella network 10 times during the simulation run, i.e., every 500 seconds. The results are shown in Figure 3. Multi- ple runs of the above experiments, using different world topologies yield similar results.
Graph connectivity: We begin by checking whether the Gnutella network graph remains connected using biased neighbor selection. We define the Gnutella network graph at a particular time instant as the graph formed by nodes that are online at that instant, where two nodes are connected by an edge if there exists a Gnutella connection between them at that instant. We experimentally verify that the Gnutella network remains connected at all 10 times where we sample the network, for all three cases. Hence, biased neighbor selection does not affect the connectivity of Gnutella network.
Mean Node Degree: Since ultrapeers have a much larger node degree than leaf nodes, we show, in Figure 3(a) and (b), how the mean node degree changes over time in a barplot for all three cases separately for ultrapeers and leaf nodes. This enables us to check if a biased neighbor se- lection affects the structural properties of Gnutella adversely. We observe that the mean node degree for leafs decreases only slightly, across time, with a maximum decrease from 3.14 to 2.08 at 3500 seconds. The same is the case for ultrapeers, where the maximum decrease is from 15.29 to 10.75, again at 3500 seconds. In other words, despite biasing the neighbor selection via the oracle, the node degree for both leafs and ultrapeers stays within the expected range, and the network structure of Gnutella remains unchanged.
Graph diameter: The diameter of the overlay graph, which is 5 − 7 hops in the unbi- ased case, increases to 6 − 8 hops with a oracle size of 100, only a nominal increase. Using an oracle with list size of 1000 results in a diameter between 7 − 12 hops, with an average of 9.2. The AS diameter of the underlay graph remains is 4 hops in all cases.
Mean Overlay path length: The average path length in the Gnutella overlay, shown in Fig- ure 3(c), while registering an increase, does not change signifi- cantly. The maximum increase occurs at 3500 seconds, from 3.35 in the unbiased case to 5.21 hops in the biased case with oracle list size of 1000.
Mean AS distance: The benefits of using an oracle for biasing the neighborhood in Gnutella are visible in Figure 3(d), which shows the average AS distance (in the underlay) between any two connected Gnutella nodes. The AS distance is obtained as follows. We map each Gnutella node’s IP address to its parent AS, and for each overlay edge, we find the network distance in AS hops between the two end-nodes. We observe that the least amount of decrease in the average AS distance occurs from 1.93 to 0.8 at 1000 seconds, and the maximum decrease from 1.94 to 0.25 happens at 5000 seconds. Given that the AS diameter remains constant at 4 hops, the average decrease of 1.45 in the AS distance is significant. Besides, as the average AS distance in the case of oracle list size of 1000 is 0.45, a
ACM SIGCOMM Computer Communication Review
value less than 1, it implies that most of the Gnutella peerings are indeed within the ASes, i.e., they are not crossing AS boundaries. This can be a major relief for ISPs, as they do not incur any addi- tional cost for traffic within their domains. Also traffic that does not leave the network is easier to manage. Moreover, P2P traffic will not encounter inter-ISP bottlenecks.
Intra-AS P2P connections: The above observations on AS distance are even better understood from the plots in Figure 3(e) and (f), where we show the total num- ber of intra-AS P2P connections in the Gnutella network as a per- centage of the total number of intra- and inter-AS P2P connections, for both leafs and ultrapeers.
In Figure 3(e), we observe that in the case of leaf nodes, taking the average over the 10 time points, the percentage of intra-AS P2P connections increases from 14.6% in unbiased case to 47.88% in the case of oracle with list size 100. For oracle with list size 1000, we note an average of 82.22% intra-AS P2P connections.
In Figure 3(f), we observe similar results for ultrapeers. The percentage of intra-AS P2P connections increases from an average value of 14.54% in the unbiased case to 38.04% in the case of ora- cle with list size 100, and further to 74.95% in case of oracle with list size 1000.
The percentage increase in intra-AS P2P connections is larger for leaf nodes as compared to ultrapeers, a welcome development. One needs a certain number of inter-AS connections, to maintain network connectivity and to be able to search for file content that may not be available within an AS. However, as leaf nodes typically have poor connectivity to the Internet, and have lower uptimes, it is reasonable to have leaf nodes keep most of their peerings within their AS, while allowing the ultrapeers to have slightly more con- nections outside their ASes.
Overall, we observe that the results for the metrics comparison in Gnutella simulations are in conformity with the graph-based simu- lation results in Section 3.
Scalability of Gnutella: In order to quantify the impact of biased neighborhood selection on the scalability of the Gnutella network, we measure the num- ber of Gnutella messages generated in the entire network, for all the three cases. The negotiation traffic in many P2P systems like Gnutella represents a large portion of the total P2P traffic . In Table 1, we show the number of each type of Gnutella message (Ping, Pong, Query and QueryHit) generated during the en- tire simulation run. Note that the number of unique messages gen- erated is about the same in all the three cases. However, when a Ping or Query is generated by a node, and flooded to its n neigh- bors, the message is counted n times. Hence, the table shows the total number of messages exchanged between Gnutella nodes.
As we can observe, the number of Ping messages decreases from 7.6 million in the unbiased case to 4 million in the case of oracle with list size 1000. Even more significant is the reduction of Pong messages, from 75.5 million to 39 million messages. The Query and QueryHit messages also register similar improve- ments. This reduction of Ping/Pong messages by a factor of 2, and search queries by a factor of almost 3 proves that the scalability of Gnutella network improves significantly with biased neighbor- hood selection.
The reason for this reduction in message volume is as follows. Even though the node degrees are largely unchanged, the oracle helps in building an efficient overlay topology. As the nodes form a dense cluster within an AS with very few inter-AS connections, caching of messages ensures that messages are flooded within sub-
Volume 37, Number 3, July 2007