helping it pick an optimal neighbor. We simulate the simplest of such oracles where it either chooses a neighbor within the querying node’s AS if such a one is available, or a node from the nearest AS (considering AS hop distance). We experiment with different sizes of the oracle’s choice list.

We experimented with establishing from 1000 upto 40,000 neigh- bor relationships in total. Given that for random graphs, the thresh- old for the number of edges to ensure connectivity is logn/2 times the number n of nodes, it is not surprising that we need roughly 18,000 edges to ensure that the simulated graph is connected. In- creasing the number of edges beyond this number does not change the graph properties noticeably. Accordingly, we concentrate on results for 20,000 peerings.

We run four experiments for each of the five AS graphs where the oracle is used for each neighbor relationship with candidate lists of length 1, 10, 50, 100, 200, 375, resulting in 120 experiments. Note that a list length of 1 corresponds to the unbiased case. The results we obtained are as follows.

Structural properties: First, we check whether the overlay graphs remain connected using biased neighbor selection. In principle it is possible that due to a heavy bias, the graph disintegrates into disconnected components which are themselves well connected. We experimentally verify that all resulting graphs remain connected, thereby not impacting the reachability of the overlay graph.

The next question is if the mean degree of the P2P nodes changes. We find that the mean degree value of 9.138 of an unbiased graph changes to 8.8 in biased graphs with list size 200, see Figure 1(a). The small change in node degree implies that we do not affect the structural properties of the overlay graph seriously.

One may expect that our biased neighborhood selection increases the diameter and mean path length, as it prefers “closeby” neigh- bors. Yet, in all experiments the hop count diameter of the overlay graph stays at 7 or 8 hops and the AS diameter of the underlying AS graph stays at 5 hops. Neither does the average path length in the overlay graph increase significantly, see Figure 1(b). Therefore we can conclude that the biased neighborhood selection does not negatively impact the structural properties of the overlay graph.

Locality in topology: We find that locality in overlays improves significantly as captured by the average AS-distance of P2P neighbors. Figure 1(c) shows how the AS-distance improves with the ability of the P2P node to choose a nearby neighbor. A lower AS-distance should correspond to lower latency. This is also reflected in the number of P2P neigh- bor connections that stay within each of the ASes, see Figure 1(d). Without consulting the oracle, only 4% of the edges are local to any of the ASes. The use of the oracle increases locality by a factor of 7 from 697 to 5088 (in a total of 20,000 peerings), even with a rather short candidate list of length 10. With a candidate list of length 200, more than half of the edges, 59%, stay within the AS. We find that the effects are even more pronounced for smaller networks. This demonstrates how much the oracle increases the ability of the AS to keep traffic within its network and with a refined oracle to better manage the P2P traffic. These results also indicate the benefit to the user, as traffic within the AS is less likely to encounter network bottlenecks than inter-AS traffic.

Flow conductance: The remaining question is if the network maintains its ability to route traffic with low congestion. Since the run time requirements of our algorithm for computing a lower bound for the flow con- d u c t a n c e o f a g r a p h i s O ( n 4 ) , w e c a n c u r r e n t l y o n l y e s t i m a t e t h e

## ACM SIGCOMM Computer Communication Review

35

fl o w c o n d u c t a n c e f o r s m a l l g r a p h s 2 . B e i n g a b l e t o c a l c u l a t e t h e conductance of smaller graphs only is not a big problem as in case of Gnutella [15], we can calculate the conductance of the graph of ultrapeers, which is naturally much smaller than the entire Gnutella connectivity graph. We construct unbiased as well as biased graphs with 10 nodes and 21 edges, respectively 18 nodes and 51 edges. Both graphs are generated on a topology with 6 ASes.

The expected flow conductance of the unbiased graphs is 0.505 for the 10 node graph and 0.533 for the 18 node graph (see Sec- tion 2). We experimentally verify that both unbiased graphs sup- port a conductance of at least 0.5. Also, we find that the penalty for the two biased graphs is less than a factor of 2. The 10 node biased graph supports a flow conductance of at least 0.3, and the 18 node graph, of at least 0.25. We furthermore observe that subgraphs of the biased graphs support a higher flow conductance which indi- cates that the connectivity within the ASes is good. This will likely result in a performance boost if the desired content can be located within the proximity of the interested user. The locality of biased graphs increases to 50% (for 10 nodes), respectively 80% (for 18 nodes) compared to 20% in the unbiased graphs.

4.

# SIMULATIONS WITH AN ACTUAL P2P SYSTEM

In the previous section, we have seen that the results of biased neighbor selection on the graph properties of a generalized over- lay network as well as its correlation to the underlay graph are promising. We now explore how a real P2P file sharing system benefits from using the oracle using a packet level network simu- lator [36]. For this purpose, we choose Gnutella, an unstructured P2P file sharing system. In the following we first give an overview of the Gnutella protocol, then discuss how we realize it within the simulation framework, and then discuss the simulation setup and our results.

4.1

# Gnutella and SSFNet

Gnutella [15] is a popular file-sharing network with about 2 mil- lion users [12, 37]. Moreover it is an open-source system, which has attracted a healthy interest from researchers, e.g., [37, 38, 39]. The Gnutella network is comprised of agents called servents, who can initiate as well as serve requests for resources. When launched, a servent searches for other peers to connect to by sending Hello- like Ping messages. The Ping messages are answered by Pong messages, which contain address and shared resource information. The search queries are flooded within the Gnutella network using Query messages, and answered by QueryHit messages. To limit flooding Gnutella uses TTL (time to live) and message IDs. Each answer message (QueryHit/Pong) traverses the reverse path of the corresponding trigger message. While the negotiation traffic is carried within the set of connected Gnutella nodes, the actual data exchange of resources takes place outside the Gnutella network, us- ing the HTTP protocol. Due to scalability problems, new versions of Gnutella take advantage of a hierarchical design in which some servents are elevated to ultrapeers, while others become leaf nodes. Each leaf node connects to a small number of ultrapeers, while each ultrapeer maintains a large number of neighbors, both ultra- peers and leafs. To further improve performance and to discourage abuse, the Ping/Pong protocol underwent semantic changes. An- swers to Pings are cached (Pong caching) and too frequent Pings or repeated Querys may cause termination of connection.

2

## Meanwhile, we have found a way to reduce the complexity

t o O ( n 2 l o g n ) a n d w o r k o n c o m p u t i n g t h e c o n d u c t a n c e o f l a r g e r graphs is continuing.

Volume 37, Number 3, July 2007