Overview of Paper: To overcome the argument that biasing the neighborhood selection process adversely affects the structural properties of the overlay topology one needs appropriate metrics. We propose metrics for evaluating the impact of using the oracle on the overlay as well as the underlay topology in Section 2 in addition to discussing how to derive realistic topologies.
To evaluate the impact of using the oracle one should ideally study P2P systems with many nodes over the Internet, a network with many ASes and complex intra-AS topologies. Yet as the or- acle service is not yet offered by the ISPs we are confined to us- ing testlab facilities or simulators. Graph simulators enable us to explore large topologies as long as we focus purely on the graph properties. Packet level simulators enable us to incorporate the be- havior of an actual P2P system but limit the complexity of the net- work that can be considered. Using testlab facility we can run the actual P2P system code and therefore no longer require to model it. Yet we again have to reduce the complexity of the network.
Accordingly we show in Section 3, relying on graph based sim- ulations and measured Internet topologies, that the resulting P2P overlays maintain their graph properties like small diameter, small mean path length and node degree, but the densely connected sub- graphs are now local to the ISPs. To study the impact of biased neighbor selection on a real P2P network that implements its own routing, we run extensive simulations of the Gnutella protocol in Section 4. These experiments help us to evaluate the effect of churn in P2P systems, and to study the impact of oracle on scalability and traffic content localization. We find that the Gnutella topologies maintain their graph properties, the ISP now has the ability to in- fluence the overlay topology, and the scalability and network per- formance of Gnutella improves considerably. Then, in Section 5, we show that a modified version of Gnutella when used in a testbed can indeed take advantage of the oracle service. Finally, in Sec- tion 6, we summarize our findings and give an outlook on future work.
In this section, we first propose metrics for evaluating the effec- tiveness of the idea of using an oracle which can also be used to characterize overlay-underlay graphs in general. Then we describe how we derive representative topologies for our simulations from the Internet AS topology.
As a basic model for our investigations, we model the AS-graph as a complete bi-directed graph G = (V,E) with a cost function c : E → I R + a s s o c i a t e d w i t h t h e e d g e s . E v e r y n o d e r e p r e s e n t s a AS, and for every pair (u,v), let c(u,v) denote the overall cost of routing a message from AS u to AS v (which depends on the routing policies of the ASes such a message may traverse). n
G i v e n a s e t o f p e e r s 1 P , l e t A S : P → V d e fi n e h o w t h e p e e r s a r e m a p p e d t o t h e A S e s a n d b : P → I R + d e n o t e s t h e b a n d w i d t h o f t h Internet connections of the peers. The overlay network formed by the peers is given as a directed graph H = (P,F) in which every edge (p,q) ∈ F has a cost of c(AS(p),AS(q)). The graph H can be characterized using several metrics. e
The degree of a peer is defined as the number of its outgoing connections. Ideally, every peer should have a large number of
1 In this paper a peer refers to a node of the P2P network and not to a BGP peer.
ACM SIGCOMM Computer Communication Review
connections to other peers within its AS so as to favor communi- cation within the AS, while connections to other ASes should be limited to avoid high communication costs and high update costs as peers enter/leave the network.
2.1.2 Hop count diameter
Another parameter that should be small is the hop count diameter
of the overlay graph H.
The hop count diameter D of H is the
maximum over all pairs p,q ∈ P of the minimum length of a path (in terms of number of edges) from p to q in H. It is well-known that any graph of n nodes and degree d has a hop count diameter o f a t l e a s t l o g d − 1 n , a n d t h a t d y n a m i c o v e r l a y n e t w o r k s s u c variants of the de Bruijn graph  can get very close to this lower bound, a very nice property. However, even though the hop count diameter may be small, the AS diameter (i.e., the distance between two P2P nodes when taking the underlying AS-graph G with cost h a s
function c into account) can be very large.
2.1.3 AS diameter
The AS diameter of H is defined as the maximum over all pairs p,q ∈ P of the minimum cost of a path from p to q in P, where the cost of a path is defined as the sum of the cost of its edges. Ideally, we would like both the hop count diameter and the AS diameter to be as small as possible. Research in this direction was pioneered by Plaxton et al. , and the (theoretically) best construction today is the LAND overlay network .
Surprisingly, the best AS diameter achievable when avoiding many P2P connections to other ASes can be better than the best AS diameter achievable when all P2P connections go to other ASes. Consider the simple scenario in which the cost of a P2P edge within the same AS is 0 and that between two different ASes is 1. Let the maximum degree of a peer be d. In scenario 1, we require all edges of a peer to leave its AS, and in scenario 2, we only allow one edge of a peer to leave its AS. In scenario 1, the best possible AS diam-
eter is log
n (see our comments above). However, in scenario
2 one can achieve an AS diameter of just log
(n/(d − 1)). For
this, organize the peers into cliques of size d − 1 within the ASes (we assume that the number of peers in each AS is a multiple of d − 1). We can then view each clique as a node of degree d − 1. It is possible to connect these nodes with a graph of diameter close to
(n/(d − 1)), giving the result above.
2.1.4 Flow conductance
Having a small hop count diameter and AS diameter is not enough to ensure high network performance. A tree, for example, can have very low hop count and AS diameter. Yet, it is certainly not a good P2P network, since one single faulty peer is sufficient to cut the network in half. Ideally, we would like to have a network that is well-connected so that it can withstand many faults and can route traffic with low congestion. A standard measure for this has been the expansion of a network. However, it seems that the expansion of a network cannot be approximated well. The best known algo- rithm can only guarantee an approximation ratio of O(√logn) . Therefore, we propose an alternative measure here that we call the flow conductance of a network (which is related to the flow number proposed in ).
Consider a directed network G = (V,E) with edge bandwidths b : E → I R + . I f E ( v ) i s t h e s e t o f e d g e s l e a v i n g v t h e n f o r e v e r y n o d e
v ∈ V, let b(v) =
b(e). Furthermore, for any subset U ⊆ V
let b(U) =
b(v). Next we consider the concurrent multicom-
m o d i t y fl o w p r o b l e m M 0
w i t h d e m a n d s d v , w
= b(v)·b(w)/b(V ) for
every pair v,w of nodes. That is, we consider the heavy-traffic sce- nario in which each node aims at injecting a flow into the system
Volume 37, Number 3, July 2007