proposed strategy consistently outperforms all the other schemes widely used in practice under almost all network configurations. In particular, the download time of our proposed scheme can be several times smaller than any other scheme when the network is heterogeneous (possi- bly correlated) and many downloading peers coexist with source peers, as mostly is the case in reality.

II. Characterizing the Download Time in a Network

We consider our network as a discrete-time system with each time slot of length ∆. For notational simplicity, throughout the paper, we will assume that the length of a time slot is normalized to one, i.e., ∆ = 1. Let C(t) de- note the time-varying service capacity (available end-to- end bandwidth) of a given path at time slot t (t = 1, 2, . . .) over the duration of a download. Then, the download

time T for a file of size F is defined as

# T = min

s>0

s

C(t) ≥ F

t=1

.

(1)

Note that T is a stopping time or the first hitting time of a process C(t) to a fixed level F .

If C(t), t = 1, 2, . . . are independent and identically distributed (i.i.d.), then by assuming an equality in (1),

we obtain from Wald’s equation [12] that

# F =E

T t = 1

C(t)

= E{C(t)}E{T }.

(2)

The expected download time, measured in slots, then be- comes E{T } = F/E{C(t)}. Note that (2) also holds if C(t) is constant (over t). Thus, when the service ca- pacity is i.i.d. over time or constant, there exists a direct relationship between the average service capacity and the average download time, as has typically been assumed in the literature.

II.A Impact of Heterogeneity in Service Capacity

We first consider the impact of heterogeneous service capacities of different paths. In order to decouple the effect of correlations from that of heterogeneity, in this section, we assume that Wald’s equation holds true for each path (i.e., the service capacity of a given path is either constant or i.i.d. over time). But we allow the average capacities for different paths to be different. We will consider the impact of correlations in Section II.B.

Let N be the number of sources in the network (N different end-to-end paths) and C_{i}(t) be the service ca- pacity of path i at time slot t. We assume that C_{i}(t) is either constant or i.i.d. over t such that (2) holds. Let c_{i }= E{C_{i}(t)} be the average capacity of path i. Then, the average service capacity the network offers to a user becomes

# A ( c ) =

1

N

N

c_{i},

(3)

i=1

2

where c = (c_{1}, c_{2}, . . . , c_{N }) and A(c) is the arithmetic mean of the sequence c_{1}, c_{2}, . . . , c_{N }. Thus, one may expect that the average download time, E{T }, of a file of size F would

be

E{T } =

F A(c)

.

(4)

As we mentioned earlier, however, the actual service capacity of each path remains hidden unless a network- wide probe is conducted. So the common strategy for a user is to randomly pick one source (path) and keep the connection to it until the download completes. If the user connects to path i (with service capacity C_{i}(t)), the average download time over that path becomes F/c_{i }from (2). Since the user can choose one of N paths with equal probability, the actual average download time in this case

becomes

# E{T } =

1 N

## N

i=1

F c_{i }

=

F H(c)

,

(5)

where H(c) is the harmonic mean of c_{1 }

, c_{2}, . . . , c_{N }

de-

fined by H(c) = [

1 N

N 1 i=1 c i ]

−1

. Because A(c) ≥ H(c) ^{∗},

it follows that (5) ≥ (4). This implies that the actual average download time in a heterogeneous network is al- ways larger than that given by ‘the average capacity of

the network’ as in (4).

II.B First Hitting Time of a Correlated Stochastic Process

In this section we show that the expected first hitting time of a ‘positively correlated process’ is larger than that of an i.i.d. counterpart. Consider a fixed path between a downloading peer and its corresponding uploading peer for a file of size F . Let C(t) be a stationary random process denoting the available capacity over that path at time slot t. We will assume that C(t) is positively corre- lated over time. Then, as before, we can define the down- load time of a file (or the first hitting time of the process C(t) to reach a level F ) as T_{cor}, where the subscript ‘cor’ means that C(t) is a correlated stochastic process.

Suppose now that we are able to remove the correla- tions from C(t). Let C^{}(t) be the resulting process and T_{ind }be the stopping time for the process C^{}(t) to reach level F , where the subscript ‘ind’ now means that C^{}(t) is independent over time. Then, again from Wald’s equa- tion, we have E{T_{ind}} = F/E{C^{}(t)} = F/E{C(t)}.

First, as introduced earlier, consider the case that C(t) is 100% correlated over time, i.e., C(t) = C for some random variable C for all t. Then, the download time T_{cor }becomes T_{cor }= F/C assuming an equality in (1).

# Hence, from Jensen’s inequality, we have

E{T_{cor }

} = FE

1

C

≥

F

E{C}

= E{T_{ind }

},

i.e., the average first hitting time of an 100% correlated process is always larger than that of an i.i.d. counterpart.

^{∗}The arithmetic mean is always larger than or equal to the har- monic mean, where the equality holds when all c_{i}’s are identical.