Figure 2: Static tournament to obtain a score esti- mate for each item.

h e o r e m 5 . 1 . W i t h t h u m b s r a t i n g , i f f = e r f ( √ 2 x )

and g is the normal distribution with variance 1 and

>1

it

is

impossible

to

estimate

the

ranks

r

within

r(1 ± )

w i t h a b o u n d e d r e v i e w b a n d w i d t h p e r i t e m . More generally, R 1 1 f (g_{c }0 if dx diverges and tends to ∞ then there is 1 (x)) 1

no such review system based on thumbs, where g_{c }is the cumulative distribution function corresponding to g, i.e., g_{c}(x) = ^{R x}_{∞ }g(s)ds .

In the proof, we show that with thumbs rating if an item is to be rated till it gets a thumbs-down at least once, the average number of ratings µ_{f }tends to infinity as n tends

to infinity when

>

1.

### This means that, for

>

1, it is

impossible to estimate the ranks r within r(1 ± ).

5.2

# Comparing items with similar rank

Intuitively, the main advantage of the pairwise comparison over thumbs is that it allows an item to be compared to other items that may be closer in score rather than always comparing it to an average item (assuming thumb rating is like comparing to an average item). This finer comparison allows more accurate estimates of the score. The following theorem, proved in Section 7, confirms the fact that under general conditions, comparing an item with another item of similar score is better than comparing it to an average item.

heorem 5.2. If f is chosen to be the logistic or the erf function, the number of comparisons required to estimate the score of an item decreases as the di erence from the score of the item to which it is compared decreases. More generally

this is true as long as

f(x)(1 f(x))

02 f (x)

is a decreasing function

in x (which holds for the logistic and erf functions).

# 5.3 Comparison-based ranking algorithm

## 5.3.1 Reviewing Mechanism

Tournament (static case): For simplicity, consider a static collection of n items. In this case it is possible to produce a score estimate and output a unique top candidate in n pairwise comparisons. Each item is compared at most log n

Figure 3: Dynamic tournament using a queue.

times. Each item is compared till it loses, as shown in Fig- ure 2; the height at which it loses gives an estimated score and approximate rank. The average number of comparisons per item is 1.

Simulating Tournament using a queue (dynamic case): In the dynamic case we can simulate the tournament using a queue, as illustrated in Figure 3. An item is enqueued at the tail when it is created. In each comparison we pick an item at the head of the queue and send it to be compared with another item in the queue with the closest score estimate. The losing item is deleted from the queue and the winning item is inserted back into the tail. Thus an item is compared till it loses once. In every review the queue size decreases by one. Thus if the number of reviews is equal or more than the number of items, the queue size is always bounded. Thus average bandwidth used by the system is 1. If the queue size is too small, there will be few items in it which means we will be comparing items with very different score estimates; in this case we may compare the item at the head with another item with similar score outside the queue from the item database. Whenever the queue size exceeds a certain threshold we revert back to comparing only with items from the queue.

The above mechanism utilizes a feedback bandwidth of 1 per item. If more bandwidth is available then there will be times when there is at most one item in the queue. Then we can use it to improve the ranking among the top can- didate items. Again one can pick an item randomly biased by the current (possibly time discounted) score estimates, and send it for comparison with another item with near by score. For instance, we may rank all items by the time dis- counted score and pick an item with rank r with probability proportional to 1/r^{γ }, where γ is a parameter, and compare it with the next item in the ranking. If γ = 0, we are sam- pling uniformly; if γ > 0 we are sampling biased towards the items with higher score. Thus γ should be chosen in the range [0, 1] and for γ > 1 the bias is large and most of the probability is concentrated in the top few ranks. Later in the simulations we will see how the convergence of the score estimates depends on γ for the standard f and g functions.

Recall, when we get feedback on a pair of items, the up-