shown to a user, and s/he scores the item based on how much they like the item. The most common such re- viewing mechanism is the thumb-based scheme, where each user merely gives a thumbs-up or thumbs-down to any item based on whether they like it or not respec- tively. Thumb-based ranking schemes are widely used on the Web, such as in Digg [1], Twitter [8] and Face- book status message ranking [2], and movie streaming websites such as tv-links [7], and sidereel [6].

A natural generalization of the thumb-based approach is a star-rating scheme where each user gives a score (typically between 0 and 5 stars), based on the extent to which they like the item. Star-rating schemes are also popular on the Web such as in movie rating sites like IMDb [3] and NetFlix [4].

In this paper, we primarily consider the thumb-based independent scoring mechanism.

2. Comparison-based Scoring: A pair of items is shown to a user, and s/he responds by only telling the system which item they find better (irrespective of whether they like/dislike any or both items). While a comparison- based ranking scheme is not as popular as a thumb- based scheme, similar approaches have been used in various different contexts such as ranking photos [13], Elo chess ratings [12], and ranking sports teams. Our system adopts the comparison-based scoring mecha- nism; in the following we shall exhibit various benefits of it over independent scoring. In the rest of the paper, we study various technical challenges involved with ap- plying comparison-based ranking in a fair and efficient manner.

A further generalization of a comparison-based rank- ing scheme consists of presenting a user with a set of k items (k > 2), and asking the user to provide a ranking among these k items in terms of a partial/total order- ing. We do not consider this generalization for most of the paper.

Pairwise comparisons have benefits over thumb-based schemes. First, consider a case where we have n items, a bandwidth of n feedbacks from users. If a user gives a thumbs-up or down on each of the n items, we do not get much information about the relative ranking of the items. On the other hand, by comparing pairs of items, as in any knockout tournament, we are able to get better rankings of the items. Second, to ensure fairness, an item needs to receive comparisons as long as it wins (or as long as there are other items to compare against). However, such a continued thumb-based reviewing may not be practical from the bandwidth point of view. A more rigorous theoretical comparison of the two approaches appears in Section 5.

3.

# DESIR BLE PROPERTIES

In general, we would like the review module to satisfy certain desirable properties.

1. Ranking Accuracy: The system should be able to rank the items as accurately as possible within the re- view budget. While it may not be critical (or even feasible) to obtain the exact ranking, getting it ap- proximately is desirable. A reasonable goal is to ap- proximate the rank r of an item within a multiplicative

error r(1±) with high confidence, for some parameter

. This will ensure that we get the top item right and allows more tolerance for the lower ranked ones.

2.

Review Feedback Bandwidth: The ranking should converge to the correct one within the desired level of accuracy quickly with a small amount of feedback per item. An important measure here is the average amount of feedback µ

_{f }used per item as the number of items n goes to ∞ in the steady state. µ_{f }must be bounded as n → ∞ as otherwise the system is unsta- ble.

3.

Low Latency: Users should not have to wait long before receiving an estimate on their score/rank. This is distinct from feedback bandwidth as a system can need low bandwidth but have high latency. A spe- cific latency measure that is important is the amount of time the user has to wait after posting an item till s/he receives the first review from the system say in the form of a thumb rating or a result of a pair wise comparison. The latter is important to retain the in- terest and enthusiasm of the forum writers.

4.

Fairness: Items should be treated equally with re- spect to ranking and allocation of review bandwidth. Items that perform equally (thinking of the review sys- tem as a game) should be treated equally. Thus, if say in the thumb based rating scheme, two items receive identical ratings in a sequence of reviews, it should not be the case that one is scheduled for more reviews and another is not. Further, preferably, an item should be reviewed till it ‘loses’ at least once; as long as an item is winning it should not be removed from the review system. However, achieving this depends on the avail- able review bandwidth.

We show in Section 5 that for reasonable score distribu- tions, there is no thumb-based algorithm that approximates the rank within a multiplicative error using bounded feed- back bandwidth. However, we give a comparison-based algo- rithm that can achieve a multiplicative error with bounded feedback.

4.

# PRELIMIN RIES

Next we present background necessary for the rest of the paper. Section 4.1 states our assumptions on how scores of items are distributed, and presents a model of how users rate items. Section 4.2 describes how thumb-based and comparison-based algorithms update scores of items that re- ceived feedback from users.

4.1

# Probability Models

The feedback bandwidth and the ranking accuracy of an algorithm depends on two factors: (1) The distribution of scores of the items; (2) Probability model of how items are rated by the algorithm. Next we describe the models used for these factors.

Score Distribution: We will assume that each item has a score that is given by a real number and the scores come from a certain distribution (say the normal distribution). Let g(x) denote the probability density function of the scores, and g_{c}(x) denote the cumulative distribution function.