When an item with score x is rated using
thumb-based algorithm, it gets a thumbs up with probability f(x) and gets a thumbs down with probability 1 f(x). A typical distribution that has been used for f(x) is the
x) function , which is the probability that a normal
random variable with mean 0 and variance
/√2 exceeds x.
Another popular choice is the logistic function 15].
The function f(x) for modeling results of thumb ratings can be generalized to results of pairwise comparisons as fol- lows. We may assume that the probability distribution of the outcome of comparing two items with scores x and y respectively depends on the difference of their scores. We say that the item with score x wins against the one with
score y with probability f(x ing probability is given by f(x comparing an item with score
y). Thus, the thumbs rat- 0), which is essentially like x to an item which score 0
which can be interpreted as a median item.
This is consistent with one of the earliest models for two
player games suggested by Arpad Elo  for chess.
central assumption was that the performance of each player is a normally distributed random variable with mean equal to the true score of the player. In any one game the players performance is given by a random number that is normally distributed around his true score. He also made the assump- tion that all players have the same variance. Observe that the difference between two normally distributed variables with the same variance and means x and y is also a nor- mally distributed variable with mean x y. So this gives that the probability that player with true score x loses to another with true score y is equal to the probability that this
variable exceeds x
y which is given by f(x
y) = erf(
depends on the variance.
4.2 Estimating scores from reviews
Any ranking algorithm makes use of the reviews to pro- duce score estimates for each item. These score estimates are then used to rank the items. The algorithm’s goal is to obtain score estimates such that ranking based on the score estimates is close to the ranking based on the inherent scores of the items. Since the goal of the algorithm is get a near-accurate ranking, the accuracy of the actual score esti- mates to the actual score is less important than the ranking induced by them.
Thumbs: For thumb-based algorithms, a natural way of estimating the score x of an item is the fraction of reviews in which the item received a thumbs-up rating.
Comparisons: For algorithms based on pairwise-comparisons, a popular choice for revising score estimates is the Elo rating system , which was originally invented by Arpad Elo for Chess but is now widely used for many other games.
Consider an item A with score estimate x. Under a given probability model, suppose A was expected to receive EA points in a comparison, but actually received SA points. Then, A’s score is revised to x0 based on the following for-
x0 = x + K(SA
K is a parameter that decays
with the number of times the item has been reviewed.
is high initially so as to get close to an items
actual score quickly.
Based on our probability model, when
item A is compared with item B
having score estimate y,
the expected points A receives is EA = f(x points received by A is SA = 1 if A wins
y). The actual the comparison
and SA = 0 if A loses the comparison.
Later, it was simplified to the logistic function be-
cause the resulting update formulae are simpler.
the final update formula becomes: x0 = x+K(SA
1 1 +ey
Intuitively, if an item wins against an-
other item with a high score estimate, it’s score gets a more significant boost than it would by defeating an item with
low score estimate.
SCHEDULING ITEMS FOR REVIEW
Clearly the main component in any (thumb-based or comparison-based) ranking algorithm is the module that picks the item(s) for review. Even in systems such as Digg and twitter, if the reader rates items while browsing through the site, the reviewing algorithm can be modeled as picking an item for review with a probability depending on its rank in the display order. We will therefore consider explicit sys- tems in our work.
In this section we analyze thumb-based and comparison- based algorithms based on the desirable properties from Sec- tion 3. We will look at the ratings bandwidth required by both the thumbs based and the comparison based algo- rithms. If there are n items with different qualities, we can order them by decreasing quality which results in a rank- ing of the items. We will compute the number of compar- isons/thumbs required to estimate the rank of an item. We are interested in estimating the rank r within a multiplica- tive error r(1 ± ); thus we are interested in identifying the top ranked items with small error in their rank and allow more error as the rank increases. We will compute the aver- age number of ratings required per item with both methods. We will assume that the qualities of the n items are normally distributed.
First, Section 5.1 shows a negative result for thumb-based algorithms, showing that no thumb-based reviewing module can achieve a good ranking with bounded feedback band- width. We will show that a comparison-based algorithm can indeed achieve such an approximation with bounded feed- back bandwidth.
Fundamentally, comparison-based algorithms allow you to distinguish between items of similar score by directly com- paring them against each other, whereas a thumb-based al- gorithm intuitively compares an item with a median item. Section 5.2 theoretically crystallizes the advantage of being able to compare items with similar score.
Therefore, we focus on comparison-based algorithms for most of the paper. Section 5.3 studies in detail review mech- anisms based on pairwise comparisons, which is the model adopted by our system.
We ask the following question: Is there any algorithm for scheduling items for review that can approximate the ranks of items within some multiplicative error with bounded feed- back? It turns out that under typical probability models, for thumb-based systems, it can be shown theoretically that there is no such reviewing mechanism. The following the- orem formalizes this result. To maintain the flow of the paper, the proof of this theorem and all other results in the paper are deferred to Section 7.