# Rating Items:

# When an item with score x is rated using

thumb-based algorithm, it gets a thumbs up with probability f(x) and gets a thumbs down with probability 1 f(x). A typical distribution that has been used for f(x) is the

erf (

x) function [12], which is the probability that a normal

random variable with mean 0 and variance

/^{√}2 exceeds x.

Another popular choice is the logistic function 15].

1 (1+e

x

)

[10,

The function f(x) for modeling results of thumb ratings can be generalized to results of pairwise comparisons as fol- lows. We may assume that the probability distribution of the outcome of comparing two items with scores x and y respectively depends on the difference of their scores. We say that the item with score x wins against the one with

score y with probability f(x ing probability is given by f(x comparing an item with score

y). Thus, the thumbs rat- 0), which is essentially like x to an item which score 0

which can be interpreted as a median item.

# This is consistent with one of the earliest models for two

player games suggested by Arpad Elo [12] for chess.

Elo’s

central assumption was that the performance of each player is a normally distributed random variable with mean equal to the true score of the player. In any one game the players performance is given by a random number that is normally distributed around his true score. He also made the assump- tion that all players have the same variance. Observe that the difference between two normally distributed variables with the same variance and means x and y is also a nor- mally distributed variable with mean x y. So this gives that the probability that player with true score x loses to another with true score y is equal to the probability that this

variable exceeds x

y which is given by f(x

y) = erf(

x),

where

depends on the variance.

# 4.2 Estimating scores from reviews

Any ranking algorithm makes use of the reviews to pro- duce score estimates for each item. These score estimates are then used to rank the items. The algorithm’s goal is to obtain score estimates such that ranking based on the score estimates is close to the ranking based on the inherent scores of the items. Since the goal of the algorithm is get a near-accurate ranking, the accuracy of the actual score esti- mates to the actual score is less important than the ranking induced by them.

Thumbs: For thumb-based algorithms, a natural way of estimating the score x of an item is the fraction of reviews in which the item received a thumbs-up rating.

Comparisons: For algorithms based on pairwise-comparisons, a popular choice for revising score estimates is the Elo rating system [12], which was originally invented by Arpad Elo for Chess but is now widely used for many other games.

Consider an item A with score estimate x. Under a given probability model, suppose A was expected to receive E_{A }points in a comparison, but actually received S_{A }points. Then, A’s score is revised to x^{0 }based on the following for-

mula:

x^{0 }= x + K(S_{A }

E_{A}).

K is a parameter that decays

with the number of times the item has been reviewed.

In-

tuitively, K

is high initially so as to get close to an items

actual score quickly.

# Based on our probability model, when

item A is compared with item B

having score estimate y,

the expected points A receives is E_{A }= f(x points received by A is S_{A }= 1 if A wins

y). The actual the comparison

against B

and S_{A }= 0 if A loses the comparison.

Originally

the

# Elo

system

assumed

that

f (x)

is

given

by

the

erf (x)

function.

Later, it was simplified to the logistic function be-

cause the resulting update formulae are simpler.

# Therefore,

the final update formula becomes: x^{0 }= x+K(S_{A }

1 1 +e^{y }

x

)

(assuming

= 1).

# Intuitively, if an item wins against an-

other item with a high score estimate, it’s score gets a more significant boost than it would by defeating an item with

low score estimate.

5.

# SCHEDULING ITEMS FOR REVIEW

Clearly the main component in any (thumb-based or comparison-based) ranking algorithm is the module that picks the item(s) for review. Even in systems such as Digg and twitter, if the reader rates items while browsing through the site, the reviewing algorithm can be modeled as picking an item for review with a probability depending on its rank in the display order. We will therefore consider explicit sys- tems in our work.

In this section we analyze thumb-based and comparison- based algorithms based on the desirable properties from Sec- tion 3. We will look at the ratings bandwidth required by both the thumbs based and the comparison based algo- rithms. If there are n items with different qualities, we can order them by decreasing quality which results in a rank- ing of the items. We will compute the number of compar- isons/thumbs required to estimate the rank of an item. We are interested in estimating the rank r within a multiplica- tive error r(1 ± ); thus we are interested in identifying the top ranked items with small error in their rank and allow more error as the rank increases. We will compute the aver- age number of ratings required per item with both methods. We will assume that the qualities of the n items are normally distributed.

First, Section 5.1 shows a negative result for thumb-based algorithms, showing that no thumb-based reviewing module can achieve a good ranking with bounded feedback band- width. We will show that a comparison-based algorithm can indeed achieve such an approximation with bounded feed- back bandwidth.

Fundamentally, comparison-based algorithms allow you to distinguish between items of similar score by directly com- paring them against each other, whereas a thumb-based al- gorithm intuitively compares an item with a median item. Section 5.2 theoretically crystallizes the advantage of being able to compare items with similar score.

Therefore, we focus on comparison-based algorithms for most of the paper. Section 5.3 studies in detail review mech- anisms based on pairwise comparisons, which is the model adopted by our system.

5.1

# Thumb-based algorithm

We ask the following question: Is there any algorithm for scheduling items for review that can approximate the ranks of items within some multiplicative error with bounded feed- back? It turns out that under typical probability models, for thumb-based systems, it can be shown theoretically that there is no such reviewing mechanism. The following the- orem formalizes this result. To maintain the flow of the paper, the proof of this theorem and all other results in the paper are deferred to Section 7.