adaptation for comparing longer texts requires an extra level of sophistication [Mihalcea et al., 2006]. In contrast, our method treats both words and texts in essentially the same way. Second, considering words in context allows our ap- proach to perform word sense disambiguation (see Table 3). Using WordNet cannot achieve disambiguation, since infor- mation about synsets is limited to a few words (gloss); in both ODP and Wikipedia concept are associated with huge amounts of text. Finally, even for individual words, ESA provides much more sophisticated mapping of words to con- cepts, through the analysis of the large bodies of texts associ- ated with concepts. This allows us to represent the meaning of words (or texts) as a weighted combination of concepts, while mapping a word in WordNet amounts to simple lookup, without any weights. Furthermore, in WordNet, the senses of each word are mutually exclusive. In our approach, concepts reflect different aspects of the input (see Tables 1–3), thus yielding weighted multi-faceted representation of the text.
On the other hand, LSA [Deerwester et al., 1990] is a purely statistical technique, which leverages word cooccur- rence information from a large unlabeled corpus of text. LSA does not rely on any human-organized knowledge; rather, it “learns” its representation by applying Singular Value De- composition (SVD) to the words-by-documents cooccurrence matrix. LSA is essentially a dimensionality reduction tech- nique that identifies a number of most prominent dimensions in the data, which are assumed to correspond to “latent con- cepts”. Meanings of words and documents are then com- pared in the space defined by these concepts. Latent semantic models are notoriously difficult to interpret, since the com- puted concepts cannot be readily mapped into natural con- cepts manipulated by humans. The Explicit Semantic Analy- sis method we proposed circumvents this problem, as it rep- resents meanings of text fragments using natural concepts de- fined by humans.
Our approach to estimating semantic relatedness of words is somewhat reminiscent of distributional similarity [Lee, 1999; Dagan et al., 1999]. Indeed, we compare the mean- ings of words by comparing the occurrence patterns across a large collection of natural language documents. However, the compilation of these documents is not arbitrary, rather, the documents are aligned with encyclopedia articles, while each of them is focused on a single topic.
In this paper we deal with “semantic relatedness” rather than “semantic similarity” or “semantic distance”, which are also often used in the literature. In their extensive survey of relatedness measures, Budanitsky and Hirst  argued that the notion of relatedness is more general than that of sim- ilarity, as the former subsumes many different kind of specific relations, including meronymy, antonymy, functional associ- ation, and others. They further maintained that computational linguistics applications often require measures of relatedness rather than the more narrowly defined measures of similarity. For example, word sense disambiguation can use any related words from the context, and not merely similar words. Bu- danitsky and Hirst  also argued that the notion of se- mantic distance might be confusing due to the different ways it has been used in the literature.
Prior work in the field mostly focused on semantic simi-
larity of words, using R&G [Rubenstein and Goodenough, 1965] list of 65 word pairs and M&C [Miller and Charles, 1991] list of 30 word pairs. When only the similarity re- lation is considered, using lexical resources was often suc- cessful enough, reaching the correlation of 0.70–0.85 with human judgements [Budanitsky and Hirst, 2006; Jarmasz, 2003]. In this case, lexical techniques even have a slight edge over ESA, whose correlation with human scores is 0.723 on M&C and 0.816 on R&G.4 However, when the entire lan- guage wealth is considered in an attempt to capture more general semantic relatedness, lexical techniques yield sub- stantially inferior results (see Table 1). WordNet-based tech- niques, which only consider the generalization (“is-a”) rela- tion between words, achieve correlation of only 0.33–0.35 with human judgements [Budanitsky and Hirst, 2006]. Jar- masz & Szpakowicz’s ELKB system [Jarmasz, 2003] based on Roget’s Thesaurus achieves a higher correlation of 0.55 due to its use of a richer set if relations.
Sahami and Heilman  proposed to use the Web as a source of additional knowledge for measuring similarity of short text snippets. A major limitation of this technique is that it is only applicable to short texts, because sending a long text as a query to a search engine is likely to return few or even no results at all. On the other hand, our approach is applicable to text fragments of arbitrary length.
Strube and Ponzetto  also used Wikipedia for com- puting semantic relatedness. However, their method, called WikiRelate!, is radically different from ours. Given a pair of words w1 and w2, WikiRelate! searches for Wikipedia arti- cles, p1 and p2, that respectively contain w1 and w2 in their titles. Semantic relatedness is then computed using various distance measures between p1 and p2. These measures ei- ther rely on the texts of the pages, or path distances within the category hierarchy of Wikipedia. On the other hand, our approach represents each word as a weighted vector of Wikipedia concepts, and semantic relatedness is then com- puted by comparing the two concept vectors.
Thus, the differences between ESA and WikiRelate! are:
WikiRelate! can only process words that actually occur in titles of Wikipedia articles. ESA only requires that the word appears within the text of Wikipedia articles.
WikiRelate! is limited to single words while ESA can compare texts of any length.
WikiRelate! represents the semantics of a word by either the text of the article associated with it, or by the node in the category hierarchy. ESA has a much more so- phisticated semantic representation based on a weighted vector of Wikipedia concepts.
Indeed, as we have shown in the previous section, the richer representation of ESA yields much better results.
We proposed a novel approach to computing semantic relat- edness of natural language texts with the aid of very large
4WikiRelate! [Strube and Ponzetto, 2006] achieved relatively low scores of 0.31–0.54 on these domains.