X hits on this document

PDF document

An LMF-based Web Service for Accessing WordNet-type Semantic Lexicons - page 2 / 7





2 / 7

activities of intercultural collaboration. It currently accom- modates more than sixty Web services1. These services are classified into one of the around twenty service types, and utilized through accordingly defined APIs2. In other words, a language resource has to be wrapped as a Web service of the appropriate type in order to be accommodated by the infrastructure. The APIs de- fined by the Language Grid are carefully designed so that non-LR/LT expert users are able to use the language ser- vices relatively easily. The provided APIs however are not linguistically fine-grained, nor compliant with LR-related international standards. Thus a possible direction for the next generation APIs or APIs in another layer may be to accommodate more fine-grained linguistic data that are rep- resented by adopting relevant LR-related international stan- dards.

  • 3.

    WordNet-type Semantic Lexicon

    • 3.1.

      What is the WordNet-type Semantic Lexicon?

We mean by WordNet-type semantic lexicon a lexical re- source whose fundamental structure is same as the Prince- ton WordNet (Fellbaum, 1998) (hereafter PWN). That is, a lexicon of the type consists of: a set of synset nodes; a set of links, each connecting a synset node with another one under some lexical/conceptual relation. A synset denotes a lexicalized concept and the associated synset node gath- ers synonymous word forms, each representing one sense carried by a word form. More precisely, a word sense in PWN is defined by the triple {word-form, part-of-speech, sense-number}, and functions as a pointer to the associated synset. A number of lexical resources sharing this information structure have been developed for many languages, includ- ing the Japanese WordNet (Bond et al., 2008) (hereafter WN-ja), and these lexical resources are expected to be in- tegrated via Global WordNet Grid3. Note that, in some literatures, WordNet-type semantic lexicon is described as


and we adopt this convention in this paper.

3.2. Modeling EDR Dictionary as a WordNet-type Semantic Lexicon

The EDR Electronic Dictionary (Yokoi, 1995; EDR, 2007) is not a single dictionary; rather it is actually a dictionary system consisting of sub-dictionaries, including monolin- gual dictionaries (Japanese and English), bilingual dictio- naries (J-to-E and E-to-J), and a concept dictionary, along with co-occurrence dictionaries and corpora. The EDR dic- tionary (hereafter EDR) is the result of a nine-year national project in Japan (from 1986 through 1994) whose aim is to establish a lexical knowledge infrastructure that is useful for intelligent information processing, including Japanese- to/from-English machine translation systems. The core logical structure of EDR can be depicted as shown in Fig. 1.

1http://langrid.org/operation/service manager/language-services

2http://langrid.nict.go.jp/ langrid-developers-wiki-en/#f6d501a8

3http://www.globalwordnet.org/gwa/ gwa grid.htm

Figure 1: Core logical structure of EDR.

In EDR, each entry in every sub-dictionary is associated with a concept identifier (CID) which represents a fine- grained language-independent (or Japanese/English bilin- gual) concept. A CID can be referred by multiple word entries whose meanings are thought to be equivalent. For

example in Fig. 1, the Japanese words (”

”, and ”

”) and the English words (”bank”, ”bnk.”, ”bk”) have the same CID (3bc999), showing that they all denote a same concept (the financial institution sense). This enables us to form a pseudo-synset for a concept node. Note here that the pseudo-synset can be bilingual, given a possible situation where a CID is shared by both the Japanese and English entries. As shown in Fig. 1, a concept node can also have glosses both in Japanese and English. The concept nodes make up a kind of taxonomy or ontolog- ical structure (conceptual system in the figure) in which a node is connected to another by some conceptual/semantic relation. In short, the overall logical structure is quite sim- ilar to the PWN structure; hence EDR can be modeled as a WordNet-type semantic lexicon. This provides us an op- portunity to realize an access service for EDR by exact the same framework for PWN/WN-ja.

3.3. Wordnet-LMF: a Modeling Framework for WordNet-type Lexicons

Wordnet-LMF (Soria et al., 2009), developed by the EU KYOTO project4 (Vossen et al., 2008), is a dialect of LMF (Lexical Markup Framework) (Francopoulo et al., 2008), which is an ISO international standard (ISO 24613, 2008) to model a broad range of lexical resources. Wordnet-LMF was especially designed to facilitate interchange of lexico- semantic information encoded in wordnets for multiple lan- guages. The WN-ja (Bond et al., 2008) is a remarkable in- stance of the wordnet that demonstrates the applicability of Wordnet-LMF to encode a large-scale lexicon in a language other than English. As suggested by the specification of LMF, it is basically expected that the multilingual associations among lexical entries across lexical resources are modeled by using the



Document info
Document views23
Page views23
Page last viewedSun Jan 22 08:37:15 UTC 2017