X hits on this document

PDF document

Evaluating Dependency Representation for Event Extraction - page 1 / 9





1 / 9

Evaluating Dependency Representation for Event Extraction

Makoto Miwa1

Sampo Pyysalo1

Tadayoshi Hara1

Jun’ichi Tsujii1,2,3

1Department of Computer Science, the University of Tokyo 2School of Computer Science, University of Manchester 3National Center for Text Mining {mmiwa,smp,harasan,tsujii}@is.s.u-tokyo.ac.jp


The detailed analyses of sentence struc- ture provided by parsers have been applied to address several information extraction tasks. In a recent bio-molecular event ex- traction task, state-of-the-art performance was achieved by systems building specif- ically on dependency representations of parser output. While intrinsic evalua- tions have shown significant advances in both general and domain-specific pars- ing, the question of how these translate into practical advantage is seldom con- sidered. In this paper, we analyze how event extraction performance is affected by parser and dependency representation, further considering the relation between intrinsic evaluation and performance at the extraction task. We find that good intrinsic evaluation results do not always imply good extraction performance, and that the types and structures of differ- ent dependency representations have spe- cific advantages and disadvantages for the event extraction task.

1 Introduction

et al., 2009). The automatic extraction of bio- molecular events from text is important for a num- ber of advanced domain applications such as path- way construction, and event extraction thus a key task in Biomedical Natural Language Processing (BioNLP).

Methods building feature representations and extraction rules around dependency representa- tions of sentence syntax have been successfully applied to a number of tasks in BioNLP. Several parsers and representations have been applied in high-performing methods both in domain studies in general and in the BioNLP’09 shared task in particular, but no direct comparison of parsers or representations has been performed. Likewise, a number of evaluation of parser outputs against gold standard corpora have been performed in the domain, but the broader implications of the results of such intrinsic evaluations are rarely considered. The BioNLP’09 shared task involved documents contained also in the GENIA treebank (Tateisi et al., 2005), creating an opportunity for direct study of intrinsic and task-oriented evaluation results. As the treebank can be converted into various de- pendency formats using existing format conver- sion methods, evaluation can further be extended to cover the effects of different representations.

Advanced syntactic parsing methods have been shown to effective for many information extrac- tion tasks. The BioNLP 2009 Shared Task, a re- cent bio-molecular event extraction task, is one such task: analysis showed that the application of a parser correlated with high rank in the task (Kim

In this this paper, we consider three types of de- pendency representation and six parsers, evaluat- ing their performance from two different aspects: dependency-based intrinsic evaluation, and effec- tiveness for bio-molecular event extraction with a state-of-the-art event extraction system. Compar- ison of intrinsic and task-oriented evaluation re-


Proceedings of the 23rd International onference on omputational Linguistics ( oling 2010), pages 779–787, Beijing, ugust 2010

Document info
Document views28
Page views28
Page last viewedFri Jan 20 02:59:32 UTC 2017