recognition of binary relations of between pro- teins, is one of the most basic information ex- traction tasks in the BioNLP field. Our findings do not conflict with those of Miyao et al. Event extraction can be viewed as an additional extrin- sic evaluation task for syntactic parsers, providing more reliable and evaluation and a broader per- spective into parser performance. An additional advantage of application-oriented evaluation on BioNLP shared task data is the availability of a manually annotated gold standard treebank, the GENIA treebank, that covers the same set of ab- stracts as the task data. This allows the gold tree- bank to be considered as an evaluation standard, in addition to comparison of performance in the primary task.
formats, further adding information provided by other formats, such as the lexical entries of the Enju format, from external resources. The results of this paper are expected to be useful as a guide not only for parser selection for biomedical infor- mation extraction but also for the development of event extraction systems.
The comparison in the present evaluation is limited to the dependency representation. As fu- ture work, it would be informative to extend the comparison to other syntactic representation, such as the PTB format. Finally, the evaluation showed that the system fails to recover approximately 40% of events even when provided with manually annotated treebank data, showing that other meth- ods and resources need to be adopted to further improve bio-molecular event extraction systems. Such improvement is left as future work.
We compared six parsers and three formats on a bio-molecular event extraction task with a state- of-the-art event extraction system from two dif- ferent aspects: dependency-based intrinsic eval- uation and task-based extrinsic evaluation. The specific task considered was the BioNLP shared task, allowing the use of the GENIA treebank as a gold standard parse reference. Five of the six considered parsers were applied using biomedi- cal models trained on the GENIA treebank, and they were found to produce similar performance. The comparison of the parsers from two aspects showed slightly different results, and and the dependency representations have advantages and disadvantages for the event extraction task.
This work was partially supported by Grant-in- Aid for Specially Promoted Research (MEXT, Japan), Genome Network Project (MEXT, Japan), and Scientific Research (C) (General) (MEXT, Japan).
The contributions of this paper are 1) the com- parison of intrinsic and extrinsic evaluation on several commonly used parsers with a state-of- the-art system, and 2) demonstration of the lim- itation and possibility of the parser and system improvement on the task. One limitation of this study is that the comparison between the parsers is not perfect, as the parsers are used with the pro- vided models, the format conversions miss some information from the original formats, and results with different formats depend on the ability of the event extraction system to take advantage of their strengths. To maximize comparability, the system was designed to extract features identi- cally from similar parts of the dependency-based