X hits on this document

PDF document

Evaluating Dependency Representation for Event Extraction - page 3 / 9

17 views

0 shares

0 downloads

0 comments

3 / 9

Figure 2: Stanford basic dependency tree

Figure 3: CoNLL-X dependency tree

Figure 4: Predicate Argument Structure

parsers are GDep (Sagae and Tsujii, 2007), the Bikel parser (Bikel) (Bikel, 2004), the Stanford parser with two probabilistic context-free gram- mar (PCFG) models1 (Wall Street Journal (WSJ) model (Stanford WSJ) and “augmented English” model (Stanford eng)) (Klein and Manning, 2003), the Charniak-Johnson reranking parser, using David McClosky’s self-trained biomedi- cal parsing model (MC) (McClosky, 2009), the C&C CCG parser, adapted to biomedical text (C&C) (Rimell and Clark, 2009), and the Enju parser with the GENIA model (Miyao et al.,

2009).

The

formats

are

Stanford

cies (SD) (Figure 2), the CoNLL-X

Dependen- dependency

format (CoNLL) (Figure argument structure (PAS)

3) and format

the predicate- used by Enju

(Figure

4).

With

the

exception

of

Stanford

and

Enju, the analyses of by the BioNLP 2009

these parsers were provided Shared Task organizers.

The six parsers operate in a number of different frameworks, reflected in their analyses. GDep is a native dependency parser that produces CoNLL dependency trees, with dependency types similar to those of CoNLL 2007. Bikel, Stanford, and MC

1Experiments showed no benefit from using the lexical- ized models with the Stanford parser.

781

Figure 5: Format conversion dependencies in six parsers. Formats adopted for the evaluation are shown in solid boxes. SD: Stanford Dependency format, CCG: Combinatory Categorial Grammar output format, PTB: Penn Treebank format, and PAS: Predicate Argument Structure in Enju for- mat.

are phrase-structure parsers trained on Penn Tree- bank format (PTB) style treebanks, and they pro- duce PTB trees. C&C is a deep parser based on Combinatory Categorial Grammar (CCG), and its native output is in a CCG-specific format. The output of C&C can be converted into SD by a rule-based conversion script (Rimell and Clark, 2009). Enju is deep parser based on Head-driven Phrase Structure Grammar (HPSG) and produces a format containing predicate argument structures along with a phrase structure tree in Enju format, which can be converted into PTB format (Miyao et al., 2009).

For direct comparison and for the study of con- tribution of the formats in which the six parsers output their analyses to task performance, we ap- ply a number of conversions between the out- puts, shown in Figure 5. The Enju PAS output is converted into PTB using the method introduced by (Miyao et al., 2009). SD is generated from PTB by the Stanford tools (de Marneffe et al., 2006), and CoNLL generated from PTB by us- ing Treebank Converter (Johansson and Nugues, 2007). With the exception of GDep, all CoNLL outputs are generated by the conversion and thus share dependency types. We note that all of these conversions can introduce some errors in the con- version process.

Document info
Document views17
Page views17
Page last viewedFri Oct 28 06:47:38 UTC 2016
Pages9
Paragraphs323
Words5145

Comments