Figure 2: Stanford basic dependency tree
Figure 3: CoNLL-X dependency tree
Figure 4: Predicate Argument Structure
parsers are GDep (Sagae and Tsujii, 2007), the Bikel parser (Bikel) (Bikel, 2004), the Stanford parser with two probabilistic context-free gram- mar (PCFG) models1 (Wall Street Journal (WSJ) model (Stanford WSJ) and “augmented English” model (Stanford eng)) (Klein and Manning, 2003), the Charniak-Johnson reranking parser, using David McClosky’s self-trained biomedi- cal parsing model (MC) (McClosky, 2009), the C&C CCG parser, adapted to biomedical text (C&C) (Rimell and Clark, 2009), and the Enju parser with the GENIA model (Miyao et al.,
cies (SD) (Figure 2), the CoNLL-X
format (CoNLL) (Figure argument structure (PAS)
3) and format
the predicate- used by Enju
Enju, the analyses of by the BioNLP 2009
these parsers were provided Shared Task organizers.
The six parsers operate in a number of different frameworks, reflected in their analyses. GDep is a native dependency parser that produces CoNLL dependency trees, with dependency types similar to those of CoNLL 2007. Bikel, Stanford, and MC
1Experiments showed no benefit from using the lexical- ized models with the Stanford parser.
Figure 5: Format conversion dependencies in six parsers. Formats adopted for the evaluation are shown in solid boxes. SD: Stanford Dependency format, CCG: Combinatory Categorial Grammar output format, PTB: Penn Treebank format, and PAS: Predicate Argument Structure in Enju for- mat.
are phrase-structure parsers trained on Penn Tree- bank format (PTB) style treebanks, and they pro- duce PTB trees. C&C is a deep parser based on Combinatory Categorial Grammar (CCG), and its native output is in a CCG-specific format. The output of C&C can be converted into SD by a rule-based conversion script (Rimell and Clark, 2009). Enju is deep parser based on Head-driven Phrase Structure Grammar (HPSG) and produces a format containing predicate argument structures along with a phrase structure tree in Enju format, which can be converted into PTB format (Miyao et al., 2009).
For direct comparison and for the study of con- tribution of the formats in which the six parsers output their analyses to task performance, we ap- ply a number of conversions between the out- puts, shown in Figure 5. The Enju PAS output is converted into PTB using the method introduced by (Miyao et al., 2009). SD is generated from PTB by the Stanford tools (de Marneffe et al., 2006), and CoNLL generated from PTB by us- ing Treebank Converter (Johansson and Nugues, 2007). With the exception of GDep, all CoNLL outputs are generated by the conversion and thus share dependency types. We note that all of these conversions can introduce some errors in the con- version process.