X hits on this document

467 views

0 shares

1 downloads

0 comments

102 / 142

XML and Perl

  • 3.

    XML and XSLT parser objects are initialized.

  • 4.

    Both of the input files are parsed.

  • 5.

    The stylesheet is validated.

  • 6.

    The XML file is transformed using the stylesheet.

  • 7.

    The transformed document is sent to STDOUT.

  • 8.

    The program quits.

Give xsltproc.pl a go using the source XSLT and XML data files supplied with the workshop's distribu- tion. From the root of the workshop distribution, examples might include:

bin/xsltproc.pl xslt/letter2xhtml.xsl getting-started/letter.xml

bin/xsltproc.pl xslt/tei2html.xsl xml-data/tei/poe-cask-1072537129.xml

bin/xsltproc.pl xslt/mods2xhtml-nosave.xsl data/mods/many/adler-development-1072276659.xml

xml-

Batch processing

Sometimes it will be necessary to read many XML documents to create the desired result. This is not very easy using pure XSLT, but Perl can come to the rescue. Title-index.pl is a program that reads all of the TEI files in a given directory, extracts the title and author information from each file, and creates an title/author index. The heart of the program is a subroutine called process_files:

sub process files {

_

  • #

    get the name of the found file

my $file = $File::Find::name;

  • #

    make sure it has the correct extension

next if ($file !~ m/\.xml$/);

  • #

    parse the file and extract the necessary data;

print "Processing $file... \n";

so

slow!

my $doc my $root

= $parser->parse_file($file); = $doc->getDocumentElement;

my @header = $root->findnodes('teiHeader');

  • #

    extract the desired data

my $autho my $title

r = $header[0]->findvalue('fileDesc/titleStmt/author'); = $header[0]->findvalue('fileDesc/titleStmt/title');

  • #

    save it

push @all items,

_

({author=>$author, title=>$title, file=>$file});

}

A f t e r a l l t h e f i l e s i n t h e g i v e n d i r e c t o r y a r e p r o c e s s e d a n d t h e @ a l l _ i t e m s a r r a y h a s b e e n f i l l e d , e x e c u -

94

Document info
Document views467
Page views473
Page last viewedFri Dec 09 17:39:43 UTC 2016
Pages142
Paragraphs3126
Words34660

Comments