X hits on this document

459 views

0 shares

1 downloads

0 comments

104 / 142

XML and Perl

cess it accordingly. As a data structure, XML is a FILO (First In, Last Out) stack, and as the parser makes it way through the stack computing tasks take place. SAX has the advantage over DOM in that it is not memory intensive. Its primary disadvantage is the lack of moving around the data structure. There really aren't very many events a SAX program needs to trap:

  • start_document - triggered when the document is opened

  • end_document - triggered when the document is closed

  • start_element - triggered as an element is opened

  • end_element - triggered as an element is closed

  • characters - triggered as after an element is opened but before it is closed

Below is a rudimentary Perl SAX script reading any URL pointing to XML data. While the data is being read it will output what elements are opened, closed, and their content.

#!/usr/bin/perl

use strict; use XML::SAX::ParserFactory;

my $handler = MyHandler->new(); my $parser = XML::SAX::ParserFactory->parser(Handler => $handler);

$parser->parse uri($ARGV[0]); exit; package MyHandler;

_

sub new { my $type = shift; return bless {}, $type;

}

sub start_element { my ($self, $element) = @_; print "Starting element $element->{Name}\n";

}

sub end_element { my ($self, $element) = @_; print "Ending element $element->{Name}\n";

}

sub characters { my ($self, $characters) = @_; print "characters: $characters->{Data}\n";

} 1;

A program from the workbook's distribution (fix-ead.pl) uses SAX to read many files from a directory and convert them to a version of EAD/XML that validates against the latest version of the EAD DTD.

96

Document info
Document views459
Page views465
Page last viewedFri Dec 09 11:57:50 UTC 2016
Pages142
Paragraphs3126
Words34660

Comments