X hits on this document

384 views

0 shares

1 downloads

0 comments

103 / 142

XML and Perl

tion returns to the main program where each item in @all items is read, re-formatted, and sent to STDOUT. Output looks like this:

_

Cash boy by Alger Horatio, Jr. (1834-1899) - /xml-data/tei/alger-cash-1072580839.xml

Cast upon the breakers by Alger Horatio, Jr. (1834-1899) - /xml-data/tei/alger-cast-1072723382.xml

My watch: An instructive little xml-data/tei/twain-my-1072419743.xml

tale

New crime: Legislation needed xml-data/tei/twain-new-1072420178.xml

by

by

Twain,

Mark

(1835-1910)

Twain,

Mark

(1835-1910)

  • -

    /

  • -

    /

  • Niagara by Twain, Mark (1835-1910) - /xml-data/tei/twain-niagara-1079905834.xml

Give title-index.pl a try against the set of TEI data supplied with the workshop. Be forewarned. You must supply the full path to the TEI directory. Otherwise the script will get confused. An example in- cludes: bin/title-index.pl /lamp/xml-data/tei/ .

A bit about DOM and SAX

XML is a predicable data structure. After all, that is the whole point. The XML may reside as a file. It may reside in computer memory. It may be manifested as a stream of bits coming over an Internet con- nection. In any case, the data needs to be read by the computer and something needs to be done with it. The process, the reading of of the data by the computer is called parsing, and the Internet community has articulated two standardized ways to facilitate processing. One is called DOM and the other is called SAX.

DOM (Document Object Model) views XML as a tree-like data structure. XML has a single root. "Re- member the six simple rules regarding XML files?" From that root grow innumerable branches, limbs, twigs, and leaves -- elements, sub-elements, sub-sub-elements, and ultimately data. DOM provides a standard set of object oriented methods for accessing sections, subsections, or any other specific part of XML. The method getDocumentElement, in the previous example, is a DOM method used to return all of the data from the root element. Other methods include but are certainly not limited to:

  • createDocument - initialize an XML document

  • setDocumentElement - create the root of an XML document

  • createElement - add an element to an XML document

  • createAttribute - add an attribute to an element

  • getAttribute - returns the value of an attribute

Since the DOM has been articulated, many programming languages and programming libraries have im- plemented it. XML::LibXML is one example. There are advantages and disadvantages to DOM. On the down side, DOM implementations require the entire XML data, file, or stream to be read into memory before processing can begin. This can cause huge memory requirements for the computer. On the other hand DOM processing allows the programmer to jump from section to section of XML with relative ease.

SAX (Simple API for XML) is an event-based parser. SAX reads XML data element by element watch- ing them open and close. As they do, "events" are called and computer programs read the data and pro-

95

Document info
Document views384
Page views390
Page last viewedSun Dec 04 00:07:50 UTC 2016
Pages142
Paragraphs3126
Words34660

Comments