X hits on this document





103 / 142

XML and Perl

tion returns to the main program where each item in @all items is read, re-formatted, and sent to STDOUT. Output looks like this:


Cash boy by Alger Horatio, Jr. (1834-1899) - /xml-data/tei/alger-cash-1072580839.xml

Cast upon the breakers by Alger Horatio, Jr. (1834-1899) - /xml-data/tei/alger-cast-1072723382.xml

My watch: An instructive little xml-data/tei/twain-my-1072419743.xml


New crime: Legislation needed xml-data/tei/twain-new-1072420178.xml









  • -


  • -


  • Niagara by Twain, Mark (1835-1910) - /xml-data/tei/twain-niagara-1079905834.xml

Give title-index.pl a try against the set of TEI data supplied with the workshop. Be forewarned. You must supply the full path to the TEI directory. Otherwise the script will get confused. An example in- cludes: bin/title-index.pl /lamp/xml-data/tei/ .

A bit about DOM and SAX

XML is a predicable data structure. After all, that is the whole point. The XML may reside as a file. It may reside in computer memory. It may be manifested as a stream of bits coming over an Internet con- nection. In any case, the data needs to be read by the computer and something needs to be done with it. The process, the reading of of the data by the computer is called parsing, and the Internet community has articulated two standardized ways to facilitate processing. One is called DOM and the other is called SAX.

DOM (Document Object Model) views XML as a tree-like data structure. XML has a single root. "Re- member the six simple rules regarding XML files?" From that root grow innumerable branches, limbs, twigs, and leaves -- elements, sub-elements, sub-sub-elements, and ultimately data. DOM provides a standard set of object oriented methods for accessing sections, subsections, or any other specific part of XML. The method getDocumentElement, in the previous example, is a DOM method used to return all of the data from the root element. Other methods include but are certainly not limited to:

  • createDocument - initialize an XML document

  • setDocumentElement - create the root of an XML document

  • createElement - add an element to an XML document

  • createAttribute - add an attribute to an element

  • getAttribute - returns the value of an attribute

Since the DOM has been articulated, many programming languages and programming libraries have im- plemented it. XML::LibXML is one example. There are advantages and disadvantages to DOM. On the down side, DOM implementations require the entire XML data, file, or stream to be read into memory before processing can begin. This can cause huge memory requirements for the computer. On the other hand DOM processing allows the programmer to jump from section to section of XML with relative ease.

SAX (Simple API for XML) is an event-based parser. SAX reads XML data element by element watch- ing them open and close. As they do, "events" are called and computer programs read the data and pro-


Document info
Document views604
Page views610
Page last viewedMon Jan 23 13:10:40 UTC 2017