Indexing and searching XML with swish-e
content="Biology, psychology, and medicine,
<meta name="author" <meta name="year" <meta name="title"
by Moritmer J. Adler and V. J. McGill. Pref. by Franz Alexander."/> content="Adler, Mortimer Jerome, 1902-"/> content="1963"/> content="Biology, psychology, and medicine"/>
content="Chicago, Encyclopedia Britannica content="xx, 395 p. illus. 22 cm."/>
<meta name="note" <meta name="subject"
content="Bibliography: p. 385-395."/> content="Biology Outlines, syllabi, etc. Psychology Outlines, syllabi, etc. Mind and body."/>
Each meta element is comprised of name and content attributes. By configuring the swish-e indexing process to look at these attribute pairs you can make each of them field searchable. When this function- ality is combined with the free text indexing against the document's body element swish-e indexes be- come very useful.
Swish-e supports many command-line arguments for indexing, but it is usually much easier to write a configuration file instead. Below is a configuration file (swish-indexes/xhtml.cfg) for indexing the con- tent of the marc2xhtml directory:
IndexDir xml-data/xhtml/marc2xhtml IndexFile swish-indexes/xhtml.idx IndexOnly .html
MetaNames id PropertyNames
brief author year title publisher pagination note subject id brief author year title publisher pagination note subject
Each line denotes a characteristic of the indexing process:
where is the data (the marc2xhtml directory)
what is the location and name of the resulting index (xhtml.idx)
what files should be indexes (only .html files)
what meta data fields should be indexed (all of them)
what meta data fields should be available for display (all of them)
Here's how to create your first index:
Edit the path statements in xhtml.cfg to suit your operating system. For example, on Windows you might have to change xml-data/xhtml/marc2xhtml to xml-data\xhtml\marc2xhtml.
Open a terminal session on your computer, change to the root level of the workshop's distribution directory and run swish-e with this command: swish-e -c swish-indexes/xhtml.cfg .