X hits on this document





108 / 142

Indexing and searching XML with swish-e

<meta name="brief"

content="Biology, psychology, and medicine,

<meta name="author" <meta name="year" <meta name="title"

by Moritmer J. Adler and V. J. McGill. Pref. by Franz Alexander."/> content="Adler, Mortimer Jerome, 1902-"/> content="1963"/> content="Biology, psychology, and medicine"/>

<meta <meta

name="publisher" name="pagination"

content="Chicago, Encyclopedia Britannica content="xx, 395 p. illus. 22 cm."/>


<meta name="note" <meta name="subject"

content="Bibliography: p. 385-395."/> content="Biology Outlines, syllabi, etc. Psychology Outlines, syllabi, etc. Mind and body."/>


Each meta element is comprised of name and content attributes. By configuring the swish-e indexing process to look at these attribute pairs you can make each of them field searchable. When this function- ality is combined with the free text indexing against the document's body element swish-e indexes be- come very useful.

Swish-e supports many command-line arguments for indexing, but it is usually much easier to write a configuration file instead. Below is a configuration file (swish-indexes/xhtml.cfg) for indexing the con- tent of the marc2xhtml directory:

IndexDir xml-data/xhtml/marc2xhtml IndexFile swish-indexes/xhtml.idx IndexOnly .html

MetaNames id PropertyNames

brief author year title publisher pagination note subject id brief author year title publisher pagination note subject

Each line denotes a characteristic of the indexing process:

  • where is the data (the marc2xhtml directory)

  • what is the location and name of the resulting index (xhtml.idx)

  • what files should be indexes (only .html files)

  • what meta data fields should be indexed (all of them)

  • what meta data fields should be available for display (all of them)

Here's how to create your first index:

  • 1.

    Edit the path statements in xhtml.cfg to suit your operating system. For example, on Windows you might have to change xml-data/xhtml/marc2xhtml to xml-data\xhtml\marc2xhtml.

  • 2.

    Open a terminal session on your computer, change to the root level of the workshop's distribution directory and run swish-e with this command: swish-e -c swish-indexes/xhtml.cfg .


Document info
Document views557
Page views563
Page last viewedFri Jan 20 10:55:00 UTC 2017