Indexing and searching XML with swish-e
swish-e -f swish-indexes/mods.idx -w origami -p swishtitle
Like the output of the XHTML searching process, swish-e returns a pointer (file name) to documents matching your query. Unfortunately, XML files are not always very human readable, but using XSL you can transform the documents into something else. Consequently, you could manually pass parts of search results to xsltproc with a stylesheet, transform the document, and view it with something like this: xsltproc xslt/mods2xhtml-nosave.xsl xml-data/mods/many/piper-folk-1074964323.xml > res- ults.html; lynx results.html
Swish-e can search multiple files with a single query by designating multiple index locations with the -f option. This is increasingly useful if the multiple files have similar MetaNames and PropertyNames val- ues. Index the TEI data with the following command: swish-e -c swish-indexes/tei.cfg . Then search the index. Your queries can be much richer since the TEI files contain much more data:
swish-e -f swish-indexes/tei.idx -w love
swish-e -f swish-indexes/tei.idx -w love and war
swish-e -f swish-indexes/tei.idx -w love and war and art
swish-e -f swish-indexes/tei.idx -w love and war and art and science
Additional use of the -f and -p options produces broader results:
swish-e -f swish-indexes/tei.idx swish-indexes/xhtml.idx swish-indexes/mods.idx -w love and -p title
swish-e -f swish-indexes/tei.idx swish-indexes/xhtml.idx swish-indexes/mods.idx -w art and sci- ence -p title
As an extra exercise, index the set of "broken" EAD files using the ead.cfg file, and then search the res- ulting index while displaying the "scopecontent" in the output.
Indexing techniques and the use of relational databases are two sides of the same information retrieval coin. Relational databases are great tools especially for editing and maintaining data. While search is part of their equation, it is incumbered by complicated syntax and the lack of easy full text queries as well as relevance ranking. Indexing techniques, such as the ones implemented by swish-e, make search easy at the expense the inability to update the underlying data. By learning to combine the strengths of both relational database applications with indexes information providers can facilitate more powerful in- formation retrieval systems.