X hits on this document

429 views

0 shares

1 downloads

0 comments

111 / 142

Indexing and searching XML with swish-e

  • swish-e -f swish-indexes/mods.idx -w origami -p swishtitle

Like the output of the XHTML searching process, swish-e returns a pointer (file name) to documents matching your query. Unfortunately, XML files are not always very human readable, but using XSL you can transform the documents into something else. Consequently, you could manually pass parts of search results to xsltproc with a stylesheet, transform the document, and view it with something like this: xsltproc xslt/mods2xhtml-nosave.xsl xml-data/mods/many/piper-folk-1074964323.xml > res- ults.html; lynx results.html

Swish-e can search multiple files with a single query by designating multiple index locations with the -f option. This is increasingly useful if the multiple files have similar MetaNames and PropertyNames val- ues. Index the TEI data with the following command: swish-e -c swish-indexes/tei.cfg . Then search the index. Your queries can be much richer since the TEI files contain much more data:

  • swish-e -f swish-indexes/tei.idx -w love

  • swish-e -f swish-indexes/tei.idx -w love and war

  • swish-e -f swish-indexes/tei.idx -w love and war and art

  • swish-e -f swish-indexes/tei.idx -w love and war and art and science

Additional use of the -f and -p options produces broader results:

  • swish-e -f swish-indexes/tei.idx swish-indexes/xhtml.idx swish-indexes/mods.idx -w love and -p title

  • swish-e -f swish-indexes/tei.idx swish-indexes/xhtml.idx swish-indexes/mods.idx -w art and sci- ence -p title

As an extra exercise, index the set of "broken" EAD files using the ead.cfg file, and then search the res- ulting index while displaying the "scopecontent" in the output.

Indexing techniques and the use of relational databases are two sides of the same information retrieval coin. Relational databases are great tools especially for editing and maintaining data. While search is part of their equation, it is incumbered by complicated syntax and the lack of easy full text queries as well as relevance ranking. Indexing techniques, such as the ones implemented by swish-e, make search easy at the expense the inability to update the underlying data. By learning to combine the strengths of both relational database applications with indexes information providers can facilitate more powerful in- formation retrieval systems.

103

Document info
Document views429
Page views435
Page last viewedWed Dec 07 22:38:38 UTC 2016
Pages142
Paragraphs3126
Words34660

Comments