Indexing and searching XML with swish-e
If you have Lynx, a text-based browser, then you should be able to run Lynx and point it to one of the files in the results like this: lynx xml-data/xhtml/marc2xhtml/honda-how-1071929753.html . Altern- atively you could open any of the files in your graphical browser.
Swish-e supports a bevy of search operations including all the expected Boolean operations, phrase searching, field searching, right-hand truncation (implemented using a "*"), and nested queries. Con- sequently, all of the following swish-e queries are valid:
swish-e -f swish-indexes/xhtml.idx -w origami
swish-e -f swish-indexes/xhtml.idx -w origami and colorful
swish-e -f swish-indexes/xhtml.idx -w "colorful origami"
swish-e -f swish-indexes/xhtml.idx -w title=origami
swish-e -f swish-indexes/xhtml.idx -w author=honda
Using the -p command line option you can alter the output to include the properties denoted in your con- figuration file. Thus the following command, swish-e -f swish-indexes/xhtml.idx -w author=honda -p pagination , will display the pagination as well as the default information (again, lines have been hard- wrapped for readability):
$ # # # # # #
swish-e -f swish-indexes/xhtml.idx -w author=honda -p pagination SWISH format: 2.4.2 Search words: author=honda Removed stopwords: Number of hits: 2 Search time: 0.001 seconds Run time: 0.045 seconds
1000 xml-data/xhtml/marc2xhtml/honda-world-1071930499.html "World of origami" 1369
"xii, 13-264 p. illus. (some col.) 31 cm." 1000 xml-data/xhtml/marc2xhtml/honda-how-1071929753.html "How to make origami" 149
"37 p. col. illus. (part fold. mounted) 26 cm.
Indexing other XML formats
Indexing and searching other XML formats is very similar to indexing XHTML. First you create your content. Then you write a swish-e configuration file. Third, you index content. Fourth, you search. Last, you retrieve while optionally transforming your XML for display.
Let's index a set of MODS files. Open any of the MODS files in the xml-data/mods/many directory. Take note of the element names. Open swish-indexes/mods.cfg and take note of the MetaNames and PropertyNames directives. Notice how swishtitle has been added to the MetaNames directive. Swishtitle is a meta data value you get for free with swish-e, but in order for it to be searchable in your index you sometimes need to specifically include it in the configuration file. Index the MODS data with the com- mand swish-e -c swish-indexes/mods.cfg , and give the following searches a whirl:
swish-e -f swish-indexes/mods.idx -w title=cannery
swish-e -f swish-indexes/mods.idx -w love -p title