X hits on this document

487 views

0 shares

1 downloads

0 comments

109 / 142

Indexing and searching XML with swish-e

$ swish-e -c swish-indexes/xhtml.cfg Indexing Data Source: "File-System" Indexing "xml-data/xhtml/marc2xhtml" Removing very common words... no words removed. Writing main index... Sorting words ... Sorting 10,687 words alphabetically Writing header ... Writing index entries ...

Writing word text: Complete Writing word hash: Complete Writing word data: Complete 10,687 unique words indexed. 13 properties sorted. 567 files indexed. 996,001 total bytes. Elapsed time: 00:00:04 CPU time: 00:00:04 Indexing done!

112,581 total words.

You can now search your newly created index. For example, search for origami: swish-e -f swish-in- dexes/xhtml.idx -w origami . This particular example only has two parts:

-f denotes what index to search

-w denotes the query

The search results will look a lot like this:

$ # # # # # #

swish-e -f swish-inde SWISH format: 2.4.2 Search words: origami Removed stopwords: Number of hits: 7 Search time: 0.002 se Run time: 0.046 secon

1000 xml-data/xhtml/mar

962

xml-data/xhtml/marc

962

xml-data/xhtml/marc

916

xml-data/xhtml/marc

916

xml-data/xhtml/marc

916

xml-data/xhtml/marc

866 xml-data/xhtml/marc

xes/xhtml.idx -w origami

conds ds c2xhtml/kawai-colorful-1071929621.html "Colorful origami" 1 2xhtml/gross-origami-1071930127.html "Origami" 1939 2xhtml/montroll-origami-1071930205.html "Origami sea life" 2xhtml/lang-complete-1071929822.html "Complete book of orig 2xhtml/honda-world-1071930499.html "World of origami" 1369 2xhtml/engel-folding-1072054962.html "Folding the universe" 2xhtml/honda-how-1071929753.html "How to make origami" 1496

The last few lines of the output are divided into four parts:

  • 1.

    the relevance score

  • 2.

    the pointer to the file matching the query

  • 3.

    the title of the file

  • 4.

    the size of the file

101

Document info
Document views487
Page views493
Page last viewedSun Dec 11 10:38:29 UTC 2016
Pages142
Paragraphs3126
Words34660

Comments