" is used to specify an index while when one
is used. If -x option is used and no file list is specified either
in the command line or with -F option, sgrep obtains the list of
queried files straight from the index.
---------------------------------------------------------------------------
EXAMPLE
---------------------------------------------------------------------------
Here the input file xml.html is taken from Robin Cover's excellent
WWW-page at http://www.sil.org/sgml/xml.html
Example: Find all P elements containing word "newsfax":
% time sgrep -x xml.index 'stag("P") .. etag("P") containing word("newsfax")'
[August 19, 1998] Tools and
Utilities from Robert Hanson: XML::Parser, LOTE NewsFax to XML Parsers,
LOTE XML to Kingdom Summaries, XML Script Server Parser.
0.03user 0.03system 0:00.25elapsed 24%CPU (0avgtext+0avgdata 0maxresident)k
The same example without using index:
% time sgrep -i 'stag("P") .. etag("P") containing word("newsfax")' xml.html
[August 19, 1998] Tools and
Utilities from Robert Hanson: XML::Parser, LOTE NewsFax to XML Parsers,
LOTE XML to Kingdom Summaries, XML Script Server Parser.
0.82user 0.05system 0:01.18elapsed 73%CPU (0avgtext+0avgdata 0maxresident)k
---------------------------------------------------------------------------
ANOTHER EXAMPLE
---------------------------------------------------------------------------
This example uses Jon Bosak's XML-example material: religious texts
and Shakespeare's works. Since this query is slightly more complex,
it has been put together from smaller parts by using m4. File "filelist"
contains the list of all XML-example files from Bosak's collection.
Here is the file "query"
# Finds elements having given name
define(ELEMENT, (stag($1) .. etag($1)))
# Finds LINE elements
define(E_LINE, (ELEMENT("LINE")))
# Finds SPEECH elements
define(E_SPEECH, (ELEMENT("SPEECH")))
# Finds SPEECH elements where HAMLET is speaking
define(HAMLET_SPEAKING, (E_SPEECH containing (
ELEMENT("SPEAKER") containing word("HAMLET"))))
# Finds LINE elements containing words to, be, not and question
define(TOBENOTQUESTION, (E_LINE containing word("to") containing word("be")
containing word("not") containing word("question")))
# Finds the LINE where HAMLET says the famous words
define(HAMLET_SAYS, (TOBENOTQUESTION in HAMLET_SPEAKING))
Evaluate the query using plain search:
% time sgrep -o "%f:\n %r\n" -f query -e HAMLET_SAYS -F filelist
/xml/shakespeare.1.10.xml/hamlet.xml:
To be, or not to be: that is the question:
16.60user 0.78system 0:18.29elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (327major+455minor)pagefaults 0swaps
Create an index of the input texts:
% time sgrep -I -c index -v -F filelist
Indexing 43/43 files 14957/14958K (99%)
Writing index file of 5472K
Writing index 35840/36691 entries (97%)
23.65user 4.56system 0:32.77elapsed 86%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (94major+5928minor)pagefaults 0swaps
Evaluate the query using index:
% time sgrep -x index -o "%f:\n %r\n" -f query -e HAMLET_SAYS -F filelist
/xml/shakespeare.1.10.xml/hamlet.xml:
To be, or not to be: that is the question:
1.24user 0.13system 0:01.43elapsed 95%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (536major+728minor)pagefaults 0swaps
---------------------------------------------------------------------------
THAT'S IT! Enjoy!
---------------------------------------------------------------------------
Please send comments about sgrep-2.0 to
Jani Jaakkola (jjaakkol@cs.helsinki.fi).