Abstract
This paper introduces a new approach to add fault-tolerance to a fulltext retrieval system. The weighted pattern morphing technique circumvents some of the disadvantages of the widely used edit distance measure and can serve as a front end to almost any fast non fault-tolerant search engine. The technique enables approximate searches by carefully generating a set of modified patterns (morphs) from the original user pattern and by searching for promising members of this set by a non fault-tolerant search backend. Morphing is done by recursively applying so called submorphs, driven by a penalty weight matrix. The algorithm can handle phonetic similarities that often occur in multilingual scientific encyclopedias as well as normal typing errors such as omission or swapping of letters. We demonstrate the process of filtering out less promising morphs. We also show how results from approximate search experiments carried out on a huge encyclopedic text corpus were used to determine reasonable parameter settings.
A commercial pharmaceutic CD-ROM encyclopedia, a dermatological online encyclopedia and an online e-Learning system use an implementation of the presented approach and thus prove its “road capability”.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bruchhausen, F.v., et al.: Hagers Handbuch der Pharmazeutischen Praxis. 10(+2) Bände. u. Folgebände. Springer, Heidelberg (1992-2000)
Blaschek, W., Ebel, S., Hackenthal, E., Holzgrabe, U., Keller, K., Reichling, J. (eds.): HagerROM 2003 - Hagers Handbuch der Drogen und Arzneistoffe. CD-ROM. Springer, Heidelberg (2003), http://www.hagerrom.de
Kukich, K.: Technique for automatically correcting words in text. ACM Computing Surveys 24(4), 377–439 (1992)
Navarro, G., Baeza-Yates, R., Sutinen, E., Tarhio, J.: Indexing Methods for Approximate String Matching. IEEE Bulletin of the Technical Committee on Data Engineering 24(4), 19–27 (2001)
Navarro, G., Baeza-Yates, R.: A Practical q-Gram Index for Text Retrieval Allowing Errors. CLEI Electronic Journal 1(2), 1 (1998)
Sutinen, E.: Filtration with q-Samples in Approximate String Matching. In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 50–63. Springer, Heidelberg (1996)
Ukkonen, E.: Approximate string-matching with q-grams and maximal matches. Theoretical Computer Science 92, 191–211 (1992)
Russell, R.: INDEX (Soundex Patent). U.S. Patent No. 1,261 167, 1–4 (1918)
Zobel, J., Dart, P.: Phonetic String Matching: Lessons from Information Retrieval. In: SIGIR 1996, pp. 166–172. ACM Press, New York (1996)
Hodge, V., Austin, J.: An Evaluation of Phonetic Spell Checkers. Dept. of CS, University of York, U.K. (2001)
Jokinen, P., Ukkonen, E.: Two algorithms for approximate string matching in static texts. In: Tarlecki, A. (ed.) MFCS 1991. LNCS, vol. 520, pp. 240–248. Springer, Heidelberg (1991)
Myers, E.: A sublinear algorithm for approximate keyword searching. Algorithmica 12(4/5), 345–374 (1994)
Levenshtein, V.: Binary codes capable of correcting deletions, insertions, and reversals. Problems in Information Transmission 1, 8–17 (1965)
Stephen, G.: String Searching Algorithms. Lecture Notes Series on Computing, vol. 3. World Scientific Publishing, Singapore (1994)
Rosenfelder, M.: Hou tu pranownse Inglish (2003), http://www.zompist.com/spell.html
Grimm, M.: Random Access und Caching für q-Gramm-Suchverfahren. Lehrstuhl für Informatik II, Universität Würzburg (2001)
Projekt DEJAVU: Homepage (2003), http://www.projekt-dejavu.de
Altmeyer, P., Bacharach-Buhles, M.: Springer Enzyklopädie Dermatologie, Allergologie, Umweltmedizin. Springer, Heidelberg (2002), http://www.galderma.de/anmeldung_enz.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Esser, W.M. (2004). Fault-Tolerant Fulltext Information Retrieval in Digital Multilingual Encyclopedias with Weighted Pattern Morphing. In: McDonald, S., Tait, J. (eds) Advances in Information Retrieval. ECIR 2004. Lecture Notes in Computer Science, vol 2997. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24752-4_25
Download citation
DOI: https://doi.org/10.1007/978-3-540-24752-4_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21382-6
Online ISBN: 978-3-540-24752-4
eBook Packages: Springer Book Archive