[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

A keyword retrieval system for historical Mongolian document images

Published: 01 March 2014 Publication History

Abstract

In this paper, we propose a keyword retrieval system for locating words in historical Mongolian document images. Based on the word spotting technology, a collection of historical Mongolian document images is converted into a collection of word images by word segmentation, and a number of profile-based features are extracted to represent word images. For each word image, a fixed-length feature vector is formulated by obtaining the appropriate number of the complex coefficients of discrete Fourier transform on each profile feature. The system supports online image-to-image matching by calculating similarities between a query word image and each word image in the collection, and consequently, a ranked result is returned in descending order of the similarities. Therein, the query word image can be generated by synthesizing a sequence of glyphs when being retrieved. By experimental evaluations, the performance of the system is confirmed.

References

[1]
Gao, G., Li, W., Hou, H., Li, Z.: Multi-agent based recognition system of printed Mongolian characters. In: Proceedings of the International Conference on Active Media Technology, pp. 376---381 (2003)
[2]
Wei, H., Gao, G.: Machine-printed traditional Mongolian characters recognition using BP neural networks. In: Proceedings of the International Conference on Computational Intelligence and Software Engineering (CiSE), pp. 1---7 (2009)
[3]
Peng, L., Liu, C., Ding, X., et al.: Multi-font printed Mongolian document recognition system. IJDAR 13(2), 93---106 (2010)
[4]
Gao, G., Su, X., Wei, H., Gong, Y.: Classical Mongolian words recognition in historical document. In: Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR), pp. 692---697 (2011)
[5]
Manmatha, R., Han, C., Riseman, E.M., Croft, W.B.: Indexing handwriting using word matching. In: Proceedings of 1st ACM International Conference on Digital Libraries (ICDL), pp. 151---159 (1996)
[6]
Rath, T.M., Manmatha, R.: Word spotting for historical documents. IJDAR 9(2), 139---152 (2007)
[7]
Rath, T.M., Manmatha, R.: Word image matching using dynamic time warping. In: Proceedings of 28th International Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 521---527 (2003)
[8]
Rath, T.M., Manmatha, R.: Features for word spotting in historical manuscripts. In: Proceedings of 7th International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 218---222 (2003)
[9]
Terasawa, K., Nagasaki, T., Kawashima, T.: Eigenspace method for text retrieval in historical document images. In: Proceedings of 8th International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 437---441 (2005)
[10]
Saabni, R., El-Sana, J.: Keyword searching for Arabic handwritten documents. In: Proceedings of the 11th International Conference on Frontiers in Handwriting recognition (ICFHR), pp. 716---722 (2008)
[11]
Rabaev, I., Biller, O., El-Sana, J., Kedem, K., Dinstein, I.: Case study in Hebrew character searching. In: Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR), pp. 1080---1084 (2011)
[12]
Saabni, R., El-Sana, J.: Word spotting for handwritten documents using Chamfer distance and dynamic time warping. In: Proceedings of Document Recognition and Retrieval XVIII (DRR), pp. 1---7 (2011)
[13]
Creating and Supporting OpenType Fonts for the Mongolian Script. http://www.microsoft.com/typography/otfntdev/mongolot/
[14]
Mongolian Language. http://en.wikipedia.org/wiki/Mongolian_language
[15]
Mongolian Script. http://en.wikipedia.org/wiki/Mongolian_script
[16]
Wei, H., Gao, G., Bao, Y., Wang, Y.: An effective binarization method for ancient Mongolian document images. In: Proceedings of the 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE), vol. 2, pp. 43---46 (2010)
[17]
Otsu, N.: A threshold selection method from gray level histograms. IEEE Trans. Syst. Man Cybern. 9, 62---66 (1979)
[18]
Kittler, J., Illingworth, J.: Minimum error thresholding. Pattern Recognit. 19, 41---47 (1986)
[19]
Duda, R., Hart, P., David G.: Pattern Classification, 2nd edn. Wiley, New York, pp. 528-530 (2001)
[20]
Aghbari, Z., Brook, S.: HAN manuscripts: a holistic paradigm for classifying and retrieving historical Arabic handwritten documents. Expert Syst. Appl. 36(8), 10942---10951 (2009)
[21]
Konidaris, T., Gatos, B., Ntzios, K., et al.: Keyword-guided word spotting in historical printed documents using synthetic data and user feedback. IJDAR 9, 167---177 (2007)
[22]
Abidi, A., Siddiqi, I., Khurshid, K.: Towards searchable digital Urdu libraries--a word spotting based retrieval approach. In: Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR), pp. 1344---1348 (2011)
[23]
Zagoris, K., Ergina, K., Papamarkos, N.: A document image retrieval system. Eng. Appl. Artif. Intell. 23(6), 872---879 (2010)
[24]
Rath, T., Manmatha, R., Lavrenko, V.: A search engine for historical manuscript images. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 369---376 (2004)
[25]
Kumar, A., Jawahar, C.V., Manmatha, R.: Efficient search in document image collections. In: Proceedings of the 8th Asian Conference on Computer Vision (ACCV), pp. 586---595 (2007)
[26]
Jain, A.K.: Fundamentals of Digital Image Processing. Prentice-Hall Press, Englewood Cliffs, pp. 151---154 (1989)
[27]
Discrete Fourier Transform. http://en.wikipedia.org/wiki/Discrete_Fourier_transform
[28]
Wei, H., Gao, G., Bao, Y.: A method for removing inflectional suffixes in word spotting of Mongolian Kanjur. In: Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR), pp. 88---92 (2011)
[29]
Manning, C.D., Raghavan, P., Schutze, H.: An Introduction to Information Retrieval. Cambridge UP, Cambridge, pp. 158---163 (2009)

Cited By

View all
  • (2024)Segmentation-Free Todo Mongolian OCR and its Public DatasetPattern Recognition and Computer Vision10.1007/978-981-97-8511-7_6(72-85)Online publication date: 18-Oct-2024
  • (2024)LABT: A Sequence-to-Sequence Model for Mongolian Handwritten Text Recognition with Local Aggregation BiLSTM and TransformerDocument Analysis and Recognition - ICDAR 202410.1007/978-3-031-70536-6_21(352-363)Online publication date: 30-Aug-2024
  • (2021)Data Augmentation Based on CycleGAN for Improving Woodblock-Printing Mongolian Words RecognitionDocument Analysis and Recognition – ICDAR 202110.1007/978-3-030-86337-1_35(526-537)Online publication date: 5-Sep-2021
  • Show More Cited By
  1. A keyword retrieval system for historical Mongolian document images

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image International Journal on Document Analysis and Recognition
    International Journal on Document Analysis and Recognition  Volume 17, Issue 1
    March 2014
    99 pages
    ISSN:1433-2833
    EISSN:1433-2825
    Issue’s Table of Contents

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 01 March 2014

    Author Tags

    1. Discrete Fourier transform
    2. Kanjur
    3. Profile features
    4. Query image synthesis
    5. Word spotting

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Segmentation-Free Todo Mongolian OCR and its Public DatasetPattern Recognition and Computer Vision10.1007/978-981-97-8511-7_6(72-85)Online publication date: 18-Oct-2024
    • (2024)LABT: A Sequence-to-Sequence Model for Mongolian Handwritten Text Recognition with Local Aggregation BiLSTM and TransformerDocument Analysis and Recognition - ICDAR 202410.1007/978-3-031-70536-6_21(352-363)Online publication date: 30-Aug-2024
    • (2021)Data Augmentation Based on CycleGAN for Improving Woodblock-Printing Mongolian Words RecognitionDocument Analysis and Recognition – ICDAR 202110.1007/978-3-030-86337-1_35(526-537)Online publication date: 5-Sep-2021
    • (2021)An Efficient Local Word Augment Approach for Mongolian Handwritten Script RecognitionDocument Analysis and Recognition – ICDAR 202110.1007/978-3-030-86337-1_29(429-443)Online publication date: 5-Sep-2021
    • (2019)End-to-End Model for Offline Handwritten Mongolian Word RecognitionNatural Language Processing and Chinese Computing10.1007/978-3-030-32236-6_19(220-230)Online publication date: 9-Oct-2019
    • (2018)Convolutional Neural Network for Machine-Printed Traditional Mongolian Font RecognitionNeural Information Processing10.1007/978-3-030-04221-9_24(265-274)Online publication date: 13-Dec-2018
    • (2016)A knowledge-based recognition system for historical Mongolian documentsInternational Journal on Document Analysis and Recognition10.1007/s10032-016-0267-119:3(221-235)Online publication date: 1-Sep-2016
    • (2015)Mongolian Inflection Suffix Processing in NLP: A Case StudyNatural Language Processing and Chinese Computing10.1007/978-3-319-25207-0_29(347-352)Online publication date: 9-Oct-2015
    • (2014)H-DocProProceedings of the First International Conference on Digital Access to Textual Cultural Heritage10.1145/2595188.2595203(131-136)Online publication date: 19-May-2014

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media