More Web Proxy on the site http://driver.im/

article

A keyword retrieval system for historical Mongolian document images

Authors:

Guanglai GaoAuthors Info & Claims

International Journal on Document Analysis and Recognition, Volume 17, Issue 1

Pages 33 - 45

https://doi.org/10.1007/s10032-013-0203-6

Published: 01 March 2014 Publication History

Abstract

In this paper, we propose a keyword retrieval system for locating words in historical Mongolian document images. Based on the word spotting technology, a collection of historical Mongolian document images is converted into a collection of word images by word segmentation, and a number of profile-based features are extracted to represent word images. For each word image, a fixed-length feature vector is formulated by obtaining the appropriate number of the complex coefficients of discrete Fourier transform on each profile feature. The system supports online image-to-image matching by calculating similarities between a query word image and each word image in the collection, and consequently, a ranked result is returned in descending order of the similarities. Therein, the query word image can be generated by synthesizing a sequence of glyphs when being retrieved. By experimental evaluations, the performance of the system is confirmed.

References

[1]

Gao, G., Li, W., Hou, H., Li, Z.: Multi-agent based recognition system of printed Mongolian characters. In: Proceedings of the International Conference on Active Media Technology, pp. 376---381 (2003)

[2]

Wei, H., Gao, G.: Machine-printed traditional Mongolian characters recognition using BP neural networks. In: Proceedings of the International Conference on Computational Intelligence and Software Engineering (CiSE), pp. 1---7 (2009)

[3]

Peng, L., Liu, C., Ding, X., et al.: Multi-font printed Mongolian document recognition system. IJDAR 13(2), 93---106 (2010)

Digital Library

[4]

Gao, G., Su, X., Wei, H., Gong, Y.: Classical Mongolian words recognition in historical document. In: Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR), pp. 692---697 (2011)

[5]

Manmatha, R., Han, C., Riseman, E.M., Croft, W.B.: Indexing handwriting using word matching. In: Proceedings of 1st ACM International Conference on Digital Libraries (ICDL), pp. 151---159 (1996)

Digital Library

[6]

Rath, T.M., Manmatha, R.: Word spotting for historical documents. IJDAR 9(2), 139---152 (2007)

[7]

Rath, T.M., Manmatha, R.: Word image matching using dynamic time warping. In: Proceedings of 28th International Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 521---527 (2003)

[8]

Rath, T.M., Manmatha, R.: Features for word spotting in historical manuscripts. In: Proceedings of 7th International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 218---222 (2003)

[9]

Terasawa, K., Nagasaki, T., Kawashima, T.: Eigenspace method for text retrieval in historical document images. In: Proceedings of 8th International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 437---441 (2005)

[10]

Saabni, R., El-Sana, J.: Keyword searching for Arabic handwritten documents. In: Proceedings of the 11th International Conference on Frontiers in Handwriting recognition (ICFHR), pp. 716---722 (2008)

[11]

Rabaev, I., Biller, O., El-Sana, J., Kedem, K., Dinstein, I.: Case study in Hebrew character searching. In: Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR), pp. 1080---1084 (2011)

[12]

Saabni, R., El-Sana, J.: Word spotting for handwritten documents using Chamfer distance and dynamic time warping. In: Proceedings of Document Recognition and Retrieval XVIII (DRR), pp. 1---7 (2011)

[13]

Creating and Supporting OpenType Fonts for the Mongolian Script. http://www.microsoft.com/typography/otfntdev/mongolot/

[14]

Mongolian Language. http://en.wikipedia.org/wiki/Mongolian_language

[15]

Mongolian Script. http://en.wikipedia.org/wiki/Mongolian_script

[16]

Wei, H., Gao, G., Bao, Y., Wang, Y.: An effective binarization method for ancient Mongolian document images. In: Proceedings of the 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE), vol. 2, pp. 43---46 (2010)

[17]

Otsu, N.: A threshold selection method from gray level histograms. IEEE Trans. Syst. Man Cybern. 9, 62---66 (1979)

[18]

Kittler, J., Illingworth, J.: Minimum error thresholding. Pattern Recognit. 19, 41---47 (1986)

Digital Library

[19]

Duda, R., Hart, P., David G.: Pattern Classification, 2nd edn. Wiley, New York, pp. 528-530 (2001)

[20]

Aghbari, Z., Brook, S.: HAN manuscripts: a holistic paradigm for classifying and retrieving historical Arabic handwritten documents. Expert Syst. Appl. 36(8), 10942---10951 (2009)

Digital Library

[21]

Konidaris, T., Gatos, B., Ntzios, K., et al.: Keyword-guided word spotting in historical printed documents using synthetic data and user feedback. IJDAR 9, 167---177 (2007)

Digital Library

[22]

Abidi, A., Siddiqi, I., Khurshid, K.: Towards searchable digital Urdu libraries--a word spotting based retrieval approach. In: Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR), pp. 1344---1348 (2011)

[23]

Zagoris, K., Ergina, K., Papamarkos, N.: A document image retrieval system. Eng. Appl. Artif. Intell. 23(6), 872---879 (2010)

Digital Library

[24]

Rath, T., Manmatha, R., Lavrenko, V.: A search engine for historical manuscript images. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 369---376 (2004)

Digital Library

[25]

Kumar, A., Jawahar, C.V., Manmatha, R.: Efficient search in document image collections. In: Proceedings of the 8th Asian Conference on Computer Vision (ACCV), pp. 586---595 (2007)

Digital Library

[26]

Jain, A.K.: Fundamentals of Digital Image Processing. Prentice-Hall Press, Englewood Cliffs, pp. 151---154 (1989)

[27]

Discrete Fourier Transform. http://en.wikipedia.org/wiki/Discrete_Fourier_transform

[28]

Wei, H., Gao, G., Bao, Y.: A method for removing inflectional suffixes in word spotting of Mongolian Kanjur. In: Proceedings of the 11th International Conference on Document Analysis and Recognition (ICDAR), pp. 88---92 (2011)

[29]

Manning, C.D., Raghavan, P., Schutze, H.: An Introduction to Information Retrieval. Cambridge UP, Cambridge, pp. 158---163 (2009)

Cited By

Wang WBao FZhang H(2024)Segmentation-Free Todo Mongolian OCR and its Public DatasetPattern Recognition and Computer Vision10.1007/978-981-97-8511-7_6(72-85)Online publication date: 18-Oct-2024
https://dl.acm.org/doi/10.1007/978-981-97-8511-7_6
Li YWei HSun S(2024)LABT: A Sequence-to-Sequence Model for Mongolian Handwritten Text Recognition with Local Aggregation BiLSTM and TransformerDocument Analysis and Recognition - ICDAR 202410.1007/978-3-031-70536-6_21(352-363)Online publication date: 30-Aug-2024
https://dl.acm.org/doi/10.1007/978-3-031-70536-6_21
Wei HLiu KZhang JFan D(2021)Data Augmentation Based on CycleGAN for Improving Woodblock-Printing Mongolian Words RecognitionDocument Analysis and Recognition – ICDAR 202110.1007/978-3-030-86337-1_35(526-537)Online publication date: 5-Sep-2021
https://dl.acm.org/doi/10.1007/978-3-030-86337-1_35
Show More Cited By

A keyword retrieval system for historical Mongolian document images
1. Applied computing
  1. Document management and text processing
    1. Document capture

Recommendations

Word spotting application in historical mongolian document images
ICIC'13: Proceedings of the 9th international conference on Intelligent Computing Theories

This paper proposes a framework based on the word spotting technology for indexing and retrieving the historical Mongolian document images. In the framework, the scanned document images are segmented into word images by some preprocessing steps such as ...
Multi-font printed Mongolian document recognition system
Special Issue DRR09

Mongolian is one of the most common written languages in China, Mongolia, and Russia. Many printed Mongolian documents still remain to be digitized for digital library applications. The traditional Mongolian script has a unique vertical cursive writing ...
A probabilistic method for keyword retrieval in handwritten document images

Keyword retrieval in handwritten document images is a challenging task because handwriting recognition does not perform adequately to produce the transcriptions, specially when using large lexicons. Existing methods build indices using OCR distances or ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image International Journal on Document Analysis and Recognition

International Journal on Document Analysis and Recognition Volume 17, Issue 1

March 2014

99 pages

ISSN:1433-2833

EISSN:1433-2825

Issue’s Table of Contents

Copyright © Copyright © 2014 Springer-Verlag Berlin Heidelberg.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 March 2014

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang WBao FZhang H(2024)Segmentation-Free Todo Mongolian OCR and its Public DatasetPattern Recognition and Computer Vision10.1007/978-981-97-8511-7_6(72-85)Online publication date: 18-Oct-2024
https://dl.acm.org/doi/10.1007/978-981-97-8511-7_6
Li YWei HSun S(2024)LABT: A Sequence-to-Sequence Model for Mongolian Handwritten Text Recognition with Local Aggregation BiLSTM and TransformerDocument Analysis and Recognition - ICDAR 202410.1007/978-3-031-70536-6_21(352-363)Online publication date: 30-Aug-2024
https://dl.acm.org/doi/10.1007/978-3-031-70536-6_21
Wei HLiu KZhang JFan D(2021)Data Augmentation Based on CycleGAN for Improving Woodblock-Printing Mongolian Words RecognitionDocument Analysis and Recognition – ICDAR 202110.1007/978-3-030-86337-1_35(526-537)Online publication date: 5-Sep-2021
https://dl.acm.org/doi/10.1007/978-3-030-86337-1_35
Zhang HChen WSu XGuo HXu H(2021)An Efficient Local Word Augment Approach for Mongolian Handwritten Script RecognitionDocument Analysis and Recognition – ICDAR 202110.1007/978-3-030-86337-1_29(429-443)Online publication date: 5-Sep-2021
https://dl.acm.org/doi/10.1007/978-3-030-86337-1_29
Wei HLiu CZhang HBao FGao G(2019)End-to-End Model for Offline Handwritten Mongolian Word RecognitionNatural Language Processing and Chinese Computing10.1007/978-3-030-32236-6_19(220-230)Online publication date: 9-Oct-2019
https://dl.acm.org/doi/10.1007/978-3-030-32236-6_19
Wei HWen YWang WGao G(2018)Convolutional Neural Network for Machine-Printed Traditional Mongolian Font RecognitionNeural Information Processing10.1007/978-3-030-04221-9_24(265-274)Online publication date: 13-Dec-2018
https://dl.acm.org/doi/10.1007/978-3-030-04221-9_24
Su XGao GWei HBao F(2016)A knowledge-based recognition system for historical Mongolian documentsInternational Journal on Document Analysis and Recognition10.1007/s10032-016-0267-119:3(221-235)Online publication date: 1-Sep-2016
https://dl.acm.org/doi/10.1007/s10032-016-0267-1
Su XGao GJiang YWu JBao F(2015)Mongolian Inflection Suffix Processing in NLP: A Case StudyNatural Language Processing and Chinese Computing10.1007/978-3-319-25207-0_29(347-352)Online publication date: 9-Oct-2015
https://dl.acm.org/doi/10.1007/978-3-319-25207-0_29
Gatos BStamatopoulos NLouloudis GPerantonis SAntonacopoulos ASchulz K(2014)H-DocProProceedings of the First International Conference on Digital Access to Textual Cultural Heritage10.1145/2595188.2595203(131-136)Online publication date: 19-May-2014
https://dl.acm.org/doi/10.1145/2595188.2595203

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents