[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2037342.2037345acmotherconferencesArticle/Chapter ViewAbstractPublication PageshipConference Proceedingsconference-collections
research-article

TSV-LR: topological signature vector-based lexicon reduction for fast recognition of pre-modern Arabic subwords

Published: 16 September 2011 Publication History

Abstract

Automatic recognition of Arabic words is a challenging task and its complexity increases as the lexicon grows. In pre-modern documents, the vocabulary is unconstrained; therefore a lexicon-reduction strategy is needed to reduce the recognition computational complexity. This paper proposes a novel lexicon-reduction method for Arabic subwords based on their shapes' topology and geometry. First the sub-word shape's topological and geometrical information is extracted from its skeleton and encoded into a graph. Then the graph is converted into a topological signature vector (TSV) which preserves the graph structure. The lexicon is reduced based on the TSV distance between the lexicon sub-words' shapes and a query shape, by keeping the i nearest subwords. The value of i is selected according to a predetermined lexicon-reduction accuracy. The proposed framework has been tested on a database of pre-modern Arabic subword shapes with promising results.

References

[1]
I. S. I. Abuhaiba, S. A. Mahmoud, and R. J. Green. Recognition of handwritten cursive Arabic characters. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 16(6):664--672, 1994.
[2]
R. Al-Hajj Mohamad, L. Likforman-Sulem, and C. Mokbel. Combining slanted-frame classifiers for improved HMM-based Arabic handwriting recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 31(7):1165--1177, 2009.
[3]
R. Bertolami, C. Gutmann, H. Bunke, and A. Spitz. Shape code based lexicon reduction for offline handwritten word recognition. In Document Analysis Systems. DAS '08. The Eighth IAPR International Workshop on, pages 158--163, sept. 2008.
[4]
M. Dehghan, K. Faez, M. Ahmadi, and M. Shridhar. Holistic handwritten word recognition using discrete HMM and self-organizing feature map. In Systems, Man, and Cybernetics, 2000 IEEE International Conference on, volume 4, pages 2735--2739, 2000.
[5]
P. Dimitrov, C. Phillips, and K. Siddiqi. Robust and efficient skeletal graphs. In Computer Vision and Pattern Recognition, 2000. Proceedings. IEEE Conference on, volume 1, pages 417--423 vol.1.
[6]
F. Farooq, A. Bhardwaj, and V. Govindaraju. Using topic models for OCR correction. International Journal on Document Analysis and Recognition, 12(3):153--164, 2009.
[7]
R. Farrahi Moghaddam and M. Cheriet. Application of multi-level classifiers and clustering for automatic word spotting in historical document images. In ICDAR '09, pages 511--515, Barcelona, Spain, July 26-29 2009.
[8]
R. Farrahi Moghaddam and M. Cheriet. A multi-scale framework for adaptive binarization of degraded document images. Pattern Recognition, 43(6):2186--2198, 2010.
[9]
R. Farrahi Moghaddam, M. Cheriet, M. M. Adankon, K. Filonenko, and R. Wisnovsky. Ibn Sina: A database for research on processing and understanding of Arabic manuscripts images. In Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, DAS '10, pages 11--18, New York, NY, USA, 2010. ACM.
[10]
M. R. Forster. Key concepts in model selection: Performance and generalizability. Journal of Mathematical Psychology, 44(1):205--231, 2000.
[11]
A. Graves and J. Schmidhuber. Offline handwriting recognition with multidimensional recurrent neural networks. In Y. B. D. Koller, D. Schuurmans and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 545--552, 2009.
[12]
S. H. Joshi, E. Klassen, A. Srivastava, and I. Jermyn. A novel representation for riemannian analysis of elastic curves in rn. In Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on, pages 1--7, 2007.
[13]
A. L. Koerich, R. Sabourin, and C. Y. Suen. Large vocabulary off-line handwriting recognition: A survey. Pattern Analysis & Applications, 6:97--121, 2003. 10.1007/s10044-002-0169-3.
[14]
L. M. Lorigo and V. Govindaraju. Offline Arabic handwriting recognition: a survey. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 28(5):712--724, 2006.
[15]
S. Mozaffari, K. Faez, V. Margner, and H. El Abed. Strategies for large handwritten Farsi/Arabic lexicon reduction. In Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on, volume 1, pages 98--102, sept. 2007.
[16]
M. Pechwitz, S. Maddouri, V. Maegner, and N. Ellouze. IFN/ENIT database for handwritten Arabic words. pages 129--136, Hammamet, Tunisia, 2002.
[17]
M. Sagheer, C. L. He, N. Nobile, and C. Suen. Holistic Urdu handwritten word recognition using support vector machine. In Pattern Recognition (ICPR), 2010 20th International Conference on, pages 1900--1903, aug. 2010.
[18]
A. Shokoufandeh, D. Macrini, S. Dickinson, K. Siddiqi, and S. W. Zucker. Indexing hierarchical structures using graph spectra. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 27(7):1125--1140, 2005.
[19]
A. Srivastava, E. Klassen, S. Joshi, and I. Jermyn. Shape analysis of elastic curves in euclidean spaces. Pattern Analysis and Machine Intelligence, IEEE Transactions on, PP(99):1--1, 2010.
[20]
C. H. Teh and R. T. Chin. On the detection of dominant points on digital curves. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 11(8):859--872, 1989.
[21]
S. Wshah, V. Govindaraju, Y. Cheng, and H. Li. A novel lexicon reduction method for Arabic handwriting recognition. In Pattern Recognition (ICPR), 2010 20th International Conference on, pages 2865--2868, aug. 2010.
[22]
A. Zidouri. ORAN: a basis for an Arabic OCR system. In Intelligent Multimedia, Video and Speech Processing, 2004. Proceedings of 2004 International Symposium on, pages 703--706, oct. 2004.

Cited By

View all
  • (2024)Subword recognition in historical Arabic manuscripts using handcrafted features and deep learning approachesInternational Journal on Document Analysis and Recognition (IJDAR)10.1007/s10032-024-00501-xOnline publication date: 23-Sep-2024
  • (2016)Lexicon reduction of handwritten Arabic subwords based on the prominent shape regionsInternational Journal on Document Analysis and Recognition10.1007/s10032-016-0262-619:2(139-153)Online publication date: 1-Jun-2016
  • (2014)A Coarse-to-Fine Word Spotting Approach for Historical Handwritten Documents Based on Graph Embedding and Graph Edit DistanceProceedings of the 2014 22nd International Conference on Pattern Recognition10.1109/ICPR.2014.530(3074-3079)Online publication date: 24-Aug-2014
  • Show More Cited By

Index Terms

  1. TSV-LR: topological signature vector-based lexicon reduction for fast recognition of pre-modern Arabic subwords

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      HIP '11: Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
      September 2011
      195 pages
      ISBN:9781450309165
      DOI:10.1145/2037342
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      • IAPR: International Association for Pattern Recognition

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 16 September 2011

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Research-article

      Conference

      HIP '11
      Sponsor:
      • IAPR
      HIP '11: Historical Document Imaging and Processing
      September 16 - 17, 2011
      China, Beijing, USA

      Acceptance Rates

      Overall Acceptance Rate 52 of 90 submissions, 58%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 11 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Subword recognition in historical Arabic manuscripts using handcrafted features and deep learning approachesInternational Journal on Document Analysis and Recognition (IJDAR)10.1007/s10032-024-00501-xOnline publication date: 23-Sep-2024
      • (2016)Lexicon reduction of handwritten Arabic subwords based on the prominent shape regionsInternational Journal on Document Analysis and Recognition10.1007/s10032-016-0262-619:2(139-153)Online publication date: 1-Jun-2016
      • (2014)A Coarse-to-Fine Word Spotting Approach for Historical Handwritten Documents Based on Graph Embedding and Graph Edit DistanceProceedings of the 2014 22nd International Conference on Pattern Recognition10.1109/ICPR.2014.530(3074-3079)Online publication date: 24-Aug-2014
      • (2014)Arabic word descriptor for handwritten word indexing and lexicon reductionPattern Recognition10.1016/j.patcog.2014.04.02547:10(3477-3486)Online publication date: Oct-2014
      • (2013)Lexicon Reduction Using Segment Descriptors for Arabic Handwriting RecognitionProceedings of the 2013 12th International Conference on Document Analysis and Recognition10.1109/ICDAR.2013.256(1265-1269)Online publication date: 25-Aug-2013
      • (2012)A new framework based on signature patches, micro registration, and sparse representation for optical text recognition2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA)10.1109/ISSPA.2012.6310485(1259-1265)Online publication date: Jul-2012
      • (2012)W-TSVPattern Recognition10.1016/j.patcog.2012.02.03045:9(3277-3287)Online publication date: 1-Sep-2012

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media