Abstract
This paper presents an integrated approach to spot the spoken keywords in digitized Tamil documents by combining word image matching and spoken word recognition techniques. The work involves the segmentation of document images into words, creation of an index of keywords, and construction of word image hidden Markov model (HMM) and speech HMM for each keyword. The word image HMMs are constructed using seven dimensional profile and statistical moment features and used to recognize a segmented word image for possible inclusion of the keyword in the index. The spoken query word is recognized using the most likelihood of the speech HMMs using the 39 dimensional mel frequency cepstral coefficients derived from the speech samples of the keywords. The positional details of the search keyword obtained from the automatically updated index retrieve the relevant portion of text from the document during word spotting. The performance measures such as recall, precision, and F-measure are calculated for 40 test words from the four groups of literary documents to illustrate the ability of the proposed scheme and highlight its worthiness in the emerging multilingual information retrieval scenario.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Crestani F (2002) Spoken query processing for interactive information retrieval. Data Knowl Eng 41(1):105–124
Cucu H, Buzo A, Burileanu C (2011) Optimization methods for large vocabulary, isolated words recognition in Romanian language. Scientific Bulletin University "Politehnica" of Bucharest, Series C, Electrical Engineering and Computer Science 73(2):179–192
Davis SB, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 28(4):357–366
Gales M, Young S (2007) The application of hidden markov models in speech recognition. Found Trends Signal Process 1(3):195–304
Gonzalez RC, Woods RE (2008) Digital image processing. Prentice Hall, Englewood Cliffs
Ikeda T, Ishikawa S, Miki K, Adachi F, Isotani R, Satoh K, Okumura A (2011) Speech-activated text retrieval system for cellular phones with web browsing capability. In: Proceedings of the 19th Asia-Pacific conference on language, information and computation pp 156–161
Juang BH, Rabiner LR (1991) Hidden Markov models for speech recognition. Technometrics 33(3):251–272
Lee L, Pan Y (November 2009) Voice based information retrieval—how far are we from the text based information retrieval. IEEE workshop on automatic speech recognition and understanding, pp 26–43
Lestari DP, Furui S (2010) Adaptation to pronunciation variations in Indonesian spoken query-based information retrieval, IEICE Transactions on Information and Systems, Special section on processing natural speech variability for improved verbal human-computer interaction, E93-D(9):2388–2396
Leydier Y, Ouji A, LeBourgeois F, Emptoz H (2009) Towards an omnilingual word retrieval system for ancient manuscripts. Pattern Recogn 42:2089–2105
Likforman-Sulem L, Zahour A, Taconet B (2007) Text line segmentation of historical documents: a survey. Int J Doc Anal Recogn 9(2):123–138
Najkar N, Razzazi F, Sameti H (2010) A novel approach to HMM-based speech recognition systems using particle swarm optimization. Math Comput Model 52(11–12):1910–1920
Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2002) Numerical recipes in C. Cambridge University Press, Cambridge
Rahbar K, Broumandnia A (May 2010) Independent-speaker isolated word speech recognition based on mean-shift framing using hybrid HMM/SVM classifier, 8th Iranian Conference on Electrical Engineering (ICEE) pp 156–161
Rath TM, Manmatha R (2003) Features for word spotting in historical manuscripts. International conference on document analysis and recognition (ICDAR) 218–222
Acknowledgments
The authors thank the authorities of Annamalai University for providing the necessary facilities to carry out this research work.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sigappi, A., Palanivel, S. Spoken query based word spotting in digitized Tamil documents . AI & Soc 29, 113–121 (2014). https://doi.org/10.1007/s00146-013-0452-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00146-013-0452-4