Abstract
Text detection and recognition in degraded video is complex and challenging due to lighting effect, sensor and motion blurring. This paper presents a new method that derives multi-spectral images from each input video frame by studying non-linear intensity values in Gray, R, G and B color spaces to increase the contrast of text pixels, which results in four respective multi-spectral images. Then we propose a multiple fusion criteria for the four multi-spectral images to enhance text information in degraded video frames. We propose median operation to obtain a single image from the results of the multiple fusion criteria, which we name fusion-1. We further apply k-means clustering on the fused images obtained by the multiple fusion criteria to classify text clusters, which results in binary images. Then we propose the same median operation to obtain a single image by fusing binary images, which we name fusion-2. We evaluate the enhanced images at fusion-1 and fusion-2 using quality measures, such as Mean Square Error, Peak Signal to Noise Ratio and Structural Symmetry. Furthermore, the enhanced images are validated through text detection and recognition accuracies in video frames to show the effectiveness of enhancement.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ye, Q., Doermann, D.: Text detection and recognition in imagery: a survey. IEEE. Trans. Pattern Anal. Mach Intell. 1, 1 (2014)
Sharma, N., Pal, U., Blumenstein, M.: Recent advances in video based document processing: a review. In: Proceedings of DAS, pp. 63–68 (2012)
Yu, S., Li, B., Zhang, Q., Liu, C., Meng, M.A.H.: A novel license plate location method based on wavelet transform and EMD analysis. Pattern Recogn. 48, 114–125 (2015)
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Proceedings of CVPR, pp. 2963–2970 (2010)
Otsu, N.: A threshold selection method from gray level histogram. IEEE Trans. Syst. Man Cybern. 11, 62–66 (1978)
Tesseract. http://code.google.com/p/tesseract-ocr/
Niblack, W.: An Introduction to Digital Image Processing. Strandberg Publishing Company, Birkeroed (1985)
Sauvola, J., Seeppanen, T., Haapakoski, S., Pietikainen, M.: Adaptive document binarization. In: Proceedings of ICDAR, pp. 147–152 (1997)
Zhou, Y., Feid, J., Miller, E.L., Wang, R.: Scene text segmentation via inverse rendering. In: Proceedings of ICDAR, pp. 457–461 (2013)
Su, B., Lu, S., Tan, C.L.: A robust document image binarization for degraded document images. IEEE Trans. Image Process. 22, 1408–1417 (2013)
Su, B., Lu, S., Tan, C.L.: Binarization of historical document images using the local maximum and minimum. In: Proceedings of DAS, pp. 159–166 (2010)
Nayef, N., Chazalon, J., Kramer, P.G., Ogier, J.M.: Efficient example-based super-resolution of single text images based on selective patch processing. In: Proceedings of DAS, pp. 227–231 (2014)
Zheng, Y., Li, X.K.S., Sun, Y.H.J.: Real-time document image super-resolution by fast matting. In: Proceedings of DAS, pp. 232–236 (2014)
Saleem, S., Hollaus, F., Sablatnig, R.: Recognition of degraded ancient characters based on dense SIFT. In: Proceedings of DATeCH, pp. 15–20 (2014)
Minetto, R., Thome, N., Cord, M., Leite, N.J., Stolfi, J.: SnooperText: a text detection system for automatic indexing of urban scenes. In: CVIU, pp. 92–104 (2014)
Yi, C., Tian, Y.: Text string detection from natural scenes by structure-based partition and grouping. IEEE Trans. Image Process. 20, 2594–2605 (2011)
Shivakumara, P., Phan, T.Q., Lu, S., Tan, C.L.: Gradient vector flow and grouping based method for arbitrarily-oriented scene text detection in video images. IEEE Trans. Circ. Syst. Video Technol. 23, 1729–1739 (2013)
Xu, J., Shivakumara, P., Lu, T., Phan, T.Q., Tan, C.L.: Graphics and scene text classification in video. In: Proceedings of ICPR, pp. 4714–4719 (2014)
Cui, Y., Huang, Q.: Character extraction of license plate from video. In: Proceedings of CVPR, pp. 502–507 (1997)
Li, H., Doermann, D.: Super-resolution-based enhancement for text in digital video. In: Proceedings of ICPR, pp 847–850 (2000)
Suresh, K.V., Kumar, G.M., Rajagopalan, A.N.: Superresolution of license plates in real traffic videos. IEEE Trans. Intell. Transp. Syst. 8, 321–331 (2007)
Saleeem, S., Sablatnig, R.: A robust SIFT descriptor for multi-spectral images. IEEE Signal Process. Lett. 21, 400–403 (2014)
Rusinol, M., Chazalon, J., Ogier, J. M.: Combining focus measure operators to predict OCR accuracy in mobile-captured document images. In: Proceedings of IWDAS, pp 181–185 (2014)
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Boorda, L.G.I., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De las Heras, L.P.: ICDAR 2013 robust reading competition. In: Proceedings of ICDAR, pp. 1115–1124 (2013)
Lu, W., Tao, D.: Multiview Hessian regularization for image annotation. IEEE Trans. Image Process. 22, 2676–2687 (2013)
Xu, C., Tao, D., Xu, C.: Large-margin multi-view information bottleneck. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1559–1572 (2014)
Acknowledgment
The work described in this paper was supported by the Natural Science Foundation of China under Grant No. 61272218 and No. 61321491, and the Program for New Century Excellent Talents under NCET-11-0232.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Weng, Y., Shivakumara, P., Lu, T., Meng, L.K., Woon, H.H. (2015). A New Multi-spectral Fusion Method for Degraded Video Text Frame Enhancement. In: Ho, YS., Sang, J., Ro, Y., Kim, J., Wu, F. (eds) Advances in Multimedia Information Processing -- PCM 2015. PCM 2015. Lecture Notes in Computer Science(), vol 9314. Springer, Cham. https://doi.org/10.1007/978-3-319-24075-6_48
Download citation
DOI: https://doi.org/10.1007/978-3-319-24075-6_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24074-9
Online ISBN: 978-3-319-24075-6
eBook Packages: Computer ScienceComputer Science (R0)