Abstract
The most important and difficult task in text document analysis is to achieve line segmentation accurately, particularly when the document is composed of unconstrained handwritten text. To accomplish this objective a painting scheme is proposed in this research work. Being motivated by the fact that the handwritten Persian texts offer the most critical challenges in the process of text-line segmentation, the new method has been devised by studying the cursive Persian text scripts extensively; yet, in general the proposed line segmentation algorithm is applicable to handwritten text in any language/script. The text block is vertically decomposed into parallel pipe structures called as strip. Each row in each strip is painted by a gray intensity, which is the average intensity value of gray values of all pixels present in that row-strip. Subsequently, the painted pipes are converted into two-tone painting and it is smoothed. The white/black spaces in each pipe of the smoothed image are analyzed to get a short line of separation, phrased as Piece-wise Potential Separating Line (PPSL), between two consecutive black spaces. The PPSLs are concatenated to produce the segmentation of text lines. Some additional procedures are built to handle certain anomalies, which may occur. The scheme is validated by extensive experimentation. We tested the proposed algorithm with 52 pages of Persian text documents containing totally 823 lines and correct line segmentation of 92.35% is achieved. Moreover, the proposed algorithm was also tested with two different datasets of 152 and 200 handwritten text-pages of different languages. Efficiency and script independency of the proposed algorithm were proved when compared with various approaches presented in recent literature.
Similar content being viewed by others
References
Likforman-Sulem L, Zahour A, Taconet B (2007) Text line segmentation of historical documents: a survey. Int J Document Anal Recognit 9(2):123–138
Bortolozzi F, Britto Jr, Alceu de S, Oliveira LS, Morita M (2005) Recent advances in handwriting recognition. In: Pal et al U (eds) Document analysis. ISBN: 8177647849, pp 1–31
Srihari SN, Ball G (2008) An assessment of arabic handwriting recognition technology. CEDAR Technical Report, TR-03-07
http://en.wikipedia.org/wiki foreign: dated 25-02-2009
Hashemi MR, Fatemi O, Safavi R (1995) Persian cursive script recognition. Proc Third Int Conf Document Anal Recogn 2:869–873
Timár G, Karacs K, Rekeczky C (2002) Analogic preprocessing and segmentation algorithms for off-line handwriting recognition. In: Proceedings of seventh IEEE international workshop on cellular neural networks and their applications (CNNA02), pp 407–414
Manmatha R, Rothfeder JL (2005) A scale space approach for automatically segmenting words from historical handwritten documents. IEEE Trans Pattern Anal Mach Intell 27(8):1212–1225
Zahour A, Taconet B, Mercy P, Ramdane S (2001) Arabic hand-written text-line extraction. In: Proceedings of the sixth international conference on document analysis and recognition (ICDAR01), pp 281–285
Pal U, Datta S (2003) Segmentation of bangla unconstrained handwritten text. In: Proceedings of the seventh international conference on document analysis and recognition (ICDAR 2003), pp 1128–1132
Tripathy N, Pal U (2004) Handwriting segmentation of unconstrained oriya text. In: Proceedings of ninth international workshop on frontiers in handwriting recognition (IWFHR), pp 306–311
Zahour A, Taconet B, Likforman-Sulem L, Boussellaa W (2009) Overlapping and multi-touching text-line segmentation by Block Covering analysis. Pattern Anal Appl 12(4):335–351
Shi Z, Govindaraju V (2004) Line separation for complex document images using fuzzy runlength. In: First international workshop on document image analysis for libraries (DIAL’04), pp 306–307
Likforman-Sulem L, Hanimyan A, Faure C (1995) A Hough based algorithm for extracting text lines in handwritten documents. In: Proceedings of the third international conference on document analysis and recognition, Montreal, Canada, pp 774–777
Louloudis G, Gatos B, Pratikakis I, Halatsis C (2008) Text line detection in handwritten documents. Pattern Recogn 41:3758–3772
Basu S, Chaudhuri C, Kundu M, Nasipuri M, Basu DK (2007) Text line extraction from multi-skewed handwritten documents. Pattern Recogn 40(6):1825–1839
Li Y, Zheng Y, Doermann D, Jaeger S (2008) Script-independent text line segmentation in freestyle handwritten documents. IEEE Trans Pattern Anal Mach Intell 30(8):1313–1329
Bukhari SS, Shafait F, Breuel TM (2009) Script-independent handwritten textlines segmentation using active contours. In: Proceedings of the 10th international conference on document analysis and recognition, pp 446–450
Yin F, Liu C-L (2009) Handwritten Chinese text line segmentation by clustering with distance metric learning. Pattern Recogn 42(12):3146–3157
Wang H, Suter D (2003) Color image segmentation using global information and local homogeneity. In: Seventh international conference on digital image computing: techniques and applications, pp 89–98
Skarbek W, Koschan A (1994) Colour image segmentation—a survey. Technical Report 94-32, Technical University of Berlin, Department of Computer Science, Germany
Panneton B, Brouillard M (2008) Assessing color representation methods for segmentation of vegetation in color photographs. Published by the American Society of Agricultural and Biological Engineers
Ball GR, Srihari SN, Srinivasan H (2006) Segmentation-based and segmentation-free methods for spotting handwritten arabic words. In: Proceedings of 10th international workshop on frontiers in handwriting recognition (IWFHR 2006), pp 53–58
Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–69
Gatos B, Stamatopoulos N, Louloudis G (2009) ICDAR2009 Handwriting segmentation contest. In: Proceedings of 10th international conference on document analysis and recognition, pp 1393–1397
Gatos B, Antonacopoulos A, Stamatopoulos N (2007) ICDAR2007 handwriting segmentation contest. In: Proceedings of ninth international conference on document analysis and recognition, pp 1284–1288
Author information
Authors and Affiliations
Corresponding author
Appendix: contributions in this paper
Appendix: contributions in this paper
Line segmentation from unconstrained handwritten document is a difficult task because of the writing styles of different individuals. Characters of two consecutive text lines may touch or overlap and such touching/overlapping makes the line segmentation task more complex. In this paper, a painting scheme is proposed to facilitate unconstrained handwritten text-line segmentation process. In the proposed scheme, input text page is vertically decomposed into parallel pipe structures called as strip. The width of strips is automatically computed based on the space (gap) between the consecutive lines in each text-page. Each row of a strip is painted by a gray intensity, which is the average intensity value of gray values of all pixels present in that row-strip. The painted strips are then converted into two-tone painting image and using some smoothing operations the two-tone painted image is smoothed. The white/black spaces in each pipe of the smoothed image are analyzed to get a short line of separation, called as Piece-wise Potential Separating Line (PPSL), between two consecutive black spaces. Finally, the PPSLs are concatenated or extended for text-line separation. The proposed method can also handle touching/overlapping cases. To do so, the proposed system initially detects the touching/overlapping zones and then based on the structural behavior of such zones, they are segmented.
The scheme is validated by extensive experimentations with many scripts. The proposed algorithm was tested with 52 pages of Persian text documents containing totally 823 lines and 92.35% line segmentation accuracy was achieved. Moreover, the proposed algorithm was tested with two different datasets containing 152 and 200 handwritten text-pages of different languages such as English, Greek, French, and German. Efficiency and script independency of the proposed algorithm was proved when compared with various approaches presented in recent literature.
Rights and permissions
About this article
Cite this article
Alaei, A., Nagabhushan, P. & Pal, U. Piece-wise painting technique for line segmentation of unconstrained handwritten text: a specific study with Persian text documents. Pattern Anal Applic 14, 381–394 (2011). https://doi.org/10.1007/s10044-011-0226-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-011-0226-x