[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Document Image Segmentation through Clustering and Connectivity Analysis

  • Conference paper
New Research in Multimedia and Internet Systems

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 314))

Abstract

This chapter presents a new document image segmentation algorithm, called Cluster Variance Segmentation (CVSEG). The method is based on the analysis of the tiles suspected to be part of an image and filtering them subsequently. In the end, the results are enhanced through a reconstruction stage. I present the design of the algorithm as well as the test results on various document images. The experiments validate the efficacy and efficiency of the proposed approach when compared with other algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 103.50
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 129.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Pujari, A.K., Dhanunjaya Naidu, C., Sreenivasa Rao, M., Jinaga, B.C.: An intelligent character recognizer for Telugu scripts using multiresolution analysis and associative memory. Image and Vision Computing 22(14), 1221–1227 (2004)

    Article  Google Scholar 

  2. Cai, K., Bu, J., Chen, C., Huang, P.: An automatic approach for efficient text segmentation. In: Gabrys, B., Howlett, R.J., Jain, L.C. (eds.) KES 2006. LNCS (LNAI), vol. 4251, pp. 417–424. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  3. Yang, J., Kang, J., Choi, J.: A focused crawler with document segmentation. In: Gallagher, M., Hogan, J.P., Maire, F. (eds.) IDEAL 2005. LNCS, vol. 3578, pp. 94–101. Springer, Heidelberg (2005)

    Google Scholar 

  4. Zhanga, X., Lyu, M.R., Dai, G.-Z.: Extraction and segmentation of tables from Chinese ink documents based on a matrix model. Pattern Recognition 40(7), 1855–1867 (2007)

    Article  Google Scholar 

  5. Roy, P.P., Pal, U., Lladós, J.: Document seal detection using GHT and character proximity graphs. Pattern Recognition 44(6), 1282–1295 (2011)

    Article  Google Scholar 

  6. Xia, Y., Xiao, B.H., Wang, C.H., Li, Y.D.: Segmentation of mixed Chinese/English documents based on Chinese Radicals recognition and complexity analysis in local segment pattern. In: Huang, D.-S., Li, K., Irwin, G.W. (eds.) Intelligent Computing in Signal Processing and Pattern Recognition. LNCIS, vol. 345, pp. 497–506. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  7. Sas, J., Markowska-Kaczmar, U.: Similarity-based training set acquisition for continuous handwriting recognition. Information Sciences 191, 226–244 (2012)

    Article  Google Scholar 

  8. Tsai, C.M.: Intelligent region-based thresholding for color document images with highlighted regions. Pattern Recognition 45(4), 1341–1362 (2012)

    Article  Google Scholar 

  9. Tonazzini, A., Vezzosi, S., Bedini, L.: Analysis and recognition of highly degraded printed characters. Document Analysis and Recognition 6(4), 236–247 (2003)

    Article  Google Scholar 

  10. Fonseca, M.J., Pimentel, C., Jorge, J.A.: CALI: An online scribble recognizer for calligraphic interfaces. In: AAAI Spring Symposium on Sketch Understanding, pp. 51–58 (2002)

    Google Scholar 

  11. Papamarkos, N.: A neuro-fuzzy technique for document binarisation. Neural Computing & Applications 12(3-4), 190–199 (2003)

    Article  Google Scholar 

  12. Chen, N., Blostein, D.: A survey of document image classification: problem statement, classifier architecture and performance evaluation. International Journal of Document Analysis and Recognition (IJDAR) 10(1), 1–16 (2007)

    Article  MATH  Google Scholar 

  13. Bloomberg, D.S.: Multiresolution morphological approach to document image analysis. In: Proc. of the International Conference on Document Analysis and Recognition, Saint-Malo, France (1991)

    Google Scholar 

  14. Bukhari, S.S., Shafait, F., Breuel, T.M.: Improved document image segmentation algorithm using multiresolution morphology. In: IS&T/SPIE Electronic Imaging, pp. 78740D-78740D. International Society for Optics and Photonics (2011)

    Google Scholar 

  15. Ha, J., Haralick, R., Phillips, I.T.: Recursive XY cut using bounding boxes of connected components. In: Proceedings of the Third International Conference on Document Analysis and Recognition, vol. 2, pp. 952–955. IEEE (1995)

    Google Scholar 

  16. Antonacopoulos, A., Bridson, D., Papadopoulos, C.: Page Segmentation Competition. In: ICDAR 2007 (2007), http://www.primaresearch.org/ICDAR2007_competition/

  17. Sauvola, J., Seppanen, T., Haapakoski, S., Pietikainen, M.: Adaptive document binarization. In: Proceedings of the Fourth International Conference on Document Analysis and Recognition, vol. 1, pp. 147–152. IEEE (1997)

    Google Scholar 

  18. Google. Ocropus, http://code.google.com/p/ocropus (April 21, 2013)

  19. Bloomberg, D.S.: libleptonica, http://www.ubuntuupdates.org/package/core/precise/universe/base/libleptonica (March 6, 2012)

  20. Google, Inc. Tesseract (2013), http://en.wikipedia.org/wiki/Tesseract_%28software%29

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mihai Bogdan Ilie .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Ilie, M.B. (2015). Document Image Segmentation through Clustering and Connectivity Analysis. In: Zgrzywa, A., Choroś, K., Siemiński, A. (eds) New Research in Multimedia and Internet Systems. Advances in Intelligent Systems and Computing, vol 314. Springer, Cham. https://doi.org/10.1007/978-3-319-10383-9_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10383-9_1

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10382-2

  • Online ISBN: 978-3-319-10383-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics