Abstract
Document analysis often starts with robust signatures, for instance for document lookup from low-quality photographs, or similarity analysis between scanned books. Signatures based on OCR typically work well, but require good quality OCR, which is not always available and can be very costly. In this paper we describe a novel scheme for extracting discrete signatures from document images. It operates on points that describe the position of words, typically the centroid. Each point is extracted using one of several techniques and assigned a signature based on its relation to the nearest neighbors. We will discuss the benefits of this approach, and demonstrate its application to multiple problems including fast image similarity calculation and document lookup.
Chapter PDF
Similar content being viewed by others
References
Bloomberg, D., Vincent, L.: Document Image Analysis, Mathematical morphology: theory and applications, Najman L., Talbot H. (ed.), pp. 425–438 (2010)
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: Speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)
Ke, Y., Sukthankar, R.: PCA-SIFT: A More Distinctive Representation for Local Image Descriptors. In: Proc. CVPR 2004, pp. 506–513 (2004)
Liu, Q., Yano, H., Kimber, D., Liao, C., Wilcox, L.: High accuracy and language independent document retrieval with a fast inv. t. In: Proc. ICME 2009, pp. 386–389 (2009)
Nakai, T., Kise, K., Iwamura, M.: Hashing with Local Combinations of Feature Points and Its App. In: Proc. CBDAR 2005, pp. 87–94 (2005)
Shijian, L., Linlin, L., Chew Lim, T.: Document Image Retrieval through Word Shape Coding. IEEE TPAMI 30(11), 1913–1918 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Spasojevic, N., Poncin, G., Bloomberg, D. (2011). Discrete Point Based Signatures and Applications to Document Matching. In: Maino, G., Foresti, G.L. (eds) Image Analysis and Processing – ICIAP 2011. ICIAP 2011. Lecture Notes in Computer Science, vol 6978. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24085-0_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-24085-0_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24084-3
Online ISBN: 978-3-642-24085-0
eBook Packages: Computer ScienceComputer Science (R0)