Attributed point matching for automatic groundtruth generation

Doe-Wan Kim¹ &
Tapas Kanungo¹

67 Accesses
7 Citations
Explore all metrics

Abstract.

Geometric groundtruth at the character, word, and line levels is crucial for designing and evaluating optical character recognition (OCR) algorithms. Kanungo and Haralick proposed a closed-loop methodology for generating geometric groundtruth for rescanned document images. The procedure assumed that the original image and the corresponding groundtruth were available. It automatically registered the original image to the rescanned one using four corner points and then transformed the original groundtruth using the estimated registration transformation. In this paper, we present an attributed branch-and-bound algorithm for establishing the point correspondence that uses all the data points. We group the original feature points into blobs and use corners of blobs for matching. The Euclidean distance between character centroids is used as the error metric. We conducted experiments on synthetic point sets with varying layout complexity to characterize the performance of two matching algorithms. We also report results on experiments conducted using the University of Washington dataset. Finally, we show examples of application of this methodology for generating groundtruth for microfilmed and FAXed versions of the University of Washington dataset documents.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Author information

Authors and Affiliations

Information Sciences Institute/USC, 3811 North Fairfax Dr., Suite 200, Arlington, VA 22030, USA; e-mail: dwkim@isi.edu IBM Almaden Research Center, 650 Harry Road, San Jose, CA 95120, USA; e-mail: kanungo@us.ibm.com , , , , , , US
Doe-Wan Kim & Tapas Kanungo

Authors

Doe-Wan Kim
View author publications
You can also search for this author in PubMed Google Scholar
Tapas Kanungo
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Received: July 24, 2001 / Accepted: May 20, 2002

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, DW., Kanungo, T. Attributed point matching for automatic groundtruth generation. IJDAR 5, 47–66 (2002). https://doi.org/10.1007/s10032-002-0083-7

Download citation

Issue Date: November 2002
DOI: https://doi.org/10.1007/s10032-002-0083-7

Keywords: Image registration – Attributed point matching – Branch-and-bound – Automatic groundtruth generation – Microfilm – FAX

Abstract.

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Are Layout Analysis and OCR Still Useful for Document Information Extraction Using Foundation Models?

A robust methodology for outdoor optical mark recognition

BOP: Benchmark for 6D Object Pose Estimation

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Subscribe and save

Buy Now

Navigation

Attributed point matching for automatic groundtruth generation

Abstract.

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Are Layout Analysis and OCR Still Useful for Document Information Extraction Using Foundation Models?

A robust methodology for outdoor optical mark recognition

BOP: Benchmark for 6D Object Pose Estimation

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Subscribe and save

Buy Now

Search

Navigation