Abstract
The growing use of Know Your Customer online services generates a massive flow of dematerialised personal Identity Documents under variable capturing conditions and qualities (e.g. webcam, smartphone, scan, or even handcrafted pdfs). IDs are designed, depending on their issuing country/model, with a specific layout (i.e. background, photo(s), fixed/variable text fields) along with various anti-fraud features (e.g. checksums, Optical Variable Devices) which are non-trivial to analyse. This paper tackles the problem of detecting, classifying, and aligning captured documents onto their reference model. This task is essential in the process of document reading and fraud verification. However, due to the high variation of capture conditions and models’ layout, classical handcrafted approaches require deep knowledge of documents and hence are hard to maintain. A modular approach using a fully multi-stage deep learning based approach is proposed in this work. The proposed approach allows to accurately classify the document and estimates its quadrilateral (localization). As opposed to approaches relying on a single end-to-end network, the proposed modular framework offers more flexibility and a potential for future incremental learning. All networks used in this work are derivatives of recent state-of-the-art ones. Experiments show the superiority of the proposed approach in terms of speed while maintaining good accuracy, both on the MIDV-500 academic dataset and on an industrial based dataset compared to hand crafted solutions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abbas, S.A., ul Hussain, S.: Recovering homography from camera captured documents using convolutional neural networks. arXiv preprint arXiv:1709.03524 (2017)
Arlazarov, V.V., et al.: MIDV-500: a dataset for identity documents analysis and recognition on mobile devices in video stream. CoRR (2018)
Attivissimo, F., et al.: An automatic reader of identity documents. In: Systems, Man and Cybernetics (SMC). IEEE (2019)
Awal, A.M., et al.: Complex document classification and localization application on identity document images. In: 14th IAPR International Conference on Document Analysis and Recognition, pp. 426–431 (2017)
Bandyopadhyay, H., et al.: A gated and bifurcated stacked U-Net module for document image dewarping (2020). arXiv:2007.09824 [cs.CV]
Bojanić, D., et al.: On the comparison of classic and deep keypoint detector and descriptor methods. In: 11th International Symposium on Image and Signal Processing and Analysis (ISPA), pp. 64–69. IEEE (2019)
Bulatov, K., et al.: MIDV-2019: challenges of the modern mobile based document OCR. In: ICMV 2019, vol. 11433 (2020)
Burie, J.-C., et al.: ICDAR2015 competition on smartphone document capture and OCR (SmartDoc). In: 13th International Conference on Document Analysis and Recognition, pp. 1161–1165. IEEE (2015)
Castelblanco, A., Solano, J., Lopez, C., Rivera, E., Tengana, L., Ochoa, M.: Machine learning techniques for identity document verification in uncontrolled environments: a case study. In: Figueroa Mora, K.M., Anzurez Marín, J., Cerda, J., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Olvera-López, J.A. (eds.) MCPR 2020. LNCS, vol. 12088, pp. 271–281. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49076-8_26
DeTone, D., et al.: Deep image homography estimation. arXiv preprint arXiv:1606.03798 (2016)
DeTone, D., et al.: Superpoint: self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 224–236 (2018)
Chiron, G., et al.: ID documents matching and localization with multi-hypothesis constraints. In: 25th International Conference on Pattern Recognition (ICPR). IEEE (2020)
Ilg, E., et al.: FlowNet 2.0: evolution of optical ow estimation with deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2462–2470 (2017)
Javed, K., Shafait, F.: Real-time document localization in natural images by recursive application of a CNN. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 105–110. IEEE (2017)
das Neves Junior, R.B., et al.: A fast fully octave convolutional neural network for document image segmentation. arXiv preprint arXiv:2004.01317 (2020)
Mullins, R.R., et al.: Know your customer: how salesperson perceptions of customer relationship quality form and influence account profitability. J. Mark. 78(6), 38–58 (2014)
Nguyen, T., et al.: Unsupervised deep homography: a fast and robust homography estimation model. IEEE Rob. Autom. Lett. 3(3), 2346–2353 (2018)
Puybareau, É., Géraud, T.: Real-time document detection in smartphone videos. In: 25th IEEE International Conference on Image Processing, pp. 1498–1502 (2018)
Raguram, R., et al.: USAC: a universal framework for random sample consensus. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 2022–2038 (2012)
Sarlin, P.-E., et al.: Superglue: learning feature matching with graph neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4938–4947 (2020)
Shen, X., et al.: RANSAC-flow: generic two-stage image alignment. arXiv preprint arXiv:2004.01526 (2020)
Sheshkus, A., et al.: Houghencoder: neural network architecture for document image semantic segmentation. In: IEEE International Conference on Image Processing (ICIP), pp. 1946–1950 (2020)
Simon, M., et al.: Fine-grained classification of identity document types with only one example. In: 2015 14th IAPR International Conference on Machine Vision Applications (MVA), pp. 126–129. IEEE (2015)
Skoryukina, N., et al.: Fast method of ID documents location and type identification for mobile and server application. In: International Conference on Document Analysis and Recognition, pp. 850–857 (2019)
Tan, M., et al.: EfficientDet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)
Tan, M., et al.: MnasNET: platform-aware neural architecture search for mobile. In: IEEE/CVPR, pp. 2820–2828 (2019)
Tropin, D.V., et al.: Approach for document detection by contours and contrasts. arXiv preprint arXiv:2008.02615 (2020)
Truong, P., et al.: GLU-Net: global-local universal network for dense flow and correspondences. In: IEEE/CVPR (2020)
Viet, H.T., et al.: A robust end-to-end information extraction system for Vietnamese identity cards. In: NAFOSTED (2019)
Zhang, J., et al.: Content-aware unsupervised deep homography estimation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 653–669. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_38
Zhou, Q., Li, X.: STN-homography: estimate homography parameters directly. arXiv preprint arXiv:1906.02539 (2019)
Zhu, A., Zhang, C., Li, Z., Xiong, S.: Coarse-to-fine document localization in natural scene image with regional attention and recursive corner refinement. Int. J. Doc. Anal. Recogn. (IJDAR) 22(3), 351–360 (2019). https://doi.org/10.1007/s10032-019-00341-0
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Chiron, G., Arrestier, F., Awal, A.M. (2021). Fast End-to-End Deep Learning Identity Document Detection, Classification and Cropping. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12824. Springer, Cham. https://doi.org/10.1007/978-3-030-86337-1_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-86337-1_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86336-4
Online ISBN: 978-3-030-86337-1
eBook Packages: Computer ScienceComputer Science (R0)