Abstract
Progress in the digitization of cultural assets leads to online databases that become too large for a human to analyze. Moreover, some analyses might be challenging, even for experts. In this paper, we explore two applications of computer vision to analyze historical data: watermark recognition and one-shot repeated pattern detection in artwork collections. Both problems present computer vision challenges which we believe to be representative of the ones encountered in cultural heritage applications: limited supervision is available, the tasks are fine-grained recognition, and the data comes in several different modalities. Both applications are also highly practical, as recognizing watermarks makes it possible to date and locate documents, while detecting repeated patterns allows exploring visual links between artworks. We demonstrate on both tasks the benefits of relying on deep mid-level features. More precisely, we define an image similarity score based on geometric verification of mid-level features and show how spatial consistency can be used to fine-tune out-of-the-box features for the target dataset with weak or no supervision. This paper relates and extends our previous works (Shen et al. in Discovering visual patterns in art collections with spatially-consistent feature learning, 2019; Shen et al. in Large-scale historical watermark recognition dataset and a new consistency-based approach, 2020). Our code and data are available at http://imagine.enpc.fr/~shenx/HisImgAnalysis/.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aubry, M., Russell, B.C., & Sivic, J. (2014). Painting-to-3d model alignment via discriminative visual elements. ACM Transactions on Graphics (ToG).
Belongie, S., Malik, J., & Puzicha, J. (2001). Shape context: A new descriptor for shape matching and object recognition. In NeurIPS.
Bender, K. (2015). Distant viewing in art history. A case study of artistic productivity. International Journal for Digital Art History (1).
Bounou, O., Monnier, T., Pastrolin, I., Shen, X., Benevent, C., Limon-Bonnet, M.F., et al. (2020). A web application for watermark recognition. Journal of Data Mining and Digital Humanities.
Brendel, W., & Bethge, M. (2019). Approximating cnns with bag-of-local-features models works surprisingly well on imagenet. In ICLR.
Briquet online. http://www.ksbm.oeaw.ac.at/_scripts/php/BR.php
Briquet, C. M. (1907). Les filigranes.
Brueghel family: Jan brueghel the elder.” the brueghel family database. University of California, Berkeley. http://www.janbrueghel.net/. Accessed 2018 October 16
Castellano, G., Lella, E., & Vessio, G. (2021). Visual link retrieval and knowledge discovery in painting datasets. Multimedia Tools and Applications.
Crowley, E. J., & Zisserman, A. (2013). Of gods and goats: Weakly supervised learning of figurative art. In BMVC.
Crowley, E. J., & Zisserman, A. (2016). The art of detection. In ECCV.
Crowley, E. J., Parkhi, O. M., & Zisserman, A. (2015). Face painting: Querying art with photos. In BMVC.
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In CVPR.
Doersch, C., Gupta, A., & Efros, A. A. (2013). Mid-level visual element discovery as discriminative mode seeking. In NeurIPS.
Doersch, C., Gupta, A., & Efros, A. A. (2014). Context as supervisory signal: Discovering objects with predictable context. In ECCV.
Doersch, C., Gupta, A., & Efros, A. A. (2015). Unsupervised visual representation learning by context prediction. In ICCV.
Dutta, A., Gupta, A., & Zissermann, A. (2016). VGG image annotator (VIA). http://www.robots.ox.ac.uk/vgg/software/via/
Elezi, I., Vascon, S., Torcinovich, A., Pelillo, M., & Leal-Taixé, L. (2020). The group loss for deep metric learning. In ECCV.
Elgammal, A., Liu, B., Elhoseiny, M., & Mazzone, M. (2017). Can: Creative adversarial networks, generating“ art” by learning about styles and deviating from style norms. arXiv
Frauenknecht, E., Stieglecker, M. (2015). Wzis – wasserzeichen-informationssystem: Verwaltung und präsentation von wasserzeichen und ihrer metadaten. Kodikologie und Paläographie im Digitalen Zeitalter 3: Codicology and Palaeography in the Digital Age 3
Gatys, L. A., Ecker, A. S., & Bethge, M. (2016). Image style transfer using convolutional neural networks. In CVPR.
Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F. A., & Brendel, W. (2019). Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. In ICLR.
Gidaris, S., & Komodakis, N. (2018). Dynamic few-shot visual learning without forgetting. In CVPR.
Ginosar, S., Haas, D., Brown, T., & Malik, J. (2014). Detecting people in cubist art. In Workshop at ECCV.
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR.
Gonthier, N., Gousseau, Y., Ladjal, S., & Bonfait, O. (2018). Weakly supervised object detection in artworks. arXiv
Gordo, A., Almazan, J., Revaud, J., & Larlus, D. (2017). End-to-end learning of deep visual representations for image retrieval. In IJCV.
Grauman, K., & Darrell, T. (2005). Pyramid match kernels: Discriminative classification with sets of image features. In ICCV.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In CVPR.
Hertzmann, A. (2018). Can computers create art? In Arts. Multidisciplinary Digital Publishing Institutes
Hertzmann, A., Jacobs, C. E., Oliver, N., Curless, B., & Salesin, D. H. (2001). Image analogies. In SIGGRAPH.
Hiary, H. (2008). Paper-based watermark extraction with image processing. Ph.D. thesis.
Hiary, H., & Ng, K. (2007). A system for segmenting and extracting paper-based watermark designs. International Journal on Digital Libraries.
Honig, E. (2016). Jan Brueghel and the Senses of Scale. University Park: Pennsylvania State University Press.
imgs.ai. https://imgs.ai/
Jabri, A., Owens, A., & Efros, A. A. (2020). Space-time correspondence as a contrastive random walk. In NeurIPS.
Jenicek, T., & Chum, O. (2019). Linking art through human poses. In ICDAR.
Karayev, S., Trentacoste, M., Han, H., Agarwala, A., Darrell, T., Hertzmann, A., & Winnemoeller, H. (2013). Recognizing image style. arXiv
Kim, S., Kim, D., Cho, M., & Kwak, S. (2020). Proxy anchor loss for deep metric learning. In CVPR.
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv
Loing, V., Marlet, R., & Aubry, M. (2018). Virtual training for a real application: Accurate object-robot relative localization without calibration. In IJCV.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. In IJCV.
Mao, H., Cheung, M., & She, J. (2017). Deepart: Learning joint representations of visual arts. In: ACM Multimedia
Massa, F., Russell, B. C., & Aubry, M. (2016). Deep exemplar 2d–3d detection by adapting from real to rendered views. In CVPR.
Mensink, T., & Van Gemert, J. (2014). The rijksmuseum challenge: Museum-centered visual recognition. In ICMR.
Noroozi, M., & Favaro, P. (2016). Unsupervised learning of visual representations by solving jigsaw puzzles. In ECCV.
Paumard, M. M., Picard, D., & Tabia, H. (2018). Jigsaw puzzle solving using local feature co-occurrences in deep neural networks. In ICIP.
Picard, D., Gosselin, P. H., & Gaspard, M. C. (2015). Challenges in content-based image indexing of cultural heritage collections. IEEE Signal Processing Magazine.
Picard, D., Henn, T., & Dietz, G. (2016). Non-negative dictionary learning for paper watermark similarity. In ACSSC.
Piccard, G. (1977). Die Wasserzeichenkartei Piccard im Hauptstaatsarchiv Stuttgart: Wasserzeichen Buchstabe P.
Pondenkandath, V., Alberti, M., Eichenberger, N., Ingold, R., & Liwicki, M. (2018). Identifying cross-depicted historical motifs. arXiv
Qi, H., Brown, M., Lowe, D. G. (2018). Low-shot learning with imprinted weights. In CVPR.
Rad, M., Oberweger, M., Lepetit, V. (2018). Feature mapping for learning fast and accurate 3d pose inference from synthetic images. In CVPR.
Radenović, F., Tolias, G., & Chum, O. (2016). Fine-tuning cnn image retrieval with no human annotation. In TPAMI.
Rauber, C., Tschudin, P., & Pun, T. (1997). Retrieval of images from a library of watermarks for ancient paper identification. In Electronic Visualisation and the Arts.
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In NeurIPS.
Rocco, I., Cimpoi, M., Arandjelović, R., Torii, A., Pajdla, T., & Sivic, J. (2018). Neighbourhood consensus networks. In NeurIPS.
Said, J., & Hiary, H. (2016). Watermark location via back-lighting modelling and verso registration. Multimedia Tools and Applications.
Seguin, B., diLenardo, I., & Kaplan, F. (2017). Tracking transmission of details in paintings. In DH.
Shen, X., Darmon, F., Efros, A. A., & Aubry, M. (2020). Ransac-flow: Generic two-stage image alignment. In ECCV.
Shen, X., Efros, A. A., & Aubry, M. (2019). Discovering visual patterns in art collections with spatially-consistent feature learning. In CVPR.
Shen, X., Pastrolin, I., Bounou, O., Gidaris, S., Smith, M., Poncet, O., & Aubry, M. (2020). Large-scale historical watermark recognition: Dataset and a new consistency-based approach. In ICPR.
Shrivastava, A., Malisiewicz, T., Gupta, A., & Efros, A. A. (2011). Data-driven visual similarity for cross-domain image matching. In SIGGRAPH ASIA.
Singh, S., Gupta, A., & Efros, A. A. (2012). Unsupervised discovery of mid-level discriminative patches. In ECCV.
Sivic, J., & Zisserman, A. (2003). Video google: A text retrieval approach to object matching in videos. In ICCV.
Snell, J., Swersky, K., & Zemel, R. (2017). Prototypical networks for few-shot learning. In NeurIPS.
Strezoski, G., & Worring, M. (2017). Omniart: Multi-task deep learning for artistic data analysis. arXiv
Su, H., Qi, C.R., Li, Y., & Guibas, L.J. (2015). Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views. In ICCV.
Sun, B., Feng, J., & Saenko, K. (2016). Return of frustratingly easy domain adaptation. In AAAI.
Tan, W. R., Chan, C. S., Aguirre, H. E., & Tanaka, K. (2016). Ceci n’est pas une pipe: A deep convolutional network for fine-art paintings classification. In ICIP.
Teh, E. W., DeVries, T., & Taylor, G. W. (2020). Proxynca++: Revisiting and revitalizing proxy neighborhood component analysis. In ECCV.
timemachine. diamond.timemachine.eu
Úbeda, I., Saavedra, J. M., Nicolas, S., Petitjean, C., & Heutte, L. (2019). Pattern spotting in historical documents using convolutional models. In Proceedings of the 5th international workshop on historical document imaging and processing (pp. 60–65).
Vinyals, O., Blundell, C., Lillicrap, T., & Wierstra, D., et al. (2016). Matching networks for one shot learning. In NeurIPS.
Wallraven, C., Caputo, B., & Graf, A. (2003). Recognition with local features: The kernel recipe. In ICCV.
Wang, X., Jabri, A., & Efros, A. A. (2019). Learning correspondence from the cycle-consistency of time. In CVPR.
Westlake, N., Cai, H., & Hall, P. (2016). Detecting people in artwork with cnns. In ECCV.
Wilber, M. J., Fang, C., Jin, H., Hertzmann, A., Collomosse, J., & Belongie, S. J. (2017). Bam! the behance artistic media dataset for recognition beyond photography. In ICCV.
Yin, R., Monson, E., Honig, E., Daubechies, I., & Maggioni, M. (2016). Object recognition in art drawings: Transfer of a neural network. In ICASSP.
Zhang, J., Marszałek, M., Lazebnik, S., & Schmid, C. (2007). Local features and kernels for classification of texture and object categories: A comprehensive study. In ICCV.
Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Katsushi Ikeuchi.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Shen, X., Champenois, R., Ginosar, S. et al. Spatially-Consistent Feature Matching and Learning for Heritage Image Analysis. Int J Comput Vis 130, 1325–1339 (2022). https://doi.org/10.1007/s11263-022-01576-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-022-01576-x