Abstract
Advanced image data augmentation techniques play a pivotal role in enhancing the training of models for diverse computer vision tasks. Notably, SalfMix and KeepAugment have emerged as popular strategies, showcasing their efficacy in boosting model performance. However, SalfMix reliance on duplicating salient features poses a risk of overfitting, potentially compromising the model’s generalization capabilities. Conversely, KeepAugment, which selectively preserves salient regions and augments non-salient ones, introduces a domain shift that hinders the exchange of crucial contextual information, impeding overall model understanding. In response to these challenges, we introduce KeepOriginalAugment, a novel data augmentation approach. This method intelligently incorporates the most salient region within the non-salient area, allowing augmentation to be applied to either region. Striking a balance between data diversity and information preservation, KeepOriginalAugment enables models to leverage both diverse salient and non-salient regions, leading to enhanced performance. We explore three strategies for determining the placement of the salient region—minimum, maximum, or random—and investigate swapping perspective strategies to decide which part (salient or non-salient) undergoes augmentation. Our experimental evaluations, conducted on classification datasets such as CIFAR-10, CIFAR-100, and TinyImageNet, demonstrate the superior performance of KeepOriginalAugment compared to existing state-of-the-art techniques. The source code for our KeepOriginalAugment method, along with trained models, is publicly available at https://github.com/kmr2017.
Similar content being viewed by others
References
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. In: Proceedings of the International Conference on Learning Representations (ICLR) (2014)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv Preprint arXiv:1409.1556 (2014)
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Wang, X., Shrivastava, A., Gupta, A.: A-fast-RCNN: hard positive generation via adversary for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2606–2615 (2017)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2020)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. Toronto, ON, Canada (2009)
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 834–848 (2018)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)
Zhong, Z., Zheng, L., Kang, G., Li, S., Yang, Y.: Random erasing data augmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 13001–13008 (2020)
Yun, S., Han, D., Oh, S., Chun, S., Choe, J., Yoo, Y.: Cutmix: regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6023–6032 (2019)
Verma, V., et al.: Manifold mixup: better representations by interpolating hidden states. In: Proceedings of the 36th International Conference on Machine Learning, pp. 6438–6447 (2019)
DeVries, T., Taylor, G.: Improved regularization of convolutional neural networks with cutout. arXiv Preprint arXiv:1708.04552 (2017)
Kumar Singh, K., Jae Lee, Y.: Hide-and-seek: forcing a network to be meticulous for weakly-supervised object and action localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3524–3533 (2017)
Chen, P., Liu, S., Zhao, H., Jia, J.: Gridmask data augmentation. arXiv Preprint arXiv:2001.04086 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Qin, J., Fang, J., Zhang, Q., Liu, W., Wang, X., Wang, X.: Resizemix: mixing data with preserved object information and true labels. arXiv Preprint arXiv:2012.11101 (2020)
Zhang, H., Cisse, M., Dauphin, Y., Lopez-Paz, D.: mixup: beyond empirical risk minimization. In: ICLR 2018. arXiv Preprint arXiv:1710.09412 (2017)
Kim, J., Choo, W., Song, H.: Puzzle mix: exploiting saliency and local statistics for optimal mixup. In: International Conference on Machine Learning, pp. 5275–5285 (2020)
Seo, J., Jung, H., Lee, S.: Self-augmentation: generalizing deep networks to unseen classes for few-shot learning. Neural Netw. 138, 140–149 (2021)
Zeiler, M., Fergus, R.: Stochastic pooling for regularization of deep convolutional neural networks. In: ICLR (2013)
Cubuk, E., Zoph, B., Mane, D., Vasudevan, V., Le, Q.: Autoaugment: learning augmentation strategies from data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 113–123 (2019)
Cubuk, E., Zoph, B., Shlens, J., Le, Q.: Randaugment: practical automated data augmentation with a reduced search space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 702–703 (2020)
Gong, C., Wang, D., Li, M., Chandra, V., Liu, Q.: Keepaugment: a simple information-preserving data augmentation approach. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1055–1064 (2021)
Choi, J., Lee, C., Lee, D., Jung, H.: SalfMix: a novel single image-based data augmentation technique using a saliency map. Sensors 21, 8444 (2021)
Uddin, A., Monira, M., Shin, W., Chung, T., Bae, S.: SaliencyMix: a saliency guided data augmentation strategy for better regularization. In: International Conference on Learning Representations (2020)
Le, Y., Yang, X.: Tiny imagenet visual recognition challenge. CS 231N 7, 3 (2015)
Mandal, A., Leavy, S., Little, S.: Dataset diversity: measuring and mitigating geographical bias in image search and retrieval. In: Proceedings of the 1st International Workshop on Trustworthy AI for Multimedia Computing, pp. 19–25 (2021)
Kumar, T., Park, J., Ali, M., Uddin, A., Ko, J., Bae, S.: Binary-classifiers-enabled filters for semi-supervised learning. IEEE Access. 9 167663–167673 (2021)
Kumar, T., Mileo, A., Brennan, R., Bendechache, M.: RSMDA: Random Slices Mixing Data Augmentation. Appl. Sci. 13 1711 (2023)
Turab, M., Kumar, T., Bendechache, M. Saber, T.: Investigating multi-feature selection and ensembling for audio classification. arXiv Preprint arXiv:2206.07511 (2022)
Chandio, A., Shen, Y., Bendechache, M., Inayat, I. Kumar, T.: AUDD: audio Urdu digits dataset for automatic audio Urdu digit recognition. Appl. Sci. 11, 8842 (2021)
Kumar, T., Mileo, A., Brennan, R., Bendechache, M.: Image data augmentation approaches: a comprehensive survey and future directions. arXiv Preprint arXiv:2301.02830 (2023)
Kumar, T., Park, J., Ali, M., Uddin, A., Bae, S.: Class specific autoencoders enhance sample diversity. J. Broadcast Eng. 26, 844–854 (2021)
Acknowledgment
This research was supported by Science Foundation Ireland under grant numbers 18/CRT/6223 (SFI Centre for Research Training in Artificial intelligence), SFI/12/RC/2289/\(P\_2\) (Insight SFI Research Centre for Data Analytics), 13/RC/2094/\(P\_2\) (Lero SFI Centre for Software) and 13/RC/2106/\(P\_2\) (ADAPT SFI Research Centre for AI-Driven Digital Content Technology). For the purpose of Open Access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 IFIP International Federation for Information Processing
About this paper
Cite this paper
Kumar, T., Mileo, A., Bendechache, M. (2024). KeepOriginalAugment: Single Image-Based Better Information-Preserving Data Augmentation Approach. In: Maglogiannis, I., Iliadis, L., Macintyre, J., Avlonitis, M., Papaleonidas, A. (eds) Artificial Intelligence Applications and Innovations. AIAI 2024. IFIP Advances in Information and Communication Technology, vol 714. Springer, Cham. https://doi.org/10.1007/978-3-031-63223-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-63223-5_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-63222-8
Online ISBN: 978-3-031-63223-5
eBook Packages: Computer ScienceComputer Science (R0)