Abstract
Environmental conservation and biodiversity monitoring efforts have been greatly enhanced by the use of deep learning and computer vision technologies, particularly in protected areas such as parks and wildlife reserves. However, the development of accurate species recognition and behaviour understanding models is limited by the lack of detailed action annotations and reliance on static images, despite the availability of several animal datasets. This paper explores the use of generative models to tackle the challenge of creating comprehensive and diverse synthetic video data for identifying bird species and their behaviour. By utilising and fine-tuning different generative models, this study assesses the feasibility of synthetic video data generation to overcome these limitations. Our work focuses on generating realistic and varied video sequences that can improve machine learning algorithms for bird action detection and species recognition, thereby contributing to the protection and management of natural habitats. Throughout this investigation, we aim to provide new methodologies and tools for wildlife conservation technology, enhancing the monitoring and safeguarding of bird populations in their natural environments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen, H., et al.: Videocrafter2: Overcoming data limitations for high-quality video diffusion models. arXiv preprint arXiv:2401.09047 (2024)
Chen, K., Song, H., Change Loy, C., Lin, D.: Discover and learn new objects from documentaries. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3087–3096 (2017)
Esser, P., Chiu, J., Atighehchian, P., Granskog, J., Germanidis, A.: Structure and content-guided video synthesis with diffusion models. arXiv e-prints pp. arXiv–2302 (2023)
Garcia-Garcia, A., et al.: The robotrix: an extremely photorealistic and very-large-scale indoor dataset of sequences with robot trajectories and interactions. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6790–6797 (2018). https://doi.org/10.1109/IROS.2018.8594495
Ge, Z., et al.: Exploiting temporal information for dcnn-based fine-grained object classification. In: 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), pp. 1–6. IEEE (2016)
Gerry: Birds 525 Species- image classification (2023). https://www.kaggle.com/datasets/gpiosenka/100-bird-species. Accessed 26 Feb 2024
Górriz, J., Álvarez Illan, I., et al.: Computational approaches to explainable artificial intelligence: advances in theory, applications and trends. Inf. Fusion 100, 101945 (2023). https://doi.org/10.1016/j.inffus.2023.101945. https://www.sciencedirect.com/science/article/pii/S1566253523002610
Jiang, A.Q., et al.: Mistral 7b. arXiv preprint arXiv:2310.06825 (2023)
Khachatryan, L., et al.: Text2video-zero: text-to-image diffusion models are zero-shot video generators. arXiv:2303.13439 (2023)
Li, J., Li, D., Savarese, S., Hoi, S.: Blip-2: bootstrapping language-image pre-training with frozen image encoders and large language models (2023)
Luo, Z., et al.: Videofusion: decomposed diffusion models for high-quality video generation. In: IEEE/CVF CVPR (2023)
Martinez-Gonzalez, P., Oprea, S., Garcia-Garcia, A., Jover-Alvarez, A., Orts-Escolano, S., Garcia-Rodriguez, J.: Unrealrox: an extremely photorealistic virtual reality environment for robotics simulations and synthetic data generation. Virt. Real. 24, 271–288 (2020)
Ng, X.L., Ong, K.E., Zheng, Q., Ni, Y., Yeo, S.Y., Liu, J.: Animal kingdom: a large and diverse dataset for animal behavior understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 19023–19034 (2022)
Pino, J., Rodà, F., Ribas, J., Pons, X.: Landscape structure and bird species richness: implications for conservation in rural areas between natural parks. Landsc. Urban Plan. 49(1–2), 35–48 (2000)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2022)
Ruiz-Ponce, P., Ortiz-Perez, D., Garcia-Rodriguez, J., Kiefer, B.: Poseidon: a data augmentation tool for small object detection datasets in maritime environments. Sensors 23(7) (2023). https://www.mdpi.com/1424-8220/23/7/3691
Song, Q., et al.: Benchmarking wild bird detection in complex forest scenes. Eco. Inf. 80, 102466 (2024)
Van Horn, G., et al.: Building a bird recognition app and large scale dataset with citizen scientists: the fine print in fine-grained dataset collection. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 595–604 (2015). https://doi.org/10.1109/CVPR.2015.7298658
Van Horn, G., et al.: Building a bird recognition app and large scale dataset with citizen scientists: the fine print in fine-grained dataset collection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 595–604 (2015)
Vélez, J., et al.: An evaluation of platforms for processing camera-trap data using artificial intelligence. Methods Ecol. Evol. 14(2), 459–477 (2023)
Yang, W., et al.: A forest wildlife detection algorithm based on improved yolov5s. Animals 13(19), 3134 (2023)
Zhang, G., et al.: CD-GAN: commonsense-driven generative adversarial network with hierarchical refinement for text-to-image synthesis. Intell. Comput. 2, 0017 (2023)
Zhang, L., Gao, J., Xiao, Z., Fan, H.: Animaltrack: a benchmark for multi-animal tracking in the wild. Int. J. Comput. Vision 131(2), 496–513 (2023)
Acknowledgments
We would like to thank “A way of making Europe” European Regional Development Fund (ERDF) and MCIN/AEI/10.13039/501100011033 for supporting this work under the “CHAN-TWIN” project (grant TED2021-130890B-C21). HORIZON-MSCA-2021-SE-0 action number: 101086387, REMARKABLE, Rural Environmental Monitoring via ultra wide-ARea networKs And distriButed federated Learning. This work has also been supported by a Spanish national and two regional grants for PhD studies, FPU21/00414, CIACIF/2021/430 and CIACIF/2022/175. Finally, we would like to thank the University Institute for Computer Research at the UA for their support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Mulero-Pérez, D., Ortiz-Perez, D., Benavent-Lledo, M., Garcia-Rodriguez, J., Azorin-Lopez, J. (2024). Text-Driven Data Augmentation Tool for Synthetic Bird Behavioural Generation. In: Ferrández Vicente, J.M., Val Calvo, M., Adeli, H. (eds) Bioinspired Systems for Translational Applications: From Robotics to Social Engineering. IWINAC 2024. Lecture Notes in Computer Science, vol 14675. Springer, Cham. https://doi.org/10.1007/978-3-031-61137-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-61137-7_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-61136-0
Online ISBN: 978-3-031-61137-7
eBook Packages: Computer ScienceComputer Science (R0)