[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Text-Driven Data Augmentation Tool for Synthetic Bird Behavioural Generation

  • Conference paper
  • First Online:
Bioinspired Systems for Translational Applications: From Robotics to Social Engineering (IWINAC 2024)

Abstract

Environmental conservation and biodiversity monitoring efforts have been greatly enhanced by the use of deep learning and computer vision technologies, particularly in protected areas such as parks and wildlife reserves. However, the development of accurate species recognition and behaviour understanding models is limited by the lack of detailed action annotations and reliance on static images, despite the availability of several animal datasets. This paper explores the use of generative models to tackle the challenge of creating comprehensive and diverse synthetic video data for identifying bird species and their behaviour. By utilising and fine-tuning different generative models, this study assesses the feasibility of synthetic video data generation to overcome these limitations. Our work focuses on generating realistic and varied video sequences that can improve machine learning algorithms for bird action detection and species recognition, thereby contributing to the protection and management of natural habitats. Throughout this investigation, we aim to provide new methodologies and tools for wildlife conservation technology, enhancing the monitoring and safeguarding of bird populations in their natural environments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 54.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 69.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chen, H., et al.: Videocrafter2: Overcoming data limitations for high-quality video diffusion models. arXiv preprint arXiv:2401.09047 (2024)

  2. Chen, K., Song, H., Change Loy, C., Lin, D.: Discover and learn new objects from documentaries. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3087–3096 (2017)

    Google Scholar 

  3. Esser, P., Chiu, J., Atighehchian, P., Granskog, J., Germanidis, A.: Structure and content-guided video synthesis with diffusion models. arXiv e-prints pp. arXiv–2302 (2023)

    Google Scholar 

  4. Garcia-Garcia, A., et al.: The robotrix: an extremely photorealistic and very-large-scale indoor dataset of sequences with robot trajectories and interactions. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6790–6797 (2018). https://doi.org/10.1109/IROS.2018.8594495

  5. Ge, Z., et al.: Exploiting temporal information for dcnn-based fine-grained object classification. In: 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), pp. 1–6. IEEE (2016)

    Google Scholar 

  6. Gerry: Birds 525 Species- image classification (2023). https://www.kaggle.com/datasets/gpiosenka/100-bird-species. Accessed 26 Feb 2024

  7. Górriz, J., Álvarez Illan, I., et al.: Computational approaches to explainable artificial intelligence: advances in theory, applications and trends. Inf. Fusion 100, 101945 (2023). https://doi.org/10.1016/j.inffus.2023.101945. https://www.sciencedirect.com/science/article/pii/S1566253523002610

  8. Jiang, A.Q., et al.: Mistral 7b. arXiv preprint arXiv:2310.06825 (2023)

  9. Khachatryan, L., et al.: Text2video-zero: text-to-image diffusion models are zero-shot video generators. arXiv:2303.13439 (2023)

  10. Li, J., Li, D., Savarese, S., Hoi, S.: Blip-2: bootstrapping language-image pre-training with frozen image encoders and large language models (2023)

    Google Scholar 

  11. Luo, Z., et al.: Videofusion: decomposed diffusion models for high-quality video generation. In: IEEE/CVF CVPR (2023)

    Google Scholar 

  12. Martinez-Gonzalez, P., Oprea, S., Garcia-Garcia, A., Jover-Alvarez, A., Orts-Escolano, S., Garcia-Rodriguez, J.: Unrealrox: an extremely photorealistic virtual reality environment for robotics simulations and synthetic data generation. Virt. Real. 24, 271–288 (2020)

    Article  Google Scholar 

  13. Ng, X.L., Ong, K.E., Zheng, Q., Ni, Y., Yeo, S.Y., Liu, J.: Animal kingdom: a large and diverse dataset for animal behavior understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 19023–19034 (2022)

    Google Scholar 

  14. Pino, J., Rodà, F., Ribas, J., Pons, X.: Landscape structure and bird species richness: implications for conservation in rural areas between natural parks. Landsc. Urban Plan. 49(1–2), 35–48 (2000)

    Article  Google Scholar 

  15. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2022)

    Google Scholar 

  16. Ruiz-Ponce, P., Ortiz-Perez, D., Garcia-Rodriguez, J., Kiefer, B.: Poseidon: a data augmentation tool for small object detection datasets in maritime environments. Sensors 23(7) (2023). https://www.mdpi.com/1424-8220/23/7/3691

  17. Song, Q., et al.: Benchmarking wild bird detection in complex forest scenes. Eco. Inf. 80, 102466 (2024)

    Article  Google Scholar 

  18. Van Horn, G., et al.: Building a bird recognition app and large scale dataset with citizen scientists: the fine print in fine-grained dataset collection. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 595–604 (2015). https://doi.org/10.1109/CVPR.2015.7298658

  19. Van Horn, G., et al.: Building a bird recognition app and large scale dataset with citizen scientists: the fine print in fine-grained dataset collection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 595–604 (2015)

    Google Scholar 

  20. Vélez, J., et al.: An evaluation of platforms for processing camera-trap data using artificial intelligence. Methods Ecol. Evol. 14(2), 459–477 (2023)

    Article  Google Scholar 

  21. Yang, W., et al.: A forest wildlife detection algorithm based on improved yolov5s. Animals 13(19), 3134 (2023)

    Article  Google Scholar 

  22. Zhang, G., et al.: CD-GAN: commonsense-driven generative adversarial network with hierarchical refinement for text-to-image synthesis. Intell. Comput. 2, 0017 (2023)

    Article  Google Scholar 

  23. Zhang, L., Gao, J., Xiao, Z., Fan, H.: Animaltrack: a benchmark for multi-animal tracking in the wild. Int. J. Comput. Vision 131(2), 496–513 (2023)

    Article  Google Scholar 

Download references

Acknowledgments

We would like to thank “A way of making Europe” European Regional Development Fund (ERDF) and MCIN/AEI/10.13039/501100011033 for supporting this work under the “CHAN-TWIN” project (grant TED2021-130890B-C21). HORIZON-MSCA-2021-SE-0 action number: 101086387, REMARKABLE, Rural Environmental Monitoring via ultra wide-ARea networKs And distriButed federated Learning. This work has also been supported by a Spanish national and two regional grants for PhD studies, FPU21/00414, CIACIF/2021/430 and CIACIF/2022/175. Finally, we would like to thank the University Institute for Computer Research at the UA for their support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Mulero-Pérez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mulero-Pérez, D., Ortiz-Perez, D., Benavent-Lledo, M., Garcia-Rodriguez, J., Azorin-Lopez, J. (2024). Text-Driven Data Augmentation Tool for Synthetic Bird Behavioural Generation. In: Ferrández Vicente, J.M., Val Calvo, M., Adeli, H. (eds) Bioinspired Systems for Translational Applications: From Robotics to Social Engineering. IWINAC 2024. Lecture Notes in Computer Science, vol 14675. Springer, Cham. https://doi.org/10.1007/978-3-031-61137-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-61137-7_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-61136-0

  • Online ISBN: 978-3-031-61137-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics