8000 GitHub - WSUAgRobotics/data-aug-multi-modal-llm
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

WSUAgRobotics/data-aug-multi-modal-llm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 

Repository files navigation

Data-Aug-Multi-Modal-LLM

A Comprehensive Survey on Text, Audio, and Image Data Augmentation Using Multi-Modal LLMs for Deep Learning Applications

This repo contains all the relevant paper and information used in our study. This will be updated perodically as we revise our manuscript throughout the publication process. Google Search Trend

The papers used in this study are organised and the links can be found below:


Text Data Augmentation

Peer Reviewed Paper

  1. Ahmed, T., Pai, K. S., Devanbu, P., & Barr, E. Automatic semantic augmentation of language model prompts (for code summarization). Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, 2024. paper code - NOCODE

  2. Cai, X., Xiao, M., Ning, Z., & Zhou, Y. Resolving the imbalance issue in hierarchical disciplinary topic inference via LLM-based data augmentation. 2023 IEEE International Conference on Data Mining Workshops (ICDMW), 2023. paper code - NOCODE

  3. Cloutier, N. A., & Japkowicz, N. Fine-tuned generative LLM oversampling can improve performance over traditional techniques on multiclass imbalanced text classification. 2023 IEEE International Conference on Big Data (BigData), 2023. paper code

  4. Santos, V. G., Santos, G. L., Lynn, T., & Benatallah, B. Identifying citizen-related issues from social media using LLM-based data augmentation. International Conference on Advanced Information Systems Engineering, 2024. paper code

  5. Hu, L., He, H., Wang, D., Zhao, Z., Shao, Y., & Nie, L. LLM vs small model? Large language model-based text augmentation enhanced personality detection model. Proceedings of the AAAI Conference on Artificial Intelligence, 2024. paper code

  6. Hua, J., Cui, X., Li, X., Tang, K., & Zhu, P. Multimodal fake news detection through data augmentation-based contrastive learning. Applied Soft Computing, 2023. paper code

  7. Jung, H., Yeen, H., Lee, J., Kim, M., Bang, N., & Koo, M.-W. Enhancing task-oriented dialog system with subjective knowledge: A large language model-based data augmentation framework. Proceedings of The Eleventh Dialog System Technology Challenge, 2023. paper code

  8. Lai, J., Yang, X., Luo, W., Zhou, L., Li, L., Wang, Y., & Shi, X. RumorLLM: A rumor large language model-based fake-news-detection data-augmentation approach. Applied Sciences, 2024. paper code

  9. Meng, Z., Liu, T., Zhang, H., Feng, K., & Zhao, P. CEAN: Contrastive event aggregation network with LLM-based augmentation for event extraction. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024. paper code

  10. Silva, K., Frommholz, I., Can, B., Blain, F., Sarwar, R., & Ugolini, L. Forged-GAN-BERT: Authorship attribution for LLM-generated forged novels. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, 2024. paper code

  11. Wan, M., Safavi, T., Jauhar, S. K., Kim, Y., Counts, S., Neville, J., Suri, S., Shah, C., White, R. W., Yang, L., & others. TnT-LLM: Text mining at scale with large language models. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024. paper code

  12. Wu, S.-L., Chang, X., Wichern, G., Jung, J.-W., Germain, F., Le Roux, J., & Watanabe, S. Improving audio captioning models with fine-grained audio features, text embedding supervision, and LLM mix-up augmentation. ICASSP 2024 IEEE International Conference on Acoustics, Speech and Signal Processing, 2024. paper code

  13. Zhang, J., Gao, H., Zhang, P., Feng, B., Deng, W., & Hou, Y. LA-UCL: LLM-augmented unsupervised contrastive learning framework for few-shot text classification. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), 2024. paper code

  14. Zhang, M., Jiang, G., Liu, S., Chen, J., & Zhang, M. LLM–assisted data augmentation for Chinese dialogue–level dependency parsing. Computational Linguistics, 2024. paper code

  15. Zhao, H., Chen, H., Ruggles, T. A., Feng, Y., Singh, D., & Yoon, H.-J. Improving text classification with large language model-based data augmentation. Electronics, 2024. paper code


Preprints

  1. Kang, A., Chen, J. Y., Lee-Youngzie, Z., & Fu, S. Synthetic data generation with LLM for improved depression prediction. arXiv preprint arXiv:2411.17672, 2024. paper code

  2. Song, S., Subramanyam, A., Madejski, I., & Grossman, R. L. Lab-RAG: Label boosted retrieval augmented generation for radiology report generation. arXiv preprint arXiv:2411.16523, 2024. paper code

  3. Fischer, L., Gao, Y., Lintner, A., & Ebling, S. SwissADT: An audio description translation system for Swiss languages. arXiv preprint arXiv:2411.14967, 2024. paper code

  4. Glazkova, A., & Zakharova, O. Evaluating LLM prompts for data augmentation in multi-label classification of ecological texts. arXiv preprint arXiv:2411.14896, 2024. paper code

  5. Wen, Z., Guo, D., & Zhang, H. AIDBench: A benchmark for evaluating the authorship identification capability of large language models. arXiv preprint arXiv:2411.13226, 2024. paper code

  6. Alyafeai, Z., et al. Arabic Stable LM: Adapting Stable LM 2 1.6B to Arabic. arXiv preprint arXiv:2412.04277, 2024. paper code

  7. Liu, J., & Nguyen, A. Rephrasing electronic health records for pretraining clinical language models. arXiv preprint arXiv:2411.18940, 2024. paper code

  8. Abane, A., Bekri, A., & Battou, A. FastRAG: Retrieval augmented generation for semi-structured data. arXiv preprint arXiv:2411.13773, 2024. paper code

  9. Yang, M., Shi, B., Le, M., et al. AudioBox TTA-RAG: Improving zero-shot and few-shot text-to-audio with retrieval-augmented generation. arXiv preprint arXiv:2411.05141, 2024. paper code

  10. Fuad, K. A. A., & Chen, L. LLM-Ref: Enhancing reference handling in technical writing with large language models. arXiv preprint arXiv:2411.00294, 2024. paper code

  11. Wang, Z., Xu, G., & Ren, M. LLM-generated natural language meets scaling laws: New explorations and data augmentation methods. arXiv preprint arXiv:2407.00322, 2024. paper code

  12. Dai, H., Liu, Z., & Wu, Z. AugGPT: Leveraging ChatGPT for text data augmentation. arXiv preprint arXiv:2302.13007, 2023. paper code

  13. Cegin, J., Simko, J., & Brusilovsky, P. LLMs vs established text augmentation techniques for classification: When do the benefits outweigh the costs? arXiv preprint arXiv:2408.16502, 2024. paper code

  14. Lee, N., Wattanawong, T., Kim, S., et al. LLM2LLM: Boosting LLMs with novel iterative data enhancement. arXiv preprint arXiv:2403.15042, 2024. paper code

  15. Song, Y., Zhang, J., Tian, Z., et al. LLM-based privacy data augmentation guided by knowledge distillation with a distribution tutor for medical text classification. arXiv preprint arXiv:2402.16515, 2024. paper code

  16. Yang, H., Zhao, X., Huang, S., et al. LATEX-GCL: Large language models (LLMs)-based data augmentation for text-attributed graph contrastive learning. arXiv preprint arXiv:2409.01145, 2024. paper code

  17. Liu, Y., Zhu, Y., Gu, Z., et al. Improving topic relevance model by mix-structured summarization and LLM-based data augmentation. arXiv preprint arXiv:2404.02616, 2024. paper code

  18. Cegin, J., Pecher, B., Simko, J., et al. Use random selection for now: Investigation of few-shot selection strategies in LLM-based text augmentation for classification. arXiv preprint arXiv:2410.10756, 2024. paper code

  19. Jia, K., Wu, Y., & Li, R. Curriculum-style data augmentation for LLM-based metaphor detection. arXiv preprint arXiv:2412.02956, 2024. paper code

  20. Jung, K., Seo, Y., Cho, S., et al. DALDA: Data augmentation leveraging diffusion model and LLM with adaptive guidance scaling. arXiv preprint arXiv:2409.16949, 2024. paper code

  21. Cegin, J., Pecher, B., Simko, J., et al. Effects of diversity incentives on sample diversity and downstream model performance in LLM-based text augmentation. arXiv preprint arXiv:2401.06643, 2024. paper code

  22. Zeng, L. Leveraging large language models for code-mixed data augmentation in sentiment analysis. arXiv preprint arXiv:2411.00691, 2024. paper code

  23. Litake, O., Yagnik, N., & Labhsetwar, S. Inditext boost: Text augmentation for low resource Indian languages. arXiv preprint arXiv:2401.13085, 2024. paper code

  24. Sahu, G., Vechtomova, O., Bahdanau, D., & Laradji, I. H. PromptMix: A class boundary augmentation method for large language model distillation. arXiv preprint arXiv:2310.14192, 2023. paper code

  25. Chowdhury, A. G., & Chadha, A. Generative data augmentation using LLMs improves distributional robustness in question answering. arXiv preprint arXiv:2309.06358, 2023. paper code

  26. Wang, L., Yu, L., Zhang, Y., & Xie, H. Large language model-based augmentation for imbalanced node classification on text-attributed graphs. arXiv preprint arXiv:2410.16882, 2024. paper code


Image Data Augmentation

Peer Reviewed Paper

  1. Sapkota, R., Meng, Z., & Karkee, M. Synthetic meets authentic: Leveraging LLM generated datasets for YOLO11 and YOLOv10-based apple detection through machine vision sensors. Smart Agricultural Technology, 2024. paper code

  2. Yuan, J., Tang, R., Jiang, X., & Hu, X. Large language models for healthcare data augmentation: An example on patient-trial matching. AMIA Annual Symposium Proceedings, 2023. paper code

  3. Li, H., Chen, B., Chen, J., et al. ITIMCA: Image-text information and cross-attention for multi-modal cassava leaf disease classification based on a novel multi-modal dataset in natural environments. Crop Protection, 2024. paper code

  4. Liu, Y., Zhu, Y., Gu, Z., et al. Enhanced dual contrast representation learning with cell separation and merging for breast cancer diagnosis. Computer Vision and Image Understanding, 2024. paper code

  5. Kirilenko, D., Andreychuk, A., Panov, A. I., & Yakovlev, K. Generative models for grid-based and image-based pathfinding. Artificial Intelligence, 2024. paper code

  6. Jindal, N., Kumaresan, P. K., Ponnusamy, R., et al. MISTRA: Misogyny detection through text–image fusion and representation analysis. Natural Language Processing Journal, 2024. paper code

  7. Li, J., Guan, Z., Wang, J., et al. Integrated image-based deep learning and language models for primary diabetes care. Nature Medicine, 2024. paper code

  8. Liu, F., Zhu, T., Wu, X., et al. A medical multimodal large language model for future pandemics. NPJ Digital Medicine, 2023. paper code

  9. Cortacero, K., McKenzie, B., Müller, S., et al. Evolutionary design of explainable algorithms for biomedical image segmentation. Nature Communications, 2023. paper code

  10. Raminedi, S., Shridevi, S., & Won, D. Multi-modal transformer architecture for medical image analysis and automated report generation. Scientific Reports, 2024. paper code

  11. Wang, Y., Shi, X., & Zhao, X. MLLM4Rec: Multimodal information enhancing LLM for sequential recommendation. Journal of Intelligent Information Systems, 2024. paper code

  12. Bet, M., Mălan, A., Aldinucci, M., et al. DALLMi: Domain adaption for LLM-based multi-label classifier. Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2024. paper code

  13. Sheik, R., Sundara, K. P., & Nirmala, S. J. Neural data augmentation for legal overruling task: Small deep learning models vs. large language models. Neural Processing Letters, 2024. paper code


Prepreints

  1. Wu, W., Qiu, X., Song, S., et al. Image augmentation agent for weakly supervised semantic segmentation. arXiv preprint arXiv:2412.20439, 2024. paper code

  2. Qian, R., Yin, X., & Dou, D. Reasoning to attend: Try to understand how ¡SEG¿ token works. arXiv preprint arXiv:2412.17741, 2024. paper code

  3. Yin, S., Fu, C., Zhao, S., et al. T2Vid: Translating long text into multi-image is the catalyst for video-LLMs. arXiv preprint arXiv:2411.19951, 2024. paper code

  4. Song, S., Subramanyam, A., Madejski, I., & Grossman, R. L. Lab-RAG: Label boosted retrieval augmented generation for radiology report generation. arXiv preprint arXiv:2411.16523, 2024. paper code

  5. Lingenberg, T., Reuter, M., Sudhakaran, G., et al. DIAGen: Diverse image augmentation with generative models. arXiv preprint arXiv:2408.14584, 2024. paper code

  6. Li, J., Zhang, F., Zhu, J., et al. ForgeryGPT: Multimodal large language model for explainable image forgery detection and localization. arXiv preprint arXiv:2410.10238, 2024. paper code

  7. Sultan, O., Khasin, A., Shiran, G., et al. Visual editing with LLM-based tool chaining: An efficient distillation approach for real-time applications. arXiv preprint arXiv:2410.02952, 2024. paper code

  8. Jin, J., Wang, X., Zhu, Q., et al. Pedestrian attribute recognition: A new benchmark dataset and a large language model augmented framework. arXiv preprint arXiv:2408.09720, 2024. paper code

  9. Hsieh, C., Moreira, C., Nobre, I. B., et al. DALL-M: Context-aware clinical data augmentation with LLMs. arXiv preprint arXiv:2407.08227, 2024. paper code

  10. Liu, J., Huang, X., Zheng, J., et al. MM-Instruct: Generated visual instructions for large multimodal model alignment. arXiv preprint arXiv:2406.19736, 2024. paper code


AUDIO/VOICE DATA AUGMENTATION

Peer Reviewed Paper

  1. Wu, S.-L., Chang, X., Wichern, G., et al. Improving audio captioning models with fine-grained audio features, text embedding supervision, and LLM mix-up augmentation. ICASSP 2024 IEEE International Conference on Acoustics, Speech and Signal Processing, 2024. paper code

  2. Xu, D. AudioSetMix: Enhancing audio-language datasets with LLM-assisted augmentations. arXiv preprint arXiv:2405.11093, 2024. paper code

  3. Dhingra, P., Agrawal, S., Veerappan, C. S., et al. Speech de-identification data augmentation leveraging large language model. IEEE International Conference on Asian Language Processing (IALP), 2024. paper code

  4. Cai, Z., Ghosh, S., Adatia, A. P., et al. AV-Deepfake1M: A large-scale LLM-driven audio-visual deepfake dataset. ACM International Conference on Multimedia, 2024. paper code

  5. Ma, Z., Wu, W., Zheng, Z., et al. Leveraging speech PTM, text LLM, and emotional TTS for speech emotion recognition. ICASSP 2024 IEEE International Conference on Acoustics, Speech and Signal Processing, 2024. paper code

  6. Dhingra, P., Agrawal, S., Veerappan, C. S., et al. Speech de-identification data augmentation leveraging large language model. ICAICTA 2024 11th International Conference on Advanced Informatics: Concept, Theory and Application, 2024. paper code

  7. Heakl, A., Zaghloul, Y., Ali, M., et al. ArzEn-LLM: Code-switched Egyptian Arabic-English translation and speech recognition using LLMs. Procedia Computer Science, 2024. paper code

  8. Hashmi, E., Yayilgan, S. Y., Yamin, M. M., et al. Self-supervised hate speech detection in Norwegian texts with lexical and semantic augmentations. Expert Systems with Applications, 2024. paper code

  9. Xu, F., Zhou, T., Nguyen, T., et al. Integrating augmented reality and LLM for enhanced cognitive support in critical audio communications. International Journal of Human-Computer Studies, 2024. paper code

  10. Cook, A., & Karakuş, O. LLM-Commentator: Novel fine-tuning strategies of large language models for automatic commentary generation using football event data. Knowledge-Based Systems, 2024. paper code

  11. Gkournelos, C., Konstantinou, C., & Makris, S. An LLM-based approach for enabling seamless human-robot collaboration in assembly. CIRP Annals, 2024. paper code

  12. Alier, M., Pereira, J., García-Peñalvo, F. J., et al. LAMB: An open-source software framework to create AI assistants deployed and integrated into LMS. Computer Standards & Interfaces, 2025. paper code

  13. Senthilselvi, A., Prawin, R., et al. Abstractive summarization of YouTube videos using Lamini-Flan-T5 LLM. ICAIT 2024 Second International Conference on Advances in Information Technology, 2024. paper code

  14. Wang, M., Shafran, I., Soltau, H., et al. Retrieval augmented end-to-end spoken dialog models. ICASSP 2024 IEEE International Conference on Acoustics, Speech and Signal Processing, 2024. paper code

  15. Qiu, P., Wu, C., Zhang, X., et al. Towards building multilingual language model for medicine. Nature Communications, 2024. paper code

  16. Hasebe, K., Fujimura, S., Kojima, T., et al. The effect of noise on deep learning for classification of pathological voice. The Laryngoscope, 2024. paper code


Prepreints

  1. Xu, D. AudioSetMix: Enhancing audio-language datasets with LLM-assisted augmentations. arXiv preprint arXiv:2405.11093, 2024. paper code

  2. Ghosh, S., Kumar, S., Kong, Z., et al. Synthio: Augmenting small-scale audio classification datasets with synthetic data. arXiv preprint arXiv:2410.02056, 2024. paper code

  3. Whitehouse, C., Choudhury, M., & Aji, A. F. LLM-powered data augmentation for enhanced cross-lingual performance. arXiv preprint arXiv:2305.14288, 2023. paper code

  4. Ghosal, D., Majumder, N., Mehrish, A., & Poria, S. Text-to-audio generation using instruction-tuned LLM and latent diffusion model. arXiv preprint arXiv:2304.13731, 2023. paper code

  5. Goel, A., Kong, Z., Valle, R., & Catanzaro, B. Audio dialogues: Dialogues dataset for audio and music understanding. arXiv preprint arXiv:2404.07616, 2024. paper code

  6. Yang, D., Tian, J., Tan, X., et al. UniAudio: An audio foundation model toward universal audio generation. arXiv preprint arXiv:2310.00704, 2023. paper code

  7. Manco, I., Salamon, J., & Nieto, O. Augment, Drop & Swap: Improving diversity in LLM captions for efficient music-text representation learning. arXiv preprint arXiv:2409.11498, 2024. paper code

  8. Li, B., Xie, Z., Xu, X., et al. DiveSound: LLM-assisted automatic taxonomy construction for diverse audio generation. arXiv preprint arXiv:2407.13198, 2024. paper code

  9. Wang, Z., Tai, Y.-W., & Tang, C.-K. Audio-Agent: Leveraging LLMs for audio generation, editing, and composition. arXiv preprint arXiv:2410.03335, 2024. paper code

  10. Shu, F., Zhang, L., Jiang, H., & Xie, C. Audio-visual LLM for video understanding. arXiv preprint arXiv:2312.06720, 2023. paper code

  11. Lei, Z., Na, X., Xu, M., et al. Contextualization of ASR with LLM using phonetic retrieval-based augmentation. arXiv preprint arXiv:2409.15353, 2024. paper code

  12. Huang, J., Ren, Y., Huang, R., et al. Make-an-audio 2: Temporal-enhanced text-to-audio generation. arXiv preprint arXiv:2305.18474, 2023. paper code

  13. Ok, H., Yoo, S., & Lee, J. AudioBERT: Audio knowledge augmented language model. arXiv preprint arXiv:2409.08199, 2024. paper code

  14. Lu, Y., Xie, Y., Fu, R., et al. Codecfake: An initial dataset for detecting LLM-based deepfake audio. arXiv preprint arXiv:2406.08112, 2024. paper code

  15. Das, N., Dingliwal, S., Ronanki, S., et al. SpeechVerse: A large-scale generalizable audio language model. arXiv preprint arXiv:2405.08295, 2024. paper code

  16. Sridhar, A. K., Guo, Y., & Visser, E. Enhancing temporal understanding in audio question answering for large audio language models. arXiv preprint arXiv:2409.06223, 2024. paper code

  17. Vallaeys, T., Shukor, M., Cord, M., & Verbeek, J. Improved baselines for data-efficient perceptual augmentation of LLMs. arXiv preprint arXiv:2403.13499, 2024. paper code


Citation

If you found our work useful fo your research or work, please consider citing it:

Sapkota, R., Raza, S., Shoman, M., Paudel, A. and Karkee, M., 2025. Image, Text, and Speech Data Augmentation using Multimodal LLMs for Deep Learning: A Survey. arXiv preprint arXiv:2501.18648.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0