[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Multimodal Event Classification in Social Media

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1967))

Included in the following conference series:

  • 842 Accesses

Abstract

Currently, research on events mainly focuses on the task of event extraction, which aims to extract trigger words and arguments from text and is a fine-grained classification task. Although some researchers have improved the event extraction task by additionally constructing external image datasets, these images do not come from the original source of the text and cannot be used for detecting real-time events. To detect events in multimodal data on social media, we propose a new multimodal approach which utilizes text-image pairs for event classification. Our model uses a unified language pre-trained model CLIP to obtain visual and textual features, and builds a Transformer encoder as a fusion module to achieve interaction between modalities, thereby obtaining a good multimodal joint representation. Experimental results show that the proposed model outperforms several state-of-the-art baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 71.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 89.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Liang, T., Lin, G., Wan, M., Li, T., Ma, G., Lv, F.: Expanding large pre-trained unimodal models with multimodal information injection for image-text multimodal classification. In: CVPR 2022, pp. 15471–15480 (2022)

    Google Scholar 

  2. Abavisani, M., Wu, Li., Hu, S., Tetreault, J.R., Jaimes, A.: Multimodal categorization of crisis events in social media. In: CVPR 2020, pp. 14667–14677 (2020)

    Google Scholar 

  3. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML 2021, pp. 8748–8763 (2021)

    Google Scholar 

  4. Vaswani, A., et al.: Attention is all you need. In: NIPS 2017, pp. 5998–6008 (2017)

    Google Scholar 

  5. Alam, F., Ofli, F., Imran, M.: CrisisMMD: multimodal twitter datasets from natural disasters. In: ICWSM 2018, pp. 465–473 (2018)

    Google Scholar 

  6. Ofli, F., Alam, F., Imran, M.: Analysis of social media data using multimodal deep learning for disaster response. In: ISCRAM 2020, pp. 802–811 (2020)

    Google Scholar 

  7. Kim, J., Jun, J., Zhang, B.: Bilinear attention networks. In: NeurIPS 2018, pp. 1571–1571 (2018)

    Google Scholar 

  8. Zhang, T., et al.: Improving event extraction via multimodal integration. In: ACM Multimedia 2017, pp. 270–278 (2017)

    Google Scholar 

  9. Li, M., Zareian, A., Zeng, Q., Whitehead S., Lu, D., Ji, H., Chang, S.: Cross-media structured common space for multimedia event extraction. In: ACL 2020, pp. 2557–2568 (2020)

    Google Scholar 

  10. Tong, M., Wang, S., Cao, Y., Xu, B., Li, J., Hou, L., Chua, T.: Image enhanced event detection in news articles. In: AAAI 2020, pp. 9040–9047 (2020)

    Google Scholar 

  11. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT(1) 2019, pp. 4171–4186 (2019)

    Google Scholar 

  12. Wadden, D., Wennberg, U., Luan, Y., Hajishirzi, H.: Entity, relation, and event extraction with contextualized span representations. In: EMNLP/IJCNLP(1) 2019, pp. 5783–5788 (2019)

    Google Scholar 

  13. Lin, Y., Ji, H., Huang, F., Wu, L.: A joint neural model for information extraction with global features. In: ACL 2020, pp. 7999–8009 (2020)

    Google Scholar 

  14. Du, X., Cardie, C.: Event extraction by answering (almost) natural questions. In: EMNLP(1)2020, pp. 671–683 (2020)

    Google Scholar 

  15. Lu, Y., et al.: Text2Event: controllable sequence-to-structure generation for end-to-end event extraction. In: ACL/IJCNLP(1)2021, pp. 2795–2806 (2021)

    Google Scholar 

  16. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140), 1–67 (2020)

    Google Scholar 

  17. Liu, X., Huang, H., Shi, G., Wang, B.: Dynamic prefix-tuning for generative template-based event extraction. In: ACL(1)2022, pp. 5216–5228 (2022)

    Google Scholar 

  18. Kim, W., Son, B., Kim, I.: ViLT: vision-and-language transformer without convolution or region supervision. In: ICML 2021, pp. 5583–5594 (2021)

    Google Scholar 

  19. Yang, Z., He, X., Gao, J., Deng, L., Smola, A.J.: Stacked attention networks for image question answering. In: CVPR 2016, pp. 21–29 (2016)

    Google Scholar 

  20. Lu, J., Yang, J., Batra, D., Parikh, D.: Hierarchical question-image co-attention for visual question answering. In: NIPS 2016, pp. 289–297 (2016)

    Google Scholar 

  21. Kiela, D., Bhooshan, S., Firooz, H., Testuggine, D.: Supervised Multimodal Bitransformers for Classifying Images and Text. In:arXiv preprint arXiv:1909.02950 (2019)

  22. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR 2016, pp. 770–778 (2016)

    Google Scholar 

  23. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: ICLR 2021 (2021)

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank the three anonymous reviewers for their comments on this paper. This research was supported by the National Natural Science Foundation of China (Nos. 62276177 and 61836007), and Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peifeng Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wu, H., Li, P., Wang, Z. (2024). Multimodal Event Classification in Social Media. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1967. Springer, Singapore. https://doi.org/10.1007/978-981-99-8178-6_26

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8178-6_26

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8177-9

  • Online ISBN: 978-981-99-8178-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics