Multimodal Event Classification in Social Media

Hexiang Wu¹⁰,
Peifeng Li¹⁰ &
Zhongqing Wang¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1967))

Included in the following conference series:

International Conference on Neural Information Processing

842 Accesses

Abstract

Currently, research on events mainly focuses on the task of event extraction, which aims to extract trigger words and arguments from text and is a fine-grained classification task. Although some researchers have improved the event extraction task by additionally constructing external image datasets, these images do not come from the original source of the text and cannot be used for detecting real-time events. To detect events in multimodal data on social media, we propose a new multimodal approach which utilizes text-image pairs for event classification. Our model uses a unified language pre-trained model CLIP to obtain visual and textual features, and builds a Transformer encoder as a fusion module to achieve interaction between modalities, thereby obtaining a good multimodal joint representation. Experimental results show that the proposed model outperforms several state-of-the-art baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 71.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 89.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Cross-Modal Contrastive Learning for Event Extraction

MEED: A Multimodal Event Extraction Dataset

Cross-Modal Event Retrieval: A Dataset and a Baseline Using Deep Semantic Learning

References

Liang, T., Lin, G., Wan, M., Li, T., Ma, G., Lv, F.: Expanding large pre-trained unimodal models with multimodal information injection for image-text multimodal classification. In: CVPR 2022, pp. 15471–15480 (2022)
Google Scholar
Abavisani, M., Wu, Li., Hu, S., Tetreault, J.R., Jaimes, A.: Multimodal categorization of crisis events in social media. In: CVPR 2020, pp. 14667–14677 (2020)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML 2021, pp. 8748–8763 (2021)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NIPS 2017, pp. 5998–6008 (2017)
Google Scholar
Alam, F., Ofli, F., Imran, M.: CrisisMMD: multimodal twitter datasets from natural disasters. In: ICWSM 2018, pp. 465–473 (2018)
Google Scholar
Ofli, F., Alam, F., Imran, M.: Analysis of social media data using multimodal deep learning for disaster response. In: ISCRAM 2020, pp. 802–811 (2020)
Google Scholar
Kim, J., Jun, J., Zhang, B.: Bilinear attention networks. In: NeurIPS 2018, pp. 1571–1571 (2018)
Google Scholar
Zhang, T., et al.: Improving event extraction via multimodal integration. In: ACM Multimedia 2017, pp. 270–278 (2017)
Google Scholar
Li, M., Zareian, A., Zeng, Q., Whitehead S., Lu, D., Ji, H., Chang, S.: Cross-media structured common space for multimedia event extraction. In: ACL 2020, pp. 2557–2568 (2020)
Google Scholar
Tong, M., Wang, S., Cao, Y., Xu, B., Li, J., Hou, L., Chua, T.: Image enhanced event detection in news articles. In: AAAI 2020, pp. 9040–9047 (2020)
Google Scholar
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT(1) 2019, pp. 4171–4186 (2019)
Google Scholar
Wadden, D., Wennberg, U., Luan, Y., Hajishirzi, H.: Entity, relation, and event extraction with contextualized span representations. In: EMNLP/IJCNLP(1) 2019, pp. 5783–5788 (2019)
Google Scholar
Lin, Y., Ji, H., Huang, F., Wu, L.: A joint neural model for information extraction with global features. In: ACL 2020, pp. 7999–8009 (2020)
Google Scholar
Du, X., Cardie, C.: Event extraction by answering (almost) natural questions. In: EMNLP(1)2020, pp. 671–683 (2020)
Google Scholar
Lu, Y., et al.: Text2Event: controllable sequence-to-structure generation for end-to-end event extraction. In: ACL/IJCNLP(1)2021, pp. 2795–2806 (2021)
Google Scholar
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140), 1–67 (2020)
Google Scholar
Liu, X., Huang, H., Shi, G., Wang, B.: Dynamic prefix-tuning for generative template-based event extraction. In: ACL(1)2022, pp. 5216–5228 (2022)
Google Scholar
Kim, W., Son, B., Kim, I.: ViLT: vision-and-language transformer without convolution or region supervision. In: ICML 2021, pp. 5583–5594 (2021)
Google Scholar
Yang, Z., He, X., Gao, J., Deng, L., Smola, A.J.: Stacked attention networks for image question answering. In: CVPR 2016, pp. 21–29 (2016)
Google Scholar
Lu, J., Yang, J., Batra, D., Parikh, D.: Hierarchical question-image co-attention for visual question answering. In: NIPS 2016, pp. 289–297 (2016)
Google Scholar
Kiela, D., Bhooshan, S., Firooz, H., Testuggine, D.: Supervised Multimodal Bitransformers for Classifying Images and Text. In:arXiv preprint arXiv:1909.02950 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR 2016, pp. 770–778 (2016)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: ICLR 2021 (2021)
Google Scholar

Download references

Acknowledgements

The authors would like to thank the three anonymous reviewers for their comments on this paper. This research was supported by the National Natural Science Foundation of China (Nos. 62276177 and 61836007), and Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD).

Author information

Authors and Affiliations

School of Computer Science and Technology, Soochow University, Suzhou, China
Hexiang Wu, Peifeng Li & Zhongqing Wang

Authors

Hexiang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Peifeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhongqing Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peifeng Li .

Editor information

Editors and Affiliations

Changsha, China
Biao Luo
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Long Cheng
Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, China
Zheng-Guang Wu
School of Automation, Guangdong University of Technology, Guangdong, China
Hongyi Li
UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, H., Li, P., Wang, Z. (2024). Multimodal Event Classification in Social Media. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1967. Springer, Singapore. https://doi.org/10.1007/978-981-99-8178-6_26

Download citation

DOI: https://doi.org/10.1007/978-981-99-8178-6_26
Published: 30 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8177-9
Online ISBN: 978-981-99-8178-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multimodal Event Classification in Social Media

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Cross-Modal Contrastive Learning for Event Extraction

MEED: A Multimodal Event Extraction Dataset

Cross-Modal Event Retrieval: A Dataset and a Baseline Using Deep Semantic Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Multimodal Event Classification in Social Media

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Cross-Modal Contrastive Learning for Event Extraction

MEED: A Multimodal Event Extraction Dataset

Cross-Modal Event Retrieval: A Dataset and a Baseline Using Deep Semantic Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation