[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3503161.3551576acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Deeply Exploit Visual and Language Information for Social Media Popularity Prediction

Published: 10 October 2022 Publication History

Abstract

Social media popularity prediction task is to predict future attractiveness of new posts, which could be applied for online advertising, social recommendation, and demand prediction. Existing methods have explored multiple feature types to model the popularity prediction, including user profile, tag, space-time, category, and others. However, images and texts of social media posts, as important and primary information, are usually used by simple or insufficient processing. In this paper, we propose a method to deeply exploit visual and language information to explore the attractiveness of posts. Specifically, images are parsed from multiple perspectives including multi-modal semantic representation, perceptual image quality, and scene analysis. Different word-level and sentence-level semantic embedding are extracted from all available language texts including title, tags, concept and category. It makes social media popularity modeling more reliable with the powerful visual and language representation. Experimental results demonstrate the effectiveness of exploiting visual and language information by the proposed method, and we achieve new state-of-the-art results on the SMP Challenge at ACM Multimedia 2022.

Supplementary Material

MP4 File (MM22-mmgc10.mp4)
Our team won the champion of the 2022 ACM Social Media Popularity Prediction challenge.This video is an introduction to our related paper:Deeply Exploit Visual and Language Information for Social Media Popularity Prediction.In this paper, we propose a method to deeply exploit visual and language information available to explore the attractiveness of posts.

References

[1]
Max Bain, Arsha Nagrani, Gül Varol, and Andrew Zisserman. 2022. A CLIP-Hitchhiker's Guide to Long Video Retrieval. arXiv preprint arXiv:2205.08508 (2022).
[2]
Qi Cao, Huawei Shen, Jinhua Gao, Bingzheng Wei, and Xueqi Cheng. 2020. Popularity prediction on social platforms with coupled graph neural networks. In Proceedings of the 13th International Conference on Web Search and Data Mining. 70--78.
[3]
Guandan Chen, Qingchao Kong, Nan Xu, and Wenji Mao. 2019a. NPP: A neural popularity prediction model for social media content. Neurocomputing, Vol. 333 (2019), 221--230.
[4]
Junhong Chen, Dayong Liang, Zhanmo Zhu, Xiaojing Zhou, Zihan Ye, and Xiuyun Mo. 2019b. Social media popularity prediction based on visual-textual features with XGBoost. In Proceedings of the 27th ACM International Conference on Multimedia. 2692--2696.
[5]
Justin Cheng, Lada Adamic, P Alex Dow, Jon Michael Kleinberg, and Jure Leskovec. 2014. Can cascades be predicted?. In Proceedings of the 23rd international conference on World wide web. 925--936.
[6]
Jaemin Cho, Seunghyun Yoon, Ajinkya Kale, Franck Dernoncourt, Trung Bui, and Mohit Bansal. 2022. Fine-grained image captioning with clip reward. arXiv preprint arXiv:2205.13115 (2022).
[7]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[8]
Keyan Ding, Ronggang Wang, and Shiqi Wang. 2019. Social media popularity prediction: A multiple feature fusion approach with deep neural networks. In Proceedings of the 27th ACM International Conference on Multimedia. 2682--2686.
[9]
Emilio Ferrara, Roberto Interdonato, and Andrea Tagarelli. 2014. Online Popularity and Topical Interests Through the Lens of Instagram. In Proceedings of the 25th ACM Conference on Hypertext and Social Media (Santiago, Chile) (HT '14). 24--34.
[10]
Francesco Gelli, Tiberio Uricchio, Marco Bertini, Alberto Del Bimbo, and Shih-Fu Chang. 2015. Image popularity prediction in social media using sentiment and context features. In Proceedings of the 23rd ACM international conference on Multimedia. 907--910.
[11]
Xiuye Gu, Tsung-Yi Lin, Weicheng Kuo, and Yin Cui. 2021. Open-vocabulary Object Detection via Vision and Language Knowledge Distillation. arXiv preprint arXiv:2104.13921 (2021).
[12]
Shintami C Hidayati, Kai-Lung Hua, Wen-Huang Cheng, and Shih-Wei Sun. 2014. What are the Fashion Trends in New York. In Proceedings of ACM International Conference on Multimedia (ACM MM).
[13]
Bogdan Ionescu, Maia Rohm, Bogdan Boteanu, Alexandru-Lucian Gînsca, Mihai Lupu, and Henning Müller. 2021. Benchmarking Image Retrieval Diversification Techniques for Social Media. IEEE TMM, Vol. 23 (2021), 677--691. https://doi.org/10.1109/TMM.2020.2986579
[14]
Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc Le, Yun-Hsuan Sung, Zhen Li, and Tom Duerig. 2021. Scaling up visual and vision-language representation learning with noisy text supervision. In International Conference on Machine Learning. PMLR, 4904--4916.
[15]
Peipei Kang, Zehang Lin, Shaohua Teng, Guipeng Zhang, Lingni Guo, and Wei Zhang. 2019. Catboost-based framework with additional user information for social media popularity prediction. In Proceedings of the 27th ACM international conference on multimedia. 2677--2681.
[16]
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, Vol. 30 (2017).
[17]
Aditya Khosla, Atish Das Sarma, and Raffay Hamid. 2014. What Makes an Image Popular?. In Proceedings of International World Wide Web Conference (WWW).
[18]
Xin Lai, Yihong Zhang, and Wei Zhang. 2020. HyFea: Winning Solution to Social Media Popularity Prediction for Multimedia Grand Challenge 2020. In Proceedings of the 28th ACM International Conference on Multimedia. 4565--4569.
[19]
Cheng Li, Yue Lu, Qiaozhu Mei, Dong Wang, and Sandeep Pandey. 2015. Click-through Prediction for Advertising in Twitter Timeline. In Proceedings of KDD.
[20]
Jiahao Li, Greg Shakhnarovich, and Raymond A Yeh. 2022. Adapting CLIP For Phrase Localization Without Further Training. arXiv preprint arXiv:2204.03647 (2022).
[21]
Yiyi Li and Ying Xie. 2020. Is a Picture Worth a Thousand Words? An Empirical Study of Image Content and Social Media Engagement. Journal of Marketing Research, Vol. 57, 1 (2020), 1--19. https://doi.org/10.1177/0022243719881113
[22]
Ling Lo, Chia-Lin Liu, Rong-An Lin, Bo Wu, and Wen-Huang Cheng. 2019. Dressing for Attention: Outfit Based Fashion Popularity Prediction. In IEEE International Conference on Image Processing (ICIP).
[23]
Huaishao Luo, Lei Ji, Ming Zhong, Yang Chen, Wen Lei, Nan Duan, and Tianrui Li. 2021. Clip4clip: An empirical study of clip for end to end video clip retrieval. arXiv preprint arXiv:2104.08860 (2021).
[24]
Tao Mei, Bo Yang, Xian-Sheng Hua, and Shipeng Li. 2011. Contextual video recommendation by multimodal relevance and user feedback. ACM Transactions on Information Systems (2011).
[25]
Ron Mokady, Amir Hertz, and Amit H Bermano. 2021. Clipcap: Clip prefix for image captioning. arXiv preprint arXiv:2111.09734 (2021).
[26]
Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. 2021. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741 (2021).
[27]
Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, and Andrey Gulin. 2018. CatBoost: unbiased boosting with categorical features. Advances in neural information processing systems, Vol. 31 (2018).
[28]
Xueming Qian, He Feng, Guoshuai Zhao, and Tao Mei. 2014. Personalized Recommendation Combining User Interest and Social Circle. IEEE Transactions on Knowledge and Data Engineering, Vol. 26, 7 (2014), 1763--1777.
[29]
Jiezhong Qiu, Jian Tang, Hao Ma, Yuxiao Dong, Kuansan Wang, and Jie Tang. 2018. Deepinf: Social influence prediction with deep learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2110--2119.
[30]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748--8763.
[31]
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022).
[32]
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S Sara Mahdavi, Rapha Gontijo Lopes, et al. 2022. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. arXiv preprint arXiv:2205.11487 (2022).
[33]
Shaolin Su, Qingsen Yan, Yu Zhu, Cheng Zhang, Xin Ge, Jinqiu Sun, and Yanning Zhang. 2020. Blindly assess image quality in the wild guided by a self-adaptive hyper network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3667--3676.
[34]
Sanjay Subramanian, Will Merrill, Trevor Darrell, Matt Gardner, Sameer Singh, and Anna Rohrbach. 2022. ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension. arXiv preprint arXiv:2204.05991 (2022).
[35]
Gabor Szabo and Bernardo A Huberman. 2010. Predicting the popularity of online content. Commun. ACM, Vol. 53, 8 (2010), 80--88.
[36]
Hossein Talebi and Peyman Milanfar. 2018. NIMA: Neural image assessment. IEEE transactions on image processing, Vol. 27, 8 (2018), 3998--4011.
[37]
Alexandru Tatar, Marcelo Dias De Amorim, Serge Fdida, and Panayotis Antoniadis. 2014. A survey on predicting the popularity of web content. Journal of Internet Services and Applications, Vol. 5, 1 (2014), 1--20.
[38]
Kai Wang, Penghui Wang, Xin Chen, Qiushi Huang, Zhendong Mao, and Yongdong Zhang. 2020. A Feature Generalization Framework for Social Media Popularity Prediction. In Proceedings of the 28th ACM International Conference on Multimedia. 4570--4574.
[39]
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, et al. 2019. Huggingface's transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019).
[40]
Bo Wu, Wen-Huang Cheng, Peiye Liu, Bei Liu, Zhaoyang Zeng, and Jiebo Luo. 2019. SMP challenge: An overview of social media prediction challenge 2019. In Proceedings of the 27th ACM International Conference on Multimedia. 2667--2671.
[41]
Bo Wu, Wen-Huang Cheng, Yongdong Zhang, and Tao Mei. 2016. Time Matters: Multi-scale Temporalization of Social Media Popularity. In Proceedings of the 2016 ACM on Multimedia Conference (ACM MM) (Amsterdam, The Netherlands).
[42]
Chun-Che Wu, Tao Mei, Winston H Hsu, and Yong Rui. 2014. Learning to personalize trending image search suggestion. In Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval. 727--736.
[43]
Kele Xu, Zhimin Lin, Jianqiao Zhao, Peicang Shi, Wei Deng, and Huaimin Wang. 2020. Multimodal deep learning for social media popularity prediction with attention mechanism. In Proceedings of the 28th ACM International Conference on Multimedia. 4580--4584.
[44]
Qingyuan Zhao, Murat A Erdogdu, Hera Y He, Anand Rajaraman, and Jure Leskovec. 2015. Seismic: A self-exciting point process model for predicting tweet popularity. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1513--1522.
[45]
Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Places: A 10 million image database for scene recognition. IEEE transactions on pattern analysis and machine intelligence, Vol. 40, 6 (2017), 1452--1464.

Cited By

View all
  • (2024)Revisiting Vision-Language Features Adaptation and Inconsistency for Social Media Popularity PredictionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3689000(11464-11469)Online publication date: 28-Oct-2024
  • (2024)Higher-Order Vision-Language Alignment for Social Media PredictionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3688999(11457-11463)Online publication date: 28-Oct-2024
  • (2024)Dual-Stream Pre-Training Transformer to Enhance Multimodal Learning for Social Media PredictionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3688998(11450-11456)Online publication date: 28-Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '22: Proceedings of the 30th ACM International Conference on Multimedia
October 2022
7537 pages
ISBN:9781450392037
DOI:10.1145/3503161
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. popularity prediction
  2. social multimedia
  3. visual prediction

Qualifiers

  • Research-article

Conference

MM '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)100
  • Downloads (Last 6 weeks)9
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Revisiting Vision-Language Features Adaptation and Inconsistency for Social Media Popularity PredictionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3689000(11464-11469)Online publication date: 28-Oct-2024
  • (2024)Higher-Order Vision-Language Alignment for Social Media PredictionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3688999(11457-11463)Online publication date: 28-Oct-2024
  • (2024)Dual-Stream Pre-Training Transformer to Enhance Multimodal Learning for Social Media PredictionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3688998(11450-11456)Online publication date: 28-Oct-2024
  • (2024)MMF: Winning Solution to Social Media Popularity Prediction Challenge 2024Proceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3688997(11445-11449)Online publication date: 28-Oct-2024
  • (2024)Cross-Class Domain Adaptive Semantic Segmentation with Visual Language ModelsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681122(5005-5014)Online publication date: 28-Oct-2024
  • (2024)MCDAN: A Multi-Scale Context-Enhanced Dynamic Attention Network for Diffusion PredictionIEEE Transactions on Multimedia10.1109/TMM.2024.337237126(7850-7862)Online publication date: 1-Mar-2024
  • (2023)SMP Challenge: An Overview and Analysis of Social Media Prediction ChallengeProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3613853(9651-9655)Online publication date: 26-Oct-2023
  • (2023)Double-Fine-Tuning Multi-Objective Vision-and-Language Transformer for Social Media Popularity PredictionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612845(9462-9466)Online publication date: 26-Oct-2023
  • (2023)Gradient Boost Tree Network based on Extensive Feature Analysis for Popularity Prediction of Social PostsProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612843(9451-9455)Online publication date: 26-Oct-2023
  • (2023)Enhanced CatBoost with Stacking Features for Social Media PredictionProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3612839(9430-9435)Online publication date: 26-Oct-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media