Abstract
Our study presents a framework for predicting image-based social media content popularity that focuses on addressing complex image information and a hierarchical data structure. We utilize the Google Cloud Vision API to effectively extract key image and color information from users’ postings, achieving 6.8% higher accuracy compared to using non-image covariates alone. For prediction, we explore a wide range of prediction models, including Linear Mixed Model, Support Vector Regression, Multi-layer Perceptron, Random Forest, and XGBoost, with linear regression as the benchmark. Our comparative study demonstrates that models that are capable of capturing the underlying nonlinear interactions between covariates outperform other methods.
Similar content being viewed by others
Data availability
Data is unavailable to the public due to data privacy regulations.
Notes
From 4,000 collected posts, those consisting only video contents without any image are excluded.
Taking Python as an example, features can be adjusted with requests when the image is provided as input to the API. Among them, the ‘max results’ factor can be increased as much as desired. ex) max results = 100.
Unlike the ‘Detect labels’ function, it should be noted that the maximum result value cannot be adjusted for the number of colors returned. Since the task of extracting ‘dominant colors’ is not aimed at extracting all colors, the maximum results are considered to be limited.
As the corpora of the caption are derived from natural language, it is much larger than that of image labels. Hence, we select fewer seed words for Image label topics than caption topics.
We employ binarization as a denoising technique to address the inherent noise present in topic probability vectors achieved from Seeded-LDA. Since LDA typically assigns non-zero, albeit potentially small, probability values to all topics for a given document, the binary approach aims to strengthen the document’s association with its most relevant topics and reduce the influence of less relevant ones. This approach has been shown to improve results in previous research (Hernández-Castañeda & Calvo, 2017) and we observed similar findings in our own experiments.
The function col2Munsell in package aqp in R is used. The conversion method can be referred to in The Rochester Institute of Technology (RIT) website.
In our analysis, \(c=5\) was chosen as it empirically shows good results in reducing the users’ within variance.
‘Reels’ means short videos of less than 60 s, and ‘IGTV’ means long videos of less than 60 min.
Video’s ‘thumbnail’ could have been used, but since there is a large difference in the amount of information regarding the post in a single thumbnail and a video, we excluded the video contents from the analysis.
Note that on social media platforms, images and reels can be uploaded at the same time and are limited to a maximum of 10. However, only one video can be uploaded when uploading IGTV.
In this case, only the list of users who liked the post is visible. Therefore, we crawled the list of users who ‘Liked’ a post and counted the number of users to calculate the raw number of “Likes”.
Public, N. of Image, N. of Reels, Tagged place, N. of Tagged id, N. of Hashtag, Holiday, Season, Weekdays, Hour, Period, and Time difference.
Each covariate has SHAP values as many as the number of observations.
‘Food’ is not ranked among the top 16 covariates. For a full set of mean SHAP values for all covariates and a more detailed discussion on the SHAP values for each covariate, see Appendix D.
References
Abousaleh, F. S., Cheng, W. H., Yu, N. H., et al. (2021). Multimodal deep learning framework for image popularity prediction on social media. IEEE Transactions on Cognitive and Developmental Systems, 13(3), 679–692. https://doi.org/10.1109/TCDS.2020.3036690
Arapakis, I., Cambazoglu, B. B. & Lalmas, M. (2014). On the feasibility of predicting news popularity at cold start. In: Social Informatics: 6th International Conference, SocInfo 2014, Barcelona, Spain, November 11-13, 2014. Proceedings. Springer, pp 290–299, https://doi.org/10.1007/978-3-319-13734-6_21.
Aryafar, K., Lynch, C. & Attenberg, J. (2014). Exploring user behaviour on etsy through dominant colors. In: 2014 22nd International Conference on Pattern Recognition, pp 1437–1442, https://doi.org/10.1109/ICPR.2014.256.
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32. https://doi.org/10.1023/A:1010933404324
Caliandro A, Anselmi G (2021) Affordances-based brand relations: An inquire on memetic brands on instagram. Social Media + Society 7(2):20563051211021367. https://doi.org/10.1177/20563051211021367
Chen, J., Song, X. & Nie, L. et al. (2016). Micro tells macro: Predicting the popularity of micro-videos via a transductive model. In: Proceedings of the 24th ACM International Conference on Multimedia. Association for Computing Machinery, MM ’16, p 898–907, https://doi.org/10.1145/2964284.2964314.
Chen, J., Liang, D. & Zhu, Z. et al. (2019). Social media popularity prediction based on visual-textual features with xgboost. In: Proceedings of the 27th ACM International Conference on Multimedia, MM ’19, p 2692–2696, https://doi.org/10.1145/3343031.3356072.
Chen, T. & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, KDD ’16, p 785–794, https://doi.org/10.1145/2939672.2939785.
Chen, X., Zhou, X., Chan, J., et al. (2022). Event popularity prediction using influential hashtags from social media. IEEE Transactions on Knowledge and Data Engineering, 34(10), 4797–4811. https://doi.org/10.1109/TKDE.2020.3048428
Chopra, A., Dimri, A. & Rawat, S. (2019). Comparative analysis of statistical classifiers for predicting news popularity on social web. In: 2019 International Conference on Computer Communication and Informatics (ICCCI), pp 1–8, https://doi.org/10.1109/ICCCI.2019.8822230.
De, S., Maity, A. & Goel, V. et al. (2017). Predicting the popularity of instagram posts for a lifestyle magazine using deep learning. In: 2017 2nd International Conference on Communication Systems, Computing and IT Applications (CSCITA), pp 174–177, https://doi.org/10.1109/CSCITA.2017.8066548.
Deza, A. & Parikh, D. (2015). Understanding image virality. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1818–1826, https://doi.org/10.1109/CVPR.2015.7298791.
Ding, K., Wang, R. & Wang, S. (2019). Social media popularity prediction: A multiple feature fusion approach with deep neural networks. In: Proceedings of the 27th ACM International Conference on Multimedia, MM ’19, p 2682–2686, https://doi.org/10.1145/3343031.3356062.
Fang, J., Liu, L., Hossin, M. A., et al. (2023). Market competition as a moderator of the effect of social signals on viewership in video-sharing platforms. Information Processing & Management, 60(3), 103329. https://doi.org/10.1016/j.ipm.2023.103329
Figueiredo, F. (2013). On the prediction of popularity of trends and hits for user generated videos. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining. Association for Computing Machinery, WSDM ’13, p 741–746, https://doi.org/10.1145/2433396.2433489.
Gayberi, M. & Oguducu, S. G. (2020). Popularity prediction of posts in social networks based on user, post and image features. In: Proceedings of the 11th International Conference on Management of Digital EcoSystems. Association for Computing Machinery, MEDES ’19, p 9–15, https://doi.org/10.1145/3297662.3365812.
Gelli, F., Uricchio, T. & Bertini, M. et al. (2015). Image popularity prediction in social media using sentiment and context features. In: Proceedings of the 23rd ACM International Conference on Multimedia. Association for Computing Machinery, MM ’15, p 907–910, https://doi.org/10.1145/2733373.2806361.
He, K., Zhang, X. & Ren, S. et al. (2016). Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), https://doi.org/10.1109/CVPR.2016.90.
He, Z., He, Z. & Wu, J. et al. (2019). Feature construction for posts and users combined with lightgbm for social media popularity prediction. In: Proceedings of the 27th ACM International Conference on Multimedia. Association for Computing Machinery, MM ’19, p 2672–2676, https://doi.org/10.1145/3343031.3356054.
Hernández-Castañeda, Á., & Calvo, H. (2017). Deceptive text detection using continuous semantic space models. Intelligent Data Analysis, 21(3), 679–695. https://doi.org/10.3233/IDA-170882
Hessel, J., Lee, L. & Mimno, D. (2017). Cats and captions vs. creators and the clock: Comparing multimodal content to context in predicting relative popularity. In: Proceedings of the 26th International Conference on World Wide Web, WWW ’17, p 927–936, https://doi.org/10.1145/3038912.3052684.
Hidayati, S. C., Chen, Y. L. & Yang, C. L. et al. (2017). Popularity meter: An influence- and aesthetics-aware social media popularity predictor. In: Proceedings of the 25th ACM International Conference on Multimedia. Association for Computing Machinery, MM ’17, p 1918–1923, https://doi.org/10.1145/3123266.3127903.
Hidayati, S. C., Prayogo, R. B. R. & Karuniawan, S. A. V. et al. (2020). What’s in a caption?: Leveraging caption pattern for predicting the popularity of social media posts. In: 2020 Third International Conference on Vocational Education and Electrical Engineering (ICVEE), pp 1–5, https://doi.org/10.1109/ICVEE50212.2020.9243175.
Hsu, C. C., Kang, L. W. & Lee, C. Y. et al. (2019). Popularity prediction of social media based on multi-modal feature mining. In: Proceedings of the 27th ACM International Conference on Multimedia, MM ’19, p 2687–2691, https://doi.org/10.1145/3343031.3356064.
Huang, F., Chen, J. & Lin, Z. et al. (2018). Random forest exploiting post-related and user-related features for social media popularity prediction. In: Proceedings of the 26th ACM International Conference on Multimedia. Association for Computing Machinery, MM ’18, p 2013–2017, https://doi.org/10.1145/3240508.3266439.
Huang, X., Gao, Y. & Fang, Q. et al. (2017). Towards smp challenge: Stacking of diverse models for social image popularity prediction. In: Proceedings of the 25th ACM International Conference on Multimedia. Association for Computing Machinery, MM ’17, p 1895–1900, https://doi.org/10.1145/3123266.3127899.
Hutto, C., & Gilbert, E. (2014). Vader: A parsimonious rule-based model for sentiment analysis of social media text. Proceedings of the International AAAI Conference on Web and Social Media, 8(1), 216–225. https://doi.org/10.1609/icwsm.v8i1.14550
Jeon, H., Seo, W., Park, E., et al. (2020). Hybrid machine learning approach for popularity prediction of newly released contents of online video streaming services. Technological Forecasting and Social Change, 161, 120303. https://doi.org/10.1016/j.techfore.2020.120303
Kang, P., Lin, Z. & Teng, S. et al. (2019). Catboost-based framework with additional user information for social media popularity prediction. In: Proceedings of the 27th ACM International Conference on Multimedia. Association for Computing Machinery, MM ’19, pp 2677–2681, https://doi.org/10.1145/3343031.3356060.
Keneshloo, Y., Wang, S. & Han, E. H. S. et al. (2016). Predicting the popularity of news articles. In: Proceedings of the 2016 SIAM International Conference on Data Mining (SDM), pp 441–449, https://doi.org/10.1137/1.9781611974348.50.
Ketelaar, P. E., Janssen, L., Vergeer, M., et al. (2016). The success of viral ads: Social and attitudinal predictors of consumer pass-on behavior on social network sites. Journal of Business Research, 69(7), 2603–2613. https://doi.org/10.1016/j.jbusres.2015.10.151
Khosla, A., Das Sarma, A. & Hamid, R. (2014). What makes an image popular? In: Proceedings of the 23rd International Conference on World Wide Web. Association for Computing Machinery, WWW ’14, p 867–876, https://doi.org/10.1145/2566486.2567996.
Laird, N. M., & Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics, 38(4), 963–974. https://doi.org/10.2307/2529876
Lee, J. G., Moon, S. & Salamatian, K. (2010). An approach to model and predict the popularity of online contents with explanatory factors. In: 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, pp 623–630, https://doi.org/10.1109/WI-IAT.2010.209.
Li, C. T., Shan, M. K., Jheng, S. H., et al. (2016). Exploiting concept drift to predict popularity of social multimedia in microblogs. Information Sciences, 339, 310–331. https://doi.org/10.1016/j.ins.2016.01.009
Li, J., Gao, Y. & Gao, X. et al. (2019). Senti2pop: Sentiment-aware topic popularity prediction on social media. In: 2019 IEEE International Conference on Data Mining (ICDM), pp 1174–1179, https://doi.org/10.1109/ICDM.2019.00143.
Li, J., Li, D. & Xiong, C. et al. (2022). BLIP: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In: Proceedings of the 39th International Conference on Machine Learning, vol 162. PMLR, pp 12888–12900, https://proceedings.mlr.press/v162/li22n.html.
Li, Y., & Xie, Y. (2020). Is a picture worth a thousand words? an empirical study of image content and social media engagement. Journal of Marketing Research, 57(1), 1–19. https://doi.org/10.1177/0022243719881113
Lu, B., Ott, M. & Cardie, C. et al. (2011). Multi-aspect sentiment analysis with topic models. In: 2011 IEEE 11th International Conference on Data Mining Workshops, IEEE, pp 81–88, https://doi.org/10.1109/ICDMW.2011.125.
Lundberg, S. M. & Lee, S. I. (2017). A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, vol 30. Curran Associates, Inc., pp 1–10, https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html.
Lundberg, S. M., Erion, G., Chen, H., et al. (2020). From local explanations to global understanding with explainable ai for trees. Nature Machine Intelligence, 2(1), 56–67. https://doi.org/10.1038/s42256-019-0138-9
Lv, J., Liu, W. & Zhang, M. et al. (2017). Multi-feature fusion for predicting social media popularity. In: Proceedings of the 25th ACM International Conference on Multimedia, MM ’17, p 1883–1888, https://doi.org/10.1145/3123266.3127897.
Machajdik, J. & Hanbury, A. (2010). Affective image classification using features inspired by psychology and art theory. In: Proceedings of the 18th ACM International Conference on Multimedia. Association for Computing Machinery, MM ’10, p 83–92, https://doi.org/10.1145/1873951.1873965.
Marwick AE (2015) Instafame: Luxury selfies in the attention economy. Public Culture 27(1 (75)):137–160. https://doi.org/10.1215/08992363-2798379
Mazloom, M., Rietveld, R. & Rudinac, S. et al. (2016). Multimodal popularity prediction of brand-related social media posts. In: Proceedings of the 24th ACM International Conference on Multimedia, MM ’16, p 197–201, https://doi.org/10.1145/2964284.2967210.
Mazloom, M., Pappi, I. & Worring, M. (2018). Category specific post popularity prediction. In: MultiMedia Modeling. Springer International Publishing, pp 594–607, https://doi.org/10.1007/978-3-319-73603-7_48.
McParlane, P. J., Moshfeghi, Y. & Jose, J. M. (2014). Nobody comes here anymore, it’s too crowded; predicting image popularity on flickr. In: Proceedings of International Conference on Multimedia Retrieval. Association for Computing Machinery, ICMR ’14, p 385–391, https://doi.org/10.1145/2578726.2578776.
Nanne, A. J., Antheunis, M. L., van der Lee, C. G., et al. (2020). The use of computer vision to analyze brand-related user generated image content. Journal of Interactive Marketing, 50, 156–167. https://doi.org/10.1016/j.intmar.2019.09.003
Naveed, N., Gottron, T. & Kunegis, J. et al. (2011). Bad news travel fast: A content-based analysis of interestingness on twitter. In: Proceedings of the 3rd International Web Science Conference, WebSci ’11, pp 1–7, https://doi.org/10.1145/2527031.2527052.
Newhall SM, Nickerson D, Judd DB (1943) Final report of the o.s.a. subcommittee on the spacing of the munsell colors\(\ast\). Journal of the Optical Society of America 33(7):385–418. https://doi.org/10.1364/JOSA.33.000385
Noaeen, M. & Far, B. H. (2020). The efficacy of using social media data for designing traffic management systems. In: 2020 4th International Workshop on Crowd-Based Requirements Engineering (CrowdRE), pp 11–17, https://doi.org/10.1109/CrowdRE51214.2020.00009.
Overgoor, G., Mazloom, M. & Worring, M. et al. (2017). A spatio-temporal category representation for brand popularity prediction. In: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval. Association for Computing Machinery, ICMR ’17, p 233–241, https://doi.org/10.1145/3078971.3078998.
Purba, K. R., Asirvatham, D., & Murugesan, R. K. (2020). An analysis and prediction model of outsiders percentage as a new popularity metric on instagram. ICT Express, 6(3), 243–248. https://doi.org/10.1016/j.icte.2020.07.001
Risius, M., & Beck, R. (2015). Effectiveness of corporate social media activities in increasing relational outcomes. Information & Management, 52(7), 824–839. https://doi.org/10.1016/j.im.2015.06.004
Saeed, R., Abbas, H., Asif, S., et al. (2022). A framework to predict early news popularity using deep temporal propagation patterns. Expert Systems with Applications, 195, 116496. https://doi.org/10.1016/j.eswa.2021.116496
Sanjo, S. & Katsurai, M. (2017). Recipe popularity prediction with deep visual-semantic fusion. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. Association for Computing Machinery, CIKM ’17, p 2279–2282, https://doi.org/10.1145/3132847.3133137.
Sashi, C., Brynildsen, G., & Bilgihan, A. (2019). Social media, customer engagement and advocacy: An empirical investigation using twitter data for quick service restaurants. International Journal of Contemporary Hospitality Management. https://doi.org/10.1108/IJCHM-02-2018-0108
Shulman, B., Sharma, A. & Cosley, D. (2021). Predictability of popularity: Gaps between prediction and understanding. In: Proceedings of the international AAAI conference on web and social media, pp 348–357, https://doi.org/10.1609/icwsm.v10i1.14748.
Sievert, C. & Shirley, K. (2014). LDAvis: A method for visualizing and interpreting topics. In: Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces, pp 63–70, https://doi.org/10.3115/v1/W14-3110.
Smola, A. J., & Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing, 14, 199–222. https://doi.org/10.1023/B:STCO.0000035301.49549.88
Su, Y., Li, Y., Bai, X., et al. (2020). Predicting the popularity of micro-videos via a feature-discrimination transductive model. Multimedia Systems, 26, 519–534. https://doi.org/10.1007/s00530-020-00660-x
Sulaiman, A., Feizollah, A., Mostafa, M. M., et al. (2023). Profiling the halal food consumer on instagram: integrating image, textual, and social tagging data. Multimedia Tools and Applications, 82(7), 10867–10886. https://doi.org/10.1007/s11042-022-13685-3
Totti, L. C., Costa, F. A. & Avila, S. et al. (2014). The impact of visual attributes on online image diffusion. In: Proceedings of the 2014 ACM Conference on Web Science. Association for Computing Machinery, WebSci ’14, p 42–51, https://doi.org/10.1145/2615569.2615700.
Trzciński, T., & Rokita, P. (2017). Predicting popularity of online videos using support vector regression. IEEE Transactions on Multimedia, 19(11), 2561–2570. https://doi.org/10.1109/TMM.2017.2695439
Wang, J., Yang, S., Zhao, H., et al. (2023). Social media popularity prediction with multimodal hierarchical fusion model. Computer Speech & Language, 80, 101490. https://doi.org/10.1016/j.csl.2023.101490
Wang, Y. (2023). Pictorial map generation based on color extraction and sentiment analysis using sns photos. In: 2023 17th International Conference on Ubiquitous Information Management and Communication (IMCOM), pp 1–8, https://doi.org/10.1109/IMCOM56909.2023.10035582.
Watanabe, K., & Zhou, Y. (2022). Theory-driven analysis of large corpora: Semisupervised topic classification of the un speeches. Social Science Computer Review, 40(2), 346–366. https://doi.org/10.1177/0894439320907027
Wei-ning, W., Ying-lin, Y. & Sheng-ming, J. (2006). Image retrieval by emotional semantics: A study of emotional space and feature extraction. In: 2006 IEEE International Conference on Systems, Man and Cybernetics, pp 3534–3539, https://doi.org/10.1109/ICSMC.2006.384667.
Wu, B., & Shen, H. (2015). Analyzing and predicting news popularity on twitter. International Journal of Information Management, 35(6), 702–711. https://doi.org/10.1016/j.ijinfomgt.2015.07.003
Xie, J., Zhu, Y., & Chen, Z. (2023). Micro-video popularity prediction via multimodal variational information bottleneck. IEEE Transactions on Multimedia, 25, 24–37. https://doi.org/10.1109/TMM.2021.3120537
Yang, Y., Liu, Y., Lu, X., et al. (2020). A named entity topic model for news popularity prediction. Knowledge-Based Systems, 208, 106430. https://doi.org/10.1016/j.knosys.2020.106430
Yu, J., & Egger, R. (2021). Color and engagement in touristic instagram pictures: A machine learning approach. Annals of Tourism Research, 89, 103204. https://doi.org/10.1016/j.annals.2021.103204
Zadeh, A., & Sharda, R. (2022). How can our tweets go viral? point-process modelling of brand content. Information & Management, 59(2), 103594. https://doi.org/10.1016/j.im.2022.103594
Zaman, T., Fox, E. B., & Bradlow, E. T. (2014). A bayesian approach for predicting the popularity of tweets. The Annals of Applied Statistics, 8(3), 1583–1611. https://doi.org/10.1214/14-AOAS741
Zhang, Z., Chen, T. & Zhou, Z. et al. (2018). How to become instagram famous: Post popularity prediction with dual-attention. In: 2018 IEEE International Conference on Big Data (Big Data), pp 2383–2392, https://doi.org/10.1109/BigData.2018.8622461.
Zohourian, A., Sajedi, H. & Yavary, A. (2018). Popularity prediction of images and videos on instagram. In: 2018 4th International Conference on Web Research (ICWR), pp 111–117, https://doi.org/10.1109/ICWR.2018.8387246.
Acknowledgements
Dahyun Jeong and Yunjin Choi was supported by the National Research Foundation of Korea (NRF) grant funded by the korea government (MSIT, No. NRF-2022M3J6A1084845).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no Conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix
Sampling process
In this paper, we sought to leverage text and image data of social media data for the purpose of predicting the number of “Likes”. However, to ensure the quality and relevance of the data used in our study, we implemented a careful selection process.
Our sample consists of influencers with a public account, who among their latest 100 posts, have uploaded a minimum of five sponsored or advertisement posts. This is determined by hashtags, such as #AD or #Sponsored. We further restrict our sample to those who have uploaded at least 100 posts at the time of crawling. Instead of relying on random users’ public data, which often includes individuals who may not frequently post images or share poor quality and meaningless content, we adopted a targeted approach by focusing on active influencers. By selecting users who demonstrate a consistent pattern of posting visually appealing and engaging photos, we aimed to capture a representative sample of users whose content and audience interactions are more likely to yield meaningful insights for predicting the number of “Likes”. This approach allowed us to enhance the reliability and validity of our analysis, ensuring that the data used accurately reflects the dynamics and factors influencing user engagement on social media. This sample selection process allowed us to focus on users who provide regular uploads, post meaningful content, and tend to post diverse topics to actively engage the audience on the platform.
Additionally, we wanted that the number of observations for each user is large for several reasons. First, in the popularity prediction task, it is critical to consider individual user effect because many user-specific unobservable factors, such as user engagement habits, content style, creativity and degree of popularity of the user, may substantially influence post popularity. Ensuring enough observations for each user enables us to disentangle the user-specific fixed effect. Next, examining a substantial number of posts per user allows us to capture a more comprehensive picture of users’ posting behavior and content patterns. Additionally, analyzing a larger sample size helps mitigate the influence of outliers or temporary fluctuations, providing a more accurate representation of their typical posting style and engagement levels.
Specifically, we conducted the following procedure for sample construction:
-
1.
To select an initial user, we search for the hashtag ‘\(\#\)Sponsor’ and choose the user who appears at the top of the search feed. Note that only the public account users will appear in the search feed. This user is assigned as “parent user”.
-
2.
We extract the users who are present in the “parent user”’s following list. These users are assigned as “children users”.
-
3.
Among the “children users”, we select users who meet the criteria mentioned above: (1) having more than 100 posts and (2) having more than five sponsored/advertisement posts among their most recent 100 posts.
-
4.
These selected users are assigned as “parent user”.
-
5.
We repeat 2, 3, and 4.
We collected data of 40 users, and to give the same information weight to the selected users, we crawled the 100 most recent posts for each user. Although we collected 100 post information for all users, the total number of uploaded images may vary for each user, as each post can contain anywhere from one to ten images. The data crawling period is between 6 February 2022 and 24 March 2022, with the resulting analysis sample of 40 users, 3807 posts, and 13,774 images. From 4000 collected posts, those consisting only video contents without any image are excluded.
There are a few aspects of the sampling process that need to be mentioned. First, in our data collection process, usernames are not acquired. Instead, individuals are assigned randomized identification numbers for the purpose of anonymity and confidentiality. User individuals effects are addressed by using dummy variables for each user. Second, restricting the user selection pool to the following account list of the previously chosen user offers a realistic and practical approach for data collection. This is because users tend to follow accounts with similar interests or content, making it more likely to identify individuals who actively engage with sponsored posts.
Seeded-LDA
1.1 The settings of Seeded-LDA
The LDA’s hyperparameter is the parameter of the prior distribution of words and documents. In general, \(\alpha\) is a parameter of the document distribution, and \(\beta\) is a parameter of the word distribution. The tuning grid was set to 0.05, 0.1, 0.5, 1, 5, and 10, and the hyperparameter was selected based on the logarithmic ratio among the measurements of coherence. For caption topics, (\(\alpha\),\(\beta\)) was set to (10,10), and for Image label topics, (\(\alpha\),\(\beta\)), the value of (10,0.1) was selected. The seed words for each topic are listed in Table 5.
1.2 Sensitivity of the number of topics
Seeded-LDA can be sensitive to the selection of several pre-determined topics and seed words. To investigate the impact of these factors, we conduct experiments to evaluate topic diversity of our predefined 5 topics, varying the number of topics from 2 to 5. For each value of k, we conduct \(5 \atopwithdelims ()k\) experiments.
In each experiment, k seed sets are selected from a pool of predefined 5 seed sets used in Table 5. We evaluate topic diversity in each experiment using Jaccard similarity. For two word sets \(T_i\) and \(T_j\), Jaccard similarity is defined as:
For \(1 \le i < j \le k\), where k is the number of topics, topic diversity is calculated by averaging pairwise Jaccard similarities between topics:
Topic diversity measures the extent to which the topics captured by captions and image labels represent diverse aspects of the data and quantifies the distinctiveness between topics.
The value of topic diversity for k’s 2–5 are illustrated in Fig. 7. For captions, employing larger topic numbers results in higher overlap between the topics (left panel). This could be attributed to the common terms regularly used by users, causing them to frequently appear in various topics. As an example, the term “cosmetic” often appear in both event and sponsorship posts. Image labels, by contrast, consistently exhibit a lower overlap, indicating that they encompass a broader range of information embedded within the data (right panel).
1.3 Sensitivity of seed selection
To explore the sensitivity of our results to the chosen seed word sets, we construct an alternative seed word sets using LDA for comparison, which we refer to as ‘contrastive set’. We first performed LDA with five topics. Then, for each topic, we identify the words with the highest information gain. The information gain of a word w to a topic T is calculated as:
where H(T) represents the entropy of the topic, and H(T|w) represents the conditional entropy of the topic given the presence of word w. We then select the words with the highest information gain for each topic as the contrastive seed words, removing the words that overlapped across topics. To maintain consistency with the original seed set size, we chose 20 and 6 final seed words per topic for captions and image labels, respectively. The selected constrastive seed word sets are exhibited in Table 6.
To evaluate the effectiveness of seed word sets, we assess the coherence of topics generated using these sets. We employ the average Normalized Pointwise Mutual Information (NPMI) as the evaluation measure. The average NPMI of a topic is evaluated across all representative word pairs within the given topic:
where N, \(r_i (T)\) for \(i=1, \cdots , N\), and P denote the number of representative words, the \(i-\)th representative word of topic T, and the frequency of the input arguments appearing across the documents. The representative words \(r_i(T)\) of a topic T are selected based on their high relevance as proposed by Sievert and Shirley (2014). The relevance \(r(\cdot | \ \cdot )\) is defined as follows:
Fig. 8 displays the average NPMI score of each topic achieved by the proposed seed word sets and the contrastive seed word sets. This analysis reveals a similar trend for both our proposed seed word sets and the contrastive sets. This indicates that the topic coherence, as measured by NPMI score, exhibits relatively low sensitivity to the specific choice of seed words in this dataset.
Method settings
1.1 Linear mixed model
In our analysis, the covariates that yield singular results when modeled in the simple Linear Mixed Model (Linear Mixed Model with only one covariate), are set to have fixed effects only. These covariates include ‘Period’, ‘Weekdays’, ‘Hour’, ‘Season’, ‘Topic (event)’, ‘G (green)’, and ‘BG (blue green)’. In addition to these seven covariates, the ‘Time difference’ is also modeled to have only a fixed effect owing to computational issues. The remaining variables are modeled to have both random and fixed effects.
1.2 Support vector regression
In our analysis, kernel Support Vector Regression is used to account for structures of covariates in a more flexible way. Specifically, the original p-dimensional covariates are mapped to infinite dimensional space by the function \(\phi : \mathbb {R}^{p} \rightarrow \mathbb {R}^{\infty }\) where
and \(\phi (\textbf{x})\)s are used as infinite- dimensional covariates replacing the p-dimensional covariates \(\textbf{x}\). The prediction of y given the covariate vector \(\textbf{x}\) becomes as follows:
1.3 Parameters settings
SVR involves three tuning parameters: \(\epsilon\), C in (2), and \(\gamma\) in (1). \(\epsilon\) is the allowed range of residuals around the regression line, C is the regularization parameter of the residuals that exceed the allowed range \(\epsilon\), and \(\gamma _{\text {svr}}\) controls the smoothness of the radial kernel.
RF and XGB share several common tuning parameters related to their tree structures, namely, the total number of trees (B), maximum depth of a tree (d), fraction of covariates to be used for each split of a tree (v), fraction of subsamples to be used in building each tree (s), and minimum number of observations that each terminal node should contain (m).
XGB has additional tuning parameters \(\rho\) and \(\gamma _{\text {XGB}}\) that control overfitting: \(\rho\) is the learning rate of gradient boosting, and \(\gamma _{\text {XGB}}\) is the minimum loss reduction required for a tree split. Here, \(\gamma _{\text {XGB}}\) plays the same role as \(\gamma\) described in 4.5 by penalizing a large tree. Another regularization parameter, \(\lambda\) in 4.5 is set to 1 by default.
MLP involves two parameters for the hidden layer: hidden layer size h and activation function. We employ Rectified Linear Unit (ReLU) as the activation function for \(h=5\). For optimization, MLP selects the optimizer and the learning rate \(\rho\). We opt for the Adam optimizer, initializing the learning rate at \(\rho =0.001\) and adaptively adjusting it during training. We use a batch size of full data to ensure a fair comparison with other methods in terms of cross validation and train the model for 500 epochs.
The R packages ‘e1071’, ‘ranger’, and ‘XGBoost’ are used for SVR, RF, and XGB, respectively. The python package ‘keras’ are used for MLP. The tuning parameter settings are presented in Table 7.
TreeSHAP results
Figure 9 displays the mean absolute SHAP values of all covariates in the XGB result. Figure 10 shows the observation-wise SHAP values, in which the covariate value of each observation is represented by a color gradient. The closer the color is to dark purple, the greater the value of each covariate.
Figure 11 shows the SHAP dependence plot for the top 16 covariates in order of SHAP values. This plot shows a scatterplot of the SHAP values against the covariate values, along with the trend line for all observations.
The top left corner presents the plot for ‘Time Difference’. Since we considered the latest 100 posts for a given user in the data collection process, most observations are posts uploaded within a year. The trend line of the plot indicates that the older the post, the greater the decline in the importance of the covariate for prediction of “Likes” until around the first 100 days. This suggests that people may not react to old posts because they are mainly interested in new posts on social media. Therefore, considering such characteristics of social media, where new posts are constantly uploaded, it can be said that it is important to use ‘Time Difference’ information.
Another time lapse variable, ‘Period’, was also important, displayed in the rightmost corner. Most users in our dataset upload new posts within 250 h (approximately 10 days) of uploading the previous post. For ‘Period’ plot, when the upload cycle of posts is short, the influence on “Likes” is low, but as the posting cycle gradually increases, the importance tends to increase. That is, when uploading a post, it may be desirable for the user to upload the post with a certain time interval rather than a cycle that is too short for higher popularity.
Another variable, ‘N. of Image’ refers to the number of images uploaded in a single post. There are various distributions of ‘N. of Image’, starting from 0 (when only reels are posted). An inverse U-shaped relationship is observed, which suggests that both too few images and too many images in a single post are not desirable, and that there may exist an optimal number of images per post.
Among Image label topics, the covariate importance was positively higher when there were ‘Body’ and ‘Fashion’ topics in the post than when there were no such topics, but it was negatively higher when ‘Beauty’ topic was included in the post.
Comparison of covariate construction approaches
In this section, we present a comparative analysis on covariate construction methods using ‘User’ and ‘Time difference’ as baseline covariates. Based on the previous section’s findings, XGBoost is chosen as the prediction model due to its effectiveness in describing image-based social media data. We evaluate the performance of our proposed covariate construction approaches alongside other methods in image covariate (image and representative color), and non-image covariate (caption) construction, respectively. More precisely, we focus on contrasting with neural network-based methods, which is known to display strong performance but often challenging to interpret. Previous studies have utilized sequential steps of multimodal fusion processes, employing neural networks (NNs), to obtain unique overall vector representation for captions or images (e.g., Wang et al. (2023)). The resulting vector representation encapsulates fused meanings from various multimodal sources, including text from post captions, image captions, and additional features. However, this compromises the interpretability of the resulting vector representation. The comparison results are exhibited in Table 8.
1.1 Image covariate
For image covariate, we employ two comparison methods to the proposed Google Vision API-based approach, each based on deep convolutional neural networks (DCNNs) and Bootstrapping Language-Image Pre-training (BLIP), referred to as ‘Image class’, and ‘Image caption topic’, respectively, with our proposed approach denoted by ‘Image label topic’. Convolutional neural network-based models demonstrate outstanding performance in image processing. Thus, we employ comparison methods leveraging DCNNs and utilized their output image categories, following approaches outlined in works such as Huang et al. (2017), Hessel et al. (2017), Chen et al. (2016), and Overgoor et al. (2017). Specifically, we employ ResNet-50 architecture (He et al., 2016) on an image and extract the resulting category with the highest probability. This process generates a binary vector of 700 dimensions, in which each element is an indicator of its associated image category. In the resulting vector, the element corresponding to an image category is encoded as 1, if the given image belongs to that category. We then sum the binary vectors of images per post to form the post-level image covariate. Given that our approach involves extracting word labels from a given image, we also include performance comparison with a method that initially generates a caption from an image, followed by applying LDA on the caption, as proposed in Zhang et al. (2018). For generating image caption, we utilized BLIP (Li et al., 2022) which is a convolutional neural network-based (CNN) approach.
For color covariate, we present a comparison of the prediction performance between Hue-Saturation-Value (HSV) color features, as used in Trzciński and Rokita (2017), and the Munsell space employed in this study, referred to as ‘HSV space’ and ‘Munsell space’, respectively. To construct representative colors based on HSV color space, which is a color space commonly used in image analysis tasks, we first convert the RGB values of the image to the HSV. Subsequently, the converted HSV values are assigned to one of eight discrete classes: black/white, blue, cyan, green, yellow, orange, red, and magenta. Each color category is then summarized at the post-level, in the same manner as detailed in Sect. 3.2.2.
Table 8 provides the test errors of various feature extraction methods. Among the compared image content processing methods, the proposed ‘Image label topic’ exhibits the lowest error for both RMSE and MAE. Furthermore, ‘Image class’ exhibits lower error compared to ‘Image caption topic’. The results tend to improve as the image content is more directly utilized. For instance, ‘Image label topic’ utilizes more direct information than ‘Image class’. While ‘Image class’ assigns a single category to an image, which can lead to an overly simplified representation, ‘Image label topic’ methods utilize multiple labels extracted from a given image, resulting in a more direct and informative representation. In addition, unlike ‘Image class’, ‘Image caption topic’ requires an additional step of converting the image into text. This indirect approach may be less suitable for social media platforms, where images are often used to convey a general impression rather than provide detailed information. In such contexts, the textual description of an image generated by ‘Image caption topic’ might introduce irrelevant details that obscure the core message of the image, acting as noise in covariate construction. These findings suggest that image posts on social media platforms carry information that can be represented solely by the image itself. Between two image color processing methods, ‘HSV space’ and ‘Munsell space’ show close results. The inherent similarity between HSV and Munsell spaces may contribute to the observed indifference in the outcome with respect to the chosen space.
1.2 Non-image covariate
For caption covariate, we compare the caption processing performance between the proposed Seeded-LDA approach, pre-trained neural network based model (BERT), and sentiment analysis algorithm (VADER), which are referred to as ‘Caption topic’, ‘Caption deep feature’, and ‘Sentiment score’ in Table 8. In our comparison, BERT is utilized to explore an attention-based method. Specifically, the caption covariates are extracted leveraging BERT, as employed in Ding et al. (2019). We additionally compare caption features extracted through sentiment analysis using the VADER algorithm (Hutto & Gilbert, 2014), as employed prior research in popularity prediction (e.g., Keneshloo et al. (2016), Saeed et al. (2022)). The VADER outputs sentiment scores for 4 emotional categories–positive, negative, neutral, and compound–with the scores ranging between 0 and 1 where higher values represents stronger emotional intensity.
Among the caption processing methods, ‘Caption topic’ achieves best results than both ‘Caption deep feature’ and ‘Sentiment score’. While both caption topic and caption deep feature represent textual content, caption deep feature exhibits a considerably higher dimensionality (768) compared to caption topic (5). This high dimensionality may lead to performance degradation. Furthermore, ‘Sentiment score’ showed limited results, which might stem from the inherent insufficiency of sentiment analysis. Captions often convey richer information than sentiment alone.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jeong, D., Son, H., Choi, Y. et al. Enhancing social media post popularity prediction with visual content. J. Korean Stat. Soc. 53, 844–882 (2024). https://doi.org/10.1007/s42952-024-00270-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42952-024-00270-7