[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Measuring International Online Human Values with Word Embeddings

Published: 22 December 2021 Publication History

Abstract

As the Internet grows in number of users and in the diversity of services, it becomes more influential on peoples lives. It has the potential of constructing or modifying the opinion, the mental perception, and the values of individuals. What is being created and published online is a reflection of people’s values and beliefs. As a global platform, the Internet is a great source of information for researching the online culture of many different countries. In this work we develop a methodology for measuring data from textual online sources using word embedding models, to create a country-based online human values index that captures cultural traits and values worldwide. Our methodology is applied with a dataset of 1.7 billion tweets, and then we identify their location among 59 countries. We create a list of 22 Online Values Inquiries (OVI), each one capturing different questions from the World Values Survey, related to several values such as religion, science, and abortion. We observe that our methodology is indeed capable of capturing human values online for different counties and different topics. We also show that some online values are highly correlated (up to c = 0.69, p < 0.05) with the corresponding offline values, especially religion-related ones. Our method is generic, and we believe it is useful for social sciences specialists, such as demographers and sociologists, that can use their domain knowledge and expertise to create their own Online Values Inquiries, allowing them to analyze human values in the online environment.

References

[1]
Jacob Levy Abitbol, Márton Karsai, Jean-Philippe Magué, Jean-Pierre Chevrot, and Eric Fleury. 2018. Socioeconomic dependencies of linguistic patterns in Twitter: A multivariate analysis. In Proceedings of the 2018 World Wide Web Conference (Lyon, France) (WWW’18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 1125–1134. DOI: https://doi.org/10.1145/3178876.3186011
[2]
Tim Althoff, Rok Sosič, Jennifer Hicks, Abby C. King, Scott Delp, and Jure Leskovec. 2017. Large-scale physical activity data reveal worldwide activity inequality. Nature 547. DOI:
[3]
Dolan Antenucci, Michael Cafarella, Margaret Levenstein, Christopher Ré, and Matthew D. Shapiro. 2014. Using Social Media to Measure Labor Market Flows. Technical Report 20010. National Bureau of Economic Research.
[4]
K. Avruch and United States Institute of Peace. 1998. Culture & Conflict Resolution. United States Institute of Peace Press. 98030951https://books.google.com.br/books?id=OofmUheyGJAC.
[5]
Tarek A. l. Baghal, Luke Sloan, Curtis Jessop, Matthew L. Williams, and Pete Burnap. 2019. Linking Twitter and survey data: The impact of survey mode and demographics on consent rates across three UK studies. Social Science Computer Review 0, 0 (2019), 0894439319828011. DOI: https://doi.org/10.1177/0894439319828011arXiv:https://doi.org/10.1177/0894439319828011
[6]
Andrea Ballatore, Mark Graham, and Shilad Sen. 2017. Digital hegemonies: The localness of search engine results. Annals of the American Association of Geographers 107, 5 (2017), 1194–1215. DOI: https://doi.org/10.1080/24694452.2017.1308240arXiv:https://doi.org/10.1080/24694452.2017.1308240
[7]
Marco Bastos, Dan Mercea, and Andrea Baronchelli. 2018. The geographic embedding of online echo chambers: Evidence from the Brexit campaign. PLOS ONE 13, 11, 1–16. DOI:
[8]
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching Word Vectors with Subword Information. arxiv:1607.04606 [cs.CL]
[9]
Johan Bollen, Huina Mao, and Alberto Pepe. 2011. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In ICWSM.
[10]
Johan Bollen, Huina Mao, and Xiao-Jun Zeng. 2011. Twitter mood predicts the stock market. J. Comput. Science 2, 1 (2011), 1–8.
[11]
Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. 2016. Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In Proceedings of the 30th International Conference on Neural Information Processing Systems (Barcelona, Spain) (NIPS’16). Curran Associates Inc., USA, 4356–4364. http://dl.acm.org/citation.cfm?id=3157382.3157584
[12]
Aylin Caliskan, Joanna J. Bryson, and Arvind Narayanan. 2017. Semantics derived automatically from language corpora contain human-like biases. Science 356, 6334, 183–186. DOI:
[13]
Jilin Chen, Gary Hsieh, Jalal U. Mahmud, and Jeffrey Nichols. 2014. Understanding individuals’ personal values from social media word use. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work &#38; Social Computing (Baltimore, Maryland, USA) (CSCW’14). ACM, New York, NY, USA, 405–414. DOI: https://doi.org/10.1145/2531602.2531608
[14]
J. Clement. 2019. Twitter: Number of monthly active users 2010–2019. Statista. https://www.statista.com/statistics/282087/number-of-monthly-active-twitter-users/. [Online: accessed 02-Feb-2020].
[15]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. DOI:
[16]
Guido Di Fraia and Maria Carlotta Missaglia. 2014. The Use of Twitter in 2013 Italian Political Election. Springer International Publishing, Cham, 63–77. DOI:
[17]
William H. Dutton and Bianca C. Reisdorf. 2019. Cultural divides and digital inequalities: Attitudes shaping Internet and social media divides. Information, Communication & Society 22, 1 (2019), 18–38. DOI: https://doi.org/10.1080/1369118X.2017.1353640arXiv:https://doi.org/10.1080/1369118X.2017.1353640
[18]
Lee Fiorio, Guy Abel, Jixuan Cai, Emilio Zagheni, Ingmar Weber, and Guillermo Vinué. 2017. Using Twitter data to estimate the relationship between short-term mobility and long-term migration. In Proceedings of the 2017 ACM on Web Science Conference (Troy, New York, USA) (WebSci’17). ACM, New York, NY, USA, 103–110. DOI:
[19]
Ronald Fischer and Shalom Schwartz. 2011. Whence differences in value priorities?: Individual, cultural, or artifactual sources. Journal of Cross-Cultural Psychology 42, 7 (2011), 1127–1144. DOI: https://doi.org/10.1177/0022022110381429arXiv:https://doi.org/10.1177/0022022110381429
[20]
Ruth García-Gavilanes, Yelena Mejova, and Daniele Quercia. 2014. Twitter Ain’t without frontiers: Economic, social, and cultural boundaries in international communication. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work and Social Computing (Baltimore, Maryland, USA) (CSCW’14). Association for Computing Machinery, New York, NY, USA, 1511–1522. DOI: https://doi.org/10.1145/2531602.2531725
[21]
Ruth Garcia-Gavilanes, Daniele Quercia, and Alejandro Jaimes. 2013. Cultural dimensions in Twitter: Time, individualism and power. In International AAAI Conference on Web and Social Media. https://www.aaai.org/ocs/index.php/ICWSM/ICWSM13/paper/view/6102.
[22]
Ruth Olimpia Garcia Gavilanes. 2013. On the quest of discovering cultural trails in social media. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining (Rome, Italy) (WSDM’13). Association for Computing Machinery, New York, NY, USA, 747–752. DOI: https://doi.org/10.1145/2433396.2433490
[23]
Minas Gjoka, Maciej Kurant, Carter T. Butts, and Athina Markopoulou. 2009. A walk in Facebook: Uniform sampling of users in online social networks. CoRR abs/0906.0060 (2009). arXiv:0906.0060 http://arxiv.org/abs/0906.0060.
[24]
Amir Globerson, Gal Chechik, Fernando Pereira, and Naftali Tishby. 2007. Euclidean embedding of co-occurrence data. J. Mach. Learn. Res. 8 (Dec. 2007), 2265–2295. http://dl.acm.org/citation.cfm?id=1314498.1314572.
[25]
A. G. Greenwald, D. E. McGhee, and J. L. K Schwartz. 1998. Measuring individual differences in implicit cognition: The implicit association test. Journal of Personality and Social Psychology 74, 6 (1998), 1464–80. DOI:
[26]
Miniwatts Marketing Group. 2019. World Internet Users and 2019 Population Stats. Internet World Stats. https://www.internetworldstats.com/stats.htm. [Online: accessed 02-Feb-2020].
[27]
L. Guo, D. Zhang, H. Wu, B. Cui, and K. Tan. 2017. From raw footprints to personal interests: Bridging the semantic gap via trip intention aggregation. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE). 123–126. DOI:
[28]
Bartosz Hawelka, Izabela Sitko, Euro Beinat, Stanislav Sobolevsky, Pavlos Kazakopoulos, and Carlo Ratti. 2014. Geo-located Twitter as proxy for global mobility patterns. Cartography and Geographic Information Science 41 (2014), 260–271. Issue 3.
[29]
G. Hofstede, G. J. Hofstede, and M. Minkov. 2010. Cultures and Organizations: Software of the Mind, Third Edition. McGraw-Hill Education. 91000205https://books.google.com.br/books?id=o4OqTgV3V00C.
[30]
R. Inglehart. 1997. Modernization and Postmodernization: Cultural, Economic, and Political Change in 43 Societies. Princeton University Press. 96053839https://books.google.com.br/books?id=uERHzCu6l9EC.
[31]
Ronald Inglehart and Wayne E. Baker. 2000. Modernization, cultural change, and the persistence of traditional values. American Sociological Review 65, 1 (2000), 19–51. http://www.jstor.org/stable/2657288.
[32]
R. Inglehart, C. Haerpfer, A. Moreno, C. Welzel, K. Kizilova, J. Diez-Medrano, M. Lagos, P. Norris, E. Ponarin, B. Puranen, et al. 2014. World Values Survey: Round Six - Country-Pooled Datafile 2010–2014. Madrid: JD Systems Institute.
[33]
R. Jakobson and N. Ruwet. 1969. Essais de Linguistique Générale. Editions de Minuit. https://books.google.com.br/books?id=OZhHvgAACAAJ.
[34]
Kyriaki Kalimeri, Mariano G. Beiró, Matteo Delfino, Robert Raleigh, and Ciro Cattuto. 2019. Predicting demographics, moral foundations, and human values from digital behaviours. Computers in Human Behavior 92 (2019), 428–445. DOI:
[35]
Rémi Lebret and Ronan Collobert. 2014. Word embeddings through Hellinger PCA. In EACL, Gosse Bouma and Yannick Parmentier (Eds.). The Association for Computer Linguistics, 482–490. http://www.aclweb.org/anthology/E14-1051.
[36]
R. Likert. 1932. A Technique for the Measurement of Attitudes. Number Nº 136-165 in A Technique for the Measurement of Attitudes. Publisher not identified. 33012634https://books.google.com.br/books?id=9rotAAAAYAAJ.
[37]
J. J. Macionis. 2016. Sociology. Pearson; 16th edition. https://books.google.com.br/books?id=BbjRZR2MJuIC.
[38]
Gabriel Magno, Giovanni Comarela, Diego Saez-Trumper, Meeyoung Cha, and Virgilio Almeida. 2012. New kid on the block: Exploring the Google+ social graph. In Proceedings of the 2012 ACM Internet Measurement Conference (Boston, Massachusetts, USA) (IMC’12). ACM, New York, NY, USA, 159–170. DOI: https://doi.org/10.1145/2398776.2398794
[39]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013). http://dblp.uni-trier.de/db/journals/corr/corr1301.html#abs-1301-3781.
[40]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2 (Lake Tahoe, Nevada) (NIPS’13). Curran Associates Inc., USA, 3111–3119. http://dl.acm.org/citation.cfm?id=2999792.2999959.
[41]
Michael Minkov. 2007. What Makes Us Different and Similar: A New Interpretation of the World Values Survey and Other Cross-Cultural Data. Klasika y Stil Publishing House.
[42]
Malvina Nissim, Rik van Noord, and Rob van der Goot. 2019. Fair is Better than Sensational: Man is to Doctor as Woman is to Doctor. arxiv:1905.09866 [cs.CL]
[43]
Brendan O’Connor, Ramnath Balasubramanyan, Bryan R. Routledge, and Noah A. Smith. 2010. From Tweets to polls: Linking text sentiment to public opinion time series. In ICWSM.
[44]
Sanna Ojanperä, Mark Graham, and Matthew Zook. 2019. The digital knowledge economy index: Mapping content production. The Journal of Development Studies 0, 0 (2019), 1–18. DOI: https://doi.org/10.1080/00220388.2018.1554208arXiv:https://doi.org/10.1080/00220388.2018.1554208
[45]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global vectors for word representation. In EMNLP, Vol. 14. 1532–1543.
[46]
Tobias Preis, Helen Susannah Moat, H. Eugene Stanley, and Steven R. Bishop. 2012. Quantifying the advantage of looking forward. Nature Scientific Reports 2 (2012), 350.
[47]
Daniele Quercia and Diego Sáez-Trumper. 2014. Mining urban deprivation from foursquare: Implicit crowdsourcing of city land use. IEEE Pervasive Computing 13, 2 (2014), 30–36.
[48]
Radim Řehůřek and Petr Sojka. 2010. Software framework for topic modelling with large corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. ELRA, Valletta, Malta, 45–50.
[49]
Bruno Ribeiro and Don Towsley. 2010. Estimating and sampling graphs with multidimensional random walks. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement (Melbourne, Australia) (IMC’10). ACM, New York, NY, USA, 390–403. DOI: https://doi.org/10.1145/1879141.1879192
[50]
Douglas L. T. Rohde, Laura M. Gonnerman, and David C. Plaut. 2006. An improved model of semantic similarity based on lexical co-occurence. Communications of the ACM 8 (2006), 627–633.
[51]
M. Rokeach. 1973. The Nature of Human Values. Free Press. lc72092870https://books.google.com.br/books?id=TfRGAAAAMAAJ.
[52]
Shalom H. Schwartz. 1992. Universals in the content and structure of values: Theoretical advances and empirical tests in 20 countries. In Advances in Experimental Social Psychology, Mark P. Zanna (Ed.). Vol. 25. Academic Press, 1–65. DOI:
[53]
Hamza Shaban. 2019. Twitter reveals its daily active user numbers for the first time. The Washington Post. https://www.washingtonpost.com/technology/2019/02/07/twitter-reveals-its-daily-active-user-numbers-first-time/. [Online: accessed 04-Jul-2019].
[54]
Thiago H. Silva, Pedro O. S. Vaz de Melo, Jussara M. Almeida, Mirco Musolesi, and Antonio A. F. Loureiro. 2014. You are what you eat (and drink): Identifying cultural boundaries by analyzing food and drink habits in foursquare. In Proceedings of the Eighth International Conference on Weblogs and Social Media, ICWSM 2014, Ann Arbor, Michigan, USA, June 1–4, 2014. http://www.aaai.org/ocs/index.php/ICWSM/ICWSM14/paper/view/8113.
[55]
Luke Sloan and Jeffrey Morgan. 2015. Who Tweets with their location?: Understanding the relationship between demographic characteristics and the use of geoservices and geotagging on Twitter. PloS One 10, 11 (06 Nov 2015), e0142209–e0142209. DOI:
[56]
H. Spencer-Oatey. 2008. Culturally Speaking: Culture, Communication and Politeness Theory. Continuum. 2008008309https://books.google.com.br/books?id=aTOBAAAAMAAJ.
[57]
Helen Spencer-Oatey. 2012. What is Culture?: A Compilation of Quotations. https://www2.warwick.ac.uk/fac/soc/al/globalpad/openhouse/interculturalskills/global_pad_-_what_is_culture.pdf. Recommended.
[58]
tm. 2015. Evaluating language identification performance. Twitter Engineering. https://blog.twitter.com/engineering/en_us/a/2015/evaluating-language-identification-performance. [Online: accessed 09-Oct-2021].
[59]
Wikipedia. 2019. World Values Survey — Wikipedia, The Free Encyclopedia. http://en.wikipedia.org/w/index.php?title=World%20Values%20Survey&oldid=885226660. [Online: accessed 08-May-2019].
[60]
Wikipedia contributors. 2020. Languages with official status in India — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Languages_with_official_status_in_India&oldid=938502640. [Online: accessed 02-Feb-2020].
[61]
Wikipedia contributors. 2020. South Africa — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=South_Africa&oldid=938819999. [Online: accessed 02-Feb-2020].
[62]
Zi Yin and Yuanyuan Shen. 2018. On the dimensionality of word embedding. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (Montréal, Canada) (NIPS’18). Curran Associates Inc., Red Hook, NY, USA, 895–906.
[63]
Wu Youyou, Michal Kosinski, and David Stillwell. 2015. Computer-based personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Sciences 112, 4 (2015), 1036–1040. DOI: arXiv:https://www.pnas.org/content/112/4/1036.full.pdf
[64]
Emilio Zagheni, Venkata Rama Kiran Garimella, Ingmar Weber, and Bogdan State. 2014. Inferring international and internal migration patterns from Twitter data. In WWW (Companion Volume). 439–444.
[65]
Xue Zhang, Hauke Fuehres, and Peter A. Gloor. 2012. Predicting asset value through Twitter buzz. Advances in Intelligent and Soft Computing 113 (2012), 23–34.

Cited By

View all
  • (2024)Exif2Vec: A Framework to Ascertain Untrustworthy Crowdsourced Images Using MetadataACM Transactions on the Web10.1145/364509418:3(1-27)Online publication date: 15-Apr-2024
  • (2024)Comparison of Vocabulary Features Among Multiple Data Sources for Constructing a Knowledge Base on Disaster InformationTechnologies and Applications of Artificial Intelligence10.1007/978-981-97-1711-8_10(139-150)Online publication date: 28-Mar-2024
  • (2023)Construction of Makeup Vocabulary Datasets for Scene Search on Makeup Movies化粧動画の工程検索を指向した化粧語彙セット構築の試みJournal of Japan Society for Fuzzy Theory and Intelligent Informatics10.3156/jsoft.35.2_64535:2(645-654)Online publication date: 15-May-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on the Web
ACM Transactions on the Web  Volume 16, Issue 2
May 2022
148 pages
ISSN:1559-1131
EISSN:1559-114X
DOI:10.1145/3506669
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 December 2021
Accepted: 01 November 2021
Revised: 01 October 2021
Received: 01 November 2020
Published in TWEB Volume 16, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Internet
  2. culture
  3. values
  4. countries
  5. online social networks
  6. word embeddings

Qualifiers

  • Research-article
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)123
  • Downloads (Last 6 weeks)8
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Exif2Vec: A Framework to Ascertain Untrustworthy Crowdsourced Images Using MetadataACM Transactions on the Web10.1145/364509418:3(1-27)Online publication date: 15-Apr-2024
  • (2024)Comparison of Vocabulary Features Among Multiple Data Sources for Constructing a Knowledge Base on Disaster InformationTechnologies and Applications of Artificial Intelligence10.1007/978-981-97-1711-8_10(139-150)Online publication date: 28-Mar-2024
  • (2023)Construction of Makeup Vocabulary Datasets for Scene Search on Makeup Movies化粧動画の工程検索を指向した化粧語彙セット構築の試みJournal of Japan Society for Fuzzy Theory and Intelligent Informatics10.3156/jsoft.35.2_64535:2(645-654)Online publication date: 15-May-2023
  • (2023)Studying the Political Values in the Digital: The Review of Russian and Foreign CasesRUDN Journal of Public Administration10.22363/2312-8313-2023-10-4-543-55110:4(543-551)Online publication date: 15-Dec-2023
  • (2023)Graph Attention Network for Text Classification and Detection of Mental DisorderACM Transactions on the Web10.1145/357240617:3(1-31)Online publication date: 22-May-2023

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media