[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2808797.2809328acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Twitter Population Sample Bias and its impact on predictive outcomes: a case study on elections

Published: 25 August 2015 Publication History

Abstract

In the past years a lot of effort has been spent analyzing online social network data to understand how the world reality is reflected in the "virtual" world. Twitter is by far the network most used in these studies, given its policy of public data availability. However, a big discussion is still on on whether the data available is enough to make user characterization or event outcomes prediction, and what are the pitfalls people do not usually account for. In this direction, we propose a new methodology for drawing representative samples from Twitter data, which is divided into four phases: (i) user filtering, (ii) user demographic characterization, (iii) user sampling, and (iv) event prediction. The methodology is tested into a common scenario in Twitter event outcome prediction: elections. The methodology was tested with municipality elections from six different Brazilian cities, and compared to official election results. Results show it is worth further investigating the topic, but that a very hight number of messages is required to match real data distributions.

References

[1]
F. Morstatter, J. Pfeffer, H. Liu, and K. M. Carley, "Is the sample good enough? comparing data from twitter's streaming api with twitter's firehose," 2013.
[2]
D. Lazer, R. Kennedy, G. King, and A. Vespignani, "The parable of google flu: Traps in big data analysis," Science, vol. 343, pp. 1203-- 1205, 2014.
[3]
D. Gayo-Avello, P. Metaxas, and E. Mustafaraj, "Limits of electoral predictions using twitter," in Int. Conf. on Weblogs and Social Media, 2011, pp. 490--493.
[4]
A. Jungherr, "Tweets and votes, a special relationship: The 2009 federal election in germany," in Proc. of the 2Nd Workshop on Politics, Elections and Data, 2013, pp. 5--14.
[5]
A. Tumasjan, T. Sprenger, P. Sandner, and I. Welpe, "Predicting elections with twitter: What 140 characters reveal about political sentiment," in Int. Conf. on Weblogs and Social Media, 2010, pp. 178--185.
[6]
A. Jungherr, P. Jürgens, and H. Schoen, "Why the pirate party won the german election of 2009 or the trouble with predictions: A response to Tumasjan, A., Sprenger, T. O., Sander, P. G., & Welpe, I. M. "Predicting elections with twitter: What 140 characters reveal about political sentiment"," Soc. Sci. Comput. Rev., vol. 30, no. 2, pp. 229--234, may 2012.
[7]
D. Gayo-Avello, ""I wanted to predict elections with twitter and all i got was this lousy paper" -- A balanced survey on election prediction using twitter data," Tech. Rep., 2012.
[8]
A. Makazhanov and D. Rafiei, "Predicting political preference of twitter users," in Int. Conf. on Advances in Social Networks Analysis and Mining, ser. ASONAM '13, 2013, pp. 298--305.
[9]
D. Gayo-Avello, "A meta-analysis of state-of-the-art electoral prediction from twitter data," Soc. Sci. Comput. Rev., vol. 31, no. 6, pp. 649--679, 2013.
[10]
Y. Lu, M. Castellanos, U. Dayal, and C. Zhai, "Automatic construction of a context-aware sentiment lexicon: An optimization approach," in Int. Conf. on World Wide Web, ser. WWW '11, 2011, pp. 347--356.
[11]
D. Nguyen, R. Gravel, D. Trieschnigg, and T. Meder, ""How old do you think I am?": A study of language and age in twitter," in Int. Conf. on Weblogs and Social Media, 2013, pp. 439--448.
[12]
F. Benevenuto, G. Magno, T. Rodrigues, and V. Almeida, "Detecting spammers on twitter," in Proceedings of the 7th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS), 2010.
[13]
I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. M. Kaufmann, 2005.
[14]
E. O. Wright, Encyclopedia of Social Theory. Sage Publications, 2003, ch. Social Class.
[15]
P. Bourdieu, Language and Symbolic power. Polity press, 1991.
[16]
R. Miranda Filho, G. Borges, J. Almeida, and G. Pappa, "Inferring user social class in online social networks," in Proceedings of the 8th Workshop on Social Network Mining and Analysis (SNA-KDD), 2014.
[17]
P. Gonçalves, M. Araújo, F. Benevenuto, and M. Cha, "Comparing and combining sentiment analysis methods," in Proc. of the First ACM Conf. on Online Social Networks, 2013, pp. 27--38.
[18]
X. Hu, J. Tang, H. Gao, and H. Liu, "Unsupervised sentiment analysis with emotional signals," in Int. Conf. on World Wide Web, 2013, pp. 607--618.

Cited By

View all
  • (2024)Sensing the pulse of the pandemic: unveiling the geographical and demographic disparities of public sentiment toward COVID-19 through social mediaCartography and Geographic Information Science10.1080/15230406.2024.232348951:3(366-384)Online publication date: 21-Mar-2024
  • (2024)Predicting the demographics of Twitter users with programmatic weak supervisionTOP10.1007/s11750-024-00666-y32:3(354-390)Online publication date: 8-Feb-2024
  • (2023)Two-Layered Machine Learning Approach for Sentiment Analysis of tweets related to Electric Vehicles2023 International Conference on Innovations in Engineering and Technology (ICIET)10.1109/ICIET57285.2023.10220717(1-6)Online publication date: 13-Jul-2023
  • Show More Cited By
  1. Twitter Population Sample Bias and its impact on predictive outcomes: a case study on elections

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ASONAM '15: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015
      August 2015
      835 pages
      ISBN:9781450338547
      DOI:10.1145/2808797
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 25 August 2015

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Funding Sources

      Conference

      ASONAM '15
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 116 of 549 submissions, 21%

      Upcoming Conference

      KDD '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)26
      • Downloads (Last 6 weeks)9
      Reflects downloads up to 19 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Sensing the pulse of the pandemic: unveiling the geographical and demographic disparities of public sentiment toward COVID-19 through social mediaCartography and Geographic Information Science10.1080/15230406.2024.232348951:3(366-384)Online publication date: 21-Mar-2024
      • (2024)Predicting the demographics of Twitter users with programmatic weak supervisionTOP10.1007/s11750-024-00666-y32:3(354-390)Online publication date: 8-Feb-2024
      • (2023)Two-Layered Machine Learning Approach for Sentiment Analysis of tweets related to Electric Vehicles2023 International Conference on Innovations in Engineering and Technology (ICIET)10.1109/ICIET57285.2023.10220717(1-6)Online publication date: 13-Jul-2023
      • (2023)Incorporating Emotions into Health Mention Classification Task on Social Media2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386330(4834-4842)Online publication date: 15-Dec-2023
      • (2023)Do I have time to build the ark calmly? Characterizing attitudes towards climate change via sentiment analysis of social mediaJournal of Integrative Environmental Sciences10.1080/1943815X.2023.226438020:1Online publication date: 2-Oct-2023
      • (2023)A critical review of social media research in sensory-consumer scienceFood Research International10.1016/j.foodres.2023.112494165(112494)Online publication date: Mar-2023
      • (2023)Filling in the White Space: Spatial Interpolation with Gaussian Processes and Social Media DataCurrent Research in Ecological and Social Psychology10.1016/j.cresp.2023.100159(100159)Online publication date: Oct-2023
      • (2022)Methods to Establish Race or Ethnicity of Twitter Users: Scoping ReviewJournal of Medical Internet Research10.2196/3578824:4(e35788)Online publication date: 29-Apr-2022
      • (2022)Political polarization on Twitter during the COVID-19 pandemic: a case study in BrazilSocial Network Analysis and Mining10.1007/s13278-022-00949-x12:1Online publication date: 23-Sep-2022
      • (2021)Geographies of Twitter debatesJournal of Computational Social Science10.1007/s42001-021-00143-75:1(647-663)Online publication date: 12-Sep-2021
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media