[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3099023.3099048acmconferencesArticle/Chapter ViewAbstractPublication PagesumapConference Proceedingsconference-collections
research-article

Generating Labeled Datasets of Twitter Users

Published: 09 July 2017 Publication History

Abstract

In this paper we present a simple, yet powerful approach to generating labeled datasets of Twitter1 users. Our focus falls on sensitive personal details, shared as background information in tweets. Such tweets avoid the focus of user's attention and also tend to resist the vast amounts of humor, wishes or hypothetical thinking typical for tweets.
Our approach combines selecting search queries, followed up by a semi-supervised filtering of indicative messages. We create datasets in several unrelated domains and prove that all sorts of target groups can be built with minimal manual annotator effort.
The generated datasets include separate groups of users with specific characteristics: pet ownership, blood pressure, diabetes and psychotropic medicine usage, for which to our knowledge manually labeled data was previously not available.
Our search-based approach is also used to generate a cross-domain corpus, matching Twitter users with their Yelp2 profiles.

References

[1]
Gediminas Adomavicius and Alexander Tuzhilin. 2015. Context-aware recommender systems. In Recommender systems handbook. Springer, 191--226.
[2]
Christina Boididou, Katerina Andreadou, Symeon Papadopoulos, Duc-Tien Dang-Nguyen, Giulia Boato, Michael Riegler, and Yiannis Kompatsiaris. 2015. Verifying Multimedia Use at MediaEval 2015. In MediaEval.
[3]
Carlos Castillo, Marcelo Mendoza, and Barbara Poblete. 2011. Information credibility on twitter. In Proceedings of the 20th international conference on World wide web. ACM, 675--684.
[4]
Munmun De Choudhury, Scott Counts, and Eric Horvitz. 2013. Social media as a measurement tool of depression in populations. In Proceedings of the 5th Annual ACM Web Science Conference. ACM, 47--56.
[5]
Brianna S Fjeldsoe, Alison L Marshall, and Yvette D Miller. 2009. Behavior change interventions delivered by mobile telephone short-message service. American journal of preventive medicine 36, 2 (2009), 165--173.
[6]
Carl L Hanson, Scott H Burton, Christophe Giraud-Carrier, Josh H West, Michael D Barnes, and Bret Hansen. 2013. Tweaking and tweeting: exploring Twitter for nonmedical use of a psychostimulant drug (Adderall) among college students. Journal of medical Internet research 15, 4 (2013), e62.
[7]
Glen Coppersmith Mark Dredze Craig Harman. 2014. Quantifying mental health signals in Twitter. ACL 2014 51 (2014).
[8]
Richard McCreadie, Ian Soboroff, Jimmy Lin, Craig Macdonald, Iadh Ounis, and Dean McCullough. 2012. On Building a Reusable Twitter Corpus. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '12). ACM, New York, NY, USA, 1113--1114.
[9]
Rebecca McKee. 2013. Ethical issues in using social media for health and health care research. Health Policy 110, 2 (2013), 298--301.
[10]
Jude Mikal, Samantha Hurst, and Mike Conway. 2016. Ethical issues in using Twitter for population-level depression monitoring: a qualitative study. BMC medical ethics 17, 1 (2016), 22.
[11]
Bridianne O'Dea, Stephen Wan, Philip J Batterham, Alison L Calear, Cecile Paris, and Helen Christensen. 2015. Detecting suicidality on Twitter. Internet Interventions 2, 2 (2015), 183--188.
[12]
Akiko Orita and Hisakazu Hada. 2009. Is That Really You?: An Approach to Assure Identity Without Revealing Real-name Online. In Proceedings of the 5th ACM Workshop on Digital Identity Management (DIM '09). ACM, New York, NY, USA, 17--20.
[13]
Minsu Park, Chiyoung Cha, and Meeyoung Cha. 2012. Depressive moods of users portrayed in Twitter. In Proceedings of the ACM SIGKDD Workshop on healthcare informatics (HI-KDD). 1--8.
[14]
Md Mahabur Rahman, Md Taksir Hasan Majumder, Md Saddam Hossain Mukta, Mohammed Eunus Ali, and Jalal Mahmud. 2016. Can we predict eat-out preference of a person from tweets?. In Proceedings of the 8th ACM Conference on Web Science. ACM, 350--351.
[15]
Diego Saez-Trumper. 2014. Fake Tweet Buster: A Webtool to Identify Users Promoting Fake News Ontwitter. In Proceedings of the 25th ACM Conference on Hypertext and Social Media (HT '14). ACM, New York, NY, USA, 316--317.
[16]
Hong-Han Shuai, Chih-Ya Shen, De-Nian Yang, Yi-Feng Lan, Wang-Chien Lee, Philip S Yu, and Ming-Syan Chen. 2016. Mining Online Social Data for Detecting Social Network Mental Disorders. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 275--285.
[17]
Todor V Tsonkov and Ivan Koychev. 2015. Automatic Detection of Double Meaning in Texts from the Social Networks. In Proceedings of the 2015 Balkan Conference on Informatics: Advances in ICT. 33--39.
[18]
Marco Vicente, Fernando Batista, and Joao Paulo Carvalho. 2016. Creating Extended Gender Labelled Datasets of Twitter Users. In International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems. Springer, 690--702.
[19]
Katrin Weller, Axel Bruns, Jean E Burgess, Merja Mahrt, and Cornelius Puschmann. 2014. Twitter and society: an introduction. Vol. 89. Peter Lang.

Cited By

View all
  • (2020)Analyzing history-related posts in twitterInternational Journal on Digital Libraries10.1007/s00799-020-00296-2Online publication date: 28-Oct-2020

Index Terms

  1. Generating Labeled Datasets of Twitter Users

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    UMAP '17: Adjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization
    July 2017
    456 pages
    ISBN:9781450350679
    DOI:10.1145/3099023
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 July 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. corpus creation
    2. twitter queries
    3. user characteristics

    Qualifiers

    • Research-article

    Funding Sources

    • National Science Fund

    Conference

    UMAP '17
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 162 of 633 submissions, 26%

    Upcoming Conference

    UMAP '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 01 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Analyzing history-related posts in twitterInternational Journal on Digital Libraries10.1007/s00799-020-00296-2Online publication date: 28-Oct-2020

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media