More Web Proxy on the site http://driver.im/

research-article

Generating Labeled Datasets of Twitter Users

Authors:

Ivan KoychevAuthors Info & Claims

UMAP '17: Adjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization

Pages 191 - 196

https://doi.org/10.1145/3099023.3099048

Published: 09 July 2017 Publication History

Abstract

In this paper we present a simple, yet powerful approach to generating labeled datasets of Twitter¹ users. Our focus falls on sensitive personal details, shared as background information in tweets. Such tweets avoid the focus of user's attention and also tend to resist the vast amounts of humor, wishes or hypothetical thinking typical for tweets.

Our approach combines selecting search queries, followed up by a semi-supervised filtering of indicative messages. We create datasets in several unrelated domains and prove that all sorts of target groups can be built with minimal manual annotator effort.

The generated datasets include separate groups of users with specific characteristics: pet ownership, blood pressure, diabetes and psychotropic medicine usage, for which to our knowledge manually labeled data was previously not available.

Our search-based approach is also used to generate a cross-domain corpus, matching Twitter users with their Yelp² profiles.

References

[1]

Gediminas Adomavicius and Alexander Tuzhilin. 2015. Context-aware recommender systems. In Recommender systems handbook. Springer, 191--226.

[2]

Christina Boididou, Katerina Andreadou, Symeon Papadopoulos, Duc-Tien Dang-Nguyen, Giulia Boato, Michael Riegler, and Yiannis Kompatsiaris. 2015. Verifying Multimedia Use at MediaEval 2015. In MediaEval.

[3]

Carlos Castillo, Marcelo Mendoza, and Barbara Poblete. 2011. Information credibility on twitter. In Proceedings of the 20th international conference on World wide web. ACM, 675--684.

Digital Library

[4]

Munmun De Choudhury, Scott Counts, and Eric Horvitz. 2013. Social media as a measurement tool of depression in populations. In Proceedings of the 5th Annual ACM Web Science Conference. ACM, 47--56.

Digital Library

[5]

Brianna S Fjeldsoe, Alison L Marshall, and Yvette D Miller. 2009. Behavior change interventions delivered by mobile telephone short-message service. American journal of preventive medicine 36, 2 (2009), 165--173.

[6]

Carl L Hanson, Scott H Burton, Christophe Giraud-Carrier, Josh H West, Michael D Barnes, and Bret Hansen. 2013. Tweaking and tweeting: exploring Twitter for nonmedical use of a psychostimulant drug (Adderall) among college students. Journal of medical Internet research 15, 4 (2013), e62.

[7]

Glen Coppersmith Mark Dredze Craig Harman. 2014. Quantifying mental health signals in Twitter. ACL 2014 51 (2014).

[8]

Richard McCreadie, Ian Soboroff, Jimmy Lin, Craig Macdonald, Iadh Ounis, and Dean McCullough. 2012. On Building a Reusable Twitter Corpus. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '12). ACM, New York, NY, USA, 1113--1114.

Digital Library

[9]

Rebecca McKee. 2013. Ethical issues in using social media for health and health care research. Health Policy 110, 2 (2013), 298--301.

[10]

Jude Mikal, Samantha Hurst, and Mike Conway. 2016. Ethical issues in using Twitter for population-level depression monitoring: a qualitative study. BMC medical ethics 17, 1 (2016), 22.

[11]

Bridianne O'Dea, Stephen Wan, Philip J Batterham, Alison L Calear, Cecile Paris, and Helen Christensen. 2015. Detecting suicidality on Twitter. Internet Interventions 2, 2 (2015), 183--188.

[12]

Akiko Orita and Hisakazu Hada. 2009. Is That Really You?: An Approach to Assure Identity Without Revealing Real-name Online. In Proceedings of the 5th ACM Workshop on Digital Identity Management (DIM '09). ACM, New York, NY, USA, 17--20.

Digital Library

[13]

Minsu Park, Chiyoung Cha, and Meeyoung Cha. 2012. Depressive moods of users portrayed in Twitter. In Proceedings of the ACM SIGKDD Workshop on healthcare informatics (HI-KDD). 1--8.

[14]

Md Mahabur Rahman, Md Taksir Hasan Majumder, Md Saddam Hossain Mukta, Mohammed Eunus Ali, and Jalal Mahmud. 2016. Can we predict eat-out preference of a person from tweets?. In Proceedings of the 8th ACM Conference on Web Science. ACM, 350--351.

Digital Library

[15]

Diego Saez-Trumper. 2014. Fake Tweet Buster: A Webtool to Identify Users Promoting Fake News Ontwitter. In Proceedings of the 25th ACM Conference on Hypertext and Social Media (HT '14). ACM, New York, NY, USA, 316--317.

Digital Library

[16]

Hong-Han Shuai, Chih-Ya Shen, De-Nian Yang, Yi-Feng Lan, Wang-Chien Lee, Philip S Yu, and Ming-Syan Chen. 2016. Mining Online Social Data for Detecting Social Network Mental Disorders. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 275--285.

Digital Library

[17]

Todor V Tsonkov and Ivan Koychev. 2015. Automatic Detection of Double Meaning in Texts from the Social Networks. In Proceedings of the 2015 Balkan Conference on Informatics: Advances in ICT. 33--39.

[18]

Marco Vicente, Fernando Batista, and Joao Paulo Carvalho. 2016. Creating Extended Gender Labelled Datasets of Twitter Users. In International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems. Springer, 690--702.

[19]

Katrin Weller, Axel Bruns, Jean E Burgess, Merja Mahrt, and Cornelius Puschmann. 2014. Twitter and society: an introduction. Vol. 89. Peter Lang.

Digital Library

Cited By

Sumikawa YJatowt A(2020)Analyzing history-related posts in twitterInternational Journal on Digital Libraries10.1007/s00799-020-00296-2Online publication date: 28-Oct-2020
https://doi.org/10.1007/s00799-020-00296-2

Index Terms

Generating Labeled Datasets of Twitter Users
1. Social and professional topics
  1. User characteristics

Recommendations

Discovery of Interesting Users in Twitter by Overlapping Propagation Paths of Retweets
WI-IAT '12: Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03

In recent years, social networking services have come into wide use to people. Especially, one of micro blog services, Twitter is a significant service. Twitter user gets information by following other users whose tweets match his interest. Retweet is ...
On building a reusable Twitter corpus
SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

The Twitter real-time information network is the subject of research for information retrieval tasks such as real-time search. However, so far, reproducible experimentation on Twitter data has been impeded by restrictions imposed by the Twitter terms of ...
A comparative study of users' microblogging behavior on sina weibo and twitter
UMAP'12: Proceedings of the 20th international conference on User Modeling, Adaptation, and Personalization

In this article, we analyze and compare user behavior on two different microblogging platforms: (1) Sina Weibo which is the most popular microblogging service in China and (2) Twitter. Such a comparison has not been done before at this scale and is ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

UMAP '17: Adjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization

July 2017

456 pages

ISBN:9781450350679

DOI:10.1145/3099023

Editors:
Marko Tkalcic
Free University of Bolzano
,
Dhaval Thakker
University of Leeds
,
Panagiotis Germanakos
SAP SE & University of Cyprus
,
Kalina Yacef
University of Sydney
,
Cecile Paris
CSIRO ICT Centre
,
Olga Santos
Spanish National University for Distance Education
,
General Chairs:
Maria Bielikova
Slovak University of Technology in Bratislava, Slovakia
,
Eelco Herder
L3S Research Center, Germany and Radboud Universiteit Nijmegen, The Netherlands
,
Program Chairs:
Michel Desmarais
Polytechnique University, Montreal, Canada
,
Federica Cena
University of Torino, Italy

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 July 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Fund

Conference

UMAP '17

Sponsor:

UMAP '17: 25th Conference on User Modeling, Adaptation and Personalization

July 9 - 12, 2017

Bratislava, Slovakia

Acceptance Rates

Overall Acceptance Rate 162 of 633 submissions, 26%

Upcoming Conference

UMAP '25

Sponsor:
sigchi
sigchi

33rd ACM Conference on User Modeling, Adaptation and Personalization

June 16 - 19, 2025

New York City , NY , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
119
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sumikawa YJatowt A(2020)Analyzing history-related posts in twitterInternational Journal on Digital Libraries10.1007/s00799-020-00296-2Online publication date: 28-Oct-2020
https://doi.org/10.1007/s00799-020-00296-2

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents