[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2835776.2835784acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Towards Modelling Language Innovation Acceptance in Online Social Networks

Published: 08 February 2016 Publication History

Abstract

Language change and innovation is constant in online and offline communication, and has led to new words entering people's lexicon and even entering modern day dictionaries, with recent additions of 'e-cig' and 'vape'. However the manual work required to identify these 'innovations' is both time consuming and subjective. In this work we demonstrate how such innovations in language can be identified across two different OSN's (Online Social Networks) through the operationalisation of known language acceptance models that incorporate relatively simple statistical tests. From grounding our work in language theory, we identified three statistical tests that can be applied - variation in; frequency, form and meaning. Each show different success rates across the two networks (Geo-bound Twitter sample and a sample of Reddit). These tests were also applied to different community levels within the two networks allowing for different innovations to be identified across different community structures over the two networks, for instance: identifying regional variation across Twitter, and variation across groupings of Subreddits, where identified example innovations included 'casualidad' and 'cym'.

References

[1]
Distributional Semantics Resources for Biomedical Text Processing. pages 1--5, Nov. 2013.
[2]
G. Aston and L. Burnard. The BNC handbook: exploring the British National Corpus with SARA. Capstone, 1998.
[3]
D. K. Barnhart. A Calculus for New Words. 28(1):132--138, 2007.
[4]
V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, physics.soc-ph(10), Oct. 2008.
[5]
R. K. Blot. Language and Social Identity. Greenwood Publishing Group, Jan. 2003.
[6]
C. Buntain and J. Golbeck. Identifying social roles in reddit using network structure. In WWW Companion '14: Proceedings of the companion publication of the 23rd international conference on World wide web companion. International World Wide Web Conferences Steering Committee, Apr. 2014.
[7]
C. P. Cook. Exploiting Linguistic Knowledge to Infer Properties of Neologisms, 2010.
[8]
W. Croft. Mixed languages and acts of identity: An evolutionary approach William Croft. The mixed language debate: Theoretical and empirical ..., 2003.
[9]
W. Croft. Evolution: Language Use and the Evolution of Languages. The Language Phenomenon, (Chapter 5):93--120, 2013.
[10]
D. Crystal. Language and the Internet. Cambridge University Press, Sept. 2001.
[11]
M. Duggan and A. Smith. 6% of online adults are reddit users. Pew Internet & American Life Project, 2013.
[12]
J. Eisenstein. What to do about bad language on the internet. In Proceedings of NAACL-HLT, 2013.
[13]
J. Eisenstein, B. O'Connor, N. A. Smith, and E. P. Xing. Mapping the geographical diffusion of new words. arXiv.org, page 5268, Oct. 2012.
[14]
J. Eisenstein, N. A. Smith, and E. P. Xing. Discovering sociolinguistic associations with structured sparsity. In HLT '11: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, June 2011.
[15]
S. L. Emery, L. Vera, J. Huang, and G. Szczypka. Wanna know about vaping? Patterns of message exposure, seeking and sharing information about e-cigarettes across media platforms. Tobacco Control, 23(Supplement 3):17--25, July 2014.
[16]
A. Giddens. The Giddens Reader. Stanford University Press, Jan. 1993.
[17]
B. Han, P. Cook, and T. Baldwin. Lexical Normalization for Social Media Text. Acm Transactions on Intelligent Systems and Technology, 4(1):--27, Jan. 2013.
[18]
D. Kershaw, M. Rowe, and P. Stacey. Towards tracking and analysing regional alcohol consumption patterns in the UK through the use of social media. WebSci, pages 220--228, 2014.
[19]
V. Kulkarni, R. Al-Rfou, B. Perozzi, and S. Skiena. Statistically Significant Detection of Linguistic Change. arXiv.org, page 3315, Nov. 2014.
[20]
W. Labov. The social stratification of English in New York city. Cambridge University Press, 2006.
[21]
S. L. Lai and V. T. Ng. Collaborative discovery of Chinese neologisms in social media. In Systems, Man and Cybernetics (SMC), 2014 IEEE International Conference on, pages 4107--4112. IEEE, 2014.
[22]
J. Leskovec, L. Backstrom, and J. Kleinberg. Meme-tracking and the dynamics of the news cycle. In KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 497--506, New York, New York, USA, June 2009. ACM Request Permissions.
[23]
S. Maneewongvatana and D. M. Mount. On the Efficiency of Nearest Neighbor Searching with Data Clustered in Lower Dimensions. In Computational Sciencet -- ICCS 2001, pages 842--851. Springer Berlin Heidelberg, Berlin, Heidelberg, July 2001.
[24]
A. A. Metcalf. Predicting New Words. The Secrets of Their Success. Houghton Mifflin Harcourt, 2004.
[25]
G. A. Miller. WordNet: a lexical database for English. Communications of the ACM, 38(11):39--41, Nov. 1995.
[26]
R. S. Olson and Z. P. Neal. Navigating the massive world of reddit: Using backbone networks to map user interests in social media. arXiv.org, page 3387, Dec. 2013.
[27]
O. Owoputi, B. O'Connor, C. Dye, K. Gimpel, N. Schneider, and N. A. Smith. Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters .
[28]
D. Rao, D. Yarowsky, A. Shreevats, and M. Gupta. Classifying latent user attributes in twitter. In SMUC '10: Proceedings of the 2nd international workshop on Search and mining user-generated contents. ACM Request Permissions, Oct. 2010.
[29]
D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, and M. Young. Machine Learning: The High-Interest Credit Card of Technical Debt. 2003.
[30]
L. Trask. Language Change. Routledge, June 2013.
[31]
L. Weng and Y.-Y. Ahn. Predicting Successful Memes using Network and Community Structure. arXiv.org, page 6199, Mar. 2014.
[32]
T. Weninger, X. A. Zhu, and J. Han. An exploration of discussion threads in social news sites: A case study of the Reddit community. In Advances in Social Networks Analysis and Mining (ASONAM), 2013 IEEE/ACM International Conference on, pages 579--583, 2013.
[33]
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster Computing with Working Sets. pages 1--7, May 2010.
[34]
X. Zhang and Y. LeCun. Text Understanding from Scratch. arXiv.org, page 1710, Feb. 2015.
[35]
Y. Zhao, G. Wang, P. S. Yu, S. Liu, and S. Zhang. Inferring social roles and statuses in social networks. In KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, page 695, New York, New York, USA, Aug. 2013. ACM Request Permissions.

Cited By

View all
  • (2024)Detecting emerging vocabulary in a large corpus of Italian tweetsResearch in Corpus Linguistics10.32714/ricl.13.01.0713:1(139-170)Online publication date: 2024
  • (2023)Can Large Language Models Transform Computational Social Science?Computational Linguistics10.1162/coli_a_0050250:1(237-291)Online publication date: 1-Mar-2023
  • (2022)Detecting and categorising lexical innovations in a corpus of tweetsPsychology of Language and Communication10.2478/plc-2022-1526:1(313-329)Online publication date: 21-Oct-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining
February 2016
746 pages
ISBN:9781450337168
DOI:10.1145/2835776
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 February 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. language change
  2. language evolution
  3. osn
  4. reddit
  5. twitter

Qualifiers

  • Research-article

Funding Sources

Conference

WSDM 2016
WSDM 2016: Ninth ACM International Conference on Web Search and Data Mining
February 22 - 25, 2016
California, San Francisco, USA

Acceptance Rates

WSDM '16 Paper Acceptance Rate 67 of 368 submissions, 18%;
Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)3
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Detecting emerging vocabulary in a large corpus of Italian tweetsResearch in Corpus Linguistics10.32714/ricl.13.01.0713:1(139-170)Online publication date: 2024
  • (2023)Can Large Language Models Transform Computational Social Science?Computational Linguistics10.1162/coli_a_0050250:1(237-291)Online publication date: 1-Mar-2023
  • (2022)Detecting and categorising lexical innovations in a corpus of tweetsPsychology of Language and Communication10.2478/plc-2022-1526:1(313-329)Online publication date: 21-Oct-2022
  • (2021)Data Augmentation for Layperson’s Medical Entity Linking TaskProceedings of the 13th Annual Meeting of the Forum for Information Retrieval Evaluation10.1145/3503162.3503172(99-106)Online publication date: 13-Dec-2021
  • (2021)Beyond action video games: Differences in gameplay and ability preferences among gaming genresEntertainment Computing10.1016/j.entcom.2021.10040838(100408)Online publication date: May-2021
  • (2019)Lexical Emergence on Reddit: An Analysis of Lexical Change on the “Front Page of the Internet”Lexis10.4000/lexis.4917Online publication date: 2-Sep-2019
  • (2016)Computational sociolinguisticsComputational Linguistics10.1162/COLI_a_0025842:3(537-593)Online publication date: 1-Sep-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media