‘Linguistics-Lite’ Topic Extraction from Multilingual Social Media Data

Peter A. Chew¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9021))

Included in the following conference series:

International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction

3129 Accesses
4 Citations

Abstract

To achieve accurate situation assessments and information dominance the commander needs accurate and rapid insight into the socio-cognitive landscape of his communities of interest. This requires insight into the key actors, groups, and their issues and concerns, and to have early indicators of changes. Social media (which by its nature is noisy and multilingual) is increasing the amount and type of data available for early assessment of rapidly emerging and changing situations such as disasters or crises. In this paper, we present a way of extracting topics from this kind of data in a principled and scalable fashion – regardless of the mix of languages, subject matter, or provenance of data (e.g. Twitter, VKontakte). Using a non-trivial validation task, we demonstrate that the technique is highly accurate (around 92%). We then show the results of applying the technique to a sample of around 100,000 Twitter posts generally relating to the early-2014 conflict in Ukraine, and explain how these results – or comparable results of applying the technique to other datasets – would enable a busy analyst quickly to gain a top-down understanding of a large set of data and help him or her to decide where to focus more detailed attention.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Understanding Russian Information Operations Using Unsupervised Multilingual Topic Modeling

Cross-platform comparison of framed topics in Twitter and Weibo: machine learning approaches to social media text mining

Article 14 August 2021

Data and Methodology in the Twitter EP2019 Analysis

References

Gartner Group: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011. Gartner Report ID G00208112 (2011)
Google Scholar
Lindsay, B.: Social Media and Disasters: Current Uses, Future Options, and Policy Considerations. Congressional Research Service 7-5700, R41987 (2011)
Google Scholar
Drozdova, K., Samoilov, M.: Predictive analysis of concealed social network activities based on communication technology choices: early-warning detection of attack signals from terrorist organizations. Comp. and Math. Org. Theory 16(1), 61–88 (2010)
Article Google Scholar
Costa, B., Boiney, J.: Social Radar. MITRE Technical Report #120088 (2012)
Google Scholar
Chew, P.: Critiquing Text Analysis in Social Modeling: Best Practices, Limitations, and New Frontiers. Soc. Computing, Behavioral-Cultural Modeling & Prediction, pp. 350-358 (2013)
Google Scholar
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)
MATH Google Scholar
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by Latent Semantic Analysis. Journal of the American Soc. for Inf. Science 41(6), 391–407 (1990)
Google Scholar
Chew, P.A., Bader, B.W., Helmreich, S., Abdelali, A., Verzi, S.J.: An Information-Theoretic, Vector-Space Model Approach to Cross-Language Information Retrieval. Journal of Natural Language Engineering 17(1), 37–70 (2011)
Article Google Scholar
Young, P.: Cross Language Information Retrieval Using Latent Semantic Indexing. Master’s thesis, University of Knoxville, Tennessee: Knoxville, TN (1994)
Google Scholar
Brown, P.F., Della Pietra, V.J., Della Pietra, S.A., Mercer, R.L.: The Mathematics of Statistical Machine Translation: Parameter Estimation. Comp. Ling. 19(2), 263–311 (1993)
Google Scholar

Download references

Author information

Authors and Affiliations

Galisteo Consulting Group, Inc., 4004 Carlisle Blvd NE, Suite H, Albuquerque, NM, 87107, USA
Peter A. Chew

Authors

Peter A. Chew
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter A. Chew .

Editor information

Editors and Affiliations

University of Arkansas, Little Rock, Arkansas, USA
Nitin Agarwal
Technicolor Research, Los Altos, California, USA
Kevin Xu
University of Saskatchewan, Saskatoon, Saskatchewan, Canada
Nathaniel Osgood

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chew, P.A. (2015). ‘Linguistics-Lite’ Topic Extraction from Multilingual Social Media Data. In: Agarwal, N., Xu, K., Osgood, N. (eds) Social Computing, Behavioral-Cultural Modeling, and Prediction. SBP 2015. Lecture Notes in Computer Science(), vol 9021. Springer, Cham. https://doi.org/10.1007/978-3-319-16268-3_30

Download citation

DOI: https://doi.org/10.1007/978-3-319-16268-3_30
Published: 17 March 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16267-6
Online ISBN: 978-3-319-16268-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

‘Linguistics-Lite’ Topic Extraction from Multilingual Social Media Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Understanding Russian Information Operations Using Unsupervised Multilingual Topic Modeling

Cross-platform comparison of framed topics in Twitter and Weibo: machine learning approaches to social media text mining

Data and Methodology in the Twitter EP2019 Analysis

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

‘Linguistics-Lite’ Topic Extraction from Multilingual Social Media Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Understanding Russian Information Operations Using Unsupervised Multilingual Topic Modeling

Cross-platform comparison of framed topics in Twitter and Weibo: machine learning approaches to social media text mining

Data and Methodology in the Twitter EP2019 Analysis

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation