Abstract
To achieve accurate situation assessments and information dominance the commander needs accurate and rapid insight into the socio-cognitive landscape of his communities of interest. This requires insight into the key actors, groups, and their issues and concerns, and to have early indicators of changes. Social media (which by its nature is noisy and multilingual) is increasing the amount and type of data available for early assessment of rapidly emerging and changing situations such as disasters or crises. In this paper, we present a way of extracting topics from this kind of data in a principled and scalable fashion – regardless of the mix of languages, subject matter, or provenance of data (e.g. Twitter, VKontakte). Using a non-trivial validation task, we demonstrate that the technique is highly accurate (around 92%). We then show the results of applying the technique to a sample of around 100,000 Twitter posts generally relating to the early-2014 conflict in Ukraine, and explain how these results – or comparable results of applying the technique to other datasets – would enable a busy analyst quickly to gain a top-down understanding of a large set of data and help him or her to decide where to focus more detailed attention.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Gartner Group: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011. Gartner Report ID G00208112 (2011)
Lindsay, B.: Social Media and Disasters: Current Uses, Future Options, and Policy Considerations. Congressional Research Service 7-5700, R41987 (2011)
Drozdova, K., Samoilov, M.: Predictive analysis of concealed social network activities based on communication technology choices: early-warning detection of attack signals from terrorist organizations. Comp. and Math. Org. Theory 16(1), 61–88 (2010)
Costa, B., Boiney, J.: Social Radar. MITRE Technical Report #120088 (2012)
Chew, P.: Critiquing Text Analysis in Social Modeling: Best Practices, Limitations, and New Frontiers. Soc. Computing, Behavioral-Cultural Modeling & Prediction, pp. 350-358 (2013)
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by Latent Semantic Analysis. Journal of the American Soc. for Inf. Science 41(6), 391–407 (1990)
Chew, P.A., Bader, B.W., Helmreich, S., Abdelali, A., Verzi, S.J.: An Information-Theoretic, Vector-Space Model Approach to Cross-Language Information Retrieval. Journal of Natural Language Engineering 17(1), 37–70 (2011)
Young, P.: Cross Language Information Retrieval Using Latent Semantic Indexing. Master’s thesis, University of Knoxville, Tennessee: Knoxville, TN (1994)
Brown, P.F., Della Pietra, V.J., Della Pietra, S.A., Mercer, R.L.: The Mathematics of Statistical Machine Translation: Parameter Estimation. Comp. Ling. 19(2), 263–311 (1993)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Chew, P.A. (2015). ‘Linguistics-Lite’ Topic Extraction from Multilingual Social Media Data. In: Agarwal, N., Xu, K., Osgood, N. (eds) Social Computing, Behavioral-Cultural Modeling, and Prediction. SBP 2015. Lecture Notes in Computer Science(), vol 9021. Springer, Cham. https://doi.org/10.1007/978-3-319-16268-3_30
Download citation
DOI: https://doi.org/10.1007/978-3-319-16268-3_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16267-6
Online ISBN: 978-3-319-16268-3
eBook Packages: Computer ScienceComputer Science (R0)