Analyzing topics and authors in chat logs for crime investigation

Abdur Rahman M. A. Basher¹ &
Benjamin C. M. Fung¹

752 Accesses
14 Citations
7 Altmetric
1 Mention
Explore all metrics

Abstract

Cybercriminals have been using the Internet to accomplish illegitimate activities and to execute catastrophic attacks. Computer-Mediated Communication such as online chat provides an anonymous channel for predators to exploit victims. In order to prosecute criminals in a court of law, an investigator often needs to extract evidence from a large volume of chat messages. Most of the existing search tools are keyword-based, and the search terms are provided by an investigator. The quality of the retrieved results depends on the search terms provided. Due to the large volume of chat messages and the large number of participants in public chat rooms, the process is often time-consuming and error-prone. This paper presents a topic search model to analyze archives of chat logs for segregating crime-relevant logs from others. Specifically, we propose an extension of the Latent Dirichlet Allocation-based model to extract topics, compute the contribution of authors in these topics, and study the transitions of these topics over time. In addition, we present a special model for characterizing authors-topics over time. This is crucial for investigation because it provides a view of the activity in which authors are involved in certain topics. Experiments on two real-life datasets suggest that the proposed approach can discover hidden criminal topics and the distribution of authors to these topics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Dynamic Topic Modeling Reveals Variations in Online Hate Narratives

Advanced Data Preprocessing for Detecting Cybercrime in Text-Based Online Interactions

Adaptation of Static and Contextualized Topic Modeling Techniques to Hidden Community Detection

References

Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Rosen-Zvi M, Griffiths T, Steyvers M, Smyth P (2004) The author-topic model for authors and documents. In: Proceedings of the 20th UAI, pp 487–494
Wang X, Mohanty N, McCallum A (2005) Group and topic discovery from relations and text. In: Proceedings of the 3rd ACM LinkKDD, pp 28–35
Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled lda: a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 EMNLP, vol 1, pp 248–256
Hong L, Davison BD (2010) Empirical study of topic modeling in twitter. In: Proceedings of the 1st SOMA, pp 80–88
Banerjee S, Agarwal N (2012) Analyzing collective behavior from blogs using swarm intelligence. KAIS, pp 1–25
Blei D, McAuliffe J (2008) Supervised topic models. Adv Neural Inf Process Syst 20:121–128
Google Scholar
Lacoste-julien S, Sha F, Jordan MI (2008) DiscLDA: discriminative learning for dimensionality reduction and classification. In: Proceedings of the 22nd NIPS, pp 897–904
Ramage D, Heymann P, Manning CD, Garcia-Molina H (2009) Clustering the tagged web. In: Proceedings of the 2nd ACM WSDM, pp 54–63
Rubin T, Chambers A, Smyth P, Steyvers M (2012) Statistical topic models for multi-label document classification. Mach Learn 88:157–208
Article MATH MathSciNet Google Scholar
Chang J, Boyd-Graber J, Blei DM (2009) Connections between the lines: augmenting social networks with text. In: Proceedings of the 15th ACM SIGKDD, pp 169–178
Song X, Lin CY, Tseng BL, Sun MT (2005) Modeling and predicting personal information dissemination behavior. In: Proceedings of the 11th ACM SIGKDD, pp 479–488
Wang X, McCallum A (2006) Topics over time: a non-markov continuous-time model of topical trends. In: Proceedings of the 12th ACM SIGKDD, pp 424–433
Wang C, Blei DM, Heckerman D (2008) Continuous time dynamic topic models. In: UAI’08, pp 579–586
Blei DM, Lafferty JD (2006) Dynamic topic models. In: Proceedings of the 23rd ICML, pp 113–120
AlSumait L, Barbará D, Domeniconi C (2008) On-line lda: adaptive topic models for mining text streams with applications to topic detection and tracking. In: Proceedings of the 8th IEEE ICDM, pp 3–12
Du L, Buntine W, Jin H, Chen C (2012) Sequential latent dirichlet allocation. KAIS 31:475–503
Google Scholar
Manning CD, Raghavan P, Schtze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
Book MATH Google Scholar
Minka T, Lafferty J (2002) Expectation-propagation for the generative aspect model. In: Proceedings of the 18th UAI, pp 352–359
Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101:5228–5235
Article Google Scholar
Heinrich G (2004) Parameter estimation for text analysis. Technical Report
Zhao WX, Jiang J, Weng J, He J, Lim EP, Yan H, Li X (2011) Comparing twitter and traditional media using topic models. In: Proceedings of the 33rd ECIR. Springer, Berlin, pp 338–349
PJF Inc. Chat log conviction numbers. Available: http://www.ciise.concordia.ca/~fung/pub/convictions.txt
Teh YW, Jordan MI, Beal MJ, Blei DM (2004) Sharing clusters among related groups: hierarchical dirichlet processes. In: Proceedings of the 19th NIPS, pp 1385–1392

Download references

Acknowledgments

The authors would like to thank the anonymous reviewers for their constructive comments that greatly helped improve this paper. The research is supported in part by research grants from Le Fonds québécois de la recherche sur la nature et les technologies (FQRNT) new researchers start-up program, Concordia ENCS seed funding program, and the National Cyber-Forensics and Training Alliance Canada (NCFTA Canada).

Author information

Authors and Affiliations

Concordia Institute for Information Systems Engineering, Concordia University, Montreal, QC, H3G 1M8, Canada
Abdur Rahman M. A. Basher & Benjamin C. M. Fung

Authors

Abdur Rahman M. A. Basher
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin C. M. Fung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Benjamin C. M. Fung.

Rights and permissions

Reprints and permissions

About this article

Cite this article

M. A. Basher, A.R., Fung, B.C.M. Analyzing topics and authors in chat logs for crime investigation. Knowl Inf Syst 39, 351–381 (2014). https://doi.org/10.1007/s10115-013-0617-y

Download citation

Received: 01 March 2012
Revised: 28 January 2013
Accepted: 17 February 2013
Published: 08 March 2013
Issue Date: May 2014
DOI: https://doi.org/10.1007/s10115-013-0617-y

Analyzing topics and authors in chat logs for crime investigation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Dynamic Topic Modeling Reveals Variations in Online Hate Narratives

Advanced Data Preprocessing for Detecting Cybercrime in Text-Based Online Interactions

Adaptation of Static and Contextualized Topic Modeling Techniques to Hidden Community Detection

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Analyzing topics and authors in chat logs for crime investigation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Dynamic Topic Modeling Reveals Variations in Online Hate Narratives

Advanced Data Preprocessing for Detecting Cybercrime in Text-Based Online Interactions

Adaptation of Static and Contextualized Topic Modeling Techniques to Hidden Community Detection

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation