Abstract
For various reasons, such as new requirements, architecture refactoring, and bug fixing, software projects often evolve to yield better quality and performance. All changes produced during the development process are reflected in the source code, which provides an opportunity to explore software evolution. In this paper, we propose a visual analytics system to support evolution analysis based on topic modeling. We focus on three aspects: (1) when significant changes to source code occur, (2) how software features evolve, and (3) why software evolution occurs. Each source file is regarded as a document and represented by its topic vector. The files of each two successive versions are classified into four types to quantify version differences, and the number of topic-associated files is denoted as the topic assignment to characterize feature evolution. Finally, we inspect the causes of software evolution through the visual comparison between versions. Two case studies on JavaScript libraries demonstrate the usefulness and effectiveness of our system.
Similar content being viewed by others
References
Alcocer JPS, Beck F, Bergel A (2019) Performance evolution matrix: visualizing performance variations along software versions. In: 2019 Working conference on software visualization (VISSOFT), pp. 1–11. IEEE
Banitaan S, Alenezi M (2015) Software evolution via topic modeling: an analytic study. Int J Softw Eng Appl 9(5):43–52
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022
Bolte F, Bruckner S (2020) Vis-a-vis: visual exploration of visualization source code evolution. IEEE Trans Vis Comput Gr
Burch M, Munz T, Beck F, Weiskopf D (2015) Visualizing work processes in software engineering with developer rivers. In: 2015 IEEE 3rd working conference on software visualization (VISSOFT), pp. 116–124. IEEE
Carreño LVG, Winbladh K (2013) Analysis of user comments: an approach for software requirements evolution. In: 2013 35th international conference on software engineering (ICSE), pp. 582–591. IEEE
Chen TH, Thomas SW, Nagappan M, Hassan AE (2012) Explaining software defects using topic models. In: 2012 9th IEEE working conference on mining software repositories (MSR), pp. 189–198. IEEE
Chotisarn N, Merino L, Zheng X, Lonapalawong S, Zhang T, Xu M, Chen W (2020) A systematic literature review of modern software visualization. J Vis 23(4):539–558
Gethers M, Poshyvanyk D (2010) Using relational topic models to capture coupling among classes in object-oriented software systems. In: 2010 IEEE international conference on software maintenance, pp. 1–10. IEEE
Gleicher M, Albers D, Walker R, Jusufi I, Hansen CD, Roberts JC (2011) Visual comparison for information visualization. Inf Vis 10(4):289–309
Göde N, Koschke R (2009) Incremental clone detection. In: 2009 13th European conference on software maintenance and reengineering, pp. 219–228. IEEE
Havre S, Hetzler E, Whitney P, Nowell L (2002) Themeriver: visualizing thematic changes in large document collections. IEEE Trans Vis Comput Gr 8(1):9–20
Hindle A, Godfrey MW, Holt RC (2009) What’s hot and what’s not: Windowed developer topic analysis. In: 2009 IEEE international conference on software maintenance, pp. 339–348. IEEE
Hu J, Sun X, Li B (2015) Explore the evolution of development topics via on-line LDA. In: 2015 IEEE 22nd international conference on software analysis, evolution, and reengineering (SANER), pp. 555–559. IEEE
Hu J, Sun X, Lo D, Li B (2015) Modeling the evolution of development topics using dynamic topic models. In: 2015 IEEE 22nd international conference on software analysis, evolution, and reengineering (SANER), pp. 3–12. IEEE
Ishio T, Maeda N, Shibuya K, Inoue K (2018) Cloned buggy code detection in practice using normalized compression distance. In: 2018 IEEE international conference on software maintenance and evolution (ICSME), pp. 591–594. IEEE
Jelodar H, Wang Y, Yuan C, Feng X, Jiang X, Li Y, Zhao L (2019) Latent dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimed Tools Appl 78(11):15169–15211
Juričić V (2011) Detecting source code similarity using low-level languages. In: Proceedings of the ITI 2011, 33rd international conference on information technology interfaces, pp. 597–602. IEEE
Kamiya T, Kusumoto S, Inoue K (2002) Ccfinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 28(7):654–670
Kawamitsu N, Ishio T, Kanda T, Kula RG, De Roover C, Inoue K (2014) Identifying source code reuse across repositories using LCS-based source code similarity. In: 2014 IEEE 14th international working conference on source code analysis and manipulation, pp. 305–314. IEEE
Linstead E, Lopes C, Baldi P (2008) An application of latent dirichlet allocation to analyzing software evolution. In: 2008 seventh international conference on machine learning and applications, pp. 813–818. IEEE
Linstead E, Rigor P, Bajracharya S, Lopes C, Baldi P (2007) Mining concepts from code with probabilistic topic models. In: Proceedings of the twenty-second IEEE/ACM international conference on automated software engineering, pp. 461–464
Liu S, Cui W, Wu Y, Liu M (2014) A survey on information visualization: recent advances and challenges. Vis Comput 30(12):1373–1393
Lukins SK, Kraft NA, Etzkorn LH (2010) Bug localization using latent dirichlet allocation. Inf Softw Technol 52(9):972–990
Nam D, Lee YK, Medvidovic N (2018) Eva: a tool for visualizing software architectural evolution. In: Proceedings of the 40th international conference on software engineering: companion proceeedings, pp. 53–56
Novais RL, Torres A, Mendes TS, Mendonça M, Zazworka N (2013) Software evolution visualization: a systematic mapping study. Inf Softw Technol 55(11):1860–1883
Ogawa M, Ma KL (2010) Software evolution storylines. In: Proceedings of the 5th international symposium on Software visualization, pp. 35–42
Popescu DA, Nicolae D (2014) Determining the similarity of two web applications using the edit distance. In: International workshop soft computing applications, pp. 681–690. Springer
Ragkhitwetsagul C, Krinke J, Clark D (2018) A comparison of code similarity analysers. Empir Softw Eng 23(4):2464–2519
Schneider T, Tymchuk Y, Salgado R, Bergel A (2016) Cuboidmatrix: exploring dynamic structural connections in software components using space-time cube. In: 2016 IEEE working conference on software visualization (VISSOFT), pp. 116–125. IEEE
Sun X, Liu X, Li B, Duan Y, Yang H, Hu J (2016) Exploring topic models in software engineering data analysis: a survey. In: 2016 17th IEEE/ACIS international conference on software engineering, artificial intelligence, networking and parallel/distributed computing (SNPD), pp. 357–362. IEEE
Telea A, Auber D (2008) Code flows: visualizing structural evolution of source code. Comput Gr Forum 27(3):831–838
Thomas SW, Adams B, Hassan AE, Blostein D (2010) Validating the use of topic models for software evolution. In: 2010 10th IEEE working conference on source code analysis and manipulation, pp. 55–64. IEEE
Thomas SW, Adams B, Hassan AE, Blostein D (2011) Modeling the evolution of topics in source code histories. In: Proceedings of the 8th working conference on mining software repositories, pp. 173–182
Thomas SW, Adams B, Hassan AE, Blostein D (2014) Studying software evolution using topic models. Sci Comput Program 80:457–479
Vincúr J, Návrat P, Polasek I (2017) Vr city: software analysis in virtual reality environment. In: 2017 IEEE international conference on software quality, reliability and security companion (QRS-C), pp. 509–516. IEEE
Wittenhagen M, Cherek C, Borchers J (2016) Chronicler: interactive exploration of source code history. In: Proceedings of the 2016 CHI conference on human factors in computing systems, pp. 3522–3532
Yoon Y, Myers BA, Koo S (2013) Visualization of fine-grained code change history. In: 2013 IEEE symposium on visual languages and human centric computing, pp. 119–126. IEEE
Acknowledgements
This work was supported by the National Key Research & Development Program of China (2017YFB0202203) and National Natural Science Foundation of China (61672452, 61890954, and 61972343).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, H., Tao, Y., Qiu, Y. et al. Visual exploration of software evolution via topic modeling. J Vis 24, 827–844 (2021). https://doi.org/10.1007/s12650-020-00739-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12650-020-00739-7