[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Topic modeling in software engineering research

Published: 01 November 2021 Publication History

Abstract

Topic modeling using models such as Latent Dirichlet Allocation (LDA) is a text mining technique to extract human-readable semantic “topics” (i.e., word clusters) from a corpus of textual documents. In software engineering, topic modeling has been used to analyze textual data in empirical studies (e.g., to find out what developers talk about online), but also to build new techniques to support software engineering tasks (e.g., to support source code comprehension). Topic modeling needs to be applied carefully (e.g., depending on the type of textual data analyzed and modeling parameters). Our study aims at describing how topic modeling has been applied in software engineering research with a focus on four aspects: (1) which topic models and modeling techniques have been applied, (2) which textual inputs have been used for topic modeling, (3) how textual data was “prepared” (i.e., pre-processed) for topic modeling, and (4) how generated topics (i.e., word clusters) were named to give them a human-understandable meaning. We analyzed topic modeling as applied in 111 papers from ten highly-ranked software engineering venues (five journals and five conferences) published between 2009 and 2020. We found that (1) LDA and LDA-based techniques are the most frequent topic modeling techniques, (2) developer communication and bug reports have been modelled most, (3) data pre-processing and modeling parameters vary quite a bit and are often vaguely reported, and (4) manual topic naming (such as deducting names based on frequent words in a topic) is common.

References

[1]
Abdellatif A, Costa D, Badran K, Abdalkareem R, Shihab E (2020) Challenges in Chatbot Development: A Study of Stack Overflow Posts. In: Proceedings of the 17th international conference on mining software repositories., vol 12. IEEE/ACM, Seoul, pp 174–185
[2]
Abdellatif TM, Capretz LF, and Ho DAutomatic recall of software lessons learned for software project managersInf Softw Technol201911544-57https://doi.org/10.1016/j.infsof.2019.07.006
[3]
Aggarwal CC and Zhai CMining text data2012New YorkSpringerhttps://doi.org/10.1007/978-1-4614-3223-4
[4]
Agrawal A, Fu W, and Menzies TWhat is wrong with topic modeling? And how to fix it using search-based software engineeringInf Softw Technol201898January 201774-88https://doi.org/10.1016/j.infsof.2018.02.005
[5]
Ahasanuzzaman M, Asaduzzaman M, Roy CK, and Schneider KACAPS: a supervised technique for classifying Stack Overflow posts concerning API issuesEmpir Softw Eng2019251493-1532https://doi.org/10.1007/s10664-019-09743-4
[6]
Ahmed S, Bagherzadeh M (2018) What do concurrency developers ask about?: A large-scale study using Stack Overflow. In: Proceedings of the international symposium on empirical software engineering and measurement. ACM, Oulu, pp 1–10
[7]
Ali N, Sharafi Z, Guéhéneuc YG, and Antoniol GAn empirical study on the importance of source code entities for requirements traceabilityEmpir Softw Eng2015202442-478https://doi.org/10.1007/s10664-014-9315-y
[8]
Alipour A, Hindle A, Stroulia E (2013) A contextual approach towards more accurate duplicate bug report detection. In: IEEE international working conference on mining software repositories. pp 183–192.
[9]
Altarawy D, Shahin H, Mohammed A, and Meng NLASCAD: Language-agnostic software categorization and similar application detectionJ Syst Softw201814221-34https://doi.org/10.1016/j.jss.2018.04.018
[11]
Asuncion HU, Asuncion AU, Taylor RN (2010) Software traceability with topic modeling. In: Proceedings of the international conference on software engineering. IEEE/ACM, Cape Town, pp 95–104
[12]
Bagherzadeh M, Khatchadourian R (2019) Going big: a large-scale study on what big data developers ask. In: Proceedings of the 27th joint european software engineering conference and symposium on the foundations of software engineering. ACM, Tallinn, pp 432–442
[13]
Bajaj K, Pattabiraman K, Mesbah A (2014) Mining questions asked by web developers. In: Proceedings of the 11th working conference on mining software repositories. ACM, Hyderabad, pp 112–121
[14]
Bajracharya S, Lopes C (2009) Mining search topics from a code search engine usage log. In: Proceedings of the 6th international working conference on mining software repositories. IEEE, Vancouver, pp 111–120
[15]
Bajracharya SK and Lopes CVAnalyzing and mining a code search engine usage logEmpir Softw Eng201217424-466https://doi.org/10.1007/s10664-010-9144-6
[16]
Barua A, Thomas SW, and Hassan AEWhat are developers talking about? An analysis of topics and trends in Stack OverflowEmpir Softw Eng2014193619-654https://doi.org/10.1007/s10664-012-9231-y
[17]
Bavota G, Gethers M, Oliveto R, Poshyvanyk D, and Lucia ADEImproving software modularization via automated analysis of latentACM Trans Softw Eng Methodol20142311-33https://doi.org/10.1145/2559935
[18]
Bavota G, Oliveto R, Gethers M, Poshyvanyk D, and De Lucia AMethodbook: Recommending move method refactorings via relational topic modelsIEEE Trans Softw Eng2014407671-694https://doi.org/10.1109/TSE.2013.60
[19]
Beitzel SM, Jensen EC, Frieder O (2009) MAP. In: Encyclopedia of database systems. Springer US, Boston, pp 1691–1692
[20]
Belle AB, Boussaidi GE, and Kpodjedo SCombining lexical and structural information to reconstruct software layersInf Softw Technol2016741-16https://doi.org/10.1016/j.infsof.2016.01.008
[21]
Bi T, Liang P, Tang A, and Yang CA systematic mapping study on text analysis techniques in software architectureJ Syst Softw2018144533-558https://doi.org/10.1016/j.jss.2018.07.055
[22]
Biggers LR, Bocovich C, Capshaw R, Eddy BP, Etzkorn LH, and Kraft NAConfiguring latent Dirichlet allocation based feature locationEmpir Softw Eng2014193465-500https://doi.org/10.1007/s10664-012-9224-x
[23]
Binkley D, Lawrie D, Uehlinger C, and Heinz DEnabling improved IR-based feature locationJ Syst Softw201510130-42https://doi.org/10.1016/j.jss.2014.11.013
[24]
Blasco D, Cetina C, and Pastor OA fine-grained requirement traceability evolutionary algorithm: Kromaia, a commercial video game case studyInf Softw Technol20201191-12https://doi.org/10.1016/j.infsof.2019.106235
[25]
Blei DM, Jordan MI, Griffiths TL, Tenenbaum JB (2003a) Hierarchical topic models and the nested chinese restaurant process. In: Proceedings of the 16th international conference on neural information processing systems. Neural Information Processing Systems Foundation, Vancouver, pp 17–24
[26]
Blei DM, Ng AY, and Jordan MILatent Dirichlet allocationJ Mach Learn Res20033993-1022https://doi.org/10.1162/jmlr.2003.3.4-5.993
[27]
Brank J, Mladenić D, Grobelnik M, Liu H, Mladenić D, Flach PA, Garriga GC, Toivonen H, Toivonen H (2011) F 1-measure. In: Encyclopedia of machine learning. Springer US, pp 397–397
[28]
Canfora G, Cerulo L, Cimitile M, and Di Penta MHow changes affect software entropy: An empirical studyEmpir Softw Eng2014191-38https://doi.org/10.1007/s10664-012-9214-z
[29]
Cao B, Frank Liu X, Liu J, and Tang MDomain-aware Mashup service clustering based on LDA topic model from multiple data sourcesInf Softw Technol20179040-54https://doi.org/10.1016/j.infsof.2017.05.001
[30]
Capiluppi A, Ruscio DD, Rocco JD, Nguyen PT, Ajienka N (2020) Detecting Java software similarities by using different clustering techniques. Inf Softw Technol 122.
[31]
Catolino G, Palomba F, Zaidman A, and Ferrucci FNot all bugs are the same: Understanding, characterizing, and classifying bug typesJ Syst Softw2019152165-181https://doi.org/10.1016/j.jss.2019.03.002
[32]
Chang J, Blei DM (2009) Relational topic models for document networks. In: Proceedings of the 12th international conference on artificial intelligence and statistics. Society for Artificial Intelligence and Statistics, Clearwater Beach, pp 81–88
[33]
Chang J and Blei DMHierarchical relational models for document networksAnn Appl Stat201041124-150https://doi.org/10.1214/09-AOAS309
[34]
Chang J, Boyd-Graber J, Gerrish S, Wang C, Blei DM (2009) Reading tea leaves: How humans interpret topic models. In: Proceedings of the 2009 conference advances in neural information. Neural Information Processing Systems Foundation, Vancouver, pp 288–296
[35]
Chatterjee P, Damevski K, Pollock L (2019) Exploratory study of slack q&a chats as a mining source for software engineering tools. In: Proceedings of the 16th international conference on mining software repositories. IEEE, Montreal, pp 1–12
[36]
Chen H, Coogle J, and Damevski KModeling stack overflow tags and topics as a hierarchy of conceptsJ Syst Softw2019156283-299https://doi.org/10.1016/j.jss.2019.07.033
[37]
Chen L, Hassan F, Wang X, Zhang L (2020) Taming behavioral backward incompatibilities via cross-project testing and analysis. In: Proceedings of the 42nd international conference on software engineering. IEEE/ACM, Seoul, pp 112–124
[38]
Chen N, Lin J, Hoi SC, Xiao X, Zhang B (2014) AR-miner: Mining informative reviews for developers from mobile app marketplace. In: Proceedings of the international conference on software engineering., vol 1. IEEE/ACM, Hyderabad, pp 767–778
[39]
Chen TH, Thomas SW, Nagappan M, Hassan AE (2012) Explaining software defects using topic models. In: Proceedings of the international working conference on mining software repositories. IEEE, Zurich, pp 189–198
[40]
Chen TH, Thomas SW, and Hassan AEA survey on the use of topic models when mining software repositoriesEmpir Softw Eng20162151843-1919https://doi.org/10.1007/s10664-015-9402-8
[41]
Chen TH, Shang W, Nagappan M, Hassan AE, and Thomas SWTopic-based software defect explanationJ Syst Softw201712979-106https://doi.org/10.1016/j.jss.2016.05.015
[42]
Choetkiertikul M, Dam HK, Tran T, and Ghose APredicting the delay of issues with due dates in software projectsEmpir Softw Eng2017221223-1263https://doi.org/10.1007/s10664-016-9496-7
[43]
Craswell N (2009) Mean reciprocal rank. In: Encyclopedia of database systems. Springer US, pp 1703–1703
[44]
Croft WB and Metzler D Search engines: Information retrieval in practice 2010 Reading Addison-Wesley
[45]
Cui D, Liu T, Cai Y, Zheng Q, Feng Q, Jin W, Guo J, Qu Y (2019) Investigating the impact of multiple dependency structures on software defects, IEEE/ACM, Montreal.
[46]
Damevski K, Chen H, Shepherd DC, Kraft NA, and Pollock LPredicting future developer behavior in the IDE using topic modelsIEEE Trans Softw Eng201844111100-1111https://doi.org/10.1109/TSE.2017.2748134
[47]
De Lucia A, Di Penta M, Oliveto R, Panichella A, and Panichella SLabeling source code with information retrieval methods: An empirical studyEmpir Softw Eng20141951383-1420https://doi.org/10.1007/s10664-013-9285-5
[48]
Deerwester S, Dumais ST, Furnas GW, Landauer TK, and Harshman RIndexing by latent semantic analysisEmpir Softw Eng1990416391-407https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
[49]
Demissie BF, Ceccato M, and Shar LKSecurity analysis of permission re-delegation vulnerabilities in Android appsEmpir Softw Eng2020255084-5136https://doi.org/10.1007/s10664-020-09879-8
[50]
Dietz L, Bickel S, Scheffer T (2007) Unsupervised prediction of citation influences. In: Proceedings of the 24th international conference on machine learning. ACM, Corvallis, pp 233–240
[51]
Dit B, Revelle M, and Poshyvanyk DIntegrating information retrieval, execution and link analysis algorithms to improve feature location in softwareEmpir Softw Eng2013182277-309https://doi.org/10.1007/s10664-011-9194-4
[52]
El Zarif O, Da Costa DA, Hassan S, Zou Y (2020) On the relationship between user churn and software issues. In: Proceedings of the 17th international conference on mining software repositories. ACM, New York, pp 339–349
[53]
Fowkes J, Chanthirasegaran P, Ranca R, Allamanis M, Lapata M, and Sutton CAutofolding for source code summarizationProc Int Conf Softw Eng20164312649-652https://doi.org/10.1145/2889160.2889171
[54]
Fu Y, Yan M, Zhang X, Xu L, Yang D, and Kymer JDAutomated classification of software change messages by semi-supervised Latent Dirichlet AllocationInf Softw Technol201557369-377https://doi.org/10.1016/j.infsof.2014.05.017
[55]
Galvis Carreno LV, Winbladh K (2012) Analysis of user comments: an approach for software requirements evolution. In: Proceedings of the international conference on software engineering. IEEE/ACM, San Francisco, pp 582–591
[56]
Gao C, Zeng J, Lyu MR, King I (2018) Online app review analysis for identifying emerging issues. In: Proceedings of the 40th international conference on software engineering. IEEE/ACM, Gothenburg, pp 48–58
[57]
Gopalakrishnan R, Sharma P, Mirakhorli M, Galster M (2017) Can latent topics in source code predict missing architectural tactics?. In: Proceedings of the 39th international conference on software engineering, IEEE/ACM, pp 15–26. http://ghtorrent.org/
[58]
Gorla A, Tavecchia I, Gross F, Zeller A (2014) Checking app behavior against app descriptions. In: Proceedings of the international conference on software engineering. IEEE/ACM, Hyderabad, pp 1025–1035
[59]
Griffiths TL, Steyvers M (2004) Finding scientific topics. In: Proceedings of the national academy of sciences., vol 101. Neural Information Processing Systems Foundation, Irvine, pp 5228–5235
[60]
Haghighi A, Vanderwende L (2009) Exploring content models for multi-document summarization. In: Proceedings of the conference on human language technologies: the 2009 annual conference of the north american chapter of the association for computational linguistics., http://www-nlpir.nist.gov/projects/duc/data.html. Association for Computational Linguistics, Boulder, pp 362–370
[61]
Han J, Shihab E, Wan Z, Deng S, and Xia XWhat do programmers discuss about deep learning frameworksEmpir Softw Eng2020252694-2747https://doi.org/10.1007/s10664-020-09819-6
[62]
Haque MU, Ali Babar M (2020) Challenges in docker development: a large-scale study using stack overflow. In: Proceedings of the 14th international symposium on empirical software engineering and measurement. IEEE/ACM, Bari, pp 1–11
[63]
Hariri N, Castro-Herrera C, Mirakhorli M, Cleland-Huang J, and Mobasher BSupporting domain analysis through mining and recommending features from online product listingsIEEE Trans Softw Eng201339121736-1752https://doi.org/10.1109/TSE.2013.39
[64]
Henß S, Monperrus M, Mezini M (2012) Semi-automatically extracting FAQs to improve accessibility of software development knowledge. In: Proceedings of the international conference on software engineering. IEEE/ACM, Zurich, pp 793–803
[65]
Hindle A, Godfrey MW, Ernst NA, Mylopoulos J (2011) Automated topic naming to support cross-project analysis of software maintenance activities. In: Proceedings of the 33rd international conference on software engineering. ACM, Waikiki, pp 163–172
[66]
Hindle A, Ernst NA, Godfrey MW, and Mylopoulos JAutomated topic naming: Supporting cross-project analysis of software maintenance activitiesEmpir Softw Eng20131861125-1155https://doi.org/10.1007/s10664-012-9209-9
[67]
Hindle A, Bird C, Zimmermann T, and Nagappan NDo topics make sense to managers and developers?Empir Softw Eng201520479-515https://doi.org/10.1007/s10664-014-9312-1
[68]
Hindle A, Alipour A, and Stroulia EA contextual approach towards more accurate duplicate bug report detection and rankingEmpir Softw Eng2016212368-410https://doi.org/10.1007/s10664-015-9387-3
[69]
Hoffman M, Blei D, Bach F (2010) Online learning for latent dirichlet allocation. In: Proceedings of the neural information processing systems conference. https://doi.org/10.1.1.187.1883. Neural Information Processing Systems Foundation, Vancouver, pp 1–9
[70]
Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international conference on research and development in information retrieval. ACM, Berkeley, pp 50–57
[71]
Hu H, Bezemer CP, and Hassan AEStudying the consistency of star ratings and the complaints in 1 & 2-star user reviews for top free cross-platform Android and iOS appsEmpir Softw Eng20182363442-3475https://doi.org/10.1007/s10664-018-9604-y
[72]
Hu H, Wang S, Bezemer CP, and Hassan AEStudying the consistency of star ratings and reviews of popular free hybrid Android and iOS appsEmpir Softw Eng2019247-32https://doi.org/10.1007/s10664-018-9617-6
[73]
Hu W, Wong K (2013) Using citation influence to predict software defects. In: Proceedings of the international working conference on mining software repositories. IEEE, San Francisco, pp 419–428
[74]
Jiang H, Zhang J, Ren Z, Zhang T (2017) An unsupervised approach for discovering relevant tutorial fragments for APIs. In: Proceedings of the 39th international conference on software engineering. IEEE/ACM, Buenos Aires, pp 38–48
[75]
Jiang HE, Zhang J, Li X, Ren Z, Lo D, Wu X, and Luo ZRecommending new features from mobile app descriptionsACM Trans Softw Eng Methodol20192841-29https://doi.org/10.1145/3344158
[76]
Jipeng Q, Zhenyu Q, Yun L, Yunhao Y, Xindong W (2020) Short text topic modeling techniques, applications, and performance: a survey.
[77]
Jo Y, Oh A (2011) Aspect and sentiment unification model for online review analysis. In: Proceedings of the fourth ACM international conference on Web search and data mining. ACM, New York, pp 815–824
[78]
Jones JA, Harrold MJ (2005) Empirical evaluation of the tarantula automatic fault-localization technique. In: Proceedings of the 20th international conference on automated software engineering., http://portal.acm.org/citation.cfm?doid=1101908.1101949. IEEE/ACM, New York, pp 273–282
[79]
Kakas AC, Cohn D, Dasgupta S, Barto AG, Carpenter GA, Grossberg S, Webb GI, Dorigo M, Birattari M, Toivonen H, Timmis J, Branke J, Toivonen H, Strehl AL, Drummond C, Coates A, Abbeel P, Ng AY, Zheng F, Webb GI, Tadepalli P (2011) Area under curve. In: Encyclopedia of machine learning. Springer US, pp 40–40
[80]
Kitchenham BA Procedures for performing systematic reviews Keele, UK, Keele University 2004 33 TR/SE-0401 28 https://doi.org/10.1.1.122.3308
[81]
Layman L, Nikora AP, Meek J, Menzies T (2016) Topic modeling of NASA space system problem reports research in practice. In: Proceedings of the 13th working conference on mining software repositories. ACM, Austin, pp 303–314
[82]
Le TDB, Thung F, and Lo DWill this localization tool be effective for this bug? Mitigating the impact of unreliability of information retrieval based bug localization toolsEmpir Softw Eng2017222237-2279https://doi.org/10.1007/s10664-016-9484-y
[83]
Leach RJ Introduction to software engineering 2016 2nd edn. Boca Raton CRC Press LLC https://ebookcentral.proquest.com/lib/canterbury/detail.action?docID=4711469∖&query=Software+Engineering
[84]
Lee DD and Seung HS Learning the parts of objects by non-negative matrix factorization Nature 1999 401 6755 788-791
[85]
Li H, Chen THP, Shang W, and Hassan AEStudying software logging using topic modelsEmpir Softw Eng2018232655-2694https://doi.org/10.1007/s10664-018-9595-8
[86]
Lian X, Liu W, and Zhang LAssisting engineers extracting requirements on components from domain documentsInf Softw Technol2020118September 2019106196https://doi.org/10.1016/j.infsof.2019.106196
[87]
Lin T, Tian W, Mei Q, Cheng H (2014) The dual-sparse topic model: Mining focused topics and focused terms in short text. In: Proceedings of the 23rd international conference on world wide web. ACM, Seoul, pp 539–549
[88]
Liu Y, Liu L, Liu H, Wang X, and Yang HMining domain knowledge from app descriptionsJ Syst Softw2017133126-144https://doi.org/10.1016/j.jss.2017.08.024
[89]
Liu Y, Lin J, Cleland-Huang J (2020) Traceability support for multi-lingual software projects. In: Proceedings of the 17th international conference on mining software repositories. ACM, Seoul, pp 443–454
[90]
Lukins SK, Kraft NA, and Etzkorn LHBug localization using latent Dirichlet allocationInf Softw Technol201052972-990https://doi.org/10.1016/j.infsof.2010.04.002
[91]
Luo Q, Moran K, Poshyvanyk D (2016) A large-scale empirical comparison of static and dynamic test case prioritization techniques. In: Proceedings of the 24th international symposium on foundations of software engineering. ACM, Seattle, pp 559–570
[92]
Mahmoud A and Bradshaw GSemantic topic models for source code analysisEmpir Softw Eng20172241965-2000https://doi.org/10.1007/s10664-016-9473-1
[93]
Mann HB and Whitney DROn a test of whether one of two random variables is stochastically larger than the otherAnn Math Stat194718150-60https://doi.org/10.1214/aoms/1177730491, http://projecteuclid.org/euclid.aoms/1177730491
[94]
Manning CD, Raghavan P, Schütze H (2008) Evaluation of Clustering. In: Introduction to information retrieval. chap 16,. https://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html, http://nlp.stanford.edu/IR?book/html/htmledition/evaluation?of?clustering?1.htmlwhereisthesetofclustersan. Cambridge University Press
[95]
Mantyla MV, Claes M, Farooq U (2018) Measuring LDA topic stability from clusters of replicated runs, ACM, Oulu.
[96]
Martin W, Harman M, Jia Y, Sarro F, Zhang Y (2015) The app sampling problem for app store mining. In: Proceedings of the 12th international working conference on mining software repositories. IEEE, Florence, pp 123–133
[97]
Martin W, Sarro F, Harman M (2016) Causal impact analysis for app releases in google play. In: Proceedings of the 24th international symposium on foundations of software engineering. ACM, Seattle, pp 435–446
[98]
McIlroy S, Ali N, Khalid H, and E Hassan AAnalyzing and automatically labelling the types of user issues that are raised in mobile app reviewsEmpir Softw Eng2016211067-1106https://doi.org/10.1007/s10664-015-9375-7
[99]
Mehrotra R, Sanner S, Buntine W, Xie L (2013) Improving LDA Topic Models for Microblogs via Tweet Pooling and Automatic Labeling. In: Proceedings of the 36th International Conference on Research and Development in Information Retrieval. ACM, Dublin, pp 889–892
[100]
Mezouar ME, Zhang F, and Zou YAre tweets useful in the bug fixing process? An empirical study on Firefox and ChromeEmpir Softw Eng20182331704-1742https://doi.org/10.1007/s10664-017-9559-4
[101]
Miner G, Elder J, Fast A, Hill T, Nisbet R, and Delen DPractical text mining and statistical analysis for non-structured text data applications2012WalthamElsevier Science & Technologyhttps://doi.org/10.1016/C2010-0-66188-8
[102]
Moslehi P, Adams B, Rilling J (2016) On mining crowd-based speech documentation. In: Proceedings of the 13th working conference on mining software repositories. ACM, Austin, pp 259–268
[103]
Moslehi P, Adams B, Rilling J (2018) Feature location using crowd-based screencasts. In: Proceedings of the 15th international conference on mining software repositories. ACM, New York, pp 192–202
[104]
Moslehi P, Adams B, and Rilling JA feature location approach for mapping application features extracted from crowd-based screencasts to source codeEmpir Softw Eng2020254873-4926https://doi.org/10.1007/s10664-020-09874-z
[105]
Murali V, Chaudhuri S, Jermaine C (2017) Bayesian specification learning for finding API usage errors. In: Proceedings of the Joint european software engineering conference and symposium on the foundations of software engineering. ACM, Paderborn, pp 151–162
[106]
Nabli H, Ben Djemaa R, and Ben Amor IAEfficient cloud service discovery approach based on LDA topic modelingJ Syst Softw2018146233-248https://doi.org/10.1016/j.jss.2018.09.069
[107]
Naguib H, Narayan N, Brügge B, Helal D (2013) Bug report assignee recommendation using activity profiles. In: Proceedings of the international working conference on mining software repositories. IEEE, San Francisco, pp 22–30
[108]
Nayebi M, Cho H, and Ruhe GApp store mining is not enough for app improvementEmpir Softw Eng2018232764-2794https://doi.org/10.1007/s10664-018-9601-1
[109]
Nguyen AT, Nguyen TT, Al-Kofahi J, Nguyen HV, Nguyen TN (2011) A topic-based approach for narrowing the search space of buggy files from a bug report. In: Proceedings of the 26th international conference on automated software engineering. IEEE/ACM, Lawrence, pp 263–272
[110]
Nguyen AT, Nguyen TT, Nguyen TN, Lo D, Sun C (2012) Duplicate bug report detection with a combination of information retrieval and topic modeling. In: Proceedings of the 27th international conference on automated software engineering. IEEE/ACM, Essen, pp 70–79
[111]
Nguyen VA, Boyd-Graber J, Resnik P, Chang J, Graber JB (2014) Learning a concept hierarchy from multi-labeled documents. In: Proceedings of the neural information processing systems conference. Neural Information Processing Systems Foundation, Montreal, pp 1–9
[112]
Noei E and Heydarnoori AEXAF: A search engine for sample applications of object-oriented framework-provided conceptsInf Softw Technol201675135-147https://doi.org/10.1016/j.infsof.2016.03.007
[113]
Noei E, Da Costa DA, Zou Y (2018) Winning the app production rally. In: Proceedings of the 26th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering. ACM, Lake Buena Vista, pp 283–294
[114]
Noei E, Zhang F, Wang S, and Zou YTowards prioritizing user-related issue reports of mobile applicationsEmpir Softw Eng2019241964-1996https://doi.org/10.1007/s10664-019-09684-y
[115]
Pagano D and Maalej WHow do open source communities blog?Empir Softw Eng20131861090-1124https://doi.org/10.1007/s10664-012-9211-2
[116]
Palomba F, Salza P, Ciurumelea A, Panichella S, Gall H, Ferrucci F, De Lucia A (2017) Recommending and localizing change requests for mobile apps based on user reviews. In: Proceedings of the 39th international conference on software engineering. IEEE/ACM, Buenos Aires, pp 106–117
[117]
Panichella A, Dit B, Oliveto R, Di Penta M, Poshynanyk D, De Lucia A (2013) How to effectively use topic models for software engineering tasks? An approach based on Genetic Algorithms. In: Proceedings of the international conference on software engineering. IEEE/ACM, San Francisco, pp 522–531
[118]
Pérez F, Lapeṅa R, Font J, and Cetina CFragment retrieval on models for model maintenance: Applying a multi-objective perspective to an industrial case studyInf Softw Technol2018103188-201https://doi.org/10.1016/j.infsof.2018.06.017
[119]
Petersen K, Vakkalanka S, and Kuzniarz LGuidelines for conducting systematic mapping studies in software engineering: An updateInf Softw Technol20156411-18https://doi.org/10.1016/j.infsof.2015.03.007
[120]
Pettinato M, Gil JP, Galeas P, and Russo BLog mining to re-construct system behavior: An exploratory study on a large telescope systemInf Softw Technol2019114121-136https://doi.org/10.1016/j.infsof.2019.06.011s
[121]
Poshyvanyk D, Gueheneuc YG, Marcus A, Antoniol G, Rajlich V (2007) Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. https://www.researchgate.net/publication/3189749, vol 33, pp 420–431
[122]
Poshyvanyk D, Marcus A, Ferenc R, and Gyimóthy TUsing information retrieval based coupling measures for impact analysisEmpir Softw Eng20091415-32https://doi.org/10.1007/s10664-008-9088-2, http://www.mozilla.org/
[123]
Poshyvanyk D, Gethers M, and Marcus AConcept location using formal concept analysis and information retrievalACM Trans Softw Eng Methodol20122141-34https://doi.org/10.1145/2377656.2377660
[124]
Poursabzi-Sangdeh F, Goldstein DG, Hofman JM, Vaughan JW, Wallach H (2021) Manipulating and measuring model interpretability. In: Proceedings of the conference on human factors in computing systems. ACM, Yokohama
[125]
Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the conference on empirical methods in natural language processing. ACL/AFNLP, Singapore, pp 248–256
[126]
Rao S, Kak A (2011) Retrieval from software libraries for bug localization: A comparative study of generic and composite text models. In: Proceedings of the international conference on software engineering. IEEE/ACM, Waikiki, pp 43–52
[127]
Ray B, Posnett D, Filkov V, Devanbu P (2014) A large scale study of programming languages and code quality in GitHub. In: Proceedings of the symposium on the foundations of software engineering, pp 155–165.
[128]
Revelle M, Gethers M, and Poshyvanyk DUsing structural and textual information to capture feature coupling in object-oriented softwareEmpir Softw Eng2011166773-811https://doi.org/10.1007/s10664-011-9159-7
[129]
Röder M, Both A, Hinneburg A (2015) Exploring the space of topic coherence measures. In: Proceedings of the eighth ACM international conference on web search and data mining - WSDM ’15. ACM, Shanghai, pp 399–408
[130]
Rosen C and Shihab EWhat are mobile developers asking about? A large scale study using Stack OverflowEmpir Softw Eng2016211192-1223https://doi.org/10.1007/s10664-015-9379-3
[131]
Rosenberg CM, Moonen L (2018) Improving problem identification via automated log clustering using dimensionality reduction. In: Proceedings of the international symposium on empirical software engineering and measurement. ACM, Oulu, pp 1–10
[132]
Rothermel G, Untcn RH, Chu C, and Harrold MJPrioritizing test cases for regression testingIEEE Trans Softw Eng20012710929-948https://doi.org/10.1109/32.962562
[133]
Salton G, Wong A, and Yang CSA vector space model for automatic indexingCommun ACM19751811613-620https://doi.org/10.1145/361219.361220
[134]
Savage T, Dit B, Gethers M, and Poshyvanyk DTopicXP: exploring topics in source code using latent Dirichlet allocation2010TimisoaraIEEEhttps://doi.org/10.1109/ICSM.2010.5609654
[135]
Shannon CEA mathematical theory of communicationBell Syst Tech J1948273379-423https://doi.org/10.1002/j.1538-7305.1948.tb01338.x
[136]
Shimagaki J, Kamei Y, Ubayashi N, Hindle A (2018) Automatic topic classification of test cases using text mining at an android smartphone vendor. In: Proceedings of the 12th international symposium on empirical software engineering and measurement. IEEE/ACM, Oulu, pp 1–10
[137]
Silva B, Sant’anna C, Rocha N, and Chavez CThe effect of automatic concern mapping strategies on conceptual cohesion measurementInf Softw Technol20167556-70https://doi.org/10.1016/j.infsof.2016.03.006
[138]
Silva LL, Valente MT, and Maia MACo-change patterns: A large scale empirical studyJ Syst Softw2019152196-214https://doi.org/10.1016/j.jss.2019.03.014
[139]
Soliman M, Galster M, Salama AR, Riebisch M (2016) Architectural knowledge for technology decisions in developer communities: An exploratory study with Stack Overflow. In: Proceedings of the 13th working conference on software architecture. IEEE, Venice, pp 128–133
[140]
Somasundaram K, Murphy GC (2012) Automatic categorization of bug reports using latent Dirichlet allocation. In: Proceedings of the 5th India software engineering conference., vol 12. ACM, pp 125–130
[141]
Souza LB, Campos EC, Madeiral F, Paixão K, Rocha AM, and Maia MdABootstrapping cookbooks for APIs from crowd knowledge on Stack OverflowInf Softw Technol2019111March 201837-49https://doi.org/10.1016/j.infsof.2019.03.009
[142]
Steyvers M, Griffiths T (2010) Probalistic Topic Models. In: Landauer T, McNamara D, Dennis S, Kintsch W (eds) Latent semantic analysis: a road to meaning. University of California, Irvine, pp 993–1022
[143]
Sun X, Li B, Leung H, Li B, and Li YMSR4SM: Using topic models to effectively mining software repositories for software maintenance tasksInf Softw Technol2015661-12https://doi.org/10.1016/j.infsof.2015.05.003
[144]
Sun X, Liu X, Li B, Duan Y, Yang H, Hu J (2016) Exploring topic models in software engineering data analysis: A survey, IEEE, Shangai.
[145]
Sun X, Yang H, Xia X, and Li BEnhancing developer recommendation with supplementary information via mining historical commitsJ Syst Softw2017134355-368https://doi.org/10.1016/j.jss.2017.09.021
[146]
Taba SES, Keivanloo I, Zou Y, and Wang SAn exploratory study on the usage of common interface elements in android applicationsJ Syst Softw2017131491-504https://doi.org/10.1016/j.jss.2016.07.010
[147]
Tairas R, Gray J (2009) An information retrieval process to aid in the analysis of code clones., http://www.cis.uab.edu/tairasr/clones/literature, vol 14, pp 33–56
[148]
Tamrawi A, Nguyen TT, Al-Kofahi JM, Nguyen TN (2011) Fuzzy set and cache-based approach for bug triaging. In: Proceedings of the 19th ACM symposium on foundations of software engineering. ACM, pp 365–375
[149]
Tang J, Zhang M, Mei Q (2013) One theme in all views: modeling consensus topics in multiple contexts. In: Proceedings of the 19th international conference on knowledge discovery and data mining. ACM, New York, pp 5–13
[150]
Tantithamthavorn C, Lemma Abebe S, Hassan AE, Ihara A, and Matsumoto KThe impact of IR-based classifier configuration on the performance and the effort of method-level bug localizationInf Softw Technol2018102June160-174https://doi.org/10.1016/j.infsof.2018.06.001
[151]
Teh YW, Jordan MI, Beal MJ, and Blei DMHierarchical Dirichlet processesJ Am Stat Assoc20061014761566-1581https://doi.org/10.1198/016214506000000302
[152]
Thomas SW, Nagappan M, Blostein D, and Hassan AEThe impact of classifier configuration and classifier combination on bug localizationIEEE Trans Softw Eng201339101427-1443https://doi.org/10.1109/TSE.2013.27
[153]
Thomas SW, Hemmati H, Hassan AE, and Blostein DStatic test case prioritization using topic modelsEmpir Softw Eng201419182-212https://doi.org/10.1007/s10664-012-9219-7
[154]
Tiarks R, Maalej W (2014) How does a typical tutorial for mobile development look like?. In: Proceedings of the 11th international conference on mining software repositories. IEEE/ACM, Hyderabad, pp 272–281
[155]
Treude C, Wagner M (2019) Predicting good configurations for GitHub and stack overflow topic models. In: Proceedings of the 16th international conference on mining software repositories. IEEE, Montreal, pp 84–95
[156]
Vargha A and Delaney HDA critique and improvement of the CL common language effect size statistics of McGraw and WongJ Educ Behav Stat2000252101-132https://doi.org/10.3102/10769986025002101
[157]
Wallach HM, Mimno D, McCallum A (2009) Rethinking LDA: Why priors matter. In: Proceedings of the conference on advances in neural information processing systems. Curran Associates Inc., Vancouver, pp 1973–1981. http://rexa.info/
[158]
Wang C, Blei DM (2011) Collaborative topic modeling for recommending scientific articles. In: Proceedings of the international conference on knowledge discovery and data mining. ACM, New York, pp 448–456
[159]
Wang W, Malik H, Godfrey MW (2015) Recommending posts concerning API issues in developer Q&A sites. In: Proceedings of the international working conference on mining software repositories. http://stackoverflow.com/questions/5358219/. IEEE/ACM, pp 224–234
[160]
Wei X, Croft WB (2006) LDA-based document models for ad-hoc retrieval. In: Proceedings of the 29th annual international conference on research and development in information retrieval. ACM, Seattle, pp 178–185
[161]
Weng J, Lim EP, Jiang J, He Q (2010) TwitterRank: Finding topic-sensitive influential twitterers. In: Proceedings of the 3rd international conference on web search and data mining. ACM, New York, pp 261–270
[162]
Wold S, Esbensen K, and Geladi PPrincipal component analysisChemom Intell Lab Syst1987237-52https://doi.org/10.1016/0169-7439(87)80084-9
[163]
Xia X, Bao L, Lo D, Kochhar PS, Hassan AE, and Xing ZWhat do developers search for on the web?Empir Softw Eng20172263149-3185https://doi.org/10.1007/s10664-017-9514-4
[164]
Xia X, Lo D, Ding Y, Al-Kofahi JM, Nguyen TN, and Wang XImproving automated bug triaging with specialized topic modelIEEE Trans Softw Eng2017433272-297https://doi.org/10.1109/TSE.2016.2576454
[165]
Yan M, Fu Y, Zhang X, Yang D, Xu L, and Kymer JDAutomatically classifying software changes via discriminative topic model: Supporting multi-category and cross-projectJ Syst Softw2016113296-308https://doi.org/10.1016/j.jss.2015.12.019
[166]
Yan M, Zhang X, Yang D, Xu L, and Kymer JDA component recommender for bug reports using Discriminative Probability Latent Semantic AnalysisInf Softw Technol20167337-51https://doi.org/10.1016/j.infsof.2016.01.005
[167]
Yang X, Lo D, Li L, Xia X, Bissyandé TF, and Klein JCharacterizing malicious Android apps by mining topic-specific data flow signaturesInf Softw Technol20179027-39https://doi.org/10.1016/j.infsof.2017.04.007
[168]
Ye D, Xing Z, and Kapre NThe structure and dynamics of knowledge network in domain-specific Q&A sites: a case study of stack overflowEmpir Softw Eng2017221375-406https://doi.org/10.1007/s10664-016-9430-z
[169]
Zaman S, Adams B, Hassan AE (2011) Security versus performance bugs: A case study on firefox. In: Proceedings - international conference on software engineering., pp 93–102
[170]
Zeugmann T, Poupart P, Kennedy J, Jin X, Han J, Saitta L, Sebag M, Peters J, Bagnell JA, Daelemans W, Webb GI, Ting KM, Ting KM, Webb GI, Shirabad JS, Fürnkranz J, Hüllermeier E, Matwin S, Sakakibara Y, Flener P, Schmid U, Procopiuc CM, Lachiche N, Fürnkranz J (2011) Precision and recall. In: Encyclopedia of machine learning. Springer US, pp 781–781
[171]
Zhang E, Zhang Y (2009) Average precision. In: Encyclopedia of database systems. Springer US, pp 192–193
[172]
Zhang T, Chen J, Yang G, Lee B, and Luo XTowards more accurate severity prediction and fixer recommendation of software bugsJ Syst Softw2016117166-184https://doi.org/10.1016/j.jss.2016.02.034
[173]
Zhang Y, Lo D, Xia X, Scanniello G, Le TDB, and Sun JFusing multi-abstraction vector space models for concern localizationEmpir Softw Eng2018232279-2322https://doi.org/10.1007/s10664-017-9585-2
[174]
Zhao N, Chen J, Wang Z, Peng X, Wang G, Wu Y, Zhou F, Feng Z, Nie X, Zhang W, Sui K, Pei D (2020) Real-time incident prediction for online service systems. In: Proceedings of the 28th ACM joint meeting european software engineering conference and symposium on the foundations of software engineering., vol 20. ACM, pp 315–326
[175]
Zhao WX, Jiang J, Weng J, He J, Lim EP, Yan H, Li X (2011) Comparing twitter and traditional media using topic models. In: Lecture Notes in Computer Science., vol 6611. Springer, Berlin, chap Advances i, pp 338–349
[176]
Zhao Y, Zhanq F, Shlhab E, Zou Y, Hassan AE (2016) How are discussions associated with bug reworking? an empirical study on open source projects. In: Proceedings of the 10th international symposium on empirical software engineering and measurement. IEEE/ACM, Ciudad Real, pp 1–10
[177]
Zou J, Xu L, Yang M, Zhang X, and Yang DTowards comprehending the non-functional requirements through Developers’ eyes: An exploration of Stack Overflow using topic analysisInf Softw Technol201784119-32https://doi.org/10.1016/j.infsof.2016.12.003

Cited By

View all
  • (2025)Decoding educational augmented reality research trends: a topic modeling analysisEducation and Information Technologies10.1007/s10639-024-12943-130:1(57-87)Online publication date: 1-Jan-2025
  • (2024)Help Me to Understand this Commit! - A Vision for Contextualized Code ReviewsProceedings of the 1st ACM/IEEE Workshop on Integrated Development Environments10.1145/3643796.3648447(18-23)Online publication date: 20-Apr-2024
  • (2024)Text Analysis Software Using Topic Modeling Techniques for the Extraction of Knowledge from Cases Related to Vulnerability and Access to JusticeArtificial Intelligence in HCI10.1007/978-3-031-60615-1_23(334-352)Online publication date: 29-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Empirical Software Engineering
Empirical Software Engineering  Volume 26, Issue 6
Nov 2021
1006 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 November 2021
Accepted: 29 July 2021

Author Tags

  1. Topic modeling
  2. Text mining
  3. Natural language processing
  4. Literature analysis

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Decoding educational augmented reality research trends: a topic modeling analysisEducation and Information Technologies10.1007/s10639-024-12943-130:1(57-87)Online publication date: 1-Jan-2025
  • (2024)Help Me to Understand this Commit! - A Vision for Contextualized Code ReviewsProceedings of the 1st ACM/IEEE Workshop on Integrated Development Environments10.1145/3643796.3648447(18-23)Online publication date: 20-Apr-2024
  • (2024)Text Analysis Software Using Topic Modeling Techniques for the Extraction of Knowledge from Cases Related to Vulnerability and Access to JusticeArtificial Intelligence in HCI10.1007/978-3-031-60615-1_23(334-352)Online publication date: 29-Jun-2024
  • (2023)Machine Learning for Software Engineering: A Tertiary StudyACM Computing Surveys10.1145/357290555:12(1-39)Online publication date: 2-Mar-2023
  • (2023)Automatic definition of engineer archetypesComputers in Industry10.1016/j.compind.2023.103996152:COnline publication date: 1-Nov-2023
  • (2023)Evaluating pre-trained models for user feedback analysis in software engineering: a study on classification of app-reviewsEmpirical Software Engineering10.1007/s10664-023-10314-x28:4Online publication date: 23-May-2023
  • (2023)Cross-status communication and project outcomes in OSS developmentEmpirical Software Engineering10.1007/s10664-023-10298-828:3Online publication date: 12-May-2023
  • (2023)Student engagement research trends of past 10 years: A machine learning-based analysis of 42,000 research articlesEducation and Information Technologies10.1007/s10639-023-11803-828:11(15067-15091)Online publication date: 1-Nov-2023
  • (2023)Technical debt (TD) through the lens of TwitterJournal of Software: Evolution and Process10.1002/smr.254736:4Online publication date: 4-Mar-2023

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media