[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3524842.3527951acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

FaST: a linear time stack trace alignment heuristic for crash report deduplication

Published: 17 October 2022 Publication History

Abstract

In software projects, applications are often monitored by systems that automatically identify crashes, collect their information into reports, and submit them to developers. Especially in popular applications, such systems tend to generate a large number of crash reports in which a significant portion of them are duplicate. Due to this high submission volume, in practice, the crash report deduplication is supported by devising automatic systems whose efficiency is a critical constraint. In this paper, we focus on improving deduplication system throughput by speeding up the stack trace comparison. In contrast to the state-of-the-art techniques, we propose FaST, a novel sequence alignment method that computes the similarity score between two stack traces in linear time. Our method independently aligns identical frames in two stack traces by means of a simple alignment heuristic. We evaluate FaST and five competing methods on four datasets from open-source projects using ranking and binary metrics. Despite its simplicity, FaST consistently achieves state-of-the-art performance regarding all metrics considered. Moreover, our experiments confirm that FaST is substantially more efficient than methods based on optimal sequence alignment.

References

[1]
Kevin Bartz, Jack W. Stokes, John C. Platt, Ryan Kivett, David Grant, Silviu Calinoiu, and Gretchen Loihle. 2008. Finding Similar Failures Using Callstack Similarity. In Proceedings of the Third Conference on Tackling Computer Systems Problems with Machine Learning Techniques (San Diego, California) (SysML'08). USENIX Association, Berkeley, CA, USA, 1--1. http://dl.acm.org/citation.cfm?id=1855895.1855896
[2]
Serafim Batzoglou. 2005. The many faces of sequence alignment. Briefings in bioinformatics 6, 1 (2005), 6--22.
[3]
J. Bergstra, D. Yamins, and D. D. Cox. 2013. Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures. In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28 (Atlanta, GA, USA) (ICML'13). JMLR.org, I-115--I-123.
[4]
M. Brodie, Sheng Ma, G. Lohman, L. Mignet, N. Modani, M. Wilding, J. Champlin, and P. Sohn. 2005. Quickly Finding Known Software Problems via Automated Symptom Matching. In Second International Conference on Autonomic Computing (ICAC'05). 101--110.
[5]
Joshua Charles Campbell, Eddie Antonio Santos, and Abram Hindle. 2016. The Unreasonable Effectiveness of Traditional Information Retrieval in Crash Report Deduplication. In Proceedings of the 13th International Conference on Mining Software Repositories (Austin, Texas) (MSR '16). ACM, New York, NY, USA, 269--280.
[6]
Angana Chakraborty and Sanghamitra Bandyopadhyay. 2013. FOGSAA: Fast optimal global sequence alignment algorithm. Scientific reports 3, 1 (2013), 1--9.
[7]
Yingnong Dang, Rongxin Wu, Hongyu Zhang, Dongmei Zhang, and Peter Nobel. 2012. ReBucket: A Method for Clustering Duplicate Crash Reports Based on Call Stack Similarity. In Proceedings of the 34th International Conference on Software Engineering (Zurich, Switzerland) (ICSE '12). IEEE Press, Piscataway, NJ, USA, 1084--1093. http://dl.acm.org/citation.cfm?id=2337223.2337364
[8]
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 51, 1 (jan 2008), 107--113.
[9]
Tejinder Dhaliwal, Foutse Khomh, and Ying Zou. 2011. Classifying Field Crash Reports for Fixing Bugs: A Case Study of Mozilla Firefox. In Proceedings of the 2011 27th IEEE International Conference on Software Maintenance (ICSM '11). IEEE Computer Society, Washington, DC, USA, 333--342.
[10]
Eclipse Foundation. 2021. Eclipse BTS. https://bugs.eclipse.org/bugs/
[11]
The Apache Software Foundation. 2013. Netbeans BTS. https://bugzilla.gnome.org/
[12]
The Apache Software Foundation. 2016. Netbeans BTS. https://bz.apache.org/netbeans/
[13]
James A Hanley and Barbara J McNeil. 1982. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143, 1 (1982), 29--36.
[14]
Peter Kampstra. 2008. Beanplot: A Boxplot Alternative for Visual Comparison of Distributions. Journal of Statistical Software, Code Snippets 28, 1 (2008), 1--9.
[15]
Aleksandr Khvorov, Roman Vasiliev, George Chernishev, Irving Muller Rodrigues, Dmitrij Koznov, and Nikita Povarov. 2021. S3M: Siamese Stack (Trace) Similarity Measure. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). 266--270.
[16]
Johannes Lerch and Mira Mezini. 2013. Finding Duplicates of Your Yet Unwritten Bug Report. In Proceedings of the 2013 17th European Conference on Software Maintenance and Reengineering (CSMR '13). IEEE Computer Society, Washington, DC, USA, 69--78.
[17]
Canonical Ltd. 2021. Ubuntu BTS. https://bugs.launchpad.net/
[18]
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press, Cambridge, UK. http://nlp.stanford.edu/IR-book/information-retrieval-book.html
[19]
Natwar Modani, Rajeev Gupta, Guy Lohman, Tanveer Syeda-Mahmood, and Laurent Mignet. 2007. Automatically Identifying Known Software Problems. In Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop (ICDEW '07). IEEE Computer Society, Washington, DC, USA, 433--441.
[20]
A. Moroo, A. Aizawa, and T. Hamamoto. 2017. Reranking-based Crash Report Deduplication. In SEKE '17, X. He (Ed.). 507--510.
[21]
S.B. Needleman and C.D. Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48, 3 (1970), 443--453.
[22]
M. S. Rakha, C. Bezemer, and A. E. Hassan. 2018. Revisiting the Performance Evaluation of Automated Approaches for the Retrieval of Duplicate Issue Reports. IEEE Transactions on Software Engineering 44, 12 (2018), 1245--1268.
[23]
Irving Muller Rodrigues, Aleksandr Khvorov, Daniel Aloise, Roman Vasiliev, Dmitrij Koznov, Eraldo Rezende Fernandes, George Chernishev, Dmitry Luciv, and Nikita Povarov. 2022. TraceSim: An Alignment Method for Computing Stack Trace Similarity. Empirical Software Engineering 27, 2 (01 Mar 2022), 53.
[24]
Korosh Koochekian Sabor, Abdelwahab Hamou-Lhadj, and Alf Larsson. 2017. DURFEX: A Feature Extraction Technique for Efficient Detection of Duplicate Bug Reports. In 2017 IEEE International Conference on Software Quality, Reliability and Security, QRS 2017, Prague, Czech Republic, July 25--29, 2017. IEEE, 240--250.
[25]
Adrian Schroter, Adrian Schröter, Nicolas Bettenburg, and Rahul Premraj. 2010. Do stack traces help developers fix bugs?. In 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010). IEEE, 118--121.
[26]
Peter H. Sellers. 1974. On the Theory and Computation of Evolutionary Distances. SIAM J. Appl. Math. 26, 4 (1974), 787--793.

Cited By

View all
  • (2024)Foliage: Nourishing Evolving Software by Characterizing and Clustering Field BugsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680363(1325-1337)Online publication date: 11-Sep-2024
  • (2024)CrashChecker: A Fusion Method for Clustering Duplicate Crash Failures in SAP HANA Delivery2024 IEEE 35th International Symposium on Software Reliability Engineering Workshops (ISSREW)10.1109/ISSREW63542.2024.00044(37-42)Online publication date: 28-Oct-2024
  • (2022)Separating the Wheat from the Chaff: Using Indexing and Sub-Sequence Mining Techniques to Identify Related Crashes During Bug Triage2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS)10.1109/QRS57517.2022.00014(31-42)Online publication date: Dec-2022

Index Terms

  1. FaST: a linear time stack trace alignment heuristic for crash report deduplication

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        MSR '22: Proceedings of the 19th International Conference on Mining Software Repositories
        May 2022
        815 pages
        ISBN:9781450393034
        DOI:10.1145/3524842
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        In-Cooperation

        • IEEE CS

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 17 October 2022

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. automatic crash reporting
        2. crash report deduplication
        3. duplicate crash report
        4. duplicate crash report detection
        5. stack trace similarity

        Qualifiers

        • Research-article

        Funding Sources

        • Engineering Research Council of Canada (NSERC), Ericsson, Ciena, and EffciOS

        Conference

        MSR '22
        Sponsor:

        Upcoming Conference

        ICSE 2025

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)46
        • Downloads (Last 6 weeks)5
        Reflects downloads up to 11 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Foliage: Nourishing Evolving Software by Characterizing and Clustering Field BugsProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3650212.3680363(1325-1337)Online publication date: 11-Sep-2024
        • (2024)CrashChecker: A Fusion Method for Clustering Duplicate Crash Failures in SAP HANA Delivery2024 IEEE 35th International Symposium on Software Reliability Engineering Workshops (ISSREW)10.1109/ISSREW63542.2024.00044(37-42)Online publication date: 28-Oct-2024
        • (2022)Separating the Wheat from the Chaff: Using Indexing and Sub-Sequence Mining Techniques to Identify Related Crashes During Bug Triage2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS)10.1109/QRS57517.2022.00014(31-42)Online publication date: Dec-2022

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media