More Web Proxy on the site http://driver.im/

research-article

Deep specification mining

Authors:

Tien-Duy B. Le,

David LoAuthors Info & Claims

ISSTA 2018: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis

Pages 106 - 117

https://doi.org/10.1145/3213846.3213876

Published: 12 July 2018 Publication History

Abstract

Formal specifcations are essential but usually unavailable in software systems. Furthermore, writing these specifcations is costly and requires skills from developers. Recently, many automated techniques have been proposed to mine specifcations in various formats including fnite-state automaton (FSA). However, more works in specifcation mining are needed to further improve the accuracy of the inferred specifcations. In this work, we propose Deep Specifcation Miner (DSM), a new approach that performs deep learning for mining FSA-based specifcations. Our proposed approach uses test case generation to generate a richer set of execution traces for training a Recurrent Neural Network Based Language Model (RNNLM). From these execution traces, we construct a Prefx Tree Acceptor (PTA) and use the learned RNNLM to extract many features. These features are subsequently utilized by clustering algorithms to merge similar automata states in the PTA for constructing a number of FSAs. Then, our approach performs a model selection heuristic to estimate F-measure of FSAs and returns the one with the highest estimated Fmeasure. We execute DSM to mine specifcations of 11 target library classes. Our empirical analysis shows that DSM achieves an average F-measure of 71.97%, outperforming the best performing baseline by 28.22%. We also demonstrate the value of DSM in sandboxing Android apps.

References

[1]

Last accessed on February 25, 2017. In https://github.com/linkedin/camus.

[2]

Domenico Amalfitano, Anna Rita Fasolino, Porfirio Tramontana, Salvatore De Carmine, and Atif M. Memon. 2012. Using GUI ripping for automated testing of Android applications. In IEEE/ACM International Conference on Automated Software Engineering, ASE’12, Essen, Germany, September 3-7, 2012. 258–261.

Digital Library

[3]

Daniel Arp, Michael Spreitzenbarth, Malte Hubner, Hugo Gascon, and Konrad Rieck. 2014. DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket. In 21st Annual Network and Distributed System Security Symposium, NDSS 2014, San Diego, California, USA, February 23-26, 2014.

[4]

Lingfeng Bao, Tien-Duy B. Le, and David Lo. 2018. Mining sandboxes: Are we there yet?. In 25th International Conference on Software Analysis, Evolution and Reengineering, SANER 2018, Campobasso, Italy, March 20-23, 2018. 445–455.

[5]

Ivan Beschastnikh, Yuriy Brun, Jenny Abrahamson, Michael D. Ernst, and Arvind Krishnamurthy. 2015. Using Declarative Specification to Improve the Understanding, Extensibility, and Comparison of Model-Inference Algorithms. IEEE Trans. Software Eng. 41, 4 (2015), 408–428.

[6]

Ivan Beschastnikh, Yuriy Brun, Sigurd Schneider, Michael Sloan, and Michael D Ernst. 2011. Leveraging existing instrumentation to automatically infer invariantconstrained models. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. ACM, 267–277.

Digital Library

[7]

Alan W Biermann and Jerome A Feldman. 1972. On the synthesis of finite-state machines from samples of their behavior. IEEE Trans. Comput. 100, 6 (1972), 592–597.

Digital Library

[8]

Zherui Cao, Yuan Tian, Tien-Duy B Le, and David Lo. {n. d.}. Rule-based specification mining leveraging learning to rank. Automated Software Engineering ({n. d.}), 1–30.

[9]

Junyoung Chung, Çaglar Gülçehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. CoRR abs/1412.3555 (2014).

[10]

Edmund M. Clarke, Jr., Orna Grumberg, and Doron A. Peled. 1999. Model Checking. MIT Press, Cambridge, MA, USA.

Digital Library

[11]

Valentin Dallmeier, Nikolai Knopp, Christoph Mallon, Gordon Fraser, Sebastian Hack, and Andreas Zeller. 2012. Automatically Generating Test Cases for Specification Mining. IEEE Trans. Software Eng. 38, 2 (2012), 243–257.

Digital Library

[12]

Valentin Dallmeier, Nikolai Knopp, Christoph Mallon, Sebastian Hack, and Andreas Zeller. 2010. Generating test cases for specification mining. In Proceedings of the Nineteenth International Symposium on Software Testing and Analysis, ISSTA 2010, Trento, Italy, July 12-16, 2010. 85–96.

Digital Library

[13]

David Lo and Siau-Cheng Khoo. 2006. SMArTIC: towards building an accurate, robust and scalable specification miner. In Proceedings of the 14th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2006, Portland, Oregon, USA, November 5-11, 2006. 265–275.

Digital Library

[14]

Michael D. Ernst, Jeff H. Perkins, Philip J. Guo, Stephen McCamant, Carlos Pacheco, Matthew S. Tschantz, and Chen Xiao. 2007. The Daikon system for dynamic detection of likely invariants. Sci. Comput. Program. 69, 1-3 (2007), 35–45.

Digital Library

[15]

Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, Oregon, USA. 226–231.

Digital Library

[16]

Gordon Fraser and Andrea Arcuri. 2011. EvoSuite: automatic test suite generation for object-oriented software. In SIGSOFT/FSE’11 19th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-19) and ESEC’11: 13th European Software Engineering Conference (ESEC-13), Szeged, Hungary, September 5-9, 2011.

Digital Library

[17]

416–419.

[18]

Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, and Sunghun Kim. 2016. Deep API learning. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, Seattle, WA, USA, November 13-18, 2016. 631–642.

Digital Library

[19]

Shuai Hao, Bin Liu, Suman Nath, William G. J. Halfond, and Ramesh Govindan. 2014. PUMA: programmable UI-automation for large-scale dynamic analysis of mobile apps. In The 12th Annual International Conference on Mobile Systems, Applications, and Services, MobiSys’14, Bretton Woods, NH, USA, June 16-19, 2014.

Digital Library

[20]

204–217.

[21]

Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar T. Devanbu. 2012. On the naturalness of software. In 34th International Conference on Software Engineering, ICSE 2012, June 2-9, 2012, Zurich, Switzerland. 837–847.

Digital Library

[22]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.

Digital Library

[23]

Konrad Jamrozik, Philipp von Styp-Rekowsky, and Andreas Zeller. 2016. Mining sandboxes. In Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, Austin, TX, USA, May 14-22, 2016. 37–48.

Digital Library

[24]

Konrad Jamrozik and Andreas Zeller. 2016. DroidMate: a robust and extensible test generator for Android. In Proceedings of the International Conference on Mobile Software Engineering and Systems, MOBILESoft ’16, Austin, Texas, USA, May 14-22, 2016. 293–294.

Digital Library

[25]

John C. Knight, Colleen L. DeJong, Matthew S. Gibble, and LuÃŋs G. Nakano. 1997. Why Are Formal Methods Not Used More Widely?. In Fourth NASA Formal Methods Workshop. 1–12.

[26]

Ivo Krka, Yuriy Brun, and Nenad Medvidovic. 2014. Automatic mining of specifications from invocation traces and method invariants. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, (FSE-22), Hong Kong, China, November 16 - 22, 2014. 178–189.

Digital Library

[27]

An Ngoc Lam, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen. 2015. Combining Deep Learning with Information Retrieval to Localize Buggy Files for Bug Reports (N). In 30th IEEE/ACM International Conference on Automated Software Engineering, ASE 2015, Lincoln, NE, USA, November 9-13, 2015. 476–481.

Digital Library

[28]

Tien-Duy B. Le, Xuan-Bach D. Le, David Lo, and Ivan Beschastnikh. 2015. Synergizing Specification Miners through Model Fissions and Fusions (T). In 30th IEEE/ACM International Conference on Automated Software Engineering, ASE 2015, Lincoln, NE, USA, November 9-13, 2015. 115–125.

Digital Library

[29]

Tien-Duy B. Le and David Lo. 2015. Beyond support and confidence: Exploring interestingness measures for rule-based specification mining. In 22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering, SANER 2015, Montreal, QC, Canada, March 2-6, 2015. 331–340.

[30]

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436–444.

[31]

Li Li, Daoyuan Li, Tegawendé F. Bissyandé, Jacques Klein, Yves Le Traon, David Lo, and Lorenzo Cavallaro. 2017. Understanding Android App Piggybacking: A Systematic Study of Malicious Code Grafting. IEEE Trans. Information Forensics and Security 12, 6 (2017), 1269–1284.

Digital Library

[32]

Yuanchun Li, Ziyue Yang, Yao Guo, and Xiangqun Chen. 2017. DroidBot: a lightweight UI-guided test input generator for Android. In Proceedings of the 39th International Conference on Software Engineering, ICSE 2017, Buenos Aires, Argentina, May 20-28, 2017 - Companion Volume. 23–26.

Digital Library

[33]

David Lo, Hong Cheng, Jiawei Han, Siau-Cheng Khoo, and Chengnian Sun. 2009. Classification of software behaviors for failure detection: a discriminative pattern mining approach. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 557–566.

Digital Library

[34]

David Lo and Siau-Cheng Khoo. 2006. QUARK: Empirical Assessment of Automaton-based Specification Miners. In 13th Working Conference on Reverse Engineering (WCRE 2006), 23-27 October 2006, Benevento, Italy. 51–60.

Digital Library

[35]

David Lo, Leonardo Mariani, and Mauro Pezzè. 2009. Automatic steering of behavioral model inference. In Proceedings of the 7th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2009, Amsterdam, The Netherlands, August 24-28, 2009. 345–354.

Digital Library

[36]

David Lo, Leonardo Mariani, and Mauro Santoro. 2012. Learning extended FSA from software: An empirical assessment. Journal of Systems and Software 85, 9 (2012), 2063–2076.

Digital Library

[37]

James MacQueen et al. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Vol. 1. Oakland, CA, USA., 281–297.

[38]

Leonardo Mariani, Fabrizio Pastore, and Mauro Pezzè. 2011. Dynamic Analysis for Diagnosing Integration Faults. IEEE Trans. Software Eng. 37, 4 (2011), 486–508.

Digital Library

[39]

Weikai Miao and Shaoying Liu. 2012. A Formal Specification-Based Integration Testing Approach. In SOFL. 26–43.

Digital Library

[40]

Tomas Mikolov, Anoop Deoras, Stefan Kombrink, Lukás Burget, and Jan Cernocký. 2011. Empirical Evaluation and Combination of Advanced Language Modeling Techniques. In INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, Florence, Italy, August 27-31, 2011.

[41]

605–608.

[42]

Tomáš Mikolov, Martin Karafiát, Lukáš Burget, Jan Černock`y, and Sanjeev Khudanpur. 2010. Recurrent Neural Network Based Language Model. In Eleventh Annual Conference of the International Speech Communication Association.

[43]

Tam The Nguyen, Hung Viet Pham, Phong Minh Vu, and Tung Thanh Nguyen. 2016. Learning API usages from bytecode: a statistical approach. In Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, Austin, TX, USA, May 14-22, 2016. 416–427.

Digital Library

[44]

Veselin Raychev, Martin T. Vechev, and Eran Yahav. 2014. Code completion with statistical language models. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, Edinburgh, United Kingdom - June 09 - 11, 2014. 419–428.

Digital Library

[45]

Brian Robinson, Michael D. Ernst, Jeff H. Perkins, Vinay Augustine, and Nuo Li. 2011. Scaling up automated test generation: Automatically generating maintainable regression unit tests for programs. In ASE 2011: Proceedings of the 26th Annual International Conference on Automated Software Engineering. Lawrence, KS, USA, 23–32.

Digital Library

[46]

Lior Rokach and Oded Maimon. 2005. Clustering Methods. In The Data Mining and Knowledge Discovery Handbook. 321–352.

[47]

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada. 3104–3112.

Digital Library

[48]

ISSTA’18, July 16–21, 2018, Amsterdam, Netherlands Tien-Duy B. Le and David Lo

[49]

Neil Walkinshaw and Kirill Bogdanov. 2008. Inferring Finite-State Models with Temporal Constraints. In 23rd IEEE/ACM International Conference on Automated Software Engineering (ASE 2008), 15-19 September 2008, L’Aquila, Italy. 248–257.

Digital Library

[50]

Song Wang, Devin Chollak, Dana Movshovitz-Attias, and Lin Tan. 2016. Bugram: bug detection with n-gram language models. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016, Singapore, September 3-7, 2016. 708–719.

Digital Library

[51]

Song Wang, Taiyue Liu, and Lin Tan. 2016. Automatically learning semantic features for defect prediction. In Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, Austin, TX, USA, May 14-22, 2016. 297–308.

Digital Library

[52]

Martin White, Michele Tufano, Christopher Vendome, and Denys Poshyvanyk. 2016. Deep learning code fragments for code clone detection. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016, Singapore, September 3-7, 2016. 87–98.

Digital Library

[53]

Martin White, Christopher Vendome, Mario Linares Vásquez, and Denys Poshyvanyk. 2015. Toward Deep Learning Software Repositories. In 12th IEEE/ACM Working Conference on Mining Software Repositories, MSR 2015, Florence, Italy, May 16-17, 2015. 334–345.

Digital Library

[54]

Mu Zhang, Yue Duan, Heng Yin, and Zhiruo Zhao. 2014. Semantics-Aware Android Malware Classification Using Weighted Contextual API Dependency Graphs. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, Scottsdale, AZ, USA, November 3-7, 2014. 1105–1116.

Digital Library

Cited By

Fan YWang M(2024)Specification Mining Based on the Ordering Points to Identify the Clustering Structure Clustering Algorithm and Model CheckingAlgorithms10.3390/a1701002817:1(28)Online publication date: 10-Jan-2024
https://doi.org/10.3390/a17010028
Wen HLiu HSong JChen YGuo WFeng YLuo BLiao XXu JKirda ELie D(2024)FORAY: Towards Effective Attack Synthesis against Deep Logical Vulnerabilities in DeFi ProtocolsProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3690293(1001-1015)Online publication date: 2-Dec-2024
https://dl.acm.org/doi/10.1145/3658644.3690293
Wen CCao JSu JXu ZQin SHe MLi HCheung STian C(2024)Enchanting Program Specification Synthesis by Large Language Models Using Static Analysis and Program VerificationComputer Aided Verification10.1007/978-3-031-65630-9_16(302-328)Online publication date: 25-Jul-2024
https://doi.org/10.1007/978-3-031-65630-9_16
Show More Cited By

Index Terms

Deep specification mining
1. Software and its engineering
  1. Software organization and properties
    1. Software functional properties
      1. Formal methods
        Dynamic analysis

Recommendations

Adversarial Specification Mining
Continuous Special Section: AI and SE

There have been numerous studies on mining temporal specifications from execution traces. These approaches learn finite-state automata (FSA) from execution traces when running tests. To learn accurate specifications of a software system, many tests are ...
DSM: a specification mining tool using recurrent neural network based language model
ESEC/FSE 2018: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Formal specifications are important but often unavailable. Furthermore, writing these specifications is time-consuming and requires skills from developers. In this work, we present Deep Specification Miner (DSM), an automated tool that applies deep ...
Deep Specification Mining with Attention
Computing and Combinatorics
Abstract
In this paper, we improve the method of specification mining based on deep learning proposed in [16]. In that neural network model, we find that if the length of a single trace exceeds 25 and the number of the tracking methods exceeds 15, the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISSTA 2018: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis

July 2018

379 pages

ISBN:9781450356992

DOI:10.1145/3213846

General Chair:
Frank Tip
Northeastern University, USA
,
Program Chair:
Eric Bodden
University of Paderborn, Germany / Fraunhofer IEM, Germany

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

In-Cooperation

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 July 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ISSTA '18

Sponsor:

SIGSOFT

ISSTA '18: International Symposium on Software Testing and Analysis

July 16 - 21, 2018

Amsterdam, Netherlands

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Sponsor:
sigsoft

34th ACM SIGSOFT International Symposium on Software Testing and Analysis

June 25 - 28, 2025

Trondheim , Norway

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

24
Total Citations
View Citations
876
Total Downloads

Downloads (Last 12 months)82
Downloads (Last 6 weeks)4

Reflects downloads up to 20 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Fan YWang M(2024)Specification Mining Based on the Ordering Points to Identify the Clustering Structure Clustering Algorithm and Model CheckingAlgorithms10.3390/a1701002817:1(28)Online publication date: 10-Jan-2024
https://doi.org/10.3390/a17010028
Wen HLiu HSong JChen YGuo WFeng YLuo BLiao XXu JKirda ELie D(2024)FORAY: Towards Effective Attack Synthesis against Deep Logical Vulnerabilities in DeFi ProtocolsProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3690293(1001-1015)Online publication date: 2-Dec-2024
https://dl.acm.org/doi/10.1145/3658644.3690293
Wen CCao JSu JXu ZQin SHe MLi HCheung STian C(2024)Enchanting Program Specification Synthesis by Large Language Models Using Static Analysis and Program VerificationComputer Aided Verification10.1007/978-3-031-65630-9_16(302-328)Online publication date: 25-Jul-2024
https://doi.org/10.1007/978-3-031-65630-9_16
Peng BLiang PHan TLuo WDu JWan HYe RZheng YBissyandé TKlein JBird CSarro F(2023)PURLTL: Mining LTL Specification from Imperfect Traces in TestingProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE56229.2023.00202(1766-1770)Online publication date: 11-Nov-2023
https://dl.acm.org/doi/10.1109/ASE56229.2023.00202
Brunello AMonica DMontanari ASaccomanno NUrgolo A(2023)Monitors That Learn From Failures: Pairing STL and Genetic ProgrammingIEEE Access10.1109/ACCESS.2023.327762011(57349-57364)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3277620
Khairunnesa SAhmed SImtiaz SRajan HLeavens G(2023)What kinds of contracts do ML APIs need?Empirical Software Engineering10.1007/s10664-023-10320-z28:6Online publication date: 17-Oct-2023
https://dl.acm.org/doi/10.1007/s10664-023-10320-z
Abdulla PLiang CRümmer P(2023)Boosting Constrained Horn Solving by Unsat Core LearningVerification, Model Checking, and Abstract Interpretation10.1007/978-3-031-50524-9_13(280-302)Online publication date: 30-Dec-2023
https://doi.org/10.1007/978-3-031-50524-9_13
Saberi IFaghih FBavil F(2022)A Passive Online Technique for Learning Hybrid Automata from Input/Output TracesACM Transactions on Embedded Computing Systems10.1145/355654322:1(1-24)Online publication date: 29-Oct-2022
https://dl.acm.org/doi/10.1145/3556543
Molina Fd'Amorim MAguirre NDwyer MDamian DZeller A(2022)Fuzzing class specificationsProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510120(1008-1020)Online publication date: 21-May-2022
https://dl.acm.org/doi/10.1145/3510003.3510120
Watson CCooper NPalacio DMoran KPoshyvanyk D(2022)A Systematic Literature Review on the Use of Deep Learning in Software Engineering ResearchACM Transactions on Software Engineering and Methodology10.1145/348527531:2(1-58)Online publication date: 4-Mar-2022
https://dl.acm.org/doi/10.1145/3485275
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents