Abstract
Knowledge extraction from existing software resources for maintenance, re-engineering and bug removal through code clone detection is an integral part of most of the internet-enabled devices. Similar code fragments which are live at different locations are called code clones. These Internet-enabled devices are used for knowledge sharing and data extraction to execute various applications related to code clone detection. However, most of the existing semantic code clone detection techniques are unable to provide heuristic solution for problems such as statement reordering, inversion of control predicates and insertion of irrelevant statements which may cause a performance bottleneck in this environment. To address these issues, we propose a novel approach that finds semantic code clones in a program or procedure using data flow analysis on the basis of reaching definition and liveness analysis. The algorithm based on reaching definition and liveness analysis is designed to find similar code fragments which are structurally divergent, but semantically equivalent. The results obtained demonstrate that the proposed approach using reaching definition and liveness analysis is effective in detection of semantic code clones for various applications running on the Internet-enabled devices. We have found 5831 semantically equivalent clone pairs on subject systems taken from DeCapo benchmark after elimination of 29,029 dead codes/statements having 2,16,579 line of code (LOC).
Similar content being viewed by others
References
He D, Zeadally S (2015) An analysis of RFID authentication schemes for internet of things in healthcare environment using elliptic curve cryptography. IEEE Internet Things J 2(1):72–83
Lu J, Rosenblum DS, Bultan T, Issarny V, Dustdar S, Storey MA, Zhang D (2015) The future of software engineering for internet computing. IEEE Softw 32(1):91–97
Roy CK, Cordy JR (2007) A survey on software clone detection research. Technical report 541, Queens University at Kingston
Gode N, Koschke R (2013) Studying clone evolution using incremental clone detection. J Softw Evol Process 25(2):165–192
Baker BS (1995) On finding duplication and near-duplication in large software systems. In: Proceedings of 2nd Working Conference on Reverse Engineering, IEEE, pp 86–95
Mayrand J, Leblanc C, Merlo EM (1996) Experiment on the automatic detection of function clones in a software system using metrics. In: Proceedings of International Conference on Software Maintenance, IEEE, pp 244–253
Rattan D, Bhatia R, Singh M (2013) Software clone detection: a systematic review. Inf Softw Technol 55(7):1165–1199
Kamiya T, Kusumoto S, Inoue K (2002) CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans Softw Eng 28(7):654–670
Johnson JH (1994) Substring matching for clone detection and change tracking. In: Proceedings of International Conference on Software Maintenance, IEEE, pp 120–126
Li Z, Lu S, Myagmar S, Zhou Y (2006) CP-Miner: finding copy-paste and related bugs in large-scale software code. IEEE Trans Softw Eng 32(3):176–192
Baxter ID, Yahin A, Moura L, Sant’Anna M, Bier L (1998) Clone detection using abstract syntax trees. In: Proceedings of International Conference on Software Maintenance, IEEE, pp 368–377
Evans WS, Fraser CW, Ma F (2009) Clone detection via structural abstraction. Softw Qual J 17(4):309–330
Wahler V, Seipel D, von Gudenberg JW, Fischer G (2004) Clone detection in source code by frequent itemset techniques. In: IEEE explore, IEEE, pp 128–135
Balazinska M, Merlo E, Dagenais M, Lague B, Kontogiannis K (1999) Measuring clone based reengineering opportunities. In: Proceedings of Sixth International Software Metrics Symposium, IEEE, pp 292–303
Higo Y, Kusumoto S, Inoue K (2008) A metric based approach to identifying refactoring opportunities for merging code clones in a Java software system. J Softw Maint Evol Res Practice 20(6):435–461
Kontogiannis KA, DeMori R, Merlo E, Galler M, Bernstein M (1996) Pattern matching for clone and concept detection. In: Reverse engineering, Springer US, pp 77–108
Lanubile F, Mallardo T (2003) Finding function clones in web applications. In: Proceedings of Seventh European Conference on Software Maintenance and Reengineering, IEEE, pp 379–386
Komondoor R, Horwitz S (2000) Semantics-preserving procedure extraction. In: Proceedings of the 27th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, ACM, pp 155–169
Komondoor R, Horwitz S (2001) Using slicing to identify duplication in source code. Static analysis, Springer, Berlin, pp 40–56
Krinke J (2001) Identifying similar code with program dependence graphs. In: Proceedings of Eighth Working Conference on Reverse Engineering, IEEE, pp 301–309
Liu C, Chen C, Han J, Yu PS (2006) GPLAG: detection of software plagiarism by program dependence graph analysis. In: Proceedings of the 12th ACM SIGKDD International Xonference on Knowledge Discovery and Data Mining, ACM, pp 872–881
Jiang L, Misherghi G, Su Z, Glondu S (2007) Deckard: scalable and accurate tree-based detection of code clones. In: Proceedings of the 29th International Conference on Software Engineering, IEEE Computer Society, pp 96–105
Koschke R, Falke R, Frenzel P (2006) Clone detection using abstract syntax suffix trees. In: 13th Working Conference on Reverse Engineering, WCRE’06, IEEE, pp 253–262
Maeda K (2010) An extended line-based approach to detect code clones using syntactic and lexical information. In: 2010 Seventh International Conference on Information Technology: New Generations (ITNG), IEEE, pp 1237–1240
Zhang L, Liu D, Li Y, Zhong M (2012) AST-based plagiarism detection method. Internet of things, Springer, Berlin, pp 611–618
Jiang L, Su Z (2009) Automatic mining of functionally equivalent code fragments via random testing. In: Proceedings of the Eighteenth International Symposium on Software Testing and Analysis, ACM, pp 81–92
Johnson JH (1994) Visualizing textual redundancy in legacy source. In: Proceedings of the 1994 Conference of the Centre for Advanced Studies on Collaborative Research, IBM Press, p 32
He D, Kumar N, Lee JH (2015) Secure pseudonym-based near field communication protocol for the consumer internet of things. IEEE Trans Consum Electron 61(1):56–63
Pate JR, Tairas R, Kraft NA (2013) Clone evolution: a systematic review. J Softw Evol Process 25(3):261–283
Baker BS (1993) A program for identifying duplicated code. Comput Sci Stat: 1–9
Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the Twentieth Annual Symposium on Computational Geometry, ACM, pp 253–262
Marcus A, Maletic JI (2001) Identification of high-level concept clones in source code. Automated Software Engineering (ASE’01), San Diego, CA, pp 107–114
Gabel M, Jiang L, Su Z (2008) Scalable detection of semantic clones. In: ICSE ’08: Proceedings of the 30th International Conference on Software Engineering, ACM, New York, pp 321–330
Choi S, Park H, Lim HI, Han T (2009) A static API birthmark for Windows binary executables. J Syst Softw 82(5):862–873
Kim H, Jung Y, Kim S, Yi K (2011) MeCC: memory comparison-based clone detector. In: 33rd International Conference on Software Engineering (ICSE), IEEE, pp 301–310
Schugerl P (2011) Scalable clone detection using description logic. In: Proceedings of the 5th International Workshop on Software Clones, ACM, pp 47–53
Higo Y, Kusumoto S (2011) Code clone detection on specialized PDGs with heuristics. In: 15th European Conference on Software Maintenance and Reengineering (CSMR),IEEE, pp 75–84
Elva R, Leavens GT (2012) Semantic clone detection using method IOE-behavior. In: Proceedings of the 6th International Workshop on Software Clones, IEEE, pp 80–81
Kamiya T (2013) Agec: An execution-semantic clone detection tool. In: IEEE 21st International Conference on Program Comprehension (ICPC), IEEE, pp 227–229
Tekchandani R, Bhatia RK, Singh M (2013) Semantic code clone detection using parse trees and grammar recovery. In: Confluence 2013: The Next Generation Information Technology Summit (4th International Conference), IET, pp 41–46
Wang T, Wang K, Su X, Ma P (2014) Detection of semantically similar code. Front Comput Sci 8(6):996–1011
Weiser M (1981) Program slicing. In: Proceedings of the 5th International Conference on Software Engineering, IEEE Press, pp 439–449
Bansal G, Tekchandani R (2014) Selecting a set of appropriate metrics for detecting code clones. In: Seventh International Conference on Contemporary Computing (IC3), IEEE, pp 484–488
Basit HA, Jarzabek S (2009) A data mining approach for detecting higher-level clones in software. IEEE Trans Softw Eng 35(4):497–514
Basit HA, Jarzabek S (2005) Detecting higher-level similarity patterns in programs. In: ACM SIGSOFT software engineering notes, ACM, vol. 30, no. 5, pp 156–165
Dang Y, Zhang D, Ge S, Chu C, Qiu Y, Xie T (2012) XIAO: tuning code clones at hands of engineers in practice. In: Proceedings of the 28th Annual Computer Security Applications Conference, ACM, pp 369–378
Roy CK, Cordy JR (2010) Near miss function clones in open source software: an empirical study. J Softw Maint Evol Res Pract 22(3):165–189
Jia S, Liu D, Zhang L, Liu C (2012) A research on plagiarism detecting method based on XML similarity and clustering. Internet of things, Springer, Berlin, pp 619–626
Luo L, Ming J, Wu D, Liu P, Zhu S (2014) Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, ACM, pp 389–400
Ekman T, Hedin G (2007) The jastadd extensible Java compiler. ACM Sigplan Not 42(10):1–18
Aho AV, Sethi R, Ullman JD (1986) Compilers, principles, rechniques. Addison wesley
Allen FE (1970) Control flow analysis. In: ACM Sigplan notices, ACM, vol. 5, no. 7, pp 1–19
Soderberg E, Ekman T, Hedin G, Magnusson E (2013) Extensible intraprocedural flow analysis at the abstract syntax tree level. Sci Comput Program 78(10):1809–1827
Blackburn SM, Garner R, Hoffmann C, Khang AM, McKinley KS, Bentzur R, Diwan A, Feinberg D, Frampton D, Guyer SZ, Hirzel M (2006) The DaCapo benchmarks: Java benchmarking development and analysis. In: ACM Sigplan Notices, ACM, vol. 41, no. 10, pp 169–190
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tekchandani, R., Bhatia, R. & Singh, M. Semantic code clone detection for Internet of Things applications using reaching definition and liveness analysis. J Supercomput 74, 4199–4226 (2018). https://doi.org/10.1007/s11227-016-1832-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1832-6