Abstract
A legacy system is an operational, large-scale software system that is maintained beyond its first generation of programmers. It typically represents a massive economic investment and is critical to the mission of the organization it serves. As such systems age, they become increasingly complex and brittle, and hence harder to maintain. They also become even more critical to the survival of their organization because the business rules encoded within the system are seldom documented elsewhere.
Our research is concerned with developing a suite of tools to aid the maintainers of legacy systems in recovering the knowledge embodied within the system. The activities, known collectively as “program understanding”, are essential preludes for several key processes, including maintenance and design recovery for reengineering.
In this paper we present three pattern-matching techniques: source code metrics, a dynamic programming algorithm for finding the best alignment between two code fragments, and a statistical matching algorithm between abstract code descriptions represented in an abstract language and actual source code. The methods are applied to detect instances of code cloning in several moderately-sized production systems including tcsh, bash, and CLIPS.
The programmer's skill and experience are essential elements of our approach. Selection of particular tools and analysis methods depends on the needs of the particular task to be accomplished. Integration of the tools provides opportunities for synergy, allowing the programmer to select the most appropriate tool for a given task.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Adamov, R. “Literature review on software metrics”, Zurich: Institut fur Informatik der Universitat Zurich, 1987.
Baker S. B, “On Finding Duplication and Near-Duplication in Large Software Systems” In Proceedings of the Working Conference on Reverse Engineering 1995, Toronto ON. July 1995
Biggerstaff, T., Mitbander, B., Webster, D., Program Understanding and the Concept Assignment Problem, Communications of the ACM, May 1994, Vol. 37, No.5, pp. 73–83.
P. Brown et. al. “Class-Based n-gram Models of natural Language”, Journal of Computational Linguistics, Vol. 18, No.4, December 1992, pp.467–479.
Buss, E., et. al. “Investigating Reverse Engineering Technologies for the CAS Program Understanding Project”, IBM Systems Journal, Vol. 33, No. 3, 1994, pp. 477–500.
G. Canfora., A. Cimitile., U. Carlini., “A Logic-Based Approach to Reverse Engineering Tools Production” Transactions of Software Engineering, Vol.18, No. 12, December 1992, pp. 1053–1063.
Chikofsky, E.J. and Cross, J.H. II, “Reverse Engineering and Design Recovery: A Taxonomy,” IEEE Software, Jan. 1990, pp. 13 - 17.
Church, K., Helfman, I., “Dotplot: a program for exploring self-similarity in millions of lines of text and code”, J. Computational and Graphical Statistics 2,2, June 1993, pp. 153–174.
C-Language Integrated Production System User's Manual NASA Software Technology Division, Johnson Space Center, Houston, TX.
Fenton, E. “Software metrics: a rigorous approach”, Chapman and Hall, 1991.
Halstead, M., H., “Elements of Software Science”, New York: Elsevier North-Holland, 1977.
J. Hartman., “Technical Introduction to the First Workshop on Artificial Intelligence and Automated Program Understanding” First Workshop on AI and Automated Program Understanding, AAAI'92, San-Jose, CA.
Horwitz S., “Identifying the semantic and textual differences between two versions of a program. In Proc. ACM SIGPLAN Conference on Programming Language Design and Implementation, June 1990, pp. 234–245.
Jankowitz, H., T., “Detecting plagiarism in student PASCAL programs”, Computer Journal, 31.1, 1988, pp. 1–8.
Johnson, H., “Identifying Redundancy in Source Code Using Fingerprints” In Proceedings of CASCON '93, IBM Centre for Advanced Studies, October 24 – 28, Toronto, Vol.1, pp. 171 – 183.
Kuhn, R., DeMori, R., “A Cache-Based Natural Language Model for Speech Recognition”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 12, No.6, June 1990, pp. 570–583.
Kontogiannis, K., DeMori, R., Bernstein, M., Merlo, E., “Localization of Design Concepts in Legacy Systems”, In Proceedings of International Conference on Software Maintenance 1994, September 1994, Victoria, BC. Canada, pp. 414–423.
Kontogiannis, K., DeMori, R., Bernstein, M., Galler, M., Merlo, E., “Pattern matching for Design Concept Localization”, In Proceedings of the Second Working Conference on Reverse Engineering, July 1995, Toronto, ON. Canada, pp. 96–103.
McCabe T., J. “Reverse Engineering, reusability, redundancy: the connection”, American Programmer 3, 10, October 1990, pp. 8–13.
Moller, K., Software metrics: a practitioner's guide to improved product development”
Muller, H., Corrie, B., Tilley, S., Spatial and Visual Representations of Software Structures, Tech. Rep. TR-74. 086, IBM Canada Ltd. April 1992.
Mylopoulos, J., “Telos: A Language for Representing Knowledge About Information Systems”, University of Toronto, Dept. of Computer Science Technical Report KRR-TR-89-1, August 1990, Toronto.
J. NIng., A. Engberts., W. Kozaczynski., “Automated Support for Legacy Code Understanding”, Communications of the ACM, May 1994, Vol.37, No.5, pp.50–57.
Paul, S., Prakash, A., “A Framework for Source Code Search Using Program Patterns”, IEEE Transactions on Software Engineering, June 1994, Vol. 20, No.6, pp. 463–475.
Rich, C. and Wills, L.M., “Recognizing a Program's Design: A Graph-Parsing Approach”, IEEE Software, Jan 1990, pp. 82 - 89.
Tilley, S., Muller, H., Whitney, M., Wong, K., “Domain-retargetable Reverse EngineeringII: Personalized User Interfaces”, In CSM'94: Proceedings of the 1994 Conference on Software Maintenance, September 1994, pp. 336 – 342.
Viterbi, A.J, “Error Bounds for Convolutional Codes and an Asymptotic Optimum Decoding Algorithm”, IEEE Trans. Information Theory, 13(2) 1967.
Wills, L.M., “Automated Program Recognition by Graph Parsing”, MIT Technical Report, AI Lab No. 1358, 1992
Author information
Authors and Affiliations
Additional information
This work is in part supported by IBM Canada Ltd., Institute for Robotics and Intelligent Systems, a Canadian Network of Centers of Excellence and, the Natural Sciences and Engineering Research Council of Canada. Based on “Pattern Matching for Design Concept Localization” by K.A.Kontogiannis, R.DeMori, M.Bernstein, M.Galler, E.Merlo, which first appeared in Proceedings of the Second Working Conference on Reverse Enginering, pp.96–103, July, 1995, © IEEE, 1995
Rights and permissions
About this article
Cite this article
Kontogiannis, K.A., Demori, R., Merlo, E. et al. Pattern matching for clone and concept detection. Automated Software Engineering 3, 77–108 (1996). https://doi.org/10.1007/BF00126960
Issue Date:
DOI: https://doi.org/10.1007/BF00126960