[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/567752.567773acmconferencesArticle/Chapter ViewAbstractPublication PagespoplConference Proceedingsconference-collections
Article
Free access

String pattern matching in polynomial time

Published: 01 January 1979 Publication History

Abstract

There is a wide range of applications for string processing and SNOBOL4 (Griswold, et al. [1971]) has come to be the most widely implemented and accepted language for such applications. No doubt one of the principle reasons for this acceptance is the data structure around which the language is organized, the string pattern. This structure together with the associated pattern matching process provide great flexibility. Nevertheless it has been widely recognized in informal terms that the pattern matching process is often grossly inefficient (Ripley & Griswold [1975], Dewar & McCann [1977]) and that the pattern structure is notoriously difficult to explain and use (Ripley & Griswold [1975], Stewart [1975]). Each of these areas of difficulty relates to such things as two modes of operation (quick-full scan), problems with left-recursion, heuristics in the scan, etc. Some difficulties are inherent with string patterns but many are not; we feel the developments described here help to clarify this situation.In section 2 we describe the formal model upon which we base this work. This allows the careful analysis of the variety of sets of strings which may be specified by the patterns which we admit and deduction to be made concerning the possibility/impossibility of algorithms of interest. With SNOBOL4 it has been the case that the careful definition of the "meaning" of a pattern is in terms of the actions taken by the pattern matching algorithm. This has led to the incorporation of idiosyncrasies of a particular algorithm into the understanding of the pattern structure. This seems akin to using a compiler as the definition of a programming language and we believe it is important to future progress to have other alternatives.In section 3 we point out that the worst-case execution time of the usual SNOBOL pattern matching algorithm is exponential in the length of the subject string, even on some quite simple patterns. We then present an algorithm whose worst-case time is polynomial and that operates on patterns which include a true set complement operator. As side benefits we find that the algorithm is not multi-modal and correctly handles the null string as an alternative and left-recursion.In order to conserve space we will assume throughout this paper that the reader is familiar with the idea of a string pattern in the sense that it is described in Griswold et al. [1971]. Also it is probably necessary that the reader have some general knowledge of the formal languages area.

References

[1]
R. B. K. Dewar & A. P. McCann {1977}, "Macro SPITBOL---a SNOBOL4 Compiler," Software-Pract. and Exp., Vol. 7, 95-113.
[2]
J. Earley {1970}, "An efficient context-free parsing algorithm," Comm. ACM, Vol. 13, No. 2, 94-102.
[3]
A. C. Fleck {1978}, "Formal models for string patterns," in Current Trends in Programming Methodology Vol. 4: Data Structuring (R. Yeh, ed.), Prentice-Hall.
[4]
J. F. Gimpel {1973}, "A theory of discrete patterns and their implementation in SNOBOL4," Comm. ACM, Vol. 16, No. 2, 91-100.
[5]
R. E. Griswold {1972}, The Macro Implementation of SNOBOL4, W. H. Freeman Co.
[6]
R. E. Griswold, J. F. Poage & I. P. Polansky {1972}, The SNOBOL4 Programming Language (2nd ed.), Prentice-Hall.
[7]
K. C. Liu {1977}, "An efficient algorithm for string pattern matching," doctoral thesis, Computer Science Dept., University of Iowa.
[8]
L. Y. Liu and P. Weiner {1973}, "An infinite hierarchy of intersections of context-free languages," Math. Sys. Theory, 7, 2, 185-192.
[9]
G. D. Ripley & R. E. Griswold {1975}, "The measurement of SNOBOL4 programs," SIGPLAN Notices, Vol. 10, No. 5, 36-53.
[10]
G. F. Stewart {1975}, "An algebraic model for string patterns," ACM 2nd Symp. on Principles of Programming Languages, Palo Alto, Calif., 167-184.
[11]
W. M. Waite {1973}, Implementing Software for Non-numeric Applications, Prentice-Hall.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
POPL '79: Proceedings of the 6th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
January 1979
290 pages
ISBN:9781450373579
DOI:10.1145/567752
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 1979

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Acceptance Rates

POPL '79 Paper Acceptance Rate 27 of 146 submissions, 18%;
Overall Acceptance Rate 824 of 4,130 submissions, 20%

Upcoming Conference

POPL '26

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)62
  • Downloads (Last 6 weeks)10
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media