[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

Hardware for searching very large text databases

Published: 11 March 1980 Publication History

Abstract

This paper discusses the problem of searching very large text databases. It is shown that conventional techniques for searching current databases cannot be scaled up to larger ones, and that it is necessary to build hardware to search the database in parallel if reasonable search times are expected. The part of the search process requiring the highest bandwidth is scanning the database to detect instances of search terms. Methods of doing this in hardware that have been mentioned in the literature are examined, and design criteria for term matchers are discussed. A new design that uses a nondeterministic finite state automaton to control matching, is introduced, its operation is explained, and the practicality of using it in a real system is discussed.

References

[1]
McCarn, D. B., 'Online Services of the National Library of Medicine,' Digest of Papers, COMPCON Fall 1978, Washington, D. C., pp. 48-53
[2]
Black, D. V., 'System Development Corporation's Search Service,' Digest of Papers, COMPCON Fall 1978, Washington, D. C., pp. 59-64.
[3]
Bayer, M. P., 'Dialog—An Online Retrieval System for Bibliographic Information,' Digest of Papers, COMPCON Fall 1978, Washington, D. C., pp. 54-58.
[4]
Sprowl, J. A., 'Computer-Assisted Legal Research—An Analysis of Full Text Document Retrieval Systems, Particularly the LEXIS System,' American Bar Foundation Research Journal, Jan. 1976, Vol. 1, No. 1, pp. 175-226.
[5]
Hollaar, L. A., 'Text Retrieval Computers,' Computer, March 1979, Vol. 12, No. 3 (ISSN 0018-9162), pp. 40-50.
[6]
Lin, C. S., Smith, D. C. P., and Smith, J. M., 'The Design of a Rotating Associative Memory for Relational Database Applications,' ACM Trans. Database Syst., March 1976, Vol. 1, No. 1, pp. 53-65.
[7]
Ozkarahan, E. A., Schuster, S. A., and Smith, K. C., 'RAP: an Associative processor for Data Base Management,' Proc. 1975 AFIPS Nat. Comp. Conf., Vol. 44, AFIPS Press, Montvale, N. J., pp. 379-387.
[8]
Su, S., and Lipovski, G. J., 'CASSM: A Cellular System for Very Large Databases,' Proc. Conf. on Very Large Data Bases, Sept. 1975, pp. 456-472.
[9]
Hsiao, D. K., Kannan, K., and Kerr, D. S., 'Structure Memory Designs for a Database Computer,' Proc. ACM Annual Conf., 1977, pp. 343-350.
[10]
Stellhorn, W. H., 'A Specialized Computer for Information Retrieval,' Ph.D. Thesis, University of Illinois, Urbana, 1975.
[11]
Hollaar, L. A., 'An Architecture for the Efficient Combining of Linearly Ordered Lists,' Second Workshop on Comp. Arch. for Non-Numeric Processing, Jan. 1976.
[12]
Hurley, B. J., 'Analysis of Computer Architectures for Information Retrieval,' M. S. Thesis, University of Illinois, Urbana, May 1976.
[13]
Milner, J. M., 'An Analysis of Rotational Storage Access Scheduling in a Multiprogrammed Information Retrieval System,' Ph.D. Thesis, University of Illinois, Urbana, Sept. 1976.
[14]
Bird, R. M., Newsbaum, J. B., and Trefftzs, J. L., 'Text File Inversion: An Evaluation,' Proc. Fourth Non-Numeric Workshop, Syracuse, N. Y., Aug. 1978, pp. 42-50.
[15]
Cheng, W. K., 'Multiprocessor for String Manipulation,' M. S. Thesis, University of Illinois, Urbana, Oct. 1977, pp. 22-65.
[16]
Stellhorn, W. H., 'A Processor for Direct Scanning of Text,' presented at First Nonnumeric Workshop, Dallas, Oct. 1974.
[17]
Bird, R. M., Tu, J. C., and Worthy, R. M., 'Associative/Parallel Processors for Searching Very Large Textual Data Bases,' Proc. Third Non-Numeric Workshop, Syracuse, N. Y., May, 1977, pp. 8-16.
[18]
Foster, M. J., and Kung, H. T., 'Design of Special Purpose VLSI Chips,' Computer, Jan. 1980, Vol. 13, No. 1, (ISSN 0018-9162), pp. 26-40.
[19]
Mules, D. W., and Warter, P. J., 'A String Matcher for an Approximate Matches,' Department of Electrical Engineering, University of Delaware, Newark, April, 1979.
[20]
Copeland, G. P., 'String Storage and Searching for Data Base Applications: Implementation on the INDY Backend Kernel,' Proc. Fourth Non-Numeric Workshop, Syracuse, N. Y., Aug. 1978, pp. 8-17.
[21]
Mukhopadhyay, A., 'Hardware Algorithms for Non-numeric Computation,' Proc. Fifth Symposium on Computer Architecture, Palo Alto, Calif., Apr. 1978, pp. 8-16.
[22]
Moore, G., et. al., 'High-Speed-Text-Search Design Contract Interim Report,' Operating Systems Incorporated, Woodland Hills, Calif., Jan. 1977, pp. 2-69-2-101.
[23]
'High-Speed-Text-Search Design Contract Final Report', Operating Systems Incorporated, OSI:R77-009, April 1977, pp. 3-1-3-24.
[24]
Booth, T. L., Sequential Machine and Automata Theory, John Wiley and Sons, Inc., New York, 1968.
[25]
Unger, S. H., Asynchronous Sequential Switching Circuits, Wiley-Interscience, New York, 1969, pp. 28-63.

Cited By

View all
  • (1988)Intelligent String Search Processor to Accelerate Text Information RetrievalDatabase Machines and Knowledge Base Machines10.1007/978-1-4613-1679-4_22(297-310)Online publication date: 1988
  • (1986)A new string search hardware architecture for VLSIProceedings of the 13th annual international symposium on Computer architecture10.5555/17407.17359(20-27)Online publication date: 1-Jun-1986
  • (1986)A new string search hardware architecture for VLSIACM SIGARCH Computer Architecture News10.1145/17356.1735914:2(20-27)Online publication date: 1-May-1986
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGIR Forum
ACM SIGIR Forum  Volume 15, Issue 2
March 1980
150 pages
ISSN:0163-5840
DOI:10.1145/1013881
Issue’s Table of Contents
  • cover image ACM Conferences
    CAW '80: Proceedings of the fifth workshop on Computer architecture for non-numeric processing
    March 1980
    130 pages
    ISBN:9781450373951
    DOI:10.1145/800083
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 March 1980
Published in SIGIR Volume 15, Issue 2

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)96
  • Downloads (Last 6 weeks)18
Reflects downloads up to 09 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (1988)Intelligent String Search Processor to Accelerate Text Information RetrievalDatabase Machines and Knowledge Base Machines10.1007/978-1-4613-1679-4_22(297-310)Online publication date: 1988
  • (1986)A new string search hardware architecture for VLSIProceedings of the 13th annual international symposium on Computer architecture10.5555/17407.17359(20-27)Online publication date: 1-Jun-1986
  • (1986)A new string search hardware architecture for VLSIACM SIGARCH Computer Architecture News10.1145/17356.1735914:2(20-27)Online publication date: 1-May-1986
  • (1990)Specialized Parallel Architectures for Textual DatabasesAdvances in Computers Volume 3010.1016/S0065-2458(08)60297-1(1-37)Online publication date: 1990
  • (1989)Algorithms for string searchingACM SIGIR Forum10.1145/74697.7470023:3-4(34-58)Online publication date: 1-Apr-1989
  • (1983)Operational characteristics of a harware-based pattern matcherACM Transactions on Database Systems10.1145/319830.3198328:1(15-40)Online publication date: 1-Mar-1983
  • (1982)VLSI architectures for high speed recognition of context-free languages and finite-state languagesProceedings of the 9th annual symposium on Computer Architecture10.5555/800048.801712(43-49)Online publication date: 26-Apr-1982
  • (1982)Exploiting parallelism for the performance enhancement of non-numeric applicationsProceedings of the June 7-10, 1982, national computer conference10.1145/1500774.1500799(207-216)Online publication date: 7-Jun-1982
  • (1982)VLSI architectures for high speed recognition of context-free languages and finite-state languagesACM SIGARCH Computer Architecture News10.1145/1067649.80171210:3(43-49)Online publication date: 1-Apr-1982
  • (1982)A Hardware Hashing Scheme in the Design of a Multiterm String ComparatorIEEE Transactions on Computers10.1109/TC.1982.167609831:9(825-834)Online publication date: 1-Sep-1982

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media