[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

Database performance evaluation in an indexed file environment

Published: 01 March 1987 Publication History

Abstract

The use of database systems for managerial decision making often incorporates information-retrieval capabilities with numeric report generation. Of great concern to the user of such a system is the response time associated with issuing a query to the database. This study presents a procedure for estimating response time for one of the most frequently encountered physical storage mechanisms, the indexed file. The model provides a fairly high degree of accuracy, but is simple enough so that the cost of applying the model is not exorbitant. The model incorporates the knowledge that the distribution of access key occurrences is known to follow Zipf's law. It first estimates the access time required to complete the query, which includes the time needed for all input and output transactions, and CPU time used in performing the search. The effects of multiple users on an individual's response time are then assessed using a simple regression estimation technique. The two-step procedure allows for the separation of access time from multiuser influences.

References

[1]
ALLEN, A.O. Queueing models of computer systems. Computer 13, 4 (Apr. 1980), 13-24.]]
[2]
BOROVITS, i., AND NEUMANN, S. Computer Systems Performance Evaluation. u.u. Heath and Co., Lexington, Mass., 1979.]]
[3]
CARDENAS, A.F. Evaluation and selection of file organization: A model and system. Commun. 5 (May 1975), 253-263]]
[4]
CARDENAS, A.F. Analysis and performance of inverted database structures. Commun. ACM 18, 5 (May 1975), 253-263.]]
[5]
CHANDY, K.M., AND SAUER, C.H. Approximate methods for anlyzing queuing network models of computing systems. ACM comput. Surv. 10, 3 (Sept. 1978), 281-317]]
[6]
CHEN, P. P.-S. The entity-relationship model: Toward a unified view of data. A CM Trans. Database Syst. 1, 1 (Mar. 1976), 9-36.]]
[7]
CHEUNG, T.-Y. Estimating block accesses and number of records in file management. Commun. ACM 25, 7 (July 1982), 484-487.]]
[8]
CHRISTODOULAKIS, S. Estimating block transfers and join sizes. SIGMOD Rec. 13, 4 (May 1983), 40-54.]]
[9]
CHRISTODOULAKIS, S. Implications of certain assumptions in database performance evaluation. ACM Trans. Database Syst. 9, 2 (June 1984), 163-186.]]
[10]
COMER, D. 'The difficulty of optimum index selection. ACM "l'rans. Database Syst. 3, 4 (Dec. 1978), 440-445.]]
[11]
FEDOROWICZ, J. A Zipfian model of inverted file storage requirements. In Proceedings of the 12th Annual Pittsburg Conference on Modling and Simulation (Univ, of Pittsburgh, Pittsbuthgh Pa. APt. 30-May 22). 1981, 1393-1399 Pa., Apr. 30-May 2). 1981, 1393-1399.]]
[12]
FEDOROWlCZ, J. A Zipfian model of an automatic bibliographic system: An application to MEDLINE. J. AM. Soc. Inf. Sci, 33, 4 (July 1982), 223-232]]
[13]
FEDOROWmZ, J. Database evaluation using multiple regression techniques. S/GMOD Rec. 14, 2 (June 1984), 70-76.]]
[14]
FZSHMAN, G.S. Concepts and Methods in Discrete Event Digital Simulation. Wiley, New York, 1973.]]
[15]
FOSTER, D. V., MCGEHEARTY, P. F., SAUER, C. H., AND WAGGONER, C.N. A language for analysis of queueing models. In Proceedings of the 5th Annual Pittsburgh Modeling and Simulation Conference (Univ. of Pittsburgh, Pittsburgh, Pa., Apr. 24-26). 1974, pp. 381-386.]]
[16]
HXLL, E., JR. Analysis of an inverted data base structure. In Proceedings of the International Conference on Information Storage and Retrieval (ACM SIGIR, Rochester, N.Y., May 10-12), voi. 13, no. i. Summer 1978, pp. 37-64.]]
[17]
JOHNSTON, J. Econometric Methods. 2nd ed. McGraw-Hill, New York, 1972.]]
[18]
KNUTH, D. E. The Art of Computer Programming. Vol. 3, Sorting and Searching. Addison- Wesley, Reading MAss., 1973]]
[19]
LOWE, T.C. The influence of database characteristics and usage on direct access file organization. J. ACM (Oct. 1968), 535-548.]]
[20]
LUM, V.Y., AND LING, H. An oprimization problem on the selection of secondary keys. In Proceedings of the ACM National Conference (Chicago, Ill., Aug. 3-5). ACM, New York, 1971, pp. 349-356.]]
[21]
MUNTZ, R.R. Queuing networks: A critique of the state of the art and directions for the future. ACM Comput. Surv. 10, 3 (Sept. 1978), 353-359.]]
[22]
NGUYEN, H. C., OCKENE, A., REVELL, R., AND SKWlSH, W.J. The role of detailed simulation in capacity planning. IBM Syst. J. 19, 1 (1980), 81-101.]]
[23]
O'CONNOR, J. Answer-passage retrieval by text searching. JASIS 31, 4 (July 1980), 227-239.]]
[24]
PALVIA, P. Expressions for batched searching of sequential and hierarchical files. ACM Trans. Database Syst. 10, 1 (Mar. 1985), 97-106.]]
[25]
RALSTON, A., AND MEEK, C. H., Eds. Encyclopedia of Computer Science. Petroceiii/Charter, New York, 1976.]]
[26]
REISER, M., AND SAUER, C. H. Queueing network models: Methods of solution and their program implementaiton. In current Trends in Programming Methodo~gy. Vol. ~, Software Modeling and Its Impact on Performance, K. M. Chandy and R. T. Yeh, Eds. Prentice-Hall, Englewood Cliffs, N. J., 1978, pp. 115-167.]]
[27]
ROSE, C.A. A measurement procedure for queuing network models of computer systems. ACM Cornput. Surv. 10, 3 (Sept. 1978), 263-280.]]
[28]
SCHUEORAF, E.J. Compression of large inverted files with hyperbolic term distribution. Inf. Process. Manage. 12, 6 (1976), 377-384.]]
[29]
SEAMAN, P.H. Modeling considerations for predicting performance of CICSfVS systems. IBM Syst. J. 19, 1 (1980), 68-80.]]
[30]
SILER, K.F. A stochastic evaluation model for database organizations in data retrieval systems. Commun. ACM 19, 2 (Feb. 1976), 84-95.]]
[31]
WIEDERHOLD, G. Data Base Design. 2nd ed. McGraw-Hill, New York, 1983.]]
[32]
WONNACOTr, T. H., AND WONNACOTT, R.J. Regression: A Second Course in Statistics. Wiley, New York, 1981.]]
[33]
YAO, S.B. Approximating block accesses in database organizations. Commun. ACM 20, 4 (Apr. 1977), 260-261.]]
[34]
ZIPF, G.K. Human Behavior and the Principle of Lease Effort. Addison-Wesley, Reading, Mass., 1949.]]

Cited By

View all
  • (2024)Efficient indexing method based on autoencoder for large-scale data productsInternational Journal of Low-Carbon Technologies10.1093/ijlct/ctae21319(2580-2586)Online publication date: 11-Nov-2024
  • (2023)A Unified Formulation for the Frequency Distribution of Word Frequencies using the Inverse Zipf's LawProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591942(1776-1780)Online publication date: 19-Jul-2023
  • (2017)Benchmarking and Performance Analysis for Distributed Cache Systems: A Comparative Case StudyPerformance Evaluation and Benchmarking for the Analytics Era10.1007/978-3-319-72401-0_11(147-163)Online publication date: 30-Dec-2017
  • Show More Cited By

Recommendations

Reviews

Geoffry Stanton Howard

This paper develops and validates two analytical prediction models: one for access time and another for response time in an indexed file context. While the validity of the models proves to be excellent, their generalizability is not as broad as is implied by the title and abstract of the paper. This is because the models rely on Zipfian assumptions about the distribution of access keys, and these assumptions are valid only for files containing natural language text or keywords. The author constructed and tested the prediction models in a bibliographic searching environment, so it should not be assumed that the results are valid in the broader domain of general purpose database management systems that are implemented using indexed files. The author clearly notes these limitations in the text, but they should also have been reflected in the title and abstract. A more appropriate title might have been “Performance Prediction Models for Indexed Bibliographic Databases.” Within the bibliographic searching realm, the prediction models are quite accurate. Testing the prediction equations against actual searches in the MEDLINE database, an R-square value of 0.95 was obtained for predictions of access time, and 0.66 for response time predictions. These R-square values reflect the proportion of overall variance in the criterion variable that was explained by the analytical prediction model using regression analysis. A possible limitation of the results is that the models were validated using data collected in 1979 in an IBM mainframe environment. The parameters for the response time prediction model that were valid in 1979 may not be accurate in 1987. This would be due not only to absolute improvements in CPU speed since then, but to enhancement of operating system task scheduling and priority assignment techniques. The paper does not discuss the operating system environment in which the data were collected. Important applications for accurate models of response time in bibliographic systems are discussed. The design of optimum index structures, determination of the amount of information that should be indexed for each citation, overall database size, limits on the number of subscribers, and pricing structures can all be efficiently investigated using the validated models. The author has made a very significant contribution toward the development of more generalizable prediction models.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 12, Issue 1
March 1987
136 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/12047
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 1987
Published in TODS Volume 12, Issue 1

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)95
  • Downloads (Last 6 weeks)17
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Efficient indexing method based on autoencoder for large-scale data productsInternational Journal of Low-Carbon Technologies10.1093/ijlct/ctae21319(2580-2586)Online publication date: 11-Nov-2024
  • (2023)A Unified Formulation for the Frequency Distribution of Word Frequencies using the Inverse Zipf's LawProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591942(1776-1780)Online publication date: 19-Jul-2023
  • (2017)Benchmarking and Performance Analysis for Distributed Cache Systems: A Comparative Case StudyPerformance Evaluation and Benchmarking for the Analytics Era10.1007/978-3-319-72401-0_11(147-163)Online publication date: 30-Dec-2017
  • (2003)Experiential computingCommunications of the ACM10.1145/792704.79272946:7(48-55)Online publication date: 1-Jul-2003
  • (2003)Making virtual environments compellingCommunications of the ACM10.1145/792704.79272846:7(40-47)Online publication date: 1-Jul-2003
  • (2003)Making a game of system designCommunications of the ACM10.1145/792704.79272746:7(32-39)Online publication date: 1-Jul-2003
  • (2003)Mobile agents in distributed network managementCommunications of the ACM10.1145/792704.79271046:7(127-132)Online publication date: 1-Jul-2003
  • (2003)PFIRESCommunications of the ACM10.1145/792704.79270646:7(101-106)Online publication date: 1-Jul-2003
  • (2003)IT businesses and franchising: a research proposal36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the10.1109/HICSS.2003.1174779(10 pp.)Online publication date: 2003
  • (2002)The metaphysics of information qualityACM Journal of Computer Documentation10.1145/604228.60423726:3(141-147)Online publication date: 1-Aug-2002
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media