More Web Proxy on the site http://driver.im/

article

Free access

Physical database design for relational databases

Editor: Gio Wiederhold Authors:

S. Finkelstein,

P. TiberioAuthors Info & Claims

ACM Transactions on Database Systems (TODS), Volume 13, Issue 1

Pages 91 - 128

https://doi.org/10.1145/42201.42205

Published: 01 March 1988 Publication History

Abstract

This paper describes the concepts used in the implementation of DBDSGN, an experimental physical design tool for relational databases developed at the IBM San Jose Research Laboratory. Given a workload for System R (consisting of a set of SQL statements and their execution frequencies), DBDSGN suggests physical configurations for efficient performance. Each configuration consists of a set of indices and an ordering for each table. Workload statements are evaluated only for atomic configurations of indices, which have only one index per table. Costs for any configuration can be obtained from those of the atomic configurations. DBDSGN uses information supplied by the System R optimizer both to determine which columns might be worth indexing and to obtain estimates of the cost of executing statements in different configurations. The tool finds efficient solutions to the index-selection problem; if we assume the cost estimates supplied by the optimizer are the actual execution costs, it finds the optimal solution. Optionally, heuristics can be used to reduce execution time. The approach taken by DBDSGN in solving the index-selection problem for multiple-table statements significantly reduces the complexity of the problem. DBDSGN's principles were used in the Relational Design Tool (RDT), an IBM product based on DBDSGN, which performs design for SQL/DS, a relational system based on System R. System R actually uses DBDSGN's suggested solutions as the tool expects because cost estimates and other necessary information can be obtained from System R using a new SQL statement, the EXPLAIN statement. This illustrates how a system can export a model of its internal assumptions and behavior so that other systems (such as tools) can share this model.

References

[1]

ANDERSON, H. D., AND BERRA, P.B. Minimum cost selection of secondary indexes for formatted files. ACM Trans. Database Syst. 2, 1 (Mar. 1977), 68-90.

[2]

ASTRAHAN, M. M., KIM, W., AND SCHKOLNICK, M. Performance of the System R access path selection mechanism. In Info Processing 80: Proceedings of the IFIP Congress 80 (Tokyo, Oct. 1980), pp. 487-491.

[3]

ASTRAHAN, M. M., SCHKOLNICK, M., AND WHANG, K.Y. Counting unique values of an attribute without sorting. In{. Syst. 12, 1 (1987), 11-16.

[4]

ASTRAHAN, M. M., BLASGEN, M. W., CHAMBERLIN, D. D., ESWARAN, K. P., GRAY, J. N., GRIFFITHS, P. P., KING, W. F., LORIE, R. A., MCJONES, P. R., MEHL, J. W., PUTZOLU, G. R., TRAIGER, I. L., WADE, B. W., AND WATSON, V. System R: Relational approach to database management. ACM Trans. Database Syst. 1, 2 (June 1976), 97-137.

[5]

BLASGEN, M. W., ASTRAHAN, M. M., CHAMBERLIN, D. D., GRAY, J. N., KING, W. F., LINDSAY, B. G., LORIE, R. A., MEHL, J. W., PRICE, T. O., PUTZOLU, G. R., SCHKOLNICK, M., SELINGER, P. G., SLUTZ, P. R., STRONG, H. R., TRAIGER, I. L., WADE, B. W., AND YOST, R.A. System R: An architectural overview. IBM Syst. J. 20, 1 (1981), 41-62.

[6]

BONANNO, R., MAIO, D., AND TIBERIO, P. Secondary index selection in relational database physical design. Comput. J. 28, 4 (Aug. 1985), 398-405.

[7]

BONFATTI, F., MAIO, D., AND TIBERIO, P. A separability-based method for secondary index selection in physical database design. In Methodology and Tools/or Data Base Design, S. Ceri, Ed. Elsevier North-Holland, New York, 1983, pp. 148-160.

[8]

CARDENAS, A.F. Analysis and performance of inverted data base structures. Comraun. ACM 18, 5 (May 1975), 253-263.

[9]

CHAMBERLIN, D. D., ASTRAHAN, M. M., ESWARAN, K. P., GRIFFITHS, P. P., LORIE, R. A., MEHL, J. W., REISNER, P., AND WADE, B. W. SEQUEL2: A unified approach to data definition, manipulation and control. IBM J. Res. Dev. 20, 6 (Nov. 1976), 560-575.

[10]

CHAMBERLIN, D. D., ASTRAHAN, M. M., KING, W. F., LORIE, R. A., MEHL, J. W., PRICE, T. G., SCHKOLNICK, M., SELINGER, P. G., SLUTZ, D. R., WADE, B. W., AND YOST, R.A. Support for repetitive transactions and ad hoc queries in System R. ACM Trans. Database Syst. 6, 1 (Mar. 1981), 70-94.

[11]

CHAMBERLIN, D. D., ASTRAHAN, M. M., BLASGEN, M. W., GRAY, J. N., KING, W. F., LINDSAY, B. G., LORIE, R., MEHL, J. W., PRICE, T. G., PUTZOLU, F., SELINGER, P. G., SCHKOLNICK, M., SLUTZ, D. R., TRAIGER, I. L., WADE, B. W., AND YOST, R.A. A history and evaluation of System R. Commun. ACM 24, 10 (Oct. 1981), 632-646.

[12]

CHRISTODOULAKIS, S. Implications of certain assumptions in database performance evaluation. ACM Trans. Database Syst. 9, 2 (June 1984), 163-186.

[13]

COMER, D. The difficulty of optimum index selection. ACM Trans. Database Syst. 3, 4 (Dec. 1978), 440-445.

[14]

COMER, D. The ubiquitous B-tree. ACM Comput. Surv. 11, 2 (June 1979), 121-137.

[15]

GANSKI, R.A., AND WONG, H.K.T. Optimization of nested SQL quries revisited. In Proceedings of the ACM SIGMOD Conference (San Francisco, May 1987). ACM, New York, 1987, pp. 23-33.

[16]

HAMMER, M., AND CHAN, A. Index selection in a self-adaptive database management system In Proceedings of the ACM SIGMOD Conference (Washington, D.C., June 1976). ACM, New York, 1976, pp. 93-101.

[17]

HATZOPOULOUS, M., AND KOLLIAS, J.Y. On the selection of a reduced set of indexes. Comput. J. 28, 4 (Aug. 1985), 406-408.

[18]

HAWTHORN, P., AND STONEBRAKER, M. Performance analysis of a relational database management system. In Proceedings of the ACM SIGMOD Conference (Boston, Mass., May 1979). ACM, New York, pp. 1-12.

[19]

IBM. Relational Design Tool--Structured Query Language/Data System. Ref. Man. SH20- 6415-1, IBM, San Jose, Calif., 1985.

[20]

IBM. 8QL-data system application programming. Man. SH24-5018-2, IBM, San Jose, Calif., Aug. 1983.

[21]

IBM. SQL-data system planning and administration. Man. SH24-5014-1, IBM, San Jose, Calif., Aug. 1983.

[22]

IBM SYSTEMS JOURNAL. Database 2. IBM Syst. d. 23, 2 (1984), 98-218.

[23]

IP, M. Y. L., SAXTON, L. V., AND RAGHAVAN, V. V. On the selection of an optimal set of

[24]

KIM, W. On optimizing an SQL-like nested query. ACM Trans. Database Syst. 7, 3 (Sept. 1982), 443-469.

[25]

KING, W.F. On the selection of indices for a file. Res. Rep. RJ1641, IBM, San Jose, Calif., Jan. 1974.

[26]

KOLLIAS, J.G. A heuristic approach for determining the optimal degree of file inversion. Inf. Syst. 4, 1 (1979), 307-318.

[27]

MACKERT, M., AND LOHMAN, G. R* optimizer validation and performance evaluation for local queries. In Proceedings of the ACM SIGMOD Conference (Washington, D.C., May 1986). ACM, New York, 1986, pp. 84-95.

[28]

PUTKONEN, A. On the selection of access paths in inverted database organizations, inf. Syst. 4, 3 {1979), 219-225.

[29]

RELATIONAL TECHNOLOGY. An Introduction to INGRES. Realtional Technology. Berkeley Calif., Jan. 1984.

[30]

SCHKOLNICK, M. The optimal selection of secondary indices for files. Inf. Syst. 1 (1975), 141-146.

[31]

SCHKOLNICK, M. A survey of physical database methodologT and techniques. In Proceedings of the Very Large Database Conference (Berlin, Sept. 1978), pp. 474-487.

[32]

SCHKOLN{CK, M., AND TIBERZO, P. Considerations in developing a design tool for a relational DBMS. In Proceedings of the IEEE COMPSAC Con{erence (Chicago, Ill., Nov. 1979). IEEE Press, New York, 1979, pp. 228-235.

[33]

SCHKOLNICK, M., AND T{BERIO, P. Estimating the cost of updates in a relational database. ACM Trans. Database Syst. 10, 2 {June 1985), 163-179.

[34]

SCHMIDT, J. W., AND BRODIE, M. L,. Relational Database Systems Analysis and Comparison. Springer-Verlag, New York, 1983.

[35]

SELINGER, P. G., ASTRAHAN, M. M., CHAMBERLIN, D. D., LORIE, R. A., AND PRICE, T.G. Access (Boston, Mass., May 1979). ACM, New York, 1979, pp. 23-34.

[36]

STONEBRAKER, M. The choice of partial inversions and combined indices. Int. J. Comput. In{. Sci. 3, 2 (June 1974), 167-188.

[37]

STONEBRAKER, M., WONG, E., AND KREPS, P. The design and implementation of INGRES. ACM Trans. Database Syst. 1, 3 (Sept. 1976), 189-222.

[38]

WHANG, K, J. A physical database design methodology using the property of separability. Rep. STAN-CS-83-968, Computer Science Dept., Stanford Univ., Stanford, Calif., May 1983.

[39]

WHANG, K, Y., WIEDERHOLD, G., AND SAGALOWITZ, D. Separability--An approach to physical database design. IEEE Trans. Comput. 33, 3 (Mar. 1984), 209-222.

[40]

WONG, E., AND YOUSSEFI, K. Decomposition--A strategy for query processing. ACM Trans. Database Syst. 1,3 (Sept. 1976), 223-241

[41]

YAO, S.B. Database storage structure design. In Proceedings of the Conference on Database Engineering (Amagi, Japan, Nov. 1979). IBM, San Jose, Calif., 1979, pp. 1-22.

[42]

YoussgF!, K._, AND WONG~ E. Query processing in a relational database management, system. In Proceedings of the Very Large Database Conference (Rio de Janeiro, Oct. 1979). pp. 409-417.

Cited By

Mozaffari MDignös AGamper JStörl U(2024)Self-tuning Database Systems: A Systematic Literature Review of Automatic Database Schema Design and TuningACM Computing Surveys10.1145/366532356:11(1-37)Online publication date: 17-May-2024
https://dl.acm.org/doi/10.1145/3665323
Siddiqui TWu W(2024)ML-Powered Index Tuning: An Overview of Recent Progress and Open ChallengesACM SIGMOD Record10.1145/3641832.364183652:4(19-30)Online publication date: 19-Jan-2024
https://dl.acm.org/doi/10.1145/3641832.3641836
Storey VLukyanenko RCastellanos A(2023)Conceptual Modeling: Topics, Themes, and Technology TrendsACM Computing Surveys10.1145/358933855:14s(1-38)Online publication date: 17-Jul-2023
https://dl.acm.org/doi/10.1145/3589338
Show More Cited By

Index Terms

Physical database design for relational databases

Recommendations

Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more
Relational Database Design and Implementation
Options in physical database design

A cornerstone of modern database systems is physical data independence, i.e., the separation of a type and its associated operations from its physical representation in memory and on storage media. Users manipulate and query data at the logical level; ...

Reviews

Reviewer: Clement R. Attanasio

This paper describes DBDSGN, a research effort which led to the IBM product relational design tool (RDT). RDT accepts as input definitions of relational tables to be used in the environment of the IBM relational database product SQL/DS and a set of queries expected to be run against the database. It produces specifications of indices to be defined to execute the workload at minimal cost, and states whether one of the indices is clustered. In a clustered index, the rows of the table are stored (approximately) by order of the data values in the index. There can be at most one clustered index on a table. The most interesting aspect of DBDSGN is that it uses the actual DBMS (SQL/DS specifically, but the method is valid for others) optimizer, rather than an external model, to evaluate the cost of query processing for hypothetical configurations of tables and indices. The authors convincingly argue that this approach is desirable, even if an independent model would be more accurate, since the query will be processed according to the plan chosen by the DBMS optimizer, not the external model. The paper is sufficiently detailed to provide a blueprint for someone embarking on the implementation of this approach for another database system. I found the overall argument interesting.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems

ACM Transactions on Database Systems Volume 13, Issue 1

March 1988

128 pages

ISSN:0362-5915

EISSN:1557-4644

DOI:10.1145/42201

Editor:
Gio Wiederhold
Computer Science Department, Stanford University, Stanford, CA

Issue’s Table of Contents

Copyright © 1988 ACM.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 1988

Published in TODS Volume 13, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

177
Total Citations
View Citations
3,287
Total Downloads

Downloads (Last 12 months)462
Downloads (Last 6 weeks)77

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Mozaffari MDignös AGamper JStörl U(2024)Self-tuning Database Systems: A Systematic Literature Review of Automatic Database Schema Design and TuningACM Computing Surveys10.1145/366532356:11(1-37)Online publication date: 17-May-2024
https://dl.acm.org/doi/10.1145/3665323
Siddiqui TWu W(2024)ML-Powered Index Tuning: An Overview of Recent Progress and Open ChallengesACM SIGMOD Record10.1145/3641832.364183652:4(19-30)Online publication date: 19-Jan-2024
https://dl.acm.org/doi/10.1145/3641832.3641836
Storey VLukyanenko RCastellanos A(2023)Conceptual Modeling: Topics, Themes, and Technology TrendsACM Computing Surveys10.1145/358933855:14s(1-38)Online publication date: 17-Jul-2023
https://dl.acm.org/doi/10.1145/3589338
Nishikawa NFujiwara SHayamizu YGoda K(2023)Physical Database Design for Manufacturing Business Analytics2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386475(1793-1802)Online publication date: 15-Dec-2023
https://doi.org/10.1109/BigData59044.2023.10386475
Siddiqui TWu WNarasayya VChaudhuri S(2022)DISTILLProceedings of the VLDB Endowment10.14778/3547305.354730915:10(2019-2031)Online publication date: 7-Sep-2022
https://dl.acm.org/doi/10.14778/3547305.3547309
Siddiqui TJo SWu WWang CNarasayya VChaudhuri SIves ZBonifati AEl Abbadi A(2022)ISUM: Efficiently Compressing Large and Complex Workloads for Scalable Index TuningProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526152(660-673)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3526152
Wu WWang CSiddiqui TWang JNarasayya VChaudhuri SBernstein PIves ZBonifati AEl Abbadi A(2022)Budget-aware Index Tuning with Reinforcement LearningProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526128(1528-1541)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3526128
Hijazi APerera SCalheiros RAlashwal A(2022)A data model for integrating BIM and blockchain to enable a single source of truth for the construction supply chain data deliveryEngineering, Construction and Architectural Management10.1108/ECAM-03-2022-020930:10(4645-4664)Online publication date: 30-Jun-2022
https://doi.org/10.1108/ECAM-03-2022-0209
(2022)BibliographyStorage Systems10.1016/B978-0-32-390796-5.00023-1(641-693)Online publication date: 2022
https://doi.org/10.1016/B978-0-32-390796-5.00023-1
Thomasian A(2022)Database parallelism, big data and analytics, deep learningStorage Systems10.1016/B978-0-32-390796-5.00017-6(385-491)Online publication date: 2022
https://doi.org/10.1016/B978-0-32-390796-5.00017-6
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents