[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

Physical database design for relational databases

Published: 01 March 1988 Publication History

Abstract

This paper describes the concepts used in the implementation of DBDSGN, an experimental physical design tool for relational databases developed at the IBM San Jose Research Laboratory. Given a workload for System R (consisting of a set of SQL statements and their execution frequencies), DBDSGN suggests physical configurations for efficient performance. Each configuration consists of a set of indices and an ordering for each table. Workload statements are evaluated only for atomic configurations of indices, which have only one index per table. Costs for any configuration can be obtained from those of the atomic configurations. DBDSGN uses information supplied by the System R optimizer both to determine which columns might be worth indexing and to obtain estimates of the cost of executing statements in different configurations. The tool finds efficient solutions to the index-selection problem; if we assume the cost estimates supplied by the optimizer are the actual execution costs, it finds the optimal solution. Optionally, heuristics can be used to reduce execution time. The approach taken by DBDSGN in solving the index-selection problem for multiple-table statements significantly reduces the complexity of the problem. DBDSGN's principles were used in the Relational Design Tool (RDT), an IBM product based on DBDSGN, which performs design for SQL/DS, a relational system based on System R. System R actually uses DBDSGN's suggested solutions as the tool expects because cost estimates and other necessary information can be obtained from System R using a new SQL statement, the EXPLAIN statement. This illustrates how a system can export a model of its internal assumptions and behavior so that other systems (such as tools) can share this model.

References

[1]
ANDERSON, H. D., AND BERRA, P.B. Minimum cost selection of secondary indexes for formatted files. ACM Trans. Database Syst. 2, 1 (Mar. 1977), 68-90.
[2]
ASTRAHAN, M. M., KIM, W., AND SCHKOLNICK, M. Performance of the System R access path selection mechanism. In Info Processing 80: Proceedings of the IFIP Congress 80 (Tokyo, Oct. 1980), pp. 487-491.
[3]
ASTRAHAN, M. M., SCHKOLNICK, M., AND WHANG, K.Y. Counting unique values of an attribute without sorting. In{. Syst. 12, 1 (1987), 11-16.
[4]
ASTRAHAN, M. M., BLASGEN, M. W., CHAMBERLIN, D. D., ESWARAN, K. P., GRAY, J. N., GRIFFITHS, P. P., KING, W. F., LORIE, R. A., MCJONES, P. R., MEHL, J. W., PUTZOLU, G. R., TRAIGER, I. L., WADE, B. W., AND WATSON, V. System R: Relational approach to database management. ACM Trans. Database Syst. 1, 2 (June 1976), 97-137.
[5]
BLASGEN, M. W., ASTRAHAN, M. M., CHAMBERLIN, D. D., GRAY, J. N., KING, W. F., LINDSAY, B. G., LORIE, R. A., MEHL, J. W., PRICE, T. O., PUTZOLU, G. R., SCHKOLNICK, M., SELINGER, P. G., SLUTZ, P. R., STRONG, H. R., TRAIGER, I. L., WADE, B. W., AND YOST, R.A. System R: An architectural overview. IBM Syst. J. 20, 1 (1981), 41-62.
[6]
BONANNO, R., MAIO, D., AND TIBERIO, P. Secondary index selection in relational database physical design. Comput. J. 28, 4 (Aug. 1985), 398-405.
[7]
BONFATTI, F., MAIO, D., AND TIBERIO, P. A separability-based method for secondary index selection in physical database design. In Methodology and Tools/or Data Base Design, S. Ceri, Ed. Elsevier North-Holland, New York, 1983, pp. 148-160.
[8]
CARDENAS, A.F. Analysis and performance of inverted data base structures. Comraun. ACM 18, 5 (May 1975), 253-263.
[9]
CHAMBERLIN, D. D., ASTRAHAN, M. M., ESWARAN, K. P., GRIFFITHS, P. P., LORIE, R. A., MEHL, J. W., REISNER, P., AND WADE, B. W. SEQUEL2: A unified approach to data definition, manipulation and control. IBM J. Res. Dev. 20, 6 (Nov. 1976), 560-575.
[10]
CHAMBERLIN, D. D., ASTRAHAN, M. M., KING, W. F., LORIE, R. A., MEHL, J. W., PRICE, T. G., SCHKOLNICK, M., SELINGER, P. G., SLUTZ, D. R., WADE, B. W., AND YOST, R.A. Support for repetitive transactions and ad hoc queries in System R. ACM Trans. Database Syst. 6, 1 (Mar. 1981), 70-94.
[11]
CHAMBERLIN, D. D., ASTRAHAN, M. M., BLASGEN, M. W., GRAY, J. N., KING, W. F., LINDSAY, B. G., LORIE, R., MEHL, J. W., PRICE, T. G., PUTZOLU, F., SELINGER, P. G., SCHKOLNICK, M., SLUTZ, D. R., TRAIGER, I. L., WADE, B. W., AND YOST, R.A. A history and evaluation of System R. Commun. ACM 24, 10 (Oct. 1981), 632-646.
[12]
CHRISTODOULAKIS, S. Implications of certain assumptions in database performance evaluation. ACM Trans. Database Syst. 9, 2 (June 1984), 163-186.
[13]
COMER, D. The difficulty of optimum index selection. ACM Trans. Database Syst. 3, 4 (Dec. 1978), 440-445.
[14]
COMER, D. The ubiquitous B-tree. ACM Comput. Surv. 11, 2 (June 1979), 121-137.
[15]
GANSKI, R.A., AND WONG, H.K.T. Optimization of nested SQL quries revisited. In Proceedings of the ACM SIGMOD Conference (San Francisco, May 1987). ACM, New York, 1987, pp. 23-33.
[16]
HAMMER, M., AND CHAN, A. Index selection in a self-adaptive database management system In Proceedings of the ACM SIGMOD Conference (Washington, D.C., June 1976). ACM, New York, 1976, pp. 93-101.
[17]
HATZOPOULOUS, M., AND KOLLIAS, J.Y. On the selection of a reduced set of indexes. Comput. J. 28, 4 (Aug. 1985), 406-408.
[18]
HAWTHORN, P., AND STONEBRAKER, M. Performance analysis of a relational database management system. In Proceedings of the ACM SIGMOD Conference (Boston, Mass., May 1979). ACM, New York, pp. 1-12.
[19]
IBM. Relational Design Tool--Structured Query Language/Data System. Ref. Man. SH20- 6415-1, IBM, San Jose, Calif., 1985.
[20]
IBM. 8QL-data system application programming. Man. SH24-5018-2, IBM, San Jose, Calif., Aug. 1983.
[21]
IBM. SQL-data system planning and administration. Man. SH24-5014-1, IBM, San Jose, Calif., Aug. 1983.
[22]
IBM SYSTEMS JOURNAL. Database 2. IBM Syst. d. 23, 2 (1984), 98-218.
[23]
IP, M. Y. L., SAXTON, L. V., AND RAGHAVAN, V. V. On the selection of an optimal set of
[24]
KIM, W. On optimizing an SQL-like nested query. ACM Trans. Database Syst. 7, 3 (Sept. 1982), 443-469.
[25]
KING, W.F. On the selection of indices for a file. Res. Rep. RJ1641, IBM, San Jose, Calif., Jan. 1974.
[26]
KOLLIAS, J.G. A heuristic approach for determining the optimal degree of file inversion. Inf. Syst. 4, 1 (1979), 307-318.
[27]
MACKERT, M., AND LOHMAN, G. R* optimizer validation and performance evaluation for local queries. In Proceedings of the ACM SIGMOD Conference (Washington, D.C., May 1986). ACM, New York, 1986, pp. 84-95.
[28]
PUTKONEN, A. On the selection of access paths in inverted database organizations, inf. Syst. 4, 3 {1979), 219-225.
[29]
RELATIONAL TECHNOLOGY. An Introduction to INGRES. Realtional Technology. Berkeley Calif., Jan. 1984.
[30]
SCHKOLNICK, M. The optimal selection of secondary indices for files. Inf. Syst. 1 (1975), 141-146.
[31]
SCHKOLNICK, M. A survey of physical database methodologT and techniques. In Proceedings of the Very Large Database Conference (Berlin, Sept. 1978), pp. 474-487.
[32]
SCHKOLN{CK, M., AND TIBERZO, P. Considerations in developing a design tool for a relational DBMS. In Proceedings of the IEEE COMPSAC Con{erence (Chicago, Ill., Nov. 1979). IEEE Press, New York, 1979, pp. 228-235.
[33]
SCHKOLNICK, M., AND T{BERIO, P. Estimating the cost of updates in a relational database. ACM Trans. Database Syst. 10, 2 {June 1985), 163-179.
[34]
SCHMIDT, J. W., AND BRODIE, M. L,. Relational Database Systems Analysis and Comparison. Springer-Verlag, New York, 1983.
[35]
SELINGER, P. G., ASTRAHAN, M. M., CHAMBERLIN, D. D., LORIE, R. A., AND PRICE, T.G. Access (Boston, Mass., May 1979). ACM, New York, 1979, pp. 23-34.
[36]
STONEBRAKER, M. The choice of partial inversions and combined indices. Int. J. Comput. In{. Sci. 3, 2 (June 1974), 167-188.
[37]
STONEBRAKER, M., WONG, E., AND KREPS, P. The design and implementation of INGRES. ACM Trans. Database Syst. 1, 3 (Sept. 1976), 189-222.
[38]
WHANG, K, J. A physical database design methodology using the property of separability. Rep. STAN-CS-83-968, Computer Science Dept., Stanford Univ., Stanford, Calif., May 1983.
[39]
WHANG, K, Y., WIEDERHOLD, G., AND SAGALOWITZ, D. Separability--An approach to physical database design. IEEE Trans. Comput. 33, 3 (Mar. 1984), 209-222.
[40]
WONG, E., AND YOUSSEFI, K. Decomposition--A strategy for query processing. ACM Trans. Database Syst. 1,3 (Sept. 1976), 223-241
[41]
YAO, S.B. Database storage structure design. In Proceedings of the Conference on Database Engineering (Amagi, Japan, Nov. 1979). IBM, San Jose, Calif., 1979, pp. 1-22.
[42]
YoussgF!, K._, AND WONG~ E. Query processing in a relational database management, system. In Proceedings of the Very Large Database Conference (Rio de Janeiro, Oct. 1979). pp. 409-417.

Cited By

View all
  • (2024)Self-tuning Database Systems: A Systematic Literature Review of Automatic Database Schema Design and TuningACM Computing Surveys10.1145/366532356:11(1-37)Online publication date: 17-May-2024
  • (2024)ML-Powered Index Tuning: An Overview of Recent Progress and Open ChallengesACM SIGMOD Record10.1145/3641832.364183652:4(19-30)Online publication date: 19-Jan-2024
  • (2023)Conceptual Modeling: Topics, Themes, and Technology TrendsACM Computing Surveys10.1145/358933855:14s(1-38)Online publication date: 17-Jul-2023
  • Show More Cited By

Recommendations

Reviews

Clement R. Attanasio

This paper describes DBDSGN, a research effort which led to the IBM product relational design tool (RDT). RDT accepts as input definitions of relational tables to be used in the environment of the IBM relational database product SQL/DS and a set of queries expected to be run against the database. It produces specifications of indices to be defined to execute the workload at minimal cost, and states whether one of the indices is clustered. In a clustered index, the rows of the table are stored (approximately) by order of the data values in the index. There can be at most one clustered index on a table. The most interesting aspect of DBDSGN is that it uses the actual DBMS (SQL/DS specifically, but the method is valid for others) optimizer, rather than an external model, to evaluate the cost of query processing for hypothetical configurations of tables and indices. The authors convincingly argue that this approach is desirable, even if an independent model would be more accurate, since the query will be processed according to the plan chosen by the DBMS optimizer, not the external model. The paper is sufficiently detailed to provide a blueprint for someone embarking on the implementation of this approach for another database system. I found the overall argument interesting.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 13, Issue 1
March 1988
128 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/42201
  • Editor:
  • Gio Wiederhold
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 1988
Published in TODS Volume 13, Issue 1

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)462
  • Downloads (Last 6 weeks)77
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Self-tuning Database Systems: A Systematic Literature Review of Automatic Database Schema Design and TuningACM Computing Surveys10.1145/366532356:11(1-37)Online publication date: 17-May-2024
  • (2024)ML-Powered Index Tuning: An Overview of Recent Progress and Open ChallengesACM SIGMOD Record10.1145/3641832.364183652:4(19-30)Online publication date: 19-Jan-2024
  • (2023)Conceptual Modeling: Topics, Themes, and Technology TrendsACM Computing Surveys10.1145/358933855:14s(1-38)Online publication date: 17-Jul-2023
  • (2023)Physical Database Design for Manufacturing Business Analytics2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386475(1793-1802)Online publication date: 15-Dec-2023
  • (2022)DISTILLProceedings of the VLDB Endowment10.14778/3547305.354730915:10(2019-2031)Online publication date: 7-Sep-2022
  • (2022)ISUM: Efficiently Compressing Large and Complex Workloads for Scalable Index TuningProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526152(660-673)Online publication date: 10-Jun-2022
  • (2022)Budget-aware Index Tuning with Reinforcement LearningProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526128(1528-1541)Online publication date: 10-Jun-2022
  • (2022)A data model for integrating BIM and blockchain to enable a single source of truth for the construction supply chain data deliveryEngineering, Construction and Architectural Management10.1108/ECAM-03-2022-020930:10(4645-4664)Online publication date: 30-Jun-2022
  • (2022)BibliographyStorage Systems10.1016/B978-0-32-390796-5.00023-1(641-693)Online publication date: 2022
  • (2022)Database parallelism, big data and analytics, deep learningStorage Systems10.1016/B978-0-32-390796-5.00017-6(385-491)Online publication date: 2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media