More Web Proxy on the site http://driver.im/

research-article

Sort vs. Hash Revisited

Authors:

L. D. ShapiroAuthors Info & Claims

IEEE Transactions on Knowledge and Data Engineering, Volume 6, Issue 6

Pages 934 - 944

https://doi.org/10.1109/69.334883

Published: 01 December 1994 Publication History

Abstract

Efficient algorithms for processing large volumes of data are very important both for relational and new object-oriented database systems. Many query-processing operations can be implemented using sort- or hash-based algorithms, e.g. intersections, joins, and duplicate elimination. In the early relational database systems, only sort-based algorithms were employed. In the last decade, hash-based algorithms have gained acceptance and popularity, and are often considered generally superior to sort-based algorithms such as merge-join. In this article, we compare the concepts behind sort- and hash-based query-processing algorithms and conclude that (1) many dualities exist between the two types of algorithms, (2) their costs differ mostly by percentages rather than by factors, (3) several special cases exist that favor one or the other choice, and (4) there is a strong reason why both hash- and sort-based algorithms should be available in a query-processing system. Our conclusions are supported by experiments performed using the Volcano query execution engine.

References

[1]

{1} A. Aggarval and J. S. Vitter, "The input/output complexity of sorting and related problems," Commun. ACM vol. 31, p. 1116, Oct. 1988.

Digital Library

[2]

{2} M. M. Astrahan, M. W. Blasgen, D. D. Chamberlin, K. P. Eswaran, J. N. Gray, P. P. Griffiths, W. F. King, R. A. Lorie, P. R. McJones, J. W. Mehl, G. R. Putzolu, I. L. Traiger, B. W. Wade, and V. Watson, "System R: A relational approach to database management," ACM Trans. Database Syst. vol. 1, no. 2, p. 97, June 1976 (reprinted in M. Stonebraker, Readings in Database Systems. San Mateo, CA: Morgan Kaufmann, 1988.

Digital Library

[3]

{3} M. Beck, D. Bitton, and W. K. Wilkinson, "Sorting large files on a backend multiprocessor," IEEE Trans. Comput., vol. 37, p. 769, 1988.

Digital Library

[4]

{4} D. Bitton and D. J. Dewitt, "Duplicate record elimination in large data files," ACM Trans. Database Syst., vol. 8, no. 2, p. 255----, June 1983.

Digital Library

[5]

{5} D. Bitton Friedland, "Design, analysis, and implementation of parallel external sorting algorithms," Comput. Sci. Tech. Rep. 464, University of Wisconsin--Madison, Jan. 1982.

[6]

{6} M. Blasgen and K. Eswaran, "On the evaluation of queries in a relational database system," IBM Res. Rep. RJ-1745, San Jose, CA, USA, Apr. 8, 1976.

[7]

{7} M. Blasgen and K. Eswaran, "Storage and access in relational databases," IBM Syst. J., vol. 16, no. 4, 1977.

Digital Library

[8]

{8} K. Bratbergsengen, "Hashing methods and relational algebra operations," Proc. Int. Conf. Very Large Data Bases, 1984, p. 323.

Digital Library

[9]

{9} H. T. Chou, D. J. DeWitt, R. H. Katz, and A. C. Klug, "Design and implementation of the Wisconsin storage system," Software: Practice and Experience, vol. 15, no. 10, p. 943, Oct. 1985.

Digital Library

[10]

{10} D. J. DeWitt, R. Katz, F. Olken, L. Shapiro, M. Stonebraker, and D. Wood, "Implementation techniques for main memory database systems," Proc. ACM SIGMOD Conf., 1984, p. 1.

Digital Library

[11]

{11} D. J. DeWitt and R. H. Gerber, "Multiprocessor hash-based join algorithms," Proc. Int. Conf. Very Large Data Bases, 1985, p. 151.

[12]

{12} D. J. DeWitt, R. H. Gerber, G. Graefe, M. L. Heytens, K. B. Kumar, and M. Muralikrishna, "GAMMA: A high performance dataflow data-base machine," Proc. Int. Conf. Very Large Data Bases, 1986, p. 228 (reprinted in M. Stonebraker, Readings in Database Systems. San Mateo, CA: Morgan Kaufmann, 1988).

Digital Library

[13]

{13} D. J. Dewitt, S. Ghandeharizadeh, and D. Schneider "A performance analysis of the GAMMA database machine," Proc. ACM SIGMOD Conf., 1988, p. 350.

Digital Library

[14]

{14} D. J. Dewitt, "The Wisconsin benchmark: Past, present, and future," in J. Gray, Ed., Database and Transactions Processing Systems Performance Handbook. San Mateo, CA: Morgan Kaufmann, 1991.

[15]

{15} D. J. Dewitt, J. Naughton, and D. Schneider, "Parallel sorting on a shared-nothing architecture using probabilistic splitting," Proc. Int. Conf. Parallel Distrib. Inform. Syst., Miami Beach, FL, USA, Dec. 1991.

Digital Library

[16]

{16} R. Epstein, "Techniques for processing of aggregates in relational database systems," UCB/Electron. Res. Lab. Memo. M79/8, Univ. of California, Feb. 1979.

[17]

{17} R. Fagin, J. Nievergelt, N. Pippenger, and H. R. Strong, "Extendible hashing: A fast access method for dynamic files," ACM Trans. Database Syst., Vol. 4, no. 3, p, 315-----, Sept. 1979.

Digital Library

[18]

{18} S. Fushimi, M. Kitsuregawa, and H. Tanaka, "An overview of the system software of a parallel relational database machine GRACE," Proc. Int. Conf. Very Large Data Bases, Kyoto, Japan, Aug. 1986.

Digital Library

[19]

{19} G. Graefe, "Encapsulation of parallelism in the Volcano query processing system," Proc. ACM SIGMOD Conf., 1990, p. 102-----.

Digital Library

[20]

{20} G. Graefe and D. L. Davison, "Encapsulation of parallelism architecture-independence in extensible database query processing," IEEE TRans. Software Eng., vol. 19, no. 8, pp. 747, Aug. 1993.

Digital Library

[21]

{21} G. Graefe, "Parallel external sorting in Volcano," Tech. Rep. 459, Univ. of Colorado, Boulder, USA, Dept. Comput. Sci., 1991.

[22]

{22} G. Graefe, "Volcano: An extensible and parallel dataflow query processing system," IEEE Trans. Knowledge. Data Eng., vol. 6, no. 1, pp. 120-135, Feb. 1994.

Digital Library

[23]

{23} G. Graefe and S. S. Thakkar, "Tuning a parallel database algorithm on a shared-memory multiprocessor," Software--Practice and Experience, vol. 22, no. 7, p.495, July 1992.

Digital Library

[24]

{24} L. M. Haas, W. F. Cody, J. C. Freytag, G. Lapis, B. G. Lindsay, G. M. Lohman, K. Ono, and H. Pirahesh, "An extensible processor for an extended relational query language," Comput. Sci. Res. Rep., San Jose, CA, USA, Apr. 1988.

[25]

{25} B. R. Iyer and D. M. Dias, "System issues in parallel sorting for database systems," Proc. IEEE Conf. Data Eng. 1990, p. 246.

Digital Library

[26]

{26} T. Keller and G. Graefe, "The one-to-one match operator of the Volcano query processing system," Oregon Graduate Center, Comput. Sci. Tech. Rep., Beaverton, OR, USA, June 1989.

[27]

{27} A. Kemper and G. Moerkotte, "Access support in object bases," Proc. ACM SIGMOD Conf., 1990, p. 364.

Digital Library

[28]

{28} M. Kitsuregawa, H. Tanaka, and T. Motooka, "Application of hash to data base machine and its architecture," New Generation Computing, vol. 1, 1983.

[29]

{29} A. M. Kitsuregawa, M. Nakayama, and M. Takagi, "The effect of bucket size tuning in the dynamic hybrid GRACE hash join method," Proc. Int. Conf. Very Large Data Bases, 1989, p. 257.

Digital Library

[30]

{30} A. Klug, "Access paths in the 'ABE' statistical query facility," Proc. ACM SIGMOD Conf., 1982, p. 161.

Digital Library

[31]

{31} D. Knuth, The Art of Computer Programming: Sorting and Searching, vol. III. Reading, MA: Addison-Wesley, 1973

Digital Library

[32]

{32} R. P. Kooi, "The optimization of queries in relational databases," Ph.D. dissertation, Case Western Reserve Univ., OH, USA, Sept. 1980.

Digital Library

[33]

{33} R. A. Lorie and H. C. Young, "A low communication sort algorithm for a parallel database machine," Proc. Int. Conf. Very Large Data Bases, 1989, p. 125.

Digital Library

[34]

{34} J. Menon, "A study of sort algorithms for multiprocessor database machines," Proc. Int. Conf. Very Large Data Bases, 1986, p. 197.

Digital Library

[35]

{35} M. Nakayama, M. Kitsuregawa, and M. Takagi, "Hash-partitioned join method using dynamic destaging strategy," Proc. Int. Conf. Very Large Data Bases, 1988, p. 468.

Digital Library

[36]

{36} J. Ousterhout, "Why aren't operating systems getting faster as fast as hardware?" WRL Tech. Rep. TN-11, Palo Alto, CA, USA, Oct. 1989.

[37]

{37} J. E. Richardson and M. J. Carey, "Programming constructs for database system implementation in EXODUS," Proc. ACM SIGMOD Conf., 1987, p. 208.

Digital Library

[38]

{38} B. Salzberg, File Structures: An Analytic Approach. Englewood Cliffs, NJ: Prentice-Hall, 1988.

Digital Library

[39]

{39} B. Salzberg, "Merging sorted runs using large main memory," Acta Informatica , vol. 27, p. 195, 1990.

Digital Library

[40]

{40} B. Salzberg, A. Tsukerman, J. Gray, M. Stewart, S. Uren, and B. Vaughan "FastSort: A distributed single-input single-output external sort," Proc. ACM SIGMOD Conf., 1990, p. 94.

Digital Library

[41]

{41} D. Schneider and D. Dewitt, "A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment," Proc. ACM SIGMOD Conf., 1989, p. 110.

Digital Library

[42]

{42} P. G. Selinger, M. M. Astrahan, D.D. Chamberlin, R. A. Lorie, and T. G. Price, "Access path selection in a relational database management system," Proc. ACM SIGMOD Conf., 1979, p. 23 (reprinted in M. Stonebraker, Readings in Database Systems. San Mateo, CA: Morgan-Kaufman, 1988).

Digital Library

[43]

{43} L. D. Shapiro, "Join processing in database systems with large main memories," ACM Trans. Database Syst., vol. 11, no. 3, p. 239, Sept. 1986.

Digital Library

[44]

{44} M. Stonebraker, E. Wong, P. Kreps, and G. D. Held, "The Design and Implementation of INGRES," ACM Trans. Database Syst., vol. 1, no. 3, p. 189, Sept. 1976 (reprinted in M. Stonebraker, Readings in Database Systems. San Mateo, CA: Morgan-Kaufman, 1988).

Digital Library

[45]

{45} Teradata Corp., DBC/1012, Data Base Computer, Concepts, and Facilities , Los Angeles, CA, USA, 1983.

[46]

{46} S. Todd, "PRTV: An efficient implementation for large relational data bases," Proc. Int. Conf. Very Large Data Bases, 1975, p. 554.

[47]

{47} H. Zeller and J. Gray, "an adaptive hash join algorithm for multiuser environments," Proc. Int. Conf. Very Large Data Bases, Brisbane, Australia, 1990.

Digital Library

[48]

{48} G. K. Zipf, Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Reading, MA: Addison-Wesley, 1949.

Cited By

Sun GJun SKim TLee P(2020)ColumnBurstProceedings of the 11th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3409963.3410494(9-16)Online publication date: 24-Aug-2020
https://dl.acm.org/doi/10.1145/3409963.3410494
Roh HShin MJung WPark S(2017)Advanced Block Nested Loop Join for Extending SSD LifetimeIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.265180329:4(743-756)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.1109/TKDE.2017.2651803
Pilman MKaufmann MKöhl FKossmann DProfeta DÖzcan FKoutrika GMadden S(2016)ParTimeProceedings of the 2016 International Conference on Management of Data10.1145/2882903.2903732(999-1010)Online publication date: 26-Jun-2016
https://dl.acm.org/doi/10.1145/2882903.2903732
Show More Cited By

Sort vs. Hash Revisited
1. Information systems
  1. Data management systems
    1. Database design and models
    2. Query languages

Recommendations

Translation of Object-Oriented Queries to Relational Queries
ICDE '95: Proceedings of the Eleventh International Conference on Data Engineering

Proposes a formal approach for translating OODB queries to equivalent relational queries. The translation is accomplished through the use of relational predicate graphs and OODB predicate graphs. One advantage of using such a graph-based approach is ...
Set-Oriented Mining for Association Rules in Relational Databases
ICDE '95: Proceedings of the Eleventh International Conference on Data Engineering

Describe set-oriented algorithms for mining association rules. Such algorithms imply performing multiple joins and may appear to be inherently less efficient than special-purpose algorithms. We develop new algorithms that can be expressed as SQL queries,...
Query Interoperation Among Object-Oriented and Relational Databases
ICDE '95: Proceedings of the Eleventh International Conference on Data Engineering

We develop an efficient algorithm for the query interoperation among existing heterogeneous object-oriented and relational databases. Our algorithm utilizes a canonical deductive database as a uniform representation of object-oriented schema and data. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Knowledge and Data Engineering

IEEE Transactions on Knowledge and Data Engineering Volume 6, Issue 6

December 1994

158 pages

ISSN:1041-4347

Issue’s Table of Contents

Copyright © Copyright © 1994 IEEE. All Rights Reserved.

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 December 1994

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

25
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sun GJun SKim TLee P(2020)ColumnBurstProceedings of the 11th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3409963.3410494(9-16)Online publication date: 24-Aug-2020
https://dl.acm.org/doi/10.1145/3409963.3410494
Roh HShin MJung WPark S(2017)Advanced Block Nested Loop Join for Extending SSD LifetimeIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.265180329:4(743-756)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.1109/TKDE.2017.2651803
Pilman MKaufmann MKöhl FKossmann DProfeta DÖzcan FKoutrika GMadden S(2016)ParTimeProceedings of the 2016 International Conference on Management of Data10.1145/2882903.2903732(999-1010)Online publication date: 26-Jun-2016
https://dl.acm.org/doi/10.1145/2882903.2903732
Balkesen CAlonso GTeubner JÖzsu M(2013)Multi-core, main-memory joinsProceedings of the VLDB Endowment10.14778/2732219.27322277:1(85-96)Online publication date: 1-Sep-2013
https://dl.acm.org/doi/10.14778/2732219.2732227
Luo TLee RMesnier MChen FZhang X(2012)hStorage-DBProceedings of the VLDB Endowment10.14778/2336664.23366795:10(1076-1087)Online publication date: 1-Jun-2012
https://dl.acm.org/doi/10.14778/2336664.2336679
Zhou JBruno NLin WCandan KChen YSnodgrass RGravano LFuxman A(2012)Advanced partitioning techniques for massively distributed computationProceedings of the 2012 ACM SIGMOD International Conference on Management of Data10.1145/2213836.2213839(13-24)Online publication date: 20-May-2012
https://dl.acm.org/doi/10.1145/2213836.2213839
Graefe G(2012)New algorithms for join and grouping operationsComputer Science - Research and Development10.1007/s00450-011-0186-927:1(3-27)Online publication date: 1-Feb-2012
https://dl.acm.org/doi/10.1007/s00450-011-0186-9
Kim CChhugani JSatish NSedlar ENguyen AKaldewey TLee VBrandt SDubey P(2011)Designing fast architecture-sensitive tree search on modern multicore/many-core processorsACM Transactions on Database Systems10.1145/2043652.204365536:4(1-34)Online publication date: 19-Dec-2011
https://dl.acm.org/doi/10.1145/2043652.2043655
Kim CKaldewey TLee VSedlar ENguyen ASatish NChhugani JDi Blas ADubey P(2009)Sort vs. Hash revisitedProceedings of the VLDB Endowment10.14778/1687553.16875642:2(1378-1389)Online publication date: 1-Aug-2009
https://dl.acm.org/doi/10.14778/1687553.1687564
Lee SMoon BPark CKim JKim SLakshmanan LNg RShasha D(2008)A case for flash memory ssd in enterprise database applicationsProceedings of the 2008 ACM SIGMOD international conference on Management of data10.1145/1376616.1376723(1075-1086)Online publication date: 9-Jun-2008
https://dl.acm.org/doi/10.1145/1376616.1376723
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents