Abstract
Randomized algorithms and data structures are often analyzed under the assumption of access to a perfect source of randomness. The most fundamental metric used to measure how “random” a hash function or a random number generator is, is its independence: a sequence of random variables is said to be k-independent if every variable is uniform and every size k subset is independent.
In this paper we consider three classic algorithms under limited independence. Besides the theoretical interest in removing the unrealistic assumption of full independence, the work is motivated by lower independence being more practical. We provide new bounds for randomized quicksort, min-wise hashing and largest bucket size under limited independence. Our results can be summarized as follows.
-
Randomized Quicksort. When pivot elements are computed using a 5-independent hash function, Karloff and Raghavan, J.ACM’93 showed \(\mathcal{O} ( n \log n)\) expected worst-case running time for a special version of quicksort. We improve upon this, showing that the same running time is achieved with only 4-independence.
-
Min-Wise Hashing. For a set A, consider the probability of a particular element being mapped to the smallest hash value. It is known that 5-independence implies the optimal probability \(\mathcal{O} (1 /n)\). Broder et al., STOC’98 showed that 2-independence implies it is \(\mathcal{O}(1 / \sqrt{|A|})\). We show a matching lower bound as well as new tight bounds for 3- and 4-independent hash functions.
-
Largest Bucket. We consider the case where n balls are distributed to n buckets using a k-independent hash function and analyze the largest bucket size. Alon et. al, STOC’97 showed that there exists a 2-independent hash function implying a bucket of size Ω( n 1/2). We generalize the bound, providing a k-independent family of functions that imply size Ω( n 1/k).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Alon, N., Dietzfelbinger, M., Miltersen, P.B., Petrank, E., Tardos, G.: Is linear hashing good? In: Symposium on Theory of Computing, STOC 1997 (1997)
Broder, A.Z.: On the resemblance and containment of documents. In: Compression and Complexity of Sequences (SEQUENCES), pp. 21–29 (1997)
Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-wise independent permutations. Journal of Computer and System Sciences 60
Carter, J.L., Wegman, M.N.: Universal classes of hash functions. Journal of Computer and System Sciences 18(2), 143–154 (1979)
Christiani, T., Pagh, R.: Generating k-independent variables in constant time. In: Foundations of Computer Science (FOCS) (2014)
Christiani, T., Pagh, R., Thorup, M.: From independence to expansion and back again. In: Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC 2015, pp. 813–820. ACM, New York (2015)
Cormen, T.H., Stein, C., Rivest, R.L., Leiserson, C.E.: Introduction to algorithms, 2nd edn. McGraw-Hill Higher Education (2001)
Dietzfelbinger, M., Hagerup, T., Katajainen, J., Penttonen, M.: A reliable randomized algorithm for the closest-pair problem. Journal of Algorithms 25(1), 19–51 (1997)
Indyk, P.: A small approximately min-wise independent family of hash functions. In: ACM-SIAM Symposium on Discrete Algorithms, SODA 1999 (1999)
Karloff, H., Raghavan, P.: Randomized algorithms and pseudorandom numbers. In: Proceedings of the Twentieth Annual ACM Symposium on Theory of Computing, STOC 1988, pp. 310–321. ACM, New York (1988)
Motwani, R., Raghavan, P.: Randomized algorithms. Cambridge University Press, New York (1995)
Pagh, A., Pagh, R., Ruzic, M.: Linear probing with constant independence. In: Proceedings of the Thirty-ninth Annual ACM Symposium on Theory of Computing, STOC 2007, pp. 318–327. ACM (2007)
Pǎtraşcu, M., Thorup, M.: The power of simple tabulation hashing. In: Proceedings of the Forty-third Annual ACM Symposium on Theory of Computing, STOC 2011, pp. 1–10. ACM, New York (2011)
Pǎtraşcu, M., Thorup, M.: On the k-independence required by linear probing and minwise independence, CoRR abs/1302.5127 (2013)
Schmidt, J.P., Siegel, A., Srinivasan, A.: Chernoff-Hoeffding bounds for applications with limited independence. SIAM J. Discret. Math. 8(2), 223–250 (1995)
Siegel, A.: On universal classes of extremely random constant-time hash functions. SIAM J. Comput. 33(3), 505–543 (2004)
Thorup, M.: Simple tabulation, fast expanders, double tabulation, and high independence. In: Proc. FOCS 2013, pp. 90–99 (2013)
Thorup, M.: Even strongly universal hashing is pretty fast. In: Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2000, Philadelphia, PA, USA, pp. 496–497. Society for Industrial and Applied Mathematics (2000)
Thorup, M., Zhang, Y.: Tabulation based 5-universal hashing and linear probing. In: ALENEX 2010, pp. 62–76 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Knudsen, M.B.T., Stöckel, M. (2015). Quicksort, Largest Bucket, and Min-Wise Hashing with Limited Independence. In: Bansal, N., Finocchi, I. (eds) Algorithms - ESA 2015. Lecture Notes in Computer Science(), vol 9294. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48350-3_69
Download citation
DOI: https://doi.org/10.1007/978-3-662-48350-3_69
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-48349-7
Online ISBN: 978-3-662-48350-3
eBook Packages: Computer ScienceComputer Science (R0)