[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Quicksort, Largest Bucket, and Min-Wise Hashing with Limited Independence

  • Conference paper
  • First Online:
Algorithms - ESA 2015

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9294))

  • 2353 Accesses

Abstract

Randomized algorithms and data structures are often analyzed under the assumption of access to a perfect source of randomness. The most fundamental metric used to measure how “random” a hash function or a random number generator is, is its independence: a sequence of random variables is said to be k-independent if every variable is uniform and every size k subset is independent.

In this paper we consider three classic algorithms under limited independence. Besides the theoretical interest in removing the unrealistic assumption of full independence, the work is motivated by lower independence being more practical. We provide new bounds for randomized quicksort, min-wise hashing and largest bucket size under limited independence. Our results can be summarized as follows.

  • Randomized Quicksort. When pivot elements are computed using a 5-independent hash function, Karloff and Raghavan, J.ACM’93 showed \(\mathcal{O} ( n \log n)\) expected worst-case running time for a special version of quicksort. We improve upon this, showing that the same running time is achieved with only 4-independence.

  • Min-Wise Hashing. For a set A, consider the probability of a particular element being mapped to the smallest hash value. It is known that 5-independence implies the optimal probability \(\mathcal{O} (1 /n)\). Broder et al., STOC’98 showed that 2-independence implies it is \(\mathcal{O}(1 / \sqrt{|A|})\). We show a matching lower bound as well as new tight bounds for 3- and 4-independent hash functions.

  • Largest Bucket. We consider the case where n balls are distributed to n buckets using a k-independent hash function and analyze the largest bucket size. Alon et. al, STOC’97 showed that there exists a 2-independent hash function implying a bucket of size Ω( n 1/2). We generalize the bound, providing a k-independent family of functions that imply size Ω( n 1/k).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 71.50
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 89.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Alon, N., Dietzfelbinger, M., Miltersen, P.B., Petrank, E., Tardos, G.: Is linear hashing good? In: Symposium on Theory of Computing, STOC 1997 (1997)

    Google Scholar 

  2. Broder, A.Z.: On the resemblance and containment of documents. In: Compression and Complexity of Sequences (SEQUENCES), pp. 21–29 (1997)

    Google Scholar 

  3. Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-wise independent permutations. Journal of Computer and System Sciences 60

    Google Scholar 

  4. Carter, J.L., Wegman, M.N.: Universal classes of hash functions. Journal of Computer and System Sciences 18(2), 143–154 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  5. Christiani, T., Pagh, R.: Generating k-independent variables in constant time. In: Foundations of Computer Science (FOCS) (2014)

    Google Scholar 

  6. Christiani, T., Pagh, R., Thorup, M.: From independence to expansion and back again. In: Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC 2015, pp. 813–820. ACM, New York (2015)

    Chapter  Google Scholar 

  7. Cormen, T.H., Stein, C., Rivest, R.L., Leiserson, C.E.: Introduction to algorithms, 2nd edn. McGraw-Hill Higher Education (2001)

    Google Scholar 

  8. Dietzfelbinger, M., Hagerup, T., Katajainen, J., Penttonen, M.: A reliable randomized algorithm for the closest-pair problem. Journal of Algorithms 25(1), 19–51 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  9. Indyk, P.: A small approximately min-wise independent family of hash functions. In: ACM-SIAM Symposium on Discrete Algorithms, SODA 1999 (1999)

    Google Scholar 

  10. Karloff, H., Raghavan, P.: Randomized algorithms and pseudorandom numbers. In: Proceedings of the Twentieth Annual ACM Symposium on Theory of Computing, STOC 1988, pp. 310–321. ACM, New York (1988)

    Chapter  Google Scholar 

  11. Motwani, R., Raghavan, P.: Randomized algorithms. Cambridge University Press, New York (1995)

    Book  MATH  Google Scholar 

  12. Pagh, A., Pagh, R., Ruzic, M.: Linear probing with constant independence. In: Proceedings of the Thirty-ninth Annual ACM Symposium on Theory of Computing, STOC 2007, pp. 318–327. ACM (2007)

    Google Scholar 

  13. Pǎtraşcu, M., Thorup, M.: The power of simple tabulation hashing. In: Proceedings of the Forty-third Annual ACM Symposium on Theory of Computing, STOC 2011, pp. 1–10. ACM, New York (2011)

    Google Scholar 

  14. Pǎtraşcu, M., Thorup, M.: On the k-independence required by linear probing and minwise independence, CoRR abs/1302.5127 (2013)

    Google Scholar 

  15. Schmidt, J.P., Siegel, A., Srinivasan, A.: Chernoff-Hoeffding bounds for applications with limited independence. SIAM J. Discret. Math. 8(2), 223–250 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  16. Siegel, A.: On universal classes of extremely random constant-time hash functions. SIAM J. Comput. 33(3), 505–543 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  17. Thorup, M.: Simple tabulation, fast expanders, double tabulation, and high independence. In: Proc. FOCS 2013, pp. 90–99 (2013)

    Google Scholar 

  18. Thorup, M.: Even strongly universal hashing is pretty fast. In: Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2000, Philadelphia, PA, USA, pp. 496–497. Society for Industrial and Applied Mathematics (2000)

    Google Scholar 

  19. Thorup, M., Zhang, Y.: Tabulation based 5-universal hashing and linear probing. In: ALENEX 2010, pp. 62–76 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mathias Bæk Tejs Knudsen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Knudsen, M.B.T., Stöckel, M. (2015). Quicksort, Largest Bucket, and Min-Wise Hashing with Limited Independence. In: Bansal, N., Finocchi, I. (eds) Algorithms - ESA 2015. Lecture Notes in Computer Science(), vol 9294. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48350-3_69

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-48350-3_69

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-48349-7

  • Online ISBN: 978-3-662-48350-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics