ABSTRACT
Randomized algorithms are often enjoyed for their simplicity, but the hash functions used to yield the desired theoretical guarantees are often neither simple nor practical. Here we show that the simplest possible tabulation hashing provides unexpectedly strong guarantees. The scheme itself dates back to Carter and Wegman (STOC'77). Keys are viewed as consisting of c characters. We initialize c tables T_1, ..., T_c mapping characters to random hash codes. A key x=(x_1, ..., x_c) is hashed to T_1[x_1] xor ... xor T_c[x_c].
While this scheme is not even 4-independent, we show that it provides many of the guarantees that are normally obtained via higher independence, e.g., Chernoff-type concentration, min-wise hashing for estimating set intersection, and cuckoo hashing.
Supplemental Material
- V. Braverman, K.-M. Chung, Z. Liu, M. Mitzenmacher, and R. Ostrovsky. AMS without 4-wise independence on product domains. In Proc. 27th Symposium on Theoretical Aspects of Computer Science (STACS), pages 119--130, 2010.Google Scholar
- J. S. Cohen and D. M. Kane. Bounds on the independence required for cuckoo hashing. Manuscript, 2009.Google Scholar
- M. Dietzfelbinger and M. Rink. Applications of a splitting trick. In Proc. 36th International Colloquium on Automata, Languages and Programming (ICALP), pages 354--365, 2009. Google ScholarDigital Library
- M. Dietzfelbinger and P. Woelfel. Almost random graphs with simple hash functions. In Proc. 25th ACM Symposium on Theory of Computing (STOC), pages 629--638, 2003. Google ScholarDigital Library
- P. Indyk. A small approximately min-wise independent family of hash functions. Journal of Algorithms, 38(1):84--90, 2001. See also SODA'99. Google ScholarDigital Library
- H. J. Karloff and P. Raghavan. Randomized algorithms and pseudorandom numbers. Journal of the ACM, 40(3):454--476, 1993. Google ScholarDigital Library
- D. E. Knuth. Notes on open addressing. Unpublished memorandum. See http://citeseer.ist.psu.edu/knuth63notes.html, 1963.Google Scholar
- M. Mitzenmacher and S. P. Vadhan. Why simple hash functions work: exploiting the entropy in a data stream. In Proc. 19th ACM/SIAM Symposium on Discrete Algorithms (SODA), pages 746--755, 2008. Google ScholarDigital Library
- R. Motwani and P. Raghavan. Randomized algorithms. Cambridge University Press, 1995. Google ScholarDigital Library
- A. Pagh, R. Pagh, and M. Ruzić. Linear probing with constant independence. SIAM Journal on Computing, 39(3):1107--1120, 2009. See also STOC'07. Google ScholarDigital Library
- M. P\v atra\c scu and M. Thorup. On the k-independence required by linear probing and minwise independence. In Proc. 37th International Colloquium on Automata, Languages and Programming (ICALP), pages 715--726, 2010. Google ScholarDigital Library
- J. P. Schmidt, A. Siegel, and A. Srinivasan. Chernoff-Hoeffding bounds for applications with limited independence. SIAM Journal on Discrete Mathematics, 8(2):223--250, 1995. See also SODA'93. Google ScholarDigital Library
- A. Siegel. On universal classes of extremely random constant-time hash functions. SIAM Journal on Computing, 33(3):505--543, 2004. See also FOCS'89. Google ScholarDigital Library
- M. Thorup. String hashing for linear probing. In Proc. 20th ACM/SIAM Symposium on Discrete Algorithms (SODA), pages 655--664, 2009. Google ScholarDigital Library
- M. Thorup and Y. Zhang. Tabulation based 4-universal hashing with applications to second moment estimation. In Proc. 15th ACM/SIAM Symposium on Discrete Algorithms (SODA), pages 615--624, 2004. Google ScholarDigital Library
- M. Thorup and Y. Zhang. Tabulation based 5-universal hashing and linear probing. In Proc. 12th Workshop on Algorithm Engineering and Experiments (ALENEX), 2009.Google Scholar
- M. N. Wegman and L. Carter. New classes and applications of hash functions. Journal of Computer and System Sciences, 22(3):265--279, 1981. See also FOCS'79. Google ScholarDigital Library
Index Terms
- The power of simple tabulation hashing
Recommendations
Fast hashing with strong concentration bounds
STOC 2020: Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of ComputingPrevious work on tabulation hashing by Pǎtraşcu and Thorup from STOC’11 on simple tabulation and from SODA’13 on twisted tabulation offered Chernoff-style concentration bounds on hash based sums, e.g., the number of balls/keys hashing to a given bin, ...
Fast and powerful hashing using tabulation
Randomized algorithms are often enjoyed for their simplicity, but the hash functions employed to yield the desired probabilistic guarantees are often too complicated to be practical. Here, we survey recent results on how simple hashing schemes based on ...
The Power of Simple Tabulation Hashing
Randomized algorithms are often enjoyed for their simplicity, but the hash functions used to yield the desired theoretical guarantees are often neither simple nor practical. Here we show that the simplest possible tabulation hashing provides ...
Comments