ABSTRACT
Many algorithms and data structures employing hashing have been analyzed under the uniform hashing assumption, i.e., the assumption that hash functions behave like truly random functions. Starting with the discovery of universal hash functions, many researchers have studied to what extent this theoretical ideal can be realized by hash functions that do not take up too much space and can be evaluated quickly. In this paper we present an almost ideal solution to this problem: A hash function that, on any set of n inputs, behaves like a truly random function with high probability, can be evaluated in constant time on a RAM, and can be stored in O(n) words, which is optimal. For many hashing schemes this is the first hash function that makes their uniform hashing analysis come true, with high probability, without incurring overhead in time or space.
- N. Alon. Eigenvalues and expanders. Combinatorica, 6(2):83--96, 1986.]] Google ScholarDigital Library
- N. Alon, L. Babai, and A. Itai. A fast and simple randomized parallel algorithm for the maximal independent set problem. J. Algorithms, 7(4):567--583, 1986.]] Google ScholarDigital Library
- N. Alon, M. Dietzfelbinger, P. B. Miltersen, E. Petrank, and G. Tardos. Is linear hashing good? In Proceedings of the 29th Annual ACM Symposium on Theory of Computing (STOC~'97), pages 465--474. ACM Press, 1997.]] Google ScholarDigital Library
- Y. Azar, A. Z. Broder, A. R. Karlin, and E. Upfal. Balanced allocations. SIAM J. Comput., 29(1):180--200, 1999.]] Google ScholarDigital Library
- M. Bellare, O. Goldreich, and H. Krawczyk. Stateless evaluation of pseudorandom functions: Security beyond the birthday barrier. In Proc. of 19th annual international cryptology conference (CRYPTO'99), volume 1666 of Lecture Notes in Computer Science, pages 270--287. Springer-Verlag, 1999.]] Google ScholarDigital Library
- P. Berenbrink, A. Czumaj, A. Steger, and B. V\"ocking. Balanced allocations: the heavily loaded case. In Proceedings of the 32nd Annual ACM Symposium on Theory of Computing (STOC~'00), pages 745--754. ACM Press, 2000.]] Google ScholarDigital Library
- A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher. Min-wise independent permutations. J. Comput. System Sci., 60(3):630--659, 2000.]] Google ScholarDigital Library
- J. L. Carter and M. N. Wegman. Universal classes of hash functions. J. Comput. System Sci., 18(2):143--154, 1979.]]Google ScholarCross Ref
- B. Chor, O. Goldreich, J. Hastad, J. Friedman, S. Rudich, and R. Smolensky. The bit extraction problem of t-resilient functions (preliminary version). In Proceedings of the 26th Annual Symposium on Foundations of Computer Science (FOCS~'85), pages 396--407. IEEE Comput. Soc. Press, 1985.]]Google ScholarDigital Library
- M. Dietzfelbinger. Universal hashing and k-wise independent random variables via integer arithmetic without primes. In Proceedings of the 13th Symposium on Theoretical Aspects of Computer Science (STACS '96), volume 1046 of Lecture Notes in Computer Science, pages 569--580. Springer-Verlag, 1996.]] Google ScholarDigital Library
- M. Dietzfelbinger, J. Gil, Y. Matias, and N. Pippenger. Polynomial hash functions are reliable (extended abstract). In Proceedings of the 19th International Colloquium on Automata, Languages and Programming (ICALP '92), volume 623 of Lecture Notes in Computer Science, pages 235--246. Springer-Verlag, 1992.]] Google ScholarDigital Library
- M. Dietzfelbinger and F. Meyer auf der Heide. A new universal class of hash functions and dynamic hashing in real time. In Proceedings of the 17th International Colloquium on Automata, Languages and Programming (ICALP '90), volume 443 of Lecture Notes in Computer Science, pages 6--19. Springer-Verlag, 1990.]] Google ScholarDigital Library
- M. Dietzfelbinger and F. Meyer auf der Heide. High performance universal hashing, with applications to shared memory simulations. In Data structures and efficient algorithms, volume 594 of Lecture Notes in Computer Science, pages 250--269. Springer, 1992.]] Google ScholarDigital Library
- M. Dietzfelbinger and P. Woelfel. Almost random graphs with simple hash functions. In Proceedings of the 35th Annual ACM Symposium on Theory of Computing (STOC '03), 2003.]] Google ScholarDigital Library
- M. L. Fredman, J. Komlos, and E. Szemeredi. Storing a sparse table with O(1) worst case access time. J. Assoc. Comput. Mach., 31(3):538--544, 1984.]] Google ScholarDigital Library
- O. Goldreich and A. Wigderson. Tiny families of functions with random properties: A quality-size trade-off for hashing. Random Structures & Algorithms, 11(4):315--343, 1997.]] Google ScholarDigital Library
- G. Gonnet. Handbook of Algorithms and Data Structures. Addison-Wesley Publishing Co., 1984.]] Google ScholarDigital Library
- P. Indyk, R. Motwani, P. Raghavan, and S. Vempala. Locality-preserving hashing in multidimensional spaces. In Proceedings of the 29th Annual ACM Symposium on Theory of Computing (STOC '97), pages 618--625. ACM Press, 1999.]] Google ScholarDigital Library
- D. E. Knuth. Sorting and Searching, volume 3 of The Art of Computer Programming. Addison-Wesley Publishing Co., Reading, Mass., second edition, 1998.]] Google ScholarDigital Library
- N. Linial and O. Sasson. Non-expansive hashing. Combinatorica, 18(1):121--132, 1998.]]Google ScholarCross Ref
- R. Pagh and F. F. Rodler. Cuckoo hashing. In Proceedings of the 9th European Symposium on Algorithms (ESA '01), volume 2161 of Lecture Notes in Computer Science, pages 121--133. Springer-Verlag, 2001.]] Google ScholarDigital Library
- J. P. Schmidt and A. Siegel. On aspects of universality and performance for closed hashing (extended abstract). In Proceedings of the 21st Annual ACM Symposium on Theory of Computing (STOC '89), pages 355--366. ACM Press, 1989.]] Google ScholarDigital Library
- J. P. Schmidt and A. Siegel. The analysis of closed hashing under limited randomness (extended abstract). In Proceedings of the 22nd Annual ACM Symposium on Theory of Computing (STOC '90), pages 224--234. ACM Press, 1990.]] Google ScholarDigital Library
- A. Siegel. On universal classes of fast high performance hash functions, their time-space tradeoff, and their applications. In Proceedings of the 30th Annual Symposium on Foundations of Computer Science (FOCS '89), pages 20--25. IEEE Comput. Soc. Press, 1989.]]Google ScholarDigital Library
- A. Siegel. On universal classes of extremely random constant time hash functions and their time-space tradeoff. Technical Report TR1995-684, New York University, 1995.]] Google ScholarDigital Library
- B. Vocking. How asymmetry helps load balancing. In Proceedings of the 40th Annual Symposium on Foundations of Computer Science (FOCS~'99), pages 131--141. IEEE Comput. Soc. Press, 1999.]] Google ScholarDigital Library
Index Terms
- Uniform hashing in constant time and linear space
Recommendations
Almost random graphs with simple hash functions
STOC '03: Proceedings of the thirty-fifth annual ACM symposium on Theory of computingWe describe a simple randomized construction for generating pairs of hash functions h1,h2 from a universe U to ranges V = [m] = (0,1,...,m-1) and W = [m] so that for every key set S ⊆ U with n = |S| ≤ m/(1 + ε) the (random) bipartite (multi)graph with ...
Uniform Hashing in Constant Time and Optimal Space
Many algorithms and data structures employing hashing have been analyzed under the uniform hashing assumption, i.e., the assumption that hash functions behave like truly random functions. Starting with the discovery of universal hash functions, many ...
Entropy-Learned Hashing: Constant Time Hashing with Controllable Uniformity
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataHashing is a widely used technique for creating uniformly random numbers from arbitrary data. This is required in a large range of core data-driven operations including indexing, partitioning, filters, and sketches. As such, hashing is a core component ...
Comments