ShockHash

A perfect hash function is a function that has no collisions on a given set. ShockHash constructs very compact perfect hash functions significantly faster than previous approaches.

191 commits | Last commit 6 months ago

What ShockHash can do for you

A minimal perfect hash function (MPHF) maps a set S of n keys to the first n integers without collisions. There is a lower bound of n*log(e)-O(log n) bits of space needed to represent an MPHF. A matching upper bound is obtained using the brute-force algorithm that tries random hash functions until stumbling on an MPHF and stores that function's seed. In expectation, e^n*poly(n) seeds need to be tested. The most space-efficient previous algorithms for constructing MPHFs all use such a brute-force approach as a basic building block.
We introduce ShockHash - Small, heavily overloaded cuckoo hash tables. ShockHash uses two hash functions h0 and h1, hoping for the existence of a function f:S -> {0,1} such that x -> h_{f(x)}(x) is an MPHF on S. In graph terminology, ShockHash generates n-edge random graphs until stumbling on a pseudoforest - a graph where each component contains as many edges as nodes. Using cuckoo hashing, ShockHash then derives an MPHF from the pseudoforest in linear time. It uses a 1-bit retrieval data structure to store f using n+o(n) bits.
By carefully analyzing the probability that a random graph is a pseudoforest, we show that ShockHash needs to try only (e/2)^n*poly(n) hash function seeds in expectation, reducing the space for storing the seed by roughly n bits. This makes ShockHash almost a factor 2^n faster than brute-force, while maintaining the asymptotically optimal space consumption. An implementation within the RecSplit framework yields the currently most space efficient MPHFs, i.e., competing approaches need about two orders of magnitude more work to achieve the same space.

Keywords
No keywords available
Programming languages
  • C++ 97%
  • CMake 2%
  • Shell 1%
License
Not specified
</>Source code

Reference papers

Related projects

no image

Core Informatics

A Helmholtz Pilot Program

Updated 8 months ago
In progress

Related software

SicHash

SI

A perfect hash function is a function that has no collisions on a given set. SicHash places objects in a cuckoo hash table and then stores the final hash function choice of each object in a retrieval data structure. Using irregular cuckoo hashing, each object has a different number of hash functions

Updated 2 weeks ago