Aleksei Safronov

1 posts

Aleksei Safronov

Aleksei Safronov

@asafrnv

ICPC '2013 (47), '2016 (26) / Google HashCode '2017 (1), '2018 (4), '2019 (2), '2021 (8) / AtCoder (yarrr) / Moscow Aviation Institute

Zurich, Switzerland Katılım Ocak 2025
46 Takip Edilen3 Takipçiler
Ash Vardanian
Ash Vardanian@ashvardanian·
How do you measure the quality of a hash function? Beyond raw speed, you want statistical sanity. For StringZilla’s new hash, I ran the full SMHasher suite — but also built a smaller, minimalistic Rust suite to cover the core signals: avalanche, distribution, and short-input integer collisions. Avalanche (SAC/BIC): Flip one input bit and observe output bits. Ideally ~50% of them flip (Strict Avalanche Criterion), and flips behave independently across positions (Bit-Independence Criterion). Distribution (χ²): Hash N-grams, drop into fixed buckets, and compute a χ² statistic against uniformity. Lower is better; it surfaces skew that raw collision counts miss. Integral collisions (N ≤ 8 bytes): Treat short byte strings as little-endian integers, hash, then map to n slots for n samples (hash % n). A "perfect hash" yields zero collisions. For typical hashes, collisions converge to ≈ n × (1 − e⁻¹) ≈ 36.8% — the birthday bound. github.com/ashvardanian/H…
English
3
1
10
1.1K