Results 1 to 6 of 6

Thread: Knuth's Multiplicative Hashing

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts

    Knuth's Multiplicative Hashing

    Let's open a small descussion about my favorite hashing method.

    The hash function can be defined as follows:
    Code:
    hash = (value * NUMBER) >> (32 - HASHBITS)
    If we deal with 32-bit math.

    The NUMBER should be large enough, possibly prime. However, it's subject for experiments.

    In some papers we may see the golden ratio term:
    2654435761 = 2^32 * Phi (golden ratio)

    In some sources we may find another values, like this one from Deflate by Bell Labs:
    0x6b43a9b5


    In sources from this forum you may find something like:
    123456791

    etc.



  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,239
    Thanks
    192
    Thanked 968 Times in 501 Posts
    Its pretty similar to rc logic, don't you think?
    And its not accidental as ideal hash is a mapping of value to its index,
    and compression is exactly the same thing.

  3. #3
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    Possibly...

    I wonder is an optimal multiplier exists? I just consider myself that multiplier like 123456791 is not optimal with 3-byte sequences. Probably such multiplier is too small for hashing 24-bit value, since larger values works better. Maybe 2654435761 should be the best? Don't you remember what Knuth wrote about this problem in his book?

  4. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,239
    Thanks
    192
    Thanked 968 Times in 501 Posts
    Doesn't look like you understand...
    There's no single optimal coefficient, because its performance depends on the specific distribution of values.
    Instead, to make it perfect you should create a model for that mapping, and then approximate it until it becomes fast enough.
    Also afaik that topic is covered by some lattice theory so Knuth didn't discuss it in much detail.

  5. #5
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,979
    Thanks
    376
    Thanked 347 Times in 137 Posts
    I do understand...

    It's about good mixing - i.e. each input bit should affect each output bit...

    http://bretm.home.comcast.net/~bretm/hash/

  6. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,239
    Thanks
    192
    Thanked 968 Times in 501 Posts
    I supposed that you want less collisions, not obfuscation.
    And for less collisions you'd have to exploit the value's internal correlations,
    by not assigning any cells to improbable values and assigning more cells to probable ones.

    Also multiplication is just a conditional sum of value shifts, so you can
    easily imagine what would happen after adding a specific shift, knowing
    that your value is eg. 4 adjacent symbols from text.

Similar Threads

  1. A nice article about hashing
    By encode in forum Forum Archive
    Replies: 19
    Last Post: 26th September 2007, 21:31

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •