Results 1 to 11 of 11

Thread: Interesting Deflate source

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Look at this source:
    http://cm.bell-labs.com/sources/plan...late/deflate.c

    Especially, I was interested in a hash function. This implementation uses multiplicative hashing:
    Code:
     
    #define hashit(c) (((ulong)(c)&0xffffff)*0x6b43a9b5)>>(32-HashLog))
    The value 0x6b43a9b5 (1799596469) is not even a prime, but I tested such constant with BALZ for 3-byte hashing and it is one of the best so far.

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,340
    Thanks
    211
    Thanked 1,009 Times in 534 Posts
    Why don't you tune your hash to the data, instead of using some random values, prime or not?

  3. #3
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Shelwien
    Why dont you tune your hash to the data
    I do. I carefully test the multiplier on the real files, collecting various performance data!

  4. #4
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,503
    Thanks
    741
    Thanked 665 Times in 359 Posts
    Quote Originally Posted by encode
    The value 0x6b43a9b5 (1799596469) is not even a prime,
    the value shouldnt be a prime. primeness is just some sign of rather good multipliers i think that this multiplier will be not much different to your one. actually, once ive tested your prime against ideal hashing and found a very little difference

    multiplication scheme has only one serious cabeat - highest bits of hashed value affects only highest bits of resulting hash value. smth like this:

    (x+(x>>16))*123456791

    may improve hashing quality

  5. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,340
    Thanks
    211
    Thanked 1,009 Times in 534 Posts
    > highest bits of hashed value
    > affects only highest bits of resulting hash value. smth like this:
    > (x+(x>>16))*123456791
    > may improve hashing quality

    (x+(x>>16)) ~= (x+x/65536) = x*(65537/65536)
    x*(65537/65536)*123456791 = x*123458674.801132
    So I'm not so sure about that.

    >> Why don't you tune your hash to the data
    > I do. I carefully test the multiplier on the real files,
    > collecting various performance data!

    Somehow I think that you just manually try out and compare some
    selected hash functions/parameters.

    And I was talking about
    1. Log the memory access pattern for some dataset
    with unhashed contexts for addresses
    (reads/writes; also other parameters like value size
    and offset in hash cell)
    2. Measure the processing speed for all 2^32 multiplier values
    (or including other hash functions too), keeping some number
    of best results.
    3. Test the real compressor performance with these stored
    best hash configurations.

  6. #6
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Shelwien
    Somehow I think that you just manually try out and compare some
    selected hash functions/parameters.
    Actually, in my case the hash function is not so extremely important. I hash 24-bit value to 1M hash HEAD. Its more than enough number of hash buckets. Since I use hash chains, hash function should do good enough separation and spread all 16M values to 1M hashes - to minimize hash chain branches. I kept multiplicative hashing. I tested the multiplier in next manner - I keep only HEAD for string searching (i.e. no further traverse on PREV) and measure the compression ratio on various files.

  7. #7
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,340
    Thanks
    211
    Thanked 1,009 Times in 534 Posts
    Well, I was talking about hash speed.
    And that depends mostly not on the distribution of indexes,
    but on the alignment and clustering of accesses.
    Your explanation didn't contain anything about that.

  8. #8
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,503
    Thanks
    741
    Thanked 665 Times in 359 Posts
    Quote Originally Posted by Shelwien
    (x+(x>>16)) ~= (x+x/65536) = x*(65537/65536)
    x*(65537/65536)*123456791 = x*123458674.801132
    So Im not so sure about that.
    why? it solves problem of distributing influence of higher value bits among all hash bits

  9. #9
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,340
    Thanks
    211
    Thanked 1,009 Times in 534 Posts
    I mean it doesn't look any different from simply changing the multiplier value.

  10. #10
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Here i really like Matt's solution, since it obviously does what you want.
    M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk

  11. #11
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,503
    Thanks
    741
    Thanked 665 Times in 359 Posts
    you may cosider this as an idea of using non-integer multipliers

Similar Threads

  1. Interesting tools
    By lunaris in forum Data Compression
    Replies: 2
    Last Post: 26th August 2009, 00:50
  2. DEFLATE/zlib implementations
    By GerryB in forum Data Compression
    Replies: 10
    Last Post: 7th May 2009, 18:03
  3. deflate model for paq8?
    By kaitz in forum Data Compression
    Replies: 2
    Last Post: 6th February 2009, 21:48
  4. FreeArc is becoming more and more interesting...
    By Vacon in forum Forum Archive
    Replies: 65
    Last Post: 9th December 2007, 21:41
  5. Outdated, but maybe interesting...?
    By Vacon in forum Forum Archive
    Replies: 1
    Last Post: 20th October 2007, 21:13

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •