Originally Posted by

**Matt Mahoney**
It depends. If your 32 bit hash is:

Code:

hash = hash * (m << s) + next_byte;

then m can be any odd number and you have an order ceil(32/s) hash. But if you compute several hashes that share a hash table, then you want the multipliers to be relatively prime to reduce collisions. The easy way to do that is to choose a different prime m for each hash.

However, my approach which was used in LWCX is slightly different (same as Toffer's and Shelwien's though):

Code:

hash=(context*C)>>32; // step 1
(...)
hash=hash XOR (last_nibble_context*N); // step 2

I've optimized that C and N via GA+RHC and I've found that coefficients:

Code:

HASH_N0=0x4e1d74b5 (1310553269 = 43 x 821 x 37123)
HASH_N1=0x11382b3f (288893759 = 7 x 7 x 11 x 607 x 883)
HASH_N2=0xf05c8f6e (4032597870 = 2 x 3 x 3 x 5 x 7 x 353 x 18133)
HASH_N3=0xbdbc7a0b (3183245835 = 3 x 5 x 7 x 11 x 17 x 223 x 727)
HASH_C0=0x7b148525018bcc9e (8868859960185572510 = 2 x 5 x 5821 x 2653429 x 57419939)
HASH_C1=0x7307540b9da02834 (8288686048064579636 = 2 x 2 x 1021 x etc)
HASH_C2=0x215433d3cdc27541 (2401601486078506305 = 3 x 5 x 11 x 2087 x etc)
HASH_C3=0x6e14c071a725dfbf (7932176438074400703 = 3 x 40129 x etc)

As you see they are not primes. Note that, each nibbles have different hashing table. In the other words, there are two hash tables in total. But, 4 context models share the same hash table.

I meant in the previous post that sometimes collisions have positive effect on compression because of implicitly context merging. So, trying to reduce collisions should not be always a good idea.