Okay, here is the new testbed version. Please test in carefully for:
+ Decompression speed (compared to the 1.10 and 1.11)
+ Compression speed with both modes (again compared to 1.10 and 1.11)
Thank you!
Link:
quad111x.zip (30 KB)
![]()
Okay, here is the new testbed version. Please test in carefully for:
+ Decompression speed (compared to the 1.10 and 1.11)
+ Compression speed with both modes (again compared to 1.10 and 1.11)
Thank you!
Link:
quad111x.zip (30 KB)
![]()
Link #2:
quad111x2.zip (30 KB)
QUAD 1.11x2 is also here! This one has not inlined e8e9 - just to see is there a difference - or probably this one works even faster!![]()
Here you go Ilia.
Compression Speed (ENWIK8 with -x):
1.10: 88.1s
1.11: 89.6s
1.11x: 83.5s
Compression Speed (ENWIK8 without -x):
1.10: 46.3s
1.11: 46.4s
1.11x: 46.1s
Decompression Speed (ENWIK8 compressed with 1.11 -x):
1.10: 13.1s
1.11: 13.6s
1.11x: 13.1s
I think you should keep 1.11x.
Too late. But I dont think this will be slower/faster. I mean, the filter is used once every 16M.Originally Posted by encode
![]()
Thanks for testing!![]()
Link #3:
quad-1.11xHASH.zip
A very draft version with hashing (memory usage + 4MB). But on some files this one works two times faster!!!![]()
Link #4:
quad-1.11HASH2.zip
Improved version, works even faster with BOTH modes! (+ 8 MB usage)
![]()
I can confirm an incredible speed boost of quad 1.11 hash2;
the results will be online in an hour from now
Cheers
my congratulations! now quad is definitely better than lzma:fast on text files! speed improvement over previous version is 1.5-2 times
Thank you!Originally Posted by Stephan Busch
Remember, this is only a beginning - I just started playing with hashing...![]()
Quick test...
Test file: ENWIK8
QUAD v1.11
Compression Time = 00:02:19.734
QUAD v1.11 HASH2
Compression Time = 00:01:25.953
QUAD v1.11 -x
Compression Time = 00:04:12.406
QUAD v1.11 HASH2 -x
Compression Time = 00:02:28.125
This HASH2 version has awesome speed!![]()
![]()
Awesome improvement, Ilia!
Yes, it is faster.
http://cs.fit.edu/~mmahoney/compression/text.html# 2561
Thank you Matt!Originally Posted by Matt Mahoney
I can explain what Ive done - its all about memory access.
During each search for the longest match, we need a random access to the buffer and this leads to the cache misses and unefficient memory use. For example:
Assume current context is "ab".
The table says that such context already appears at:
4
12524
675467
976579
1065744
etc.
This offsets can looks just like random values.
Okay, so now we need to access to each offset to check the match - but now you can see how far we may jump for each match search, even if we access to the one byte (primary match check).
To minimize the access to the buffer, we can keep an additional table with followed to context bytes. This can be 2-bytes pair or hash or even 1-byte.
In this case, at each match search, first we just check this tab for current N-bytes. If its match, we try to acces to the buffer to get the real match.
So, for each context->offset stored in table, we keep the followed bytes or hash. This table updates in the same way as with tab[] - i.e. we just keep the list of recent items. And thats it!
Currently Im experimenting to find the best approach. At the first sight, this thing is worth to add. And finally, QUAD became memory asymmetric - for compression we need a slightly larger number of bytes compared to the decompression. The QUAD 1.11HASH2 is symmetric is due to its experimental-draft nature...
![]()
some additional info:
optimizing program, you always have some important measure you should minimize. for example, with SQL this may be amount of data fetched or queries issued
for asm optimization, in old good days main measure was a cpu cycles. but nowadays before optimizing cpu cycles one should first optimize number of cache misses. this is especially important for LZ and most other comporession schemes which utilize many mbytes with rather random access
one memory read costs ~100 cpu cycles, but cpu reads a whole cache line each time. this cache line is 16 bytes long (and, afair, 32 bytes in 64-bit mode) and of course aligned to 16-byte boundary. so, the question is how much 16-byte blocks your program touch
with original scheme, your program reads 1.25 blocks for each checked match (even a bit more because checked strings may cross block boundary)
with a scheme that saves 4 bytes of each match together with match index, you get 0.5 blocks read for each match plus blocks whose match length was >4
by keeping match index and its hash together you can save a few memory accesses - i.e. you should make one array of structs instead of two separate arrays
about selection of hash function - it's a field for wide experimentsin particular, i suggest you to gather more statistics about typical match lengths. two ideas - first, you can use 3 bytes instead of 4 to store index. second - because your minimal match is 3 byte, you can store reduced hash of these bytes, it should be not so good for following bytes. hash formula may be the following:
(value*789123456)>>24
I have an idea to get the clever usage of the current tab[] - keep the actual offset + 8-bit hash or one followed byte in each element.
int p = tab[x][m] & 0xffffff;
int hash = tab[x][m] >> 24;
or something like that...
About hash, I like the simple schemes:
((c >> X) ^ c) & (N - 1)
This one works faster. In addition, we can use a modification of CRC32 / CRC16 for hashing.![]()
my hashing schemes combines speed and good bit shuffling. with xor, each hash bit depends only on two original bits
it should be better to keep index and any cached bytes together - this way you got 1 memory access instead of 2
What do you mean?Originally Posted by Bulat Ziganshin
Currently, I get the super thing! I keep the offset + 8-bit hash together - memory usage is the same as with 1.10 but compression speed is even faster than 1.11HASH2...![]()
Also I hash two bytes to 8-bit hash. What do you think, which hash function better?
I think I'll keep the 8-bit hash. hashing 3-bytes may lead to many hash collisions...
>Also I hash two bytes to 8-bit hash. What do you think, which hash function better?
my
>I think I'll keep the 8-bit hash. hashing 3-bytes may lead to many hash collisions...
anyway you need only strings with the same first 3 bytes
>What do you mean?
isntead of making *separate* array for hash data, it's better to keep hashes together with indexes, so they will lie in the same cache line
Read post below - I already done that...Originally Posted by Bulat Ziganshin
![]()
Yepp, your hash works better! But probably I must make some experiments with the MAGIC number.
AFAIK, it must be some large prime number... So I must experiment to get the best number for hashing of 2-bytes...![]()
THIS IS CRAZY!!!
I make the better hash function and I get VERY serious speed boost!!!!!!!!!!!!!!!!!!!!!!
inline unsigned int gethash(int i) { // DEBUG
return ((*(reinterpret_cast<unsigned short *>(
&buf[i])) * 987660757) >> 24);
}
The new magic number taken from PAQ1 and it works MUCH better! I can't beleive!![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
However, the performance is depent on data type. On fp.log anf reaktor.exe this magic number gives fantastic speed improvement, but on enwik8 this number gives no improvement (compared to the Bulat's magic). Anyway, compared to 1.11HASH2 this new version is indeed faster!
Continue digging...
inline unsigned int gethash(int i) { // FIXME
return ((*(reinterpret_cast<unsigned short *>(
&buf[i])) * 789123456) >> 24);
}
Well, after some tuning and testing Bulat's magic becomes the best! Thank you Bulat! Just one question why 789123456 ?![]()
Originally Posted by encode
Are you going to release another awesome testbed version before the final release?
I do!Originally Posted by LovePimple
![]()
just random 32-bit number. actually you should see at it's binary representation and check that there are about sixteen 1's and that there are no (too much) regular patterns like 111111 or 101010. as i said, this operation just shuffles the bits
don't forget to use entire 3 bytes for hash:
return ((*(ulong*)(buf+i)&0xffffff)*789123456) >> 24;
Excellent!!!Originally Posted by encode
![]()
? ???? ?? ?????? ???????????? ??????? ?????????? - ?????? ??????? ????? oder-0 ?????????? ???-????????, ?????????? ??????? ????????? ?? ????? ???????? ?????? ?? ???????? ??????? ??????. ??? ?? ?????????, ????? ????????? ???-??????? ?????? ?? ?????? ?????????![]()
I tried this, and this may lead some tiny speed boost with text files and some slowdown with binary data - due to hash collisions. So, currently I keep just two bytes - since its better in average...Originally Posted by Bulat Ziganshin
Will run some tests!Originally Posted by Bulat Ziganshin
![]()