Yes!Originally Posted by encode
I will try the test again after a reboot and post the results later.
Here is a link to the compressed ENWIK8 file.
http://rapidshare.com/files/98061712/enwik8.balz.html
Yes!Originally Posted by encode
I will try the test again after a reboot and post the results later.
Here is a link to the compressed ENWIK8 file.
http://rapidshare.com/files/98061712/enwik8.balz.html
Can't confirm either... However, I can confirm that compression on your PC went bad in some point - BALZ 1.02 compresses enwik8 to 29.2 MB -> 5 MB would be new Hutter prize record by a HUGE marginYour corrupted archive decompresses to 100,000,121 bytes also here.
EDIT: I really HATE those kitten fonts at rapidshare! Why are they doing that to us?
EDIT2: Fed original ENWIK8 and the LovePimple's decompressed one to file comparator - since byte 16775499 (including) everything is corrupted, I believe this is size of one block.
Test failed again, but compressed size was 24.4 MB (25,622,851 bytes) this time. Decompressed file is once again longer than the original.
BALZ does not report any errors during compression or decompression of the file.
Intel Core 2 duo E6600 Test OK.
I have repeated the test on Enwik8 with my program of test for MOC and in file decompressed it is identical to the original .
The compressed size of ENWIK8 should be 30,634,726 bytes. CRC32 of the compressed file: 28DC2D3C
The block size is exactly 16,777,216 bytes (i.e. - 16 MB). In this case - the 5 MB is a compressed 16775498 bytes of ENWIK8 - after, the decompressor produces a garbage...Originally Posted by Black_Fox
The compressor and decompressor has no error checking - since the entire program so simple - nothing to be wrong... The larger decompressed file is possible only if youll try to decompress a corrupted file - since the original file size is kept in a small file header.Originally Posted by LovePimple
The link to a valid compressed ENWIK8:
ENWIK8.balz (29 MB)
Try to decompress it on your machine!
I think this may be the OS/hardware related problem...Maybe some compiler optimizations is not compatible with such old CPU... Get Core 2 Duo!
![]()
Analyzed the 5MB file provided by LovePimple. Well, looks like BALZ crashes on LovePimple's machine after compressing first block. I tried to reproduce such thing - kill the BALZ just after first 16 MB compression completes, without arithmetic encoder flushing and other stuff - and I got exactly the same file as LovePimple provided! Later, he says that he get 24 MB file, I think 24 MB is a compressed ENWIK8 without the last block. Concluding, the BALZ crashes on block boundaries on his CPU.
The question - does BALZ displays 'done' message after compressing? If it is, the BALZ somehow crashes within the encode() function. Otherwise, the BALZ simple crashes on his machine...
![]()
balz 1.02 was OK on LTCB. I verified decompression for both files.
enwik8=30,634,726
enwik9=268,552,062 (6 hours to compress, 1 minute to decompress)
http://cs.fit.edu/~mmahoney/compression/text.html
Thanks Matt! Please test RZM is incredible compressor! Hi!![]()
Decompresses in about 30 seconds. Byte check (FC) reports the decompressed file to be exactly the same as the original.Originally Posted by encode
YES!Originally Posted by encode
Thank you, Matt!Originally Posted by Matt Mahoney
LovePimple
A special DEBUG release for you:
balz102d.zip (50 KB)
- please report the compressors output!
![]()
Will run test now!Originally Posted by encode
![]()
balz v1.02DEBUG by encode
sizeof(head)=67108864
sizeof(prev)=67108864
sizeof(path)=201326592
size=100000000
==================================================
n=16777216
pass 0 - exe transformation...
exetransform(): no 0x4550 detected
memset(head) - OK
memset(path) - OK
pass 1 - finding all matches...
(optimizing 16384k block...)
pass 2 - constructing path...
smallest approx. cost=53408815
pass 3 - encoding LZ output...
encoding block - OK
==================================================
n=16777216
pass 0 - exe transformation...
exetransform(): no 0x4550 detected
memset(head) - OK
memset(path) - OK
pass 1 - finding all matches...
(optimizing 16384k block...)
37%
Compressed file size is 5,177,344 bytes.
Elapsed Time: 00:33:53.051 (2033.051 Seconds)
Here is a link to the compressed file.
http://rapidshare.com/files/98525223/enwik8.balz.html
As you can see, BALZ falls out at the middle of the match searching... No done message displayed... Just have no idea what can be wrong...Originally Posted by LovePimple
![]()
Will think more deeply...![]()
OK, check out a different compile (VS2005 SP1):
balz102vs2005.zip (42 KB)
![]()
I didnt take note on the first test, but it did display the done message after producing the 24.4 MB file on the second test.Originally Posted by encode
This version seems to be working OK.Originally Posted by encode
Compression
Compressed size: 30,634,726 bytes
Elapsed Time: 02:46:44.181 (10004.181 Seconds)
Decompression
Elapsed Time: 00:00:37.520 (37.520 Seconds)
Byte check shows decompressed file is identical to the original.
By the way, BALZ v1.03 is coming! It will be released within a week.
What's new:
+ Now BALZ has two modes - fast (default) and max. Fast mode is very fast and crazily efficient, plus it uses less memory.
You need
~144 MB for compression with fast mode and
~336 MB for compression with max mode.
Fast mode is <u>50X times</u> faster in average than max mode!
+ Improved compression in max mode - new BALZ has slightly higher compression being fully compatible with BALZ v1.02
New command line interface:
balz ex test.dat test.balz - max mode
balz e test.dat test.balz - fast mode
![]()
what's the dictionary size?
Same 512 KB!
New fast mode uses unoptimized parsing (greedy). Plus, some tricks like discarding not good enough matches, for higher compression. I also played with Lazy Matching, well this trick makes BALZ 2X-4X times slower at the minor compression gain in most cases - thanks to discarding matches trick - in some cases it compensates such suboptimal parsing. Maybe in future versions I will add a Lazy Matching as an additional mode, but current Fast mode is a must have for BALZ!![]()
it's hard to imagine how 300mb of memory may be used for 512kb dict![]()
Well, actually match finder can search matches within entire buffer (16 MB). If we find a match at longer distance than 512 KB - we stop the search. I use such approach to avoid circular buffer stuff.Originally Posted by Bulat Ziganshin
![]()
![]()
At the same time I do SS parsing thru the entire buffer.![]()
well, it seems like highly asymmetric solution![]()
Yep!![]()
Having said thad I will do additional experiments with Lazy Matching. Especially, with Deflate-like approach. Soon, I'll post the results and you will able to suggest which one to use as default.![]()
are you read Kadach dissertation? he had a lot of interesting ideas including improved lazy matching. that's from my notebook:
http://magicssoft.ru/content/docs/12/phd.zip
в lzh части есть с полдюжины идей, которые не были известны на 97-й год,
втч. четыре, представляющих для меня интерес и сейчас:
- корзинная МЦ сортировка для построения дерева Хафмана
- "оптимальные" алгоритмы парсинга
- замена в хешируемой строке пробелов/нулевых символов на следующие за ней
- приоритет REPDIST кодам перед обычными дистанциями
(hope that now when i use Opera, russian chars will be rendered properly)
Of course, I'm familiar with this paper. Very hard to read, though - too many formulas and pseudo science language. I'm afraid that I'm not correctly understand the ideas listed there. Anyway, in my opinion nothing really new.
BTW, (compression people should write this as BWT), looks like magicssoft.ru is dead!
Correct link to this paper (Russian):
http://compression.ru/download/artic...d_1997_pdf.rar
Just found that new Greedy coder may outperform BALZ with "optimal" one:
canterbury.tar
BALZ v1.03, ex: 576,687 bytes
BALZ v1.03, e: 533,254 bytes
Testing new mode, I'm thinking about to throw away this SS stuff and keep Greedy and Lazy parsers only... If you'll test new BALZ you'll understand what I mean.
![]()
Hello everyone,
Funny how time is flowing sometimes...Originally Posted by encode
Quick decision, indeed. Long live democracy!Originally Posted by encode
Just kidding!
Well done Ilia, waiting for testing-results.
Best regards!