o1rc9f uploaded and BCM 0.02 results are in the table.
o1rc9f uploaded and BCM 0.02 results are in the table.
Thank you!![]()
Tested an LZP preprocessor with BCM. Well, it not only eliminates the worst case, additionally, it heavily improves processing speed and compression ratio!![]()
![]()
Testing BCM v0.02 versus BCM v0.01
Enwik8 :
BCM v0.02 : 23.761.415 bytes / 101 sec /decomp 62,4 sec
BCM v0.01 : 21.824.948 bytes / 126 sec /decomp 49.1 sec
Scribble.wav (9.249.480 bytes) :
BCM v0.02 : 11.654 bytes / 637 sec /decomp 4,63 sec
BCM v0.01 : 12.473 bytes / 1253 sec/decomp 3,49 sec
Scribble.wav seems to be a typical "worst case" file for BCM sorting algorithm
for comparison :
DARK v0.51 p-b10mf : 9.638 bytes / 3,5 sec !!!
DARK v0.51 p-b10mfsr : 9.587 bytes / 2,3 sec !!!
(10m = block size / f = disable fragmentation / s = disable solid / r = disable reverse)
BBB v1 (cfm10) : 13.014 bytes / 1170 sec
For the next test I've used TAR to put the same data that Shelwien uses together :
Date/time Size Filename
----------------------------------------
2008-06-17 12:13:24 35.594.240 ./shelwdata/bookstar.txt
2008-06-17 11:45:02 52.459.520 ./shelwdata/cyg_bin.tar
2008-06-17 11:46:22 95.436.800 ./shelwdata/cyg_lib.tar
2006-05-19 15:51:12 100.000.000 ./shelwdata/enwik8.txt
2007-02-21 10:23:02 31.851.425 ./shelwdata/finn_lst.txt
2006-09-19 12:14:38 64.368.429 ./shelwdata/gits2op_mkv.mkv
----------------------------------------
6 files, total 379.710.414 bytes
Shelwdata.tar (379.710.414 bytes)
BCM v0.02 : 141.751.386 bytes / 503,5 sec /decomp 259,52 sec
BCM v0.01 : 138.700.039 bytes / 596,6 sec /decomp 198,39 sec
Based on these results, it seems that compression is worse on some files while speed is better. Decompression speed seems to be decreased in 0.02.
@ encode
I wonder why the sorting takes so long... what algorithm does BCM use for sorting ?
I've tested BWT v1.10 that was posted here : it outputs the BWT in a fraction of time compared to BCM. Is the BWT output the same for al diffrent algoritms ?
Maybe you could build a version using BWT 1.10 for sorting and then feeding this to your CM ? It would be over 10x faster on the "worst case" files !!
What do you think about this idea ?
Last edited by pat357; 2nd July 2008 at 02:31. Reason: Decompression times included for BCM v0.02 & BCM v0.01
BTW, can anyone test BCM v0.01 and v0.02 for the decompression speed?
With BCM v0.01 I used VS2005 and with v0.02 I use VS2008!
On text files too short MINMATCH of LZP may hurt compression, but starting at MINMATCH=64 the compression loss is negligible.
Furthermore, as I said, LZP may improve compression, for example:
calgary.tar:
BCM -> 791,434 bytes, 6 sec
BCM+LZP -> 786,640 bytes, 2 sec
3X times speed boost in pair with higher compression...![]()
Short matches replaced by LZP introduces distortion, since the coded match lengths insert artifical symbols, which destroy the context. And BWT heavily relies on the context, since it is context sorting. LZP might only improve compression on very homogenous data, where LZP almost always relpaces the same sequences with the same symbols (match lengths), like: aaaa aaaa -> (by lzp) B B.
Why don't you try a static string substitution? This is more likely to improve compression.
I wonder how your BCM will do, when you remove PIC from calgary.tar... I bet there's no compression gain at all, for the reason i said above.
And you really shouldn't test against such a tar with inhomogenous data.
It's not my implementation. Just my algorithm.
There was a 32MB blocksize variation at:
http://mij4x.datacompression.jp/junk...2005-02-15.zip
You have to right click in the app GUI to get the context menu which offers different blocksizes. I think he released his source code to a short while back.
- Michael
I've included the decompression times in my previous post :
http://encode.su/forum/showpost.php?p=1453&postcount=34
You might have a look there.
It seems that v0.02 is slower in decompressing than 0.01. Could this be due the smaller block-size ?
o1c9g uploaded
o1rc9f uses 3 mixers, o1rc9g uses 4 (still order1 max contexts)Code:[BCM] [o1rc9f] [o1rc9g] [BCM+LZP] calgary.tar 791,435 785456 784037 786,640 book1 212,674 211047 210908 world95.txt 474,291 470390 469640
Also o1rc can be tuned for use with LZP preprocessor (which one, flzp?).
if anybody is interested, though you can test it yourself as is.
Can you please test and compare attached compile to the original BCM v0.01?
Like I said this one was compiled using VS2008, original BCM v0.01 was compiled using VS2005...
On my machine VS2008 version is crazily slower:
ENWIK8 decompression:
VS2005 - 44 sec
VS2008 - 107 sec
Will try to recompile with new profile...![]()
Properly recompiled BCM v0.02a:
bcm v0.02a
has for me the same compression result as version 0.02
but seems to be a little bit slower tehn version 0.02
on a system with 4 gbyte ram:
bcm 002a memory request:
sorting.. 58.644 kbytes
encoding. 42.240 kbytes
42.244 kbytes
sorting.. 58.644 kbytes
...
...
compression:
sourcefile db.dmp 648.331.264 bytes
balz113 e 33.390.086 bytes 9 min
7zip 35.151.362 bytes 19 min
bcm001 31.927.387 bytes 22 min
bcm002 31.022.446 bytes 17 min
bcm002a 31.022.446 bytes 19 min
where the news in this version ?
this version is more stable in "worst case" ?
New timing test :
Enwik8 :
BCM v0.02 : 23.761.415 bytes / 101 sec /decomp 62,4 sec
BCM v0.02a / 102,2 sec /deomp 49,4 sec
Shelwdata.tar (379.710.414 bytes)
BCM v0.02 : 141.751.386 bytes / 503,5 sec /decomp 259,52 sec
BCM v0.02a : / 534,0 sec /decomp 195.9 sec
Tested on C2Q E6600 (quadcore @ 2.2 Mhz / 4GB RAM /Vista Prem (32 bit)
It seems compression is a bit slower. while decomp. is FASTER !!
Did you expect this ?![]()
Last edited by pat357; 3rd July 2008 at 05:27.
Some of us are looking forward to the next version of BCM with LZP preprocessing !
Can you tell us when you're going to release this version ?
- I think the sorting algorithm is the next first thing to look at because it is the reason why BCM is very slow for some files. Maybe a complete different sorting algorithm, like Msufsort, DivSufsort, ....). Memory from 7N to 6N - 5N.
- Second imho would be fragmentation, reversed bytes, ...
- 3rd step would be further optimizing the CM
- 4th step could be introducing filters based on content (not only extension) : exe, wav, images,...
BTW about segmentation/fragmentation : someone posted here the "hidden" switch in Durilca v0.2 but I forgot it. How was again ?
durilca e -farchive file.tar -s or -f or -t.. .???.
The seg_file works only for smaller files.
Last edited by pat357; 5th July 2008 at 02:45.
Well, currently I fully switched to BALZ project. I need to write some README, design a new homepage, plus do some cosmetic source changes.
After, I might release new BCM. Anyway, mostly, BCM is just BWT testbed for me. Anyway, for now you may test BCM with Dmitry Shkarin's LZP:
http://compression.ru/ds/lzp.rar
![]()
I'm a bit confused as to what was the latest on your sorting Ilia? Are you still using std::sort? If I may humbly suggest couple of basic tips which you can explore. First take order1 stats of the block, then make the pointers based on these statistics. Then sort each of two byte buckets with custom the sedgewick quicksort variant. The sedgewick variant should work on dwords (like strcmp with dwords instead of bytes), directly comparing the input in backward direction (if we presume we are working on little endian platform), and the comparison function should be inlined. These are fairly easy modifications, as you can find sedgewick sorting source code online, and see bzip2 source for the initial order1 stats. If LZP is used before compression, with these settings you can match (or get close to) blizzard in compression speed, while requiring only 5n memory.
Well, like I said, the main goal of BCM is just to test BWT-output coding with my CM - no more no less...![]()
encode wrote:
Well, now ("After finalizing BALZ")
I have some time to release BCM with LZP preprocessing - to be in the game!
pat357 wrote:
"I guess it is possible to implement BWT for 4 cores..."
"(Winrk might be even doing this already, not sure though..)"
@encode:
Do you think - would it be possible to modify your algorithm in direction using 2 or more threads ?
the compression can done much faster ...
Well, as you know, I spent that time for a new version of my PIM file archiver...Just completely have no spare time these days...
I'm not sure that Malcolm is that advanced in BWT. He's about cool LZ and PPM-based stuff a la ROLZ and some PAQ snips like FPW and PWCM.
Not sure about the CM part. The sorting part, which makes compression so slow, can be, I guess. Since I use stable_sort() there is no way to do such thing. However, with custom sorting routines we may do that. Hm, I think it is possible to compress each block separately and independently with different threads - for example, 2-cores = we compress two blocks simultaneously, 4-cores = 4 blocks, etc.![]()
His current Mcomp V2.0 seems to use 4 cores for BWT :
http://www.msoftware.biz/blog/2008/0...cs#attachments
Code:> mcomp_x32 LibMComp Demo Compressor (v2.00). Copyright (c) 2008 M Software Ltd. mcomp [options] pofile(s) Options: -m[..] Compression method: b - BZIP2. c - Experimental DMC codec. d - Optimised deflate (df - fast, dx - max) d64 - Optimised deflate64 (d64f - fast, d64x - max) lz - Optimised LZ (lzf - fast, lzx - max) f - Optimised ROLZ (ff - fast, fx - max) f3 - Optimised ROLZ3 (f3f - fast, f3x - max) p - PPMd var.J. sl - Bitstream (LSB first). sm - Bitstream (MSB first). w - Experimental BWT codec. -MNN[k,m] Model size (in kb (default) or Mb, default 64M). -oNN Order (for Bitstream and PPMd). -np Display no progress information. > mcomp_x32 -mw fp.log fplog_mw.mcomp LibMComp Demo Compressor (v2.00). Copyright (c) 2008 M Software Ltd. Thread pool size = 4 Using 1280Mb (4 threads) 100% From 20617071 => To 559960 Elapsed time: 5.355s (3759.82kps)
Last edited by pat357; 10th September 2008 at 19:37.