A small vote about the future of the BCM.
Simplified v0.08. Notable faster, less compression.
Optimized v0.08. About the same speed, a small compression gain.
Enhanced v0.08+an extra SSE. Moderate speed penalty, nice compression gain.
Enhanced v0.08+two extra SSEs. Some speed penalty, really nice compression gain
Something else. Post your variant.
A small vote about the future of the BCM.
I choose the first
OK. Since I voted for "Something else" I'm posting my variant. The suggestion is to make a selectable compression level so new BCM will cover all above speed\ratio tradeoffs so both speed fans and compression maniacs will be happy
Anyway, if its not an option then I choose last variant - Enhanced v0.08+two extra SSEs. Some speed penalty, really nice compression gain.
Well, we already have fastest ever (LZOP) and slowest ever (PAQ, so anything in between is fine, honestly.
![]()
Today, I just tested all of the modifications. I was curious how many times the fastest version is faster compared to the slowest one. Well, it's less than two times. However, in terms of complexity, the slowest version is far more complex, I'd say it is 10 times more complex than the fastest. Anyway, the fastest can be even faster&simpler, but I see no reason to use a dummy coder inside the BCM. Still, the slowest version is a few times faster than BWTmix or BBB... I'm not talking about the latest BWMonstr...
Yep, I'm thinking about a selectable CM coder, say '-x' option will select the strongest one. Anyway, I did the straight compare of the fastest and the slowest one to see how it feels like. Well, at the regular user point of view relatively small compression gain not worth such extra processing time. 210xxx/209xxx bytes vs 208xxx bytes on book1, ~200k difference on ENWIK8 not plays the serious role at some point of view. But, since the compression/decompression speed of the slowest version is still quite acceptable, I think it's OK indeed!
![]()
Aha, Sami silently released BWMonstr v0.02. Thanks for news! Another thing to test![]()
Why don't you give the information everyone wants to read
Code:0.02, July 7, 2009 ------------------ This version implements a "compressed model" in which the data is kept compressed in memory all the time. Compressed model program flow: Compression: compression -> bwt -> compression Decompression: decompression -> compression -> unbwt -> decompression BWMonstr is able to perform BWT compression and decompression using about 0.5n space for English text. This is 10% of the amount that typical BWT implementations use and around 3% - 5% that of PPM or CM implementations. The program supports multi-threading. In practice the speedup for compression translates in the following way: 2 processors: 1.57x 3 processors: 1.83x 4 processors: 2.25x
Actually BWMonstr 0.02 has been released for awhile.
http://mattmahoney.net/dc/text.html#1605
Quite impressive. Uses less memory than the block size. Uses all cores in parallel on a single block with no loss of compression ratio. Makes the Pareto frontier on size/memory. Unfortunately it is slower than paq8px on a single core
paq8k2 is still the slowest, however. enwik9 would take months.
Anyway, about the vote, what I'd like to see is good speed and compatibility between versions, kind of like zip.![]()
Why don't you merge bbb and zpaq then?![]()
Last edited by Shelwien; 31st July 2009 at 03:01.
Yeah, I've been meaning to write a BWT or LZP+BWT based compressor in ZPAQ. The inverse transform should not be too hard to write in ZPAQL. (I already did LZP). But for now I think I will work on a .bmp compressor first.
BIT Archiver homepage: www.osmanturan.com
Yeah - exactly: http://encode.su/forum/showthread.php?t=379
I'd vote for max. compression. Maybe you can use 2d SSE instead of multiple chained SSE stages?
is it possible to have better compression without decompresse speed penalty. but only at the cost of compression time ?
i really dont care about compressions speed. i cna compresss while i do other stuf.
but when i decompress I'm waiting for the data and then times becomes important.
just my thought
Yeah, its possible to perform a limited optimization for blocks of data
during compression, and just store the parameters for decompression.
Alphabet reordering and dynamic dictionary are examples of that too,
but actually I meant something like trying multiple models and selecting
the best one.
However, I'd be really surprised if encode would ever implement something like that![]()
Well, seems that BWMonstr v0.02 already have been tested but since I've done some tests for myself, I think there is nothing bad if I'll publish them
version \ size in bytes \ comp. time \ dec. time
Compression ratio improvement is very modest while v0.02 is more than 10 times slower!Code:BOOK1 0.01 = 205 397 = 10.575 = 10.025 0.02 = 204 844 = 108.833 = 50.646 ENWIK6 0.01 = 245 958 = 17.140 = 17.032 0.02 = 244 590 = 181.607 = 105.320 ENWIK8 20 379 365 = 1726.326 = 1675.510 20 307 295 = 17908.286 = 10406.475Well, the good thing here is that v0.02 shows up more asymmetrity. Also some strange thing happens during compression. The output file is slowly growing and then at some point its size is reseting and starts growing again but faster.
Maybe from some technical point BWMonstr v0.02 is unique but for me it looks a little bit strange. For example:
Code:ENWIK6 0.01 = 245 958 = 17.140 = 17.032 0.02 = 244 590 = 181.607 = 105.320 paq8px_61 -1 = 230 097 = 14.250
you can use text preprocessing with bwmonstr too... like wrt or drt.
i mean, paq8px isn't a plain universal context model.