FLASHZIP rev.0.01
Copyright ® 2008 by Nania Francesco A
INSTRUCTIONS :
To compress FlashZIP -c in out
To decompress FlazhZIP -d in out
* [10.1.2008] V. 0.1 Coming new Fast compressor!
link:
http://www.winturtle.netsons.org/FLASHZIP.zip
FLASHZIP rev.0.01
Copyright ® 2008 by Nania Francesco A
INSTRUCTIONS :
To compress FlashZIP -c in out
To decompress FlazhZIP -d in out
* [10.1.2008] V. 0.1 Coming new Fast compressor!
link:
http://www.winturtle.netsons.org/FLASHZIP.zip
Thanks Francesco!
Quick test...
Test machine: AMD Sempron 2400+ with Windows XP SP2
FLASHZIP Copyright ® 2008 by Nania Francesco Antonio (Italy).
Default
Compress 100000000 to 34053194 bytes time 17.20 5676 KB/s
Another quick test...
A10.jpg > 842,029
AcroRd32.exe > 1,900,776
english.dic > 903,765
FlashMX.pdf > 3,915,570
FP.LOG > 932,331
MSO97.DLL > 2,249,419
ohs.doc > 998,572
rafale.bmp > 1,312,917
vcfiu.hlp > 1,005,850
world95.txt > 837,486
Total = 14,898,715 bytes
Thanks Love Pimple!
Hi!
Is faster or not ?
Its very quick! I will try more tests later.
OK Thanks!
FLASHZIP vs THOR 0.96A
ENWIK8
FLASHZIP
34053194 bytes time 3.88 25201 KB/s
THOR e4
35795184 bytes 4sec 859msec 19.63 MB/sec
Quick speed test...
Timed with AcuTimer v1.2
Test File: ENWIK8 (100,000,000 bytes)
THOR v0.96a (E4)
Elapsed Time: 00:00:14.304 (14.304 Seconds)
Compressed Size: 34.1 MB (35,795,184 bytes)
PKZIP v2.50 Win32 (-max)
Elapsed Time: 00:00:16.035 (16.035 Seconds)
Compressed Size: 34.7 MB (36,432,708 bytes)
FLASHZIP v0.01
Elapsed Time: 00:00:17.292 (17.292 Seconds)
Compressed Size: 32.4 MB (34,053,198 bytes)
THOR v0.96a (E5)
Elapsed Time: 00:00:20.042 (20.042 Seconds)
Compressed Size: 34.0 MB (35,696,032 bytes)
QUAD v1.12
Elapsed Time: 00:01:07.286 (67.286 Seconds)
Compressed Size: 28.2 MB (29,649,604 bytes)
Some more stats, CeleronM 1.7
http://shelwien.googlepages.com/fzip001.htm
Btw, I'd written some perl scripts that automatically generated this.
So I can upload them (maybe include perl binaries even) if anyone would like to use something like that.
I also have an idea about new fast compressor. That is, an improved/lite version of LZPX. If combine a better hash function (multiplicative hash), with FPAQ0P like encoder, plus other stuff... However, for me no reason to do that - LZPM will outperform such things in both compression ratio and decompression time.
As I have said I don't already use in this presser the arithmetic engine similar to that of Matt Mahoney but a similar engine more to that of PPMD or of Quad to understand better. The implementations are anchors so many to do I hope!
What about a algorithm describtion?
M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk
I Use a modification of Adaptive Huffman Code in order 1 Byte ! Is difficult to write a pseudo code and the arithmetic engine at moment is not stable entirely!
> I also have an idea about new fast compressor. That is, an
> improved/lite version of LZPX. If combine a better hash function
> (multiplicative hash),
Actually, the best hash for context is its entropy code...
multiplication-based too, obviously. Like, you find
probabilities for a static model on a first pass, and
then use it as a hash for more complex adaptive model
on a second pass. Of course it slows compression, but
decompression could be significantly faster like that.
> with FPAQ0P like encoder, plus other stuff...
> However, for me no reason to do that - LZPM will outperform such things
> in both compression ratio and decompression time.
Really... Why don't anyone work on better modelling anymore?
Even Matt seems more concerned about speed now...
As if theory is perfect already...
Although in fact its fairly easy to get a better result than paq8's
on any data type with known structure. And I bet its possible
for text as well, though proper text modelling takes too much work.
I made a few attempts to improve modeling in my LZPM... Currently LZPM uses:
It's slightly slower than version with just one probability. Mostly due to double memory use.Code:class TPredictor { private: unsigned short p1; unsigned short p2; public: TPredictor(): p1(1 << 15), p2(1 << 15) {} int P() { return ((p1 + p2) >> 6); } void Update(int y) { if (y) { p1 += ((unsigned short)~p1 >> 3); p2 += ((unsigned short)~p2 >> 6); } else { p1 -= (p1 >> 3); p2 -= (p2 >> 6); } } };
What if add version with one probability combined with APM/SSE? Looking at your mixer and latest PAQ9 I see that it's possible to do a fast weighing. However, I tried your mixer with my LZPM, the compression is just a slightly better compared to the dump (a+b)/2. Or maybe I should add a table driven update like pr = tab[pr][y]?
Have you treid to mix diffrent model's predictions? Mixing under the same context won't be a big deal (you just distingush between more or less stationary).
M1, CMM and other resources - http://sites.google.com/site/toffer86/ or toffer.tk
I think as so many that date by now the compression has arrived to such a point that ZIP,RAR and 7ZIP have monopolized everything! I believe nevertheless that the problem is really tied up to the fact that any firm or great company are wanted to put in an already so difficult market! The search that we am doing is also very important however because in a future the algorithms could be used for a new codec video or for the stream-video HDTV for instance! Who knows also it to compress audio!
> I made a few attempts to improve modeling in my LZPM...
Well, I was talking about "better than paq8" modelling,
not the LZ kind. Though, I guess, good techniques can
be applied anywhere.
...And anywhere quality level is kinda the same.
Like, there's no known perfect counter (most precise,
not fastest), no good tools for correlation analysis,
no perfect implementation of combining predictions of
several models, totally no examples of good update
functions for primary submodels whose predictions do
not linearly affect the code length, etc etc.
Code:> if (y) { > p1 += ((unsigned short)~p1 >> 3); > p2 += ((unsigned short)~p2 >> 6); > } > else { > p1 -= (p1 >> 3); > p2 -= (p2 >> 6); > }
> It's slightly slower than version with just one probability. Mostly due to
> double memory use.
> What if add version with one probability combined with APM/SSE?
SSE is always good, at least for compression level.
At least when its proper SSE and not the kind used in paq
> Looking at your
> mixer and latest PAQ9 I see that it's possible to do a fast weighing. However, I
> tried your mixer with my LZPM, the compression is just a slightly better
> compared to the dump (a+b)/2.
That's probably because its not that kind of statistics.
Normally, this trick (dynamic mixing of two counters with
different update speeds) is used when there're alternating
areas with different level of nonstationarity, but no better way
to detect which kind of area is it currently.
And in your case it might be fairly stationary... then
static mix of two counters works better than any of them
probably because optimal update speed value is not power of 2.
So in your place I'd try to rewrite the update like this:
Code:SCALElog = 16; SCALE = 1<<SCALElog; mSCALE = SCALE-1; Speed = 8; wr1 = SCALE/Speed; p1 = p1*(SCALE-wr1) + y*mSCALE*wr1; p1 >>= SCALElog;
Ensure that initially it gives the same compression as yours
(though slower of course).
And then test it with all values of "Speed" from 8 to 64.
Quite possible that with some optimal value you'd get
the same or better compression level like with two counters,
but with a single counter.
Though I didn't really think about speed optimization after this
> Or maybe I should add a table driven update like
> pr = tab[pr][y]?
Maybe, if you'd find that 8 bits precision per counter is enough...
which it isn't.
in TarsaLZP im using something like this:Originally Posted by Shelwien
pr += tab[pr >> 8][y];
pr is a 16 bit value, but im using only high 8 bits of it (to speed up things, with 8 bit probabilty i will have at most one renormalization). and im using tables optimized for lzp+ sse, which gives me 0.5 % improvement on reduntant files like fp.log over normal update.
my sse tables are very small (8 * 256 entries, but most of them are rarely used), so after few kilobytes of input data they are very warm so i can use fixed speed of adaptation without much compression hit (maybe one kilobyte per file).
and as an input to sse im using 8- bit states (as in lpaq) and i think that 8- bit is enough for that. with bigger variables values would be much more unstable and adaptation will be slower.
big precision only helps with very reduntant files, with compression ratio over 100 : 1.
can also be improved as because have made a will him with my filters and is excellent! I think that if, you intend TarsalLZP OpenSource you should do I strive him/it to entirely translate him/it in C++ and you/he/she could be improved together then in 20-40% compression with the addition of a best coding Wave-Bmp-Exe-Tiff. If you want Donkey7, passing him/it in C++ can lend you a hand!
i've described lzp+ sse stuff here: http://phpfi.com/272692 for you some time ago. and it's written in c++.
i haven't described my fixed order- 2 literal coder because it's dumb and inefficient. it's just an array of 16777216 probabilities (256 symbol * 65536 contexts). almost surely you have better thing in your turtle.
TarsaLZP is combination of LZP+ SSE (for matched bytes) and fixed order- 2 PPM (for unmatched bytes).
i don't have time now (end of semester). on holidays (winter or summer) i will have more spare time to work on it.
> pr += tab[pr >> 8][y];
> pr is a 16 bit value, but i'm using only high 8 bits of it (to speed up things,
> with 8 bit probabilty i will have at most one renormalization).
Why, make a 16bit renormalization in rangecoder
> and i'm using tables optimized for lzp+ sse,
> which gives me 0.5 % improvement on reduntant
> files like fp.log over normal update.
I suppose its similar to how contextual compressors with lossy hash
mapping of several contexts into same counter seemingly compress
better than same model with full statistics.
Actually, better precision is _always_ better.
Simply because you can always simulate worse precision with it.
> my sse tables are very small (8 * 256 entries, but most of them are rarely
> used),
is that interpolated SSE at least?
> so after few kilobytes of input data they are very warm so i can use
> fixed speed of adaptation without much compression hit
> (maybe one kilobyte per file).
Well, dunno what are you comparing it with.
Same as with precision, its always possible to get some improvement
with more statistics and more submodels. And comparing LZP even with
PPMd, there's much space for improvement.
> and as an input to sse i'm using 8- bit states (as in lpaq) and i think that 8-
> bit is enough for that. with bigger variables values would be much more
> unstable and adaptation will be slower.
Guess you're doing something wrong... Its like how PPMd can't use
higher-order contexts. But it doesn't mean that they're unusable,
just the model is faulty.
> big precision only helps with very reduntant files,
> with compression ratio over 100 : 1.
Well, I can't agree here.
Actually, almost any compressible file can be considered redundant
like that - its just that there're contexts with good compression
ratio and ones with bad.
Let's suppose that data source is a algorithm which uses some
amount of true random information and some fixed rules to change its
state and generate the data.
Then, ideal compressor for that source is supposed to recover
exactly the used amount of random information from the data and
store it (and use the same generator for decompression).
So, even with not visibly redundant files (like pseudorandom streams
from Ilya's prng) you'd still need all the precision you can afford.
Ok, with most of the real files we don't actually know exactly how
they're generated. But still, its always possible to find
some almost-deterministic contexts in compressible data.
And also you need good precision to process random data or you'd get
high redundancy.
So, afaik, its best to have random and deterministic statistics
separated as much as possible. Of course, its faster to have
huffman-like state/symbol hierarchy, but to implement a correct
compressor that way you need a really good knowledge of data
properties - and even more precision as you'd have to manage
these compound states, where eg. a proper update is much more
complex than usually.
hmm, then i will be left with 16- bit precision rangecoder (i dont want to use 64 bit variables, theyre slow) and this will have big impact on literal coding (which is not bijective, theres 255 symbols in statisitcs) - etiher compression (because of less precision) or speed (i will be forced to use more complex routines to compensate rounding errors).Originally Posted by Shelwien
thats certainly not true. ive tested it and it works. its used only in sse part. sse is not hashed - it uses 3 bit of match history + 8 bit of context state for total of 11 bits (== 2048 entries).Originally Posted by Shelwien
by using tables i emulated fpaq0p behaviour but ive tuned it somewhat. ive explained the difference in article ive linked above:
// creating probability transition tables. they are used when updating probabilities and usually act like updating by 2.5 % (1/ 40th) of the error.
// but it is updating less agressively when statistics are skewed and this helps on skewed statistics (eg. on files like fp.log it is 0.5 % better)
in other words - when probabilty is like 0.01 or 0.99 then we update less aggresively.
hash mapping is always lossy (because hash is lossy by definiton). but the error in lzp model isnt very important - with collision detection i gain very little with big speed loss http://www.encode.su/forums/index.ph...page=1#msg5204
no. i know with interpolation i will benefit from bigger tables but i wanted speed.Originally Posted by Shelwien
as i written above i dont use interpolation or propagation. just simple point- to- point updates (ie. when processing one byte with one lzp model i check and change only one entry from lzp and one entry from sse, lzp models are indepedent). with some mixing, propagating errors, weighting, etc. it will compress a little better but much slower.Originally Posted by Shelwien
using big precision sometimes means dealing with noise. with small precision you filter the noise automatically.
see your post here: http://www.encode.su/forums/index.ph...ic=586#msg7095
To be specific, I have results like this:
370648 370607 // 15 bits
370652 370607 // 12 bits
370646 370628 // 11 bits
371075 370708 // 8 bits
And its not about simple change of shift value - complete
model reoptimization (contexts and other parameters) was
performed for objective comparison (numbers in first column
are results before optimization).
So I think that you need 12bit precision, but anything more
is masked by the "model noise".
TarsaLZP, FlashZIP, PPMdJ are all about efficiency not super high compression ratio.
my main goal is efficient compression of textual data (human readable and understandeable like plain text, html, source codes, charts, etc. not base64 encoded data, dna sequences or random characters). lzp is a best solution to that, although its terrible on binary data.
@Donkey7
Excuse me but in that sources tails it is not understood really a lot any arithmetic engine and there are besides some varying ones that it is not understood where you are been initialized. how example:
const word floor = 40;
const word length = 500;
byte a, b;
word pmiss[256], pmatch[256], prob;
// for Donkey7
A=?????
B=????
for (a = 0; a < 256; a++)
{
if (a == 0)
b = 0;
else if (a < 12
{
b = length / a;
if (b < floor)
b = floor;
}
else if (a == 12
b = floor;
else
{
b = length / (256 - a)
if (b < floor)
b = floor;
}
if (a != 255)
pmatch[i] = (256 - a) * 256 / b; // for Donkey7 i=????
else
pmatch[i] = 0; // for Donkey7 i=????
if (a != 1)
pmiss[i] = a * 256 / b;// for Donkey7 i=????
else
pmiss[i] = 0;// for Donkey7 i=????
}
// after a match we'll do:
prob += pmatch[prob >> 8];
// after a miss we'll do:
prob -= pmiss[prob >> 8];
Engine is equal to Fpaq0 (Write Bit) ?
Nania Francesco Antonio
sorry it should be pmiss[a] not pmiss[i] (and pmatch[a] not pmatch[i]).
yes, it's almost equal to fpaq0p. it's only slightly tuned.
so: a = index (element to be computed), b = speed of adaptation (different for different probabilities).
you can skip that code. it's rather irrelevant. it's only about tuning fpaq0p coder.
edit/ update:
b is an inverse of speed. eg. if b = 40 then probability is updated by 1/ 40' th of error.
Thanks Donkey7!
>> Why, make a 16bit renormalization in rangecoder
>
> hmm, then i will be left with 16- bit precision rangecoder (i don't want to use
> 64 bit variables, they're slow) and this will have big impact on literal coding
> (which is not bijective, there's 255 symbols in statisitcs) - etiher compression
> (because of less precision) or speed (i will be forced to use more complex
> routines to compensate rounding errors).
Well, here's a working example in my demo coder:
http://shelwien.googlepages.com/order_test4.rar
Btw, although it uses qword type casts there, its a speed optimization
actually as it seems that IC 10 is finally able to use the common
32x32->64 x86 MUL instruction - before o2_rc1 all was done in 32 bits.
>> I suppose its similar to how contextual compressors with lossy hash
>> mapping of several contexts into same counter seemingly compress
>> better than same model with full statistics.
>> Actually, better precision is _always_ better.
>> Simply because you can always simulate worse precision with it.
>
> that's certainly not true. i've tested it and it works. it's used only in sse
> part. sse is not hashed - it uses 3 bit of match history + 8 bit of context
> state for total of 11 bits (== 2048 entries).
you apparently didn't understand me, but whatever.
> in other words - when probabilty is like 0.01 or 0.99 then we update less
> aggresively.
Wonder how you find optimal parameters for these updates.
Do you have any specific tools or maths for that task?
> hash mapping is always lossy (because hash is lossy by definiton). but the error
> in lzp model isn't very important - with collision detection i gain very little
> with big speed loss
I didn't argue with that. I just said that as an example that you can't
simply replace one part of the tuned model with a better (more precise
etc) version and hope that whole model will work better.
So its kinda troublesome to prove my point.
> > is that interpolated SSE at least?
>
> no. i know with interpolation i will benefit from bigger tables but i wanted speed.
No, its the reverse. Properly implemented SSE-with-interpolation allows
to greatly reduce the number of quants. Eg. in my coders a single map
consists of 4 to 6 values and thats enough.
Of course it takes more computations, but the model might get even
faster overall due to fewer RAM access etc.
> as i written above i don't use interpolation or propagation. just simple point-
> to- point updates (ie. when processing one byte with one lzp model i check and
> change only one entry from lzp and one entry from sse, lzp models are
> indepedent). with some mixing, propagating errors, weighting, etc. it will
> compress a little better but much slower.
So, how do you determine where's the maximum compression level possible
for LZ and why do you think you reached it?
There's not that much difference between CM and LZ afaik.
If you start speed optimization of CM it'll start looking like LZ, and
vice versa.
> using big precision sometimes means dealing with noise. with small precision you
> filter the noise automatically.
Well, I wanted to say that such things should be done in a controllable
fashion, it gives more parameters which you can tune.
> To be specific, I have results like this:
> 370648 370607 // 15 bits
> 370652 370607 // 12 bits
> 370646 370628 // 11 bits
> 371075 370708 // 8 bits
> And its not about simple change of shift value - complete
> model reoptimization (contexts and other parameters) was
> performed for objective comparison (numbers in first column
> are results before optimization).
> So I think that you need 12bit precision, but anything more
> is masked by the "model noise".
Right, but that's kinda out of context here.
That was not a normal counter, more like a bit sequence hash
or something. For normal counters I prefer better precision.
And also these "abnormal counters" was used not directly, but
as a index to interpolated SSE with 16bit coefficients.
That is, I tried to shrink down my counters in best possible
circumstances.
But in most "fast compressors" these things are done in reverse order.
That is, first you implement 1-byte counters with fast and rough update
function, then try some magic to get better predictions.
Well, not that I care anyway. If it wasn't like that, there'd be nothing
to do in this field already.
> my main goal is efficient compression of textual data (human readable and
> understandeable like plain text, html, source codes, charts, etc. not base64
> encoded data, dna sequences or random characters). lzp is a best solution to th
> at, although it's terrible on binary data.
Afair, your result on book1 was like 248k or something.
Dunno how's that efficient.
LZ is the best solution for the data with long matches.
That's all.
And texts are not like that.
Even more so if you'd like to use some preprocessing.
FLASHZIP V. 0.02 Copyright ® 2008 by Nania Francesco A
-More faster!
-Removed Bugs!
-Full Compatible with old version!
Download Link:
http://www.winturtle.netsons.org/FLASHZIP.zip
Shelwien
TarsaLZP is not finished yet. it lacks proper literal coding. on files where lzp model doesn't match many symbols my program performs very bad (worse than even zip). so, on some files it can be worse than ppmdj -o2 and on some it can be better than ppmdj -o16 and that's normal. i don't use normal lz. my lzp model is more like high order ppm (with only one symbol per context).
when/ if i release new versions of my program (with proper literal coding) i will consult you to exchange some ideas, but for now discussing the performance of my program is pointless.
Thanks Francesco!Originally Posted by Nania Francesco Antonio