Well, I decided to make a first release of the new compressor before Jan 2007...
Keep in mind:
1. this is the first version of the compressor ever
3. this compressor has no filters
2. this is an alpha release!
Enjoy!
Link:
quad101a.zip (26 KB)
![]()
Well, I decided to make a first release of the new compressor before Jan 2007...
Keep in mind:
1. this is the first version of the compressor ever
3. this compressor has no filters
2. this is an alpha release!
Enjoy!
Link:
quad101a.zip (26 KB)
![]()
Thanks! Is it alpha because of possible bugs or because of algoritm improving?![]()
That's a nice little Christmas surprise.
Thanks Ilia!![]()
Well, this must be the most reliable engine I have ever made. This one is alpha because of algorithm improvement – i.e. I have a few variations of LZ output coding and unlike improved parsing, with different LZ output coding we completely loose compatibility. In addition, I release this version to see its performance on various benchmarks and read the users opinion about the algorithm to choose the future improvement strategy. Moreover, here I have at least two possibilities:Originally Posted by Black_Fox
1. Improve LZ parsing. This is still possible – some time ago, Igor Pavlov explained to me the "Optimal LZH" principles, this is the base for optimal parsing in 7-Zip and many other programs such as CABARC (LZX algorithm) and others. Anyway, my LZ engine significantly differs from the baseline LZ77. Therefore, currently I am thinking is it possible to integrate similar parsing to the QUAD.
2. Improve the LZ output coding, including improved PPM or CM compression. Here we have many variants additionally to some context modeling techniques – each improvement must break compatibility. Anyway, looks like current coding scheme for LZ output is fast and efficient. But additionally to that I have another variant – different algorithm that shows noticeable higher compression on binary and slightly poorer compression on text files at some cost of compression and decompression speed. However, since here we talking about decompression speed, this algorithm was rejected.
![]()
Thanks a lot. Great christmas gift !
Good to hear that.Originally Posted by encode
Well, Ive tested it on my files and heres part of the chart (values in bytes)
LZPXJ 1.5g 13 111 588
PIM 1.25beta 13 266 630
rzip 2.1 13 611 631
QUAD 1.01a 13 866 240
WINRAR 3.61 (RAR) 13 989 864
OpenDark ver.A 14 009 352
What I find pretty good (taken into account that QUAD exiss without any filters) is that it performs better than winrar even though winrar is known for its multimedia filters![]()
Only little flaw is that QUAD performs worst among these six archivers at PNG file (which it actually inflates by 2%, while rar decreases size by 4,5%).
But its only alpha version, so good job encode!
I uploaded my compression benchmark. Have a look at http://blackfox.wz.cz/
Thanks! Nice benchmark!Originally Posted by Black_Fox
![]()
Since UNRAR source code is available, we can look at RAR's filters.
Standard filters are:
E8/E9 transform
ITANIUM
DELTA transform
RGB transform
AUDIO
UPCASE
In fact WInRAR even have the text filter (UPCASE), which transforms upper case letters to the lower case in this manner:
Word -> 0x02 (special flag) word
If data already contains 0x02 (usually ASCII text files cannot contain such value), and to override such problem we use:
0x02 -> 0x02 (special flag) 0x02
Also, I have found some interesting things about RK file archiver. It also uses a text filter which gives some serious compression gain. In addition, looks like the EXE filter in WinRK/RK have no auto-detection. For example if rename EXE file to DAT, and overwrite first two bytes (MZ) WInRK will not use any preprocessing.
![]()
Tested WinRAR 3.62 (-m5) vs QUAD 1.01a on bunch of mpeg videos which I tarred into single file before compressing:
compressor --- size --- ratio --- comp time/speed --- decomp time/speed
uncompressed --- 1 035 833 344 --- 100%
winrar 3.62 --- 985122210 --- 95,104% --- 738s/1367kBps --- 59s/17263kBps
QUAD 1.01a --- 973555042 --- 93,988% --- 3421s/295kBps --- 699s/1445kBps
Here, looks like QUAD is a winner. However, on hard-to-compress files QUAD must show the worst results in both terms of compression ratio and decompression speed. Two reasons:Originally Posted by Black_Fox
1. QUAD uses semi-stationary model for literals, which shows poor performance on random/compressed data and in most cases, it can expand a file to about 1%-2%. In addition, here is no solution to fix this problem without significant speed loss.
2. Less compression = less matches was coded = less decompression speed. That means:
Good compressible files will be decompressed in shortest time.
Uncompressible files will be decompressed in longest time.
Additionally to that, the baseline compression of RAR is LZH (LZ77 with 4 MB window + Huffman coding). Thus, RAR must have the fastest decompression speed. At the same time QUAD uses an advanced LZ + arithmetic coding, which slower but in some cases can provide significant higher compression. If you remove PPMd and all filters from WinRAR, and compare the "pure" RAR engine and QUAD engine, RAR will be far behind...
![]()
True, if you look at my benchmark (the complete version) at columns SAVE, PNG and possibly MP3 (which are all already compressed files), winrar saves few more bytes compared to quad. But it's not that great deal.
Is there some file type or something except programming I could do for improving quad?![]()
Well, now I am looking forward for testing results (MFC, Squeeze Chart). In addition, these things are welcomed:
+ Testing results on large set of data, including text files on different languages (neither English nor Russian)
+ Compression/decompression timings on different systems, since these values are extremely depend on data type/hardware (CPU speed, CPU cache, RAM).
Now, I am reading books/papers, consulting with some people and thinking about future improvements. However, the most probable future improvement is well known EXE filter. Also note, estimated release date of the next alpha release is Feb/March 2007.
![]()
I tested performance on some Czech txt ebooks:
http://blackfox.wz.cz/pcman/comp/czebooks.htm
Has QUAD a model for Russian text or is there some other reason for not needing to test it?
And I've tried also one unusual thing:
http://blackfox.wz.cz/pcman/comp/repetition.htm
EDIT:fixed a typo.
The reason not to test on English/Russian text is QUAD was already tested on these Languages by myself - since I have many English and of course Russian books in plain text format.Originally Posted by Black_Fox
From your new tests, we can see:
Czech txt ebooks:
All programs except QUAD use a symmetric compression for text files (compression speed = decompression speed). For example, WinRAR uses PPMd. However, QUAD shows the worst performance and the fastest decompression. Note any LZ-based compressor potentially has a poor performance on textual and similar data – it is just the algorithm property.
"Unusual thing":
1,000,000 "a"s: Thanks to the arithmetic compression – on such things this algorithm is a very effective, since here we have a big asymmetry in symbols probabilities. Note, with Huffman entropy coding we cannot encode this data such efficiently – since the most probable symbol cannot be compressed to less than one bit, with arithmetic compression here is no problem
80,000,000 "a"s: Due to the dictionary bounds, here we see the worst compression...
![]()
Oh, I see. I didnt know this, thanksOriginally Posted by encode
![]()
Some testing results with fast x86 handler:
acrord32.exe: 1,529,707 bytes
mso97.dll: 1,956,788 bytes
WIthout transformation, for comparison:
acrord32.exe: 1,692,763 bytes
mso97.dll: 2,062,611 bytes
I hope soon I release this filter with x86 code auto-detection as separate program, for testing. After that I add it to the QUAD.
![]()
Another one:
photoshop.exe
Quad 1.01a+exeflt: 6,524,840 bytes
Quad 1.01a: 7,624,885 bytes
Original: 19,533,824 bytes
![]()
AcroRd32.dll
Quad 1.01a+exeflt: 3,510,693 bytes
Quad 1.01a: 3,964,707 bytes
Original: 9,609,216 bytes
![]()
thts so c0o0o0o0ol Compressor i have ever seen ,,, good work guy's![]()
Very nice![]()
Okay, current filter has been improved, now it represents something like improved filter from CABARC (LZX) or WinRAR. Some new results:
Layout:
<filename>: current results (previous results; results with no filter)
acrord32.exe: 1,524,108 bytes (1,529,707 bytes; 1,692,763 bytes)
mso97.dll: 1,953,551 bytes (1,956,788 bytes; 2,062,611 bytes)
Photoshop.exe: 6,508,264 bytes (6,524,840 bytes; 7,624,885 bytes)
AcroRd32.dll: 3,502,264 bytes (3,510,693 bytes; 3,964,707 bytes)
![]()
Results look very promising so far.Do you have plans to add other filters to Quad (text, jpeg etc) at anytime in the future?
With Quad engine, I will make some accent on filters with auto detection.
JPEG filter will not fit into the Quad engine. Furthermore, I have no own JPEG technology now.
The Multimedia filter with auto detection can be slow. Although, some time ago I was already developed it for PIMPLE but eventually it has not added.
The Multimedia filter with no auto detection can be also efficient:
track5.WAV
Quad 1.01a + MM: 19,315,852 bytes
Quad 1.01a: 24,991,780 bytes
Original: 29,644,608 bytes
However, this filter not helps with TAR files and requires a huge amount of code for header detections. For example, PIM archiver has a special module for file-header reading.
The brand new exe-filter is the best one. It is fast, small and efficient. It not only helps with executable compression, it also can provide a faster executable decompression (do you remember: higher compression = faster decompression). By the way, possibly eventually I add this exe-filter to the PIM archiver, since this one is better.
In addition, we have a table filter – a preprocessing for tables inside executables and others.
Text filters... I do not know since all txt-filters made by me do not improve compression.
Well, I will carefully dig the UNRAR sources for filters. (Current exe-filter is the result of such digging)
__________________________________________________ ___
Never fear!
I is here!
Can't wait to get your x86 filter in my hands!
Well, this filter was tested on a few gigabytes of various data and looks like it is okay! I think after some additional testing I will release a new version of Quad!
![]()