Ok why is it that so much effort is put into squeezing the life out of some content to achieve a better compression ratio with some slightly more effective algorithm than dealing with container formats that so many files nowdays are being shipped in?
Here is an example of the point I'm trying to make. And I like to do this on some real data.
Start by getting this file. Its an Nvidia demo. Of a really cool 3D head.
http://www.nzone.com/object/nzone_hu...downloads.html
Original file
nzd_HumanHeadSetup.exe --> 101MB
Now compressed with Winrar 3.9 (Best), 7-Zip 9 beta (Ultra LZMA2) and NanoZIP 0.07 (Opti2)
nzd_HumanHeadSetup.rar --> 100MB
nzd_HumanHeadSetup.7z --> 100MB
nzd_HumanHeadSetup.nz --> 98.6MB
Yeh bet everyone was surprised by that. Well with 7-zip u can open the original exe up and extract the contents out. It produces a bunch of files including mp3 textures model formats and some exes as well as a vcredist.exe which can be opened up and have its contents extracted again too. the total uncompressed data is about 172MB
Now here are the results when the uncompressed data is compressed this time around.
nzd_HumanHeadSetup.rar --> 71.1MB
nzd_HumanHeadSetup.7z --> 72.6MB
nzd_HumanHeadSetup.nz --> 68.4MB
So if any one of these compressors filtered the container formats then they clearly would have achieved a better ratio that the others, but they dont. 7-zip can open them up but I'm guessing it cant put them back together again for decompression. I've also noticed a lot of antivirus programs worming their way deep into layers and layers of container files, so is it really that hard to deal with?
I played with precomp... Good concept but unfortunately it doesnt deal with these container types and the filtering should really be taking place inside a compression app... not a 3rd party one in my opinion.
So there seems to be 2 schools of thought of how to deal with this...
1)Maintain identical bit for bit compression/decompression, which means identifying the container file, decomp it, compress all data and reverse for decomp. this add lots of extra decomp time to the compression process and compression time to the decomp process! but would keep everyone happy by not changing the data in anyway.
2)Lossy installers but lossless content. After the data is decompressed initially out of the container file, recompress the payload to the new highly compressed format. but somehow relink the exe to use the new compression format over the old one. In essence this would be upgrading the compression in the installer.exe (cabs etc) and yes it sounds a little crazy. but it would be more efficent that option 1 because on decompression / running the installer, no recompression to the old compression in the container would need to take place.
Well just thought I would share this.