Results 1 to 17 of 17

Thread: Dealing with container formats

  1. #1
    Member
    Join Date
    Jan 2009
    Location
    Australia
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Dealing with container formats

    Ok why is it that so much effort is put into squeezing the life out of some content to achieve a better compression ratio with some slightly more effective algorithm than dealing with container formats that so many files nowdays are being shipped in?

    Here is an example of the point I'm trying to make. And I like to do this on some real data.

    Start by getting this file. Its an Nvidia demo. Of a really cool 3D head.
    http://www.nzone.com/object/nzone_hu...downloads.html

    Original file

    nzd_HumanHeadSetup.exe --> 101MB

    Now compressed with Winrar 3.9 (Best), 7-Zip 9 beta (Ultra LZMA2) and NanoZIP 0.07 (Opti2)

    nzd_HumanHeadSetup.rar --> 100MB
    nzd_HumanHeadSetup.7z --> 100MB
    nzd_HumanHeadSetup.nz --> 98.6MB

    Yeh bet everyone was surprised by that. Well with 7-zip u can open the original exe up and extract the contents out. It produces a bunch of files including mp3 textures model formats and some exes as well as a vcredist.exe which can be opened up and have its contents extracted again too. the total uncompressed data is about 172MB
    Now here are the results when the uncompressed data is compressed this time around.
    nzd_HumanHeadSetup.rar --> 71.1MB
    nzd_HumanHeadSetup.7z --> 72.6MB
    nzd_HumanHeadSetup.nz --> 68.4MB

    So if any one of these compressors filtered the container formats then they clearly would have achieved a better ratio that the others, but they dont. 7-zip can open them up but I'm guessing it cant put them back together again for decompression. I've also noticed a lot of antivirus programs worming their way deep into layers and layers of container files, so is it really that hard to deal with?

    I played with precomp... Good concept but unfortunately it doesnt deal with these container types and the filtering should really be taking place inside a compression app... not a 3rd party one in my opinion.

    So there seems to be 2 schools of thought of how to deal with this...
    1)Maintain identical bit for bit compression/decompression, which means identifying the container file, decomp it, compress all data and reverse for decomp. this add lots of extra decomp time to the compression process and compression time to the decomp process! but would keep everyone happy by not changing the data in anyway.

    2)Lossy installers but lossless content. After the data is decompressed initially out of the container file, recompress the payload to the new highly compressed format. but somehow relink the exe to use the new compression format over the old one. In essence this would be upgrading the compression in the installer.exe (cabs etc) and yes it sounds a little crazy. but it would be more efficent that option 1 because on decompression / running the installer, no recompression to the old compression in the container would need to take place.

    Well just thought I would share this.

  2. #2
    Member
    Join Date
    Aug 2009
    Location
    Bari
    Posts
    74
    Thanks
    1
    Thanked 1 Time in 1 Post

    Talking

    nzd_HumanHeadSetup.7z 99,8 Mb (I used also Ultra7z)
    nzd_HumanHeadSetup.rar 100 Mb
    nzd_HumanHeadSetup.nz 98,1 Mb (cm option with 1300 mb RAM)
    nzd_HumanHeadSetup.arc 100 Mb (Ultra, require 2 gb RAM for decompression)
    nzd_HumanHeadSetup.rzm 99,8 Mb (RZM.exe)
    Last edited by PiPPoNe92; 31st August 2009 at 20:45.

  3. #3
    Member
    Join Date
    Aug 2009
    Location
    The Moon
    Posts
    9
    Thanks
    0
    Thanked 0 Times in 0 Posts
    subwolf,

    I also aware of this phenomenon and I absolutely agree with you.

  4. #4
    Member
    Join Date
    Jan 2009
    Location
    Germany
    Posts
    35
    Thanks
    0
    Thanked 0 Times in 0 Posts
    The other way round ist seems to work, by changing the .cab files or setup.exe into 'store' (zero compression archives like .tar) and then compressing with nz or 7z. This has only the disadvantage of big temporal files.

    edit:
    I tested that some time ago with a game mod (cinematic mod 10) sized 9.3GB, consisting of a .rar containing .zip files and setup. changing the .zips to 'store' and compressing it with 7z reduced it to 5.5GB, while keeping the setup.exe working.
    Last edited by mstar; 1st September 2009 at 00:15.

  5. #5
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    561
    Thanks
    212
    Thanked 200 Times in 93 Posts
    Quote Originally Posted by subwolf View Post
    So there seems to be 2 schools of thought of how to deal with this...

    1)Maintain identical bit for bit compression/decompression, which means identifying the container file, decomp it, compress all data and reverse for decomp. this add lots of extra decomp time to the compression process and compression time to the decomp process! but would keep everyone happy by not changing the data in anyway.
    That's the hard way that Precomp goes. As you said, it keeps people happy because it doesn't change the data, but it's slow and you need to add support for every new compression method or you can't de-/recompress the data. Especially recompression is very hard as you have to maintain identity and additionally, even if most decompression methods are available, compression methods often aren't.

    Quote Originally Posted by subwolf View Post
    2)Lossy installers but lossless content. After the data is decompressed initially out of the container file, recompress the payload to the new highly compressed format. but somehow relink the exe to use the new compression format over the old one. In essence this would be upgrading the compression in the installer.exe (cabs etc) and yes it sounds a little crazy. but it would be more efficent that option 1 because on decompression / running the installer, no recompression to the old compression in the container would need to take place.
    The problem with such a "lossy" concept is that you can't generalize it. You can apply this to installers and optimize them this way (that's pretty much what NOSSO does), but you have to have good knowledge about the data and what "lossless content" mean.
    For example, take an game ISO file or game-format container. There could be f.e. some images in there that you can optimize and that will be identical afterwards pixel for pixel. But if somewhere else a checksum of the images is stored (for the container case) or the image size changed (for the ISO case), you can run into problems.
    This means for every installer/file you optimize you'll have to check if it works correctly afterwards - this is not an easy task and you can't automate it...
    http://schnaader.info
    Damn kids. They're all alike.

  6. #6
    Member
    Join Date
    Aug 2009
    Location
    The Moon
    Posts
    9
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by mstar View Post
    The other way round ist seems to work, by changing the .cab files or setup.exe into 'store' (zero compression archives like .tar) and then compressing with nz or 7z. This has only the disadvantage of big temporal files.

    edit:
    I tested that some time ago with a game mod (cinematic mod 10) sized 9.3GB, consisting of a .rar containing .zip files and setup. changing the .zips to 'store' and compressing it with 7z reduced it to 5.5GB, while keeping the setup.exe working.
    Doesn't work for me though. I use 7z (LZMA2, Ultra, 256MB, 273, Solid) after ".tar" store and it produce a bigger end result (file). Perhaps yours is a special case?

  7. #7
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    873
    Thanks
    49
    Thanked 106 Times in 84 Posts
    NAKTT
    i think ou are missing the point.
    He is REMOVING the zip compression layer of the files.
    not just taring them together.

    And it works in most case to enhance compressions as 7-zip now works on the original data and not something where a bad compression has destroyed redundancy


    what he is doing is this:

    he is is making this
    org -> zip (comrpessed) -> 7-zip ultra .oO( oh no data with low redundancy)

    into this indstead
    org file -> zip (store) - > 7-zip ultra .oO( horray data with lots of redundancy

    not just
    Org file -> zip (compressed) -Z> into .tar -> 7-zip ultra .oO( oh no still data with low redundacy)
    which what it sounds like you are doing

  8. #8
    Member
    Join Date
    Aug 2009
    Location
    The Moon
    Posts
    9
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by SvenBent View Post
    NAKTT
    i think ou are missing the point.
    He is REMOVING the zip compression layer of the files.
    not just taring them together.

    And it works in most case to enhance compressions as 7-zip now works on the original data and not something where a bad compression has destroyed redundancy


    what he is doing is this:

    he is is making this
    org -> zip (comrpessed) -> 7-zip ultra .oO( oh no data with low redundancy)

    into this indstead
    org file -> zip (store) - > 7-zip ultra .oO( horray data with lots of redundancy

    not just
    Org file -> zip (compressed) -Z> into .tar -> 7-zip ultra .oO( oh no still data with low redundacy)
    which what it sounds like you are doing
    Thanks for your explanation. But if you read the 1st post of this thread, the issue is we can not and have no control over the original files (if we want it to maintain identical bit for bit compression/decompression). In other word, its up to the owner of the original data to pack his files. By understanding that fact, that is why I arrived to my understanding when reading schnaader's post:

    Original file (exe or whatever) -> tar (store) -> 7z ultra = no improvement

  9. #9
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    561
    Thanks
    212
    Thanked 200 Times in 93 Posts
    I just had a look at the data. The biggest part (around 93% of the compressed size) seems to be the directory "textures" that has some interesting files (showing only the biggest ones here):

    Code:
    AdrianAlbedo.tga        67.108.908 bytes    4096x4096, 32-bit, uncompressed
    AdrianFinalNormal.tga   67.108.908 bytes    4096x4096, 32-bit, uncompressed
    cube_diaCourt.exr       23.586.382 bytes    3072x4096, 24-bit (?), PIZ wavelet
    The OpenEXR file is compressed using PIZ wavelet compression (which combines wavelet and huffman) and so can't be compressed well anymore without converting it. IrfanView can read the image using a plugin and save it as BMP, but since OpenEXR also supports HDR, I'm not sure if this conversion is lossless. At least IrfanView says it has 24-bit depth. Converting this image to PNG gives a file that is only 8.329.293 bytes, so we perhaps can save around 15 MB.

    The TGA images can't be converted using IrfanView as it only saves 24-bit images, paq8q doesn't recognize them - I'm not sure if NanoZip recognizes/optimizes them. If this would work (or using some other image compression tricks), I'm sure that there would be additional savings so the compressed size could get under 50 MB.
    http://schnaader.info
    Damn kids. They're all alike.

  10. #10
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by schnaader View Post
    The OpenEXR file is compressed using PIZ wavelet compression (which combines wavelet and huffman) and so can't be compressed well anymore without converting it. IrfanView can read the image using a plugin and save it as BMP, but since OpenEXR also supports HDR, I'm not sure if this conversion is lossless. At least IrfanView says it has 24-bit depth. Converting this image to PNG gives a file that is only 8.329.293 bytes, so we perhaps can save around 15 MB.
    For creating such lighting effects (HDR lightprobs as lighting environment) with shaders, there is no sense to use 8 bits channels. They are usually 16 bits floating-point per channel (16x3=48 bits for each pixel). They have to be floating point. Because, very bright white (i.e. sun) and regular white (i.e. white paper) is not same. So, we have to increase range of the values as possible. As a result, this image cannot be 24 bit depth. The only way to convert a HDR image to LDR image (such as 24 bit bmp) is applying tone mapping. And it's also lossy transform too. It's dynamically computed for each frame by HDR rendering supported games such as Half-Life 2, Crysis, Far Cry 2, Unreal Tournament 3, Gear of Wars etc.
    BIT Archiver homepage: www.osmanturan.com

  11. #11
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    561
    Thanks
    212
    Thanked 200 Times in 93 Posts
    Yes, I also thought that they're most probably 48 bits in depth. In that case, do the upper and lower bits differ that much? Because 2*8,3 MB=16,6 MB would still be better than 23,5 MB. Anyway, in that case the question would be how to revert the PIZ compression to get the uncompressed image data. Perhaps this would be possible using exrtools, but this is available for Linux only, so I didn't test it so far.
    http://schnaader.info
    Damn kids. They're all alike.

  12. #12
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    873
    Thanks
    49
    Thanked 106 Times in 84 Posts
    Quote Originally Posted by nakTT View Post
    Original file (exe or whatever) -> tar (store) -> 7z ultra = no improvement
    using tar before 7-zip is jsut the same as using solid mode. that why it doesn't help anything.

  13. #13
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by schnaader View Post
    Yes, I also thought that they're most probably 48 bits in depth. In that case, do the upper and lower bits differ that much? Because 2*8,3 MB=16,6 MB would still be better than 23,5 MB. Anyway, in that case the question would be how to revert the PIZ compression to get the uncompressed image data. Perhaps this would be possible using exrtools, but this is available for Linux only, so I didn't test it so far.
    I have just downloaded the file and look at it. It's exactly 32 bits per channel

    If you want to make support for "game ripping", it's not worth to work on HDR files. Because, this kind of files only used as light probs (=sky textures which is used as lighting source) in games. And most popular game does not include many HDR textures (say at most 5). Instead they mostly consists DDS, TGA and JPEG files. For a quick support for games, you can have look BSP file formats. Quake derived engines (I can roughly tell you about >20 game engines - not even game titles which can be up to >100!!!) support BSP files with small variations. Their "generic" structure is like that:
    Code:
    struct LumpMeta
    {
      UInt32 Offset; // offset of the lump in the file
      UInt32 Length; // lump length
    };
    
    struct Header
    {
      UInt32 Signature; // Usually "?BSP", ? can be R, I, etc (it points developer company)
      UInt32 Version; // it's only useful to get length (N) of lumps sector
      LumpMeta LumpInfo[N];
    };
    
    /*
    it's structure looks like that
    [Header]
    [Lump #0]
    [Lump #1]
    [Lump #2]
    [...]
    [Lump #N-1]
    */
    As you see, you don't really care about "all" versions. You can simply extract each lumps out of the file. By doing this you can improve statistical compressors' performance by acting as a perfect data analyzer
    Last edited by osmanturan; 2nd September 2009 at 11:01. Reason: 32 bits per pixel -> 32 bits per channel
    BIT Archiver homepage: www.osmanturan.com

  14. #14
    Member
    Join Date
    Aug 2009
    Location
    The Moon
    Posts
    9
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by SvenBent View Post
    using tar before 7-zip is jsut the same as using solid mode. that why it doesn't help anything.
    Exactly, that is why I implied about it in my post reply to him.
    Last edited by nakTT; 2nd September 2009 at 11:07.

  15. #15
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    873
    Thanks
    49
    Thanked 106 Times in 84 Posts
    Now I'm confused

    maybe i missed something in prior post but i sew noone suggestion just using tar on the input files before doing 7-zip...

    so far you are the only one who mentioned that "method" or did a miss something ?

    -- edit --

    just re-read the post and seem even thought i tried to explain it to you you are still getting it wrong.
    nonone sugested using tar on the input files.
    someone sugested makin the input files BE LIKE a tar file and not a compressed zip

    like i explained before... remove the compression layer.
    NOT just taring them together

    you are the only on who (an me)so far has has talked about that "method". its NOT what was mentioned by the people you are replying to and telling them that id didn't work (because you are doing it wrong)
    so put it in short
    there was no reason for you to make the post as you did it wrong
    Last edited by SvenBent; 2nd September 2009 at 09:18.

  16. #16
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts

    using 7z-format as container-format

    i think in the windows-world it is better
    to using the 7z-format and not the tar-format

    because in 7z is a better well-proved support
    for unicode filenames and directory-trees

    and it is possible to update
    a file within the existing 7z-archiv
    without to recreate the whole archiv-file

    two things i am missing:
    - a simple (primitive and fast) program to
    create a 7z-archiv without compressing in a short time
    - a simple (opensource-) program which read
    such a simple 7z-archiv

    if we had such two basically programs in opensource

    it would be possible to
    do compressing in two steps

    first step:
    - in a short time
    collect the files within a directory-tree
    and create a 7z-archivfile
    second step:
    - now there is a lot of time
    read the 7z-archivfile and
    do a individually optimized compression
    for each component of the archivfile

    just my 2 cents

  17. #17
    Member Vacon's Avatar
    Join Date
    May 2008
    Location
    Germany
    Posts
    523
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hello everyone,

    Quote Originally Posted by joerg View Post
    [...]
    two things i am missing:
    - a simple (primitive and fast) program to
    create a 7z-archiv without compressing in a short time
    - a simple (opensource-) program which read
    such a simple 7z-archiv

    if we had such two basically programs in opensource
    [...]
    just my 2 cents
    Something like that => http://www.kmonos.net/lib/noah.en.html ?
    Or still too complicated?
    Edit: Don't know for sure about the license => Ultra7z_Optimizer probably LGPL (as it refers to "C by Igor Pavlov")

    Edit 2: Having said that and had a closer look at Ultra7z-Optimizer, I found it deleting the original archive silently without asking the user. It would be nice to change that behaviour:
    - compare new (7z) archive's size with original-archive's size (imagine content of original is mp3 => maybe new archive's size will increase!). ArcConvert does (license unknown) Convert Arc is LGPL and seems to be the follow-up.
    - ask user if the original should be kept (at least for safty reasons). Ultra7z_Optimizer itself says "Please verify content of each archive." Yeah, it also says "Use it carefully! Please create backup copy before use it." Wouldn't it easier (and therefore be safer) to ask users?

    Best regards!
    Last edited by Vacon; 3rd September 2009 at 12:04.

Similar Threads

  1. Replies: 1
    Last Post: 13th May 2009, 11:46

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •