Results 1 to 12 of 12

Thread: Problems identifying file compression

  1. #1
    Member
    Join Date
    Feb 2010
    Location
    Germany
    Posts
    77
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Problems identifying file compression

    I wrote an extractor for a certain game's archive format. Some archives contain compressed files among uncompressed ones. Since I haven't dealt much with compression before, I have a hard time identifying which kind of compression has been used and how to decompress it. May be some of you pro's can figure it out?

    I attached a compressed Lua script as sample file.
    Attached Files Attached Files

  2. #2
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    It doesn't look compressed to me. Otherwise 7zip would not compress it further.

  3. #3
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    571
    Thanks
    219
    Thanked 205 Times in 97 Posts
    I would also say that this is a compiled LUA script - compiled, but uncompressed. There are some clear text sections inside the file, e.g. at position 0x27000. If a compression is used, it's a very basic one similar to LZSS, but I doubt this.
    http://schnaader.info
    Damn kids. They're all alike.

  4. #4
    Member
    Join Date
    Feb 2010
    Location
    Germany
    Posts
    77
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Well, the archives contain information on compressed and decompressed file sizes, so my educated guess was that this was compressed. Also, the Lua-File contains a LuaC header after a block that I assumed was the dictionary. Many files have a prepending block of data like this. May be the sample file wasn't too great. I attached a PNG and a TIF this time. They don't have a header and don't look like typical image files when checking them with a hex-editor. Also, this time 7z-compression barely worked on them:
    Attached Files Attached Files

  5. #5
    Member Karhunen's Avatar
    Join Date
    Dec 2011
    Location
    USA
    Posts
    91
    Thanks
    2
    Thanked 1 Time in 1 Post

    RGB Planar (RRR GGG BBB) analysis

    One of the files seems to be "stratified", the RedBurn.png compresses as such:
    # unzip -Z -m RedBurn_262_262-head_240.zip
    Archive: RedBurn_262_262-head_240.zip 182712 bytes 1 file
    -rw-a-- 2.0 fat 206172 b- 12% defX 31-Jan-12 15:59 RedBurn_262_262-head_240.raw
    1 file, 206172 bytes uncompressed, 182558 bytes compressed: 11.5% And the bitmap with offset 240 262x262x24bits RGB planar attached.
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	RedBurn_262_262-head_240.png 
Views:	447 
Size:	195.6 KB 
ID:	1826  

  6. #6
    Member
    Join Date
    Feb 2010
    Location
    Germany
    Posts
    77
    Thanks
    2
    Thanked 0 Times in 0 Posts
    I'm not sure I understand you. What do you mean by stratified? Do you mean the file was scrambled? It looks like that to me. How did you decompress the file? Did you use a generic zip-header?

  7. #7
    Member Karhunen's Avatar
    Join Date
    Dec 2011
    Location
    USA
    Posts
    91
    Thanks
    2
    Thanked 1 Time in 1 Post
    Sorry do not mean to confuse, I only put schnader's observation via a picture... there are regular definite offsets for example, if you have a raw rgb image and read it in RRR,GGG,BBB order instead of RGB,RGB,RGB order you get 3 duplicate planes. In the case of your data, I view it as 3 such blocks, so you have 9 "bands". I can't say if this helps, but it is only offered as an observation for those better able to help you.

  8. #8
    Member
    Join Date
    Feb 2010
    Location
    Germany
    Posts
    77
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Thanks for the clarification. May be I can make something out of that.

  9. #9
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    571
    Thanks
    219
    Thanked 205 Times in 97 Posts
    Looking at the three test files, I get the impression that some container format is used here and it might use some compression/scrambling.

    All three files start similar, bytes 3-6 are the same and look like a magic string (0xC5 0x73 0x5A 0x11).

    In each file, there are small regions of ~128 bytes with a distance of ~11000 bytes from each other. These contain many "low" bytes (0x00-0x0F), although they're not limited to those. They are visible in the picture above.

    I had a look for typical PNG strings in RedBurn.PNG, but haven't found any, so they are either stripped, there's some compression/encryption involved or the image was converted from PNG to another format and the name was kept. There also is a block at offset 0xEB0 that contains some text ("bitmap", "mode", "ccf", ".5TXD", "width 4i", "height", "dep", "flags", "framsmip", "row_pitc", "slice", "framimagplatform") - these look like a description of image parameters and after some of them, there are some values that seem to make sense - for example, 0x00000500 (1280) and 0x000002D0 (720) after "width 4i" and "height".

    There's a similar section in Scopes_Common.tif, where width seems to be 0x00000400 (1024), but "height" is directly followed by "dep" - I guess there is some compression involved and width=height, so this would be a 1024x1024 image.

    The cut-off strings and the interleaving blocks still remind me of LZSS, although in LZSS compression flags and data are closer to each other and they seem to be strictly seperated here. I guess the small 128 byte blocks may be some kind of match flags, perhaps there are some "distance/len" blocks too that aren't recognizable that easily. This would fit to the strings missing repeating parts (e.g. "dep" should be "depth 4i", but has a match in "width 4i").

    A LZSS-like compression would also explain why the files can be compressed further.
    Last edited by schnaader; 1st February 2012 at 13:54.
    http://schnaader.info
    Damn kids. They're all alike.

  10. #10
    Member
    Join Date
    Feb 2010
    Location
    Germany
    Posts
    77
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Thanks for this profound analysis. I tried to google the few readable text snippets in the hope to find some information where they would belong in the file (probably the header), but couldn't find a single result. On the other hand, according to wikipedia the png-format doesn't use any of those flags. I also have some uncompressed png-files from those archives, but those comply with wikipedia's header and flag description, so I couldn't even find those strange flags in other non-compressed files of that game. Also, the compression must have rather quick decompression times, since the data is loaded almost instantly in game.

    Edit: I just had the idea to search the gamee's executable for hints and found some strings that seem to pinpoint to zip/zlib:

    Error: GFxZlibState is not set - can't load zipped image data
    GFxZlibState is not set
    Unknown zlib error
    zlib version error
    zlib memory error

    ZLib....LZX.LZD.UNKNOWN...Compression: .MiniPack ...InternalCompression ....AllowDuplicates ....<None>....Flags: ...Size: %s bytes.......Version: %i....PackFile %s:.....P..N.......(...Xz.0C@.......P...l......... ........0...P...............

    GFx_DefineBitsJpeg3Loader: charid = %d pos = %d.......
    DefBitsLossless2: tagInfo.TagType = %d, id = %d, fmt = %d, w = %d, h = %d.
    So could this be a scrambled or headerless zlib file? I used precomp a couple of days ago on those files and it didn't amount to anything even though zlib seems to be used, so I guess the files are really just missing the header.
    Last edited by Mexxi; 1st February 2012 at 14:53.

  11. #11
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    571
    Thanks
    219
    Thanked 205 Times in 97 Posts
    Quote Originally Posted by Mexxi View Post
    So could this be a scrambled or headerless zlib file? I used precomp a couple of days ago on those files and it didn't amount to anything even though zlib seems to be used, so I guess the files are really just missing the header.
    After running both Precomp in intense and brute mode and Shelwien's reflate on the files, I doubt they are compressed using pure zLib. Both reflate and Precomp in brute mode detect zLib streams without any header and even if the stream is interleaved with something, at least the first part of it would be detected, but there wasn't anything found in the files. As we suppose there's some container format around the files, zLib streams are most likely "hidden" until the original data can be successfully extracted from the container files.

    I think the zLib routines are used to decompress the extracted PNG files, as the PNG format also uses zLib to compress the image data.

    At least, "LZX" and "LZD" seem to indicate that there are additional LZ compression algorithms used. "...Jpeg..." is most probably for TIFF containing JPEG streams or pure JPEG files if the game uses any. "MiniPack", "InternalCompression" and "PackFile" look like signs for custom compression routines.

    Still, extracting from the 3 example files seems to be possible if there's no global dictionary or additional data, but it would be very hard reverse engineering. Perhaps the easiest to analyze would be the LUA file, as you said, it contains a LuaC header (Starting at offset 0xF12, 0x1B "LuaQ") and since the following table seems to consist of more or less increasing 4 byte offsets, it seems to be possible to guess the original content for verifications of decompression attempts. There are also some clear text blocks in the file that are large enough to be able to distinguish and seperate literal blocks from additional compression data blocks.

    Of course, as you have the game executable, real reverse engineering would be the easiest, but likely has legal issues depending on the license.
    Last edited by schnaader; 1st February 2012 at 22:40.
    http://schnaader.info
    Damn kids. They're all alike.

  12. #12
    Member
    Join Date
    Feb 2010
    Location
    Germany
    Posts
    77
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by schnaader View Post
    After running both Precomp in intense and brute mode and Shelwien's reflate on the files, I doubt they are compressed using pure zLib. Both reflate and Precomp in brute mode detect zLib streams without any header and even if the stream is interleaved with something, at least the first part of it would be detected, but there wasn't anything found in the files. As we suppose there's some container format around the files, zLib streams are most likely "hidden" until the original data can be successfully extracted from the container files.
    Thanks for your reply. Question is why the company would use a proprietary container within another proprietary container plus compression. That doesn't make much sense, especially since this only applies to some files. I'll have to take that into consideration though . May be looking at it as another container helps me to reverse engineer it. The archive's TOC contains information of the file's compressed and extracted size, so in the meantime I tried to use that data to prepend a zip-header manually. I could open the archive, yet the extraction wasn't possible due to errors. My last idea was to force extraction by ignoring occurring errors, but I don't know a program with such a feature.


    Quote Originally Posted by schnaader View Post
    I think the zLib routines are used to decompress the extracted PNG files, as the PNG format also uses zLib to compress the image data.

    At least, "LZX" and "LZD" seem to indicate that there are additional LZ compression algorithms used. "...Jpeg..." is most probably for TIFF containing JPEG streams or pure JPEG files if the game uses any. "MiniPack", "InternalCompression" and "PackFile" look like signs for custom compression routines.
    Thanks for the clarification.


    Quote Originally Posted by schnaader View Post
    Still, extracting from the 3 example files seems to be possible if there's no global dictionary or additional data, but it would be very hard reverse engineering. Perhaps the easiest to analyze would be the LUA file, as you said, it contains a LuaC header (Starting at offset 0xF12, 0x1B "LuaQ") and since the following table seems to consist of more or less increasing 4 byte offsets, it seems to be possible to guess the original content for verifications of decompression attempts. There are also some clear text blocks in the file that are large enough to be able to distinguish and seperate literal blocks from additional compression data blocks.
    Yes, that was also my estimate that a text-file would be best to figure out what actually is going on. Too bad that it's compiled. I tried to find other text-files so far, but the only other ones were either compiled Lua-scripts, or uncompressed text-files, so not much luck there.


    Quote Originally Posted by schnaader View Post
    Of course, as you have the game executable, real reverse engineering would be the easiest, but likely has legal issues depending on the license.
    Apart from that, it's also way beyond my expertise

Similar Threads

  1. Compiling ZPAQ on Mac OSX 10.7.2 (Lion) gcc4.2.1 - problems!
    By z3cko in forum The Off-Topic Lounge
    Replies: 7
    Last Post: 21st December 2011, 03:28
  2. Compression test file generator
    By Matt Mahoney in forum Data Compression
    Replies: 3
    Last Post: 26th June 2011, 22:28
  3. Replies: 1
    Last Post: 12th June 2011, 03:01
  4. my file compression considerations
    By JB_ in forum Data Compression
    Replies: 2
    Last Post: 5th May 2008, 20:47
  5. Fixed email sending problems!
    By encode in forum The Off-Topic Lounge
    Replies: 2
    Last Post: 5th May 2008, 16:42

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •