Results 1 to 9 of 9

Thread: Identify compression

  1. #1
    Member
    Join Date
    Nov 2016
    Location
    India
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Identify compression

    Hi guys,
    looking to figure out what compression this file is using. I'm not programmer and my compression knowledge is very limited.
    I do not have source code, following is my attempt to roughly describe beginning of decompression(the rest seems to be more complicated).


    Compressed file is 5700 bytes, uncompressed is 32384 bytes.
    File is from 1992, it's using an older compression algorithm.
    It works with 16bit numbers ie. words




    Code:
    compressed:
    A1 4E 8D 2A D5 E0 55 AB 1E AB 42 65 8C DC 31 76

    Code:
    decompressed:
    50 53 51 52 06 57 56 1E 55 50 8C C8 8E D8 8E C0



    1. first word, and second plus third byte interpreted as word, are shifted to right by some loop(from 0 to 7) let's call it i-loop.
    I'll start with second iteration is easier to see result, thus i-loop is 1
    Code:
        8D4E >> i-loop = 46A7
        2A8D >> i-loop = 1546

    2. from the result bitwise or is performed between upper byte of a first word and lower byte of a second word
    Code:
        46 | 46 = 46

    3. shift by one(it's always one)
    Code:
        46A7 >> 1 = 2353
        1546 >> 1 = AA3

    4. same as step 2
    Code:
        23 | A3 = A3 = A353
    5. iteration is compared to 7, and if it's above it's set back to zero

    6. 53 is stored as the final "uncompressed" byte.



    Any idea what this could be?

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,013
    Thanks
    303
    Thanked 1,329 Times in 759 Posts
    That looks like old-style 16bit bit i/o routine.

    1. Check for zlib/deflate with http://nishi.dreamhosters.com/u/rawdet.exe
    Example:
    D:\tmp6>rawdet.exe delta_floats.png nul nul nul
    beg=0000003E last=0 type=2 size=16391 unplen=680954
    end=00045F3D bufbeg=00000002 bufend=00000000

    2. Maybe try http://aluigi.altervista.org/quickbms.htm or similar

  3. Thanks:

    encode (20th November 2016)

  4. #3
    Member
    Join Date
    Nov 2016
    Location
    India
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hi, thank you for suggestions, I've already tried quickbms and ZlibChecker which is somewhere on this forum, 0 streams found is the result, rawdet does not return anything , quickbms fails to deliver proper result..

    I ruled out possibility it's deflate some time ago (or is there some older deflate?).

    Do you know any other tool like quickbms?

    There are two distinct constants used later in the decompression 0x3ff and 0x1fff.

  5. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,013
    Thanks
    303
    Thanked 1,329 Times in 759 Posts
    There's little point in identifying it then, if its not deflate or lzma - there're too many LZ variants in apps, just look at this: http://asmodean.reverse.net/

    What do you want to do with it? I presume you already have the decoder?

  6. #5
    Member
    Join Date
    Nov 2016
    Location
    India
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts
    How do you know it's LZ variant? Wish I had my decoder that's why I'm trying to identify algorithm so finding implementation somewhere would be possible. Now I have almost nothing and rewriting complicated 16bit assembler code into some "normal" language is really more than frustrating experience I've tried already.......

  7. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,013
    Thanks
    303
    Thanked 1,329 Times in 759 Posts
    > How do you know it's LZ variant?

    That's simply most likely. At first, after "It works with 16bit numbers" I thought it could be LZW, but LZW won't need shifts.

    > Wish I had my decoder that's why I'm trying to identify algorithm so finding implementation somewhere would be possible.
    > Now I have almost nothing and rewriting complicated 16bit assembler code into some "normal" language is really more than frustrating experience I've tried already.......

    Unfortunately I don't see any working decompilers for 16bit code.
    But I can suggest a workaround.
    You can disassemble the code (make sure that it assembles correctly - disassemblers frequently have problems with lossless reassembling),
    then assemble it as 32-bit, then use idapro6.8/hex-rays to decompile it.

  8. #7
    Member
    Join Date
    Nov 2016
    Location
    India
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien View Post
    You can disassemble the code (make sure that it assembles correctly - disassemblers frequently have problems with lossless reassembling),
    then assemble it as 32-bit, then use idapro6.8/hex-rays to decompile it.


    Honestly,it would be perfectly fine to have even 32bit compilable assembler code, but manually dealing with all
    ds:si, es:si............is pain in the a.. Nevermind, hope someone else could help,
    I'll try supply any info when requested or I can post disassembled code.

  9. #8
    Member
    Join Date
    Nov 2016
    Location
    India
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts
    sorry for reviving the thread , I found better description of the algorithm.

    The tokens for decompression are fetched using the get_next_token function which reads the next part of the source into ax:dx shifted by cl. The cl register is being used as a bit shift position, wrapping back to zero when it reaches eight. At the top of the loop a token is read and the bottom bit is checked. If the flag is set this just stores the current byte, gets the next token and continues. If the flag is clear a larger code path is taken which ends with a rep movsb instruction. This indicates the compression is using a dictionary of some sort.


    The decompression algorithm is interesting in a few ways. The first is that it is using a variable bit length encoding. An absolute value is encoded as a 1 flag, and and 8-bit data value. The interesting bit is that the bitstream is encoded as little endian. For example if the first three bytes of a chunk are encoded as absolute values the data is arranged like this:
    Code:
    Bytes:   AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD
    Token 1: 6543210F        7
    Token 2:          543210F        76
    Token 3:                   43210F        765
    The decompressor will also skip a byte when reading if the cl counter wraps back to zero when getting the next token. I don’t know if this is an optimisation, a bug, or a hack made by the original developer to fix an issue with their tools.


    If the flag is clear the decompressor performs a copy from an early part of the decompressed data. In this case the following bits encode a length and offset to copy from. The offset is encoded as either 10 or 13 bits, with a flag indicating which. This seems like an odd choice since it makes the code a bit more complex and is really only saving 2 bits at best.


    The length encoding is a little strange. The decompressor reads bits until it reaches a zero bit. The number of bits used to encode the length is then two plus the number of non-zero bits read. For example to encode a length of 58 (0x3a) the bitstream looks like:

    Code:
    11110
    111010
    Which requires 11 bits to encode. Short lengths encode better, since the minimum bit length is 2. Copy lengths up to 3 require only 3 bits to encode, up to 7 requires 5 bits, etc. I’m not sure if this form of encoding is a common technique or not.

  10. #9
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,013
    Thanks
    303
    Thanked 1,329 Times in 759 Posts
    Well, that's just a description of bit i/o, it can still be anything.
    But on other hand, now we have a 16-bit decompiler.
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	ghidra.png 
Views:	113 
Size:	218.7 KB 
ID:	6509  

Similar Threads

  1. Machine Learning to identify weight loss parameters
    By Sportman in forum The Off-Topic Lounge
    Replies: 0
    Last Post: 13th August 2016, 14:10
  2. Identify compression (pre 1996)
    By McGee in forum Data Compression
    Replies: 5
    Last Post: 18th March 2016, 16:06
  3. Can anyone identify this compression/encryption tool?
    By Mexxi in forum The Off-Topic Lounge
    Replies: 2
    Last Post: 9th August 2014, 17:52
  4. Help identify compression algorithm?
    By DotDotDot in forum Data Compression
    Replies: 0
    Last Post: 1st June 2013, 10:15
  5. Identify compression in file fragment
    By Nquisitive in forum Data Compression
    Replies: 0
    Last Post: 16th September 2012, 18:02

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •