1. ## Identify compression

Hi guys,
looking to figure out what compression this file is using. I'm not programmer and my compression knowledge is very limited.
I do not have source code, following is my attempt to roughly describe beginning of decompression(the rest seems to be more complicated).

Compressed file is 5700 bytes, uncompressed is 32384 bytes.
File is from 1992, it's using an older compression algorithm.
It works with 16bit numbers ie. words

Code:
```compressed:
A1 4E 8D 2A D5 E0 55 AB 1E AB 42 65 8C DC 31 76```

Code:
```decompressed:
50 53 51 52 06 57 56 1E 55 50 8C C8 8E D8 8E C0```

1. first word, and second plus third byte interpreted as word, are shifted to right by some loop(from 0 to 7) let's call it i-loop.
I'll start with second iteration is easier to see result, thus i-loop is 1
Code:
```    8D4E >> i-loop = 46A7
2A8D >> i-loop = 1546```

2. from the result bitwise or is performed between upper byte of a first word and lower byte of a second word
Code:
`    46 | 46 = 46`

3. shift by one(it's always one)
Code:
```    46A7 >> 1 = 2353
1546 >> 1 = AA3```

4. same as step 2
Code:
`    23 | A3 = A3 = A353`
5. iteration is compared to 7, and if it's above it's set back to zero

6. 53 is stored as the final "uncompressed" byte.

Any idea what this could be?

2. That looks like old-style 16bit bit i/o routine.

1. Check for zlib/deflate with http://nishi.dreamhosters.com/u/rawdet.exe
Example:
D:\tmp6>rawdet.exe delta_floats.png nul nul nul
beg=0000003E last=0 type=2 size=16391 unplen=680954
end=00045F3D bufbeg=00000002 bufend=00000000

2. Maybe try http://aluigi.altervista.org/quickbms.htm or similar

3. ## Thanks:

encode (20th November 2016)

4. Hi, thank you for suggestions, I've already tried quickbms and ZlibChecker which is somewhere on this forum, 0 streams found is the result, rawdet does not return anything , quickbms fails to deliver proper result..

I ruled out possibility it's deflate some time ago (or is there some older deflate?).

Do you know any other tool like quickbms?

There are two distinct constants used later in the decompression 0x3ff and 0x1fff.

5. There's little point in identifying it then, if its not deflate or lzma - there're too many LZ variants in apps, just look at this: http://asmodean.reverse.net/

What do you want to do with it? I presume you already have the decoder?

6. How do you know it's LZ variant? Wish I had my decoder that's why I'm trying to identify algorithm so finding implementation somewhere would be possible. Now I have almost nothing and rewriting complicated 16bit assembler code into some "normal" language is really more than frustrating experience I've tried already.......

7. > How do you know it's LZ variant?

That's simply most likely. At first, after "It works with 16bit numbers" I thought it could be LZW, but LZW won't need shifts.

> Wish I had my decoder that's why I'm trying to identify algorithm so finding implementation somewhere would be possible.
> Now I have almost nothing and rewriting complicated 16bit assembler code into some "normal" language is really more than frustrating experience I've tried already.......

Unfortunately I don't see any working decompilers for 16bit code.
But I can suggest a workaround.
You can disassemble the code (make sure that it assembles correctly - disassemblers frequently have problems with lossless reassembling),
then assemble it as 32-bit, then use idapro6.8/hex-rays to decompile it.

8. Originally Posted by Shelwien
You can disassemble the code (make sure that it assembles correctly - disassemblers frequently have problems with lossless reassembling),
then assemble it as 32-bit, then use idapro6.8/hex-rays to decompile it.

Honestly,it would be perfectly fine to have even 32bit compilable assembler code, but manually dealing with all
ds:si, es:si............is pain in the a.. Nevermind, hope someone else could help,
I'll try supply any info when requested or I can post disassembled code.

9. sorry for reviving the thread , I found better description of the algorithm.

The tokens for decompression are fetched using the get_next_token function which reads the next part of the source into ax:dx shifted by cl. The cl register is being used as a bit shift position, wrapping back to zero when it reaches eight. At the top of the loop a token is read and the bottom bit is checked. If the flag is set this just stores the current byte, gets the next token and continues. If the flag is clear a larger code path is taken which ends with a rep movsb instruction. This indicates the compression is using a dictionary of some sort.

The decompression algorithm is interesting in a few ways. The first is that it is using a variable bit length encoding. An absolute value is encoded as a 1 flag, and and 8-bit data value. The interesting bit is that the bitstream is encoded as little endian. For example if the first three bytes of a chunk are encoded as absolute values the data is arranged like this:
Code:
```Bytes:   AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD
Token 1: 6543210F        7
Token 2:          543210F        76
Token 3:                   43210F        765```
The decompressor will also skip a byte when reading if the cl counter wraps back to zero when getting the next token. I don’t know if this is an optimisation, a bug, or a hack made by the original developer to fix an issue with their tools.

If the flag is clear the decompressor performs a copy from an early part of the decompressed data. In this case the following bits encode a length and offset to copy from. The offset is encoded as either 10 or 13 bits, with a flag indicating which. This seems like an odd choice since it makes the code a bit more complex and is really only saving 2 bits at best.

The length encoding is a little strange. The decompressor reads bits until it reaches a zero bit. The number of bits used to encode the length is then two plus the number of non-zero bits read. For example to encode a length of 58 (0x3a) the bitstream looks like:

Code:
```11110
111010```
Which requires 11 bits to encode. Short lengths encode better, since the minimum bit length is 2. Copy lengths up to 3 require only 3 bits to encode, up to 7 requires 5 bits, etc. I’m not sure if this form of encoding is a common technique or not.

10. Well, that's just a description of bit i/o, it can still be anything.
But on other hand, now we have a 16-bit decompiler.

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•