Results 1 to 9 of 9

Thread: Unknown compression algorithm in old DOS program

  1. #1
    Member
    Join Date
    May 2013
    Location
    Geneve
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Unknown compression algorithm in old DOS program

    Hello,
    I am trying to determine compression algorithm used in old DOS program (16-bit, compiled about 1993-94). It is used to compress ~30 files, I can determine the expected output by doing memory dumps. I've attached example of compressed and decompressed binary files.
    What I know by now:
    - it is some variant of LZSS (from debug information included .exe file)
    - first 3 bytes of compressed file are compression parameters
    - there is a 4th parameter with value: 1 << 2nd parameter
    What my guesses are (regarding attached example):
    - 1st parameter (5) is length bits count
    - 2nd parameter (8), dunno
    - 3rd parameter (10), dunno
    - 4th parameter (1 << 8 = 256) is window size
    I can attach the dissassembly, but it is quite big. I've tried few LZSS source codes with changed parameters, but none of them gave me expected output. Any ideas?
    Attached Files Attached Files

  2. #2
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,569
    Thanks
    777
    Thanked 687 Times in 372 Posts
    if your have program executable, you can try to compress various files and look at the compressed data. also, you can run it under turbo debugger, press ^Break due compression/decompression and look at the internal loop code

  3. #3
    Member
    Join Date
    May 2013
    Location
    Geneve
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts
    The program code contains only decompression algorithm. I have tried to trace it in Bochs emulator (I am not using Windows), but before analyzing such a huge amount of data I prefer to ask. In the worst case, I can try to cut the decompression function code from disassembly but I will still miss the compression algorithm.

    I've attached some part of disassembled code (unfortunately this function code is the only place where IDA dissassembler partly fails, I guess this function was linked from some external library).
    Attached Files Attached Files

  4. #4
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,569
    Thanks
    777
    Thanked 687 Times in 372 Posts
    i've identified comp_f2. it reads 8 bits from input buffer buff_offset, filled on underflow by sub_162CA. this literal byte then stored by stosb command

    so, it looks more like some sort of tornado bitcoder, not byte-oriented LZSS

  5. #5
    Member Jean-Marie Barone's Avatar
    Join Date
    Oct 2012
    Location
    Versailles, France
    Posts
    60
    Thanks
    15
    Thanked 4 Times in 3 Posts
    ; ---------------------------------------------------------------------------
    db 6Ah ; dissassembling error?
    comp_b2 db 0
    ; ---------------------------------------------------------------------------

    These are the opcodes of push 0 which must be an argument for comp_f2 function. With IDA, set your cursor on db 6Ah and press the C key to force it to disassembly.

    db 5 ; dissassembling error?
    comp_b3 dw 0F0F0h
    It is add ax,F0F0h.

    Sadly, I can help you much with the name of the algorithm...

  6. #6
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,569
    Thanks
    777
    Thanked 687 Times in 372 Posts
    it's not even bitcoder, but semi-LZH coder: literal bytes are copied directly as 8 bits, while LZ matches are Huffman-encoded (or at least have variable-sized codes). I've heard 15 years ago that such coders are usual for exe packers. also, this decoder modifies code on-the-fly, probably as codesize-optimization technique. those "dissassembling errors" are just palces where program modifies itself, usually immediate operand in the "push" command., you should disassemble these lines manually, looking into 8086 opcode table

    loc_166CF is a literal decode: *ptr++ = getNbits(. code

    push N
    call comp_f2

    reads N bits from some input buffer and returns them in ax.

    com_f1 == getbit() == getNbits(1)

    comp_f3 flushes bitbuffer, i.e. moves read pointer to the next whole byte



    so, entrire algo starting from loc_1668C looks as:

    comp_b1 := number of bits used to represent match lengt minus 2. in your example, it's 5, i.e. matches of 2..33 bytes
    comp_b2 := number of bits used to represent short distances (in near matches), in your example, it's 8, i.e. up to 256 bytes far
    comp_b4 := number of bits used to represent long distances (in far matches), in your example, it's 10, i.e. up to 256+1024 bytes far

    while (dst<bufend)
    {
    if (getbit()==0)
    { // near match
    src = dst - getbits(comp_b2);
    len = getbits(comp_b1)+2;
    memcpy(dst,src,len);
    dst += len;
    }
    else if (getbit()==0){ // far match
    offset = getbits(comp_b4) + (1<<comp_b2);
    src = dst - offset;
    len = getbits(comp_b1)+2;
    memcpy(dst,src,len);
    dst += len;
    }
    else
    { // literal
    *dst++ = getbits(;
    }

    }


    final conclusion: it's a bitcoder, with a parameters stored in the first 3 bytes. usual for situations when you need extremely short and fast decoder, such as exepacker. i've never seen upx decoder but it may be just it. or some other exe packer
    Last edited by Bulat Ziganshin; 17th May 2013 at 18:55.

  7. #7
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,569
    Thanks
    777
    Thanked 687 Times in 372 Posts
    oh, i forget that it's 1993 or so. so look at the packers popular those days: http://en.wikipedia.org/wiki/Executa...DOS_executable

    but if you need just compression algo, it's obvious from the code reconstruction

  8. #8
    Member
    Join Date
    May 2013
    Location
    Geneve
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Really big thanks, you have saved me many days of work!

  9. #9
    Member
    Join Date
    Oct 2009
    Location
    usa
    Posts
    60
    Thanks
    1
    Thanked 9 Times in 6 Posts
    What is the name of the DOS program? Were you able to decompress the entire dataset, and might your findings be useful to other programs which used that same algorithm?

Similar Threads

  1. BCIF image compression program
    By m^2 in forum Data Compression
    Replies: 35
    Last Post: 26th April 2013, 16:02
  2. Help with unknown compression. Maybe LZMA?
    By twisted89 in forum Data Compression
    Replies: 0
    Last Post: 29th November 2012, 11:11
  3. Unknown compression.
    By yjme in forum Data Compression
    Replies: 7
    Last Post: 21st January 2011, 11:11
  4. LZP2 - compression program by a newbye
    By Cyan in forum Data Compression
    Replies: 45
    Last Post: 1st May 2009, 13:30
  5. Replies: 18
    Last Post: 5th November 2007, 11:12

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •