Results 1 to 14 of 14

Thread: Looking for a super-simple decompressor

  1. #1
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts

    Looking for a super-simple decompressor

    I have a microcontroller project. It has some side functionality that requires a HTTP server serving a ~5KB of static text. This 5KB is a waste that's big enough to warrant compressing, but not enough to write custom code (it would be best to compress the server code too, but it's not worth the effort; still, I may do it if I'm bored). So I'm looking for a compressor that will minimise compressed size+decompressor code size. With at least somewhat free license. Encoding time doesn't matter. Decoding time practically neither, even 50 KB/s would be fine. It doesn't have to be production quality, if it works on this single file, it's OK.
    My initial candidate would be LZ4 because it has HC variant. But decoder is optimised for speed, not size. Any suggestions?

  2. #2
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    I don't know what the performance is like, but LZ4 is probably a bit complex for a microcontroller. You might want to take a look at heatshrink.

  3. Thanks:

    m^2 (27th March 2015)

  4. #3
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    The MCU has 32 KB RAM and I can use 8 for the decompressor easily, so any LZSS will fit. Smaller space will make things easier though.
    As a side note, I haven't seen this compressor, need to fsbench it.

  5. #4
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    164
    Thanks
    31
    Thanked 64 Times in 40 Posts
    Attached file is html. Its source is compressed by simple algorithm(LZSS+huffman).
    The size of decompressor is 356bytes.
    Attached Files Attached Files
    Last edited by xezz; 27th March 2015 at 12:54.

  6. Thanks:

    m^2 (28th March 2015)

  7. #5
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    887
    Thanks
    481
    Thanked 278 Times in 118 Posts
    5 KB is a really small amount of data.
    The problem is that in such circumstances, size of the decoder matters.
    I feel your general direction of selecting an LZSS code is a good bet.
    Anything more complex is likely to cost as much on the code size at it earns on the compression side.

  8. #6
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 796 Times in 488 Posts
    A quick look through http://mattmahoney.net/dc/text.html
    LZW decompressor is 671 bytes of source code and public domain.
    smile decompressor is 207 byte executable, although compression isn't very good.

  9. Thanks:

    m^2 (28th March 2015)

  10. #7
    Member WayneD's Avatar
    Join Date
    Mar 2015
    Location
    Down Under
    Posts
    7
    Thanks
    7
    Thanked 2 Times in 2 Posts
    [correction, there are two "supertiny's" ... "JQCoding", and "Super Tiny Compression Engine (STCE)"... the authors (different but from same group) refer to both as "Supertiny" however, hence the confusion - apologies!]

    "JQCoding" by Jacky Qwerty/29A, has a compiled x86 compressor of 154 bytes and decompressor 122 bytes
    http://vxheaven.org/29a/29a-3/29a-3.2_f
    (note this is a pro-virus website, but this page is harmless/purely about a de/compressor). Its author states the algorithm is "for VXers" but really its just another general compression algorithm that has seemingly nothing to do with viruses. [edit - although see Matt's analysis in next post that shows it uses a crypto layer probably aimed at helping bypass AV]
    This is very powerful super(tiny/fast) compresion/encryption tool for VXers! It consists of both compressor/encryptor and decompresor/decryptor routines! Compresor and decompresor routines are only 9Ah/7Ah bytes long respectively! Compression ratio is much better than RLE and acceptably good compared with other compresors. Compresion algorithm is not based on LZ dictionaries, nor huffman coding, nor any other popular algorithm, it rather implements a new compresion technique which favors code size and both compresion speed/ratio!
    ---

    "Super Tiny Compression Engine" (STCE), by Super/29A and Vecna/29A (i dont have compiled sizes but you can tell from the source its also ubertiny)
    http://repo.hackerzvoice.net/depot_m...-2/29A%232.2_8
    PACK/UNPACK [just RLE but only for nulls? *headscratch*]
    This engine consists on a simple zero-repeat compression. It copies code from the source buffer to a destination buffer, checking for zeros. When it finds one, it starts counting the number of zeros which follow that initial zero, and then stores that value. So, all the zeros will be followed by the number of times we need to repeat them. Pretty simple, as you see.

    CRUNCH/UNCRUNCH
    This routine is based on the premise that not all possible ASCII codes are used in a given text. Some letters, for instance in portuguese or spanish, like a, s, c, or e are more used than others, like k, y, w, t, etc. Other codes, such as the high ASCII ones, are almost never used! So this routine works in two steps. First it scans for the most used codes, creating two tables, each of them with the 15 most used characters. Then it starts compressing the file. If the character in the source buffer is present in the first table, the engine will put its offset in the table. Else, it will look for that character in the second table, and if it's there, will put a zero to mark a table change and put the offset of the character in that second table. Finally, if the character is not present in any table, then the engine will put a zero and the plain ASCII code. The 15 characters, which are the max number in our table, can be represented in a nibble, so we may gain space. So, it will work this way:
    * Codes in 1st table -> 1 nibble
    * Codes in 2nd table -> 2 nibbles (one 0 and the offset)
    * Codes not found in any table -> 4 nibbles (two 0 and the ASCII code)
    This compression engine is especially useful in order to compress, for instance, the text of a payload in a virus, the names of the APIs used in a Win32 PE infector... in general, any plain text you would like to compress.
    Last edited by WayneD; 28th March 2015 at 23:20.

  11. Thanks:

    comp1 (28th March 2015)

  12. #8
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 796 Times in 488 Posts
    I'm looking at the code. I can't tell for sure, but I think this is an order-2 coder. It uses a 64 KB table to predict the next byte from the last 2 bytes of context. There are 8 flag bits packed (MSB first) into 1 byte to encode whether the next 8 predictions are right (0) or wrong (1). For each wrong prediction, the byte is coded as a literal and the table is updated. It should be extremely fast.

    The algorithm is complicated by some simple (not secure) encryption code probably designed to confuse virus scanners. It uses a seed passed to EAX. The code could be made quite a bit shorter without the encryption. It also looks like there are 2 unreachable instructions in the decoder just before label jq_d5.

    Edit: I wrote a small C++ program which I think does the same compression but without the encryption. It does not compress very much: enwik9 is 703 MB. However the decoder is very simple. One difference is that instead of passing the input size to the decoder, I use EOF to mark the end of input. The last flag byte is padded with 1 bits to signal mispredictions, but without corresponding literal bytes to input.

    The speed is limited mainly by I/O. In g++, getc() and putc() are slow, but I was too lazy to use fread() and fwrite().

    Code:
    // jq.cpp - public domain by Matt Mahoney
    // To compress|decompress: jq c|d input output
    
    #define _CRT_DISABLE_PERFCRIT_LOCKS  // MSVC: speed up getc/putc
    #include <stdio.h>
    
    int main(int argc, char** argv) {
      if (argc<4) return printf("To compress|decompress: jq c|d in out\n"), 1;
      FILE *in, *out;
      if (!(in =fopen(argv[2], "rb"))) return perror(argv[2]), 1;
      if (!(out=fopen(argv[3], "wb"))) return perror(argv[3]), 1;
    
      // compress
      int flags=1;  // 8 packed bits: 1 = mispredicted byte. bit 8 = sentinel
      int cxt=0;    // 2 byte context
      int n=0;      // number of pending literals to write
      int c;        // last byte
      unsigned char buf[8]={0};  // pending literals to write
      unsigned char model[1<<16]={0};  // order 2 prediction of next byte
      if (argv[1][0]=='c') {
        while (true) {
          c=getc(in);
          if (c!=EOF) {
            flags+=flags;
            if (c!=model[cxt]) ++flags, buf[n++]=model[cxt]=c;
            cxt=((cxt<<8)|c)&0xffff;
          }
          else
            while (flags<256) flags+=flags+1;  // pad
          if (flags>=256) {
            putc(flags&255, out);
            for (int i=0; i<n; ++i) putc(buf[i], out);
            flags=1;
            n=0;
          }
          if (c==EOF) break;
        }
      }
    
      // decompress
      if (argv[1][0]=='d') {
        while ((flags=getc(in))!=EOF) {
          for (int i=7; i>=0; --i) {
            if ((flags>>i)&1) {
              c=getc(in);
              if (c==EOF) break;
              else putc(model[cxt]=c, out);
            }
            else putc(c=model[cxt], out);
            cxt=(cxt<<8|c)&0xffff;
          }
        }
      }
      return 0;
    }
    Last edited by Matt Mahoney; 28th March 2015 at 03:38.

  13. Thanks (3):

    Bulat Ziganshin (26th April 2015),m^2 (28th March 2015),WayneD (28th March 2015)

  14. #9
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by xezz View Post
    Attached file is html. Its source is compressed by simple algorithm(LZSS+huffman).
    The size of decompressor is 356bytes.
    Cool idea, but is there a compressor?
    Quote Originally Posted by Matt Mahoney View Post
    A quick look through http://mattmahoney.net/dc/text.html
    LZW decompressor is 671 bytes of source code and public domain.
    smile decompressor is 207 byte executable, although compression isn't very good.
    Thanks, I'll look into them.
    Quote Originally Posted by WayneD View Post
    "Supertiny" by Jacky Qwerty from the 29A virus group has a compiled x86 compressor of 154 bytes and decompressor 122 bytes, see http://vxheaven.org/29a/29a-3/29a-3.2_f (note this is a pro-virus website, but this page is harmless/purely about a de/compressor). Its author states the algorithm is "for VXers" but really its just another general compression algorithm that has nothing to do with viruses.
    Interesting, but I need ARMv7-M not x86 code.
    Quote Originally Posted by Matt Mahoney View Post
    I wrote a small C++ program which I think does the same compression but without the encryption. It does not compress very much: enwik9 is 703 MB. However the decoder is very simple. One difference is that instead of passing the input size to the decoder, I use EOF to mark the end of input. The last flag byte is padded with 1 bits to signal mispredictions, but without corresponding literal bytes to input.
    This may be useful, but my gut tells me that a strong encoder would pay for itself.

  15. #10
    Member WayneD's Avatar
    Join Date
    Mar 2015
    Location
    Down Under
    Posts
    7
    Thanks
    7
    Thanked 2 Times in 2 Posts
    ok so it seems there are two "supertinys"! both by the same 29A virus group but different authors. I've edited my previous post above to reflect
    Last edited by WayneD; 28th March 2015 at 21:30.

  16. #11
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    164
    Thanks
    31
    Thanked 64 Times in 40 Posts
    Cool idea, but is there a compressor?
    I have written it, but not works well yet. I'm not original auther.
    There are many wonderful codes. http://golf.shinh.org/p.rb?Text+Compression

  17. #12
    Member snowcat's Avatar
    Join Date
    Apr 2015
    Location
    Vietnam
    Posts
    31
    Thanks
    45
    Thanked 12 Times in 8 Posts

  18. #13
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Maybe...I'll try it in free time.

  19. #14
    Member
    Join Date
    May 2012
    Location
    United States
    Posts
    334
    Thanks
    192
    Thanked 57 Times in 41 Posts
    Quote Originally Posted by WayneD View Post
    [correction, there are two "supertiny's" ... "JQCoding", and "Super Tiny Compression Engine (STCE)"... the authors (different but from same group) refer to both as "Supertiny" however, hence the confusion - apologies!]

    "JQCoding" by Jacky Qwerty/29A, has a compiled x86 compressor of 154 bytes and decompressor 122 bytes
    http://vxheaven.org/29a/29a-3/29a-3.2_f
    (note this is a pro-virus website, but this page is harmless/purely about a de/compressor). Its author states the algorithm is "for VXers" but really its just another general compression algorithm that has seemingly nothing to do with viruses. [edit - although see Matt's analysis in next post that shows it uses a crypto layer probably aimed at helping bypass AV]


    ---

    "Super Tiny Compression Engine" (STCE), by Super/29A and Vecna/29A (i dont have compiled sizes but you can tell from the source its also ubertiny)
    http://repo.hackerzvoice.net/depot_m...-2/29A%232.2_8
    Does anyone have compiled Windows executables for these programs for testing?

Similar Threads

  1. Replies: 4
    Last Post: 8th March 2014, 14:50
  2. Fastest decompressor!?
    By Sanmayce in forum Data Compression
    Replies: 66
    Last Post: 13th April 2011, 01:18
  3. Dummy Static [Windowless] Dictionary Text Decompressor
    By Sanmayce in forum Data Compression
    Replies: 4
    Last Post: 12th October 2010, 18:55
  4. Super Sound Slimmer
    By kampaster in forum Data Compression
    Replies: 2
    Last Post: 10th July 2010, 06:10
  5. super-fast i/o
    By Bulat Ziganshin in forum Forum Archive
    Replies: 18
    Last Post: 10th May 2007, 22:08

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •