Page 2 of 2 FirstFirst 12
Results 31 to 49 of 49

Thread: (Extremely) tiny decompressors

  1. #31
    Member
    Join Date
    May 2020
    Location
    Berlin
    Posts
    76
    Thanks
    21
    Thanked 25 Times in 20 Posts
    Some improvements in my tooling:
    Click image for larger version. 

Name:	Figure_sorted_w_disasm.png 
Views:	72 
Size:	83.8 KB 
ID:	7653
    I added sorting of entries per size bin, so that it's easier to see the distribution of different compressors (only PAQ1 and Exomizer shown here). Then I added some mouseover functionality to see name, original and compressed size, compressor, and even up to 30 lines disassembled code of that entry.

    After digging around a bit more on the interweb, I found this huge analysis of Z80 compressors, their resulting filesizes and decompressor sizes. Were all those tables created by you, introspec?
    https://uniabis.net/pico/msx/z80packer/

    Edit 1: Related Z80 decompressor sources are available here: https://github.com/uniabis/z80depacker

    I guess that those decompressor sizes are roughly comparable to x86 solutions (16b, except for some interesting special x86 ops), more than to 68k. So they could work as an estimate for decompressor code complexity, which could be weighted against the compressed data.

    Edit2: I also found this very interesting blog posting, which has been published by modexp just two weeks ago:
    https://modexp.wordpress.com/2020/05...ite-shellcode/
    Is that someone also active in this forum? This posting contains a nice collection of decompressor sources, exe compressors, a compression algorithm classification, and related stuff.

    From my own stats and from what I've read so far, the most promising concepts (for small code compression) are:

    • LZSS style compression
    • 1 byte matches
    • efficient handling of 2 byte matches
    • rep offsets (indexed ringbuffer or whatever fits in small decompressor code)
    • byte masks for matches
    • MTF (has similar effects as 1B matches, only one variant useful)
    • efficient handling of short literal runs or no literals at all (I see run length distributions of roughly 0: n, 1: n/2, 2: n/4, and so on, mostly caused by lots of 1B matches)
    • unary and interleaved gamma codes
    • specialized op byte prefix handling (0x0f, 0x66, 0x67, Z80 ones, etc.)
    • leaving out anything generic (code mover, initializations), or better suited for bigger files (long matches, handling of many different cases)
    • adaption of the decompressor code in combination with encoding

    This is what my little research is about: looking for the best compression ratios at specific file sizes and contents and then to check the actual features implemented in the used compression algos to identify the most promising subset.

    While the match/literal statistics of my compressor point into some interesting directions, adding prominent features of the compressors in my study might also highlight promising concepts.
    Last edited by Dresdenboy; 10th June 2020 at 01:34. Reason: Added two links and a few words

  2. #32
    Member
    Join Date
    Apr 2017
    Location
    United Kingdom
    Posts
    82
    Thanks
    68
    Thanked 33 Times in 21 Posts
    Quote Originally Posted by Shelwien View Post
    > wouldn't have been able to depend on compression routines in Windows.

    I actually mean chunks of existing code, like
    https://en.wikipedia.org/wiki/Return...amming#Attacks
    https://github.com/JonathanSalwan/ROPgadget#screenshots
    Huh, I've seen this kind of tricks applied on ZX Spectrum before (disc OS was implemented by shadow ROM and access to disc controller was only available while running code in shadow ROM).
    So, ROP-style techniques were developed to get low level access to the disc controller and extend the functionality of disc OS.

  3. #33
    Member
    Join Date
    Apr 2017
    Location
    United Kingdom
    Posts
    82
    Thanks
    68
    Thanked 33 Times in 21 Posts
    My apologies for slow replying; my family recently got expanded, so my time for compression massively shrunk!

    Quote Originally Posted by Dresdenboy View Post
    Some current list of tested compressors with example compressed sizes [...] If you have some interesting ones, I'd happy to add them. So far I also want to add oneKpaq, as it has a 128B decompressor.
    It is interesting from my perspective to see so many variations of paq, which I am sure won't be particularly compact (oneKpaq is massively impressive in this sense, will have to study it). paq, deflate - I do not believe most of these technologies to be really usable in 1K. I did some research on 1K intros for ZX Spectrum, the top 3 most popular compressors in 1K productions are:
    1) ZX7 (because its decompressor is only 69 bytes and the shortest stub to unpack and run could be 6-7 bytes long.
    2) MegaLZ (because it compresses better and its decompressor is 110 bytes long (could be reduced to 88 bytes)). Its modern compressor is available as part of https://github.com/lvd2/mhmt
    3) AteBit's internal compressor for micro intros (used only by AteBit on ZX Spectrum, but its decompressor is only 32 bytes long). Its compressor is included as part of this release: https://www.pouet.net/prod.php?which=53074 (but seems suboptimal and is, therefore, likely to be best re-written from scratch).

    You should probably also look into
    4) zx7mini would be an interesting option (although, I believe, not the strongest performer). See https://github.com/antoniovillena/zx7mini
    5) Pletter 4 could potentially be competitive (although it is very close to ZX7 and zx7mini). See http://www.xl2s.tk/

    Quote Originally Posted by Dresdenboy View Post
    E.g. prefix emitter: on x86 I might try to use long SSE insts. They share a 0x0f prefix. This is easy to emit with little code. On Z80 there are several prefixes (FD, ED etc.), which are rather common. It would be easy to write such a compressor with encoding the prefixes in 1-2 bits.
    I think that if you run the stats, you'll find that the vast majority of commands with prefixes are not used in size coding on Z80, because they are mostly not particularly space-efficient. My point is that the savings from such emitter are likely to be not really worth it on Z80.
    Last edited by introspec; 12th June 2020 at 01:33. Reason: clarified the text

  4. #34
    Member
    Join Date
    Apr 2017
    Location
    United Kingdom
    Posts
    82
    Thanks
    68
    Thanked 33 Times in 21 Posts
    Quote Originally Posted by Dresdenboy View Post
    Does the Z80 have bitfield instructions? I guess not. I programmed them the last time 28 years ago. Then the typical shift, test, reload would be required.
    Variety of shifts, test, set and reset commands are available. The set is not very balanced, so your mileage will strongly depend on what you do, actually.

    Quote Originally Posted by Dresdenboy View Post
    Some further research brought up these LZ based solutions (68k, 28-42B):
    http://eab.abime.net/showthread.php?...72#post1148772 (ross)
    http://eab.abime.net/showthread.php?...32#post1152832 (paraj)
    http://eab.abime.net/showthread.php?...66#post1270966 (Blueberry)
    https://www.pouet.net/topic.php?whic...page=1#c276543 (baah, who also has a 32B variant in some productions)
    These links are amazing! this single post of yours fully justified my opening of this thread (for me anyway). I do not know 68k, but will learn it now just to read these.

    Quote Originally Posted by Dresdenboy View Post
    I will add lzsa, lz32b (ross' encoder), lz48, emcompress/smashv2 to my list soon. I will also start adding decomp sizes to the calculations when available.
    I am sure LZSA will not be competitive - it is designed for decompressor speed, not size. Same about LZ48. I do not know the other ones.

  5. #35
    Member
    Join Date
    Apr 2017
    Location
    United Kingdom
    Posts
    82
    Thanks
    68
    Thanked 33 Times in 21 Posts
    Quote Originally Posted by Dresdenboy View Post
    After digging around a bit more on the interweb, I found this huge analysis of Z80 compressors, their resulting filesizes and decompressor sizes. Were all those tables created by you, introspec?
    https://uniabis.net/pico/msx/z80packer/

    Edit 1: Related Z80 decompressor sources are available here: https://github.com/uniabis/z80depacker

    I guess that those decompressor sizes are roughly comparable to x86 solutions (16b, except for some interesting special x86 ops), more than to 68k. So they could work as an estimate for decompressor code complexity, which could be weighted against the compressed data.
    No, I do not publish the results of my tests in this way because I do not believe it is practical (I currently run my tests on two corpora of small ZX Spectrum-related files, with hundreds of files). I prefer to condense the results of my tests into overall results tables and diagrams like the one you can find on LZSA webpage (I generate these diagrams on a reasonably regular basis).

    I agree that Z80 code density is not very far from x86 (with just one caveat of severely reduced performance). This is why I tend to trust my Z80 tests a bit more than many people here would feel is reasonable

    Quote Originally Posted by Dresdenboy View Post
    Edit2: I also found this very interesting blog posting, which has been published by modexp just two weeks ago:
    https://modexp.wordpress.com/2020/05...ite-shellcode/
    Is that someone also active in this forum? This posting contains a nice collection of decompressor sources, exe compressors, a compression algorithm classification, and related stuff.
    I believe I exchanged several emails with him. He collected a lot of information; it will take time to take it all in.

    Quote Originally Posted by Dresdenboy View Post
    From my own stats and from what I've read so far, the most promising concepts (for small code compression) are:

    • 1 byte matches
    • LZSS style compression
    • efficient handling of 2 byte matches
    • rep offsets (indexed ringbuffer or whatever fits in small decompressor code)
    • byte masks for matches
    • MTF (has similar effects as 1B matches, only one variant useful)
    • efficient handling of short literal runs or no literals at all (I see run length distributions of roughly 0: n, 1: n/2, 2: n/4, and so on, mostly caused by lots of 1B matches)
    • unary and interleaved gamma codes
    • specialized op byte prefix handling (0x0f, 0x66, 0x67, Z80 ones, etc.)
    • leaving out anything generic (code mover, initializations), or better suited for bigger files (long matches, handling of many different cases)
    • adaption of the decompressor code in combination with encoding.
    I agree with some of it. 1 byte matches are interesting, but I am undecided at present, whether they are a way to reduce the number of literals or valid compression mechanism. Need to look into it more. LZSS is not a heading I'd use (LZSS is usually used to imply a one bit indicator of literal/match, which is definitely not necessarily the only way to do things, see my link to Charles Bloom's post at the start of this thread). 2-byte matches are important for sure. rep offsets - we are actually experimenting with them at the moment. Byte masks - I am not sure what you mean. MTF - by itself is probably not a compression algorithm as such, together with even the simplest entropy coder is probably ~40-50 bytes. It is worth trying, for sure, but I am sceptical. Even with universal codes instead of the true entropy coder it is likely to be a bit big. Efficient representation of literal runs is another interesting topic which is not discussed enough and I do not have a good answer to this one. Adaptation of the decompressor to specific data is another thing we are looking into at present.
    Last edited by introspec; 12th June 2020 at 01:35. Reason: completed the incomplete sentence

  6. #36
    Member
    Join Date
    May 2020
    Location
    Berlin
    Posts
    76
    Thanks
    21
    Thanked 25 Times in 20 Posts
    Quote Originally Posted by introspec View Post
    My apologies for slow replying; my family recently got expanded, so my time for compression massively shrunk!
    Congratulations! Enjoy this time! You see: I'm also not the fastest.
    Quote Originally Posted by introspec View Post
    It is interesting from my perspective to see so many variations of paq, which I am sure won't be particularly compact (oneKpaq is massively impressive in this sense, will have to study it). paq, deflate - I do not believe most of these technologies to be really usable in 1K. I did some research on 1K intros for ZX Spectrum, the top 3 most popular compressors in 1K productions are:
    1) ZX7 (because its decompressor is only 69 bytes and the shortest stub to unpack and run could be 6-7 bytes long.
    2) MegaLZ (because it compresses better and its decompressor is 110 bytes long (could be reduced to 88 bytes)). Its modern compressor is available as part of https://github.com/lvd2/mhmt
    3) AteBit's internal compressor for micro intros (used only by AteBit on ZX Spectrum, but its decompressor is only 32 bytes long). Its compressor is included as part of this release: https://www.pouet.net/prod.php?which=53074 (but seems suboptimal and is, therefore, likely to be best re-written from scratch).

    You should probably also look into
    4) zx7mini would be an interesting option (although, I believe, not the strongest performer). See https://github.com/antoniovillena/zx7mini
    5) Pletter 4 could potentially be competitive (although it is very close to ZX7 and zx7mini). See http://www.xl2s.tk/
    Thanks for your suggestions. The more the better. I already had this thought, that my script could be simply extended to do a meta optimization by finding the best combination of compressors and its parameters. Knowing/estimating some size
    optimizatons by removing code for handling unused encodings might help finding the best solutions.

    Quote Originally Posted by introspec View Post
    I think that if you run the stats, you'll find that the vast majority of commands with prefixes are not used in size coding on Z80, because they are mostly not particularly space-efficient. My point is that the savings from such emitter are likely to be not really worth it on Z80.
    I did some tests with Z80 intros. There was one often seen sequence: 23h 36h, which actually are 2 opcodes. But the found matches already reduce this count, rendering this method useless.

    (to be continued)

  7. #37
    Member
    Join Date
    Apr 2017
    Location
    United Kingdom
    Posts
    82
    Thanks
    68
    Thanked 33 Times in 21 Posts
    Quote Originally Posted by Dresdenboy View Post
    I did some tests with Z80 intros. There was one often seen sequence: 23h 36h, which actually are 2 opcodes. But the found matches already reduce this count, rendering this method useless.
    I can tell you immediately that this sequence is a common speedcode, which also makes a common appearance in code generators (inc hl : ld (hl),const). I can assure you that it is not going to be common enough in the vast majority of tiny intros to justify a 10-20 byte decoder just for that.

  8. #38
    Member
    Join Date
    May 2020
    Location
    Berlin
    Posts
    76
    Thanks
    21
    Thanked 25 Times in 20 Posts
    Quote Originally Posted by introspec View Post
    Variety of shifts, test, set and reset commands are available. The set is not very balanced, so your mileage will strongly depend on what you do, actually.
    Yeah, especially when considering opcode sizes. I just experienced that in practice, when looking at the Ate Bit decompressor you mentioned. I thought, one dec b and later add hl, bc could be replaced by sbc hl, bc. But the latter op is 2 bytes. BTW I could add their compressor to my test bench.

    Quote Originally Posted by introspec View Post
    These links are amazing! this single post of yours fully justified my opening of this thread (for me anyway). I do not know 68k, but will learn it now just to read these.
    I'm glad that you enjoyed them. 68k is not that complicated, maybe even more straightforward (and nearly orthogonal). Different conditional branches, some postincrement and predecrement addressing modes, mul, div..

    Quote Originally Posted by introspec View Post
    No, I do not publish the results of my tests in this way because I do not believe it is practical (I currently run my tests on two corpora of small ZX Spectrum-related files, with hundreds of files). I prefer to condense the results of my tests into overall results tables and diagrams like the one you can find on LZSA webpage (I generate these diagrams on a reasonably regular basis).
    I agree with this based on a different scope: compression ratio vs. decompression speed. There might still be some variances, which could hint at a better result with a different compressor which is close on the chart (for an individual file). Averages vs. full distributions.

    Quote Originally Posted by introspec View Post
    I agree with some of it. 1 byte matches are interesting, but I am undecided at present, whether they are a way to reduce the number of literals or valid compression mechanism. Need to look into it more. LZSS is not a heading I'd use (LZSS is usually used to imply a one bit indicator of literal/match, which is definitely not necessarily the only way to do things, see my link to Charles Bloom's post at the start of this thread). 2-byte matches are important for sure. rep offsets - we are actually experimenting with them at the moment. Byte masks - I am not sure what you mean. MTF - by itself is probably not a compression algorithm as such, together with even the simplest entropy coder is probably ~40-50 bytes. It is worth trying, for sure, but I am sceptical. Even with universal codes instead of the true entropy coder it is likely to be a bit big. Efficient representation of literal runs is another interesting topic which is not discussed enough and I do not have a good answer to this one. Adaptation of the decompressor to specific data is another thing we are looking into at present.
    With MTF I actually meant a variant to encode common bytes (op codes, relative jump offsets, constants) by using a MTF scheme. Encoding an index pointing into a limited MTF buffer (e.g. with 16 or 32 entries using 4b or 5b respectively) would use less bits. The bytes not found there would still remain in a "literal" status.

    I'll go into more detail on the other ideas in my mail, as this might easily exceed the scope of this thread.

    To add something new, here is my latest plot:
    Click image for larger version. 

Name:	Figure_compression_ratios.png 
Views:	49 
Size:	382.0 KB 
ID:	7702
    It shows, what I'm actually looking for: Where do specific algorithm families or actual implementations (when containing specific features) show the best performance? Some seem to work better at smaller files than on bigger ones (still < xy KB), moving up or down in the overall rankings for different sizes, while others are overall better or worse compressors. Most compressors (and abbreviations I used) should be known to you. Some are kind of redundant (e.g. hrust and hst, the latter being the variant included in mhmt). "exo" is Exomizer 3 raw incl. the encoding tables, "exor" is without (as they might be optimized/reduced with lots of zeros for small files).

  9. #39
    Member
    Join Date
    May 2020
    Location
    Berlin
    Posts
    76
    Thanks
    21
    Thanked 25 Times in 20 Posts
    Quote Originally Posted by Stefan Atev View Post
    My experience being with x86 1K intros, this certainly resonates; at the end of the day, the tiny (de)compressor should only be helping you with code - all the data in the intro should be custom-packed anyway, in a way that makes it difficult to compress for LZ-based algorithms. For example, I remember using basically 2bits per beat for an audio track (2 instruments only, both OPL-2 synth); Fonts would be packed, etc.

    4K is different, I think there you just have a lot more room. And for 128B and 256B demos, compression is very unlikely to help, I think.
    Getting back to this one...
    Biggest part of those small intros is code anyway - if there is data at all. Many intros use code bytes as integer or float constants anyway -> except you need some structured data (at least 2 elements). But even then some of todays tiny intros play code bytes as MIDI notes and so on.

    But regarding compressibility of such code: One important point is that the final size of the compressed file is limited to some 2^n bytes. This means, we're not looking at compressing files already at this size limit, but want to get as much code as possible under this (compo-entry-)barrier. And as you said, one would help the compressor. This is an interesting fact, as code would look very different in this case. Some intro coders already noted this (e.g. unrolled loops instead of using a counter register and setup/increment/loop instructions, or macros instead of subroutines). With appropriate tools at hand, we might be looking at 650B to be compressed into 512 Bytes incl. decompressor stub.

    It is even not necessary to use a compression format, which works on any data (generic data compressor). Since the coder is still involved, it would be possible, to have a format with limits, which might not work on any file, but on the small ones, with some option of adapting the file to make compression work. Here's an example: the format might only support literal run lengths up to 32. If there is a longer run, there is no need to provide code to handle this (more decompressor code and more encoding bits), but the coder (the human one) might just adapt this block a bit to cause a match or so.

  10. #40
    Member
    Join Date
    May 2020
    Location
    Berlin
    Posts
    76
    Thanks
    21
    Thanked 25 Times in 20 Posts
    My own experiments look promising. With a mix of LZW, LZSS and other ideas (not ANS though ), I can get close to apultra, exomizer, packfire for smaller files, while the decompression logic is still smaller than theirs.
    Last edited by Dresdenboy; 4th August 2020 at 18:04. Reason: Not allowed to tell ;)

  11. #41
    Member
    Join Date
    Aug 2016
    Location
    USA
    Posts
    73
    Thanks
    17
    Thanked 23 Times in 18 Posts
    This has all had me trying to scratch the compression bug again I am trying several ideas out, will let you know if they pan out:
    ‚Äč
    1. No literals (construct a "dictionary" of 256 bytes so that a match (of length 1) is always guaranteed - the code to set that up is very short); that alone is not very promising (literals encoded as len-1 matches will definitely use more than the 9 bits LZSS does); I am trying an offset encoding scheme that's very simple but unfortunately hard to optimize for (if you must _always_ be able to encode a match, you may need too many bits for the match offset - trying to use multiple "offset contexts"); For this to work, the savings of 1 bit from each match must offset the cost of adding len-1 matches as a replacement for literals. It will also simplify the decompressor size if everything is a match.
    2. Imprecise lengths - especially if you can reuse previous match offsets, it's OK if there are "gaps" in representable match lengths; an especially long match will just be encoded with multiple shorter matches; that only makes sense if you expect to have long matches and very few bits dedicated to offsets. A Goulomb or Fibonacci length distribution seems too difficult for a tiny decompressor, but I think there are easier ways to stretch out your length budget.

    I am trying to see if there's a way to stick to byte-oriented IO, no variable-length codes, etc (just more fun, I think; the best compression likely needs variable-length codes even if they add a little decompressor complexity). I guess I will shoot to have a program that spits out the compressed data and a small decompressor (probably just C source to match the data); The inputs being small at least allows me not to worry about being reasonable about memory consumption, match finding performance, etc.. It turns out (as we all knew) that trying to do optimal decisions when encoding choices can drastically affect future output is quite difficult.

  12. #42
    Member
    Join Date
    May 2020
    Location
    Berlin
    Posts
    76
    Thanks
    21
    Thanked 25 Times in 20 Posts
    @Stefan:
    1. This sounds interesting, but might not work. I put len-1-matches into my list because several good LZ77-family compressors use them (with 4b offsets) to avoid the longer encoding of literals in case of no longer matches.
    You might also consider literal runs (early in a file) vs. match runs (1-n matches following).
    You might also check out the literal escaping mechanism of pucrunch.
    2. Fibonacci sum calculation is a few instructions. But it quickly adds up. Gamma is cheap instead in code footprint aspects.

    Bit oriented encoding is also cheap in asm (the widely seen get_bit subroutines with a shift register, which also is being filled with bytes, or BT instructions).
    Last edited by Dresdenboy; 11th July 2020 at 11:45.

  13. #43
    Member
    Join Date
    Jul 2020
    Location
    EU
    Posts
    7
    Thanks
    0
    Thanked 0 Times in 0 Posts
    https://yupferris.github.io/blog/201...S-on-6502.html
    This bit-level ANS/rABS implementation for 6502 looks like it'd be pretty small. Maybe not 40 bytes, but should be <100.

  14. #44
    Member
    Join Date
    May 2020
    Location
    Berlin
    Posts
    76
    Thanks
    21
    Thanked 25 Times in 20 Posts
    Quote Originally Posted by zeit View Post
    https://yupferris.github.io/blog/201...S-on-6502.html
    This bit-level ANS/rABS implementation for 6502 looks like it'd be pretty small. Maybe not 40 bytes, but should be <100.
    Yes, it would fit in here. The question is which probabilities it could be used for - in a simpler decompressor than "Ferris" did with a 4 KiB executable. BTW that link went in a full circle now, after I posted it in some other thread in this forum (but not considering it for this one).

    Ferris talks about using an arithmetic decoder with 20 instructions instead of an rANS decoder with 24 instructions in this video: https://www.youtube.com/watch?v=5bRrUr76rc4&t=1200s
    You can also see his code there. I think I found the small decompressor (to decompress the 2.4 KiB second decompressor) to be 247 B. The decoder code is being reused by the big decompressor afterwards.
    Compare that (the small decompressor ("smol squishy")) to the one of oneKpaq with 128 B on 32b x86, with even slightly better compression ratio. There is still some room for improvement I think (or useful simplifications).

    Addendum: Crinkler went open source recently. It also uses PAQ like compression algorithms, but also contains a reduced variant for 1k intros (as PE formatted x86 exe files) with a small decompressor (full PE header reused and filled in decompressor source)
    https://github.com/runestubbe/Crinkler

  15. #45
    Member
    Join Date
    May 2020
    Location
    Berlin
    Posts
    76
    Thanks
    21
    Thanked 25 Times in 20 Posts
    I got hold of the BeRoExePacker depacker sources for some older version of 2008. This is not including LZBRA (LZSS+AC), a PAQ variant based on kkrunchy, or some LZP+context modelling variant. But it includes LZBRS, LZBRR and LZMA (see BeRo's blog for some details). The sources could be found in this Chinese forum: https://bbs.pediy.com/thread-71242.htm
    But since it required some complex registration and contains the packer, which triggers security mechanisms all over the place (Win, Chrome, Firefox..), I stripped the exe from the archive. The current version can be downloaded from BeRo's blog.

    LZBRS depacker (without CLD - clear direction flag, and source/dest init) is 69 Bytes in 32b x86 asm (going 16b would save 10B from long call adresses, and add a byte here and there for replacing LEAs, see disasm):
    Code:
    00000274  BE02714000        mov esi,0x407102
    00000279  BF00204000        mov edi,0x402000
    0000027E  FC                cld
    0000027F  AD                lodsd
    00000280  8D1C07            lea ebx,[edi+eax]
    00000283  B080              mov al,0x80
    00000285  3BFB              cmp edi,ebx
    00000287  733B              jnc 0x2c4
    00000289  E81C000000        call dword 0x2aa
    0000028E  7203              jc 0x293
    00000290  A4                movsb
    00000291  EBF2              jmp short 0x285
    00000293  E81A000000        call dword 0x2b2
    00000298  8D51FF            lea edx,[ecx-0x1]
    0000029B  E812000000        call dword 0x2b2
    000002A0  56                push esi
    000002A1  8BF7              mov esi,edi
    000002A3  2BF2              sub esi,edx
    000002A5  F3A4              rep movsb
    000002A7  5E                pop esi
    000002A8  EBDB              jmp short 0x285
    000002AA  02C0              add al,al
    000002AC  7503              jnz 0x2b1
    000002AE  AC                lodsb
    000002AF  12C0              adc al,al
    000002B1  C3                ret
    000002B2  33C9              xor ecx,ecx
    000002B4  41                inc ecx
    000002B5  E8F0FFFFFF        call dword 0x2aa
    000002BA  13C9              adc ecx,ecx
    000002BC  E8E9FFFFFF        call dword 0x2aa
    000002C1  72F2              jc 0x2b5
    000002C3  C3                ret
    LZBRR depacker is (same conditions) 149 Bytes in 32b x86 asm (10 long relative call adresses, which would be 20B less in 16b asm):
    Code:
    00000274  BE52714000        mov esi,0x407152
    00000279  BF00204000        mov edi,0x402000
    0000027E  FC                cld
    0000027F  B280              mov dl,0x80
    00000281  33DB              xor ebx,ebx
    00000283  A4                movsb
    00000284  B302              mov bl,0x2
    00000286  E86D000000        call dword 0x2f8
    0000028B  73F6              jnc 0x283
    0000028D  33C9              xor ecx,ecx
    0000028F  E864000000        call dword 0x2f8
    00000294  731C              jnc 0x2b2
    00000296  33C0              xor eax,eax
    00000298  E85B000000        call dword 0x2f8
    0000029D  7323              jnc 0x2c2
    0000029F  B302              mov bl,0x2
    000002A1  41                inc ecx
    000002A2  B010              mov al,0x10
    000002A4  E84F000000        call dword 0x2f8
    000002A9  12C0              adc al,al
    000002AB  73F7              jnc 0x2a4
    000002AD  753F              jnz 0x2ee
    000002AF  AA                stosb
    000002B0  EBD4              jmp short 0x286
    000002B2  E84D000000        call dword 0x304
    000002B7  2BCB              sub ecx,ebx
    000002B9  7510              jnz 0x2cb
    000002BB  E842000000        call dword 0x302
    000002C0  EB28              jmp short 0x2ea
    000002C2  AC                lodsb
    000002C3  D1E8              shr eax,1
    000002C5  744D              jz 0x314
    000002C7  13C9              adc ecx,ecx
    000002C9  EB1C              jmp short 0x2e7
    000002CB  91                xchg eax,ecx
    000002CC  48                dec eax
    000002CD  C1E008            shl eax,byte 0x8
    000002D0  AC                lodsb
    000002D1  E82C000000        call dword 0x302
    000002D6  3D007D0000        cmp eax,0x7d00
    000002DB  730A              jnc 0x2e7
    000002DD  80FC05            cmp ah,0x5
    000002E0  7306              jnc 0x2e8
    000002E2  83F87F            cmp eax,byte +0x7f
    000002E5  7702              ja 0x2e9
    000002E7  41                inc ecx
    000002E8  41                inc ecx
    000002E9  95                xchg eax,ebp
    000002EA  8BC5              mov eax,ebp
    000002EC  B301              mov bl,0x1
    000002EE  56                push esi
    000002EF  8BF7              mov esi,edi
    000002F1  2BF0              sub esi,eax
    000002F3  F3A4              rep movsb
    000002F5  5E                pop esi
    000002F6  EB8E              jmp short 0x286
    000002F8  02D2              add dl,dl
    000002FA  7505              jnz 0x301
    000002FC  8A16              mov dl,[esi]
    000002FE  46                inc esi
    000002FF  12D2              adc dl,dl
    00000301  C3                ret
    00000302  33C9              xor ecx,ecx
    00000304  41                inc ecx
    00000305  E8EEFFFFFF        call dword 0x2f8
    0000030A  13C9              adc ecx,ecx
    0000030C  E8E7FFFFFF        call dword 0x2f8
    00000311  72F2              jc 0x305
    00000313  C3                ret
    Attached Files Attached Files
    Last edited by Dresdenboy; 4th August 2020 at 13:50. Reason: adding code

  16. #46
    Member
    Join Date
    May 2020
    Location
    Berlin
    Posts
    76
    Thanks
    21
    Thanked 25 Times in 20 Posts
    Temisu (oneKpaq developer) also has collected a lot of depacker code (C++) here: https://github.com/temisu/ancient
    There are no small asm depackers, though, but at least the sources could give an idea of what could be implemented with small code footprint in asm.

    Another search revealed some more information and sources of baah's packer "louzy77" (the 32B depacker on 68000): http://abrobecker.free.fr/text/louzy.htm

  17. #47
    Member
    Join Date
    Aug 2016
    Location
    USA
    Posts
    73
    Thanks
    17
    Thanked 23 Times in 18 Posts
    Hi, is there a place where I can find decompressed demos? I'd like to use real data when tweaking parameters / experimenting.

  18. #48
    Member
    Join Date
    May 2020
    Location
    Berlin
    Posts
    76
    Thanks
    21
    Thanked 25 Times in 20 Posts
    You might check the Hardcode collection (http://hardcode.untergrund.net/). It surely has some uncompressed intros >=1k and others might be decompressed by using some general unpacker or common tools (apack, upx) to decompress. Or use a debugger (which I did for a few).
    Beware of AV software warnings, though!

  19. Thanks:

    Stefan Atev (10th August 2020)

  20. #49
    Member
    Join Date
    Aug 2016
    Location
    USA
    Posts
    73
    Thanks
    17
    Thanked 23 Times in 18 Posts
    Quote Originally Posted by Dresdenboy View Post
    You might check the Hardcode collection (http://hardcode.untergrund.net/). It surely has some uncompressed intros >=1k and others might be decompressed by using some general unpacker or common tools (apack, upx) to decompress. Or use a debugger (which I did for a few).
    Beware of AV software warnings, though!
    Thanks, this will save me some digging. I don't need a ton of examples, even 1 or 2 examples are better than pretending that text or that random chunks of compiler-produced x64 is representative....

Page 2 of 2 FirstFirst 12

Similar Threads

  1. Seeking extremely easily non secure pwd hash
    By SvenBent in forum Data Compression
    Replies: 8
    Last Post: 3rd September 2016, 13:00
  2. Extremely fast hash
    By Bulat Ziganshin in forum Data Compression
    Replies: 36
    Last Post: 23rd August 2013, 22:25

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •