Page 1 of 5 123 ... LastLast
Results 1 to 30 of 130

Thread: LZPM's future

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,009
    Thanks
    399
    Thanked 397 Times in 152 Posts
    Playing with my own ASM driven LZ77 coders I've found one interesting modification. No ROLZ, pure LZ77 that mostly easily outperforms current LZPM. It's cool!

    I think I should completely rewrite LZPM adding new features:
    + Pure LZ77 engine
    + Decrease memory usage
    + Faster decompression, very simple decoder
    + Add inline assembly functions, where needed

    Maybe the entire ROLZ idea is not so great idea as I thought at the beginning.


  2. #2
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    Somehow I thought from the title that this will be some kind of sad news for LZPM as an ROLZ coder
    Maybe these advantages (and non-widespreadness of ROLZ too) is the reason ROLZ isn't used very much so far. Is your variation of LZ77 at least as comparatively strong and fast as current LZPM version?

  3. #3
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    222
    Thanked 146 Times in 83 Posts
    LZ77 or ROLZ?
    LZ77 faster but not good!
    ROLZ slower but good!
    New LZ77 engine use[ context] and [keys] for store position?

  4. #4
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,009
    Thanks
    399
    Thanked 397 Times in 152 Posts
    If LZ77 (for example LZMA) is not good then I'm a chinese emperor.

    OK, some results of a new DRAFT version. This one has a dummy parsing, which even weaker than LZPM's "2". Anyway, results with different dictionary sizes:

    calgary.tar:
    64K: 960,861 bytes
    128K: 919,961 bytes
    256K: 888,468 bytes
    512K: 865,257 bytes
    1M: 849,791 bytes
    2M: 842,497 bytes

    LZPM 0.15, 2: 884,418 bytes
    LZPM 0.15, 9: 858,791 bytes

    canterbury.tar:
    64K: 569,538 bytes
    128K: 538,520 bytes
    256K: 518,184 bytes
    512K: 509,729 bytes
    1M: 504,954 bytes
    2M: 496,219 bytes

    LZPM 0.15, 2: 528,685 bytes
    LZPM 0.15, 9: 514,503 bytes

    In my plans, use a 64K-2M dictionary, do not use an optimal parsing, instead, use an advanced lazy matching. Use hash chains for string searching.


  5. #5
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,009
    Thanks
    399
    Thanked 397 Times in 152 Posts
    Quote Originally Posted by Nania Francesco Antonio
    New LZ77 engine use[ context] and [keys] for store position?
    Nope. New version does not use any such tricks.

  6. #6
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    222
    Thanked 146 Times in 83 Posts
    New LZPM use ring Buffer ?

  7. #7
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,009
    Thanks
    399
    Thanked 397 Times in 152 Posts
    Quote Originally Posted by Nania Francesco Antonio
    New LZPM use ring Buffer ?
    Kind of. I do some trick to use a circular buffer.

  8. #8
    Member
    Join Date
    May 2008
    Location
    France
    Posts
    82
    Thanks
    519
    Thanked 27 Times in 19 Posts
    For me the main interest of LZPM is fast decompression. (Slightly) better compression shouldn't harm decompression speed. So my vote go to LZ77

  9. #9
    Member
    Join Date
    Dec 2006
    Posts
    611
    Thanks
    0
    Thanked 1 Time in 1 Post
    If it's basically the same, only better compressing, faster decompressing, using 1/100th of memory, then I like it no matter what algorithm is involved

  10. #10
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,009
    Thanks
    399
    Thanked 397 Times in 152 Posts
    Done more testings. Well, with optimal parsing this LZ77 should surely outperform current LZPM. Currently I've made an additional experiments with LZ77.

    Check out this simple LZSS decoder:

    Code:
     
    procedure decompress; 
    asm 
      pushad 
      cld 
     
      lea esi,src 
      lea edi,dst 
      mov ecx,size 
     
      mov ebx,2 
     
    @@1: 
      shr ebx,1 
      cmp ebx,1 
      jne @@2 
      movzx ebx,byte ptr[esi] 
      add esi,1 
      add ebx,256 
     
    @@2: 
      test ebx,1 
      je @@3 
     
      lea eax,[esi+2] 
      movzx esi,word ptr[esi] 
     
      mov edx,ecx 
      mov ecx,esi 
     
      and esi,4095 
      add esi,1 
      neg esi 
      add esi,edi 
     
      shr ecx,12 
      add ecx,3 
      sub edx,ecx 
     
      rep movsb 
     
      mov esi,eax 
      mov ecx,edx 
     
      jmp @@4 
     
    @@3: 
      movsb 
      add ecx,-1 
     
    @@4: 
      test ecx,ecx 
      jne @@1 
     
      popad 
    end;
    The decoder needs no extra memory, no variables used - all fits in register set. It is a standard LZSS with 4KB window, match/literal flags are coded using tags (one byte represents eight flags).

    I believe that many things in this ASM source done cleverly. I think this one is mostly better than others in this category.

    This scheme has nice compression and can compete with LZOP and similar stuff. The decompression speed should be crazy...


  11. #11
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,840
    Thanks
    288
    Thanked 1,243 Times in 696 Posts
    > procedure decompress;
    > asm
    > pushad
    > cld

    You don't need CLD as win32 api doesn't work with DF set.

    > lea esi,src
    > lea edi,dst
    > mov ecx,size
    >
    > mov ebx,2

    push 2
    pop ebx

    > @@1:
    > shr ebx,1
    > cmp ebx,1
    > jne @@2

    dunno why not write this using carry flag and zero testing

    > movzx ebx,byte ptr[esi]
    > add esi,1
    > add ebx,256

    mov eax,256
    lodsb
    xchg ebx,eax

    > @@2:
    > test ebx,1
    > je @@3

    jc/jnc instead
    also you'd like to have less branches on more frequent path

    > lea eax,[esi+2]
    > movzx esi,word ptr[esi]

    again, there's lodsw which could do that +2 too

    > mov edx,ecx
    > mov ecx,esi
    >
    > and esi,4095
    > add esi,1
    > neg esi
    > add esi,edi

    you could use slightly different encoding
    and rewrite this like
    OR ESI,not 4095
    add esi,edi

    > shr ecx,12
    > add ecx,3
    > sub edx,ecx

    why not keep end offset instead of constant
    length recalculation?

    > rep movsb
    > mov esi,eax

    xchg esi,eax

    > mov ecx,edx
    >
    > jmp @@4
    >
    > @@3:
    > movsb
    > add ecx,-1

    are you sure that its faster then DEC here?
    If dec/inc have some faulty "feature", that doesn't
    mean you now have to totally avoid them.

    > @@4:
    > test ecx,ecx
    > jne @@1
    >
    > popad
    > end;
    >
    >
    > I believe that many things in this ASM source done cleverly. I think this one is
    > mostly better than others in this category.

    Well, I don't believe that
    Actually, if you upload a compressed sample of book1 or something like
    that, I'd probably be able to write some really optimized
    implementation.

  12. #12
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,009
    Thanks
    399
    Thanked 397 Times in 152 Posts
    OK, of course many things are subject to change.

    DRAFT encoder source/binary + compressed book1:
    lzss.zip

    If needed, it is possible to change the output format - i.e. different offset/match length storage.

    Most ASM optimizing tutorials say that lodsX is slow, better use mov. Using of inc/dec maybe depends on a context. ICL often uses add REG,-1.


  13. #13
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,009
    Thanks
    399
    Thanked 397 Times in 152 Posts
    Of course we should keep the final offset, say in EDX (or EBX), and each iteration check for EDI == EDX, in this case we may freed the ECX and use it in REP MOVSB.

  14. #14
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Before testing it, I thought, this implementation can compress small files better if we compare it with ROLZ based compression. At my point of view, ROLZ is well suited for large files because of its tricky large dictionary. My first test seems bad for new LZPM. Here is the result:
    Code:
    Valley.cmb (19,776,230 bytes) 
    LZPM(LZSS) -> 11,610,307 bytes) 
    ZIP -> 9,887,176 bytes 
    LZPM (-9) -> 9,367,718 bytes 
    BIT 0.2 -> 9,272,426 bytes 
    WinRAR (best) -> 8,511,238 bytes 
    WinRK (ROLZ3-Max) -> 7,552,372 bytes 
    PAQ9a (-9) -> 7,528,983 bytes 
    7-Zip (LZMA-Ultra) -> 7,507,360 bytes
    My second test shocked me
    Code:
    enwik8 -> 16.262.343 bytes
    You have won the Hutter Prize!

    Is there a bug, isn't it?
    BIT Archiver homepage: www.osmanturan.com

  15. #15
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,009
    Thanks
    399
    Thanked 397 Times in 152 Posts
    Don't mix this LZSS and future LZPM. This one is for my experimental exe packer. No bug here - the pack.exe compresses only first 32 MB of a file. Mainly this package is for Shelwien.

  16. #16
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by encode
    ...the pack.exe compresses only first 32 MB of a file...
    This is the answer for enwik8 result.
    Im fully aware that this version is kind of draft. Forgive me, but I cant see any potential when comparing with classical ROLZ for large files. But, if you want to compress executables, we cant discuss any problem due to small dictionary. If you do it better than ROLZ, of course this will be so cool I hope, you make it. I look forward for next release Good luck!
    BIT Archiver homepage: www.osmanturan.com

  17. #17
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,009
    Thanks
    399
    Thanked 397 Times in 152 Posts
    The main benefit of this LZSS is a super fast decompression and super small decoder. Like you see it written in ASM. Such things is also done for testing the LZ77 idea.

    New LZPM is FAR much complex of course. New arithmetic encoder, new model for coding literals/matches. New string searching algorithm, all new...

  18. #18
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,840
    Thanks
    288
    Thanked 1,243 Times in 696 Posts
    http://shelwien.googlepages.com/lzss_sh1.rar

    Well, I'm not sure about the outcome as I've got completely
    different results on my two machines, but keep in mind that
    I was writing this on my P4, where my new decoder is both
    faster and smaller.
    So, your original decoder was 90 bytes and my smallest
    here (5b) is 59 bytes, and I think its possible to push it
    even further with some changes in compression format.

    Code:
     
                                (1)          (2) 
    dec5d   : codesize=60, time=7.031s, time=5.625s 
    dec5c:    codesize=60, time=6.468s, time=6.391s 
    dec6a:    codesize=68, time=8.953s, time=5.406s 
    original: codesize=90, time=7.859s, time=4.312s 
     
    (1) P4 3.1 
    (2) CeleronM 1.7

  19. #19
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,009
    Thanks
    399
    Thanked 397 Times in 152 Posts
    Cool!

    It confirmed that here we have some hardware related dependencies. Cool trick with "partial loop unrolling" - which additionaly eliminates addition by MINMATCH. I do use this trick in my new LZPM.

    By the way we also may use:
    Code:
     
    MOVSW 
    MOVSB 
    REP MOVSB

    Maybe the codesize is not an issue here - i.e. favor fast code not favor small code.

    Anyway, thank you for your feedback!

  20. #20
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,009
    Thanks
    399
    Thanked 397 Times in 152 Posts
    I heard that mixing partial register (AL,AH,BL...) with full register (EAX,EBX,...) may cause a delay. (Partial register stalls) In addition to LODSB/LODSW are slow on many modern processors (better use MOV).

    What do you think? (Question mainly to Matt Mahoney)

  21. #21
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,009
    Thanks
    399
    Thanked 397 Times in 152 Posts
    Again:

    pentopt.pdf


  22. #22
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,009
    Thanks
    399
    Thanked 397 Times in 152 Posts
    Furthermore, the MOVSB and MOVSW are also not recommended. Although, REP MOVSD in some cases is cool.

  23. #23
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,840
    Thanks
    288
    Thanked 1,243 Times in 696 Posts
    > It confirmed that here we have some hardware related dependencies. Cool trick
    > with "partial loop unrolling" - which additionaly eliminates addition by
    > MINMATCH. I do use this trick in my new LZPM.

    Well, the actual trick there is checking for end of data only in outer loop.
    Also, NEG(x)=NOT(x)+1 and using the fact that xchg eax,reg is single-byte
    and always fast.

    > By the way we also may use:
    >
    > MOVSW

    16bit operations are bad in 32bit mode (encoded with operand-size prefix)

    > Maybe the codesize is not an issue here - i.e. favor fast code not favor small code.

    Smaller code was faster too, at least on the machine where I was writing it

    > I heard that mixing partial register (AL,AH,BL...) with full register
    > (EAX,EBX,...) may cause a delay.

    Thats true since the times of first pentium... but doesn't mean that you
    have to avoid using byte registers.

    > In addition to LODSB/LODSW are slow on many modern processors (better use MOV).

    Might be really true for recent CPUs, like my CeleronM...

    Guess you have to consider unrolling the rep movsb too... or turning it into
    REP MOSVD/REP MOVSB at least.

    > Again:
    > pentopt.pdf <http://www.encode.su/downloads/pentopt.pdf>

    Again:
    http://agner.org/optimize/

    Btw, your hosting is totally slow recently... it went like at 3kb/s when
    I was downloding your lzss archive... both from here and from my US shell too.

    > Furthermore, the MOVSB and MOVSW are also not recommended. Although, REP MOVSD
    > in some cases is cool.

    Well, something like that should be optimized with instruction clocks measurement for
    specific CPU... also maybe with tools like vtune.

    But first you have to find the best format

    Eg. what about using a "bit reservoir" like in mp3?
    I mean, we can encode a sequence of 32bit fields with
    more specific cases like 1:flag+7:extra+3xLiteral and
    store these "extra" bits for later use... though it
    would be tricky to encode.

    But at least imho its better to use 32bit fields for
    match/literal flags in current implementation... and
    invert the match offset.

  24. #24
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,009
    Thanks
    399
    Thanked 397 Times in 152 Posts
    Quote Originally Posted by Shelwien
    and
    invert the match offset.
    I think we should do that (to avoid the same in the decoder).

    I choose two byte field since we may compress matches starting at three bytes long. I tested things like three bytes filed (64 KB dictionary + longer matches) on large file and in some cases it has notable better compression.

    Quote Originally Posted by encode
    though it
    would be tricky to encode.
    No problem with encoder - all things should be done in favor to provide the fastest and simplest decoder. Again, with larger fields is a problem that we cant compress small matches and in most cases we loose compression. The way should be use some kind of expand length codes:
    0..254
    255 0..65535
    ...
    In this case, we still may compress a small matches, and for long matches we may use up to 64+KB window.
    in addition, now we have a long enough MAXMATCH.

  25. #25
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,009
    Thanks
    399
    Thanked 397 Times in 152 Posts
    Quote Originally Posted by encode
    Btw, your hosting is totally slow recently... it went like at 3kb/s when
    I was downloding your lzss archive... both from here and from my US shell too.
    I think I should change hosting provider. The "paid till" ends in April. Im 100% unsatisfied with current one - the web-statistics does not work for a long time, support not answers in a decent time. So, any provider recommendations are welcomed. I need just about 100-200 MB of space, PHP/SQL for forum, SSH/SFTP for file uploading.

  26. #26
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,840
    Thanks
    288
    Thanked 1,243 Times in 696 Posts
    > > though it would be tricky to encode.
    >
    > The way should be use some kind of expand length codes:
    > 0..254
    > 255 0..65535
    > ...
    > In this case, we still may compress a small matches, and for long matches we may
    > use up to 64+KB window.
    > in addition, now we have a long enough MAXMATCH.

    Somehow I doubt that you understood what I was trying to say.
    I meant a format like this:

    1ooo oooo llll llll llll llll llll llll 3 literals
    01oo oooo ssss ssss llll llll llll llll match(15: + 2 literals
    001o oooo oooo oooo oooo oooo ssss ssss match(21:

    Here stream is split into fixed-width "frames" for i/o speed improvement,
    but a frame can contain bits for future use. Of course we'd need to select
    really useful frame types instead of that random example.

    Also, mp3 already uses this technique - even in VBR mode there's only
    a few possible frame lengths, so unused part of the frame is filled with
    data for later frames.

  27. #27
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,840
    Thanks
    288
    Thanked 1,243 Times in 696 Posts
    >> Btw, your hosting is totally slow recently... it went like at 3kb/s when
    >> I was downloding your lzss archive... both from here and from my US shell too.
    >
    > I think I should change hosting provider. The "paid till" ends in April. I'm
    > 100% unsatisfied with current one - the web-statistics does not work for a long
    > time, support not answers in a decent time. So, any provider recommendations are
    > welcomed. I need just about 100-200 MB of space, PHP/SQL for forum, SSH/SFTP for
    > file uploading.

    Well, I'm using dreamhost for historical reasons. Services are mostly
    good, but they had their downtime issues... hope that's finished
    already. So basically my opinion is that you should try any major
    US hoster you'd like... keeping an english forum on a russian hosting
    is completely unreasonable.

  28. #28
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,009
    Thanks
    399
    Thanked 397 Times in 152 Posts
    I understood. We just need do some really good frame design.

    How much we loose coding each literal. Current scheme adds one bit for each literal. One byte flag for each three bytes is redundant. However we may encode a literal count. Just currently I can't see any benefit in terms of compression ratio. It should be faster, but compression...

    Another idea:
    0xxxxxxx - literal runs - 1..128 literals
    1xxxxxxx yyyyyyyy yyyyyyyy - match, match length, match offset

  29. #29
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,009
    Thanks
    399
    Thanked 397 Times in 152 Posts
    Quote Originally Posted by Shelwien
    So basically my opinion is that you should try any major
    US hoster youd like... keeping an english forum on a russian hosting
    is completely unreasonable.
    The only reason is money transfer issue. I have no bank accounts, with Russian hosters its easy to transfer money via SBERBANK.

  30. #30
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,840
    Thanks
    288
    Thanked 1,243 Times in 696 Posts
    I believe that shouldn't be a problem nowadays...
    Getting some visa (or mastercard) internet card would take an hour and &#036;2 in any commercial bank.

Page 1 of 5 123 ... LastLast

Similar Threads

  1. BCM's future
    By encode in forum Data Compression
    Replies: 17
    Last Post: 9th August 2009, 01:00
  2. Future Bandwidth.
    By Tribune in forum The Off-Topic Lounge
    Replies: 9
    Last Post: 10th October 2008, 22:56
  3. ENCODE's music
    By encode in forum Forum Archive
    Replies: 1
    Last Post: 16th October 2006, 23:19
  4. Dwing's UDA v3.00
    By spark in forum Forum Archive
    Replies: 6
    Last Post: 10th August 2006, 10:11
  5. TC - What's next
    By encode in forum Forum Archive
    Replies: 3
    Last Post: 20th June 2006, 17:06

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •