View Poll Results: What do you think about my ppmx v0.04?

Voters
33. You may not vote on this poll
  • It's great! I love it! And it works really fine for me!

    9 27.27%
  • Very promising compressor! Will wait for future improvements...

    23 69.70%
  • Nothing special, I've expected something more!

    1 3.03%
Page 2 of 3 FirstFirst 123 LastLast
Results 31 to 60 of 63

Thread: ppmx v0.04 is here!

  1. #31
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by m^2 View Post
    Getting slightly over available mamory is a slight problem, there's usually a lot of things used only once that can be swapped out w/out any problem. I found that I can get commit charge to about 1100 MB and it's ok, so for me 600 MB is OK.
    Big memory requirement is not only about the memory resource problem. It also slows down the process due to increased access area. You can observe yourself by checking different profile of BIT. Only difference is memory usage and you see that it changes something. For me, ideally a compressor should use at most 128 MiB. If it's a really compression oriented compressor, it can use 256 MiB. But, not more than this. 32-64 MiB memory usage is very practical. Since it can be implemented in small devices too (such PDA etc).

    Quote Originally Posted by m^2 View Post
    IIRC BIT used like 680 MB really and it was too much.
    Not must at that level. Since, it uses 512 MiB context + 128 MiB match model window (=640 MiB). I don't think that the other small models require ~40 MiB Anyway, it won't be a problem in next release. Since I'm making it very light

    Quote Originally Posted by m^2 View Post
    And Vista is unusably slow with 1 GB, so I doubt this configuration is really something to think of.
    I use 2 GiB with Vista x64 and I frequently face with disk trashing due to insufficient physical memory
    BIT Archiver homepage: www.osmanturan.com

  2. #32
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts

    Thumbs up

    Quote Originally Posted by encode View Post
    Check out the non-public version of PPMX v0.04a (not outside this forum, should not be placed at any benchmark site). It's quite DRAFT version with the all new core, the model is heavier and slower, still has no match model. Anyway, it's just for fun!

    P.S.
    Maybe I should continue improving the "classic" PPM scheme as v0.04...
    Thanks Ilia!

  3. #33
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,017
    Thanks
    406
    Thanked 404 Times in 154 Posts
    Returned to work under byte-wise PPM:
    calgary.tar -> 769 KB
    The speed is faster than original PPMX v0.04... Memory usage is the same.

    So I guess PPMX v0.05 will be byte oriented.

  4. #34
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,017
    Thanks
    406
    Thanked 404 Times in 154 Posts
    New version introduces new symbol removal technique. The biggest improvement on binary data:

    TraktorDJStudio3.exe (29,124,024 bytes)
    PPMX v0.04: 6,219,062 bytes
    PPMX v0.05: 5,785,032 bytes

    Reaktor.exe (14,446,592 bytes)
    PPMX v0.04: 2,322,470 bytes
    PPMX v0.05: 2,171,729 bytes

    test.exe (7,919,616 bytes)
    PPMX v0.04: 1,206,092 bytes
    PPMX v0.05: 910,656 bytes

    With the higher speed and same memory usage...

  5. #35
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,017
    Thanks
    406
    Thanked 404 Times in 154 Posts
    Just created my own brute-force optimizer...
    Not it's running with PPMX!

  6. #36
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,017
    Thanks
    406
    Thanked 404 Times in 154 Posts
    Optimizer running for two days...

    Now, I'm optimizing a hash PRIME. (8000 random primes+8000 random numbers)


  7. #37
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    True brute-force optimization can take billion years. Why don't you try simple bit flipping? Or more advanced things such as MATLAB integration as I did or m1's optimizer? If you want to use a quick n' dirty thing, here is the bit-flipping technique:

    Code:
    1-Get a random parameter
    2-Toggle a random bit within the selected parameter
    3-Test with modified parameter
    4-If an improvement has been catched, use that bit and go to step 1st
    5-If not, undo the bit
    You will catch some useful results in less time. But, keep in mind this is not useful too. Because, it's memoryless. At least you can use m1's RHC part only. It's not too hard to understand.
    BIT Archiver homepage: www.osmanturan.com

  8. #38
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,017
    Thanks
    406
    Thanked 404 Times in 154 Posts
    PPMX has not so much parameters, so brute-force is really OK!

  9. #39
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,017
    Thanks
    406
    Thanked 404 Times in 154 Posts
    Just for info, the top numbers found:
    // 920101050n
    // 269283107p
    // 717285271p
    // 204058009p
    // 191612529n
    // 678666983p
    // 996276289p
    // 492515714n
    // 314279428n
    // 877751740n

  10. #40
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by encode View Post
    PPMX has not so much parameters, so brute-force is really OK!
    Actually even a simple compressor has too much parameters for optimizing - just think some coefficient as "x1". At least you can optimize RC precision if you still insist on it has not too much parameters
    Last edited by osmanturan; 11th January 2009 at 21:22. Reason: Spelling...
    BIT Archiver homepage: www.osmanturan.com

  11. #41
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,017
    Thanks
    406
    Thanked 404 Times in 154 Posts
    Quote Originally Posted by osmanturan View Post
    Actually even a simple compressor has much too parameters for optimizing - just think some coefficient as "x1". At least you can optimize RC precision if you still insist on it has not too much parameters
    I hope you'll do that with your BIT!

  12. #42
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by encode View Post
    I hope you'll do that with your BIT!
    At least, I had tried RC precision

    BTW, you can try to optimize your escape strategy, maximal order etc. You can extend the list in a short of time.
    BIT Archiver homepage: www.osmanturan.com

  13. #43
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,017
    Thanks
    406
    Thanked 404 Times in 154 Posts
    Well, now I can for sure say that with multiplicative hashing PRIMES has no higher efficiency than regular numbers. Furthermore, the best value found so far is not a prime number!

  14. #44
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by encode View Post
    Well, now I can for sure say that with multiplicative hashing PRIMES has no higher efficiency than regular numbers. Furthermore, the best value found so far is not a prime number!
    Yep! It's as expected. Hashing coefficients have an side effect for implicitly context merging and not necessary to have an obfuscation property. So, there is no reason to insist on some prime numbers.
    BIT Archiver homepage: www.osmanturan.com

  15. #45
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,572
    Thanks
    783
    Thanked 687 Times in 372 Posts
    Quote Originally Posted by osmanturan View Post
    Code:
    1-Get a random parameter
    2-Toggle a random bit within the selected parameter
    3-Test with modified parameter
    4-If an improvement has been catched, use that bit and go to step 1st
    5-If not, undo the bit
    invented by Nature

  16. #46
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    invented by Nature
    Yep. It's GA in a way. But, not complete. At least, bit-flipping is memoryless.
    BIT Archiver homepage: www.osmanturan.com

  17. #47
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    It is not really a GA, since there's a population sized 1 and it lacks crossover. That's simple hill climbing by using a hamming distance as a neighbourhood measure (improve by testing "similar" samples nearby).

  18. #48
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,017
    Thanks
    406
    Thanked 404 Times in 154 Posts
    test.exe (7,919,616 bytes):

    WinRAR 3.80, Best: 1,185,700 bytes
    7-Zip 4.64, Ultra: 997,681 bytes
    PPMX v0.05: 897,311 bytes


  19. #49
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    926
    Thanks
    58
    Thanked 116 Times in 93 Posts
    When do we get to try ppmx 0.0.5 ?

  20. #50
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    Quote Originally Posted by osmanturan View Post
    Yep! It's as expected. Hashing coefficients have an side effect for implicitly context merging and not necessary to have an obfuscation property. So, there is no reason to insist on some prime numbers.
    It depends. If your 32 bit hash is:

    Code:
    hash = hash * (m << s) + next_byte;
    then m can be any odd number and you have an order ceil(32/s) hash. But if you compute several hashes that share a hash table, then you want the multipliers to be relatively prime to reduce collisions. The easy way to do that is to choose a different prime m for each hash.

  21. #51
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,017
    Thanks
    406
    Thanked 404 Times in 154 Posts
    I use:
    Code:
      void Update(int c) {
        for (int i=N; i>0; i--)
          hash[i]=(hash[i-1]^c)*890577147;
      }

  22. #52
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,017
    Thanks
    406
    Thanked 404 Times in 154 Posts
    Quote Originally Posted by SvenBent View Post
    When do we get to try ppmx 0.0.5 ?
    I hope very soon! Still testing some ideas... So, within a week!

  23. #53
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Matt Mahoney View Post
    It depends. If your 32 bit hash is:

    Code:
    hash = hash * (m << s) + next_byte;
    then m can be any odd number and you have an order ceil(32/s) hash. But if you compute several hashes that share a hash table, then you want the multipliers to be relatively prime to reduce collisions. The easy way to do that is to choose a different prime m for each hash.
    However, my approach which was used in LWCX is slightly different (same as Toffer's and Shelwien's though):
    Code:
    hash=(context*C)>>32; // step 1
    (...)
    hash=hash XOR (last_nibble_context*N); // step 2
    I've optimized that C and N via GA+RHC and I've found that coefficients:
    Code:
    HASH_N0=0x4e1d74b5 (1310553269 = 43 x 821 x 37123)
    HASH_N1=0x11382b3f (288893759 = 7 x 7 x 11 x 607 x 883)
    HASH_N2=0xf05c8f6e (4032597870 = 2 x 3 x 3 x 5 x 7 x 353 x 18133)
    HASH_N3=0xbdbc7a0b (3183245835 = 3 x 5 x 7 x 11 x 17 x 223 x 727)
    HASH_C0=0x7b148525018bcc9e (8868859960185572510 = 2 x 5 x 5821 x 2653429 x 57419939)
    HASH_C1=0x7307540b9da02834 (8288686048064579636 = 2 x 2 x 1021 x etc)
    HASH_C2=0x215433d3cdc27541 (2401601486078506305 = 3 x 5 x 11 x 2087 x etc)
    HASH_C3=0x6e14c071a725dfbf (7932176438074400703 = 3 x 40129 x etc)
    As you see they are not primes. Note that, each nibbles have different hashing table. In the other words, there are two hash tables in total. But, 4 context models share the same hash table.

    I meant in the previous post that sometimes collisions have positive effect on compression because of implicitly context merging. So, trying to reduce collisions should not be always a good idea.
    Last edited by osmanturan; 14th January 2009 at 23:54. Reason: Spelling + Fixing code piece...
    BIT Archiver homepage: www.osmanturan.com

  24. #54
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    It would be interesting to see the actual bit masks (primary hash, first nibble) along with the context masks... What about your match model?

  25. #55
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Context masks are still same since 0.7. Context mask optimization is still waiting because I'm busy (final exams, some overlapped jobs, a good data detection filter and structure has not implemented yet). Context masks are:
    Code:
    MODEL_MASK0 = 0x000000000000FFFF (order-2)
    MODEL_MASK1 = 0x0000000000FFFFFF (order-3)
    MODEL_MASK2 = 0x00000000FFFFFFFF (order-4)
    MODEL_MASK3 = 0x0000FFFFFFFFFFFF (order-6)
    Initial hashes are all zero. I didn't optimize them. But, I may try. Thanks.

    Match model is a bit different from 0.7. At least it doesn't use collision handling. A position stored with two pairs: a pointer and an extra hash key in hash table which for byte boundries. And also, it's activated in case matchLength>5. I intentionally optimize that check and I found "5" as I expected (because of order-6 handling).
    BIT Archiver homepage: www.osmanturan.com

  26. #56
    Programmer toffer's Avatar
    Join Date
    May 2008
    Location
    Erfurt, Germany
    Posts
    587
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I meant to show the context masks in binary and the hashing coefficients in binary, to see if there's some correlation. For me the hashing coefficients didn't look that random, just wanted to see if it's the same with yours.

  27. #57
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Sorry for confusion. Here is in the binary form:
    Code:
    MODEL_MASK0 = 0000000000000000000000000000000000000000000000001111111111111111
        HASH_C0 = 0111101100010100100001010010010100000001100010111100110010011110
    
    MODEL_MASK1 = 0000000000000000000000000000000000000000111111111111111111111111
        HASH_C1 = 0111001100000111010101000000101110011101101000000010100000110100
    
    MODEL_MASK2 = 0000000000000000000000000000000011111111111111111111111111111111
        HASH_C2 = 0010000101010100001100111101001111001101110000100111010101000001
    
    MODEL_MASK3 = 0000000000000000111111111111111111111111111111111111111111111111
        HASH_C3 = 0110111000010100110000000111000110100111001001011101111110111111
    I believe, there should be a correlation. Because, hashing is kind of pointer to the context itself (of course if they are optimized).
    BIT Archiver homepage: www.osmanturan.com

  28. #58
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,017
    Thanks
    406
    Thanked 404 Times in 154 Posts
    Quote Originally Posted by osmanturan View Post
    I meant in the previous post that sometimes collisions have positive effect on compression because of implicitly context merging. So, trying to reduce collisions should not be always a good idea.
    Who told you that I'm reducing collisions? All stuff was optimized based on smallest compressed size!

  29. #59
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by encode View Post
    Who told you that I'm reducing collisions? All stuff was optimized based on smallest compressed size!
    There is a misunderstanding. I meant, most of people believe as "choosing prime hash coefficients=less collisions=best compression ratio". But, for me this is not true. Because, at least there is an implicit context merging mechanism via collisions. So, I wouldn't expect that you find prime hash coefficients with true optimizations. And both your and my results already proved that idea.
    BIT Archiver homepage: www.osmanturan.com

  30. #60
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    But if you want some collisions, then masking out context bits is the way to do it. For example, mask bit 5 in text (upper/lower case) and low bits in images and audio. It's hard to do that by choosing magic multipliers.

Page 2 of 3 FirstFirst 123 LastLast

Similar Threads

  1. PPMX v0.05 - new PPM-based compressor
    By encode in forum Data Compression
    Replies: 49
    Last Post: 28th July 2010, 02:47
  2. ppmx v0.03 is here!
    By encode in forum Data Compression
    Replies: 13
    Last Post: 1st January 2009, 02:21
  3. PPMX v0.02 is here!
    By encode in forum Data Compression
    Replies: 26
    Last Post: 8th December 2008, 22:20
  4. PPMX - a new PPM encoder
    By encode in forum Data Compression
    Replies: 14
    Last Post: 30th November 2008, 16:03

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •