Page 32 of 33 FirstFirst ... 2230313233 LastLast
Results 931 to 960 of 970

Thread: Paq8pxd dict

  1. #931
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,946
    Thanks
    294
    Thanked 1,286 Times in 728 Posts
    > I am trying to learn some by modificating sources

    There're some more parameters actually.
    First there's the tree cutting flag (1=cut some branches to free memory, 0=reset the whole tree)

    And this line: https://github.com/Shelwien/ppmd_sh/..._flush.inc#L90
    controls how much memory is freed (>>2,*3->3/4, but it can be any number).

    > if use more memory shouldn't I have more compression?

    As I said, unlike CM with hashtable, PPM doesn't always use all the allocated memory.
    Basically the memory is only given to internal memory manager.
    1680M is enough to process most of enwik8 at o16, so there's no benefit in giving it more memory.
    However its possible to specify higher order, for example o32 needs 2648M, o48 needs 2925M.

  2. #932
    Member
    Join Date
    Sep 2014
    Location
    Italy
    Posts
    90
    Thanks
    97
    Thanked 35 Times in 22 Posts
    Thank you for the explanation!
    Luca

  3. #933
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,007
    Thanks
    97
    Thanked 401 Times in 279 Posts
    Quote Originally Posted by Sportman View Post
    enwik8 - 16,372,331 bytes, 2826.75 sec., 64bit -10
    Quote Originally Posted by Sportman View Post
    Enwik8:
    16,361,221 bytes, 3,084.091 sec., paq8pxd_v12_biondivers1_x64 -11
    Intel single core and memory speed progress in almost 6 years:

    enwik8:
    16,372,331 bytes, 1,991.736 sec., paq8pxd_v12-skbuild-1-win64 -10
    16,361,221 bytes, 2,203.392 sec., paq8pxd_v12_biondivers1_x64 -11

  4. Thanks (2):

    Darek (20th June 2020),LucaBiondi (20th June 2020)

  5. #934
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,195
    Thanks
    722
    Thanked 481 Times in 371 Posts
    40% of improvement by 6 years, this means about 7% of CAGR... in single thread performance
    Let's hope next CPU architectures will improve it faster.

  6. #935
    Member
    Join Date
    Sep 2014
    Location
    Italy
    Posts
    90
    Thanks
    97
    Thanked 35 Times in 22 Posts
    Quote Originally Posted by Sportman View Post
    Intel single core and memory speed progress in almost 6 years:

    enwik8:
    16,372,331 bytes, 1,991.736 sec., paq8pxd_v12-skbuild-1-win64 -10
    16,361,221 bytes, 2,203.392 sec., paq8pxd_v12_biondivers1_x64 -11
    Hi Sportman... how many years that i did not view my little biondivers version of paq8px

  7. #936
    Member
    Join Date
    Sep 2014
    Location
    Italy
    Posts
    90
    Thanks
    97
    Thanked 35 Times in 22 Posts
    Yes should be nice have inside paq8pxd a little procedure run as a benchmark. Could be useful to compare various cpu... maybe in future...

  8. #937
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,195
    Thanks
    722
    Thanked 481 Times in 371 Posts
    Quote Originally Posted by LucaBiondi View Post
    Yes should be nice have inside paq8pxd a little procedure run as a benchmark. Could be useful to compare various cpu... maybe in future...
    I regret that I didn't start measuring the times at the beginning of my benchmark in 1996... as I remember whole set was compressed in couple of hours. Of course compression programs was different (less memory, faster, etc.) but...

    Accoding to procedure - enwik8 looks now for best example of testfile (for older CPU could tekaes about couple of hours to do), however different versions and builds have different speeds and the "difference is different on different CPU" (AVS/SSE usage... etc.) - maybe it could be selected build/s to test it and gather the data?

    Couple years ago Piotr Tarsa pubished it compression database site: https://tarsa.github.io/lossless-benchmark/
    Maybe this would be a core/source to build such benchmark?

  9. Thanks:

    LucaBiondi (25th June 2020)

  10. #938
    Member
    Join Date
    Sep 2014
    Location
    Italy
    Posts
    90
    Thanks
    97
    Thanked 35 Times in 22 Posts
    Comparison between

    PAX8PDX_V89 -X12 standard vs. PAX8PDX_V89 -X12 modified with:
    ppmd_12_256_1.Init(40,3360,1,0);
    Click image for larger version. 

Name:	paq8pxd_v89_o40.png 
Views:	33 
Size:	88.8 KB 
ID:	7712

    Usig PPMD with o=40 and 3360 MB in total i earned only 7,3 KB.
    P.S. ENWIK8 reach 15.816.569 byte!

    Ask me the exe in case.

    Bye,
    ​Luca

    Shelwien, what do you think about? Did you aspect more gain?
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	image.png 
Views:	17 
Size:	130.8 KB 
ID:	7711  
    Last edited by LucaBiondi; 26th June 2020 at 11:48.

  11. Thanks:

    Darek (2nd July 2020)

  12. #939
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,195
    Thanks
    722
    Thanked 481 Times in 371 Posts
    @LucaBiondi - could you attach exe file of modified paq8pxd_v89?
    According to benchmark procedure - good idea in my opinion but there should be the same benchmark file to test it - maybe procedurally generated one by progam before start the test?

    @Kaitz - I have some idea but maybe it could be silly or not duable. Is it possible to use some sort of very light compression of program memory during use?
    As I understand majority of memory is use on some kinds of trees or other structure data representative.
    Is it possible to use lightly compressed data which would be virtual simulation of more memory usage?
    I think it could be still room for improvement for the biggest files (enwik8/9) if we could use more memory but maybe is not need to use more phisical memory and instead of this made kind of trick like this?
    Of course it would be more time consuming but maybe it could be worth it...
    Last edited by Darek; 2nd July 2020 at 16:25.

  13. Thanks:

    LucaBiondi (2nd July 2020)

  14. #940
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    510
    Thanks
    208
    Thanked 348 Times in 185 Posts
    I cant see how this can be done.
    Last version uses less memory 5gb or less (max option), cant remember. And size diff for enwik9 is 50kb (worse). So it is not only about memory.
    There are no new(+) context in wordmodel since v80, only preprocessing for enwik.

    Next time i will probably work on this in feb.

    edit:
    +RC,modSSE
    KZo


  15. #941
    Member
    Join Date
    Sep 2014
    Location
    Italy
    Posts
    90
    Thanks
    97
    Thanked 35 Times in 22 Posts

    Thumbs up Paq8pxd_v89_40_3360 executable

    Hi Darek this is the executable.
    Happy testing!

    paq8pxd_v89_40_3360.zip

    L
    uca
    Attached Files Attached Files

  16. Thanks (2):

    Darek (2nd July 2020),paleski (5th July 2020)

  17. #942
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,195
    Thanks
    722
    Thanked 481 Times in 371 Posts
    Scores of paq8pxd_v89_40_3360 on my testset. In general 640 bytes of improvement. Always something!
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	paq8pxd_v89_Luca.jpg 
Views:	24 
Size:	773.0 KB 
ID:	7727  

  18. Thanks:

    LucaBiondi (3rd July 2020)

  19. #943
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,946
    Thanks
    294
    Thanked 1,286 Times in 728 Posts
    > Shelwien, what do you think about? Did you expect more gain?

    Not really, in PPMd these parameters have little effect on actual statistics and predictions.
    Like, o40 is only relevant for symbols after 40-byte repeated prefix, and m3360 vs m1360
    is only relevant for files longer than 90m... maybe test it on enwik9?

    However isn't it still better to use tested parameter profiles rather than something random?
    Btw, there're still other options - like an instance with something like o6 m1 r0,
    or an instance updated with filtered byte values (upcased, or with all non-letters replaced with space).

    Also it could be interesting to see where ppmd actually provides benefits,
    ie in which contexts it beats paq model.

  20. #944
    Member
    Join Date
    Sep 2014
    Location
    Italy
    Posts
    90
    Thanks
    97
    Thanked 35 Times in 22 Posts
    Hi Shelwien,
    Thank you for your explanation!
    I am trying to learn how paq8px/pxd works but you know is not easy at all!
    I am a delphi developer and i know c only a little.
    But ...i am trying and my goal is to learn.

    What do you mean exactly with "instance updated with filtered byte values"
    I you have some time try to explain me and if i will be able i will do it.

    Luca


    Learn to a developer and you will have a collegues ....

  21. #945
    Member
    Join Date
    Sep 2014
    Location
    Italy
    Posts
    90
    Thanks
    97
    Thanked 35 Times in 22 Posts
    Thank you Darek!!!

  22. #946
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,946
    Thanks
    294
    Thanked 1,286 Times in 728 Posts
    > I am trying to learn how paq8px/pxd works but you know is not easy at all!

    Did you read DCE? http://mattmahoney.net/dc/dce.html

    > I am a delphi developer and i know c only a little.

    Fortunately paq doesn't use that much of mainstream C++.
    There're some tools like this: https://github.com/WouterVanNifterick/C-To-Delphi

    > What do you mean exactly with "instance updated with filtered byte values"

    mod_ppmd.inc has this function (at the end):
        U32 ppmd_Predict( U32 SCALE, U32 y ) {
    if( cxt==0 ) cxt=1; else cxt+=cxt+y;
    if( cxt>=256 ) ppmd_UpdateByte( U8(cxt) ), cxt=1;
    if( cxt==1 ) ppmd_PrepareByte();
    U32 p = U64(U64(SCALE-2)*trF[cxt])/trT[cxt]+1;
    return p;
    }

    Unlike paq's main model, it does update for a whole byte at once,
    so we can change it to something like ppmd_UpdateByte( Map[U8(cxt)] )
    which could provide effects similar to preprocessing.

  23. Thanks:

    LucaBiondi (3rd July 2020)

  24. #947
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,195
    Thanks
    722
    Thanked 481 Times in 371 Posts
    Scores for 4 Corpuses by paqone paq8pxd_v89_40_3360. This versionget better scores for all corpuses.
    For all Silesia, Calgary and MaximumCompression set the paq8pxd records, however MaximumCompression tar version 5'991'491 bytes is in my opinion best scores ever!

    For Silesia there are 15KB of gain - nice!

    There is another tfing worth to mention for "nci" file from Silesia this version got the best score ever - beat cmix v18!
    The same case of for A10.jpg, FP.FOG and vcfiu.hlp from Maximum Compression!
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	paq8pxd_v89_Luca_4_Corpuses.jpg 
Views:	16 
Size:	2.78 MB 
ID:	7732  

  25. #948
    Member
    Join Date
    Sep 2014
    Location
    Italy
    Posts
    90
    Thanks
    97
    Thanked 35 Times in 22 Posts
    Hi Darek are you able to test enwin9?
    I can't because is too big for me!
    Thank you!!!
    I will try to do some other expreriment!!!!

  26. #949
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,195
    Thanks
    722
    Thanked 481 Times in 371 Posts
    I'll test it. Starting from enwik8.

  27. #950
    Member
    Join Date
    Sep 2014
    Location
    Italy
    Posts
    90
    Thanks
    97
    Thanked 35 Times in 22 Posts
    Great!!!
    Have a good day!

  28. #951
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,195
    Thanks
    722
    Thanked 481 Times in 371 Posts
    15'728'903 - enwik8 -s15 -w -e1,english.dic by Paq8pxd_v89, change: -0,02%
    15'728'903 - enwik8 -s15 -w -e1,english.dic by paq8pxd_v89_40_3360, change: 0,00%

    Hmmm... identical scores!
    Testing -x15...

  29. #952
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,946
    Thanks
    294
    Thanked 1,286 Times in 728 Posts
    @Darek: -s doesn't use mod_ppmd

  30. Thanks:

    Darek (3rd July 2020)

  31. #953
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,195
    Thanks
    722
    Thanked 481 Times in 371 Posts
    Ok, then there no changes...

    15'655'526 - enwik8 -x15 -w -e1,english.dic by Paq8pxd_v89, change: -0,01%
    15'654'147 - enwik8 -x15 -w -e1,english.dic by paq8pxd_v89_40_3360, change: -0,01%

    Looks like 14KB of gain for enwik9...

  32. #954
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,195
    Thanks
    722
    Thanked 481 Times in 371 Posts
    enwik scores for paq8pxd v89 Luca's modification:

    15'728'903 - enwik8 -s15 -w -e1,english.dic by Paq8pxd_v89
    15'655'526 - enwik8 -x15 -w -e1,english.dic by Paq8pxd_v89
    123'301'984 - enwik9 -x15 -w -e1,english.dic by Paq8pxd_v89

    15'728'903 - enwik8 -s15 -w -e1,english.dic by paq8pxd_v89_40_3360, change: 0,00%
    15'654'147 - enwik8 -x15 -w -e1,english.dic by paq8pxd_v89_40_3360, change: -0,01%
    123'013'220 - enwik9 -x15 -w -e1,english.dic by paq8pxd_v89_40_3360, change: -0,23% - very good change, no more memory usage and 3% of compression time improvenment!

  33. #955
    Member
    Join Date
    Sep 2014
    Location
    Italy
    Posts
    90
    Thanks
    97
    Thanked 35 Times in 22 Posts
    Quote Originally Posted by Darek View Post
    enwik scores for paq8pxd v89 Luca's modification:

    15'728'903 - enwik8 -s15 -w -e1,english.dic by Paq8pxd_v89
    15'655'526 - enwik8 -x15 -w -e1,english.dic by Paq8pxd_v89
    123'301'984 - enwik9 -x15 -w -e1,english.dic by Paq8pxd_v89

    15'728'903 - enwik8 -s15 -w -e1,english.dic by paq8pxd_v89_40_3360, change: 0,00%
    15'654'147 - enwik8 -x15 -w -e1,english.dic by paq8pxd_v89_40_3360, change: -0,01%
    123'013'220 - enwik9 -x15 -w -e1,english.dic by paq8pxd_v89_40_3360, change: -0,23% - very good change, no more memory usage and 3% of compression time improvenment!
    Thank you Darek!
    Next time we should go under 123.000.0000 for enwik9..
    Near 300k of improveement very good!!!

    P.s. what is actual record for enwik9?

  34. #956
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,195
    Thanks
    722
    Thanked 481 Times in 371 Posts
    300KB is a very, very, good score!

    The first fourth submitted places for LTCB are:
    cmix v18 = 115,714,367
    phda9 1.8 = 116,544,849
    nncp 2019-11-16 = 119,167,224
    paq8pxd_v48_bwt1 -s14 = 126,183,029

    and there are paq8pxd_v89 and paq8sk23 to be submitted. paq8sk23score = 122'364'274 (but 2.5x higher time and 4GB more memory used)

  35. #957
    Member
    Join Date
    Sep 2014
    Location
    Italy
    Posts
    90
    Thanks
    97
    Thanked 35 Times in 22 Posts
    Hi Darek,
    I am tying to push mod_ppm to it's limit!
    Could you try to compress enwik9 with the executable attached.
    This time i have set o=60 and m=4095 (max ram supported by mod_ppm)
    Actually other tips suggested e.g. instance updated with filtered byte values (upcased, or with all non-letters replaced with space) failed to improve compression

    thank you
    ​Luca
    Attached Files Attached Files

  36. #958
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,195
    Thanks
    722
    Thanked 481 Times in 371 Posts
    OK, I'll test it. I need to finish one task in progress and then I'll run it.

    I've started from enwik8 -x15, however meantime I've tested my testbed on paq8pxd_v89_ppmd_60_4095 and the total score is worse than paq8pxd_v89_40_3360... and there no change on textual files, but we'll see...
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	paq8pxd_v89_60_4095.jpg 
Views:	14 
Size:	779.7 KB 
ID:	7785  
    Last edited by Darek; 18th July 2020 at 11:52.

  37. #959
    Member
    Join Date
    Sep 2014
    Location
    Italy
    Posts
    90
    Thanks
    97
    Thanked 35 Times in 22 Posts
    Thank you very much Darek!

  38. #960
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,195
    Thanks
    722
    Thanked 481 Times in 371 Posts
    sores for enwik8:
    15'728'903 - enwik8 -s15 -w -e1,english.dic by paq8pxd_v89_40_3360
    15'654'147 - enwik8 -x15 -w -e1,english.dic by paq8pxd_v89_40_3360

    15'728'903 - enwik8 -s15 -w -e1,english.dic by paq8pxd_v89_60_4095, change: 0,00% - yes, I know, ppmd didn't work with -s option,
    15'654'151 - enwik8 -x15 -w -e1,english.dic by paq8pxd_v89_60_4095, change: 0,00% - looks like there no change for this version..

Page 32 of 33 FirstFirst ... 2230313233 LastLast

Similar Threads

  1. FreeArc compression suite (4x4, Tornado, REP, Delta, Dict...)
    By Bulat Ziganshin in forum Data Compression
    Replies: 554
    Last Post: 26th September 2018, 02:41
  2. Dict preprocessor
    By pat357 in forum Data Compression
    Replies: 5
    Last Post: 2nd May 2014, 21:51

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •