Page 17 of 22 FirstFirst ... 71516171819 ... LastLast
Results 481 to 510 of 640

Thread: Paq8pxd dict

  1. #481
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    919
    Thanks
    541
    Thanked 363 Times in 271 Posts
    Yes. There no matter if file is in subdirectory or not.
    This is not an critical error. It's only communicate but it's strange. As I wrote up to version v42 there no such communicate.

  2. #482
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    919
    Thanks
    541
    Thanked 363 Times in 271 Posts
    Scores for my testbed for v48 and v49 versions and entira table of 4 corpuses similar like for paq8px (I've added colors - green color if score is better than previous version, red if not, white if there no changes).
    v48 version got good gain from v47 (and subversions) but v49 except MaximumCompression benchmark got some loses to v48.

    Did paq8pxd have adaptive learning rate implemented and ON? If yes then latest paq8px changes could got similar effect like in paq8px_v148 version...
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	paq8pxd_v48.jpg 
Views:	35 
Size:	1.14 MB 
ID:	6004   Click image for larger version. 

Name:	paq8pxd_v49.jpg 
Views:	24 
Size:	1.14 MB 
ID:	6005   Click image for larger version. 

Name:	4_Corpuses_paq8pxd_v49.jpg 
Views:	32 
Size:	3.01 MB 
ID:	6006  

  3. The Following User Says Thank You to Darek For This Useful Post:

    kaitz (10th July 2018)

  4. #483
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    That's probably because Kaitz reverted to the old dmc model. The gains from the improved image model are still there in v49, it's just that some of the gains v48 had from the dmc model that Gotty made are now gone.

  5. #484
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    385
    Thanks
    142
    Thanked 213 Times in 115 Posts
    All versions between v47 and v48 use more contexts and memory, and are sower. I consider v48 bad release. So gain is somewhat ok v47->v49 or lack of it . As is said nothing new from my side.
    Adaptive learning in mixer is in v48 and v49. For exe,image24 (v49) and deca. For exe its probably not worth.

    There are some obvious differences in px vs pxd versions (different streams, ...). Its good to be different. So if there is no mistake on my part then gains are not 100% same.
    KZo


  6. The Following User Says Thank You to kaitz For This Useful Post:

    Darek (10th July 2018)

  7. #485
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Hungary
    Posts
    349
    Thanks
    238
    Thanked 229 Times in 124 Posts
    Quote Originally Posted by Darek View Post
    It looks like. I don't know what it is.
    It started from v43 official version.
    The "FileDisk: unable to open file (No such file or directory)" problem is indeed harmless.
    @Kaitz: It was fixed in paq8px_v130_fix2:
    Code:
    paq8px_v130_fix2
    
    -- Cleanup
    - On decompression the following message was printed: "FileDisk: unable to open file (No such file or directory)". Fixed. 
    - Printing more info when FileDisk.create() or FileDisk.createtmp() fails for any reason (name of the file and error: access denied, filen not found, etc.)
    - Cosmetic fixes in remarks and/or code (Array's chkindex, IntBuf, jpegModel, Encoder, detect(), etc.)
    
    -- Remarks --
    No model or compression improvements (archives must be binary compatible with paq8px_v130 and paq8px_v130_fix1)
    Since that version open function takes 2 parameters, the second one ensures that no error message is printed when fopen may fail (and it's accepted).

    From the current paq8px version:
      bool open(const char *filename, bool must_succeed) {
    assert(file==0);
    file = fopen(filename, "rb");
    bool success=(file!=0);
    if(!success && must_succeed){printf("Unable to open file %s (%s)", filename, strerror(errno));quit();}
    return success;
    }
    Last edited by Gotty; 13th July 2018 at 13:23.

  8. The Following 2 Users Say Thank You to Gotty For This Useful Post:

    Darek (11th July 2018),kaitz (12th July 2018)

  9. #486
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    385
    Thanks
    142
    Thanked 213 Times in 115 Posts
    paq8pxd_v50
    This version is faster on binary and text files.
    On webster about 3 min faster then v49, compression is mostly identical.

    EDIT:
    enwik8 -8 time.
    Attached Files Attached Files
    Last edited by kaitz; 15th July 2018 at 09:06.
    KZo


  10. The Following 2 Users Say Thank You to kaitz For This Useful Post:

    comp1 (13th July 2018),Darek (13th July 2018)

  11. #487
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    919
    Thanks
    541
    Thanked 363 Times in 271 Posts
    Here are scores for v50 for my testset and 4 corpuses - there are however some gains but this version is still worse than v148 (except MaximumCompression testset)
    Time tests ongoing.
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	paq8pxd_v50.jpg 
Views:	36 
Size:	612.8 KB 
ID:	6013   Click image for larger version. 

Name:	4_Corpuses_paq8pxd_v50.jpg 
Views:	31 
Size:	1.55 MB 
ID:	6016  
    Last edited by Darek; 16th July 2018 at 00:29.

  12. The Following User Says Thank You to Darek For This Useful Post:

    kaitz (16th July 2018)

  13. #488
    Member
    Join Date
    Apr 2018
    Location
    Indonesia
    Posts
    24
    Thanks
    7
    Thanked 4 Times in 4 Posts

    paq8pxd48_bwt1

    i have ported my changes in v47 to v48 and small tweak nest model
    here is the result for enwik8 using s6 option 16.525.916 bytes
    i think another textual files get better compression ratio too.
    Attached Files Attached Files

  14. #489
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    919
    Thanks
    541
    Thanked 363 Times in 271 Posts
    @bwt - one request more for v48_bwt1 - could you please name exe file adequate to version? In this case should be "paq8pxd_v48_bwt1"
    This verision is not stable - some files like I.EXE crash during the test.
    Another issue are the very big memory requirements which forces me to use maximum of -s14 option.
    I'll try to test enwik8 and enwik9. Latest your version (v47_5) wasn't also stable and enwik9 crashes on the end of compression....
    Last edited by Darek; 22nd July 2018 at 23:56.

  15. #490
    Member
    Join Date
    Apr 2018
    Location
    Indonesia
    Posts
    24
    Thanks
    7
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by Darek View Post
    @bwt - one request more for v48_bwt1 - could you please name exe file adequate to version? In this case should be "paq8pxd_v48_bwt1"
    This verision is not stable - some files like I.EXE crash during the test.
    Another issue are the very big memory requirements which forces me to use maximum of -s14 option.
    I'll try to test enwik8 and enwik9. Latest your version (v47_5) wasn't also stable and enwik9 crashes on the end of compression....

    i think very big memory requirement is caused by dmc model because it is based on v48....

  16. #491
    Member
    Join Date
    Apr 2018
    Location
    Indonesia
    Posts
    24
    Thanks
    7
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by bwt View Post
    i think very big memory requirement is caused by dmc model because it is based on v48....
    I improve textmodel,wordmodel & nest model only.maybe kaitz can fix it because he is the author.

  17. #492
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    919
    Thanks
    541
    Thanked 363 Times in 271 Posts
    Scores for my testset for v148_bwt1. Generally almost all files got some loses, except textual files.

    Some tests for enwik8/enwik9:

    16'038'418 - enwik8 -s15 by Paq8pxd_v47_3
    126'749'584 - enwik9_1423 -s15 by Paq8pxd_v47_3 - very good score - 3rd place on LTCB, however score is not inserted into official page

    16'027'128 - enwik8 -s15 by Paq8pxd_v47_5, ewnwik9 crashes on the end...
    16'061'557 - enwik8 -s14 by Paq8pxd_v48
    16'067'623 - enwik8 -s15 by Paq8pxd_v49

    16'004'759 - enwik8 -s15 by Paq8pxd_v48_bwt1 - it means enwik9 should get about 126'200'xxx - 126'400'000

    Darek
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	paq8pxd_v148_bwt1.jpg 
Views:	28 
Size:	1.14 MB 
ID:	6035  

  18. The Following User Says Thank You to Darek For This Useful Post:

    bwt (23rd July 2018)

  19. #493
    Member
    Join Date
    Apr 2018
    Location
    Indonesia
    Posts
    24
    Thanks
    7
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by Darek View Post
    Scores for my testset for v148_bwt1. Generally almost all files got some loses, except textual files.

    Some tests for enwik8/enwik9:

    16'038'418 - enwik8 -s15 by Paq8pxd_v47_3
    126'749'584 - enwik9_1423 -s15 by Paq8pxd_v47_3 - very good score - 3rd place on LTCB, however score is not inserted into official page

    16'027'128 - enwik8 -s15 by Paq8pxd_v47_5, ewnwik9 crashes on the end...
    16'061'557 - enwik8 -s14 by Paq8pxd_v48
    16'067'623 - enwik8 -s15 by Paq8pxd_v49

    16'004'759 - enwik8 -s15 by Paq8pxd_v48_bwt1 - it means enwik9 should get about 126'200'xxx - 126'400'000

    Darek

    testing pa8pxd48_bwt1 using s6 option
    enwik9 136.732.561 bytes

  20. The Following User Says Thank You to bwt For This Useful Post:

    Darek (25th July 2018)

  21. #494
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    919
    Thanks
    541
    Thanked 363 Times in 271 Posts
    I've started test enwik9 with -s14 option. It's going very slow - after 24h it's on 25% - that means it will take 4 days to compress full file.
    It's because -s14 for enwik9 use 51GB of memory and use swap file in my case.

    I'm afraid that it could crash at the end similar like v147_5 but we'll see... however if it finish then It could be a next paq8pxd record - score something about 126'3xx'xxx

    update: after 3 days progress is on the level of 49% - compression goes slower....
    Last edited by Darek; 27th July 2018 at 10:31.

  22. #495
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    919
    Thanks
    541
    Thanked 363 Times in 271 Posts
    paq8pxd_v48_bwt1 score for enwik:

    16'004'759 - enwik8 -s14 by paq8pxd_v48_bwt1 -> time 31'050.72s
    126'183'029 - enwik9_1423 -s14 by paq8pxd_v48_bwt1 -> time 579'894.70s - it was very slow run - similar to byron's cmix timings

    Both scores are the best paq scores ever. Paq is getting more stable on third position on LTCB

  23. The Following User Says Thank You to Darek For This Useful Post:

    bwt (31st July 2018)

  24. #496
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    919
    Thanks
    541
    Thanked 363 Times in 271 Posts
    @Matt - could you add version v48_bwt1 submission to LTCB page?

    Paq8pxd_v48_bwt1:

    enwik8 - 16'004'759 -> option -s14 - encode time: 31'050,72s, memory used: 51'865MB
    enwik9_1423 - 126'183'029 -> option -s14 - encode time: 579'894,7s, memory used: 51'865MB


    System: Core i7 4900MQ at 3.8GHz, 32GB, Win7Pro 64, decompression in progress.
    Source code and 1423 resplit is attached in 7ZIP file = 153'295 bytes - it means total paq8pxd score with unpacker is 126'336'324 bytes! Matt zip could be a little bigger.
    Attached Files Attached Files

  25. #497
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    919
    Thanks
    541
    Thanked 363 Times in 271 Posts
    @Piotr Tarsa - could you also add Paq8pxd_v48_bwt1 enwik8/9 scores to your loseless benchmark on github?

  26. #498
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    Sure. Right now the entry for PAQ is:
    Code:
    program_series,program_name,program_options,compressed_enwik8_size,compressed_enwik9_size,decompressor_size,total_compressed_size,compression_time,decompression_time,memory_usage_in_bytes,algorithm_type,note_id,outdated
    PAQ,Paq8pxd_v47_3,-s15,16038418,126749584,"150,210 (s)",126899794,83435,86671,28940697600,CM,1,no
    What would be the entry for Paq8pxd_v48_bwt1?

  27. #499
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    919
    Thanks
    541
    Thanked 363 Times in 271 Posts
    Maybe something like this:

    program_series,program_name,program_options,compre ssed_enwik8_size,compressed_enwik9_size,decompress or_size,total_compressed_size,compression_time,dec ompression_time,memory_usage_in_bytes,algorithm_ty pe,note_id,outdated
    PAQ,Paq8pxd_v48_bwt1,-s14,16004759,126183029,"153,295 (s)",126336324,5798947,0,54384394240,CM,1,no

    One thing is not fit for me - "decompressor size" comma in the middle. Every other numbers have space as a 1000's separator then maybe in this column should be also space?

  28. #500
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    Thanks, entry looks good. But does the paq8pxd_v48_bwt1 require some preprocessor? You've mentioned some resplit 1423 or something. If yes then add it to program name and options like Byron did for cmix in Silesia compression corpus results:
    Code:
    Program series: name
    options
    cmix: cmix v15, precomp v0.4.6
    precomp -cn | cmix
    One thing is not fit for me - "decompressor size" comma in the middle. Every other numbers have space as a 1000's separator then maybe in this column should be also space?
    Whether a number has a space or comma as a separator depends on your locale. For example if you put https://tarsa.github.io/lossless-benchmark/ into https://www.url2png.com/ you'll get screenshot with commas as thousands separators so they will match with decompressor size thousands separator. To fix the mismatch I would need to split decompressor size to two columns (one numeric, one string suffix) or just force to use single locale for every user. I've created an issue for that: https://github.com/tarsa/lossless-benchmark/issues/5
    Last edited by Piotr Tarsa; 10th August 2018 at 11:51.

  29. #501
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    919
    Thanks
    541
    Thanked 363 Times in 271 Posts
    Yes, it need some preprocessing - split for 4 pieces and reordering it from less redundant to more redundant - it saves about 100KB.
    Resplit is here (in post 496, above my request): https://encode.su/threads/1464-Paq8p...ll=1#post57634

  30. #502
    Member
    Join Date
    Apr 2018
    Location
    Indonesia
    Posts
    24
    Thanks
    7
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by Darek View Post
    Yes, it need some preprocessing - split for 4 pieces and reordering it from less redundant to more redundant - it saves about 100KB.
    Resplit is here (in post 496, above my request): https://encode.su/threads/1464-Paq8p...ll=1#post57634
    paq8pxd48_bwt2 -s6
    enwik8 16.472.556 bytes
    it should be under 16Mb using s14 option.
    Attached Files Attached Files

  31. The Following User Says Thank You to bwt For This Useful Post:

    Darek (12th August 2018)

  32. #503
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    Darek:
    I've added results for Paq8pxd_v48_bwt1 to my benchmark, but compression time looks totally off. Almost 10x slower than cmix: https://tarsa.github.io/lossless-benchmark/

    Commit with update: https://github.com/tarsa/lossless-be...76bb7285f9edd4

  33. #504
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    919
    Thanks
    541
    Thanked 363 Times in 271 Posts
    Yes, it's my mistake - sorry - comma on wrong position. It should be as follows:

    PAQ,Paq8pxd_v48_bwt1,-s14,16004759,126183029,"153,295 (s)",126336324,579894,70,54384394240,CM,1,no

    I'm sorry for this.

  34. #505
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    OK, fixed. Fractional parts are not supported so I've rounded it up. Did you verify decompression?

  35. The Following User Says Thank You to Piotr Tarsa For This Useful Post:

    Darek (12th August 2018)

  36. #506
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    919
    Thanks
    541
    Thanked 363 Times in 271 Posts
    Not yet but I'll plan to do it at the end of August.

  37. #507
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    919
    Thanks
    541
    Thanked 363 Times in 271 Posts
    @bwt - according to paq8pxd_v48_bwt2 - unfortunatelly there is still memory problem the same as v48_bwt1 version.
    Option -s14 uses really less than 20GB but requires 53GB, option -s15 is not usable for my laptop with 32GB of memory....

    I'll test it at -s14 option like bwt1.

  38. #508
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    919
    Thanks
    541
    Thanked 363 Times in 271 Posts
    Score of enwik8 for v148_bwt2:

    16'004'759 -> paq8pxd_v48_bwt1 with option -s14 - encode time: 31'050,72s, memory used: 51'865MB
    16'001'495 -> paq8pxd_v48_bwt2 witho ption -s14 - encode time: 20'852,80s, memory used: 51'865MB

    Very slightly improvement on this level....

  39. #509
    Member
    Join Date
    Apr 2018
    Location
    Indonesia
    Posts
    24
    Thanks
    7
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by Darek View Post
    Score of enwik8 for v148_bwt2:

    16'004'759 -> paq8pxd_v48_bwt1 with option -s14 - encode time: 31'050,72s, memory used: 51'865MB
    16'001'495 -> paq8pxd_v48_bwt2 witho ption -s14 - encode time: 20'852,80s, memory used: 51'865MB

    Very slightly improvement on this level....
    Hhmm but compression time faster than before. May you test it for enwik9 ?

  40. #510
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    919
    Thanks
    541
    Thanked 363 Times in 271 Posts
    >Hhmm but compression time faster than before
    Not exactly, due to very high memory requirements compression timings are very dependent to actual memory usage. First 25% of compressing/decompressing goes 2-3x faster than next quarters. If program goes to use swap file then it slowdown very rapidly. In this case one or two (even small) applications could make such big difference. I've test enwik8 with v48_bwt1 two times - second time to test timings again. Second time time was 15'173,76 sec -> two times faster...

    Below there are updated scores for enwik8 and enwik8.drt.

    enwik8:
    16'004'759 -> paq8pxd_v48_bwt1 with option -s14 - encode time: 15'173,76s, memory used: 51'865MB
    16'001'495 -> paq8pxd_v48_bwt2 witho ption -s14 - encode time: 20'852,80s, memory used: 51'865MB

    enwik8.drt
    16'088'990 -> paq8pxd_v48_bwt1 with option -s14 - encode time: 42'441,17s, memory used: 51'865MB
    16'088'506-> paq8pxd_v48_bwt2 witho ption -s14 - encode time: 41'259',06s, memory used: 51'865MB

    According to enwik9 test of v48_bwt2 - difference looks very tiny -> about 30KB of gain or less (according to drt very small difference) for 7 days testing.... Hmmm.. nobody see the difference.
    In this case I prefer to test and check v148_bwt1 decompression to confirm previous record. It would be more usable - we could submit it with 100% of sure that scores are right.
    Then I'll try to test cmix v15d (12-14 days) due to potential very good score. Don't get me wrong - I don't say no to test it - maybe I'll borrow my daughter laptop for such 7 days.. . or maybe it would be more rationale to lowering memory requirements for bwt3 version (made it on paq8pxd_v50 release?) and tune textual model even more.

Page 17 of 22 FirstFirst ... 71516171819 ... LastLast

Similar Threads

  1. FreeArc compression suite (4x4, Tornado, REP, Delta, Dict...)
    By Bulat Ziganshin in forum Data Compression
    Replies: 554
    Last Post: 26th September 2018, 02:41
  2. Dict preprocessor
    By pat357 in forum Data Compression
    Replies: 5
    Last Post: 2nd May 2014, 21:51

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •