Page 33 of 33 FirstFirst ... 23313233
Results 961 to 970 of 970

Thread: Paq8pxd dict

  1. #961
    Member
    Join Date
    Sep 2014
    Location
    Italy
    Posts
    94
    Thanks
    102
    Thanked 39 Times in 25 Posts
    Yes no changes for enwik8.
    We reached the max with o40 m3360
    Instead we should see improved compression for enwik9.
    At least i hope!
    Thank you
    Luca

  2. #962
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,217
    Thanks
    743
    Thanked 495 Times in 383 Posts
    ewnik scores:

    15'654'147 - enwik8 -x15 -w -e1,english.dic by paq8pxd_v89_40_3360, change: -0,01%
    123'013'220 - enwik9 -x15 -w -e1,english.dic by paq8pxd_v89_40_3360, change: -0,23%, memory used 32'130MB (by paq8pxd)

    15'654'151 - enwik8 -x15 -w -e1,english.dic by paq8pxd_v89_60_4095, change: 0,00%
    122'945'119 - enwik9 -x15 -w -e1,english.dic by paq8pxd_v89_60_4095, change: -0,06%, memory used 34'335MB (by paq8pxd) - and finally there is a gain about 70kB!
    Last edited by Darek; 19th July 2020 at 17:02.

  3. Thanks:

    LucaBiondi (18th July 2020)

  4. #963
    Member
    Join Date
    Sep 2014
    Location
    Italy
    Posts
    94
    Thanks
    102
    Thanked 39 Times in 25 Posts
    Thank you very much Darek!

    122.945.199 well is not bad at all

    My goal was to push ppm_mod to the limit.
    I have tried a few parameters, As soon as I have a moment I write what I felt.If kaitz or someone wants to apply these small changes to paq8pxd, I will be happy .

    Luca

  5. #964
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,217
    Thanks
    743
    Thanked 495 Times in 383 Posts
    Gain is always gain
    This changes plus maybe LSTM could give us 121'xxx'xxx bytest at the end.

  6. #965
    Member
    Join Date
    Sep 2014
    Location
    Italy
    Posts
    94
    Thanks
    102
    Thanked 39 Times in 25 Posts
    Hi Darek!
    I think the same!!
    What do you think, what is globally the best version between 40 3360 and 60 4095?
    Luca

  7. #966
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,217
    Thanks
    743
    Thanked 495 Times in 383 Posts
    For my testset best version is 40 3360, however as it was visible on enwik9, for big files better is 60 4095.
    I'll need to test 4 corpuses on 60 4095 and then maybe this choise will be easier.

  8. #967
    Member
    Join Date
    Sep 2014
    Location
    Italy
    Posts
    94
    Thanks
    102
    Thanked 39 Times in 25 Posts
    Good idea!
    Luca

  9. #968
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,217
    Thanks
    743
    Thanked 495 Times in 383 Posts
    Scores of 4 Corpuses for paq8pxd_v89_ppm_60_4095.

    Looks like for smaller files tests (Calgary, Canterbury, Maximum Compression and my testset) this version generates worse scores than 40_3360 but for bigger files tests like Silesia and enwik9 scores are better...
    Then it's hard to be objective and say definetely but if we add all 5 corpuses together then 60_4095 wins and wins also for enwik9 then let it be - for my opinion 60_4095 should be best version.
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	paq8pxd_v89_60_4095_4_Corpuses.jpg 
Views:	41 
Size:	2.84 MB 
ID:	7798  

  10. #969
    Member
    Join Date
    Sep 2014
    Location
    Italy
    Posts
    94
    Thanks
    102
    Thanked 39 Times in 25 Posts
    Hi Darek!
    Good job and thank you!
    If Kaitz or Shelwien want adopt these parameters i will be happy!
    ‚ÄčLuca

  11. #970
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    278
    Thanks
    116
    Thanked 160 Times in 117 Posts
    Quote Originally Posted by kaitz View Post
    https://encode.su/threads/1464-Paq8p...ll=1#post61439
    Just uncomment some LSTM related lines and it works.
    It's not that easy, at least for me.
    I tried to rewrite lstm.inc with the current version of LSTM present in cmix (with the differences needed to use it in paq8pxd), however the decompressed file is different from the original.
    Then I started from the original lstm.inc, but I had the same problem, trying it with paq8pxd 73 (first version in which the management of lstm.inc is present) and 89 (latest version), g++ 6.3 and 7.1 with various options.
    After debugging for some time, it is still not clear to me if it is a compilation problem (but I have tried 2 g++ and with different options) or in the LSTM source (it doesn't seem, but it is difficult to debug).
    Has anyone else tried to enable the LSTM part?
    It's not my current main work, if I can't solve it without taking too much more time I'll have to give up.

Page 33 of 33 FirstFirst ... 23313233

Similar Threads

  1. FreeArc compression suite (4x4, Tornado, REP, Delta, Dict...)
    By Bulat Ziganshin in forum Data Compression
    Replies: 554
    Last Post: 26th September 2018, 02:41
  2. Dict preprocessor
    By pat357 in forum Data Compression
    Replies: 5
    Last Post: 2nd May 2014, 21:51

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •