Page 27 of 31 FirstFirst ... 172526272829 ... LastLast
Results 781 to 810 of 923

Thread: Paq8pxd dict

  1. #781
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    937
    Thanks
    95
    Thanked 362 Times in 252 Posts
    Quote Originally Posted by LucaBiondi View Post
    how much memory do you need to use -x15 option?
    "used 33310 MB (568349169 bytes) of memory" -x15
    "used 33310 MB (568349201 bytes) of memory" -x15 -w

  2. Thanks:

    Darek (5th April 2020)

  3. #782
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,103
    Thanks
    679
    Thanked 433 Times in 331 Posts
    Looks like the even times are smaller... !

  4. #783
    Member
    Join Date
    Sep 2014
    Location
    Italy
    Posts
    58
    Thanks
    63
    Thanked 28 Times in 18 Posts
    Quote Originally Posted by Sportman View Post
    "used 33310 MB (568349169 bytes) of memory" -x15
    "used 33310 MB (568349201 bytes) of memory" -x15 -w
    Thank you! ...so i need al least 48 gb

  5. #784
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    937
    Thanks
    95
    Thanked 362 Times in 252 Posts
    Quote Originally Posted by LucaBiondi View Post
    so i need al least 48 gb
    I use only 32GB and no problems so far.

  6. #785
    Member
    Join Date
    Sep 2014
    Location
    Italy
    Posts
    58
    Thanks
    63
    Thanked 28 Times in 18 Posts
    Quote Originally Posted by Sportman View Post
    I use only 32GB and no problems so far.
    Thank you!

  7. #786
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    611
    Thanks
    246
    Thanked 240 Times in 119 Posts
    Quote Originally Posted by Sportman View Post
    "used 33310 MB (568349169 bytes) of memory" -x15
    "used 33310 MB (568349201 bytes) of memory" -x15 -w
    The second number in each line is a 32 bit datatype display error. 33310 MiB*1024*1024 modulo 2^32 = 568,328,192 bytes
    http://schnaader.info
    Damn kids. They're all alike.

  8. #787
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,103
    Thanks
    679
    Thanked 433 Times in 331 Posts
    @CompressMaster - first attempt to tarball compression:

    10'335'343 - score of compress particular files with one option => paq8pxe v1 gc82
    10'116'571 - best score of solid archive, revisited by paq8pxe v1 gc82, I've faced some issues with paq8px v184 and v185 to test...
    10'122'251 - tarball file compressed by paq8pxe v1 gc82 - slightly worse than solid compresion for the same version. I wonder if file order could be important for tar file.

    Question - is it possible to set particular files in tarball file in my own order?

  9. #788
    Member
    Join Date
    Jun 2009
    Location
    Puerto Rico
    Posts
    208
    Thanks
    98
    Thanked 27 Times in 20 Posts

  10. Thanks:

    Darek (5th April 2020)

  11. #789
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,103
    Thanks
    679
    Thanked 433 Times in 331 Posts
    Damn, Kaitz is too fast... I'm still testing v80

  12. #790
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    505
    Thanks
    208
    Thanked 343 Times in 182 Posts
    No need to test this one.
    KZo


  13. #791
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,103
    Thanks
    679
    Thanked 433 Times in 331 Posts
    @Kaitz - this version = paq8pxd v81 - It's only fix?

    I'm asking about enwik8 and enwik9 testing => should I use v81 instead of v80 version or there are would be different scores?
    Last edited by Darek; 6th April 2020 at 12:12.

  14. #792
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    505
    Thanks
    208
    Thanked 343 Times in 182 Posts
    Quote Originally Posted by Darek View Post
    @Kaitz - this version = paq8pxd v81 - It's only fix?

    I'm asking about enwik8 and enwik9 testing => should I use v81 instead of v80 version or there are would be different scores?
    v81 fixes enwik10 processing only, there is still "small" mistake i added.
    So testing v80 is ok. No need for v81.
    Also if 1423 order is used, please check with -s0 -w that it does transform without fail. '
    Also drt has no effect on -w option. Transform will fail.
    KZo


  15. Thanks (2):

    Darek (6th April 2020),xinix (6th April 2020)

  16. #793
    Member
    Join Date
    Sep 2014
    Location
    Italy
    Posts
    58
    Thanks
    63
    Thanked 28 Times in 18 Posts
    Quote Originally Posted by kaitz View Post
    Maybe open it in notepad++ and see if it is infact a text.


    ​You need to change wrtpre.cpp. If it reports bintext then its text mixed with binary data. And wrt probably will bloat the hell out of this file.

    I found that one xml file is in UTF8 format.
    The other is in UFT16 format (noetpad++ tell me "UCS-2 LE BOM")

    Should UTF16 files detected as text files?
    Thank you as usual!!

    Luca

  17. #794
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    505
    Thanks
    208
    Thanked 343 Times in 182 Posts
    Quote Originally Posted by LucaBiondi View Post
    I found that one xml file is in UTF8 format.
    The other is in UFT16 format (noetpad++ tell me "UCS-2 LE BOM")

    Should UTF16 files detected as text files?
    No.
    KZo


  18. #795
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    505
    Thanks
    208
    Thanked 343 Times in 182 Posts
    Made another test. Moving some content around.
    enwik8 -s8 -w
    16222552 fixed my mistake compared to v80
    16217354 reordered some articles

    Data size moved after main data is 1814632 bytes. Total 159 articles.

    It should be the same thing as Darek's 1423 ordering but on enwik8.
    This also on enwik9 selects about 130MB that matches http://mattmahoney.net/dc/textdata.html gap. And about 49000 articles.
    enwik9 test running...

    How selection is made is kindof stupid. Really. ,

    EDIT:
    To be sure made comparsion:
    Click image for larger version. 

Name:	fv.v79.v80.v82.s0w.png 
Views:	52 
Size:	806.6 KB 
ID:	7532
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	fv.v79.v80.v82.s0w.png 
Views:	30 
Size:	776.9 KB 
ID:	7531  
    Last edited by kaitz; 6th April 2020 at 21:36.
    KZo


  19. Thanks (2):

    Darek (7th April 2020),xinix (7th April 2020)

  20. #796
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,103
    Thanks
    679
    Thanked 433 Times in 331 Posts
    @Kaitz -> this is an paq8px v81 version or something new? Ok, I see - it's v82...

    I've gathered some scores of paq8pxd v80 for enwik8 and comparison to paq8pxd v79 scores are:

    16'272'537 - enwik8 -s8 by Paq8pxd_v79_AVX2
    16'214'034 - enwik8 -x8 by Paq8pxd_v79_AVX2
    15'925'621 - enwik8 -s15 by Paq8pxd_v79_AVX2- tested by Sportman
    15'862'122 - enwik8 -x15 by Paq8pxd_v79_AVX2- tested by Sportman
    15'843'925 - enwik8.drt -x15 by Paq8pxd_v79_AVX2

    16'265'881 - enwik8 -s8 by Paq8pxd_v80_AVX2
    16'222'997 - enwik8 -s8 -w by Paq8pxd_v80_AVX2- tested by Kaitz
    16'207'724 - enwik8 -x8 by Paq8pxd_v80_AVX2
    16'162'663 - enwik8 -x8 -w by Paq8pxd_v80_AVX2
    15'924'798 - enwik8 -s15 by Paq8pxd_v80_AVX2
    15'898'839 - enwik8 -s15 -w by Paq8pxd_v80_AVX2
    15'861'418 - enwik8 -x15 by Paq8pxd_v80_AVX2 - tested by Sportman
    15'835'340 - enwik8 -x15 -w by Paq8pxd_v80_AVX2 - tested by Sportman - best score ever for paq8pxd series for enwik8 - what is important that there is a better score than for DRT preprocessed file - the first case from long time!
    15'849'095 - enwik8.drt -x15 by Paq8pxd_v80_AVX2
    15'849'039 - enwik8.drt -x15 -w by Paq8pxd_v80_AVX2
    Last edited by Darek; 7th April 2020 at 01:21.

  21. Thanks:

    Sportman (10th April 2020)

  22. #797
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    505
    Thanks
    208
    Thanked 343 Times in 182 Posts
    My test version finished on enwik9. Now i dont know result of v80 but compared to v79 -s15 its about 727kb better.
    Click image for larger version. 

Name:	pxdv82test1.PNG 
Views:	39 
Size:	17.6 KB 
ID:	7533
    I will think about this and test some other things. And how to handle it on command line.
    KZo


  23. Thanks (2):

    Darek (7th April 2020),xinix (7th April 2020)

  24. #798
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,103
    Thanks
    679
    Thanked 433 Times in 331 Posts
    > My test version finished on enwik9. Now i dont know result of v80 but compared to v79 -s15 its about 727kb better.

    Damn! It's great result! I'm still testing some files with v80 then I'll test enwik9.

    This result means that -x15 version should be even 400-500KB less = 124'xxx'xxx

    > And how to handle it on command line. - what this means? Is this options hardcoded?

  25. #799
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    505
    Thanks
    208
    Thanked 343 Times in 182 Posts
    Quote Originally Posted by Darek View Post

    > And how to handle it on command line. - what this means? Is this options hardcoded?
    Option -w is general, if i add this change it makes it target specific to one file. So another option. This is moment i like to have external config (like pxv) to use and added to archive.
    KZo


  26. Thanks:

    Darek (8th April 2020)

  27. #800
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    937
    Thanks
    95
    Thanked 362 Times in 252 Posts
    enwik9:
    124,905,286 bytes, 72,450.859 sec., paq8pxd_v81_avx2 -x15 -w

  28. Thanks (2):

    Darek (8th April 2020),kaitz (8th April 2020)

  29. #801
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,103
    Thanks
    679
    Thanked 433 Times in 331 Posts
    Quote Originally Posted by kaitz View Post
    v81 fixes enwik10 processing only, there is still "small" mistake i added.
    So testing v80 is ok. No need for v81.
    Also if 1423 order is used, please check with -s0 -w that it does transform without fail. '
    Also drt has no effect on -w option. Transform will fail.
    Nope... -s0 -w and also -x0 -w transforms fails at all with pure 1423 file.
    But if performs well with enwik9_1423.DRT file.

    Is there any difference between -s0 -w and -x0 -w options?
    If there was "small" mistake in enwik10 only then enwik9 should be the same for paq8pxd v80 and v81, that's right?

  30. #802
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    505
    Thanks
    208
    Thanked 343 Times in 182 Posts
    Quote Originally Posted by Darek View Post
    Is there any difference between -s0 -w and -x0 -w options?
    No.
    Quote Originally Posted by Darek View Post
    If there was "small" mistake in enwik10 only then enwik9 should be the same for paq8pxd v80 and v81, that's right?
    v81 is probably tiny bit faster.
    KZo


  31. Thanks:

    Darek (9th April 2020)

  32. #803
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,103
    Thanks
    679
    Thanked 433 Times in 331 Posts
    Quote Originally Posted by Sportman View Post
    enwik9:
    124,905,286 bytes, 72,450.859 sec., paq8pxd_v81_avx2 -x15 -w
    125,101,814 bytes, 89,181.930 sec., paq8pxd_v81_avx2 -x15, enwik 1423

  33. #804
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,842
    Thanks
    288
    Thanked 1,244 Times in 697 Posts
    Btw, do we have enwik9 stats for -x9 .. -x14 ?
    Keep in mind that HP has 10GB memory limit now.

  34. Thanks:

    kampaster (9th April 2020)

  35. #805
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    505
    Thanks
    208
    Thanked 343 Times in 182 Posts
    paq8pxd_v82
    Code:
    - -w option - reorder content by title selecting ',' or ':'
    Reordered sizes:
    Main size: 734946 kb
    tags size: 138695 kb
    file size: 22409 kb
    header size: 23642 kb
    Langs size: 15477 kb


    Click image for larger version. 

Name:	fv.v82.png 
Views:	40 
Size:	258.3 KB 
ID:	7543

    , selects citys (about 97% rest is other)
    : selects images and other similar content

    If you comment out ',' selecting lines in encode/decode then it will work on any wiki dump.

    Also considered years in titles, maybe username is important like bots (mathbot means article is probably about math), .........

    Tested -s0 -w output with 7z ppmd and there is improvement.


    Quote Originally Posted by Shelwien View Post
    Btw, do we have enwik9 stats for -x9 .. -x14 ?
    Keep in mind that HP has 10GB memory limit now.
    I will test some of these options (-x9,-x10) on v82

    ---
    No need for 1423 reorder anymore.
    Attached Files Attached Files
    KZo


  36. Thanks (6):

    Darek (10th April 2020),Mike (10th April 2020),moisesmcardona (10th April 2020),schnaader (10th April 2020),Sportman (10th April 2020),xinix (10th April 2020)

  37. #806
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,842
    Thanks
    288
    Thanked 1,244 Times in 697 Posts
    Attached mod_SSE (cmix version):
    Code:
    183200 // paq8pxd_v82_SSE4.exe -s7 book1
    183015 // paq8pxd82.exe -s7 book1
    Attached Files Attached Files

  38. Thanks (4):

    Darek (10th April 2020),kaitz (10th April 2020),Sportman (10th April 2020),xinix (10th April 2020)

  39. #807
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    245
    Thanks
    32
    Thanked 25 Times in 23 Posts
    Quote Originally Posted by kaitz View Post
    paq8pxd_v82
    Code:
    - -w option - reorder content by title selecting ',' or ':'
    Reordered sizes:
    Main size: 734946 kb
    tags size: 138695 kb
    file size: 22409 kb
    header size: 23642 kb
    Langs size: 15477 kb


    Click image for larger version. 

Name:	fv.v82.png 
Views:	40 
Size:	258.3 KB 
ID:	7543

    , selects citys (about 97% rest is other)
    : selects images and other similar content

    If you comment out ',' selecting lines in encode/decode then it will work on any wiki dump.

    Also considered years in titles, maybe username is important like bots (mathbot means article is probably about math), .........

    Tested -s0 -w output with 7z ppmd and there is improvement.



    I will test some of these options (-x9,-x10) on v82

    ---
    No need for 1423 reorder anymore.
    Here is compiled with gcc 9.2.0
    Attached Files Attached Files

  40. #808
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    245
    Thanks
    32
    Thanked 25 Times in 23 Posts
    Quote Originally Posted by suryakandau@yahoo.co.id View Post
    Here is compiled with gcc 9.2.0
    this is not compiled using modSSE the result use -x11 option for xml file is:
    5345280 bytes compressed to 249396 bytes.
    Time 1356.64 sec, used 13914 MB (1705005728 bytes) of memory

  41. #809
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    505
    Thanks
    208
    Thanked 343 Times in 182 Posts
    ​enwik8 16214678 -s8 -w Paq8pxd_v82_AVX2
    KZo


  42. Thanks:

    Darek (10th April 2020)

  43. #810
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    505
    Thanks
    208
    Thanked 343 Times in 182 Posts
    Quote Originally Posted by kaitz View Post
    My test version finished on enwik9. Now i dont know result of v80 but compared to v79 -s15 its about 727kb better.
    Click image for larger version. 

Name:	pxdv82test1.PNG 
Views:	39 
Size:	17.6 KB 
ID:	7533
    I will think about this and test some other things. And how to handle it on command line.
    Quote Originally Posted by Sportman View Post
    enwik9:
    124,905,286 bytes, 72,450.859 sec., paq8pxd_v81_avx2 -x15 -w
    Quote Originally Posted by Darek View Post
    125,101,814 bytes, 89,181.930 sec., paq8pxd_v81_avx2 -x15, enwik 1423
    enwik9 125354436 -s15 -w Paq8pxd_v81_AVX2
    KZo


  44. Thanks:

    Darek (10th April 2020)

Page 27 of 31 FirstFirst ... 172526272829 ... LastLast

Similar Threads

  1. FreeArc compression suite (4x4, Tornado, REP, Delta, Dict...)
    By Bulat Ziganshin in forum Data Compression
    Replies: 554
    Last Post: 26th September 2018, 02:41
  2. Dict preprocessor
    By pat357 in forum Data Compression
    Replies: 5
    Last Post: 2nd May 2014, 21:51

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •