Page 25 of 31 FirstFirst ... 152324252627 ... LastLast
Results 721 to 750 of 923

Thread: Paq8pxd dict

  1. #721
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,101
    Thanks
    678
    Thanked 431 Times in 329 Posts
    I'm not sure... Should I get MinGW to compile it?

  2. #722
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,840
    Thanks
    288
    Thanked 1,243 Times in 696 Posts
    Yes, it seems to compile like this with g.bat: http://nishi.dreamhosters.com/u/paq8pxd75_src_0.7z
    Might have to modify "set gcc=C:\MinGW820x\bin\g++.exe" for mingw path and "-march=k8" to "-march=native" for speed.

  3. Thanks (2):

    Darek (2nd March 2020),kaitz (14th March 2020)

  4. #723
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,101
    Thanks
    678
    Thanked 431 Times in 329 Posts
    I cannot download the file. . Could you compile this version?

  5. #724
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,840
    Thanks
    288
    Thanked 1,243 Times in 696 Posts
    Yes, there's an exe inside too.
    Attached Files Attached Files

  6. Thanks:

    Darek (3rd March 2020)

  7. #725
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,101
    Thanks
    678
    Thanked 431 Times in 329 Posts
    @Shelwien: Did you change something in this version? As far enwik8 scores are the same as for oaw8pxd v75

  8. #726
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,840
    Thanks
    288
    Thanked 1,243 Times in 696 Posts
    I didn't change anything, just added script to compile it.
    Since you're testing it, I thought you could try finding optimal mod_ppmd parameters (order and memory).

  9. Thanks:

    Darek (4th March 2020)

  10. #727
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    505
    Thanks
    207
    Thanked 343 Times in 182 Posts
    Quote Originally Posted by Darek View Post
    Another scores for enwik:

    126'227'845 - enwik9_1423 -s15 by Paq8pxd_v75_AVX2 => unfortunately, enwik9 got 0.01% lose to paq8pxd_v74 despite enwik8 got nice gain. Time saving is about 3.7%.
    True, enwik8 -s8 is also worse (if only wordmodel considered), it just compresses better as whole.

    Quote Originally Posted by Shelwien View Post
    o13 m420 (at -s8) seems pretty random, and there're probably better settings for enwik.
    At this moment its has arbitrary settings. Was meant to be on same (total) memory usage level with px when last used. Really depends what the input is, and so on.
    In version 34 there was one more https://github.com/kaitz/paq8pxd/blo...8pxd.cpp#L6483
    E: also to limit memory usage below 30GB at the time.
    Last edited by kaitz; 5th March 2020 at 19:23. Reason: mem
    KZo


  11. #728
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    937
    Thanks
    95
    Thanked 361 Times in 252 Posts
    I try to compress enwik10 with paq8pxd_v75_AVX2 but it never finish.

    Console:
    paq8pxd_v75_avx2 -s15 enwik10
    Creating archive enwik10.paq8pxd75 with 1 file(s)...

    File list (21 bytes)
    Compressed from 21 to 18 bytes.

    1/1 Filename: enwik10 (1410065408 bytes)
    Block segmentation:
    0 | default |2147483646 [0 - 4186751275]
    1 | default | 2 [2147483646 - 2147483647]
    2 | text | 135946 [2147483648 - 2147619593]
    3 | default |2147483646 [2147619594 - 4186751273]
    4 | default | 1 [135944 - 135944]
    5 | text | 135949 [135945 - 271893]
    6 | default |2147483646 [271894 - 1410065406]
    7 | ARM |1518281428 [4186751276 - 1410065407]

    Task manager:
    paq8pxd_v75_AVX2.exe (still running):
    CPU time: 302:33:06 (12.6 days)
    Peak working memory: 393,940K
    I/O read: 38,365,960,503 bytes
    I/O write: 12,515,576,592 bytes

    Files (both created 12.5+ days ago):
    enwik10.paq8pxd75 0 bytes
    tmpBAC1.tmp 7,960,731,648 bytes

  12. #729
    Member DZgas's Avatar
    Join Date
    Feb 2020
    Location
    Russia
    Posts
    23
    Thanks
    8
    Thanked 7 Times in 5 Posts
    I accidentally noticed that all paq8pxd versions >10 can not compress or uncompress (just crashes) on my amd athlon II x4 640 processor (SSE4, 2010).
    Version <7 is work fine, all paq8px versions work fine too.

  13. #730
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    505
    Thanks
    207
    Thanked 343 Times in 182 Posts
    Quote Originally Posted by Sportman View Post
    I try to compress enwik10 with paq8pxd_v75_AVX2 but it never finish.
    Will look into this next week. It should be detected as text or default. Problem is there is hard split at 2GB, as seen in log. And detection should be over within an 40-55 mins witch in this case did not happen.
    Quote Originally Posted by DZgas View Post
    I accidentally noticed that all paq8pxd versions >10 can not compress or uncompress (just crashes) on my amd athlon II x4 640 processor (SSE4, 2010).
    Version <7 is work fine, all paq8px versions work fine too.
    I think starting from version >10 and up i used SSE4 as main target. This amd CPU lacks partial/proper support for it? (http://www.cpu-world.com/CPUs/K10/AM...0WFGMBOX).html) Mabye compile it from source.

    https://encode.su/threads/342-paq8px...ll=1#post64174
    In todo list :)

    Also got this wrton working, some things were compiler dependent ( i have gcc v8.1 and v4.9) and also had wrong order of variable initialization (had no effect on older compiler).
    This allowed to merge better pdf compression. On reymont it was something like 14kb better.


    Just need some free time to think about this. :)
    KZo


  14. #731
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    505
    Thanks
    207
    Thanked 343 Times in 182 Posts
    Progress so far.

    Quote Originally Posted by Shelwien View Post
    Code:
    https://mega.nz/#!9f4DxYJR!wgX4bEvR96sPCcVLWu05TUA3WmgM9ZIQ9jyTo1QJ0Hg
    1,872,327,733 // 7z a -mx=9 -mf=off -ms=1t -m0=lzma:d=1536m:lc=8:lp=0:pb=0:fb=273 enwik10.7z enwik10
    Code:
    7689823266 pxd -s0 (time 1 hour)
    1865350519 7z (time 1,5 hours)
    There are some stupid strings in dict like:

    Code:
    000–15
    
    
    1815–1816
    1815–1817
    1815–1818
    1815–1824
    1815–1830
    is (3 bytes) utf8 char and is treated as that.
    Overall there is about 310000 words. Some utf8 chars at beginning and most one utf8 chars at the end of dict.

    For 10GB there is about 75GB of read and 45GB of write. (detect,transform,compare, compress/copy,final arhive) What a waste :D
    KZo


  15. #732
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    505
    Thanks
    207
    Thanked 343 Times in 182 Posts
    paq8pxd_v76
    Code:
        - Change wordModel1 to compress pdf text (from paq8px_183fix1)    
    ​    - Fix jpeg thumbnail compression
        - Make online wrt work
        - In wrt split num/utf8 chars, also some other utf8 chars. Large file mode
        - Allow large text block detection (+2GB)
        - Set utf8 for text if found
        - Change wordModel1, recordmodel to use wrt column mode
        - Change sparsemodelx
        - Small fixes
        - Show progress when detecting data
    This wrt colum mode is really helpful. Mostly in wordmodel.
    Wanted to use it long time ago. Probably only breaks if utf8 char columns. I expect more improvements.

    dickens is about 10kb better with -s8

    EDIT:
    I uploaded v77 to git. (dickens) is 1kb better vs v76 and my current test is vs v77 1kb even better.
    Attached Files Attached Files
    Last edited by kaitz; 19th March 2020 at 20:54.
    KZo


  16. Thanks (6):

    Darek (19th March 2020),Mike (18th March 2020),moisesmcardona (19th March 2020),schnaader (18th March 2020),Sportman (21st March 2020),User (20th March 2020)

  17. #733
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,101
    Thanks
    678
    Thanked 431 Times in 329 Posts
    @kaitz -where you uploaded v77 version?

  18. #734
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    505
    Thanks
    207
    Thanked 343 Times in 182 Posts
    Quote Originally Posted by Darek View Post
    @kaitz -where you uploaded v77 version?
    https://github.com/kaitz/paq8pxd/releases/tag/v77
    KZo


  19. Thanks:

    Darek (20th March 2020)

  20. #735
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,101
    Thanks
    678
    Thanked 431 Times in 329 Posts
    Hmmm, looks that there are indeed improvement in textual files but other types of file heve some backdrafts.
    In total my testset got 29KB worse score (0,3%). Here are scores of my testset for paq8pxd v76.
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	paq8pxd_v76.jpg 
Views:	35 
Size:	770.7 KB 
ID:	7499  

  21. Thanks:

    kaitz (20th March 2020)

  22. #736
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    937
    Thanks
    95
    Thanked 361 Times in 252 Posts
    enwik8:
    16,314,392 bytes, 5,782.234 sec. paq8pxd v76 -s8
    16,316,789 bytes, 5,817.031 sec. paq8pxd v77 -s8

    15,965,102 bytes, 5,904.337 sec., paq8pxd v76 -s15
    15,967,512 bytes, 5,933.575 sec., paq8pxd v77 -s15
    Last edited by Sportman; 21st March 2020 at 01:16.

  23. Thanks (2):

    Darek (20th March 2020),kaitz (20th March 2020)

  24. #737
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,101
    Thanks
    678
    Thanked 431 Times in 329 Posts
    Scores of 4 corpuses for paq8pxd v76 and v77. Despite my testset worse scores, for all 4 corpuses both paq8pxd v76 and v77 got the best scores and very good improvemets!

    For Silesia corpus there are 123KB less on paq8pxd v77 than paq8pxd v75!
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	paq8pxd_v_76&77_4_Corpuses.jpg 
Views:	29 
Size:	2.25 MB 
ID:	7504  

  25. Thanks:

    kaitz (23rd March 2020)

  26. #738
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,101
    Thanks
    678
    Thanked 431 Times in 329 Posts
    some enwik scores gathered:

    16'319'686 - enwik8 -s8 by Paq8pxd_v75_AVX2
    15'976'838 - enwik8 -s15 by Paq8pxd_v75_AVX2

    16'260'265 - enwik8 -x8 by Paq8pxd_v75_AVX2
    15'912'509 - enwik8 -x15 by Paq8pxd_v75_AVX2
    15'859'187 - enwik8.drt -x15 by Paq8pxd_v75_AVX2

    125'761'484 - enwik9_1423 -x15 by Paq8pxd_v75_AVX2
    126'074'749 estimated - enwik9_1423.drt -x15 by Paq8pxd_v75_AVX2



    16'314'392 - enwik8 -s8 by Paq8pxd_v76_AVX2 - tested by Sportman
    15'965'102 - enwik8 -s15 by Paq8pxd_v76_AVX2 - tested by Sportman

    16'253'017 - enwik8 -x8 by Paq8pxd_v76_AVX2
    15'899'380 - enwik8 -x15 by Paq8pxd_v76_AVX2
    15'856'800 - enwik8.drt -x15 by Paq8pxd_v76_AVX2

    16'316'789 - enwik8 -s8 by Paq8pxd_v77_AVX2 - tested by Sportman
    15'967'512 - enwik8 -s15 by Paq8pxd_v77_AVX2- tested by Sportman

    16'255'214 - enwik8 -x8 by Paq8pxd_v77_AVX2
    15'901'484 - enwik8 -x15 by Paq8pxd_v77_AVX2 - tested by Kaitz
    15'856'824 - enwik8.drt -x15 by Paq8pxd_v77_AVX2

    125'65x'xxx estimated - enwik9_1423 -x15 by Paq8pxd_v76_AVX2

  27. Thanks:

    kaitz (24th March 2020)

  28. #739
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    505
    Thanks
    207
    Thanked 343 Times in 182 Posts
    paq8pxd_v78
    Code:
    - Change wordModel1,recordmodel
    l.pak,k.wad not fixed for now.

    This change mostly will work only with internal wrt. drt processed files will not benefit from it. Most compression is on plain text files and comes from wordmodel.

    enwik8 -s8 should be 19kb smaller.
    Attached Files Attached Files
    KZo


  29. Thanks (4):

    Darek (24th March 2020),Mike (24th March 2020),moisesmcardona (24th March 2020),Sportman (29th March 2020)

  30. #740
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,101
    Thanks
    678
    Thanked 431 Times in 329 Posts
    enwik8/9 scores for paq8pxd_v76:

    15'928'916 - enwik8 -x15 by Paq8pxd_v74_AVX2
    125'752'479 - enwik9_1423 -x15 by Paq8pxd_v74_AVX2

    15'912'509 - enwik8 -x15 by Paq8pxd_v75_AVX2
    125'761'484 - enwik9_1423 -x15 by Paq8pxd_v75_AVX2

    15'899'380 - enwik8 -x15 by Paq8pxd_v76_AVX2
    125'974'773 - enwik9_1423 -x15 by Paq8pxd_v76_AVX2 - hmmm, there is an 0,17% loss to v75 version, 0.18% to v74 version. The v74 is still the best!

    paq8pxd v77 and v78 tests ongoing.

  31. #741
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,101
    Thanks
    678
    Thanked 431 Times in 329 Posts
    paq8pxd_v78 scores on my testset. In general no big changes. Some improvements for textual files. Some loses for bigger files.
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	paq8pxd_v78.jpg 
Views:	21 
Size:	769.3 KB 
ID:	7506  

  32. Thanks:

    kaitz (26th March 2020)

  33. #742
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,101
    Thanks
    678
    Thanked 431 Times in 329 Posts
    paq8pxd_v78 scores for 4 corpuses => another version with all 4 records for paq8pxd serie!
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	paq8pxd_v_78_4_Corpuses.jpg 
Views:	23 
Size:	2.29 MB 
ID:	7507  

  34. Thanks:

    kaitz (26th March 2020)

  35. #743
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,101
    Thanks
    678
    Thanked 431 Times in 329 Posts
    First enwik scores:

    16'319'686 - enwik8 -s8 by Paq8pxd_v75_AVX2
    16'314'392 - enwik8 -s8 by Paq8pxd_v76_AVX2 = -6'300 bytes
    16'316'789 - enwik8 -s8 by Paq8pxd_v77_AVX2 = +2'400 bytes
    16'291'281 - enwik8 -s8 by Paq8pxd_v78_AVX2 = -25'500 bytes -> good improvement!

  36. #744
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,101
    Thanks
    678
    Thanked 431 Times in 329 Posts
    Other enwik8 scores:
    16'316'789 - enwik8 -s8 by Paq8pxd_v77_AVX2
    15'967'512 - enwik8 -s15 by Paq8pxd_v77_AVX2
    16'255'214 - enwik8 -x8 by Paq8pxd_v77_AVX2
    15'901'484 - enwik8 -x15 by Paq8pxd_v77_AVX2
    15'856'824 - enwik8.drt -x15 by Paq8pxd_v77_AVX2


    16'291'281 - enwik8 -s8 by Paq8pxd_v78_AVX2
    15'941'450 - enwik8 -s15 by Paq8pxd_v78_AVX2
    16'231'687 - enwik8 -x8 by Paq8pxd_v78_AVX2
    15'877'659 - enwik8 -x15 by Paq8pxd_v78_AVX2
    15'852'312 - enwik8.drt -x15 by Paq8pxd_v78_AVX2 - drt got smaller improvement than pure file however it still provides to best score ever for paq8pxd series!

    enwik9 estimate = 125'802'xxx - very close to paq8pxd v74!

  37. #745
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,101
    Thanks
    678
    Thanked 431 Times in 329 Posts
    125'752'479 - enwik9_1423 -x15 by Paq8pxd_v74_AVX2
    125'797'519 - enwik9_1423 -x15 by Paq8pxd_v78_AVX2 - slightly worse than paqpxd v74

  38. Thanks:

    kaitz (29th March 2020)

  39. #746
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,840
    Thanks
    288
    Thanked 1,243 Times in 696 Posts
    I found a suspicious thing:
            bufn.setsize(0x10000);
    if (level>=9) buf.setsize(0x10000000); //limit 256mb
    else buf.setsize(MEM()*8);

    Do I read it right and paq8pxd uses 256mb buffer for enwik9 here?

  40. Thanks:

    kaitz (28th March 2020)

  41. #747
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    505
    Thanks
    207
    Thanked 343 Times in 182 Posts
    Quote Originally Posted by Shelwien View Post
    I found a suspicious thing:
            bufn.setsize(0x10000);
    if (level>=9) buf.setsize(0x10000000); //limit 256mb
    else buf.setsize(MEM()*8);

    Do I read it right and paq8pxd uses 256mb buffer for enwik9 here?
    True, i did not realize it myself. Again was some long time ago set as to reduce memory usage.

    Also made quick test. Only matchmodels active:
    174008558 250mb (buf)
    172560820 1gb (buf)
    KZo


  42. #748
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    505
    Thanks
    207
    Thanked 343 Times in 182 Posts
    paq8pxd_v79
    Code:
    - Change wordModel1
           some html entities rollback
    - Some fixes
    ​enwik8 -s8 is about 18kb smaller then v78.
    Attached Files Attached Files
    KZo


  43. Thanks (2):

    Darek (29th March 2020),Sportman (29th March 2020)

  44. #749
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,101
    Thanks
    678
    Thanked 431 Times in 329 Posts
    Quote Originally Posted by kaitz View Post
    True, i did not realize it myself. Again was some long time ago set as to reduce memory usage.

    Also made quick test. Only matchmodels active:
    174008558 250mb (buf)
    172560820 1gb (buf)
    @Kaitz - did you change the buffer limit in paq8pxd v79 version?

  45. #750
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    505
    Thanks
    207
    Thanked 343 Times in 182 Posts
    Quote Originally Posted by Darek View Post
    @Kaitz - did you change the buffer limit in paq8pxd v79 version?
    Yes.
    KZo


  46. Thanks:

    Darek (29th March 2020)

Page 25 of 31 FirstFirst ... 152324252627 ... LastLast

Similar Threads

  1. FreeArc compression suite (4x4, Tornado, REP, Delta, Dict...)
    By Bulat Ziganshin in forum Data Compression
    Replies: 554
    Last Post: 26th September 2018, 02:41
  2. Dict preprocessor
    By pat357 in forum Data Compression
    Replies: 5
    Last Post: 2nd May 2014, 21:51

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •