Page 29 of 36 FirstFirst ... 192728293031 ... LastLast
Results 841 to 870 of 1056

Thread: Paq8pxd dict

  1. #841
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,272
    Thanks
    802
    Thanked 545 Times in 415 Posts
    Quote Originally Posted by suryakandau@yahoo.co.id View Post
    @darek Could you tes paq8sk2 -x11 -w for enwik9 please ?
    I could do it. However I've started to test some paq8pxd options for enwik9. My computer have "only" 32GB of ram and testing two instances which overfow the memory making simultaneously tests slower than one nstance at all...
    After it I'll test it. I need to test also how long it could take. Compression longer than 2-3 days is a bit risky due to interruption... but I've already tested 2 weeks instances. I'll check.

    Questions -> why -x11 option? As far I found I'ts above the HP requiments? Maybe it's would be better to test -x15?

  2. #842
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,135
    Thanks
    320
    Thanked 1,397 Times in 802 Posts

  3. Thanks:

    Darek (17th April 2020)

  4. #843
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,272
    Thanks
    802
    Thanked 545 Times in 415 Posts
    No. Looks promising but this is real?

  5. #844
    Member
    Join Date
    Jun 2009
    Location
    Puerto Rico
    Posts
    277
    Thanks
    164
    Thanked 64 Times in 49 Posts
    Quote Originally Posted by Darek View Post
    No. Looks promising but this is real?
    Testing it with my natively compiled paq8px v187 for Linux. It seems to work. You just have to upload the file manually or download with wget. Also, there seems to be a max of 12 hour limit on the virtual machines. Which means it may get shutdown at any time but may run up to 12 hours.

    https://research.google.com/colaboratory/faq.html

  6. Thanks:

    Darek (18th April 2020)

  7. #845
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,135
    Thanks
    320
    Thanked 1,397 Times in 802 Posts
    I guess kaitz can add model dump/restore and time limit :)

  8. #846
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    630
    Thanks
    288
    Thanked 252 Times in 128 Posts
    Quote Originally Posted by Shelwien View Post
    I guess kaitz can add model dump/restore and time limit
    Also, don't miss Colab Pro - it's not completely free, but $9.99 per month which isn't too expensive and has a 24 hour timeout.
    http://schnaader.info
    Damn kids. They're all alike.

  9. Thanks:

    Darek (18th April 2020)

  10. #847
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    538
    Thanks
    225
    Thanked 392 Times in 203 Posts
    I have about 2 weeks to test or add new things, then offline until new year.
    Reminder, i am regular user so dont expect magical improvments. My target is 3 spot on ltcb.
    Still, there are code and math tags in wiki, pdf like processing maybe?. Cant test now as wrt online is broken again. And some models have constant improvments. Maybe testing without them saves testing time.
    And if some brave soul adds wit part to cmix and drops some models (paq)....
    If you compare to cmix then pxd is on cmix v7 or v8.
    KZo


  11. #848
    Member DZgas's Avatar
    Join Date
    Feb 2020
    Location
    Russia
    Posts
    53
    Thanks
    23
    Thanked 12 Times in 9 Posts
    Oh, paq8pxd-stagnation until new year...

    Сompress 10000 blank files:
    Code:
                bytes
    tar     5 121 536
    zip     1 397 926
    7z          4 169
    paq8px v136   257
    paq8pxd_v11   293
    paq8pxd_v25   380
    paq8pxd_v40   303
    paq8pxd_v50   313
    paq8pxd_v55   251
    paq8pxd_v58   248
    paq8pxd_v67   220 best
    paq8pxd_v69   222
    paq8pxd_v70   224
    paq8pxd_v71   235
    paq8pxd_v75   232
    paq8pxd_v80   230
    paq8pxd_v83   229

  12. #849
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    514
    Thanks
    63
    Thanked 96 Times in 75 Posts
    Quote Originally Posted by Darek View Post
    I could do it. However I've started to test some paq8pxd options for enwik9. My computer have "only" 32GB of ram and testing two instances which overfow the memory making simultaneously tests slower than one nstance at all...
    After it I'll test it. I need to test also how long it could take. Compression longer than 2-3 days is a bit risky due to interruption... but I've already tested 2 weeks instances. I'll check.

    Questions -> why -x11 option? As far I found I'ts above the HP requiments? Maybe it's would be better to test -x15?
    yes its would be better to test using -x15 maybe it result is better or the same as cmix v7 or cmix v8 on enwik9

  13. #850
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,135
    Thanks
    320
    Thanked 1,397 Times in 802 Posts
    Tried adding a better AC.
    Had to use v82s, since there its possible to use SSE prediction for coding directly, without rounding to 12 bits.
    Code:
                  paq8pxd82s paq8pxd82sa // -x7
    00s  1,048,576       442       216 // 1M of zero bytes
    FFs  1,048,576       443       355 // 1M of FF bytes[*]
    dups 1,048,576       557       396 // repeated 100-byte string of random bytes
    rand 1,048,576 1,049,202 1,049,198 // 1M chunk of some archive
    BOOK1  768,771   182,660   182,646 // calgary book1 [**]
    [*] different results for 00s and FFs are caused by paq's 0 turning into 1/32768 and paq's 4095 turning into 1-8/32768
    [**] could be 4 bytes less, but flush after header is incompatible with rc tail cutting
    Attached Files Attached Files

  14. #851
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,272
    Thanks
    802
    Thanked 545 Times in 415 Posts
    Quote Originally Posted by kaitz View Post
    Code:
    enwik9    128613028    -x9 -w    Paq8pxd_v82_AVX2 5162 MB
    enwik9    126615900    -x10 -w    Paq8pxd_v82_AVX2 8482 MB
    enwik9    126272866    -x10 -w    paq8pxd82
    enwik9    126290118    -x10 -w    Paq8pxd_v83_AVX2
    How much is reordering vs mod_sse, i dont know. Updated above.

    Code:
    enwik8    16154836    -x8 -w    Paq8pxd_v82_AVX2
    enwik8    16125181    -x8 -w    Paq8pxd_v83_AVX2
    125'622'750 - enwik9 -x11 -w by Paq8pxd_v82_AVX2, time 83'126,17s, mem declared 13'074MB, really used 10'850MB - info based higher value on task manager
    125'241'759 - enwik9 -x12 -w by Paq8pxd_v82_AVX2, time 84'156,91s, mem declared 16'114MB, really used 13'830MB - info based higher value on task manager
    124'992'597 - enwik9 -x13 -w by Paq8pxd_v82_AVX2, time 83'127,00s, mem declared 22'194MB, really used 19'343MB - info based higher value on task manager
    124'741'041 - enwik9 -x14 -w by Paq8pxd_v82_AVX2, time 85'134,18s, mem declared 27'422MB, really used 22'625MB - info based higher value on task manager
    124'688'860 - enwik9 -x15 -w by Paq8pxd_v82_AVX2, time 85'118,42s, mem declared 33'310MB, really used 26'000MB - info based higher value on task manager
    Last edited by Darek; 23rd April 2020 at 23:52.

  15. Thanks:

    kaitz (20th April 2020)

  16. #852
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,135
    Thanks
    320
    Thanked 1,397 Times in 802 Posts
    Code:
    // -x7 -w
    16,300,372 enwik8.paq8pxd82s 
    16,297,531 enwik8.paq8pxd82sa

  17. Thanks:

    kaitz (20th April 2020)

  18. #853
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    538
    Thanks
    225
    Thanked 392 Times in 203 Posts
    paq8pxd_v84
    Code:
    ​    - adjust jpeg,im1 predictor
        - add mod_SSE to all predictors exept jpeg
        - add external dictionary command line option
        - add RC to encoder (paq8pxd82sa)
        - change ppmd mem
        - in wordmodel process math tag (wiki)
    How to use external dictionary:
    Code:
    paq8pxd_v84   -s8 -eNum,dictfile infile
    where Num is minimum freq for external dictfile
    Code:
    paq8pxd_v84   -s8 -e10,dictfile world95.txt
    Most smaller files should use minfq 10 or larger.
    Do not use it for nci and etc. Unless you make dict file yourself.

    Line ending in external dict must be \n (inc. last line)

    For wiki:
    Code:
    paq8pxd_v84   -s8 -w -e10,dictfile enwik7
    paq8pxd_v84   -s8 -w -e1,dictfile enwik8
    (When external dict) some parameters are hard coded in wrt,
    utf8 chars are added to dict no matter what. Minfq for dynamic words is *2 and length >3
    If it crashes then don't use -e

    With https://github.com/byronknoll/cmix/t...ter/dictionary
    enwik8 -x15 -w is about 1568xxxx bytes.


    Math tag thingy gives only 1kb improvement on enwik8, on eniwk7 it was about 2kb. So its soso.
    Attached Files Attached Files
    KZo


  19. Thanks (7):

    Darek (20th April 2020),kampaster (25th April 2020),Mike (22nd April 2020),moisesmcardona (20th April 2020),Shelwien (20th April 2020),Sportman (21st April 2020),xinix (4th May 2020)

  20. #854
    Member
    Join Date
    Jun 2009
    Location
    Puerto Rico
    Posts
    277
    Thanks
    164
    Thanked 64 Times in 49 Posts
    Thanks @kaitz. Are any of these changes worth porting to paq8px?

  21. #855
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,040
    Thanks
    104
    Thanked 420 Times in 293 Posts
    enwik8:
    15,809,103 bytes, 7,522.931 sec., paq8pxd_v84_avx2 -x15 -w

  22. #856
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,272
    Thanks
    802
    Thanked 545 Times in 415 Posts
    My testset scores for paq8pxd_v84. Looks like this SSE mod worsk completely different than Shelwien implementation. Or it's something else.
    In general score is about 8'400 bytes worse than paq8pxd_v83 version. Especially for audio and bigger, 24bit image files compression get worse than previous version.
    Comparison to Shelwien version get 14KB difference -> second table... @Kaitz - maybe you could use SSE in the same way as Shelwien do?
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	paq8pxd_v84.jpg 
Views:	60 
Size:	771.3 KB 
ID:	7566   Click image for larger version. 

Name:	paq8pxd_v84_vs_SSE.jpg 
Views:	42 
Size:	771.6 KB 
ID:	7567  

  23. #857
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    538
    Thanks
    225
    Thanked 392 Times in 203 Posts
    Sorry about that, forgot to round down prediction.
    KZo


  24. #858
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    538
    Thanks
    225
    Thanked 392 Times in 203 Posts
    paq8pxd_v85
    Code:
    fix rounding error
    Attached Files Attached Files
    KZo


  25. Thanks (5):

    Darek (21st April 2020),DZgas (27th April 2020),kampaster (25th April 2020),moisesmcardona (21st April 2020),Sportman (21st April 2020)

  26. #859
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,040
    Thanks
    104
    Thanked 420 Times in 293 Posts
    enwik8:
    15,794,654 bytes, 8,417.468 sec., paq8pxd_v85_avx2 -x15 -w

  27. Thanks:

    Darek (21st April 2020)

  28. #860
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,272
    Thanks
    802
    Thanked 545 Times in 415 Posts
    Two questions about use dictionary file:
    1. I cannot download the Byron's dictionary file from github... I don't know why, everytime is downloaded 8.5KB of HTML file. How I can download such dictionary? Could I copy only list of the words to the "dic" file?
    2. If the dictonary is used -> is there any communicate about it?

    I've found that for my doc files use just -e10 (w/o anything else) gives some additional bytes gain....

    I cannot use any dictionary with this version... I think. I've the same scores with dictionary name typed or not....

    And scores for my testset for paq8pxd_v85 - 8KB of gain and 3KB better score than paq8pxd_v82_SSE. Still not better than paq8pxd_v75 (20KB less), but it's something in good direction
    Very good scores for 24bpp image files = A.TIF - second score ever and B.TGA - the best score ever from all compressors!
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	paq8pxd_v85.jpg 
Views:	56 
Size:	771.4 KB 
ID:	7569  

  29. Thanks:

    kaitz (21st April 2020)

  30. #861
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    538
    Thanks
    225
    Thanked 392 Times in 203 Posts
    Quote Originally Posted by Darek View Post
    Two questions about use dictionary file:
    1. I cannot download the Byron's dictionary file from github... I don't know why, everytime is downloaded 8.5KB of HTML file. How I can download such dictionary? Could I copy only list of the words to the "dic" file?
    https://raw.githubusercontent.com/byronknoll/cmix/master/dictionary/english.dic
    and save where programm is.
    Quote Originally Posted by Darek View Post
    2. If the dictonary is used -> is there any communicate about it?
    Only when compressing WRT dict count shows total words.

    Example world95.txt

    No external dictionary:
    Code:
    paq8pxd_v85_AVX2 -s8  world95.txt
    Creating archive world95.txt.paq8pxd85 with 1 file(s)...
    
    
    File list (21 bytes)
    Compressed from 21 to 23 bytes.
    
    
    1/1  Filename: world95.txt (2988578 bytes)
    Block segmentation:
     0           | eoltxt    |   2988578 [0 - 2988577]
    
     Segment data size: 40 bytes
    
     TN |Type name |Count      |Total size
    -----------------------------------------
     34 |eoltxt    |         1 |   2988578
    -----------------------------------------
    Total level  0 |         1 |   2988578
    
    text wrt stream(9).  Total 2899483
    compressed stream(12).  Total 627
     Total 2899483 wrt: 1830527
    WRT dict count 1841 words.
    ^CTerminate batch job (Y/N)? n
    External dictionary:
    Code:
    paq8pxd_v85_AVX2 -s8  -e10,english.dic.txt  world95.txt
    Creating archive world95.txt.paq8pxd85 with 1 file(s)...
    
    
    File list (21 bytes)
    Compressed from 21 to 23 bytes.
    
    
    1/1  Filename: world95.txt (2988578 bytes)
    Block segmentation:
     0           | eoltxt    |   2988578 [0 - 2988577]
    
     Segment data size: 40 bytes
    
     TN |Type name |Count      |Total size
    -----------------------------------------
     34 |eoltxt    |         1 |   2988578
    -----------------------------------------
    Total level  0 |         1 |   2988578
    
    text wrt stream(9).  Total 2899483
    compressed stream(12).  Total 627
     Total 2899483 wrt: 1853647
    WRT dict count 2615 words.
    ^CTerminate batch job (Y/N)? n
    Maybe -e11 works better... or -e15, its endless

    Code:
    enwik8    15862215    -s15 -w    Paq8pxd_v85_AVX2
    ​enwik8    15749003    -s15 -w -e1,english.dic.txt    Paq8pxd_v85_AVX2
    KZo


  31. Thanks:

    Darek (21st April 2020)

  32. #862
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,272
    Thanks
    802
    Thanked 545 Times in 415 Posts
    @Kaitz - last question for a while - is english.dic file name which you use : "english.dic" or "english.dic.txt" ?

  33. #863
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    538
    Thanks
    225
    Thanked 392 Times in 203 Posts
    Yes, its name as any file.
    KZo


  34. #864
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,272
    Thanks
    802
    Thanked 545 Times in 415 Posts
    One thing which is strange. I've mentioned it above but I've checked it double and compression and decompression is OK:

    paq8pxd_v85 -x9 R.DOC => 27'333 bytes
    paq8pxd_v85 -x9 -e1 R.DOC => 27'290 bytes - funny, just use -e1 w/o dictionary gives 43 bytes of gain

    paq8pxd_v85 -x9 S.DOC => 24'284 bytes
    paq8pxd_v85 -x9 -e1 S.DOC => 24'123 bytes - the same effect - lonely -e1 gives 161 bytes of gain

    This effect is visible only with these two files from my testset.

  35. #865
    Member
    Join Date
    Jun 2009
    Location
    Puerto Rico
    Posts
    277
    Thanks
    164
    Thanked 64 Times in 49 Posts
    What does the -e flags do?

  36. #866
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    538
    Thanks
    225
    Thanked 392 Times in 203 Posts
    Quote Originally Posted by Darek View Post
    One thing which is strange. I've mentioned it above but I've checked it double and compression and decompression is OK:

    paq8pxd_v85 -x9 R.DOC => 27'333 bytes
    paq8pxd_v85 -x9 -e1 R.DOC => 27'290 bytes - funny, just use -e1 w/o dictionary gives 43 bytes of gain

    paq8pxd_v85 -x9 S.DOC => 24'284 bytes
    paq8pxd_v85 -x9 -e1 S.DOC => 24'123 bytes - the same effect - lonely -e1 gives 161 bytes of gain

    This effect is visible only with these two files from my testset.
    It was not intended but means that dynamic minimum word count is multiple of 2. So probably only some words or none go to dictionary. As stated above, WRT word count shows difference.

    Quote Originally Posted by moisesmcardona View Post
    What does the -e flags do?
    Selects minimum count for word in external dictionary file that must be present in compressed file. Otherwise not used.
    KZo


  37. #867
    Member
    Join Date
    Jun 2009
    Location
    Puerto Rico
    Posts
    277
    Thanks
    164
    Thanked 64 Times in 49 Posts
    Great work on v85! TIFF file compression is better than on paq8px

  38. #868
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,272
    Thanks
    802
    Thanked 545 Times in 415 Posts
    enwik scores fr paq8pxd v85 with Byron's dictionary:

    15'674'853 - enwik8 -x15 -w -e1,english.dic by Paq8pxd_v85, time 10'775,22s
    123'523'709 - enwik9 -x15 -w -e1,english.dic by Paq8pxd_v85, time 114'420,64s

    Two best scores for whole enwik8 serie

  39. Thanks:

    kaitz (29th April 2020)

  40. #869
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    538
    Thanks
    225
    Thanked 392 Times in 203 Posts
    Quote Originally Posted by Darek View Post
    enwik scores fr paq8pxd v85 with Byron's dictionary:

    15'674'853 - enwik8 -x15 -w -e1,english.dic by Paq8pxd_v85, time 10'775,22s
    123'523'709 - enwik9 -x15 -w -e1,english.dic by Paq8pxd_v85, time 114'420,64s

    Two best scores for whole enwik8 serie
    my enwik8 -x15 was same.

    enwik9 124315764 -s15 -w -e1,english.dic.txt Paq8pxd_v85_AVX2

    My latest test (-w and wrt changes):
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	pxdv86test.PNG 
Views:	55 
Size:	5.2 KB 
ID:	7573  
    KZo


  41. Thanks:

    Darek (30th April 2020)

  42. #870
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,272
    Thanks
    802
    Thanked 545 Times in 415 Posts
    Scores of 4 corpuses for paq8pxd v85. Similar to earlier versions this version also get best scores for all 4 corpuses
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	4_Corpuses_paq8pxd_v85.jpg 
Views:	56 
Size:	2.56 MB 
ID:	7577  

Page 29 of 36 FirstFirst ... 192728293031 ... LastLast

Similar Threads

  1. FreeArc compression suite (4x4, Tornado, REP, Delta, Dict...)
    By Bulat Ziganshin in forum Data Compression
    Replies: 554
    Last Post: 26th September 2018, 03:41
  2. Dict preprocessor
    By pat357 in forum Data Compression
    Replies: 5
    Last Post: 2nd May 2014, 22:51

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •