Page 24 of 31 FirstFirst ... 142223242526 ... LastLast
Results 691 to 720 of 923

Thread: Paq8pxd dict

  1. #691
    Member
    Join Date
    Aug 2008
    Location
    NZ
    Posts
    59
    Thanks
    30
    Thanked 11 Times in 7 Posts
    I've run some tests with paq8pxd V73 and added the results to the table below.

    Tests run under Windows 7 64 bit, with i5-3570k CPU, and 8 GB RAM. Used SSE41 compiles of paq8pxd V*.

    Code:
    Compressor               Total file(s) size (bytes)    Compression time (seconds)    Compression options
    Original 171 jpg files         64,469,752
    paq8pxd v69                    51,365,725                          7,753                      -s9
    paq8pxd v72                    51,338,132                          7,533                      -s9
    paq8pxd v73                    51,311,533                          7,629                      -s9
    
    Tarred jpg files               64,605,696
    paq8pxd v69                    50,571,934                          7,897                      -s9
    paq8pxd v72                    50,552,930                          7,756                      -s9
    paq8pxd v73                    50,530,038                          7,521                      -s9
    Overall, improved compression, and slight reduction in compression time for v73!

  2. Thanks (2):

    Darek (18th February 2020),kaitz (18th February 2020)

  3. #692
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,103
    Thanks
    679
    Thanked 436 Times in 334 Posts
    Scores on my testset for paq8pxd_v73 - very nice improvements overall - especially for K.WAD file. In total 25KB of gain.
    Option -x (second table) gives additional 16KB of gain to -s test -> the gains are visible almost for every file.
    Time penalty for -x option compared to -s is about 21%.
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	paq8pxd_v73_s.jpg 
Views:	25 
Size:	770.7 KB 
ID:	7418   Click image for larger version. 

Name:	paq8pxd_v73_x.jpg 
Views:	19 
Size:	770.8 KB 
ID:	7419  

  4. Thanks:

    kaitz (18th February 2020)

  5. #693
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    505
    Thanks
    208
    Thanked 346 Times in 184 Posts
    Quote Originally Posted by brispuss View Post
    Overall, improved compression, and slight reduction in compression time for v73!
    In single mode there is file overhead about 50 bytes per file vs px version. Like when you compress 1 byte file.
    For this single mode test i think its about 9000 bytes total. Not sure how much overhead is on tarred file, probably 100 bytes total for input data. (data that cant be compressed)
    KZo


  6. Thanks:

    brispuss (20th February 2020)

  7. #694
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,103
    Thanks
    679
    Thanked 436 Times in 334 Posts
    @kaitz - I have question - is there -x option works for -x15 option? I've tested enwik8 files and scores for -s15 and -x15 are identical. Timings are also similar.
    For my testset, for textual files there are difference between -s9 and -x9 options.

    From other side scores are great:

    16'013'638 - enwik8 -s15 by Paq8pxd_v72_AVX2
    16'672'036 - enwik8.drt -s15 by Paq8pxd_v72_AVX2

    15'993'409 - enwik8 -s15 by Paq8pxd_v73_AVX2
    15'956'537 - enwik8.drt -s15 by Paq8pxd_v73_AVX2

  8. #695
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    505
    Thanks
    208
    Thanked 346 Times in 184 Posts
    My bad, level 10 and up will be in -s mode.
    KZo


  9. #696
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,103
    Thanks
    679
    Thanked 436 Times in 334 Posts
    From one side that's good news -> if you could change this then scores for enwik8 could be even better!

  10. #697
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,103
    Thanks
    679
    Thanked 436 Times in 334 Posts
    Scores of 4 corpuses for paq8pxd_v73 with -s15 mode. The best scores for Calgary, Canterbury and MaximumCompression for paq8pxd family!
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	4_Corpuses_paq8pxd_v73_s.jpg 
Views:	31 
Size:	2.00 MB 
ID:	7423  

  11. Thanks:

    kaitz (19th February 2020)

  12. #698
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    505
    Thanks
    208
    Thanked 346 Times in 184 Posts
    paq8pxd_v74
    Code:
    fix -x option on level 10-15
    -x option has affect only on default, text mode.
    Attached Files Attached Files
    KZo


  13. Thanks (2):

    Darek (19th February 2020),moisesmcardona (27th February 2020)

  14. #699
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,103
    Thanks
    679
    Thanked 436 Times in 334 Posts
    @kaitz - at first thanks to fix this. At second "-x option has affect only on default, text mode." - is it ineffective for other types of data?

  15. #700
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    505
    Thanks
    208
    Thanked 346 Times in 184 Posts
    It's used in stream where all default data is. Also for text as humans tend to produce allot of it. :)
    Some images have headers and it my gives somewhat better compression on that, but not very useful. It all depends (how many files, etc).


    This needs time consuming testing to be actually useful on other types of data.
    My test version shows what context are mostly bad for given data in time.


    I think there has been good enough improvements from someone like me. But i still wonder.
    KZo


  16. Thanks:

    Darek (20th February 2020)

  17. #701
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,103
    Thanks
    679
    Thanked 436 Times in 334 Posts
    some enwik scores for paq8pxd_v74:

    16'358'450 - enwik8 -s8 by Paq8pxd_v72_AVX2
    16'013'638 - enwik8 -s15 by Paq8pxd_v72_AVX2
    16'672'036 - enwik8.drt -s15 by Paq8pxd_v72_AVX2
    126'779'432 - enwik9_1423 -s15 by Paq8pxd_v72_AVX2
    132'464'891 - enwik9_1423.drt -s15 by Paq8pxd_v72_AVX2

    16'339'122 - enwik8 -s8 by Paq8pxd_v74_AVX2
    15'993'409 - enwik8 -s15 by Paq8pxd_v74_AVX2
    15'956'537 - enwik8.drt -s15 by Paq8pxd_v74_AVX2

    16'279'540 - enwik8 -x8 by Paq8pxd_v74_AVX2
    15'928'916 - enwik8 -x15 by Paq8pxd_v74_AVX2 - best score for non DRT preprocessed enwik8 file for whole paq8pxd family
    15'880'133 - enwik8.drt -x15 by Paq8pxd_v74_AVX2 - best overall score for enwik8 file for whole paq8pxd family - enwik9 score could be even below 126'000'000 bytes )) I'll check it.

  18. Thanks:

    kaitz (21st February 2020)

  19. #702
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,103
    Thanks
    679
    Thanked 436 Times in 334 Posts
    Scores of 4 Corpuses for paq8pxd_v74 -x15 option. Improvements to paq8pxd_v72 from 0.26% to 0.59% and of course the best scores for all four testsets for paq8pxd series.
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	4_Corpuses_paq8pxd_v74.jpg 
Views:	22 
Size:	2.10 MB 
ID:	7430  

  20. Thanks:

    kaitz (21st February 2020)

  21. #703
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    505
    Thanks
    208
    Thanked 346 Times in 184 Posts
    Quote Originally Posted by Darek View Post
    Some enwik scores for paq8pxd_v72:
    16'013'638 - enwik8 -s15 by Paq8pxd_v72_AVX2 - tested by Sportman, My compress time 6'811s -@Sportman - you have very fast machine!
    On my computer it takes about 9800 seconds to compress.
    KZo


  22. #704
    Member CompressMaster's Avatar
    Join Date
    Jun 2018
    Location
    Lovinobana, Slovakia
    Posts
    184
    Thanks
    49
    Thanked 13 Times in 13 Posts
    Quote Originally Posted by Darek View Post
    Scores of 4 Corpuses for paq8pxd_v74 -x15 option. Improvements to paq8pxd_v72 from 0.26% to 0.59% and of course the best scores for all four testsets for paq8pxd series.
    Why Silesia and MaximumCompression corpuses aren´t tarred?
    Maximum compression corpus - only 40 807 bytes left to get below 6 000 000 bytes! As always, good testing, Darek!

  23. #705
    Member CompressMaster's Avatar
    Join Date
    Jun 2018
    Location
    Lovinobana, Slovakia
    Posts
    184
    Thanks
    49
    Thanked 13 Times in 13 Posts
    btw, it´s interesting to see how compression improving over time. And, maybe overestimated expectation, - when did you expect Maximum compression corpus goes below 1 000 000 bytes?

  24. #706
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,103
    Thanks
    679
    Thanked 436 Times in 334 Posts
    Quote Originally Posted by CompressMaster View Post
    Why Silesia and MaximumCompression corpuses aren´t tarred?
    Maximum compression corpus - only 40 807 bytes left to get below 6 000 000 bytes! As always, good testing, Darek!

    btw, it´s interesting to see how compression improving over time. And, maybe overestimated expectation, - when did you expect Maximum compression corpus goes below 1 000 000 bytes?
    Truely speaking - I don't know. These tests were tested mostly w/o tarred file test from beginnig. I was copy existed approaches.

    Maybe due to the fact that there are longer tests and compress additionally tarred file double the test time.
    For paq8px/pxd there is a quite reasonable time yet, but for cmix it's additonal 3-day test in one approach which sometimes could be hard to handle.

    From other side - tarred file test didn't give you any information about improvement of particular files.

    According to MaximumCompression corpus estimate -> 1'000'000 is very ambitious challenge... at now best scores of all files gives 5'872'598 bytes. Using all techniques from all compressors (especially paq8px and paq8pxd for FlashMx.pdf and vcfiu.hlp) into actually best cmix there is a chance to get 5'800'000, maybe 5'700'000 bytes. More parsers and better NN compression maybe gives additonal 100-300KB of gain then we are landing about 5'400'00-5'500'000 bytes. There's need a completely new technique or specialized parsers for all files to get lower score but it could be 4'000'000 bytes. Hmmmm... 1'000'000 looks impossible at now for me.

    In attached table there are the best scores of MaximumCompression corpus for most of best compressors.
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	MC_best_scores.jpg 
Views:	17 
Size:	109.2 KB 
ID:	7434  

  25. #707
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    505
    Thanks
    208
    Thanked 346 Times in 184 Posts
    .hlp file has some LZ compressed data, recompress it. And if you target only this set you can gain allot. Jpeg gain 1-2kb gain is possible. On im24 4kb is possible. On dict 3kb is possible But why? And pxd splits data, and sometimes it is bad for compression. Silesia -> samba. Do you care compressor/decompressor size, memory usage, time...
    px version has gains on different places, but conisider time/mem/ etc...
    In pxd its clear that sometimes adding more models makes it worse. x vs s option. Like adding dmc back to jpeg stream makes it worse... (one present in pxd).
    Its hard.
    KZo


  26. #708
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,103
    Thanks
    679
    Thanked 436 Times in 334 Posts
    enwik9 score for non DRT version and -s15 (next three scores):

    126'211'491 - enwik9_1423 -s15 by Paq8pxd_v74_AVX2, time 67'980,66s.
    126'599'124 - enwik9_1423.drt -s15 by Paq8pxd_v74_AVX2, time 97604,05s - I don't know why DRT file is compressed 44% more time...
    125'752'479 - enwik9_1423 -x15 by Paq8pxd_v74_AVX2, time 91787,59s - record for a paq8pxd series at all!
    Last edited by Darek; 25th February 2020 at 14:06.

  27. Thanks:

    kaitz (24th February 2020)

  28. #709
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    505
    Thanks
    208
    Thanked 346 Times in 184 Posts
    paq8pxd_v75
    Code:
    - Change wordModel1
    enwik8 -s8 is about 16kb smaller. Time should be same, memory usage maybe bit smaller.
    Attached Files Attached Files
    Last edited by kaitz; 25th February 2020 at 17:58. Reason: file
    KZo


  29. Thanks (4):

    brispuss (26th February 2020),Darek (25th February 2020),DZgas (11th March 2020),moisesmcardona (26th February 2020)

  30. #710
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    505
    Thanks
    208
    Thanked 346 Times in 184 Posts
    Quote Originally Posted by Darek View Post
    enwik9 score for non DRT version and -s15 (next three scores):

    126'211'491 - enwik9_1423 -s15 by Paq8pxd_v74_AVX2, time 67'980,66s.
    126'599'124 - enwik9_1423.drt -s15 by Paq8pxd_v74_AVX2, time 97604,05s - I don't know why DRT file is compressed 44% more time...
    Its detected as default data and its slower. Compression probably is worse for this reason.
    KZo


  31. Thanks:

    Darek (26th February 2020)

  32. #711
    Member
    Join Date
    Aug 2008
    Location
    NZ
    Posts
    59
    Thanks
    30
    Thanked 11 Times in 7 Posts
    I've run some further tests with paq8pxd V75 and added the results to the table below.

    Brief tests were done with paq8pxd v74, but there was no improvement in compression with slightly quicker compression times. I didn't think it was worth posting results for paq8pxd v74 tests.

    Tests run under Windows 7 64 bit, with i5-3570k CPU, and 8 GB RAM. Used SSE4 compiles of paq8pxd V*.

    Code:
    Compressor               Total file(s) size (bytes)    Compression time (seconds)    Compression options
    Original 171 jpg files         64,469,752
    paq8pxd v69                    51,365,725                          7,753                      -s9
    paq8pxd v72                    51,338,132                          7,533                      -s9
    paq8pxd v73                    51,311,533                          7,629                      -s9
    paq8pxd v75                    51,311,427                          7,509                      -s9
    
    Tarred jpg files               64,605,696
    paq8pxd v69                    50,571,934                          7,897                      -s9
    paq8pxd v72                    50,552,930                          7,756                      -s9
    paq8pxd v73                    50,530,038                          7,521                      -s9
    paq8pxd v75                    50,528,772                          7,501                      -s9
    For v75, improved compression, and slight reduction in compression time!

  33. Thanks (2):

    Darek (26th February 2020),kaitz (27th February 2020)

  34. #712
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,103
    Thanks
    679
    Thanked 436 Times in 334 Posts
    Quote Originally Posted by kaitz View Post
    Its detected as default data and its slower. Compression probably is worse for this reason.
    Quite interesting thing is that enwik8.drt is compressed 0,23% better than pure enwik8 but enwik9.drt is compressed 0,31% worse than pure enwik9 -> that's 0,54% of difference.
    Looks like text/word model is more efficient for bigger files than default model. Learning rate is different? Hmmmm....

  35. #713
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,843
    Thanks
    288
    Thanked 1,245 Times in 698 Posts
    DRT uses a static dictionary with only ~45k words. I guess enwik9 gets too many unconverted words.

    I wonder if nncp preprocessor could be used instead - either directly, or followed by utf16-to-utf8 conversion.

  36. Thanks:

    Darek (26th February 2020)

  37. #714
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,103
    Thanks
    679
    Thanked 436 Times in 334 Posts
    paq8pxd_v75 scores on my testset with -s9 option. Some better scores mainly for exe and txt files.
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	paq8pxd_v75.jpg 
Views:	13 
Size:	769.7 KB 
ID:	7447  

  38. #715
    Member
    Join Date
    Nov 2019
    Location
    Malaysia
    Posts
    3
    Thanks
    1
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Darek View Post
    paq8pxd_v75 scores on my testset with -s9 option. Some better scores mainly for exe and txt files.
    @darek, do you still have the paq8pxd48_bwt1 source code ??

  39. #716
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,103
    Thanks
    679
    Thanked 436 Times in 334 Posts
    Quote Originally Posted by lzhuff View Post
    @darek, do you still have the paq8pxd48_bwt1 source code ??
    Here you are only file I have. There is an official source which was published on the forum, however I don't know if it's ok.
    Attached Files Attached Files
    Last edited by Darek; 27th February 2020 at 14:02.

  40. #717
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,103
    Thanks
    679
    Thanked 436 Times in 334 Posts
    Scores of 4 Corpuses for paq8pxd_v75 -> very nice improvement on Silesia, however there are best scores for all corpuses .
    Additionaly there are -x15 option comparison of v75 vs.v74.
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	paq8pxd_v75x15.jpg 
Views:	16 
Size:	769.9 KB 
ID:	7452   Click image for larger version. 

Name:	4_Corpuses_paq8pxd_v75.jpg 
Views:	21 
Size:	2.15 MB 
ID:	7451  

  41. Thanks:

    kaitz (27th February 2020)

  42. #718
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,103
    Thanks
    679
    Thanked 436 Times in 334 Posts
    first enwik scores for paq8pxd_v75:

    16'339'122 - enwik8 -s8 by Paq8pxd_v74_AVX2
    15'993'409 - enwik8 -s15 by Paq8pxd_v74_AVX2
    15'956'537 - enwik8.drt -s15 by Paq8pxd_v74_AVX2

    16'279'540 - enwik8 -x8 by Paq8pxd_v74_AVX2
    15'928'916 - enwik8 -x15 by Paq8pxd_v74_AVX2
    15'880'133 - enwik8.drt -x15 by Paq8pxd_v74_AVX2

    16'319'686 - enwik8 -s8 by Paq8pxd_v75_AVX2 - 0.12% of improvement
    15'976'838 - enwik8 -s15 by Paq8pxd_v75_AVX2 - 0.10% of improvement
    15'934'372 - enwik8.drt -s15 by Paq8pxd_v75_AVX2 - 0.14% of improvement

    16'260'265 - enwik8 -x8 by Paq8pxd_v75_AVX2- 0.12% of improvement
    15'912'509 - enwik8 -x15 by Paq8pxd_v75_AVX2- 0.10% of improvement -> this score could provide to about 125'58x'xxx bytes for enwik9_1423
    15'859'187 - enwik8.drt -x15 by Paq8pxd_v75_AVX2- 0.13% of improvement, best score for paq8pxd series

    According to time there are different changes -> from 5-7% up to 18% quicker (for enwik8.drt -x15)!

  43. Thanks:

    kaitz (29th February 2020)

  44. #719
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,103
    Thanks
    679
    Thanked 436 Times in 334 Posts
    Another scores for enwik:

    16'339'122 - enwik8 -s8 by Paq8pxd_v74_AVX2
    15'993'409 - enwik8 -s15 by Paq8pxd_v74_AVX2
    15'956'537 - enwik8.drt -s15 by Paq8pxd_v74_AVX2
    126'211'491 - enwik9_1423 -s15 by Paq8pxd_v74_AVX2
    126'599'124 - enwik9_1423.drt -s15 by Paq8pxd_v74_AVX2

    16'279'540 - enwik8 -x8 by Paq8pxd_v74_AVX2
    15'928'916 - enwik8 -x15 by Paq8pxd_v74_AVX2
    15'880'133 - enwik8.drt -x15 by Paq8pxd_v74_AVX2
    125'752'479 - enwik9_1423 -x15 by Paq8pxd_v74_AVX2
    126'065'722 - enwik9_1423.drt -x15 by Paq8pxd_v74_AVX2

    16'319'686 - enwik8 -s8 by Paq8pxd_v75_AVX2 => 0.12% of gain to paq8pxd_v74
    15'976'838 - enwik8 -s15 by Paq8pxd_v75_AVX2 => 0.10% of gain to paq8pxd_v74
    15'934'372 - enwik8.drt -s15 by Paq8pxd_v75_AVX2 => 0.14% of gain to paq8pxd_v74
    126'227'845 - enwik9_1423 -s15 by Paq8pxd_v75_AVX2 => unfortunately, enwik9 got 0.01% lose to paq8pxd_v74 despite enwik8 got nice gain. Time saving is about 3.7%.
    126'615'528 estimated - enwik9_1423.drt -s15 by Paq8pxd_v75_AVX2

    16'260'265 - enwik8 -x8 by Paq8pxd_v75_AVX2
    15'912'509 - enwik8 -x15 by Paq8pxd_v75_AVX2
    15'859'187 - enwik8.drt -x15 by Paq8pxd_v75_AVX2

    enwik9_1423 -x15 by Paq8pxd_v75_AVX2 is on the run

  45. #720
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,843
    Thanks
    288
    Thanked 1,245 Times in 698 Posts
    @Darek: You can compile it, right?
    It could make sense to experiment with ppmd parameters here: https://github.com/kaitz/paq8pxd/blo...pxd.cpp#L12549
    o13 m420 (at -s8) seems pretty random, and there're probably better settings for enwik.

Page 24 of 31 FirstFirst ... 142223242526 ... LastLast

Similar Threads

  1. FreeArc compression suite (4x4, Tornado, REP, Delta, Dict...)
    By Bulat Ziganshin in forum Data Compression
    Replies: 554
    Last Post: 26th September 2018, 02:41
  2. Dict preprocessor
    By pat357 in forum Data Compression
    Replies: 5
    Last Post: 2nd May 2014, 21:51

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •