Page 12 of 22 FirstFirst ... 21011121314 ... LastLast
Results 331 to 360 of 644

Thread: Paq8pxd dict

  1. #331
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    395
    Thanks
    148
    Thanked 226 Times in 123 Posts
    paq8pxd_v28
    Quote Originally Posted by Stephan Busch View Post
    I am testing paq8pxd_v24 on the bibles testset (https://drive.google.com/file/d/0ByL...ew?usp=sharing).
    It doesn't detect UTF-8 on the following files:

    ara.txt
    heb.txt
    mao.txt
    tag.txt
    dut.txt
    eng.txt
    gre.txt
    chi.txt
    thai.txt
    kor.txt

    On the heb.txt, chi.txt and thai.txt, the wrt transform creates an output that is much larger than input.
    On heb.txt and chi.txt WRT is skipped then but not on thai.txt.
    Can you fix that please? I guess that WRT would give additional gain on both files.

    Could the following preprocessing help us here?
    http://airccse.org/journal/jcsit/7215ijcsit04.pdf
    I think this is what you want. Acording to SQ2017 table, this version should have best result on bible set, xml set should get also better results.

    EDIT:
    Code:
                          File           Size Compressed
    paq8pxd_v28   -s0 thai.txt       11302195    4066075
    paq8pxd_v28   -s8 thai.txt       11302195     711363
    paq8pxd_v27   -s8 thai.txt       11302195     859986
    paq8px_v99     -8 thai.txt       11302195     795101
    paq8px_v99     -8 thai.txt.pxd28  4066075     725870
    Emma v0.1.24 best thai.txt       11302195     778565
    Emma v0.1.24 best thai.txt.pxd28  4066075     715765
    Code:
    paq8pxd_v28 -s9:2  enwik8 100000000 16350992 6624.94 sec  5144.1 MB 
    paq8pxd_v28 -s10:2 enwik8 100000000 16288928 6873.61 sec  9072.1 MB - Retesting ...
    paq8pxd_v28 -s11:2 enwik8 100000000 16280617 6569.11 sec 10784.1 MB 
    Attached Files Attached Files
    Last edited by kaitz; 23rd August 2017 at 17:28. Reason: added EMMA tests; enwik8
    KZo


  2. The Following 2 Users Say Thank You to kaitz For This Useful Post:

    Darek (23rd August 2017),Stephan Busch (23rd August 2017)

  3. #332
    Member
    Join Date
    Jun 2013
    Location
    Sweden
    Posts
    150
    Thanks
    9
    Thanked 25 Times in 23 Posts
    Quote Originally Posted by kaitz View Post
    paq8pxd_v28

    Tested all methods, f1-f10 and q1-q10 and s1-s3 looks fine but found error in s4-s10. Testfile was your compile of paq8pxd_v28x64.exe
    Viewing file with hex-editor and found large blocks of FFh and 00h.



    for %a in (f q s) do for %b in (1 2 3 4 5 6 7 8 9 10) do (
    copy e:\paq8pxd_v28x64.exe %a%b.exe
    %a%b.exe -%a%b %a%b.exe
    )

    pause

    for %a in (f q s) do for %b in (1 2 3 4 5 6 7 8 9 10) do (
    ren %a%b.exe %a%b.ex
    E:\paq8pxd_v28x64.exe -d %a%b.exe.paq8pxd27
    )

  4. The Following User Says Thank You to a902cd23 For This Useful Post:

    mpais (23rd August 2017)

  5. #333
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    395
    Thanks
    148
    Thanked 226 Times in 123 Posts
    It is not an error. FF FF... is compressed run of zeros in input. all paq8 version do this. 00h at EOF is stream sizes, and in this case stream sizes are mostly zero.
    KZo


  6. The Following User Says Thank You to kaitz For This Useful Post:

    Stephan Busch (23rd August 2017)

  7. #334
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    940
    Thanks
    558
    Thanked 380 Times in 284 Posts
    >paq8pxd_v28 -s9:2 enwik8 100000000 16350992 6624.94 sec 5144.1 MB
    >paq8pxd_v28 -s10:2 enwik8 100000000 16288928 6873.61 sec 9072.1 MB
    >paq8pxd_v28 -s11:2 enwik8 100000000 16280617 6569.11 sec 10784.1 MB

    I've got:

    16'288'925 - paq8pxd_v28 -s10 enwik8 4812,21 sec ~6750 MB - strange, slightly different score than yours, I've checked it double.
    16'179'200 - paq8pxd_v28 -s10 enwik8.drt 4328,62 sec ~6750 MB

    these scores are slightly better than all v18, v19 and v25, however worse than v27 version:

    16'308'754 - paq8pxd_v18 -s10 enwik8
    16'201'839 - paq8pxd_v18 -s10 enwik8.drt

    16'373'632 - paq8pxd_v19 -s10 enwik8
    16'204'616 - paq8pxd_v19 -s10 enwik8.drt

    16'306'973 - paq8pxd_v25 -s10 enwik8
    16'213'602 - paq8pxd_v25 -s10 enwik8.drt

    16'283'711 - paq8pxd_v27 -s10 enwik8
    16'176'737 - paq8pxd_v27 -s10 enwik8.drt

    If I could then I'll bet that v27 to be the winner for enwik9 -s15 with my estimation about 128'636'000 bytes - but I'll need to check it.

    Darek

  8. The Following User Says Thank You to Darek For This Useful Post:

    kaitz (23rd August 2017)

  9. #335
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    395
    Thanks
    148
    Thanked 226 Times in 123 Posts
    Quote Originally Posted by Darek View Post
    >paq8pxd_v28 -s9:2 enwik8 100000000 16350992 6624.94 sec 5144.1 MB
    >paq8pxd_v28 -s10:2 enwik8 100000000 16288928 6873.61 sec 9072.1 MB
    >paq8pxd_v28 -s11:2 enwik8 100000000 16280617 6569.11 sec 10784.1 MB

    I've got:

    16'288'925 - paq8pxd_v28 -s10 enwik8 4812,21 sec ~6750 MB - strange, slightly different score than yours, I've checked it double.
    Will check this, yours is probably correct. Wrong compile probably in testfolder.
    Quote Originally Posted by Darek View Post
    >
    16'283'711 - paq8pxd_v27 -s10 enwik8
    16'176'737 - paq8pxd_v27 -s10 enwik8.drt

    If I could then I'll bet that v27 to be the winner for enwik9 -s15 with my estimation about 128'636'000 bytes - but I'll need to check it.
    You need to include drt+dict+paq8pxd+compressed archive in size calculation.?
    KZo


  10. #336
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    940
    Thanks
    558
    Thanked 380 Times in 284 Posts
    Quote Originally Posted by kaitz View Post
    You need to include drt+dict+paq8pxd+compressed archive in size calculation.?
    For pure enwik I'll need to add only compressed paq8pxd archive - however as I wrote this compile is big - about 1MB and packed by 7zip got 300KB, zipped by Matt would be even bigger - mabye 360-370KB. Paq8pxd v18 compressed by Matt for record submission have 100KB only - that means 200KB less for total score.

    For DRT files I'll need to add compressed DRT and dict but this part I could compress by paq8px_vXX - it's about 90KB. DRT versions have about 130KB less for paq8pxd than pure version then it's got slightly better result.

    For 1423.DRT split (as I discovered here: https://encode.su/threads/2459-EMMA-...ll=1#post49164 there is a 100-200KB gain (depend on algorithm) if we split enwik9 for 4 parts and combine it in 4123 order) I'll need to ad another 10KB for resplitting program.

    My estimate for v27 is for pure enwik9 score.

    Darek

  11. #337
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    395
    Thanks
    148
    Thanked 226 Times in 123 Posts
    LTCB states:
    • s = source code size (if available and smaller).
    KZo


  12. The Following User Says Thank You to kaitz For This Useful Post:

    Darek (23rd August 2017)

  13. #338
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    940
    Thanks
    558
    Thanked 380 Times in 284 Posts
    Quote Originally Posted by kaitz View Post
    LTCB states:
    • s = source code size (if available and smaller).
    Yes, you have right - there would be something slightly more than 100KB!

    With this option paq8pxd_v25 got better result for enwik9_1423.drt file: 128'896'250 (packed enwik9) +96'890 (packed drt+dict+resplit) + 101'779 (my compression of paq8pxd_v25 source) = 129'094'919 bytes - about 100KB less than best paq8pxd_v18 score = 129'225'607 bytes...

    I'll test some configurations of v18 and v27/v28 for enwik8/9 - It takes couple of days but I think paq8pxd best record for LTCB could be beaten and go below 129'000'000 bytes in total.

  14. #339
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    395
    Thanks
    148
    Thanked 226 Times in 123 Posts
    I think its only source or binary, not mix of both of them...
    KZo


  15. #340
    Member
    Join Date
    Jun 2013
    Location
    Sweden
    Posts
    150
    Thanks
    9
    Thanked 25 Times in 23 Posts
    Quote Originally Posted by kaitz View Post
    It is not an error. FF FF... is compressed run of zeros in input. all paq8 version do this. 00h at EOF is stream sizes, and in this case stream sizes are mostly zero.
    Sorry, forgot to write that MD5 and CRC32 did not matched original file after extraction of s4-s10, all other did match.

  16. #341
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    876
    Thanks
    467
    Thanked 175 Times in 85 Posts
    compression of the bibles is really impressive, Kaido: I can comfirm that v28 gains top position. congratulations

    Was the paper helpful for that?
    WRT transform seems much more powerful than before..


    By the way:
    could you please make paq8pxd to use current folder for creating temp file?
    The versions now are using system drive which isn't large enough on my system
    (I want to test paq8pxd on many gigabytes of data).

  17. #342
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    395
    Thanks
    148
    Thanked 226 Times in 123 Posts
    Quote Originally Posted by Stephan Busch View Post
    compression of the bibles is really impressive, Kaido: I can comfirm that v28 gains top position. congratulations

    Was the paper helpful for that?
    WRT transform seems much more powerful than before..

    .
    A little.
    Quote Originally Posted by Stephan Busch View Post
    By the way:
    could you please make paq8pxd to use current folder for creating temp file?
    The versions now are using system drive which isn't large enough on my system
    (I want to test paq8pxd on many gigabytes of data).
    I will tray.

    Quote Originally Posted by a902cd23 View Post
    Sorry, forgot to write that MD5 and CRC32 did not matched original file after extraction of s4-s10, all other did match.
    Indeed, its exemodel. Some mistake got in. I will try to figure out where i made it. So no point on testing v28 rtight now it .
    KZo


  18. #343
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    395
    Thanks
    148
    Thanked 226 Times in 123 Posts

    v29

    paq8pxd_v29

    Fixed exe predictor. This mistake only affected exe data that. Text image etc decompressed ok. But still. Happy testing.
    Last edited by kaitz; 24th August 2017 at 11:35. Reason: removed attachment
    KZo


  19. The Following User Says Thank You to kaitz For This Useful Post:

    Stephan Busch (23rd August 2017)

  20. #344
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    521
    Thanks
    198
    Thanked 745 Times in 302 Posts
    @kaitz
    The problem is in this line of the exe model:

    Context = State+16*Op.BytesRead+128*(Op.REX & REX_w);

    You should change it to:

    Context = State+16*Op.BytesRead+16*(Op.REX & REX_w);

    It was actually correct in the first version, but then when I was searching for bugs I misread it (somehow saw it as a boolean eval) and though I'd made a mistake.
    REX_w is 0x08, so a left shift by 4 (x16) will get it to 128, to set the msb of Context. I'll have to fix it also on paq8px.

  21. The Following 2 Users Say Thank You to mpais For This Useful Post:

    kaitz (23rd August 2017),Stephan Busch (23rd August 2017)

  22. #345
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    395
    Thanks
    148
    Thanked 226 Times in 123 Posts
    paq8pxd_v31

    @Darek
    I figured out why enwik8 compressed sizes are different. I forget to initialize xmltagcache to zero. The array content was undefined, it made decompression to fail.
    So no point on testing previous versions.

    Also added temp file creation to program folder!
    Attached Files Attached Files
    Last edited by kaitz; 24th August 2017 at 15:42. Reason: bold
    KZo


  23. The Following User Says Thank You to kaitz For This Useful Post:

    Darek (24th August 2017)

  24. #346
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    239
    Thanks
    104
    Thanked 142 Times in 103 Posts
    Did you jump paq8pxd_v30?

  25. #347
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    395
    Thanks
    148
    Thanked 226 Times in 123 Posts
    Quote Originally Posted by Mauro Vezzosi View Post
    Did you jump paq8pxd_v30?
    No, https://github.com/kaitz/paq8pxd/commits/master
    KZo


  26. The Following User Says Thank You to kaitz For This Useful Post:

    Mauro Vezzosi (24th August 2017)

  27. #348
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    940
    Thanks
    558
    Thanked 380 Times in 284 Posts
    Quote Originally Posted by Darek View Post
    Yes, you have right - there would be something slightly more than 100KB!

    With this option paq8pxd_v25 got better result for enwik9_1423.drt file: 128'896'250 (packed enwik9) +96'890 (packed drt+dict+resplit) + 101'779 (my compression of paq8pxd_v25 source) = 129'094'919 bytes - about 100KB less than best paq8pxd_v18 score = 129'225'607 bytes...

    I'll test some configurations of v18 and v27/v28 for enwik8/9 - It takes couple of days but I think paq8pxd best record for LTCB could be beaten and go below 129'000'000 bytes in total.
    I've compressed enwik9 with 1423 split by paq8pxd v27 -s15. I think the score is worth to subimssion in LTCB:

    128'269'105 bytes of enwik9_1423 + 7'023 bytes of resplitting batch (I didn't use drt in this case, then it contains fsplit32.exe+resplit.bat packed by paq8pxd_v27 and "How to Encode.txt" - nonpacked) + 111'187 bytes (7zipped source, probably Matt got slightly worse score with his zip) =

    128'387'315 bytes in total. Great job Kaido!

    Time: 42'645.22s, memory used 27'278.9MB. System: Core i7 4900MQ 2.8GHz turbo-boosted to 3.8GHz, 32GB, Win7Pro 64

    I'm checking also three options with -s15:
    1 - pure enwik9
    2 - enwik9.drt
    3 - enwik9_1423.drt
    I don't think if it could be much better score in total, with added files than above because of DRT+dict size, but never say never...

    Darek

    p.s. version v31 shows tmp files during compression.
    Last edited by Darek; 24th August 2017 at 14:48.

  28. The Following User Says Thank You to Darek For This Useful Post:

    kaitz (24th August 2017)

  29. #349
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    395
    Thanks
    148
    Thanked 226 Times in 123 Posts
    Quote Originally Posted by Darek View Post
    I've compressed enwik9 with 1423 split by paq8pxd v27 -s15. I think the score is worth to subimssion in LTCB:

    128'269'105 bytes of enwik9_1423 + 7'023 bytes of resplitting batch (I didn't use drt in this case, then it contains fsplit32.exe+resplit.bat packed by paq8pxd_v27 and "How to Encode.txt" - nonpacked) + 111'187 bytes (7zipped source, probably Matt got slightly worse score with his zip) =

    128'387'315 bytes in total. Great job Kaido!

    Time: 42'645.22s, memory used 27'278.9MB. System: Core i7 4900MQ 2.8GHz turbo-boosted to 3.8GHz, 32GB, Win7Pro 64

    I'm checking also three options with -s15:
    1 - pure enwik9
    2 - enwik9.drt
    3 - enwik9_1423.drt
    I don't think if it could be much better score in total, with added files than above because of DRT+dict size, but never say never...

    v27 will fail on decompression, so you need to use v31 at least.
    Quote Originally Posted by Darek View Post

    p.s. version v31 shows tmp files during compression.
    Quote Originally Posted by Stephan Busch View Post
    By the way:
    could you please make paq8pxd to use current folder for creating temp file?
    The versions now are using system drive which isn't large enough on my system
    (I want to test paq8pxd on many gigabytes of data).
    Stephan requested, so i changed it.
    Previous versions used system drive. It's not good for an SSD, so much. If you have one for system drive.
    KZo


  30. #350
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    940
    Thanks
    558
    Thanked 380 Times in 284 Posts
    >v27 will fail on decompression, so you need to use v31 at least.
    Are you sure? Hmmm it's not good because v27 have best compression ratio....
    OK. I'll test v31.



  31. #351
    Member
    Join Date
    Jun 2013
    Location
    Sweden
    Posts
    150
    Thanks
    9
    Thanked 25 Times in 23 Posts
    v31 is also broken, tested 1 to 11 (q f s) and only s10 s11 s4 s5 shows MD5 missmatch. (exe)
    Testfile was your compile of v31.

    Another test of Base64Encode.exe (only 14kb) shows no error after decompression, tested all f q s 1-13

    Some errors in x32 exe with s11 s10 s4 s5 s13 (testfile bcm100.exe and bcm100x32.exe)

  32. #352
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    940
    Thanks
    558
    Thanked 380 Times in 284 Posts
    OK, then there no reason to test enwik scores.
    @Kaitz - can you fix it?

  33. #353
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    395
    Thanks
    148
    Thanked 226 Times in 123 Posts
    Quote Originally Posted by a902cd23 View Post
    v31 is also broken, tested 1 to 11 (q f s) and only s10 s11 s4 s5 shows MD5 missmatch. (exe)
    Testfile was your compile of v31.

    Another test of Base64Encode.exe (only 14kb) shows no error after decompression, tested all f q s 1-13

    Some errors in x32 exe with s11 s10 s4 s5 s13 (testfile bcm100.exe and bcm100x32.exe)
    Indeed.

    Same mistake as in xmlmodel
    Attached Files Attached Files
    KZo


  34. The Following User Says Thank You to kaitz For This Useful Post:

    Darek (24th August 2017)

  35. #354
    Member
    Join Date
    Jun 2013
    Location
    Sweden
    Posts
    150
    Thanks
    9
    Thanked 25 Times in 23 Posts
    v32

    All q f s 1-14 okey (MD5 OK), couldnt run s15 due to out of memory, have 32GB with ramdisk of 5GB, but q15 and f15 also MD5 OK

  36. #355
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    940
    Thanks
    558
    Thanked 380 Times in 284 Posts
    Quote Originally Posted by kaitz View Post
    Indeed.

    Same mistake as in xmlmodel
    I've started to test v32 with enwik8 -s15 - compression and decompression. If there would be ok, then I go furter.
    v31 have slightly better compression ratio than v27 for enwik8! Maybe v32 would be similar.

    16'254'271 - enwik8 -s15 for paq8pxd_v31 - uncompressed and verified, SHA1 OK!

    It looks that every version have different dependence for memory usage - sometimes -s10 gets worse score but for -s15 is better result than previous versions...
    Example:

    16'283'711 - paq8pxd v27 -s10
    16'288'895 - paq8pxd v32 -s10 - worse than v27, but:

    16'262'542 - paq8pxd v27 -s15
    16'254'271 - paq8pxd v32 -s15 - better than v27
    Last edited by Darek; 24th August 2017 at 21:42.

  37. #356
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    876
    Thanks
    467
    Thanked 175 Times in 85 Posts
    during parsing through some large files (one of them 8.3GB) v32 and previous versions show much too long bytes sometimes:

    Code:
    931         | 8b-image  |    101760 b [197624886 - 197726645] (width: 384)
     932         | default   |      6780 b [197726646 - 197733425]
     933         | hdr       |      1092 b [197733426 - 197734517]
     934         | 8b-image  |      1024 b [197734518 - 197735541] (width: 32)
     935         | default   |18446744073709552000 b [197735542 - 197735537]
    ..
    621       | default   |18446744069414584000 b [4592626242 - 297658970]
     34622       | exe       |4294967296 b [297658971 - 4592626266]
     34623       | default   |18446744069414584000 b [4592626267 - 297658983]
     34624       | exe       |4294967296 b [297658984 - 4592626279]
     34625       | default   |18446744069414584000 b [4592626280 - 297658999]
     34626       | exe       |4294967296 b [297659000 - 4592626295]
     34627       | default   |18446744069414584000 b [4592626296 - 297659090]
     34628       | exe       |4294967296 b [297659091 - 4592626386]
     34629       | default   |18446744069414584000 b [4592626387 - 297659099]
     34630       | exe       |4294967296 b [297659100 - 4592626395]
     34631       | default   |18446744069414584000 b [4592626396 - 297659115]
     34632       | exe       |4294967296 b [297659116 - 4592626411]
     34633       | default   |18446744069414584000 b [4592626412 - 297659124]
     34634       | exe       |4294967296 b [297659125 - 4592626420]
     34635       | default   |18446744069414584000 b [4592626421 - 297659347]
     34636       | exe       |4294967296 b [297659348 - 4592626643]
     34637       | default   |18446744069414584000 b [4592626644 - 297659754]
     34638       | exe       |4294967296 b [297659755 - 4592627050]
     34639       | default   |18446744069414584000 b [4592627051 - 297659814]
     34640       | exe       |4295494757 b [297659815 - 4593154571]
     34641       | default   |18446744069417353000 b [4593154572 - 300956980]
     34642       | text      |4296534990 b [300956981 - 4597491970]
     34643       | jpeg      |18446744069414724000 b [4597491971 - 302662997]
     34644       | default   |4313319557 b [302662998 - 4615982554]

  38. #357
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    940
    Thanks
    558
    Thanked 380 Times in 284 Posts
    Paq8pxd v32 test on enwik9 with 1423 split by -s15 option:

    128'209'407 bytes of enwik9_1423 + 7'023 bytes of resplitting batch (I didn't use drt in this case, then it contains fsplit32.exe+resplit.bat packed by paq8pxd_v27 and "How to Encode.txt" - nonpacked) + 117'727 bytes (7zipped source, probably Matt got slightly worse score with his zip) =

    128'334'157 bytes in total. You need about 600KB less to beat durilca_kingsize submission and go to the 2'nd place on LTCB.

    enwik8 scores:
    16'288'895 - paq8pxd v32 -s10 for pure enwik8
    16'176'742 - paq8pxd v32 -s10 for enwik8.drt

    16'254'271 - paq8pxd v32 -s15 for pure enwik8
    16'139'511 - paq8pxd v32 -s15 for enwik8.drt

    Darek

  39. #358
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    940
    Thanks
    558
    Thanked 380 Times in 284 Posts
    I've tested all my enwik cases with -s15 option and v32 version and it's looks as follows:

    enwik8:
    16'254'271 - paq8pxd v32 -s15 for pure enwik8, time 4219,56s, memory 27'278.9MB + uncompressed and verified, SHA1 OK, time 4318,04s
    16'139'511 - paq8pxd v32 -s15 for enwik8.drt, time 4447,71, memory 27'278.9MB
    enwik9:
    128'290'115 - paq8pxd v32 -s15 for pure enwik9, time 42445,91s, memory 27'278.9MB
    128'755'019 - paq8pxd v32 -s15 for enwik9.drt, time 45311,93s, memory 27'278.9MB
    128'209'407 - paq8pxd v32 -s15 for enwik9_1423, time 41419,58s, memory 27'278.9MB + decompression in progress
    128'654'949 - paq8pxd v32 -s15 for enwik9_1423.drt, time 45155,23s, memory 27'278.9MB

    This is quite opposite behaviour with DRT precompression for enwik9 than v18 and v19 versions - for these version both enwik8 and enwik9 get a 100-130KB gain with DRT preprocess but now, with improved XWRT there are much better scores without DRT preprocessing for enwik9. Maybe it's also better memory usage and -s15 option improvement. Despite this 1432 split gives still 55-80KB gain and it could be used to slightly crunch the record score.



  40. The Following User Says Thank You to Darek For This Useful Post:

    kaitz (31st August 2017)

  41. #359
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    940
    Thanks
    558
    Thanked 380 Times in 284 Posts
    I've decompressed enwik9_1423 packed by v32 and checksum SHA1 is OK!

    I've tested also v32 with -s10 option to future compare:

    enwik8:
    16'288'895 - paq8pxd v32 -s10 for pure enwik8, memory showed by paq8pxd_v32 = 9'062,5MB but real memory usage was about 7700MB max.
    16'176'742 - paq8pxd v32 -s10 for enwik8.drt
    enwik9:
    129'671'266 - paq8pxd v32 -s10 for pure enwik9,
    130'075'320 - paq8pxd v32 -s10 for enwik9.drt,
    129'172'690- paq8pxd v32 -s10 for enwik9_1423,
    129'527'878 - paq8pxd v32 -s10 for enwik9_1423.drt,

    It looks that there slightly different behaviour for -10 than -s15. For -s10 option DRT preprocessing hurts compression by 350-400KB but 1423 split gives 500KB of gain.



  42. The Following User Says Thank You to Darek For This Useful Post:

    kaitz (31st August 2017)

  43. #360
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    940
    Thanks
    558
    Thanked 380 Times in 284 Posts
    @Matt - can you add three submisstions to LTCB?

    1) paq8pxd_v32:

    16'254'271 bytes of enwik8 encode time: 4218,56s, decode time: 4571,76s, memory used: 27'278,9MB
    128'209'407 bytes of enwik9 preprocessed by split 1423 - encode time: 41418,58s, decode time: 43518,11s, memory used: 27'278,9MB

    option -s15

    System: Core i7 4900MQ at 3.8GHz, 32GB, Win7Pro 64

    To the full entry to LTCB you should add 7'023 bytes of resplitting batch (I didn't use drt in this case, then it contains fsplit32.exe+resplit.bat packed by paq8pxd_v32 and "How to Encode.txt" - nonpacked) + 111'187 bytes (7zipped source, probably you got slightly worse score with yours zip). I've attached 7zip file with source, resplit packed by v32, and nonpacked
    "How to Encode.txt" file.

    2) EMMA V1.23 with ppmd_mod v3a by Shelwien:

    16'523'517 bytes of enwik8 preprocessed by DRT, decode time: 6'168,7s, encode time: 6'218,2s, memory used: 3800MB
    134'164'521 bytes of enwik9 preprocessed by split 1423 and DRT, decode time: 67'097,2s, encode time: 73006,4s, memory used: 3800MB


    EMMA 1.23 settings: all settings = MAX, eceept: image and audio models = off, use fast mode on long matches = off, xml=on, x86model=off, x86 exe code = off, delta coding = off, dictionary = off, ppmd memory = 1024, ppmd order = 14

    System: Core i7 4900MQ at 3.8GHz, 32GB, Win7Pro 64

    Decompressor batch file size (also attached in post) = 1'079'026 bytes compressed by 7zip. I've added ppmd_mod64.dll v3a file.

    3) paq8px_v96:

    16'704'802 bytes of enwik8 preprocessed by drt encode time: 6242,80s, decode time: 6118,22s, memory used: 1'700MB
    137'170'609 bytes of enwik9 preprocessed by split 1423 then drt - encode time: 63'618,27s, decode time: 64'113,55, memory used: 1'700MB

    option -8

    System: Core i7 4900MQ at 3.8GHz, 32GB, Win7Pro 64

    Decompressor batch file size also attached in post. I've used source to zip due to smaller file.
    I've tested next versions also but enwik9 scores are slightly worse than v96.

    Darek
    Attached Files Attached Files
    Last edited by Darek; 29th August 2017 at 13:06.

  44. The Following User Says Thank You to Darek For This Useful Post:

    Matt Mahoney (24th September 2017)

Page 12 of 22 FirstFirst ... 21011121314 ... LastLast

Similar Threads

  1. FreeArc compression suite (4x4, Tornado, REP, Delta, Dict...)
    By Bulat Ziganshin in forum Data Compression
    Replies: 554
    Last Post: 26th September 2018, 02:41
  2. Dict preprocessor
    By pat357 in forum Data Compression
    Replies: 5
    Last Post: 2nd May 2014, 21:51

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •