Page 4 of 5 FirstFirst ... 2345 LastLast
Results 91 to 120 of 141

Thread: Paq8sk

  1. #91
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    281
    Thanks
    44
    Thanked 50 Times in 40 Posts
    Paq8sk23

    - improve text model
    - faster than paq8sk22
    Attached Files Attached Files

  2. Thanks (2):

    Darek (31st May 2020),moisesmcardona (17th June 2020)

  3. #92
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,151
    Thanks
    703
    Thanked 455 Times in 352 Posts
    Could you post source code in every version?

    Short time test:

    paq8sk23 is about

    - 17% faster than paq8sk22 but is still about
    - 40% slower than paq8sk19 and
    - 78% slower than paq8pxd series.
    Last edited by Darek; 31st May 2020 at 20:56.

  4. #93
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    281
    Thanks
    44
    Thanked 50 Times in 40 Posts
    Quote Originally Posted by Darek View Post
    Could you post source code in every version?

    Short time test:

    paq8sk23 is about

    - 17% faster than paq8sk22 but is still about
    - 40% slower than paq8sk19 and
    - 78% slower than paq8pxd series.
    you are right but the result is better than paq8pxd series.

  5. #94
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    975
    Thanks
    96
    Thanked 392 Times in 274 Posts
    enwik8:
    15,753,052 bytes, 13,791.106 sec., paq8sk23 -x15 -w
    15,618,351 bytes, 14,426.736 sec., paq8sk23 -x15 -w -e1,english.dic
    Last edited by Sportman; 1st June 2020 at 22:22.

  6. Thanks (2):

    Darek (3rd June 2020),suryakandau@yahoo.co.id (1st June 2020)

  7. #95
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    281
    Thanks
    44
    Thanked 50 Times in 40 Posts
    Quote Originally Posted by Sportman View Post
    enwik8:
    15,618,351 bytes, 14,426.736 sec., paq8sk23 -x15 -w -e1,english.dic
    Sportman could you test enwik9 using paq8sk23 -x4 -w -e1,English.dic please ?

  8. #96
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    975
    Thanks
    96
    Thanked 392 Times in 274 Posts
    Quote Originally Posted by suryakandau@yahoo.co.id View Post
    could you test enwik9 using paq8sk23 -x4 -w -e1,English.dic please ?
    I guess -x14, I do only fast tests this moment so no enwik9.

  9. #97
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,151
    Thanks
    703
    Thanked 455 Times in 352 Posts
    I've started to test paq8sk19. paq8sk22 or paq8sk23 is generally too slow...

    First 200-300MB goes generally (historically) well but after then often program start to use very high amount of memory and often block whole computer - as I said I have "only" 32GB of RAM.

  10. #98
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    281
    Thanks
    44
    Thanked 50 Times in 40 Posts
    Quote Originally Posted by Darek View Post
    I've started to test paq8sk19. paq8sk22 or paq8sk23 is generally too slow...

    First 200-300MB goes generally (historically) well but after then often program start to use very high amount of memory and often block whole computer - as I said I have "only" 32GB of RAM.
    Maybe you could test paq8sk19 with -x14 option..thanx. ...

  11. #99
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,151
    Thanks
    703
    Thanked 455 Times in 352 Posts
    122'505'372 - enwik9 -x15 -w -e1,english.dic by Paq8sk19, time 134115,35s - not as bad time as for paq8sk13
    Score very close to my estimate.

  12. Thanks:

    suryakandau@yahoo.co.id (5th June 2020)

  13. #100
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    281
    Thanks
    44
    Thanked 50 Times in 40 Posts
    Paq8sk25

    -faster than paq8sk23
    -better than paq8sk19

    this is the source code and the binary of paq8sk25.
    @darek/sportman could you test it on enwik8/9 using paq8sk25 -x15 -w -e1,english.dic please ?
    Attached Files Attached Files

  14. Thanks (2):

    Darek (7th June 2020),moisesmcardona (17th June 2020)

  15. #101
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,151
    Thanks
    703
    Thanked 455 Times in 352 Posts
    I'll try to test it on enwik9.
    Of course it depend how slow this version is and how much memory use - I can't stay my laptop for 3-4 days wihout doing anything else than test. Using more than 26-27GB causes high use of swap file and then my laptop is unusable to do anything else... But, I've found that paq8sk19 use slightly less memory than paq8sk13 with enwik9. I'll try.

  16. #102
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    281
    Thanks
    44
    Thanked 50 Times in 40 Posts
    Quote Originally Posted by Darek View Post
    I'll try to test it on enwik9.
    Of course it depend how slow this version is and how much memory use - I can't stay my laptop for 3-4 days wihout doing anything else than test. Using more than 26-27GB causes high use of swap file and then my laptop is unusable to do anything else... But, I've found that paq8sk19 use slightly less memory than paq8sk13 with enwik9. I'll try.
    i am testing it on enwik9 using -x10 -w -e1,english.dic. the process is running now. thank you

  17. #103
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,151
    Thanks
    703
    Thanked 455 Times in 352 Posts
    Quote Originally Posted by suryakandau@yahoo.co.id View Post
    i am testing it on enwik9 using -x10 -w -e1,english.dic. the process is running now. thank you
    Started. We'll see if it run OK

    Looks like 36h to go. Total time about 42h. Now, After 300MB often situation going worse
    Last edited by Darek; 7th June 2020 at 21:55.

  18. #104
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    975
    Thanks
    96
    Thanked 392 Times in 274 Posts
    enwik8:
    15,759,356 bytes, 8,856.254 sec., paq8sk25 -x15 -w
    15,629,136 bytes, 9,290.952 sec., paq8sk25 -x15 -w -e1,english.dic
    Last edited by Sportman; 7th June 2020 at 19:13.

  19. Thanks (2):

    Darek (7th June 2020),suryakandau@yahoo.co.id (7th June 2020)

  20. #105
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,151
    Thanks
    703
    Thanked 455 Times in 352 Posts
    My estimates:

    for -x10 -w -e1,english.dic is 123'910'xxx or close o it,
    for -x15 -w -e1,english.dic is 122'505'xxx or close o it,

  21. #106
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,151
    Thanks
    703
    Thanked 455 Times in 352 Posts
    Scores of enwik8 and enwik9 combined from Sportman, Suryakandau and mine:

    15'758'738 - enwik8 -x15 -w by Paq8sk19
    15'629'126 - enwik8 -x15 -w -e1,english.dic by Paq8sk19
    123'910'093 - enwik9 -x10 -w -e1,english.dic by Paq8sk19
    122'505'372 - enwik9 -x15 -w -e1,english.dic by Paq8sk19

    15'759'356 - enwik8 -x15 -w by Paq8sk25
    15'629'136 - enwik8 -x15 -w -e1,english.dic by Paq8sk25
    122'497'059 - enwik9 -x15 -w -e1,english.dic by Paq8sk25 - time 135'165,86s

  22. Thanks:

    suryakandau@yahoo.co.id (13th June 2020)

  23. #107
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    281
    Thanks
    44
    Thanked 50 Times in 40 Posts
    Paq8sk26
    - improve text model
    - faster than paq8sk25
    the result for dickens file (silesia benchmark) using paq8sk26 -s6 -w -e1,english.dic
    Total 10192446 bytes compressed to 1901162 bytes.
    Time 2088.93 sec, used 1147 MB (1203085224 bytes) of memory
    Attached Files Attached Files

  24. Thanks:

    moisesmcardona (17th June 2020)

  25. #108
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    975
    Thanks
    96
    Thanked 392 Times in 274 Posts
    enwik8:
    15,762,241 bytes, 8,188.465 bytes, paq8sk26 -x15 -w
    15,633,785 bytes, 8,596.777 bytes, paq8sk26 -x15 -w -e1,english.dic
    Last edited by Sportman; 13th June 2020 at 20:25.

  26. Thanks:

    suryakandau@yahoo.co.id (13th June 2020)

  27. #109
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    281
    Thanks
    44
    Thanked 50 Times in 40 Posts
    Quote Originally Posted by Sportman View Post
    enwik8:
    15,633,785 bytes, 8,596.777 bytes, paq8sk26 -x15 -w -e1,english.dic
    @darek could you test paq8sk26 on enwik9 ?

  28. #110
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,151
    Thanks
    703
    Thanked 455 Times in 352 Posts
    paq8sk26 looks like step back in compression vs. paq8sk25....
    let's think if it's reasonable to test this version.

    I've at now on my plan:

    1) enwik9_1423.drt for paq8px_v172fix2 to test - running - 35h to go
    2) enwik9 + phda dict for paq8pxd v86 (best score at now) - test if this dictionary get any impact on enwik9 - there no impacts on enwik8, my testfile and 4 corpuses, but maybe... I need to know at 100% but if it have any impact then there would be an impact for pa8skXX too = about 30h
    3) paq8sk - the question is if Is more reasonable to test paq8sk23 - long, long test but there is chance to some improvement or paq8sk26 - rather 45KB of loss... ?

    in meantime GPT-2 tests because it doesn't consume to much memory
    Last edited by Darek; 13th June 2020 at 22:01.

  29. #111
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    281
    Thanks
    44
    Thanked 50 Times in 40 Posts
    Quote Originally Posted by Darek View Post
    paq8sk26 looks like step back in compression vs. paq8sk25....
    let's think if it's reasonable to test this version.

    I've at now on my plan:

    1) enwik9_1423.drt for paq8px_v172fix2 to test - running - 35h to go
    2) enwik9 + phda dict for paq8pxd v86 (best score at now) - test if this dictionary get any impact on enwik9 - there no impacts on enwik8, my testfile and 4 corpuses, but maybe... I need to know at 100% but if it have any impact then there would be an impact for pa8skXX too = about 30h
    3) paq8sk - the question is if Is more reasonable to test paq8sk23 - long, long test but there is chance to some improvement or paq8sk26 - rather 45KB of loss... ?

    in meantime GPT-2 tests because it doesn't consume to much memory
    i think your second plan is useless because phda dict is spesifically designed for phda9 software. i have tried it for dickens file and the result bigger than using english.dic

  30. #112
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,151
    Thanks
    703
    Thanked 455 Times in 352 Posts
    Quote Originally Posted by suryakandau@yahoo.co.id View Post
    i think your second plan is useless because phda dict is spesifically designed for phda9 software. i have tried it for dickens file and the result bigger than using english.dic
    I need to check it. PDHA dictionary was (I think that) optimized for enwik9 mostly then, despite the enwik8 got worse score, is still a chance to get better score for enwik9.

  31. #113
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,151
    Thanks
    703
    Thanked 455 Times in 352 Posts
    It's better to test paq8sk26 or maybe go back to pa8sk23?

  32. #114
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    281
    Thanks
    44
    Thanked 50 Times in 40 Posts
    Quote Originally Posted by Darek View Post
    It's better to test paq8sk26 or maybe go back to pa8sk23?
    it is better to test paq8sk23 than paq8sk26. thank you

  33. #115
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,151
    Thanks
    703
    Thanked 455 Times in 352 Posts
    started
    First time estimate - 65h
    Last edited by Darek; 16th June 2020 at 19:06.

  34. #116
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    281
    Thanks
    44
    Thanked 50 Times in 40 Posts
    Paq8sk28
    -
    improve text model
    this is the source code n the binary of paq8sk28
    @sportman could you test it on enwik8 using paq8sk28 -x15 -w -e1,english.dic please ?
    Attached Files Attached Files

  35. Thanks:

    moisesmcardona (17th June 2020)

  36. #117
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    975
    Thanks
    96
    Thanked 392 Times in 274 Posts
    enwik8:
    15,755,989 bytes, 14,188.132 sec., paq8sk28 -x15 -w
    15,622,996 bytes, 14,965.880 sec., paq8sk28 -x15 -w -e1,english.dic
    Last edited by Sportman; 18th June 2020 at 14:59.

  37. #118
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,151
    Thanks
    703
    Thanked 455 Times in 352 Posts
    Quote Originally Posted by Sportman View Post
    enwik8:
    15,622,996 bytes, 14,965.880 sec., paq8sk28 -x15 -w -e1,english.dic
    Looks similar to pa8qsk23 version but with slightly worse score and longer time to do....

  38. #119
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,151
    Thanks
    703
    Thanked 455 Times in 352 Posts
    enwik9:

    122'364'274 - enwik9 -x15 -w -e1,english.dic by Paq8sk23, time 231'436,27s - best score but very bad time to compress and huge memory usage - i's hard to use laptop with 32GB during pa8psk running - especially after first 300-400MB passed.

    From my estimates this is the best score of paq8sk and even paq8sk26 and paq8sk28 still didn't beat it:

    122'726'582 - enwik9 -x15 -w -e1,english.dic by Paq8sk26 - hmmm, not good direction - paq8sk25 have better score (230KB) and 15% less time.
    122'641;887 - estimated (changed) - enwik9 -x15 -w -e1,english.dic by Paq8sk28 - running
    Last edited by Darek; 21st June 2020 at 10:50.

  39. Thanks:

    suryakandau@yahoo.co.id (19th June 2020)

  40. #120
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    281
    Thanks
    44
    Thanked 50 Times in 40 Posts
    Paq8sk29
    -tweak text and word model

    paq8sk29 -s6 -w -e1,english.dic enwik8
    Total 100000000 bytes compressed to 16380110 bytes.
    Time 35921.93 sec, used 3583 MB (3757850002 bytes) of memory
    Attached Files Attached Files

  41. Thanks:

    moisesmcardona (25th June 2020)

Page 4 of 5 FirstFirst ... 2345 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •