Page 77 of 78 FirstFirst ... 276775767778 LastLast
Results 2,281 to 2,310 of 2329

Thread: paq8px

  1. #2281
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    739
    Thanks
    424
    Thanked 486 Times in 260 Posts
    150'000? Are you sure you wanted to ask 150'000?
    OK. Let's do the math

    Open Darek's MaximumCompression results a couple of posts above.
    Look at the first result he recorded (for paq8px_v75): 637110
    Look at the last result (for paq8px_v200): 624578
    Gaining that 12532 bytes took roughly 125 versions.
    Doing a simple interpolation (which is totally incorrect, but fits very well to your question): 150'000 bytes will be reached at around paq8px_v4858.

  2. Thanks:

    CompressMaster (20th January 2021)

  3. #2282
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,273
    Thanks
    802
    Thanked 545 Times in 415 Posts
    Quote Originally Posted by Gotty View Post
    150'000? Are you sure you wanted to ask 150'000?
    OK. Let's do the math

    Open Darek's MaximumCompression results a couple of posts above.
    Look at the first result he recorded (for paq8px_v75): 637110
    Look at the last result (for paq8px_v200): 624578
    Gaining that 12532 bytes took roughly 125 versions.
    Doing a simple interpolation (which is totally incorrect, but fits very well to your question): 150'000 bytes will be reached at around paq8px_v4858.
    If we assume about 20 versions yearly it means that it could be Year 2263.... or there would be a some breakthrough or more...

  4. #2283
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    739
    Thanks
    424
    Thanked 486 Times in 260 Posts
    Quote Originally Posted by Darek View Post
    One thing with is worth to mention: between versions paq8px_v192 and paq8px_v193 something strange happened to tar files compression. There no bigger changes in other single files, however:

    Calgary.tar = 542'110 bytes compressed by paq8px_v192
    Calgary.tar = 557'019 bytes compressed by paq8px_v193 - about 15 KB worse score for this file... however score for sum of all files compressed separatelly is similar (136 bytes worse but not 15KB)

    Canterbury.tar = 290'645 bytes compressed by paq8px_v192
    Canterbury.tar = 298'700 bytes compressed by paq8px_v193 - about 7KB worse score for this file - and again score for sum of all files compressed separatelly is similar (4 bytes better)

    This worse level of compression for these mentioned files is sustained still up to paq8px_v197 version.

    From other hand there no such effect on MaximumCompresssion.tar file:

    MaximumCompresssion.tar = 6'004'940 bytes compressed by paq8px_v192
    MaximumCompresssion.tar = 5'959'185 bytes compressed by paq8px_v193 - about 5.8KB better score for this file - score for sum of all files compressed separatelly is also better but difference is about 3.5KB
    I need your help investigating it (I can't reproduce). Could you tell me the command line switches you used?

  5. Thanks:

    LucaBiondi (17th January 2021)

  6. #2284
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,273
    Thanks
    802
    Thanked 545 Times in 415 Posts
    Quote Originally Posted by Gotty View Post
    I need your help investigating it (I can't reproduce). Could you tell me the command line switches you used?
    Yes, of course. I'll check it again and give you switches used for these versions.

  7. Thanks:

    LucaBiondi (17th January 2021)

  8. #2285
    Member
    Join Date
    Jun 2009
    Location
    Puerto Rico
    Posts
    277
    Thanks
    164
    Thanked 64 Times in 49 Posts
    @Gotty, can the BZip2 transform that @kaitz added in paq8pxd be ported to paq8px?

  9. Thanks (2):

    Gotty (18th January 2021),mpais (18th January 2021)

  10. #2286
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    538
    Thanks
    225
    Thanked 392 Times in 203 Posts
    Quote Originally Posted by moisesmcardona View Post
    @Gotty, can the BZip2 transform that @kaitz added in paq8pxd be ported to paq8px?
    @Gotty If you do, please add: mbr, base85, uuencoode
    KZo


  11. Thanks (2):

    Gotty (18th January 2021),mpais (18th January 2021)

  12. #2287
    Member
    Join Date
    May 2008
    Location
    Estonia
    Posts
    538
    Thanks
    225
    Thanked 392 Times in 203 Posts
    MC
    Code:
                            paq8px_v200  paq8px_v201  diff
    file​              size           -8           -8  
    A10.jpg         842468       624597       624587    10
    AcroRd32.exe   3870784       823707       823468   239
    english.dic    4067439       346422       345366  1056
    FlashMX.pdf    4526946      1315382      1314782   600
    FP.LOG        20617071       215399       213420  1979 *
    MSO97.DLL      3782416      1175358      1175162   196
    ohs.doc        4168192       454753       454784   -31
    rafale.bmp     4149414       468156       468095    61
    vcfiu.hlp      4121418       372048       371119   929
    world95.txt    2988578       313915       313828    87
    Total                       6109737      6104611  5126
    KZo


  13. Thanks (2):

    Gotty (18th January 2021),mpais (18th January 2021)

  14. #2288
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,273
    Thanks
    802
    Thanked 545 Times in 415 Posts
    Quote Originally Posted by Darek View Post
    Yes, of course. I'll check it again and give you switches used for these versions.
    I've tested it again and got sligtly better option now (but the effect/mechanism is the same) and the scores are:

    Calgary Corpus => option "-8lrta"
    541'975 - for paq8px v191, v191a, v192
    556'856 - for paq8px v193 and higher (with some little changes of course but it's a base)

    Canterbury Corpus => option "-8lrta" - the same as for Calgary Corpus
    290'599 - for paq8px v191, v191a, v192
    298'652 - for paq8px v193 and higher (with some little changes of course but it's a base)

  15. Thanks:

    Gotty (18th January 2021)

  16. #2289
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,273
    Thanks
    802
    Thanked 545 Times in 415 Posts
    Scores for paq8px v201 for my testset - nice gain for K.WAD, and smaller gains to other files.
    Some image files and 0.WAV audio file got also small loses, however in total this is about 3.7KB of gain.
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	paq8px_v201_DBA_Corpus.jpg 
Views:	20 
Size:	853.2 KB 
ID:	8280  

  17. Thanks (2):

    Gotty (18th January 2021),mpais (18th January 2021)

  18. #2290
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    577
    Thanks
    220
    Thanked 833 Times in 341 Posts
    Quote Originally Posted by Darek View Post
    One insight for paq8px v193 version - for some files: "sum" from Canterbury Corpus and both Calgary and Canterbury tar files looks like -r option didn't work - it brings some improvements in paq8px v192 version and now these gains are reversed. Additionally for these three files scores with -r and w/o it are the same in paq8px v193 despite fact, that paq8px v192 got nice improvements from -r option.
    Quote Originally Posted by Darek View Post
    One thing with is worth to mention: between versions paq8px_v192 and paq8px_v193 something strange happened to tar files compression. There no bigger changes in other single files, however:

    Calgary.tar = 542'110 bytes compressed by paq8px_v192
    Calgary.tar = 557'019 bytes compressed by paq8px_v193 - about 15 KB worse score for this file... however score for sum of all files compressed separatelly is similar (136 bytes worse but not 15KB)

    Canterbury.tar = 290'645 bytes compressed by paq8px_v192
    Canterbury.tar = 298'700 bytes compressed by paq8px_v193 - about 7KB worse score for this file - and again score for sum of all files compressed separatelly is similar (4 bytes better)

    This worse level of compression for these mentioned files is sustained still up to paq8px_v197 version.

    From other hand there no such effect on MaximumCompresssion.tar file:

    MaximumCompresssion.tar = 6'004'940 bytes compressed by paq8px_v192
    MaximumCompresssion.tar = 5'959'185 bytes compressed by paq8px_v193 - about 5.8KB better score for this file - score for sum of all files compressed separatelly is also better but difference is about 3.5KB
    Quote Originally Posted by Darek View Post
    There are even something strange which happens with tarball files for Calgary.tar and Canterbury.tar files which lost huge between paq8px v192 and paq8px v193 versions. They "lost" 15KB and 8KB of the previous score.
    Quote Originally Posted by mpais View Post
    In v191a and v192, the loading of LSTM models was just a quick hack, the english model was always loaded at startup (when using the "r" option), and with v192 the x86/64 LSTM model was loaded when the first x86/64 block was detected. In both cases we'd then simply keep iterating on them, any previous learning would be lost.

    With v193, we use a repository of LSTM models, which for now just includes the default (untrained and randomly initialized) model, and the english and x86/64 models. If a block requiring one of the pre-trained models is found, we save the progress made with the current model, and then switch to the other model. This way, on compound files, we don't lose compression by uselessly training a text model on image data, for instance. Now, since the early versions always loaded the english model, on some files that aren't detected as text but are maybe a mixture of text with other data, it might have helped a bit to load it anyway. But for others, it hurt compression, like I.EXE from your testset. The sole exception is that when using the x86/64 LSTM model, if the next block is detected as just a "default" block, we continue using that model, because the way we detect x86/64 code isn't ideal and we often get those "default" blocks between x86/64 blocks.

    Since the LSTM takes a long time to readapt, its important to only load pre-trained models when we're pretty sure they'll be useful, because otherwise the network will take much longer to learn than if it had simply started from random weights.

  19. Thanks (2):

    Gotty (18th January 2021),LucaBiondi (18th January 2021)

  20. #2291
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    739
    Thanks
    424
    Thanked 486 Times in 260 Posts
    That explained it. So fixing my detection routine jumped to no1 spot on my to do list. My next version will be about detections and tranforms anyway - as requested. It fits perfectly.

  21. Thanks:

    LucaBiondi (18th January 2021)

  22. #2292
    Member CompressMaster's Avatar
    Join Date
    Jun 2018
    Location
    Lovinobana, Slovakia
    Posts
    216
    Thanks
    66
    Thanked 18 Times in 18 Posts
    How is compression of broken files handled in paq8px? It refuses to compress at all or at least tries it?
    By "broken", I mean not text files with missing characters, I mean for example JPG without header, PDF corrupted by virus etc.
    Please hit the "THANKS" button under my post if its useful for you.

  23. #2293
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    739
    Thanks
    424
    Thanked 486 Times in 260 Posts
    Paq8px has a bunch of "general" models for "general data". And specialized models for each of the special data: audio, jpg and bitmap images.
    If paq8px don't see anything "special" about the data (also applies to the case when it can't detect the file type (blockType) because of some data corruption), it will just use its default models to compress it.
    Additional note: there are a couple of special blocktypes that paq8px can transform before compression: like zip, cdrom, exe. If it fails to detect these formats then no transformation takes place and so without these special compression-helping transformations compression ratio will be somewhat worse.

  24. Thanks (2):

    CompressMaster (21st January 2021),LucaBiondi (21st January 2021)

  25. #2294
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,273
    Thanks
    802
    Thanked 545 Times in 415 Posts
    Scores of 4 Corpuses for paq8px v201.
    For Calgary corpus there no record, however most of files got the best results. The strange thing is that "trans" file got 5.6% lose between version v198 and 199.
    For Canterbury corpus there a record of total files compression, no record for tar file due to mentioned change between v192 and v193
    For MaxumimCompression corpus - best score for paq8px serie, about 5KB of gain and 5.5KB of gain for tar file
    For Silesia corpus - there is a record score (15.5KB of gain) and THE BEST SCORE overall - that means paq8px beat best cmix score (whith precompressor) !!!!

    Request for Matt Mahoney: could you add this score to the official Silesia page? Detailed scores are as follow:

    compressor version: paq8px_v201, no precompression

    FILE score truncated score
    dickens | 1,796,777 | 1796
    mozilla | 6,569,742 | 6569
    mr | 1,839,331 | 1839
    nci | 815,620 | 815
    ooffice | 1,183,903 | 1183
    osdb | 1,978,882 | 1978
    reymont | 698 148 | 698
    samba | 1,625,828 | 1625
    sao | 3,733,172 | 3733
    webster | 4,413,395 | 4413
    x-ray | 3,521,326 | 3521
    xml | 250,710 | 250
    TOTAL | 28,426,834 | 28426
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	paq8px_v201_4_Corpuses.jpg 
Views:	20 
Size:	3.85 MB 
ID:	8285  

  26. Thanks (4):

    Gotty (21st January 2021),kaitz (21st January 2021),LucaBiondi (21st January 2021),schnaader (21st January 2021)

  27. #2295
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    514
    Thanks
    63
    Thanked 96 Times in 75 Posts
    @darek which option do you use ?

  28. #2296
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    739
    Thanks
    424
    Thanked 486 Times in 260 Posts
    Quote Originally Posted by Darek View Post
    For Calgary corpus there no record, however most of files got the best results. The strange thing is that "trans" file got 5.6% lose between version v198 and 199.
    The change I made in normalmodel ("NormalModel now includes the BlockType in its contexts") in v199 is not fully compatible with text pre-training. Calgary/trans includes mostly text but is detected as "default", so it lost the boost given by text pre-training.
    A fix is coming in my next version.

  29. #2297
    Member CompressMaster's Avatar
    Join Date
    Jun 2018
    Location
    Lovinobana, Slovakia
    Posts
    216
    Thanks
    66
    Thanked 18 Times in 18 Posts
    Quote Originally Posted by Darek View Post
    Request for Matt Mahoney: could you add this score to the official Silesia page? Detailed scores are as follow:
    @Darek, I highly doubt that this is the proper way for reaching Mr. Matt Mahoney. Better to navigate to his website mattmahoney.net and contact him from there.
    Please hit the "THANKS" button under my post if its useful for you.

  30. Thanks:

    Darek (22nd January 2021)

  31. #2298
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,136
    Thanks
    320
    Thanked 1,397 Times in 802 Posts
    fastest method is probably https://www.facebook.com/mattmahoneyfl

  32. Thanks:

    Darek (22nd January 2021)

  33. #2299
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    514
    Thanks
    63
    Thanked 96 Times in 75 Posts
    Quote Originally Posted by Darek View Post
    Scores of 4 Corpuses for paq8px v201.
    For Calgary corpus there no record, however most of files got the best results. The strange thing is that "trans" file got 5.6% lose between version v198 and 199.
    For Canterbury corpus there a record of total files compression, no record for tar file due to mentioned change between v192 and v193
    For MaxumimCompression corpus - best score for paq8px serie, about 5KB of gain and 5.5KB of gain for tar file
    For Silesia corpus - there is a record score (15.5KB of gain) and THE BEST SCORE overall - that means paq8px beat best cmix score (whith precompressor) !!!!

    Request for Matt Mahoney: could you add this score to the official Silesia page? Detailed scores are as follow:

    compressor version: paq8px_v201, no precompression

    FILE score truncated score
    dickens | 1,796,777 | 1796
    mozilla | 6,569,742 | 6569
    mr | 1,839,331 | 1839
    nci | 815,620 | 815
    ooffice | 1,183,903 | 1183
    osdb | 1,978,882 | 1978
    reymont | 698 148 | 698
    samba | 1,625,828 | 1625
    sao | 3,733,172 | 3733
    webster | 4,413,395 | 4413
    x-ray | 3,521,326 | 3521
    xml | 250,710 | 250
    TOTAL | 28,426,834 | 28426
    i have make a little improvement on lstm model by adding 1 mixercontextsets and 2 mixerinputs. @darek could you test it on silesia file using -12lrta option ? i think it can save more space. thank you!!
    Attached Files Attached Files

  34. #2300
    Member
    Join Date
    Sep 2014
    Location
    Italy
    Posts
    109
    Thanks
    139
    Thanked 50 Times in 30 Posts
    Hi suryakandau
    Could you post your modifications for example an exam diff please
    I would to learn (slowly) how to paq8px works... thank you
    Luca

  35. #2301
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    739
    Thanks
    424
    Thanked 486 Times in 260 Posts
    Yes, as Luca said and also Moises said and also I said before: don't post source code here. It's quite cumbersome to find changes in your contributions that way.
    Do a pull request to my branch instead if you would like to have it reviewed. Would yo do that please?
    I'm not an authority here but I must act one as I haven't seen a modification/improvement from your side that was bug-free or issue-free. Maybe your current one is an exception, I don't know yet. But I'm willing to review it only when you use git. No source codes in the forum please. That's not reviewer friendly.

  36. #2302
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    514
    Thanks
    63
    Thanked 96 Times in 75 Posts
    i just adding mixercontextset in textmodel.cpp and simdlstmmodel.hpp...

  37. #2303
    Member
    Join Date
    Jun 2009
    Location
    Puerto Rico
    Posts
    277
    Thanks
    164
    Thanked 64 Times in 49 Posts
    Quote Originally Posted by suryakandau@yahoo.co.id View Post
    i just adding mixercontextset in textmodel.cpp and simdlstmmodel.hpp...
    As we said, we will not be reviewing code that is not submitted via Git.

  38. Thanks:

    LucaBiondi (25th January 2021)

  39. #2304
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    514
    Thanks
    63
    Thanked 96 Times in 75 Posts
    Quote Originally Posted by Darek View Post
    Scores for paq8px v201 for my testset - nice gain for K.WAD, and smaller gains to other files.
    Some image files and 0.WAV audio file got also small loses, however in total this is about 3.7KB of gain.
    that is not paq8sk in your testset darek, it should be paq8px

  40. #2305
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    514
    Thanks
    63
    Thanked 96 Times in 75 Posts
    @darek, I mean it should be paq8px with LSTM in your test set not paq8sk

  41. Thanks:

    Darek (27th January 2021)

  42. #2306
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,273
    Thanks
    802
    Thanked 545 Times in 415 Posts
    Quote Originally Posted by suryakandau@yahoo.co.id View Post
    @darek, I mean it should be paq8px with LSTM in your test set not paq8sk
    Thanks, I know. I didn't keep this table now, then I can't change it now.

  43. #2307
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    514
    Thanks
    63
    Thanked 96 Times in 75 Posts
    Quote Originally Posted by suryakandau@yahoo.co.id View Post
    i have make a little improvement on lstm model by adding 1 mixercontextsets and 2 mixerinputs. @darek could you test it on silesia file using -12lrta option ? i think it can save more space. thank you!!
    just testing paq8px201 and paq8px201fix1 on silesia corpus using -8lrta option, the result is:

    -8lrta paq8px201ori pa8px201fix1
    xml 253766 253142
    dickens 1899100 1898292
    mozilla 6925711 6910517
    mr 1988233 1987423
    ooffice 1298763 1297553
    reymont 751183 748527
    sao 3749141 3749666
    xray 3582991 3583354
    osdb 2027465 2027137
    samba 1672088 1669365
    nci 849920 843474
    webster 4657102 4646529
    --------- --------
    29655463 29614979
    diff 40484 bytes

  44. #2308
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    739
    Thanks
    424
    Thanked 486 Times in 260 Posts
    That sounds really good. I appreciate that you share your test results.
    Please pull request your changes to my git repo.

  45. #2309
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    514
    Thanks
    63
    Thanked 96 Times in 75 Posts
    Thank you for your response gotty, I am still struggling with doing pull request to git. Don't blame me...

  46. #2310
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,273
    Thanks
    802
    Thanked 545 Times in 415 Posts
    Quote Originally Posted by suryakandau@yahoo.co.id View Post
    i have make a little improvement on lstm model by adding 1 mixercontextsets and 2 mixerinputs. @darek could you test it on silesia file using -12lrta option ? i think it can save more space. thank you!!
    Two questions:
    1) could you provide executable file?
    2) dod you use different english.exp file than paq8px_v201? Yours file have 19'651 bytes, however paq8px_v201 got 21'529 bytes.

Page 77 of 78 FirstFirst ... 276775767778 LastLast

Similar Threads

  1. FrontPAQ - GUI frontend for PAQ8PF and PAQ8PX
    By LovePimple in forum Download Area
    Replies: 26
    Last Post: 17th January 2019, 14:36
  2. Alternative paq8px builds
    By M4ST3R in forum Download Area
    Replies: 20
    Last Post: 25th June 2010, 17:19
  3. Optimized paq7asm.asm code not compatible with paq8px?
    By M4ST3R in forum Data Compression
    Replies: 7
    Last Post: 3rd June 2009, 16:34

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •