Page 4 of 13 FirstFirst ... 23456 ... LastLast
Results 91 to 120 of 361

Thread: EMMA - Context Mixing Compressor

  1. #91
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    Quote Originally Posted by Darek View Post
    I'm running test ENWIK9 on EMMA 1.4 w/o dictionaries as Mauro suggest to check if this option w/o additional files will be better.

    However I didn't switch off Images models and EMMA found one 24bpp file inside. I don't know if it is right file or only header or even mistake but if not - maybe it could help in a bit to get better result...
    It's probably a false positive from the PPM parser, I'll check it out, thank you.
    I've also realized from glancing at SqueezeChart that I had forgoten to enable the colorspace
    transform for PPM files, so the results weren't as good as expected. I'll fix it in the next version.

    I've almost finished writing the XML parser and model, it seems to help on small XML/HTML files
    (around 200 to 300KB), but not so much for enwik.

    Best regards

  2. #92
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Probably a false positive. From http://mattmahoney.net/dc/textdata.html

    The data is UTF-8 clean. All characters are in the range U'0000 to U'10FFFF with valid encodings of 1 to 4 bytes. The byte values 0xC0, 0xC1, and 0xF5-0xFF never occur. Also, in the Wikipedia dumps, there are no control characters in the range 0x00-0x1F except for 0x09 (tab) and 0x0A (linefeed).

  3. #93
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    Well, I've found it.. and it is a valid header, according to the specifications:

    enwik9, offset 435.132.002:

    Code:
    Uploaded by [[User:Mbecker]] in unknown format. Converted to PNG.
    
    Original image has a 56-byte header beginning "MB". I replaced it with the following:
    
    P6
    104 82
    255
    
    This resulted in a PPM file, which I then converted to PNG. -[[User:PierreAbbat|phma]]
    paq8px_v69 also detects it

  4. The Following 3 Users Say Thank You to mpais For This Useful Post:

    Darek (18th March 2016),Matt Mahoney (20th March 2016),schnaader (18th March 2016)

  5. #94
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    919
    Thanks
    541
    Thanked 363 Times in 271 Posts
    Quote Originally Posted by mpais View Post
    That is not fair, one can't simply change the rules of a benchmark to better suit this or that compressor.
    I never intended for EMMA to top any benchmark, from the start I severely limited it (streaming compression,
    relatively low memory usage, coding with extensibility in mind, not performance) and have tried to overcome
    that by trying new ideas. For me EMMA is a research project, I wasn't even sure if I should ever release it.

    As for the LTCB, if I ever want EMMA to gain a few spots, I'll just release a CLI version and increase the memory
    for the text model to 512MB, that should do it.

    Best regards
    ENWIK 9 score for options like last time but with all Dictionaries off. Difference is quite small to compression with Dictionaries - it means that we could cut Directories files for improve total LTCB score.

    EMMA v0.1.4 Enwik9 score is 148.906.879 bytes, time 113232,2s, hardware: i7 4900MQ, 2.8GHz Oc. to 3.1GHz, 16GB, Win7Pro 64. Options on screenshot (Options5.jpg). Decompression not verified.
    Memory used: 1095MB

    Second test (EMMA_Options.jpg) - my testbed comparison with all options influence map. I've tested all combinations of options and select one witch MUST be on (green cells) to improve compression, and these which MUST be off because it's hurt compression (red cells). White fields means option/model which is Irrelevant to compression. Yellow cells means better compression but only by 1-2 bytes - it looks like only changes in header or so....

    Generally my insights:
    - x86/x64 model - for textual files should be off, for other switching it on improves compression,
    - executable code (x84/x64) option - sometimes gives much more compression - for example K.WAD - switching it on gets 23% additional ratio! M.DBF - switching it on gets 40% smaller file! From other hand A.TIF hurts by 10% when it's turned on...,
    - image models - helps in all graphic (nonpacked) files even if there aren't recognised as implemented model (A.TIF), and helps for exe files from my testbed,
    - adaptive learnings rate - mostly should be off or it's irrelevant. Only 6 of 28 files gets some additional ratio from this option..,
    - match, text, sparse, indirect, record, distance and DMC models generally improve compression ratio for almost all files (nonmodel),
    - delta coding - generally irrelevant, in one case it hurts compression, in another one one improve compression ratio,
    - English dictionary also helps for many even nontextual files,

    Darek

    Darek
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	EMMA_Options.jpg 
Views:	179 
Size:	662.8 KB 
ID:	4187   Click image for larger version. 

Name:	Options5.jpg 
Views:	174 
Size:	150.6 KB 
ID:	4186  

  6. The Following User Says Thank You to Darek For This Useful Post:

    mpais (18th March 2016)

  7. #95
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    Quote Originally Posted by Darek View Post
    ENWIK 9 score for options like last time but with all Dictionaries off. Difference is quite small to compression with Dictionaries - it means that we could cut Directories files for improve total LTCB score.

    EMMA v0.1.4 Enwik9 score is 148.906.879 bytes, time 113232,2s, hardware: i7 4900MQ, 2.8GHz Oc. to 3.1GHz, 16GB, Win7Pro 64. Options on screenshot (Options5.jpg). Decompression not verified.
    Memory used: 1095MB
    Thank you for testing. The dictionaries help mostly with smaller files. Using the image models cost you in terms of time (all image parsers active) and memory (80MB for models, plus the memory for the filter, which would stay off if no special models were used, and for the colorspace transform), and the false .PPM positive probably gave just slightly worse compression. For the LTCB memory is a huge factor, ludicrous mode helps attenuate the effect but there really is no way to compete without using many, many GB of memory (I might just have to try a x64 compile of EMMA using 48GB of memory to see what it can do )

    Quote Originally Posted by Darek View Post
    Second test (EMMA_Options.jpg) - my testbed comparison with all options influence map. I've tested all combinations of options and select one witch MUST be on (green cells) to improve compression, and these which MUST be off because it's hurt compression (red cells). White fields means option/model which is Irrelevant to compression. Yellow cells means better compression but only by 1-2 bytes - it looks like only changes in header or so....

    Generally my insights:
    - x86/x64 model - for textual files should be off, for other switching it on improves compression,
    - executable code (x84/x64) option - sometimes gives much more compression - for example K.WAD - switching it on gets 23% additional ratio! M.DBF - switching it on gets 40% smaller file! From other hand A.TIF hurts by 10% when it's turned on...,
    - image models - helps in all graphic (nonpacked) files even if there aren't recognised as implemented model (A.TIF), and helps for exe files from my testbed,
    - adaptive learnings rate - mostly should be off or it's irrelevant. Only 6 of 28 files gets some additional ratio from this option..,
    - match, text, sparse, indirect, record, distance and DMC models generally improve compression ratio for almost all files (nonmodel),
    - delta coding - generally irrelevant, in one case it hurts compression, in another one one improve compression ratio,
    - English dictionary also helps for many even nontextual files,

    Darek

    Darek
    That coincides with my observations, I'd just add that the colorspace transform also plays a pivotal role, it either helps or hurts by a significant margin.
    This is why I decided to make almost everything optional in EMMA, so we can see what effect each option has, and hopefully get some insights on what
    is worth researching more.

  8. The Following User Says Thank You to mpais For This Useful Post:

    Darek (18th March 2016)

  9. #96
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    919
    Thanks
    541
    Thanked 363 Times in 271 Posts
    Quote Originally Posted by mpais View Post
    For the LTCB memory is a huge factor, ludicrous mode helps attenuate the effect but there really is no way to compete without using many, many GB of memory (I might just have to try a x64 compile of EMMA using 48GB of memory to see what it can do )
    I'm serious - do it!

    Maybe 48GB is too far for me, but 32GB or 16GB version I can run and test.
    If you could build a such 64bit version then we could test how much is memory factor eeffect in EMMA compression.

    Darek

  10. #97
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    776
    Thanks
    63
    Thanked 270 Times in 190 Posts
    Quote Originally Posted by mpais View Post
    (I might just have to try a x64 compile of EMMA using 48GB of memory to see what it can do )
    I can test till just under 256GB memory under Windows 64-bit.

  11. The Following User Says Thank You to Sportman For This Useful Post:

    Darek (19th March 2016)

  12. #98
    Member
    Join Date
    Sep 2014
    Location
    Italy
    Posts
    26
    Thanks
    33
    Thanked 18 Times in 11 Posts
    I might just have to try a x64 compile of EMMA using 48GB of memory to see what it can do
    Please!! do it! give us a easter gift

  13. #99
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    EMMA v0.1.5, attachment in first post.

    Code:
    Changes:
    - Fixed the PBM/PGM/PPM parser, wasn't using the
    colorspace transform
    - Improved text model
    - Improved audio model, enabled Ludicrous Mode
    - Preliminary XML model
    Results:

    enwik8 compressed to 17.848.906 bytes in 6088s using 984MB of memory.
    enwik9 compressed to 148.736.759 bytes in 59883s using 984MB of memory.
    Tested on an Intel Core i7 5820k@4.4Ghz, settings in screenshot.

    Code:
    29 files from SqueezeChart PGM/PPM dataset
    
    236.094.359 bytes, each file compressed separately
    237.830.592 bytes, .TAR with all files (633.509.888 bytes)
    
    
    File: diato.sf2 from SqueezeChart, 140.384.856 bytes
    
    69.511.942 bytes, EMMA v0.1.5, Audio (Max)
    Click image for larger version. 

Name:	options.png 
Views:	185 
Size:	16.4 KB 
ID:	4203
    Last edited by mpais; 21st March 2016 at 19:52.

  14. The Following 3 Users Say Thank You to mpais For This Useful Post:

    Darek (20th March 2016),necros (27th January 2019),Stephan Busch (20th March 2016)

  15. #100
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    919
    Thanks
    541
    Thanked 363 Times in 271 Posts
    Quote Originally Posted by mpais View Post
    EMMA v0.1.5, attachment in first post.

    Code:
    Changes:
    - Fixed the PBM/PGM/PPM parser, wasn't using the
    colorspace transform
    - Improved text model
    - Improved audio model, enabled Ludicrous Mode
    - Preliminary XML model
    Results:

    enwik8 compressed to 17.848.906 bytes in 6088s using 984MB of memory.
    enwik9 is still compressing, I'll update this post with the results.
    Tested on an Intel Core i7 5820k@4.4Ghz, settings in screenshot.

    Code:
    29 files from SqueezeChart PGM/PPM dataset
    
    236.094.359 bytes, each file compressed separately
    237.830.592 bytes, .TAR with all files (633.509.888 bytes)
    
    
    File: diato.sf2 from SqueezeChart, 140.384.856 bytes
    
    69.511.942 bytes, EMMA v0.1.5, Audio (Max)
    Click image for larger version. 

Name:	options.png 
Views:	185 
Size:	16.4 KB 
ID:	4203
    Great job again!
    Enabling Ludicrous Mode in audio compression helps much. For my testbed now EMMA moves on 2'nd place (from 4'th). Only specialised Optimfrog is still to beat.
    Question - could you enable Ludicrous Mode also in Image/JPG models?

    Strange thing with TARGA compiled file score... Is much more worse than be expected.

    Darek

  16. #101
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    Quote Originally Posted by Darek View Post
    Great job again!
    Enabling Ludicrous Mode in audio compression helps much. For my testbed now EMMA moves on 2'nd place (from 4'th). Only specialised Optimfrog is still to beat.
    Question - could you enable Ludicrous Mode also in Image/JPG models?

    Strange thing with TARGA compiled file score... Is much more worse than be expected.

    Darek
    Do you mean compression of .TGA files? I didn't change the parser nor the model, guess I must have broke something.
    Could you be more specific, and if possible provide a file where you are seeing a regression in compression ratio?

    As for the image and JPEG models, I could try to further improve them, but from my testing they are already very good,
    I think time would be better spent focusing on improving other models or adding new features. But as always, I'm open
    to suggestions, and I'll try to keep the project going, I just don't have as much time for this as I'd like

  17. #102
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    919
    Thanks
    541
    Thanked 363 Times in 271 Posts
    Quote Originally Posted by mpais View Post
    Do you mean compression of .TGA files? I didn't change the parser nor the model, guess I must have broke something.
    Could you be more specific, and if possible provide a file where you are seeing a regression in compression ratio?

    As for the image and JPEG models, I could try to further improve them, but from my testing they are already very good,
    I think time would be better spent focusing on improving other models or adding new features. But as always, I'm open
    to suggestions, and I'll try to keep the project going, I just don't have as much time for this as I'd like
    No. Sorry It's my bad explanation. I've meant .TAR file in your example.
    In most cases compiling all files in one .TGA file gives better compression results or, in worst case, scores close to compressing all files separatelly. In your case compression of .TAR file hurts about 1.8MB..

    Darek

  18. #103
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    Quote Originally Posted by Darek View Post
    No. Sorry It's my bad explanation. I've meant .TAR file in your example.
    In most cases compiling all files in one .TGA file gives better compression results or, in worst case, scores close to compressing all files separatelly. In your case compression of .TAR file hurts about 1.8MB..

    Darek
    That is to be expected, because when compressing each file separately, I can choose to use the colorspace transform or not.
    For that dataset, almost half the .PPM files got worse compression when using it. But when compressing the .TAR archive
    I enabled the transform, and the result wasn't even worse because, as you said, the models benefit from being already
    "trained" on the previous image in the archive, so they can make up some of the difference.

  19. #104
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    919
    Thanks
    541
    Thanked 363 Times in 271 Posts
    Quote Originally Posted by mpais View Post
    That is to be expected, because when compressing each file separately, I can choose to use the colorspace transform or not.
    For that dataset, almost half the .PPM files got worse compression when using it. But when compressing the .TAR archive
    I enabled the transform, and the result wasn't even worse because, as you said, the models benefit from being already
    "trained" on the previous image in the archive, so they can make up some of the difference.
    XML model generally improve compression ratio, however for some cases should be off.
    New text model:
    - for text files really good! One of my test files (S.DOC) move into 2'nd place in my ranking. Of course the first place has CMIXv6 but difference is only 4%.
    - for non-text files - sometimes get better results, sometimes worse. In one case switching off text model helps.

    Darek

  20. #105
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    EMMA v0.1.6, attachment in first post.

    Code:
    Changes:
    - Fixed a bug in the GIF model, relaxed parsing to allow for
    GIF87a with extensions, even if the spec says it's not supported.
    - Fixed a bug in the .TGA image parser
    - Improved PBM/PGM/PPM image parser
    - Parser for .PSD (PhotoShop) images
    This is mostly a bug fix release, the only new feature is the .PSD parser.
    The bug in the GIF model was caused by "faulty" GIF encoders, if anyone
    cares I'll elaborate further.

    I've been researching recompression possibilities, but most interesting
    formats which would benefit from recompression would need preprocessors.
    As seen with the GIF recompression in paq8px, the gains from preprocessing
    can be really good, but it would defeat the objective of having a streaming
    compression engine.

    I've also converted the assembler code to x64, so I could compile a 64-bit
    version of EMMA. It's not a full conversion, but for now it allows me to
    increase the memory usage.

    So, as requested, I'm attaching a 64-bit version of EMMA that quadruples the
    memory usage of non-specific models.

    Code:
    29 files from SqueezeChart PGM/PPM dataset in a .TAR file (633.509.888 bytes)
    
    237.830.592 bytes, EMMA 0.1.5
    237.227.205 bytes, EMMA 0.1.6, Images (Slow)

  21. The Following 4 Users Say Thank You to mpais For This Useful Post:

    Darek (27th March 2016),necros (27th January 2019),Stephan Busch (27th March 2016),xinix (27th March 2016)

  22. #106
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    873
    Thanks
    460
    Thanked 175 Times in 85 Posts
    I would vote for preprocessing rather than having a streaming format.
    EMMA could set new frontiers this way.

  23. The Following User Says Thank You to Stephan Busch For This Useful Post:

    Darek (27th March 2016)

  24. #107
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    919
    Thanks
    541
    Thanked 363 Times in 271 Posts
    Quote Originally Posted by Stephan Busch View Post
    I would vote for preprocessing rather than having a streaming format.
    EMMA could set new frontiers this way.
    Maybe both formats could be used? As an option. New frontiers, old possibilities
    Was there similar compressor released? I don't know if it's possible or if - how complicated is to match both options in one program? ... just idea.

  25. #108
    Member
    Join Date
    Mar 2008
    Location
    Minsk
    Posts
    9
    Thanks
    0
    Thanked 2 Times in 2 Posts
    Quote Originally Posted by Darek View Post
    Was there similar compressor released?
    It seems WinRK had a separate preprocessing step.

  26. The Following User Says Thank You to nemoW For This Useful Post:

    Darek (27th March 2016)

  27. #109
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    Quote Originally Posted by Darek View Post
    Maybe both formats could be used? As an option. New frontiers, old possibilities
    Was there similar compressor released? I don't know if it's possible or if - how complicated is to match both options in one program? ... just idea.
    It's certainly possible, you can do it right now by using precomp + EMMA, it's just a matter of doing the work in coding a framework for pre-processing.
    I mentioned it because if one really want's to go for maximum compression, there is no way around doing recompression of already compressed data.
    Right now I've been researching these ideas:

    - Reordering the IFD's in TIFF files, to ensure EMMA can detect them. And in case of compressed image data, such as LZW, we could decompress it and
    use the appropriate image model
    - Decompress GIF files and use the 8bpp image model, similar to what paq8px_v75 does
    - Base64 detection and decoding

    Ideally all of these would be perfectly reversible transformations, but as seen with paq8px_v75 and GIF, sometimes you can't perfectly model the encoder
    used and the transformation fails. With EMMA that could be handled by using the GIF model when the transformation fails and the 8bpp image model when
    it succeeds.

    I still have a few things I'd like to implement first, and then I might give it a try. Though to be honest, it would probably be better to just try to port some
    features from EMMA to paq8px, since it already has pre-processing code.

  28. The Following User Says Thank You to mpais For This Useful Post:

    Darek (27th March 2016)

  29. #110
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    919
    Thanks
    541
    Thanked 363 Times in 271 Posts
    EMMA v0.1.6.64 Enwik8 score is 17.542.087 bytes, time 9231,9s hardware: i7 4900MQ, 2.8GHz Oc. to 3.6GHz, 16GB, Win7Pro 64. Options on screenshot (Options7.jpg). Decompression in progress. Memory used: 3895MB

    It's damn fast and give quite good improvement. ENWIK8 score is on the 5'th position on the LTCB benchmark (w/o compressor size). ENWIK9 in progres.
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	Options7.jpg 
Views:	176 
Size:	173.3 KB 
ID:	4221  

  30. #111
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    Quote Originally Posted by Darek View Post
    EMMA v0.1.6.64 Enwik8 score is 17.542.087 bytes, time 9231,9s hardware: i7 4900MQ, 2.8GHz Oc. to 3.6GHz, 16GB, Win7Pro 64. Options on screenshot (Options5.jpg). Decompression in progress. Memory used: 3895MB

    It's damn fast and give quite good improvement. ENWIK9 in progres.
    I have the same result using 3686MB in 6679s with the usual options (screenshot).
    I'm also testing enwik9, don't have the result yet because I tested an internal version
    first using 50GB of memory

    Click image for larger version. 

Name:	options.png 
Views:	181 
Size:	16.5 KB 
ID:	4222

    [EDIT]

    Results:

    enwik8 compressed to 17.542.087 bytes in 6679s using 3686MB of memory.
    enwik9 compressed to 142.759.448 bytes in 65305s using 3686MB of memory.

    Tested on an Intel Core i7 5820k@4.4Ghz
    Last edited by mpais; 28th March 2016 at 13:56.

  31. The Following User Says Thank You to mpais For This Useful Post:

    Darek (28th March 2016)

  32. #112
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    919
    Thanks
    541
    Thanked 363 Times in 271 Posts
    Quote Originally Posted by mpais View Post
    I have the same result using 3686MB in 6679s with the usual options (screenshot).
    I'm also testing enwik9, don't have the result yet because I tested an internal version
    first using 50GB of memory

    Click image for larger version. 

Name:	options.png 
Views:	181 
Size:	16.5 KB 
ID:	4222
    I've tested my testbed with maximim options and foud that some files hurts with more memory settings. Especially L.PAK had it's minimal size on 1024MB of main model memory. Using 2048MB I've got worst result than 512MB...
    I've foud that in some cases increasing match model above 32MB also hurts compression ratio. I'need test harder to find out how changing memory is affecting compression ratio.

  33. #113
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    Quote Originally Posted by Darek View Post
    I've tested my testbed with maximim options and foud that some files hurts with more memory settings. Especially L.PAK had it's minimal size on 1024MB of main model memory. Using 2048MB I've got worst result than 512MB...
    I've foud that in some cases increasing match model above 32MB also hurts compression ratio. I'need test harder to find out how changing memory is affecting compression ratio.
    Probably related to hash collisions and resolution. When using less memory, you have more collisions, and if a context isn't found, another one will be cleared to make room.
    This can sometimes actually be beneficial to compression. Also, it seems that ludicrous mode does a really good job of handling memory, as you can see from the results with
    enwik8 the improvement is small when considering that one is using 4x more memory, so the models weren't very "starved" for memory even with a ~100MB file.

    And please consider that the x64 version is just to test the effect of memory usage on the models, something Mauro Vezzosi was interested in.
    I plan to keep developing EMMA as 32-bit.

  34. #114
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    919
    Thanks
    541
    Thanked 363 Times in 271 Posts
    Quote Originally Posted by mpais View Post
    Probably related to hash collisions and resolution. When using less memory, you have more collisions, and if a context isn't found, another one will be cleared to make room.
    This can sometimes actually be beneficial to compression. Also, it seems that ludicrous mode does a really good job of handling memory, as you can see from the results with
    enwik8 the improvement is small when considering that one is using 4x more memory, so the models weren't very "starved" for memory even with a ~100MB file.

    And please consider that the x64 version is just to test the effect of memory usage on the models, something Mauro Vezzosi was interested in.
    I plan to keep developing EMMA as 32-bit.
    Yes. I agree. Generally more memory helps. Not a lot but improve compression. If it's small gain or big - that's a metter of perspective .

    From my side - many thanks for this version!

    EDIT:
    Question - EMMA 64bit status bar is generally green, however (very rare) is colored on red - is it means something? Something goes wrong? (screen)
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	Redband.jpg 
Views:	163 
Size:	869.7 KB 
ID:	4223  
    Last edited by Darek; 28th March 2016 at 00:39.

  35. #115
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    Quote Originally Posted by Darek View Post
    Yes. I agree. Generally more memory helps. Not a lot but improve compression. If it's small gain or big - that's a metter of perspective .

    From my side - many thanks for this version!

    EDIT:
    Question - EMMA 64bit status bar is generally green, however (very rare) is colored on red - is it means something? Something goes wrong? (screen)
    Just checked, it's a bug. You probably had an error before (not enough memory? you're running several instances simultaneously) and the color is not being
    reset when initializing a new compression job.

    I'm running some tests on the image models and the jpeg model, comparing them to paq8px_v75, to see if it's worth it to try to squeeze a bit more compression
    out of them, as you requested. If you have any file where you think EMMA is performing worse then expected, I'd appreciate it if you could post it here.

  36. #116
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    I've finished testing, here is the summary.

    I started by testing the 24bbp and 8bpp models with the PGM/PPM dataset from SqueezeChart.
    Since EMMA doesn't support archiving multiple files, I created a .TAR file with the 29 files. It is
    most likely not the same used by Stephan Busch, but for my comparison it does the trick. For
    EMMA I used the "Images (Slow)" preset, for paq8px_v75 I used the switch "-8". To be fair, I
    also used paq8px_v75 to create a single archive from all 29 files, using the command line
    "paq8px_v75 -8 *.pgm *.ppm". Now the results:
    Code:
    squeezechart_pgm.tar, 29 files from SqueezeChart PGM/PPM dataset in a .TAR file (633.509.888 bytes):
    237.227.205 bytes, EMMA 0.1.6
    238.529.369 bytes, paq8px_v75
    
    29 files compressed to a single .paq8px archive:
    238.541.214 bytes
    EMMA beats paq8px_v75 by a nearly insignificant margin, and paq8px_v75 seems to benefit from
    compressing all files in a container instead of compressing them one by one.

    I then proceed to test with the Kodak Set, from http://r0k.us/graphics/kodak/
    I downloaded every file and converted them to .BMP files using MsPaint.
    These are all 24bpp images.

    Click image for larger version. 

Name:	kodak set.png 
Views:	151 
Size:	10.5 KB 
ID:	4224

    EMMA and paq8px_v75 trade wins, and EMMA again just slightly wins by an insignificant margin.
    EMMA seems to do particularly bad on files kodim01 and kodim02, might be worth it to see why.

    I then further tested the 24bpp model on other images, some are classic image compression tests
    (Baboon, Peppers, Lena), 3 are random photos found using Google Images, 1 is a game screenshot,
    2 are drawings, 1 is a ray-traced rendering of a 3D scene, and the last one is MARBLES.bmp, which
    seems to be a commonly used file for testing, but which I'm unsure as to the origin, seems to be
    scanned.

    Click image for larger version. 

Name:	Other, 24bpp.png 
Views:	145 
Size:	11.2 KB 
ID:	4225

    Here EMMA shows a bigger spread in the results when compared to paq8px_v75. It does really
    bad on Baboon and Lena, not sure why (maybe the model in paq8px_v75 was optimized for them?).
    It seems to do well on non-photographic images.

    So next I tested with the files from the Lossless Image Compression section of SqueezeChart,
    Non Photographic chart.
    These are 24bpp .PPM files, and paq8px_v75 doesn't detect the first 3 due to a bug in its parser.
    So, to make it a fair comparison, I used an hex-editor to change a single byte in each file
    (a new-line (0x0A) between the image dimensions was changed to a space (0x20). For EMMA,
    I used the "Image (Slow)" preset, though for the 1st and 5th file I unticked the colorspace transform.
    For paq8px_v75, the "-8" switch was used.

    Click image for larger version. 

Name:	squeezechart_non_photographic.png 
Views:	142 
Size:	6.4 KB 
ID:	4228

    Again EMMA and paq8px_v75 (fixed) trade wins.The results for the 1st, 2nd and 5th file from EMMA
    seem to be new records, as does the result for the fixed 3rd file from paq8px_v75. EMMA does badly
    on the 4th file, and both are well behind the result for BMF for that file.

    Now for the 8bpp model. I again used classic image compression test files, all grayscale (Baboon,
    Barbara, Bird, Cameraman, Clown, Goldhill, Lena, Peppers), a color version of Lena, and more files
    found with Google Images.

    Click image for larger version. 

Name:	8bpp.png 
Views:	141 
Size:	11.7 KB 
ID:	4226

    EMMA seems to have a very slightly better model, though the difference is again almost insignificant.

    I then tested with the files from the Lossless Image Compression section of SqueezeChart,
    Grayscale 8bpp Photographic chart. Again, "Image (Slow)" preset for EMMA, "-8" switch for paq8px_v75.

    Click image for larger version. 

Name:	squeezechart_grayscale_8bpp_photographic.png 
Views:	133 
Size:	5.3 KB 
ID:	4229

    These are 4 medical grayscale image files, and 1 grayscale photo. EMMA does badly on the 1st file,
    paq8px_v75 does badly on the 3rd file. In the other files paq8px_v75 wins by a slight margin, but both
    are clearly beaten by dedicated image compressors.

    So, all in all, it seems the 8bpp and 24bpp models are competitive.



    For JPEG testing, I used the JPEG file from the Maximum Compression benchmark, 3 files from the
    ACT JPEG Compression test, the JPEG and PDF test files from SqueezeChart, a CMYK JPEG file, 3
    personal photos from different cameras, and 3 .AVI files, since EMMA supports MJPEG compression
    as well as uncompressed PCM audio compression. For the JPEG files, I used EMMA with preset
    "Images (Slow)", except for the CMYK file, which used all models. The PDF file also used all models,
    and the AVI files used the "Images (Slow)" preset with the Audio model added, High complexity,
    using 32MB of memory, not in Ludicrous Mode. paq8px_v75 used the "-8" switch.

    Click image for larger version. 

Name:	JPEG.png 
Views:	173 
Size:	17.7 KB 
ID:	4227

    EMMA loses on the CMYK file, even though it detects the 2 JPEG streams (an embedded thumbnail),
    because the file contains a lot of extra-data (more than half). The result for the SqueezeChart PDF
    file is very good, since EMMA doesn't have Deflate recompression. It and the result from Mill.jpg
    appear to be new records for SqueezeChart.

    I'm not aware of a standard JPEG dataset used for testing recompression, so if anyone knows of
    one that you'd like to see tested, feel free to let me know.



    @Darek:

    I believe that, for the moment, there is no need to focus on improving these models.
    Best regards
    Last edited by mpais; 28th March 2016 at 16:55.

  37. The Following 4 Users Say Thank You to mpais For This Useful Post:

    Darek (28th March 2016),hexagone (28th March 2016),schnaader (28th March 2016),xinix (28th March 2016)

  38. #117
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    919
    Thanks
    541
    Thanked 363 Times in 271 Posts
    Quote Originally Posted by mpais View Post
    I'm also testing enwik9, don't have the result yet because I tested an internal version
    first using 50GB of memory
    Have you any insights from testing your internal 50GB version? Is worth to use? Improves ENWIK9 score?

  39. #118
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    Quote Originally Posted by Darek View Post
    Have you any insights from testing your internal 50GB version? Is worth to use? Improves ENWIK9 score?
    The result for enwik9 was 139.486.920 bytes, so not worth it in any way.

    I guess just a little more memory than that used by the x64 version I released would be enough to surpass ZPAQ.
    So I'm quite happy with the text model, even when not using the english dictionary it can beat most compressors
    while using significantly less memory. However, to top this particular benchmark, memory would not be enough,
    since of the top 4, 3 are heavily optimized for it, and the #1, CMIX, is just a beast

    I guess some sort of preprocessing step, using a transform as WRT or similar, could help, but that is way, way down
    on my list of priorities.

  40. The Following User Says Thank You to mpais For This Useful Post:

    Darek (28th March 2016)

  41. #119
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    919
    Thanks
    541
    Thanked 363 Times in 271 Posts
    Quote Originally Posted by mpais View Post
    If you have any file where you think EMMA is performing worse then expected, I'd appreciate it if you could post it here.
    Are you asking for JPEG files to improve or any kind of file?

    Generally after 0.1.6.64 version my testbed results are as on the attached image and there are some my outlook:

    JPEG - outstanding result, best for my one jpeg file.
    24bit - very good model, for TGA - best score, for BMP only 1.4% after GRALIC 1.7d.
    8bit - for my only one recognised file (TGA) score is sligtly worse than PAQ8PX or CMIX.
    Audio - best from nonspecialised compressors (only for audio), only OPTIMFROG beat EMMA by 4.1%.
    Exe - EMMA scores are better than PAQ8PX or PAQ8PXD, only CMIX and PAQ8KX are slightly better.
    Text - quite good model, only CMIX beat it by 5% on my files, and PAQAR/PASQDA could match with it.
    main compression - close to latest PAQPX/PXD versions and CMV, 3-5% worse than CMIX - In my opinion really good score against this program. Only CMIX and PAQ8KX beat it on noticeable level.

    I don't know if these differences to best scores are worth to work on it, however one file looks as you wrote: "performing worse then expected". It's Q.WK3 - Lotus spreadsheet file.
    Generally KX,CMIX,CMV and some other compressors get better results than EMMA for this file. PAQ8px and PAQ8pxd have close results. CMIX get 17.2% better score. It's look strange. I've attached this file if you want to look into it.

    If you are looking for PAQ8PX_V75 then maybe interested would be to look at PAQ8KX_v7 - original by by Bill Pettis, Feb. 13, 2007, then modified by Jan Ondrus in 2010 - little bit slower version of PAQ, however uses, as all older PAQ, only 1635MB of memory and get quite interesting results - for nonmodel files gest score about only 1.5% worse than CMIX. PAQ8KX_v7 despite older graphic and audio models is still #1 for my testbed.

    Maybe it could be helpful.
    Darek

    p.s. 50GB ENWIK9 score is quite amazing anyway - it set EMMA on 5'th place in LTCB benchmark! WRT or other transform could also help.
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	Testbed comparison.jpg 
Views:	140 
Size:	285.3 KB 
ID:	4230  
    Attached Files Attached Files
    • File Type: 7z Q.7z (401.0 KB, 51 views)
    Last edited by Darek; 29th March 2016 at 20:40.

  42. #120
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    Quote Originally Posted by Darek View Post
    Are you asking for JPEG files to improve or any kind of file?
    Any kind of file where EMMA needs to improve. From your testbed, that would be 0.WAV, D.TGA and H.EXE, besides the file you provided (thank you for that).
    With the TIFF files I know the reason for the bad performance, no need to see why.

    Quote Originally Posted by Darek View Post
    If you are looking for PAQ8PX_V75 then maybe interested would be to look at PAQ8KX_v7 - original by by Bill Pettis, Feb. 13, 2007, then modified by Jan Ondrus in 2010 - little bit slower version of PAQ, however uses, as all older PAQ, only 1635MB of memory and get quite interesting results - for nonmodel files gest score about only 1.5% worse than CMIX. PAQ8KX_v7 despite older graphic and audio models is still #1 for my testbed.
    I usually use paq8px_v69 (now 75) and paq8pxd_v16 to test, since in most benchmarks they seem to be the top listed compressors.
    Guess I'll have to check that version out too.

    Quote Originally Posted by Darek View Post
    p.s. 50GB ENWIK9 score is quite amazing anyway - it set EMMA on 5'th place in LTCB benchmark! WRT or other transform could also help.
    But then again, give CMIX 50GB of memory and then you'll see what an amazing score is

    Best regards

Page 4 of 13 FirstFirst ... 23456 ... LastLast

Similar Threads

  1. Context mixing file compressor for MenuetOS (x86-64 asm)
    By x3k30c in forum Data Compression
    Replies: 0
    Last Post: 12th December 2015, 06:19
  2. Context Mixing
    By Cyan in forum Data Compression
    Replies: 9
    Last Post: 23rd December 2010, 20:45
  3. Simple bytewise context mixing demo
    By Shelwien in forum Data Compression
    Replies: 11
    Last Post: 27th January 2010, 03:12
  4. Context mixing
    By Cyan in forum Data Compression
    Replies: 7
    Last Post: 4th December 2009, 18:12
  5. CMM fast context mixing compressor
    By toffer in forum Forum Archive
    Replies: 171
    Last Post: 24th April 2008, 13:57

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •