Page 1 of 13 12311 ... LastLast
Results 1 to 30 of 361

Thread: EMMA - Context Mixing Compressor

  1. #1
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts

    EMMA - Context Mixing Compressor

    EMMA v0.1.25, available at https://goo.gl/EDd1sS

    Hello,
    this is my first post, though I've been lurking here for some time.
    English is not my maternal language, so please bear with me.

    I'd like to present a small pet project of mine, an experimental C.M.
    compressor, designed to be a testing-ground for some ideas I wanted to try.
    It's a streaming compressor, meaning it doesn't do a first pass through
    the data before compressing, so all the parsing and filtering is done
    alongside the compression. Each model is optional (aside from the main one),
    and can be configured separately in terms of memory usage and modelling
    complexity. The mixing complexity and level of post-mixing refinement (SSE)
    are also configurable.

    The main objective was to attempt to reach compression ratios comparable to,
    or even surpassing, the current best CM archivers, while using less memory
    and trying to be faster. To that end, I've tried to develop better models
    and parsers, and have implemented many different data structures to keep
    context statistics.

    [Features]

    Besides the main model, EMMA has a few generic data models, named following
    the nomenclature established in the PAQ compressor series, such as:
    Code:
    - Match model, which models matches to previously seen data, using a maximum
    window size defined by the ring buffer
    - Sparse model, currently much simpler than the one used in later versions of PAQ
    - Indirect model
    - Record model, which models variable record length data, such as tables
    - DMC model, an exact conversion of the one in PAQ (by Matt Mahoney)
    - PPMd model, using mod_ppmd (by Eugene Shelwien, PPMd by Dmitry Shkarin)
    In addition to these, EMMA has models and parsers for specific data types:
    Code:
    - Text model
    - x86/x64 model, does partial disassembly of the executable code and derives contexts from
    the decoded instruction stream
    - XML model, models XML/HTML tags and structure, useful on small files
    - Image models/parsers for:
        - JPEG images (baseline only), including embedded images/thumbnails, MJPEG video frames
        - GIF images, static or animated
        - DICOM/ACR NEMA 1.0 & 2.0 images, uncompressed only, upto 16bpc
        - RAW image formats from digital cameras:
            - Sony .ARW (v2.x, only 11bit+7bit delta compressed images)
            - Panasonic/Leica .RAW and .RW2
            - Fujifilm .RAF raw images
            - Kodak .KDC
            - Olympus .ORF (only uncompressed images)
            - Pentax .PEF raw images, uncompressed only
            - Leaf/Aptus/Mamiya .MOS raw images, uncompressed only
            - Mamiya .MEF raw images
            - EPSON .ERF
        - PhotoShop .PSD images, uncompressed only, upto 64bpp
        - TIFF images (.TIF), uncompressed only, including multi-page. Only sequential files supported
        - Bitmap images (.BMP/.ICO/.CUR/.ANI, embedded resources in executables) in 1, 4, 8, 24 or 32bpp, uncompressed
        - Truevision TGA (TARGA) images (.TGA)
        - Netpbm image formats (.PPM, .PGM, .PBM, .PAM), upto 16bpc
        - SGI Graphics images (.SGI, .RGB)
    - Audio model (uncompressed 8/16bps, mono or stereo only), for:
        - Wave files (.WAV)
        - AIFF files
        - SoundFont files (.SF2)
        - Uncompressed PCM streams in .AVI files
        - Module file formats .MOD, .IT/MPTM, .S3M, .XM
    EMMA also has 3 transforms:
    Code:
    - x86/x64 code, interprets executable code and performs relative to
    absolute address conversion for unconditional jumps (commonly referred
    to as "e8/e9"), with an option to also process conditional jumps
    - Colorspace, performs a lossless RGB transformation
    - Delta, an adaptive delta transform
    Also included are some example dictionaries that can be used to
    pre-train the main model before compression. These are composed
    of a mandatory .DIC file, containing a simple word list, and an
    optional .EXP file, containing some common expressions in that
    language (currently only available for English). They can be
    used concurrently.

    Other options include a configurable mixing complexity, an option
    to use an adaptive learning rate, which may improve compression of
    (near) stationary sources, configurable 2nd stage refinement (SSE)
    complexity, a fast mode which speeds up compression when using the
    match model in case of long matches, and a special extreme compression
    mode, named "Ludicrous mode", which is significantly slower.

    Several presets are provided, fully customizable, for different data
    types, and you can easily create your own. The x64 version is not
    compatible with the x86 version, since it uses more memory

    To use the command line to (de)compress files:
    "EMMA C i input_file output_file", i is the index of the preset to use when compressing (0 based), and
    "EMMA D input_file output_directory" for decompression."

    Please keep in mind that this is experimental software, so bugs
    are par for the course. It's not optimized nor very thoroughly
    tested. The simple GUI was designed to make it easy to explore
    all the different options, and it is Windows only. An SSE2
    capable processor is required.

    As always, your criticism is very welcome and appreciated.

    Best regards
    Last edited by mpais; 11th February 2018 at 15:56.

  2. The Following 24 Users Say Thank You to mpais For This Useful Post:

    137ben (28th February 2016),Bulat Ziganshin (28th February 2016),comp1 (27th February 2016),Darek (27th February 2016),encode (28th February 2016),Gonzalo (27th February 2016),Hacker (7th October 2016),hexagone (27th February 2016),Jan Ondrus (29th February 2016),lorents17 (8th May 2016),LucaBiondi (27th February 2016),Matt Mahoney (4th March 2016),Mauro Vezzosi (28th February 2016),mhajicek (28th February 2016),Mike (27th February 2016),necros (20th September 2016),RamiroCruzo (22nd August 2017),Razor12911 (15th October 2016),samsat1024 (20th June 2016),schnaader (28th February 2016),Shelwien (4th September 2016),Stephan Busch (28th February 2016),xinix (28th February 2016),zhisheng wang (27th February 2016)

  3. #2
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    224
    Thanks
    99
    Thanked 133 Times in 96 Posts
    No CLI?
    If you are interested, I made a quick search for italian words list and I found these:
    - 60453 words.
    Page http://informationpoint.forumcommunity.net/?t=37000925
    Link to zip file http://informationpoint.forumcommuni...t&id=259262063
    It seems to be a good list.
    - 245K words.
    Page http://www.yorku.ca/lbianchi/italian.html and http://www.yorku.ca/lbianchi/italian_words/index.html
    Link to zip file http://www.yorku.ca/lbianchi/italian.../italia-1a.zip
    This wordlist is Copyright © 1993-2002 Luigi M Bianchi, and is made freely available under the terms of the GNU General Public License. For further information concerning this license, please visit the GNU Project Web Server.
    It seems to be a very comprehensive list, contain also foreign words used in Italy.

  4. The Following User Says Thank You to Mauro Vezzosi For This Useful Post:

    mpais (27th February 2016)

  5. #3
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    Some results:

    Code:
    [TEXT]
    
    File: book1, from Calgary Corpus, 768.771 bytes
    
    191.570 bytes, paq8px_v69_sse2 -8
    188.166 bytes, paq8pxd_v16_skbuild_x64 -s15
    188.812 bytes, EMMA
    179.432 bytes, EMMA with English dictionary
    
    File: book2, from Calgary Corpus, 610.856 bytes
    
    120.402 bytes, paq8px_v69_sse2 -8
    118.603 bytes, paq8pxd_v16_skbuild_x64 -s15
    120.868 bytes, EMMA
    117.110 bytes, EMMA with English dictionary
    
    File: world95.txt, from Maximum Compression, 2.988.578 bytes
    
    351.178 bytes, paq8px_v69_sse2 -8
    331.151 bytes, paq8pxd_v16_skbuild_x64 -s15
    356.438 bytes, EMMA
    351.572 bytes, EMMA with English dictionary
    Code:
    [Executables]
    
    File: acrodr32.exe, from Maximum Compression, 3.870.784 bytes
    
    892.419 bytes, paq8px_v69_sse2 -8
    889.113 bytes, paq8pxd_v16_skbuild_x64 -s15
    884.003 bytes, EMMA
    
    File: mso97.dll, from Maximum Compression, 3.782.416 bytes
    
    1.255.145 bytes, paq8px_v69_sse2 -8
    1.255.182 bytes, paq8pxd_v16_skbuild_x64 -s15
    1.235.588 bytes, EMMA
    
    File: ooffice, from Silesia Corpus, 6.152.192 bytes
    
    1.419.146 bytes, paq8px_v69_sse2 -8
    1.403.217 bytes, paq8pxd_v16_skbuild_x64 -s15
    1.386.389 bytes, EMMA
    Code:
    [Images, 24bpp]
    
    File: Lena.bmp, 786.488 bytes
    
    412.604 bytes, paq8px_v69_sse2 -8
    411.193 bytes, paq8pxd_v16_skbuild_x64 -s15
    409.161 bytes, EMMA
    
    File: Baboon.bmp, 786.488 bytes
    
    559.950 bytes, paq8px_v69_sse2 -8
    558.145 bytes, paq8pxd_v16_skbuild_x64 -s15
    556.589 bytes, EMMA
    
    File: dog.bmp, 820.890 bytes
    
    216.045 bytes, paq8px_v69_sse2 -8
    215.794 bytes, paq8pxd_v16_skbuild_x64 -s15
    203.670 bytes, EMMA with colorspace transform
    Code:
    [Images, 8bpp]
    
    File: Lena.bmp, 256 color palette, 263.222 bytes
    
    120.016 bytes, paq8px_v69_sse2 -8
    110.463 bytes, paq8pxd_v16_skbuild_x64 -s15
    105.644 bytes, EMMA
    
    File: Lena.bmp, grayscale, 263.224 bytes
    
    125.872 bytes, paq8px_v69_sse2 -8
    126.205 bytes, paq8pxd_v16_skbuild_x64 -s15
    124.046 bytes, EMMA
    
    File: barbara.bmp, grayscale, 263.224 bytes
    
    141.923 bytes, paq8px_v69_sse2 -8
    143.312 bytes, paq8pxd_v16_skbuild_x64 -s15
    138.437 bytes, EMMA
    Code:
    [JPEG]
    
    File: DSCN3974.jpg, from http://compression.ca/act/act-jpeg.html, 1.114.198 bytes
    
    827.025 bytes, paq8px_v69_sse2 -8
    826.588 bytes, paq8pxd_v16_skbuild_x64 -s8
    822.541 bytes, EMMA
    
    File: A10.jpg, from Maximum Compression, 842.468 bytes
    
    637.109 bytes, paq8px_v69_sse2 -8
    636.815 bytes, paq8pxd_v16_skbuild_x64 -s8
    635.476 bytes, EMMA
    
    File: CIMG3322.jpg, 4.441.863 bytes
    
    3.535.489 bytes, paq8px_v69_sse2 -8
    3.530.277 bytes, paq8pxd_v16_skbuild_x64 -s8
    3.386.798 bytes, EMMA
    
    File: MPC_2011_11-web.pdf, from SqueezeChart (http://dl.maximumpc.com/Archives/MPC_2011_11-web.pdf), 14.384.698 bytes
    
    9.956.189 bytes, paq8px_v69_sse2 -8
    9.973.122 bytes, paq8pxd_v16_skbuild_x64 -s8
    9.598.268 bytes, EMMA
    
    File: bruce1.sleepytom.SGP.mjpeg.avi, from https://archive.org/details/SecretGardenParty, 3.092.870 bytes
    
    2.941.465 bytes, paq8px_v69_sse2 -8
    3.006.568 bytes, paq8pxd_v16_skbuild_x64 -s8
    2.027.930 bytes, EMMA
    Code:
    [GIF]
    
    File: filou.gif, from SqueezeChart (http://www.squeezechart.com/filou.gif), 1.854.179 bytes
    
    1.782.417 bytes, paq8px_v69_sse2 -8
    1.781.944 bytes, paq8pxd_v16_skbuild_x64 -s15
    1.388.815 bytes, EMMA
    
    File: cat.gif, 3.294.027 bytes
    
    3.233.426 bytes, paq8px_v69_sse2 -8
    3.235.541 bytes, paq8pxd_v16_skbuild_x64 -s15
    2.619.713 bytes, EMMA
    
    File: GMARBLES.gif, 1.106.266 bytes
    
    1.032.387 bytes, paq8px_v69_sse2 -8
    1.028.414 bytes, paq8pxd_v16_skbuild_x64 -s15
    773.256 bytes, EMMA
    Last edited by mpais; 28th February 2016 at 20:30.

  6. #4
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    Quote Originally Posted by Mauro Vezzosi View Post
    No CLI?
    If you are interested, I made a quick search for italian words list and I found these:
    - 60453 words.
    Page http://informationpoint.forumcommunity.net/?t=37000925
    Link to zip file http://informationpoint.forumcommuni...t&id=259262063
    It seems to be a good list.
    - 245K words.
    Page http://www.yorku.ca/lbianchi/italian.html and http://www.yorku.ca/lbianchi/italian_words/index.html
    Link to zip file http://www.yorku.ca/lbianchi/italian.../italia-1a.zip
    This wordlist is Copyright © 1993-2002 Luigi M Bianchi, and is made freely available under the terms of the GNU General Public License. For further information concerning this license, please visit the GNU Project Web Server.
    It seems to be a very comprehensive list, contain also foreign words used in Italy.
    Hello, thank you for the links, I'll check them out. I have made provisions in the code for the use of upto 8 dictionaries,
    but have still not decided on the languages, though you can quite easily test with any word list of your choosing, simply
    rename it to one of the 4 included dictionaries. The only limitation is that files with repeated words will throw an error.

    As for the CLI, at such an early stage in development I didn't really want to commit to a fixed definition of command line
    switches, since the number of options is very large and the code was constantly changing as I added new models or tweaked
    existing ones. I certainly plan on doing it, but I don't really have much free time, so I'll have to find a way to prioritize my
    "to-do" list of features.

    Best regards

  7. #5
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    I can add to LTCB if you like to run benchmarks on enwik8/9. Let me know what options you use, time, memory usage, and hardware. I could test myself but you probably know better what options you want. Also if you decide to open source I can add to the Silesia benchmark. A CLI would make testing easier.

  8. The Following User Says Thank You to Matt Mahoney For This Useful Post:

    mpais (27th February 2016)

  9. #6
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    467
    Thanks
    202
    Thanked 81 Times in 61 Posts
    Well done! Impressive results.
    A few comments:
    1) GUI under-sized. I can't see all controls on my screen
    2) Ratio info: Is more handy to see the actual ratio, not the percent saved. Is what all compressors do.
    3) You block the source file even for read only access. Block it only for writing.
    4) Please add a verbose output, while compressing, or at least when finished. So we can see what actual operations EMMA performed, filter used, data types found, etc.

    ==============================

    I made a little dirty hack using Resource Hacker in order to fix #1 issue. Now the Start buttons are a little upper and the GUI can be maximized.
    Attached Files Attached Files
    Last edited by Gonzalo; 27th February 2016 at 23:18.

  10. The Following User Says Thank You to Gonzalo For This Useful Post:

    mpais (27th February 2016)

  11. #7
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    Quote Originally Posted by Matt Mahoney View Post
    I can add to LTCB if you like to run benchmarks on enwik8/9. Let me know what options you use, time, memory usage, and hardware. I could test myself but you probably know better what options you want. Also if you decide to open source I can add to the Silesia benchmark. A CLI would make testing easier.
    Hello, thank you for the offer, I haven't tested on enwik8/9 yet but I'll give it a try, though I think that for such large files the limited memory usage of the models will play a large role in the results.

    I honestly haven't decided if I should open-source it, I'm currently pondering 3 options:

    1) Try to "clean-up" the code and open-source it, but I don't think it would be of much interest.
    I've written it mostly for research purposes, it's nowhere near "production" quality, and I would most
    likely have to rewrite large chunks of code, and that time could be spent on the other options.

    2) Try to port as much of the code as possible to PAQ. I've learned a lot from studying PAQ and
    since it's code is already available and understood by others, it would make sense to contribute
    to it. Though since EMMA performs all parsing and filtering online, it would make porting a lot
    of the components infeasible.

    3) Probably the best option, but also the most time consuming: create detailed documentation,
    clearly describing (and including pseudo-code) the execution flow, each component, reasoning
    for every choice made, and so on. This would allow anyone interested to study it and possibly
    improve upon the results, without having to wonder what some cryptic piece of code does.

    Best regards

  12. #8
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    Quote Originally Posted by Gonzalo View Post
    Well done! Impressive results.
    A few comments:
    1) GUI under-sized. I can't see all controls on my screen
    2) Ratio info: Is more handy to see the actual ratio, not the percent saved. Is what all compressors do.
    3) You block the source file even for read only access. Block it only for writing.
    4) Please add a verbose output, while compressing, or at least when finished. So we can see what actual operations EMMA performed, filter used, data types found, etc.

    ==============================

    I made a little dirty hack using Resource Hacker in order to fix #1 issue. Now the Start buttons are a little upper and the GUI can be maximized.
    Hello, thank you for your interest.

    Regarding 1), I'm afraid it's a consequence of the language used (Delphi XE7), DPI scaling has quite a few bugs, I'll try to iron them out, sorry for the inconvenience.
    As for the others, I will try to address them in the next version, this GUI was only designed for testing the compression engine, so I didn't really spend much time on it, I just
    wanted to have a working version of the software to share and get feedback.

  13. #9
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    919
    Thanks
    541
    Thanked 363 Times in 271 Posts
    Very impressive! Good scores.

    My comments:

    1) Good, quite stable GUI, however sometimes (randomally) crashes after finishing compression. According to view - on my machine everything looks properly.
    2) It could be very usable to save choosen compression parameters. Actually it takes some time to choose almost all options which I do (methods, memory, option, etc...).
    3) about compression:

    - audio - very good ratio, best from the PAQ series for my test, just slightly worse than WinRK;
    - graphic models - also very good model - one of the best I've ever seen (maybe only some graphic specialised compressors like Gralic 1.7d could beat it for some BMP files), however it not recognised some TIFF files from my testbed;
    - JPEG - for my test this model is better than Stuffit and any other model used in PAQ family;
    - text model - quite good - better than PAQ and CMV models, in total score for small files similar to WinRK, but actually worse about 10-15% than CMIX - that a quite huge difference - maybe ENWIK8/9 tests show more...
    - general (non-special-model) compression - 2% worse than CMV or PAQ8pxd V16 skybuild3, 4% worse than best PAQ version PAQ8kx_v7 and about 6% worse than CMIX v6 - maybe more memory should help or something like SUPER-INSANE options could help... ?

    Table with my testbed scores are in attached JPG file.

    4) About dictionaries. Polish word dictionary (words with variations) in quite good quality is placed in this location - this is official source: http://sjp.pl/slownik/odmiany/sjp-odm-20160208.zip.

    Best Regards,
    Darek
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	Tests1.jpg 
Views:	285 
Size:	375.0 KB 
ID:	4103  

  14. The Following 2 Users Say Thank You to Darek For This Useful Post:

    Mauro Vezzosi (28th February 2016),mpais (28th February 2016)

  15. #10
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    Quote Originally Posted by Darek View Post
    Very impressive! Good scores.

    My comments:

    1) Good, quite stable GUI, however sometimes (randomally) crashes after finishing compression. According to view - on my machine everything looks properly.
    2) It could be very usable to save choosen compression parameters. Actually it takes some time to choose almost all options which I do (methods, memory, option, etc...).
    3) about compression:

    - audio - very good ratio, best from the PAQ series for my test, just slightly worse than WinRK;
    - graphic models - also very good model - one of the best I've ever seen (maybe only some graphic specialised compressors like Gralic 1.7d could beat it for some BMP files), however it not recognised some TIFF files from my testbed;
    - JPEG - for my test this model is better than Stuffit and any other model used in PAQ family;
    - text model - quite good - better than PAQ and CMV models, in total score for small files similar to WinRK, but actually worse about 10-15% than CMIX - that a quite huge difference - maybe ENWIK8/9 tests show more...
    - general (non-special-model) compression - 2% worse than CMV or PAQ8pxd V16 skybuild3, 4% worse than best PAQ version PAQ8kx_v7 and about 6% worse than CMIX v6 - maybe more memory should help or something like SUPER-INSANE options could help... ?

    Table with my testbed scores are in attached JPG file.

    4) About dictionaries. Polish word dictionary (words with variations) in quite good quality is placed in this location - this is official source: http://sjp.pl/slownik/odmiany/sjp-odm-20160208.zip.

    Best Regards,
    Darek
    Hello, thank you for the feedback.

    I had already thought of allowing the parameters to be saved as "presets" that could be reused, but maybe a CLI would be better, as requested above.

    The TIFF file cannot be detected because I don't have a parser for it, due to the format specification not being "stream-friendly", since the offset of the
    first IFD can be after the image data.

    The text model is relatively simple, and being limited to 64Mb of memory, will have a hard time competing on large text files, and CMIX is completely
    out of it's league. That was one of the reasons why I decided to use external dictionaries, to attempt to improve text compression without having to
    drastically increase the memory usage or the model complexity. Also, the option to use an adaptive learning rate seems to do well with text files, due
    to their stationary nature.

    As for the "super-insane" mode, it's actually already in the code, just disabled because I didn't think anyone would want it. The data structure used
    for the models is very simple, and in my testing provided the best ratio of compression/performance, but I have implemented many others, including
    one which on some files provides a nice boost to compression (ex: acrord32.exe goes from 884.003 bytes to 869.027). I will enable it in the next version.

    Thank you for the word list, I guess I'll have a hard time choosing the languages to include by default.

    Best regards

  16. #11
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    467
    Thanks
    202
    Thanked 81 Times in 61 Posts
    Q: Is it really mandatory to set an 8 languages limit? After all, it is a demo now, you can change whatever you want until you release a stable version.

    Notice: EMMA did well on bad JPG. PackJPG didn't. So nice work.

    Suggestion: If you open the source right now, you will get a lot more of help. Not me, I am a donkey. But I know of certain users who are really good at tuning PAQ* cousins behaviour just for fun.
    Last edited by Gonzalo; 28th February 2016 at 00:56.

  17. #12
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    Quote Originally Posted by Gonzalo View Post
    Q: Is it really mandatory to set an 8 languages limit? After all, it is a demo now, you can change whatever you want until you release a stable version.

    Notice: EMMA did well on bad JPG. PackJPG didn't. So nice work.
    No, I simply reserved 8 bits to serve as a binary mask for determining the dictionaries used (so as to allow their concurrent usage).
    I only provided 4 because those were the languages I am reasonably fluent in, and even then they are not very good, but can be used
    for testing.

    Best regards

  18. #13
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Quick test with enwik8 -> 20,885,770 in about 30 minutes to compress or decompress on a 2.0 GHz T3200, 3 GB.
    Click image for larger version. 

Name:	Capture.PNG 
Views:	518 
Size:	108.6 KB 
ID:	4104

  19. The Following User Says Thank You to Matt Mahoney For This Useful Post:

    mpais (28th February 2016)

  20. #14
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    Click image for larger version. 

Name:	enwik8.png 
Views:	341 
Size:	31.9 KB 
ID:	4105

    enwik8, compressed to 18.412.076 bytes in 1795,6s on an Intel i5 2400.

    I was able to get it down to 18.369.075 bytes by using other models, but it's not worth it as the compression time goes up considerably (2683,5s).
    I'll test if increasing the memory for the text model from 64Mb to 256Mb is worthwhile.

  21. #15
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Most likely it will. The single biggest factor affecting compression in LTCB is memory usage. cmix wins because it requires 32 GB of RAM.

  22. #16
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    I guess I should explain a bit more what each option does.

    EMMA has a mandatory main model (modelling from order 6 to 9), optional models,
    transforms and dictionaries.

    For the main model, you can select its memory usage, up to 256Mb, aside from
    the maximum order.

    The ring buffer contains the previously encoded data, and can vary from 4 to
    32 Mb. It is shared by the models.

    The mixing complexity determines the structure and contexts used for mixing the
    prediction of each model, and the refinement (or Secondary Symbol Estimation)
    determines the level of post-processing done to the result of the mixing step.
    The adaptive learning rate allows the mixer to adapt the current learning
    rate, which is useful on stationary sources.

    When using the match model, you can use a "fast mode" when long matches are
    found, which skips the modelling and mixing and uses the prediction from the
    match model exclusively.

    Some optional models rely on parsing to recognize certain data, and remain
    dormant until called upon, at which time they claim exclusive control of
    encoding, shutting down all other models except the match model, if in use.
    These models are:
    - Images models (1bpp, 4bpp, 8bpp, 24bpp, 32bpp), active when parsing detects
    BMP, Targa or SGI files, or embedded DIB's in executable files, such as icons.
    - Audio models, active when parsing detects 8/16 bits mono/stereo uncompressed
    audio streams in WAV, AIFF and AVI files
    - JPEG model, active when parsing detects baseline JPEG files, or MJPEG frames
    - GIF model

    The transform for executable code parses the input stream and attempts
    to decode it as x86/x64 instructions, and only performs the relative to
    absolute address conversion when at least 8 consecutive valid instructions
    precede it. Note that the parsing done by the transform is independent
    from that done by the x86/x64 model, as the transform does not need to
    attempt disassembly for gathering contexts. The option to also process
    conditional jumps can sometimes provide interesting compression gains.

    The colorspace transform is also dependent on the usage of image models
    and the parsing, since it is only applied to 24bpp or 32bpp images. It
    is usually beneficial to compression.

    The delta transform attempts to find correlations in the data by interpreting
    it as 8, 16 or 32 bit integers, and checking if delta-coding them may
    lead to better compression. It's usually not very useful.

    For text, most models can be skipped, as the most benefit comes from using the
    text and match models, with the indirect, sparse, distance and DMC models
    providing only small incremental gains. If the source is truly only text,
    the adaptive learning rate usually gives nice gains for free (in terms of time),
    though text mixed with formatting code, logs, or other data that doesn't have the
    same stationary property will not benefit from it.

    On the other hand, for x86/x64 executable files, if maximum compression is desired,
    almost all models will have to be used, since these files often contain a lot more
    than just code. The executable and colorspace transforms should also be used,
    as they can provide very substantial gains nearly for free (in terms of time).

    If the file type is known and EMMA has a specific model for it, then usually selecting
    only that model will give almost the best possible result, though in case the file is
    some sort of container that simply has only files of that type in it, the match model
    is useful to detect duplicates and, when used with fast mode, can provide not only
    good compression gains, but also reduce the encoding time.

    The available dictionaries will, when selected, be used to pre-train the engine,
    and you can use more than one at the same time, should the need arise.

  23. The Following 7 Users Say Thank You to mpais For This Useful Post:

    Bulat Ziganshin (28th February 2016),Darek (29th February 2016),Gonzalo (28th February 2016),Matt Mahoney (29th February 2016),Mauro Vezzosi (28th February 2016),Paul W. (28th February 2016),samsat1024 (25th June 2016)

  24. #17
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    919
    Thanks
    541
    Thanked 363 Times in 271 Posts
    Quote Originally Posted by mpais View Post
    Click image for larger version. 

Name:	enwik8.png 
Views:	341 
Size:	31.9 KB 
ID:	4105

    enwik8, compressed to 18.412.076 bytes in 1795,6s on an Intel i5 2400.

    I was able to get it down to 18.369.075 bytes by using other models, but it's not worth it as the compression time goes up considerably (2683,5s).
    I'll test if increasing the memory for the text model from 64Mb to 256Mb is worthwhile.
    My test: enwik8: 18.299.608 bytes. Time: 2524s on i7 4900MQ.
    Options - every option on maximum mode (maximum memory, maximal complexity). All transforms on. Dictionaries - only English. "Use fast mode on long matches" - off.

    From my observations - using all models in the same time mostly has no any negative effects on compression! Then you could add all options an you are sure that you get maximum compression... Of course I could be wrong in some cases, but generally it looks like - according to my tests. Sometimes adding new model/options which is not direct connected to compressed data gives couple of bytes smaller file. Maybe it's not quite a lot but it's always better.

    However there are two exceptions:
    First exeption: "Use fast mode on long matches" - sometimes gives better scores, sometime worse - depends on file.
    Second exception: using all Dictionaries mostly gives worse compression than use selected one (for my cases English).

    Best regards,
    Darek
    Last edited by Darek; 29th February 2016 at 02:43.

  25. #18
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    A quick follow up on enwik8 testing.

    Keeping the same options as before (use the match, text and indirect model on max, main model on max, mixing and refinement on max):
    - Increasing the memory for the text model from 64Mb to 256Mb gives a result of 18.269.934 bytes
    - Increasing the memory for the main model from 256Mb to 512Mb gives a result of 18.254.313 bytes

    So not really worth it, IMHO.
    I think that memory can be better used for more optional models, which frankly are the best way to achieve good "practical" compression
    on most files. I had thought of creating other models (for ARMv7 executables, 16bpp images, XML, etc) but those seem to be of little use.

    Best regards

  26. #19
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    776
    Thanks
    63
    Thanked 270 Times in 190 Posts
    Quote Originally Posted by mpais View Post
    Click image for larger version. 

Name:	enwik8.png 
Views:	341 
Size:	31.9 KB 
ID:	4105

    enwik8, compressed to 18.412.076 bytes in 1795,6s on an Intel i5 2400.

    I was able to get it down to 18.369.075 bytes by using other models, but it's not worth it as the compression time goes up considerably (2683,5s).
    I'll test if increasing the memory for the text model from 64Mb to 256Mb is worthwhile.
    enwik9, compressed to 155,015,228 bytes in 12649.9s on Intel i7 3960X OC 4.4GHz, with setting screenshot.

  27. The Following 2 Users Say Thank You to Sportman For This Useful Post:

    Matt Mahoney (29th February 2016),mpais (28th February 2016)

  28. #20
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    542
    Thanks
    194
    Thanked 174 Times in 81 Posts
    Good work, there is some nice progress here! The real gem for me is the GIF model as it gives file sizes between PAQ and Precomp + PAQ:

    Code:
    filou.gif:
    
    1.795.248 bytes, zpaq v7.05 -method 5
    1.388.815 bytes, EMMA
    1.104.854 bytes, Precomp v0.4.4 | zpaq v7.05 -method 5
    
    GMARBLES.gif:
    
    1.055.871 bytes, zpaq v7.05 -method 5
      773.256 bytes, EMMA
      561.774 bytes, Precomp v0.4.4 | zpaq v7.05 -method 5
    If it works as I suppose - predicting compressed symbols from the content decompressed so far - it has much potential. Something like this was often discussed here, but never implemented. It could be used to compress other already compressed data like deflate/lzma/bzip2/xz/... streams similar to Precomp, but with the important difference that recompression doesn't have to be exact (it doesn't hurt much if the predicted compressed symbol was wrong), so implementation is easier.
    Last edited by schnaader; 28th February 2016 at 18:35.
    http://schnaader.info
    Damn kids. They're all alike.

  29. The Following User Says Thank You to schnaader For This Useful Post:

    mpais (28th February 2016)

  30. #21
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    I've made a few quick changes and have updated the 1st post with a new version.

    Changes:

    - Ratio displayed is now the actual ratio, not the percentage saved
    - Fix input file access sharing rights
    - Real-time parsing information
    - Preliminary "Presets" support
    - Implemented "Ludicrous complexity mode" for maximum compression,
    extremely slow
    - Fixed a bug in the AVI parser
    - I've tested the GUI at 96, 120 and 144DPI in Windows 8.1 and 10,
    but I can't be sure it will always be properly scaled

    The new complexity level changes the data structure used by some models:
    - Main, Sparse, Indirect, x86/x64 and Record models

    It provides some additional gains on binary files, but not much for text.
    Executables files seem to benefit the most, since all of those models
    are usually useful for such files.

    Some results:

    Code:
    File: acrodr32.exe, 3.870.784 bytes
    
    884.003 bytes, EMMA v0.1
    869.176 bytes, EMMA v0.1.1 Ludicrous
    
    File: mso97.dll, 3.782.416 bytes
    
    1.235.588 bytes, EMMA v0.1
    1.217.680 bytes, EMMA v0.1.1 Ludicrous
    
    File: ooffice, 6.152.192 bytes
    
    1.386.389 bytes, EMMA v0.1
    1.356.452 bytes, EMMA v0.1.1 Ludicrous
    
    File: sao, 7.251.944 bytes
    
    3.765.707 bytes, EMMA v0.1
    3.762.076 bytes, EMMA v0.1.1 Ludicrous
    
    File: x-ray, 8.474.240 bytes
    
    3.593.212 bytes, EMMA v0.1
    3.591.018 bytes, EMMA v0.1.1 Ludicrous
    Further improvements will need either more memory, or changes to the models.

    Best regards

  31. The Following 2 Users Say Thank You to mpais For This Useful Post:

    Darek (29th February 2016),Gonzalo (29th February 2016)

  32. #22
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    919
    Thanks
    541
    Thanked 363 Times in 271 Posts
    Quote Originally Posted by mpais View Post
    A quick follow up on enwik8 testing.

    Keeping the same options as before (use the match, text and indirect model on max, main model on max, mixing and refinement on max):
    - Increasing the memory for the text model from 64Mb to 256Mb gives a result of 18.269.934 bytes
    - Increasing the memory for the main model from 256Mb to 512Mb gives a result of 18.254.313 bytes

    So not really worth it, IMHO.
    I think that memory can be better used for more optional models, which frankly are the best way to achieve good "practical" compression
    on most files. I had thought of creating other models (for ARMv7 executables, 16bpp images, XML, etc) but those seem to be of little use.

    Best regards
    At first thanks for new version!

    In my opinion, we should distinguish two different purposes which could be considered - "practical" use and "experimental = testing and record breaking" usages.

    And EMMA could fulfill both of them. Maybe, for easier use it could have some presets for easier choosing and for less advanced users for compress modes: Minimum, Medium, Practical, Maximum and/or Insane compression.

    I’m a maximum compression fan then my purpose is to find best compression ratio which means that “Time Is Not An Issue”. Then, for me if larger memory, then better, if more insane model then better. Of course some users could have use different setup.

    Then if possible could you add more memory for text model and main model as you test ENWIK8 – it would be super! For me is worth it to compress much slower and a little better. That’s a nature of maximum compression finding – time and complexity rise exponentially to compression level. Then I’ll wait for mentioned Super-Insane compression level….

    According to TIFF model parser – latest version of PAQ8pxd v16 skbuild3 has a very good parser for TIFF – to be honest it made the best score for TIFF files for my test. If you could use it then it help much to this kind of files..

    And last but not least: WNWIK9 for my setup (settings on screenshot): 153.971.476 bytes, time 28943,2s, on i7 4900MQ

    Best regards,
    Darek
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	Options.jpg 
Views:	218 
Size:	204.5 KB 
ID:	4107  
    Last edited by Darek; 29th February 2016 at 02:43.

  33. #23
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    Quote Originally Posted by schnaader View Post
    Good work, there is some nice progress here! The real gem for me is the GIF model as it gives file sizes between PAQ and Precomp + PAQ:

    Code:
    filou.gif:
    
    1.795.248 bytes, zpaq v7.05 -method 5
    1.388.815 bytes, EMMA
    1.104.854 bytes, Precomp v0.4.4 | zpaq v7.05 -method 5
    
    GMARBLES.gif:
    
    1.055.871 bytes, zpaq v7.05 -method 5
      773.256 bytes, EMMA
      561.774 bytes, Precomp v0.4.4 | zpaq v7.05 -method 5
    If it works as I suppose - predicting compressed symbols from the content decompressed so far - it has much potential. Something like this was often discussed here, but never implemented. It could be used to compress other already compressed data like deflate/lzma/bzip2/xz/... streams similar to Precomp, but with the important difference that recompression doesn't have to be exact (it doesn't hurt much if the predicted compressed symbol was wrong), so implementation is easier.
    The GIF model was just a proof of concept that I wanted to try, since due to the streaming nature of EMMA, it could never
    really achieve good results, since I'm limited to attempting to predict LZW codewords, and with the wrong endianness at that.

    I did some tests when I was implementing it, and I believe a dedicated bit-lossless compressor for GIF images could realistically
    achieve a further 50% to 70% reduction in size, by processing each frame at once, decompressing it, compressing it with a much
    stronger model (8bbp), and then simply storing the necessary parameters for the decompressor to be able to perform a bit-lossless
    reconstruction (initial dictionary size, block-lengths, position of the clear codes, position of the EOI code, etc).

    Simply converting the files you mentioned to 8bpp BMP files (a visually lossless transformation) and compressing them, we get:

    Code:
    filou.bmp
    921.011 bytes, EMMA v0.1.1 w/ Image (Slow)
    949.330 bytes, paq8pxd_v16_skbuild4_x64 -s15
    996.560 bytes, paq8px_v69 -8
    
    GMARBLES.bmp
    463.360 bytes, EMMA v0.1.1 w/ Image (Slow)
    471.789 bytes, paq8pxd_v16_skbuild4_x64 -s15
    477.650 bytes, paq8px_v69 -8
    Even accounting for the extra side information required, we can see that much better results (than those of EMMA) could be achieved.

    Best regards
    Last edited by mpais; 28th February 2016 at 21:46.

  34. #24
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    Quote Originally Posted by Darek View Post
    At first thanks for new version!

    In my opinion, we should distinguish two different purposes which could be considered - "practical" use and "experimental = testing and record breaking" usages.

    And EMMA could fulfill both of them. Maybe, for easier use it could have some presets for easier choosing and for less advanced users for compress modes: Minimum, Medium, Practical, Maximum and/or Insane compression.

    I’m a maximum compression fan then my purpose is to find best compression ratio which means that “Time Is Not An Issue”. Then, for me if larger memory, then better, if more insane model then better. Of course some users could have use different setup.

    Then if possible could you add more memory for text model and main model as you test ENWIK8 – it would be super! For me is worth it to compress much slower and a little better. That’s a nature of maximum compression finding – time and complexity rise exponentially to compression level. Then I’ll wait for mentioned Super-Insane compression level….

    According to TIFF model parser – latest version of PAQ8pxd v16 skbuild3 has a very good parser for TIFF – to be honest it made the best score for TIFF files for my test. If you could use it then it help much to this kind of files..

    And last but not least: WNWIK9 for my setup (settings on screenshot): 153.971.476 bytes, time 28943,2s, on i7 4900HQ

    Best regards,
    Darek
    Well, I had that in mind - an experimental, practical, C.M. compressor - from the start, but then thought that there wouldn't
    be much interest in "extreme-compression", since for that there is CMIX.

    I think you'll be pleased to see that the new version already as the (basic) preset functionality, with presets for text, executables,
    images, audio, and the option to fully customize them or create your own. I've also included the aptly (?) named "Ludicrous mode"
    for when extreme compression is required.

    As for the TIFF parser, the specification states that the first IFD can come after the pixel data, so a streaming compressor as EMMA
    would only be able to parse it when the pixel data was already encoded. I guess I can still create a parser, but it will probably miss
    most TIFF files. I'll have to research that further.

    Best regards

  35. #25
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    919
    Thanks
    541
    Thanked 363 Times in 271 Posts
    Quote Originally Posted by mpais View Post
    Well, I had that in mind - an experimental, practical, C.M. compressor - from the start, but then thought that there wouldn't
    be much interest in "extreme-compression", since for that there is CMIX.

    I think you'll be pleased to see that the new version already as the (basic) preset functionality, with presets for text, executables,
    images, audio, and the option to fully customize them or create your own. I've also included the aptly (?) named "Ludicrous mode"
    for when extreme compression is required.

    As for the TIFF parser, the specification states that the first IFD can come after the pixel data, so a streaming compressor as EMMA
    would only be able to parse it when the pixel data was already encoded. I guess I can still create a parser, but it will probably miss
    most TIFF files. I'll have to research that further.

    Best regards
    Yes! I've tested Version 0.1.1 and it had a great improvements:
    - saving options + presets works fine,
    - text model improvement - good,
    - Ludicrous mode - great! It add about 2% to compression ratio for files w/o special models - thats quite a lot. Question - I know, it's bore, but could it be tweaked even more ludicrous?

    About TIFF - ok, I understand. I thought that parser could be simply copied from PAQ...

    About CMIX - it's generally great compressor, however it has two disadvantages now:
    - no special compression models (audio, graphic, jpeg, exe...)
    - is quite hard to make proper x86/64 build - last x86/64 stable version was cmix v6 by Skymmer, next version (v7) works fine for x86/64 only in general compression - means w/o English Dictionary, there are still lack of version 8 of Cmix for x86/64...

    Then for my testbed CMIXv6 on total score is worse than EMMA V0.1.1 by 3%..

    Ok, I'll wait for next releases.
    Best regards,
    Darek
    Last edited by Darek; 28th February 2016 at 23:38.

  36. #26
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    520
    Thanks
    196
    Thanked 744 Times in 301 Posts
    Quote Originally Posted by Darek View Post
    Yes! I've tested Version 0.1.1 and it had a great improvements:
    - saving options + presets works fine,
    - text model improvement - good,
    - Ludicrous mode - great! It add about 2% to compression ratio for files w/o special models - thats quite a lot. Question - I know, it's bore, but could it be tweaked even more ludicrous?

    About TIFF - ok, I understand. I thought that parser could be simply copied from PAQ...

    About CMIX - it's generally great compressor, however it has two disadvantages now:
    - no special compression models (audio, graphic, jpeg, exe...)
    - is quite hard to make proper x86/64 build - last x86/64 stable version was cmix v6 by Skymmer, next version (v7) works fine for x86/64 only in general compression - means w/o English Dictionary, there are still lack of version 8 of Cmix for x86/64...

    Then for my testbed CMIXv6 on total score is worse than EMMA V0.1.1 by 3%..

    Ok, I'll wait for next releases.
    Best regards,
    Darek
    I'm glad everything worked, I didn't really have much time to test everything.
    Ludicrous mode is just a switch that tells the models to use a (much) slower
    structure for the context stats, the best one I could came up with. The models
    remain the same, any improvement comes from that.

    During development I had a simple "rule": any new context added to a model
    should result in at least a further 0.1% reduction in relative compressed size,
    so I would consider a model "finished" when I couldn't meet that goal. Obviously
    this was only possible when testing on files that benefited from that model
    (no point in tuning a text model on binary data). So in order to further increase
    the compression ratio, I would most likely have to "beef up" some models and
    give them more memory. But if I keep increasing the memory usage, soon enough
    I'll have to switch to a 64-bit version of EMMA, and that would defeat the purpose
    of researching methods for practical CM compression, since I feel 32-bit software
    will still be relevant for decades to come. And if nothing else, ludicrous mode
    shows that we can still have decent gains by simply being wiser in using the
    memory we have.

    Right now I had planned on improving the 4bpp and 1bpp image models, since
    the ones I have were just a quick "hack" to see if they could give better
    compression in executable files. I also considered a PNG model, since many
    icons are now 256 by 256px PNG files, but it would suffer from the same limitations
    as the GIF model, and I think that PNG files don't really justify the effort, since,
    being a lossless format, the best compression would be to simply convert them
    to a superior file format.
    Last edited by mpais; 29th February 2016 at 00:16.

  37. The Following User Says Thank You to mpais For This Useful Post:

    Darek (29th February 2016)

  38. #27
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    467
    Thanks
    202
    Thanked 81 Times in 61 Posts
    Quote Originally Posted by mpais View Post
    I'm glad everything worked, I didn't really have much time to test everything.
    Ludicrous mode is just a switch that tells the models to use a (much) slower
    structure for the context stats, the best one I could came up with. The models
    remain the same, any improvement comes from that.

    During development I had a simple "rule": any new context added to a model
    should result in at least a further 0.1% reduction in relative compressed size,
    so I would consider a model "finished" when I couldn't meet that goal. Obviously
    this was only possible when testing on files that benefited from that model
    (no point in tuning a text model on binary data). So in order to further increase
    the compression ratio, I would most likely have to "beef up" some models and
    give them more memory. But if I keep increasing the memory usage, soon enough
    I'll have to switch to a 64-bit version of EMMA, and that would defeat the purpose
    of researching methods for practical CM compression, since I feel 32-bit software
    will still be relevant for decades to come. And if nothing else, ludicrous mode
    shows that we can still have decent gains by simply being wiser in using the
    memory we have.

    Right now I had planned on improving the 4bpp and 1bpp image models, since
    the ones I have were just a quick "hack" to see if they could give better
    compression in executable files. I also considered a PNG model, since many
    icons are now 256 by 256px PNG files, but it would suffer from the same limitations
    as the GIF model, and I think that PNG files don't really justify the effort, since,
    being a lossless format, the best compression would be to simply convert them
    to a superior file format.
    I think this is the right path to follow if you are really into state of the art development. Because using more memory is a cheap trick, the real deal is to find a way to do more with less resources. If you can compete on the big leagues with half memory requirements, you can still add more memory any time and then you will have the most powerful compression engine ever.

  39. The Following User Says Thank You to Gonzalo For This Useful Post:

    Darek (29th February 2016)

  40. #28
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    919
    Thanks
    541
    Thanked 363 Times in 271 Posts
    Enwik8 and Enwik9 my summary:

    EMMA V0.1 Enwik8 score is 18.299.608 bytes, time 2643,3s, hardware: i7 4900MQ, 2.8GHz Oc. to 3.7GHz, 16GB, Win7Pro 64. Options on screenshot (Options.jpg). Decompression verified. Time 3128,6s, SHA1 checksum OK.
    EMMA V0.1 Enwik9 score is 153.971.476 bytes, time 28943,2s, hardware: i7 4900MQ, 2.8GHz Oc. to 3.7GHz, 16GB, Win7Pro 64. Options on screenshot (Options.jpg). Decompression verified. Time 29225,6s, SHA1 checksum OK.

    EMMA V0.1.1 Enwik8 score is 18.204.956 bytes, time 8946,3s, hardware: i7 4900MQ, 2.8GHz Oc. to 3.7GHz, 16GB, Win7Pro 64. Options on screenshot (Options3.jpg). Decompression verified. Time 8711,7s, SHA1 checksum OK.
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	Options3.jpg 
Views:	213 
Size:	153.4 KB 
ID:	4109   Click image for larger version. 

Name:	Options.jpg 
Views:	146 
Size:	204.5 KB 
ID:	4110  
    Last edited by Darek; 29th February 2016 at 16:56.

  41. The Following User Says Thank You to Darek For This Useful Post:

    mpais (29th February 2016)

  42. #29
    Member
    Join Date
    Nov 2015
    Location
    ?l?nsk, PL
    Posts
    81
    Thanks
    9
    Thanked 13 Times in 11 Posts
    Quote Originally Posted by Darek
    1) Try to "clean-up" the code and open-source it, but I don't think it would be of much interest.
    I've written it mostly for research purposes, it's nowhere near "production" quality, and I would most
    likely have to rewrite large chunks of code, and that time could be spent on the other options.
    Research is what may generate interest. If code works well, people want to study it.

  43. #30
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    873
    Thanks
    460
    Thanked 175 Times in 85 Posts
    Thank you for your hard work, Márcio.
    I adore the GIF model and I would vote for a PNG model as well, because in everyday life, you always come across PNG everywhere
    no matter if better formats are available.
    Its good to see that reseach and experimenting on CM is still alive,
    although I think that fast CM variants such as MCM would be more practical.

    EMMA 0.1.1 does not detect any audio in the following files using preset fast (with audio model and delta enabled):

    https://drive.google.com/file/d/0ByL...ew?usp=sharing
    https://drive.google.com/file/d/0ByL...ew?usp=sharing
    https://drive.google.com/file/d/0ByL...ew?usp=sharing
    https://drive.google.com/file/d/0ByL...ew?usp=sharing

  44. The Following User Says Thank You to Stephan Busch For This Useful Post:

    mpais (29th February 2016)

Page 1 of 13 12311 ... LastLast

Similar Threads

  1. Context mixing file compressor for MenuetOS (x86-64 asm)
    By x3k30c in forum Data Compression
    Replies: 0
    Last Post: 12th December 2015, 06:19
  2. Context Mixing
    By Cyan in forum Data Compression
    Replies: 9
    Last Post: 23rd December 2010, 20:45
  3. Simple bytewise context mixing demo
    By Shelwien in forum Data Compression
    Replies: 11
    Last Post: 27th January 2010, 03:12
  4. Context mixing
    By Cyan in forum Data Compression
    Replies: 7
    Last Post: 4th December 2009, 18:12
  5. CMM fast context mixing compressor
    By toffer in forum Forum Archive
    Replies: 171
    Last Post: 24th April 2008, 13:57

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •