Results 1 to 30 of 71

Thread: Etincelle - new compression

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    892
    Thanks
    492
    Thanked 280 Times in 120 Posts

    Etincelle - new compression program

    Hi

    I wish to offer to your scrutiny an early look to the first public release of etincelle, a new fast compression software, with some interesting compression ratio / speed properties.

    [Update] : latest version can be found at webpage :
    http://phantasie.tonempire.net/pc-co...e-t102.htm#160
    Latest version is RC2 :
    - improved speed and compression rate

    [Older versions]
    RC1 :
    - Default Dictionary 128MB
    - Better compression on binary files
    - Benchmark mode accepts large files

    beta 4 :
    - Long repetitions detection and support

    beta 3 :
    - major speed gains for files containing incompressible segments

    beta 2 :
    - small compression and speed gains
    - minor bugfix, on error message for insufficient memory

    Beta 1 :
    - selectable dictionary size (from 1MB to 3GB)

    Alpha 3 : http://sd-1.archive-host.com/membres...lle-alpha3.zip
    - drag'n'drop interface support
    - benchmark mode support

    Alpha2 : http://sd-1.archive-host.com/membres...lle-alpha2.zip
    - improved global speed
    - bugfix on decoding i/o

    Alpha1 version can be downloaded here : http://sd-1.archive-host.com/membres.../Etincelle.zip

    It gets close to 90MB/s on my system, while providing better compression than zip's best modes. An especially good use-case seems to be "Mail Archives", like outlook.pst files, which are plentyfull of identical attached files, thanks to etincelle's capability to find matches at larges distances (up to 1GB in this version).

    For your comments and evaluation. There are still features & controls i want to add, but main properties (speed and compression ratio) should be quite close to final results.

    Edit : Updated graphical comparison of fast compressors :
    http://phantasie.tonempire.net/pc-co...rk-t96.htm#149

    Regards
    Last edited by Cyan; 23rd April 2010 at 15:06.

  2. #2
    Member
    Join Date
    Oct 2007
    Location
    Germany, Hamburg
    Posts
    409
    Thanks
    0
    Thanked 5 Times in 5 Posts
    Your new compression algorithm seems to be really good in it's first version.
    I used a tar file I tested precomp with, which includes mostly eclipse and maybe part of a game folder. 512mb at all.

    Etincelle
    Code:
    Compression completed : 512.0MB --> 332.0MB  (64.84%) (348089108 Bytes)
    
    Compression Time : 13.62s ==> 39.4MB/s
    Total Time : 33.39s   ( HDD Read : 17.11s / HDD Write : 2.65s / CPU : 13.62s )
    time: elapsed: 33390ms, kernel: 1250ms, user: 13593ms
    SlugX
    Code:
    524288.00 KB -> 330011.43 KB (62.94%, 337931705 bytes)
    time: elapsed: 32343ms, kernel: 1453ms, user: 18656ms
    It is at eye level with SlugX. Don't take timing too serious, I ran them with many programs parallel.

  3. #3
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    892
    Thanks
    492
    Thanked 280 Times in 120 Posts
    Thanks for testing, Simon.
    Indeed, SlugX is a tough target to reach, and i'm not trying to beat it on ratio, i'm interested in keeping a speed advantage at this stage.

    Speaking about speed, Etincelle uses a 2MB table for storing pointers. For modern processors which are plentyfull of cache, this works well.

    But i suspect a speed hit for systems with less cache (2MB or even less). Even more so for older processors.
    How much could be the speed impact, i don't know. Maybe it is not that big, maybe it makes a real difference.
    A work-around could be to introduce new modes using less memory (obviously in exchange for a hit on compression ratio).

    Alas, this is something i cannot test alone with my only Core 2. I need your advises and measures to challenge this hypothesis.

    Simon, would you be so kind as telling which size is your L2 cache in your test ?

    Best Regards
    Last edited by Cyan; 28th March 2010 at 16:02.

  4. #4
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,593
    Thanks
    801
    Thanked 698 Times in 378 Posts
    Cyan, it adds 50-100 cycles to almost every table access. it's easy to test - just increase dictionary and table 8x or so

  5. #5
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,040
    Thanks
    104
    Thanked 420 Times in 293 Posts

  6. #6
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,583
    Thanks
    234
    Thanked 160 Times in 90 Posts

    Hi!

    I wanted to signal that my program of test has signalled error on these files of new MOC 2010..

    http://www.random.org/files/2009/2009-12-27.bin
    http://www.random.org/files/2009/2009-12-26.bin

    please you can give confirmation!

  7. #7
    Member
    Join Date
    Jul 2014
    Location
    chongqing
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I want the etincelle's source code, where are download?

  8. #8
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    892
    Thanks
    492
    Thanked 280 Times in 120 Posts
    There is none. The project of Open-sourcing Etincelle did not reached completion status.

    Considering my current free time,
    with all of it currently gobbled for the LZ4 framing layer,
    and next stages planned to be concentrated on the next version of Zhuff,
    there is very little chance this item will get through anytime soon.

    Developing on free time only makes for a quite limited workforce.

  9. #9
    Member
    Join Date
    Jul 2014
    Location
    chongqing
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I think, the lz4 is very slower than etincelle,
    etincelle rc2 alreday can use in work.
    so, I very want current etincelle rc2 code.
    I can convert to stream mode, use in work.
    I'm chinese, english is say bad, sorry.
    mybe you provide a DLL for etincelle rc2, thank you very math!

  10. #10
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,505
    Thanks
    26
    Thanked 136 Times in 104 Posts
    But opening source doesn't require a lot of work either. Unless you have something to hide (patented algos inside? ), ie replace.

  11. #11
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    Quote Originally Posted by Piotr Tarsa View Post
    But opening source doesn't require a lot of work either. Unless you have something to hide (patented algos inside? ), ie replace.
    Yeah. The source could have been opened in the time it took to post to this board. <g> Also, open-sourcing instantly grows the potential workforce from 1 -> billions.

  12. #12
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 798 Times in 489 Posts
    Opening source is easy. The expensive part is all the time spent answering questions from people trying to understand your code. Probably nothing is documented, and it has to be for open source to be useful. Who wants to spend time doing that for abandoned code?

  13. Thanks:

    Cyan (3rd September 2014)

  14. #13
    Member
    Join Date
    Jan 2017
    Location
    Selo Bliny-S'edeny
    Posts
    24
    Thanks
    7
    Thanked 10 Times in 8 Posts
    So was the "incompressible segment detection" algorithm described? Is it a precise algorithm which says something like, "cut off right here"? - as opposed to simply "tread at a faster pace"?

  15. #14
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    892
    Thanks
    492
    Thanked 280 Times in 120 Posts
    It was a simpler "tread at a faster pace" strategy.
    Worked great for speed, and also produced a nice little win for compression ratio.
    Last edited by Cyan; 10th December 2018 at 08:34.

  16. #15
    Member
    Join Date
    Jan 2017
    Location
    Selo Bliny-S'edeny
    Posts
    24
    Thanks
    7
    Thanked 10 Times in 8 Posts
    After an already compressed/incompressible block is skipped by the match finder, the resumed matches come at a greater price: the distances with the preceding block are much longer. Thus an advanced algorithm targeting high compression ratios should try to "excise", rather than just "skip", the incompressible blocks. Was something like this ever attempted?

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •