View Poll Results: The default window size of CRUSH should be:

Voters
16. You may not vote on this poll
  • 512 KB

    1 6.25%
  • 1 MB

    4 25.00%
  • 2 MB

    11 68.75%
Page 2 of 2 FirstFirst 12
Results 31 to 50 of 50

Thread: CRUSH 0.01 is here!

  1. #31
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,964
    Thanks
    367
    Thanked 341 Times in 134 Posts
    Uploaded a new version (the first actual open-source release)

    It's compatible with Linux, so this time Linux users may compile CRUSH at their own! (g++ -O3 crush.cpp or something like that) Since my Linux compiles never work with different Linux distributions anyway...

    LTCB:
    Code:
    c:\Test>crush cx enwik8 enwik8.z
    Compressing enwik8...
    100000000 -> 31731711 in 115.175s
    
    c:\Test>timer crush d enwik8.z e8
    
    Timer 3.01  Copyright (c) 2002-2003 Igor Pavlov  2003-07-10
    Decompressing enwik8.z...
    31731711 -> 100000000 in 0.499s
    
    Kernel Time  =     0.046 = 00:00:00.046 =   9%
    User Time    =     0.296 = 00:00:00.296 =  59%
    Process Time =     0.343 = 00:00:00.343 =  68%
    Global Time  =     0.499 = 00:00:00.499 = 100%
    
    c:\Test>crush cx enwik9 enwik9.z
    Compressing enwik9...
    1000000000 -> 279491430 in 991.272s
    
    c:\Test>timer crush d enwik9.z e9
    
    Timer 3.01  Copyright (c) 2002-2003 Igor Pavlov  2003-07-10
    Decompressing enwik9.z...
    279491430 -> 1000000000 in 4.898s
    
    Kernel Time  =     0.592 = 00:00:00.592 =  12%
    User Time    =     2.574 = 00:00:02.574 =  52%
    Process Time =     3.166 = 00:00:03.166 =  64%
    Global Time  =     4.899 = 00:00:04.899 = 100%
    System: Intel Core i7-3770K @ 4.6 GHz, Corsair Vengeance LP 16 GB @ 1800 MHz CL9, Corsair Force GS 240 GB SSD


  2. #32
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts

  3. The Following User Says Thank You to Matt Mahoney For This Useful Post:

    encode (2nd July 2013)

  4. #33
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,964
    Thanks
    367
    Thanked 341 Times in 134 Posts
    Thanks for update! But it is Pareto Frontier, isn't it?

  5. #34
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Oops, you're right, for decompression speed.

    Also, updated the Silesia benchmark. Compression is better for cx and c, worse for cf. http://mattmahoney.net/dc/silesia.html

  6. The Following User Says Thank You to Matt Mahoney For This Useful Post:

    encode (2nd July 2013)

  7. #35
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,964
    Thanks
    367
    Thanked 341 Times in 134 Posts
    Retested CRUSH with a higher overclock, to conform system specs noted at LTCB (60):
    Intel Core i7-3770K, 4.8 GHz, 16 GB Corsair Vengeance LP 1800 MHz CL9, Corsair Force GS 240 GB SSD, Windows 7 SP1.
    Code:
    c:\Test>crush cx enwik9 enwik9.z
    Compressing enwik9...
    1000000000 -> 279491430 in 948.466s
    
    c:\Test>timer crush d enwik9.z e9
    
    Timer 3.01  Copyright (c) 2002-2003 Igor Pavlov  2003-07-10
    Decompressing enwik9.z...
    279491430 -> 1000000000 in 4.711s
    
    Kernel Time  =     0.452 = 00:00:00.452 =   9%
    User Time    =     2.496 = 00:00:02.496 =  52%
    Process Time =     2.948 = 00:00:02.948 =  62%
    Global Time  =     4.727 = 00:00:04.727 = 100%

  8. The Following User Says Thank You to encode For This Useful Post:

    Matt Mahoney (2nd July 2013)

  9. #36
    Member
    Join Date
    Sep 2011
    Location
    uk
    Posts
    237
    Thanks
    187
    Thanked 16 Times in 11 Posts
    Did I miss something- downloaded latest - don't see compression 1..9 as mentioned in forum only f & x. John

  10. #37
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,964
    Thanks
    367
    Thanked 341 Times in 134 Posts
    I've tested CRUSH with 1..9 options for a few days, and have found it slightly less useful (or intuitive) as simple f..x. Also, probably, CRUSH needs just no more than four compression levels. Anyway, it's the Public Domain and you can do whatever you want - change window size - as an example, if you're compressing small files like Nintendo DS ROMs you may set the window size to 128 KB, for huge files, set it to 8 MB or even 16+ MB. Same thing with match finder - if it's too slow reduce max_chain[] constant and/or disable Lazy Matches!

  11. #38
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by encode View Post
    Anyway, it's the Public Domain and you can do whatever you want - change window size - as an example, if you're compressing small files like Nintendo DS ROMs you may set the window size to 128 KB, for huge files, set it to 8 MB or even 16+ MB. Same thing with match finder - if it's too slow reduce max_chain[] constant and/or disable Lazy Matches!
    It would be useful if you added this statement to comments at the beginning of a file, so users can find it easily.

    I see that you haven't made it thread safe. I want it this way in fsbench, so I will change it. I don't like to deviate from the base though...wouldn't you like to have it this way too? It's only adding a constructor and removing statics really.

  12. #39
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,964
    Thanks
    367
    Thanked 341 Times in 134 Posts
    Actually, it's too simple to bother. Decompressor needs no extra memory (except the buffer - buf). The compressor additionally uses head and prev for hash chains. That's it!
    So you can pass these arrays as function parameters. In addition, compress() in a moment can be transformed to compress_block() just comment while() loop definition and pass buf and size as function parameters.
    For Nintendo DS this code works.
    Public Domain is the Ultimate Freedom!

  13. #40
    Member
    Join Date
    Sep 2011
    Location
    uk
    Posts
    237
    Thanks
    187
    Thanked 16 Times in 11 Posts
    re last from m^2 assume you'll modify & make public. J

  14. #41
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by avitar View Post
    re last from m^2 assume you'll modify & make public. J
    I intend to do so.

  15. #42
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,964
    Thanks
    367
    Thanked 341 Times in 134 Posts
    Well, I will collect user/developer feedback during next few weeks. And based on that feedback will do something. From small changes - to make code even more self-explanatory, to not so small changes.
    The most obvious thing is make mem->FILE* interface a la high level bzip2 library. Honestly, it is possible to make a huge number of things from mem->mem interface to more optimized bit I/O (as example more flexible fill and shift of bit buffer - not at each get_bit() call). Anyway, the most optimizations may kill readability - i.e. such things must be added with care. Or else it will be like LZO source...
    So, please share your opinion/ideas! (And take into account, the only reason I've made CRUSH open source is user feedback - a large number of emails. So, don't be silent!)

  16. #43
    Member
    Join Date
    Sep 2011
    Location
    uk
    Posts
    237
    Thanks
    187
    Thanked 16 Times in 11 Posts
    '
    Actually, it's too simple to bother. Decompressor needs no extra memory (except the buffer -
    buf). The compressor additionally uses head and prev for hash chains. That's it!
    So you can pass these arrays as function parameters. In addition, compress() in a moment can be transformed to compress_block() just comment while() loop definition and pass buf and size as function parameters.

    '

    Great that you are keeping control imo. The reason I raised the issue re m^2 above was that in your 2nd July entry you seemed to be saying 'it is all open source so just modify it as needed'. It may be appear to be simple to frig c++ code, but it rarely is. Problem with this is of course that one ends up with n versions with maybe bugs - better to have one point of contact, so we should all be pleased that you've time to do it. J.

  17. #44
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    I would leave it like it is. It is simple enough.

  18. The Following User Says Thank You to Matt Mahoney For This Useful Post:

    encode (11th July 2013)

  19. #45
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    Quote Originally Posted by encode View Post
    So, please share your opinion/ideas! (And take into account, the only reason I've made CRUSH open source is user feedback - a large number of emails. So, don't be silent!)
    I would also love to see a library. I'd like to integrate it into something I've been working on but I/O streams are a bit limiting. A zlib-style API would be perfect for me... In principle, would you be receptive to such a change?

  20. #46
    Member
    Join Date
    Sep 2011
    Location
    uk
    Posts
    237
    Thanks
    187
    Thanked 16 Times in 11 Posts
    nemequ

    I'm interested re why do you want to use crush vs the many other compressors in this forum? Do you think the testing shows it is better than them?

    J

  21. #47
    Member
    Join Date
    Jul 2013
    Location
    United States
    Posts
    194
    Thanks
    44
    Thanked 140 Times in 69 Posts
    I don"t want to hijack this thread, so if you (or anyone else) wants to discuss this further maybe we should create a new thread? The only reason I didn't create one for my project already is I didn't think there would be any interest here.
    Quote Originally Posted by avitar View Post
    I'm interested re why do you want to use crush vs the many other compressors in this forum?
    The "vs" part isn't quite accurate. Squash uses plugins for each compression library it supports (it already has plugins for bzip2, FastLZ, LZ4, liblzma, LZF, LZO, QuickLZ, Snappy, and zlib), so the runtime cost of supporting a lot of different libraries is quite low. Also, the development cost of adding a new plugin is pretty trivial?see the lzf plugin for an example. In a perfect world I'd love to add support for all compressors posted here. That said, CRUSH does have a few things going for it which make it a bit more compelling than most of the stuff in this forum....
    • Licensing. Public domain is great, as are liberal open-source licenses (MIT, BSD, Apache, etc.). GPL, which seems to be popular around here (especially for the PAQ stuff, for obvious reasons) is quite limiting. Proprietary and license-free is pretty much a deal breaker for me; anyone can distribute a plugin for Squash as part of their project but I don't have any interest in distributing plugins for proprietary software with Squash.
    • At least for now, CRUSH seems to have an active developer who is receptive to feedback. Hopefully that receptiveness extends to matters other than just compression ratio and speed, such as API. Registering a SourceForge project indicates, at least to me, that the author would like to see the project evolve based feedback from the software development community, not just the data compression community (as posting here does).
    • Versioning. It is hard to target a project for which the format or (for libraries) API changes in an incompatible way with every release. I may not have seen a firm commitment to a stable format for CRUSH, but version "1.00" does imply that. On the other hand, if it were just version "1" I wouldn't expect such guarantees.
    • The source code for CRUSH is pretty straightforward. I admit I haven't read the code for many of the things posted here, but I (or someone else) should be able to turn CRUSH into exactly what I need for a good Squash plugin quite quickly.

    Given the above, if anyone has suggestions for other compressors which they think would be a good fit I'd be happy to hear them. Again, I don't want to hijack this thread, so please either use the issue tracker, create a new thread, e-mail me, etc.
    Do you think the testing shows it is better than them?
    No. I think the testing shows it to be generally competitive, but not necessarily better.

  22. #48
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,964
    Thanks
    367
    Thanked 341 Times in 134 Posts
    Actually, I have a small amount of spare time at the moment. However, in near future I'll test a few things:
    + Variable-length position slot coding. (Instead of a fixed 4-bit code)
    + 2-byte matches. To keep literal count as low as possible (literals are unencoded)
    + 3,4,5-byte hash chains (4-byte hash is not enough for 1 MB+ window)
    + Different literal coding scheme. As example:
    0 0..127 - Text literal (8-bits)
    1 0 128..255 - Binary literal (9-bits)
    1 1 ? - Match (2 + ? Bits)
    Saving one bit for text literal at the cost of an extra bit per match. Questionable thing, however...
    + And of course some code cleanups - at least keep clear file->file, buffer->file for compression and file->buffer for decompression interfaces. I must admit I always focused on compression, not on compatibility, readability and other non-compression related stuff.
    As to better or not. Well, it's provocative question. CRUSH has some unique properties. Nice compression - better than Deflate in most cases. It's flexible and can be a good base for any new LZ-based coder. With some tweaks it can be anything - from online filesystem compressor or NES ROM packer to LZMA competitor (with added arithmetic coder). And frankly speaking, I really don't care about anything - it's non-profitable project - I earned zero dollars with that, so why should I care? Because, I'm just a huge fan of all that compression stuff.

  23. #49
    Member
    Join Date
    Aug 2013
    Location
    United States
    Posts
    4
    Thanks
    4
    Thanked 4 Times in 2 Posts
    I've played with crush and it does well with small to medium sized files, but gets pathologically slow when using "cx" on large files. I tried using 'cx' on a large database dump (~60GB file) but it doesn't finish after ~48-hours of running.

    Looking at the source I can see how to adjust the window size pretty easily, but not the "max_chain[] constant and/or disable Lazy Matches" (from an earlier post). How to adjust those?

    (FWIW, I'm building crush with gcc 4.8.2 with -O3)

  24. #50
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,964
    Thanks
    367
    Thanked 341 Times in 134 Posts
    Code:
    const int max_chain[]={4, 256, 1<<12};
    max_chain for level 0,1,2 - fast,normal,max

    To change max_chain for, say, max mode:
    Code:
    const int max_chain[]={4, 256, 512};
    As to lazy matches - it's line #214:
    Code:
    if ((level>=2)&&(len>=MIN_MATCH)&&(len<max_match))
    ...
    Just comment out this or change to desired level.
    As example, to activate lazy matches in normal mode:
    Code:
    if ((level>=1)&&(len>=MIN_MATCH)&&(len<max_match))
    ...
    Hope this helps!

Page 2 of 2 FirstFirst 12

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •