Page 2 of 5 FirstFirst 1234 ... LastLast
Results 31 to 60 of 140

Thread: another (too) fast compressor

  1. #31
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,423
    Thanks
    223
    Thanked 1,052 Times in 565 Posts
    gcc on windows (mingw,cygwin) doesn't automatically do the \n->\r\n conversion, like MSC/IntelC do,
    thus that kind of condition.

  2. #32
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Shelwien View Post
    gcc on windows (mingw,cygwin) doesn't automatically do the \n->\r\n conversion, like MSC/IntelC do,
    thus that kind of condition.
    Yeah, but how about the 100 other compilers for Windows?

  3. #33
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    Quote Originally Posted by Shelwien View Post
    gcc on windows (mingw,cygwin) doesn't automatically do the \n->\r\n conversion, like MSC/IntelC do,
    thus that kind of condition.
    try it:

    Code:
    #include <stdio.h>
    main()
    {
      printf("hi\neugene");
    }

    Code:
    C:\!\Haskell>g++ -O2 newline.cpp
    
    C:\!\Haskell>a.exe |od -c -x
    0000000   h   i  \r  \n   e   u   g   e   n   e
               6968    0a0d    7565    6567    656e

  4. #34
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    The cygwin and mingw versions of g++ work differently. cygwin is a UNIX-like environment, so stdin and stdout are already in binary mode. mingw is designed for Windows, so they are text mode by default. This usually works:

    Code:
    #ifndef unix  // assume Windows
    #include <fcntl.h>
    ...
    setmode(0, O_BINARY);  // stdin
    setmode(1, O_BINARY);  // stdout
    #endif

  5. #35
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    865
    Thanks
    463
    Thanked 260 Times in 107 Posts
    Thanks very much for your suggestions and comments. It works now.
    I finally selected the following test :
    Code:
    #ifdef _WIN32 
    #include <io.h>            // _setmode
    #include <fcntl.h>         // _O_BINARY
    #endif
    (...)
    #ifdef _WIN32 /* We need to set stdin/stdout to binary mode. Damn windows. */
                    _setmode( _fileno( stdin ), _O_BINARY );
    #endif
    which seems to work well so far with compilers (*nix gcc, mingw, visual)

    LZ4HC is now hosted at its own website, on Google Code.
    http://code.google.com/p/lz4hc/

    While at it, i also decided to improve the compression ratio (-c1 mode), by adding a lazy evaluation, which typically improves compression ratio by 2-5%.

    I guess the next logical step would be to experiment with Optimal Parsing. Since all fields are coded with fixed length, it should be a relatively favorable playing ground.
    Nonetheless, it looks like a fairly complex task to fully achieve.
    Last edited by Cyan; 5th September 2011 at 12:12.

  6. #36
    Member
    Join Date
    May 2007
    Location
    Poland
    Posts
    91
    Thanks
    8
    Thanked 4 Times in 4 Posts
    I like that LZ4(HC) has drag&drop, it is so easy and fast to compress like that. But that only works for individual files so it doesn't work with multiple files or folders. Can you add this functionality?

  7. #37
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    865
    Thanks
    463
    Thanked 260 Times in 107 Posts
    That's a good point.

    Unfortunately this functionality is more complex than it sounds.

    The main issue is that it requires to create an archive container format, as opposed to the simple file format generated by LZ4. This is a necessity to reproduce a folder structure.

    Although something very crude could be created fast, one has to wonder if it wouldn't be better to re-use one of the existings archive container format, since they tend to be well tested by their user-base, come with nice features (listing, testings, error correction, etc.) and may even come with an associated GUI.
    This last idea is very tempting, but unfortunately i still have to find a GPL/BSD code i can easily integrate LZ4 into...


    Regards

  8. #38
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,475
    Thanks
    26
    Thanked 121 Times in 95 Posts
    You can make a .BAT or .SH script (depending on OS) which executes TAR or something similiar before compressing with LZ4(HC).

  9. #39
    Member
    Join Date
    May 2007
    Location
    Poland
    Posts
    91
    Thanks
    8
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by Cyan View Post
    That's a good point.

    Unfortunately this functionality is more complex than it sounds.

    The main issue is that it requires to create an archive container format, as opposed to the simple file format generated by LZ4. This is a necessity to reproduce a folder structure.

    Although something very crude could be created fast, one has to wonder if it wouldn't be better to re-use one of the existings archive container format, since they tend to be well tested by their user-base, come with nice features (listing, testings, error correction, etc.) and may even come with an associated GUI.
    This last idea is very tempting, but unfortunately i still have to find a GPL/BSD code i can easily integrate LZ4 into...


    Regards
    This archive container format would be useful for all your programs. Fast compressors are often so fast that typing command lines/making .BAT is usually longer than the compression process itself - which kind of defeats the practicality of them.
    Drag&Drop TAR could be a solution too, just requiring 2 steps.
    Here http://www.cabiatl.com/mricro/ezgz/index.html is an example of D&D app (with source code), for making tar.gz, maybe it could help.

    regards
    Last edited by jethro; 5th September 2011 at 20:31.

  10. #40
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,423
    Thanks
    223
    Thanked 1,052 Times in 565 Posts

  11. #41
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    876
    Thanks
    473
    Thanked 175 Times in 85 Posts
    Hi there.. Can anyone please post a compiled ersion of latest LZ4HC for me?

  12. #42
    Member
    Join Date
    Jul 2006
    Location
    US
    Posts
    39
    Thanks
    26
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Stephan Busch View Post
    Hi there.. Can anyone please post a compiled ersion of latest LZ4HC for me?
    For which OS?

    The author produces compiled versions of all LZ4* builds for Windows:
    http://fastcompression.blogspot.com/p/lz4.html

    Also, previous releases can be found here:
    http://phantasie.tonempire.net/t95-l...-x206-260-mb-s

  13. #43
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    876
    Thanks
    473
    Thanked 175 Times in 85 Posts
    This version is from 13th december of last year. It hasnt got the -c0 and -c1 switches. Is it really the latest build?

  14. #44
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    865
    Thanks
    463
    Thanked 260 Times in 107 Posts
    Hi Stephan

    Indeed, version 0.9 hasn't changed much yet. However, it is still faster than the Open source version provided at Google Code.
    The reason for this is that the original build was heavily optimised, tuned for performance on windows with Visual. As a result, the code was deeply entangled, difficult to read and not modular. In order to become useful, MMC code had to become a stand-alone library with a clean interface, which as a result decreased performance.

    Therefore, as it stands, there is no performance benefit in compiling the GPL version of LZ4HC for windows. Note however that the open-source version is multi platform, compatible with multiple compilers, and can be compiled for Linux or OS-X for example.

    It is not the end of the story, since i have a complete rewrite of MMC ongoing, which i expect to improve performance (quite tricky to complete). When that's done, i'm likely to update the Windows binary build, and ship LZ4 and LZ4HC together.

    @jethro : yes, i understand and agree with your argument. I'll look into it.

    @Shelwien : that's a nice code you have provided. As usual, it is small, efficient and clearly written (Scan v6).
    If i do understand correctly, this is a directory scanner, it goes through a massive number of directories and files, quickly sort the filenames, and checksum the result ?
    Last edited by Cyan; 7th September 2011 at 16:47.

  15. #45
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,423
    Thanks
    223
    Thanked 1,052 Times in 565 Posts
    My code is windows-specific, but there's also Sami's class which is more portable - http://compressionratings.com/d_archiver_template.html
    Afair, Sami made that thing as a reply to complaints about his use of tar for testing of single-file compressors on compressionratings.
    I think, the idea was to let compressor developers to easily add archiver functions for fair testing.
    So I gave him that implementation to prove that STL is slow... and it ended up as a contest (btw, feel free to post another entry ;)

    However, if you find it (scan_v6) useful, I'd suggest to look at newer version somewhere like
    http://nishi.dreamhosters.com/u/paf_03a.rar (dirscanr.inc)
    (older/simpler versions of paf are also available there).

  16. #46
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    Eugene, are you tried to make scan multithreaded? it may further improve speed

  17. #47
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,423
    Thanks
    223
    Thanked 1,052 Times in 565 Posts
    1. paf does the scan in parallel with actual file compression,
    thus adding complexity to the scan likely doesn't make sense there.
    Also there're certain issues with timestamp preservation - paf
    is currently able to extract the archive with original timestamps (all 3)
    on both files and directories, which is afaik an unique feature.
    (Btw, such a feature is useful for backup applications.)

    2. As to Sami's scanner contest, I guess, it could help there
    (although only when tested on cached FS), but for now my implementation
    is the fastest there anyway :)
    Also I think the more interesting idea would be to read raw directories
    using something like http://msdn.microsoft.com/en-us/libr...(v=vs.85).aspx

    3. Actually windows has _lots_ of FS features, most of which are not supported
    anywhere (path limit when \\?\ prefix is not used, short names, embedded streams,
    compression/encryption/security, softlinks/reparse points), and the worst thing
    is that its normal to have multiple types of FS in the system (eg. flash drives
    are commonly formatted as FAT32; optical storage uses CDFS/UDF) and most features
    are FS-specific (eg. FAT32 has short filename aliases, but no streams),
    so its very tricky to implement as it is, even without weird speed optimizations.

  18. #48
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,475
    Thanks
    26
    Thanked 121 Times in 95 Posts
    Regarding binary mode. Here's snippet from http://zlib.net/zlib_how.html :
    Code:
    #if defined(MSDOS) || defined(OS2) || defined(WIN32) || defined(__CYGWIN__)
    #  include <fcntl.h>
    #  include <io.h>
    #  define SET_BINARY_MODE(file) setmode(fileno(file), O_BINARY)
    #else
    #  define SET_BINARY_MODE(file)
    #endif

  19. #49
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,423
    Thanks
    223
    Thanked 1,052 Times in 565 Posts
    which is the same thing as I posted, but nobody atm would care about MSDOS or OS2,
    and checking for cygwin doesn't make sense because its configurable there (and I use \n=LF in my setups).
    Also WIN32 is not automatically defined in MSC/IntelC, so that #if is basically all wrong
    (won't work right if I'd try to compile it with either of my 4 different compilers).

  20. #50
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    865
    Thanks
    463
    Thanked 260 Times in 107 Posts
    What would be a correct test in your opinion, Eugene ?

  21. #51
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,475
    Thanks
    26
    Thanked 121 Times in 95 Posts
    Shelwien:
    Zlib has makefiles for MSC, GCC on Windows, Borland and others. So either they did inset such flags into makefiles or those flags are actually defined by compilers.

    Lasse makes compressors/ archivers that work on a multitude of platforms. Quote from quicklz.com:
    The C version compiles trouble-free on many platforms such as Visual Studio on Windows, gcc on Linux, LLVM on FreeBSD, Xcode for iPhone, aCC for SPARC, xlC for POWER, and for ARM, SH4/5, etc, etc.
    exdupe works on Windows, Linux, HP UX, FreeBSD, and there are more to come. zlib surely is at least as portable as exdupe or quicklz. I don't see a reason to not make portable code if it doen't make life noticeably harder.

  22. #52
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,423
    Thanks
    223
    Thanked 1,052 Times in 565 Posts
    @Piotr:
    I don't see how that's related to what I said.
    Sure, linux libs usually also have "configure" scripts (or cmake or whatever else).

    As to Lasse, as far as i can see, his quicklz lib only implements memory-to-memory processing,
    and qpress has a magic macro "WINDOWS" which enables the use of winapi and such, so
    there's no setmode() stuff at all.

    @Cyan:
    Making up your own macro flag and letting the user to define it (or not) is probably the best idea overall.
    Otherwise there's no certain way to test such stuff.
    Well, i'm checking __GNUC__ in that case, because it works for all my compilers and platforms, but this is certainly
    not a 100% stable solution.

  23. #53
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    471
    Thanks
    26
    Thanked 9 Times in 8 Posts
    Would _WIN32 work instead of WIN32?
    I am... Black_Fox... my discontinued benchmark
    "No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates

  24. #54
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,423
    Thanks
    223
    Thanked 1,052 Times in 565 Posts
    _WIN32 would work, it seems.
    I tried compiling this with MSC/IntelC/IntelC-x64/mingw/cygwin
    Code:
    #include <stdio.h>
    
    int main( void ) {
      int a = 10;
      fwrite( &a, 1,1, stdout );
      return 0;
    }
    and first 4 had _WIN32 and expanded LF to CRLF, while cygwin didn't.

  25. #55
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    Quote Originally Posted by Black_Fox View Post
    Would _WIN32 work instead of WIN32?
    _WIN32 and __CYGWIN__ are defined by compilers while MSDOS/OS2/WIN32 are defined by zlib build scripts

  26. #56
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    865
    Thanks
    463
    Thanked 260 Times in 107 Posts

    LZ4 reaches v1.0

    A quick note to tell that LZ4 binary is now available as v1.0.

    The new version can be grabbed at its usual homepages :
    - http://fastcompression.blogspot.com/p/lz4.html
    - http://phantasie.tonempire.net/t95-l...-x207-295-mb-s)

    Compared to previous release (v0.9), the main change is that it integrates 3 different compression mode, effectively integrating LZ4-HC. They can be accessed using -c0, -c1 or -c2 command switches.

    v1.0 also benefits from recent progresses of the open-source version, as is measured in this table :

    No code has to be inserted here.

    The open-source LZ4 is integrated as c0 mode.
    The open-source LZ4HC is integrated as c2 mode.

    Regards
    Last edited by Cyan; 19th September 2011 at 03:14.

  27. #57
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    865
    Thanks
    463
    Thanked 260 Times in 107 Posts

    LZ4 for ARM

    Hi

    The latest release of LZ4 (r33) has been created to support ARM processors, and more generically processors requiring aligned memory access. Thanks to Vlad Grachov for providing the __attribute__ ((packed)) to allow this result.

    The code has been validated on a single ARM platform so far. Therefore, if anyone is able and willing to test it on its own ARM system, please feel free to comment. This will help.

    http://code.google.com/p/lz4/


    Regards
    Last edited by Cyan; 27th September 2011 at 18:30.

  28. #58
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Cyan, you're very active recently. It makes me want to take a pause from benchmarking LZ4, because the results would be outdated soon. I expect to have some free time soon to make an overdue multithreaded benchmark comparison, but seeing there's so many changes, I'd rather wait some more. Could you send some notification when you think the LZ4 score is worth freezing?

  29. #59
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    865
    Thanks
    463
    Thanked 260 Times in 107 Posts
    Hi
    It's a bit difficult to tell, since i'm also surprised by the latest updates achieved on LZ4 lately.
    It seems i'm constantly thinking "well, this time, it is the last update", and then, out of nowhere, a tiny little idea for improvement just reach a sleeping brain-cell.

    As far as performance goes, we should be pretty close to the limit of LZ4 format now. Performance have not changed since r27 btw.
    My last planned update is expected to target compression ratio, which may improve by a small amount.
    The other 2 candidates items on my list are :
    - Speed improvement on ARM (but i need to properly setup an ARM testbed for that)
    - Specific mode for small packets (<64KB)

    This last item is probably most interesting for you. So i guess once it is completed (or invalidated, since it might end up being unsuccessfull), this will be good enough for your reviewing.

    In any case, don't feel shy to evaluate LZ4 "as it is", whenever it is convenient for your planning. After all, software constantly evolve (if not improve ), so we have to live with that ...

  30. #60
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Cyan View Post
    Hi
    It's a bit difficult to tell, since i'm also surprised by the latest updates achieved on LZ4 lately.
    It seems i'm constantly thinking "well, this time, it is the last update", and then, out of nowhere, a tiny little idea for improvement just reach a sleeping brain-cell.

    As far as performance goes, we should be pretty close to the limit of LZ4 format now. Performance have not changed since r27 btw.
    My last planned update is expected to target compression ratio, which may improve by a small amount.
    The other 2 candidates items on my list are :
    - Speed improvement on ARM (but i need to properly setup an ARM testbed for that)
    - Specific mode for small packets (<64KB)

    This last item is probably most interesting for you. So i guess once it is completed (or invalidated, since it might end up being unsuccessfull), this will be good enough for your reviewing.

    In any case, don't feel shy to evaluate LZ4 "as it is", whenever it is convenient for your planning. After all, software constantly evolve (if not improve ), so we have to live with that ...
    Understood.
    Actually the most interesting size for me is exactly 128K, but smaller ones are OK too. Something tells me that after a major breakage you'll be able to find improvements for a long time...but OK, I'll test when I have time and maybe do a followup later. The hardest thing is getting several CPUs for the tests, but I don't expect changes that would have very special effect on only 1 CPU, so after comprehensive preliminary testing, followups on a smaller hardware base should be OK...right?

    And as to evolution, zlib gets slightly less than 1 update per year.

Page 2 of 5 FirstFirst 1234 ... LastLast

Similar Threads

  1. Blizzard - Fast BWT file compressor!!!
    By LovePimple in forum Data Compression
    Replies: 40
    Last Post: 6th July 2008, 15:48
  2. PACKET v.0.01 new fast compressor !
    By Nania Francesco in forum Data Compression
    Replies: 45
    Last Post: 19th June 2008, 02:44
  3. RINGS Fast Bit Compressor.
    By Nania Francesco in forum Forum Archive
    Replies: 115
    Last Post: 26th April 2008, 22:58
  4. Tornado - fast lzari compressor
    By Bulat Ziganshin in forum Forum Archive
    Replies: 23
    Last Post: 27th July 2007, 14:26
  5. Fast PPMII+VC Compressor
    By in forum Forum Archive
    Replies: 4
    Last Post: 2nd August 2006, 20:17

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •