Results 1 to 28 of 28

Thread: Some of my toy compressors

  1. #1
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    160
    Thanks
    18
    Thanked 56 Times in 27 Posts

    Talking Some of my toy compressors

    I write them just for fun and practice... It's based on LZSS method, but both the speed and ratio is bad. I hope you can give it a try and send me some ideas to improve it?
    http://comprox.googlecode.com/files/...110927.tar.bz2 (with source and executable)

    There is also another toy of mine here, using BWTS transform, but with damn slow speed. I'm still looking for ways to make BWTS run faster: http://comprox.googlecode.com/files/...110927.tar.bz2 (with source and executable)

  2. #2
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Very interesting. I tested it on LTCB.
    http://mattmahoney.net/dc/text.html#2428
    http://mattmahoney.net/dc/text.html#2875

    Not sure how to make it faster, but you could probably improve compression by using larger block sizes, like with bwtsdc. That program also uses 5x block size but it looks like yours uses 24x. But BWTS does not seem to offer any compression improvement over regular BWT as far as I know. I just use libdivsufsort.

    I did try compiling with MinGW but saw only a very small speed improvement. But since you compress in blocks, you could make it faster using multithreading. I do that in zpaq, but I need different versions for Windows and Linux. Another problem is that it increases memory so you are forced to use smaller blocks with worse compression. libdivsufsort is multithreaded without increasing memory, but only for context sorting, not modeling, and not for the inverse transform, and only if you have OMP, which only works in Linux AFAIK.

  3. #3
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,475
    Thanks
    26
    Thanked 121 Times in 95 Posts
    libdivsufsort is multithreaded without increasing memory, but only for context sorting, not modeling, and not for the inverse transform, and only if you have OMP, which only works in Linux AFAIK.
    Is this confirmed? Where are the scaling benchmarks for libdivsufsort?

  4. #4
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts

    Cool Very nice incoming!

    Good beginning really
    I insert the results in the next release of MOC!

    Hi! Nice job!

  5. #5
    Programmer Gribok's Avatar
    Join Date
    Apr 2007
    Location
    USA
    Posts
    159
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Piotr Tarsa View Post
    Is this confirmed? Where are the scaling benchmarks for libdivsufsort?
    libdivsufsort is multithreaded without increasing memory, but multithreading is done only for first steps of B* buckets sorting. Other steps are serial and can't be paralleled. bsc uses libdivsufsort with some modifications to save multiple indexes instead of primary only. This allow multithreaded inverse transform since reconstruction can be started from multiple indexes. OpenMP works well for both Linux and Windows.
    Enjoy coding, enjoy life!

  6. #6
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    160
    Thanks
    18
    Thanked 56 Times in 27 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Very interesting. I tested it on LTCB.
    http://mattmahoney.net/dc/text.html#2428
    http://mattmahoney.net/dc/text.html#2875

    Not sure how to make it faster, but you could probably improve compression by using larger block sizes, like with bwtsdc. That program also uses 5x block size but it looks like yours uses 24x. But BWTS does not seem to offer any compression improvement over regular BWT as far as I know. I just use libdivsufsort.

    I did try compiling with MinGW but saw only a very small speed improvement. But since you compress in blocks, you could make it faster using multithreading. I do that in zpaq, but I need different versions for Windows and Linux. Another problem is that it increases memory so you are forced to use smaller blocks with worse compression. libdivsufsort is multithreaded without increasing memory, but only for context sorting, not modeling, and not for the inverse transform, and only if you have OMP, which only works in Linux AFAIK.
    I just want to keep the code clean and didn't play tricks on memory size (i.e. for a 4MB block, 22bit per an integer is enough). But in BWTS, we need extra space to store information of each lyndon word. So it's not easy to reduce memory space in BWTS?
    I'm on linux and openmp/pthread should work for me. but that will make the comprox_ba eat more space (100MB for a 4MB block). I think I should try multi-thread on comprox_sa because it uses no more than 8x block size.

  7. #7
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    I would suggest C++ 2011 multithreading, over time it should get more portable than any other solution.

  8. #8
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    471
    Thanks
    26
    Thanked 9 Times in 8 Posts
    Until now there has been Clanlib or TinyThread++, but ClanLib is of course aimed at a little different purpose and TinyThread++ should be soon obsoleted by increasing adherence of compilers to C++11 standard.
    I am... Black_Fox... my discontinued benchmark
    "No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates

  9. #9
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    160
    Thanks
    18
    Thanked 56 Times in 27 Posts

  10. #10
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Nice. Both programs are faster. I tested on 2 machines, one in WinXP and one in Ubuntu. http://mattmahoney.net/dc/text.html

    How did you compile them? I am still trying to figure out how to get openmp to work in Windows. I have MinGW 4.5.0.

  11. #11
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    160
    Thanks
    18
    Thanked 56 Times in 27 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Nice. Both programs are faster. I tested on 2 machines, one in WinXP and one in Ubuntu. http://mattmahoney.net/dc/text.html

    How did you compile them? I am still trying to figure out how to get openmp to work in Windows. I have MinGW 4.5.0.
    I compiled the .exe using MinGW under linux... the command is 'g++ -Wall -O3 -fopenmp xxx.c -o xxx.exe'.
    Maybe you don't have libgomp installed? Go to http://sourceforge.net/projects/ming...n4/gcc-4.6.1-2 and download the libgomp package, replace gcc-4.6.1-2 with your gcc version.

  12. #12
    Member Vacon's Avatar
    Join Date
    May 2008
    Location
    Germany
    Posts
    523
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hello everyone,

    Quote Originally Posted by RichSelian View Post
    I compiled the .exe using MinGW under linux... the command is 'g++ -Wall -O3 -fopenmp xxx.c -o xxx.exe'.
    Just to let you know:
    The version comprox_sa_20110927 runs on Win 7 Starter (it prints it's options on the commandline, didn't made more tests so far)
    but
    version comprox_sa_20110928 gives some errormessage: " The program cannot be started, because 'pthreadGC2.dll" is missing on this computer. Reinstall the program to solve the problem." (translated from german).

    Edit: same is true for comprox_ba_20110927 / comprox_ba_20110928

    Best regards!
    Last edited by Vacon; 29th September 2011 at 10:37.

  13. #13
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    160
    Thanks
    18
    Thanked 56 Times in 27 Posts
    Quote Originally Posted by Vacon View Post
    Hello everyone,


    Just to let you know:
    The version comprox_sa_20110927 runs on Win 7 Starter (it prints it's options on the commandline, didn't made more tests so far)
    but
    version comprox_sa_20110928 gives some errormessage: " The program cannot be started, because 'pthreadGC2.dll" is missing on this computer. Reinstall the program to solve the problem." (translated from german).

    Edit: same is true for comprox_ba_20110927 / comprox_ba_20110928

    Best regards!
    I haven't try it on a Windows7 computer but it seems that you miss some DLLs provided by mingw.
    Anyway you can compile the code using your favorite compiler. Any compiler supporting standard C++ should work with the code under single-threaded mode. And compilers like vc2010 can also support openmp.

  14. #14
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    160
    Thanks
    18
    Thanked 56 Times in 27 Posts
    Oh I made a stupid mistake that I wrote "#ifdef OPENMP" instead of "#ifdef _OPENMP". That makes the decompression still run in single-thread mode. Fixed version is here:
    http://comprox.googlecode.com/files/...110929.tar.bz2
    http://comprox.googlecode.com/files/...110929.tar.bz2

  15. #15
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    AFAIK all threading functions of mingw (pthreads, openmp, c++ threads) use the dll, it comes with the compiler.

  16. #16
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts

    Wink Hi from Francesco!

    Test compile comprox_sa.cpp
    with Code::Bloks (selected in link libraries: libgomp-1.dll) -fopenmp -O3
    my results:
    1) one Thread is fast how much two!
    2) sometimes with openmp the programs (others) they slow down of 200%-300%
    3) for me the multithreading it is not openmp!

    Naturally I am speaking about the compilation and the compressor it is a lot interesting!

  17. #17
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    pthreadGC2.dll is also from http://sourceware.org/pthreads-win32/
    I had it already installed, so I did not get an error message.

  18. #18
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    I updated the benchmark. Decompression is now multithreaded for both programs in both Windows and Linux.
    http://mattmahoney.net/dc/text.html#2428
    http://mattmahoney.net/dc/text.html#2875
    You forgot to mention that compression is also slightly improved

    I also got multithreading to work when I compiled it myself. I updated mingw to the newest version (4.6.1) with a normal install and it all worked. Also, I was able to produce much smaller executables by compiling with just g++ -O3 -fomit-frame-pointer -s -fopenmp and compressing with upx.

    12,288 comprox_ba.exe
    9,728 comprox_sa.exe

    Edit: turns out that the programs require pthreadGC2.dll which is installed with mingw 4.6.1 but apparently not with 4.5.0 Most users probably won't have this file so you might want to include it with your distribution.
    Last edited by Matt Mahoney; 30th September 2011 at 05:33.

  19. #19
    Member Vacon's Avatar
    Join Date
    May 2008
    Location
    Germany
    Posts
    523
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hello everyone,

    thanks m^2 and thanks Matt

    Best regards!

  20. #20
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    160
    Thanks
    18
    Thanked 56 Times in 27 Posts
    A new version of comprox_sa released!

    http://comprox.googlecode.com/files/...111003.tar.bz2, with source code. All the executables are dynamically linked and the Windows version is tested under Linux with wine (for some reason the decompression of Windows version is much slower than the Linux version).

    I rewrite it completely from scratch in C and use pthread for multi-threading instead of openmp. The algorithm is also improved for better compression. Now both the speed of encoding and decoding are faster and the ratio is better (now compressed enwik8 less than 30MB).

  21. #21
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Strangely, I found the Windows version to be slower, even on the same machine. On a Dell Latitude E6510 Core i7 M620, 2.66 GHz, 4 GB:
    Ubuntu: enwik8 29,944,631 13s 4s. enwik9 262,923,509 127s 34s. (size, compression time, decompression time).
    Windows 7: same sizes. enwik8 20s 37s. enwik9 154s 73s.
    Decompressing enwik8 in Windows, the size display does not start until a few seconds before decompression is finished.
    I had a different version of pthreadGC2.dll than yours, but I tried both and it made no difference.

    On a Gateway M7301U Pentium T3200, 2.0 GHz, 3 GB in Windows Vista: enwik8 34s 113s. enwik9 246s 118s.

    Edit: rebooted the Gateway in Ubuntu. enwik8 23s 6s. enwik9 215s 57s.

    Also forgot to mention that in Windows, even before progress display starts, both CPUs are near 100%.
    Last edited by Matt Mahoney; 4th October 2011 at 07:07.

  22. #22
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts
    @RichSelian:

    1. gratulation
    - in a very short time you have developed a wonderful fast compressor

    2. compression is really fast,
    but for now the compression ratio is a little bit low

    3. wishes:
    implementation a possibility
    to compress a whole directory with subdirectories

    running on intel core 2 duo with vista sp2

    comprox_sa encode ../ORGDTA/db.dmp ../RESULT/comp-db
    COMPROX_SA, AN EXPERIMENTAL LZSS-ARI COMPRESSOR, WRITTEN BY ZHANG LI <RICHSELIAN
    @GMAIL.COM>
    633136KB => 47377KB, ratio = 7.48% --- 41 seconde
    ---
    comprox_sa encode ../ORGDTA/dbbig.dmp ../RESULT/comp-dbbig
    COMPROX_SA, AN EXPERIMENTAL LZSS-ARI COMPRESSOR, WRITTEN BY ZHANG LI <RICHSELIAN
    @GMAIL.COM>
    2849324KB => 364954KB, ratio = 12.81% --- 276 seconds
    ---

  23. #23
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts

    Decoding is slower with Intel Core duo 2 (WXP)

    C:\>cobra comprox_sa encode c:\test\fp.log t.a
    COBRA v2.0 beta 5
    Copyright (c) 2010 by LovePimple
    COMPROX_SA, AN EXPERIMENTAL ...........
    20133KB => 1024KB, ratio = 5.09%
    Timing = QueryPerformanceCounter : Frequency = 2400.010000 MHz
    Kernel Time = 0.140 = 00:00:00.140 = 7%
    User Time = 1.421 = 00:00:01.421 = 74%
    Process Time = 1.562 = 00:00:01.562 = 82%
    Global Time = 1.897 = 00:00:01.897 = 100%

    C:\>cobra comprox_sa decode t.a test.log
    COBRA v2.0 beta 5
    Copyright (c) 2010 by LovePimple
    COMPROX_SA, AN EXPERIMENTAL LZSS-ARI ..............
    1024KB => 20133KB, ratio = 1964.73%
    Timing = QueryPerformanceCounter : Frequency = 2400.010000 MHz
    Kernel Time = 13.703 = 00:00:13.703 = 19%
    User Time = 55.359 = 00:00:55.359 = 80%
    Process Time = 69.062 = 00:01:09.062 = 100%
    Global Time = 68.684 = 00:01:08.684 = 100%

  24. #24
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    160
    Thanks
    18
    Thanked 56 Times in 27 Posts
    I made foolish code again that I do O(n) realloc() for output block. Now I'm thinking of making it more than a toy, but googlecode is now blocked in China again...

  25. #25
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    And Sourceforge?

  26. #26
    Member Surfer's Avatar
    Join Date
    Mar 2009
    Location
    oren
    Posts
    203
    Thanks
    18
    Thanked 7 Times in 1 Post
    GitHub is better.

  27. #27
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Quote Originally Posted by RichSelian View Post
    I made foolish code again that I do O(n) realloc() for output block. Now I'm thinking of making it more than a toy, but googlecode is now blocked in China again...
    Do they give a reason for blocking it. Does it mean the chances for War are going up or what?

  28. #28
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Quote Originally Posted by RichSelian View Post
    I made foolish code again that I do O(n) realloc() for output block. Now I'm thinking of making it more than a toy, but googlecode is now blocked in China again...
    Can you post your program here?

Similar Threads

  1. History tables for LZ compressors
    By willvarfar in forum Data Compression
    Replies: 1
    Last Post: 26th August 2010, 18:19
  2. Non Windows or Linux compressors
    By Earl Colby Pottinger in forum Data Compression
    Replies: 6
    Last Post: 8th April 2010, 17:26
  3. qc & qazar compressors
    By encode in forum Forum Archive
    Replies: 3
    Last Post: 25th August 2007, 04:58
  4. image compressors
    By maadjordan in forum Forum Archive
    Replies: 5
    Last Post: 13th August 2007, 10:28
  5. Fastest Compressors
    By LovePimple in forum Forum Archive
    Replies: 0
    Last Post: 1st November 2006, 06:36

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •