Page 3 of 3 FirstFirst 123
Results 61 to 90 of 90

Thread: BCM 0.12 is here!

  1. #61
    Programmer Gribok's Avatar
    Join Date
    Apr 2007
    Location
    USA
    Posts
    159
    Thanks
    0
    Thanked 1 Time in 1 Post
    bsc uses libdivsufsort for BWT. If bsc taking 12 seconds for enwik8 then BWT stage should be at lest 8-9 seconds. This is very impressive if tank can do to for 6 seconds. MSufsort4?
    Enjoy coding, enjoy life!

  2. #62
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    I'm testing with both divsufsort and my own experimental sorting... Experimenting with some ideas... Like I said sorting takes the most time in tank. My guesses, with BSC sorting takes the half of a final time, at least on my machine.
    Anyway, the only key is system specs. To do such heavy tuning I upgraded my PC to top components, and using liquid cooling overclocked that to the highest freqs possible. Fast memory counts - 2133+ MHz, 4.7-5.1+ GHz CPU counts as well! Previously such optimizations take at least one week (BCM 0.09) currently I can compute it in one day...

  3. #63
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    It would be cheaper and cooler if you built a cluster. :P

  4. #64
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,475
    Thanks
    26
    Thanked 121 Times in 95 Posts
    DDR3 2133 MHz is slow :P http://www.techpowerup.com/167205/Co...-Pictured.html

    I agree with m^2, you can make a distributed optimization engine.

    Could you post separate timings for BWT-ing and coding stages?

    Is your experimental blocksorting on par with libdivsufsort or MSufSort?

  5. #65
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Well, currently 2133 or little higher is more than adequate. Just small number of motherboards can support clocks higher than 2400 MHz and I doubt their stability at such insane overclock. As a note all freqs that higher than 1333 or 1600 (in newest products) are overclocked.

    I've programmed a multi threaded optimizer. Not use it all the time though.

    Probably will post timings for BWT and CM later - have a small amount of spare time for programming, which I use to start new optimization tasks (which run 24/7 for quite a long time). And I'll continue improving what I've got, finding new ideas that work!

  6. #66
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,423
    Thanks
    223
    Thanked 1,052 Times in 565 Posts
    Actually massively parallel compression optimization is rarely possible.
    Some degree of parallelism is certainly possible, but not on a cluster scale.
    Basically its necessary to compress a file to compute a value of the target function,
    and parameters are rarely independent, so in the worst case it has to be sequential.
    So overclocking and such is pretty important actually.

  7. #67
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Wow, I thought I've never exposed BWTANK name. Anyway, I've just returned to it's development - and I must say it's mostly done and the code sits at the top of my head for quite long time. In other words, I think I must release something - three years passed from my last BWT-based release (BCM v0.12).

  8. #68
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Code:
    BCM - Experimental BWT compressor, v0.13
    Copyright (C) 2008-2013 Ilya Muravyov
    
    Usage: BCM command infile outfile
    
    Commands:
      c[#] - Compress, set block size to # MB (default: 100)
      d    - Decompress

  9. #69
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    The BCM optimizer working for a few days uninterrupted... And here is the the beauty of the prodigy - it is truly portable - you can setup it at any room/place you want! B.E.A.UTIFUL!
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	image.jpg 
Views:	268 
Size:	1.02 MB 
ID:	2335  

  10. #70
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts

  11. #71
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Yes, I'm familiar with this chassis. However, according to all tests, Bitfenix Prodigy has superior cooling. At some point, it is too big for miniITX, but it has zillions of cooling options and useful handles to move around.
    Sufficient cooling (230mm intake fan, 140mm exhaust fan) means I can run the CPU heavily overclocked at 100% load (8 threads of the optimizer) 24/7

  12. #72
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Results for BWTANK (Simplest FSM-based encoder, as simple as 7-Zip's (LZMA) literal coder, superfast):
    book1 -> 213,000 bytes
    calgary.tar -> 794,804 bytes
    world95.txt -> 473,562 bytes
    enwik5m -> 1,218,906 bytes
    enwik9 -> ~165,xxx,xxx bytes

    Nice and really fast, but I need more compression.

    And what's what an optimized version of BCM shows (and it's at least two times faster than BCM v0.12):
    book1 -> 210,952 bytes
    calgary.tar -> 786,941 bytes
    world95.txt -> 468,827 bytes
    enwik5m -> 1,208,360 bytes
    enwik8 -> 20,792,792 bytes
    enwik9 -> 164,251,280 bytes

    Anyway, my actual goal is 163,xxx,xxx bytes on enwik9 (to beat bbb (at least) and bsc). But even at this point, it is notable faster than BCM v0.12, with often a higher compression. So, even at this stage it's worth releasing!


  13. #73
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    BTW, the max block size in the new version is 2047 MB - quite nice!

  14. #74
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Compared to BSC:
    Code:
    C:\bcm\x64\Release>bsc e enwik8 enwik8.bsc -Tb100e1
    This is bsc, Block Sorting Compressor. Version 3.1.0. 8 July 2012.
    Copyright (c) 2009-2012 Ilya Grebnov <Ilya.Grebnov@gmail.com>.
    
    enwik8 compressed 100000000 into 20920018 in 10.452 seconds.
    
    C:\bcm\x64\Release>bsc e enwik8 enwik8.bsc -Tb100e2
    This is bsc, Block Sorting Compressor. Version 3.1.0. 8 July 2012.
    Copyright (c) 2009-2012 Ilya Grebnov <Ilya.Grebnov@gmail.com>.
    
    enwik8 compressed 100000000 into 20786218 in 11.918 seconds.
    
    C:\bcm\x64\Release>bcm c enwik8 enwik8.z
    Compressing enwik8: 100000000->20792792 in 11.372s

  15. #75
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    And BCM is BWT+CM - i.e. it encodes 8 bits per input byte. In addition, it has some advantage on binary data and its speed is constant (no RLE or transforms)

  16. #76
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Looks OK.

  17. #77
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,475
    Thanks
    26
    Thanked 121 Times in 95 Posts
    Good results. Waiting for more thorough benchmarks :]

    And BCM is BWT+CM - i.e. it encodes 8 bits per input byte.
    That is irrelevant for end user, ...

    In addition, it has some advantage on binary data and its speed is constant (no RLE or transforms)
    ..but this is not ;]

    We'll see how it performs on Francesco's benchmark - there are a lot of hardly compressible files.




    What's your method for optimization? Toffer has promised (or at least announced) that he will make a genetic optimizer for FSM's, but AFAIR he hasn't released anything like that on this forum.

  18. #78
    Member
    Join Date
    Sep 2011
    Location
    uk
    Posts
    238
    Thanks
    188
    Thanked 17 Times in 12 Posts
    Confused - is bcm 0.13 out yet- can't see it on this thread, tho' encode seems to have it ready? Where?

    If not what are these guys trying?
    J

  19. #79
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    I use Brute-Force optimizer most of the time (And some other techniques (a la Hill-Climbing) occasionally) IMHO, there is no short-cut (at least with BCM's CM) In addition, I'm checking all the results myself, building charts, comparing complexity and such, so I go beyond single model structure.

  20. #80
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Quote Originally Posted by avitar View Post
    Confused - is bcm 0.13 out yet- can't see it on this thread, tho' encode seems to have it ready? Where?

    If not what are these guys trying?
    J
    BCM v0.13 is an upcoming version currently I'm heavily working on. It will be released soon (July 2013, or earlier)

  21. #81
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    The results for the BCM with slow counters (Not fully optimized yet)

    book1 -> 210,417 bytes
    calgary.tar -> 782,952 bytes
    world95.txt -> 466,671 bytes
    enwik5m -> 1,203,579 bytes
    enwik8 -> 20,745,547 bytes
    enwik9 -> 163,988,775 bytes

    And it is somewhat slower:
    Code:
    C:\bcm\x64\Release>bcm c enwik8 enwik8.z
    Compressing enwik8: 100000000->20745547 in 13.622s


    In conclusion, some comparison chart of what I've got right now:

    No code has to be inserted here.


  22. #82
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,475
    Thanks
    26
    Thanked 121 Times in 95 Posts
    Slow counters look pretty pointless currently - almost 20% slowdown for about 0.2% improvement in compression ratio.

    Is your program fully single-threaded?

  23. #83
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Well, BCM must have a higher compression than, say, BSC!
    CM is single threaded, for sorting new BCM uses libdivsufsort-lite with OpenMP enabled

  24. #84
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Further optimized BCM shows 163,885,873 bytes on ENWIK9.

    Yet another interesting result of this build (not tuned to this file):

    pht.psd (6,755,163 bytes):
    No code has to be inserted here.


  25. #85
    Member ivan2k2's Avatar
    Join Date
    Nov 2012
    Location
    Russia
    Posts
    35
    Thanks
    13
    Thanked 6 Times in 3 Posts
    Quote Originally Posted by encode View Post
    Further optimized BCM shows 163,885,873 bytes on ENWIK9.

    Yet another interesting result of this build (not tuned to this file):

    pht.psd (6,755,163 bytes):
    No code has to be inserted here.

    What about list of primes from http://encode.su/threads/1723-Compressing-prime-numbers ?? ZIP compresses this list much better then BCM 0.12

  26. #86
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    BWT in general does poorly on sorted data.

  27. #87
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Tested this build on Sorted Prime Numbers - no magic - just a small compression improvement, as expected

  28. #88
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    471
    Thanks
    26
    Thanked 9 Times in 8 Posts
    Quote Originally Posted by encode View Post
    Yet another interesting result of this build (not tuned to this file):

    pht.psd (6,755,163 bytes):
    Seems familiar
    I am... Black_Fox... my discontinued benchmark
    "No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates

  29. #89
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts

  30. #90
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Well, I've just finished working on a new release. Will test it for a few days to check its performance and double-check each line of a code - since I do not plan to release a next version anytime soon.

    Code:
    >C:\bcm\x64\Release>bcm
    BCM - Experimental BWT compressor, v0.13
    Copyright (C) 2008-2013 Ilya Muravyov
    
    Usage: BCM command infile outfile
    
    Commands:
      c[#] - Compress; set block size to # MB (default: 100)
      d    - Decompress
    
    >C:\bcm\x64\Release>bcm c enwik8 enwik8.z
    Compressing enwik8: 100000000->20736614 in 12.841s
    
    >C:\bcm\x64\Release>bcm c1000 enwik9 enwik9.z
    Compressing enwik9: 1000000000->163885873 in 145.885s

  31. Thanks:

    samsat1024 (21st June 2013)

Page 3 of 3 FirstFirst 123

Similar Threads

  1. BCM v0.10 is here!
    By encode in forum Data Compression
    Replies: 45
    Last Post: 20th June 2010, 22:39
  2. BCM v0.06,0.07 is here! [!]
    By encode in forum Data Compression
    Replies: 34
    Last Post: 31st May 2009, 17:39
  3. BCM v0.05 is here! [!]
    By encode in forum Data Compression
    Replies: 19
    Last Post: 8th March 2009, 22:12
  4. BCM v0.04 is here! [!]
    By encode in forum Data Compression
    Replies: 64
    Last Post: 5th March 2009, 17:07
  5. BCM v0.03 is here! [!]
    By encode in forum Data Compression
    Replies: 25
    Last Post: 14th February 2009, 15:42

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •