View Poll Results: Should I release BCM v2.00 (not comparible with v1.xx)?

Voters
29. You may not vote on this poll
  • Yes

    28 96.55%
  • No

    1 3.45%
Page 4 of 4 FirstFirst ... 234
Results 91 to 111 of 111

Thread: BCM - The ultimate BWT-based file compressor

  1. #91
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    It is correct. In other words, BCM needs approx. block size*5 of RAM. e.g. for 1 GB block you need 5 gigs of free RAM.

  2. #92
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts
    bcm -b64 backup.dat lab-bcm-b064 65718343680 -> 7433016230 in 12778.48s
    bcm -b256 backup.dat lab-bcm-b256 65718343680 -> 6869264058 in 13618.84s
    bcm -b512 backup.dat lab-bcm-b512 65718343680 -> 6643032885 in 19543.54s
    ---
    works like expected

    bigger blocksize = better compression?

    best regards

    PS: i think the progress-Information "1% .. 99%" is a nice to have, may be it can be changed to

    0% -10%(xxxxxs) -20%(xxxxxs) -30%(xxxxxs) -40%(xxxxxs) -50%(xxxxxs) -60%(xxxxxs) -70%(xxxxxs) -80%(xxxxxs) -90%(xxxxxs) -END(xxxxxs)

    this means: progress-info will be shown only every 10% with a timeinfo
    with the option to write it to a logfile for batch-processing

  3. #93
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Quote Originally Posted by joerg View Post
    bigger blocksize = better compression?
    Yep, in most cases.

    Speaking of the progress view - it's a matter of personal preference...


  4. #94
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    410
    Thanks
    37
    Thanked 60 Times in 37 Posts
    Quote Originally Posted by encode View Post
    in most cases

    ---
    bcm -b64 backup.dat lab-bcm-b064 65718343680 -> 7433016230 in 12778.48s
    bcm -b256 backup.dat lab-bcm-b256 65718343680 -> 6869264058 in 13618.84s
    bcm -b512 backup.dat lab-bcm-b512 65718343680 -> 6643032885 in 19543.54s
    ---
    bcm -b1024 backup.dat lab-bcm-1024 65718343680 -> 6443528922
    bcm -b1152 backup.dat lab-bcm-1152 65718343680 -> 6410402570

    works like expected

    in my test: bigger blocksize .. better compression

    with big blocksize bcm can compress better then 7zip !
    but decompressing needs a long time ...

    best regards

    wish you all the best

  5. Thanks:

    encode (27th June 2016)

  6. #95
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    BCM v1.30 has been released!

    What's new:
    • Added 4n unbwt (if block size is less than 16 MB). The -b1 and -b2 lead to pretty intereting decompression speed
    • The default block size is 16 MB now. (1<<24)-1 actually to make use of 4n unbwt by default
    • Fixed "Segmentation fault" due to unaligned memory access bug (GCC + Linux) - special thanks to smjohn1!

    https://github.com/encode84/bcm


  7. Thanks (4):

    Bulat Ziganshin (12th January 2018),Darek (11th January 2018),Mike (11th January 2018),smjohn1 (11th January 2018)

  8. #96
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    515
    Thanks
    234
    Thanked 89 Times in 69 Posts
    Good! Very fast for BWT+CM

    Now, I have a few questions, if you don't mind:

    1)How do you compile it? I mean, switches... My linux version is 60% slower than you win64 exe and I don't think that it should.
    2)I see no progress indicator under linux. Is it a known bug, or am I doing something wrong?

  9. #97
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    1) Hand tuned Visual Studio 2013 + Intel C++ compile.
    2) Looks like it depends on Linux version. Please checkout the "Overwrite Prompt".

  10. #98
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Anyway, just reuploaded the source. Please recompile!


  11. #99
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    515
    Thanks
    234
    Thanked 89 Times in 69 Posts
    Quote Originally Posted by encode View Post
    Anyway, just reuploaded the source. Please recompile!

    Thank you. But still doesn't work. I'll check the sources later and see if I can help. Meanwhile, this is my output:

    Code:
    linux binary
    $ time bcm -f file 
    Compressing file:
    6106817 -> 1420300 in 20.73s
    
    
    real	0m20.077s
    user	0m20.577s
    sys	0m0.170s
    
    -------------------------
    
    Windows exe
    
    $ time wine bcm -f file 
    
    
    Timer 3.01  Copyright (c) 2002-2003 Igor Pavlov  2003-07-10
    bcm v0.07 by ilia muraviev
    compressing 5963k block...
    ratio: 1.862 bpb
    done
    
    real	0m7.469s
    user	0m7.646s
    sys	0m0.209s

  12. #100
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    BCM shows a progress once per block! For such a small file you shouldn't see anything...
    Reuploaded the original file.

  13. #101
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    82
    Thanks
    31
    Thanked 8 Times in 8 Posts
    Here is what I used for compile

    g++ -Wall -O3

    and I did see progressive indication even without fflush() for my Linux distribution ( Ubuntu 16.04 LTS )

    I tested a large file and got:

    1080728273 -> 1080241381 in 269.22s ( about 3.8MB/s ), and my cpu is i5-3320M CPU @ 2.60GHz

    Try -O3 to see if your Linux has about the same speed as Windows

    Quote Originally Posted by Gonzalo View Post
    Thank you. But still doesn't work. I'll check the sources later and see if I can help. Meanwhile, this is my output:

    Code:
    linux binary
    $ time bcm -f file 
    Compressing file:
    6106817 -> 1420300 in 20.73s
    
    
    real    0m20.077s
    user    0m20.577s
    sys    0m0.170s
    
    -------------------------
    
    Windows exe
    
    $ time wine bcm -f file 
    
    
    Timer 3.01  Copyright (c) 2002-2003 Igor Pavlov  2003-07-10
    bcm v0.07 by ilia muraviev
    compressing 5963k block...
    ratio: 1.862 bpb
    done
    
    real    0m7.469s
    user    0m7.646s
    sys    0m0.209s

  14. #102
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    Gonzalo, you're comparing BCM v1.30 to BCM v0.07...

  15. Thanks:

    Gonzalo (12th January 2018)

  16. #103
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    515
    Thanks
    234
    Thanked 89 Times in 69 Posts
    What an idiot! I didn't realize, but I was comparing the new version with an old one. See above, it is v0.07. I don't know how that can happen, because I overwrote the exe. Anyway, the speed is still an issue. Any hints?

    Edit: Yes, now I see your post, too! Thanks

  17. #104
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    515
    Thanks
    234
    Thanked 89 Times in 69 Posts
    Quote Originally Posted by smjohn1 View Post
    Here is what I used for compile

    g++ -Wall -O3

    and I did see progressive indication even without fflush() for my Linux distribution ( Ubuntu 16.04 LTS )

    I tested a large file and got:

    1080728273 -> 1080241381 in 269.22s ( about 3.8MB/s ), and my cpu is i5-3320M CPU @ 2.60GHz

    Try -O3 to see if your Linux has about the same speed as Windows
    Well, I think I beat you to it. I used -Ofast. What can I say, I'm a bold man
    And that is the slow binary. Did you compare speeds on you system with the author's exe binary?

  18. #105
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    BTW, are you compiling a 64-bit Linux binary?

  19. #106
    Member
    Join Date
    Feb 2016
    Location
    USA
    Posts
    82
    Thanks
    31
    Thanked 8 Times in 8 Posts
    Ha, you are really bold! Ofast doesn't show more bugs? -

    Unfortunately I have no Windows box.

    Oh, mine is 64-bit.

    Then I noticed encode used intel C++ for Windows.

    So I tried Intel Compiler on Linux too ( icpc ), as well as g++ with -Ofast, -mssse3 and -mavx2

    Here are some results:

    -O3: 43470244 -> 38912460 in 10.84s
    -Ofast: 43470244 -> 38912460 in 10.34s
    -Ofast -msse3: 43470244 -> 38912460 in 10.26s
    -Ofast -mavx: 43470244 -> 38912460 in 10.37s
    -Ofast -mavx2: Illegal instruction
    icpc -O3 -ipo: 43470244 -> 38912460 in 10.32s

    Added: -Ofast -msse3 -march=native: 43470244 -> 38912460 in 10.20s


    Quote Originally Posted by Gonzalo View Post
    Well, I think I beat you to it. I used -Ofast. What can I say, I'm a bold man
    And that is the slow binary. Did you compare speeds on you system with the author's exe binary?
    Last edited by smjohn1; 12th January 2018 at 21:59.

  20. Thanks:

    encode (12th January 2018)

  21. #107
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    515
    Thanks
    234
    Thanked 89 Times in 69 Posts
    Fixed! My bad, silly typo: -ofast instead of -Ofast. That was the troublemaker.
    Now performance is just a little behind Intel's.

    As for options, I use -march=native, which enables all instruction subsets supported by my CPU, so I don't know if there is something else that can be done to improve the speed.

    And yes, -Ofast is definitely not recommended for release, because it tends to make the programs crash. I just wanted to see how much I could speed it up for pure curiosity.

    Some numbers:

    Code:
    65,82s - 100,0% - Ilya's compile
    72,68s - 110,4% - GCC
    76,32s - 115,9% - LLVM (clang)
    Method: three times each one; pick up the faster run.
    Last edited by Gonzalo; 12th January 2018 at 06:25.

  22. Thanks:

    encode (12th January 2018)

  23. #108
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    I have an idea for BCM v1.40. To be more BZIP2'ish:

    No code has to be inserted here.
    -9 is for a 64-bit compile only. (Memory usage is Block Size * 5)

    Plus, I have some speed improvements (EncodeDirectBit for Encode32()) and code cleanups.


  24. Thanks (2):

    load (21st November 2018),Simorq (20th November 2018)

  25. #109
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,984
    Thanks
    377
    Thanked 352 Times in 140 Posts
    A new version to play with:
    Attached Files Attached Files

  26. Thanks (4):

    avitar (25th November 2018),CompressMaster (28th May 2019),danlock (27th November 2018),Mike (25th November 2018)

  27. #110
    Member
    Join Date
    Jun 2018
    Location
    Slovakia
    Posts
    154
    Thanks
    44
    Thanked 10 Times in 10 Posts
    I´ve tried BCM today for compressing large text file with simple yet randomly grouped pattern that´s repeated across file. 3MB file can be compressed down to 180 KB losslessly. But, can it be compressed further? Could it be possible to shrink it below 90 KB? Time and resource usage does not matter.

  28. #111
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    467
    Thanks
    149
    Thanked 160 Times in 108 Posts
    Quote Originally Posted by CompressMaster View Post
    I´ve tried BCM today for compressing large text file with simple yet randomly grouped pattern that´s repeated across file. 3MB file can be compressed down to 180 KB losslessly. But, can it be compressed further? Could it be possible to shrink it below 90 KB? Time and resource usage does not matter.
    This sort of question is completely pointless. It's entirely data specific, and even knowing the data the answer is simply "probably not, but try a different tool and you may get lucky". I'm sure you know of the Large Text Compression Benchmark already?

    Unless of course you're a proponent of recursive infinite compression in which case the answer is obviously YES! Just ask the unicorn for the answer

Page 4 of 4 FirstFirst ... 234

Similar Threads

  1. BCM v0.09 - The ultimate BWT-based file compressor!
    By encode in forum Data Compression
    Replies: 22
    Last Post: 6th March 2016, 10:26
  2. BCM 0.11 - A high performance BWT compressor
    By encode in forum Data Compression
    Replies: 44
    Last Post: 29th October 2010, 23:45
  3. BCM v0.08 - The ultimate BWT-based file compressor!
    By encode in forum Data Compression
    Replies: 78
    Last Post: 12th August 2009, 11:14
  4. BCM v0.01 - New BWT+CM-based compressor
    By encode in forum Data Compression
    Replies: 81
    Last Post: 9th February 2009, 16:47
  5. Blizzard - Fast BWT file compressor!!!
    By LovePimple in forum Data Compression
    Replies: 40
    Last Post: 6th July 2008, 15:48

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •