Page 9 of 13 FirstFirst ... 7891011 ... LastLast
Results 241 to 270 of 364

Thread: bsc, new block sorting compressor

  1. #241
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,505
    Thanks
    26
    Thanked 136 Times in 104 Posts
    encode:
    Did you try verifying decompression? I think that should detect most of errors.

  2. #242
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,610
    Thanks
    30
    Thanked 65 Times in 47 Posts
    I've heard that OCed GPUs tend to make many calculation errors. Encode, did you try to decompress the file with weird size?
    ADDED: Good timing.

  3. #243
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,023
    Thanks
    415
    Thanked 416 Times in 158 Posts

    Angry

    One of the error messages at overclocked GPU:
    Code:
    I:\>bsc e enwik9 enwik9.z -b32p -m8f
    This is bsc, Block Sorting Compressor. Version 3.0.0. 26 August 2011.
    Copyright (c) 2009-2011 Ilya Grebnov <Ilya.Grebnov@gmail.com>.
    
    Compressing enwik9(67%)[D:/Development/Tools/back40computing\b40c/util/spine.cuh
    , 112] Spine cudaFree d_spine failed:  (CUDA error 30: unknown error)
    
    General GPU failure, please contact the author!
    Randomly, under overclocked GPU, BSC produces a wrong compressed file. If try to decompress that file:
    Code:
    I:\>bsc d enwik9.z e9
    This is bsc, Block Sorting Compressor. Version 3.0.0. 26 August 2011.
    Copyright (c) 2009-2011 Ilya Grebnov <Ilya.Grebnov@gmail.com>.
    
    enwik9.z decompressed 196399168 into 1000000000 in 24.820 seconds.
    This takes a longer time, but BSC happily decompresses it!


    Files after decompression:

    2996E86FB978F93CCA8F566CC56998923E7FE581 *ENWIK9 (Original)
    ECFC9019ACA1077E22527206FB1B7ED3CA0397AC *e9 (Decompressed)

    • CRC-32 calculation routine for data integrity verification.


    WTF???

  4. #244
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,610
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Your GPU is unstable. Nothing special. Lower the clock or increase voltage.

  5. #245
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,023
    Thanks
    415
    Thanked 416 Times in 158 Posts
    I'm talking about that BSC should check data integrity as listed in its features. And here the decompressed file is not equal to the original (as confirmed with MD5 checksums above
    Just informing the BSC author for possible bug. To make BSC/LIBBSC more stable and better

  6. #246
    Programmer Gribok's Avatar
    Join Date
    Apr 2007
    Location
    USA
    Posts
    162
    Thanks
    0
    Thanked 14 Times in 2 Posts
    Quote Originally Posted by encode View Post
    I'm talking about that BSC should check data integrity as listed in its features. And here the decompressed file is not equal to the original (as confirmed with MD5 checksums above
    Just informing the BSC author for possible bug. To make BSC/LIBBSC more stable and better
    bsc computes Adler32 checksum of compressed data. I will fix this issue in next version.

    After reviewing CUDA forum I found a lot of issue with overclocked GPU. GPU can work fine with 3d with some visual artifacts, but it can produce incorrect results on CUDA. Could you please test you PC using OCCT v3.1.0. It have a test for CUDA.
    Enjoy coding, enjoy life!

  7. #247
    Member przemoc's Avatar
    Join Date
    Aug 2011
    Location
    Poland
    Posts
    44
    Thanks
    3
    Thanked 23 Times in 13 Posts
    Not a CUDA programmer myself, but maybe you could introduce two new experimental modes
    • safe mode with DMR (dual modular redundancy)
    • overclocker's mode with TMR (triple modular redundancy)

    Redundancy of (at least some, presumably most GPU intensive) operations performed in parallel to:
    • discover GPU problems and stop compression if results of each redundant operations aren't identical - safe mode
    • allow "voting" in case of non identical results if at two of them are the same (three different results obviously mean error) - overclocker's mode

    Well, it may be a huge overkill (and any gain from using GPU will be rather lost). Compression is not an example of safety-critical task.

  8. #248
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,610
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by przemoc View Post
    Compression is not an example of safety-critical task.
    Then what is?

  9. #249
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 798 Times in 489 Posts
    Or check if decompression is correct.

  10. #250
    Programmer Gribok's Avatar
    Join Date
    Apr 2007
    Location
    USA
    Posts
    162
    Thanks
    0
    Thanked 14 Times in 2 Posts

    MTF vs QLFC

    GRZipII MTF model:
    Code:
    qlfc   :       enwik8 | 100000000 byte |  22023940(22.0%) byte |  1.747 sec
    qlfc   :      mozilla |  51220480 byte |  16631752(32.5%) byte |  1.747 sec
    qlfc   :      webster |  41458703 byte |   6745872(16.3%) byte |  0.546 sec
    qlfc   :          nci |  33553445 byte |   1235898(03.7%) byte |  0.125 sec
    qlfc   :        samba |  21606400 byte |   4063574(18.8%) byte |  0.484 sec
    qlfc   :      dickens |  10192446 byte |   2370823(23.3%) byte |  0.203 sec
    qlfc   :         osdb |  10085684 byte |   2311308(22.9%) byte |  0.327 sec
    qlfc   :           mr |   9970564 byte |   2287976(22.9%) byte |  0.219 sec
    qlfc   :        x-ray |   8474240 byte |   3872735(45.7%) byte |  0.500 sec
    qlfc   :          sao |   7251944 byte |   4869905(67.2%) byte |  0.608 sec
    qlfc   :      reymont |   6627202 byte |   1024072(15.5%) byte |  0.078 sec
    qlfc   :      ooffice |   6152192 byte |   2703163(43.9%) byte |  0.250 sec
    qlfc   :          xml |   5345280 byte |    392929(07.4%) byte |  0.032 sec
    qlfc   :     interrup |   5134954 byte |    818903(15.9%) byte |  0.062 sec
    qlfc   :       enwik5 |   5000000 byte |   1266655(25.3%) byte |  0.093 sec
    qlfc   :        book1 |    768771 byte |    221223(28.8%) byte |  0.016 sec
    Global ratio: 4.12033216, Global time: 7.037 sec
    bsc fast QLFC model:
    Code:
    qlfc   :       enwik8 | 100000000 byte |  20928559(20.9%) byte |  2.324 sec
    qlfc   :      mozilla |  51220480 byte |  16149628(31.5%) byte |  1.935 sec
    qlfc   :      webster |  41458703 byte |   6478039(15.6%) byte |  0.859 sec
    qlfc   :          nci |  33553445 byte |   1219420(03.6%) byte |  0.218 sec
    qlfc   :        samba |  21606400 byte |   3991791(18.5%) byte |  0.499 sec
    qlfc   :      dickens |  10192446 byte |   2270321(22.3%) byte |  0.312 sec
    qlfc   :         osdb |  10085684 byte |   2254247(22.4%) byte |  0.297 sec
    qlfc   :           mr |   9970564 byte |   2221703(22.3%) byte |  0.312 sec
    qlfc   :        x-ray |   8474240 byte |   3799108(44.8%) byte |  0.499 sec
    qlfc   :          sao |   7251944 byte |   4718048(65.1%) byte |  0.593 sec
    qlfc   :      reymont |   6627202 byte |    991200(15.0%) byte |  0.125 sec
    qlfc   :      ooffice |   6152192 byte |   2585650(42.0%) byte |  0.374 sec
    qlfc   :          xml |   5345280 byte |    386820(07.2%) byte |  0.047 sec
    qlfc   :     interrup |   5134954 byte |    795626(15.5%) byte |  0.140 sec
    qlfc   :       enwik5 |   5000000 byte |   1216376(24.3%) byte |  0.141 sec
    qlfc   :        book1 |    768771 byte |    213966(27.8%) byte |  0.031 sec
    Global ratio: 3.98866593, Global time: 8.706 sec
    Do we need a fast MTF model in bsc?
    Enjoy coding, enjoy life!

  11. #251
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    IMO, we don't need MTF. As you summarized already, speed impact is not that much while compression ratio much worser. Also, as you know current trend is multi-threading apps. And you did well already. I think, those facts satisfy the requirements.
    BIT Archiver homepage: www.osmanturan.com

  12. #252
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,136
    Thanks
    320
    Thanked 1,397 Times in 802 Posts
    Why don't you try making a direct CM (maybe RLE+CM, or o1 huffman + CM) instead?
    Also how about making a postcoder that doesn't need the whole block to start working?
    And MTF is not really a fast algo anyway - I'd say it would be better to further optimize the qlfc speed if you need higher speed.

  13. #253
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    412
    Thanks
    38
    Thanked 64 Times in 38 Posts

    new "beta" gcc 4.6

    @gribok:
    15th March 2011 00:12 you wrote:
    ---
    I do work usually in Visual Studio, and for bsc 2.4.5 I have used combination of VS 2008 SP1 + Intel 11.1 Now I have upgraded my PC to VS 2010 SP1 and Intel Composer XE. And new builds of bsc is slower.
    ---

    can you please try to compile with the new "beta" gcc 4.6 ?

    http://sourceforge.net/projects/ming...4/gcc-4.6.1-2/

    http://sourceforge.net/projects/ming....lzma/download

    info is from
    http://encode.su/threads/1368-Some-o...6463#post26463


    gcc 4.6 has new optimization/support for

    Intel Core 2 : -march=core2 and -mtune=core2
    Intel Core i3/i5/i7 : -march=corei7 and -mtune=corei7
    Intel Core i3/i5/i7 processors with AVX : -march=corei7-avx and -mtune=corei7-avx

    may be it can give bsc 3.0 a little bit extra speed on Core2 / Corei7 ?!

    best regards

    ps:
    if you think there is a need for very fast compression
    - why not implement a ppmd - algorithm within bsc ?
    (7zip seems to have a very fast and good implementation of PPMD = Dmitry Shkarin's PPMdH with small changes)
    Last edited by joerg; 29th September 2011 at 10:05.

  14. #254
    Member przemoc's Avatar
    Join Date
    Aug 2011
    Location
    Poland
    Posts
    44
    Thanks
    3
    Thanked 23 Times in 13 Posts
    Quote Originally Posted by joerg View Post
    (7zip seems to have a very fast and good implementation of PPMD = Dmitry Shkarin's PPMdH with small changes)
    Isn't it rather var.I? Shkarin released rev.2 with Pavlov's fixes almost year and a half ago.

  15. #255
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,136
    Thanks
    320
    Thanked 1,397 Times in 802 Posts
    1. 7z actually includes both ppmd vH (for rar) and vI (for 7z, maybe for zipx)
    2. Afaik they are not "implementations", but simple ports of original ppmd sources.
    3. Latest version is vJ, which is better
    4. There's no sense to integrate ppmd into bsc, because there's basically nothing that ppmd does better than BWT

  16. #256
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    412
    Thanks
    38
    Thanked 64 Times in 38 Posts
    yesterday nvidia releases the new CUDA 4.1 toolkit:

    http://www.developer.nvidia.com/cuda-toolkit-41

    @gribok:

    will you build a new version of your wonderful bsc 3.0.0 with the new CUDA 4.1-tools ?

    I am especially interested in a speedup for the modes "Sort Transform of order 5" and "Sort Transform of order 6"

    blazerx wrote on 7th December 2011, 14:01:
    http://encode.su/threads/1208-CUDA-G...ll=1#post27528

    ***
    For optimal performance on nVIDIA CUDA enabled cards please specify the block switch to No. of Shaders /8 and threads to 512

    for example:

    GTX460 has 336 CUDA cores - Blocks=42 ..... GTX580 has 512 CUDA cores - Blocks=64

    This should allow you to reach maximum performance and shave off some time to the default values.
    ***

    i cant understand this to the end: how and where is it possible to specify this block switch ?

    best regards

  17. #257
    Programmer Gribok's Avatar
    Join Date
    Apr 2007
    Location
    USA
    Posts
    162
    Thanks
    0
    Thanked 14 Times in 2 Posts
    Quote Originally Posted by joerg View Post
    yesterday nvidia releases the new CUDA 4.1 toolkit:
    will you build a new version of your wonderful bsc 3.0.0 with the new CUDA 4.1-tools ?
    I will try it over a weekend. At least it is compiles with CUDA 4.1 Also I will try to upgrade to new version of B40c.
    Quote Originally Posted by joerg View Post
    For optimal performance on nVIDIA CUDA enabled cards please specify the block switch to No. of Shaders /8 and threads to 512
    You don't have control over number of GPU blocks in bsc. It is computed automatically @ runtime.
    Enjoy coding, enjoy life!

  18. #258
    Member
    Join Date
    May 2008
    Location
    Germany
    Posts
    412
    Thanks
    38
    Thanked 64 Times in 38 Posts
    @gribok: thanks in advance - sounds wonderful

  19. #259
    Programmer Gribok's Avatar
    Join Date
    Apr 2007
    Location
    USA
    Posts
    162
    Thanks
    0
    Thanked 14 Times in 2 Posts
    Quote Originally Posted by joerg View Post
    @gribok: thanks in advance - sounds wonderful
    Experimental version with CUDA 4.1 and new entropy coder based on Eugene Shelwien's rc_sh2d.inc attached.
    Attached Files Attached Files
    Enjoy coding, enjoy life!

  20. #260
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,593
    Thanks
    801
    Thanked 698 Times in 378 Posts
    as time goes, bsc becomes more and more brilliant program. but there a few things that has potential for improvements:

    1. afair, when i tested grzip on my old q6600@3.24GHz, it perfromed ST4 transform of 100 mb block in 0.5 seconds, and further MTF compression took 1.5 seconds. if numbers are still about the same, it would be fantastic to see some super-fast encoding algorithm, even if it makes compression 10-20% worse - it will just extend bsc usage to new areas

    2. there is a well-known technique of huffman preprocessing data before BWT/ST. afaik, it just decreases number of memory accesses, improving speed/compression ratio

    3. bzip2 had -s switch that decreased amount of memory used for compression and decompression, and BBB is also very efficient. for FreeArc it's important to have ability to decompress using minmum of memory: modern computers usually have 2 gb of memory, but i want to keep archives decompressible even on computers having 128 MB RAM. now this means that i should limit blocksize to 20-30 mb, and implementation of such option will allow me to use 50-100 mb bloks. the same will hold for advertising bsc as bzip2 replacement

  21. #261
    Programmer Gribok's Avatar
    Join Date
    Apr 2007
    Location
    USA
    Posts
    162
    Thanks
    0
    Thanked 14 Times in 2 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    1. afair, when i tested grzip on my old q6600@3.24GHz, it perfromed ST4 transform of 100 mb block in 0.5 seconds, and further MTF compression took 1.5 seconds. if numbers are still about the same, it would be fantastic to see some super-fast encoding algorithm, even if it makes compression 10-20% worse - it will just extend bsc usage to new areas
    Likely. I have prototype with grzip model in bsc.

    Quote Originally Posted by Bulat Ziganshin View Post
    2. there is a well-known technique of huffman preprocessing data before BWT/ST. afaik, it just decreases number of memory accesses, improving speed/compression ratio
    Unlikely, because I am using external libraries for BWT.

    Quote Originally Posted by Bulat Ziganshin View Post
    3. bzip2 had -s switch that decreased amount of memory used for compression and decompression, and BBB is also very efficient. for FreeArc it's important to have ability to decompress using minmum of memory: modern computers usually have 2 gb of memory, but i want to keep archives decompressible even on computers having 128 MB RAM. now this means that i should limit blocksize to 20-30 mb, and implementation of such option will allow me to use 50-100 mb bloks. the same will hold for advertising bsc as bzip2 replacement
    Unlikely, because in bzip2 block is limited by 1MB, so you can pack bits and do other optimizations for low memory. For blocks >16MB this is not trivial.
    Enjoy coding, enjoy life!

  22. #262
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,593
    Thanks
    801
    Thanked 698 Times in 378 Posts
    3. it's also implemented in open-source BBB, with a full description

    1. hopefully it will be much faster than MTF reults you cited above. but overall, may be there some other methods for fast encoding? what needs most time in MTF code? MTF itself or Huffman or ? we just neeed the way to do this part faster, even with worser quality
    Last edited by Bulat Ziganshin; 17th February 2012 at 22:56.

  23. #263
    Programmer Gribok's Avatar
    Join Date
    Apr 2007
    Location
    USA
    Posts
    162
    Thanks
    0
    Thanked 14 Times in 2 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    3. it's also implemented in open-source BBB, with a full description

    1. hopefully it will be much faster than MTF reults you cited above. but overall, may be there some other methods for fast encoding? what needs most time in MTF code? MTF itself or Huffman or ? we just neeed the way to do this part faster, even with worser quality
    1. We can try MTF + Huffman like in bzip2. This should be the fastest way.
    3. BBB uses swap file.
    Enjoy coding, enjoy life!

  24. #264
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,593
    Thanks
    801
    Thanked 698 Times in 378 Posts
    1. can you say please that requires most time - mtf, huffman or smth else?

    3.
    bbb (Aug. 31, 2006) is a Big Block BWT (Burrows-Wheeler transform) compressor. It allows blocks as large as 80% of available memory...

    bbb uses a memory efficient BWT. For compression, blocks are first context-sorted in small blocks and then merged using temporary files. For the inverse transform, instead of building a linked list, the program builds an index to the approximate location of the next node, then searches linearly for the exact location.
    Last edited by Bulat Ziganshin; 18th February 2012 at 10:20.

  25. #265
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 798 Times in 489 Posts
    Right. Then BBB models the BWT output by mixing order 1, 2, 4 indirect contexts. Problem with BBB is that it uses a naive sort that runs too slow on highly redundant input. I could fix that using divsufsort to sort the small blocks before merging. But instead I stopped supporting BBB and put my efforts into ZPAQ instead. zpaq -m2 uses divsufsort BWT followed by order 1-2 ISSE chain for better speed and similar compression to BBB. Blocks are also compressed and decompressed in parallel by separate threads. I did not implement the large memory model but Jan Ondrus has written a config file and preprocessor for 1 GB blocks (using 1.4 GB memory) which is posted on the ZPAQ page.

  26. #266
    Member chornobyl's Avatar
    Join Date
    May 2008
    Location
    ua/kiev
    Posts
    153
    Thanks
    0
    Thanked 0 Times in 0 Posts

  27. #267
    Member
    Join Date
    May 2008
    Location
    Antwerp , country:Belgium , W.Europe
    Posts
    487
    Thanks
    1
    Thanked 3 Times in 3 Posts
    Does anyone have a link to the bsc v2.80 binaries ? (the latest without CUDA) ?
    I would like to test the ST6-7-8 modes with and without CUDA acceleration, to see the difference.
    The latest version (v3.0.0) seems always to use CUDA for ST7 -8 modes ; not sure about ST6 though.
    Further, is there a way to limit the number of blocks that are parallel processed ? On my 6+6HT cores = 12 treads, BSC allocates about 13GB RAM if I use 250mb blocks...
    I see only a switch -t, but this completely disables parallel block processing and slows down a lot.

    Thanks in advance !

  28. #268
    Programmer Gribok's Avatar
    Join Date
    Apr 2007
    Location
    USA
    Posts
    162
    Thanks
    0
    Thanked 14 Times in 2 Posts
    Quote Originally Posted by pat357 View Post
    Does anyone have a link to the bsc v2.80 binaries ? (the latest without CUDA) ?
    I would like to test the ST6-7-8 modes with and without CUDA acceleration, to see the difference.
    The latest version (v3.0.0) seems always to use CUDA for ST7 -8 modes ; not sure about ST6 though.
    Further, is there a way to limit the number of blocks that are parallel processed ? On my 6+6HT cores = 12 treads, BSC allocates about 13GB RAM if I use 250mb blocks...
    I see only a switch -t, but this completely disables parallel block processing and slows down a lot.

    Thanks in advance !
    You can download old builds using direct link like this http://libbsc.web.officelive.com/Doc...-2.8.0-src.zip bsc implements ST7&8 on GPU only. ST5&6 has GPU and CPU implementations. You can use -G switch to control behavior. Currently there is no switch to control number of blocks running in parallel. I will probably add it in next release.
    Enjoy coding, enjoy life!

  29. #269
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,583
    Thanks
    234
    Thanked 160 Times in 90 Posts
    BSC for me is very nice compressor!
    Good job !
    Regards! Francesco!

  30. #270
    Member
    Join Date
    May 2008
    Location
    Antwerp , country:Belgium , W.Europe
    Posts
    487
    Thanks
    1
    Thanked 3 Times in 3 Posts
    Gribok,

    It seems the link you gave has only the sources for the older builds, not the compiled binaries....

    http://libbsc.web.officelive.com/Doc.../bsc-2.8.0.zip gave me : 404 not found

    I've tried with Firefox and Opera, same results.

Page 9 of 13 FirstFirst ... 7891011 ... LastLast

Similar Threads

  1. Brute forcing Delta block size
    By SvenBent in forum Data Compression
    Replies: 2
    Last Post: 2nd May 2009, 13:44
  2. Block sorting for LZ compression
    By Bulat Ziganshin in forum Forum Archive
    Replies: 15
    Last Post: 14th April 2007, 16:37

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •