Page 11 of 13 FirstFirst ... 910111213 LastLast
Results 301 to 330 of 364

Thread: bsc, new block sorting compressor

  1. #301
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,897
    Thanks
    291
    Thanked 1,267 Times in 715 Posts
    Only ST transform is ported to cuda, not BWT - see which of the -m# options uses ST/cuda.

  2. #302
    Member
    Join Date
    Aug 2016
    Location
    India
    Posts
    36
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien View Post
    Only ST transform is ported to cuda, not BWT - see which of the -m# options uses ST/cuda.
    Ya I compiled st.cu. This file is used in libbsc.cpp. So this filename has been changed to libbsc.cu, which in turn is used in bsc.cpp, so it has been changed to bsc.cu. Thats it. But I dont know why it is not running in GPU

  3. #303
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,562
    Thanks
    772
    Thanked 687 Times in 372 Posts
    try with -m8 option - this mode can run ONLY on GPU

  4. Thanks:

    Vanns (25th August 2016)

  5. #304
    Member
    Join Date
    Aug 2016
    Location
    India
    Posts
    36
    Thanks
    2
    Thanked 0 Times in 0 Posts

    How to reduce the time taken?

    Quote Originally Posted by Bulat Ziganshin View Post
    try with -m8 option - this mode can run ONLY on GPU
    Great man!

    It is running in GPU now.

    /home/1.tar compressed 414125056 into 167088434 in 27.449 seconds.


    But for just 414.1 MB it is taking 27 seconds. In openmp, it just took 9 seconds. Is there any other way to reduce the time taken in GPU?

    It seems libbsc could compress 1GB file in 8 seconds.

    How can I improve the time factor now?

  6. #305
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,562
    Thanks
    772
    Thanked 687 Times in 372 Posts
    overall, it depends on your gpu and cpu. on my own system (i7-4770 + 560Ti) GPU compression is faster than CPU-only. -m7 runs faster, -m5 is fastest GPU-enabled mode. you can also disable preprocessing and use -e1

  7. #306
    Member
    Join Date
    Aug 2016
    Location
    India
    Posts
    36
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    overall, it depends on your gpu and cpu. on my own system (i7-4770 + 560Ti) GPU compression is faster than CPU-only. -m7 runs faster, -m5 is fastest GPU-enabled mode. you can also disable preprocessing and use -e1
    with -m5 I am getting same time. I ran the command

    Code:
    time ./bsc e /home/1.tar /home/11.tar.bsc -m5 -p
    Is there any solution?

    I am working in Xeon processor having card

    Code:
    VGA compatible controller: NVIDIA Corporation GM200GL [Quadro M6000] (rev a1)
    Does core processors support cuda? I mean Intel processors. I know they support OpenCL. But how about CUDA?

  8. #307
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,562
    Thanks
    772
    Thanked 687 Times in 372 Posts
    no, cuda supported only by nvidia gpus

    M6000 was a fastest GPU just half-year ago, so it needs further investigation. start with benchmarking on standard enwik9 file. use "-pGm5" option to ensure that CUDA is employed. it should finish in a few seconds. with my own, much slower GPU it is:

    with GPU:
    Code:
    Z:\>bsc.exe e e9 nul -Gpm5
    This is bsc, Block Sorting Compressor. Version 3.1.0. 8 July 2012.
    Copyright (c) 2009-2012 Ilya Grebnov <Ilya.Grebnov@gmail.com>.
    
    
    e9 compressed 1000000000 into 2140 in 7.971 seconds.
    Without GPU:
    Code:
    Z:\>bsc.exe e e9 nul -pm5
    This is bsc, Block Sorting Compressor. Version 3.1.0. 8 July 2012.
    Copyright (c) 2009-2012 Ilya Grebnov <Ilya.Grebnov@gmail.com>.
    
    
    e9 compressed 1000000000 into 2140 in 10.047 seconds.

  9. #308
    Member
    Join Date
    Aug 2016
    Location
    India
    Posts
    36
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    no, cuda supported only by nvidia gpus
    Your system is i7-4770 + 560Ti. So that the question arises. I am using xeon with card
    Code:
    VGA compatible controller: NVIDIA Corporation GM200GL [Quadro M6000] (rev a1)
    Is there any way to improve the speed further?

  10. #309
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,562
    Thanks
    772
    Thanked 687 Times in 372 Posts
    well, i know how to make BSC fatser, but it needs a few weeks of programming

  11. #310
    Member
    Join Date
    Aug 2016
    Location
    India
    Posts
    36
    Thanks
    2
    Thanked 0 Times in 0 Posts
    okay.

    But basically it could do the compression of 1 GB file in 8 seconds right?

    Why am I not able to?

    Hardware architecture problem?

    And can u tell how to make BSC faster via programming?

  12. #311
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,562
    Thanks
    772
    Thanked 687 Times in 372 Posts
    BSC speed greatly depends on input data. It's great for texts, but may be slower for binary files. So try first with enwik9 and -pGm5 . Then we can see whether you problem is input data or something else

    and publish your CPU model too. are you can use 100% of it? with my options and on my system, speed is really limited by CPU parts of code. It is why i said that BSC can be made much faster by modifying this part

  13. #312
    Member
    Join Date
    Aug 2016
    Location
    India
    Posts
    36
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    BSC speed greatly depends on input data. It's great for texts, but may be slower for binary files. So try first with enwik9 and -pGm5 . Then we can see whether you problem is input data or something else

    and publish your CPU model too. are you can use 100% of it? with my options and on my system, speed is really limited by CPU parts of code. It is why i said that BSC can be made much faster by modifying this part
    CPU model is

    Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz

    I ll try with enwik9 input and update

  14. #313
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,562
    Thanks
    772
    Thanked 687 Times in 372 Posts
    your cpu is 6 cores at 2 GHZ, my one is 4 cores at 3.4 GHz and 15% higher IPC. overal my cpu should be about 30% faster. and sice enwik9 compression speed is limited by CPU even on my box, you should be 30% slower than my numbers above

    btw, one part of problem is that data are split into 25 MB block by default, so you need 12*25 MB input data to fill all CPU threads, and several times more to get rid of long tails. block size may be reduced by adding -b option, f.e -b8Gpm5

  15. #314
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,562
    Thanks
    772
    Thanked 687 Times in 372 Posts
    Quote Originally Posted by Vanns View Post
    And can u tell how to make BSC faster via programming?
    http://encode.su/threads/2533-LZ4-BW...=bsc#post48802

    if someone interested to implement that, i will give more thorough explanation

  16. #315
    Member
    Join Date
    Aug 2016
    Location
    India
    Posts
    36
    Thanks
    2
    Thanked 0 Times in 0 Posts

    How to reduce the time taken?

    Quote Originally Posted by Bulat Ziganshin View Post
    your cpu is 6 cores at 2 GHZ, my one is 4 cores at 3.4 GHz and 15% higher IPC. overal my cpu should be about 30% faster. and sice enwik9 compression speed is limited by CPU even on my box, you should be 30% slower than my numbers above

    btw, one part of problem is that data are split into 25 MB block by default, so you need 12*25 MB input data to fill all CPU threads, and several times more to get rid of long tails. block size may be reduced by adding -b option, f.e -b8Gpm5
    Quote Originally Posted by Bulat Ziganshin View Post
    BSC speed greatly depends on input data. It's great for texts, but may be slower for binary files. So try first with enwik9 and -pGm5 . Then we can see whether you problem is input data or something else

    and publish your CPU model too. are you can use 100% of it? with my options and on my system, speed is really limited by CPU parts of code. It is why i said that BSC can be made much faster by modifying this part
    For enwik9 data

    In OpenMP,

    time ./bsc e /home/testfiles.tar /home/testCPU.tar.bsc -p

    real 0m37.884s
    user 7m13.858s
    sys 0m2.195s

    In GPU,

    time ./bsc e /home/testfiles.tar /home/test.tar.bsc -pGm5

    /home/testfiles.tar compressed 2264715776 into 595521502 in 91.059 seconds.

    real 1m31.162s
    user 1m28.415s
    sys 0m2.616s

    Time is more in GPU still. How could I resolve it?

  17. #316
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,562
    Thanks
    772
    Thanked 687 Times in 372 Posts
    1. use enwik9. i have no idea of your data, it may be unoptimal for bsc
    2. use the same options as in my test. -m5 is really important
    3. as i said, 140 MB is not enough to fill the compression pipeline with default 25 MB block

    so let's start with copying my tests and later look into details of your data

    PS: enwik9 file is "e9" in my testsuite

  18. #317
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,562
    Thanks
    772
    Thanked 687 Times in 372 Posts
    my tests with user/real times:
    Code:
    Z:\>timer bsc.exe e e9 nul -Gpm5
    
    Timer 3.01  Copyright (c) 2002-2003 Igor Pavlov  2003-07-10
    This is bsc, Block Sorting Compressor. Version 3.1.0. 8 July 2012.
    Copyright (c) 2009-2012 Ilya Grebnov <Ilya.Grebnov@gmail.com>.
    
    e9 compressed 1000000000 into 2140 in 7.894 seconds.
    
    Kernel Time  =     6.645 = 00:00:06.645 =  83%
    User Time    =    49.764 = 00:00:49.764 = 628%
    Process Time =    56.409 = 00:00:56.409 = 711%
    Global Time  =     7.924 = 00:00:07.924 = 100%
    
    Z:\>timer bsc.exe e e9 nul -pm5
    
    Timer 3.01  Copyright (c) 2002-2003 Igor Pavlov  2003-07-10
    This is bsc, Block Sorting Compressor. Version 3.1.0. 8 July 2012.
    Copyright (c) 2009-2012 Ilya Grebnov <Ilya.Grebnov@gmail.com>.
    
    e9 compressed 1000000000 into 2140 in 10.031 seconds.
    
    Kernel Time  =     3.088 = 00:00:03.088 =  30%
    User Time    =    67.033 = 00:01:07.033 = 667%
    Process Time =    70.122 = 00:01:10.122 = 698%
    Global Time  =    10.046 = 00:00:10.046 = 100%
    as you see, even with -G multiple threads are used, and real time is 6.28x smaller than user time. in your test, they are the same. some specific of Linux version??

  19. #318
    Member
    Join Date
    Aug 2016
    Location
    India
    Posts
    36
    Thanks
    2
    Thanked 0 Times in 0 Posts
    CentOS 7.2

  20. #319
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,562
    Thanks
    772
    Thanked 687 Times in 372 Posts
    i mean that i have no problems on Windows. can't check it on linux

  21. #320
    Member
    Join Date
    Aug 2016
    Location
    India
    Posts
    36
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    i mean that i have no problems on Windows. can't check it on linux
    Okay

    I my case

    time ./bsc e /home/testfiles.tar /home/testCPU.tar.bsc -Gpm5

    /home/testfiles.tar compressed 2264715776 into 595521502 in 87.119 seconds.

    time ./bsc e /home/testfiles.tar /home/testCPU.tar.bsc -pm5

    /home/testfiles.tar compressed 2264715776 into 595521502 in 87.363 seconds.

    Not much difference as yours

  22. #321
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,562
    Thanks
    772
    Thanked 687 Times in 372 Posts
    can you test on e9 file?!!

  23. #322
    Member
    Join Date
    Aug 2016
    Location
    India
    Posts
    36
    Thanks
    2
    Thanked 0 Times in 0 Posts

    How to reduce the time taken?

    Quote Originally Posted by Bulat Ziganshin View Post
    can you test on e9 file?!!
    Ya I tested with the file enwik9 provided by you in previous messages. That file downloaded as testfiles

  24. #323
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,562
    Thanks
    772
    Thanked 687 Times in 372 Posts
    the archive i provided contains several files, inlcuding e9, which is the file i used for tests above

  25. #324
    Member
    Join Date
    Aug 2016
    Location
    India
    Posts
    36
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    the archive i provided contains several files, inlcuding e9, which is the file i used for tests above
    time ./bsc e /home/e9 /home/e9G.bsc -Gpm5

    /home/e9 compressed 1000000000 into 207808604 in 31.136 seconds.

    real 0m31.238s
    user 0m30.031s
    sys 0m1.218s

    time ./bsc e /home/e9 /home/e9C.bsc -pm5

    /home/e9 compressed 1000000000 into 207808604 in 31.091 seconds.

    real 0m31.193s
    user 0m30.001s
    sys 0m1.203s

  26. #325
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,562
    Thanks
    772
    Thanked 687 Times in 372 Posts
    great. and what are user times (printed by 'time' command)? may be speed is limited by I/O - try with output to /dev/null. also try with -T option - it should show that GPU gets part of user execution time:

    Code:
    Z:\>timer bsc.exe e e9 nul -Gpm5T
    
    Timer 3.01  Copyright (c) 2002-2003 Igor Pavlov  2003-07-10
    This is bsc, Block Sorting Compressor. Version 3.1.0. 8 July 2012.
    Copyright (c) 2009-2012 Ilya Grebnov <Ilya.Grebnov@gmail.com>.
    
    e9 compressed 1000000000 into 2140 in 31.231 seconds.
    
    Kernel Time  =     5.397 = 00:00:05.397 =  17%
    User Time    =    25.677 = 00:00:25.677 =  81%
    Process Time =    31.075 = 00:00:31.075 =  99%
    Global Time  =    31.340 = 00:00:31.340 = 100%
    
    Z:\>timer bsc.exe e e9 nul -pm5T
    
    Timer 3.01  Copyright (c) 2002-2003 Igor Pavlov  2003-07-10
    This is bsc, Block Sorting Compressor. Version 3.1.0. 8 July 2012.
    Copyright (c) 2009-2012 Ilya Grebnov <Ilya.Grebnov@gmail.com>.
    
    e9 compressed 1000000000 into 2140 in 40.327 seconds.
    
    Kernel Time  =     1.591 = 00:00:01.591 =   3%
    User Time    =    38.501 = 00:00:38.501 =  95%
    Process Time =    40.092 = 00:00:40.092 =  99%
    Global Time  =    40.342 = 00:00:40.342 = 100%

  27. #326
    Member
    Join Date
    Aug 2016
    Location
    India
    Posts
    36
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Updated the previous post with real and user time.

    There is no option like -T

    With dev/null

    time ./bsc e /home/e9 /dev/null -Gpm5

    /home/e9 compressed 1000000000 into 2140 in 31.025 seconds.

    real 0m31.125s
    user 0m29.999s
    sys 0m1.138s

    time ./bsc e /home/e9 /dev/null -pm5

    /home/e9 compressed 1000000000 into 2140 in 31.016 seconds.

    real 0m31.117s
    user 0m29.941s
    sys 0m1.190s

  28. #327
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,562
    Thanks
    772
    Thanked 687 Times in 372 Posts
    its obvious that only one core is used. you should compile for openMP AND cuda. these options doesn't exclude each other

  29. #328
    Member
    Join Date
    Aug 2016
    Location
    India
    Posts
    36
    Thanks
    2
    Thanked 0 Times in 0 Posts
    okay. U mean enabling both openmp and cuda flags. If I gave like that, it is showing core dumped

    Code:
    nvcc -DLIBBSC_SORT_TRANSFORM_SUPPORT -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -O3  -lcuda -lcudart -DLIBBSC_CUDA_SUPPORT -Xcompiler -openmp -DLIBBSC_OPENMP_SUPPORT -DNDEBUG -I/usr/local/cuda/include/ -L/usr/local/cuda/lib64/ -c libbsc/adler32/adler32.cpp
    nvcc -DLIBBSC_SORT_TRANSFORM_SUPPORT -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -O3  -lcuda -lcudart -DLIBBSC_CUDA_SUPPORT -Xcompiler -openmp -DLIBBSC_OPENMP_SUPPORT -DNDEBUG -I/usr/local/cuda/include/ -L/usr/local/cuda/lib64/ -c libbsc/bwt/divsufsort/divsufsort.c
    nvcc -DLIBBSC_SORT_TRANSFORM_SUPPORT -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -O3  -lcuda -lcudart -DLIBBSC_CUDA_SUPPORT -Xcompiler -openmp -DLIBBSC_OPENMP_SUPPORT -DNDEBUG -I/usr/local/cuda/include/ -L/usr/local/cuda/lib64/ -c libbsc/bwt/bwt.cpp
    nvcc -DLIBBSC_SORT_TRANSFORM_SUPPORT -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -O3  -lcuda -lcudart -DLIBBSC_CUDA_SUPPORT -Xcompiler -openmp -DLIBBSC_OPENMP_SUPPORT -DNDEBUG -I/usr/local/cuda/include/ -L/usr/local/cuda/lib64/ -c libbsc/coder/coder.cpp
    nvcc -DLIBBSC_SORT_TRANSFORM_SUPPORT -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -O3  -lcuda -lcudart -DLIBBSC_CUDA_SUPPORT -Xcompiler -openmp -DLIBBSC_OPENMP_SUPPORT -DNDEBUG -I/usr/local/cuda/include/ -L/usr/local/cuda/lib64/ -c libbsc/coder/qlfc/qlfc.cpp
    nvcc -DLIBBSC_SORT_TRANSFORM_SUPPORT -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -O3  -lcuda -lcudart -DLIBBSC_CUDA_SUPPORT -Xcompiler -openmp -DLIBBSC_OPENMP_SUPPORT -DNDEBUG -I/usr/local/cuda/include/ -L/usr/local/cuda/lib64/ -c libbsc/coder/qlfc/qlfc_model.cpp
    nvcc -DLIBBSC_SORT_TRANSFORM_SUPPORT -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -O3  -lcuda -lcudart -DLIBBSC_CUDA_SUPPORT -Xcompiler -openmp -DLIBBSC_OPENMP_SUPPORT -DNDEBUG -I/usr/local/cuda/include/ -L/usr/local/cuda/lib64/ -c libbsc/filters/detectors.cpp
    nvcc -DLIBBSC_SORT_TRANSFORM_SUPPORT -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -O3  -lcuda -lcudart -DLIBBSC_CUDA_SUPPORT -Xcompiler -openmp -DLIBBSC_OPENMP_SUPPORT -DNDEBUG -I/usr/local/cuda/include/ -L/usr/local/cuda/lib64/ -c libbsc/filters/preprocessing.cpp
    nvcc -DLIBBSC_SORT_TRANSFORM_SUPPORT -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -O3  -lcuda -lcudart -DLIBBSC_CUDA_SUPPORT -Xcompiler -openmp -DLIBBSC_OPENMP_SUPPORT -DNDEBUG -I/usr/local/cuda/include/ -L/usr/local/cuda/lib64/ -c libbsc/libbsc/libbsc.cu
    nvcc error   : 'cudafe' died due to signal 11 (Invalid memory reference)
    nvcc error   : 'cudafe' core dumped
    make: *** [libbsc.o] Error 139

  30. #329
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    502
    Thanks
    180
    Thanked 177 Times in 120 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    its obvious that only one core is used. you should compile for openMP AND cuda. these options doesn't exclude each other
    How do you enable both? My nvcc doesn't accept -fopenmp option. Maybe it's just too old (V6.5.12). This isn't a machine I have admin rights on so I can't just install new drivers and compilers alas.

    It's definitely not the easiest thing to build in the world, with no explanations of how to compile with cuda support. Ideally it should have a Makefile target ("make bsc_cuda") that just does the donkey work. I finally figured out that we need both st.cpp and st.cu built separately, but both output to st.o so you have to specify -o to give one an alternative filename.

    Note that using -m6 works as does -Gm6, so I think with an unsupported GPU asking for -G is ignored, so you may be benchmarking the CPU only version without realising. -m7 and -m8 are the only realiable way to check if the GPU actually works.

    For me: my bsc_cuda binary seems to give very little gain. This is an oldish machine, but with a GPU which ought to really be up to the job.

    Code:
    @ acceldev2[/tmp]; time bsc_cuda e enwik9 /dev/null -pm5
    This is bsc, Block Sorting Compressor. Version 3.1.0. 8 July 2012.
    Copyright (c) 2009-2012 Ilya Grebnov <Ilya.Grebnov@gmail.com>.
    
    enwik9 compressed 1000000000 into 0 in 84.637 seconds.
    
    real    1m24.644s
    user    1m18.665s
    sys     0m2.180s
    
    @ acceldev2[/tmp]; time bsc_cuda e enwik9 /dev/null -Gpm5
    This is bsc, Block Sorting Compressor. Version 3.1.0. 8 July 2012.
    Copyright (c) 2009-2012 Ilya Grebnov <Ilya.Grebnov@gmail.com>.
    
    enwik9 compressed 1000000000 into 0 in 51.951 seconds.
    
    real    0m53.355s
    user    0m48.895s
    sys     0m2.660s


    The system is a 2 core "Intel(R) Xeon(R) CPU W3503 @ 2.40GHz" (albeit 1 core running a cfagent process flat out as it's gone rogue!). The GPU is a "NVIDIA Corporation GK110GL [Tesla K20c] (rev a1)".

  31. #330
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,562
    Thanks
    772
    Thanked 687 Times in 372 Posts
    i just use precompiled windows executables provided by the author so sorry, i don't know how to enable it. it may be not supported on linux at all

    just random idea - can you compile it separately? i.e. use nvcc only for cuda-specific files and gcc -openmp for the rest?

    Note that using -m6 works as does -Gm6, so I think with an unsupported GPU asking for -G is ignored, so you may be benchmarking the CPU only version without realising. -m7 and -m8 are the only realiable way to check if the GPU actually works.
    -m7/8 works for me too, moreover i got 25% speed increase with -G. and the speed is almost the same with any -m5..8 (which willbe impossible with CPU implementation):
    Code:
    Z:\>timer bsc.exe e e9 nul -Gpm5
    
    Timer 3.01  Copyright (c) 2002-2003 Igor Pavlov  2003-07-10
    This is bsc, Block Sorting Compressor. Version 3.1.0. 8 July 2012.
    Copyright (c) 2009-2012 Ilya Grebnov <Ilya.Grebnov@gmail.com>.
    
    e9 compressed 1000000000 into 2140 in 7.894 seconds.
    
    Kernel Time  =     6.318 = 00:00:06.318 =  79%
    User Time    =    49.670 = 00:00:49.670 = 624%
    Process Time =    55.988 = 00:00:55.988 = 703%
    Global Time  =     7.956 = 00:00:07.956 = 100%
    
    Z:\>timer bsc.exe e e9 nul -Gpm6
    
    Timer 3.01  Copyright (c) 2002-2003 Igor Pavlov  2003-07-10
    This is bsc, Block Sorting Compressor. Version 3.1.0. 8 July 2012.
    Copyright (c) 2009-2012 Ilya Grebnov <Ilya.Grebnov@gmail.com>.
    
    e9 compressed 1000000000 into 2752 in 8.143 seconds.
    
    Kernel Time  =     6.817 = 00:00:06.817 =  83%
    User Time    =    49.483 = 00:00:49.483 = 604%
    Process Time =    56.300 = 00:00:56.300 = 687%
    Global Time  =     8.190 = 00:00:08.190 = 100%
    
    Z:\>timer bsc.exe e e9 nul -Gpm7
    
    Timer 3.01  Copyright (c) 2002-2003 Igor Pavlov  2003-07-10
    This is bsc, Block Sorting Compressor. Version 3.1.0. 8 July 2012.
    Copyright (c) 2009-2012 Ilya Grebnov <Ilya.Grebnov@gmail.com>.
    
    e9 compressed 1000000000 into 2926 in 8.175 seconds.
    
    Kernel Time  =     7.628 = 00:00:07.628 =  92%
    User Time    =    50.325 = 00:00:50.325 = 610%
    Process Time =    57.954 = 00:00:57.954 = 703%
    Global Time  =     8.237 = 00:00:08.237 = 100%
    
    Z:\>timer bsc.exe e e9 nul -Gpm8
    
    Timer 3.01  Copyright (c) 2002-2003 Igor Pavlov  2003-07-10
    This is bsc, Block Sorting Compressor. Version 3.1.0. 8 July 2012.
    Copyright (c) 2009-2012 Ilya Grebnov <Ilya.Grebnov@gmail.com>.
    
    e9 compressed 1000000000 into 3554 in 9.282 seconds.
    
    Kernel Time  =     8.954 = 00:00:08.954 =  95%
    User Time    =    57.361 = 00:00:57.361 = 613%
    Process Time =    66.316 = 00:01:06.316 = 709%
    Global Time  =     9.345 = 00:00:09.345 = 100%


    my bsc_cuda binary seems to give very little gain.
    my gain is 1.25x, youir gain is 1.6x. it's the best you can get without further BSC optimization

Page 11 of 13 FirstFirst ... 910111213 LastLast

Similar Threads

  1. Brute forcing Delta block size
    By SvenBent in forum Data Compression
    Replies: 2
    Last Post: 2nd May 2009, 12:44
  2. Block sorting for LZ compression
    By Bulat Ziganshin in forum Forum Archive
    Replies: 15
    Last Post: 14th April 2007, 15:37

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •