Results 1 to 14 of 14

Thread: CUDA Technology of nVIDIA for Compression?

  1. #1
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    874
    Thanks
    464
    Thanked 175 Times in 85 Posts

    Exclamation CUDA Technology of nVIDIA for Compression?

    Hi there.

    nVIDIA advertises its CUDA Technology which can be used on any of the
    newer Graphic cards. Is it also applicable for compression?

    What is CUDA technology?

    NVIDIA's CUDA technology is the world's only C language environment for the
    GPU. It enables programmers and developers to create software for quickly
    solving complicated computation problems using GPU multicore parallel
    processing capabilities. To date, NVIDIA has shipped over 80 million CUDA-
    enabled GeForce? 8 Series and higher GPUs, the largest installed base of
    general-purpose, parallel-computing processors ever created. The latest
    generation of NVIDIA GeForce GPUs (GTX260, GTX280) offer up to 240
    processor cores (older ones like GeForce 8800 GT have 112 streaming
    cores), compared to a maximum of the four cores found on the highest-end
    CPU. Any process that can be divided into multiple elements and run in
    parallel can be programmed to take advantage of the massive processing
    potential of the GPU.

    What do you think?

  2. #2
    Member
    Join Date
    Aug 2008
    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts
    IFAIK CUDA is mostly for applications that can make good use of massive parallelization, so I don't think compression would benefit from it.

  3. #3
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    I can't think of any good use, the best being making huge neural networks for PAQ.

  4. #4
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    874
    Thanks
    464
    Thanked 175 Times in 85 Posts
    I know what CUDA is designed for, but what are up to 240 cores compared to 4? How would an algorithm like LZMA benefit from that? I think the idea is very cool and will lead to more and more multithreaded applications.
    And maybe one day also compression applications with enabled CUDA
    (just to decrease time for p.ex. server backups)

  5. #5
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Stephan Busch View Post
    I know what CUDA is designed for, but what are up to 240 cores compared to 4? How would an algorithm like LZMA benefit from that? I think the idea is very cool and will lead to more and more multithreaded applications.
    And maybe one day also compression applications with enabled CUDA
    (just to decrease time for p.ex. server backups)
    There are problems...

    GPUs have very poor integer performance - And compression is almost all about ints. (right? It's just my guess, but most of you know )
    Maybe some lossy algorithms?

    The only generic way of multithreading compression I've heard about (I've heard about it here ) is 4x4-like splitting into separate streams. Don't you think that having 240 streams compressed independently would hurt compression ratio?

    Otherwise you can do only certain tasks with multiple threads, so gain is limited. I can imagine algorithm that does some additional processing just because it has the power though. Maybe compressing with several different algorithms simultaneously and choosing the best result?

  6. #6
    Member
    Join Date
    May 2008
    Location
    Kuwait
    Posts
    324
    Thanks
    29
    Thanked 36 Times in 21 Posts
    if its like threads then any program can benfit for example to compress 100 jpg with packjpg all compressed at one time.. or with precomp or paq..

    lets consider paq compressing a Tar of different formats it needs to get streams (tiff, bmp+pgm, jpg, txt, exe+dll) so it can compress these five streams together then dump the total as one file..so i think it would help compression speed on some stage..

  7. #7
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by maadjordan View Post
    if its like threads then any program can benfit for example to compress 100 jpg with packjpg all compressed at one time.. or with precomp or paq..

    lets consider paq compressing a Tar of different formats it needs to get streams (tiff, bmp+pgm, jpg, txt, exe+dll) so it can compress these five streams together then dump the total as one file..so i think it would help compression speed on some stage..
    If you split on per-files basis, your performance is limited by the slowest file to be encoded. 99 tiny website images + a 100 MB one...you're gonna get much worse performance than with CPU. Maybe compressor should use both CPU and GPU? The simple approach would waste a lot of processing power, but should speed things up...

  8. #8
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    874
    Thanks
    464
    Thanked 175 Times in 85 Posts
    Ok, we agree that splitting in too many streams will hurt compression, but it could be used for preprocessing (like the Precomp or PackJPG suggested by maadjordan). Or just compressing with several different algorithms simultaneously and choosing the best result like m^2 suggested. I think combined use of GPU and CPU might be the best way to get speed and also
    take advantage from additional compression power. And if it just would be
    ZLIB that is used today in most Hard Disk backup solutions..

    Pegasys Inc's TMPGEnc 4 XPress beta 4.6.0 uses this technology for video
    conversion and they claim to get up to 446% speed improvement.
    PackJPG is a very good example since all JPEG are compressed separately
    every JPEG could get its thread.. GeForce TESLA Gfx board for example
    is advertised with "Achieve the highest floating point performance from a
    single chip, while meeting the precision requirements of your application."
    (They are referring to IEEE 754 single & double precision floating point units
    such as hexahedral meshes and those make use of floating point part).
    I don't know if the integer performance is much lower than CPU integer performance, but still, it is an incredible way to future and great solution -
    hopefully also for the compression scene.

  9. #9
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    859
    Thanks
    451
    Thanked 255 Times in 103 Posts
    DirectX 11 will bring all these GPU power available under a standard API. Those interested will be able to test it starting November 2008
    I guess that, by this time, CUDA will become obsolete, not because it is bad, but because it is NVidia.

    Anyway, CUDA or DirectX11, it does not change much from the conclusions discussed in this thread.

  10. #10
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Cyan View Post
    DirectX 11 will bring all these GPU power available under a standard API. Those interested will be able to test it starting November 2008
    I guess that, by this time, CUDA will become obsolete, not because it is bad, but because it is NVidia.

    Anyway, CUDA or DirectX11, it does not change much from the conclusions discussed in this thread.
    We're gonna wait several years before most home computers will support this.
    I think that OpenMP is a better way to go as it's available already and cross platform.

  11. #11
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,239
    Thanks
    192
    Thanked 968 Times in 501 Posts
    1. I played with cuda on my 8800GT and don't see any problems with
    integer performance there. The bottleneck there imho is a memory
    access, because there're no caches - just a lot of registers (8k) and
    slow memory (100x slower than registers).

    2. http://ctxmodel.net/rem.pl?-7

    3. To clarify [2] I can add that BWT implementation for GPU is
    surely possible (though there won't be any significant speed
    improvement, but a few times might be possible; also there would
    be faster GPUs eventually).
    Also most of CM compression algorithms can be easily threaded,
    as there're multiple independent chains of counter updates and the like.

    4. Its really hard to think of a useful GPU application for decompression.

  12. #12
    Member chornobyl's Avatar
    Join Date
    May 2008
    Location
    ua/kiev
    Posts
    153
    Thanks
    0
    Thanked 0 Times in 0 Posts
    What about using this for video compression
    many 8x8 dct ondo 240 cores

  13. #13
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    551
    Thanks
    206
    Thanked 182 Times in 87 Posts
    Quote Originally Posted by chornobyl View Post
    What about using this for video compression
    many 8x8 dct ondo 240 cores
    This is already done. Have a look at the BadaBoom media converter (main page). It uses NVIDIA CUDA to transcode H264 HD videos in realtime (~30fps) or faster (some sources say 60-70 fps).

    Quote Originally Posted by maadjordan View Post
    if its like threads then any program can benfit for example to compress 100 jpg with packjpg all compressed at one time.. or with precomp or paq..

    lets consider paq compressing a Tar of different formats it needs to get streams (tiff, bmp+pgm, jpg, txt, exe+dll) so it can compress these five streams together then dump the total as one file..so i think it would help compression speed on some stage..
    I don't think I'll use CUDA for Precomp, but multithreading is on my todo-list and should be very useful for Precomp. Splitting into threads will be done with some kind of job queue, I think. A job will be either a stream de-/recompression using one of the 81 zLib parameters or a specific task like a PackJPG call.
    http://schnaader.info
    Damn kids. They're all alike.

  14. #14
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    870
    Thanks
    47
    Thanked 105 Times in 83 Posts
    Multithreading on precomp is much welcome.


    Regarding badaboom
    BTW a single Q9300@3GHz doing H.264 high profile encoding is already beyond realtime encoding speed.

    and badaboom use only baseline.. i dont see the big benefit in baddaboom's software. however its a step in the right direction

Similar Threads

  1. CUDA anyone?
    By Mexxi in forum Data Compression
    Replies: 75
    Last Post: 23rd February 2012, 13:13
  2. Nosso Compression technology..
    By maadjordan in forum Data Compression
    Replies: 50
    Last Post: 24th August 2009, 16:10
  3. Training for Voice Technology
    By ReebaMuller in forum The Off-Topic Lounge
    Replies: 0
    Last Post: 13th May 2009, 10:53
  4. Cuda massive Multithreating
    By thometal in forum Forum Archive
    Replies: 2
    Last Post: 18th February 2007, 22:49

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •