Results 1 to 4 of 4

Thread: A paper on GPGPU Huffman

  1. #1
    Member m^2's Avatar
    Join Date
    Sep 2008
    Ślůnsk, PL
    Thanked 65 Times in 47 Posts

    A paper on GPGPU Huffman
    c.a. 2x faster than CPU with OpenMP and 4-6x faster than single threaded.
    NVIDIA GeForce GTX 285 with 240 cores at 1.5GHz, Core i7 965 with four cores at 3.2 Ghz
    Last edited by m^2; 14th July 2011 at 23:25.

  2. #2
    Join Date
    Jun 2009
    Kraków, Poland
    Thanked 136 Times in 104 Posts
    Why so poor scaling on multicore CPU? 4 cores/ 8 threads and only a little above 200 % of scaling on quad i7...

  3. #3
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Hessen, Germany
    Thanked 252 Times in 128 Posts
    Nice paper. It gives some interesting and insightful explanations on the seemingly poor GPU scaling:

    Despite the GPU having 60 times the number of cores as our CPU, the differences in throughput between the GPU encoder and the OpenMP encoder are not dramatic. This paradox can be largely resolved by recalling that the architecture of the GPU was developed for the SIMD, single instruction multiple data, programming model while our CPU was developed with MIMD, multiple instruction multiple data, in mind.
    The control unit broadcasts an instruction to all the cores, and optimal performance can only be achieved when every core can execute it. If, for example, the instruction is a branching statement, then there is a likelihood that some cores will not follow the jump, and in this case, some cores must remain inactive until they either themselves satisfy the branching instruction or control passes beyond the branching sections of the code. Therefore, in the worst case, when only one core can satisfy the jump and the other seven are left idle, our GPU behaves more like a crippled 30 core shared memory MIMD machine with a slow clock speed and no automatic memory caching.

    Quote Originally Posted by Piotr Tarsa View Post
    Why so poor scaling on multicore CPU? 4 cores/ 8 threads and only a little above 200 % of scaling on quad i7...
    This is mysterious indeed. The decoding performs well (~40 MB/s=>~150 MB/s with OpenMP), so I guess they're having some problems with the encoder which is strange because by splitting data into blocks, the expected 300% speedup should be no problem. Another strange thing is that encoding is much faster then decoding - is this common for Huffman?
    Last edited by schnaader; 15th July 2011 at 01:44.
    Damn kids. They're all alike.

  4. #4
    Join Date
    Sep 2008
    Thanked 280 Times in 120 Posts
    It's common for Huffman Decoding to be slower than Encoding, but it does not need to be *that* dramatic.
    I would say that beyond a 20% speed difference, there is certainly some missing optimisation.

    Not sure if the implementation they compare GPGPU with is properly done.

    For reference :

Similar Threads

  1. Huffman code generator
    By Shelwien in forum Data Compression
    Replies: 2
    Last Post: 24th May 2011, 03:50
    By biject.bwts in forum Data Compression
    Replies: 20
    Last Post: 4th September 2009, 22:10
  3. huffman's Coding
    By swapy in forum Data Compression
    Replies: 5
    Last Post: 12th August 2009, 23:51
  4. Advanced Huffman Encoding
    By Simon Berger in forum Data Compression
    Replies: 28
    Last Post: 15th April 2009, 15:24
  5. Original paper on LZMW
    By encode in forum Forum Archive
    Replies: 3
    Last Post: 9th February 2008, 14:34

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts