Results 1 to 13 of 13

Thread: Did anyone combine GPGPU and CM?

  1. #1
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,488
    Thanks
    26
    Thanked 130 Times in 100 Posts

    Did anyone combine GPGPU and CM?

    I have thought of some way for GPGPU utilization, eg having a complex delayed model computed by GPGPU and simple instant model computed by CPU.

    For example consider following scheme (substeps are done in parallel):
    1. Code first megabyte using instant model.
    2a. Code second megabyte using instant model.
    2b. Fed first megabyte to delayed model.
    3a. Code third megabyte using instant model and delayed model (delayed model fed up with first megabyte).
    3b. Fed second megabyte to delayed model.
    4a. Code fourth megabyte using instant model and delayed model (delayed fed up with first and second megabyte).
    4b. Fed third megabyte to delayed model.

    I have no idea how to make the delayed model however.


    Anyone has any ideas?

  2. #2
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,552
    Thanks
    767
    Thanked 684 Times in 370 Posts
    the next idea is to break ALL input data into N big chunks and encode each chunk independently. it's how nanozip -co works (while -cO compresses in 1 chunk)

  3. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,835
    Thanks
    288
    Thanked 1,240 Times in 695 Posts
    1. Encoding (but not decoding) is easily enough to parallelize, GPU or not.
    Basically its possible to process each context separately (scan the data, find
    contexts belonging to current thread and store corresponding predictions).

    2. GPU can be used for various data analysis - structure detection, parsing optimization, etc.
    Even bruteforce parameter search for lzma recompression :)

    3. With various filters and compression algos I can imagine like 10 parallel threads even for decoding,
    but that's still not GPU level of threading. And not exactly CM either.

    4. For archiving, fully solid compression rarely makes sense, some kind of segmentation is almost always
    applicable and helpful - even with files like enwik9 its possible to parse it into a few mostly-unrelated streams
    of data (eg. dictionary + LIPT output) and compress them separately.
    And with lots of independent segments, parallel (de)compression is obviously not a problem.

  4. #4
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,552
    Thanks
    767
    Thanked 684 Times in 370 Posts
    3 and 4 - if you have 10 algos/chunks, one may be 50% of entire file/cputime

  5. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,835
    Thanks
    288
    Thanked 1,240 Times in 695 Posts
    Not necessarily.
    Secondary modelling in psrc can be as slow as one wants (its a generic CM).
    Same applies to secondary LZ (i mean compression of patterns in LZ parsing output).
    Stuff like recompression and (dis)assembling also requires considerable computing resources.

    P.S. But I forgot.
    Atm the main problem with GPGPU compression is the poor error correction logic in the commodity GPUs.
    As we'd already seen with bsc 3.0, its enough to process a few GBs of data for error to appear.

  6. #6
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,552
    Thanks
    767
    Thanked 684 Times in 370 Posts
    i thought that i had errors with bsc because my gpu was overclocked. does anyone else had problems too?

  7. #7
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,488
    Thanks
    26
    Thanked 130 Times in 100 Posts
    I know that I can parallelize whole process (ie encodeing and decoding too) by splitting file into chunks. That's the obvious way I was obviously aware of. I also know that at encoding we can collect data from models in parallel with some blockwise pipelining. But I had something like PAQ in mind, where teaching the neural network (or anything like it that is in the PAQ) takes a lot of time and probably is suited well to GPGPU. The main problem is that the neural network need to be updated before encoding each input symbol, so that would make decoding on GPGPU infeasible (synchronization overhead would kill performance) and effectively that would mean decoding speed that is tens times slower than encoding speed. But maybe there's a niche for such codec

    I was asking for ideas on a codec with most of workload parallelized during encoding and decoding, that's why I made up that scheme with delayed model. But unfortunately I cannot imagine any working implementation of that idea.

  8. #8
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,552
    Thanks
    767
    Thanked 684 Times in 370 Posts
    one niche for asymmetric codec is backups

  9. #9
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    one niche for asymmetric codec is backups
    In the enterprise: not really. Compression has to be fairly fast because computers are expected to do many TBs/hour...and compression is not the only task these computers have to perform. Buying faster processors that consume more energy to save 0.1% of backup size is just not worth it.
    Smal businesses and home users might be different, but they often don't have it well automated and there's some operator waiting for backups to happen. They usually want backups to be fast too.

  10. #10
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,835
    Thanks
    288
    Thanked 1,240 Times in 695 Posts
    > I know that I can parallelize whole process (ie encodeing and
    > decoding too) by splitting file into chunks.

    But splitting into fixed-size chunks is not the only way.
    As I said, we can run a segmentation algo first, so that
    separate compression of segments won't hurt overall compression.
    Also frequently its possible to split the data into multiple streams
    by parsing its structure - see 7z/bcj2 etc.

    > That's the obvious way I was obviously aware of.

    Well, I also keep talking about implementing compression with
    lots of async algos, but I guess it seems boring for you somehow,
    although it surely needs a lot of research to reach good results.
    At least, there's always compression of internal data structures
    for large-scale models (window etc), parsing optimization
    (in CM it can be symmetric too), dedup/filters/segmentation,
    and the secondary modelling stage in a parallel rc.

    > But I had something like PAQ in mind, where teaching the neural
    > network (or anything like it that is in the PAQ) takes a lot of time
    > and probably is suited well to GPGPU.

    There's no back-propagation or layers, so I'd not call the context model
    in paq a neural network.

    And anyway, only mixing and mixer update can be vectorized there,
    and mixing can't be really async unless you decide to mix millions
    of submodels.

    One interesting option is speculative update though.
    Imagine a context with p0=0.1 - we can update it 6 times for bit=1
    and still have 53% chance to guess correctly.
    However additional storage is a problem (its required to undo
    the updates in case of error).

    > The main problem is that the neural network need to be updated
    > before encoding each input symbol,

    Well, its not exactly true. Instead, each context has to be updated
    before encountering the same context again.
    So in theory, its possible to have a mostly async update thread...
    but synchronization would be still required sometimes, which is very slow on CPU,
    so unless its something of paq's complexity, there's likely not much sense.
    However in theory GPUs should be much more sync-friendly...
    Still, the performance bottleneck for CM is certainly not the update.

    > decoding speed that is tens times slower than encoding speed.

    Remember that for a GPU implementation, decoding would have to be a part
    of encoding process, to verify that GPU didn't make an error somewhere.

    Also current GPUs are not really that parallel - its more like massive vectorization
    than actual parallel execution, and there's not so much of real independent cores.

  11. #11
    Member
    Join Date
    Apr 2010
    Location
    El Salvador
    Posts
    43
    Thanks
    0
    Thanked 1 Time in 1 Post
    Asymmetric coders can also encode faster than decode, fe. satelite data is often coded with prime-based error-correction codes, encoding is easy, decoding requires exhaustive searches in prime-space. There are some curious papers about these kind of asymmetric coders.

  12. #12
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    882
    Thanks
    478
    Thanked 276 Times in 116 Posts
    Speaking about GPGPU :
    I've had a lot of work around OpenGL and Textures lately.
    And the more i "combine" textures, the more i feel like i'd be better off with using a direct data stream within the fragment shaders,
    rather than a converted floating-point value (sometimes interpolated) automatically handled by openGL (GL_TEXTURE_2D).

    Does anyone know how to achieve that ? (while keeping into an OpenGL world, so no OpenCL nor DirectX).

    I've looked around in search of documentation, but well, most of the "docs" i find are almost encrypted in a custom-jargon. I sometimes feel that OpenGL snippets of information are meant to separate community members from would-be-opengl-programmer. In short, they don't learn anything. At best, they are "memo" for people which already know the information.

  13. #13
    Member
    Join Date
    Nov 2012
    Location
    Bangalore
    Posts
    114
    Thanks
    9
    Thanked 46 Times in 31 Posts
    Quote Originally Posted by Shelwien View Post
    > I know that I can parallelize whole process (ie encodeing and
    > decoding too) by splitting file into chunks.

    But splitting into fixed-size chunks is not the only way.
    As I said, we can run a segmentation algo first, so that
    separate compression of segments won't hurt overall compression.
    Also frequently its possible to split the data into multiple streams
    by parsing its structure - see 7z/bcj2 etc.
    I am doing the chunk splitting in pcompress but with deduplication enabled it splits at a content-defined rabin boudary rather than a fixed position. As you mention there are better ways of breaking into chunks. Something that Ocarina networks (now acquired by DELL) did sometime back.

    Quote Originally Posted by Shelwien View Post
    > decoding speed that is tens times slower than encoding speed.

    Remember that for a GPU implementation, decoding would have to be a part
    of encoding process, to verify that GPU didn't make an error somewhere.
    This will depend on the class of the GPU. In case of Nvidia, the Tesla/Fermi series are designed for compute with stable correct processing and ECC RAM etc. On the other hand if you are using the consumer grade GeForce series then yes, correctness is not guaranteed.

    Quote Originally Posted by Shelwien View Post
    >Also current GPUs are not really that parallel - its more like massive vectorization
    than actual parallel execution, and there's not so much of real independent cores.
    Yes, the so called cores are in fact just ALUs.

Similar Threads

  1. GPGPU computing
    By Bulat Ziganshin in forum The Off-Topic Lounge
    Replies: 0
    Last Post: 19th February 2012, 16:10
  2. A paper on GPGPU Huffman
    By m^2 in forum Data Compression
    Replies: 3
    Last Post: 15th July 2011, 02:02
  3. Idea: Combine Compression & Encryption
    By dirks in forum Data Compression
    Replies: 16
    Last Post: 22nd February 2010, 10:49

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •