Results 1 to 8 of 8

Thread: New nvJPEG decoder from Nvidia

  1. #1
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    245
    Thanks
    100
    Thanked 48 Times in 32 Posts

    New nvJPEG decoder from Nvidia

    This is interesting: https://developer.nvidia.com/nvjpeg

    It's up to twice as fast as libjpeg-turbo on a Skylake CPU with hyperthreading off. I'm not sure what the significance of turning off HT is. I've heard that if you have more than two cores hyperthreading actually slows down most workloads, but I've never looked into it. The CPU they use is a monster. If I understand correctly, what they're casually calling a Skylake 6140 is not a desktop CPU as I first assumed – it's one of the new Skylake Xeon server chips, a "Gold" model, with 18 cores. Were hyperthreading enabled, it would have 36 threads. https://en.wikichip.org/wiki/intel/xeon_gold/6140

    The GPU is also a monster, a Tesla V100. I would expect more than a 2X improvement in decode speed with such a beast, but since they're comparing it to libjpeg-turbo on a massive 18-core Xeon CPU, maybe 2X isn't so shabby. (libjpeg-turbo uses a lot of SIMD.) nvJPEG uses both the GPU and CPU. This reinforces my view that CPU-only compression codecs are a waste of time, at multiple levels of analysis. At this point in history, all new image formats should leverage GPUs.

  2. #2
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    767
    Thanks
    217
    Thanked 286 Times in 168 Posts
    Which applications benefit most from faster JPEG decoding times?

  3. #3
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    245
    Thanks
    100
    Thanked 48 Times in 32 Posts
    Quote Originally Posted by Jyrki Alakuijala View Post
    Which applications benefit most from faster JPEG decoding times?
    Computer vision, deep learning, machine learning, and what some people are prematurely calling "AI". Nvidia has more context here: https://news.developer.nvidia.com/an...nvidia-nvjpeg/

    DALI relies on nvJPEG.

    And regular users would benefit somewhat. I think an Instant Computing (and secure by design) OS is long overdue. Almost everything we do on a computer/mobile should happen instantly, but instead we're constantly waiting for computers to fully respond, to take input, etc. even with SSDs. I want an OS where things like app launch (with full readiness) is guaranteed to take 100 ms. Maybe 200 ms. And everything but multimedia transcoding and such should also happen instantly. So I want the fastest possible image decode, single image and bulk, to honor the guarantees of an Instant OS. I don't think we should ever stop trying to get better, to get faster, and more energy efficient.

  4. #4
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    245
    Thanks
    100
    Thanked 48 Times in 32 Posts

    MS version

    By the way, Microsoft did something similar in 2013 for Internet Explorer 11. They saw about a 45% drop in JPEG decode time by giving some of the work to the GPU, and they don't seem to be tied to any brand of GPU:

    IE11 decodes the JPG image into the chroma subsampled YCbCr color space on the CPU, but then does the chroma upsampling and YCbCr to RGB color conversion steps on the GPU at draw time, where it can happen much faster and in parallel. This process frees CPU time to perform other operations, as the CPU is a common bottleneck in modern sites and apps. In addition to the decode time improvements, copying the much smaller YCbCr image to the GPU reduces the amount of memory that is copied and stored on the GPU (a limited resource). Using less CPU and memory also reduces power consumption and increases data locality.
    I don't know if these optimizations were included in Microsoft Edge, but I'd bet they were. One thing that holds me back on using JPEG-XR is that to my knowledge Microsoft has not implemented similar hardware acceleration for that format (in IE11 or Edge). I know that if I send JPEGs to an IE11, and probably Edge, user, they'll decode them faster than any other browser. But I assume it will take longer to decode JPEG-XR since Microsoft has been so lazy with that format and doesn't seem to have even optimized the standard CPU-based decoder, much less offer a hardware accelerated decoder in their browsers. Since no other browsers support JPEG-XR, and Microsoft doesn't talk about it much, using it just seems like a needless risk and source of glitches. I feel similarly about webp, particularly given the profound lack of valid performance and battery use data from Google on the format. And I won't touch JPEG-2000 since Apple is completely silent on it. So JPEG is still the default, and improvements to JPEG decoding speed are great to see.

  5. #5
    Member
    Join Date
    Oct 2009
    Location
    usa
    Posts
    59
    Thanks
    1
    Thanked 9 Times in 6 Posts
    With such CPU and GPU power, why bother with decoding JPEG at all? Just use raw, uncompressed images. We need to move on from lossy compression at this point. And JPEG is 25-year old tech now... Let's at least do the codec as h.265?

  6. #6
    Member
    Join Date
    Aug 2010
    Location
    Seattle, WA
    Posts
    79
    Thanks
    6
    Thanked 67 Times in 27 Posts
    Quote Originally Posted by SolidComp View Post
    This reinforces my view that CPU-only compression codecs are a waste of time, at multiple levels of analysis. At this point in history, all new image formats should leverage GPUs.
    There's still plenty of need for pure CPU based codecs. Some products (like triple A games rendering at 4k resolution) already use the entire GPU for rendering, with little if any GPU power left to spare. But they still have one or more CPU's just sitting there mostly idling because modern software is hard to parallelize well.

    Also, anything that uses the GPU has to talk to a driver via an API. As soon as you introduce this you're up against driver bugs, platform dependencies, and a wide range of different API's with different capabilities. With a CPU codec there is no API to worry about by comparison - it's just some code. A well-vectorized and multithreaded CPU codec can be extremely fast.

  7. #7
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    245
    Thanks
    100
    Thanked 48 Times in 32 Posts
    Quote Originally Posted by rgeldreich View Post
    There's still plenty of need for pure CPU based codecs. Some products (like triple A games rendering at 4k resolution) already use the entire GPU for rendering, with little if any GPU power left to spare. But they still have one or more CPU's just sitting there mostly idling because modern software is hard to parallelize well.

    Also, anything that uses the GPU has to talk to a driver via an API. As soon as you introduce this you're up against driver bugs, platform dependencies, and a wide range of different API's with different capabilities. With a CPU codec there is no API to worry about by comparison - it's just some code. A well-vectorized and multithreaded CPU codec can be extremely fast.
    I think the driver and API issues will be greatly simplified by SPIR-V. OpenCL and other languages will compile to SPIR-V and GPUs will consume it.

  8. #8
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    895
    Thanks
    54
    Thanked 109 Times in 86 Posts
    Quote Originally Posted by SolidComp View Post
    This is interesting: https://developer.nvidia.com/nvjpeg

    I'm not sure what the significance of turning off HT is. I've heard that if you have more than two cores hyperthreading actually slows down most workloads, but I've never looked into it.
    its not because of more than 2 cores. if you have physsical cores enough to satify the software cpu heavy threads. adding HT will slow you down.
    this is because now the software threats might end up going to the same physsical cores instead of going to each of thier own separate physical core.

    aka under 2core with HT (worst case)
    Thread 1 goes to logical core 1 that is phys core 1
    Thread 2 goes to logical core 2 that is phys core 1

    ntohing is going to logical core 3 and 4 which is physsical core 2
    only one core are in use here that the threads have to fight about

    Without HT
    thread 1 goes to physical core 1
    thread 2 goes to physical core 2

    Both core are running and no threads has to fight for the core sub ressources


    its easy to test and the results can be easily be 15% performance increase from disabling HT on load with low numbers of cpu heavy threads

Similar Threads

  1. Knusperli — a better JPEG decoder
    By Jyrki Alakuijala in forum Data Compression
    Replies: 8
    Last Post: 10th March 2018, 06:50
  2. Replies: 1
    Last Post: 17th February 2014, 22:05
  3. Code generation in LZ decoder / Branchless LZ77 Decoder
    By Shelwien in forum Data Compression
    Replies: 1
    Last Post: 30th September 2010, 20:48
  4. CUDA Technology of nVIDIA for Compression?
    By Stephan Busch in forum Data Compression
    Replies: 13
    Last Post: 17th September 2008, 21:44
  5. UNZ - minimal LZW decoder + source
    By encode in forum Forum Archive
    Replies: 7
    Last Post: 29th January 2008, 14:54

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •