This is interesting:

It's up to twice as fast as libjpeg-turbo on a Skylake CPU with hyperthreading off. I'm not sure what the significance of turning off HT is. I've heard that if you have more than two cores hyperthreading actually slows down most workloads, but I've never looked into it. The CPU they use is a monster. If I understand correctly, what they're casually calling a Skylake 6140 is not a desktop CPU as I first assumed – it's one of the new Skylake Xeon server chips, a "Gold" model, with 18 cores. Were hyperthreading enabled, it would have 36 threads.

The GPU is also a monster, a Tesla V100. I would expect more than a 2X improvement in decode speed with such a beast, but since they're comparing it to libjpeg-turbo on a massive 18-core Xeon CPU, maybe 2X isn't so shabby. (libjpeg-turbo uses a lot of SIMD.) nvJPEG uses both the GPU and CPU. This reinforces my view that CPU-only compression codecs are a waste of time, at multiple levels of analysis. At this point in history, all new image formats should leverage GPUs.