https://jpeg.org/downloads/htj2k/wg1..._final_cfp.pdf
Anyone interested in working on a faster entropy coder for JPEG 2000 ?
Deadline for registration of interest is October 1.
https://jpeg.org/downloads/htj2k/wg1..._final_cfp.pdf
Anyone interested in working on a faster entropy coder for JPEG 2000 ?
Deadline for registration of interest is October 1.
26 pages...
"On a given platform and with identical decoded image, the HTJ2K codestream should be on average no more than 15% larger than the corresponding JPEG 2000 Part 1 codestream."
"Over a range of bitrates and on a given software platform, the throughput of the HTJ2K block decoder should be on average no less than 10 times greater than the JPEG 2000 Part 1 block decoder of the reference specified in Annex D. Increase of throughput of the HTJ2K block decoder is also desirable on hardware and GPU platforms."
However, in section B.6: "Assembly language or GPU code shall not be included". I bet JPEG 4000 will ask for GPU source code
And btw, "The submission shall include source code to serve as a verification model, written in a high-level language, such as C or C++" -- I think nowadays C is a relatively low-level language.
This newsgroup is dedicated to image compression:
http://linkedin.com/groups/Image-Compression-3363256
This call resulted in the outcome of the JPEG XS CfP evaluation. We also had a faster JPEG 2000 entropy coder as one candidate (FBCOT) for XS, of course coming from the UNSW (David Taubman). The call did not enter the JPEG XS standardization basically because it seemed likely that an FPGA implementation of the call does would not fit into the target architecture, so we split this part off. It seems that it will become part-15 of JPEG 2000.
What is currently on the table is a combination of MelCode (as in JPEG LS) with RLE-coding, therre is an SPIE paper by David Taubman on this. The source for the speedup is that the updated entropy coder no longer operates on bitplanes or sub-bitplanes, but on groups of bitplanes, which should hopefully create a decent speedup. I also played with this idea approximately five years ago where I combined a "horizontal" (over coefficients) Huffman code with a "vertical" (over bitplanes) code to create a code that was almost as good as JPEG 2000 (though not scalable, same as Taubman's FBCOT proposal today) and quite a bit faster. There was unfortunately not enough momentum in the committee to continue with this, but now we do.
I can check whether I still find my paper on this, David will certainly provide his work - you'll find him at the UNSW pages.
Concerning the speedup: At least according to David's estimate, FBCOT can reach approximately 10 times the EBCOT speed, but note that this is *only* the speed of the bitplane coder (tier 1 of JPEG 2000), not the end-to-end speedup. How much that improves depends then of course on the rest of the code. With a pure CPU implementation in C++, this gives an approximate 2 to 3 times end-to-end speedup according to my estimate, but much more if you parallelize the wavelet, quantizer and multi-component decorrelation transformation.
What we need by October 1st is people raising hands "hey, I would like to contribute to this", then have an optimized implementation (presumably in assembly) available in April next year which will be run in a testbench, i.e. only the entropy coder is needed, not the full JPEG 2000 pipeline.
The call currently only addresses CPU as architecture. If you want to play with FPGA or GPU and faster coding, then JPEG XS is the right card to play. But there we already have an architecture and a test model (albeit slow, at this moment, as we play with a lot of ideas). Again, there were plenty of papers on this at this year's SPIE (Application of Digital Image Processing XL).
PIK is an option where fast lossy photographic decoding is needed. It decodes on software — pik-to-rgb24 — 220 MB/s on a single core and 1.1 GB/s on a multi-core cpu, and gives really good compression densities for photographs (very likely better than the latest video codecs). Pik-to-rgb48 is ~35 % slower. PIK is probably about 5-10 % of the complexity of a modern video codec on key frame decoding. It is based on the relatively simple concepts such as entropy clustering, context modeling, psychovisual modeling, etc. that we have used in our earlier work with zopfli, WebP lossless, guetzli and brotli.
Thanks for all of this background information, Thomas. Links to papers would be greatly appreciated! From what you've described so far, this sort of implementation may not work on a GPU.
As GPUs have very limited on-chip memory, processing groups of bit planes could be in fact slower than processing one bit plane at a time. Unless the entropy LUT table is small and could
be put in constant memory, and/or the flags buffer for the different code passes is small enough to fit in local memory. Good for CPU-based codecs, though.
So, the 10x speedup requirement seems to refer to CPU-based implementations.
Yes, I agree, there should be more focus on GPU. What is interesting is that 8K and VR image size is going to make CPU implementations obsolete, just cannot keep up with GPU. So, this proposal would
give the CPU codecs a few more years of relevance for this use-case.
Thomas, can you comment on licensing vs. royalty-free for JPEG XS ? Always an issue for open source hackers like myself.
I'm attaching my SPIE 2012 paper which somehow got the ball rolling. For David's work, please check with him directly:
https://www.engineering.unsw.edu.au/.../david-taubman
The paper is SPIE copyright, though I can certainly provide my papers for others under fair use conditions. David can probably do the same.
That's certainly correct. I asked David the same question - they are currently not targeting GPUs. Looks to me a bit of a loss, but it seems that the market for JPEG 2000 applications does not require GPUs at this moment. JPEG XS is quite a bit different, here we have a very strong focus on GPU and FPGA, and less so for CPU.
JPEG XS only uses only unary codes for entropy coding (if you even want to call it like this). This can be parallelized on a GPU.
Yes.
Well, two aspects. First, the ISO aspect: We as working group cannot make statements on royalties and licensing, we are technical experts, not legal experts. In particular, we are not allowed to select technology from licensing conditions.
This being said, we can express "desires" what we would like to do, but we have no power to enforce it. As I say, "the right way, the wrong way, the ISO way".
For High-Throughput JPEG 2000 (that is the official name), we have a desire to make this a royalty free standard. For JPEG XS, there is no such desire. The market - professional broadcasting applications - does not have problems with licensing, so it seems very likely that this standard includes IPs. The license costs in broadcasting are minor compared to the hardware costs and the savings you get from a mezzanine codec.
Thanks for the link. I will try to get a glimpse of David's SPIE paper, and see how it might work on the GPU, but I fear the tables for multiple bit planes will be too large. It would be interesting
to try out various entropy coders, for example the ANS variants.
The problem is decoding in parallel - this requires approaches that are particularly tuned towards the GPU architecture. However, GPUs are not in focus of HT-J2K. You are certainly more than welcome to contribute, though given the current understanding of complexity, ANS might be too much. We are talking about coding multiple bitplanes in less than a handful (probably 3-5) CPU cycles. There isn't really much you can do. Combine as many coefficients as possible, encode them in one single go. There isn't much more you can do than a table lookup or a single arithmetic operation plus a buffer write.
That depends on how much SIMD you're willing to accept. With SIMD you can get huge speed ups if the algorithm permits it. With SIMD rANS decoding I think I was hitting a max of 1.4Gb/sec (on a 3.2GHz machine) and maybe 800-1000Mb/s for order-1 depending on data complexity. That was just entropy encoding though with no other data manipulation. Encoding was probably half that speed. Hardening the software would add a bit more overhead - since then I spotted a few issues in the frequency table handling.
Vanilla JPEG 2000 already has a fast mode : RESTART + RESET + BYPASS, where each code pass is encoded independently, and non-skewed lower bit planes are encoded raw.
Decoding such images would work really well on a GPU, as the on-chip memory requirements would be drastically reduced, so kernel occupancy would be very good.
Not really. You forget that even in the bypass mode, the cleanup phase is MQ-coded, and context-dependent, so you cannot decode bitplanes independent from each other. What you really *need* to do is to code (or decode) as many bitplanes as possible in a single go, i.e. decode multiple bits by a single operation. A binary decoder (as demonstrated in the EBCOT) will necessarily be slower.
So as I understood - for final solution for HTJ2K will be taken FBCOT from David Taubman.
Then what are the sources of information to implement it - patent application from David Taubman?
If there are somewhere reference source code available? openjpeg,jasper has nothing about it yet.
I don't think the new standard has been finalized yet. But presumably, there will be a reference implementation from somebody.
Also, I believe the committee wishes for a royalty-free standard, as with existing standard, so royalty-free license would be granted
for any relevant IP.
The link at the top of this thread, in section 1.6, mentions that the goal of HTJ2K is to make it royalty-free.
Thomas @thorfdbg would have the latest info on this.
image compression is a heavily patented field. royalty-free means patent holders agree not to sue you if you use their algorithm, and they won't charge you any license fees.
The HTJ2K (ISO/IEC 15444-15 | ITU T.814) specification has been published and is available free of charge at:
https://www.itu.int/rec/T-REC-T.814/en
Cool, thanks @pter : Let the HTJ2K vs XS battle begin !