Intel Integrated Performance Primitives (IPP) includes a nice set of compression-related stuff - Deflate, BZip2, LZO, LZSS, BWTFwd/BWTInv, ... Nice read:
http://software.intel.com/sites/prod...3_0_Intro.html
![]()
Intel Integrated Performance Primitives (IPP) includes a nice set of compression-related stuff - Deflate, BZip2, LZO, LZSS, BWTFwd/BWTInv, ... Nice read:
http://software.intel.com/sites/prod...3_0_Intro.html
![]()
nburns (3rd February 2014)
I wonder if Intel plans to extend the instruction set or add hardware optimizations to support their library.
There are hardware Deflate solutions already, so Intel is late to the party :P
http://en.wikipedia.org/wiki/DEFLATE#Hardware_encoders
And bzip2 is already being obsoleted by lzma/ lzma2.
I think I recall a conversation a year or two ago about Intel asking how they could optimize compression. My response was that for high end algorithms like PPM or CM that the bottleneck is random memory access, so the best thing they could do was design faster memories and larger caches.
But it seems to me that the most logical target would be to optimize video decompression. Maybe instructions for fast IDCT.
Another possibility is hardware and OS support for compressed memory. For example, blocks of zero bytes don't have to be physically allocated.
If I understood thouse "Integrated Performance Primitives" correctly then this is a bunch of libraries for Linux and Windows. I think the libraries for Windows where *.dll files which only had a few functions (they split all the stuff up so there are many *.dll files) and every *.dll file is something like 2 MB in size. The whole package for Windows is something like 400 MB. So I don't think that this library package is beeing used widely.
Did anyone ever work with thouse libraries? How are they doing regarding speed and memory consumption?
Yes, that's right. If you are trying to do a BWT detransform, then you have the same problem.My response was that for high end algorithms like PPM or CM that the bottleneck is random memory access, so the best thing they could do was design faster memories and larger caches.
But I don't think that larger caches really fixes the problem. It helps but it rather reduces the problem to the data that still doesn't fit in the cache because of it's still to big. Usually the data you need to process is also several times bigger than the actual payload.
One thing that intel has done in the near past was adding support to non-blocking memory accesses to the Atom-series. The Atom-series is probably the most sold processor serie at the time. Maybe this also helps a little to increase the speed. But you would probably need to do some adjustments in the code to make good use of the "out-of-order"-capabilities of a processor because it's somewhat different to an "in-order"-processor (like the older generation of the Atom-processors are).
The most other processors already had support for non-blocking memory accesses. I don't think that AMD ever released another "in-order execution"-processor after their first "out-of-order execution"-processor.
In the last months I tried to do some speed improvements by using the instruction set extensions. I used the SSE2 instruction set extension with the "16 byte"-XMM-registers to improve the detransformation speed of a BWT. I could only use it for some trivial tasks instead of anything serious so the speed improvement was less than 10% but it was very much work to look into all this stuff. So my personal opinion about thouse instruction set extensions is, that they are somewhat disappointing.
Last edited by just a worm; 4th February 2014 at 14:01.
I tested ipp_gzip recently and while faster, it was also poorer at compression. There was still some speed up over and above the slightly lower compression (tested against zlib using lower compression levels to get a similar size), but not a particularly large one.
just a worm (11th February 2014)