# Thread: Looking for compression algorithm with simply decompression

1. ## Looking for compression algorithm with simply decompression

Hi,

I'm a neuromorphic hardware designer trying to implement a compressed cache system on a digital chip as part of my PhD and I have been suggested to ask here for help. During the design, I noticed that the data I'm going to store, despite being composed of 16 bits words,have the following distribution:

minimum number of bits to represent them | %
1 | 20.1%
2 | 12.7%
3 | 22.5%
4 | 29.0%
5 | 14.7%
6 | 0.9%
7 | < 0.1%
8 | < 0.1%
9 | < 0.1%
10 | < 0.1%
11 | < 0.1%
12 | < 0.1%
13 | < 0.1%
14 | < 0.1%
15 | < 0.1%
16 | < 0.1%

The physical structure in which they are going to be stored can be seen as a set of small list of 9 words each

L[0][0]= [0, 1, 2, 3, 4, 5, 6, 7, 8]
L[0][1]= [0, 1, 2, 3, 4, 5, 6, 7, 8]
L[0][2]=[0, 1, 2, 3, 4, 5, 6, 7, 8]
....
L[0][7]=[0, 1, 2, 3, 4, 5, 6, 7, 8]
L[1][0]=[0, 1, 2, 3, 4, 5, 6, 7, 8]
L[1][1]=[0, 1, 2, 3, 4, 5, 6, 7, 8]
...
L[N][7]=[0, 1, 2, 3, 4, 5, 6, 7, 8]

at time step 0, the system is going to read, in parallel, the first values of the 8 lists composing L[K], one from list (so L[K][0][0],L[K][1][0],L[K][2][0],L[K][3][0],...)
at time step 1, the second values of the lists composing L[K] are read(L[K][0][1],L[K][1][1],L[K][2][1],L[K][3][1],...)
at time step 2, the third values

it continues until 9 values are read for each list(L[K][0][8],L[K][1][8],L[K][2][8],L[K][3][8],...), then it starts to read from another L[K]

I'm trying to reduce the number of bits composing each 9-words list using some compression technique. The technique can be very slow and complex during compression phase (it is performed offline) but must be very simply for decompression (to avoid hardware overhead).

Since it is done in HW, it has some extra "unusual" constrain:
-Since we can't change the physical size of the memory cells, the resulting total number of bits required for each list is going to be rounded up to the closest multiple of 8 (e.g. a compressed string of 17 bits is going to become 24 bits long)
-It would be better if the lists are stored independently (without creating a huge 81 words list) unless it gives a really strong advantage (> 10%) in terms of compression.
-We can have shared information for the decoding (e.g. storing the number of bits of the longest word in the 8 lists)
-The decompression can't have multiplications or divisions, only shifts, adds, subs, and, or or large memories (e.g. no large or dynamic dictionaries) since there are going to be 512 decoders implemented on chip

I tried several techniques already, with following resulting bits/word:

-exp golomb coding: 7.76
-fibonacci coding: 6.73
-rice coding: 5.57
-ANS (2 bits symbol)** 11.35 (a lot higher than I was expecting? I have checked the implementation and it seems fine...)
-ANS (1 bit symbol)** 10.86
-shared size* 5.24

**probabilities rounded to closest power of 2 to simplify decoding in HW
*The data are rectified (2*x if positive, -2*x-1 if negative and a reference value X is subtracted from all of them). For each group of entries L[K][0][Y],L[K][1][Y],L[K][2][0],L[K][3][Y]... the number of bits required by the longest word of the group is stored within the data

Before proceeding with the implementation of the current best solution, I would really appreciate any suggestion for algorithms or whatsoever other useful information/paper/etc&

Thank you!

2. Well, here's an arithmetic coder without multiplications of divisions (via exp/log LUTs): http://nishi.dreamhosters.com/u/rc_v1.rar
And this version has FSM counters: http://nishi.dreamhosters.com/u/rc_v5.rar
So you can basically port any statistical model (paq etc) to be mul/div-free.

Interleaved coding recently became popular in PC data compression too (zstd etc),
so it shouldn't be a problem - its compatible both with AC and ANS.

Also, both AC and ANS should provide compression to entropy limit -
for example, its possible to simulate golomb/rice coding, if you'd assign probabilities right.
But its hard to say anything about best model for the data without the data.

As to compression optimization for offline encoding, the common methods are parsing optimization
and parameter optimization.
For example of the latter, we can take static probability tables, then mix their predictions with
an adaptive mixer. In this case, mixer would have some parameters (update rate etc), which can be tuned
for specific data.

3. While not technically the best solution, 16-bit values can sometimes be compressed well simply by splitting into top 8 bits and bottom 8 bits and then using existing standard algorithms on them. It's crude, but may work well so worth a punt.

If the data has not predictability between values then basic entropy encoding is sufficient. You've demonstrated a distribution around a mean point already with the "shared size". You can use this to convert your values to and then apply the other techniques again, eg rice or golomb coding. If it's not data randomly sampled from a distribution, but with some correlation to previous data, then start with the simplest test of delta encoding (subtract from previous value) and work from there.

Finally, frankly the easiest way of learning if your data warrants a lot of analysis first or is sufficient to just shove through a basic entropy encoder, is to pass it through a variety of the heavyweight tools - paq8, cmix, emma, etc. I'm not suggesting you actually use them in production, but knowing how well they work can help direct your research. If they only shrink 10% better than your current methods then it's likely there is little correlation and/or modelling to be done.

4. Thank you, I'm going to see what comes out from your suggestion. In the meanwhile, I had this idea: since the sequences are short (9 values) is there any "generative model" that can be used? In electronics for example are popular LFSR for generating test data, is there any algorithm that can initialize a LFSR-like structure and this will provide a value at each step? From my limited-knowledge point of view, it should be somehow similar to AC and ANS, no?

5. Well, there's https://en.wikipedia.org/wiki/Linear_predictive_coding .
And in general, you can use some SAT solver to compute parameters for some PRNG and generate your data.
But this kind of transformation would only improve compression if code length of original data is longer than
code length of generator parameters (plus code length of residuals if generator is not precise).

In any case, this approach doesn't work for random data. For example, if you try using some LFSR-based function
to predict audio data, the parameters for that function would be likely much less compressible than original data.
So there's no universal solution - you just need to do a statistical analysis of your data and build a model based on that.

Also, generative schemes are mostly used as a speed optimization.
Computing optimal LPC coefs for a block, then processing the block with a static filter
results in faster decoding than computing optimal LPC coefs after each coding step,
but compression is better in the latter case.

And anyway, in the end, you'd just have to compress generator coefs instead of original data.

It would help if you'd post some data samples.

6. You can try to use methods used for lightweight Integer Compression, possibly with Zigzag, Delta or Xor encoding.
- Variable nibble, variable byte, or variable n-bits
- Simple family like vsimple or "simple 9" (similar to shared size).
It is also possible to compress the selectors (or sizes) using RLE for example
- TurboPFor scheme
- Extra-bits like in Deflate

7. Originally Posted by Shelwien
Well, there's https://en.wikipedia.org/wiki/Linear_predictive_coding .
And in general, you can use some SAT solver to compute parameters for some PRNG and generate your data.
But this kind of transformation would only improve compression if code length of original data is longer than
code length of generator parameters (plus code length of residuals if generator is not precise).

In any case, this approach doesn't work for random data. For example, if you try using some LFSR-based function
to predict audio data, the parameters for that function would be likely much less compressible than original data.
So there's no universal solution - you just need to do a statistical analysis of your data and build a model based on that.

Also, generative schemes are mostly used as a speed optimization.
Computing optimal LPC coefs for a block, then processing the block with a static filter
results in faster decoding than computing optimal LPC coefs after each coding step,
but compression is better in the latter case.

And anyway, in the end, you'd just have to compress generator coefs instead of original data.

It would help if you'd post some data samples.
Ok yes I see the point.
Here is a sample of data (it is txt, each line is a different entry to be read as 16 bit signed integer), thank you!

8. Sure looks similar to audio data.
I guess you can optimize a static LPC filter (or several) - multiplications by constant can be approximated well enough by shifts and additions.
And then encode the best filter choice for each 9 samples, or some such.
As to entropy coding, lossless audio formats frequently have rice coding or such, but it could be interesting if you'd use that mul-free AC.

Code:
```6,161,967 weights_cnn_layer_12.txt
4,718,592 weights_cnn_layer_12.bin // 1.exe
1,367,212 1.7z // 7z a -mx=9 -myx=9 1 weights_cnn_layer_12.bin
1,213,105 weights_cnn_layer_12.ofr // ofr.exe --encode --raw --sampletype SINT16 --channelconfig MONO --seek min --preset 4 weights_cnn_layer_12.bin
1,210,365 weights_cnn_layer_12.bin.paq8px // paq8px_v79 -7
1,205,940 weights_cnn_layer_12.bin.paq8pxd18 // paq8pxd_v18 -s7
1,204,062 weights_cnn_layer_12.lea // emma audio-max```
In bits/word this would be... 1204062*8/(4718592/2) = 4.08?

9. It appears you only have 8 bit data present here. You could use an escape code to embed the very occasional 16-bit element if they're present but that infrequent.

So working on that I get the following perl 1-liners to convert.

Code:
```perl -e '\$/=undef;print pack("c*",split("\n",<>))' weights_cnn_layer_12.txt > a.bin
perl -e '\$/=undef;@a=unpack("c*", <>);\$"="\n";print "@a\n"' < a.bin > a.txt```
In terms of basic entropy of the data:

Code:
```len = 2359296 bytes, entropy8o0 = 1329733.119624, 4.508915 bits per byte
len = 2359296 bytes, entropy8o1 = 1213659.440753, 4.115327 bits per byte```
So a quick entropy encoder is going to get you close to those figures. The rANS I have produces 1333493 (order-0) and 1221892 (order-1), but any decent entropy encoder will be in the similar ball park. Given the above (and slow) codecs listed by Shelwien, I suspect there is little predictability in the signal and so a fast crude algorithm is likely sufficient.

Edit: I tried ppmd and o2 onwards gives poorer compression, so correlation between successive values is very limited. Fixed frequency order-1 statistics looks like the quick route, or just doing successive delta of integer values and doing a basic order-0 encoding or trivial integer bit encoding methods (rice etc). Delta does work and gives theoretical entropy of 4.31 bits per byte - not as good as order-1, but close enough.

10. ## Thanks:

Shelwien (21st July 2017)

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•