1. ## Compress small integers

What will be the best algorithm for compressing these integers? It would be possible to go below 46 bytes?

Code:
`1111115151213111112411116211112152211111212131112251111412141623111143121222112212217131311113315111113221313141332111122141112123232224222321321511111323324121111624322212211421231212121211321121445121431235222`
Code:
```1
1
1
1
1
1
5
1
5
1
2
1
3
1
1
1
1
1
2
4
1
1
11
6
2
1
1
1
1
2
1
5
2
2
1
1
1
1
1
2
1
2
1
3
1
1
1
2
2
5
1
1
1
1
4
1
2
1
4
1
6
2
3
1
1
1
1
4
3
1
2
1
2
2
2
1
1
2
2
1
2
2
1
7
1
3
1
3
1
1
1
1
3
3
1
5
1
1
1
1
1
3
2
2
1
3
1
3
1
4
1
3
3
2
1
1
1
1
2
2
1
4
1
1
1
2
1
2
3
2
3
2
2
2
4
2
2
2
3
2
1
3
2
1
5
1
1
1
1
1
3
2
3
3
2
4
1
2
1
1
1
16
2
4
3
2
2
2
1
2
2
1
1
4
2
1
2
3
1
2
1
2
1
2
1
2
1
1
3
2
1
1
2
1
4
4
5
1
2
1
4
3
1
2
3
5
2
2
2```

2. Entropy is 1.932334 bits per digit. 42.5B or 43B rounded up.

Edit: I miscounted the digits on the mobile phone. Correction below.

Your range of numbers in your sample seems to be 1..16. If you would give us different samples would that be always the case? Or would we face larger numbers sometimes? That's the alphabet of your input. And that's very important to know. (You sent us the numbers twice: I believe that the second one is correct and the first one is not, right? The second one contains the numbers 11 and 16 mixed-in with (inseparable from ) the other numbers)

For compression you'll need a model.

A model's role is to do any necessary rearranging and/or disassembly/assembly and/or prediction.
A model may contain a dictionary of chunks that regularly appear in most inputs, a model may contain a way of representing repeating elements.
A model may be an order-n string model. Or a model may simply contain an algorithm to use a static frequency table and just encode/decode input elements based on those frequencies.
A complex model may be a combination of simple models. There may be for example different parts of your input that may be represented by different models or you may use context mixing to have a more fine-tuned probability estimation.

In order to find out how much your numbers can be compressed, we'll need to find the answer to another question: which is the best model in your case?
However you need to be careful: the model you find may do a good job in the above case of 209 numbers, but may fail miserably when giving it the next bunch of numbers. That's overfitting the model for a specific input data.
So 209 numbers without specifying what these numbers represent is not enough. So from where did you get them?

For creating a good model you need to tell us what these numbers represent.
Those numbers you posted may give us some clue what they would mean, but that would be speculation from our side.
Your numbers could represent residuals from another model (highly likely), or they may be part of an image or just random numbers (very-very unlikely but still a possibility).
For different cases there may be different models that would work best. And different models may give different answers to your original question.

It looks like that the probability of the above numbers are very close to: 50% 1s, 25% 2s, 12.5% 3s, ... 1/(2^n). Is this a true probability distribution? Without you telling us how you come up with those numbers, it's just speculation.
If the above probability distribution is true and there is no other useful information (no structure for example) for the numbers, then the entropy (using an order0 model with these static probabilities) is 53.625 bytes.

What model did you use to get 46 bytes?
@Dresdenboy, what model did you use to get 42.5 bytes?

4. Under the assumption, that this is a homework, I just threw the string into an entropy function. But I didn't see the two 2-digit symbols at first. Are they real or typos?

5. >>But I didn't see the two 2-digit symbols at first. Are they real or typos?
I believe the integers of 11 and 16 are mistakenly mixed in with the single-digit integers when removing whitespace.

>>I just threw the string into an entropy function
Aha, I see.
This is an order0 model then, with the actual probabilities. For the mixed-in sample with 211 digits the entropy in bits is correct (1.932334) but in bytes it is 1.932334 * 211 / 8 = 50.96531 bytes.

>>Under the assumption, that this is a homework
CompressMaster has a long history of asking similar questions:

I hope he learns from them.

6. Originally Posted by CompressMaster
It would be possible to go below 46 bytes?
Probably not, test with 209 values:
61 bytes, test.bin.arb (arb255 32-bit binary http://bijective.dogma.net/compres11.htm)
78 bytes, test.bin.paq8px193
84 bytes, test.txt.paq8px193
104 bytes, test.ge.paq8px193
115 bytes, test.txt.arb (arb255 32-bit binary http://bijective.dogma.net/compres11.htm)
130.625 bytes, binary 5 bits
138 bytes, test.bin.paq8pxd89
159 bytes, test.ge.paq8pxd89
167 bytes, test.txt.paq8pxd89
209 bytes, test.bin (binary 8 bits)
419 bytes, test.txt (text x,x,x)

7. Is this without file headers? Otherwise I'd guess the models don"t adapt to the reduced alphabet that quickly.

Oh and I counted 4*37+28 digits in the webform I used (176 digits), hence the lower value. I'll test that on a real PC later.
Update: Indeed, this are 211 digits. My bad. So Gotty is right with ~51 Bytes with an order-0 model.

But the visible repetitions might help getting this lower thanks to some redundancy.

8. Originally Posted by Dresdenboy
No.

Originally Posted by Dresdenboy
Indeed, this are 211 digits.
Same test with 211 values (11 and 16 split):
59 bytes, test.bin.arb (arb255 32-bit binary http://bijective.dogma.net/compres11.htm)
75 bytes, test.bin.paq8px193
80 bytes, test.txt.paq8px193
91 bytes, test.ge.paq8px193
105.5 bytes, binary 4 bits
116 bytes, test.txt.arb (arb255 32-bit binary http://bijective.dogma.net/compres11.htm)
136 bytes, test.bin.paq8pxd89
148 bytes, test.ge.paq8pxd89
162 bytes, test.txt.paq8pxd89
211 bytes, test.bin (binary 8 bits)
421 bytes, test.txt (text x,x,x)

9. gamma coding gives 59bytes(477bits).
Code:
```!function(A){ for(var a=0,b=A.length,c,d,e=0;a<b;){
for(d=A[a++],c=0;d>>++c;);
e+=c*2-1
}document.write(e," ",e>>3)
}([1,1,1,1,1,1,5,1,5,1,2,1,3,1,1,1,1,1,2,4,1,1,1,1,6,2,1,1,1,1,2,1,5,2,2,1,1,1,1,1,2,1,2,1,3,1,1,1,2,2,5,1,1,1,1,4,1,2,1,4,1,6,2,3,1,1,1,1,4,3,1,2,1,2,2,2,1,1,2,2,1,2,2,1,7,1,3,1,3,1,1,1,1,3,3,1,5,1,1,1,1,1,3,2,2,1,3,1,3,1,4,1,3,3,2,1,1,1,1,2,2,1,4,1,1,1,2,1,2,3,2,3,2,2,2,4,2,2,2,3,2,1,3,2,1,5,1,1,1,1,1,3,2,3,3,2,4,1,2,1,1,1,1,6,2,4,3,2,2,2,1,2,2,1,1,4,2,1,2,3,1,2,1,2,1,2,1,2,1,1,3,2,1,1,2,1,4,4,5,1,2,1,4,3,1,2,3,5,2,2,2])```