1. ## Entropy (?) question

Im no math-viz. When reading about entropy I understand that it has something to do with the information contained.
High entropy means more randomness.
Low entropy means more compact structures/patterns etc?

Then I heard about the Kolmogorov complexity, the Shannon Entropy etc...
I viewed also a video explaining entropy. But I wasnt quite sure if I understood if one has a mathematical way of finding the low entropy of a chunk of information.

Say I have a file of 256 bytes. The data is computer opcodes, so its some patterns, but it looks random, but its not very random.
Is it possible to find the low-entropy value of these bytes (or 2048 bits)? meaning it contains all the information but the entropy-value tells how much it could have been compressed?

What is this mathematical formula called? Shannon or? (since I am no math-viz I was wondering if anyone here could give me a hint or direction, or if it is no such formula).

Thanks.

Entropy can be defined in the Shannon sense as long as the way you represent your data is simple enough.
For example, in your 256 bytes example, take all bytes, make a histogram of their value. Now you know that each symbol has an optimal cost of -lop2(p = occurence/total). Sum them. The result is a nice figure, which is typically called the "entropy" of a message.

That being said, this figure depends on an implicit modeling : in this case, each byte is considered an "independent" unit, with probability of apparition only decided by global histogram.

And we all know this is a very poor modeling, which compresses badly. We can do much better.
For example, you could sort all these probabilities depending on the previous byte. For human words, it works well : after a "w", you'll more probably get an "o" or an "e", rather than a "k" in an english text.
And then, you start wondering : why not sort them depending on the 2 previous bytes ? or depending on the previous full word ?
What about arabic sentences ? chinese characters ?
What about binary data ? should we still consider we have bytes, or a lot of bits packed together (a jpg image for example) ?
Are there regular patterns, or some typical distances between them ? Are some bytes special after a definable event ?

Pretty soon, you understand the ways to model your data are so numerous that you can't see the end of them. It's infinite.

That's Kolmogorov complexity.

3. ## Thanks (2):

Bulat Ziganshin (2nd November 2015),Rudi (4th November 2015)

4. Originally Posted by Cyan

Entropy can be defined in the Shannon sense as long as the way you represent your data is simple enough.
For example, in your 256 bytes example, take all bytes, make a histogram of their value. Now you know that each symbol has an optimal cost of -lop2(p = occurence/total). Sum them. The result is a nice figure, which is typically called the "entropy" of a message.
To be precise, this is the zero-order entropy, or rather, the entropy of a very simple model where you suppose that data is generated by a black box, without memory, one character at a time. A poor model indeed.

Originally Posted by Cyan
Pretty soon, you understand the ways to model your data are so numerous that you can't see the end of them. It's infinite.
That's Kolmogorov complexity.
Well, not exactly. Kolmogorov complexity has not much to do with entropy. Entropy is a number assigned to a random process, and not assigned to a particular sequence. Hence, one cannot point at a sequence and say, "this sequence has entropy XY". You need to specify a model, and by that a random source, and by that probabilities. That's often implicitly understood as part of "computing entropy", but it isn't.

Kolmogorov complexity is an attribute assigned to a particular realization of a random process, i.e. a given message. It is defined as the smallest computer program regenerating the input message. This is not a very precise definition, unfortunately, but defining it precisely takes a lot of time...

For example: Why does the programming language not matter, computer programs can be written in various styles and so on... In the end, one finds that it does indeed not matter (much) if one only considers very long messages, or rather, sending the size of the message to infinity. In such a setting, the choice of the computer creates a vanishing contribution, i.e. the size of the message dominates, at least for "interesting messages".

5. ## Thanks:

Jiwei Ke (3rd November 2015)

6. Interesting. Thanks for the replies. It seems that i'd have to do some reading.

Cyan: Sorting based on probabilities of bits in their neighborhoods sounds like a good idea. Id have to write something else than lzw to compress using this method. I just recently made a lzw-compressor that is. Since a file for example has finite number of bits, we dont involve infinite numbers into these equations. Atleast not me :P Also let's not base our algorithms on english or any other language (even if its a good way testing the compressor), since our data has information content that can be anything.

Working with small amounts of data (to compress) is very nice practise for me, and I also think that if one is able to compress small amounts of data to something very small, and the algorithm is adaptive enough to compress large amounts, better compression of larger amounts of data will follow. How such a scheme should be set up is another issue. For example I dont want it to be too complex, I want the algorithm to do the job, but I want to understand their underlying mechanics. I am looking into Cellular Automatons, if for example a CA can do some compression work alongside or independent of any other algorithm, then I am fine with it as long as I understand the underlying mechanics of the CA. I recently made a computer simulation of Totalistic CA which I will try and research some more on. I like also reversible CA's because it means after some bits of information is processed one can go backwards to the initial configuration (bit-strings), there are not many CA-rules that are but still interesting stuff.
If anyone would like to see some screenshots of the TCA's, feel free to see and read a little about them on my homepage here: Totalisitc CA

The problem with these CA-computations is that often information gets lost. I always work with Periodic boundary conditions because the edges of some rules often scrambles the data too.

7. Entropy is a function of probability. Usually you don't know what the probability distribution is, so you have to guess. Kolmogorov complexity does not depend on guessing a probability distribution. Instead, it is defined as the length of the shortest program which outputs (or describes) your data.

Kolmogorov (also Solomonoff and Levin, all working independently in the 1960's) proved that the choice of language matters only up to a constant that is independent of the data. For any description in language L1, you can describe it in L2 by appending a fixed sized translator from L1 to L2 written in L2.

More importantly, they proved (again independently) that there is no algorithm that takes strings as input and tells you what the Kolmogorov complexity is. Suppose there was. Then I could describe "the first string (in some lexicographical order) that cannot be described in a million bits". But I just did.

This is why data compression is an art, instead of a solved problem.

8. ## Thanks (2):

biject.bwts (4th November 2015),Rudi (5th November 2015)