Hi. I'm interested in finding out whether there is any work on approximating how much further compression can go?

The two following charts represent all of the compression scores on the Matt Mahoney and squeezechart sites.

WARNING 1: the time axes are not to scale as I was unable to accurately incorporate dates. So I just graphed it all as simple line charts.
WARNING 2: I cannot vouch for the existence of the slight (apparent) downturn at the extreme right of the charts for the reason above.

Clearly there is some form of asymptotic behaviour. Intuition tells us that there has to be a limit by virtue of the pigeon hole principle. Otherwise anything would compress down to one byte. I'm really not sure what role Kolmogorov complexity (Kc) plays regarding incremental improvements to the current top compression algorithms featured on this site. Respectfully, are the authors here hoping for a revelation into how Kc might become computable? I personally don't think that Kc will help with the current benchmark files. They're human generated ultimately (names, images, phone numbers etc). It's hard to imagine how a hundred lines of magical computer code could ever hope to produce the wiki definition of "anarchism" for example.

So I think we have to limit this question to algorithmic improvements. A file compressed by cmix virtually passes the ent randomness test. This indicates that a cmix file is almost pure entropy. Consequently there doesn't seem to be much room for improvement. cmix and the paq8 compressors produce files that are within 0.1% of the theoretical Shannon sizes when I test them with synthetically generated data sets (using the standard Shannon log formula). My three graphs all look asymptotic.

Therefore I wager that another 50% reduction in compressed file size is impossible. enwik9 will never reach 58MB. Is there any scientific work/references on testing my assertion?

Depends on the models we develop, most high end codecs predict using a combination of binary, byte-wise, and word-wise context, if there's a higher level context we can take advantage of such as a topic-model or tonal-model then we can get higher compression. Compression is such a complex problem there isn't really a mathematical way to prove we've reached a limit.

The big question is if we are closing to the theoretical limit: Kolmogorov complexity, which is unknown but fixed.

Regarding text, we need to have in mind that entropy growth is sublinear ( https://encode.su/threads/2092-Infro...xt-compression ):
H(n) ~ n^alpha
where alpha for text seems to be ~0.9, but might be improved in future compressors, there are suggestions that alpha > 0.6.
This sublinear behavior means that at least in theory, asymptotically we need 0 bits/symbol (for n->infinity) - due to long range correlations in text, it becomes more and more difficult to write something really new ...

This sublinear entropy growth might also concern other media, like all used textures would finally repeat in infinite movie ...
It brings a practical question: how to exploit this bits/symbol -> 0 limit?
Like building new YouTube video from all the old ones ... or more practically: learn the space of possible videos and encode within it - and we have now started going there with machine learning-based image/video compressors.

I think the situation is different between lossless and lossy; In theory, for lossy, one can represent the manifold of "naturally occurring video/image/etc" and encode relative to that (that's a way to understand what NN based compressors are doing - they are learning a compact representation for a high-dimensional manifold embedded in a much higher-dimensional ambient space). It feels (not a very scientific statement) that lossless is different since it cannot afford to drop (not encode) "noise" that is far from the manifold in question, and noise is definitionally not compressible.

You are right - in multimedia there is rather always impossible to predict noise.
While we can try to somehow remove it in lossy, lossless cannot really approach asymptotically 0 bits/symbol.

... and there is a similar issue in text compression: analogue of noise is e.g. freedom of choosing synonyms, we could try to do "lossy text compression" by somehow standardizing expressions to completely eliminate ambiguity ... but otherwise it just cannot asymptotically approach 0 bits/symbol.

Łukasz Dębowski has a nice idealization ("Santa Fe process") of such approaching 0 bits/symbol: there is some hidden random infinite sequence of 0/1, and we try to compress a sequence of its values on randomly chosen positions.
So information about this hidden sequence saturates along the compressed sequence - asymptotically leading to 0 bits/symbol.
This hidden sequence represents the entire knowledge, our e.g. succeeding texts saturate this knowledge ... a few philosophical assumptions and we could get approaching 0 bits/symbol

Partial disclosure: My interest in (lossless) compression is for using it as a tool to measure the entropy rate of ergodic/stationary but truly random processes. Not really for text, but I've found that the latest compressors work really well with random data. I don't think that Hilberg's conjecture will hold for a truly random process. It's derived from work with human language, not avalanche diodes where memes/set phrases are not relevant.

I build true random number generators for my sins. Having developed an entropy source, I sample it and then try to guesstimate the entropy rate using compression. So my cunning plan is to do |cmix(samples)| / n. The purpose of this thread is to determine a safe yet practical value for n. So far I've used n = 2. So for example, I might have an ASCII data set like:-

There's no possible underlying model for this behaviour as it's derived from quantum indeterminacy within a hardware thingie. It's truly random, but correlated to some unknown degree. Compressing such a set may result in say 5 bits/sample. Hence I'm relying on the effectiveness of cmix to approximate the true entropy.

What would I then divide it by to be safe yet practical? Simply dividing by a million may not be clever...

Ciril, as mentioned, for enwik this limit is unknown but indeed fixed: as its Kolmogorov complexity.

Regarding compression of PRNG sequence, even the best compressors only focus on relatively simple statistical dependencies, have no chance to find rules governing even the simplest (deterministic) random generators - what requires a sophisticated cryptanalysis.

If you are interested in measuring entropy rate of real (e.g. time) series as minus asymptotic log-likelihood like in data compression, a very accurate approach is first normalizing variables to uniform marginal distribution like in copula theory, then modelling joint distribution with polynomials (slides) - some results for log returns of daily stock prices of 29 Dow Jones companies:
If you are interested in maximizing entropy rate, see https://en.wikipedia.org/wiki/Maxima...py_Random_Walk

Regarding compression of PRNG sequence, even the best compressors only focus on relatively simple statistical dependencies, have no chance to find rules governing even the simplest (deterministic) random generators - what requires a sophisticated cryptanalysis.

I think that you're being overly pessimistic. I've done some testing that suggests these compressors are pretty good. In fact, the linked test approximates entropy to within 0.1% of the true theoretical amount. That's pretty much perfect. Matt's synthetic bench marking thread also seems to bear this out for my kind of data. Even 7z is only a few tens of percent off the actual value. And the auto correlation lag is over 70 bytes. Remember that this is not human text, and thus relatively simple in comparison.

I'm starting to think that for sample data of the type I posted above, no significant improvements can be achieved That would confirm an n = 2 as safe.

There is still a lot of headroom for reducing the enwik8/enwik9 size, especially if you allow unrestricted resource usage. However, making progress on the Hutter prize will be a lot more difficult due to the resource restrictions. Over the last few years of working on cmix, I don't really feel like making incremental improvements has become any more difficult. I am attaching a graph of the enwik8+decompresser size for all the previous cmix versions: over the last three years progress has been close to linear. I can't guess where the actual compression limit will be, but I think we are still far away from that since it is still relatively easy to make improvements.

The whole point of Kolmogorov complexity is that it's not possible to prove that the ultimate limit of compression for a file has been reached.
And basic entropy measurements do not provide a meaningful hint of what this limit could be.

Let's take an example.
Presuming the exercise is to compress enwik8, but enwik8 has been encrypted first.
Encrypted enwik8 has same size as enwik8 (or even a bit more), and its entropy tells that it cannot be compressed.
By the way, trying to compress encrypted enwik8 proves it : no compressor can find any gain.

Yet, that's not completely true.
If one knows the encryption algorithm and key, one can decrypt, and then compress, leading to substantial gains.

That's the essence of Kolmogorov complexity.
A source may seem impossible to compress, and this can be confirmed by entropy measurements,
but it just proves that the decoder is unable to interpret that data, not that it's not compressible.

The big question is if we are closing to the theoretical limit: Kolmogorov complexity, which is unknown but fixed.

I strongly feel that we are far from reaching the Kolmogorov complexity.
Let me explain.

Remember the 4KB intros/compos/demos? Participants create (generate) mind blowing videos with sound with just 4KB of code+data (youtube). If you try to compress the resulting video+audio by actually modeling it as video+audio, you are soooooo far from the Kolmogorov complexity (which is less than 4KB).
I think 4KB intros/compos/demos are the best to illustrate how large the gap could be between the size of a compressed file and the Kolmogorov complexity of that file.

My enwik8 skeleton compression post demonstrates the same idea: I have a very good '"compression ratio" (71:1), but my approach is different: I do not let a general compression algorithm to try to guess the structure of the file, I use an algorithm crafted (almost optimally) for that particular file. Do you guys feel the difference?

So how to get closer to the Kolmogorov complexity? Here is a (non-practical) algorithm:
Take a file. See it, understand it, let your brain pick up its patterns and finally write an algorithm that creates the file. In order to get the best results - take a decade of vacation ("unrestricted resource usage" in Byron's words). If you did your job right, your result will always be better (sometimes far better) than using any "general" compression algorithm that we use today. No "general" compression algorithm is specialized for this file.

Originally Posted by ciril_desquirrel

It's hard to imagine how a hundred lines of magical computer code could ever hope to produce the wiki definition of "anarchism" for example.

The definition of Kolmogorov Complexity might be misleading. Program = code + data. Thus an upper bound of the Kolmogorov complexity of a file is the size of its compressed format + the size of its decompressor. Not just "a hundred lines of magical computer code" - that's just the algorithm. You will need to include some data as well. You may imagine the "data" part as a tight decision tree that the "a hundred lines of magical computer code" uses.

Originally Posted by Jarek

at least in theory, asymptotically we need 0 bits/symbol (for n->infinity)

A small correction: I agree that is is getting closer and closer to 0, but not arbitrarily close (-> "asymptotically" does not sound right). It is approaching the entropy of the source (which is certainly not 0).

Originally Posted by Cyan

The whole point of Kolmogorov complexity is that it's not possible to prove that the ultimate limit of compression for a file has been reached.
And basic entropy measurements do not provide a meaningful hint of what this limit could be.
[...]
A source may seem impossible to compress, and this can be confirmed by entropy measurements,
but it just proves that the decoder is unable to interpret that data, not that it's not compressible.

Well put!

Originally Posted by byronknoll

I can't guess where the actual compression limit will be, but I think we are still far away from that since it is still relatively easy to make improvements.

Completely agree.

Last edited by Gotty; 2nd February 2019 at 13:56.
Reason: Didn't really shine thru what I ment

A source may seem impossible to compress, and this can be confirmed by entropy measurements...

This is the part of interest to me. Surely you can compress right down to the entropy content of the file. Let's leave encrypted files aside. The trick is in measuring the entropy accurately. And if you know how to measure the entropy, you know how to compress the file. Clearly the simplistic Shannon p log(p_{i}) formula is not practical as we don't know what i is in the presence of auto correlation. I've tried using NIST entropy formulae, but the GitHub implementation is rubbish. So I'm left with strong compression as a measuring device.

The digits are random, and "-fixed-text" is well, fixed. Therefore the theoretical Shannon entropy of this file is exactly 2.76827 bits /byte. If you use cmix, you get 2.77107 bits/byte. That's within 0.1% of the best possible result. Ergo for this file, there is hardly any room for improvement.

So perhaps we can still further shrink the enwik* files, but I don't see how we can shrink this example as there appears to be nowhere left to go...

The transformer-XL results are really impressive. Someone should convert it into a compression program! I am guessing that by itself it won't be competitive with PAQ8. It is SoTA for language modelling with a train/test split, but it won't do as well with the online training needed for arithmetic coding. However, it would probably be very useful as a model added to PAQ8/cmix.

The transformer-XL results are really impressive. Someone should convert it into a compression program! I am guessing that by itself it won't be competitive with PAQ8. It is SoTA for language modelling with a train/test split, but it won't do as well with the online training needed for arithmetic coding. However, it would probably be very useful as a model added to PAQ8/cmix.

I disagree, completely.
Not only are the results unimpressive, but their claims to state-of-the-art status are ignorant at best, or purposefully deceptive.
It seems the rage nowadays is to come up with new machine-learning architectures, tweak them, let them train for weeks on bonafide super-computers, and then publish ludicrous claims without any meaning.

Either the authors (and most in the field) have forgotten about Kolgomorov complexity, which is doubtful given their prestigious credentials, or the (sad) state of academic publishing is such that nowadays if you don't publish something, you're a failure.

Now, before going any further, let me justify myself with a concrete example.
I can publish a machine-learning language model right now that, within equivalent restriction parameters as those described in the paper, achieves 0.76 bpc on enwik8, i.e., compresses it to less than 9.500.000 bytes.

Now back to the paper. The code for it is located here. The pre-trained Tensorflow models are here.
Notice anything? They can compress enwik8 down to 0.99 bpc.. using a model that's over 3GB(!) in size. Impressive? No, not at all.
Not to mention the fact that it requires powerful GPUs to use the model (let alone train it) and it can still take a couple of hours.

So I modified paq8px to only use the normal model and the text model. That way on level 9 it uses about the same memory, 3GB.
I trained it on enwik8 (so that way, to reproduce the result, we too would need to load 3GB of model state data). I didn't even save the mixer weigths.
Compressing enwik8 after this training stage gives my claimed result, 0.76 bpc, in less than 1 hour, single-threaded, on a CPU.

Sounds ridiculous? Yes, as are their claims. The moment they decided to include results for cmix and yet claim themselves SoTA, they decided that comparing apples to oranges was fair game. What is the benefit of considering these ML models as-is for such tasks? If anything, we should be trying to understand what correlations are being modeled in these networks and devising mathematical frameworks to allow us to efficiently perform that analysis.

These networks encode the pattern correlations for which they are trained on in a huge high-dimensionality space, they transform the information they've been given into a mapping into this space, but that gives us no better insight into how can we reduce the dimensionality to something tractable. We know that the space of actually meaningful information is a much smaller subset of the infinite space of possibilities, so we need to focus on understanding how can we devise the tools needed to encode a mapping into that space using as little data as possible.

I'm not 100% sure what they are doing with transformer-xl, but if they are training their model on the same data that they are then compressing then that makes completely no sense. If we're allowed to have 3 GB decompressor then BARF ( http://mattmahoney.net/dc/barf.html ) can compress enwik8 easily to one byte. It would make much more sense to e.g. train on first 100 MBs of enwik9, and then evaluate compression ratio on next 100 MBs of enwik9. Have they done that?

For their enwik8 results, they use the first 90% for training, the next 5% for validation, and the last 5% for testing.

I tried measuring the performance of cmix v16 using this setup. I posted the results under the "enwik8" section here. After a single pass through the training set, cmix v16 gets 1.1092 bpc on the test set. Their 0.99 bpc is significantly better than what has been achieved using LSTMs on the same benchmark.

For their enwik8 results, they use the first 90% for training, the next 5% for validation, and the last 5% for testing.

Yes, that's the usual methodology. But now, that's not what they're stating, is it? Look at table 2, the results for enwik8. They're directly comparing their 1.06 and 0.99 bpc results to cmix v13, listed as 1.23 bpc, which is v13's result for the full enwik8, with no 90% pre-training stage. So, deception or just ignorance?

Originally Posted by byronknoll

I tried measuring the performance of cmix v16 using this setup. I posted the results under the "enwik8" section here. After a single pass through the training set, cmix v16 gets 1.1092 bpc on the test set. Their 0.99 bpc is significantly better than what has been achieved using LSTMs on the same benchmark.

At the cost of making even cmix look fast. Honestly, what is the point? It seems like everyday a new paper comes up on arxiv with yet another ridiculously over-the-top network architecture claiming "SoTA" status. It trully seems the old saying as a point, when all you have is a hammer, everything starts looking like a nail. We know that modelling long-term correlations helps, we know that maintaining an internal memory state helps, what we don't know is how to model this efficiently.

I'd much rather read about a paper claiming 1.1 bpc on enwik8 but doing so in 3 or 4 less orders of magnitude than the current ML approaches, that would be a lot more interesting. Alas, it seems all the work in the field nowadays is focused on this approach, instead of trying to devise new, efficient ways of achieving similar performance. Are we to resign ourselves to the fact that everything else has already been invented, that no other avenues await, that we must simply proceed on this path? I refuse to believe so, and if anything I think the admittedly poor work I've been doing on paq8 shows that it's much better to improve the way we model the data than to rely on sheer brute force.

So how to get closer to the Kolmogorov complexity? Here is a (non-practical) algorithm:
Take a file. See it, understand it, let your brain pick up its patterns and finally write an algorithm that creates the file. In order to get the best results - take a decade of vacation ("unrestricted resource usage" in Byron's words). If you did your job right, your result will always be better (sometimes far better) than using any "general" compression algorithm that we use today. No "general" compression algorithm is specialized for this file.

Oh, I can certainly write one. This is the easiest exercise. Here is its output:

0

And here is the decoder: If the input is 0, regenerate "the file". If the input is "1", strip off the "1" bit and move the rest from input to output.

Originally Posted by Gotty

The definition of Kolmogorov Complexity might be misleading. Program = code + data. Thus an upper bound of the Kolmogorov complexity of a file is the size of its compressed format + the size of its decompressor. Not just "a hundred lines of magical computer code" - that's just the algorithm. You will need to include some data as well. You may imagine the "data" part as a tight decision tree that the "a hundred lines of magical computer code" uses.

Well, that isn't all of it. The Kolmogorov complexity is relative to a Turing machine, which may have an internal (finite) state machine according to which it operates. Hence, the Kolmogorov complexity is only relative to the machine it was defined on. (Or, in your example, if the demo was written for i86 or 68K, so to say). Results will be different. Again, for a given job ("play this demo") I can easily find a Turing machine whose program size is minimal ("0") and whose data size is minimal ("") which plays this demo. Because I can hide the complexity in the machine ("hard-wire something that plays the demo").

So this is not quite meaningful. It becomes meaningful if you consider infinite sequences, and check whether you can generate them with a (finite) Turing machine (low complexity) or no Turing machine at all (a statistical sequence - if you believe that this exists) in which case the data of the program needs to be infinite as well.

you are soooooo far from the Kolmogorov complexity (which is less than 4KB).

This is an oversimplification. Sorry if I misled anyone (see below).

Originally Posted by thorfdbg

The Kolmogorov complexity is relative to a Turing machine, which may have an internal (finite) state machine according to which it operates.

Originally Posted by thorfdbg

I can easily find a Turing machine whose program size is minimal ("0") and whose data size is minimal ("") which plays this demo. Because I can hide the complexity in the machine ("hard-wire something that plays the demo").

I'm afraid you can't cheat like that. If any algorithm is manipulating any data on the Turing Machine you must take it into consideration when determining the Kolmogorov complexity. If your Turing Machine has multiple abstraction layers: from the gates on the silicon chip(s), the memory controller, cache controller, cpu microcode, chipset microcode, bios, operating system kernel, dlls, ... up to your programs code + data - you must consider all of them and include their length in the "description" of your input because they all have a role in reproducing it.

You can't hide any tricky stuff anywhere. You would just nullify the essence of Kolmogorov complexity.
At least you have to read the description from the "input tape", output the original file and finally you must halt at the end. So that is at least 3 different instructions. You cannot have 0 instructions.

Originally Posted by thorfdbg

It becomes meaningful if you consider infinite sequences.

See in Solomonoff's paper: "Any finite string of 0’s and 1’s is an acceptable input to M1"; "known finite string of symbols"; or just consider the title of Chaitin: "Gregory J. Chaitin. On the length of programs for computing finite binary sequences."
It looks like our forefathers found it meaningful for finite strings only. I also find it meaningful for finite strings. It sounds like you mean determining the entropy of a source? Is it correct? Or am I mistaken?

I'm afraid you can't cheat like that. If any algorithm is manipulating any data on the Turing Machine you must take it into consideration when determining the Kolmogorov complexity. If your Turing Machine has multiple abstraction layers: from the gates on the silicon chip(s), the memory controller, cache controller, cpu microcode, chipset microcode, bios, operating system kernel, dlls, ... up to your programs code + data - you must consider all of them and include their length in the "description" of your input because they all have a role in reproducing it.

So, what is the "length of the gates of a microprocessor"? The problem is, you cannot really define that either. The Kolmogorov-complexity is only relative to a given machine, but the important aspect here is that a Turing machine is driven by a *finite* state machine (stress on *finite*). Hence, one cannot point at a particular number sequence and say "the Kolmogorov-complexity of this sequence is 5".

Originally Posted by Gotty

It looks like our forefathers found it meaningful for finite strings only. I also find it meaningful for finite strings. It sounds like you mean determining the entropy of a source? Is it correct? Or am I mistaken?

No, Kolmogorov complexity. It is meaningful. For example, consider the Kolmogorov-complexity of Pi: This is certainly finite - Pi is an "algorithmic number" - so there is a finite state machine and a finite input tape forming a Turing machine with its input that, when run, produces the infinite series of the digits of Pi. How long this input tape is depends on the Turing machine, of course, but we know that a finite number of symbols is sufficient.

Consider a stoichastic series of 0 and 1: For this, we can probably find a Turing machine (finite state machine) but the input tape of the Turing machine has to be infinite to reproduce the sequence. So there is clearly a difference between the two cases.

Yes, that's the usual methodology. But now, that's not what they're stating, is it? Look at table 2, the results for enwik8. They're directly comparing their 1.06 and 0.99 bpc results to cmix v13, listed as 1.23 bpc, which is v13's result for the full enwik8, with no 90% pre-training stage. So, deception or just ignorance?

Yes, the cmix v13 result is not comparable with the other results in that table. I think they just copied that result from other papers (e.g. this) which include cmix results with more explanation. Even the typo in my name has propagated between papers

Originally Posted by mpais

At the cost of making even cmix look fast. Honestly, what is the point? It seems like everyday a new paper comes up on arxiv with yet another ridiculously over-the-top network architecture claiming "SoTA" status. It trully seems the old saying as a point, when all you have is a hammer, everything starts looking like a nail. We know that modelling long-term correlations helps, we know that maintaining an internal memory state helps, what we don't know is how to model this efficiently.

I'd much rather read about a paper claiming 1.1 bpc on enwik8 but doing so in 3 or 4 less orders of magnitude than the current ML approaches, that would be a lot more interesting. Alas, it seems all the work in the field nowadays is focused on this approach, instead of trying to devise new, efficient ways of achieving similar performance. Are we to resign ourselves to the fact that everything else has already been invented, that no other avenues await, that we must simply proceed on this path? I refuse to believe so, and if anything I think the admittedly poor work I've been doing on paq8 shows that it's much better to improve the way we model the data than to rely on sheer brute force.

For certain applications efficiency is not very important. Language modelling can be used for other tasks like translation, sentiment analysis, text categorization, etc. For some applications there are fewer time constraints so it doesn't matter how long it takes to process the data. The precision/recall of the model may take precedence over the resource cost.

Some researches have access to huge amounts of computational resources, so they have somewhat of an unfair advantage. They can achieve SoTA results just by scaling up the size of the model. That is why "# of parameters" gets reported.

I think transformer-XL is a genuinely innovative architecture. It is substantially different from LSTMs, and I am guessing it will achieve some nice results for data compression.

The Kolmogorov-complexity is only relative to a given machine

Just a small correction: the Kolmogorov complexity is relative to the description Language, not the hardware (machine) that runs it. Nonetheless, I understand what you mean and I agree.

Originally Posted by thorfdbg

So, what is the "length of the gates of a microprocessor"? The problem is, you cannot really define that either."

I will try to. First let's have a summary of the process and our goal (in modern terms):

(In the following paragraphs: "string" = the file content for which we need to find the Kolmogorov complexity; "machine" = any hardware or software implementation of a Universe Turing Machine or preferably a modern computer.)

We will upload a Description file (the "compressed" form of our string) to the memory of a machine, start the machine, and when it finishes processing the Description (i.e. it halts), we expect our original "uncompressed" string to appear starting from a specific memory location.

The Description is written in a predetermined description Language. The machine has an interpreter that can decode descriptions written in this Language. The description Language is essentially a compression format. The interpreter is essentially the decompressor. Our goal is to find the shortest possible Description.
This description Language may have any complexity. We may choose it freely but we shall choose wisely (it should have a good modeling power).

Please note, that the Language is not bound to our specific machine. It will run on any machine that has an interpreter for it.
Let's write an interpreter for the Language discussed above in java or python or c++ or in any other language that runs on any older or more modern architecture. The exact implementation of the machine and the interpreter does not play a role when calculating the Kolmogorov complexity. Only the Language and the Description does.

The standard Kolmogorov complexity definition states that it is the shortest length of a Description written in a predetermined Language.
Thus the description Language must be independent of our string. It must not have references to its size, to its characters, its structure or to any concrete frequencies of anything in our string. All such information must be placed to the Description and must be acquired from it during decoding.

In order to find the shortest possible Description we'll need to use a Language that enables us to include algorithms (models/transformations crafted specifically for our string) in the Description. Let's suppose, the string is a text string. When we have a general model for text, it's not useful as it is: it's not yet optimized for the string. For the smallest possible Description we must super-optimize it (and any other useful models) for the string. When we have a smaller text we must have a smaller (simpler) model, when we have a larger text we may have room for a more complex model. We'll need to find the best balance between the model size and the encoded size of the string.

When encoding "abrakadabra" for instance, we can't have a too sophisticated text model: it may just model the 3 first bits (all 11 characters being lowercase ascii letters). Probably we cannot even model efficiently that "abra" repeats, because that model would be larger than the bits gained by the better model. On the other hand, when encoding for instance dickens/silesia we can design a more sophisticated model. It is advantageous to enhance our text model as long as the size of the encoded string shrinks more compared to the Description growth caused by the included model enhancements.
When we include such models/transformations in the Description - preferably in a tight format - then we can call them from the interpreter of the Language (inside a virtual machine, of course).

Now, back to the problematic definition of the "length of the gates".

The gates on a chip and especially their connections to each other may be created in a manner that they are not independent of our string. You may easily create a circuit board that when powered, it immediately outputs the full string to the desired memory location. This is the reason we must consider them when determining the Kolmogorov Complexity.

If the behavior of the gates or their connections do contain information about the size/structure/content of the string then they are considered as part of the Description. The gates are just hardware implementations of some (boolean) logic, the information content related to their behavior is what we need. The question is: which description Language are we talking about in this case? Is it a circuit board design encoded in a predetermined (!) format? If so, that shall be included in the Description. But that is probably too expensive, there may be better Languages describing our string.

On the other hand, if the behavior of the gates and their connections are independent from our string then they will not be part of the "Description", but they will be part of the "machine" or the "interpreter" and so will be simply excluded from the calculation of the complexity.

I was not very precise regarding this "inclusion"/"exclusion" in my previous post. Sorry for that. I hope it's better now.

Originally Posted by thorfdbg

No, Kolmogorov complexity. It is meaningful.

I agree: it may be meaningful for infinite strings as well.
In your post you just sounded like that the infinite case is more important than the finite case (where I disagree):

Originally Posted by thorfdbg

It becomes meaningful if you consider infinite sequences.

Last edited by Gotty; 7th February 2019 at 20:03.
Reason: clarifications