Well, i used javascript to generate a string about 20 ramdom char_3bit ( number between 0~7) , using my algorithm to encode ( by hand), it and i have a string of 18,3 char_3bit. Then, i decode and get exactly the original string, I don't care you guy believe or not. I create this topic to find out that is there any other algorithm can compress random data, and discus about compression speed and compress ratio.

Almost any reasonable compression program can compress some files generated by a random number generator, as long as the output happens to have enough of the type of redundancy targeted by the compressor.

The problem is that on average you won't be able to achieve compression if the source of the data is random. Finding one specific output of a random number generator that you are able to compress only proves that that particular output was compressible, not that you can achieve compression on average across the whole spectrum of possible files. If you think about it enough, you will realize there are 2^60 equally likely files containing 20 random 3 bit values. You can't fit all of these files into a space containing less than 2^60 possible outcomes because you will run out of unique output files before you run out of (equally probable) input files. This is the pigeonhole principle and you can read more about it here: https://en.wikipedia.org/wiki/Pigeonhole_principle

Last edited by Kennon Conrad; 8th January 2016 at 06:50.

For sure, you are not the first one with this kind of algorithm, others who build an encoding like this had around the same shrink ratio each pass.

Problem is that nobody shall believe you, even not in real live tests, they shall think you cheated.

The only way to get it out to become famous and maybe earn some money with a book and TV shows, is to build it and publish binary and/or source code at places where it shall be visible, not easy to remove and copied quick by interested people.

In other words it's a risky business because the military and economic implications. If you can't follow where I'm talking about, watch this 4 hours video to get some understanding https://www.youtube.com/watch?v=oHxGQjirV-c

Last edited by Sportman; 8th January 2016 at 15:07.

Well, i used javascript to generate a string about 20 ramdom char_3bit ( number between 0~7) , using my algorithm to encode ( by hand), it and i have a string of 18,3 char_3bit. Then, i decode and get exactly the original string, I don't care you guy believe or not. I create this topic to find out that is there any other algorithm can compress random data, and discus about compression speed and compress ratio.

You see, if we assume we have N bits, they can represent 2^N of possible combinations. If you dare to claim you can ALWAYS encode them to, say, N-1 bits, there is some catch: your output can represent only 2^(N-1) possible combinations, 2x less than before. This means one little problem: output haves less possible states than input. This inevitably means more than one input state maps to same output state. That's what ppl here mean you CAN compress stuff whatever size. But GOOD LUCK TO GET IT BACK, UNCHANGED. Since more than one input state maps to output state, there is uncertainity which of inputs has produced your particular ouput and how to decode it. If you assume you have ALL possible combos of N bits as possible inputs, there is no way to encode it as N-1 bits. If you haven't got idea in binary system, let's try in decimal system: can you represent decimal number 1 549 823 using only 3 decimal digits and nothing else?

NB: I guess there're still some random data compression challenges are running? And they maybe even offer prize?

Well, i used javascript to generate a string about 20 ramdom char_3bit ( number between 0~7) , using my algorithm to encode ( by hand), it and i have a string of 18,3 char_3bit. Then, i decode and get exactly the original string, I don't care you guy believe or not. I create this topic to find out that is there any other algorithm can compress random data, and discus about compression speed and compress ratio.

I have a question about a statement you made earlier:

Originally Posted by tefara

spoil a example, i can encode and decode a string of 256 totally random char ( all of them are different each other)

When you say you generate a string of random values, do you mean actually random values, or randomly ordered distinct values (all different from each other)?

Because as both Mauro Vezzosi and I pointed out it makes a rather large difference.

When you say you generate a string of random values, do you mean actually random values, or randomly ordered distinct values (all different from each other)?

Because as both Mauro Vezzosi and I pointed out it makes a rather large difference.

spoil a example, i can encode and decode a string of 256 totally random char ( all of them are different each other) This situation is the simplest type of Random data, encoding such kind of string is the first step in my algorithm. If " all of them are different each other", my algorithm will gain its max compression ratio. Then i process to compress random char but "not all of them are different". The solution is build a dynamic dictionary that lenght 32char. To do that we much perform investigating the current string => define 32 char of dictionary=> encode => build new dictionary .......
Logically, if one 32 char dictionary can be used to encode 3 block of 256 char, the result is 256*3 char to 32+240*3 char...
Solution:
1: I find of a way to ensure that each dictionary can meet that requirment, or can do better in future developing!
2, The Lzma algorithm can be helpful here, to completely remove the 32char of dictionary in our encoded string, however i haven't tried this way yet .
I tried my algorithm with string of char_3bit, the result is chaotic between 1~5% reducing.
3. decompression speed is faster about 4 times than compress speed. (we compress once to decompress millions times)
Compressing process include the investigating to build dynamic dictionary while decoding process is not required, so that decompress speed is faster than compress speed.

spoil a example, i can encode and decode a string of 256 totally random char ( all of them are different each other) This situation is the simplest type of Random data, encoding such kind of string is the first step in my algorithm. If " all of them are different each other", my algorithm will gain its max compression ratio.

It seems to me you are a good guy, you really believe what you say, so it worthwhile to help you.
I suggest you to test your algorithm on real life data (e.g. on some small parts of enwik8.zip/rar/7z), not with data generated by hand or by program you know the frequency distribution of its output.
A string of 256 totally random char (all of them are different each other) is very rare, so in real life data you need to transmit to the decoder if a block of 256 chars is compressed or not: you'll see that this extra info hurts the compression ratio and enlarge the final data size.

It seems to me you are a good guy, you really believe what you say, so it worthwhile to help you.
I suggest you to test your algorithm on real life data (e.g. on some small parts of enwik8.zip/rar/7z), not with data generated by hand or by program you know the frequency distribution of its output.
A string of 256 totally random char (all of them are different each other) is very rare, so in real life data you need to transmit to the decoder if a block of 256 chars is compressed or not: you'll see that this extra info hurts the compression ratio and enlarge the final data size.

Thank,for your sharing ideal
srand(time(0));
char x = (char)(rand() % 8
i consider x as a char that need 3 bits to reference.
this is how i generated the random string, i don't think is there any frequency distribution here. Because, i'm not a coder so i only tried with char 3bit and compress by hand.
Extra info here is 32bit dictionary, and about 0.01% for escape code.

Thank,for your sharing ideal
srand(time(0));
char x = (char)(rand() % 8
i consider x as a char that need 3 bits to reference.
this is how i generated the random string, i don't think is there any frequency distribution here. Because, i'm not a coder so i only tried with char 3bit and compress by hand.
Extra info here is 32bit dictionary, and about 0.01% for escape code.

Firstly, the low order bits of most rand() implementations are quite poor. A crude fix is to use higher bits, even though this does not help with the modulo bias.

char x = (rand() >> 8) % 8;

But the question is how many of these strings that are compressible by your method are there? How likely are we to find one of them in random data?

There are 8^8 possible strings of 8 random 3-bit values (16.777.216). There are 8! possible strings of 8 randomly ordered unique 3-bit values (40.320). So we only have a 0.24% chance of generating one of these compressible strings.

This gets a lot worse when we go to strings of 256 random 8 bit values. There are 256^256 random strings (roughly 10^616), but only 256! randomly ordered unique value strings (roughly 10^507). The odds of generating a string compressible by this method are astronomically tiny.

Firstly, the low order bits of most rand() implementations are quite poor. A crude fix is to use higher bits, even though this does not help with the modulo bias.

char x = (rand() >> 8) % 8;

But the question is how many of these strings that are compressible by your method are there? How likely are we to find one of them in random data?

There are 8^8 possible strings of 8 random 3-bit values (16.777.216). There are 8! possible strings of 8 randomly ordered unique 3-bit values (40.320). So we only have a 0.24% chance of generating one of these compressible strings.

This gets a lot worse when we go to strings of 256 random 8 bit values. There are 256^256 random strings (roughly 10^616), but only 256! randomly ordered unique value strings (roughly 10^507). The odds of generating a string compressible by this method are astronomically tiny.

I'm tied to explain, you if you read clearly what i write before, you will recognize that i use a special method to find out the redundant between blocks of 256 chars, and store them in 32 chars dictionary. and this dictionary is not a static dictionary, it is a multi dynamic dictionary, it mean if i have 30 block of 256 chars, on average, 1 dictionary can be used by 3 blocks it mean the result is 10*(32+240*3) char. There is a very very very low chance that one dictionary can only be used by 1 or 2 block, on the other hand there is a high chance for 1 dictionary can be used by 4 blocks or more.

If you guy want to see the result soon, please help me in improving my coding skill. I right now don't even know how to read file into memory

thank you very much for your sharing, but it is a little difficult for a newbie coder like me to understand these code, but i can learn some way to improve processing speed through your sharing code. It will be great if you can help me in some code like: read and write x char from and to file stated at char n or position n ( to save RAM space). write out group of information like string name of file, int size of file, vector char 1, vector char 2.... into one file and re-read them correctly for decompression thank.

Not bad. It compresses 256 random bytes (all of them different from each other) to 212 bytes. That's close to the theoretical limit of log2(256!)/8 = 210.5 bytes.

Tefara, type 256 random bytes and select "Download to file" to get a real random test file for testing this program or your own created encoder: https://www.random.org/bytes/

Uhm, if all 256 bytes are different, they're probably not entirely random, to begin with . If we take a random byte, it not supposed to depend on previous, so chance new byte matches previous is 1/256, and if we have got whole 255 different bytes and taking another one, there is 255/256 chance it would match one of previous bytes. So overall probability there is no another byte with same value is quite low, isn't it?

Also, sequence like 0, 1, ... 255 can be very efficiently compressed as e.g. instruction to go from 0 to 255 at step 1, while it formally haves all bytes between 0 and 255. But it's nowhere close to being random.

The probability that 256 bytes are all different is 256!/2^(8*256) or about 2.654 × 10^-110.We are almost sure that they will be not different.https://en.wikipedia.org/wiki/Birthday_problem

I hope that once you understand the Pigeonhole principle, which implies that no compression algorithm, not even the one you have in mind, can consistently make arbitrary input smaller (oh how nice it would be if that would be possible, we wouldn't need storage at all: just compress everything down recursively to one bit, which you can easily just memorize ), you will not be too embarrassed and lose your interest in the topic.

... and never learn programming? once i tried to teach Haskell to the son of my friend. he asked me "can it be used to write games?" and teaching was finished here. fortunately, C++ allows to write puzzles

... and never learn programming? once i tried to teach Haskell to the son of my friend. he asked me "can it be used to write games?" and teaching was finished here. fortunately, C++ allows to write puzzles

Claiming recursive compression is possible is like claiming you can write any 9-digit long number using only 3-digit long numbers instead. Sure, it is possible to establish mappings of some selected 9-digit numbers to 3-digit numbers, etc. And it going to work okay... unless you've got more than 999 diferent numbers to chew on. That's where FAIL happens. So it seems you can't keep writing ALL 9-digit numbers like that. A little catch, isn't it?

Furthermore, bits in file can be viewed as some (usually huge) binary number, and while base changes from decimal to binary, math isn't too different, mentioned idea also applies. Compressors are only working because files aren't really random arbitrary inputs. LZ encodes repeated sequences. It does not works if file lacks them. Huffman attempts to encode more frequent symbols using shorter codes, and more rare symbols using longer codes. If you have data where some bytes are more frequent, gain happens. But if they all are equally probable, like in random input of sufficient size, there is no gain. And so on.