Page 2 of 2 FirstFirst 12
Results 31 to 56 of 56

Thread: My new compression algorithm

  1. #31
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    695
    Thanks
    153
    Thanked 182 Times in 108 Posts
    Quote Originally Posted by tefara View Post
    Well, i used javascript to generate a string about 20 ramdom char_3bit ( number between 0~7) , using my algorithm to encode ( by hand), it and i have a string of 18,3 char_3bit. Then, i decode and get exactly the original string, I don't care you guy believe or not. I create this topic to find out that is there any other algorithm can compress random data, and discus about compression speed and compress ratio.
    Almost any reasonable compression program can compress some files generated by a random number generator, as long as the output happens to have enough of the type of redundancy targeted by the compressor.

    The problem is that on average you won't be able to achieve compression if the source of the data is random. Finding one specific output of a random number generator that you are able to compress only proves that that particular output was compressible, not that you can achieve compression on average across the whole spectrum of possible files. If you think about it enough, you will realize there are 2^60 equally likely files containing 20 random 3 bit values. You can't fit all of these files into a space containing less than 2^60 possible outcomes because you will run out of unique output files before you run out of (equally probable) input files. This is the pigeonhole principle and you can read more about it here: https://en.wikipedia.org/wiki/Pigeonhole_principle
    Last edited by Kennon Conrad; 8th January 2016 at 06:50.

  2. #32
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    862
    Thanks
    79
    Thanked 313 Times in 217 Posts
    Quote Originally Posted by tefara View Post
    Is it valuable enough to develop a application?
    For sure, you are not the first one with this kind of algorithm, others who build an encoding like this had around the same shrink ratio each pass.

    Problem is that nobody shall believe you, even not in real live tests, they shall think you cheated.

    The only way to get it out to become famous and maybe earn some money with a book and TV shows, is to build it and publish binary and/or source code at places where it shall be visible, not easy to remove and copied quick by interested people.

    In other words it's a risky business because the military and economic implications. If you can't follow where I'm talking about, watch this 4 hours video to get some understanding https://www.youtube.com/watch?v=oHxGQjirV-c
    Last edited by Sportman; 8th January 2016 at 15:07.

  3. #33
    Member
    Join Date
    Nov 2015
    Location
    boot ROM
    Posts
    95
    Thanks
    27
    Thanked 17 Times in 15 Posts
    Quote Originally Posted by tefara View Post
    Well, i used javascript to generate a string about 20 ramdom char_3bit ( number between 0~7) , using my algorithm to encode ( by hand), it and i have a string of 18,3 char_3bit. Then, i decode and get exactly the original string, I don't care you guy believe or not. I create this topic to find out that is there any other algorithm can compress random data, and discus about compression speed and compress ratio.
    You see, if we assume we have N bits, they can represent 2^N of possible combinations. If you dare to claim you can ALWAYS encode them to, say, N-1 bits, there is some catch: your output can represent only 2^(N-1) possible combinations, 2x less than before. This means one little problem: output haves less possible states than input. This inevitably means more than one input state maps to same output state. That's what ppl here mean you CAN compress stuff whatever size. But GOOD LUCK TO GET IT BACK, UNCHANGED. Since more than one input state maps to output state, there is uncertainity which of inputs has produced your particular ouput and how to decode it. If you assume you have ALL possible combos of N bits as possible inputs, there is no way to encode it as N-1 bits. If you haven't got idea in binary system, let's try in decimal system: can you represent decimal number 1 549 823 using only 3 decimal digits and nothing else?

    NB: I guess there're still some random data compression challenges are running? And they maybe even offer prize?

  4. #34
    Member jibz's Avatar
    Join Date
    Jan 2015
    Location
    Denmark
    Posts
    121
    Thanks
    103
    Thanked 71 Times in 51 Posts
    Quote Originally Posted by tefara View Post
    Well, i used javascript to generate a string about 20 ramdom char_3bit ( number between 0~7) , using my algorithm to encode ( by hand), it and i have a string of 18,3 char_3bit. Then, i decode and get exactly the original string, I don't care you guy believe or not. I create this topic to find out that is there any other algorithm can compress random data, and discus about compression speed and compress ratio.
    I have a question about a statement you made earlier:

    Quote Originally Posted by tefara
    spoil a example, i can encode and decode a string of 256 totally random char ( all of them are different each other)
    When you say you generate a string of random values, do you mean actually random values, or randomly ordered distinct values (all different from each other)?

    Because as both Mauro Vezzosi and I pointed out it makes a rather large difference.
    Last edited by jibz; 8th January 2016 at 10:04.

  5. #35
    Member
    Join Date
    Jan 2016
    Location
    Vietnam
    Posts
    15
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by jibz View Post
    When you say you generate a string of random values, do you mean actually random values, or randomly ordered distinct values (all different from each other)?

    Because as both Mauro Vezzosi and I pointed out it makes a rather large difference.
    spoil a example, i can encode and decode a string of 256 totally random char ( all of them are different each other)
    This situation is the simplest type of Random data, encoding such kind of string is the first step in my algorithm. If " all of them are different each other", my algorithm will gain its max compression ratio. Then i process to compress random char but "not all of them are different". The solution is build a dynamic dictionary that lenght 32 char. To do that we much perform investigating the current string => define 32 char of dictionary=> encode => build new dictionary .......
    Logically, if one 32 char dictionary can be used to encode 3 block of 256 char, the result is 256*3 char to 32+240*3 char...
    Solution:
    1: I find of a way to ensure that each dictionary can meet that requirment, or can do better in future developing!
    2, The Lzma algorithm can be helpful here, to completely remove the 32char of dictionary in our encoded string, however i haven't tried this way yet .
    I tried my algorithm with string of char_3bit, the result is chaotic between 1~5% reducing.
    3. decompression speed is faster about 4 times than compress speed. (we compress once to decompress millions times)
    Compressing process include the investigating to build dynamic dictionary while decoding process is not required, so that decompress speed is faster than compress speed.

  6. #36
    Member
    Join Date
    Sep 2015
    Location
    Italy
    Posts
    261
    Thanks
    110
    Thanked 151 Times in 111 Posts
    Quote Originally Posted by tefara View Post
    spoil a example, i can encode and decode a string of 256 totally random char ( all of them are different each other)
    This situation is the simplest type of Random data, encoding such kind of string is the first step in my algorithm. If " all of them are different each other", my algorithm will gain its max compression ratio.
    It seems to me you are a good guy, you really believe what you say, so it worthwhile to help you.
    I suggest you to test your algorithm on real life data (e.g. on some small parts of enwik8.zip/rar/7z), not with data generated by hand or by program you know the frequency distribution of its output.
    A string of 256 totally random char (all of them are different each other) is very rare, so in real life data you need to transmit to the decoder if a block of 256 chars is compressed or not: you'll see that this extra info hurts the compression ratio and enlarge the final data size.

  7. #37
    Member
    Join Date
    Jan 2016
    Location
    India
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts
    anyone please help me to highly compress fifa 16 game!

  8. #38
    Member
    Join Date
    Jan 2016
    Location
    India
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts
    will u plz help me to highly compress game?

  9. #39
    Member
    Join Date
    Jan 2016
    Location
    Vietnam
    Posts
    15
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Mauro Vezzosi View Post
    It seems to me you are a good guy, you really believe what you say, so it worthwhile to help you.
    I suggest you to test your algorithm on real life data (e.g. on some small parts of enwik8.zip/rar/7z), not with data generated by hand or by program you know the frequency distribution of its output.
    A string of 256 totally random char (all of them are different each other) is very rare, so in real life data you need to transmit to the decoder if a block of 256 chars is compressed or not: you'll see that this extra info hurts the compression ratio and enlarge the final data size.
    Thank,for your sharing ideal
    srand(time(0));
    char x = (char)(rand() % 8
    i consider x as a char that need 3 bits to reference.
    this is how i generated the random string, i don't think is there any frequency distribution here. Because, i'm not a coder so i only tried with char 3bit and compress by hand.
    Extra info here is 32bit dictionary, and about 0.01% for escape code.

  10. #40
    Member jibz's Avatar
    Join Date
    Jan 2015
    Location
    Denmark
    Posts
    121
    Thanks
    103
    Thanked 71 Times in 51 Posts
    Quote Originally Posted by tefara View Post
    Thank,for your sharing ideal
    srand(time(0));
    char x = (char)(rand() % 8
    i consider x as a char that need 3 bits to reference.
    this is how i generated the random string, i don't think is there any frequency distribution here. Because, i'm not a coder so i only tried with char 3bit and compress by hand.
    Extra info here is 32bit dictionary, and about 0.01% for escape code.
    Firstly, the low order bits of most rand() implementations are quite poor. A crude fix is to use higher bits, even though this does not help with the modulo bias.
    char x = (rand() >> 8) % 8;

    But the question is how many of these strings that are compressible by your method are there? How likely are we to find one of them in random data?

    There are 8^8 possible strings of 8 random 3-bit values (16.777.216). There are 8! possible strings of 8 randomly ordered unique 3-bit values (40.320). So we only have a 0.24% chance of generating one of these compressible strings.

    This gets a lot worse when we go to strings of 256 random 8 bit values. There are 256^256 random strings (roughly 10^616), but only 256! randomly ordered unique value strings (roughly 10^507). The odds of generating a string compressible by this method are astronomically tiny.

  11. #41
    Member
    Join Date
    Jan 2016
    Location
    Vietnam
    Posts
    15
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by jibz View Post
    Firstly, the low order bits of most rand() implementations are quite poor. A crude fix is to use higher bits, even though this does not help with the modulo bias.
    char x = (rand() >> 8) % 8;

    But the question is how many of these strings that are compressible by your method are there? How likely are we to find one of them in random data?

    There are 8^8 possible strings of 8 random 3-bit values (16.777.216). There are 8! possible strings of 8 randomly ordered unique 3-bit values (40.320). So we only have a 0.24% chance of generating one of these compressible strings.

    This gets a lot worse when we go to strings of 256 random 8 bit values. There are 256^256 random strings (roughly 10^616), but only 256! randomly ordered unique value strings (roughly 10^507). The odds of generating a string compressible by this method are astronomically tiny.
    I'm tied to explain, you if you read clearly what i write before, you will recognize that i use a special method to find out the redundant between blocks of 256 chars, and store them in 32 chars dictionary. and this dictionary is not a static dictionary, it is a multi dynamic dictionary, it mean if i have 30 block of 256 chars, on average, 1 dictionary can be used by 3 blocks it mean the result is 10*(32+240*3) char. There is a very very very low chance that one dictionary can only be used by 1 or 2 block, on the other hand there is a high chance for 1 dictionary can be used by 4 blocks or more.

    If you guy want to see the result soon, please help me in improving my coding skill. I right now don't even know how to read file into memory
    Last edited by tefara; 8th January 2016 at 13:19.

  12. #42
    Member nikkho's Avatar
    Join Date
    Jul 2011
    Location
    Spain
    Posts
    550
    Thanks
    222
    Thanked 165 Times in 106 Posts
    Quote Originally Posted by ankit1 View Post
    will u plz help me to highly compress game?
    Yes. DEL *.* /S should leave it at 0 bytes, and you could spend your time in more profitable activities

  13. #43
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    159
    Thanks
    30
    Thanked 62 Times in 38 Posts
    i can encode and decode a string of 256 totally random char ( all of them are different each other)
    bsc has good way to compress them. Its block header has compressed 1-256symbols.
    And Sreep's Reverse CBT coding can compress them.

  14. #44
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    159
    Thanks
    30
    Thanked 62 Times in 38 Posts
    It is simple demo program
    Attached Files Attached Files
    Last edited by xezz; 9th January 2016 at 14:49.

  15. Thanks:

    tefara (9th January 2016)

  16. #45
    Member
    Join Date
    Jan 2016
    Location
    Vietnam
    Posts
    15
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by xezz View Post
    It is simple demo program
    thank you very much for your sharing, but it is a little difficult for a newbie coder like me to understand these code, but i can learn some way to improve processing speed through your sharing code. It will be great if you can help me in some code like: read and write x char from and to file stated at char n or position n ( to save RAM space). write out group of information like string name of file, int size of file, vector char 1, vector char 2.... into one file and re-read them correctly for decompression thank.

  17. #46
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Not bad. It compresses 256 random bytes (all of them different from each other) to 212 bytes. That's close to the theoretical limit of log2(256!)/8 = 210.5 bytes.

  18. #47
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    862
    Thanks
    79
    Thanked 313 Times in 217 Posts
    Quote Originally Posted by xezz View Post
    It is simple demo program
    Tefara, type 256 random bytes and select "Download to file" to get a real random test file for testing this program or your own created encoder:
    https://www.random.org/bytes/

  19. #48
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    159
    Thanks
    30
    Thanked 62 Times in 38 Posts
    ha(ppm baseed) a2 gives 229bytes
    Some entropy coders with MTF gives about 235bytes.
    That's close to the theoretical limit of log2(256!)/8 = 210.5 bytes.
    bsc algo's best result will be about 160bytes(very rare case)

  20. #49
    Member
    Join Date
    Nov 2015
    Location
    boot ROM
    Posts
    95
    Thanks
    27
    Thanked 17 Times in 15 Posts
    Uhm, if all 256 bytes are different, they're probably not entirely random, to begin with . If we take a random byte, it not supposed to depend on previous, so chance new byte matches previous is 1/256, and if we have got whole 255 different bytes and taking another one, there is 255/256 chance it would match one of previous bytes. So overall probability there is no another byte with same value is quite low, isn't it?

    Also, sequence like 0, 1, ... 255 can be very efficiently compressed as e.g. instruction to go from 0 to 255 at step 1, while it formally haves all bytes between 0 and 255. But it's nowhere close to being random.

  21. #50
    Member
    Join Date
    Apr 2015
    Location
    Greece
    Posts
    83
    Thanks
    34
    Thanked 26 Times in 17 Posts
    The probability that 256 bytes are all different is 256!/2^(8*256) or about 2.654 × 10^-110.We are almost sure that they will be not different.https://en.wikipedia.org/wiki/Birthday_problem

  22. #51
    Member
    Join Date
    Jan 2016
    Location
    Vietnam
    Posts
    15
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Massive difficult module: "build dynamic dictionary" completed, processing to compress and decompress code.
    Last edited by tefara; 10th January 2016 at 12:46.

  23. #52
    Member
    Join Date
    Oct 2015
    Location
    Belgium
    Posts
    31
    Thanks
    9
    Thanked 11 Times in 9 Posts
    I hope that once you understand the Pigeonhole principle, which implies that no compression algorithm, not even the one you have in mind, can consistently make arbitrary input smaller (oh how nice it would be if that would be possible, we wouldn't need storage at all: just compress everything down recursively to one bit, which you can easily just memorize ), you will not be too embarrassed and lose your interest in the topic.

  24. #53
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,531
    Thanks
    755
    Thanked 674 Times in 365 Posts
    ... and never learn programming? once i tried to teach Haskell to the son of my friend. he asked me "can it be used to write games?" and teaching was finished here. fortunately, C++ allows to write puzzles

  25. #54
    Member
    Join Date
    May 2013
    Location
    ARGENTINA
    Posts
    54
    Thanks
    62
    Thanked 13 Times in 10 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    ... and never learn programming? once i tried to teach Haskell to the son of my friend. he asked me "can it be used to write games?" and teaching was finished here. fortunately, C++ allows to write puzzles
    lol

  26. #55
    Member
    Join Date
    Nov 2015
    Location
    boot ROM
    Posts
    95
    Thanks
    27
    Thanked 17 Times in 15 Posts
    Claiming recursive compression is possible is like claiming you can write any 9-digit long number using only 3-digit long numbers instead. Sure, it is possible to establish mappings of some selected 9-digit numbers to 3-digit numbers, etc. And it going to work okay... unless you've got more than 999 diferent numbers to chew on. That's where FAIL happens. So it seems you can't keep writing ALL 9-digit numbers like that. A little catch, isn't it?

    Furthermore, bits in file can be viewed as some (usually huge) binary number, and while base changes from decimal to binary, math isn't too different, mentioned idea also applies. Compressors are only working because files aren't really random arbitrary inputs. LZ encodes repeated sequences. It does not works if file lacks them. Huffman attempts to encode more frequent symbols using shorter codes, and more rare symbols using longer codes. If you have data where some bytes are more frequent, gain happens. But if they all are equally probable, like in random input of sufficient size, there is no gain. And so on.

  27. #56
    Member
    Join Date
    Jun 2018
    Location
    Slovakia
    Posts
    171
    Thanks
    46
    Thanked 11 Times in 11 Posts
    tefara, could you post your software or at least compressed sample?

Page 2 of 2 FirstFirst 12

Similar Threads

  1. CRC used to figure out compression algorithm
    By Omnikam in forum The Off-Topic Lounge
    Replies: 13
    Last Post: 29th February 2016, 03:21
  2. Anyone know which compression algorithm does this?
    By hjazz in forum Data Compression
    Replies: 8
    Last Post: 24th March 2014, 06:49
  3. Help identify compression algorithm?
    By DotDotDot in forum Data Compression
    Replies: 0
    Last Post: 1st June 2013, 10:15
  4. Hierarchy compression algorithm and more
    By teddybot in forum The Off-Topic Lounge
    Replies: 7
    Last Post: 3rd May 2012, 02:16
  5. New layer 0 - compression algorithm
    By abocut in forum Data Compression
    Replies: 5
    Last Post: 28th May 2010, 02:32

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •