Page 5 of 9 FirstFirst ... 34567 ... LastLast
Results 121 to 150 of 254

Thread: loseless data compression method for all digital data type

  1. #121
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    88
    Thanks
    15
    Thanked 2 Times in 2 Posts
    Quote Originally Posted by Shelwien View Post
    They actually do exist: https://en.wikipedia.org/wiki/Perpet...otion_machines

    Problem is that people usually propose new designs from lack of knowledge, and it turns into a kind of spam.
    Well, just like darwin evolution theory, or maybe flat earth or anything else. Some will agree, but also a lot of people disagree. We just need some proof. Thats very hard to achieve since so many arguments with great explanation able to counter the theory. Im glad so many people join data compression forum. I hope someday people will find a solution to this problem and make history in humankind, compressing any big file into a very tiny size just like folding paper into a small pieces.

  2. #122
    Member
    Join Date
    Aug 2016
    Location
    USA
    Posts
    71
    Thanks
    16
    Thanked 21 Times in 16 Posts
    Quote Originally Posted by Shelwien View Post
    They actually do exist: https://en.wikipedia.org/wiki/Perpet...otion_machines

    Problem is that people usually propose new designs from lack of knowledge, and it turns into a kind of spam.
    Agree 110%; I had to learn and apply the misnamed Dirichlet's principle (really the Dirichlet's drawer principle) a.k.a the Pigeonhole principle in 4th grade in actual math proofs. It still amazes me that people don't seem to comprehend the impossibility of universal lossless compression of random sequences. If we fail to communicate that simple proof, how do we hope to convince them that the situation is even worse - the _vast_ majority of sequences is essentially incompressible, and the only reason compression exists is because of our redundant encodings...

    Anyhow, the occasional posts are worth a chuckle, but sometimes seem to consume too much forum traffic and too many well-intentioned responses.

  3. #123
    Programmer michael maniscalco's Avatar
    Join Date
    Apr 2007
    Location
    Boston, Massachusetts, USA
    Posts
    142
    Thanks
    27
    Thanked 95 Times in 32 Posts
    Quote Originally Posted by Shelwien View Post
    Problem is that people usually propose new designs from lack of knowledge, and it turns into a kind of spam.
    That was the downfall of comp.compression too. At one point it was the place to discuss compression but eventually it was overrun with crackpots and the vast majority of the posts were "I can compress anything down to 3.2 bytes. I only have to write the decoder." or similar. Eventually anyone with any level of credibility simply never returned. But then again perhaps the emergence of encode.su had something to do with that.

    For me the filter is very easy. You have a working prototype or you have bullshit. And I have no time for bullshit. (^:

    Let's hope that the crackpots don't overrun this forum as they once did to comp.compression

    - Michael

  4. #124
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,034
    Thanks
    104
    Thanked 417 Times in 290 Posts
    I encourage scientists to look broader, because science has a track record to be wrong with assumptions, proven with evidence at later date, resulting in improved, changed or new theories.

    Variable length encoding is something to look into.

  5. #125
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    524
    Thanks
    199
    Thanked 186 Times in 127 Posts
    Quote Originally Posted by rarkyan View Post
    Thank you sir. At least i post them here and learn. Or maybe someday prepetual machine fans also able to prove their methods are working somehow. Well people just try, other just argue but its necesarry to achieve the goals. Thanks
    I'm pleased that people try new methods and think about data compression. Many of the long term "residents" here will have produced new compression algorithms that look very promising only to find it's flawed once the decoder side has been implemented. It's just the insistence of a few sometimes to continue in the face of proven mathematics that is boggling.

  6. #126
    Member CompressMaster's Avatar
    Join Date
    Jun 2018
    Location
    Lovinobana, Slovakia
    Posts
    212
    Thanks
    65
    Thanked 17 Times in 17 Posts
    @rarkyan,
    welcome back!
    The Occurence ID just like a complete dictionary for the program to compare and find the pattern (bit location). I think it is a really big and super large database
    Well, that´s sounds like hash -CRC or MD5- compression algorithm which is impossible due to collisions - i.e. two different files would share the same hash value, so you would´n be able to determine which file is correct when you will have only hash value. And even if you apply more sophisticated algorithm with far less collission probabilities, hash output would be always larger than your input, it would be pointless, because you will need to store more information than your input. Result? Increase in filesize.

    As for the impossibilities, everything could be possible, it´s all about time, science-technology progress and our capabilities - for example invisibility. Prior year 2000, nobody trust that something like this could EVEN be possible. And now it´s possible at least to certain degree.
    The same with teleportation - in our history, we were always able to "teleport" (move) into desired location it depends only on WHAT exactly teleportation means to different peoples. But it´s also possible to teleport some particles (even those that are normally visible) - only informations and assembly instructions are transported and resulting product will be compiled on the desired side.
    The same with time travel (or space-time, because we live in 4th dimensional space, but currently we are able to alter only the first three coordinates) - so far nobody proved that something like this could work, but I saw that one breakthrough was made at CERN regarding atom position state ... or, the black holes was invisible, but prior a few months, the world´s first black hole photo has been captured. I believe that some day we would be able to alter spacetime the same way as 3D space now. Nobody knows that we live only in one universe, nobody knows that there are many possible universes that aren´t very divergent from current one.

    But, in the other side, there are some exceptions, as always - today, noone (althought we never know what they´re doing in secret laboratories) is able to recover data that has been completely overwritten at least one time. Prior to 1990, it was possible only with an costly equipment and mainly thanks to the density of old HDD´s was very small (overwritten 1 wasn´t 0, but it was 0.15), so small percentage of overwritten data COULD be retrieved back.

    Now back to compression...

    First of all, let´s define randomness. Random means unpredictable, but not strictly incompressible. Thus, it´s not easy to predict the next bit in sequence using less information, but it´s possible at least to certain degree. And even if you will have pure random content with an 0% of repeated strings, there is always some pattern - it depends on interpretation. So, for me, pure randomness does not exists at all. Only we need to find the correct formulas to minimize it, although it will be still present in some form. Of course, it depends how to alter the input correctly, which algorithm to use, how to compress it, and decompress it back to their original form. Again, there is ALWAYS some pattern, problem is, how to encode it using less information than original...

    Next, let´s define lossless compression method for all data - all data can be compressed, of course. Some more, some less. As an example, for human text (LTCB) actual record is 15MB of 100MB. And, for "random" (incl. encrypted, randomly generated, already compressed) data the record is approx. 35% - 200KB JPEG to 132KB lossless. But, in the near future, it can be definitely improved. Not all data can be compressed? Partially agree. Well, it depends on proper interpretation, algorithm and other things. But, since all files can be represented using limited charset, one day there could be algorithm that will be able to handle any filetypes and compress it losslessly by more than 20% I guess. I´m currently "working" on my custom data compression algorithm, but problem is that I´m not good in programming, but the idea I have on my mind will definitely work, although it will be terribly slow, because for compressing only 1MB file, it will need to process roughly 6 000 000 byte values.

  7. Thanks:

    rarkyan (30th June 2019)

  8. #127
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    524
    Thanks
    199
    Thanked 186 Times in 127 Posts
    Quote Originally Posted by CompressMaster View Post
    Next, let´s define lossless compression method for all data - all data can be compressed, of course.
    Wrong! Stop peddling the same mistakes. You say "As for the impossibilities, everything could be possible". Again wrong. Some things have been proven to be impossible, and unless our entire understanding of mathematics is fundamental wrong then continuing to search is futile. You're welcome to do so, but please stop spamming serious forums with such drivel. As I said earlier, I'd rather just punt all such threads to their own forum area.

    Some more, some less. As an example, for human text (LTCB) actual record is 15MB of 100MB. And, for "random" (incl. encrypted, randomly generated, already compressed) data the record is approx. 35% - 200KB JPEG to 132KB lossless.
    Also utterly wrong. JPEG isn't random. What makes you think it is? If it was how come we view it and see a picture? Maybe you're not aware of how JPEG compressors work. They decompress the JPEG and recompress it with a better method. There's nothing random about that process at all. You're grasping at straws and for your own sanity, please don't.

    Not all data can be compressed? Partially agree. Well, it depends on proper interpretation, algorithm and other things. But, since all files can be represented using limited charset, one day there could be algorithm that will be able to handle any filetypes and compress it losslessly by more than 20% I guess. I´m currently "working" on my custom data compression algorithm, but problem is that I´m not good in programming, but the idea I have on my mind will definitely work, although it will be terribly slow, because for compressing only 1MB file, it will need to process roughly 6 000 000 byte values.
    Again also wrong. PLEASE understand the counting argument before you waste more of your own and others time so you can do something constructive instead.

    Consider a 4 byte file. If you can compress *ALL* files by 25% then that means ALL those 4 byte files (there are 4.3 billion of them) can become 3 bytes. There are 16.7 million possible 3 byte files. That means for every 3 byte file you'll have 256 4 byte files that compressed to the same 3 byte file. How do you know which one to decompress to? You can't, unless you add an extra byte . Basically, no algorithm can compress all data - random or otherwise. FACT. Claiming you can shows you've made a mistake somewhere. Note this applies for 4 to 3 byte file just as much as 4Mb to 3Mb files. It's all the same argument.

  9. #128
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,034
    Thanks
    104
    Thanked 417 Times in 290 Posts
    Quote Originally Posted by JamesB View Post
    Note this applies for 4 to 3 byte file just as much as 4Mb to 3Mb files. It's all the same argument.
    The counting theory is right in case you see sequences of bits in bytes as fixed length, but recursive random data compression encoding and decoding use not fixed lengths as input and output.

    For example when sequences of round 64 bits (round 8 bytes) are to transform in smaller and longer sequences of bits where on average by a random input 1 bit is saved, so some sequences become longer, some become smaller and some maybe equal, where the gain from the smaller sequences is higher then the loss from the longer sequences, this generate an output stream.

    To get one byte profit the average minimal input length must be round 8 x 64 bits = 64 bytes (1.5625% profit).

    For decoding the input stream must be read by detecting the sequences as written in the output stream during encoding, to let this always go right sometimes there where extra bits added during encoding who must be detected and filtered out, this increase the minimal input length and decrease the average profit as calculated above.

  10. #129
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    88
    Thanks
    15
    Thanked 2 Times in 2 Posts
    Oh man every expert explain their knowledge. I think this is my mistake starting this thread. But please i hope this thread didnt make us to start hate each other okay?
    I am responsible of this lossless compression for all digital data idea. Maybe tomorrow i will try to explain the method again. Im a very stupid person compared to the experts on this forum. But please, what i do is just to share the idea. I hope this method useful someday. Or maybe not because useless.
    On the first page maybe my explanation still unclear.
    I need time to prepare the idea explanation as best as i can tomorrow. Have a nice day

  11. Thanks:

    xinix (30th June 2019)

  12. #130
    Member
    Join Date
    Nov 2014
    Location
    California
    Posts
    168
    Thanks
    57
    Thanked 46 Times in 35 Posts
    Quote Originally Posted by Sportman View Post
    The counting theory is right in case you see sequences of bits in bytes as fixed length, but recursive random data compression encoding and decoding use not fixed lengths as input and output.
    Not to beat a dead horse, it is not a theory but an argument based on simple logic. Sequences of bits or not is irrelevant.

    Quote Originally Posted by Sportman View Post
    For example when sequences of round 64 bits (round 8 bytes) are to transform in smaller and longer sequences of bits where on average by a random input 1 bit is saved, so some sequences become longer, some become smaller and some maybe equal, where the gain from the smaller sequences is higher then the loss from the longer sequences, this generate an output stream.

    To get one byte profit the average minimal input length must be round 8 x 64 bits = 64 bytes (1.5625% profit).
    Same logical fallacy. I encourage you guys to try and understand the feedback from other people instead of running circles around faulty logic.

  13. #131
    Member
    Join Date
    Apr 2012
    Location
    London
    Posts
    265
    Thanks
    13
    Thanked 0 Times in 0 Posts
    Hello

    There are things one could not just announce for the sake of settling discussion issues.... this here will definite rank topmost to not ever

    However do suggest ways whereby collaboration groups can be formed to everyone's benefits while ensuring confidentiality adhered to ( beyond merely 'i say I promise... ')

  14. #132
    Member
    Join Date
    Sep 2018
    Location
    Philippines
    Posts
    121
    Thanks
    31
    Thanked 2 Times in 2 Posts
    Quote Originally Posted by rarkyan View Post
    Oh man every expert explain their knowledge. I think this is my mistake starting this thread. But please i hope this thread didnt make us to start hate each other okay?
    I am responsible of this lossless compression for all digital data idea. Maybe tomorrow i will try to explain the method again. Im a very stupid person compared to the experts on this forum. But please, what i do is just to share the idea. I hope this method useful someday. Or maybe not because useless.
    Nothing wrong in sharing an idea. I shared one random data compressor idea myself, which is up for others to solve or improve.

    But to others, posting your algorithm in this forum indeed shows one's lack of knowledge, that you don't know compressibility, random data compression and recursive algorithms. Really, every once in a while i find it interesting to see other individuals' attack on random data compression which tends to be recursive, however futile they may be/seem.

    @CompressMaster, a truly "random" data, input or output, by definition should be incompressible to known functions, processes, or algorithms. That is, everything stops at this "random" data. However there are random appearing data fragments, by the word "appearing" itself, means that they are still compressible or "decompressible" to another corresponding output fragments or bytes.

    And, i found a video on YouTube by Google Developers that explains a popular idea to guess data sources via random functions that define the sequences of data. However, this was uploaded on April 1. Maybe, it's Route 85 indeed.

    https://youtu.be/KOvoD1upTxM

  15. #133
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    88
    Thanks
    15
    Thanked 2 Times in 2 Posts
    Allow me to explain my idea.

    EXPLANATION 1 - HEXADECIMAL & HEX EDITOR

    First things first, this is my condition :
    1. Im not programmer
    2. I just share idea
    3. I found this method accidentally
    4. I will try to explain as clear as i can using my point of view
    5. Sorry for my bad english writing, it may cause misinterpretation
    6. Other deficiencies will be add later

    Ok, i found this idea when opening file using Hex Editor software (Hex Editor Neo, 010 Hex Editor, etc).
    Hex Editor software able to open any kind of file type, and the results looks like this :

    Click image for larger version. 

Name:	IMG1.jpg 
Views:	46 
Size:	190.3 KB 
ID:	6671

    On the first time seeing that code i call it "computer language" because only computer able to read that. Human cannot understand what the meaning of the code.

    Later, i begin to understand that code are the hexadecimals. They are contain 256 codes from 00 until ff.
    Every file always contain the combination of that 256 codes. Also later i understand that the hexadecimal codes are the shortcut of binary code.
    For example : hexadecimal 42 = binary code 01000010, etc.

    But in my point of view, i will just using hexadecimal to explain.

  16. #134
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    88
    Thanks
    15
    Thanked 2 Times in 2 Posts
    EXPLANATION 2 - WRITE/EDIT HEXADECIMAL


    After i get the view of hexadecimal code, i try to edit several code on that panel. And then save the file using the same extension of the file type. If we edit the "crutial part", the file will be unable to open. Lets try :

    I have this image file "sample1.bmp" (attachment), size 128 byte, created using Adobe Photoshop. In human view, it will looks like this :

    Name:  IMG1A.jpg
Views: 363
Size:  897 Bytes

    When i open that file on hex editor, it looks like this :

    Click image for larger version. 

Name:	IMG2.jpg 
Views:	36 
Size:	222.5 KB 
ID:	6673

    Now i edit the first code 42, change that into 01. And save using the same file type .bmp. For example edit1.bmp

    The result, file cant be opened or it is broken (maybe i change the header so computer cant recognize the structure).

    After i return that 42 code back into its original position, save the file again as .bmp, the image able to open using any picture viewer software.

    At this point i begin to understand that computer able to read the file (any digital file) and able to construct/save the file if they are written in the correct structure.

    From this file behaviour i think computer maybe able to create/construct any file, at any size, using the correct combination of hexadecimal codes, and save the hexadecimal structure into correct file type. And we have a working file. Maybe any kind of file.

    And now the next idea, how to tell computer or hex editor program to be able to construct the correct structure of the file.

  17. #135
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    88
    Thanks
    15
    Thanked 2 Times in 2 Posts
    EXPLANATION 3 - DATABASE/LIBRARY


    In order to tell hex editor so they able to construct a correct file, i need a database/library. The library contain a complete sequences of pattern, generated using formula 2^n.

    Why im using that formula? this is the reason :

    Click image for larger version. 

Name:	IMG3.jpg 
Views:	41 
Size:	177.7 KB 
ID:	6674

    On the first column, we have 42, 00, 22, 66, aa, ee.
    Each hex code/value are placed on different "coordinate", as we see :
    42 placed on the first row.
    00 placed on the 2nd, 3rd, and 4th row
    22 placed on the 5th and so on, finally ee placed on the last 8th row

    After knowing their position, i need a database/library which already record all the placement possibilities. Using formula 2^n, which n is the total row in 1 column. So we have 2^8 = 256 possibilities or i say the results as a pattern.

    To count 256 manually, i need more time but i will give you smaller example :

    If we have 3 rows, we have database/library 2^3 = 8 pattern like this :

    Click image for larger version. 

Name:	DB1.jpg 
Views:	43 
Size:	24.3 KB 
ID:	6675

    That database contain all pattern possibilities and contain ID.

    The next step, i need to compare each code placement with the database/library. If the code placement are the same with the database, i only need to write the ID on the output table.

    For example using 8 pattern database.
    In short explanation because i only have small database, i will cut the hex editor panel so its only contain 3 rows like this :

    Click image for larger version. 

Name:	IMG4.jpg 
Views:	40 
Size:	175.0 KB 
ID:	6676

    I need to search all of the code and determine their location and then i need to see their ID on the database/library.

    On the first column first code : 42 as we can see like this :

    Name:  PTR1.jpg
Views: 367
Size:  2.0 KB

    And then i need to see that location on the database/library, we have things like this :

    Click image for larger version. 

Name:	PTR2.jpg 
Views:	24 
Size:	5.8 KB 
ID:	6678

    Hex code 42 have ID 2, write this on the output table (i will explain later).

    Repeat the process for the next code, we have hex code : 00
    We see their placement like this :

    Click image for larger version. 

Name:	PTR3.jpg 
Views:	26 
Size:	6.4 KB 
ID:	6679

    Hex code 00 have ID 7

    Repeat all process for all code on all column. TO BE CONTINUED..... ​maybe tomorrow. Or maybe you already understand how the next step. May i know?

  18. #136
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,034
    Thanks
    104
    Thanked 417 Times in 290 Posts
    Quote Originally Posted by hexagone View Post
    Sequences of bits or not is irrelevant.
    It's relevant because variable length add till 50% extra possibilities.
    But there is a minimum input length needed to exploit creating sequences and distinguish them later.


    Quote Originally Posted by hexagone View Post
    Same logical fallacy.
    It's possible to design an input bit/byte file/stream what shall be seen as one sequence (can't be split) and worse case give expansion by encoding that sequence so some bits are added, but the next pass (output stream as input stream) it shall shrink.
    But it's also possible to design an input bit/byte file/stream what shall be seen as many sequences and in best case almost half the input.

  19. #137
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    88
    Thanks
    15
    Thanked 2 Times in 2 Posts
    It seems like rather quiet to discuss. So i will continue my explanation about this idea. I hope it will give a whole picture

    EXPLANATION 4 - OUTPUT TABLE (COMPRESSED FILE)


    After all ID are match with the structure, i need to store all of the ID on output table. This is the complete simple example of the output table :

    Click image for larger version. 

Name:	Output Table.jpg 
Views:	41 
Size:	1.14 MB 
ID:	6686

    So, the basic idea of this lossless compression is to replace hexadecimal structure into ID written into the output table.

    To decompress, the program only need to read the ID from output table, send to the database, and the pattern will be written back into hex editor program.

    After the hex structure are complete, program save the file into original file type.

    Progress done, and i have a lossless data same as the original.

  20. #138
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    88
    Thanks
    15
    Thanked 2 Times in 2 Posts
    EXPLANATION 5 - DIFFICULT PART


    I think the most difficult part of the idea is to generate the pattern.

    Until now, i dont know the computer ability (specification, storage, etc) to run/generate such of big pattern sequences.

    My example are very short, i only have 3 rows so its only generated 2^3 computation.

    But,
    How if the data have 1.000.000 rows or more?

    How long/fast computer able to generate and store data of 2^1.000.000?

    Is there any way to shorten the pattern computation?

    My idea currently stop in here.

    If there is any rejection about my idea, i accept that. It is very good to reject the idea with a clear example so i can learn from my failure.

    Thank you for the contribution. I really appreciate anyone who still stay in here, spending their time to discuss together solving data compression for future challenge.

    Have a nice day everyone.

  21. #139
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    625
    Thanks
    282
    Thanked 249 Times in 126 Posts
    Quote Originally Posted by rarkyan View Post
    After all ID are match with the structure, i need to store all of the ID on output table. This is the complete simple example of the output table :

    [...]

    So, the basic idea of this lossless compression is to replace hexadecimal structure into ID written into the output table.

    To decompress, the program only need to read the ID from output table, send to the database, and the pattern will be written back into hex editor program.

    [...]

    Progress done, and i have a lossless data same as the original.
    And this step is where your error is hidden. For example, the most upper left element in your table is "7" which decodes to " xx" or, in hex, "?? 00 00" for the first column of 3 bytes - this is correct so far, but the missing information is what "??" is (in this case, 42).
    Essentially, what you've encoded in your output table is the position of all "00" in the original file, the major information (all other non-null bytes) is missing.
    That's why everyone in this thread (and the other similar threads) says that you should check your decompression - you can only use your algorithm if you can restore all your data in a simple example like the one you posted. This is not the case here.

    By the way, you don't need a database to store your patterns - 1 => " ", 2 => "x ", 3 => " x ", that's just simple binary representation of the number (minus 1).
    http://schnaader.info
    Damn kids. They're all alike.

  22. #140
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    88
    Thanks
    15
    Thanked 2 Times in 2 Posts
    Quote Originally Posted by schnaader View Post
    And this step is where your error is hidden. For example, the most upper left element in your table is "7" which decodes to " xx" or, in hex, "?? 00 00" for the first column of 3 bytes - this is correct so far, but the missing information is what "??" is (in this case, 42).
    Essentially, what you've encoded in your output table is the position of all "00" in the original file, the major information (all other non-null bytes) is missing.
    That's why everyone in this thread (and the other similar threads) says that you should check your decompression - you can only use your algorithm if you can restore all your data in a simple example like the one you posted. This is not the case here.

    By the way, you don't need a database to store your patterns - 1 => " ", 2 => "x ", 3 => " x ", that's just simple binary representation of the number (minus 1).
    Thank you for the feedback. I already put 42 on the 9th row. You can check here (blue square) :

    Click image for larger version. 

Name:	Output Table.jpg 
Views:	45 
Size:	435.4 KB 
ID:	6688

    I manually checked all non-null bytes on every rows and write them on output table.

    Quote Originally Posted by schnaader View Post
    All other non-null bytes is missing
    They aren't missing.
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	Output Table.jpg 
Views:	42 
Size:	1.44 MB 
ID:	6687  

  23. #141
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    625
    Thanks
    282
    Thanked 249 Times in 126 Posts
    Ah, I see, mea culpa, overlooked the meaning of the other rows.

    So it indeed is a working algorithm reordering the data, transforming a file of 16*N byte values (range 0..255) into max. 16*256 values with range 1..2^N. Your input has N * 2^7 bits, your output has N * 2^12 bits, so there's some redundancy introduced. This is reflected in most of the values becoming "1".

    The algorithm itself would not be difficult to implement and would work for any N, even the mentioned N = 1.000.000. But since you're basically reordering bits and adding redundancy (this leads to the impression that the data can be compressed better), there's no reason why it should improve compression except for some special files. File types that could benefit from the transform would be:

    1. data with a skewed histogram where not all byte values 0..255 are used. (though note that you will have to add a small header in that case that tells the decoder which values aren't used)
    2. data that only use specific columns or structured data with structure widths that are multiples of 16


    Anyway, these can already be compressed well using entropy coding like huffman (for 1.) and multimedia compressors (for 2.), so nothing really new here.

    Also thanks for the good summary of your algorithm, if you would've started the post like this, much of the hassle in this thread couldn't have been avoided as it is clear what you're doing now.
    http://schnaader.info
    Damn kids. They're all alike.

  24. #142
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    88
    Thanks
    15
    Thanked 2 Times in 2 Posts
    Quote Originally Posted by schnaader View Post
    Ah, I see, mea culpa, overlooked the meaning of the other rows.

    So it indeed is a working algorithm reordering the data, transforming a file of 16*N byte values (range 0..255) into max. 16*256 values with range 1..2^N. Your input has N * 2^7 bits, your output has N * 2^12 bits, so there's some redundancy introduced. This is reflected in most of the values becoming "1".

    The algorithm itself would not be difficult to implement and would work for any N, even the mentioned N = 1.000.000. But since you're basically reordering bits and adding redundancy (this leads to the impression that the data can be compressed better), there's no reason why it should improve compression except for some special files. File types that could benefit from the transform would be:

    1. data with a skewed histogram where not all byte values 0..255 are used. (though note that you will have to add a small header in that case that tells the decoder which values aren't used)
    2. data that only use specific columns or structured data with structure widths that are multiples of 16


    Anyway, these can already be compressed well using entropy coding like huffman (for 1.) and multimedia compressors (for 2.), so nothing really new here.

    Also thanks for the good summary of your algorithm, if you would've started the post like this, much of the hassle in this thread couldn't have been avoided as it is clear what you're doing now.
    My apologize. Started this thread 9 years ago when i am still an evil
    By the way, is there any simplified example of Huffman coding that represent my method?
    If my idea is the same with Huffman or Multimedia Compressors, which point make the compession fail to achieve lossless goal?
    Tried to Google that but its hard for me to understand complex example.

    Thx in advance

  25. #143
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    625
    Thanks
    282
    Thanked 249 Times in 126 Posts
    Implementation in C#, binary and source attached.
    Main code part is going through all the input bytes, getting byte value (0..255) and calculating row/column (divide by 16/mod 16) and calculating which bit to set in the output using a simple formula. This works as the whole transformation just reorders bits, no database or other complex things needed. Formula for the forward transform is:

    Code:
            private const int ColumnCount = 16;
            // N is the total number of rows (ceil(input length / ColumnCount))
    
            var bitOffset = byteValue * ColumnCount * N + column * N + inputRow;
    The output filename "X" is set to "X_rky" for the forward transform and "X_rev" for the reverse transform, if file exists, you can choose if you want to overwrite. Everything is done in memory, there's a warning if this would use more than 2 GB. The output files get 32 times larger (like explained in my previous post). Results using paq8px:

    Code:
    Transformer.cs		 	2.884		(part of the source code)
    Transformer.cs_rky             92.673
    Transformer.cs_rky_rev   	2.884		(same as original)
    
    Transformer.cs.paq8px175	  554
    Transformer.cs_rky.paq8px175	2.202
    Attached Files Attached Files
    http://schnaader.info
    Damn kids. They're all alike.

  26. Thanks:

    rarkyan (5th July 2019)

  27. #144
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    88
    Thanks
    15
    Thanked 2 Times in 2 Posts
    Quote Originally Posted by schnaader View Post
    Implementation in C#, binary and source attached.

    Code:
            private const int ColumnCount = 16;
            // N is the total number of rows (ceil(input length / ColumnCount))
    
            var bitOffset = byteValue * ColumnCount * N + column * N + inputRow;
    The output filename "X" is set to "X_rky" for the forward transform and "X_rev" for the reverse transform, if file exists, you can choose if you want to overwrite. Everything is done in memory, there's a warning if this would use more than 2 GB. The output files get 32 times larger (like explained in my previous post). Results using paq8px:

    Code:
    Transformer.cs             2.884        (part of the source code)
    Transformer.cs_rky             92.673
    Transformer.cs_rky_rev       2.884        (same as original)
    
    Transformer.cs.paq8px175      554
    Transformer.cs_rky.paq8px175    2.202
    Thank you very much for the code. I tried the exe file and this is my understanding :

    Forward transform = compress file
    Reverse transform = decompress file

    am i right?

    Could you explain why the compressed file/output become 32 times much larger?

    Because the efficiency of the compressed file is just written as ID on the output table. May i know how you define the notation of each ID?

    My apologize, i lack on programming and try my best to understand.

  28. #145
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    625
    Thanks
    282
    Thanked 249 Times in 126 Posts
    Quote Originally Posted by rarkyan View Post
    Forward transform = compress file
    Reverse transform = decompress file
    That's correct, yes. Though it's just a transformation, to compress the result you'll have to use another compressor after it (as I did with paq8px175)

    Quote Originally Posted by rarkyan View Post
    Could you explain why the compressed file/output become 32 times much larger?
    See my previous post:
    Your input has N * 2^7 bits, your output has N * 2^12 bits.
    2^12 = 4096, 2 ^ 7 = 128; 2^12 / 2^7 = 4096 / 128 = 32

    Quote Originally Posted by rarkyan View Post
    Because the efficiency of the compressed file is just written as ID on the output table. May i know how you define the notation of each ID?
    Let's look at the first column from your 3 row example:

    Code:
    42
    00
    00
    The two "00" values will result in " xx" (or 7 in your example) - the notation used in the program is:

    Code:
    "   "  0
    "  x"  1
    " x "  2
    " xx"  3
    "x  "  4
    "x x"  5
    "xx "  6
    "xxx"  7
    This has the advantage that you can get the values by modifying bits. For any occurence of a "00", you only have to set a bit in your output. This is done after the calculation of the bit offset in the code:

    Code:
        output[bitOffset / 8 + HeaderSize] ^= (byte)(1 << (bitOffset % 8));
    The final output is the result of putting all those IDs/bits next to each other, so that we get 16*256*N bits (16 is the column count 00..0f, 256 is the byte value count 00..ff, N is the row count). If the original file has length L (in bytes), it has L*8 bits and N=L/16 rows. After the transform, it has 16*256*N = 16*256*L/16 = 256 * L bits = 32 * L bytes which explains the 32 again. It comes from the fact that for each byte in the input, you store 256 bits (32 bytes) - "1"/"x" for the input value, "0"/" " for each of the 255 other byte values.
    Last edited by schnaader; 5th July 2019 at 16:48.
    http://schnaader.info
    Damn kids. They're all alike.

  29. Thanks:

    rarkyan (5th July 2019)

  30. #146
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    88
    Thanks
    15
    Thanked 2 Times in 2 Posts
    My brain start to smoking now because the processors inside are about to explode.
    I hope you can guide me to confirm that i really really made a big mistake about this idea. I mean, in the end i maybe agree to achieve my mistake.

    Another question :

    What is this 7 and 12 from the formula :
    N * 2^7 bits, your output has N * 2^12 bits.
    Where they come from?

    I think my input is just the length of rows in 1 column. I give the example of 3 rows, so in my poor logic, i only need 2^3 to create all possible ID. And this ID are sufficient to fill the other hex code pattern on the next column (from 00 -- 0f)

    I can understand your explanation :

    "the final output is the result of putting all those IDs/bits next to each other, so that we get 16*256*N bits (16 is the column count 00..0f, 256 is the byte value count 00..ff, N is the row count)"
    Yes that explain my output table. And when the 3 rows example ID are written in the output table, its surely not compress the file, it will become much larger. Because the output table itself maybe have its own minimum size which maybe need more space to store header, write ID, etc. But im glad seeing result where the file able to returned into its original state when decompressed.

    Actually i want to check the logic using my own understanding, using any tools i can get to help me see it myself, thinking another possibilites to solve the problem. If you can help me think/code just the way i need (only to prove my false method), i will thank so many times. Well, just consider it as an experiment to make this great forum alive.

    Btw pls answer my question about the formula. Thanks

  31. #147
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    625
    Thanks
    282
    Thanked 249 Times in 126 Posts
    Quote Originally Posted by rarkyan View Post
    What is this 7 and 12 from the formula :
    Where they come from?

    I think my input is just the length of rows in 1 column. I give the example of 3 rows, so in my poor logic, i only need 2^3 to create all possible ID.
    I call the original untransformed file (from the hex editor view) "input" - it has N rows, each row contains 16 bytes or 16*8=128 bits. 128 is 2 ^ 7. Thus N * 2 ^ 7.
    The transformed file is your table (the "output"), 16 * 256 * N bits (each cell has N bits, there are 16 row and max. 256 columns). 16 * 256 is 4096 or 2 ^ 12. Thus N * 2 ^ 12.
    http://schnaader.info
    Damn kids. They're all alike.

  32. Thanks:

    rarkyan (5th July 2019)

  33. #148
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    88
    Thanks
    15
    Thanked 2 Times in 2 Posts
    The transformed file is your table (the "output"), 16 * 256 * N bits (each cell has N bits, there are 16 row and max. 256 columns). 16 * 256 is 4096 or 2 ^ 12. Thus N * 2 ^ 12.
    So after all, my output table/compressed file size is depend on the N bits size right?
    Can i shorten the size of N bits size? For example if the file contain big rows on 1 column, the sequences ID generated from 2^n are somehow defined like this for example :

    00000 which is will be filled from all possible combination of all alphabetical (upper/lower case), numbers and all other available symbols/notation. Still have the same result?

    After i run your .exe file for the forward transform it create _rky extension. Could you give me visualized output table with the same format like my example? which contain cells and each ID. I want to check the structure
    Last edited by rarkyan; 5th July 2019 at 19:10.

  34. #149
    Member CompressMaster's Avatar
    Join Date
    Jun 2018
    Location
    Lovinobana, Slovakia
    Posts
    212
    Thanks
    65
    Thanked 17 Times in 17 Posts
    Quote Originally Posted by compgt View Post
    @CompressMaster, a truly "random" data, input or output, by definition should be incompressible to known functions, processes, or algorithms.
    You´re right. But true randomness does not exists at all. Even those data that was generated using "pure pseudorandom data generators using random cryptographics functions" such as SHARND, aren´t random. Why? Because we have limited byte count - 256 possible values. And even if you have for example text file with an 0% of repeated strings, there is STILL some pattern. Why I am claiming this? Well, I learned that if you convert file to other form where randomness does not exists at all and you´re able to convert it back losslessly, you will end up with your original file, right? I completely agree that "random" data are "incompressible" to known algorithms. But what about new ideas/approaches? So, random data aren´t incompressible at all, they are all compressible - some more, some less, it depends only on proper selected interpretation. Therefore, randomness=incompressible isn´t correct term. It should be randomness=hardly compressible data.

  35. #150
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    678
    Thanks
    404
    Thanked 451 Times in 235 Posts
    Quote Originally Posted by CompressMaster View Post
    But true randomness does not exists at all.
    True randomness does exist. Let me give you a problem to solve. A compression challenge. I have an input. It contains one bit. No more no less. One bit. I don't tell you if the bit is 0 or 1. How do you compress it? You tell me your algorithm, I tell you my bit. Let's see if your algorithm can compress it or not THEN let's talk about true randomness.

    Quote Originally Posted by CompressMaster View Post
    Even those data that was generated using "pure pseudorandom data generators using random cryptographics functions" such as SHARND, aren´t random
    No, pseudorandom is not true random by definition.

    Quote Originally Posted by CompressMaster View Post
    Why? Because we have limited byte count - 256 possible values.
    No, the number of possible values in a byte has no relationship with randomness. One bit is even more limited - it has only 2 possible values. Yet, it can be truely random. Try to solve my 1-bit challenge, and you'll see.

    Quote Originally Posted by CompressMaster View Post
    And even if you have for example text file with an 0% of repeated strings, there is STILL some pattern.
    A text file is not random by definition. A text file contans TEXT. And text is not random.

    Quote Originally Posted by CompressMaster View Post
    But what about new ideas/approaches?
    The second sticky in this forum is the "random compression FAQ". Please read it. It contains useful information about the subject.

Page 5 of 9 FirstFirst ... 34567 ... LastLast

Similar Threads

  1. Any money in data compression?
    By bitewing in forum The Off-Topic Lounge
    Replies: 18
    Last Post: 19th March 2019, 11:34
  2. Data compression explained
    By Matt Mahoney in forum Data Compression
    Replies: 92
    Last Post: 7th May 2012, 19:26
  3. Advice in data compression
    By Chuckie in forum Data Compression
    Replies: 29
    Last Post: 26th March 2010, 16:09
  4. Data Compression Crisis
    By encode in forum The Off-Topic Lounge
    Replies: 15
    Last Post: 24th May 2009, 20:30
  5. Data Compression Evolution
    By encode in forum Forum Archive
    Replies: 3
    Last Post: 11th February 2007, 15:33

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •