Page 6 of 8 FirstFirst ... 45678 LastLast
Results 151 to 180 of 229

Thread: loseless data compression method for all digital data type

  1. #151
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Hungary
    Posts
    398
    Thanks
    277
    Thanked 282 Times in 149 Posts
    Quote Originally Posted by CompressMaster View Post
    I completely agree that "random" data are "incompressible" to known algorithms. But what about new ideas/approaches? So, random data aren´t incompressible at all, they are all compressible - some more, some less, it depends only on proper selected interpretation.
    I have another (practical) claim that there does not exist any clever algorithm that would compress any file.

    Let's suppose that there is a clever algorithm. After studying it, I can create a file that will BREAK it. How? With an "anti-file".

    I will just give it a data-sequence to compress that is the opposite what it believes about the data. If this algorithm works bit by bit, and predicts that the next a bit is 1 (p(1)>0.5), then I will create the data sequence where that particular bit is 0. And when it predicts that a bit is 0 (p(1)<0.5), I will make that bit 1. It is very easy to create such an anti-file for any algorithm.

    For paq8px (v180) I attached the anti 1K-2K-4K-8K-16K files. It is interesting to see that the larger the file the more it loses. It tries hard to find patterns, but when it thinks it found a pattern, and predicts the next bit with some certainty (p(1)<>0.5), the next bit breaks it. I feel pity for the poor thing.

    Code:
    01K.bin.paq8px180  1024  -> 1041 (+17)
    02K.bin.paq8px180  2048  -> 2069 (+21)
    04K.bin.paq8px180  4096  -> 4125 (+29)
    08K.bin.paq8px180  8192  -> 8237 (+45)
    16K.bin.paq8px180 16384 -> 16466 (+82)
    Attached Files Attached Files

  2. Thanks (2):

    Stefan Atev (8th July 2019),xinix (7th July 2019)

  3. #152
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,423
    Thanks
    223
    Thanked 1,052 Times in 565 Posts
    1) I think its actually true that most files on computers are compressible (even if random-looking).
    Btw, https://encode.su/threads/482-Bit-guessing-game
    Pseudorandom files just have some generation algorithm, or, more specifically, were compressed and/or encrypted with some algorithm.
    Even files with digitized analog data (eg. Mark Nelson's "million random digits" file) usually end up with some redundancy
    (column parity in Nelson's case).
    So true random files _can_ exist, but are not very useful, take relatively little space, and thus not worth consideration.

    2) Sequential bit prediction approach is not necessarity compatible even with _known_ generation algorithms.
    Of course, mathematically it applies to anything, but practically you won't crack AES that way,
    even though its possible with other methods (because AES works with blocks).

    3) The common mistake of random compression inventors is different.
    They usually think that some simple one-line formula can be a universal solution to data compression.
    CS textbooks certainly promote that way of thinking, by only describing that kind of algorithms.
    While actual compression algorithms are a hard challenge for programmers, involve basically all of CS fields,
    and different filetypes frequently require custom handling.

    So yes, its possible to compress most random-like files that can be found on the net etc.
    For example, we can take a paq archive, unpack it with correct version, then pack with cmix.
    But this kind of random compression is even harder to implement than usual compression algorithms -
    you'd need a high precision implementation of everything just to never lose a single bit to arithmetic rounding and such.
    And, except for some contests, there's basically no practical use for that kind of algorithms.
    Btw they'd rely on known redundancy existing in the data, and thus can't be applied recursively.

  4. Thanks (2):

    compgt (7th July 2019),xinix (7th July 2019)

  5. #153
    Member
    Join Date
    Sep 2018
    Location
    Philippines
    Posts
    38
    Thanks
    22
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Shelwien View Post
    1) I think its actually true that most files on computers are compressible (even if random-looking).
    Btw, https://encode.su/threads/482-Bit-guessing-game
    Pseudorandom files just have some generation algorithm, or, more specifically, were compressed and/or encrypted with some algorithm.
    Even files with digitized analog data (eg. Mark Nelson's "million random digits" file) usually end up with some redundancy
    (column parity in Nelson's case).
    So true random files _can_ exist, but are not very useful, take relatively little space, and thus not worth consideration.

    2) Sequential bit prediction approach is not necessarity compatible even with _known_ generation algorithms.
    Of course, mathematically it applies to anything, but practically you won't crack AES that way,
    even though its possible with other methods (because AES works with blocks).

    3) The common mistake of random compression inventors is different.
    They usually think that some simple one-line formula can be a universal solution to data compression.
    CS textbooks certainly promote that way of thinking, by only describing that kind of algorithms.
    While actual compression algorithms are a hard challenge for programmers, involve basically all of CS fields,
    and different filetypes frequently require custom handling.

    So yes, its possible to compress most random-like files that can be found on the net etc.
    For example, we can take a paq archive, unpack it with correct version, then pack with cmix.
    But this kind of random compression is even harder to implement than usual compression algorithms -
    you'd need a high precision implementation of everything just to never lose a single bit to arithmetic rounding and such.
    And, except for some contests, there's basically no practical use for that kind of algorithms.
    Btw they'd rely on known redundancy existing in the data, and thus can't be applied recursively.
    Bit-guessing is fundamental to bit-oriented compression algorithms that rely on bit context, e.g. bitwise PPM even simple bitwise LZW.

    One of my early ideas when i was starting up is not to predict just bits but bytes using just the random(256) function. Or some bit-byte correlation that leads to smaller number of bits that predict the byte: "Route 85" to me can mean 8 bits to 5 bits, i.e. at most 32 guesses was fine and was enough.

    At one time, i hailed the random() function of Turbo C for this instance. Then everything breaks up. I thought the compiler was really rigged for lesser compression as i thought during the Cold War, but paq and other top compressors were doing fine anyway, that it must be a coding error on the decompressor part.

  6. #154
    Member
    Join Date
    Jun 2018
    Location
    Slovakia
    Posts
    154
    Thanks
    44
    Thanked 10 Times in 10 Posts
    @Gotty,
    Quote Originally Posted by Sportman View Post
    The counting theory is right in case you see sequences of bits in bytes as fixed length, but recursive random data compression encoding and decoding use not fixed lengths as input and output.
    Completely agree.

    Quote Originally Posted by Gotty View Post
    I have an input. It contains one bit. No more no less. One bit. I don't tell you if the bit is 0 or 1. How do you compress it?
    By looking on it. In other words, it´s already compressed (but that doesn´t mean that already compressed data cannot be losslessly compressed further). Your input cannot be compressed at all, because there will be overhead. Specifically, that small amount of data cannot be compressed at all. But, if we packed a lot of small *hardly compressible* files into one compressed archive (such as RAR), it can be losslessly compressed further, of course. But, there will be always limit HOW MUCH FURTHER particular data can be compressed. 100% lossless compression is impossible, but I expect ratio very close to 98 % (or at least 95%).

    Quote Originally Posted by Gotty View Post
    the number of possible values in a byte has no relationship with randomness
    Wrong. There is a STRONG relationship between that.

    Quote Originally Posted by Gotty View Post
    A text file is not random by definition. A text file contans TEXT. And text is not random.
    In this case, by text file, I mean pure text file with an 0% of repeated sequence - i.e. every character is unique, not human-readable text files such as LTCB or documents.

    Quote Originally Posted by Gotty View Post
    I have another (practical) claim that there does not exist any clever algorithm that would compress any file.
    You´re partially right. There does not exist any clever algorithm SO FAR that would compress any file.

    Quote Originally Posted by Gotty View Post
    Let's suppose that there is a clever algorithm. After studying it, I can create a file that will BREAK it. How? With an "anti-file".
    Anti-files does not exists at all. Again, there are ALWAYS lot of patterns, yet they´re hardly predictable.
    But, if you still believe that´s true, could I request you for four MUCH larger anti-files for BSC algorithm (attached)?

    Quote Originally Posted by Gotty View Post
    I will just give it a data-sequence to compress that is the opposite what it believes about the data. If this algorithm works bit by bit, and predicts that the next a bit is 1 (p(1)>0.5), then I will create the data sequence where that particular bit is 0. And when it predicts that a bit is 0 (p(1)<0.5), I will make that bit 1. It is very easy to create such an anti-file for any algorithm.
    You´re right. If algorithm works bit by bit.
    I disagree completely with the rest.
    Attached Files Attached Files
    Last edited by CompressMaster; 8th July 2019 at 21:03. Reason: typo

  7. #155
    Programmer michael maniscalco's Avatar
    Join Date
    Apr 2007
    Location
    Boston, Massachusetts, USA
    Posts
    114
    Thanks
    11
    Thanked 88 Times in 26 Posts
    Bits, bytes, files, randomness, anti-files (whatever that is) ... all irrelivant.
    What is being addressed is information and nothing more. When information is as concise as can be then there can logically be no further reduction in the information. The more succinct the more difficult it becomes to make any further reductions.

    This is the basics of information theory. And that's all there is to data compression. No magic and no buzzwords are going to alter the rules of the game.

    - Michael




    Quote Originally Posted by CompressMaster View Post
    @Gotty,


    Completely agree.



    By looking on it. In other words, it´s already compressed (but that doesn´t mean that already compressed data cannot be losslessly compressed further). Your input cannot be compressed at all, because there will be overhead. Specifically, that small amount of data cannot be compressed at all. But, if we packed a lot of small *hardly compressible* files into one compressed archive (such as RAR), it can be losslessly compressed further, of course. But, there will be always limit HOW MUCH FURTHER particular data can be compressed. 100% lossless compression is impossible, but I expect ratio very close to 98 % (or at least 95%).



    Wrong. There is a STRONG relationship between that.



    In this case, by text file, I mean pure text file with an 0% of repeated sequence - i.e. every character is unique, not human-readable text files such as LTCB or documents.



    You´re partially right. There does not exist any clever algorithm SO FAR that would compress any file.



    Anti-files does not exists at all. Again, there are ALWAYS lot of patterns, yet they´re hardly predictable.
    But, if you still believe that´s true, could I request you for four MUCH larger anti-files for BSC algorithm (attached)?



    You´re right. If algorithm works bit by bit.
    I disagree completely with the rest.

  8. Thanks:

    hexagone (8th July 2019)

  9. #156
    Member snowcat's Avatar
    Join Date
    Apr 2015
    Location
    Vietnam
    Posts
    29
    Thanks
    38
    Thanked 12 Times in 8 Posts
    Quote Originally Posted by CompressMaster View Post
    [...] In this case, by text file, I mean pure text file with an 0% of repeated sequence [...]
    That is not random at all. Randomness mean there is zero knowledge available that could
    help to guess an outcome of an event. AFAIK, 0% of repeated sequence means every time
    you read something you know that you will never see them again. That knowledge can help
    to further reduce the size of the input.

  10. #157
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Hungary
    Posts
    398
    Thanks
    277
    Thanked 282 Times in 149 Posts
    Quote Originally Posted by CompressMaster View Post
    By looking on it. In other words, it´s already compressed.
    No, it's not compressed. The bit may represent that I tossed a coin. If it is heads, the bit is 1, if it is tails, the bit is 0. The bit may represent the number of noses I have (I have 1 nose: bit is 1, I have no nose: bit is 0). That simple. No compression.
    Az Michael wrote: it's all about information.
    Can you "guess" my bit? If you say, that the chance is 90% that my bit is 1, and my bit is indeed 1, then you could compress that information to 0.15 bits (-log2(0.90)). And I say well done! But if my bit is 0, you antually lost: 3.32 bits will be the result of your compression (-log2(1.0-0.90)).
    So can you guess my bit? No, you can't. You have no information about it. From your viewpoint there is a 50% chance that my bit is 1 ((-log2(0.9))). In other words it is random. After "compression" the result is 1.0 bits (-log2(0.50)). i.e. with no information you can not compress a bit.

    Quote Originally Posted by CompressMaster View Post
    Wrong. There is a STRONG relationship between that.
    Why do you think so?

    Quote Originally Posted by CompressMaster View Post
    In this case, by text file, I mean pure text file with an 0% of repeated sequence - i.e. every character is unique, not human-readable text files such as LTCB or documents.
    Please write "binary file" then. That is not a text file. Wait - every character is unique? How do you do that in a file that contains 1000 characters (bytes)?

    Quote Originally Posted by CompressMaster View Post
    But, if you still believe that´s true, could I request you for four MUCH larger anti-files for BSC algorithm (attached)?
    What dou you mean by "much larger"?

  11. Thanks:

    xinix (9th July 2019)

  12. #158
    Member
    Join Date
    Jun 2018
    Location
    Slovakia
    Posts
    154
    Thanks
    44
    Thanked 10 Times in 10 Posts
    Quote Originally Posted by Gotty View Post
    What dou you mean by "much larger"?
    1.100K
    2.200K
    3.900K
    4.5000K

  13. #159
    Member
    Join Date
    Jun 2018
    Location
    Slovakia
    Posts
    154
    Thanks
    44
    Thanked 10 Times in 10 Posts
    Quote Originally Posted by JamesB View Post
    Also utterly wrong. JPEG isn't random. What makes you think it is? If it was how come we view it and see a picture? Maybe you're not aware of how JPEG compressors work. They decompress the JPEG and recompress it with a better method. There's nothing random about that process at all. You're grasping at straws and for your own sanity, please don't.
    You´re right. JPEG isn´t random if it´s decompressed. But I´m talking about already compressed data. Randomness in IT does not exists for me, they´re all compressible, but hardly.
    I´m aware how JPEG REcompressor (such as stuffit or paq) works.
    But my custom data preprocessing method is able to compress even random data WITHOUT recompression i.e. it´s not neccessary to decompress JPG file. And it´s noticeably faster, although compression ratio is not that good. 200KB original (preprocessed to almost 4MB - preprocessed, not decompressed, see the difference) to 174 KB lossless is possible with CMIX.

    @Gotty, what´s your progress with BSC algorithm anti-files?
    Last edited by CompressMaster; 17th August 2019 at 15:29. Reason: added more sentences

  14. #160
    Member
    Join Date
    Sep 2018
    Location
    Philippines
    Posts
    38
    Thanks
    22
    Thanked 0 Times in 0 Posts
    How good their compressors/decompressors are tell us how "intelligent" the programmers are. Understanding "context" is one measure. I wonder if investigators in other areas who are good with context will perform precisely as good in data compression.

  15. #161
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    467
    Thanks
    149
    Thanked 160 Times in 108 Posts
    Quote Originally Posted by CompressMaster View Post
    But my custom data preprocessing method is able to compress even random data WITHOUT recompression
    No it doesn't. That is provably impossible. Please read http://www.faqs.org/faqs/compression...section-8.html

    Don't bother trying to explain why your algorithm is different - it won't be. The only way any tool works, even things like CMIX, is by spotting patterns in non-random data and exploiting them. That means no tool can compress every file, but that's fine as we don't generally want to compress random data.

    You may well have a useful tool, but if so focus on where it is useful (people's actual data) and not where it is not (random data).

  16. #162
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    82
    Thanks
    13
    Thanked 2 Times in 2 Posts
    My apologize, can anyone give me example of random data and non random data file in attachment? I will try to understand the diference.

    Thank you

  17. #163
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,423
    Thanks
    223
    Thanked 1,052 Times in 565 Posts

  18. #164
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    82
    Thanks
    13
    Thanked 2 Times in 2 Posts
    Quote Originally Posted by Shelwien View Post
    Thank you. Downloaded the random data from the source
    https://archive.random.org/download?file=2019-08-19.txt

    For the non random data, file size too big and take a long time to download.

    If the random data already downloaded, it is now become offline file and the bit are not changed except the file can modified itself. So if the bit on the file aren't changed is it still random?

    Because when the bit just stay like that, i still able to create the pattern on them.

    Click image for larger version. 

Name:	Random Data.jpg 
Views:	41 
Size:	251.5 KB 
ID:	6798

  19. #165
    Member
    Join Date
    Jun 2018
    Location
    Slovakia
    Posts
    154
    Thanks
    44
    Thanked 10 Times in 10 Posts
    Quote Originally Posted by JamesB View Post
    No it doesn't. That is provably impossible. Please read http://www.faqs.org/faqs/compression...section-8.html

    Don't bother trying to explain why your algorithm is different - it won't be. The only way any tool works, even things like CMIX, is by spotting patterns in non-random data and exploiting them. That means no tool can compress every file, but that's fine as we don't generally want to compress random data.

    You may well have a useful tool, but if so focus on where it is useful (people's actual data) and not where it is not (random data).
    Well, I´m able to compress already compressed JPEGs (i.e. random data) by my custom preprocessing method (it´s not an algorithm) without decompression at all - see my second started thread.
    200 KB JPEG can be losslessly shrinked down to 174KB by CMIX without knowing that it´s JPEG image.

    You´re right. Patterns in non-random data. I´ve used my custom data preprocessing method to minimize randomness in original file.
    As for incompressibility of files, let´s wait for my custom data compression algorithm - BestComp. Maybe I have too overestimated expectations, but never say never...

  20. #166
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    467
    Thanks
    149
    Thanked 160 Times in 108 Posts
    As I said before:

    A) JPEG isn't random. It's compressible because JPEG hasn't managed to entropy encode everything perfectly (for starters most JPEGs are huffman encoded instead of arithmetic).

    B) Reliable and consistent compression of random data is impossible. I don't mean hard, I mean mathematically provably impossible.

    You may have a valid compression method for non-random data, but please think about how your assertions look. While you're claiming to be able to compress random data everyone with any knowledge in data compression will be thinking you're a crackpot. That means even if you do have a good algorithm, it'll be completely ignored and rightly so.

  21. #167
    Member
    Join Date
    Jun 2018
    Location
    Slovakia
    Posts
    154
    Thanks
    44
    Thanked 10 Times in 10 Posts
    Quote Originally Posted by JamesB View Post
    B) Reliable and consistent compression of random data is impossible. I don't mean hard, I mean mathematically provably impossible.
    We have limited byte count - 256 possible values. That´s big advantage (and even for text files with an 0% of repeated symbols - i.e. every character is unique). That means at every 256-th position, there´s repeated sequence, but it´s hard to compress it without occupying more information than original. So random isn´t random, problem is how to compress it without occupying more info than original...

    Quote Originally Posted by JamesB View Post
    You may have a valid compression method for non-random data, but please think about how your assertions look. While you're claiming to be able to compress random data everyone with any knowledge in data compression will be thinking you're a crackpot. That means even if you do have a good algorithm, it'll be completely ignored and rightly so.
    See my second started thread, test it on your JPEG files, post some results and then let´s discuss it deeper.

  22. #168
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Hungary
    Posts
    398
    Thanks
    277
    Thanked 282 Times in 149 Posts
    Quote Originally Posted by CompressMaster View Post
    You´re right. JPEG isn´t random if it´s decompressed. But I´m talking about already compressed data.
    It is still not random. I can just repeat what JamesB told you: JPEG files are not random.

    Quote Originally Posted by CompressMaster View Post
    But my custom data preprocessing method is able to compress even random data
    What do you mean here? (Preprocessing is not compression.)

    Quote Originally Posted by CompressMaster View Post
    And it´s noticeably faster, although compression ratio is not that good. 200KB original (preprocessed to almost 4MB - preprocessed, not decompressed, see the difference) to 174 KB lossless is possible with CMIX.
    I'm confused here. Faster than...?

    Quote Originally Posted by CompressMaster View Post
    @Gotty, what´s your progress with BSC algorithm anti-files?
    Thanx for asking. You posted an old version in exe format. You should have posted a link to the source. But don't worry, I have it.
    As I see the source code (which is more than 1 MB), it would take too much time to study the algorithm and engineer a proper anti-file. For comparison the source code for paq8px is around half a megabyte, I'm studying the latter for some time, and I'm still not at 100% . But my understanding was enough to create an anti file (and it was straightforward in that case). For BSC it's not that simple as with paq8px because there is a transformation step. Anyway, I verified: it tries to compress the data with using p<>0.5 probabilities.
    Unfortunately I don't really have months to spend for analyzing, reverse engineering the data flow.
    I'll need to give up on this challenge, sorry.
    Also, you would need to verify if the anti-files are really anti-files. Would you be able to do that?

  23. #169
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Hungary
    Posts
    398
    Thanks
    277
    Thanked 282 Times in 149 Posts
    Quote Originally Posted by CompressMaster View Post
    Well, I´m able to compress already compressed JPEGs (i.e. random data) by my custom preprocessing method (it´s not an algorithm) without decompression at all - see my second started thread.
    200 KB JPEG can be losslessly shrinked down to 174KB by CMIX without knowing that it´s JPEG image.

    You´re right. Patterns in non-random data. I´ve used my custom data preprocessing method to minimize randomness in original file.
    Do you know that if you don't preprocess the 200K JPEG file with your custom preprocessor then it is compressed much better? I tried it with paq8px without the JPEG model): 185K after your preprocessing, 176K in the original form. (Unfortunately I don't have the resources to run cmix.)
    It means, your custom preprocessing hurts compression.

  24. #170
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    467
    Thanks
    149
    Thanked 160 Times in 108 Posts
    Quote Originally Posted by CompressMaster View Post
    We have limited byte count - 256 possible values. That´s big advantage (and even for text files with an 0% of repeated symbols - i.e. every character is unique). That means at every 256-th position, there´s repeated sequence, but it´s hard to compress it without occupying more information than original. So random isn´t random, problem is how to compress it without occupying more info than original...
    "hard to compress it without occupying more information than original". You still don't understand unfortunately. Not hard, but impossible. (Possible on some files, but not on all and over time you'll never win).

    Please read and understand the link I listed before: http://www.faqs.org/faqs/compression...section-8.html

    I'm done with this thread. I've tried to help you, honestly. I'm not here to poke fun, but to save you from yourself. You will never succeed at random data compression and the evidence for why not is right there in that link.

    Non-random data... now that's an entirely different and more fruitful story. JPEG (even uncompressed JPEG) fits into that category.

  25. #171
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    82
    Thanks
    13
    Thanked 2 Times in 2 Posts
    I do read the counting arguments and everything about impossible way to compress random data. But i just curious, why human cant solve it. It maybe not impossible, its just that we didnt find the correct way.


    I currently try to find my mistake if the method isnt going to work. So far, this is the result :


    Im talking about the idea about output table size or compressed file size. If the ID only contain 2 digits lenght, the compressed file size is :

    N digit ID x 16 x 256

    2 x 16 x 256 = 8.192 bytes or 8 kb

    The input file to compress must not smaller than 8kb. Lets take example, i have a file with size 10.000 bytes. On the hex editor it have 10.000/16 column = 625 rows.

    To create the pattern on the rows i need (2^625)-1 =

    139.234.637.988.958.594.318.883.410.818.490.335.84 2.688.858.253.435.056.475.195.084.164.406.590.796. 163.250.320.615.014.993.816.265.862.385.324.388.84 2.602.762.167.013.693.889.631.286.567.769.205.313. 788.274.787.963.704.661.873.320.009.853.338.386.43 2

    I have that count of pattern (result using big integer calculator : https://defuse.ca/big-number-calculator.htm).

    And now i will try to make sure that the 2 digit ID are able to store that many pattern. Using any possible character to generate the ID.
    By observing the available character manually, i notice that there are about 216 character to create the ID (each character size is 1 byte). To create ID for 2 digits using possible combination of 216 character, i can use formula n^r, so i get :

    216^2 = 46.656 ID

    Great, im out of pattern's ID stock.
    And yes it fail and it is the mistake.


    Now i still try to think another possibilities. I have several alternative formula to reduce pattern and expand the stock of the ID. I need more time to create small experiment. Any kind of help will be appreciated. Thank you

  26. #172
    Member
    Join Date
    Jun 2018
    Location
    Slovakia
    Posts
    154
    Thanks
    44
    Thanked 10 Times in 10 Posts
    Quote Originally Posted by rarkyan View Post
    I do read the counting arguments and everything about impossible way to compress random data. But i just curious, why human cant solve it. It maybe not impossible, its just that we didnt find the correct way.
    Completely agree. Of course I´ve read the counting argument and also the random data compression, and I think that everything COULD be possible (although there are some limits as always, and for other things, limits also applies), although some things are hardly compressible than others. I´m not believer of lossless infinite compression, because some information MUST be stored of course, but if we will be able to express even random patterns with smallest information, then we will be able to compress even random data such as SHARND much better.

  27. Thanks:

    rarkyan (22nd August 2019)

  28. #173
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,423
    Thanks
    223
    Thanked 1,052 Times in 565 Posts
    1. Counting argument is not something to "solve", its just a visualization of an obvious fact, that there're more n-bit numbers than there're (n-1)-bit numbers.
    It only says that its impossible to compress any n-bit files without expanding some others.
    It doesn't say that you can't compress some random-looking files like jpegs or other compressed formats.
    Problem is that people tend to underestimate the number of files without any useful patterns in them.
    It may be counter-intuitive, but (n-1)/n of all n-bit files would have near-equal counts of 0s and 1s, for example.
    And when a file doesn't have any patterns to identify it with, it also means that its likely to be expanded rather than compressed.

    2. Actually we can safely pretend that random data don't exist on a PC. 99%+ files one can download from the net won't be random even if random-looking.
    Their compression also can be considered solved for any personal purposes - there're enough of free hosting options that can be used to store the data forever
    and just keep a hash for identification.
    But why do you think that its possible to losslessly compress a file without any programming skill, just with simple arithmetics?
    Let's say, we have a megabyte of zeroes encrypted with AES using some unknown password.
    Would you compress it by finding the password, or would you believe that there's a magical formula that can do it some other way?

  29. #174
    Member
    Join Date
    Jan 2017
    Location
    Germany
    Posts
    53
    Thanks
    28
    Thanked 11 Times in 8 Posts
    Quote Originally Posted by Shelwien View Post
    Let's say, we have a megabyte of zeroes encrypted with AES using some unknown password.
    Would you compress it by finding the password, or would you believe that there's a magical formula that can do it some other way?
    You've to take the cipher mode into account when you perform such an encryption.
    Some modes add additional randomness, others – like ECB – don't.

  30. #175
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,423
    Thanks
    223
    Thanked 1,052 Times in 565 Posts
    okay, let's make it specific:
    Code:
    openssl enc -e -aes-128-cbc -nopad -in zeroes -out zeroes.aes
    Attached Files Attached Files

  31. Thanks:

    WinnieW (23rd August 2019)

  32. #176
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    82
    Thanks
    13
    Thanked 2 Times in 2 Posts
    Quote Originally Posted by Shelwien View Post
    But why do you think that its possible to losslessly compress a file without any programming skill, just with simple arithmetics?
    Let's say, we have a megabyte of zeroes encrypted with AES using some unknown password.
    Would you compress it by finding the password, or would you believe that there's a magical formula that can do it some other way?
    Because life is like a puzzle sir. I mean, some pieces lack of information, others have a lot of information but maybe miss a tiny piece of information. Im not assuming my information useful because the big pieces already contain what they need. But maybe at some case, a little rusty bolt still needed to make the whole engine works.

    Other things, actually i need help from mathematician or programmer to solve the method. Because they have experience on this field of study. I only propose idea, and learn from the feedback whether it is fail or not. Try to find another path when it is blocked: how to go there, what if we use this or that way, etc.

    Human just try to evolve. Problem surely need a method to solve. I dont know, but if when someone trying something new to develop a good things and the other just stay : "stop. its useless". Well maybe we never hold a smartphone nowaday.

    My apologize for being stupid but really i dont want to mess on this forum. I just feels like sleeping in a big hall, surrounded by expert doing their great works. Somehow i can learn from everyone here. Even its very hard for me to understand, i want try and i need help.

    Forget it. I just want to find another way to deal with the pattern in lossless compression.

  33. #177
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,423
    Thanks
    223
    Thanked 1,052 Times in 565 Posts
    > I just want to find another way to deal with the pattern in lossless compression.

    That's ok, but you lack too much information, so you can't find any valid ideas.
    Its like wanting to beat 5G protocols while only knowing that smartphone is a shiny black box.
    At the very least to have to understand some basic concepts like information, entropy, probability, combinatorics, enumeration, Kolmogorov complexity.

    Funny thing is that its possible to design successful compression algorithms without any mathematical foundation -
    but that's only possible with known types of compression algorithms - like LZ77 or RLE.
    While what you seem to want - compressing data which is usually incompressible - is actually much harder to do
    and requires much more knowledge from all areas of computer science.

  34. Thanks:

    rarkyan (22nd August 2019)

  35. #178
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    82
    Thanks
    13
    Thanked 2 Times in 2 Posts
    Quote Originally Posted by Shelwien View Post
    > I just want to find another way to deal with the pattern in lossless compression.

    That's ok, but you lack too much information, so you can't find any valid ideas.
    Its like wanting to beat 5G protocols while only knowing that smartphone is a shiny black box.
    At the very least to have to understand some basic concepts like information, entropy, probability, combinatorics, enumeration, Kolmogorov complexity.

    Need some times to make validation on idea. I know im very lack too much information, so i need help from the expert. I still have several formula, and need to proved to be fail. Let me use my own ideas, and let me see my own mistake. Can anyone help me to make a little experiment on the pattern?

    I attach 10.000 bytes file.

    I need help to cut the hex editor view into only 1 column like this :

    Click image for larger version. 

Name:	1 column 10000 byte.jpg 
Views:	31 
Size:	170.9 KB 
ID:	6804

    I want to know how much each hex value fill every rows. Anyone?
    Attached Files Attached Files

  36. #179
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    467
    Thanks
    149
    Thanked 160 Times in 108 Posts
    Quote Originally Posted by rarkyan View Post
    Let me use my own ideas, and let me see my own mistake. Can anyone help me to ...
    There's an irony there.

    However to help you, use the unix "cut" tool. If you run Windows, then try installing the Windows subsystem for linux (WSL). I'm sure there are equivalent windows tools out there, but generally I find Unix to have readily available methods for basic file manipulation: cut, join, split, sort, in addition to trivial one-liners in simple programming languages (eg awk '{print $1}' would do the same too).

  37. Thanks:

    rarkyan (22nd August 2019)

  38. #180
    Member rarkyan's Avatar
    Join Date
    Dec 2010
    Location
    Tell Me Where
    Posts
    82
    Thanks
    13
    Thanked 2 Times in 2 Posts
    Quote Originally Posted by JamesB View Post
    There's an irony there.

    However to help you, use the unix "cut" tool. If you run Windows, then try installing the Windows subsystem for linux (WSL). I'm sure there are equivalent windows tools out there, but generally I find Unix to have readily available methods for basic file manipulation: cut, join, split, sort, in addition to trivial one-liners in simple programming languages (eg awk '{print $1}' would do the same too).
    Thanks in advance sir. Gonna search for that

Page 6 of 8 FirstFirst ... 45678 LastLast

Similar Threads

  1. Any money in data compression?
    By bitewing in forum The Off-Topic Lounge
    Replies: 18
    Last Post: 19th March 2019, 11:34
  2. Data compression explained
    By Matt Mahoney in forum Data Compression
    Replies: 92
    Last Post: 7th May 2012, 19:26
  3. Advice in data compression
    By Chuckie in forum Data Compression
    Replies: 29
    Last Post: 26th March 2010, 16:09
  4. Data Compression Crisis
    By encode in forum The Off-Topic Lounge
    Replies: 15
    Last Post: 24th May 2009, 20:30
  5. Data Compression Evolution
    By encode in forum Forum Archive
    Replies: 3
    Last Post: 11th February 2007, 15:33

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •