Results 1 to 22 of 22

Thread: Random files are compressed all the time, by definition is the issue

  1. #1
    Member
    Join Date
    Jan 2020
    Location
    Fl
    Posts
    49
    Thanks
    1
    Thanked 3 Times in 3 Posts

    Post Random files are compressed all the time, by definition is the issue

    Hi
    To put compress a random 255 character ASCII file in a 255 character limit file not likely.The dictionary/words/block size would need to be far greater which would mean a unrealistic bigger processing power. But to compress a random 10 digit 0-9 file in 255 characters likely, but not if you try to compress it in 10 characters.

    pointless
    I find compression more like cheap tricks than skills. With compression I see programs just take a file and try to go 1 step forward to compress it than go 1 step back and 2 steps forward. I see people selling compression programs but it does not mater since even a small % gain is not worth changing the standard. Many even still reply on Zip despite 7zip is better. File compression seems like maybe it should try to not compress it at 100% of its ability but have some room to take that 1 step back and try again which needs a new approach.

    math
    Large digit numbers can end up with luck to be compresses in a small size and fast time. Sometimes files compression comes down to luck. Example try to compress a simple number like 3,486,784,401 This can be compressed theoretically to 1 digit in 1 steps (90% compression). Answer is 9^9 (raises the quantity to a power of itself =9). But if 1 number is off or even if half the digit length then its impossible to beat the original number. Even the current compression methods have a limited instruction range in how to compress things. Things have to be set up right like domino to be taken down easily. But if you try to make a program that take that's advantage of that well that might be some tough math, unless you find a way around it. But how can someone dismiss something when not evaluating, or maybe I am wrong and if so just continue to your road block standard and don't look at other avenues.

    audio
    Even music files can not be compressed well but MIDI files can since it is presented in a different way. They are structured very differently since you are not compressing the tune since the tune is in the file that loads it but the pattern. Same thing with older art tile sets for old systems since they can have a large image yet compress better than modern pictures since you are compressing a pattern since the art set is in the file that loads it. Most programs now do not have the file that loads them as the main thing. Some do which is rare which you can not tell the difference form a 10mb instrumental audio file to a few KB custom music file which takes the MIDI approach to thing.

    image
    Even images files do not deal with the pattern but take the image as it is. Compression programs can not come close to actually compressing a file as the original program does to the file if the file was converted to modern formats. Which in it of itself is even "more" random. Example a 8bit game with half the artwork presented alone in png format (compressed only 5%) is over 55% bigger than the original game file compressed which has more art and program. In other words their is a at least a 55%gap in modern compression performance since it does not know how to set up let alone look for patterns in a small 300KB png image file with no artifacts and limited colors. Even those 2 examples show their is some reevaluation needed when it comes to compression in pattern recognition.

    setup
    The file has to be set up to be willing to be compressed just like a hyperactive moving child needs to calm down to take a picture. If its set up is correct the end result will be better, if the setup is not correct then cheap tricks can not fix it but more advanced image enhancing programs can set it up better. Just like how you can not change the initial code to a PNG file but maybe (not likely) you can try to set it up to be taken down like domino as was 9^9.

    switch
    Maybe compression has to think out of the box and people should try to avoid the standard and find something completely different to get different results. You will say I am wrong but its fine, I am just presenting a different perspective. For example another "trick" is if you switch something as simple as a binary files characters you can compress the file a little over 3% more. Yet compression programs don't take advantage of that. But then again compression programs is like trying to use a butcher knife to do everything form cut hair to surgery. If something is not flexible then its will break which becomes broken.

    skills
    I forgot general programming a long time ago but so I am talking from a different perspective but I only do programming formulas on spread sheets. But just like every other activity in the world you need others perspectives to give you the edge in things. You may make a better burger than McDonald but you sure can not sell more than McDonald's. Everyone that knows better than McDonald still lack a list of many skills form marketing, accounting, real estate, art, design, nutrition, psychology, etc and the list goes on to about "almost" every profession.

    chart
    In the end if you do it for fun then good, if you do it for work then not worth it. Some people like pointless activities and some try to be like Edison how he test over 100 light bulbs to note for research what method does not work to help save others time. But like a periodic table that predicted elements before discovered, it almost seems like we are all at the pre periodic table stage trying to stumble to find a new element in the dark without understanding what we have than focusing on the big picture. Again just another perspective.


    Thanks
    Last edited by Trench; 12th February 2020 at 05:33.

  2. #2
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    171
    Thanks
    20
    Thanked 61 Times in 30 Posts
    yet another "Random Compression" post
    by the way, all BWT based compressors do "go back 1 step then go forward 2 step" by BWT transform. it always enlarge the file with 4 or 8 bytes.

  3. #3
    Member CompressMaster's Avatar
    Join Date
    Jun 2018
    Location
    Lovinobana, Slovakia
    Posts
    199
    Thanks
    58
    Thanked 15 Times in 15 Posts
    Hi
    To put compress a random 255 character ASCII file in a 255 character limit file not likely.The dictionary/words/block size would need to be far greater which would mean a unrealistic bigger processing power. But to compress a random 10 digit 0-9 file in 255 characters likely, but not if you try to compress it in 10 characters.

    pointless
    I find compression more like cheap tricks than skills. With compression I see programs just take a file and try to go 1 step forward to compress it than go 1 step back and 2 steps forward. I see people selling compression programs but it does not mater since even a small % gain is not worth changing the standard. Many even still reply on Zip despite 7zip is better. File compression seems like maybe it should try to not compress it at 100% of its ability but have some room to take that 1 step back and try again which needs a new approach.

    math
    Large digit numbers can end up with luck to be compresses in a small size and fast time. Sometimes files compression comes down to luck. Example try to compress a simple number like 3,486,784,401 This can be compressed theoretically to 1 digit in 1 steps (90% compression). Answer is 9^9 (raises the quantity to a power of itself =9). But if 1 number is off or even if half the digit length then its impossible to beat the original number. Even the current compression methods have a limited instruction range in how to compress things. Things have to be set up right like domino to be taken down easily. But if you try to make a program that take that's advantage of that well that might be some tough math, unless you find a way around it. But how can someone dismiss something when not evaluating, or maybe I am wrong and if so just continue to your road block standard and don't look at other avenues.

    audio
    Even music files can not be compressed well but MIDI files can since it is presented in a different way. They are structured very differently since you are not compressing the tune since the tune is in the file that loads it but the pattern. Same thing with older art tile sets for old systems since they can have a large image yet compress better than modern pictures since you are compressing a pattern since the art set is in the file that loads it. Most programs now do not have the file that loads them as the main thing. Some do which is rare which you can not tell the difference form a 10mb instrumental audio file to a few KB custom music file which takes the MIDI approach to thing.

    image
    Even images files do not deal with the pattern but take the image as it is. Compression programs can not come close to actually compressing a file as the original program does to the file if the file was converted to modern formats. Which in it of itself is even "more" random. Example a 8bit game with half the artwork presented alone in png format (compressed only 5%) is over 55% bigger than the original game file compressed which has more art and program. In other words their is a at least a 55%gap in modern compression performance since it does not know how to set up let alone look for patterns in a small 300KB png image file with no artifacts and limited colors. Even those 2 examples show their is some reevaluation needed when it comes to compression in pattern recognition.

    setup
    The file has to be set up to be willing to be compressed just like a hyperactive moving child needs to calm down to take a picture. If its set up is correct the end result will be better, if the setup is not correct then cheap tricks can not fix it but more advanced image enhancing programs can set it up better. Just like how you can not change the initial code to a PNG file but maybe (not likely) you can try to set it up to be taken down like domino as was 9^9.

    switch
    Maybe compression has to think out of the box and people should try to avoid the standard and find something completely different to get different results. You will say I am wrong but its fine, I am just presenting a different perspective. For example another "trick" is if you switch something as simple as a binary files characters you can compress the file a little over 3% more. Yet compression programs don't take advantage of that. But then again compression programs is like trying to use a butcher knife to do everything form cut hair to surgery. If something is not flexible then its will break which becomes broken.

    skills
    I forgot general programming a long time ago but so I am talking from a different perspective but I only do programming formulas on spread sheets. But just like every other activity in the world you need others perspectives to give you the edge in things. You may make a better burger than McDonald but you sure can not sell more than McDonald's. Everyone that knows better than McDonald still lack a list of many skills form marketing, accounting, real estate, art, design, nutrition, psychology, etc and the list goes on to about "almost" every profession.

    chart
    In the end if you do it for fun then good, if you do it for work then not worth it. Some people like pointless activities and some try to be like Edison how he test over 100 light bulbs to note for research what method does not work to help save others time. But like a periodic table that predicted elements before discovered, it almost seems like we are all at the pre periodic table stage trying to stumble to find a new element in the dark without understanding what we have than focusing on the big picture. Again just another perspective.


    Thanks
    It is still impossible to compress random or already compressed data by the ways you have described. Read for example counting argument at matt mahoney´s book about data compression. The only possible way is to DECOMPRESS the input and then COMPRESS it by much stronger algorithms - for example JPEGs can be compressed up to 40%, but further compression requieres decompression first. I tried it many times alongside with all possible methods (including preprocessing) and results were always bigger files than original. So, no compression... forget about it. It´s impossible.
    Last edited by CompressMaster; 13th February 2020 at 12:46. Reason: added quote tags

  4. #4
    Member
    Join Date
    Aug 2015
    Location
    indonesia
    Posts
    381
    Thanks
    57
    Thanked 76 Times in 58 Posts
    Quote Originally Posted by CompressMaster View Post
    It is still impossible to compress random or already compressed data by the ways you have described. Read for example counting argument at matt mahoney´s book about data compression. The only possible way is to DECOMPRESS the input and then COMPRESS it by much stronger algorithms - for example JPEGs can be compressed up to 40%, but further compression requieres decompression first. I tried it many times alongside with all possible methods (including preprocessing) and results were always bigger files than original. So, no compression... forget about it. It´s impossible.
    Yes I agree

  5. #5
    Member
    Join Date
    Jan 2020
    Location
    Fl
    Posts
    49
    Thanks
    1
    Thanked 3 Times in 3 Posts
    RichSelian When I means 1 step back 2 step forward I mean to make it permanent and not finalize to use ASCII which makes increased the randomness which makes it harder.

    Quote Originally Posted by CompressMaster View Post
    It is still impossible to compress random or already compressed data by the ways you have described. Read for example counting argument at matt mahoney´s book about data compression. The only possible way is to DECOMPRESS the input and then COMPRESS it by much stronger algorithms - for example JPEGs can be compressed up to 40%, but further compression requieres decompression first. I tried it many times alongside with all possible methods (including preprocessing) and results were always bigger files than original. So, no compression... forget about it. It´s impossible.
    As stated even a simple binary is random but you only deal with 2 characters unlike ascii which you deal with 255 characters which makes it even moe random. Both random but the odds are worse. It is not like people are compressing numbers with only numbers or binary with binary. But binary with number ascii. What will people come up with next a extended version of ascii to have 100,000 characters which is not the solution to throw more characters at the problem which might make it work but that has its other issues.

    I gave various perspectives from even as simple as a png image how its not compressed over 50% as it should be yet that is random. Or how one has to change the format like the music file over 50%.

    I agree its hard to compress with current method since it reaches its limit and so should other perspective be looked at?
    Maybe try to decompress first and see where they gets you as silly as that sounds but at lest is not trying the conventional.

    Even if you do achieve something to get a 50% gain out of all other compression it still not easy to be a standard.

    Also you tried many things that did not work, did you and other have a place to log what does not work so that others see and try a new path or see how to improve?
    Even I showed how changing a simple thing as 2 characters compressed an extra 3% yet how come compression programs dont use that? And if they dont know about it why not? and what standard is their to make what works or doesnt know in a chart? none? So its like going blind and we are using luck to find logic?

    As for ascii I think people should try to limit the use of it to go further due to the randomness odd are much greater than the lower randomness odds with something like numbers. You have a 1/2 change to guess a 1 or 0, a 1/10 for 0-9, and 1/255 for ascii. All random but the odds are different is what i mean since compression programs have a limit which maybe for good reason to have the limited dictionary.

    Maybe more formulas are needed as well since their is plenty room for improvement as dome examples shown. Again you will disagree or maybe I am right and time will tell, but its another perspective.

  6. #6
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    615
    Thanks
    261
    Thanked 242 Times in 121 Posts
    Quote Originally Posted by Trench View Post
    I gave various perspectives from even as simple as a png image how its not compressed over 50% as it should be yet that is random.
    I'm not sure if this is what you wanted, but compression a png image to less than 50% of its size is not impossible at all:

    Code:
    test.png:  80,225 bytes (attached image)
    test.pcf:  52,864 bytes (Precomp 0.4.7, completely reversible to get the original PNG file)
    test.webp: 38,224 bytes (cwebp -lossless -q 100 -m 5, same image data as the PNG)
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	test.png 
Views:	40 
Size:	78.3 KB 
ID:	7410  
    http://schnaader.info
    Damn kids. They're all alike.

  7. #7
    Member
    Join Date
    Sep 2018
    Location
    Philippines
    Posts
    121
    Thanks
    31
    Thanked 2 Times in 2 Posts
    Quote Originally Posted by CompressMaster View Post
    It is still impossible to compress random or already compressed data by the ways you have described. Read for example counting argument at matt mahoney´s book about data compression. The only possible way is to DECOMPRESS the input and then COMPRESS it by much stronger algorithms - for example JPEGs can be compressed up to 40%, but further compression requieres decompression first. I tried it many times alongside with all possible methods (including preprocessing) and results were always bigger files than original. So, no compression... forget about it. It´s impossible.
    @CompressMaster, why are you trying random compression? I don't mean to pry, but is it your line of work, data compression? I think there are many companies out there trying to solve random data compression. I am not a programmer now professionally but i had done intermittent work on these compression stuff (zip format, bmp, gif, png, jpeg, vcd/svcd, dvd, paq etc.) in the 1970s and the 80s when i was a precocious child and grade-schooler. I am now just coding in C, too lazy to re-learn C++. Data compression remained an interest though.
    Last edited by compgt; 3rd July 2020 at 17:44.

  8. #8
    Member
    Join Date
    Jan 2020
    Location
    Fl
    Posts
    49
    Thanks
    1
    Thanked 3 Times in 3 Posts
    Quote Originally Posted by schnaader View Post
    I'm not sure if this is what you wanted, but compression a png image to less than 50% of its size is not impossible at all:
    The file you compressed has mostly empty space and few colors, just like any file if its mostly empty it will go the same. But here are some examples of something more complex that cant be compressed desire original program can.

    Unlike something like an NES game map online which the size is 1,907kb and compressed to 1,660kb. While looking online for the file size it says it is 75KB (95% compression) despite it includes more art, sound, and code.
    Another game is earthbound on the list which the map is 3.4mb while the file size online says 194KB (94% compression)
    This applies to 100mb png image files that can not come close to the original which would probably be 5MB

    Plenty of patterns yet compression can not even recognize the patters.
    Shouldn't this be a simple thing that is not implemented yet? 95%!
    https://vgmaps.com/Atlas/NES/CastlevaniaII-Simon'sQuest-Transylvania(Unmarked).png


  9. #9
    Member JamesWasil's Avatar
    Join Date
    Dec 2017
    Location
    Arizona
    Posts
    107
    Thanks
    96
    Thanked 19 Times in 18 Posts
    There's a lot of things to address here, but first this:

    1) Compgt: PAQ did not exist in the 70's or 80's. Wasn't conceptualized until the middle of the 1990's, I remember reading the initial newsgroup and forum threads for it back then under comp.compression and other forums/sites that no longer exist. Some of the file formats mentioned didn't either. Just politely letting you know of that so that you don't get flamed for that someday in a conversation or post, even if you were working on predecesor algorithms at that time.

    Trench: You have to understand the difference between pixel by pixel compression and macro sprites. The NES, Sega Master System, and other game systems from that time used a z80 or similar processor which ran anywhere from 1mhz to 8mhz baed on what it was. The graphics and audio were usually separate but dedicated processors running at about the same speed. Even still, while having a processor for each made a lot of things possible that a single chip would have struggled to do for the time, ram and storage space was still really expensive. They didn't want to use bloated image formats and they needed animation and scrolling to be fast.

    What they did is use sprites that were limited to 64x64 pixels (or less) and made background sprites that were used as tiles to create the images, maps, and backgrounds that you see. Larger Sprite animations were at times 2 or 3 of those large blocks synchronized to move together, but at times did flicker because of latency and refresh rate issues when the gpu tried to do too much at once, which was evident when too many sprites were on the screen at the same time.

    What this means is that out of a given screen area of say 480x284 (may be different but as an example), the entire background "image" was a jigsaw-piece assembled Sprite layer of tiles that were "stamped" in blocks, to where 64x64 pixel blocks squared which is the equivalent of 4096 pixels were represented with a pointer that was either 6 bits or 1 8 bit byte. This means that for every 4096 pixels, they were represented by 1 byte rather than 16. Yes, you might be able to fit entire images - stamped as Sprite macros to form that image - at 1/16th it's size for an NES game. But any image you're able to make with it is repetitive and restricted to Sprite artwork only loaded. PNG, GIF, and even lossy formats like JPEG are NOT restricted to premade macro images / aka graphical text fonts and have to be able to process, compress, and display ANY pixels you throw at it for an image. The same was done for audio and audio effects to make sure it all fit under 256 to 512k per cartridge. The earlier systems like the first generation of Sega Master Systems had to fit under 64k on game cards for the SG1000, and you were able to see the differences and restrictions even more to get it to fit.

    There are different types of compression, and not all are equivocal to one another. If there is a database of apriori knowledge, compression can be done with that. Compression isn't limited to the poor explanation of the tired and repetitive references to mahoney's "data compression explained", because in truth that is data compression only in one form and only explained one way. There are many other ways it can happen, and the NES and Sega implementation of Sprite image layers for KNOWN data demonstrate that.

    Compression of unknown data becomes more difficult of course, but not impossible. Just very difficult and only possible with methods that can address it.

  10. #10
    Member
    Join Date
    Sep 2018
    Location
    Philippines
    Posts
    121
    Thanks
    31
    Thanked 2 Times in 2 Posts
    Quote Originally Posted by JamesWasil View Post
    There's a lot of things to address here, but first this:

    1) Compgt: PAQ did not exist in the 70's or 80's. Wasn't conceptualized until the middle of the 1990's, I remember reading the initial newsgroup and forum threads for it back then under comp.compression and other forums/sites that no longer exist. Some of the file formats mentioned didn't either. Just politely letting you know of that so that you don't get flamed for that someday in a conversation or post, even if you were working on predecesor algorithms at that time.
    @JamesWasil, Ok... that's the official "history" you know.

    Quote Originally Posted by compgt
    These PAQ compressors were still being developed in the 1980s when i was in grade school. But it was actually started earlier than that, in the 1970s when i was a really young boy. It was our project for the US information theory education materials for the future, funded by us. What happened (the researchers' results) will be part of the history of US information theory, i said. These materials included Data Compression The Complete Reference (Salomon's book), Data Compression Explained and LTCB programs, released only in 2000s. I was running LTCB in the 1970s that some compressors were developed by my aunts or uncles, and really hardcore militaristic people. These programs were just homework assignments for advanced students before in Masbate, Masbate, Philippines. I am thinking DCE and the LTCB compressors were just given again to Matt Mahoney by the government in 2000s and officially launched them, or he already possessed, developed or added on those materials since the 80s, like some people here owning 1980s compressors but published or released only in 1990s or 2000s.

    We created ZIP format too and video formats such as VCD, SVCD, and DVD. We wrote and scheduled comp.compression FAQ. I dictated and moderated as well in data encryption, approving DES and AES, both i know from the 1970s. That's during Martial Law Cold War Philippines, the Philippines still occupied by Americans that time.

    My memories are a little foggy but if i hear my voice in the official songs of Bread, Nirvana, America, Queen and Scorpions etc, they just reinforce my belief in my computer science memories.

    -- Gerald

    But i remember this the classified top secret part of 1970s to 80s Martial Law Cold War when the Americans were here in the Philippines. It was me a kind of "central hub" among scientist networks that time (so i learned from many, and privy to state of the art science and technologies that time), dictating on what-would-be computer science history for the 80s, 90s and 2000s (e.g. in data compression and encryption, IBM PC, Intel, AMD and Microsoft Windows dominance etc.). Just think of me then as a top military analyst. I mean, i wasn't just a player on all this; it was me moderating on everything tech. I knew i already co-own Apple and Microsoft. I guess i decided to officially be co-founder of Yahoo, Google and Facebook, but didn't happen officially in the 1990s and 2000s. There was "too much fierce" competition amongst tech companies. I mean, it was Cold War, a real war. The Cold War officially ended in the early 1990s, with the Americans leaving the Philippines, military bases left behind or demolished.

    In short, the real computing history of US (and the world) was made, written, and decided here in the Philippines, with me. I chose Steve Jobs. I made Bill Gates, bettering his profile more and more. I chose Sergey Brin and Larry Page for Google as a sign of peace between the US and Russia, and i decided for a Mark Zuckerberg Chairman-CEO profile for Facebook.

    Too many ownerships for me in tech that they became greedy, wanted my ownerships for themselves, or decided to move on without me. That is, they asserted to other player groups my decisions or timetable for them to own the tech giants, but without me. What kind is that?! In late 1980s, however, they reminded me of Zuckerberg and Facebook, implying a chance for me to officially "co-found" Facebook.

    I remember this encode.su website and GUI (1970s), as the names Shelwien, David Scott, Bulat Ziganshin, dnd, Matt Mahoney, Michael Maniscalco, Jyrki Alakuijala. Some of them would come to me in the Philippines in the 80s...if it was really them. By early 1990s i was forgetting already. In the mid 2000s i was strongly remembering again. If i hear my voice in the bands "official" recordings of Bread, Nirvana, America, Queen, Scorpions etc, i then strongly believe these computer science memories.
    Last edited by compgt; 3rd July 2020 at 17:36.

  11. #11
    Member
    Join Date
    Jan 2020
    Location
    Fl
    Posts
    49
    Thanks
    1
    Thanked 3 Times in 3 Posts
    James you know better but I was giving that as an example. I could have made a 8x8 or 16x16 sprite and duplicated it to 10000x10000. Point is the for the compression program to have a library of methods to handle certain structures.

    Sure plenty of variations but to have a reasonable progressive structure in the compression program, to take unto account patterns, gradient, shapes, etc.

    Sometime looking at things conventionally we get limited to conventional limits.
    How much does compression get better every year?
    How many types of compression are their and can they be charted out in a logical manner?
    What compression methods did not work?
    How many hours have people wasted trying to look at something that thousands of others have wasted looking at which could have looked elsewhere?
    Obviously these all cant be answered easily but maybe before people try to make the next best compression program to understand the basics information.

    If 1 digits can compress 4 digits and 4 digits can compress 16 digits the why cant the new 4 digit can not compressed to 1 again? It cheated in the first place maybe to reply on another format like like hex or ASCII?

    In the end is
    recognizable randomness vs unrecognizable randomness. Or no pattern vs pattern
    its all random but programs only prepare for certain patterns and not all since it can be confusing.

    If we never has ASCII or HEX would compression be possible?
    Or
    If we take on the complexity of unrecognizable be more recognizable of 1@C/ to 1234 would that help?
    Again you guys know best and I am just asking questions.

  12. #12
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    157
    Thanks
    39
    Thanked 47 Times in 27 Posts
    Quote Originally Posted by Trench View Post
    The file you compressed has mostly empty space and few colors, just like any file if its mostly empty it will go the same. But here are some examples of something more complex that cant be compressed desire original program can.

    Unlike something like an NES game map online which the size is 1,907kb and compressed to 1,660kb. While looking online for the file size it says it is 75KB (95% compression) despite it includes more art, sound, and code.
    Another game is earthbound on the list which the map is 3.4mb while the file size online says 194KB (94% compression)
    This applies to 100mb png image files that can not come close to the original which would probably be 5MB

    Plenty of patterns yet compression can not even recognize the patters.
    Shouldn't this be a simple thing that is not implemented yet? 95%!
    https://vgmaps.com/Atlas/NES/CastlevaniaII-Simon'sQuest-Transylvania(Unmarked).png
    This map is rendered with a map loader that runs commands of the type: At x, y put source tile/sprite z. Tiles and sprites are either perfectly full chunks or chunks with transparency (bit mask). Something as simple as LZ compression can recognize 1D segments of these tiles/sprites, which is a lot more useful than .png (looks for linear patterns, not matches, then entropy reduction).

    Simple template matching with fixed-size images can match those patterns and reproduce the original instructions of (x,y,z). Or to put it simply, .png is wrong information representation for that format.

  13. #13
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,982
    Thanks
    298
    Thanked 1,309 Times in 745 Posts
    > Sometime looking at things conventionally we get limited to conventional limits.

    Its not about conventions, but rather about hardware constraints.
    For example, there's a known method which allows to combine most of different existing compressors together as cmix submodels.
    Also just using more memory and more contexts still helps too.
    But business only cares about efficient algorithms, like ones that transparently save storage space.

    Also from the p.o.v. of Kolmogorov Complexity the common arithmetics operations
    are less efficient than NNs or LPC: https://en.wikipedia.org/wiki/Function_approximation

    > How much does compression get better every year?

    For cmix its about 250kb per year.

    > How many types of compression are their and can they be charted out in a logical manner?

    Compression is approximate enumeration of datatype instances,
    so we can say that there're N*M types of compression, where
    N is the number of types of data and M is the number of approximation methods...
    essentially its infinite.

    > What compression methods did not work?

    Mostly ones that you suggest, I suppose.

    "Common sense methods" mostly don't work: https://en.wikipedia.org/wiki/List_of_cognitive_biases

    > How many hours have people wasted trying to look at something that thousands
    > of others have wasted looking at which could have looked elsewhere?

    I'd say less than necessary.
    Compression developers too frequently don't actually look at the data at all
    (because hex viewers are not integrated in OS and/or IDE, etc),
    but instead work based on their imagination and test results.

    > If 1 digits can compress 4 digits and 4 digits can compress 16 digits
    > the why cant the new 4 digit can not compressed to 1 again?

    Because people mostly care about decompression... and a single digit
    has too few different values to be decompressed to many different files.

    > It cheated in the first place maybe to reply on another format like like hex or ASCII?

    Compression algorithms normally only work with binary data -
    there's no hardware for efficient storage of eg. decimal digits (or ASCII),
    thus there's no software to support it.

    > In the end is recognizable randomness vs unrecognizable randomness.

    In the end most of data sequences are incompressible, there're just too many of them,
    much more than the number of possible generator programs of smaller size
    (since these have dups and failures).
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	cmix_v1-18_gain_per_year.png 
Views:	50 
Size:	13.3 KB 
ID:	7415  

  14. #14
    Member
    Join Date
    Jan 2020
    Location
    Fl
    Posts
    49
    Thanks
    1
    Thanked 3 Times in 3 Posts
    The original file formats are outdated in how they handle things. But o well. well at least its not BMP format



    True its infinite just like combining elements to many different alloys, but at least their is a basic chart of the limited elements.

    Compression and decompression go hand in hand which I assume was implied.

    " cmix its about 250kb per year." 250k out of what a 100mb file? a gain o under 0.25% doesnt that seem worth the effort? HD sizes go bigger and faster than that every year.

    So you say we reached a limit of compression with current hardware and to get better results need more powerful hardware?
    If so why bother trying when their is not much to do??

    if so i guess we have to wit until cmix gets better to incorporate the various meths in 1 since modern files have everything from text, music, art, etc.

  15. #15
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,982
    Thanks
    298
    Thanked 1,309 Times in 745 Posts
    > " cmix its about 250kb per year." 250k out of what a 100mb file?

    250kb smaller compressed size of 100M file (which is around 15mb).
    I meant this: http://mattmahoney.net/dc/text.html#1159

    The surprising thing is that progress was equal during 4-5 recent years,
    even though one would normally expect exponential decay here.

    > HD sizes go bigger and faster than that every year.

    Not really. Storage becomes less and less reliable instead.

    > So you say we reached a limit of compression with current hardware and
    > to get better results need more powerful hardware?

    In a way - its possible to achieve new results with hardware implementations of advanced algorithms.

    But in practice I don't think that we're already up to date with modern hardware (SIMD,MT,x64).

    > If so why bother trying when their is not much to do??

    There's very much to do, for example video recompression basically doesn't exist atm.

  16. #16
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    615
    Thanks
    261
    Thanked 242 Times in 121 Posts
    Quote Originally Posted by cade View Post
    Simple template matching with fixed-size images can match those patterns and reproduce the original instructions of (x,y,z). Or to put it simply, .png is wrong information representation for that format.
    Exactly. There's an additional detail of the PNG algorithm that shows why the algorithm performs so bad: It has the default deflate window size of 32 KB. Now, looking at that Castlevania PNG, width is 13895 and bit depth is 24 bit, so each line takes 41 KB. So the algorithm doesn't even see the previous line.

    Webp is much better for this kind of data as it recognizes block repetitions:

    Code:
    1.952.389	Original PNG (CastlevaniaII-Simon...)
      763.246       cwebp -lossless -q 100 -m 5
      729.836       same cwebp, after that paq8p -3 (so the webp file still has some redundancies)
    Also note that the original PNG contains some additional data (text and arrows), removing them reduces the color count from 440 to 73 and improves compression further:

    Code:
    1.975.238       Modified PNG
      675.626       cwebp -lossless -q 100 -m 5
      640.962       same cwebp, after that paq8p -3
    Still 8 times bigger than the NES ROM, but you get the idea. Also, the non-modified version stresses the point that PNG/cwebp are universal compressors, so they can process anything, even when modified.
    Last edited by schnaader; 19th February 2020 at 11:57.
    http://schnaader.info
    Damn kids. They're all alike.

  17. #17
    Member
    Join Date
    Jan 2020
    Location
    Fl
    Posts
    49
    Thanks
    1
    Thanked 3 Times in 3 Posts
    Storage less reliable? I guess to some things. I remember 1980's hd slowly broke down to have time to transfer but now they just die without warning. But i figure the ssd might get better. But thats another topic.

    hardware does not seem to increase as fast as it use to.
    "250kb smaller compressed size of 100M file (which is around 15mb)." So a 2% gain a year sounds good compared to my suggested 3% one time only trick to switch a highly repeated upper characters to a lower character value. But again dont know if that can be implemented.

    As for the png yes better but not many read to view picture formats do as better job.

    But thanks for the info.

    How does one think out of the box when everyone is use to thinking in the box?
    Sure I can say things that are out of the box that does not work but at least I gave some suggestion and other have too i assume, which maybe 1 out of 100 out of the box ideas may stick or at least inspire an idea for progress. A lof of things in life done from a new perspective which inspires others to ass on.

    Even something silly as playing with definitions of the word to say their is no impossible random compression which everyone used that phrase which puts a road block in peoples mind. Which no one can dispute that it is not random that is the issue. And with that in mind random can be used to help things. If anything make the file more "random" to compress.

    An when I say random I find it hard to believe people would only think about compression only and not decompression as I stated before.

    So make a organised chaos since the file is luck in what type you get in how much you compress it so why not mix it up more to see if the "odd" are better? Maybe wrong again but as stated above its not random that is the issue but incomprehensible.

    Even a silly suggestion is to make a compressed file from a code like this 11335578... to add a basic pattern code that takes 1bit like 10101010... for the compressed file to be 12345679... and to be done many times if needed with other selected patterns logged to be compressed/decompressed.

    here is a random number what pattern does it have 128705610 ? at first whack at it nothing throw some other selected patterns in it and see when you get something. Thing is to see a list of various patterns it can handle. odds/evens. high/low, etc.. Maybe programs do that already then dont mind me. But can a machine find a pattern to that randomly pressed number? 8/9 numbers have a pattern

    remember those IQ tests how you had to solve find patterns? experts say most fail to get 100% and only less than 1% getting them all right. They dont say that stuff to people.
    Even something as silly as rubix cube can be solved by a computer in record time since they say its a math problem in a way.

    Take 100 pennies flip them and you get close to 50% heads and 50% tails, take the heads out and flip again the the other 25%...

    What that 4 out of the box points in this post? lets see what others have for ideas reach 100 of them and see if 1 sticks. or maybe its less than 100 to be 25 out of 1.

    no need to comment its just a thought.

  18. #18
    Member
    Join Date
    Apr 2009
    Location
    The Netherlands
    Posts
    83
    Thanks
    6
    Thanked 18 Times in 11 Posts
    Code:
    public static boolean backwardsMatch(java.util.BitSet bitSet, int setIndex, char[] pattern, int maxDepth)
        {
            // These two values are set when we observe a wildcard character.  They
            // represent the locations, in the two strings, from which we start once
            // we've observed it.
            //
            int startIndex = setIndex;
            int setBookmark = 0;
            int patternBookmark = 0;
            int patternIndex = 0;
    
            // Walk the text strings one character at a time.
            while (startIndex - setIndex < maxDepth) {
                // How do you match a unique text string?
                if (pattern[patternIndex] == '*') {
                    // Easy: unique up on it!
                    while (++patternIndex < pattern.length && pattern[patternIndex] == '*') {
                    }                          // "xy" matches "x**y"
    
                    if (patternIndex == pattern.length) {
                        return true;           // "x" matches "*"
                    }
    
                    if (pattern[patternIndex] != '?') {
                        // Fast-forward to next possible match.
                        while (bitSet.get(setIndex) != (pattern[patternIndex] == '1')) {
                            if (--setIndex == -1 || startIndex - setIndex >= maxDepth)
                                return false;  // "x" doesn't match "*y*"
                        }
                    }
    
                    patternBookmark = patternIndex;
                    setBookmark = setIndex;
                } else if (setIndex > -1 &&  bitSet.get(setIndex) != (pattern[patternIndex] == '1') && pattern[patternIndex] != '?') {
                    // Got a non-match.  If we've set our bookmarks, back up to one
                    // or both of them and retry.
                    //
                    if (patternBookmark > 0) {
                        if (setIndex != patternBookmark) {
                            patternIndex = patternBookmark;
    
                            if (bitSet.get(setIndex) != (pattern[patternIndex] == '1')) {
                                // Don't go this far back again.
                                setIndex = --setBookmark;
                                continue;      // "xy" matches "*y"
                            } else {
                                patternIndex++;
                            }
                        }
    
                        if (setIndex > -1) {
                            setIndex--;
                            continue;          // "mississippi" matches "*sip*"
                        }
                    }
    
                    return false;              // "xy" doesn't match "x"
                }
    
                setIndex--;
                patternIndex++;
    
                // How do you match a tame text string?
                if (setIndex < 0) {
                    // The tame way: unique up on it!
                    while (patternIndex < pattern.length && pattern[patternIndex] == '*') {
                        patternIndex++;           // "x" matches "x*"
                    }
    
                    if (patternIndex >= pattern.length) {
                        return true;           // "x" matches "x"
                    }
    
                    return false;              // "x" doesn't match "xy"
                }
            }
    
            return false;
        }
    This is a function in Java to be able to use wildcard patterns on bitsets. I found a piece of code on the DrDobbs website and changed it from forwards to backwards and from characters to bits. Now you can do "000?1*" and it will return true if for some index you find this pattern.

    The second step is that I made a function that generates N pseudo-random patterns and goes over the bitset to find matches. I build an simple filter that recognizes not so useful patterns so we end up with at least probable useful patterns. Lets say 10 bits (1024) worth of patterns. I take the pattern that has the biggest entropy gain (P1 * matchCount) and save the index of the pattern with the probability converted to 8 bits. This way a pattern costs just 18 bits to save to the disk. The gain has to be at least 18 bits to improve compression.

    I ran this algorithm on AMillionRandomDigits.bin and a 7zipped Book1. The best I could find was a gain of -8 bits. Or in other words: a loss of 1 byte for every try to compress the uncompressible. So no luck here.

    Okay, lets cheat... Lets generate hundreds of thousands of patterns on AMillionRandomDigits.bin and try to gain more than 18 bits. I could not find anything...

    Lets use a different approach and lets use 10000 patterns in parallel and use a conservatively configured Paq-like mixer to try to compress AMillionRandomDigits.bin and the 7zipped Book1. My best result was 2 bits loss on AMillionRandomDigits.bin and 8 byte gain on the 7zipped book1. Reason? 7zip has some compressable bits on the start and the end of the file like mentioning the filename and a bit of checksum stuff.

    I think this is a good approach of trying to compress the random and it still fails. It fails because random means not predictable and that means compression is not possible.

    I could try to use Data Science to assemble a random forest of decision trees, boosted by a modified version of Gradient Boosting (to entropy) that does the job for every random file you input. And then I hide the decision trees in the executable. Compression by obscurity. Still no theoretical gains though!

  19. #19
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,982
    Thanks
    298
    Thanked 1,309 Times in 745 Posts
    @Kaw:
    digits.bin is a bad target. That data is known to have at least 50 bits of redundancy (column parity), but you can't find it in packed format.

    https://encode.su/threads/3122-dec2b...on-digits-file

  20. #20
    Member
    Join Date
    Jan 2020
    Location
    Fl
    Posts
    49
    Thanks
    1
    Thanked 3 Times in 3 Posts
    Imagine if the program took an hour to scan for what pattern to use then compress it. Would that be worth it? it would make the file just as "random" but at least to try and be more "comprehensible" to be compression friendly.

    nice effort but maybe better to learn some other ideas by experimenting in small scale than go big to see results.

    The skill people have is good here to be able to make such programs but do not have infinite perspectives which one has which takes exploration. Kind of how people use to go on trips was to experience new thing to learn and bring back home to use it and make your home better. Ancient Greeks went around gather information and put it in books for that other add on to ideas until something with a certain perspective can add up the things to make something new and that new thing inspires others when they gather other things and other better things happen. But now modern time to just visit see what you can and bring nothing back but selfies to say they were their which is only important to them and their circle and no one else. it would be like going to the supermarket to just look just for the sake of looking which has no purpose. Anyone take a selfie in a supermarket? LOL Obviously i am giving a silly example but you get the idea.

    Some people like wasting hours leveling up in a game to erase all progress and forget the game in a few months while others have fun figuring out math problems that helps themselves and/or everyone a lifetime. And some ideas go nowhere which is an example for others to not try their... if done right since some dead ends are false dead ends.


    replacing random with other random does not help but make it more meaningful if you are going to do it. Have the code find something that can become a pattern with the help of a pattern. I dont mean scan the entire file but randomly pick some spots in the file to see if they have similarities to input a pattern or 2 to make it be smaller not bigger. Again that can be a dead end but to say you have a loss makes it sound like you are doing something way off. best if the program scans which pattern is best to get a better result than to just do without scanning.


    Imagine if the program took an hour to scan for what pattern to use then compress it. Would that be worth it? it would make the file just as "random" but at least to try and be more "comprehensible" to be compression friendly.

    just like the random number presented before instead of finding a solution to find how many way it can be broken down.

    128705610

    LLHHLHHLL High Low the center 0 is off to need another L
    OEEOEOEOE Odds Evens the 2 after would have needed a O
    GGGLL
    GGLL greater lower than previous which 1st ones off so based off that then
    228815621 pattern to add 100110011 based off average of the 3 previous patterns
    but patterns can be infinite which will take time.

    If it be at small chunks of the file to have 2 random numbers and put it in blocks which would be a bigger file or the single pattern though out. in a way it is a math problem which the would need a lot of processing. Or maybe a dead end.

    Again I am not saying it will work in practice but in theory. Just like the Irobot vacuum how eventually within time it will clean the home.



    Another odd thing is that it can be subtracted by some of #2 doubling of the numbers as the chart shows The left is what is misused first.
    which strangely enough if you put it a 1 next to an active number and zero next to the non active row you get 111101010111110010001001010 which is exactly the same number 128705610. if it was dividable by 3 it would be
    101000111010100001011011100 with works 2/3 of the times. and converting the binary number would change to be 85803740. Other numbers like 6 it would work and is 1 digit less to be 10100011101010000101101110 which if you convert that number to decimal it is 42901870 again 1 digit less. But by 9
    1101101000110101110011111 which needs a -2 to make it work since the last binary digit remains the same if 7 number off.

    I figure that means nothing but throwing it out their.




    by 2 to find 128705610
    67108864 1 1
    33554432 2 1
    16777216 3 1
    8388608 4 1
    4194304 0
    2097152 5 1
    1048576 0
    524288 6 1
    262144 0
    131072 7 1
    65536 8 1
    32768 9 1
    16384 10 1
    8192 11 1
    4096 0
    2048 0
    1024 12 1
    512 0
    256 0
    128 0
    64 13 1
    32 0
    16 0
    8 14 1
    4 0
    2 15 1
    1 0

  21. #21
    Member
    Join Date
    Apr 2009
    Location
    The Netherlands
    Posts
    83
    Thanks
    6
    Thanked 18 Times in 11 Posts
    Quote Originally Posted by Shelwien View Post
    @Kaw:
    digits.bin is a bad target. That data is known to have at least 50 bits of redundancy (column parity), but you can't find it in packed format.

    https://encode.su/threads/3122-dec2b...on-digits-file
    50 bits of redundancy is not a very big deal if you talk about randomness. For compression you have basically 2 options:
    1. On the fly statistics and trying to compress the data with partial knowledge. (Time serie prediction)
    2. Use knowledge of the entire file, but then you have to include a description of this knowledge in the output file.

    If you look at time series prediction and randomness, it's really hard to find working patterns that will help you to compress the file. In a 100% random file 50% of the patterns will have a bias to 0 and 50% a bias to 1. Half of those patterns will end up having a bias the other way around at the end of the file. There might be a pattern finding 50 bits of redundancy, but which is it? And will it hold up for the second half of the file? If you are able to make an algorithm that finds that 50 bits redundancy in this file it will be very strong AI.

    If you look at prior knowledge: how do you describe this redundancy or bias within 50 bits? Also this would be a major improvement if you are able to describe fairly complex patterns within an amount of bits that's below the advantage you get from describing it.

  22. #22
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,982
    Thanks
    298
    Thanked 1,309 Times in 745 Posts
    I want to try writing an "Anti-LZ" coder - one that would encode some bitstrings that don't appear in the data,
    then enumeration index of the data with known exclusions.

    As to Nelson's file though, its produced with a known process: https://www.rand.org/pubs/monograph_...18/index2.html

Similar Threads

  1. precomp - further compress already compressed files
    By schnaader in forum Data Compression
    Replies: 87
    Last Post: 22nd August 2020, 07:22
  2. Compressing pseudo random files
    By Sportman in forum Random Compression
    Replies: 57
    Last Post: 3rd January 2020, 22:20
  3. Replies: 3
    Last Post: 16th May 2017, 22:08
  4. How to expand .ff compressed files using Precomp & Fsum???
    By Manjunath in forum The Off-Topic Lounge
    Replies: 21
    Last Post: 7th September 2014, 13:47
  5. Replies: 12
    Last Post: 30th June 2007, 16:49

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •