Results 1 to 21 of 21

Thread: Any tutorial how to make a simple compression program?

  1. #1
    Member
    Join Date
    Jan 2020
    Location
    Fl
    Posts
    32
    Thanks
    0
    Thanked 2 Times in 2 Posts

    Any tutorial how to make a simple compression program?

    A intro for non programmers to try basic compression?

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,909
    Thanks
    291
    Thanked 1,271 Times in 718 Posts
    1) Find a basic compressor in http://mattmahoney.net/dc/text.html
    2) Try it
    ...
    3) Profit

    Or learn C or C++ programming, download some open-source compressor and read the source.
    Unfortunately C/C++ have best compilers, a compressor would be automatically 3-4x slower if
    another programming language is chosen.

    Or maybe read some books first:
    https://www.amazon.com/Data-Compress.../dp/8184898002
    https://www.amazon.com/Understanding.../dp/1491961538

  3. #3
    Member
    Join Date
    Sep 2018
    Location
    Philippines
    Posts
    96
    Thanks
    29
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Trench View Post
    A intro for non programmers to try basic compression?
    "The Data Compression Guide" is for introductory purposes, e.g. on Huffman, LZ etc. I could have modified existing freeware or GPL compression programs that time but opted my programs to be distinct but easily understandable.

    https://sites.google.com/site/datacompressionguide/

  4. #4
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    468
    Thanks
    321
    Thanked 309 Times in 166 Posts
    Quote Originally Posted by Trench View Post
    A intro for non programmers to try basic compression?
    What's a "non programmer"? Someone who does not know any programming languages?

  5. #5
    Member
    Join Date
    Jan 2020
    Location
    Fl
    Posts
    32
    Thanks
    0
    Thanked 2 Times in 2 Posts
    it makes it seem like even programmers wont be able to do it. LOL
    I was meaning more of a step by step method to deal with it for non programmers.
    File compression does not seem that it gained as much progress compared to HD size increased over the year. And if more CPU and Memory is what is needed then that does not make it seems like its doing it on a level playing field and will also have its limits.

    I was thinning of file modification for the file to be open to be edited and then ran to see results.Plenty have edited files to get "desired" results without knowing how to code, or simplified to be edited. Just like how one does not need to know fully Japanese to go to Japan but enough words to know what to do.


    Shelwien
    Well its not a step by step method but its interesting. The all the charts are not clear to show % indicators which would match them all up evenly to see a comparison. Its not even spaced out evenly with tabla to be put into a spread sheet to evaluate it better.
    The methods used are fine but another approach has to be taken. Anything else just takes more power/memory and gets slower. I am confident that I do not see any "great" improvement happening with conventional actions.


    You know what I have not seen yet that no one mentions? What is the the theoretical HUMP it has to pass to be recompress since none structured to allow it. Other fiends have the theoretical hump to pass just like overunity forums can pass the hump with a V Gate.




    Compgt
    Huffman is an obvious way to compress. How can people find something new when they are are the same stack as everyone else to not be able to take another road to find something new. Huffman or Shannon were not even programmers and are the founders of compression is the irony. The the only way to progress is the same it seems. Which is why i say for non programmers to participate.


    The issue is the mind set of a programmer is hindering design but without programmers their is no progress. its like a finger trap. The harder one tried it is uses against themselves feeling they are so close but not really. If anyway understood this they they would take the next step.




    Gotty
    Yes.
    Just like all the people that help influence file compression did not get influenced by programming. I dont get why what worked stopped and went to only 1 skill type. A Significant progression is impossible.
    Last edited by Trench; 28th June 2020 at 23:51.

  6. #6
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    468
    Thanks
    321
    Thanked 309 Times in 166 Posts
    Quote Originally Posted by Trench View Post
    Gotty
    Yes.
    There may be some contradiction here or I don't understand your question. Your question includes: "to make a simple compression program?"
    To make any program, you need to learn how to program. That would be the first step. Otherwise how would you "make a program"?

    If your question actually means: "Any tutorial about how data compression works" then still that person needs to understand what are bits and bytes at least. To understand more sophisticated compression algorithms you actually need to have some background in information theory.
    Without all the above (i.e. non-programmers and without information theory background), see these videos:

    https://www.youtube.com/watch?v=JsTptu56GM8
    How Computers Compress Text: Huffman Coding and Huffman Trees

    https://www.youtube.com/watch?v=TxkA5UX4kis
    Compression codes | Journey into information theory | Computer Science | Khan Academy

    https://www.youtube.com/watch?v=Lto-ajuqW3w
    Compression - Computerphile

    These are simple enough with visualizations. Hopefully non-programmers will understand them. But it is still beneficial if the person knows what is a bit, byte, frequency, probability. There is no escape.

  7. #7
    Member
    Join Date
    Jan 2020
    Location
    Fl
    Posts
    32
    Thanks
    0
    Thanked 2 Times in 2 Posts
    True
    Well not to have a grandmother that doesnt know how to turn on a pc
    Which is why i also said not even a programmer can do what most do here off hand since they have to learn what you know. Which what was said does apply for them as a good starting point YES.
    Sure the links shown have some good things but its not a step by step and they can not be edited unless one knows how to decompile, edit, and recompile them.

    But to have something set up that someone that does not know try and put their algorithm or put some basic codes from W3 school.
    Or maybe you feel its far too complicated then you now best.

    I suggest programmers here to work with people from other fields to get a different perspective. Since the best way to find something is to start form the beginning. And since everyone here is compromised to think of the same theories that is not starting from scratch to find what is missing on the road to looking for what they are missing. Dont you think so too? Start form the beginning and discover the theories to understand the progression to find new things.

    But if you feel I asked for too much then sorry.

  8. #8
    Member
    Join Date
    Sep 2018
    Location
    Philippines
    Posts
    96
    Thanks
    29
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Gotty View Post
    There may be some contradiction here or I don't understand your question. Your question includes: "to make a simple compression program?"
    To make any program, you need to learn how to program. That would be the first step. Otherwise how would you "make a program"?

    If your question actually means: "Any tutorial about how data compression works" then still that person needs to understand what are bits and bytes at least. To understand more sophisticated compression algorithms you actually need to have some background in information theory.
    Without all the above (i.e. non-programmers and without information theory background), see these videos:

    https://www.youtube.com/watch?v=JsTptu56GM8
    How Computers Compress Text: Huffman Coding and Huffman Trees

    https://www.youtube.com/watch?v=TxkA5UX4kis
    Compression codes | Journey into information theory | Computer Science | Khan Academy

    https://www.youtube.com/watch?v=Lto-ajuqW3w
    Compression - Computerphile

    These are simple enough with visualizations. Hopefully non-programmers will understand them. But it is still beneficial if the person knows what is a bit, byte, frequency, probability. There is no escape.
    Right, even high-schoolers can start data compression with these essential knowledge. They do have a basic computer science course using Basic, Pascal, or C/C++. Talented high-schoolers can go straight to coding if they have compression ideas.

    When i remembered i co-developed image formats in the 1970s and 80s (i.e., bmp, gif, png, jpeg, jpeg2000) i wondered if i can do some coding again. Alas, not working as a professional programmer, it was hard to find compressors on the net and without constant access to the internet and software companies' resources. The results are The Data Compression Guide's core compressors from 2004-2005 based on my rediscovered knowledge of C from "The C Programming Language" book by Kernighan & Ritchie and "Using C" authored by the Atkinson brothers.

    I recall Mark Nelson remarking that one programmer's compressor took 5 years to complete and become stable. Data compression is not for the faint of heart. I say some now popular compressors were actually done in the 1970s and 80s and released only in 2000s. That's how advanced a subject data compression is.

  9. #9
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    468
    Thanks
    321
    Thanked 309 Times in 166 Posts
    Quote Originally Posted by Trench View Post
    But to have something set up that someone that does not know try and put their algorithm or put some basic codes from W3 school.
    W3 schools teaches you the basics of different programming languages and structure of websites and similar stuff. It does not teach you actual algorithms. I'm afraid there are no such teaching materials that gives you building blocks so that you can copy and paste and voila you created a compression software.

    Creating the structure of a website is different from creating the behavior of a web app.
    The structural building blocks are small and much easier to understand (and there are building blocks!). I can teach people to create a very basic HTML website in a couple of hours. For that you will need to learn simple HTML. Doable in a short amount of time. Not too difficult. But teaching you to code in C++ and implement a simple compression program - it requires many days. And you will need to actually learn C++. No other way. You will need to write that code.
    It's not only about data compression, it's about any algorithmic task. You need to create an algorithm. And that is ... programming.

    Quote Originally Posted by Trench View Post
    I suggest programmers here to work with people from other fields to get a different perspective. Since the best way to find something is to start form the beginning. And since everyone here is compromised to think of the same theories that is not starting from scratch to find what is missing on the road to looking for what they are missing. Dont you think so too? Start form the beginning and discover the theories to understand the progression to find new things.
    People here do work with people from other fields (other = no data compression).
    You want to master text compression? You will benefit from linguistics. Just look at the text model in paq8px. It's full of linguistic stuff. But it's far form complete - it can be still improved.
    You want to master audio compression? Image compression? Video compression? Same stuff. Applied mathematics, psychoacoustics, signal processing, machine learning, neural networks just to name a few.
    Actually an advanced data compression software has more stuff from other fields than the actual "compression algorithm".
    You better rethink that here we all know the same theories. No, we don't. We actually never stop learning. Don't think that we have some common theory, and that's all We continuously invent new stuff. From the user point of view it's not evident. A user just types a command or pushes a button, and the magic happens. And that magic might be real magic - you just don't see it.
    If you would like to explore it deeper and actually make a new compression software - go for it. There are lot's of ways and lot's of new possibilities. Truly.

  10. #10
    Member
    Join Date
    Jan 2020
    Location
    Fl
    Posts
    32
    Thanks
    0
    Thanked 2 Times in 2 Posts
    compgtzip is thr standard now while in the 90's it was not a standard in windows I think. Now you click on a zip file and it opens up like a regular folder to view.


    If you feel it takes a certain mindset to do compression then maybe that is the problem that its just that one mindset. Maybe a more abstract approach is needed to create something new.


    Many programmers want to be game programmers thinking they can make something good to make money which they almost all have bad design. Everyone wants to be a 1 man show to do it all which its hard to even do 1 skill right let alone many.




    Gotty
    Again you are right.
    Even Excel is somewhat a programming language despite encodes in the program.
    Non data compression programmers is different but not completely different another field.

    Their is a program called cheat engine which modifies almost any programs to act differently which doesnt take programming knowledge. Obviously its silly since its mainly for games. Excel modified things within the program to find patterns quicker than coding. Programmers as a whole are kind of ridged since one has to be to follow the rules of programming and might be the case here and why I stated the ones that help make compression like Hoffman were not programmers but engineers. And Hoffman theory is useless if not for programmers. Sometimes people look for the hardest solution when the simplest one is more effective.


    As i said before everyone here is defining compression as the wrong thing when they say randomness being an issue which is not it is lack of patterns. Which all those fields you stated are put in the same place when we evaluate it like that.
    Everyone is trying to make something more complex which would take up more cpu and memory to make the program more unusable. I assume you disagree, but disagreeing cuts your view from another perspective.
    I might have something or I might not but if i have something that cant be proven then its useless if i dont have something than can test it out then again useless. All I am is trying to assist in the mindset to think about of the box since a lot of things stated in the forum from all like minded people with someone with "no data compression" a you call that is trying to shine a light on another view.


    Just a tough that the best selling burgers in the world is made by a real estate company that rents out stores called McDonald. the best tasting burgers do not come close in sales. The reason McDonald did so well is to switch mentality from a burger company to a real estate company.

  11. #11
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    468
    Thanks
    321
    Thanked 309 Times in 166 Posts
    Let me tell you a couple of interesting facts.

  12. #12
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    468
    Thanks
    321
    Thanked 309 Times in 166 Posts
    Your decisions are greatly determined by the "success rate" and the "magnitude" of positive or negative feedback you experience. We are pursuing happiness throughout our entire lives and in order to reach it we make "statistics" starting in a very early age. We evaluate all situations based on these statistics and decide what to do and what not to do. For example I tried football, basketball, handball and I know I'm rather lame at any ball game. Even at snooker (I have my statistics ). In these activities I can't shine, so they don't give me satisfaction, so I try to avoid them. But I'm good at running and jumping (I can jump my height ), I got a bunch of medals from my teens, and so I love them (got my statistics). If I need to chose what to do, these statistics will dictate me that I better not go to play basketball in my free time but go running in the evenings.
    When having many friends with many different interests, how do we decide what to do when we meet and spend time together?
    We will instinctively maximize our shared happiness based on how much we like or dislike these activities. We will do activities more frequently that are liked by the majority of us and less frequently that are not liked by so many but still a couple of us enjoy - for the sake of those few. We can formulate the "high happiness score" as "less regret" or "less cost" and "low happiness score "as high regret" or "high cost".

    Summary: maximum happiness = do the "high regret"/"high cost" activities less frequently and the "low regret"/"low cost" activities more frequently.

    >>And Hoffman theory is useless if not for programmers.

    Huffman theory says: maximum compression = "As in other entropy encoding methods, more common symbols are generally represented using fewer bits than less common symbols." [wikipedia]
    Hmmm... sounds familiar? Our decision making is intuitively based on this compression theory. Not just for programmers. For everybody.
    Last edited by Gotty; 2nd July 2020 at 23:54. Reason: typo

  13. #13
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    468
    Thanks
    321
    Thanked 309 Times in 166 Posts
    Decision making - again.

    When you try to decide something you actually try to predict what the outcome of that decision would be. Would it be good? Would it be bad for me?
    When you need to buy a new mobile phone for example, you have different options. Buy a high end one, and hope that it will last many-many years and you'll be satisfied with the packed in features. Or for quarter of the price you buy one from the low range? Probably it would fail sooner or you would need to exchange it sooner than the top one. Also it may be lagging or miss some features, so eventually your satisfaction would be a bit lower.
    Or a second hand phone? Hm, the failure rate could be even higher and you don't have warranty. But the price is really tempting...

    You make a decision by trying to predict the outcome based on different metrics: price, satisfaction rate, warranty, probability of a failure.
    You don't foresee the future. But your past experiences, listening to experts, asking the opinion of friends will help you making a good decision. (This is also called an informed decision.)

    An entropy-based compression software does exactly that: trying to predict the next character in a file and the better the prediction is, the better the compression will be.
    (Entropy-based) compression = prediction.
    When you try to predict the future: what's the probability that it would rain or the probability of a successful marriage with a person, you actually apply the same theory that is used in compression.

  14. #14
    Member JamesWasil's Avatar
    Join Date
    Dec 2017
    Location
    Arizona
    Posts
    77
    Thanks
    79
    Thanked 13 Times in 12 Posts
    Quote Originally Posted by Trench View Post
    A intro for non programmers to try basic compression?
    To answer your question, yes.

    I set out to do this about a year or two ago, seeing that most of the source code was always in C++ or ANSI C, and rarely if ever in anything easier to read and closer to natural language for people who were intermediate or beginners.

    Many people started out with languages like Basic or Turbo Pascal, although C++ or assembly language is going to be your best bet long-term for programming efficiency, speed, and most real-world applications these days.

    BUT -- if you're getting started or are versed with Basic, you might want to start there and then gradually branch out to other languages like C++, Python, or Java...all of which are now industry standards.

    There were some great and helpful commenters on this thread when initially introduced (please ignore the 1 jerk, maybe 2 spouting off on there and read past it to get what you need out of the posts and the code):

    https://encode.su/threads/3022-TOP4-algorithm

    As a bonus, Sportman was especially helpful and compiled his own version along with the one that I submitted. He did independent testing as did at least one or two others. (jibz did one in C if you need it, too)

    Although the thread title is slightly misleading because in reality it doesn't always produce code better than Huffman...there are many modifications that you can do with it to where it can be made to achieve more with partial contexts and better compression. But I would use ait s a starting point for an easy way to understand, since there are few places to find anything easier or more straight-forward than this.

    Now please understand that there are other compressors that will do better too that are most likely based on Arithmetic Encoding or Range encoding, but with those is more complexity and might be too much for someone to start out with. A lot of people suggest "PAQ", but it's a lot of unnecessary stuff to do very basic compression and understand the premise of it. When you're ready you can do PAQ probably after Arithmetic Encoding and traditional Huffman, but for an easy and fast way to do things, I'd start here.

    With TOP4, you get a basic skeleton frame I made that is table-based in BASIC and compresses with a very straight-forward, WYSIWYG approach. There's a separate file that is able to reverse the process as a decompressor. It reads the bytes at the front of the file to get a table for codewords, and then decompresses data based on that.

    How it works: It represents the 4 most statistically likely symbols with 1 bit shorter code word, while adding 1 bit longer to the least frequent symbols at the end.

    By doing this, you get compression because of the frequencies with shorter codewords at the front always outweighing the frequency of the least occurring ones at the end.

    The less compressed and "balanced" the frequencies are for the symbols, the more you're able to compress data at the top and expand the few at the bottom. Your compression is what you get from the difference of this when all the bits are tallied up and converted back to 8 bit ASCII symbols.

    (I did one in C and used QB64 for what was submitted, but you can make it for VB6 or use Sportman's VB.Net submission just as well)

    The basic code should be easy enough to read to where you can adapt it to any language you fancy or want to use, since it's very close to pseudocode for beginners.

    You'll find however that a lot of things (most things?) are written in C++ now, and people are using that as their pseudocode as a defacto-standard.

    What I would suggest is using this to get an understanding, but gradually branch out to C++ or Python from here and adapt it to that. Then, you can move on to actual Huffman coding or Adaptive Arithmetic encoding and more.

    This is more or less instantly gratification to help you get your feet wet with compressing text files, EXE files, BMP files, and others that are easily compressed. Once you're comfortable with this, it'll be even easier for you to continue on and adapt as you grow.

    And of course, the code was submitted royalty-free with no restrictions really, more as a learning tool for people to use freely.

    If you're interested in things like BWT, there's sections on that, too. I made some BWT implementations entirely in BASIC and one in Python (not sure where I put that, but I still have the basic one on a flash drive), but honestly you'll find more straight-forward BWT implementations from others searching this site than what I have to share. Michael has a really good BWT compressor on here that I've seen. And Shelwein and Encode have tons of random stuff lol

    There may be other really easy compressors on here for you to check out too if you search for them. They'll either be on threads or under the download sections with source code.
    Last edited by JamesWasil; 2nd July 2020 at 23:30.

  15. #15
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    468
    Thanks
    321
    Thanked 309 Times in 166 Posts
    And finally... language (morphology).

    In every (human) language the words that are used more often tend to be the shortest. (Zipf's law)
    We, humans, intuitively created our languages optimal in the sense that we express our thoughts and exchange information with the least effort - fewest possible sounds and fewest possible letters to convey the intended information.

    Thus we compress information as we speak. In a Huffman-like way. Isn't in phenomenal?

  16. #16
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    468
    Thanks
    321
    Thanked 309 Times in 166 Posts
    Summary.

    >>And Hoffman theory is useless if not for programmers.

    As you see from those examples above, compression theory is embedded in our decisions and is a serious part of our everyday life. We didn't really invent compression, we discovered it and formulated it mathematically and statistically. It's just everywhere.

  17. #17
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    468
    Thanks
    321
    Thanked 309 Times in 166 Posts
    Quote Originally Posted by Trench View Post
    As i said before everyone here is defining compression as the wrong thing when they say randomness being an issue which is not it is lack of patterns
    You need to actually try experimenting with a random file. And you'll see with your own eyes. Saying ("As I said before") in not enough. You must try. It worth it.

    First, let's fix your definition a little bit. This is a bit better definition of randomness:

    Randomness == lack of useful patterns.
    What does that mean?

    Do you see a pattern here: 000000000000000000000 ?
    These are 21 '0' bits.
    And is there a pattern here: 1111111111111111111111 ?
    These are 22 '1' bits.

    Yes, indeed they are patterns. Repeating bits. But these patterns are unfortunately worthless in the file where I found them. How can that be? Let me show you.

    I grabbed the latest 1MB random file (2020-07-02.bin) from https://archive.random.org/binary
    The above bit patterns are in this random file. They are the longest repeats of zeroes and ones. And there is only one from each. No more.

    You will understand the real meaning of a "useful" pattern when you try to actually make the file shorter by using the fact that it contains these patterns.
    When you would like to encode the information that the 1111111111111111111111 patterns is there, you will need to encode the position where this pattern is found in the file (and its length of course).
    It starts at bit position 5245980 and it's length is 22. The file being 8388608 bits (or 1048576 bytes) long, encoding any position in this file would cost us log2( 8388608 ) = 23 bits. Oh.
    See the problem?
    Even the pattern of 22 repeating '1' bits is in the file, it is still not long enough to be useful. Encoding this info would cost us at least 23 bits. We cannot use it to shorten the file. And there are no longer repeats... We are out of luck.

    When I first started experimenting with data compression I was trying to compress random files and find patterns. Like everybody else, I guess. I did find patterns but not useful ones. Eventually when you count all (!) possible bit-combinations in a random file you will end up with the pure definition of randomness. Everything has a near 50% chance. Count the number of '1's and '0's. Count the number of '00', '01', '10', '11', ... all of them will have a near equal probability. When I first experienced that it was of course discouraging, but beautiful at the same time.

    Lack of patterns? No.
    Lack of useful patterns.

    Let me quote you again:

    Quote Originally Posted by Trench View Post
    As i said before everyone here is defining compression as the wrong thing when they say randomness being an issue which is not it is lack of patterns
    Randomness is an issue. And randomness is the lack of useful patterns.

  18. #18
    Member
    Join Date
    Sep 2018
    Location
    Philippines
    Posts
    96
    Thanks
    29
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Gotty View Post
    You need to actually try experimenting with a random file. And you'll see with your own eyes. Saying ("As I said before") in not enough. You must try. It worth it.

    First, let's fix your definition a little bit. This is a bit better definition of randomness:



    What does that mean?

    Do you see a pattern here: 000000000000000000000 ?
    These are 21 '0' bits.
    And is there a pattern here: 1111111111111111111111 ?
    These are 22 '1' bits.

    Yes, indeed they are patterns. Repeating bits. But these patterns are unfortunately worthless in the file where I found them. How can that be? Let me show you.

    I grabbed the latest 1MB random file (2020-07-02.bin) from https://archive.random.org/binary
    The above bit patterns are in this random file. They are the longest repeats of zeroes and ones. And there is only one from each. No more.

    You will understand the real meaning of a "useful" pattern when you try to actually make the file shorter by using the fact that it contains these patterns.
    When you would like to encode the information that the 1111111111111111111111 patterns is there, you will need to encode the position where this pattern is found in the file (and its length of course).
    It starts at bit position 5245980 and it's length is 22. The file being 8388608 bits (or 1048576 bytes) long, encoding any position in this file would cost us log2( 8388608 ) = 23 bits. Oh.
    See the problem?
    Even the pattern of 22 repeating '1' bits is in the file, it is still not long enough to be useful. Encoding this info would cost us at least 23 bits. We cannot use it to shorten the file. And there are no longer repeats... We are out of luck.

    When I first started experimenting with data compression I was trying to compress random files and find patterns. Like everybody else, I guess. I did find patterns but not useful ones. Eventually when you count all (!) possible bit-combinations in a random file you will end up with the pure definition of randomness. Everything has a near 50% chance. Count the number of '1's and '0's. Count the number of '00', '01', '10', '11', ... all of them will have a near equal probability. When I first experienced that it was of course discouraging, but beautiful at the same time.

    Lack of patterns? No.
    Lack of useful patterns.

    Let me quote you again:

    Randomness is an issue. And randomness is the lack of useful patterns.

    It's a relief that somebody here is admitting he actually tried compressing random files, like me. And actually suggests us to experiment with a random file. But not too much i guess.

    I tried random compression coding in 2006-2007 that i actually thought i solved it, that i thought i got a random data compressor. I feared the Feds and tech giants will come after me, so i deleted the compressor, maybe even without a decoder yet. Two of my random compression ideas are here:

    https://encode.su/threads/3339-A-Ran...sor-s-to-solve

    RDC#1 and RDC#2 are still promising, worth the look for those interested. Maybe, i still have some random compression ideas but i am not very active on it anymore. There are some "implied information" that a compressor can exploit such as the order or sequence of literals (kinda temporal) in my RLLZ idea, and the minimum match length in lzgt3a. Search here in this forum "RLLZ" for my posts.

    https://encode.su/threads/3013-Reduc...highlight=RLLZ

    > Randomness is an issue. And randomness is the lack of useful patterns.

    Randomness is the lack of useful patterns, i guess, if your algorithm is a "pattern searching/encoding" algorithm. Huffman and arithmetic coding are not pattern searchers but naturally benefit on the occurrences of patterns. LZ based compressors are.

  19. #19
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    468
    Thanks
    321
    Thanked 309 Times in 166 Posts
    I'm happy that you are happy and relieved.
    I think trying compressing random data is a must for everyone who wants to understand entropy.
    I'm with you, understand your enthusiasm and I do encourage you to experiment even more. After you understand it deeply, you will not post more different ideas.

  20. #20
    Member
    Join Date
    Sep 2018
    Location
    Philippines
    Posts
    96
    Thanks
    29
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Gotty View Post
    I'm happy that you are happy and relieved.
    I think trying compressing random data is a must for everyone who wants to understand entropy.
    I'm with you, understand your enthusiasm and I do encourage you to experiment even more. After you understand it deeply, you will not post more different ideas.

    Thanks for replying Gotty! Your posts here at encode.su are very informative, clearly well explained and sure are very helpful to anyone doing data compression, experts or enthusiasts alike.

    > I do encourage you to experiment even more.

    Well, maybe not too much in random data compression, but on algorithms very different than Huffman, LZ, grammar based, and arithmetic/ANS coding. If luck wills it, the question is again how programmers can "monetize" on their compression ideas and compressors.
    Last edited by compgt; 3rd July 2020 at 14:03.

  21. #21
    Member
    Join Date
    Jan 2020
    Location
    Fl
    Posts
    32
    Thanks
    0
    Thanked 2 Times in 2 Posts
    ​JamesCool. I thought about that too to replace the most used with a smaller one and the least used with bigger but in another way. But its also kind of used in coding in some ways I think.




    Gotty
    agreed. nice associations, so my wording was wrong and I emphasized it too much. But that was a side comment. Similarities from one field to another.


    The main point is not addressed which is programmers need completely different other fiends for perspective


    as for randomness you are right for the most part.
    when you open up a combination lock you have a limit of patterns to put in. the more digits the more numbers. if its 1 digit and binary you have 50% odds, if its 2 digits and binary its 4, 3 digits and binary etc you get the idea. I dont know if you saw my other post about randomness but I explained more in more details with examples. but in short a computer does not know the difference between random or order and we define it with the formula in what we understand.


    Maybe you are 100% right but as for not I do not agree fully. Also sorry but I disagree with discouraging other to disagree with random files. You have to push yourself to achieve something harder which makes the other feel easier. Randomness file compression is the future I feel despite almost everyone does not see it.
    https://encode.su/threads/3338-Rando...n-is-the-issue


    I forgot my programming 20 years ago and only do simple things like html, excel, or hex edit. Its a different mindset when dealing with other things and to be away form coding.


    Compgt
    very interesting comments.
    Did you have proof that it would be an issue? but if you can you can at least make one for yourself and have a dead man switch.
    I made a post about that too. https://encode.su/threads/3346-No-ho...en-if-possible


    People think of the positive aspects of finding the ultimate compression but the issue is many ignore or imagine the negative side about it. Good programmers but not practical. What if you were in charge of a nations GDP, what if you are in charge of a companies fiduciary duty, what if you are in charge of livelihood of others. 1000 steps forward 2000 steps back. If one is going to release fire they better be able to control it. This forum is for compression but the balance is more than that, but not talked about since again this forum is just for file compression. Its hard to balance many aspects.


    yes they benefit from the occurrence of patterns, and that the edge you take and exploit. Just like a boxer exploits the opponents weakness the same with the coding.


    As for making money on their ideas. Well how did this work for Gif?
    I should make a new topic. Since I feel many are holding back out of fear,out of money, etc.




    Anyway
    I suggest people do what Christ said. Be like a child to at least start from the begging and understand how they learn. The obvious is not so obvious. I talk vague since I am trying to make others think about it and dont like to say much about it.

Similar Threads

  1. rANS tutorial
    By j-towns in forum Data Compression
    Replies: 3
    Last Post: 11th March 2020, 23:39
  2. How to make many from compression
    By soln in forum Random Compression
    Replies: 6
    Last Post: 11th August 2019, 04:00
  3. Replies: 4
    Last Post: 17th February 2018, 12:03
  4. Simple Program To Measure Peak Memory Usage
    By comp1 in forum The Off-Topic Lounge
    Replies: 7
    Last Post: 20th July 2016, 17:09
  5. Idea to make new site about data compression
    By Piotr Tarsa in forum Data Compression
    Replies: 1
    Last Post: 14th August 2009, 20:22

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •