Results 1 to 14 of 14

Thread: Golden Mechanisms

  1. #1
    Member
    Join Date
    Jan 2020
    Location
    Canada
    Posts
    142
    Thanks
    12
    Thanked 2 Times in 2 Posts

    Golden Mechanisms

    We know Lossless Compression is an evaluation for strong AI. And so is compression/decompression speed. 2 of the same kind, efficiency. And the evaluation for evolution is immortality based on Big Diverse Data Context in a synchronized/equilibrium(Utopia) compression/extractionMining, where domains of loops align like on the Large sun stronger to propagate brain waves [faster] in the swarm using [less] energy. The more patterns/diverse data you know, the exponentially better your model is, little data is powerful, and lossless compression shows you the power of such little data. 100MB has been compressed to 15MB. In actuality we are on Earth and are re-generating missing data/churning using old data self-recursively and all of Earth (which is data) although can look lossy, nothing is lossy, we are regenerating back missing knowledge, employees, items. It gets exponentially faster near the end because you recursively made/found extra data/context.

    Since yous are very knowledgeable, I want to ask what are the key sauces? Here is the more important sauces I've realized so far:

    1) Context models, these are multiple n-grams that window previous text. This also includes frequency, another sauce.
    2) Online/Adaptive Learning.
    3) Grouping related words.
    4) Finally, Arithmetic Coding the remaining error.

    Do yous know of any other major mechanisms?
    I'm unsure BWT can help a well done case of the above as we already displaced patterns. Random isn't really compressible.

  2. #2
    Member JamesWasil's Avatar
    Join Date
    Dec 2017
    Location
    Arizona
    Posts
    103
    Thanks
    95
    Thanked 18 Times in 17 Posts
    Well, that's a whole lot of nonsense and misconception in one post.

    Data compression is NOT an evaluation of "Artificial Intelligence", nor is it machine learning per se. It is not an application of strong AI either, and I feel that that assumption might be a side road that people have started to subscribe to after the ridiculous path that Matt Mahoney has sent people in the compression community down by claiming it for too many years previously to where people don't know better anymore or aren't able to think for themselves on this matter.

    Prior to the use of any algorithmic form of AI for compression, you've had statistical and entropy coders which didn't (and many still don't) use anything more than a very small initial table to cleverly organize and compress data without having to digitally "think" or use back-propagation or anything like it to compress data.

    Huffman, Arithmetic and Range encoding, RLE, and LZ algorithms do not have to implement any form of artificial intelligence to compress data, they only need to use a very structured method to do it routinely. They follow a very set path and many of them don't even have to be adaptive. There are some that simply perform better just having the ability to utilize more data at once (i.e: going from order 0 to order 1,2,or 3 with models and matches). The computer doesn't have to "learn" anything further, it just has to be on the right track and keep doing what it's doing as it encounters the data to get the best results.

    Another example of this is that years ago (around 1998 to 2001) I saw someone "invent" what they called an "intelligent data organizer using AI", and they thought it was the greatest thing since sliced bread. They promoted it on Geocities and Angelfire websites (now defunct). They talked feverishly about having read Ray Kurzweil's interpretation of how people and technology should go, and they hoped it would "gain his attention". Alas, when evaluated the program was literally no better than the MTF algorithm and it did not perform as well as a basic BWT algorithm on data.

    The application of learning n-grams can be useful - to a point - for compressing data when algorithms are able to identify it and compress the patterns in it. But the problem with expert systems, neural networks, and any other approach to using AI for data compression is that your tables, your contexts, and additional methods for identifying those patterns become bloated and more and more complex, and they have to be stored somewhere and that requires space - often times, more space than the data you seek to compress if the files are too small and the databases and programs are already larger than what you'd save by doing it. That negates and starts to run counter-current to data compression itself at that point.

    If your super-AI compressor requires 4GB of data and stack space to compress a 40k file to 20k, you're still negative -4294946816 bytes that you didn't save, because you have to still store all that bloated data on the same medium with it (or somewhere else). And at that point, you might have actually saved data using deduplication instead with a 1k program that identifies multiple copies of ANY other size file on the medium and makes links to it rather than storing multiple copies. When the multiple copies are replaced with links, those 4k to 40k files become 25 bytes and data savings happen that even the most complex AI programs are not going to give you.

    Compression isn't always "bigger, badder, better, more complex". It's about saving space, which can be by unorthodox or unconventional means that may or may not yet have been identified.

    That said, there isn't going to be a Utopia, and we're not here to subscribe to your quasi-religion of Singularity while pretending that AI is the end-all-be-all for compression, either. We're here to talk about real compression of data - in any form it may be - but if artificial intelligence is a type of religion to you where fictitious outcomes become your reality and everyone lives in a cloud as a part of a borg-collective nightmare fantasy, then perhaps this thread might be best moved to the Off-Topic area instead. :-/

    The key "sauces" are anything and everything that works. Context modeling is currently king for size, while other methods that don't compress as much but are very fast and are able to read large blocks quickly are king for speed. Before this, PPMD and SSE were, even though implementations and speed varied.

    What is the "best" algorithm for one person trying to save the most space on a medium but is not pressed for time, may be entirely different than the "best" algorithm for a person or a company that needs to be able to save at least a certain known percentage of space, while still making retrieval and updates to data quickly, while the "best" algorithm for a multimedia company may be the one that doesn't compress that much but is able to do what it does fast when their users need it, and saves them transfer time, bandwidth, and money by reducing what it can transparently without the user having to know it is there. The "best" algorithm for a data center may not even be daily compression as much as deduplication to trim the FAT (no DOS pun intended!) from multiple files with the exact same name and contents using space where other files might be instead. The "best" algorithm doesn't exist, and even the most advanced AI is not going to be able to realistically present it without having a lot of "decision making" and require a lot of data that adds to the bloat that, past a point, makes AI inefficient rather than useful for compression situations.

    "We's" been looking at different algorithms for decades trying to determine ways to squeeze out every last drop of what we can from the most basic enhancements to huffman to LZMA and beyond. There isn't a silver bullet yet, and if there is, there is no "sauce" or "AI" that does it.

    My personal outlook on machine learning is that it partially gets certain aspects right based on training as it goes along, and only gives you you the best result for A certain data set until you (or the data that you read with it) deviates too far from the model.

    That practicality is frequently limited by that data set (i.e: text exploitation rather than binary or multimedia file formats).

    Meanwhile, a real algorithm that does not have to be trained or take days or weeks to compress medium sized data sets can exist which makes the current trend of AI used for data compression entirely obsolete.

    Essentially, AI for data compression becomes like a stopped watch telling the right time twice a day as long as it stays tuned to what you need it to, but you have to wait 24 hours for it to compress things that should only take 60 seconds or less for it to do that currently. Kind of like using a substitution algorithm to try and achieve entropy and tuning the lookup tables to the data, when you could use an arithmetic encoder and do better without wasting time or resources for the computer to eventually train itself to get some of those right.

    We're not rechurning or reprocessing any data on the Earth one way or another outside how it has always been since data was perceivable by human beings. All of the above assumed lossless continuity for algorithmic expression, but even if you try to pretend that human interpretation of lossy data is the same as lossless in a smaller form, it is not. The interpretation of what is made from a lossy representation is made lossless only by the missing information that the observer makes to "fill in the blanks" for that data, whether it is the eye or the ear anything else that perceives it.

    And yet still, the information that is filled in to make sense to the beholder for the situation of lossy representation may not even be the same lossless data that would exist if it were used to make sense to another individual who sees it differently. Yes, the eye may see it mostly the same which is "good enough", but if one person is partially color-blind while the other person is able to see slightly above normal range for lighting (or if one person is partially deaf, while the other has a very carefully trained ear that is able to hear slightly above normal ranges and frequencies), even though the "standard approach" to reduce data in a lossy way and be interpreted by the holder is still applicable, the degree to which and the quality of that result varies drastically (some things more than others), which is why there are audiophiles who will NOT use MP3s but will save and only listen to music in FLAC or AAC format because they can literally hear the difference in the music quality, where "most" people who are not that into music or fine audio, will not.

    Data compression is not limited to text, even though most of the testing seems to be done on large compilations that contain it such as enwik8 and enwik9 to replace the calgary corpus and others before it that were not as robust.

    You can make an AI program that "learns" which numbers will add up to 4,294,967,296 and then saves a big database on which numbers it uses or doesn't use as whole integers or fractions to do that and have it spend a few hours to days doing that while claiming equilibriums and happy dances all over the place for the sake of singularity worship...OR...you could have the computer calculate 2^32, 256^4, or 65536*65536 to get the same damn thing in a few nanoseconds.

    It might be a neat project to make a computer "learn" which numbers add up to what you want, but it isn't necessary and the end result of what it does is the same as the stopped watch with a lot of bloat, time, energy, and resources wasted to accomplish the same exact thing in the end.

    For right now, context-mixing achieves good results and uses a lot of modeling based on the data type to get what it does. But you might not even need the extreme complexity and waste it creates to get the same results in other ways when you find them.

    Assuming that random data is random and therefore not compressible is yet another area to be careful when walking along, because what seems to be "random" may not always be IF you can find a pattern for that type of data produced. The repetition in the data is not going to be there and yeah, the algorithms you use that exploit that will have harder and harder times achieving compression because of it. But that doesn't mean that the data has become random or that it has become uncompressible in every case, just that the methods which are currently known and used for it will have an increasingly harder time to get larger compression ratios out of it as it reaches the limit for the algorithm used currently for compression.

    You might maximize your use of all the current compression algorithms, but if you miss an optimization on one or you do it in a specific order, it can mean the difference between compressing the normally difficult or seemingly incompressible data by 1k or expanding it.

    It isn't the "sauces" at all, but the way people prepare them in a meal that makes or breaks your dinner experience.

  3. Thanks (2):

    Self_Recursive_Data (12th January 2020),snowcat (12th January 2020)

  4. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,975
    Thanks
    296
    Thanked 1,302 Times in 739 Posts
    > Data compression is NOT an evaluation of "Artificial Intelligence", nor is it machine learning per se.
    > It is not an application of strong AI either, and I feel that that assumption might be a side road

    Compression is a good objective measure of data model quality.
    Its just rarely called that, but instead there're plenty of euphemisms:
    likelihood, log-likelihood, entropy, description/message length, complexity, "Occam's razor" etc.

    So if there was a good AI (able to detect patterns in already known data and make predictions based on them),
    we'd have better compression.
    Since current best CM works on developer intelligence instead -
    a paq8 developer sees some pattern in the data, adds a new submodel,
    we get better compression for this kind of data.
    But obviously it should be possible to automate, and at some level it would be AI,
    because understanding the meaning of text can help in predicting following text, etc.

    > after the ridiculous path that Matt Mahoney has sent people in the compression community down

    Matt Mahoney did cause a problem, but not the one you think.
    Overall idea of Hutter Prize makes sense, and it does include the decoder size,
    so overtuning doesn't automatically win it.

    However the choice of target data was bad - enwik8 is mostly not text,
    but various markup languages - xml,html,wiki markup,urls,bibliographies, etc.
    Thus it becomes possible to win by manually writing preprocessors for these
    (since syntax is known), and obviously an AI can't learn from data to
    predict markup syntax better than a preprocessor/model based on external
    description of the syntax.
    So I'd expect more interesting contest results if target data was plain text,
    maybe even the same enwik, just with all markup removed.
    Although ideally the target data shouldn't be public at all, to prevent this manual tuning.

    And another bad choice was to set up the entry constraints strictly to paq8 specs.
    Time is long enough to accept paq8 (which excludes all fast coders),
    but not long enough for NN with online training (like NNCP).
    And memory size is good enough for paq8 (with its 8-bit fsm counters and compact hashtable),
    but not good enough for parametric CM models with linear counters or PPMs or NNs.

  5. Thanks (3):

    JamesB (13th January 2020),JamesWasil (12th January 2020),Self_Recursive_Data (12th January 2020)

  6. #4
    Member
    Join Date
    Jan 2020
    Location
    Canada
    Posts
    142
    Thanks
    12
    Thanked 2 Times in 2 Posts
    Thank you two for posting!

    I know some of what I said is new to many and isn't explaining the background behind it either, sorry.

    If you please take a look, you can see the sun https://www.youtube.com/watch?v=6tmbeLTHC_0
    Put the music on the video. Do you realize how huge that thing is? It catches fire from too much interactions in the core, context. Then it self-extracts output. The loops are like atomic electrons, unstably large atoms emit radiation. The domains/loops are what propagate material faster. They align, like humans or context models. It's a combinational affect. And there's other cases of it as well.

    What I meant in one line was that systems with larger context are able to, like a giant sun, compress lots and extracts lots, and explains the upper edge in evolution for groups of humans who are educated. They try to resist change like energy looping around in a cold motionless battery. I call that "Immortality". We are gathering exponentially more context and repairing/resisting change exponentially faster these days.



    "Data compression is NOT an evaluation of "Artificial Intelligence", nor is it machine learning"
    I agree there is other ways to compress data with no n-gramming etc. But the best compression benchmarks have been based on many AI techniques. Also 'Online Learning' of extra data context is being achieved in these benchmark compressors.

    "Prior to the use of any algorithmic form of AI for compression, you've had statistical and entropy coders which didn't (and many still don't) use anything more than a very small initial table to cleverly organize and compress data without having to digitally "think" or use back-propagation or anything like it to compress data."
    Are you talking about the few short context windows only looking at the very recent last few letters lol? Indeed, the recent context is most important. Farther back is less important but still critical weight to harness though.

    "Huffman, Arithmetic and Range encoding, RLE, and LZ algorithms do not have to implement any form of artificial intelligence"
    I can tell you haven't thought this out. These algorithms destroy patterns / just fix a human-made issue let me explain. Huffman and LZ make shorter codes, RLE makes runs of patterns shorter codes, Range/Arithmetic only stores the error correction for desired prediction output. The shorter codes from Huffman/ LZ/ /RangeCoding/ Arithmetic is a human problem, just an inflation, Huffman is almost as optimal as Arithmetic AND RangeCoding AND LZ - they all do the same thing! They give equivalent compression, they have the same idea. The codes/words we use on facebook chat are twice as big as need, 'this' is 32 bits but could be 16 bits, if we were aliens we could just change our language to shortest codes. As for RLE, you could first do BWT, this is 'looking' for patterns, to find high probabilities to Arithmetic Encode in, this is similar to short context windows on the previous text because they are linked to their associations to predict. Now this, BWT+RLE, is destructive, like the various n-gram methods, and is part of the AI pattern searching. Same thing, patterns; AI. So the human made issue is just that, while the BWT+RLE and n-graming is both pattern searching. Of course they are, all similar.

    "Another example of this is"
    Yes, making sure it actually compresses wiki8 (and other human/realWorld data that HAS patterns in it), is mandatory. And yes, say you landed on a good compressor, the wiki8 compressor you make is a pattern searcher, so it should work (depends if the text has less patterns or is simply too unseen) good on average on other human/realWorld data observations. If you use a text compressor on video data, first it needs code change but as for the idea, to consider nearby context, frequency, grouping similar ex. animals. Vision, any pixel is meaningless without context. All data in the universe has context and will show many patterns, every word in the English dictionary explains each other.

    "The application of learning n-grams can be useful - to a point" "expert systems, neural networks, and any other approach"
    Yes ex. n-grams have a just-right point before growing x10 bigger than wiki8 itself. The point is, with n-grams, and the other mechanisms, all set to the just-right settings, result in the littlest data yet most knowledge about wiki8. GPT-2 (see openAI website and test it at TalkToTransformer) is similar to the best wiki8 compression benchmarks (n-grams far back, grouping words) and GPT-2 is extremely much better predictor all while being efficient in speed and memory. GANs compress data, the predictor is shrinking and trying to learn the real world data so it's outputs look 'real', no fake. So it gets a lot for a little, and understands the data patterns. Transformers have now been used on wiki8 to achieve really great results: https://openreview.net/pdf?id=Hygi7xStvS And yes more data is exponentially better because every fact you know enables 10 questions to be answered; patterns. The universe laws are predictable. So digesting more data I mean here gives you better probabilities and for longer n-grams which have few observations (order-90...). Sure, you can ramp it up, but the Hutter Prize is about finding out the best beast before any ramping up, because it is smallest yet does a lot of damage (or healing, I should say, prediction is regeneration/ rejuvenation). Evolution is also exponentially faster near end because of that extra context we are sharing/thinking of. So yes, growing the brain is absolutely part of it, no one said we just want to digest wiki8 in the smallest way, we want to grab wiki9 and Mars data and grow the n-grams and probabilities and then Extract free fuel/wisdom from the Learnt network itself.

    "That said, there isn't going to be a Utopia"
    I was just sharing the trend where humans find patterns and grow more confident in predictions, we are able to breed faster new humans and ideas to be mutated, we are able to Extract new knowledge/intelligence from know data ex. if cats eat and cats are dogs then, dogs eat. We are getting better at Darwinian Survival and are breeding/repairing faster than we are dying. To repair/regenerate, we use context. So our big data these days is making us come close to the nanobots where instant regeneration occurs and are extremely hard to destroy them. How can you catch all nanobots? A butterfly net? And they repair near instantly as their 'hospitals' are general and everywhere. They understand the data/real universe extremely extremely well. Sure the opposite could happen, decompression/expansion where we eat too much energy/matter and grow like a star and burst into heat from gravity, giving it back, compression>decompression. Or big universe crunch. But maybe, there is an end of evolution, equilibrium of physics, where we have the approx. balanced compression/decompression ex. the huge nanoblob world would eat a lot and dump a lot, and we compress data a lot and extract a lot of insights.

    "The key "sauces" are anything and everything that works."
    Nope, patterns, didn't I say there's trends? Sure you can use a hammer or rock of rod to break a glass windows but, patterns exist.

    "Context modeling is currently king for size, while other methods that don't compress as much but are very fast and are able to read large blocks quickly are king for speed."
    I agree speed increase can result is memory increase, because you need to extract things you *do *not *have. While if you do have them, it's fast but big memory. When you break this law, you actually have them all but are just sorted better Ex. "the and it he boat zoo Mars grommet" vs "grommet Mars zoo the he boat it and". Same memory, but slower in the last sort. But we saw compressed data can be fast to decompress? Yeah, but with wiki8 100MB there's no decompressing hence fast to scan for X. So we want to find a compressor for AGI that extracts answers/data fast. One thing, extraction is faster if you understand the data, because Brute Force is very slow to give you the output you want. So intelligence, per evolution, is all about efficiency, like computer chips, are small compression and speedy. As I said you can know 100GB and know most the universe patterns, so that is pretty fast for extracting whatever you need using 1) little data and 2) compressed little data. So we want compression and speed, and intelligent methods do exactly that, they give you the compression, like Brute Force would, but fast. If I were to work on a BWT compressor on bits it would give some fair compression but isn't fast now is it.

    "We's" been looking" "There isn't a silver bullet yet"
    As said, our own brains are fast compressors and extractors. We can understand a lot of the universe (works on any dataset, we digest it and build) if study some lectures for a few years of diverse domains to build a mental Theory Of All about the laws of physics.

    "We're not re-churning or reprocessing any data on the Earth one way or another"
    Earth is evolving, we generate data, share it, compress it, then extract yet more insights, recursively self improving our "intelligence" (intelligence by definition is a term that refers to data).

    "And yet still, the information that is filled in to make sense to the beholder for the situation of lossy representation may not even be the same lossless data that would exist if it were used to make sense to another individual who sees it differently."
    Yes our knowledge makes us think/predict what we even see, but most humans and animals have a similar model of the world and each human is weighted in and averaged lol.

    "Assuming that random data is random and therefore not compressible is yet another area to be careful when walking along, because what seems to be "random" may not always be IF you can find a pattern for that type of data produced. The repetition in the data is not going to be there and yeah, the algorithms you use that exploit that will have harder and harder times achieving compression because of it. But that doesn't mean that the data has become random or that it has become uncompressible in every case, just that the methods which are currently known and used for it will have an increasingly harder time to get larger compression ratios out of it as it reaches the limit for the algorithm used currently for compression."
    No, random means random in our discussion here. Take a global bag of context, ex. Earth or the solar system, and count its occurrences. Random is something that is-not compressible as much. Else Hmm there is patterns everywhere, hills of sand on Mars, mountains, shadows, liquid, splashes, fractures, electro-magnetism, so if there isn't 'really' any random patterns that exist then actually I may opt now for the conclusion that there is no random because all particles are elementary types and can be positioned relatively to become less random. Now, this doesn't mean infinite compression because of no random data is permanent. Evolution doesn't make Earth evolve into a perfect homogeneous solid, it has to be able to defend against death like in evolution. So the best 'morph' to become is defensive/rejuvenating and has most patterns (general nanobot fabric all connected). It knows there is another of the module x distance away, while sub units know there is another of themselves in the bigger unit. So: patterny, while defensive/repairing by knowing the right knowledge and having the approximately right technologies. Of course surrounding matter is not transformed yet, so its a never ending battle for life in the future

    "You can make an AI program that "learns" which numbers will add up to 4,294,967,296"
    Yes. And yes knowing that 65,536*65,536 gives the answer is faster. You can store it for future use. But how do ou in polynomial time find the answers faster these days? We have big data these days. We can try dividing the number or find the square root ex. try the square root of 60, if is in the high then try 600, if still try 4,000.....70,000...now in the low, try 60,000...eventually you find it by Binary Search. See how my knowledge allowed me to discover how to find this in just 2 minutes? I don't usually work with hard math so this was exciting. (I just work with other formats/symbols, like visual images of dogs/cats drinking or text words saying what i/does what. Works because of reduncdancy.)

    "It isn't the "sauces" at all, but the way people prepare them in a meal that makes or breaks your dinner experience. "
    It is the sauces, and their relative positions, just like in GPT-2 Transformer architecture



    "likelihood, log-likelihood, entropy, description/message length, complexity, "Occam's razor" etc."
    Yes patterns, nearby context and global connections (each word in the dictionary not only describe/make each other but also are only a few steps away from each other; small world network. Same for friend connections/context, a team of works are like neurons where missing employees can be replaced by redundancy patterns and in fact compression by Dropout or simply Big Diverse Dataset lead to a compressed/general team, small, fast, a human-net that compresses the team, just like compressing the wik8 using a network)

    "Overall idea of Hutter Prize makes sense, and it does include the decoder size, so overtuning doesn't automatically win it."
    Yes and of course.

    "However the choice of target data was bad - enwik8 is mostly not text, but various markup languages - xml,html,wiki markup,urls,bibliographies, etc. Thus it becomes possible to win by manually writing preprocessors for these (since syntax is known), and obviously an AI can't learn from data to predict markup syntax better than a preprocessor/model based on external description of the syntax. So I'd expect more interesting contest results if target data was plain text, maybe even the same enwik, just with all markup removed. Although ideally the target data shouldn't be public at all, to prevent this manual tuning."
    I wonder how better knowing "possible to win by manually writing preprocessors for these (since syntax is known)" would do, did anyone try?
    Yes tuning to the wiki8 is overfitting but because it holds an o-k amount of human knowledge it allows you to 1) make the smallest compressor to 'understand' the wiki8 data and therefore 2) it should work on unseen inputs to generate back unseen future discoveries (missing data of anything it looks at on the internet lol).
    Yes the constraints on benchmarks are your must be fast and low compression result while during compression/decompression should never need to grow/spike overly large, makes sense. Personally I am more interested in compression but also speed but it's ok if it takes a few days to digest, and the RAM shouldn't ever grow overly large. To find AI focus on compression first, then optimize the golden mechanisms's speed and working memory sizes.
    Last edited by Self_Recursive_Data; 13th January 2020 at 06:13.

  7. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,975
    Thanks
    296
    Thanked 1,302 Times in 739 Posts
    > I wonder how better knowing "possible to win by manually writing preprocessors for these (since syntax is known)"
    > would do, did anyone try?

    http://prize.hutter1.net/
    https://encode.su/threads/2858-Hutte...e?p=58855&pp=1
    https://encode.su/threads/2924-Faste...ssion-(for-fun)
    https://encode.su/threads/2590-Some-...enwik8-parsing

    > Yes tuning to the wiki8 is overfitting but because it holds an o-k amount
    > of human knowledge

    Its not overfitting, its a real solution for the specified task
    (since target size includes the decoder size).

    Problem is that compared to enwik8 markup, english text doesn't matter that much.
    It uses multiple markup languages with reasonably complex syntax,
    so perfectly parsing all of them is hard, so its still possible to make further progress
    by improving markup syntax handling.

    The list of potential improvements looks kinda like this
    (in order of growing complexity):
    1) Better preprocessing of markup syntax where its not efficiently handled by paq8
    2) Syntax/grammar/semantics model of english language
    (in something like 100k of compressed code/data, since otherwise its not worth
    the prediction improvements)
    3) Automatic runtime discovery of markup syntax
    (its not sequential like english, instead there're hyperlinks, tables etc)

    So its kinda obvious what would be chosen.

    I just think that starting with a "pure" natural language target
    would be more compatible with the purpose of the contest.
    Although of course it would work only until first person discovers
    some preprocessing trick that would break it.
    At least its how it turned out with Mahoney's attempt to make
    a different kind of benchmark: http://mattmahoney.net/dc/uiq/
    It worked right... but then a preprocessor appeared.

    > 1) make the smallest compressor to 'understand' the wiki8 data and therefore

    That was the idea, but HP contest went off the way right from the first entry.
    http://mattmahoney.net/dc/text.html#1323
    It was basically the same paq8, but with words replaced by shorter codes...
    which mostly improved compression due to contest's memory limit
    (since smaller file needs less memory for statistics).
    Then there was more and more preprocessing, paq parameter tweaking,
    adding whatever open-source components that helped, then recently swapping to ssd.
    But there's essentially nothing that would work for compression of other files
    (even other english text) - instead of AI development it turned
    into a competition in tricking contest rules.

  8. #6
    Member
    Join Date
    Jan 2020
    Location
    Canada
    Posts
    142
    Thanks
    12
    Thanked 2 Times in 2 Posts
    P.S. I think the best non-text in wiki8 is this one:
    https://ibb.co/Mkfjzrb

  9. #7
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,975
    Thanks
    296
    Thanked 1,302 Times in 739 Posts
    Yes, there's other ascii art too... but it doesn't matter since its compressed to like 300 bytes even with plain paq8.

    Have to focus on parts and syntax types which contribute to entropy the most.

  10. #8
    Member
    Join Date
    Jan 2020
    Location
    Canada
    Posts
    142
    Thanks
    12
    Thanked 2 Times in 2 Posts
    When I started this thread I was aiming for the top benchmark algorithms's golden mechanisms, not unproven ideas/ fantasies.

    Is this how I predict the next bit? I understand what an order-1 context is now. You look at a byte and predict the next byte. But, will I get the same compression if, instead of 8bit>8bit I instead do 15bit>1bit? It's same as order-1, the length is 16 bits, but most the context is the 16 bits. Is that all you do? Or do I have to slap on a logistics activation too and divide something in half lol and round it????? I understand Arithmetic Coding already and mixing.

  11. #9
    Member JamesWasil's Avatar
    Join Date
    Dec 2017
    Location
    Arizona
    Posts
    103
    Thanks
    95
    Thanked 18 Times in 17 Posts
    Quote Originally Posted by Self_Recursive_Data View Post
    Thank you two for posting!

    I know some of what I said is new to many and isn't explaining the background behind it either, sorry.
    You're welcome, but I don't see anything new with what was mentioned. I see a lot of things that have nothing to do with data compression, many of which people have said before 20+ years ago when they did not understand what data is, let alone how to manage and correctly organize it to make representation of it smaller. People start to talk about everything short of a sci-fi film they once saw combined with a bad LSD trip, and it really takes away from the reality and actual science of data compression when people do that for whatever reason they feel compelled to.

    If you please take a look, you can see the sun https://www.youtube.com/watch?v=6tmbeLTHC_0
    Put the music on the video. Do you realize how huge that thing is? It catches fire from too much interactions in the core, context. Then it self-extracts output. The loops are like atomic electrons, unstably large atoms emit radiation. The domains/loops are what propagate material faster. They align, like humans or context models. It's a combinational affect. And there's other cases of it as well.
    This has absolutely nothing to do with data compression or combinatorics even slightly. The only science this might be relative to is thermodynamics and thermogenesis with relation to humans. This has no correlation to data compression whatsoever.

    What I meant in one line was that systems with larger context are able to, like a giant sun, compress lots and extracts lots, and explains the upper edge in evolution for groups of humans who are educated. They try to resist change like energy looping around in a cold motionless battery. I call that "Immortality". We are gathering exponentially more context and repairing/resisting change exponentially faster these days.
    You're talking about philosophy of social constructs at this point and perhaps people not engaging cult-like attributes that cult leaders may find desirable. This has nothing to do with compression, nor does the use of the sun to make disconnected references to evolution or batteries. You're talking nonsense and trying to make it have a meaning for something that has no relation to data compression at this point. The question is: Why? Are you trying to troll the board, or do you actually believe whatever it is you're saying here?

    "Data compression is NOT an evaluation of "Artificial Intelligence", nor is it machine learning"
    I agree there is other ways to compress data with no n-gramming etc. But the best compression benchmarks have been based on many AI techniques. Also 'Online Learning' of extra data context is being achieved in these benchmark compressors.
    No. The benchmarks are measurements of data compression techniques, many of which have no connection to any machine learning algorithms at all. The best compression benchmarks are those which are closest to Shannon's measurement for entropy that have been achievable thus far, and as mentioned, although it may be currently approached with a method that uses a form of AI, it is not AI itself that does this (but as Shelwien correctly mentioned, it is the current body of human work and human knowledge that gives those algorithms the ability to make those contexts which compress the data, and the only learning the program does is around the constraints for the weights in the algorithm itself).


    "Prior to the use of any algorithmic form of AI for compression, you've had statistical and entropy coders which didn't (and many still don't) use anything more than a very small initial table to cleverly organize and compress data without having to digitally "think" or use back-propagation or anything like it to compress data."

    Are you talking about the few short context windows only looking at the very recent last few letters lol?
    Indeed, the recent context is most important. Farther back is less important but still critical weight to harness though.
    No, I am talking about any order of data that you decide to use that can be variable length, be it order 0 and 1 symbol at a time, or order X where X can be several bytes at a time. Not necessarily a window (unless you're talking about an LZ variant for implementation). They would be symbols ranging from 0 to 255 as 8 bit ASCII usually, or 4 bit hex symbols (or words, or double-words, whatever bit fields you want to work with for it). They aren't letters unless you're only trying to compress text, and they are not restricted to the last few characters if you're using other algorithms that save the entire history of all data encountered in a large buffer. You seem to be thinking very limited on this and that only the last few symbols are compressible, but that is not the case, and not how most compressors work. They can work that way yes, but there are bitwise operations and other transforms for data, precomp and converting words into more easily compressed symbols and other things which increase compressibility.

    The distance between data read is not going to be more or less important when encountered unless you have a certain arrangement or prediction of known contexts for it. For example, the word "the" before certain words can literally be expected if you're working with English text just as much as character 32 can be expected after each word, if not a comma or a period or ? symbol which has its own statistical occurrences that can be charted out in the English language.

    While things like this which parse the English language into a much more compressible form COULD be an advantageous application of machine learning to help transform it more easily, those transformations (if a priori knowledge already) could just as easily be hardcoded and then applied before compression to increase compressibility of those words and the data.

    Entire sentences and phrases could be as well, giving greater gains and, wherever you may encounter this, would literally have nothing to do with the last few symbols you have read, but would have everything to do with the order of how many symbols you're able to read, the expectations and correlations the program can correctly observe between data structures, and it's ability to represent that efficiently.

    Again, this can be done with or without the help of any form of AI if the program that is written for it knows what to look for and how to make those changes before compression (and how to return them to normal thereafter).

    If you are dealing with only certain approaches, then in certain cases the most recent bytes and bits will be applicable to those algorithms if your goal is compression of redundant characters. But if you're sorting data like with BWT, you'd actually want to get as much data as possible to a block to be able to organize it and put the symbols together where you need then to be. The larger the block size, the more effective the organization will be. The most recent bytes encountered is not going to even be an issue with that because of how that algorithm arranges data, and statistical coding after you've done this becomes the goal rather than pattern matching, because the focus and implementation for compression is entirely different at that point. You're going for counts and nearness of arranged data and representing them with the shortest bit sequences possible then, rather than pattern matches and other attempts to compress data where you might use a window or not.

    "Huffman, Arithmetic and Range encoding, RLE, and LZ algorithms do not have to implement any form of artificial intelligence"
    I can tell you haven't thought this out.
    Then you can't tell very much. I have thought this out for over 31 years and compression algorithms weren't using AI back then for anything to compress data, and most are still not today.

    These algorithms destroy patterns / just fix a human-made issue let me explain. Huffman and LZ make shorter codes, RLE makes runs of patterns shorter codes, Range/Arithmetic only stores the error correction for desired prediction output.
    They don't destroy patterns, they represent them in shorthand. Data that is created is not a human-made issue, because the data that you work with could be generated by a machine, by man, or by nature itself as random data. Nuclear decay is considered a true form of randomness by human perception currently because it is unpredictable, and it has been the basis for random data generation for years and those who utilize it frequently such as the RAND corporation.

    Huffman and LZ try to make shorter codes, but they don't always succeed at that if the structure of the data does not favor it.

    Huffman and LZ methods approach the data as having high redundancy to the extent that they expect to compress data by having enough space existing at the front for a table to be stored and representations combined, to where the table and the data together are smaller than the original data based on the exploitation of frequency and the ability to compress out what was recognizable enough to make available space for everything...and everything above that is your "sauce" or gravy, since it becomes free space on the medium. But if the data you work with is already in a different form, even if it isn't compressed yet, those algorithms won't be able to generate very short codes and in some cases won't at all. If the algorithm has to account for everything and doesn't get the outcome the algorithm was designed to exploit mathematically, then the data will either break-even on size or expand as the result of that.

    The shorter codes from Huffman/ LZ/ /RangeCoding/ Arithmetic is a human problem, just an inflation, Huffman is almost as optimal as Arithmetic AND RangeCoding AND LZ - they all do the same thing!
    This is incorrect, but I'll explain why. Huffman is not nearly as optimal as arithmetic or range encoding. Range encoding is an approximation to arithmetic that was implemented to get around the patent issues from IBM and others that was still good enough to get near to +/- 10% of what arithmetic encoding's output would give you. But either one is far better than Huffman, because the best that Huffman can compress by is half a bit (1 or a 0 in a tree), where you can use fractional parts of a bit to figure out whether it should be restored as a 1 or a 0. When you compress data with a very good arithmetic encoding scheme, you can see up to 33% better performance on highly compressible data because of these differences. Part of the reason for this is that it approaches the logarithmic balance of the base 10 number system at log2(10/1) = 3.322 bits per symbol. The best Huffman can do is 3.5 bits per symbol (or worse, 4 bits per symbol if you are keeping it as a nibble), which is 0.178 bits larger than it needs to be and 0.178 able to be saved more than if you used Huffman upon the same file or group of data. The information saved adds up to several bytes, kilobytes, even megabytes when the file you work with is large enough. Using arithmetic encoding you can subdivide those base 10 ranges further and further according to the frequencies and scale them to stay above the floor and under the ceiling to where those more frequent symbols can be represented with 1.66 bits, .087 bits, or less, again depending on what you encounter and the frequency counts for them. Huffman isn't going to be able to give you this unless you are able to include context-jumps and even then, you bet on being able to encounter enough of those to negate expansion of the extra bit used in a table to make that possible. If you use an LZ algorithm, usually the best you can do is about 2.4 bits per symbol at it's best, give or take filters and other conditions. But yes, all of these factors are important.

    While the approach of these algorithms and what they do are in SOME ways similar, they do not give the same results, they do not give near the same results (otherwise benchmarks would show a few bytes difference, and not megabytes or gigabytes difference in some compressed outputs), and you can't expect to get the same results even remotely...because if even 1 bit in a data stream is added or subtracted, it will throw the entire data sequence off from what your algorithm expected to work with, and what you get as a result is going to be better or worse because of it, even just because of that 1 bit.

    They give equivalent compression, they have the same idea.
    NO they do not! You haven't thought this through at all, I can tell. You are thinking that Huffman is the same as arithmetic encoding when Arithmetic can reach levels that are fractions of what Huffman can do? You are thinking that Huffman alone can give you the same results as a well-tuned LZ based algorithm that can output codes at a quarter or less the size of Huffman in ideal cases? They do not have the same idea, because they are 3 different ways of working with binary data!

    You need to study Information Theory and start from the start. Study the work and papers that Dr. Claude Shannon did and Dr. Fano, and then study Kolmogrov Complexity, the Kraft–McMillan inequality theorem, and (one of) the first Arithmetic encoding implementations from Ian Witten's paper on it. It wouldn't hurt to look over Charles Bloom's work, codebase, and pay attention to PPMd and PPMZ among other things for a foundation before you try to move forward. A lot of people read a few paragraphs from Mahoney's "Data Compression Explained" and suddenly think data compression is all about AI and text compression only, and that's not the case AT ALL). I would look over and carefully study that and more before trying to apply...ideas...no matter how based in reality or far fetched they may seem, because you may find that others have tried those approaches and can either help you with the next step, or save you a lot of time by not repeating the wheel to know why they did or did not work in the end.

    The codes/words we use on facebook chat are twice as big as need, 'this' is 32 bits but could be 16 bits, if we were aliens we could just change our language to shortest codes.
    Of course, but people aren't going to run around and stare at each other screaming "1 0 0 1 0 1 1 0! 0 0 0 0 1 1 1 0 0?!?! LOL, but wait, 1 1 0 1 1 0 0 0 0 1 1 1 0 1 1 0!"

    If it were that easy, then we'd be doing that already. But humans are not digital creatures, we're analog in nature, and we respond better to patterns, learned behaviors, stimuli, and sound or expression in various forms that are more analog than not.

    The Chinese and Japanese language (and other asian languages) DO approach this at times, and the use of their language system may be seen in some cases as a form of text compression. Entire events can be written on a piece of paper that symbolize several words to an entire sentence in other languages at times. But that doesn't mean that it's always understandable or as easily distinguished as those longer sequences may be. Sure, you could switch those sentences you recognize in English or German or Italian using a standard alphabet to Chinese and even as unicode maybe save space over the English representation...but things always get lost in translation, and the original sentence may not be able to be converted back losslessly even if "most of the meaning heuristically" can be ascertained and recovered from it.

    And then again, you have issues for context. For example, the Japanese language has different meanings just for the word Kami. Kami no kazi (kaze) or kami no noki which can literally mean the difference between the god of wind who is considered the eldest god of the Japanese Shinto religion, hair on one's head, or if you were to simply say "kami" with no other context, it would mean a sheet of paper. Other caveats would be things like akachan, which is baby but also means the color of red. Then you have to account for word exclusions, like how yon rather than shi is used to count the number 4 in Japanese. The reason they use yon rather than shi is because shi often means death or bad luck, whereas yon does not. Number frequencies will change too based on cultural awareness, perceptions, interpretations, and even superstitions. For example, the number 666 is considered evil to western and many middle-eastern religious subscripts, but 666 would be fine for China and other countries where 3, 6, and 9 are considered lucky numbers. While you would be fine to use the number 4 or represent it in western and European cultures, people in both China and Japan try to avoid it at times because it sounds very close to the word for death, and the very representation of it without being aware of it or giving someone a gift or a piece of paper with a message to transmit information could be insulting or like you're trying to wish something bad when that is not your intention.

    Understanding these differences would be an important use and utilization for expert systems and AI learning for sure, but these are all caveats you encounter with trying to condense data and make one-size-fits-all for the languages and cultures across the world today. Unfortunately, dedpulication and central representation of one singular language that can be expressed in every other language flawlessly is not feasible due to the differences in established cultures and civilizations today. If it were, then we might be able to make one very condensed and efficient language for humanity, but the representation for it, the learning curve (and all of the slang that people are going to introduce with it after it is made!) are going to be an almost insurmountable situation.

    To address your example, you said above that the word "this" could be represented with fewer than 32 bits. Yes, definitely true. But consider this: You can already represent that exact same word with 28 bits instead of 32 bits and save 4 bits for every 4 ASCII symbols if you know that you'll ONLY be dealing with text symbols from 0 to 127, because you can knock off the MSB (most significant bit) and get rid of the 0 that prefixes the other 7 bits after it where your text data will be held. If your program knows that it'll be reading 7 bits at a time rather than 8, then no AI is required, it just reads 7 bits, adds a 0 to the front, and converts it back to an ASCII symbol to make 8 bits again. Very easy text compression just like that, by eliminating the 0 bit prefix that is not needed for symbols that are less than 128 on the ASCII symbol chart.

    BUT consider this, too: IF you were to get rid of that 0 on the text, guess what is going to happen to all of the precomp tables and algorithms you use to try and get 3 or 4 letter words to compress to 16 bits? They won't work anymore because now those 7 bits at a time will be shifted 1 bit to the left for every single symbol, creating an entirely different model that will be full binary data now because the MSB will no longer always be a 0. While you would get compression by converting it from 8 to 7 bits and then to ASCII, you would be losing out on the other methods which may very well outperform that without that transition made first.

    You could try to rewrite the algorithm and precomp of your text to accommodate for those 7 bit symbols rather than 8 and still compress them, but you're going to get 1 bit less compression per symbol after conversion, and 3 symbols would only be 21 bits used rather than 24, meaning your tables that may have had that extra space to fit those extra frequent words might no longer fit. Is it going to compress as well still or not? It depends how you model it from there.


    As for RLE, you could first do BWT, this is 'looking' for patterns, to find high probabilities to Arithmetic Encode in, this is similar to short context windows on the previous text because they are linked to their associations to predict. Now this, BWT+RLE, is destructive, like the various n-gram methods, and is part of the AI pattern searching. Same thing, patterns; AI. So the human made issue is just that, while the BWT+RLE and n-graming is both pattern searching. Of course they are, all similar.
    You would want to use RLE first, and then BWT and then MTF, because you can represent large RLE sequences with only 1 or very few symbols for large reps, and then exploit the delta spaces between the RLE outputs with what BWT doesn't pick up and squeeze a little more out of it by keeping them as close to the front as possible. There may be types of data where this isn't as efficient, but for many types it will be more than not. BWT and RLE have nothing to do with AI pattern searching. RLE is just encoding of long runs to a concise representation of how many symbols or bits are seen after it, while BWT has a computable and reversible structure (you can lay the data for it out in rows on a table to see how). MTF is mostly computable as well. What is not currently computable is Kolmogorov complexity and applications revealing it with the Kolmogorov-Smirnov tests, but maybe one day that will change. I won't say that it is never computable, but only that it is not yet computable.

    That said, neither these algorithms nor Lempel-Ziv based sliding windows have anything to do with Artificial Intelligence. Artificial Intelligence is something that is capable of encompassing machine learning algorithms to increase its intelligence and "think" of and create ways to handle new data. The majority of the algorithms we use today do not do this, and do not need to. They could, but if they did, often times they will be returning to my numerical analogy outlined on the previous post. These algorithms are entirely different from one another and are not the same. The only thing they have in common is the goal to compress data, but the way they go about it is different in each case. They may share properties from one another, but their likeness ends at their differences and the efficiency of one over another.

    "Another example of this is"
    Yes, making sure it actually compresses wiki8 (and other human/realWorld data that HAS patterns in it), is mandatory. And yes, say you landed on a good compressor, the wiki8 compressor you make is a pattern searcher, so it should work (depends if the text has less patterns or is simply too unseen) good on average on other human/realWorld data observations. If you use a text compressor on video data, first it needs code change but as for the idea, to consider nearby context, frequency, grouping similar ex. animals. Vision, any pixel is meaningless without context. All data in the universe has context and will show many patterns, every word in the English dictionary explains each other.
    If you're only trying to extrapolate data to apply it to a type of learning engine, then you will need contexts. But if you're not, the data may still be compressible without a single shred of AI implemented and will be very meaningful. Only for an AI system that requires grouping and "understanding" the data that it processes to make "guesses" about it to give you the results you're looking for, only to that AI system would data without context be meaningless. The universe exists with or without that implementation of AI at all, and it has a natural pattern to that data whether we, or any AI program that we make, every understands the data that it cimpresses or not. Sure, being able to assign logical labels to sequences does make it more efficient and compressible for that SPECIFIC type of system, but having a way of compressing data more efficiently - without ever employing an AI algorithm to do it - may still exist and give you better results by not trying to arrange the data, but by going smaller.

    Physical mediums did this for us with the advent of CD technology. And then DVD technology, having the ability to fit up to 7 700MB CDs to a single-sided DVD, and nearly 14 CDs to a dual-sided DVD. And then Blu-ray fitting 25GB per side, and 50GB per dual-sided medium. No AI was used to make that data smaller. It physically was able to BE smaller on a medium simply because it was represented in smaller space than a magnetic medium of the time or punch tape. It is possible to write data to molecules (IBM proved this) and nanostructures now, but the stability and practicality of that as a storage medium is not yet ready for the public or even the commercial business sector because of the large amounts of data stored that would be lost if anything goes wrong without a backup, and how different it is for people to adapt to.

    You can physically compress data by making it represent less space on a medium, or you can algorithmically compress it to utilize less space mathematically on any size medium. Neither of these use nor need anything with regard to AI or machine learning, even though they could possibly benefit from it down the road in some way.

    You don't know of all the data in the universe, so for you to say that 'all data in the universe has context' is preposterous. You have to observe that data first and know what it is you're observing to honestly, fairly, and accurately make a statement such as that. Anyone would. To make a blanket statement like that about data in the universe is no different than a scientist in the 1500's saying we are all riding on the back of a giant turtle floating through space on a bird, and if anyone questions it they are a heretic lol. We know that we're not flying in space on a bird or on the back of a giant turtle, but to make that assumption and then try to get the rest of the world to go along with the notion of it (a lot how some people try to force-feed AI to the masses as data compression) is folly and does more harm than good in the end because some will know better, but many will not or be mislead until they realize it's "possible" but neither necessary nor likely.

    "The application of learning n-grams can be useful - to a point" "expert systems, neural networks, and any other approach"
    Yes ex. n-grams have a just-right point before growing x10 bigger than wiki8 itself. The point is, with n-grams, and the other mechanisms, all set to the just-right settings, result in the littlest data yet most knowledge about wiki8. GPT-2 (see openAI website and test it at TalkToTransformer) is similar to the best wiki8 compression benchmarks (n-grams far back, grouping words) and GPT-2 is extremely much better predictor all while being efficient in speed and memory. GANs compress data, the predictor is shrinking and trying to learn the real world data so it's outputs look 'real', no fake. So it gets a lot for a little, and understands the data patterns. Transformers have now been used on wiki8 to achieve really great results: https://openreview.net/pdf?id=Hygi7xStvS And yes more data is exponentially better because every fact you know enables 10 questions to be answered; patterns.
    Part of the problem is what Shelwien addressed earlier, though. Those n-grams are not just tuned to look at letters and numbers and words, but they're tuned to XML data and page breaks that don't have a real meaning outside of enwik8 or enwik9 that they are used on, yet "compression occurs - for that file" because...those things are present along with it to make it more readable. The perpetual benchmarks we get with new improvements are not just for the natural language itself compressed, but are (what would the right word be here to say?) "polluted"? or "contaminated" by the HTML, XML, and other things that don't have anything to do with it that the system is "learning" to compress, that won't be there much or at all with other applications where that language is recognized and attempts are made to make it more efficiently represented.

    The universe laws are predictable.
    NO, they are not. If they were, you wouldn't be here and neither would I. The laws are NOT predictable if you don't know what the universe IS and how it works! This is the #1 fundamental error that people don't seem to get. With the inclusion of new-age quasi moonbeam 70's hippie stuff, a lot of real science and mathematics gets tossed to the wayside or corrupted by it, and humans then laughably think they know everything about the world when they can't even get beyond the Van Allen belts and mangetic radiation that holds them and keeps them confined to Earth that they can't identify. If they knew what the universe was and it's laws, then they would know how to get humans beyond those belts and not just heavily shielded electronics. It's a problem that NASA has admitted in recent years they've never been able to solve. Even with all their experience, they don't claim to know the laws of the universe nor do they attempt to insult it by saying or even pretending it is predictable. Why do you?


    So digesting more data I mean here gives you better probabilities and for longer n-grams which have few observations (order-90...). Sure, you can ramp it up, but the Hutter Prize is about finding out the best beast before any ramping up, because it is smallest yet does a lot of damage (or healing, I should say, prediction is regeneration/ rejuvenation).
    What is next? Are you going to associate Hindi Prana to Kolmogorov complexity? Is DATAFILES/16 and WEB technologies back or did I miss something?

    Evolution is also exponentially faster near end because of that extra context we are sharing/thinking of. So yes, growing the brain is absolutely part of it, no one said we just want to digest wiki8 in the smallest way, we want to grab wiki9 and Mars data and grow the n-grams and probabilities and then Extract free fuel/wisdom from the Learnt network itself.
    Yes, growing the brain may help a lot here.

    "That said, there isn't going to be a Utopia"
    I was just sharing the trend where humans find patterns and grow more confident in predictions, we are able to breed faster new humans and ideas to be mutated, we are able to Extract new knowledge/intelligence from know data ex. if cats eat and cats are dogs then, dogs eat. We are getting better at Darwinian Survival and are breeding/repairing faster than we are dying.
    Darwin's approach was disproven. At the end of his life, it was Charles Darwin himself who said it was not correct, because there's been too many variables that were the opposite of his assumptions and early conclusions. His work did help advance things certainly, but it was a dead end and knowing that it was, we have been able to move forward to new ways to analyze things and not be stuck there.

    What you are talking about has nothing to do with data compression, and is merely a figment of human fantasies for Transhumanism instead.

    To repair/regenerate, we use context.
    No, we don't. The body uses the immune system with a priori knowledge of the body it maintains to repair it. There is no conscious element required for this to happen. If there were, people in comas with scrapes and bruises would still have them. But they heal physically whether they are awake or not because it is not a conscious process, but an exclusively biological one.

    So our big data these days is making us come close to the nanobots where instant regeneration occurs and are extremely hard to destroy them. How can you catch all nanobots? A butterfly net? And they repair near instantly as their 'hospitals' are general and everywhere. They understand the data/real universe extremely extremely well. Sure the opposite could happen, decompression/expansion where we eat too much energy/matter and grow like a star and burst into heat from gravity, giving it back,
    Again, you're talking about Transhumanism, not Data Compression, and your rant is better reserved for the Off Topic area and or a subreddit of Star Trek and new developments with nanotechnology in the real world, than it is trying to "make" it have a connection here when it otherwise does not. Nanobots do not "understand" our universe any more than a virus does, a bacterium, an ant, a fly, a rodent, a cat, or a dog like Spuds Mackenzie. They are designed to do what they do in nature and by those who are impressionable to them in their world and environment. They are not out there creating nebulas while they are feeding on microorganisms or licking your hand. They're just not, and you need to understand that before you continue commenting in a DATA COMPRESSION FORUM.

    compression>decompression. Or big universe crunch. But maybe, there is an end of evolution, equilibrium of physics, where we have the approx. balanced compression/decompression ex. the huge nanoblob world would eat a lot and dump a lot, and we compress data a lot and extract a lot of insights.
    There has to be a beginning to an evolution before there can ever be an end to it.

    "The key "sauces" are anything and everything that works."
    Nope, patterns, didn't I say there's trends? Sure you can use a hammer or rock of rod to break a glass windows but, patterns exist.
    (sarcasm) Right....because it has nothing to do with the volume of data on how much you can compress anything either....it is merely about patterns, and star gazers, and battlestar galactica's interactions with evolution on the star planet of Ryza 4 where Picard and Jordie found a clone of data manufactured by the universe that Data's cat Spot understands perfectly because nanobots and singularity. Right. Got it. (/end sarcasm)


    "Context modeling is currently king for size, while other methods that don't compress as much but are very fast and are able to read large blocks quickly are king for speed."
    I agree speed increase can result is memory increase, because you need to extract things you *do *not *have. While if you do have them, it's fast but big memory. When you break this law, you actually have them all but are just sorted better Ex. "the and it he boat zoo Mars grommet" vs "grommet Mars zoo the he boat it and". Same memory, but slower in the last sort. But we saw compressed data can be fast to decompress? Yeah, but with wiki8 100MB there's no decompressing hence fast to scan for X. So we want to find a compressor for AGI that extracts answers/data fast. One thing, extraction is faster if you understand the data, because Brute Force is very slow to give you the output you want.
    You can simulate RAM with an SSD if you have enough space or an array to not have to worry about those issues, but most of the compressors are designed to use MALLOC and not a ram drive because most people don't have storage arrays nor sufficient storage to do that, whereas RAM they usually are expected to have plenty (over 2GB on the low end and near 32GB on the high end per the time of this post).


    So intelligence, per evolution, is all about efficiency, like computer chips, are small compression and speedy. As I said you can know 100GB and know most the universe patterns, so that is pretty fast for extracting whatever you need using 1) little data and 2) compressed little data. So we want compression and speed, and intelligent methods do exactly that, they give you the compression, like Brute Force would, but fast. If I were to work on a BWT compressor on bits it would give some fair compression but isn't fast now is it.
    Intelligence is not interdependent upon evolution nor is evolution interdependent upon it more than adaptability, and the fact that evolution has not been proven but was mostly disproven by its own creator. Intelligence is real, but it did not need anything exterior to validate it other than its application in the world by living creatures to do work, and incrementally do that work more efficiently.

    You don't know what the universe is, but you claim that you can do anything in the universe now that you have 100GB of data? OK, good luck with that then lol

    "We's" been looking" "There isn't a silver bullet yet"
    As said, our own brains are fast compressors and extractors. We can understand a lot of the universe (works on any dataset, we digest it and build) if study some lectures for a few years of diverse domains to build a mental Theory Of All about the laws of physics.
    That isn't us interacting with the universe nor understanding it and how it works. You have to understand how small human beings and creatures are on this planet in contrast to the spatial bodies we are aware of, not even mentioning the ones that are out there that people are not aware of yet. What humans are understanding more of incrementally is their own environment and how to grow and adapt to it to do things. The universe may very well have given humanity and all life on this planet some very nice things to work with, but it's almost childish for one to say that they can pretend to model the universe based on models given to them on the Earth.

    That is tantamount to a young kid saying they know exactly how the world and banks work because they played Monopoly with a friend once and bought a house on Parkplace. That is not how banks and money works, they wouldn't know about the Federal Reserve or currency exchanges nor stock options and puts or how to deal with investors or anything like that. They would THINK they knew because of what they had available to them and their desire for that to be the biggest thing in the world, but reality is that it is not even 1/10000th of what is really out there in the world. How much more exists in the universe and hundreds to thousands of other planets undiscovered, then? It is foolish to pretend that one knows the universe based on the things they do in some human way. While we are here in a PART of the universe and interact with a FEW or SOME of the laws that it has implemented, humans cannot know the great extent to which this system is nor how far out it extends.


    "We're not re-churning or reprocessing any data on the Earth one way or another"
    Earth is evolving, we generate data, share it, compress it, then extract yet more insights, recursively self improving our "intelligence" (intelligence by definition is a term that refers to data).
    More Ray Kurzweil Transhumanism, not anything spoken of Data Compression. If anything, humans are recursively destroying themselves...not improving nor "evolving".

    "And yet still, the information that is filled in to make sense to the beholder for the situation of lossy representation may not even be the same lossless data that would exist if it were used to make sense to another individual who sees it differently."
    Yes our knowledge makes us think/predict what we even see, but most humans and animals have a similar model of the world and each human is weighted in and averaged lol.
    That is mostly true, but not every human or animal will. Even with twins, if two identical twins see and have enough outside stimuli which is different from their combined environments, they will start to develop differences based upon those deviations around the likenesses they continue to maintain. There have been a few scientific studies upon this, and that was the general consensus and conclusion observed between all of them.

    "Assuming that random data is random and therefore not compressible is yet another area to be careful when walking along, because what seems to be "random" may not always be IF you can find a pattern for that type of data produced. The repetition in the data is not going to be there and yeah, the algorithms you use that exploit that will have harder and harder times achieving compression because of it. But that doesn't mean that the data has become random or that it has become uncompressible in every case, just that the methods which are currently known and used for it will have an increasingly harder time to get larger compression ratios out of it as it reaches the limit for the algorithm used currently for compression."

    No, random means random in our discussion here. Take a global bag of context, ex. Earth or the solar system, and count its occurrences. Random is something that is-not compressible as much. Else Hmm there is patterns everywhere, hills of sand on Mars, mountains, shadows, liquid, splashes, fractures, electro-magnetism, so if there isn't 'really' any random patterns that exist then actually I may opt now for the conclusion that there is no random because all particles are elementary types and can be positioned relatively to become less random.
    One person's random is another person's pattern given the right situations, tools, and opportunities. Yes, there are patterns in everything, whether they are observed or not, but that doesn't mean that data is going to always be compressible because volume and medium for data containment are equally important to address. What good is storing half a bit somewhere if there is nothing to store a bit on?

    Now, this doesn't mean infinite compression because of no random data is permanent. Evolution doesn't make Earth evolve into a perfect homogeneous solid, it has to be able to defend against death like in evolution. So the best 'morph' to become is defensive/rejuvenating and has most patterns (general nanobot fabric all connected). It knows there is another of the module x distance away, while sub units know there is another of themselves in the bigger unit. So: patterny, while defensive/repairing by knowing the right knowledge and having the approximately right technologies. Of course surrounding matter is not transformed yet, so its a never ending battle for life in the future
    Yep, I think you need to find a good transhumanism forum because you're talking about that and nanobots, not Data Compression and Data. Maybe the nanobots will Mighty Morphin' Power Rangers into supercomputers that beat Google's current qubit machines, but until then, I'll leave that up to you to expand upon.


    "You can make an AI program that "learns" which numbers will add up to 4,294,967,296"
    Yes. And yes knowing that 65,536*65,536 gives the answer is faster. You can store it for future use. But how do ou in polynomial time find the answers faster these days? We have big data these days. We can try dividing the number or find the square root ex. try the square root of 60, if is in the high then try 600, if still try 4,000.....70,000...now in the low, try 60,000...eventually you find it by Binary Search. See how my knowledge allowed me to discover how to find this in just 2 minutes? I don't usually work with hard math so this was exciting. (I just work with other formats/symbols, like visual images of dogs/cats drinking or text words saying what i/does what. Works because of reduncdancy.)
    Those methods can be done yes, but the trial and error for each transaction (even when arriving at the correct answer) are going to be slower than the fastest route to the answer mathematically.

    "It isn't the "sauces" at all, but the way people prepare them in a meal that makes or breaks your dinner experience. "
    It is the sauces, and their relative positions, just like in GPT-2 Transformer architecture

  12. #10
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,975
    Thanks
    296
    Thanked 1,302 Times in 739 Posts
    > But, will I get the same compression if, instead of 8bit>8bit I instead do 15bit>1bit?
    > It's same as order-1, the length is 16 bits, but most the context is the 16 bits.

    16-bit context with 8-bit symbols would be order2 and would usually provide better compression than order1.
    16-bit context with 8-bit text parsed as 16-bit symbols would be order1, but compression won't be good because of misalignment.
    16-bit context with 16-bit unicode symbols would be order1 with slightly better results than 8-bit order1 with utf8 text.

    > Is that all you do?

    Mixing of order0..order-N (paq has up to order15 or so) is the most basic model which is a CM equivalent of LZ string matching.
    But paq8 has hundreds of other contexts and whole submodels.

    > Or do I have to slap on a logistics activation too and divide something in half lol and round it?????

    Even if you have to, there're mathematical reasons for that, for example: https://encode.su/threads/496-Paq-mixer-theory

    There're also various other components like different kinds of counters (size/precision/tunability tradeoffs),
    secondary models which take primary predictions as context (SSE/APM), static and adaptive extrapolation functions, etc.
    Even whole non-CM models like DMC or PPM, or NN components like LSTM.

    > I understand Arithmetic Coding already and mixing.

    Well, you can read Mahoney's book: http://mattmahoney.net/dc/dce.html#Section_4

  13. #11
    Member
    Join Date
    Jan 2020
    Location
    Canada
    Posts
    142
    Thanks
    12
    Thanked 2 Times in 2 Posts
    nooo, Shelwien that wasn't my question. Matt made butterflies in my stomach when he said 'predict 1 bit at a time'. It's simple, better, all while sounds dare-ing. Currently my code takes a look at 1 char (8 bits) and predicts the next 1 char (8 bits). I was thinking, hmm, what if I take a look at 1.9 chars (15 bits) and predict the next 1 bit (1 bit). All I do is change the size of windows here and I'm predicting the Next Bit. But will this allow me to do it? Sounds too good to be true. And will it give me the same compression I currently get or worse/better?

  14. #12
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,975
    Thanks
    296
    Thanked 1,302 Times in 739 Posts
    > I was thinking, hmm, what if I take a look at 1.9 chars (15 bits) and predict the next 1 bit (1 bit).
    > All I do is change the size of windows here and I'm predicting the Next Bit.
    > But will this allow me to do it? Sounds too good to be true.

    Depends on your implementation?

    For most files byte alignment matters (bit's position in a byte),
    so you can't just use 15-bit prefix context to predict the next bit all the time.
    Normally you'd have to use a byte-aligned context (15 bits taken from previous bytes)
    _and_ 0-7 bits of extra context within the current byte.

    > And will it give me the same compression I currently get or worse/better?

    1) If you'd use a 8-15 bits aligned context to encode bits, it would be a normal bitwise order0
    You can see the related experiment here: https://encode.su/threads/1052-Bytewise-vs-Binary
    In short, working with byte alphabet gets you slightly better results on text,
    while working with bits gets you a more universal coder (better results on binaries).

    2) Using a 15-bit aligned context + 0..7 extra bits of a current byte would
    be closer to order2, and would thus give you much better compression than order1
    in most cases.

    3) Just using a 15-bit unaligned context for every bit would give you worse compression
    than normal order1 on most files, although it may be good for files that are naturally bitwise
    (1bpp pictures, data compressed with huffman).

    In any case, the compression reachable with a single context is limited.

  15. #13
    Member
    Join Date
    Jan 2020
    Location
    Canada
    Posts
    142
    Thanks
    12
    Thanked 2 Times in 2 Posts
    I know 15bit context to predict 1 bit looks like order2 but it has to be as good as order1 because my current order1 is 16bits in length as well. Order2 would be 3*8 bits; 24 bits.

    Well, I'll give it a shot.

  16. #14
    Member
    Join Date
    Jan 2020
    Location
    Canada
    Posts
    142
    Thanks
    12
    Thanked 2 Times in 2 Posts
    I won't talk 'LSD troll' in ANY future posts and will stay concise but compression/expansion is all throughout the universe, atoms and planets become unstable when too large and self-extract radiation. During growth, they grow faster the bigger they are because an object x the same distance away from say a moon or sun has a difference in gravity pull. Magnets attract/repel. High-tech cities grow faster; new skyscrapers are built. The combinational interactions in the core heat it up. Like data compression, you can compress/extract *more* the more data you have. We also extract energy from nukes, gasoline, food, and fire extracts energy. Like fire, cells spread too and burn energy, that's all we/nanobots will do is grow. But we better find an equalibrium between compressing/exploding, both kill you/data. Like a friend network, you can gain a small fast team by firing bad ones.

    The top benchmarks (I took a look again http://mattmahoney.net/dc/text.html) all use either CM, PPM, or LSTM, or Transformers (https://openreview.net/pdf?id=Hygi7xStvS) which Transformers don't use handcrafted rules as much.
    The understanding here is context, frequency, grouping words he/him/her etc. I know it looks 'simple', but that IS AI. Let me show you again OpenAI's GPT-2 https://openai.com/blog/better-language-models/ This is predicting the Next Word, if top benchmarks did this then they are fairly AI but they don't I assume or even fairly do they. However, we sorta know what GPT-2 does here, it can recognize unseen context by using grouping etc and can be confident what the Next Word is. This is exactly what we want. We are basically doing this. Top benchmarks don't use much ANNs I agree, they try to be simple efficient adjustable. But once they do long term dependancies they too will predict even better. Yes we make the AI, however 'it' does do something for us, it looksat context and can even recognize never seen/known context if doe right. Once the AI starts extracting data, it can be used to make New discoveries.

    Yes whole paragraphs even could be pasted. So how does an AI look far back and do that? Humans seemed to latch on. Have to think about this.

    Yes BWT after you RLE/MTF but can you explain how to run this monster on 800,000,000 bits (wiki) so that it re-organizes bits, not letters, in practical time? It takes so long to run. So you can only do it in chunks. Does the highest wiki8 or wiki9 benchmark use BWT? If not, we can't say it's neccesary.

    "Huffman, Arithmetic and Range encoding, RLE, and LZ algorithms do not have to implement any form of artificial intelligence"
    AC/RC simply store correction to favor the correct prediction. This isn't the AI part.
    HC/LZ make English words/codes smaller like Byte Pair Encodin does, actually this is important 'AI' I take that back in last post.
    RLE is similar but it seems *more* destructive , and probably not used in top-most benchmarks (?).

    Yes AC can sometimes be 33% better than HC ex. you have 1025 unique code IDs and each must be 11 bits, but 1024 would require 10 bits per word/code.

    Yes you can make a movie disc smaller physically or the data on it less size.

    If you compress 100MB into 15MB or say (if utterly) maxed ex. 6MB, this is 'random' data that *can't* be losslessly compressed any further. Yes algorithms differ on how to do it, but because they have the same goal they have the similar ideas; context=patterns. You could destroy a small poem and some girl will re-create it in 500 years but, takes longer. You could have a small molecule, unique to the whole universe, destroyed, but in 500 years the exact (or much more approx.) one emerges in another galaxy, meaning physics has the 'lossless' data all there really. The hyperthetical 6MB random wiki8 isn't actually wiki8, and wil take you longer to re-generate it back from the dead. Simply your computer (mostly) has a limit of compressing it. Of course if you run a brute force that goes through each movie, life, activity, it can run everything from nothing; a man drinking beer and then riding his bike. But the universe isnt exactly a brute force. Sorta is because watch: Without a world of particles you won't get each animal/activity or item sold in stores. So to evolve data you need a big enough system, and to move faster (we want this one) we need already made context bags and Not start from scratch like cavemen. So this defines compressibility (you can store wiki8 as nothing but the Earth system, if bigger, IS wiki8 (at least a bit of wiki8 will be re-created)) while shows speed is slow if don't utilize sorted data of Earth. We want to losslessly compress wiki8 on cmoputers though, and quickly extract insights/wiki8, so we MUST store some already created advanced algorithm, else we'd be in caveman days and takes longgg to re-generate anythnig. So it seems we can compress lots *while* extracts fast as long as you have a good algorithm. Why? Because yes, wiki8 is so destroyed, but you have the large-sized 6MB predictor which can be organized such that, unlike evolution, has some nify high-tech ideas. So a mix between destruction of organized data but still has high-tech future data. Yes any shrinking = slower, we go could go lower but slower! There could be a just-right ground that's feasible.

    I meant, given a brain the size of Earth, filled with diverse trained knowledge, you would approx. understand/be able to predict most te universe. Of course this won't work fully, you can't see what joe is dong in galaxy 7. But it's mostly true, you know joe is eating, sleeping, interacting sending communications to other agents.

    Oh. Yes our compressors are tuned to ex. wiki8 and while they should be good predictors and even better once read more than wiki8, they retain crappy data though like ascii castles in wiki8 and other beliefs/hobbies, humans ignore a lot of data. But remember what I said, we are looking at the power - how much data you can extract from so little data. Then we will fine-tune it to become AGI.

    "No, we don't. The body uses the immune system with a priori knowledge of the body it maintains to repair it. There is no conscious element required for this to happen. If there were, people in comas with scrapes and bruises would still have them. But they heal physically whether they are awake or not because it is not a conscious process, but an exclusively biological one."
    Erm. There is no consciousness. We are machines. A brain that is a hierarchy of context self-organizes (just physics) to propagate brain waves faster. Body wounds heal by self-organizing too. They consider context I bet. So do friend networks. Nanobots would act as a swarm of friends.neurons too, not individually stupid like you think.

    "saying they know exactly how the world and banks work because they played Monopoly with a friend once"
    You missed my point. More data doesn't just give you more data, it exponentially gives you more the more you have. An Earth sized brain would "know" or 'deal with' a lot of dangerous cases easily. All intelligences have only the goal to survive. AI=contexts=survive death. We love sight-seeing and eating because it gives us more data, life extension, children. It's one big data collection/communication that spreads. Our data tech these days is building on itself rapidly, mobile phones are small fast and powerful.
    Last edited by Self_Recursive_Data; 14th January 2020 at 02:55.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •