Results 1 to 12 of 12

Thread: Random questions

  1. #1
    Member
    Join Date
    Jan 2020
    Location
    Fl
    Posts
    36
    Thanks
    1
    Thanked 3 Times in 3 Posts

    Question Random questions

    1 What can compress binary files or numbers best?

    2 Why is Zip the only one that is able to be opened by windows and isnt their a way to open 7zip files like zip files in windows?

    3 Any way to make compressed image file be view able like regular image files on window and other OS? Or can they not handle it since it would take too long to decompress?

    4 Who sets the standard MS? and if so how would they implement something other formats like 7zip as standard?

    5 What do commercial companies use for compression?
    6 Do companies have to pay a fee to open source programs?
    7 would you have to sell your compression program door to door like a salesman for companies to use your open source program?

    8 why do companies like winrar even bother to sell their program when opensource ones are better?

    9 Hasn't winrar or others paid programs been reserved engineered to be used or its not allowed to be used even if you know how it works?

    10 lets say you want to compete with the others but want it free for everyone but want money, what do you do?
    Use the honor system and say free for non corporations and hope companies pay?

    11 why does "size on disk" remain the same at times when both files are different in size off by 300 bits? I assume block size?

    12 When using 7zip it seems like bzip2 is better than lzma2(default). Would anything be better?

    13 Cant the file select automatically which one to use since as 7zip states "You can get big difference in compression ratio for different sorting methods, if dictionary size is smaller than total size of files"

    14 why does compressing 2 files at the same time bigger than compressing 2 files separate?

    Besides programming what other skills do most people have since seeing things with a different perspective is something new.

    When people have to follow the standard road that has been discovered by everyone it is hard to discover something new than to go off the path to discover uncharted territory. You even drive your car in a grass field or on the size walk in reverse. I am not saying you should do that to a car but try something with ideas as silly as that may sound.

    I like the passion people have on this site.
    ​Thanks

  2. Thanks:

    Self_Recursive_Data (6th March 2020)

  3. #2
    Member
    Join Date
    Jan 2020
    Location
    Canada
    Posts
    142
    Thanks
    12
    Thanked 2 Times in 2 Posts
    > 1 What can compress binary files or numbers best?

    High end data compressors use Arithmitic Encoding to store the file, creating a number which is equal to a binary number, hence compressing it as much as can using prediction to arithmitic encode (ex.
    0.4574843633758) and then number to binary (ex. 1000101110) https://www.rapidtables.com/convert/number/decimal-to-binary.html

    That's the best. Now, when you do get the number encoding ex. 0.4574843633758, it is mostly random now, so it is not easy now to compress it using math tricks like division etc. Rare if you can; some numbers allow full compression ex. if your file was "aaaaaaaaaaaaaa" or "12345678910111213141516".

    However I'm newish and unsure if there is number compressors out there.

  4. #3
    Member
    Join Date
    Jun 2018
    Location
    Yugoslavia
    Posts
    58
    Thanks
    8
    Thanked 3 Times in 3 Posts
    well everything digital is large number. depends how you interpret it.
    some of those questions should be answered by Billy Gates or similar.
    I think winrar became popular when it was the best, so they can monetize that even now. its not bad anyway.

  5. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,942
    Thanks
    291
    Thanked 1,286 Times in 728 Posts
    > 1 What can compress binary files or numbers best?

    paq/cmix if you want a universal solution.

    Otherwise "binary file" is not specific enough.
    For simple 1D tables of numbers you can use eg. optimfrog.
    For 2D tables there're lossless image coders, but most would
    require data conversion and/or adding a header with table dimensions and element type.
    For more complex structures there's no universal solution -
    have to make a custom entropy coder, or at least a preprocessor.

    > 2 Why is Zip the only one that is able to be opened by windows and isnt their a way to
    > open 7zip files like zip files in windows?

    Its not the only one, there's at least .cab and related (.wmi etc).
    https://superuser.com/questions/1038...ndows-explorer

    > 3 Any way to make compressed image file be view able like regular image files
    > on window and other OS? Or can they not handle it since it would take too long to decompress?

    Decoder for specific image format has to be implemented for the corresponding plugin system.

    > 4 Who sets the standard MS? and if so how would they implement something other formats like 7zip as standard?

    .7z format is undocumented, so it can be natively implemented only by accident,
    if some developer in related department really likes it and spends months on implementing it.

    > 5 What do commercial companies use for compression?

    For paperwork and such? Generally .zip.

    Storage companies may use something fancy to save space.

    > 6 Do companies have to pay a fee to open source programs?

    No, but they commonly prefer to pay some intermediate company,
    which takes the responsibility for solving issues related to open-source programs.

    > 7 would you have to sell your compression program door to door like a salesman
    > for companies to use your open source program?

    That might actually work.

    > 8 why do companies like winrar even bother to sell their program when opensource ones are better?

    1) Opensource programs are not always better, in particular winrar has some unique format features
    (recovery, signing, file dedup) which have no alternatives.

    2) Companies would prefer to buy commercial software because it has explicit support and official contracts.

    > 9 Hasn't winrar or others paid programs been reserved engineered to be used or its not allowed
    > to be used even if you know how it works?

    Data compression is not important enough for anyone to bother.
    Yes, all the interesting algorithms are usually reverse-engineered fast enough.
    No, companies mostly still use .zip compression, because its "good enough".

    > 10 lets say you want to compete with the others but want it free for everyone but want money, what do you do?
    > Use the honor system and say free for non corporations and hope companies pay?

    Direct donations, patreon, kickstarter...

    Normally though you'd just write a good open-source app,
    then use it as advertisement to get a good job.

    > 11 why does "size on disk" remain the same at times when both files are different in size off by 300 bits?
    > I assume block size?

    https://en.wikipedia.org/wiki/Data_cluster
    https://en.wikipedia.org/wiki/NTFS#Scalability

    > 12 When using 7zip it seems like bzip2 is better than lzma2(default). Would anything be better?

    Yes, bzip2 can be better than lzma for small text files (<=900kb).
    In these cases, ppmd would be likely even better.

    > 13 Cant the file select automatically which one to use since as 7zip states
    > "You can get big difference in compression ratio for different sorting methods,
    > if dictionary size is smaller than total size of files"

    In theory yes, but there's no clear solution (have to try all different file permutations
    to find the best result), advanced users can do this by generating filelists for compression,
    7-zip author doesn't consider it a high-priority task.

    > 14 why does compressing 2 files at the same time bigger than compressing 2 files separate?

    A compression algorithm would try using statistics collected from file1 to compress file2,
    which would fail if the files are sufficiently different.

    > When people have to follow the standard road that has been discovered by everyone
    > it is hard to discover something new than to go off the path to discover uncharted territory.

    Its kinda like saying "Using a known formula for finding pi digits isn't good enough.
    I'd try juggling random numbers until I find my own formula."

    Its simply not how it works.
    Creativity is useful, but only when you're already able to implement a known method with good results.

    And we're already at the stage where we have more ideas and proof-of-concepts than actual
    practical implementations.

    Like, there's nothing creative and uncharted in data segmentation or file order optimization,
    so nobody does it.

  6. #5
    Member
    Join Date
    Jan 2020
    Location
    Fl
    Posts
    36
    Thanks
    1
    Thanked 3 Times in 3 Posts
    1 When i mean binary i mean a file that has just "1" and "0". LMZA seems beter for this instance.

    cmix is just too much for realistic use it feels sorry to say, it just takes up too much memory which crashed my pc and takes too long to deal with.
    But can you request 7zip to put in cmix and PaQ?


    Speaking of standard windows used deflate to compress files than LMZA or BZIP2. Which LMZA seems to compress binary and numbers better than Bzip2, while bzip 2 is better for more characters, and LMZA for less.


    2 I noticed cab being better than zip but oddly its not used as much. I guess its not good since it is slower despite compresses better than deflate method and time is money.


    3 what prevents it from being implemented? fiance? What steps can be taken to try to implement it you think?


    4 I assume they want to keep the standard for financial royalty?


    5 why would companies use ZIP? for speed? Or is that what most know.


    9 zip used since its good enough... what would make the move to not be good enough? with high speed internet to stream movies now i guess so since compression would have been more valuable with slow dial up modems.


    10 What about another suggestion. online compression? one can not copy what is online and you save it on the free cloud with ads sponsoring and paid non stop big files.
    Maybe cmix might be better that way since most dont have 32gb ram.


    13 in theory? its not hard for read how big the file is and it make the proper selection so i dont understand why its not implemented. I assume 7zip and other compress companies are not being emphasized how much of a gain they get.


    14 a biter % loss. So many small things can add up. its like compression people have a passion but not serious enough.


    So it may seem in all fields until someone comes up with something new. Its like we restrict ourselves when we think their is a limit.




    On another note 7zip went down in popularity and more people looking for just zip file. cmix compression got some. https://trends.google.com/trends/exp...mpression,7zip

    Thanks

  7. #6
    Member
    Join Date
    Jan 2020
    Location
    Canada
    Posts
    142
    Thanks
    12
    Thanked 2 Times in 2 Posts
    I personally use lossless compression as a true evaluation for finding AGI, which is the goal behind the Hutter Prize. Of course it doesn't cover everything, AI minds need bodies/nanobots, rewards, surf the web, etc. Yet, so far the best compressors talk to themselves and do Online Learning and rearrange articles more neatly and group similar words, etc. So it seems like a million dollar idea. Makes you wonder is there any other evaluation we need if my goal is to massively lower the probability of death/pain on Earth? (That's where Earth is going with evolution and the Hutter Prize, it repairs missing data using prediction to make Earth a neater pattern fractal so save on energy wasted, adds it using Online Learning, then it uses its larger context tree to add new data even faster, exponentially growing in scale.) If I invent true AGI, it uses higher intelligent means (faster than brute force) to make/predict the future and form the settled state ('utopia') that physics will settle us down into, and so does it need to have a body (do we need another evaluation) if it knows enough about the real world that it can extract any unseen text/image it would have seen by a body? It has less need to gather real data. So in a sense, to invent AGI is to just predict well and smash the Hutter Prize. As for the rest of the evaluation for achieving immortality, we do need bodies to carry out tasks still, intelligence doesn't 'do it all', but we already have humanoid bodies with no brain really (and humans etc), so yes all we need to focus on is AGI. And big data. Yes the other evaluation being part of the intelligence evaluation is data/brain size; scale. Scale and Prediction is what we need to work on only. Note your prediction can be good but slow/large RAM needed, but it is less important and we can feel that effect i.e. it tells us the cure for cancer but had to wait a year = a big thank you still. As with scale, the more workers on Earth also the more faster we advance as well.

    The first thing you notice is it is easy to get the enwiki8 100MB down to 25MB, but exponentially harder the lower you go. oh. So is intelligence solved? No. But you saw people are getting the data compression needs they need already and won't benefit much more now, so it seems! If a company compresses DNA data down from 100MB to 15MB, why would they/we work on the Hutter Prize so much to get 13MB? Here's why: What if they had not 100MB, but 100GB? the amount cut off is now not 2MB, but 2GB! Also, the more data fed in, the more patterns and the more it can compress it! So not 2GB, but 4GB; 100GB becomes 11GB. Mind you, it is funny our AGI prize is being used to compress data as a tool. Though AI can think up any invention itself, so it doesn't seem odd exactly. Now, seeing that we get more compression ratio the more data fed into it, this means if we make the AI predictor more intelligent in finding/creating patterns, it Will result in a huge improvement, not 15MB>14MB>13.5MB>13.45MB>13.44MB. However I'm unsure that makes sense, maybe it is indeed harder the lower you go if we look at a ex. 100TB input ex. the limit is 2TB (and currently we'd ex. only reach 4TB). Perhaps this is harder the lower you go because there is less hints and higher uncertainty in some problems that require lots of knowledge. So instead it is more honoring to lower compression by just a little bit once it is lower. Of course probability gets worse for hard problems to predict well on, consider the question below for predicting the likely answer:

    "Those witches who were spotted on the house left in a hurry to see the monk in the cave near the canyon and there was the pot of gold they left and when they returned back they knew where to go if they wanted it back. They knew the keeper now owned it and if they waited too long then he would forever own it for now on."
    > Who owns what?
    > Possible Answers: Witches own monk/witches own canyon/monk owns gold/monk owns house/monk owns cave/cave owns pot/there was pot/he owns it

    You can't help that hard issues have less probability of being predicted. But you can improve it still. These issues are hard because they haven't been seen much, no match. Yes, simply entailment and translation is used to see the frequency of the answer! How do I know this ;-; :D? Think about it, if real world data humans wrote says the turtle sat in water and yelped, you can be sure it will yelp, you can be sure a rolling ball will fall of a table, a molecule x will twist like x, etc, and if the words are unseen but similar ex. 'the cat ate the ?' and you seen lots 'this dog digested our bread' then you know the probability of what follows (and what is what, cat=dog using contexts seen prior). This works for rearranged words and letters etc, typos, etc, missing or added words... So simply doing prediction will discover the true answers with highest probabilities. Of course, it may not be exactly as simple as a match is all that's needed and then you get the entailing answer (or translation) ex. 'the cure to caner is '. At least it doesn't seem like it, but it is. This requires you to take the word cancer, look into what entails IT; pain, rashes, what those are as well similar to, what entails THOSE, repeat, commonsense reasoning like BERT translation here yes, giving you more virtual generated data, and so you are finding yes the entailment to 'the cure to caner is ' except the prediction is not directly tied to those very words if you know what I mean.

    Now if our predictor was a image predictor or movie generator, and also used text context (multi-sensory context predictor), we would have AGI 'talking' to itself learning new data online. And can share its thoughts of vision to talk to us using vision too. We also use Byte Pair Encoded segmentation like 'sounds' (words...) to activate certain images, it is more efficient, images aren't as abstract in that way, but can be much more detailed low level! We will need both later to get better prediction. And yes, images are words, words describe image objects and they are inversely the same thing.

  8. #7
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    846
    Thanks
    242
    Thanked 309 Times in 184 Posts
    Please rename the thread better. There is always some underlying theme in your interest, not just a collection of random stuff.

    Quote Originally Posted by Trench View Post
    1 What can compress binary files or numbers best?
    'best' is a combination of encode speed, decode speed, compression density, decode memory use, encode memory, opportunities in parallel computing, energy use, complexity, binary footprint of the encoder and decoder, and other things

    Quote Originally Posted by Trench View Post
    4 Who sets the standard MS? and if so how would they implement something other formats like 7zip as standard?
    A standradization body such as ISO, W3C or IETF sets the standards for computing.

    Microsoft decides what goes into Microsoft operating systems.

    Quote Originally Posted by Trench View Post
    5 What do commercial companies use for compression?
    All that needs to be used, all that was thought to be good, and some that were used for legacy or by mistake.


    Quote Originally Posted by Trench View Post
    6 Do companies have to pay a fee to open source programs?
    Mostly no. For some yes. For example, Kornel's DSSIM is licensed with a dual license that would make companies to use the commercial license.

    Quote Originally Posted by Trench View Post
    12 When using 7zip it seems like bzip2 is better than lzma2(default). Would anything be better?
    lzma2 is more generic. In some use cases bzip2 predicts better as it has more flexibility with 'context size'.

    In practice smart people are moving from lzma2 to brotli (with a 0.6 % loss in density with 5x decoding speedup) or zstd (with 5 % density loss, but with 8x decoding speedup)

  9. #8
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    846
    Thanks
    242
    Thanked 309 Times in 184 Posts
    Quote Originally Posted by Trench View Post
    4 Who sets the standard MS? and if so how would they implement something other formats like 7zip as standard?
    The most recent standard general purpose compression format (https://tools.ietf.org/html/rfc7932) that penetrated Microsoft Windows, iOS, and browsers for integration is brotli.

    Microsoft: https://devblogs.microsoft.com/dotne...i-compression/

    Apple: https://developer.apple.com/videos/play/wwdc2017/709/

    Chrome: https://www.omgchrome.com/brotli-htt...ing-to-chrome/

    Firefox: https://hacks.mozilla.org/2015/11/be...n-with-brotli/

    Android: https://www.xda-developers.com/googl...r-ota-updates/

  10. #9
    Member
    Join Date
    Jul 2014
    Location
    Mars
    Posts
    197
    Thanks
    135
    Thanked 13 Times in 12 Posts
    Hm Self Recursive Data, quite interesting post, yeah all in all compression and information exchange is a way to singularity. My notes would be as follows:
    1. AI is advertised goodness. No real AI will be implemented in recent years. For real and by real I mean selfconscious AI - we need to base it on human brain, which in current time impossible cause of many things, 1st of all human brain is constantly changing thing, neuronet driven by chemistry, constant chemical interactions, 2nd - every intelligence by current data shows it need motivation, drive - self induced drugs like dopamine etc in us (for example you finished your super compressor or algorithms and you`re happy - self induced hormones are produced in your brain triggering this wonderful sense, so it is trouble how to motivate this AI to "live". Current trendy name AI means only big data analizing and processing, just as always great name to gain grants from unaware people. I`m not against such technology but again it have no connection to real AI.
    2. Things which slow progress and better world) - human corruption, abusing of grant system for money (examples - hype projects with no widespread or practical result - cloning, stem cells, Human genome project, cold fusion, MRI showing what part of brain used right now), low education and ancient "legacy code" - drives which make hard to live - domination, reproduction, hunger)

  11. #10
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,942
    Thanks
    291
    Thanked 1,286 Times in 728 Posts
    > For real and by real I mean selfconscious AI - we need to base it on human brain,

    Why? First, define self-conscious (or self-aware, if you actually meant that).
    Is it reflection?
    Is it self-preservation or something like that?
    Egoism? :)

    If its just a bot that can navigate in a city, communicate in english, and perform useful work -
    I think it might already exist. At least modules for it were already demonstrated, although
    by different companies.

    Sure, further improvement is possible, but computers already can do
    all the relevant recognition/prediction tasks atm.
    Comprehension is translation to internal syntax.
    Simple task solving at the human level is also already implemented (SAT,ATP etc).

    I think the main problem atm is hardware cost (who would buy a home bot for $100k?).
    Also content copyrights, if you want the bot to do content analysis and answer questions.

  12. #11
    Member
    Join Date
    Jul 2014
    Location
    Mars
    Posts
    197
    Thanks
    135
    Thanked 13 Times in 12 Posts
    I would say self-awareness, creativity - make something not made before someone else - if you want to create human copy of AI. (Don`t say - look there are examples of AI making music/paintings - it`s made by human defined algorithms not self-aware consciousness). All things you listed above are programmes / apps - and forever will be cause input data for them made by humans and limited by default, yes I agree we can make great simulants / human like behaviour- but it will always be limited / predefined. Of course we can theoretically simulate brain and memories but it`s beyond current technology, we have to simulate every cell every chemical reaction... So yes I agree we can make humanlike droids and yes computing power is limited to make it advanced like Terminator or Aliens` Bishop droid)
    So we have actually 2 different things and goals to implement under the same name - AI - 1.humanlike assistant/tool 2. artificial independent consciousness (Skynet?) )

  13. #12
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,942
    Thanks
    291
    Thanked 1,286 Times in 728 Posts
    > I would say self-awareness, creativity - make something not made before someone else -

    There're plenty of examples where GA (or some other optimization algorithms) find unexpected
    solutions for various problems, ones that humans would consider creative.

    > if you want to create human copy of AI. (Don`t say - look there are
    > examples of AI making music/paintings - it`s made by human defined
    > algorithms not self-aware consciousness).

    To an extent, art could be a problem, yes, and might really require making a copy of a human brain -
    because we'd need precise feedback to train AI on existing art -
    or at least good psychoacoustic/psychovisual/association simulators.

    On other hand, formalized aspects of art are relatively easy for AIs
    (like drawing a picture in style of a specific artist from a photo).
    And its not like all humans are automatically artists and musicians.
    In fact I'm not even sure if we need to even try to get AIs to create real art
    (though I don't think its impossible - just hard to find the information for AI training).

    > All things you listed above are programmes / apps - and forever will be cause input data for them made
    > by humans and limited by default,

    Why? Is runtime analysis and generation of necessary programs that hard to imagine?

    Or something like https://youtu.be/ANkHL3UPVC4?t=56 :)

    AI can find unique solutions or algorithms, write it down as text,
    then get another AI to read it and learn - that won't be made by humans?

    > we can make great simulants / human like behaviour- but it will always be limited / predefined.

    I don't see why. Existing AIs are simply not allowed to learn from environment,
    that's why they have to use human input.
    Scan the environment, then build algorithms to reach a chosen goal using available options -
    what's so hard in that?

    > Of course we can theoretically simulate brain and memories but it`s beyond current technology,

    I think its already possible, just highly unethical.
    I mean, human brain is relatively redundant, and there're supposedly some neuron simulation chips,
    so its probably possible to use "Ship of Theseus" approach to incrementally replace parts of brain
    with chips, then get it to recover.

    > we have to simulate every cell every chemical reaction...

    That's only necessary for perfect simulation of some specific brain.
    A rough approximation should be enough to get a human-like AI.

    > So yes I agree we can make humanlike droids and yes computing power is limited to make
    > it advanced like Terminator or Aliens` Bishop droid)

    Actually atm the main problem seems to be power.
    Bots are very weak and have to be frequently recharged, or powered by wire.

    > 2. artificial independent consciousness (Skynet?) )

    I only see problems with things that require specifically emulating human hardware (like art).

    Otherwise I think that currently available hardware/software should be already enough
    to build an independent "lifeform" bot, with human-level ability to deal with
    environment and build its clones. Just that it would be pretty expensive while useless.

Similar Threads

  1. Some Questions regarding Hutterprize
    By thometal in forum Data Compression
    Replies: 10
    Last Post: 25th July 2018, 00:05
  2. FreeArc usability questions
    By TheEmptyMind in forum Data Compression
    Replies: 12
    Last Post: 14th July 2013, 14:36
  3. Greetings, Questions, and Benchmarks
    By musicdemon in forum Data Compression
    Replies: 4
    Last Post: 8th January 2012, 21:45
  4. Questions about compression
    By 0011110100101001 in forum Data Compression
    Replies: 12
    Last Post: 8th December 2011, 01:31
  5. Random reads vs random writes
    By Piotr Tarsa in forum The Off-Topic Lounge
    Replies: 22
    Last Post: 16th May 2011, 09:58

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •