Results 1 to 22 of 22

Thread: Compression with Pi

  1. #1
    Member
    Join Date
    Feb 2010
    Location
    Germany
    Posts
    77
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Compression with Pi

    I wanted to ask you knowledgeable people about an idea that has been going around in my head for years. I'm nowhere near a compression expert, so please don't jump on me if this is ridiculous for reasons I can't grasp.

    My math teacher once told me that it is assumed that the number Pi would contain any kind of combinations of numbers thinkable. So, wouldn't it be possible to transform Pi (or any other endless mathematical constant or combination of a multitude of them) into a binary format and then trying to find matches between the file that needs to be compressed? All that would need to be stored in an archive would be coordinates of the corresponding digits to the chunks that match, not the data itself. Restoring the file would be computationally intense on a massive scale, but shouldn't it be possible to even "extract" huge files from tiny archives this way? Has someone ever thought about this?

  2. #2
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,610
    Thanks
    30
    Thanked 65 Times in 47 Posts
    You're not the first with this idea. It won't work. Instead of a file you'd have to store a single number, a position in the stream of numbers. Usually, this would be so far that the number would be bigger than the file itself.
    Learn about counting argument.

  3. #3
    Member
    Join Date
    Feb 2010
    Location
    Germany
    Posts
    77
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Thank you for your answer. Yes the number will be bigger than the file itself, but since the number is not stored in the archive, but computed in real time by the archiver this shouldn't be much of a problem. What am I getting wrong here? I'll look into the counting argument.

  4. #4
    Programmer Jan Ondrus's Avatar
    Join Date
    Sep 2008
    Location
    Rychnov nad Kněžnou, Czech Republic
    Posts
    279
    Thanks
    33
    Thanked 138 Times in 50 Posts
    Quote Originally Posted by Mexxi View Post
    All that would need to be stored in an archive would be coordinates of the corresponding digits to the chunks that match, not the data itself.
    Quote Originally Posted by Mexxi View Post
    What am I getting wrong here?
    He means those coordinates would be be bigger than original file...

  5. #5
    Member
    Join Date
    Feb 2010
    Location
    Germany
    Posts
    77
    Thanks
    2
    Thanked 0 Times in 0 Posts
    That makes sense. I guess the point would be to only use the irrational number as a basis if the coordinates would be smaller than the chunk of data and otherwise leave it uncompressed.

  6. #6
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    You can't compress but you if look at the space of all files. No compressor can compress all files in fact the identity transform is about the best overall compressor for all files.

    That said one has to realize we make compressors to compress useful files smaller and hope that few bad files are encountered.

    First of all any file can be thought of as a single number since you can bijectively transform between the two sets.

    And you can map the set of all pairs of numbers to the set of numbers.

    So you could write a bijective file compressor using PI that will cause many files to compress and many files to expand.

    So if you find the files that compress are useful then you have your compressor.

    You could even create a random looking set of files that compress with your method and then have a contest where
    your compressor would likely beat all the other normal file compressors.

  7. Thanks:

    Mexxi (10th September 2013)

  8. #7
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Whoops sorry my idea above is incomplete. Since you could have several offsets with the same length that decompress to the same file.
    So it's harder to write such a compressor that I indicated but with time I think its possible but still most likely it would not compress the kind of files one wants to compress.

  9. #8
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,572
    Thanks
    779
    Thanked 687 Times in 372 Posts

  10. #9
    Member
    Join Date
    Feb 2010
    Location
    Germany
    Posts
    77
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Hilarious. I'd give it a shot if it wasn't linux based.

  11. #10
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    If you could even create a program that would find any file in the digits of Pi (in a reasonable amount of time), that would interesting just as a novelty. I bet it would make a splash on Hacker News. But it wouldn't compress anything.

  12. #11
    Member
    Join Date
    Feb 2010
    Location
    Germany
    Posts
    77
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Just for shits'n giggles I have computed Pi and can calculate theoretically 2 billion digits per hour on my cpu. GPU should be even faster. I'm playing with the thought of converting the dump to binary and look for small text fragments just for the fun of it

  13. #12
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    Just be aware that the expected number of trials to find an n character string in random bits is on the order of 256^n, which becomes very large, very fast.

  14. #13
    Member Jean-Marie Barone's Avatar
    Join Date
    Oct 2012
    Location
    Versailles, France
    Posts
    60
    Thanks
    15
    Thanked 4 Times in 3 Posts
    This makes me think of the Infinite monkey theorem.
    Everybody has heard this story : a dozen of monkeys or more typing randomly on typewriters for a million years will produce the entire work of Shakespeare.

    This is actually a myth : "Ignoring punctuation, spacing, and capitalization, a monkey typing letters uniformly at random has a chance of one in 26 of correctly typing the first letter of Hamlet. It has a chance of one in 676 (26*26) of typing the first two letters. Because the probability shrinks exponentially, at 20 letters it already has only a chance of one in 2620 = 19,928,148,895,209,409,152,340,197,376 (almost 2*1028). In the case of the entire text of Hamlet, the probabilities are so vanishingly small they can barely be conceived in human terms".

    I guess the same rule applies if you try to find your favorite proggy in the Pi decimals.

  15. #14
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    I am not sure how long Hamlet is but lets assume it takes a single monkey a month to type enough to be the length of Hamlet.
    you start with one monkey at the end of 1/2 month you add two more monkeys to the crew. Then at end of 1/4 month you add four more monkeys to the crew. Each time doubling the number of monkeys but waiting 1/2 the previous time interval before adding the additional monkeys to the crew.

    Its my belief that by the end of 3 months there should be several copies of Hamlet typed by the monkeys. Of course you will not find enough monkeys but pretend you can and that each type randomly and independent of each other. In fact you can stop short of adding additional monkeys just before 2 month since the crew of monkey will be large enough to do the job.

  16. #15
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    Quote Originally Posted by biject.bwts View Post
    I am not sure how long Hamlet is but lets assume it takes a single monkey a month to type enough to be the length of Hamlet.
    you start with one monkey at the end of 1/2 month you add two more monkeys to the crew. Then at end of 1/4 month you add four more monkeys to the crew. Each time doubling the number of monkeys but waiting 1/2 the previous time interval before adding the additional monkeys to the crew.

    Its my belief that by the end of 3 months there should be several copies of Hamlet typed by the monkeys. Of course you will not find enough monkeys but pretend you can and that each type randomly and independent of each other. In fact you can stop short of adding additional monkeys just before 2 month since the crew of monkey will be large enough to do the job.
    Good insight. If you assume the monkeys are breeding, then the number of monkeys grows exponentially. Where are you going to find enough typewriters, though?

  17. #16
    Member Jean-Marie Barone's Avatar
    Join Date
    Oct 2012
    Location
    Versailles, France
    Posts
    60
    Thanks
    15
    Thanked 4 Times in 3 Posts
    Quote Originally Posted by biject.bwts View Post
    I am not sure how long Hamlet is but lets assume it takes a single monkey a month to type enough to be the length of Hamlet.
    I cannot assume that, David
    Actually, some people took the matter seriously :
    "A website entitled The Monkey Shakespeare Simulator, launched on July 1, 2003, contained a Java applet that simulates a large population of monkeys typing randomly, with the stated intention of seeing how long it takes the virtual monkeys to produce a complete Shakespearean play from beginning to end. For example, it produced this partial line from Henry IV, Part 2, reporting that it took "2,737,850 million billion billion billion monkey-years" to reach 24 matching characters:
    RUMOUR. Open your ears; 9r"5j5&?OWTY Z0d..."

  18. #17
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    Quote Originally Posted by Jean-Marie Barone View Post
    I cannot assume that, David
    Actually, some people took the matter seriously :
    "A website entitled The Monkey Shakespeare Simulator, launched on July 1, 2003, contained a Java applet that simulates a large population of monkeys typing randomly, with the stated intention of seeing how long it takes the virtual monkeys to produce a complete Shakespearean play from beginning to end. For example, it produced this partial line from Henry IV, Part 2, reporting that it took "2,737,850 million billion billion billion monkey-years" to reach 24 matching characters:
    RUMOUR. Open your ears; 9r"5j5&?OWTY Z0d..."
    I wonder if they took into account evolution. By that time, they might have produced another Shakespeare, or, at least, learned a strategy for typing nonrandomly.

    Edit: Actually, my prediction would be some sort of bloody revolt -- using weapons fashioned out of typewriters.
    Last edited by nburns; 11th September 2013 at 05:15.

  19. #18
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,572
    Thanks
    779
    Thanked 687 Times in 372 Posts
    Quote Originally Posted by biject.bwts View Post
    Of course you will not find enough monkeys but pretend you can and that each type randomly and independent of each other. In fact you can stop short of adding additional monkeys just before 2 month since the crew of monkey will be large enough to do the job.
    i think you can stop adding monkeys after 1st month since it already will be infinite

  20. Thanks:

    biject.bwts (11th September 2013)

  21. #19
    Member
    Join Date
    Sep 2013
    Location
    Wuhan, China
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I posted this exact topic already just a few days ago here on this very forum: http://encode.su/threads/1789-100-Lo...allestFilePoss

    And also over at BitCoinTalk.org .... here: https://bitcointalk.org/index.php?topic=288152.0 This thread is the best one, and starts to get good about page 6 and on. You can pick it up there.

    So my question to you, Mexxi, is are you for real, or are you just messing with me? (wink).
    "All achievements, all earned riches, have their beginning in an idea." <-- Napoleon Hill --> "More gold has been mined from the thoughts of men than has been [mined] from the earth."

  22. #20
    Member
    Join Date
    Feb 2010
    Location
    Germany
    Posts
    77
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Lol dui bu qi. I saw your topic on this forum, but you were rather vague and not really technical while I thought about irrational number specifically. Also, I came here with my question already in mind before knowing of your thread, so I'd say it's a coincidence Additionally, I have to say that I'm not that far removed from reality to think that a movie could be compressed to kilobytes this way. This is absolutely impossible for the reasons stated in your thread and this one. I was merely wondering whether a proof of concept existed that would show the (in)efficiency of such a technique.

    I'll take a look at the bitcoin thread. May be it actually leads to something.
    Last edited by Mexxi; 11th September 2013 at 19:46.

  23. #21
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    I am surprised that the people on bitcointalk are so patient with someone who doesn't know enough math to understand that you can't put n pigeons into n-1 pigeon holes.

    But maybe I shouldn't be. There are still a bunch of people on comp.compression who have for years worked on this dream that will bring them fame and riches, and another bunch who for years keep trying in vain to explain to them why it won't work. Meanwhile, everyone with actual working software has left...

    I really hate to see people waste their lives this way, when they could have spent that time doing something useful like learning a little math and how to write code.

  24. Thanks (4):

    biject.bwts (13th September 2013),Bulat Ziganshin (13th September 2013),Mexxi (12th September 2013),sh0dan (15th September 2013)

  25. #22
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    Quote Originally Posted by Matt Mahoney View Post
    I am surprised that the people on bitcointalk are so patient with someone who doesn't know enough math to understand that you can't put n pigeons into n-1 pigeon holes.
    Pigeons may not be the best analogy, because pigeons are soft and they can probably be crammed in.

    But maybe I shouldn't be. There are still a bunch of people on comp.compression who have for years worked on this dream that will bring them fame and riches, and another bunch who for years keep trying in vain to explain to them why it won't work. Meanwhile, everyone with actual working software has left...

    I really hate to see people waste their lives this way, when they could have spent that time doing something useful like learning a little math and how to write code.
    If someone can't grasp when they are wasting their time, then they probably didn't have much of a future in programming, anyway.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •