1. ## Compression with Pi

I wanted to ask you knowledgeable people about an idea that has been going around in my head for years. I'm nowhere near a compression expert, so please don't jump on me if this is ridiculous for reasons I can't grasp.

My math teacher once told me that it is assumed that the number Pi would contain any kind of combinations of numbers thinkable. So, wouldn't it be possible to transform Pi (or any other endless mathematical constant or combination of a multitude of them) into a binary format and then trying to find matches between the file that needs to be compressed? All that would need to be stored in an archive would be coordinates of the corresponding digits to the chunks that match, not the data itself. Restoring the file would be computationally intense on a massive scale, but shouldn't it be possible to even "extract" huge files from tiny archives this way? Has someone ever thought about this?

2. You're not the first with this idea. It won't work. Instead of a file you'd have to store a single number, a position in the stream of numbers. Usually, this would be so far that the number would be bigger than the file itself.

3. Thank you for your answer. Yes the number will be bigger than the file itself, but since the number is not stored in the archive, but computed in real time by the archiver this shouldn't be much of a problem. What am I getting wrong here? I'll look into the counting argument.

4. Originally Posted by Mexxi
All that would need to be stored in an archive would be coordinates of the corresponding digits to the chunks that match, not the data itself.
Originally Posted by Mexxi
What am I getting wrong here?
He means those coordinates would be be bigger than original file...

5. That makes sense. I guess the point would be to only use the irrational number as a basis if the coordinates would be smaller than the chunk of data and otherwise leave it uncompressed.

6. You can't compress but you if look at the space of all files. No compressor can compress all files in fact the identity transform is about the best overall compressor for all files.

That said one has to realize we make compressors to compress useful files smaller and hope that few bad files are encountered.

First of all any file can be thought of as a single number since you can bijectively transform between the two sets.

And you can map the set of all pairs of numbers to the set of numbers.

So you could write a bijective file compressor using PI that will cause many files to compress and many files to expand.

So if you find the files that compress are useful then you have your compressor.

You could even create a random looking set of files that compress with your method and then have a contest where
your compressor would likely beat all the other normal file compressors.

7. ## Thanks:

Mexxi (10th September 2013)

8. Whoops sorry my idea above is incomplete. Since you could have several offsets with the same length that decompress to the same file.
So it's harder to write such a compressor that I indicated but with time I think its possible but still most likely it would not compress the kind of files one wants to compress.

9. Hilarious. I'd give it a shot if it wasn't linux based.

10. If you could even create a program that would find any file in the digits of Pi (in a reasonable amount of time), that would interesting just as a novelty. I bet it would make a splash on Hacker News. But it wouldn't compress anything.

11. Just for shits'n giggles I have computed Pi and can calculate theoretically 2 billion digits per hour on my cpu. GPU should be even faster. I'm playing with the thought of converting the dump to binary and look for small text fragments just for the fun of it

12. Just be aware that the expected number of trials to find an n character string in random bits is on the order of 256^n, which becomes very large, very fast.

13. This makes me think of the Infinite monkey theorem.
Everybody has heard this story : a dozen of monkeys or more typing randomly on typewriters for a million years will produce the entire work of Shakespeare.

This is actually a myth : "Ignoring punctuation, spacing, and capitalization, a monkey typing letters uniformly at random has a chance of one in 26 of correctly typing the first letter of Hamlet. It has a chance of one in 676 (26*26) of typing the first two letters. Because the probability shrinks exponentially, at 20 letters it already has only a chance of one in 2620 = 19,928,148,895,209,409,152,340,197,376 (almost 2*1028). In the case of the entire text of Hamlet, the probabilities are so vanishingly small they can barely be conceived in human terms".

I guess the same rule applies if you try to find your favorite proggy in the Pi decimals.

14. I am not sure how long Hamlet is but lets assume it takes a single monkey a month to type enough to be the length of Hamlet.
you start with one monkey at the end of 1/2 month you add two more monkeys to the crew. Then at end of 1/4 month you add four more monkeys to the crew. Each time doubling the number of monkeys but waiting 1/2 the previous time interval before adding the additional monkeys to the crew.

Its my belief that by the end of 3 months there should be several copies of Hamlet typed by the monkeys. Of course you will not find enough monkeys but pretend you can and that each type randomly and independent of each other. In fact you can stop short of adding additional monkeys just before 2 month since the crew of monkey will be large enough to do the job.

15. Originally Posted by biject.bwts
I am not sure how long Hamlet is but lets assume it takes a single monkey a month to type enough to be the length of Hamlet.
you start with one monkey at the end of 1/2 month you add two more monkeys to the crew. Then at end of 1/4 month you add four more monkeys to the crew. Each time doubling the number of monkeys but waiting 1/2 the previous time interval before adding the additional monkeys to the crew.

Its my belief that by the end of 3 months there should be several copies of Hamlet typed by the monkeys. Of course you will not find enough monkeys but pretend you can and that each type randomly and independent of each other. In fact you can stop short of adding additional monkeys just before 2 month since the crew of monkey will be large enough to do the job.
Good insight. If you assume the monkeys are breeding, then the number of monkeys grows exponentially. Where are you going to find enough typewriters, though?

16. Originally Posted by biject.bwts
I am not sure how long Hamlet is but lets assume it takes a single monkey a month to type enough to be the length of Hamlet.
I cannot assume that, David
Actually, some people took the matter seriously :
"A website entitled The Monkey Shakespeare Simulator, launched on July 1, 2003, contained a Java applet that simulates a large population of monkeys typing randomly, with the stated intention of seeing how long it takes the virtual monkeys to produce a complete Shakespearean play from beginning to end. For example, it produced this partial line from Henry IV, Part 2, reporting that it took "2,737,850 million billion billion billion monkey-years" to reach 24 matching characters:
RUMOUR. Open your ears; 9r"5j5&?OWTY Z0d..."

17. Originally Posted by Jean-Marie Barone
I cannot assume that, David
Actually, some people took the matter seriously :
"A website entitled The Monkey Shakespeare Simulator, launched on July 1, 2003, contained a Java applet that simulates a large population of monkeys typing randomly, with the stated intention of seeing how long it takes the virtual monkeys to produce a complete Shakespearean play from beginning to end. For example, it produced this partial line from Henry IV, Part 2, reporting that it took "2,737,850 million billion billion billion monkey-years" to reach 24 matching characters:
RUMOUR. Open your ears; 9r"5j5&?OWTY Z0d..."
I wonder if they took into account evolution. By that time, they might have produced another Shakespeare, or, at least, learned a strategy for typing nonrandomly.

Edit: Actually, my prediction would be some sort of bloody revolt -- using weapons fashioned out of typewriters.

18. Originally Posted by biject.bwts
Of course you will not find enough monkeys but pretend you can and that each type randomly and independent of each other. In fact you can stop short of adding additional monkeys just before 2 month since the crew of monkey will be large enough to do the job.
i think you can stop adding monkeys after 1st month since it already will be infinite

19. ## Thanks:

biject.bwts (11th September 2013)

20. I posted this exact topic already just a few days ago here on this very forum: http://encode.su/threads/1789-100-Lo...allestFilePoss

And also over at BitCoinTalk.org .... here: https://bitcointalk.org/index.php?topic=288152.0 This thread is the best one, and starts to get good about page 6 and on. You can pick it up there.

So my question to you, Mexxi, is are you for real, or are you just messing with me? (wink).

21. Lol dui bu qi. I saw your topic on this forum, but you were rather vague and not really technical while I thought about irrational number specifically. Also, I came here with my question already in mind before knowing of your thread, so I'd say it's a coincidence Additionally, I have to say that I'm not that far removed from reality to think that a movie could be compressed to kilobytes this way. This is absolutely impossible for the reasons stated in your thread and this one. I was merely wondering whether a proof of concept existed that would show the (in)efficiency of such a technique.

I'll take a look at the bitcoin thread. May be it actually leads to something.

22. I am surprised that the people on bitcointalk are so patient with someone who doesn't know enough math to understand that you can't put n pigeons into n-1 pigeon holes.

But maybe I shouldn't be. There are still a bunch of people on comp.compression who have for years worked on this dream that will bring them fame and riches, and another bunch who for years keep trying in vain to explain to them why it won't work. Meanwhile, everyone with actual working software has left...

I really hate to see people waste their lives this way, when they could have spent that time doing something useful like learning a little math and how to write code.

23. ## Thanks (4):

biject.bwts (13th September 2013),Bulat Ziganshin (13th September 2013),Mexxi (13th September 2013),sh0dan (15th September 2013)

24. Originally Posted by Matt Mahoney
I am surprised that the people on bitcointalk are so patient with someone who doesn't know enough math to understand that you can't put n pigeons into n-1 pigeon holes.
Pigeons may not be the best analogy, because pigeons are soft and they can probably be crammed in.

But maybe I shouldn't be. There are still a bunch of people on comp.compression who have for years worked on this dream that will bring them fame and riches, and another bunch who for years keep trying in vain to explain to them why it won't work. Meanwhile, everyone with actual working software has left...

I really hate to see people waste their lives this way, when they could have spent that time doing something useful like learning a little math and how to write code.
If someone can't grasp when they are wasting their time, then they probably didn't have much of a future in programming, anyway.

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•