1. ## Compressing pi

Some compression results for pi.txt from the Canterbury miscellaneous corpus.
pi.txt contains the first million digits of pi: 3141592653...45815

Code:
```1,000,000 uncompressed
470,439 gzip -9
438,096 7zip
431,671 bzip2
419,212 ppmonstr
416,101 paq8px_v69 -8
159 zpaq ocpi```
The ZPAQ configuration file pi.cfg is as follows:

Code:
```comp 0 0 0 19 1
0 icm 5
hcomp
halt
a> 255 ifnot halt endif (no output unless EOF)

(Compute pi to 1000000 digits using the formula:
pi=4; for (d=r1*20/3;d>0;--d) pi=pi*d/(2*d+1)+2;
where r1 is the number of base 100 digits in M.
The precision is 1 bit per iteration so 20/3
is slightly more than the log2(100) we need. We compute
10 extra trailing digits and discard them to avoid
roundoff error.)
a= 100 a*= 100 a*= 50 a+= 10 r=a 1 (r1 = digits base 100)
a*= 20 a/= 3 d=a (d iterations)
*b= 40 (pi = 4)
do

(multiply M *= d, carry in c)
b=r 1 c=0
do
b--
a=*b a*=d a+=c c=a a%= 100 *b=a
a=c a/= 100 c=a
a=b a> 0 while

(divide M /= (2d+1), remainder in c)
a=d a+=d a++ d=a
do
a=c a*= 100 a+=*b c=a a/=d *b=a
a=c a%=d c=a
a=r 1 b++ a>b while
a=d a>>= 1 d=a

b= 0 a= 20 a+=*b *b=a
d-- a=d a> 0 while

(output the digits in ASCII, 2 per byte)
a=r 1 a-= 10 r=a 1 (discard last 10 bytes)
do
a=*b a/= 10 a+= 48 out
a=*b a%= 10 a+= 48 out
b++ a=r 1 a>b while
halt
end```
The preprocessor discard.bat takes an input and output file as arguments and creates an empty output file:

Code:
`copy nul: %2`
The postprocessor ignores the decoded output and computes pi to 1,000,000 decimal places after decoding EOF. Thus, pi.cfg works only on this one particular file. The code itself is compressed with an order 0 indirect model. It would be possible to reduce the compressed size further from 159 bytes to 126 by not storing the filename, comment, and checksum using the command "zpaq nisocpi pi.zpaq pi.txt".

The algorithm is not the most efficient. It took 31868 seconds (about 9 hours) to compress and 27943 seconds to decompress on a 2.67 GHz i7 M620 under 64 bit Linux (using an equivalent shell script for discard "cp /dev/null \$2"). Compression is slow because zpaq tests the pre-post processing sequence before encoding, which requires generating the SHA1 hash of pi and comparing it with the SHA1 hash of the input. It would have been 5 times slower without optimization using the "o" modifier (translating the above ZPAQL to C++, compiling, and rerunning zpaq) or decompressing with a non-optimizing program like pzpaq.

I realize there are faster algorithms for computing pi. For example, qpi computes a million digits in about 1 second using (I think) Chudnovsky's formula with binary splitting, FFT multiplication, and Newton-Raphson division. But then the archive would have been larger.   Reply With Quote

2. A small improvement to 114 bytes. Changes:
- Removed check for EOF
- Removed calculation of 10 extra digits. Calculation turns out to be exact anyway.
- Changed 2 byte instruction "b= 0" to 1 byte form "b=0".
- Compressed with "zpaq nisocpi pi.txt.zpaq pi.txt" (no filename, comment, or checksum).

Compression time is 27887 seconds.

Code:
```comp 0 0 0 19 1
0 icm 5
hcomp
halt

(Compute pi to 1000000 digits using the formula:
pi=4; for (d=r1*20/3;d>0;--d) pi=pi*d/(2*d+1)+2;
where r1 is the number of base 100 digits.
The precision is 1 bit per iteration so 20/3
is slightly more than the log2(100) we need.)
a= 100 a*= 100 a*= 50 r=a 1 (r1 = digits base 100)
a*= 20 a/= 3 d=a (d = n iterations)
*b= 40 (M=4)
do

(multiply M *= d, carry in c)
b=r 1 c=0
do
b--
a=*b a*=d a+=c c=a a%= 100 *b=a
a=c a/= 100 c=a
a=b a> 0 while

(divide M /= (2d+1), remainder in c)
a=d a+=d a++ d=a
do
a=c a*= 100 a+=*b c=a a/=d *b=a
a=c a%=d c=a
a=r 1 b++ a>b while
a=d a>>= 1 d=a

b=0 a= 20 a+=*b *b=a
d-- a=d a> 0 while

(output the digits in ASCII, 2 per byte)
do
a=*b a/= 10 a+= 48 out
a=*b a%= 10 a+= 48 out
b++ a=r 1 a>b while
halt
end```  Reply With Quote

3. Originally Posted by Matt Mahoney Some compression results for pi.txt from the Canterbury miscellaneous corpus.
pi.txt contains the first million digits of pi: 3141592653...45815

Code:
```416,101 paq8px_v69 -8
159 zpaq ocpi```
That looks almost like a typo It's really an astonishing reult, both in resulting size and partially also in time needed to compute.  Reply With Quote

4.  Reply With Quote

5. More computing power, actually   Reply With Quote

6. Assuming this is not to dumb a question (I can't find a calc program with logs at present).

If you assume Pi is has random digits, how many bits are needed to express one million digits?
Log2(10^(10^6)) right? How many bits is that please?

Yes, I am that dumb. I know the rough rule is ten bits for every 10^3, I know it is a fractionally less
so (1,000,000/3)*10= 3,333,333 bits / 8 = 416667 bytes. So guess this means you are getting great results.

Sorry if this seems silly, I had to do the steps to see how good you were doing. Congrats to you.  Reply With Quote

7. One digit should occupy 3,321928095 bits, that's 0,415241012 bytes, so million digits should occupy about 415241,012 bytes. Wonder how would arb255 perform.  Reply With Quote

8. @Piotr: There is minor typo. It should be 1 billion according to your result @Earl: You can always use Windows' Calculator for "log" or any other functions like I used Or even better there should be some online programs for computing such function (i.e. Wolfram Integrator is a great online tool).  Reply With Quote

9. ## I don t like floats

I don t like float numbers. In my opinion, math can use only integer numbers to calculate.  Reply With Quote

10. Originally Posted by lunaris I don t like float numbers. In my opinion, math can use only integer numbers to calculate.
No difference. Regular languages are Turing complete with both integer and fp, so whatever maths you do with ints, you can do with floats. Performance is another topic.  Reply With Quote

11. But float is just a concatenation of two integers - matissa and exponent. You can store them separately if you want.  Reply With Quote

12. I don t like float operations because it uses infinite numbers .Integer math always use complete and precise numbers.

I don t care about performance .What bothers me is the cracked math .

Pi for example is a awful number because it's very cracked and infinite.

Is there a way to calculate circles using only integer numbers?  Reply With Quote

13. Originally Posted by lunaris Is there a way to calculate circles using only integer numbers?
use a series-representation of pi

cracked, infinite ugly numbers? ... operations on real numbers are always exact, unless you do it on an finite system like a cpu

e^(i*pi)-1=0 ... that's what i call beauty  Reply With Quote

14. Originally Posted by lunaris I don t like float operations because it uses infinite numbers .Integer math always use complete and precise numbers.

I don t care about performance .What bothers me is the cracked math .

Pi for example is a awful number because it's very cracked and infinite.

Is there a way to calculate circles using only integer numbers?
I think you don't mean float but real numbers that are not rational. Because floats have finite precision and if you treat they as real numbers, calculations often have some errors. About the same as if you tried to used ints this way.  Reply With Quote

15. Because we are using finite materials like a computer , using infinite floating numbers is very ugly thing.

Pi-series still use infinite calculations and still calculates pi.I dont like numbers e,pi and i. Lemniscata (infinity) and zero are acceptable .

No one invented a way to calculate circles ,spheres using only right angles and integer numbers ?  Reply With Quote

16. Yes. Finite float numbers are acceptable.  Reply With Quote

17. 1000000*log(10)/log(256) = 415241 bytes.

BTW this simple stationary order 0 ZPAQ model compresses pi.txt to 415536 (better than paq8px_v69 - in 0.3 seconds. Command: zpaq nisoco0s pi.txt

Code:
```(o0s.cfg)
comp 0 0 0 0 1
0 cm 9 255
hcomp
halt
post
0
end```
Edit: Windows Calc program has logs. Click on View/Scientific.  Reply With Quote

18. Thank you for the math.  Reply With Quote

19. Can you attach the pi.txt and cfg file?   Reply With Quote

20. You can get pi.txt here: http://corpus.canterbury.ac.nz/descriptions/#misc

Or you can create it with the command: zpaq oprpi nul: pi.txt
using pi.cfg above. I suggest running it overnight.  Reply With Quote

21. I used my inter cpu T9300 to test lastnight. now I am still waitting@:@. Matt, could you attach the compressed 114 byte file?  Reply With Quote

22. Attached. It takes about 15 hours on my 2 GHz T3200 laptop with "zpaq ox" or with pzpaq with optimization enabled. I guess it would take 5 times as long without an external g++ compiler. Using more threads in pzpaq won't help because there is only one block.  Reply With Quote

23. Thanks Matt.
I had been analyze the 114 bytes file, and found that it has 5 bytes redundancy. Using my compression method, it can be reached to 112 bytes.  Reply With Quote

24. Contents of pi.txt.zpaq (114 bytes)

Code:
```0000000 122  80  81   1   1  10   0   0   0   0  19   1   3   5   0  56
0000016   0   1   0   0   0 254  85  82 111 233  49 153 114  20 183 195
0000032 210 125 244 198  36  91 120 209  24 215  72  78  26  17  43 168
0000048 220 168  50 202  95  36  99  17 170 197 127 165  71 199 136  88
0000064  23 226 167 242 251 159  73  79 227 240 164 109  33 130  14 205
0000080 160 248  96 171 240 215  74  81 181 141 147 200 203  34 101  97
0000096  16 192 158  66 216  32 116 192 119  67 198 117   0   0   0   0
0000112 254 255
0000114```
122 80 81 = magic number "zPQ"
1 = ZPAQ level
0 0 0 19 1 = COMP section parameters: PM = 2^19 bytes, 1 component
3 5 0 = COMP section (ICM 5), and NUL terminator
56 0 = HCOMP section (HALT), and NUL terminator
0 = filename (empty)
0 = comment (empty)
0 = reserved
254 85 ... 198 117 (87 bytes) = compressed data
0 0 0 0 = end of compressed data marker
254 = end of segment marker (no SHA1 checksum)
255 = end of block marker

The 87 bytes of compressed data decodes to 92 bytes:

1 = flag to indicate PCOMP section is present
89 0 = length of PCOMP section
... (89 bytes) = code in ZPAQL to compute pi
end of segment bit coded with p = 2^-32, which codes as 4 bytes

The compression algorithm is a simple indirect order 0 model. Partial byte context -> bit history -> prediction -> binary arithmetic coding.  Reply With Quote

25. ## Prediction model

Some months ago I started to bridge Kolmogorov complexity and adaptive prediction, so Matt helped me to show that instead of computation in pcomp you can also use hcomp to compute Pi and give a perfect prediction. It has almost the same output file size. (But less digits are used so that one does not have to wait so long for testing.)

From there it was a small step to mix it with a general text model, so that occurrences of Pi are detected and coded with a few bits because of good prediction. More text is in the B.Sc. thesis.

pi10k.cfg
mixedpi2.cfg  Reply With Quote

26. ## Thanks (3):

Gonzalo (26th September 2016),Matt Mahoney (28th September 2016),schnaader (26th September 2016)

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•