Results 1 to 25 of 25

Thread: Compressing pi

  1. #1
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts

    Compressing pi

    Some compression results for pi.txt from the Canterbury miscellaneous corpus.
    pi.txt contains the first million digits of pi: 3141592653...45815

    Code:
    1,000,000 uncompressed
    470,439 gzip -9
    438,096 7zip
    431,671 bzip2
    419,212 ppmonstr
    416,101 paq8px_v69 -8
        159 zpaq ocpi
    The ZPAQ configuration file pi.cfg is as follows:

    Code:
    comp 0 0 0 19 1
      0 icm 5
    hcomp
      halt
    pcomp discard ;
      a> 255 ifnot halt endif (no output unless EOF)
    
      (Compute pi to 1000000 digits using the formula:
         pi=4; for (d=r1*20/3;d>0;--d) pi=pi*d/(2*d+1)+2;
       where r1 is the number of base 100 digits in M.
       The precision is 1 bit per iteration so 20/3
       is slightly more than the log2(100) we need. We compute
       10 extra trailing digits and discard them to avoid
       roundoff error.)
      a= 100 a*= 100 a*= 50 a+= 10 r=a 1 (r1 = digits base 100)
      a*= 20 a/= 3 d=a (d iterations)
      *b= 40 (pi = 4)
      do
    
        (multiply M *= d, carry in c)
        b=r 1 c=0
        do
          b--
          a=*b a*=d a+=c c=a a%= 100 *b=a
          a=c a/= 100 c=a
        a=b a> 0 while
    
        (divide M /= (2d+1), remainder in c)
        a=d a+=d a++ d=a
        do
          a=c a*= 100 a+=*b c=a a/=d *b=a
          a=c a%=d c=a
        a=r 1 b++ a>b while
        a=d a>>= 1 d=a
    
        (add 2)
        b= 0 a= 20 a+=*b *b=a
      d-- a=d a> 0 while
    
      (output the digits in ASCII, 2 per byte)
      a=r 1 a-= 10 r=a 1 (discard last 10 bytes)
      do
        a=*b a/= 10 a+= 48 out
        a=*b a%= 10 a+= 48 out
      b++ a=r 1 a>b while
      halt
    end
    The preprocessor discard.bat takes an input and output file as arguments and creates an empty output file:

    Code:
    copy nul: %2
    The postprocessor ignores the decoded output and computes pi to 1,000,000 decimal places after decoding EOF. Thus, pi.cfg works only on this one particular file. The code itself is compressed with an order 0 indirect model. It would be possible to reduce the compressed size further from 159 bytes to 126 by not storing the filename, comment, and checksum using the command "zpaq nisocpi pi.zpaq pi.txt".

    The algorithm is not the most efficient. It took 31868 seconds (about 9 hours) to compress and 27943 seconds to decompress on a 2.67 GHz i7 M620 under 64 bit Linux (using an equivalent shell script for discard "cp /dev/null $2"). Compression is slow because zpaq tests the pre-post processing sequence before encoding, which requires generating the SHA1 hash of pi and comparing it with the SHA1 hash of the input. It would have been 5 times slower without optimization using the "o" modifier (translating the above ZPAQL to C++, compiling, and rerunning zpaq) or decompressing with a non-optimizing program like pzpaq.

    I realize there are faster algorithms for computing pi. For example, qpi computes a million digits in about 1 second using (I think) Chudnovsky's formula with binary splitting, FFT multiplication, and Newton-Raphson division. But then the archive would have been larger.

  2. #2
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    A small improvement to 114 bytes. Changes:
    - Removed check for EOF
    - Removed calculation of 10 extra digits. Calculation turns out to be exact anyway.
    - Changed 2 byte instruction "b= 0" to 1 byte form "b=0".
    - Compressed with "zpaq nisocpi pi.txt.zpaq pi.txt" (no filename, comment, or checksum).

    Compression time is 27887 seconds.

    Code:
    comp 0 0 0 19 1
      0 icm 5
    hcomp
      halt
    pcomp ./discard ;
    
      (Compute pi to 1000000 digits using the formula:
         pi=4; for (d=r1*20/3;d>0;--d) pi=pi*d/(2*d+1)+2;
       where r1 is the number of base 100 digits.
       The precision is 1 bit per iteration so 20/3
       is slightly more than the log2(100) we need.)
      a= 100 a*= 100 a*= 50 r=a 1 (r1 = digits base 100)
      a*= 20 a/= 3 d=a (d = n iterations)
      *b= 40 (M=4)
      do
    
        (multiply M *= d, carry in c)
        b=r 1 c=0
        do
          b--
          a=*b a*=d a+=c c=a a%= 100 *b=a
          a=c a/= 100 c=a
        a=b a> 0 while
    
        (divide M /= (2d+1), remainder in c)
        a=d a+=d a++ d=a
        do
          a=c a*= 100 a+=*b c=a a/=d *b=a
          a=c a%=d c=a
        a=r 1 b++ a>b while
        a=d a>>= 1 d=a
    
        (add 2)
        b=0 a= 20 a+=*b *b=a
      d-- a=d a> 0 while
    
      (output the digits in ASCII, 2 per byte)
      do
        a=*b a/= 10 a+= 48 out
        a=*b a%= 10 a+= 48 out
      b++ a=r 1 a>b while
      halt
    end
    Last edited by Matt Mahoney; 31st January 2011 at 16:57.

  3. #3
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    471
    Thanks
    26
    Thanked 9 Times in 8 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Some compression results for pi.txt from the Canterbury miscellaneous corpus.
    pi.txt contains the first million digits of pi: 3141592653...45815

    Code:
    416,101 paq8px_v69 -8
        159 zpaq ocpi
    That looks almost like a typo It's really an astonishing reult, both in resulting size and partially also in time needed to compute.
    Last edited by Black_Fox; 31st January 2011 at 17:32.
    I am... Black_Fox... my discontinued benchmark
    "No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates

  4. #4
    Member
    Join Date
    Jun 2008
    Location
    Germany
    Posts
    369
    Thanks
    5
    Thanked 6 Times in 4 Posts

  5. #5
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    More computing power, actually

  6. #6
    Member
    Join Date
    Mar 2010
    Location
    Canada
    Posts
    40
    Thanks
    0
    Thanked 2 Times in 2 Posts
    Assuming this is not to dumb a question (I can't find a calc program with logs at present).

    If you assume Pi is has random digits, how many bits are needed to express one million digits?
    Log2(10^(10^6)) right? How many bits is that please?

    Yes, I am that dumb. I know the rough rule is ten bits for every 10^3, I know it is a fractionally less
    so (1,000,000/3)*10= 3,333,333 bits / 8 = 416667 bytes. So guess this means you are getting great results.

    Sorry if this seems silly, I had to do the steps to see how good you were doing. Congrats to you.
    Last edited by Earl Colby Pottinger; 2nd February 2011 at 21:41. Reason: expansion

  7. #7
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    One digit should occupy 3,321928095 bits, that's 0,415241012 bytes, so million digits should occupy about 415241,012 bytes. Wonder how would arb255 perform.
    Last edited by Piotr Tarsa; 2nd February 2011 at 22:40. Reason: Corrected :)

  8. #8
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    @Piotr: There is minor typo. It should be 1 billion according to your result
    @Earl: You can always use Windows' Calculator for "log" or any other functions like I used Or even better there should be some online programs for computing such function (i.e. Wolfram Integrator is a great online tool).
    BIT Archiver homepage: www.osmanturan.com

  9. #9
    Member
    Join Date
    May 2008
    Location
    brazil
    Posts
    163
    Thanks
    0
    Thanked 3 Times in 3 Posts

    I don t like floats

    I don t like float numbers. In my opinion, math can use only integer numbers to calculate.

  10. #10
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by lunaris View Post
    I don t like float numbers. In my opinion, math can use only integer numbers to calculate.
    No difference. Regular languages are Turing complete with both integer and fp, so whatever maths you do with ints, you can do with floats. Performance is another topic.

  11. #11
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    But float is just a concatenation of two integers - matissa and exponent. You can store them separately if you want.

  12. #12
    Member
    Join Date
    May 2008
    Location
    brazil
    Posts
    163
    Thanks
    0
    Thanked 3 Times in 3 Posts
    I don t like float operations because it uses infinite numbers .Integer math always use complete and precise numbers.

    I don t care about performance .What bothers me is the cracked math .

    Pi for example is a awful number because it's very cracked and infinite.

    Is there a way to calculate circles using only integer numbers?

  13. #13
    Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    286
    Thanks
    9
    Thanked 33 Times in 21 Posts
    Quote Originally Posted by lunaris View Post
    Is there a way to calculate circles using only integer numbers?
    use a series-representation of pi

    cracked, infinite ugly numbers? ... operations on real numbers are always exact, unless you do it on an finite system like a cpu

    e^(i*pi)-1=0 ... that's what i call beauty

  14. #14
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by lunaris View Post
    I don t like float operations because it uses infinite numbers .Integer math always use complete and precise numbers.

    I don t care about performance .What bothers me is the cracked math .

    Pi for example is a awful number because it's very cracked and infinite.

    Is there a way to calculate circles using only integer numbers?
    I think you don't mean float but real numbers that are not rational. Because floats have finite precision and if you treat they as real numbers, calculations often have some errors. About the same as if you tried to used ints this way.

  15. #15
    Member
    Join Date
    May 2008
    Location
    brazil
    Posts
    163
    Thanks
    0
    Thanked 3 Times in 3 Posts
    Because we are using finite materials like a computer , using infinite floating numbers is very ugly thing.

    Pi-series still use infinite calculations and still calculates pi.I dont like numbers e,pi and i. Lemniscata (infinity) and zero are acceptable .


    No one invented a way to calculate circles ,spheres using only right angles and integer numbers ?

  16. #16
    Member
    Join Date
    May 2008
    Location
    brazil
    Posts
    163
    Thanks
    0
    Thanked 3 Times in 3 Posts
    Yes. Finite float numbers are acceptable.

  17. #17
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    1000000*log(10)/log(256) = 415241 bytes.

    BTW this simple stationary order 0 ZPAQ model compresses pi.txt to 415536 (better than paq8px_v69 - in 0.3 seconds. Command: zpaq nisoco0s pi.txt

    Code:
    (o0s.cfg)
    comp 0 0 0 0 1
      0 cm 9 255
    hcomp
      halt
    post
      0
    end
    Edit: Windows Calc program has logs. Click on View/Scientific.
    Last edited by Matt Mahoney; 3rd February 2011 at 23:59.

  18. #18
    Member
    Join Date
    Mar 2010
    Location
    Canada
    Posts
    40
    Thanks
    0
    Thanked 2 Times in 2 Posts
    Thank you for the math.
    Last edited by Earl Colby Pottinger; 6th February 2011 at 00:39.

  19. #19
    Member
    Join Date
    May 2009
    Location
    China
    Posts
    36
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Can you attach the pi.txt and cfg file?

  20. #20
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    You can get pi.txt here: http://corpus.canterbury.ac.nz/descriptions/#misc

    Or you can create it with the command: zpaq oprpi nul: pi.txt
    using pi.cfg above. I suggest running it overnight.

  21. #21
    Member
    Join Date
    May 2009
    Location
    China
    Posts
    36
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I used my inter cpu T9300 to test lastnight. now I am still waitting@:@. Matt, could you attach the compressed 114 byte file?

  22. #22
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Attached. It takes about 15 hours on my 2 GHz T3200 laptop with "zpaq ox" or with pzpaq with optimization enabled. I guess it would take 5 times as long without an external g++ compiler. Using more threads in pzpaq won't help because there is only one block.
    Attached Files Attached Files

  23. #23
    Member
    Join Date
    May 2009
    Location
    China
    Posts
    36
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thanks Matt.
    I had been analyze the 114 bytes file, and found that it has 5 bytes redundancy. Using my compression method, it can be reached to 112 bytes.

  24. #24
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Contents of pi.txt.zpaq (114 bytes)

    Code:
    0000000 122  80  81   1   1  10   0   0   0   0  19   1   3   5   0  56
    0000016   0   1   0   0   0 254  85  82 111 233  49 153 114  20 183 195
    0000032 210 125 244 198  36  91 120 209  24 215  72  78  26  17  43 168
    0000048 220 168  50 202  95  36  99  17 170 197 127 165  71 199 136  88
    0000064  23 226 167 242 251 159  73  79 227 240 164 109  33 130  14 205
    0000080 160 248  96 171 240 215  74  81 181 141 147 200 203  34 101  97
    0000096  16 192 158  66 216  32 116 192 119  67 198 117   0   0   0   0
    0000112 254 255
    0000114
    122 80 81 = magic number "zPQ"
    1 = ZPAQ level
    1 = block header version
    10 0 = header length
    0 0 0 19 1 = COMP section parameters: PM = 2^19 bytes, 1 component
    3 5 0 = COMP section (ICM 5), and NUL terminator
    56 0 = HCOMP section (HALT), and NUL terminator
    1 = segment header version
    0 = filename (empty)
    0 = comment (empty)
    0 = reserved
    254 85 ... 198 117 (87 bytes) = compressed data
    0 0 0 0 = end of compressed data marker
    254 = end of segment marker (no SHA1 checksum)
    255 = end of block marker

    The 87 bytes of compressed data decodes to 92 bytes:

    1 = flag to indicate PCOMP section is present
    89 0 = length of PCOMP section
    ... (89 bytes) = code in ZPAQL to compute pi
    end of segment bit coded with p = 2^-32, which codes as 4 bytes

    The compression algorithm is a simple indirect order 0 model. Partial byte context -> bit history -> prediction -> binary arithmetic coding.
    Last edited by Matt Mahoney; 22nd February 2011 at 21:01.

  25. #25
    Member
    Join Date
    Dec 2014
    Location
    Berlin
    Posts
    29
    Thanks
    36
    Thanked 26 Times in 12 Posts

    Prediction model

    Some months ago I started to bridge Kolmogorov complexity and adaptive prediction, so Matt helped me to show that instead of computation in pcomp you can also use hcomp to compute Pi and give a perfect prediction. It has almost the same output file size. (But less digits are used so that one does not have to wait so long for testing.)

    From there it was a small step to mix it with a general text model, so that occurrences of Pi are detected and coded with a few bits because of good prediction. More text is in the B.Sc. thesis.

    pi10k.cfg
    mixedpi2.cfg
    Last edited by pothos2; 15th November 2018 at 21:55. Reason: update PDF link

  26. Thanks (3):

    Gonzalo (26th September 2016),Matt Mahoney (28th September 2016),schnaader (26th September 2016)

Similar Threads

  1. compressing a really small 1k .COM file
    By Rugxulo in forum Data Compression
    Replies: 3
    Last Post: 28th November 2009, 01:32
  2. Compressing small bits of data
    By fredrp in forum Data Compression
    Replies: 9
    Last Post: 28th April 2009, 23:27

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •