Results 1 to 7 of 7

Thread: State of PPM and Shkarin's PPMII algorithm

  1. #1
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    369
    Thanks
    133
    Thanked 57 Times in 40 Posts

    State of PPM and Shkarin's PPMII algorithm

    Hi all – What's the situation with PPM in comparison to other approaches, in terms of ratio, complexity, and speed tradeoffs? I don't see a PPM or PPMII thread here, and I'm curious to learn more about it. Which codecs are using PPM?

    Also, what happened with Dmitry Shkarin's PPMII algorithm? I was intrigued by this passage in Steinruecken's 2014 thesis (PDF) on lossless data compression:

    Shkarin (2001a) published a finite-depth algorithm named PPMII (PPM with information inheritance), which uses custom probability estimation and update rules that were manually optimised for good compression and speedy computation. As of 2014, the results produced by PPMII are still state of the art and unbeaten by any other known PPM variant.
    How does PPMII compare to other, non-PPM approaches? (I didn't realize until now that PPMd is also Shkarin's.)

    (That thesis also references a 1998 paper by Charles Bloom (PDF) that focuses on PPM. I haven't read it yet.)

    Thanks.

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,133
    Thanks
    320
    Thanked 1,396 Times in 801 Posts
    > Hi all – What's the situation with PPM in comparison to other approaches,

    PPM can be viewed as an ultimate version of ROLZ with unary length coding (known as escape coding).

    > I don't see a PPM or PPMII thread here, and I'm curious to learn more about it.

    Well, you can read this: https://marknelson.us/posts/1991/02/...ion.html#part2
    or this: http://mattmahoney.net/dc/dce.html#Section_422

    The main characteristic of PPM is escape coding used to combine predictions of multiple predictors.
    It would be CM, if we'd use same predictors with logistic mixing.

    > Which codecs are using PPM?

    The only useful one is PPMd and derivatives (ppmonstr,PPMs, durilca*).

    However there're many independent implementations of PPM, you can look for them on LTCB.
    Even @encode has 2-3 implementations, eg. http://mattmahoney.net/dc/text.html#1936

    > in terms of ratio, complexity, and speed tradeoffs?

    Ratio is 5-10% higher than LZMA, which in turn is 10% higher than zstd.

    PPMd doesn't implement parsing optimization, so its encoding speed can reach ~20MB/s or so (mostly in PPMs),
    but decoding is symmetric, thus its speed is the same (or a little less).

    The speed is mostly determined by memory latency (L3 or RAM), so there's not much to improve
    on current cpu architectures, although 20 years ago it could compete with LZs in speed too -
    but relative memory latency became much higher since then (comparing to cpu speed).

    Also PPM has some common properties with BWT-based compression methods (ratio etc),
    while BWT provides more options for speed optimizations, so PPM is not very popular lately.

    > Also, what happened with Dmitry Shkarin's PPMII algorithm?

    You can access most of its versions here: https://compression.ru/ds/
    Also Shkarin participated in GDC-2020, so there're new builds of all his codecs
    (very little difference from previous releases).

    > How does PPMII compare to other, non-PPM approaches?

    Ratio-wise, CM is significantly better.

    Also, PPMd is beaten by BWT codecs in both speed and ratio,
    while ppmonstr is not open-source and much slower anyway
    (similar to CM in ratio and speed).

    However PPM has benefits in small-block compression, since
    table-based algorithms (ROLZ,BWT,CM) waste some time on
    table initialization, while PPM can just reset the memory pool pointer.

  3. Thanks (2):

    Hakan Abbas (15th February 2021),SolidComp (15th February 2021)

  4. #3
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    573
    Thanks
    245
    Thanked 98 Times in 77 Posts
    Quote Originally Posted by Shelwien View Post
    Also, PPMd is beaten by BWT codecs in both speed and ratio,
    ... which ones for example?

  5. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,133
    Thanks
    320
    Thanked 1,396 Times in 801 Posts
    Most I guess?
    Ratio-wise its not so simple, as BWT has more problems with local adaptation,
    also PPM results on binary files can be improved with sparse models etc,
    while its not really possible for BWT.
    But sparse models are only available in ppmonstr, while ppmd is pretty similar to plain BWT.
    Code:
    21,388,296 33.473s 34.148s // ppmd -o10 -m256 -r1
    22,325,416 19.387s 19.849s // ppmd -o10 -m256 -r0
    
    27,343,261 12.084s 13.082s // ppms -o8
    
    21,688,998 12.937s  8.294s // bsc -b51 -m0 -cp -e2 -p -t -T (256MB memory usage)
    21,835,062 11.412s  6.656s // bsc -b51 -m0 -cp -e1 -p -t -T (faster entropy model)
    21,820,628 11.600s  6.734s // bsc -b51 -m0 -cp -e1 -t -T (preproc enabled)
    
    21,323,232 14.400s 15.763s // bcm 2.00a3 c51r
    Maniscalco's and Lucas' works are probably better than old bsc,
    but would likely require more preparation work for fair comparison.

  6. #5
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    369
    Thanks
    133
    Thanked 57 Times in 40 Posts
    Quote Originally Posted by Shelwien View Post

    > in terms of ratio, complexity, and speed tradeoffs?

    Ratio is 5-10% higher than LZMA, which in turn is 10% higher than zstd.
    What direction is a "higher" ratio pointing here? PPMd compresses more than LZMA, which compresses more than Zstd?

  7. #6
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    573
    Thanks
    245
    Thanked 98 Times in 77 Posts
    Quote Originally Posted by Shelwien View Post
    Most I guess?
    Ratio-wise its not so simple, as BWT has more problems with local adaptation,
    also PPM results on binary files can be improved with sparse models etc,
    while its not really possible for BWT.
    But sparse models are only available in ppmonstr, while ppmd is pretty similar to plain BWT.
    Code:
    21,388,296 33.473s 34.148s // ppmd -o10 -m256 -r1
    22,325,416 19.387s 19.849s // ppmd -o10 -m256 -r0
    
    27,343,261 12.084s 13.082s // ppms -o8
    
    21,688,998 12.937s  8.294s // bsc -b51 -m0 -cp -e2 -p -t -T (256MB memory usage)
    21,835,062 11.412s  6.656s // bsc -b51 -m0 -cp -e1 -p -t -T (faster entropy model)
    21,820,628 11.600s  6.734s // bsc -b51 -m0 -cp -e1 -t -T (preproc enabled)
    
    21,323,232 14.400s 15.763s // bcm 2.00a3 c51r
    Maniscalco's and Lucas' works are probably better than old bsc,
    but would likely require more preparation work for fair comparison.

    hmm... I find these results odd since in my computer ppmd is usually faster than bsc and bcm. I use Bulat's fazip with default options, which are -o10 -m48.
    But even when I run exactly the same test as you I get different results.
    BTW, what is the file? enwik8? I just took a json I have lying around, but the result is representative of what I see every day. I have a script that compares similar compressors and I run it pretty frequently.

    Click image for larger version. 

Name:	imagen.png 
Views:	21 
Size:	551.0 KB 
ID:	8335
    The last command is how I usually run bsc

  8. #7
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,133
    Thanks
    320
    Thanked 1,396 Times in 801 Posts
    @SolidComp:

    > What direction is a "higher" ratio pointing here? PPMd compresses more than LZMA, which compresses more than Zstd?

    Yes. https://en.wikipedia.org/wiki/Data_c...tio#Definition

    @Gonzalo:

    Too many long matches, which is bad for BWT sort (too many redundant operations)
    and good for PPMd (which keeps encoding whole bytes in maxorder context, while bsc is bitwise).
    Your test also shows that enabled LZP preprocessing fixes that problem and provides 18.5% higher ratio.
    Its likely that -b51 instead of -b1000 in 2nd bsc example would improve speed,
    while the ratio would remain higher than ppmd's.
    (51 is 256/5, where 5N is BWT memory usage; to match ppmd's -m256)

    In any case, BWT has conceptual advantages for speed optimization,
    as its batch processing vs incremental, plus there's a lot of entropy coding options
    for speed/ratio tradeoffs.

Similar Threads

  1. State transitions in PPMd
    By cade in forum Data Compression
    Replies: 3
    Last Post: 24th February 2014, 16:35
  2. State tables
    By Piotr Tarsa in forum Data Compression
    Replies: 3
    Last Post: 29th January 2012, 23:14
  3. Would it be SR or PPM?
    By Piotr Tarsa in forum Data Compression
    Replies: 6
    Last Post: 30th December 2011, 06:53
  4. State machines
    By encode in forum Data Compression
    Replies: 3
    Last Post: 3rd November 2010, 22:58
  5. Fast PPMII+VC Compressor
    By in forum Forum Archive
    Replies: 4
    Last Post: 2nd August 2006, 20:17

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •