Results 1 to 11 of 11

Thread: QUAD 1.10 overview

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    Traditionally, some facts about upcoming version. Okay, what's already done:

    + Faster decompression (Thanks to Criss (American spelling of Chris ))
    + Slightly higher compression with Max mode (Thanks to Uwe)
    + Added a new option - -f - force overwrite of output file - I just tired to delete each time the outputfile at testing
    + Improved option parsing, now you can concatenate options:
    quad -d -f 1.txt.quad
    is equal to
    quad -df 1.txt.quad
    + Some source code changes


  2. #2
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    I have an idea to replace the "x" option with "9":

    quad -9 book1


  3. #3
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    I just tested the decompression speed - it's CRAZY - near to 2 times faster!!!

  4. #4
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    Some testing results:

    FILE PAK0.PAK (QUAKE 2 RESOURCES)
    Size: 183,997,730 bytes
    Packed: 93,457,814 bytes

    Decompression times:
    1.10: 36 sec
    1.09: 58 sec


  5. #5
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by encode
    I have an idea to replace the "x" option with "9":
    Why?

  6. #6
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    Just an idea for improvement. However, I think I will keep the "x" switch. By the way, the "f" switch is really cool - It's very useful, especially at speed testing.


  7. #7
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    fp.log packed to 604 KB:
    fp.log.quad


  8. #8
    Moderator

    Join Date
    May 2008
    Location
    Tristan da Cunha
    Posts
    2,034
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Some nice improvements!

    Quote Originally Posted by encode
    Just an idea for improvement. However, I think I will keep the "x" switch. By the way, the "f" switch is really cool - Its very useful, especially at speed testing.
    Please dont change the "-x" to "-9" unless you plan on adding nine (-1 to -9) different compression levels.

  9. #9
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    Okay!

  10. #10
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    Quote Originally Posted by Malcolm Taylor
    ...
    ROLZ2 and ROLZ3 use optimal parsing, which is where they get a lot of
    compression from
    (similar to the technique used in 7zip).ROLZ uses an order-1 context for matches, I have tried many times to use
    Normal: 4,776,318 bytesvarious other contexts, but this turns out to always be the best.
    ROLZ2 uses an order-1 (and a bit) fast literal coder. ROLZ3 uses a PAQ
    inspired order-2 literal encoder.
    Match flags are encoded using a state-machine model.

    Overall, ROLZ2 is quite similar to 7zip in algorithm choices, except for
    the reduced offset matching.
    Some testing results (ROLZ2), PariahInterface.utx:
    Normal: 4,776,318 bytes
    Max: 4,784,591 bytes
    Fast: 5,556,744 bytes

    With Fast mode I bet ROLZ2 uses Lazy Evaluation - the same parsing strategy as with QUADs Normal mode (Default). However, with Normal and Max mode ROLZ2 uses Oprimal Parsing (Dynamic Programming with choice cost estimators). Do you see the difference? Current Max mode in QUAD is good, but far from optimal. Now I can say, Im pretty close for the practical implementation of such parsing scheme.

    Some info:
    Quote Originally Posted by Malcolm Taylor
    ...
    There is also for lz77 several different optimal parsing methods. One
    (used by quantum IIRC) parses all the matches in the file, and then
    passes backwards over them finding the best match to reach each byte.
    The result is optimal for a static encoding (much like FP).
    Another one (used by 7zip and ROLZ and others), is a dynamic programming
    approach which can cope with adaptive encodings. This often leads to
    some fantastic results. Although I call this optimal, it is still only a
    near optimal method. It always converges to a local minimum, so may not
    always hit the global minimum.
    ...
    It is easier to understand than any other optimal parsing I have come
    across, but not necessarily obvious.
    1) Find all the matches for every byte of your file.
    2) Start at the end (position n).
    3) Find the cheapest way to get to n from n-1, this is the best so far.
    4) Find the cheapest way to get to n from n-2, if better than the best
    so far, then this new way becomes the best so far.
    5) Repeat step 4 until we reach the start of the file.

    By induction the way to get from 0 to n that you end up with is the
    cheapest.

    Things are a little more complicated when you consider the actual
    implementation. As you go, you have to remember the cheapest from each
    byte - ie. have an array of the best match found for each byte (or
    literal which we can think of as a length 1 match), and the cost.

    If you are testing position x which has a match length l, then the total
    cost of reaching the end of the file will be:
    best_cost_found(x) = cost_of_match(l) + best_cost_found(x+l).
    In other words, the total cost is the cost of the match at the current
    position, plus the best cost found earlier from where the match ends.

    Then you try all match lengths from l down to 1, and save the best total
    cost and match details for the current byte, and repeat for the previous
    one...

    This parsing depends on a well defined cost function, which implies a
    static encoding. If your encoding is dynamic, then it kindof screws with
    the optimisation, especially since you parse backwards. One way to solve
    this is to parse optimally only over small blocks (say 4096), then you
    can have static encodings only for each block.
    The definition of the ROLZ algorithm:

    Quote Originally Posted by Malcolm Taylor
    ...
    The fast modes make use of an algorithm called Reduced Offset LZ (ROLZ). This can be regarded as a fast large dictionary LZ.
    ...
    Simply put it is a table driven LZ algorithm that makes use of the
    order-1 context to reduce the offset set. In other words matches are
    made only to offsets with the same order-1 context. The main advantage
    of the algorithm is that it can cover very large dictionaries
    (equivalent to 1-2Mb under other techniques) both efficiently and
    fast. I use an order-1 context model to code the literals which is why
    decompression speeds are not comparable with Winzip.
    Okay, and now the first English translation of the Optimal Parsing scheme by Igor Pavlov (LZMA, 7-Zip):


    Quote Originally Posted by Igor Pavlov
    OPTIMAL LZH

    1) Get matches for each offset. During searching, additionally collect information about optimal (by distance) matches from 2 to max length.

    Offsets[] = Get_Longest_And_Other_Good_Matches();

    // Offsets.Size = length of longest match.
    // Offsets[i] = back offset in dictionary for match with len=i.

    BYTE Get_Current_Literal();
    // returns current byte

    2) Looking at previous huffman blocks, we can estimate number of bits needed for each variant (match/literal)

    int Get_Match_Huffman_Price(int Length, int Offset);

    // Length = length of match
    // Offset = offset of match;
    // Result = number of bits for coding this match;

    int Get_Literal_Huffman_Price(BYTE Literal);
    // Result = number of bits for coding this Literal;

    Build optimal sequence of codes for many steps ahead.

    We have a large array a[]:
    a[i] =
    {
    int Price; // the price in bits, to get to ith byte
    struct
    {
    int Prev; // position, where we jump to current one (=i)
    // for literal: Prev = i - 1
    // for match: Prev = i - Length

    int Offset; // offset in buffer (in case of match)
    // for storing match from prev to i
    }

    }

    a) For all elements set Price to Infinity

    b)
    for(int i=0; i < Big_Value; i++)
    {
    // here we have some conditions to break the loop
    // ...

    // get array Offsets[2..Longest_match_length] of offsets in buffer, read 1)

    Offsets[] = Get_Longest_And_Other_Good_Matches();

    for(int Len = 1; Len < Offsets[].Length; Len++) // len = 1 means Literal
    {
    // estimate the price in bits for current "jump" to Len symbols ahead
    if (Len == 1) // its a literal
    aPrice = Get_Literal_Huffman_Price(Get_Current_Literal());
    else
    aPrice = Get_Match_Huffman_Price(Len, Offsets[Len]);

    // and get price for a new candidate to a[i + Len];
    aNewPrice = a[i].Price + aPrice;

    if (aNewPrice < a[i + Len].Price )
    // if older variant is worse than new, replace it with later
    // (a[i + Len] will refer to i)
    {
    a[i + Len].Price = aNewPrice;
    a[i + Len].Prev = i;
    a[i + Len].Offset = Offsets[Len];
    }
    }

    }

    c) Now, scan a[] from end, collect and code optimal match/literal sequences

    End.

  11. #11
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,985
    Thanks
    377
    Thanked 353 Times in 141 Posts
    The proof of not optimally Max mode in QUAD:

    canterbury.tar, 2,799,104 bytes

    (Both versions) - Normal mode: 600,568 bytes
    QUAD 1.10 - Max mode: 605,177 bytes
    QUDA 1.09 - Max mode: 606,311 bytes


Similar Threads

  1. lzpm 0.03 overview
    By encode in forum Forum Archive
    Replies: 3
    Last Post: 28th April 2007, 22:16
  2. lzpm overview
    By encode in forum Forum Archive
    Replies: 4
    Last Post: 14th April 2007, 23:30
  3. QUAD 1.11 overview
    By encode in forum Forum Archive
    Replies: 45
    Last Post: 1st April 2007, 22:36
  4. QUAD v1.08 overview
    By encode in forum Forum Archive
    Replies: 40
    Last Post: 12th March 2007, 02:15
  5. Quad 1.05a overview
    By encode in forum Forum Archive
    Replies: 47
    Last Post: 21st February 2007, 21:16

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •