Results 1 to 5 of 5

Thread: Nibble MMC matchfinder

  1. #1
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,378
    Thanks
    215
    Thanked 1,025 Times in 546 Posts

    Nibble MMC matchfinder

    http://nishi.dreamhosters.com/u/nmf_v0.rar

    I made a matchfinder based on my impression of Cyan's MMC,
    and it works, but unfortunately in the end the performance
    appeared much worse than originally expected
    (3x slower than lzma).

    By design, its somewhat similar to a PPM tree - there's no
    lookahead buffer and it indexes the occurrences of prefixes.
    This seemed like a good idea for later parsing optimization implementation.

    The data structure is like this:
    Code:
    struct {
      uint ch:4;
      uint ptr0:28;
      uint cl:4;
      uint ptr1:28;
    };
    ie there's a table like in mf with hash chains, but data bytes
    are also stored there (instead of separate data window), and
    there's a second pointer which references the child nodes in the tree.

    Thus, ptr0 chain only contains up to 16 elements with different nibble
    values, and if the nibble matches, we have to follow ptr1 and look for
    the next nibble.

    Also, like this, we have 15*(2*maxorder-1) memory lookups, which in
    theory should be better than indefinite number in hash chains or
    255*maxorder with a similar bytewise tree (which is presumably MMC).

    Unfortunately, we can't just normally insert the new node into a tree
    here, because then an older node would reference newer one, and when
    that older nodes falls out of the window, the reference is also lost.

    So instead, the new node is always added at root level, and then an
    older node with the same nibble value is pushed into the branch
    (recursively).

    Due to that, it appeared necessary to have separate functions for
    search and insertion of new node, which certainly made the processing
    significantly slower.

    But well, anyway, its a working 8N implementation which finds offsets
    of nearest matches of length 3..272, and its output seems to match
    the results of direct search (see archive).

    Now, questions.

    1. So is this MMC or not?

    2. Any ideas for speed improvement?

  2. Thanks (2):

    Bulat Ziganshin (11th January 2017),encode (1st March 2016)

  3. #2
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    863
    Thanks
    461
    Thanked 257 Times in 105 Posts
    1. So is this MMC or not?
    Yes it is.
    And your idea of making the search progressing by nibble instead of byte looks good.

    By design, its somewhat similar to a PPM tree - there's no
    lookahead buffer and it indexes the occurrences of prefixes.
    This seemed like a good idea for later parsing optimization implementation.
    Do you mean it's a backward tree ?
    ie at a given position, with a given byte, you're looking for the best previous prefix ? Where are stored the node statistics then ?
    I guess i've not understood this part correctly.

    Also, note that MMC works best with Greedy/Lazy matching strategies.
    A very important property is that we don't need to "insert" node, we just add them like a regular hash chain, and we only insert (i.e. sort) when we are actively looking for a best match.
    In many circumstances, it means that we will only insert ~1/8th of positions. It explains a lot of the speed up compared to BST.

    This property is obviously nullified if we are actively looking for best match at each and every position (i.e. Optimal Parsing). In this case, i guess BST should be faster.


    2. Any ideas for speed improvement?
    I guess optimizations applied to MMC are still valid with your implementation.
    Typically :
    - multi-level promotion : one node can get promoted by several levels in a single pass. This is useless in an "optimal parsing" scenario, since it's a kind of "catch up" process for data blindly inserted without sorting.
    - secondaries promotions : nodes that are not being searched but are nonetheless crossed while searching can get promoted if we get a chance to find correlations. It's complex however, and thus slow. The benefit is apparent when search window size is large, i.e. larger than CPU cache size.
    - stream detection : usual tricks for handling long streams of zero, similar to hash chain. Useless for enwik testings.

    All these tricks add substantial complexity though. I mostly recommend the first one.
    Last edited by Cyan; 24th April 2011 at 13:35.

  4. #3
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    For me Shelwien's NMF looks like MMC with Secondary Promotions (in Cyan terminology) as any nibble chain is limited to 17 nodes (head + 16 different nibbles) and when adding a node with some value NMF pushes older node with that value recursively up the tree.

    As everyone see there's a problem with MMC: with byte based MMC there can be at most 256 + 1 = 257 nodes at a some level, compared to (16 + 1) * 16 + 1 = 273 with NMF. At first look this doesn't seem to be a big problem but the problem grows when there's more divergence in high nibbles than in entire bytes (relatively to number of all possible values).

    Probably it would be best to make a hybrid MMC: nibble based on levels where there are low divergence in high nibbles compared to number of different byte values (eg 15 different high nibbles vs 250 different bytes, or 3 different high nibbles compared to 30 different bytes), and byte based otherwise. Also, when there is only one different high nibble value then there's no point in making two nibble-levels instead of one byte-level.

    That would require converting levels between nibble-based and byte-based implementations, but that could probably be done locally and rest of work deferred (adding a flag for deferring evaluation).

    I don't really know how stream detection works in Cyan's MMC, but surely with MMC you could add a logic that detects repeats, like: abcabcabcabcabclolcat. On position where the second abc starts you can detect that there's such many repetitions and you could use a logic that does much less work than ordinary string searches.

    Unfortunately I don't have any simple idea how to speed up updating MMC structure when parsing files like file combined of book1 + book2 + book1. When parsing that last part of combined file there would be only longest matches and they would be on every position. Maybe there is one but it would be better to preprocess such file with srep or something that has much lower memory requirements.
    Last edited by Piotr Tarsa; 24th April 2011 at 14:56.

  5. #4
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    863
    Thanks
    461
    Thanked 257 Times in 105 Posts
    Unfortunately I don't have any simple idea how to speed up updating MMC structure when parsing files like file combined of book1 + book2 + book1
    Isn't this solved by Greedy Parsing ?

    Shelwien, i've been looking at your sample code, and it seems it is actually "inserting", which translates into sorting the tree, at each and every position.
    This is probably what makes it slow.
    As stated earlier, MMC performance starts to become interesting when not sorting everything, but just adding the position into a standard hash chain, that will be sorted later on if needed.
    That essentially means that we do not have the comfort to work with a fully sorted tree : it's only partially sorted. And we finish the job while searching into it.

    It sounds convoluted, but in the end, many positions will never be sorted, or only partially, which translates into a lot of CPU cycles savings.

  6. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,378
    Thanks
    215
    Thanked 1,025 Times in 546 Posts
    @Cyan:

    > And your idea of making the search progressing by nibble instead
    > of byte looks good.

    Actually I'm still wondering whether it makes sense to make a
    nibble-wise LZ with that.

    > Do you mean it's a backward tree ?

    Yes, search goes right-to-left.

    > ie at a given position, with a given byte, you're looking for the
    > best previous prefix?

    I'm looking for nearest prefix occurrences.
    Code:
      for( i=0; i<f_len; i++ ) {
        int l=Min(i-1,maxlen), m=minlen-1;
        for( j=i-1; j>0; j-- ) {
          n = compare( j, i, l ); // length of substring match at pos i and j
          for( m=m; m<=n; m++ ) printf( "%08X ", j );
        }
      }
    > Where are stored the node statistics then ?

    There're no statistics. What do you mean?

    > In many circumstances, it means that we will only insert ~1/8th of
    > positions. It explains a lot of the speed up compared to BST.

    Well, its like update exclusion I guess... Its actually applicable
    with trees too, but its kinda imprecise like that.
    Still, I guess I need to consider this too... I guess there's no
    way to get a speed like 10MB/s with 128M window and precise search.

    > This property is obviously nullified if we are actively looking for
    > best match at each and every position (i.e. Optimal Parsing). In
    > this case, i guess BST should be faster.

    I really like trees, like the one in ppmd - its possible
    to make it without any search at all, just byte lookups
    within long matches.
    But there're some unfortunate issues, which I still can't resolve:
    1. Memory usage. Something like order16 may be still relatively ok,
    but order256 ppmd certainly wastes too much memory.
    2. There's no clear solution for sliding window.
    Blockwise processing is certainly not a good idea, and deleting
    expired contexts would make it 2x slower.
    3. Memory reallocation is most problematic.
    Ppmd tries very hard in that sense, there's even block splitting
    and merging, but its likely still possible to exploit it
    with some specially prepared data (make it impossible to
    allocate a node of some size, even though there's enough of
    free memory).

    > - multi-level promotion : one node can get promoted by several
    > levels in a single pass. This is useless in an "optimal parsing"
    > scenario, since it's a kind of "catch up" process for data blindly
    > inserted without sorting.

    Actually it might work. Delayed update is certainly a good idea,
    it lets us avoid doing updates for all the unique contexts in the window.
    But on other hand, the search becomes much more complicated with it,
    would require scanning up to the end on each level, and sorting the nodes.
    Its hard to say which is better without actually testing it...

    Even as it is, I don't like the fact that I have to compare strings while
    walking the tree, and adding delayed updates would make some searches
    scarily slow.

    > - stream detection : usual tricks for handling long streams of zero,
    > similar to hash chain. Useless for enwik testings.

    Yes; I also plan to use a rep-like filter anyway, might as well
    include RLE in it.


    @Piotr Tarsa:

    > any nibble chain is limited to 17 nodes (head + 16 different nibbles)

    16 nodes; there's no separate head.

    > Probably it would be best to make a hybrid MMC:

    To me, short matches don't make much sense - the compression
    would be better if we'd handle them with a higher-order
    literal model, anyway.
    So hybrid MMC with step size determined by context order
    likely won't make any sense.
    (I know that its not what you suggest, just a simpler approach).

    > nibble based on levels where there are low divergence in high
    > nibbles compared to number of different byte values (eg 15 different
    > high nibbles vs 250 different bytes, or 3 different high nibbles
    > compared to 30 different bytes), and byte based otherwise.

    Nibbles are also important because of node structure layout -
    currently the maximum window size there is 128M and I can't
    allow it to get any smaller (2^28 is 256M, but I need an extra
    bit to detect out-of-window pointers).
    However nibble string comparison is certainly ugly, so
    I'm considering making a compromise to 9N and separate
    data window.

    Like that it would be certainly possible to detect that
    chain became too long in search and subdivide it.

    > Also, when there is only one different high nibble value then
    > there's no point in making two nibble-levels instead of one
    > byte-level.

    Actually even in my implementation the next level is created
    only when a second string with same nibble value is inserted
    into a subtree.

    > it would be better to preprocess such file with srep or something
    > that has much lower memory requirements.

    Yes, fortunately I have a good method for it, with anchored hashing,
    so I'm intending to do it anyway.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •