Results 1 to 6 of 6

Thread: About Ring Buffers

  1. #1
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    863
    Thanks
    461
    Thanked 257 Times in 105 Posts

    About Ring Buffers

    Hi

    Thinking about a potential next compression program, i was wondering how to solve a long-time issue with my current I/O code.

    Currently, I/O and compression codes are completely separated, and i wish to keep it that way, for simplicity. But i'm wondering if that's really possible.

    Currently, file is divided into blocks of 8MB, before feeding the compression algorithm. But between each block, everything is initialised again.
    Therefore, no information of the previous block is used to start compressing the next one.

    This is simple, but obviously some efficiency is lost between each initialisation.
    Indeed, it is the main reason why a "large" block of 8MB was selected as I/O buffer within LZP2/LZ4, to minimize efficiency loss.

    But this shouldn't be necessary.
    For example, LZ4 uses only 64KB for the sliding window.
    So obviously, we could go for much less memory.

    Looking for a solution, i've read many times that a "ring buffer" should be the preferred method. But it does come with problems of its own.

    As an example, let's imagine a simple ring buffer, which is feeded say 64KB at a time, with a total size of 512KB (8 slots).
    When the last slot is filled up, then it is simply a matter of filling again the first one.

    OK, but what about the hash/context/references tables built ?
    ==> Any reference to the first slot should be discarded (if within authorized distance)
    ==> Any reference to "forward data" (from a ring buffer perspective) is no longer respecting "max distance" rules (if any)
    ==> Any match sequence which reaches the end of the ring buffer should continue to the beginning of the ring buffer.

    This look significant enough to have some impact on the compression algorithm (i.e. the ring buffer process is not transparent !).
    Although there are some solutions to each problem, i'm worried about their potential impact on complexity, dependancies(!) and performance.

    For example :
    ==> about discarding : go through the pointer table and discard each and every entry which is within slot 1 (which translated into two comparisons for each entry)
    ==> about distances : detects "negative" distance and simply correct them (could be a simple mod (%), but then it means the compression algorithm must know the size of the ring buffer !)
    ==> about matches which "cross" the end of ring buffer : simply discard them. But then, it also means an additional "end of ring buffer" check within the main comparison loop, and this one is going to cost an arm ...


    Well, as i guess this issue has already been largely solved and worked upon, so i'm merely looking for "best practices". It could even be something else than a ring buffer (a copy of dictionnary for example).
    The main idea here is to not re-invent the wheel for something common (is it ?). If possible keeping the compression algorithm independant from such design choice would be preferable. Alternatively, concentrating dependancies into a dedicated function could be an interesting work around.

    Well, any idea is welcomed


    Regards

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,377
    Thanks
    215
    Thanked 1,024 Times in 545 Posts
    > Currently, I/O and compression codes are completely
    > separated, and i wish to keep it that way, for simplicity.
    > But i'm wondering if that's really possible.

    Its possible, but that kind of virtualization is usually slow.
    And normally the compression algorithm has to be aware of
    the window size etc.

    > Indeed, it is the main reason why a "large" block of 8MB was
    > selected as I/O buffer within LZP2/LZ4, to minimize
    > efficiency loss.

    Actually you would also lose quite some time in i/o like that.
    It might be counter-intuitive, but loading whole file, then
    processing it in memory, then storing the code, is much slower
    than if you'd use any smaller buffers.

    I'd like to remind about this benchmark:
    http://www.ctxmodel.net/files/hddtest.png
    It shows that a 2M buffer is best for input and 4k for output.

    > As an example, let's imagine a simple ring buffer, which is
    > feeded say 64KB at a time, with a total size of 512KB (8 slots).
    > When the last slot is filled up, then it is simply a matter
    > of filling again the first one.

    I used a separate "window buffer" and "input buffer" instead -
    so there were no slots, just start and end offsets in the window.

    > OK, but what about the hash/context/references tables built ?
    > ==> Any reference to the first slot should be discarded (if
    > within authorized distance)

    I just stored absolute offsets in the hashtable - its then
    easy to check whether an offset is in window.

    > ==> Any match sequence which reaches the end of the ring
    > buffer should continue to the beginning of the ring buffer.

    Its also possible to replicate the first slot here.
    (so that slot[N]=slot[0]).
    Like that, any string comparisons starting at slot[N-1]
    with length smaller than slot size, won't need any
    additional checks.

    > This look significant enough to have some impact on the
    > compression algorithm (i.e. the ring buffer process is not
    > transparent !).

    Sure. Again, it can be - eg. in C++ you can define a operator[]
    which would allow to work with a virtually infinite buffer.
    But that would be very slow.

    > ==> about discarding : go through the pointer table and
    > discard each and every entry which is within slot 1 (which
    > translated into two comparisons for each entry)

    Well, yeah, though I prefer a "lazy" way to do that -
    on access. Also a single comparison should be enough.

    > ==> about distances : detects "negative" distance and simply
    > correct them (could be a simple mod (%), but then it means
    > the compression algorithm must know the size of the ring
    > buffer !)

    Of course the compression algorithm should know both the window
    size and forward buffer size.

    > ==> about matches which "cross" the end of ring buffer :
    > simply discard them. But then, it also means an additional
    > "end of ring buffer" check within the main comparison loop,
    > and this one is going to cost an arm ...

    Well, as I said, you can copy the first buffer into the last,
    and limit the match length, and then comparisons won't need
    any more checks.

    > Well, as i guess this issue has already been largely solved
    > and worked upon, so i'm merely looking for "best practices".
    > It could even be something else than a ring buffer (a copy
    > of dictionnary for example).

    Algos like CM and LZP don't need any special processing at all.
    And if you'd take into account the possibility of a hash lookup
    returning an offset of completely unrelated string, then
    it also applies to any LZ I guess.

    > The main idea here is to not re-invent the wheel for
    > something common (is it ?).

    Then don't. There're lots of open source compressors to look at.

    > If possible keeping the compression algorithm independant
    > from such design choice would be preferable.

    That's totally unlikely, as it highly affects the algorithm's
    performance.

    > Alternatively, concentrating dependancies into a dedicated
    > function could be an interesting work around.

    Well, I already suggested operator[].

    > Well, any idea is welcomed

    Simple, fast, and precise are usually 3 completely different
    versions of the algorithm.

  3. #3
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts
    Actually you would also lose quite some time in i/o like that.
    It might be counter-intuitive, but loading whole file, then
    processing it in memory, then storing the code, is much slower
    than if you'd use any smaller buffers.
    So does it mean ring buffer (or sliding window?) is both memory efficienct and faster than large-block compression? I'm not sure the extra operations such as filling the IO buffer into ring-window would cost significant time.

    Though I have no time in some future days, I'm considering about making my program into compression on blocks which are hundreds of MBs in size. Osman has ever said he really don't support this. So what's your opinion?

  4. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,377
    Thanks
    215
    Thanked 1,024 Times in 545 Posts
    > So does it mean ring buffer (or sliding window?) is both

    These are a little different, I guess.
    Afaiu ring buffer is something like
    byte buffer[16][1<<20];
    And one buffer is used for the data reads, and
    wrapped from 15 to 0.

    While a sliding window is shifted incrementally,
    with a separate input buffer.

    > memory efficienct and faster than large-block compression?

    I guess, in a way.

    > I'm not sure the extra operations such as filling the IO
    > buffer into ring-window would cost significant time.

    Not really, as it can be overlapped with other processing.

    > Though I have no time in some future days, I'm considering
    > about making my program into compression on blocks which are
    > hundreds of MBs in size. Osman has ever said he really don't
    > support this. So what's your opinion?

    Algorithms like that are inconvenient.
    They are not completely efficient without async i/o.
    And most applications benefit from sequential processing.

  5. #5
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    863
    Thanks
    461
    Thanked 257 Times in 105 Posts
    Quote Originally Posted by Shelwien View Post
    I used a separate "window buffer" and "input buffer" instead -
    so there were no slots, just start and end offsets in the window.
    Thanks Shelwien,
    I agree, this is also my conclusion for a "small dictionnary" algorithm,
    such as an LZ implementation with, for example, a 64KB sliding window.

    Now, to continue one step forward,
    talking about an algorithm with large dictionnary (for example 64MB)
    such as deep ROLZ, or a very large LZ window
    then this method seems no longer convenient.

    In this case, the "I/O slot" is likely to be a sub-part of the "dictionnary buffer".
    This happens for example within Cabarc, with data packed into blocks of 32KB, but the capability to use a dictionnary of 2^21=2MB.

    Then, Ring Buffer seems a more interesting choice. But it comes with different problems compared to "input buffer" method.

    Well, comments are welcomed...
    Last edited by Cyan; 17th November 2009 at 15:44.

  6. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,377
    Thanks
    215
    Thanked 1,024 Times in 545 Posts
    > I agree, this is also my conclusion for a "small
    > dictionnary" algorithm, such as an LZ implementation with,
    > for example, a 64KB sliding window.

    Just a few weeks ago I implemented a rep-like filter
    based on that design, so the window size there was large enough.

    > such as deep ROLZ, or a very large LZ window
    > then this method seems no longer convenient.

    I don't really see why.
    The main drawback of the scheme with separate window buffer
    and input buffer is data copying from input to window.
    But it won't be slow with data already in L1 due to comparisons.
    And there'd be enough operations to overlap with that.

    On other hand, the ring buffer would have 3 data intervals,
    but separate window - only 2.
    Code:
    [slot][slot][processed|new][slot][slot]
    \____________________/\___/\__________/
    Also, it would never have the specified capacity
    (I mean, LZ with 128M ring buffer nearly never would
    be able to match a string from exactly 128M before).

    But actually these layouts are quite similar, so
    I guess it doesn't matter.

    Btw, I had an interesting idea which I didn't implement yet.
    We can compress the data in the window using some fast bitcode.
    Like that, the window size would effectively increase by 20-50%,
    and comparisons might become faster due to less memory access.
    The tricky part is selecting the statistical model, though -
    the whole idea is to compare compressed strings, and adding a
    pass to collect statistics usually won't be a good idea.
    But I have an implementation of a "universal" model like that.

Similar Threads

  1. Ring 1.5c crash when decompressing.
    By Jahoui in forum Data Compression
    Replies: 4
    Last Post: 31st July 2009, 22:08

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •