Results 1 to 12 of 12

Thread: Compression for a stream that flushes often?

  1. #1
    Member
    Join Date
    Apr 2014
    Location
    Palo Alto
    Posts
    5
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Compression for a stream that flushes often?

    I'm building compression for an RPC platform - in which most messages are approximately 100 - 1000 bytes. As we need to flush the message out very often, our compression ratio isn't great.

    What compression algorithm works well with constant flushing? Currently we're using LZMA.

  2. #2
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,488
    Thanks
    26
    Thanked 130 Times in 100 Posts
    Block based algoritms aren't suited for frequent flushing but most other can be adjusted for that.

    Your option is keeping a state on both ends of communication (eg a LZ77 window, a PPM model, etc) and/ or using a precomputed dictionary/ model. I think some PPMd variants had support for precomputed models.

  3. Thanks:

    aninternet (16th April 2014)

  4. #3
    Member
    Join Date
    Apr 2014
    Location
    Palo Alto
    Posts
    5
    Thanks
    2
    Thanked 0 Times in 0 Posts
    I'm running LZMA - is it suitable for flushing? It does keep a LZ77 window. Using HC4 branch predictor seems to perform much better than BT4.

  5. #4
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    155
    Thanks
    39
    Thanked 45 Times in 25 Posts
    The problem with dictionary-based methods is that you have to have a buffer ahead of up to the maximum match length for a "normal" ratio.

    Try something byte-based instead: PPM variants or context models.

  6. #5
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    170
    Thanks
    20
    Thanked 61 Times in 30 Posts
    lzma has a big output header, if I remember correctly.

  7. #6
    Member
    Join Date
    Apr 2014
    Location
    Palo Alto
    Posts
    5
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by RichSelian View Post
    lzma has a big output header, if I remember correctly.
    I thought it was around 6 bytes? Using LZMA does reduce our data usage by 30% - but we think we can go further. I'll benchmark ppmd.

    Also, we have pretty strict memory requirements on decompression - what performs well on a 32kb dictionary?

  8. #7
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    This sounds like an interesting use case. I can't quite picture all of the details, though. Is it bidirectional? Since you're talking about flushing and not terminating, I imagine the compressed stream must be spanning multiple procedure calls/messages. That way, you'd actually have a chance to build up a dictionary. I'm guessing the flushes are not at arbitrary points, either, but coincide with some kind of natural break within the content of the stream. So they wouldn't necessarily shorten matches.

    I'm interested in hearing more of the details, if you're at liberty to share them.

  9. #8
    Member
    Join Date
    Mar 2013
    Location
    Worldwide
    Posts
    565
    Thanks
    67
    Thanked 198 Times in 147 Posts
    You can use zlib with Z_SYNC_FLUSH.
    From http://www.zlib.net/manual.html:
    If the parameter flush is set to Z_SYNC_FLUSH, all pending output is flushed to the output buffer and the output is aligned on a byte boundary, so that the decompressor can get all input data available so far.

  10. #9
    Member
    Join Date
    Apr 2014
    Location
    Palo Alto
    Posts
    5
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by nburns View Post
    This sounds like an interesting use case. I can't quite picture all of the details, though. Is it bidirectional? Since you're talking about flushing and not terminating, I imagine the compressed stream must be spanning multiple procedure calls/messages. That way, you'd actually have a chance to build up a dictionary. I'm guessing the flushes are not at arbitrary points, either, but coincide with some kind of natural break within the content of the stream. So they wouldn't necessarily shorten matches.

    I'm interested in hearing more of the details, if you're at liberty to share them.

    Sure! It's not bidirectional - since communication mostly happens entirely one way. (Server to client). The compressed stream does span multiple messages, and we do currently build up a dictionary. The flushes are not at arbitrary points - but often times longer message sequences do repeat. I have looked at not flushing as often if there is a way to "batch" up messages.

    Currently, LZMA2 actually works pretty decently for this.

  11. #10
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,552
    Thanks
    767
    Thanked 684 Times in 370 Posts
    charles bloom commercial library has exactly that you need, developed for game severs

  12. Thanks:

    aninternet (19th April 2014)

  13. #11
    Member
    Join Date
    Apr 2014
    Location
    Palo Alto
    Posts
    5
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    charles bloom commercial library has exactly that you need, developed for game severs

    Can you link me to this?

  14. #12
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,552
    Thanks
    767
    Thanked 684 Times in 370 Posts

Similar Threads

  1. Replies: 3
    Last Post: 16th May 2016, 00:02
  2. Precomp stream detection
    By danswano in forum Data Compression
    Replies: 10
    Last Post: 26th September 2013, 03:10
  3. Data compression for stream with small packet
    By alpha_one_x86 in forum Data Compression
    Replies: 1
    Last Post: 6th May 2012, 18:51

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •