Results 1 to 3 of 3

Thread: Re: Useful compressed streaming properties

  1. #1
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts

    Re: Useful compressed streaming properties

    http://fastcompression.blogspot.com/...roperties.html

    1) As to appending in deflate, its rather interesting,
    I encountered an example of that in .docx of all things -
    afair some config section is appended to the main .xml file there.
    The deflate format includes "stored" blocks and the stream
    is aligned to byte (its a decoding speed optimization likely).
    So, compressed appending in deflate is implemented by adding
    a zero-size stored block and then simply concatenating the streams.

    2) To terminate a stream of unknown size the common method is
    to use special signatures. For example, 0xFF is special like that in jpeg -
    all FFs are masked in the data (converted to FF 00 afair), so its possible
    to parse jpeg records without decoding the data and without any size fields.
    Of course, a single byte is too redundant as such a marker - imho a 3-byte
    marker is most reasonable, like what I used in lzmarec.
    Its also possible to modify a rangecoder to avoid specific codes
    (and thus not having to mask these in the data).
    Some formats use longer dynamic markers (ie some hash is stored at block start
    and also marks the block end), but imho its inefficient (but works around the
    masking issue).

  2. #2
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    862
    Thanks
    457
    Thanked 256 Times in 104 Posts
    I've used in the past the methodology of "special marker" to find end of blocks, or end of messages, in situation where data could be arbitrary truncated, or would lost a few bytes here and there, making block distance information useless.
    The objective was not to save the current block, rather to save the following ones. This was for transmission on terribly lossy and bad quality radio networks.

    Now, for a streaming format on modern storage or LAN, i feel it more reasonable to use "block distance" methodology.
    It's fairly rare to "lose a few bytes" during a pipe transmission, or within a file. More probably, some data will be garbaged, but the number of bytes should remain correct.

    While the marker methodology is "stronger", it also cost a lot more to process.
    It essentially means data must first be "filtered" to find the marker, or convert data which would happen to be legitimate but have the same value as the marker.
    It's not difficult to do. But to cope with LZ4 decoding speed, even such a light scanning has devastating consequences on performance down the line.

    Your comments are always welcomed

  3. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,267
    Thanks
    200
    Thanked 985 Times in 511 Posts
    > The objective was not to save the current block

    Its not really the point though.
    Just that if you want to compress and transmit the incoming data with low delay,
    then its basically the only solution - buffering the data until you know the
    compressed block size increases the delay.
    And end-of-block codes within the compressed data are much less reliable.

    Though I'm using it in the case where I have to interleave multiple data
    streams into one, while not knowing the stream length in advance.

    > It's fairly rare to "lose a few bytes" during a pipe transmission, or within a file

    Actually crcs in network packets are very small, so you can expected anything to happen
    when there's enough data. Once I tried to download some NTLM rainbow tables (36G) via http -
    there were quite a few examples of weird errors, like a chunk of data (probably a packet)
    being moved 100k further etc.

    > to cope with LZ4 decoding speed

    Maybe its possible to adjust the format in such a way that specific marker code
    would never appear within valid data.

    Or it can be a separate layer of format - networks are usually a little slower
    than that anyway.

    > Your comments are always welcomed

    Thanks, its just that blogspot comments are annoying.

Similar Threads

  1. Matlab Contest: Compressed Sensing
    By russelms in forum Data Compression
    Replies: 1
    Last Post: 28th April 2010, 22:48
  2. Introducing zpipe, a streaming ZPAQ compatible compressor
    By Matt Mahoney in forum Data Compression
    Replies: 0
    Last Post: 1st October 2009, 06:32
  3. BWT with compressed input data
    By Shelwien in forum Data Compression
    Replies: 3
    Last Post: 29th May 2009, 15:16
  4. Replies: 3
    Last Post: 10th November 2007, 21:32
  5. Replies: 12
    Last Post: 30th June 2007, 16:49

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •