Results 1 to 6 of 6

Thread: Efficient dictionary vs. context modeling

  1. #1
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts

    Efficient dictionary vs. context modeling

    Hi all – Let's say we want to encode/compress URLs. Something like this:

    1 bit: HTTP or HTTPS
    1 bit: www or not
    1 bit: .com or not

    Then a slightlier beefier scheme for encoding common TLDs other than .com, as well as common file extensions like .html, .png, etc. Something efficient (less than a byte) for forward slashes, question marks, and equals signs. Then whatever codec you want for the unique text of the URL.

    Would you expect good context modeling to beat scheme like this? I mean to beat the dictionary pieces. For example, are those first three bits wasteful compared to what good context modeling could do?

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,304 Times in 740 Posts
    > are those first three bits wasteful compared to what good context modeling could do?

    The main difference is that CM only encodes relevant data (rather than indirectly related meta-info),
    and it uses probability estimations (so isn't limited to integer number of bits).

    So on one hand a simple custom bitcode might produce better results than some existing generic CM.
    But a custom CM tuned for specific data type would be always better (compression-wise).

  3. Thanks:

    SolidComp (13th June 2020)

  4. #3
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts
    Quote Originally Posted by Shelwien View Post
    > are those first three bits wasteful compared to what good context modeling could do?

    The main difference is that CM only encodes relevant data (rather than indirectly related meta-info),
    and it uses probability estimations (so isn't limited to integer number of bits).

    So on one hand a simple custom bitcode might produce better results than some existing generic CM.
    But a custom CM tuned for specific data type would be always better (compression-wise).
    Thanks. Will custom CM always be slow?

  5. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,304 Times in 740 Posts
    > Will custom CM always be slow?

    order0 adaptive bitwise coder is about ~60MB/s per thread.
    Also that can be combined with preprocessing that reduces the number of bits for CM - like LZ+Huffman.

    In any case, whether its slow depends on specific use case.
    For example, internet traffic is rarely limited by compression speed.

  6. #5
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    353
    Thanks
    131
    Thanked 54 Times in 38 Posts
    Compression imposes limits on traffic if you care about CPU and memory overhead. SLZ was a revelation when Willie introduced it. I had no idea gzip could be made so much lighter and faster.

  7. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,304 Times in 740 Posts
    > Compression imposes limits on traffic if you care about CPU and memory overhead.

    CM/PPM encoding is actually faster and has lower complexity than non-greedy LZ.
    And decoding speed/complexity don't really matter to end user, and in any case would
    be much lower than actual interpretation of these JS scripts.

Similar Threads

  1. efficient bitwise BWT ?
    By Boris K in forum Data Compression
    Replies: 2
    Last Post: 2nd February 2020, 20:59
  2. Efficient deflate implementations
    By Bulat Ziganshin in forum Data Compression
    Replies: 10
    Last Post: 27th March 2019, 23:25
  3. Context Modeling
    By imransuet in forum Data Compression
    Replies: 2
    Last Post: 16th January 2018, 16:09
  4. Logistic mixing & modeling
    By toffer in forum Data Compression
    Replies: 44
    Last Post: 3rd March 2011, 05:25
  5. Graphic/Image/Modeling Benchmark
    By Simon Berger in forum Data Compression
    Replies: 23
    Last Post: 24th May 2009, 17:50

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •