Results 1 to 4 of 4

Thread: How to use text file as dictionary?

  1. #1
    Member
    Join Date
    Jul 2016
    Location
    Australia
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Question How to use text file as dictionary?

    Here is what I want to do...

    My plan is to have a big JSON snapshot for e.g. every 4 hours which compresses without dependency/dictionary, then every 20 minutes have a 'changes' compression file that can rebuild the current state but without needing to include the data that was present in the 4-hour-ago snapshot.

    Are there any easy tools I can play around with? I like LZMA since it is around the right speed and efficiency of what I want. Ideally I would want to be able to achieve decompressing this using javascript.

    Is there a specific term I should be using to describe what I want to do? Is there any way to do this with 7-zip?

    I could keep track of just the changes, and just compress that by itself, but I'll miss out on compression efficiency :\.

  2. #2
    Member
    Join Date
    Mar 2016
    Location
    USA
    Posts
    56
    Thanks
    7
    Thanked 23 Times in 15 Posts
    Assuming you are generating that change file somehow, liblzma has functionality for using a dictionary for this purpose (which you'd create from the snapshot), but you'd have to build it yourself: https://github.com/xz-mirror/xz/blob.../lzma12.h#L224
    Alternatively, you could use the feature in brotli https://github.com/google/brotli/blo...y_generator.cc or zstd (built-in, see command-line help)

  3. Thanks (2):

    Bulat Ziganshin (12th May 2019),Mike (12th May 2019)

  4. #3
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    885
    Thanks
    480
    Thanked 278 Times in 118 Posts
    zstd snapshot20mn -D reference4h

    and for decompression

    zstd -d snapshot20mn.zst -D reference4h

    For high compression (closer to lzma) :
    zstd -19 snapshot20mn -D reference4h

    and if reference4h is really big (>8MB), you'll want to add --long command, or try an --ultra level (20, 21 or 22).

    > I could keep track of just the changes, and just compress that by itself, but I'll miss out on compression efficiency :\.

    Note that, save some specific corner case, the expected amount of savings due to dictionaries is not that large.
    About a few KB per file.
    That may sound very little, but when files are a few KB, it's actually a lot.
    However, when files become much larger, say >1 MB, gain is comparatively tiny, and therefore not worthwhile.

  5. Thanks (3):

    Bulat Ziganshin (12th May 2019),MegaByte (14th May 2019),Mike (12th May 2019)

  6. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,904
    Thanks
    291
    Thanked 1,269 Times in 717 Posts
    Dictionary compression effect can be easily tested with any sequential compression algorithm, via diff(compress(dict),compress(dict+file)).
    Code:
    100,000,000 enwik8
        465,211 dict1 // DRT dictionary
        133,741 dict.7z      // 7z a -m0=lzma dict dict1
     25,895,917 file.7z      // 7z a -m0=lzma file enwik8
     26,015,709 dict+file.7z // 7z a -m0 dict+file dict1 enwik8
     25,919,495 file.patch   // hdiffz.exe dict.7z dict+file.7z file.patch
    // 25,919,495>25,895,917 - bad dictionary
    
    100,000,000 enwik8
        768,771 dict1 // BOOK1
        261,068 dict.7z
     25,895,917 file.7z
     26,140,756 dict+file.7z
     25,879,830 file.patch
    // 25,879,830<25,895,917 - better?

Similar Threads

  1. Replies: 4
    Last Post: 25th October 2018, 00:31
  2. Replies: 1
    Last Post: 24th June 2018, 18:48
  3. Replies: 8
    Last Post: 23rd September 2016, 13:41
  4. Small dictionary prepreprocessing for text files
    By Matt Mahoney in forum Data Compression
    Replies: 40
    Last Post: 23rd June 2011, 06:04
  5. Dummy Static [Windowless] Dictionary Text Decompressor
    By Sanmayce in forum Data Compression
    Replies: 4
    Last Post: 12th October 2010, 18:55

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •