Results 1 to 4 of 4

Thread: How to use text file as dictionary?

  1. #1
    Join Date
    Jul 2016
    Thanked 0 Times in 0 Posts

    Question How to use text file as dictionary?

    Here is what I want to do...

    My plan is to have a big JSON snapshot for e.g. every 4 hours which compresses without dependency/dictionary, then every 20 minutes have a 'changes' compression file that can rebuild the current state but without needing to include the data that was present in the 4-hour-ago snapshot.

    Are there any easy tools I can play around with? I like LZMA since it is around the right speed and efficiency of what I want. Ideally I would want to be able to achieve decompressing this using javascript.

    Is there a specific term I should be using to describe what I want to do? Is there any way to do this with 7-zip?

    I could keep track of just the changes, and just compress that by itself, but I'll miss out on compression efficiency :\.

  2. #2
    Join Date
    Mar 2016
    Thanked 24 Times in 16 Posts
    Assuming you are generating that change file somehow, liblzma has functionality for using a dictionary for this purpose (which you'd create from the snapshot), but you'd have to build it yourself:
    Alternatively, you could use the feature in brotli or zstd (built-in, see command-line help)

  3. Thanks (2):

    Bulat Ziganshin (12th May 2019),Mike (12th May 2019)

  4. #3
    Join Date
    Sep 2008
    Thanked 280 Times in 120 Posts
    zstd snapshot20mn -D reference4h

    and for decompression

    zstd -d snapshot20mn.zst -D reference4h

    For high compression (closer to lzma) :
    zstd -19 snapshot20mn -D reference4h

    and if reference4h is really big (>8MB), you'll want to add --long command, or try an --ultra level (20, 21 or 22).

    > I could keep track of just the changes, and just compress that by itself, but I'll miss out on compression efficiency :\.

    Note that, save some specific corner case, the expected amount of savings due to dictionaries is not that large.
    About a few KB per file.
    That may sound very little, but when files are a few KB, it's actually a lot.
    However, when files become much larger, say >1 MB, gain is comparatively tiny, and therefore not worthwhile.

  5. Thanks (3):

    Bulat Ziganshin (12th May 2019),MegaByte (14th May 2019),Mike (12th May 2019)

  6. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Kharkov, Ukraine
    Thanked 1,367 Times in 781 Posts
    Dictionary compression effect can be easily tested with any sequential compression algorithm, via diff(compress(dict),compress(dict+file)).
    100,000,000 enwik8
        465,211 dict1 // DRT dictionary
        133,741 dict.7z      // 7z a -m0=lzma dict dict1
     25,895,917 file.7z      // 7z a -m0=lzma file enwik8
     26,015,709 dict+file.7z // 7z a -m0 dict+file dict1 enwik8
     25,919,495 file.patch   // hdiffz.exe dict.7z dict+file.7z file.patch
    // 25,919,495>25,895,917 - bad dictionary
    100,000,000 enwik8
        768,771 dict1 // BOOK1
        261,068 dict.7z
     25,895,917 file.7z
     26,140,756 dict+file.7z
     25,879,830 file.patch
    // 25,879,830<25,895,917 - better?

Similar Threads

  1. Replies: 4
    Last Post: 25th October 2018, 01:31
  2. Replies: 8
    Last Post: 23rd September 2016, 14:41
  3. Small dictionary prepreprocessing for text files
    By Matt Mahoney in forum Data Compression
    Replies: 40
    Last Post: 23rd June 2011, 07:04
  4. Dummy Static [Windowless] Dictionary Text Decompressor
    By Sanmayce in forum Data Compression
    Replies: 4
    Last Post: 12th October 2010, 19:55

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts