Results 1 to 4 of 4

Thread: Patch/delta compression?

  1. #1
    Member
    Join Date
    Sep 2010
    Location
    US
    Posts
    126
    Thanks
    4
    Thanked 69 Times in 29 Posts

    Patch/delta compression?

    What's the state of the art in patch/delta compression?

    eg. you sent a large (100M-1G) archive previously, and you wish to update a few parts of that. You can use the previous archive as context for the patch compression.


    Back in the day, most of the serious command line compressors had the option to precondition their model (eg. ACB, Rkive, PPMZ) but I just tried searching a bit and I don't see it as an option on any of the modern/mainstream things I checked (RAR, FreeArc, 7z).

    The most obvious way to do a patch compress is to just use a dictionary coder and preload the dictionary from the previous archive. I'm very surprised that the mainstream archivers don't seem to offer this option, maybe I'm just missing it.

    Any advice?

    A related question : has there been any clever work on very large window matching? Say for example I wanted to find LZ77 string matches in a 1GB window, and I'm okay with a min match len of 16 or something big like that. Is there a better way to do that? How about out-of-core string matching for files larger than memory? eg. if you have two 1 TB files that are nearly identical and you wish to compress the differences.
    Last edited by cbloom; 3rd August 2016 at 20:43.

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,942
    Thanks
    291
    Thanked 1,286 Times in 728 Posts
    See http://shelwien.googlepages.com/fma-diff_v0.rar and http://freearc.org/research/SREP.aspx and http://exdupe.com/
    Also http://compression.ru/ds/ppmtrain.rar

    The best actual diff is probably still bsdiff.

    And imho prebuilt stats won't be of much use with that amount of data.
    With 100M any CM/PPM would already have to flush its stats multiple times.

    Also http://encode.su/threads/1147-A-new-...ression-target

  3. #3
    Member
    Join Date
    Sep 2010
    Location
    US
    Posts
    126
    Thanks
    4
    Thanked 69 Times in 29 Posts
    srep seems like the right thing but it oddly doesn't provide any "preload dictionary" option.

    bsdiff seems okay.

    Ideally it should be built into an archiver, because the archiver can decompress the data to build the patch. eg. what you want to be able to do is something like :

    make archive of initial distribution (dist1.arc)
    send dist1.arc to user
    prepare a new distribution and archive the whole thing (dist2.arc)
    create a delta archive from dist1.arc -> dist2.arc = dist1_2.arc

    user can either either download dist2.arc or dist1_2.arc
    archiver can do dist1.arc + dist1_2.src -> dist2.arc
    Last edited by cbloom; 3rd August 2016 at 20:43.

  4. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,942
    Thanks
    291
    Thanked 1,286 Times in 728 Posts
    exdupe does just that, also fma-diff in a way.
    While bsdiff tries too hard and you won't be able to diff GBs of data with it.

Similar Threads

  1. FreeArc compression suite (4x4, Tornado, REP, Delta, Dict...)
    By Bulat Ziganshin in forum Data Compression
    Replies: 554
    Last Post: 26th September 2018, 02:41
  2. zlib 1.2.4 LC2 patch
    By roytam1 in forum Data Compression
    Replies: 8
    Last Post: 5th December 2016, 08:35
  3. Executable patch generation methods
    By Shelwien in forum Data Compression
    Replies: 2
    Last Post: 2nd April 2010, 09:13
  4. Delta: binary tables preprocessor
    By Bulat Ziganshin in forum Forum Archive
    Replies: 14
    Last Post: 1st April 2008, 09:43
  5. Delta transformation
    By encode in forum Forum Archive
    Replies: 16
    Last Post: 4th January 2008, 11:13

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •