Results 1 to 15 of 15

Thread: recommended formats for game data and partial updates

  1. #1
    Member
    Join Date
    Feb 2010
    Location
    Nordic
    Posts
    200
    Thanks
    41
    Thanked 36 Times in 12 Posts

    Question recommended formats for game data and partial updates

    Glest is a fun free RTS with an active modding community.

    I am undertaking to help make a mod downloader and updater tool for it.

    One neat thing in the newest version of Glest is that these addons are simply zip or 7z files in appropriate directories. Other compression formats could be added.

    What archive formats, compression options and the delta updates would you recommend?

    http://sourceforge.net/mailarchive/f...=glestae-devel

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,368
    Thanks
    213
    Thanked 1,020 Times in 541 Posts
    http://nishi.dreamhosters.com/u/bsdiff_sh2.rar for patch creation + lzma for compression (E8 can be disabled if there're no exes)
    ccm is better than lzma for this, but you won't use it anyway

    As to zip/7z, I don't think that lossless archive recompression matters here, so instead of precomp, it should be better to just
    extract/repack the archives with actual archivers.

    Depending on data size, it may be feasible to extract all other files, make uncompressed archives per version (possibly with 7z -mx0,
    but it would be more compact with a custom simple archive format), then just generate patches with bsdiff.

    But if these uncompressed archives are larger than 50M or so, it would be better to make separate patches per file.

  3. #3
    Member
    Join Date
    Feb 2010
    Location
    Nordic
    Posts
    200
    Thanks
    41
    Thanked 36 Times in 12 Posts
    I imagine a tool that zips (or 7z or whatever, as long as it isn't solid) the mod. This is the file that people download when they don't have the mod previously installed. If they have the mod previously installed, and its an update, I think a delta is appropriate (mods can be hundreds of MB, but differ by only a few MB each sequential update). The person doing the packaging has two files - the archive of the current version of the mod, and an archive of the previous version. I imagine a patch-maker tool that walks through the file-aligned blocks in the two archives looking for different files; when two files are different, it records this... But rather than packaging the changed files up, it can instead make an index of offsets for the new parts in the new mod, and then clients can use HTTP range requests to do a partial download of the bits of the new archive that they need. And then the archives are hosted on some free web hosting that supports large files and HTTP range requests, which I speculate Google Sites does. Does this make sense?

  4. #4
    Member Surfer's Avatar
    Join Date
    Mar 2009
    Location
    oren
    Posts
    203
    Thanks
    18
    Thanked 7 Times in 1 Post
    Interesting tool for patch creation http://www.indigorose.com/products/deltamax/

  5. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,368
    Thanks
    213
    Thanked 1,020 Times in 541 Posts
    @Surfer: thanks, I tested it:
    Code:
    25256260  win32_1161.rar // rar-m0 for v1161
    24374051  win32_1162.rar // rar-m0 for v1162
     7843054  win32_1162.rar.lzma // rar-m0 1162 compressed with lzma
     7264087  win32_1162.ari // updater's archive of 1162
     7822845  win32.diff // deltamax diff
     4702852  win32.diff.lzma // compressed with lzma, as its clearly redundant
     8190903  xdelta-0 // xdelta3.0z -e -0
     4977936  xdelta-0.lzma
     5589487  xdelta-default // xdelta3.0z -e
     4690081  xdelta-default.lzma
    24470334  win32.bsdiff // bsdiff_sh2 rar archive diff
     2644026  win32.bsdiff.lzma // compressed
     2824097  update_180A1F39625BC233.ari // updater patch with file diffs
    1. Wasn't able to create a folder diff with deltamax
    2. Its interesting to see that archive diff is smaller with bsdiff - I didn't try that before for some reason.

    Update: xdelta3 results added

  6. #6
    Member Surfer's Avatar
    Join Date
    Mar 2009
    Location
    oren
    Posts
    203
    Thanks
    18
    Thanked 7 Times in 1 Post
    Quote Originally Posted by Shelwien View Post
    I tested it
    Thanks. Can you test xdelta 3.0z please ?

  7. #7
    Member
    Join Date
    Feb 2010
    Location
    Nordic
    Posts
    200
    Thanks
    41
    Thanked 36 Times in 12 Posts
    (cross-posted from http://glest.org/glest_board/index.p...62749#msg62749)

    I added rename-checking and got these numbers for the mod MRise 1.0 -> 1.6:

    245 added: 39.8 MB
    9 renamed: 1.6 MB
    300 unchanged: 91.0 MB
    11 changed: 7.6 MB
    87 deleted: 17.1 MB

    471 previously: 117.3 MB -> 60.2 MB zipped -> 50 MB 7z/rar
    556 now: 148.7 MB -> 78.1 MB zipped -> 64 MB 7z/rar
    delta: -> 27.0 MB zipped -> 19 MB 7z


    My script, if anyone wants to run it (requires Python): https://gist.github.com/704809

    9 of the files were found to be renamed.

    With zip (my format of choice) the 1.0 mod archive was 60 MB compressed; it was originally distributed as a 7zip archive using RAR compression (and, I imagine, solid), and was only 50 MB.

    The 1.6 mod archive compresses to 78 MB using zip, and was distributed as a 7zip archive with RAR too, taking just 64 MB.

    Using the 'diffing', I calculate you would only need to download 27 MB of of the 1.6 zip file in order to patch a copy of 1.0 that you already had; if we used 7zip (non-solid; actually, solid didn't improve compression dramatically) then this would be only around 19 MB.

    So you can imagine two archives sitting on public web hosting, e.g. Google Sites. There is also a 'diff' meta-data file. The updater checks the meta-data file to see if its changed, using HTTP If-Modified-Since etc. If its modified, it gets the meta-data and either fetches the full new archive, or uses a range request to download the delta (that has been conveniently packaged at the end of the new archive). You can imagine a central site with scripts and things for the mod makers to be using to 'publish' the availability of updates etc, but I think it useful that the actual GBs of mods are stored elsewhere, and Google Sites seems to be a good choice.

    Now 7z is supported in the latest GAE, but I feel that zip is an acceptable format and will be easier to work with.

    Now of course on this forum I would hope people point out how format-specific filters, or in-file diffing might help dramatically...?

  8. #8
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,368
    Thanks
    213
    Thanked 1,020 Times in 541 Posts
    Code:
     51503962 MRise_v1.0_Core.7z
     67100484 MRise_CoreFiles_v1.6.rar
    
    123065315 MRise_v100_Core.rar // rar -m0
    156015046 MRise_v160_Core.rar
    
     38553779 MRise_v100_Core.nz // nanozip 08 -cc
     47743274 MRise_v160_Core.nz
    
     38765046 MRise_v100_Core.rar.nz // rar -m0 | nanozip 08 -cc
     47897290 MRise_v160_Core.rar.nz
    
     46812617 MRise_v100_Core.rar.lzma // rar -m0 | lzma
     58661570 MRise_v160_Core.rar.lzma
    
    124242689 MRise_v100_Core.rar.pcf // precomp -slow
    157192420 MRise_v160_Core.rar.pcf
    
     46795848 MRise_v100_Core.rar.pcf.lzma // rar0 | precomp | lzma
     58610169 MRise_v160_Core.rar.pcf.lzma
    
    156423371 MRise_v160_Core.rar.bsdiff // bsdiff_sh2
    157601102 MRise_v160_Core.rar.pcf.bsdiff
    
     16134305 MRise_v160_Core.rar.bsdiff.ccm // + ccm 1.30c
     16142316 MRise_v160_Core.rar.pcf.bsdiff.ccm
    
     18728493 MRise_v160_Core.rar.bsdiff.lzma // + lzma
     18736294 MRise_v160_Core.rar.pcf.bsdiff.lzma
    
     14986686 MRise_v160_Core.rar.bsdiff.nz // + nz08 -cc
     15007036 MRise_v160_Core.rar.pcf.bsdiff.nz

  9. #9
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,368
    Thanks
    213
    Thanked 1,020 Times in 541 Posts
    Now, as to format filters
    Code:
    57120424  v100-bmp-tga.rar0
    28796621  v100-g3d.rar0
    12810076  v100-exe-dll.rar0
    11157201  v100-misc.rar0
     6757612  v100-ogg.rar0
     6423464  v100-wav.rar0
    It seems that you mainly need an image compressor.
    1. CM-based image compressor, like bmf
    2. .g3d filter - at least float-to-fixedpoint conversion
    3. .exe filter, ie disasm
    4. misc: png recompression, ascii2binary converter for .obj, custom parser for .blend
    5. ogg: recompression possible, but not available atm anyway
    6. wav: a specialized model

  10. #10
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,368
    Thanks
    213
    Thanked 1,020 Times in 541 Posts
    Also, here's how my updater works:
    1. Download the target version index file (list of paths/files and their hashes)
    2. Build a similar index file for local files
    3. Compute an update hash from hash pairs of local files and target files
    4. Try downloading the patch file based on hash
    5. If it doesn't exist, download the full archive of new version

    And as to http requests, I considered that approach too, it surely looks attractive.
    But its not very compact, you'd have at least 1k extra traffic per range - so you
    can't just use it like LZ matches. And depending on webserver, these advanced http
    features may do something weird.

    However, the main point is different - you just don't need something that complicated.
    It seems that you're trying to design some protocol with runtime analysis, which would
    be able to construct a minimal set of requests by locally available data.
    But normally there's a fixed set of versions which can be installed, so you only have
    to prebuild patches for these. And if somebody edited even a single file required to
    apply the prebuilt patch - that somebody probably doesn't need to incrementally update
    his setup to a standard new version anyway.

  11. #11
    Member
    Join Date
    Feb 2010
    Location
    Nordic
    Posts
    200
    Thanks
    41
    Thanked 36 Times in 12 Posts
    Yeah, mods change fairly frequently, mostly by either small text fiddling - changes so small as to not be a big problem to download - or the addition and subtraction of artwork, which are often not meaningfully diffable.

    Unknown to me, some prototypes using tar+xdelta had been made by others a couple of months back. For the MRise mods we used above, the tar+xdelta made a diff that was 24MB. I tried also bsdiff, which was much smaller - 16MB as you had above. However, xdelta took 13 seconds to run; bsdiff took 3 minutes. Modders are an impatient bunch of people; I wonder if bsdiff is worth it, and the same can be said of all the CM approaches. Interestingly, the full tars were then compressed with xz. Its amazing and nice that all the fringe formats we love are turning up and actually getting used by this games modding community

    With the range request bit, I was thinking of putting the diff at the end of the zip so it was a single range request to get that bit. And then the downloader would create a composite zip with valid headers from the bit it downloads and the bit it has (since it downloaded mods are stored zipped, it has the previous download around). I never imagined the client sending up what it has, rather the diff between each sequential (and perhaps between major and minor versions too) version is calculated by the mod maker using a packaging script.

    However, as the tar+xdelta prototype exists, I think I will go with that first; the bigger work is the actual separation of conflicting mods and things at the game engine level.
    Last edited by willvarfar; 19th November 2010 at 10:20.

  12. #12
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,368
    Thanks
    213
    Thanked 1,020 Times in 541 Posts
    bsdiff is asymmetric - its just that its fuzzy match finder is relatively slow, but decoding is not any different from normal LZ77.
    And I don't think that it matters even if it would run for an hour to generate a smaller patch... the actual problem with bsdiff
    is its ~12N memory usage... you'd have to build a 64-bit version (which probably requires some fixes) to diff larger archives.

    Also, I used lzma with parameters "-d25 -fb273 mc999999 -lc8 -lp0 -pb0" - anyway, there're lots of parameters and I don't
    understand how people use defaults for archives downloaded a million times.

  13. #13
    Member
    Join Date
    Feb 2010
    Location
    Nordic
    Posts
    200
    Thanks
    41
    Thanked 36 Times in 12 Posts
    archives downloaded a million times
    a fun problem to have

  14. #14
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Quote Originally Posted by Shelwien View Post
    Also, I used lzma with parameters "-d25 -fb273 mc999999 -lc8 -lp0 -pb0" - anyway, there're lots of parameters and I don't
    understand how people use defaults for archives downloaded a million times.
    I haven't seen these 3 explained, I just saw Bulat saying 'only I.Pavlov understands them' in FA 0.4 docs and thought not to bother.
    Could you tell what they are and (more importantly) how do they improve strength?

  15. #15
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,368
    Thanks
    213
    Thanked 1,020 Times in 541 Posts
    Quote Originally Posted by lzma.exe
    -lc{N}: set number of literal context bits - [0, 8], default: 3
    -lp{N}: set number of literal pos bits - [0, 4], default: 0
    -pb{N}: set number of pos bits - [0, 4], default: 2
    Afaik "literal context bits" are bits of previous symbol, ie -lc8 means order1 literal model
    And "pos" means offset in the data, ie it can improve compression of up to 16-byte aligned data.

Similar Threads

  1. zpaq updates
    By Matt Mahoney in forum Data Compression
    Replies: 2527
    Last Post: 4th May 2019, 13:33
  2. Bit guessing game
    By Shelwien in forum Data Compression
    Replies: 11
    Last Post: 24th November 2009, 02:22
  3. Dealing with container formats
    By subwolf in forum Data Compression
    Replies: 16
    Last Post: 2nd September 2009, 23:14
  4. Metacompressor.com benchmark updates
    By Sportman in forum Data Compression
    Replies: 79
    Last Post: 22nd April 2009, 04:24

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •