Results 1 to 21 of 21

Thread: RW-mounting of archive

  1. #1
    Member
    Join Date
    Aug 2019
    Location
    USA
    Posts
    13
    Thanks
    2
    Thanked 1 Time in 1 Post

    RW-mounting of archive

    This topic is the continuation of threads "Mounting a .zip file" and "Dokan+7zip = mount many possible archives".
    Now 7ZIP ("7za.exe a -r -t7z -m0=lzma2 -mx=9") gives good compression (952 to 263 MB), but can be mounted in RO-mode.
    Soon we'll be able to mount ZIP archives with bzip2 and deflate (zlib) compression (952 to 366 MB) in RW-mode.

    This breakthrough will allow to create portable programs in one executable file that does not require unpacking!
    But we still look variants that theoretically can provide better compression and simple structure for RW-mount.
    We did some tests and found that 7zip (7za a 7WIM.wim ./Test) makes excellent WIM (952 to 278 MB).

    What else we can try for compress executable files?

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,975
    Thanks
    296
    Thanked 1,302 Times in 739 Posts
    "7za a 7WIM.wim ./Test" probably produces 7zip archive, not a wim.

    As to exe files, the easy way to improve compression there is to use some good preprocessor.
    LZMS actually includes one, but its not very good.
    http://nishi.dreamhosters.com/u/bcj2_v0.rar
    http://farbrausch.com/~fg/code/disfilter/
    http://nishi.dreamhosters.com/u/x64flt3_v0.rar

    Also a delta preprocessor should be useful:
    http://freearc.dreamhosters.com/delta151.zip

    Overall, .exe is basically a container format - it can contain code (for various cpus), aligned data, text,
    also all kinds of filetypes in resources. So it'd take a lot of work to properly compress it.

    It might be good to use some preprocessing + zstd for FS compression (with individual files).
    Zstd also supports external dictionaries, so its possible to build a common dictionary for the files in archive.

  3. #3
    Member
    Join Date
    Aug 2019
    Location
    USA
    Posts
    13
    Thanks
    2
    Thanked 1 Time in 1 Post
    bcj2enc doesn't work: error 0x00000000004011A5.
    disfilter doesn't have a build for Windows.
    x64flt3 and delta151 works fine.
    But ptime cmd /c "zstd.exe -19 test.tar -o ZSTD.zip" > ZSTD.txt = 305 Mbyte, 345.582 seconds.
    It's too slow and doesn't give good compression.

  4. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,975
    Thanks
    296
    Thanked 1,302 Times in 739 Posts
    http://nishi.dreamhosters.com/u/bcj2_v0.rar reuploaded.
    Does it crash on your test file or when running its own test.bat?
    Also it doesn't have a decoder - its a standalone version of 7-zip BCJ2 filter for reference.
    Well, x64flt is better anyway.

    http://nishi.dreamhosters.com/u/dispack_v0.rar
    Compiled it for you.

    > [zstd] It's too slow and doesn't give good compression

    Test decoding speed, also try faster encoding modes - its not like I suggested it for best compression... it has 10x faster decoding with lzma-like compression.
    Also don't compare plain zstd vs .7z size - 7-zip uses BCJ2 filter by default.

    I guess you can also test http://nishi.dreamhosters.com/u/ooz_v1.rar (https://github.com/rarten/ooz/)

  5. #5
    Member
    Join Date
    Aug 2019
    Location
    USA
    Posts
    13
    Thanks
    2
    Thanked 1 Time in 1 Post
    preprocessors:
    > BCJ2ENC: test.bat works fine, but "bcj2enc c test.tar 1_" = 0х0000000000С21632.
    > DISPACK: test.bat works fine, but "dispack e test.tar 1" = "Only 32-bit PE files for x86 supported".
    > X64FLT3: "x64flt3.exe c test.tar stream" = stream+stream_1+stream_2+stream_3 (954+12+7+5 Mbyte) = ?
    > DELTA: "Delta.exe -v3 test.tar Delta" = Delta (955 Mbyte) = expected result and good speed.

    compressors:
    > ZSTD: ptime cmd /c "zstd.exe -19 Delta -o ZSTD-Delta" > ZSTD-Delta.txt = 349.091 s, 300 Mbyte.
    We have +5 Mbyte to previous test and fast decoding speed, but slow and bad compression ratio = not interesting, sorry.
    > OOZ: ptime cmd /c "ooz.exe -z --level=4 --leviathan Delta test6.ooz" > ooz6.txt = 130.517 s, 290 Mbyte.
    2.6 times faster and 3% less than ZSTD = interesting, but very resource-intensive, on my old PC works only these modes:
    - 048.466 s, 302 MB = ptime cmd /c "ooz.exe -z --level=4 test.tar test1.ooz" > ooz1.txt
    - 047.717 s, 302 MB = ptime cmd /c "ooz.exe -z --level=4 --kraken test.tar test2.ooz" > ooz2.txt
    - 005.535 s, 933 MB = ptime cmd /c "ooz.exe -z --level=0 --mermaid test.tar test3.ooz" > ooz3.txt
    - 005.560 s, 933 MB = ptime cmd /c "ooz.exe -z --level=0 --selkie test.tar test4.ooz" > ooz4.txt
    - 135.328 s, 292 MB = ptime cmd /c "ooz.exe -z --level=4 --leviathan test.tar test5.ooz" > ooz5.txt

    In total we have "DELTA" and "OOZ --leviathan".
    Is it possible to improve the result or try something else?

  6. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,975
    Thanks
    296
    Thanked 1,302 Times in 739 Posts
    https://github.com/schnaader/precomp-cpp (exes in releases)

    for x64flt3 (like with BCJ2) you need to compress all streams.
    Smaller streams are 32-bit aligned, so its possible to improve compression by using lzma lc0 lp2 pb2 and delta for these.
    There's also http://nishi.dreamhosters.com/u/x64flt.exe (usage: x64flt c/d input output) which is equivalent to BCJ
    and produces only one output stream. But its worse than BCJ2/x64flt3.

    Its possible to apply delta multiple times.

    With that size of data it would be good to use http://freearc.dreamhosters.com/srep393a.zip
    I suppose it shouldn't be hard to use this kind of thing as global dictionary with FS compression.
    Actually zpaq already does it in a way.

    Also you can test http://freearc.dreamhosters.com/mm11.zip

    If you don't like zstd, maybe you'd like brotli? https://github.com/google/brotli

  7. #7
    Member
    Join Date
    Aug 2019
    Location
    USA
    Posts
    13
    Thanks
    2
    Thanked 1 Time in 1 Post
    > PRECOMP-CPP: very slow (maybe I just use wrong settings).
    ptime cmd /c "precomp.exe -oprecomp.pcf -cb -d1 test.tar" > precomp.txt = 362 Mbyte, 244.032 s (I think it's long for 2019 year)
    ptime cmd /c "bzip2.exe -k -c -9 precomp.pcf > precomp.bz2" > precomp-bzip2.txt = 362 Mbyte, 84.222 s (vs. 366 Mbyte without precomp)
    ptime cmd /c "ooz.exe -z --level=4 --leviathan precomp.pcf precomp.ooz" > precomp-ooz.txt = 360 Mbyte, 56.581 s (vs. 290 Mbyte with DELTA)

    > X64FLT3: very complicated for RW-mounting (maybe i'm wrong).
    -

    > X64FLT: very close to WIMLIB-LZX/LZMS (with OOZ).
    ptime cmd /c "x64flt.exe с test.tar test.x64flt" > x64flt.txt = 354 Mbyte, 9.954 s
    ptime cmd /c "bzip2.exe -k -c -9 test.x64flt > x64flt.bz2" > x64flt-bzip2.txt = 356 Mbyte, 154.540 s (vs. 366 Mbyte without X64FLT)
    ptime cmd /c "ooz.exe -z --level=4 --leviathan test.x64flt x64flt.ooz" > x64flt-ooz.txt = 287 Mbyte, 133.194 s (and it's very good!)

    > SREP393A: very close to 7ZIP-LZMA (with OOZ), better than WIMLIB-LZX/LZMS and 7ZIP-WIM!
    ptime cmd /c "srep.exe -m5 test.tar test.srep" > srep.txt = 689 Mbyte, 9.874 s
    ptime cmd /c "bzip2.exe -k -c -9 test.srep > srep.bz2" > srep-bzip2.txt = 315 Mbyte, 97.451 s
    ptime cmd /c "ooz.exe -z --level=4 --leviathan test.srep srep.ooz" > srep-ooz.txt = 274 Mbyte, 117.515 s

    > MM11: worse than SREP.
    ptime cmd /c "mm.exe test.tar test.mm" > mm.txt = 954 Mbyte, 8.796 s
    ptime cmd /c "bzip2.exe -k -c -9 test.mm > mm.bz2" > mm-bz2.txt = 366 Mbyte, 155.873 s
    ptime cmd /c "ooz.exe -z --level=4 --leviathan test.mm mm.ooz" > mm-ooz.txt = 292 Mbyte, 137.971 s

    > BROTLI: slower than all I tested before (strange that zstd is the basis of WinBtrfs...).
    ptime cmd /c "brotli.exe -9 -o test.br test.tar" > brotli.txt = 314 Mbyte, 645.399 s

    I think OOZ+SREP - that's what deserves our attention.
    But what settings is better to OOZ+SREP and what chances to mount received archive in RW-mode?

  8. #8
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    615
    Thanks
    260
    Thanked 242 Times in 121 Posts
    Code:
    ptime cmd /c "precomp.exe -oprecomp.pcf -cb -d1 test.tar" > precomp.txt = 362 Mbyte, 244.032 s (I think it's long for 2019 year)
    ptime cmd /c "bzip2.exe -k -c -9 precomp.pcf > precomp.bz2" > precomp-bzip2.txt = 362 Mbyte, 84.222 s (vs. 366 Mbyte without precomp)
    I'm not sure if I understand your notation, but if the precomp.pcf in the second line is the output from above, you've got redundancy here. The output of the first line is bzip2-compressed already (because of -cb).

    My suggestion would be "precomp.exe -cn -oprecomp.pcf -d1 test.tar" to get the decompressed file (and after that, processing with pbzip or ooz) - or "precomp.exe -lf+x -oprecomp.pcf -d1 test.tar" which generates a LZMA2-compressed file using LZMAs x86 executable filter.

    But in general, Precomp might not fit your usecase here, e.g. deduplication is still missing, so it will be slower than most other tools. And chains like "Precomp -cn" -> "SREP" -> "Precomp -lf+x" have too much overhead.

    Another advice would be to use pbzip2 which is a multi-threaded version of bzip2 to speed up both compression and decompression, but all of my links for Windows binaries are dead.
    http://schnaader.info
    Damn kids. They're all alike.

  9. #9
    Member
    Join Date
    Aug 2019
    Location
    USA
    Posts
    13
    Thanks
    2
    Thanked 1 Time in 1 Post
    Thanks for advice and sorry for mistakes - I'm not specialist in data compression.
    Unfortunately, I didn't find Windows versions of pbzip2 (multithreaded bzip2) and pigz (multithreaded gzip).
    Also I confused with many of its brothers. That is why I came to this forum for help...

    ptime cmd /c "precomp.exe -cn -oprecomp1.pcf -d1 test.tar" > precomp1.txt = 1.14 Gb, 77.169 s.
    ptime cmd /c "ooz.exe -z --level=4 --leviathan precomp1.pcf precomp1.ooz" > precomp-ooz.txt = 275 Mbyte, 107.696 s.
    In total OOZ+PRECOMP1=275 Mbyte and 114.865 s.: faster, than OOZ+SREP (274 Mbyte and 127.389 s.).

    ptime cmd /c "precomp.exe -lf+x -oprecomp2.pcf -d1 test.tar" > precomp2.txt = 266 Mbyte, 110.460 s.
    ptime cmd /c "ooz.exe -z --level=4 --leviathan precomp2.pcf precomp2.ooz" > precomp2-ooz.txt = 266 Mbyte, 30.201 s. (can be skipped?).
    In total PRECOMP2=266 Mbyte and 110.460 s.: faster and smaller, than OOZ+SREP (274 Mbyte and 127.389 s.).

    Any other suggestions for the test?
    Last edited by AcidBurn; 5th August 2019 at 15:49.

  10. #10
    Member
    Join Date
    Aug 2019
    Location
    USA
    Posts
    13
    Thanks
    2
    Thanked 1 Time in 1 Post
    Well, I wanted to say many thanks to all who participated in this topic!
    We got the following: the minimum size gives 7-zip (263 MB), optimal algorithm for RW-mounting gives OOZ+precomp (266 MB).
    Hope we can get the OOZ SDK and integrate this into libzip or choose another algorithm from the available set...

    But I still wanted to clarify whether is it possible to do something more?
    I guess that it's difficult to answer without access to test files. So here is a sample I'm working with.
    Can anyone improve compression ratio and speed or we did everything we can?

  11. #11
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,975
    Thanks
    296
    Thanked 1,302 Times in 739 Posts
    1. 7-zip has options which can significantly affect the output size, even for standard 7-zip.
    Like, you can add -md=1536M, or -mqs=on, or -mfb=273, or tweak lc/lp/pb, or use an explicit BCJ2 filter tree.
    There're also 7z updates like this: http://www.tc4shell.com/en/7zip/

    2. You can't "get OOZ SDK". I already linked the github source and that's it.
    Also there're legal issues with actually using it.

    3. Yes, its possible to improve compression ratio even using things already listed here.
    You seem to treat everything like "black boxes" with same properties, but
    x64flt, srep, delta, mm, precomp are totally different preprocessors which can be all used at once,
    and combined with any compression algorithm.
    There're also some other preprocessors (like tpnibble or xwrt) which are potentially useful,
    but lack detection code.

    4. Your approach to building a compressed filesystem is very strange.
    I guess it doesn't matter if whole archive is extracted to Temp on first access,
    then eventually compressed back at once.
    But normally I'd expect special algorithms (like per-file compression with global dictionary)
    to be designed for this purpose.
    Unfortunately there're no publicly available efficient algorithms designed for this.

    5. You can also test https://encode.su/threads/2829-RAZOR...based-archiver
    and http://web.archive.org/web/201611022...//nanozip.net/
    They have integrated preprocessors (like BCJ,delta,mm).
    Weren't included before, because I presumed that you'd want open-source tools.

  12. #12
    Member
    Join Date
    Aug 2019
    Location
    USA
    Posts
    13
    Thanks
    2
    Thanked 1 Time in 1 Post
    1. 7-zip interesting only for comparison with other algorithms, because cannot be used for RW-mounting.
    I thought lc/lp/pb are not for general tasks and "7za.exe a -r -t7z -m0=lzma2 -mx=9 LZMA.zip /Test" is not bad solution.
    Maybe I was wrong, but options you specified didn't give better compression:
    ptime cmd /c "7za.exe a -r -t7z -m0=lzma2 -mx=9 -md=1536M LZMA.7z ./Test" > LZMA.txt = System Error
    ptime cmd /c "7za.exe a -r -t7z -m0=lzma2 -mx=9 -mf=BCJ2 LZMA.7z ./Test" > LZMA.txt = ERROR: Can't allocate required memory
    ptime cmd /c "7za.exe a -r -t7z -m0=lzma2 -mx=9 -mqs=on LZMA.7z ./Test" > LZMA.txt = 270 Mbyte, 131.299 s
    ptime cmd /c "7za.exe a -r -t7z -m0=lzma2 -mx=9 -mfb=273 LZMA.7z ./Test" > LZMA.txt = 266 Mbyte, 161.908 s

    2. I could try discussing this theme with the author.
    SDK are headers + API description + pluggable libraries. Why not provide them for a good cause?
    But first I want to make sure that OOZ is the best solution for this purpose.

    3. Yes, it's "black boxes" for me and many users. I can learn this theme several hours a week and all this is not easy.
    I understand that the results can be improved using external preprocessors, but I'll leave them in reserve.
    Also Schnaader said that "chains like "Precomp -cn" -> "SREP" -> "Precomp -lf+x" have too much overhead".

    4. PISMO PFO, IMAGEX ESD, DISM ESD, VHD COMPACT works in RW-mode without Temp.
    It saves time, SSD resource, disk and memory space (if temp in ram).
    Poor compression - is the only one reason to search alternatives.

    5. Yes, we need open-source tools. So RAZOR and special NanoZip are problematic to use.
    NANOZIP as always very fast:
    ptime cmd /c "nz.exe a -cc -m2g NZ.nz test.tar" > NZ.txt = 219 Mbyte, 1146.665 s
    ptime cmd /c "nz.exe a -cO -m2g NZ.nz test.tar" > NZ.txt = 228 Mbyte, 423.599 s
    ptime cmd /c "nz.exe a -cc -m2g -p6 -t16 NZ.nz test.tar" > NZ.txt = 237 Mbyte, 281.586 s
    ptime cmd /c "nz.exe a -co -m2g NZ.nz test.tar" > NZ.txt = 265 Mbyte, 98.946 s
    ptime cmd /c "nz.exe a -cD -m2g NZ.nz test.tar" > NZ.txt = 306 Mbyte, 17.151 s
    ptime cmd /c "nz.exe a -cd -m2g NZ.nz test.tar" > NZ.txt = 328 Mbyte, 9.450 s
    ptime cmd /c "nz.exe a -cf -m2g NZ.nz test.tar" > NZ.txt = 409 Mbyte, 6.691 s
    RAZOR as always very slow:
    ptime cmd /c "rz.exe a RZ.rz .\Test\*.*" > RZ.txt = 238 Mbyte, 1118.316 s (~19 min.)
    ptime cmd /c "rz.exe a -d 1M RZ.rz .\Test\*.*" > RZ.txt = 261 Mbyte, 1081.555 s (~18 min.)

    If you can suggest something else, please write - I'm ready to try.

  13. #13
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,975
    Thanks
    296
    Thanked 1,302 Times in 739 Posts
    Ok, I did some tests with 7-zip.
    Here I've got 275M with bcj2/lzma tweaks (still d26=64M window),
    274M with 128M window but hc4 matchfinder (less memory),
    273M with x64flt3, 267M with stronger entropy coding.
    While 1G window gives 251M.

    Code:
    279,728,069 Test.7z // https://dropmefiles.com/ZJS84
    
    277,622,379  90.906s: 7z a -mx=9 -myx=9 001 Test
    
    280,356,179  95.343s: 7z a -mx=9 -myx=9 -mqs=on 002 Test
    277,364,342 163.282s: 7z a -mx=9 -myx=9 -mc=1g 003 Test
    276,918,389 123.719s: 7z a -mx=9 -myx=9 -mfb=273 004 Test
    277,142,502  92.984s: 7z a -mx=9 -myx=9 -mlc=4 005 Test
    
    276,159,283 224.266s: t1.bat / 7z.exe a -bb3 -mx=9 -myx=9 -m0=bcj2 -m1=lzma [...]
    276,175,165 222.531s: t1.bat / bcj2:d27
    276,264,033 228.531s: t1.bat / bcj2:d25
    276,691,065 227.890s: t1.bat / bcj2:d25 / no delta:4
    276,159,283 223.656s: t1.bat / bcj2:d26 -mf=off
    
    272,925,205 262.344s: t3.bat / x64flt3 + delta
    267,468,238 496.203s: t3a.bat / plzma
    267,472,224 412.984s: t3a.bat / plzma4
    
    275,466,297 218.907s: t1a=t1.bat/d24 for extra streams
    275,569,312 223.750s: as above/lc7
    
    271,000,158 232.844s: t1a.bat/d27 for main stream; 128*11.5=1472M
    269,765,290 244.328s: t1a.bat/d28 2944M
    257,619,678 271.406s: t1a.bat/d29 5888M
    251,873,112 276.094s: t1a.bat/d30 11776M
    
    274,200,749 314.297s: t1a.bat/d28 hc4:mc16 128*6.5=834M
    Attached Files Attached Files

  14. Thanks:

    AcidBurn (7th August 2019)

  15. #14
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,975
    Thanks
    296
    Thanked 1,302 Times in 739 Posts
    > 1. 7-zip interesting only for comparison with other algorithms,
    > because cannot be used for RW-mounting.

    Why? 7-zip is open-source, so its possible to use its codecs with any custom container format.
    Why precomp is okay, but 7-zip isn't?

    > I thought lc/lp/pb are not for general tasks

    They are lzma/lzma2 options.
    For example, for 32bit-aligned data you'd set lc0:lp2:pb2,
    while for text its better to use lc8:lp0:pb0 (or lc4 with lzma2 since lc8 is impossible for it).

    > Maybe I was wrong, but options you specified didn't give better compression

    They do (except mqs=on), see my results above.

    > 2. I could try discussing this theme with the author.
    > SDK are headers + API description + pluggable libraries.
    > Why not provide them for a good cause?

    Well, good luck: http://www.radgametools.com/oodle.htm
    What I linked before is a reverse-engineered version of that library.
    Reverse-engineering itself is legal in some countries,
    but actual use of ooz source in your project can be troublesome,
    especially if there's profit somewhere.

    > 3. Yes, it's "black boxes" for me and many users.
    > I can learn this theme several hours a week and all this is not easy.

    There's nothing that hard on user level.
    Just be careful when comparing standalone tools with integrated ones (like 7-zip or nz).
    For example, 7-zip has MT and handles multiple streams and auto-assigns
    some preprocessors (like BCJ2 and delta), so its wrong to compare
    its compression time to srep+precomp.

    > I understand that the results can be improved using external preprocessors,
    > but I'll leave them in reserve.

    Preprocessors are the easiest way to improve compression though, especially for exes.
    Basically any other option would require months of work.

    > Also Schnaader said that "chains like "Precomp -cn" -> "SREP" -> "Precomp -lf+x" have too much overhead".

    Well, using precomp's integrated compression is a bad idea anyway.

    > 4. PISMO PFO, IMAGEX ESD, DISM ESD, VHD COMPACT works in RW-mode without Temp.
    > It saves time, SSD resource, disk and memory space (if temp in ram).
    > Poor compression - is the only one reason to search alternatives.

    Yes, but there's a limit to possible gains from custom file-format handlers and such.
    In the end, the main improvement comes from using more memory.

    Also, existing compressed-FS implementations work at cluster or file level,
    so comparing them to solid 7z archive of the whole data makes no sense.

    Sure, I believe that its possible to design FS format with stronger
    compression, but it won't be solved by collecting open-source modules on the net.

  16. #15
    Member
    Join Date
    Aug 2019
    Location
    USA
    Posts
    13
    Thanks
    2
    Thanked 1 Time in 1 Post
    > Why precomp is okay, but 7-zip isn't?
    Unfortunately 7-zip is mounted in RO-mode and "the 7zip author is not easy to approach".
    The rest algorithms is also in doubt, but I still hope to find some solution on this forum.
    Real alternative for now is only XZ, if/when will be accepted patch, fixed bug, implemented multithreading and filters.
    Еverything isn't simple, but we don't give up. As said Rene Descartes "Cogito ergo sum".
    Maybe idea will interest someone and together we can do something useful, maybe not, will see...

    > for 32bit-aligned data
    I don't find what is the "32bit-aligned data"?

    > They do (except mqs=on), see my results above.
    Thanks for the tests and scripts! I couldn't write something like this, but I'll try to understand...

    For all other points, I completely agree with you and have nothing to add.
    Thanks again!

  17. #16
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,975
    Thanks
    296
    Thanked 1,302 Times in 739 Posts
    > Unfortunately 7-zip is mounted in RO-mode

    Its not a problem of 7-zip.
    7-zip is perfectly able to update its archives and there's API with access to all functions.

    Actual problem is that .7z archives mostly use solid compression,
    so updating a file in archive can take unreasonable time (and resources),
    since it requires re-compressing all the files in the corresponding solid block.

    Of course, .7z format is flexible enough, so you can create archives
    with reasonably small solid blocks.
    But writing to normal .7z archives would be still slow.

    > The rest algorithms is also in doubt,
    > but I still hope to find some solution on this forum.

    Solution to what?
    RW support for 7-zip is a matter of implementing an actual write handler
    instead of "DbgPrint("WriteFile not support\n");".
    Though of course its far from simple - would involve file caching etc.

    Finding a file format with state-of-art compression
    and compatibility with RW random access?
    The best approximation I can think of would be zpaq: http://www.mattmahoney.net/dc/zpaq.html
    It might not have relevant compression methods as it is,
    but its extensible, has dedup, and implements archive updates
    via appending only modified file chunks to archive.

    But the best approach would be designing and implementing your own
    archive format with explicit support for RW random access.

    > I don't find what is the "32bit-aligned data"?

    For example a table of float-type numbers.
    Or CALL/JMP target addrs in case of BCJ2 extra streams.

  18. #17
    Member
    Join Date
    Aug 2019
    Location
    USA
    Posts
    13
    Thanks
    2
    Thanked 1 Time in 1 Post
    > 7-zip
    I have no complaints to 7-zip and its author, each goes its own way.
    Solid compression is disabled by the -ms=off switch, but file writing remains slow.
    And yes, implementing an actual write handler - it's far from simple.
    This is the reason why it's not suitable for RW-mounting.
    What problems will appear when testing other algorithms - time would tell.

    > The best approximation I can think of would be zpaq
    Yes, you have already told about zpaq, but I have a problem with him:
    E:\test_zips3>zpaq -m5 a Zpaq.zpaq Test.tar
    zpaq v7.15 journaling archiver, compiled Aug 17 2016
    Zpaq.zpaq: Incomplete transaction ignored
    0 versions, 0 files, 0 fragments, 0.000000 MB
    Updating Zpaq.zpaq at offset 0 + 0
    Adding 1001.371648 MB in 1 files -method 56 -threads 2 at 2019-08-07 05:33:27.
    6.73% 0:00:09 [1..889] 66988318 -method 56,106,0
    13.55% 0:00:08 [890..1831] 66994482 -method 56,105,0
    20.29% 0:00:09 [1832..2746] 67088068 -method 56,90,0
    27.02% 0:00:08 [2747..3637] 66976198 -method 56,72,0
    33.70% 0:06:04 [3638..4557] 67064633 -method 56,83,0
    job 2: std::bad_alloc

    "zpaq -m5 a Zpaq.zpaq Test\*" = std::bad_alloc too.

    > the best approach would be designing and implementing your own archive format
    Good joke, I think this is not the easiest way.
    In case of failure to RW-mount existing archive formats, it's easier to return to the very stable, but slow and poorly compressed PISMO PFO.

    > For example a table of float-type numbers.
    Ok, reformulate the question: "How do you know what data is 32bit-aligned?".

  19. #18
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    615
    Thanks
    260
    Thanked 242 Times in 121 Posts
    Quote Originally Posted by AcidBurn View Post
    E:\test_zips3>zpaq -m5 a Zpaq.zpaq Test.tar
    [..]
    Adding 1001.371648 MB in 1 files -method 56 -threads 2 at 2019-08-07 05:33:27.
    [..]
    job 2: std::bad_alloc
    Looks like you're using the 32-bit version (zpaq.exe) which is limited to 2 GB memory (see http://www.mattmahoney.net/dc/zpaq.html, "Multithreaded Compression"). "-m5" will use at least 850 MB per thread (1700 MB total in your case) and additional memory (not much left with 2 GB limitation) for other things like deduplication and block management.

    Either use the 64-bit version (zpaq64.exe) if your OS is 64-bit or try "zpaq -m5 -threads 1 a Zpaq.zpaq Test.tar" which uses only one thread and less memory.

    Also note these messages:

    Zpaq.zpaq: Incomplete transaction ignored
    Updating Zpaq.zpaq at offset 0 + 0
    zpaq is trying to update an existing Zpaq.zpaq file which might not be what you intended - the resulting file might be a mix of several archive operations and bigger because of this. Also, timings might get skewed from this. For a "fresh" archive, consider deleting Zpaq.zpaq before you run zpaq.
    http://schnaader.info
    Damn kids. They're all alike.

  20. #19
    Member
    Join Date
    Aug 2019
    Location
    USA
    Posts
    13
    Thanks
    2
    Thanked 1 Time in 1 Post
    Yes, thank you!
    The modes "-m5 -t3" and "-m4 -t6" are acceptable for my PC, the above ("-m5 -t4", -m59...-m611) = std::bad_alloc:
    ptime cmd /c "zpaq64 -m5 -t3 a Zpaq-M5-T3.zpaq Test\*" > Zpaq-M5-T3.txt = 1034.806 s, 249 Mbyte
    ptime cmd /c "zpaq64 -m4 -t6 a Zpaq-M4-T8.zpaq Test\*" > Zpaq-M4-T6.txt = 231.529 s, 272 Mbyte
    Are there any ways to improve compression (preprocessors/settings) and speed (GPU support hasn't been implemented?)?

    And what do you think about Radyx or LRZIP?
    Upd: LRZIP binary is very old (one, two), сompile new for Windows according to instructions.
    ptime cmd /c "radyx a -mx12 Radix Test.tar" > Radix.txt = 76.599 s, 270 Mbyte - not bad.
    ptime cmd /c "lrzip.exe -Uzp 1 -L 9 -o test.lrz test.tar" > LRZIP.txt = 1640.542 s, 237 Mbyte - bad.
    Last edited by AcidBurn; 7th August 2019 at 16:04.

  21. #20
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,975
    Thanks
    296
    Thanked 1,302 Times in 739 Posts
    > Solid compression is disabled by the -ms=off switch, but file writing remains slow.

    Yeah, because -ms=off just means "make a solid block per file",
    and your data includes 100Mb+ files.

    http://nishi.dreamhosters.com/u/method.htm#Solid
    Try -ms=100f10M or something.

    > This is the reason why it's not suitable for RW-mounting.

    Not really. The task is more or less the same for all archive formats,
    and 7-zip at least is open-source and has 7z.dll API.

    >> the best approach would be designing and implementing your own archive format
    > Good joke, I think this is not the easiest way.

    Its probably the only way if you want to create a stable and efficient compressed FS.
    Archive formats have bad compatibility with random access,
    while FS formats are not compatible with compression
    (it requires support for variable-size blocks, caching and
    async/MT operation and ECC).

    SSD firmware developers might have the best approximation.

    > In case of failure to RW-mount existing archive formats, it's easier to
    > return to the very stable, but slow and poorly compressed PISMO PFO.

    Actually win10 NTFS has support for LZX compression:
    Code:
    > compact /?
    ...
    /EXE       Use compression optimized for executable files which are read
    frequently and not modified.  Supported algorithms are:
    XPRESS4K  (fastest) (default)
    XPRESS8K
    XPRESS16K
    LZX       (most compact)
    Also you can mount WIM as FS and there's LZMS which is improved version of LZX
    with built-in x86/x64 preprocessor (which was the inspiration for x64flt).

    >> For example a table of float-type numbers.
    > Ok, reformulate the question: "How do you know what data is 32bit-aligned?".

    1) Best-compression method:
    test actual compression for all popular element sizes
    and keep the one with smallest size.

    2) Faster but less reliable method:
    gather byte statistics in context of pos%element_size,
    find element_size with minimal entropy.
    Essentially this is similar to (1), but we can use
    a faster compression algorithm to estimate compression
    (or even a rough approximation of it).

    > Are there any ways to improve compression (preprocessors/settings)

    Kinda yes, zpaq even provides its own programming language (zpaql)
    for adding preprocessors with forward compatibility
    (old zpaq versions would be able to process archives compressed with new codecs).

    On other hand, its probably useless if there's no existing preset which works for your purpose.
    Just making a new format while borrowing a couple of ideas from zpaq
    would be probably easier than fixing it.

    > and speed (GPU support hasn't been implemented?)?

    GPUs are still not very compatible with strong lossless compression.
    But you can test BSC if you have nvidia gpu: http://libbsc.com/

    > And what do you think about Radyx or LRZIP?

    They are both lzma-based, and there's no archive format which you could use.

    I guess you can test https://github.com/conor42/7-Zip-FL2/releases/
    But it seems to say that it requires 2x memory of bt4 for same compression.

  22. #21
    Member
    Join Date
    Aug 2019
    Location
    USA
    Posts
    13
    Thanks
    2
    Thanked 1 Time in 1 Post
    > 7-zip
    In any case, I cannot create write handler or accelerate archive updates, that make 7z (temporarily?) unusable.

    > win10 LZX, WIM
    I know and use it, on old PC it allows to compress the sample to 287-437 MB in 27-175 seconds:
    the most compact solution is 287 MB (79 seconds), the fastest is 27 seconds (397 MB).
    Archiving algorithms >287 MB and/or >175 seconds are not considered.

    But 7Z, NanoZIP, OOZ, RADYX, XZ easy gives more compact results in comparable time.
    Now we are creating a list of possible archivers, algorithms and problems (including licenses).
    Further it all will be clarified (I mean the ability/speed/resource intensity of creating/mounting/access/update, etc.).

    Thank you for finding and offering new interesting options. At least it’s interesting to learn and test.

    > Just making a new (archive) format while borrowing a couple of ideas from zpaq would be probably easier
    Creating a new archive format is an absolutely impossible task for me, don’t know how for you.

    > you can test BSC if you have nvidia gpu: http://libbsc.com/
    > you can test https://github.com/conor42/7-Zip-FL2/releases/
    ptime cmd /c "7zf.exe a -t7z -mx=9 -m0=flzma2 FLZMA.7z ./Test" > FLZMA.txt = 100.157 s, 267 MByte
    ptime cmd /c "7za.exe a -t7z -mx=9 -m0=lzma2 LZMA.7z ./Test" > LZMA.txt = 170.567 s, 266 MByte

    ptime cmd /c "fxz.exe -z -7 -e -T0 Test.tar" > FXZ.txt = 99.815 s, 280 MByte
    ptime cmd /c "xz.exe -z -7 -e -T0 Test.tar" > XZ.txt = 117.615 s, 283 MByte

    ptime cmd /c "bsc.exe -e2 -b1024 Test.tar Test.bsc" > BSC.txt = don't work in GTX285
    Last edited by AcidBurn; 9th August 2019 at 06:18.

Similar Threads

  1. PLEASE HELP ME ABOUT THIS COMPRESSED ARCHIVE !!!
    By CoreGames in forum Data Compression
    Replies: 8
    Last Post: 22nd September 2014, 18:20
  2. Unknown archive
    By Surfer in forum The Off-Topic Lounge
    Replies: 6
    Last Post: 15th July 2010, 00:59
  3. Mounting a .zip file
    By jaclaz in forum Data Compression
    Replies: 7
    Last Post: 24th November 2008, 03:54
  4. Forum archive
    By encode in forum Data Compression
    Replies: 5
    Last Post: 15th May 2008, 11:34
  5. New archive format
    By Matt Mahoney in forum Forum Archive
    Replies: 9
    Last Post: 25th December 2007, 11:22

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •