Page 1 of 3 123 LastLast
Results 1 to 30 of 71

Thread: AntiZ-an open source alternative to precomp

  1. #1
    Member
    Join Date
    May 2015
    Location
    Hungary
    Posts
    25
    Thanks
    4
    Thanked 32 Times in 8 Posts

    AntiZ-an open source alternative to precomp

    Hello everyone!

    I have been working on a precompressor with the goal of making a precomp alternative. It is still very rudimentary, but it usually doesn't corrupt data so I figured why not release the first alpha.
    As of now it does about the same as precomp with fastmode off, intense on,all file format handling off, recursion off. The file format is not final or stable, the specification is not written yet and while I did some tests I am sure there are bugs, so this is a really alpha release. Oh and it is SLOW, it bruteforces the zlib clevel/memlevel/windowbits, although it does take a few shortcuts. More intelligent and faster mode is planned.

    This release takes the specified input file and expands it into a .atz file(kinda like a .pcf), then reads back the .atz and reconstructs the original file to a .rec file. The .rec file should be byte-identical to the original file, if not then you have found a bug and please open an issue on GitHub.
    If the -r switch is given after the input file name, then it is assumed that the file is an ATZ file and the program attempts to reconstruct the original file from it, into a .rec file.

    Two .exes in the release, one is silent and one is for debugging. Windows x64 only, core 2 duo or later. GPLv3 license

    Contributions, patches, forks, etc. are welcome on Github: https://github.com/Diazonium/AntiZ
    Attached Files Attached Files

  2. Thanks (18):

    Bulat Ziganshin (16th May 2015),chummy (27th October 2015),CoolOppo (26th May 2015),Gonzalo (16th May 2015),Jan Ondrus (16th May 2015),kaitz (16th May 2015),kassane (4th June 2016),Matt Mahoney (19th May 2015),milky (2nd July 2017),Nania Francesco (17th May 2015),ne0n (19th May 2015),rjmalagon (17th May 2015),samsat1024 (17th May 2015),Simorq (21st May 2017),Skymmer (17th May 2015),Stephan Busch (17th May 2015),surfersat (18th May 2015),VoLT (29th May 2015)

  3. #2
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    515
    Thanks
    234
    Thanked 89 Times in 69 Posts
    I love you man! In the sane sense

    I always wanted to do the same thing... Actually, I have had planned the whole process and just trying to find some free time to slowly implement it. So thank you for the initiative.

    EDIT: It is possible to provide an x86 version? Thanks!
    Last edited by Gonzalo; 17th May 2015 at 00:47.

  4. #3
    Member
    Join Date
    May 2015
    Location
    Hungary
    Posts
    25
    Thanks
    4
    Thanked 32 Times in 8 Posts
    Quote Originally Posted by Gonzalo View Post
    I love you man! In the sane sense

    I always wanted to do the same thing... Actually, I have had planned the whole process and just trying to find some free time to slowly implement it. So thank you for the initiative.

    EDIT: Can you provide please x86 version? Thank you.
    I am afraid I cannot provide 32-bit binaries yet, I have used 64-bit ints extensively in the code, and it is not a simple search and replace because some of them are written to disk in the ATZ file, so I would have to redo quite a few other things and figure out how to keep having a single codebase for both and keep the ATZ files compatible. I am not exactly an experienced programmer, and this is just a hobby project for me, so I am learning as I go.

    I will add 32 bit to the todo list, but no promises on when will I do it.

  5. #4
    Member Skymmer's Avatar
    Join Date
    Mar 2009
    Location
    Russia
    Posts
    681
    Thanks
    38
    Thanked 168 Times in 84 Posts
    Wow, thats really surprising. There are not too much precompressors have been created at this moment, so any new development in this area is highly interesting. Especially if its an open source project.
    Thanks for it

    Its a little bit sad that you aiming your project at zlib. No doubt deflate compression is highly widespreaded but for example a lot of MS Win related distributives are LZX based.
    Anyway, its good to see that precompression is not dead.
    I performed a little test on one continuos 92 MB zlib block. Actually AntiZ is not so slow comparing to precomp.

    Input size:96396299
    Good offsets: 1
    recompressed:1/1
    Total bytes written: 181061083
    reconstructing from jeepride.ff.atz
    File size:181061083
    Process Time : 61.406s
    Clock Time : 61.468s
    Working Set : 363 MB
    Pagefile : 554 MB
    IO Read : 264 MB (in 2 reads)
    IO Write : 264 MB (in 6 writes)



    Full encode\decode cycle of precomp 0.4.3 -intense -cn took 39.2 sec. So IMHO its a very good start for AntiZ.
    Actually precomp performs slower because of temp files it writes\read from the disk, so it gets additional impact from IO and locks from antivirus real-time scanners. The result above is for RAM disk.
    Also as I see AntiZ uses in-memory operations for data blocks. Very wise decision.
    Hope you'll not stop on this alpha )

  6. Thanks (2):

    Bulat Ziganshin (17th May 2015),Gonzalo (17th May 2015)

  7. #5
    Member
    Join Date
    May 2012
    Location
    United States
    Posts
    324
    Thanks
    182
    Thanked 53 Times in 38 Posts
    Great!!!! Keep working on this!

    Quick benchmark. Size is file after compression with NanoZip 0.09 "-cc" option.

    FlashMX.pdf
    Code:
    Precomp 0.4.3 -cn -intense  (2,290,842 
                    Antiz_010a  (2,286,361

  8. #6
    Member
    Join Date
    May 2015
    Location
    Hungary
    Posts
    25
    Thanks
    4
    Thanked 32 Times in 8 Posts
    Quote Originally Posted by Skymmer View Post
    Wow, thats really surprising. There are not too much precompressors have been created at this moment, so any new development in this area is highly interesting. Especially if its an open source project.
    Thanks for it

    Its a little bit sad that you aiming your project at zlib. No doubt deflate compression is highly widespreaded but for example a lot of MS Win related distributives are LZX based.
    Anyway, its good to see that precompression is not dead.
    I performed a little test on one continuos 92 MB zlib block. Actually AntiZ is not so slow comparing to precomp.

    Input size:96396299
    Good offsets: 1
    recompressed:1/1
    Total bytes written: 181061083
    reconstructing from jeepride.ff.atz
    File size:181061083
    Process Time : 61.406s
    Clock Time : 61.468s
    Working Set : 363 MB
    Pagefile : 554 MB
    IO Read : 264 MB (in 2 reads)
    IO Write : 264 MB (in 6 writes)



    Full encode\decode cycle of precomp 0.4.3 -intense -cn took 39.2 sec. So IMHO its a very good start for AntiZ.
    Actually precomp performs slower because of temp files it writes\read from the disk, so it gets additional impact from IO and locks from antivirus real-time scanners. The result above is for RAM disk.
    Also as I see AntiZ uses in-memory operations for data blocks. Very wise decision.
    Hope you'll not stop on this alpha )
    Yes everything is done in RAM, in fact the crazy IO barrage that precomp does was one of the reasons I started developing AntiZ. Currently the limiting factor is the deflate code(50% of cpu time spent in longest_match, there is a nasty bottleneck caused by branch mispredictions). I will see what I can do about that.

    Also I plan to revive the currently disabled smart mode, it will sometimes hurt compression but will be much faster than bruteforcing(many trimes).

  9. #7
    Member
    Join Date
    Oct 2007
    Location
    Germany, Hamburg
    Posts
    408
    Thanks
    0
    Thanked 5 Times in 5 Posts
    Thank you Diazonium,

    a very nice project you started here.
    I'll wait recursion implementation before I start testing it. This gives much more space for interesting results and possible problems.
    How deep is this feature inside your todo list?

  10. #8
    Member
    Join Date
    May 2015
    Location
    Hungary
    Posts
    25
    Thanks
    4
    Thanked 32 Times in 8 Posts
    Quote Originally Posted by Simon Berger View Post
    Thank you Diazonium,

    a very nice project you started here.
    I'll wait recursion implementation before I start testing it. This gives much more space for interesting results and possible problems.
    How deep is this feature inside your todo list?
    Recursion and 32-bit are both pretty far away. I dont think that I will do them before the first major rewrite(separating high level logic from the lower level type code, put all the separate phases into functions, etc.) The current code is messy and not very maintainable.

  11. #9
    Member Skymmer's Avatar
    Join Date
    Mar 2009
    Location
    Russia
    Posts
    681
    Thanks
    38
    Thanked 168 Times in 84 Posts
    Quote Originally Posted by Diazonium View Post
    Currently the limiting factor is the deflate code(50% of cpu time spent in longest_match, there is a nasty bottleneck caused by branch mispredictions). I will see what I can do about that.
    I have recently browsing the zlib sources and the are a couple of interesting things in the contrib\. Among them are the masmx64 and masmx86 folders with ASM implementations of the functions longest_match() and inflate_fast(). Maybe it will help.

    Also I have a couple of suggestions:
    1. Multi-thread with an option to control the number of cores used.
    Yes, implementing the MT can be quite problematic but its the only way to get a fast performance. Even with highly optimized deflating there is still need for brute of compression levels and memory levels, so it means that 81 compressions of same data block need to be done. Actually more due the enumeration of window bits.
    2. Option for setting the memory cap.
    This means that if required memory is more than allowed value then mode which uses temporary files is activated. This can be usefull in case of memory limits which can happen on some extraordinary zlib packed data. I have an example of 2.34GB continuos zlib block which is being unpacked takes 8GB. So potentially it will lead to more than 10GB memory usage.
    3. Option to set temporary files folder.
    Obviously it should be active in case of memory cap used.
    By the way, AntiZ gives an error on this file:
    Code:
    Input file: GLOBAL.PAK
    overwriting GLOBAL.PAK.atz and GLOBAL.PAK.rec if present
    Input size:-1774453588
    terminate called after throwing an instance of 'std::bad_alloc'
      what():  std::bad_alloc
    4. Precompression level deduplication.
    Hash of the raw deflate block can be calculated using some fast and collision-free algo (BLAKE2 for example). So in case the block with the same hash is found, it can be marked as duplicate. Such behaviour will give benefits for encoding (no need to re-brute the same data), decoding (unique block can be compressed only once and then reduplicated) and output file size (same blocks are stored as one precompressed data chunk).

  12. #10
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    515
    Thanks
    234
    Thanked 89 Times in 69 Posts
    Quote Originally Posted by Diazonium View Post
    Recursion and 32-bit are both pretty far away. I dont think that I will do them before the first major rewrite(separating high level logic from the lower level type code, put all the separate phases into functions, etc.) The current code is messy and not very maintainable.
    Then maybe the best you can do is to rewrite right now, because later will become more and more difficult... Otherwise you will be doing the same job twice... Another advantage is that more users will be able to test AntiZ so the development will accelerate more actually.

  13. #11
    Programmer Jan Ondrus's Avatar
    Join Date
    Sep 2008
    Location
    Rychnov nad Kněžnou, Czech Republic
    Posts
    278
    Thanks
    33
    Thanked 137 Times in 49 Posts
    Hello you can find my branch here
    https://github.com/hxim/AntiZ/tree/hxim
    i just removed some debug stuff and rewritten phase 1 (detecting possible offsets)

    Edit: added my attempt to merge phase 1 and 2 (detect possible offsets and check them in one go)
    Last edited by Jan Ondrus; 17th May 2015 at 21:18.

  14. Thanks (5):

    Bulat Ziganshin (18th May 2015),comp1 (17th May 2015),Diazonium (18th May 2015),Gonzalo (17th May 2015),Stephan Busch (18th May 2015)

  15. #12
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    515
    Thanks
    234
    Thanked 89 Times in 69 Posts
    Also, brute force is not always needed. You can approximate the level of compression with some accuracy after some tries using heuristics. For example, you try the fastest and the strongest mode. And compare sizes with original stream. At that point you know in which range from 1:1 to 9:9 the block was packed. The whole process takes far less than 81 attempts...

  16. #13
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    515
    Thanks
    234
    Thanked 89 Times in 69 Posts
    {newbie question incoming} What about coder only replacement? I mean Huffman >> ari/ans/etc... It is technically possible? The idea is to avoid modeling stage. If possible, I infer that it would result in less gaining but very much faster process.

  17. #14
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    you can try it with tornado: -c3 is huffman and -c4 is ari. usually it's 0.5% improvement

  18. #15
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    Quote Originally Posted by Skymmer View Post
    Hash of the raw deflate block can be calculated using some fast and collision-free algo (BLAKE2 for example)
    vmac is an order of magnitude faster, 0.8 clocks/byte for 256-bit result using only non-simd 64-bit arithmetic
    Last edited by Bulat Ziganshin; 18th May 2015 at 01:28.

  19. #16
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,475
    Thanks
    26
    Thanked 121 Times in 95 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    you can try it with tornado: -c3 is huffman and -c4 is ari. usually it's 0.5% improvement
    Ari allows for fancier modelling so the potential improvement can be somewhat larger. Vide eg lossless jpeg compressors.

  20. #17
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    515
    Thanks
    234
    Thanked 89 Times in 69 Posts
    uh. ok. So is not worth...

  21. #18
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    Quote Originally Posted by Piotr Tarsa View Post
    Ari allows for fancier modelling so the potential improvement can be somewhat larger. Vide eg lossless jpeg compressors.
    well, that's another idea. rather than replacing huf->ari (0.5% less), use any novel modeling over stream of chars+matches already found by deflate. isn't it behind shelwien's reflate?

  22. #19
    Member
    Join Date
    May 2015
    Location
    Hungary
    Posts
    25
    Thanks
    4
    Thanked 32 Times in 8 Posts
    Quote Originally Posted by Jan Ondrus View Post
    Hello you can find my branch here
    https://github.com/hxim/AntiZ/tree/hxim
    i just removed some debug stuff and rewritten phase 1 (detecting possible offsets)

    Edit: added my attempt to merge phase 1 and 2 (detect possible offsets and check them in one go)
    Thanks for the work you have done, I have already incorporated some of your changes into the main branch. I indend to keep more debug stuff, especially in the later phases because they have helped me a lot in debugging and I think I may need them again. So I kinda made a hybrid of your code and my old code.

  23. #20
    Member
    Join Date
    May 2015
    Location
    Hungary
    Posts
    25
    Thanks
    4
    Thanked 32 Times in 8 Posts
    1. MT: Yes, I have plans for multithreading! It should not be that difficult after the rewrite to using functions is complete.
    2-3. Yes this is currently a limitation, again an issue that will have to wait until the logic is separated out.
    4. An interesting idea, I have not yet tought about it. It would be nice, but its a low priority thing. Definately only after 1-2-3.

  24. #21
    Member Skymmer's Avatar
    Join Date
    Mar 2009
    Location
    Russia
    Posts
    681
    Thanks
    38
    Thanked 168 Times in 84 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    vmac is an order of magnitude faster, 0.8 clocks/byte for 256-bit result using only non-simd 64-bit arithmetic
    О_О I can't believe it. More exactly speaking I just don't believe it. If you have console exe which I can test on my own then I'll gladly do it. I don't want to say that you're liar but such speed is simply unbelievable.

    Blake2sp gives 2.268 cpb on my system.
    z:\test_rawdet>64 -g -n -u -m -r678bde -- _test64.exe -a blake2sp CONS.dat
    daf7e9c6e345b0f2f7ab3a16906bc5053f620b7f577bdda9bb dbce6e9ffa8964 CONS.dat
    Process Time : 23.468s
    Clock Time : 4.921s
    Working Set : 17 MB
    Pagefile : 17 MB
    IO Read : 9929 MB (in 622 reads)
    IO Write : 0 MB (in 0 writes)
    CONS.dat is 10 412 264 009 bytes, CPU is 4800 MHz

  25. #22
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    blake2sp is ~10cpb. vmac is a part of srep for 2 years, you can make standalone program yourself. btw, srep performs full dedup at 4GB/s where checksumming is only a part of work
    Last edited by Bulat Ziganshin; 18th May 2015 at 06:44.

  26. #23
    Member Skymmer's Avatar
    Join Date
    Mar 2009
    Location
    Russia
    Posts
    681
    Thanks
    38
    Thanked 168 Times in 84 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    blake2sp is ~10cpb. vmac is a part of srep for 2 years...
    ... btw, srep performs full dedup at 4GB/s where checksumming is only a part of work
    Yes, I know about SREP internals. I'm tracking its progress almost from the beggining

    Quote Originally Posted by Bulat Ziganshin View Post
    you can make standalone program yourself.
    You're overestimating my possibilities ). I tried to compile original sources but only after a couple of hours I realized why entry point is not visible. ОК, I compiled it. But resulted exe started to give some bad results and reporting about some differences in abc. So I took your version from SREP and everything became fine. By the way, changes you have made for removing some limitations of vmac are admirable.
    Yes, you were right - VMAC is blazingly fast if not to say more.

    Code:
    
       16 bytes, 20.60 cpb   |    2048 bytes, 0.46 cpb
       32 bytes, 10.37 cpb   |    4096 bytes, 0.39 cpb
        64 bytes, 5.28 cpb   |    8192 bytes, 0.34 cpb
       128 bytes, 2.70 cpb   |   16384 bytes, 0.33 cpb
       256 bytes, 1.54 cpb   |   32768 bytes, 0.32 cpb
       512 bytes, 0.94 cpb   |   65536 bytes, 0.31 cpb
      1024 bytes, 0.62 cpb   |  131072 bytes, 0.31 cpb
    
    But I have a suspicion that VMAC is strongly asm optimized and tuned for aligned lengths of data (correct me If I wrong). Also I think its not ready for large portions of data. At least out-of-the-box. I tried to test it with larger bytes lengths and it simply crashed at 1048576 bytes. Also there are still limitations like the first bit of the nonce buffer and special conditions for vhash_update.
    I have also retested blake2 with my new compile and blake2b for this time. It gets 5.90 cpb.
    There is also xxHash which impressed me by its nice code layout and speed. It gets 0.86 cpb.

  27. #24
    Member
    Join Date
    May 2008
    Location
    brazil
    Posts
    163
    Thanks
    0
    Thanked 3 Times in 3 Posts
    Is there any plans to create a general preprocessor ? (text ,sound )

    Projects like xwrt for example is very interesting

  28. #25
    Member Skymmer's Avatar
    Join Date
    Mar 2009
    Location
    Russia
    Posts
    681
    Thanks
    38
    Thanked 168 Times in 84 Posts
    You have any troubles with it ?

  29. #26
    Member
    Join Date
    May 2008
    Location
    brazil
    Posts
    163
    Thanks
    0
    Thanked 3 Times in 3 Posts
    Quote Originally Posted by Skymmer View Post
    You have any troubles with it ?
    No ,but an unified preprocessing tool can be very good for data compression.

  30. #27
    Member
    Join Date
    Oct 2007
    Location
    Germany, Hamburg
    Posts
    408
    Thanks
    0
    Thanked 5 Times in 5 Posts
    Quote Originally Posted by Diazonium View Post
    1. MT: Yes, I have plans for multithreading! It should not be that difficult after the rewrite to using functions is complete.
    2-3. Yes this is currently a limitation, again an issue that will have to wait until the logic is separated out.
    4. An interesting idea, I have not yet tought about it. It would be nice, but its a low priority thing. Definately only after 1-2-3.
    An addition/question to point 2.
    I don't know which functionality you use to detect and after it decompress the zlib streams but in theory there should not be any hard memory limit one could hit, because zip has those low requirments.
    There should be a streaming decompression/compression of zlib available.
    Only bottleneck could maybe be the detection if you go through the whole possible zlib stream to be sure it is one, then you would need to seek back in the file.

  31. #28
    Member Skymmer's Avatar
    Join Date
    Mar 2009
    Location
    Russia
    Posts
    681
    Thanks
    38
    Thanked 168 Times in 84 Posts
    20150518 binaries for both x86 and x64
    Attached Files Attached Files

  32. Thanks (3):

    comp1 (19th May 2015),Diazonium (19th May 2015),Gonzalo (19th May 2015)

  33. #29
    Member
    Join Date
    May 2015
    Location
    Hungary
    Posts
    25
    Thanks
    4
    Thanked 32 Times in 8 Posts
    Quote Originally Posted by Skymmer View Post
    20150518 binaries for both x86 and x64
    Wow, how did you manage to build a 32bit version? The offsets in phase 4-5 are all hand-crafted and assume 8 byte ints, so I tought building 32-bit versions would break everything. Have you made any modifications to the source code for the 32-bit build?

  34. #30
    Member
    Join Date
    May 2015
    Location
    Hungary
    Posts
    25
    Thanks
    4
    Thanked 32 Times in 8 Posts
    Quote Originally Posted by Simon Berger View Post
    An addition/question to point 2.
    I don't know which functionality you use to detect and after it decompress the zlib streams but in theory there should not be any hard memory limit one could hit, because zip has those low requirments.
    There should be a streaming decompression/compression of zlib available.
    Only bottleneck could maybe be the detection if you go through the whole possible zlib stream to be sure it is one, then you would need to seek back in the file.
    Currently the entire input file is read into memory, then decompressed into memory, streams are induvidually decompressed into memory(but only one at any given time) and recompressed into memory. So worst case is that there is a large file, with only one big zlib stream in it, that expands by a lot when decompressed. In this case the memory usage is many times larger than the input file.

Page 1 of 3 123 LastLast

Similar Threads

  1. Why not open source?
    By nemequ in forum Data Compression
    Replies: 65
    Last Post: 25th November 2013, 23:05
  2. packMP3 v1.0d release: Open source under the GPL v3
    By packDEV in forum Data Compression
    Replies: 3
    Last Post: 2nd October 2013, 03:11
  3. MCM open source
    By Mat Chartier in forum Data Compression
    Replies: 12
    Last Post: 29th August 2013, 21:22
  4. Open source JPEG compressors
    By inikep in forum Data Compression
    Replies: 8
    Last Post: 22nd October 2011, 01:16
  5. PeaZip - open source archiver
    By squxe in forum Data Compression
    Replies: 1
    Last Post: 3rd December 2009, 22:01

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •