Page 3 of 3 FirstFirst 123
Results 61 to 85 of 85

Thread: precomp - further compress already compressed files

  1. #61
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    536
    Thanks
    237
    Thanked 90 Times in 70 Posts
    Yes, I kinda get lost on precomp.cpp I just always thought it was bc I'm not really a programmer. I did a few cosmetic changes a while ago (I believe it was ratio and speed report after completion) but I never got it quite the way I wanted it so I didn't share it.

    Anyway, I wouldn't do the log parser thing bc I believe it would slow it down way too much. I'd prefer to take my chances with precomp itself.

  2. #62
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    612
    Thanks
    250
    Thanked 240 Times in 119 Posts
    Quote Originally Posted by Gonzalo View Post
    What do you think about this, Christian?
    One thing that I wanted to do soon was some kind of "extract" switch that dumps all the data from the original file unprocessed (so it would create numbered .png, .jpg, .zip, .gz, .mp3... files). The original intent would be to support file analysis by extracting all these streams. This idea could be extended by additionally creating an index file and a file for non-parsed segments as you described and a way to reconstruct the original file ("reassemble").

    After that first step, the files could be sorted and put in a container - a second pass of "precomp -cn" would decompress all the data - and finally, everything could be compressed. This would be what you described if I understood it correctly.

    I'll priorize this a bit higher than before as it shouldn't take too long to implement and would be useful for some people.
    http://schnaader.info
    Damn kids. They're all alike.

  3. Thanks (2):

    Gonzalo (17th December 2019),Shelwien (17th December 2019)

  4. #63
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    536
    Thanks
    237
    Thanked 90 Times in 70 Posts
    Quote Originally Posted by schnaader View Post
    This would be what you described if I understood it correctly.
    Well, yes and no. I really, really like that. I was going to propose an extraction mode later and it could be truly useful, indeed!

    What I did propose earlier was that after precomp processes a stream in a file, it should write the {pmp, pjg, raw, etc} recompressed stream down to a separate position.ext instead of appending it to file.pcf, and all metainfo needed for lossless restoration to an index file. So the idea was to have on disk, for example, the txt contents of a pdf stream, not the deflate-compressed data. There could be just a few files, each one for a different kind of data, with all corresponding streams on them.

    But I guess what you propose is even better. The drawback of that is how to deal with false positives... We don't really know if a stream is what it seems until after we actually recompressed it (or not)...

  5. #64
    Member
    Join Date
    Jun 2018
    Location
    Yugoslavia
    Posts
    54
    Thanks
    8
    Thanked 3 Times in 3 Posts
    i have started 'reiso.pm' and 're_arc.pm' that works that way. i wanted to do it for all other formats too but other people have same idea.
    i also started 'reorder.pm' that tries to group similar files.


    Quote Originally Posted by Gonzalo View Post
    I was thinking about a rather naive way to improve precomp effectiveness... I'm sure somebody thought about it before, I'm just sharing it so as to know if it could be done or is it a bad idea.

    It's been stated before the possibility of rearranging data inside the .PCFs to group similar streams and in doing so improve compression. Couldn't it be simpler to output every stream as a separate file with a guessed extension, like '.ari' for incompressible streams, '.txt' for text, '.bmp' for bitmaps and '.bin' for everything else? Then any modern archiver would take care of the grouping and maybe codec selection.

    An alternative (so as to not write a million little files to the disk) would be to output a few big TXT, BIN, and so on with all respective streams concatenated, plus an index.pcf containing the metadata needed for reconstruction.

    ​What do you think about it?

  6. #65
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    536
    Thanks
    237
    Thanked 90 Times in 70 Posts
    Quote Originally Posted by pklat View Post
    i have started 'reiso.pm' and 're_arc.pm' that works that way. i wanted to do it for all other formats too but other people have same idea.
    i also started 'reorder.pm' that tries to group similar files.
    That's a good idea there! If you'll allow me, here are a few comments about it:

    * Unless we're working with a very small dictionary, modern lz compressors can find similarities across vast distances
    * There's also deduplication programs, which can cancel out any big similarities real fast
    * If you still want to reorder the file list, maybe you could try some tricks to speed it up. For example,
    ## find first any duplicate files (same size, same hash),
    ## sort first based on extension and size, and maybe TrID guessed type
    ## make a quick histogram to have a rough idea about the entropy of each file, etc


    What I believe is even better to improve compression is grouping *types* of files together and using different algorithms on each. FreeArc tried to do that based on a manually curated list of file extensions, and it wasn't bad. Some other tools do that automatically too, that's why i suggested to try and analyse the type of each stream and assign a fitting extension or maybe concatenate them on a big file. But that is a very difficult task. I don't know of any successful attempts. I haven't had much luck with the 'file' utility on linux and TrID, but TrIDScan seems promising enough. I'll try and make a little time to try it on precomp-ed files. @christian: For that, it would be really useful the extraction feature.
    Last edited by Gonzalo; 18th December 2019 at 01:14.

  7. #66
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    612
    Thanks
    250
    Thanked 240 Times in 119 Posts
    After some fiddling with the AppVeyor configuration, there's a x64 windows binary for each automated build now. Might also change the automated builds from Travis and AppVeyor to only AppVeyor as they now support Linux/MacOS builds.
    http://schnaader.info
    Damn kids. They're all alike.

  8. Thanks (2):

    hmdou (23rd March 2020),moisesmcardona (18th March 2020)

  9. #67
    Member
    Join Date
    Feb 2017
    Location
    none
    Posts
    25
    Thanks
    6
    Thanked 13 Times in 6 Posts
    Hi! i i compiled the last GIt version in Mint 19.3 and is more than 50% slow that this compiled windows version

    Example:
    Using wine and the precomp048cl80.exe a file with a lot of really compressed png's and zlib compression (339 MB) took 7 minutes to recover the pcf to the original data and no errors, CRC OK, and the precomp for Linux (wich i complied myself) took more than 14 minutes, same data

    How i can make the Linux version faster as that windows compiled version?

    And btw compiling precomp i had some warnings, here is the log

    This can help to the speed?
    -- Looking for pthread_create
    -- Looking for pthread_create - not found
    /precomp-cpp$ mkdir build
    :~/precomp-cpp$ cd build
    :~/precomp-cpp/build$ cmake ..
    -- The C compiler identification is GNU 7.5.0
    -- The CXX compiler identification is GNU 7.5.0
    -- Check for working C compiler: /usr/bin/cc
    -- Check for working C compiler: /usr/bin/cc -- works
    -- Detecting C compiler ABI info
    -- Detecting C compiler ABI info - done
    -- Detecting C compile features
    -- Detecting C compile features - done
    -- Check for working CXX compiler: /usr/bin/c++
    -- Check for working CXX compiler: /usr/bin/c++ -- works
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    -- Detecting CXX compile features
    -- Detecting CXX compile features - done
    -- Looking for pthread.h
    -- Looking for pthread.h - found
    -- Looking for pthread_create
    -- Looking for pthread_create - not found
    -- Check if compiler accepts -pthread
    -- Check if compiler accepts -pthread - yes
    -- Found Threads: TRUE
    -- Configuring done
    -- Generating done
    -- Build files have been written to: /home/test/precomp-cpp/build
    :~/precomp-cpp/build$ make
    Scanning dependencies of target precomp
    [ 0%] Building C object CMakeFiles/precomp.dir/contrib/giflib/gifalloc.c.o
    [ 1%] Building C object CMakeFiles/precomp.dir/contrib/giflib/gif_err.c.o
    [ 48%] Building C object CMakeFiles/precomp.dir/contrib/liblzma/rangecoder/price_table.c.o
    In file included from /home/test/precomp-cpp/contrib/liblzma/rangecoder/price_table.c:3:0:
    /home/test/precomp-cpp/contrib/liblzma/rangecoder/range_encoder.h: In function ‘rc_encode’:
    /home/test/precomp-cpp/contrib/liblzma/rangecoder/range_encoder.h:153:2: warning: implicit declaration of function ‘assert’ [-Wimplicit-function-declaration]
    assert(rc->count <= RC_SYMBOLS_MAX);
    ^~~~~~
    [ 99%] Building CXX object CMakeFiles/precomp.dir/precomp.cpp.o
    /home/test/precomp-cpp/precomp.cpp: In funct ion ‘recompress_deflate_result try_recompression_deflate(FILE*)’:
    /home/test/precomp-cpp/precomp.cpp:3233:69: warning: format ‘%d’ expects argument of type ‘int’, but argument 4 has type ‘size_t {aka long unsigned int}’ [-Wformat=]
    snprintf(namebuf, 49, "preflate_error_%04d.raw", counter++);
    ~~~~~~~~~^
    [100%] Linking CXX executable precomp
    [100%] Built target precomp

  10. #68
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    612
    Thanks
    250
    Thanked 240 Times in 119 Posts
    Quote Originally Posted by redrabbit View Post
    Hi! i i compiled the last GIt version in Mint 19.3 and is more than 50% slow that this compiled windows version

    [...]

    How i can make the Linux version faster as that windows compiled version?
    You most likely did just run CMake without parameters which unfortunately builds a debug version (larger executable and slower). Try to delete the CMakeCache file, the Makefile and the build directories and re-run like with "-DCMAKE_BUILD_TYPE=Release" as parameter. Sorry for the inconvinience, I'll add some "How to build" section to the README.md. Also, I'll have a look at other CMake files because I think there's a way to make release the default configuration.

    Quote Originally Posted by redrabbit View Post
    And btw compiling precomp i had some warnings, here is the log​
    At the moment, none of the warnings is really critical (in regards to performance or wrong behaviour). I try to keep them low, but as each compiler gives different ones, this is kind of a tedious task. I think pthread_create is not needed, if Precomp detects multiple threads by default, multi-threading things (preflate, JPG recompression) should work fine.
    http://schnaader.info
    Damn kids. They're all alike.

  11. Thanks:

    redrabbit (8th May 2020)

  12. #69
    Member
    Join Date
    Feb 2017
    Location
    none
    Posts
    25
    Thanks
    6
    Thanked 13 Times in 6 Posts
    Thanks for the tip, i compiled with this parameter and now is more faster (from 14 minutes to 7 minutes), BUT still more slow than the windows version using wine restoring a big .pcf file (4.9 GB to 322.3 MB)

    precomp git linux (x86_64) using "-DCMAKE_BUILD_TYPE=Release"
    Time: 6 minute(s), 59 second(s)

    precomp048cl80.exe (32 bits) with wine
    Time: 5 minute(s), 59 second(s)

  13. #70
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,909
    Thanks
    291
    Thanked 1,271 Times in 718 Posts
    That's probably my build, like here: https://encode.su/threads/3076-Preco...ll=1#post59388
    Clang seems to produce faster binaries for precomp, plus I used some additional compiler options like
    "-O3 -march=k8 -mtune=k8 -fomit-frame-pointer -fno-stack-protector -fno-stack-check"

  14. Thanks (2):

    redrabbit (8th May 2020),schnaader (8th May 2020)

  15. #71
    Member
    Join Date
    Feb 2017
    Location
    none
    Posts
    25
    Thanks
    6
    Thanked 13 Times in 6 Posts
    Thanks for the tip Shelwien, i did some tests to search the best optimization parameters for precomp and this is what i have

    export CC=/usr/bin/clang
    export CXX=/usr/bin/clang++
    cmake .. "-DCMAKE_BUILD_TYPE=Release" "-O3 -march=k8 -mtune=k8 -fomit-frame-pointer -fno-stack-protector -fno-stack-check"
    Done
    Time: 7 minute(s), 32 second(s)
    ---
    export CC=/usr/bin/clang
    export CXX=/usr/bin/clang++
    cmake .. "-DCMAKE_BUILD_TYPE=Release"
    Done
    Time: 7 minute(s), 32 second(s)
    --
    export CC=/usr/bin/gcc
    export CXX=/usr/bin/c++
    cmake .. "-DCMAKE_BUILD_TYPE=Release" "-O3 -march=k8 -mtune=k8 -fomit-frame-pointer -fno-stack-protector -fno-stack-check"
    Done.
    Time: 6 minute(s), 26 second(s)
    ---
    export CC=/usr/bin/gcc
    export CXX=/usr/bin/c++
    cmake .. "-DCMAKE_BUILD_TYPE=Release" "-O3 -mtune=intel -fomit-frame-pointer -fno-stack-protector -fno-stack-check"
    Done.
    Time: 6 minute(s), 44 second(s)
    ---
    export CC=/usr/bin/gcc
    export CXX=/usr/bin/c++
    cmake .. "-DCMAKE_BUILD_TYPE=Release" "-O3 -march=native -fomit-frame-pointer -fno-stack-protector -fno-stack-check"
    Time: 6 minute(s), 19 second(s)
    --
    export CC=/usr/bin/gcc
    export CXX=/usr/bin/c++
    cmake .. "-DCMAKE_BUILD_TYPE=Release" "-O3 -mtune=native -march=native -fomit-frame-pointer -fno-stack-protector -fno-stack-check"
    Done.
    Time: 5 minute(s), 52 second(s)
    ---
    export CC=/usr/bin/gcc
    export CXX=/usr/bin/c++
    cmake .. "-DCMAKE_BUILD_TYPE=Release" "-O3 -mtune=native -march=native -fomit-frame-pointer -fno-stack-protector -fno-stack-check -fomit-stack-pointer"
    Done.
    Time: 5 minute(s), 35 second(s)
    --
    export CC=/usr/bin/gcc
    export CXX=/usr/bin/c++
    cmake .. "-DCMAKE_BUILD_TYPE=Release" "-Ofast -mtune=native -march=native -fomit-frame-pointer -fno-stack-protector -fno-stack-check -fomit-stack-pointer"
    Done.
    Time: 5 minute(s), 45 second(s)
    I attach the compiled version for Linux x86_64
    Attached Files Attached Files

  16. Thanks (3):

    Mike (8th May 2020),moisesmcardona (9th May 2020),schnaader (8th May 2020)

  17. #72
    Member
    Join Date
    Dec 2008
    Location
    Poland, Warsaw
    Posts
    1,151
    Thanks
    703
    Thanked 455 Times in 352 Posts
    @schnaader - I have two questions:
    1) could you compile windows version of 048 version?
    2) I've tried to use precomp on reymont pdf file from Silesia corpus but there's no result - is it some issue with the file type or precomp will not help with this case?

    Ok, maybe I'm silly (Im not an progrmmer) but I've checked the file on textual viewers then this file contains generally text (in Polish) but splitted by some strange numbers in the middle of words...
    Of course there are additionally polish special charts (ł,ś,ń,ó etc.) but there are strange split with words without these charts. Example:

    PDF text:

    Td[({)-412(Gran)29(ula,)-412(b)1(iedoto,)-412(gr)1(a)-1(n)29(ula!)

    As a pure text should be:

    Granula, biedoto, granula!

    Of corse with, some charts like space have -412 number as I understand but why there 1 or -1 figures in the middle of the word and if it's possible to separate it from the text.
    Of course I know that there numbers are neccessary for the file and it should be puttet back before decompression, but if it would be possible to separate all these numbers, maybe remove the brackets, then, using polish dictioanry is posssible to compress this file better ratio than is actually made.

    I've attached this file to check.
    Attached Files Attached Files

  18. #73
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    612
    Thanks
    250
    Thanked 240 Times in 119 Posts
    Quote Originally Posted by Darek View Post
    @schnaader - I have two questions:
    1) could you compile windows version of 048 version?
    2) I've tried to use precomp on reymont pdf file from Silesia corpus but there's no result - is it some issue with the file type or precomp will not help with this case?
    Quick and short answer, might post a more elaborate version later.

    1) I recently made the AppVeyor build agent automatically create windows binaries for each commit, so called "artifacts". The latest for 0.4.8dev can be downloaded here: https://ci.appveyor.com/project/schn...0297/artifacts

    2) reymont is an uncompressed PDF, it doesn't contain any deflate streams like other PDFs, so there's nothing to do for Precomp. The numbers and seperated words you noticed are related to the font kerning and quite common in PDF files. The PDF optimizations of the latest paq8px versions merge the word parts and seperate them from the kerning to compress this better. But I have not found a way to make use of this in Precomp yet as this transforms don't work well when not using paq-like arithmetic coding, but LZMA2 like in Precomp.
    http://schnaader.info
    Damn kids. They're all alike.

  19. Thanks:

    Darek (17th May 2020)

  20. #74
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,909
    Thanks
    291
    Thanked 1,271 Times in 718 Posts
    I posted a simple script in the other thread - https://encode.su/threads/1464-Paq8p...ll=1#post64974
    Maybe something along that line?
    This specific script doesn't actually seem to improve compression, but the idea?

  21. #75
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    612
    Thanks
    250
    Thanked 240 Times in 119 Posts
    Quote Originally Posted by Shelwien View Post
    Maybe something along that line?
    This specific script doesn't actually seem to improve compression, but the idea?
    Yes, this is the basic way to go. Did some similar experiments last year, but cancelled it because when testing with different types of PDF, there were some problems:

    1. Different PDFs and fonts use different types of kerning (e.g. sometimes word aren't fragmented as much, sometimes the numbers are floats and so on), so the strategy has to be quite adaptive along different PDFs, sometimes even inside the same PDF. Only increases complexity, though, not a showstopper.
    2. Compression can only be improved when overhead from the transform is low. This is a bigger problem as the only quick solution I found was to split the PDF content into 3 seperate streams: The words, the kerning numbers and some metadata to tell when to insert the kerning values (that is, how many characters until the next kerning number has to be inserted) and this can generate some overhead that was hard to get back from improved compression.
    3. Compression can only be improved if kerning doesn't correlate too much with the text. This was the biggest problem and can be observed on reymont. This document often contains strings like "n)29(u" where kerning is directly correlated with the previous and the next character - which isn't much of a surprise when thinking about it, but makes it harder to predict ")29(" for LZMA2 when it is not interleaved with the words.

    All of these 3 points interact with each other, e.g. on some PDF data, point 3 won't be a problem, on other, it will, so the strategy in point 2 has to be adapted, further increasing the complexity from point 1.

    That's why I reacted so positive when paq8px_v182 introduced the PDF WordModel changes. Didn't have time so far to look at how it works in paq8px, though.

    By the way, also note the xref table mention there - this is a much easier optimization that can be done. It can be applied to reymont and is not included in paq8px so far. Basically compressing PDF without xref and reconstructing it from the data, along with some additional information. This one might actually find its way into one of the next Precomp versions.

    Conclusion: I think it's possible to compress PDF text content better in Precomp similar to what is done in paq8px, but it will need quite some research and work.
    Last edited by schnaader; 17th May 2020 at 17:14.
    http://schnaader.info
    Damn kids. They're all alike.

  22. Thanks:

    Shelwien (17th May 2020)

  23. #76
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,909
    Thanks
    291
    Thanked 1,271 Times in 718 Posts

  24. Thanks:

    schnaader (17th May 2020)

  25. #77
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    612
    Thanks
    250
    Thanked 240 Times in 119 Posts
    Quote Originally Posted by Shelwien View Post
    Nice!
    GPL-2.0 can't be integrated into Precomp (Apache license), though, so dual-licensing, switching to LGPL or permission would be needed.

    Integration might be useful for at least four things:
    1) Applying cdm to the LZMA2 stream generated by Precomp.
    2) Applying cdm to compressed substreams like the arithmetic coded output from PackJPG and PackMP3 or the output of Brunsli+Brotli.
    3) Applying cdm to unprocessed parts of the original file that have high entropy (can also be done content dependent, e.g. when detecting things like Ogg Vorbis streams, video or certain archive formats)
    4) Applying cdm to the reconstruction data that preflate generates.

    Anything else I'm missing (or any of those four that likely won't work)?
    Last edited by schnaader; 17th May 2020 at 17:25.
    http://schnaader.info
    Damn kids. They're all alike.

  26. #78
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,909
    Thanks
    291
    Thanked 1,271 Times in 718 Posts
    > GPL-2.0 can't be integrated into Precomp (Apache license), though,
    > so dual-licensing, switching to LGPL or permission would be needed.

    Switched to LGPL for now, but I don't really like the idea of commercial use for this.

    > 1) Applying cdm to the LZMA2 stream generated by Precomp.

    Yes, LZMA2 tends to leave large compressible chunks within stored blocks.
    LZMA2 stream is rather easy to parse (see http://nishi.dreamhosters.com/u/lzma2_det_v0.rar),
    so it might be a good idea to split it to headers/compressed/stored streams.

    But I don't like lzma2 - I think its better to use lzma with self-made MT.
    Unlike LZMA2, LZMA supports lc8 and MT compression in LZMA2 is poorly made.
    Maybe I can post my MT version of lzma (plzma c0) if you're interested...
    test version is here: http://nishi.dreamhosters.com/u/plzma4_stdio_v0.rar
    (params are n_threads chunksize winlog)

    > 2) Applying cdm to compressed substreams like the arithmetic coded output
    > from PackJPG and PackMP3 or the output of Brunsli+Brotli.

    Possible, but probably not worth it, solid arithmetic code usually won't be compressible with cdm.
    At least I just tested A10.pjg and A10.brn and its not.
    We might be able to get some effect by attaching mod_SSE though.

    Btw, did you see jojpeg? http://nishi.dreamhosters.com/u/jojpeg_sh3.rar

    > 3) Applying cdm to unprocessed parts of the original file that have high entropy.

    For LZMA2 stored blocks it could make sense, I suppose.
    But CDM doesn't have an internal matchfinder, so at least a dedup filter is necessary -
    ideally actual LZ without entropy coding.

    4) Applying cdm to the reconstruction data that preflate generates.

    Tested with book1__kzip.raw - not compressible.
    Some specific streams might be, but from reflate I know that its actually a better
    idea to compress diffs with LZ.

    > Anything else I'm missing?

    https://encode.su/threads/2742-Compr...ll=1#post52493

    cdm is more helpful for known bitcode formats without specific handler
    (or ones with a handler that failed to be recompressed normally).
    Also its better to use it after bitwise-BWT transformation (or some other method of sorting bits by context).

  27. Thanks:

    Mike (17th May 2020)

  28. #79
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    612
    Thanks
    250
    Thanked 240 Times in 119 Posts
    Quote Originally Posted by Shelwien View Post
    Switched to LGPL for now, but I don't really like the idea of commercial use for this.
    OK, feel free to make further changes until you're content with the license model. I don't want to integrate anything with the author having mixed feelings about it. Another possibility would be going the other way round - creating an additional GPL licensed fork of Precomp that doesn't allow commercial use (and being able to integrate useful GPL licensed libraries on the other hand). But there's so much on my todo list for Precomp right now that at least for the next two months, I won't do more than internal testing with cdm anyway.

    Quote Originally Posted by Shelwien View Post
    But I don't like lzma2 - I think its better to use lzma with self-made MT.
    Unlike LZMA2, LZMA supports lc8 and MT compression in LZMA2 is poorly made.
    Yes, I'm not happy with the current LZMA2 implementation either, another thing to work on. Gonzalo suggested fastlzma2 as an alternative and I'm still think about adding zstd as a fast option. On the other hand, people can use whatever dedup/compression they like when using "-cn" (only decompress), so it might be smarter to improve streaming support first to make things like "tar --to-stdout *.* | precomp -cn | some_dedupe | some_compressor" work.

    Quote Originally Posted by Shelwien View Post
    Seen and noticed, yes. Also tested it a few times. For Precomp, I guess I'll stick with packJPG and brunsli for now, though.

    Quote Originally Posted by Shelwien View Post
    For LZMA2 stored blocks it could make sense, I suppose.
    But CDM doesn't have an internal matchfinder, so at least a dedup filter is necessary -
    ideally actual LZ without entropy coding.
    Yeah, dedup is another thing that will have to be implemented in Precomp in the next versions as it would give better/faster results than any external dedup, especially by removing the need to process identical compressed data twice in Precomp.

    Quote Originally Posted by Shelwien View Post
    cdm is more helpful for known bitcode formats without specific handler
    (or ones with a handler that failed to be recompressed normally).
    Also its better to use it after bitwise-BWT transformation (or some other method of sorting bits by context).
    Thanks, that confirms my assumptions and also fits to my plan to do more parsing in Precomp that detects more formats even if they can't be processed, but they might get useful later (or for the user to know that they exist in the file).
    http://schnaader.info
    Damn kids. They're all alike.

  29. Thanks:

    Mike (17th May 2020)

  30. #80
    Member
    Join Date
    Feb 2017
    Location
    none
    Posts
    25
    Thanks
    6
    Thanked 13 Times in 6 Posts
    After severals tests and compilations here is what i have, and i compared with Xtool 0.9

    I tested two files of two new games

    LIFE IS STRANGE 2 [UNREAL ENGINE 4.16]
    File: lis2-windowsnoeditor-bulk.pak 603.5 MB
    You can download the file to make tests here

    ./precomp048.x86_64.17052020.r2.bin -cn -intense0 -d0
    New size: 1499226364 instead of 632785771
    Time: 5 minute(s), 58 second(s)
    Recompressed streams: 23197/23198
    GZip streams: 0/1
    zLib streams (intense mode): 23197/23197

    Recover: -r -d0
    Time: 3 minute(s), 53 second(s)
    xtool e:precomp:c32mb,t4:zlib lis2-windowsnoeditor-bulk.pak lis2-windowsnoeditor-bulk.unp
    New size:1497657676 instead of 632785771
    Time: 0m 46,610s


    xtool d:precomp:c32mb,t4:zlib lis2-windowsnoeditor-bulk.unp lis2-windowsnoeditor-bulk.pak
    Time: 0m 37,236s
    Well in this case Xtool beats Precomp, 37 seconds vs 3 minutes 53 seconds, why?

    Another test, in this case is the game

    THE ETERNAL CASTLE REMASTERED [created with GAMEMAKER STUDIO], it uses a lot of high compressed png's
    File: game.unx 213.3 MB
    You can download the file to make tests here

    precomp048.x86_64.17052020.r2.bin -cn -intense0 -d0 game.unx
    Time: 6 minute(s), 15 second(s)
    Recompressed streams: 319/319
    PNG streams: 5/5
    PNG streams (multi): 314/314

    Recover: -r -d0
    Time: 5 minute(s), 34 second(s)

    Using Xtool 0.9 with hi2fraw and raw2hif dll's
    xtool e:precomp:c32mb,t4:zlib (using hif data)
    Time: 4m 54,092s
    New size: 2099419005 instead of 223699564


    xtool d:precomp:c32mb,t4:zlib (using hif data)
    Recover Time: 5m 58,695s
    In this case precomp was more faster than xtool recovering the data took 5 minutes 34 seconds vs 5 minutes 58 seconds, but a bit more slower unpacking the data

    I compiled precomp for Linux 32 bits and it crashes unpacking at 42.23% the file game.unx to pcf with this error
    ERROR 3: There is not enough space on disk
    I tested the version of windows 32 bits and worked well..

    It is possible make precomp fast like Xtool? 37 seconds vs 3 minutes 53 seconds it is a big difference

    I attach the precomp (linux 64 bits) i compiled/used and the Xtool 0.9
    Attached Files Attached Files

  31. #81
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,909
    Thanks
    291
    Thanked 1,271 Times in 718 Posts
    > additional GPL licensed fork of Precomp that doesn't allow commercial use

    I'm okay with commercial use of precomp, but don't like the idea of people selling simple GUI wrappers.

    Anyway, that cdm version isn't fast enough to become popular on its own, so I'd just keep it under LGPL.

    > Gonzalo suggested fastlzma2 as an alternative

    Yes, it solves the issue with 7z's independent block compression.
    But there's still lc8, which is supported by lzma, but not lzma2.

    I think it may be better to just use lzma instead - you'd have
    enough independent streams for MT anyway, normally.

    > and I'm still think about adding zstd as a fast option.

    Yes, its certainly a good idea, since zstd has both MT and dedup
    already integrated.

    > "tar --to-stdout *.* | precomp -cn | some_dedupe | some_compressor" work.

    I made a blockwise MT wrapper for precomp executable:
    http://nishi.dreamhosters.com/rzwrap_precomp_v0a.rar

    But I had to patch precomp to stop it from asking questions and fix constant
    names of temp files.

    > For Precomp, I guess I'll stick with packJPG and brunsli for now, though.

    It makes sense to use jojpeg for sets of small jpegs, like in pdfs.
    Since jojpeg works with solid entropy coding, it might be actually faster than
    multiple calls of packjpg in that case (and provide better compression).

    Speed/ratio also can be adjusted to an extent, by enabling/disabling some contexts.
    (mod_jpeg.inc line 558)

    > removing the need to process identical compressed data twice in Precomp.

    Yes, detecting duplicate recognized streams would be certainly helpful.
    Though there's a question of partial dups then - I wonder if its possible
    to combine recompression with partial dedup of compressed content.

    Also it makes sense to combine dedup filter, format detector and entropy filter -
    otherwise its necessary to have multiple instances of dedup filter,
    which is very inefficient, since each needs its own multi-GB window buffer.

    > do more parsing in Precomp that detects more formats even if they can't be
    > processed, but they might get useful later

    I think we need some special framework for this, like a parsing optimizer.
    Unfortunately many relevant formats don't have any reliable signatures.

    For example, I made a port of rawdet to detect LZ4 instead...
    turns out, that its a good idea to keep detector always running
    even on decodable data - after first 64k (which is LZ4 window size),
    any data becomes valid LZ4, and there's no way to detect end-of-stream.

    Same thing actually happens with deflate too, but mostly on fragmented files.

  32. Thanks:

    Mike (18th May 2020)

  33. #82
    Member
    Join Date
    Feb 2017
    Location
    none
    Posts
    25
    Thanks
    6
    Thanked 13 Times in 6 Posts
    I tried to replace the zlib of precomp with this zlib to see how it works

    https://github.com/matbech/zlib


    3rd Party Patches
    Optimizations from Intel without the new deflate strategies (quick, medium)
    crc32: crc32 implementation with PCLMULQDQ optimized folding
    deflate: slide_hash_sse in fill_window
    deflate: use crc32 (SIMD) to calculate hash
    https://github.com/jtkukunas/zlib

    Optimizations from Cloudflare
    deflate: longest_match optimizations (https://github.com/cloudflare/zlib/c...7290bd5b63c65c)
    https://github.com/cloudflare/zlib

    Optimized longest_match
    https://github.com/gildor2/fast_zlib
    Adapted function to use crc32 (SIMD) to calculate hash and integrated match compare optimization from above

    Other small changes
    put_short optimization (https://github.com/Dead2/zlib-ng/com...ab35a037e9b9d0)
    https://github.com/Dead2/zlib-ng

    Optimizations for ARM
    adler32: Adenilson Cavalcanti <adenilson.cavalcanti@arm.com>
    fill_window: Mika T. Lindqvist <postmaster@raasu.org>

    adler32-simd from Chromium
    https://github.com/chromium/chromium...adler32_simd.c

    Additional changes
    Support and optimizations for MSVC15 compiler
    Support for _M_ARM64
    Use __forceinline

    Use tzcnt instead of bsf
    This improves performance for AMD CPUs

    Implementation optimized for modern CPUs (Intel Nehalem)
    Removed alignment loop in crc32
    Adds temporary in crc32_little calcuation
    Less manual unrolling

    Others
    Optimized insert_string loop

    New features
    General purpose crc32 interface
    Based on Intel's PCLMULQDQ crc32 implementation.
    New functions:
    crc32_init
    crc32_update
    crc32_final
    Brings ~200% performance improvement over the original zlib crc32 implementation
    Here is the benchmark
    https://github.com/matbech/zlib-perf...ter/Results.md

    But it failed to compile

    First i got this error

    /precomp-cpp/contrib/zlib/match.h:246:55: error: ‘uintptr_t’ undeclared (first use in this function); did you mean ‘__intptr_t’?
    UPDATE_HASH_CRC_INTERNAL(s, hash, *(unsigned *)((uintptr_t)(&scan_end[0])));
    ^~~~~~~~~
    __intptr_t
    /precomp-cpp/contrib/zlib/match.h:246:55: note: each undeclared identifier is reported only once for each function it appears in
    /precomp-cpp/contrib/zlib/deflate.c: In function ‘deflate_slow’:
    /precomp-cpp/contrib/zlib/deflate.c:2134:33: warning: implicit declaration of function ‘min’ [-Wimplicit-function-declaration]
    uInt insert_count = min(string_count, max_insert - s->strstart);
    But i i think i "solved" editing the file contrib/zlib/deflate.c and adding this line
    Code:
    #include <stdint.h>
    Then i compiled again and i got this other error at 100%

    [100%] Linking CXX executable precomp
    CMakeFiles/precomp.dir/contrib/zlib/deflate.c.o: In function `fill_window_c':
    deflate.c:(.text+0x35c): undefined reference to `UPDATE_HASH'
    CMakeFiles/precomp.dir/contrib/zlib/deflate.c.o: In function `deflate_fast':
    deflate.c:(.text+0x1195): undefined reference to `UPDATE_HASH'
    CMakeFiles/precomp.dir/contrib/zlib/deflate.c.o: In function `deflate_slow':
    deflate.c:(.text+0x1ca6): undefined reference to `UPDATE_HASH'
    deflate.c:(.text+0x1e22): undefined reference to `UPDATE_HASH_CRC_INTERNAL'
    deflate.c:(.text+0x1f94): undefined reference to `min'
    deflate.c:(.text+0x224e): undefined reference to `UPDATE_HASH'
    CMakeFiles/precomp.dir/contrib/zlib/deflate.c.o: In function `deflateSetDictionary':
    deflate.c:(.text+0x2d7e): undefined reference to `UPDATE_HASH'
    collect2: error: ld returned 1 exit status
    CMakeFiles/precomp.dir/build.make:4644: recipe for target 'precomp' failed
    make[2]: *** [precomp] Error 1
    CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/precomp.dir/all' failed
    make[1]: *** [CMakeFiles/precomp.dir/all] Error 2
    Makefile:129: recipe for target 'all' failed
    make: *** [all] Error 2
    Any idea how to fix?
    Thanks

  34. #83
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    612
    Thanks
    250
    Thanked 240 Times in 119 Posts
    @redrabbit: Just a quick post to let you know I'm currently in researching the bad performance you describe on those 2 files and stuff I found out while profiling the latest Precomp version.

    Quote Originally Posted by redrabbit View Post
    I tried to replace the zlib of precomp with this zlib to see how it works
    Don't waste too much time on this. As I saw your post, I wondered about where zlib is still used as preflate is doing all deflate related stuff itself. But zlib is indeed still used to speed up intense and brute mode (testing the first few bytes of a potential stream to avoid recompressing false positives).

    But profiling the latest version shows that for the Life Is Strange 2 file you posted, this is only using 0.3% of CPU time (of .pak -> .pcf, in -r zlib isn't used at all). So using a faster zlib library could only speed up things by 0.3%.

    On the other hand, I found something else and fixed it some minutes ago in both branches: About 5% of CPU time was wasted because uncompressed data was written to a file to prepare recursion even though both "-intense0" and "-d0" disable recursion, so the temporary file wasn't used at all. Fixed this by writing the file only if it's used. Testing this shows it works: 3 min 11 s instead of 3 min 21 s for "-cn -intense0 -d0" of the Life Is Strange 2 file. Not much, but some progress. Might have more impact on non-SSD drives.
    Last edited by schnaader; 24th May 2020 at 21:20.
    http://schnaader.info
    Damn kids. They're all alike.

  35. Thanks:

    redrabbit (25th May 2020)

  36. #84
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    612
    Thanks
    250
    Thanked 240 Times in 119 Posts
    OK, so here's the long answer.

    I could reproduce the bad performance of the Life is Strange 2 testfile, my results are in the table below. There are two things this all boils down to: preflate (vs. zlib brute force in xtool and Precomp 0.4.6) and multithreading. Note that both the Life is Strange 2 times and the decompressed size are very similar for Precomp 0.4.6 and xtool when considering the multithreading factor (computed by using the time command and dividing "user" time by "real" time). Also note that the testfile has many small streams (64 KB decompressed each), preflate doesn't seem to use its multithreading in that case.

    Although preflate can be slower than zlib brute force, it also has big advantages which can be seen when looking at the Eternal Castle testfile. It consists of big PNG files, preflate can make use of multithreading (though not fully utilizing all cores) and is faster than the zlib brute force. And the zlib brute force doesn't even manage to recompress any of the PNG files. Xtool's (using reflate) decompressed size is somewhere between those two, most likely because reflate doesn't parse multi PNG files and can only decompress parts of them because of this.

    So, enough explanation, how can the problem be solved? Multithreading, of course. The current branch already features multithreading for JPEG when using -r and I'm working on it for deflate streams. When it's done, I'll post fresh results for the Life is Strange 2 testfile, should be very close to xtool if things work out well. Multithreaded -cn or -cl though is a bit more complex, I've got some ideas, but have to test them and it will take longer.

    Code:
    Test system: Hetzner vServer CPX21: AMD Epyc, 3 cores @ 2.5 GHz, Ubuntu 20.04 64-Bit
    
    Eternal Castle testfile, 223,699,564 bytes
        program                                 decompressed size       time (decompression/recompression)  multithreading factor (decompression/recompression) compressed size (-nl)
        ---
        Precomp 0.4.8dev -cn -d0                5,179,454,907           5 min 31 s / 4 min 45 s             1.73 / 1.64                                         118,917,128
        Precomp 0.4.6 -cn -d0                     223,699,589           8 min 31 s                          1.00                                                173,364,804
        xtool (redrabbit's result)              2,099,419,005
        
    Life is Strange 2 testfile, 632,785,771 bytes
        program                                 decompressed size       time (decompression/recompression)  multithreading factor (decompression/recompression)
        ---
        Precomp 0.4.8dev -cn -intense0 -d0      1,499,226,364           3 min 21 s / 2 min 14 s             0.91 / 0.99
        Precomp 0.4.8dev (after tempfile fix)   1,499,226,364           3 min 11 s / 2 min 21 s             0.92 / 0.99
        Precomp 0.4.6 -cn -intense0 -d0         1,497,904,244           1 min 55 s / 1 min 43 s             0.93 / 0.98
        xtool 0.9 e:precomp:32mb,t4:zlib (Wine) 1,497,672,882                 46 s /       36 s             2.75 / 2.87
    http://schnaader.info
    Damn kids. They're all alike.

  37. Thanks:

    redrabbit (25th May 2020)

  38. #85
    Member
    Join Date
    Feb 2017
    Location
    none
    Posts
    25
    Thanks
    6
    Thanked 13 Times in 6 Posts
    Thanks for the explanation and the testing

Page 3 of 3 FirstFirst 123

Similar Threads

  1. Test files to compress
    By KingAmada in forum Random Compression
    Replies: 2
    Last Post: 4th November 2019, 18:31
  2. How to expand .ff compressed files using Precomp & Fsum???
    By Manjunath in forum The Off-Topic Lounge
    Replies: 21
    Last Post: 7th September 2014, 13:47
  3. pim 2.9 compress mysql 5.1.32 x64 files
    By l1t in forum Data Compression
    Replies: 0
    Last Post: 23rd March 2009, 15:06
  4. Replies: 3
    Last Post: 10th November 2007, 21:32
  5. Replies: 12
    Last Post: 30th June 2007, 16:49

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •