Activity Stream

Filter
Sort By Time Show
Recent Recent Popular Popular Anytime Anytime Last 24 Hours Last 24 Hours Last 7 Days Last 7 Days Last 30 Days Last 30 Days All All Photos Photos Forum Forums
  • Gotty's Avatar
    Today, 01:54
    Gotty replied to a thread paq8px in Data Compression
    Well done, Eppie! That was a huge thing to do! Sorry I was away for so long (half a year!), I will gradually return.
    1864 replies | 539681 view(s)
  • Gotty's Avatar
    Today, 01:07
    Gotty replied to a thread paq8px in Data Compression
    Can't reproduce. I don't think that my system is 5x faster than yours. Could you try again? Could you also try disabling your antivirus when run timing tests? >>paq8px_v187.exe -8 nothing.txt paq8px archiver v187a (c) 2020, Matt Mahoney et al. Creating archive nothing.txt.paq8px187 in single file mode... Filename: nothing.txt (0 bytes) Block segmentation: ----------------------- Total input size : 0 Total archive size : 10 Time 0.26 sec, used 2172 MB (2278028105 bytes) of memory >>paq8px_v183fix1.exe -8 nothing.txt paq8px archiver v183fix1 (C) 2019, Matt Mahoney et al. Creating archive nothing.txt.paq8px183fix1 in single file mode... Filename: nothing.txt (0 bytes) Block segmentation: ----------------------- Total input size : 0 Total archive size : 10 Time 0.25 sec, used 2172 MB (2278036266 bytes) of memory
    1864 replies | 539681 view(s)
  • Gotty's Avatar
    Yesterday, 22:33
    Sorry for the long posts. I hope they are useful. It was not documented, but it is now. The thread is about Knuth's multiplicative hashing and paq8px has it "in the first place" - a proof that it is a good hash for the right use case.
    23 replies | 25605 view(s)
  • Gotty's Avatar
    Yesterday, 22:32
    ---- cut ---- The current paq8px hash implementation has the following parts: We have 14 large odd prime numbers. The first one is the golden one. So Knuth's multiplicative hashing is there. static HASHEXPR uint64_t hashes = {UINT64_C(0x9E3779B97F4A7C15), UINT64_C(0x993DDEFFB1462949), UINT64_C(0xE9C91DC159AB0D2D), UINT64_C(0x83D6A14F1B0CED73), UINT64_C(0xA14F1B0CED5A841F), UINT64_C(0xC0E51314A614F4EF), UINT64_C(0xDA9CC2600AE45A27), UINT64_C(0x826797AA04A65737), UINT64_C(0x2375BE54C41A08ED), UINT64_C(0xD39104E950564B37), UINT64_C(0x3091697D5E685623), UINT64_C(0x20EB84EE04A3C7E1), UINT64_C(0xF501F1D0944B2383), UINT64_C(0xE3E4E8AA829AB9B5)}; How did I get these large random primes numbers? I used an online random number generator. Generated a couple of numbers, looked up the next prime with wolframalpha. Then tried which ones in which combination works best. Really. :-) Didn't need to try for too long, the results were very similar. Hashing a constant number of inputs: (1,2 .. 13): static ALWAYS_INLINE uint64_t hash(const uint64_t x0) { return (x0 + 1) * PHI64; } static ALWAYS_INLINE uint64_t hash(const uint64_t x0, const uint64_t x1) { return (x0 + 1) * PHI64 + (x1 + 1) * MUL64_1; } static ALWAYS_INLINE uint64_t hash(const uint64_t x0, const uint64_t x1, const uint64_t x2, const uint64_t x3, const uint64_t x4, const uint64_t x5, const uint64_t x6, const uint64_t x7, const uint64_t x8, const uint64_t x9, const uint64_t x10, const uint64_t x11, const uint64_t x12) { return (x0 + 1) * PHI64 + (x1 + 1) * MUL64_1 + (x2 + 1) * MUL64_2 + (x3 + 1) * MUL64_3 + (x4 + 1) * MUL64_4 + (x5 + 1) * MUL64_5 + (x6 + 1) * MUL64_6 + (x7 + 1) * MUL64_7 + (x8 + 1) * MUL64_8 + (x9 + 1) * MUL64_9 + (x10 + 1) * MUL64_10 + (x11 + 1) * MUL64_11 + (x12 + 1) * MUL64_12; Or hashing a string: static ALWAYS_INLINE uint64_t combine64(const uint64_t seed, const uint64_t x) { return hash(seed + x); } Then we finally finalize the result by keeping the most significant bits for hashtable lookup and the next couple of bits (usually 8 or 16) as checksums for collision detection: static ALWAYS_INLINE auto finalize64(const uint64_t hash, const int hashBits) -> uint32_t { assert(uint32_t(hashBits) <= 32); // just a reasonable upper limit return uint32_t(hash >> (64 - hashBits)); } static ALWAYS_INLINE uint64_t checksum64(const uint64_t hash, const int hashBits, const int checksumBits) { assert(0 < checksumBits && uint32_t(checksumBits) <= 32); //32 is just a reasonable upper limit return (hash >> (64 - hashBits - checksumBits)) & ((1 << checksumBits) - 1); }
    23 replies | 25605 view(s)
  • Gotty's Avatar
    Yesterday, 22:15
    Remark: Most of the statements below are well known for this community. I just wanted to have them in one place with paq8px in focus. Overview Paq8px uses hash tables to store its context statistics. For each input byte many contexts are taken into account for prediction. Therefore for each input byte (sometimes: for each input bit) many statistics are gathered by the different models and are stored in hash tables. Naturally there are collisions in hash tables. Paq8px collision resolution policy is (simply put) to overwrite infrequent older context statistics. During compression as the hash tables become slowly full there are more and more collisions. As a result sometimes useful contexts are overwritten. We need to minimize the loss of useful contexts. We need a "good" hash function to disperse our contexts nearly uniformly into hash buckets. We also need a reasonably good checksum to test for collisions. We need two hash functions: one that squeezes multiple integers into one integer: like hashkey = hashfunc(x1,x2,x3); and one that squeezes arbitrary many bytes (a string) into one integer: like for(int i=0;i<size,i++) hashkey = hashfunc(hashkey,inputbuffer); Both needs to have good spreading (dispersion) and they need to be reasonably fast since we do a lot of hashing for every input bit or byte. Hash key size matters When looking up or inserting a new context statistic item to a hash table it needs an index (bucket selector) and a checksum (usually 1 or 2 bytes) to detect collisions. With maximum compression (memory) settings ContextMap and ContextMap2 used 19 bits for bucket selection and 16 bits as a checksum from the 32-bit hash key. This is a 3-bit overlap so the effective checksum is around 13 bits. This suggests that a 32-bit hash keys may not be enough. Which hash function(s) to use in paq8px then? When chosing a hash function, we need to understand the data we have. Range: We usually have "smaller" integers. Sometimes very small. These integers are not random. Hash functions developed and fine tuned with random data in mind (like hash functions tuned for 32-bit random numbers as inputs) may not perform optimally in this range. Order: hashing order matters for strings: (simplified example follows) when hashing "fireman" and "policeman" from left to right they may produce the same hash key after "fire" and "police". Including the remaining bytes (m+a+n) the hash value will not change. These strings will be represented by the same hash key in a hashtable so their collision will be inevitable and undetectable. Their statistics will be calculated together as they would be the same string. However strings having the same ending would probably also have very similar statistics. Therefore, in this case a collision is advantageous. Therefore hashing from left to right is probably better than right to left: more recent bytes will have greater "weight": when an early collision happens "in the middle" of the string, the last some bytes decide the final hash value. Most of the times in paq8px we hashed from right to left for performance reasons, so it was not ideal. Parameter order in multiple value hashes: when we hash a fixed number of integers and do it similarly as in the case of strings (i.e. combining the input values in order: one after the other), then in case of an intermediate collision the most recently added inputs will have greater "weight". But this time it is a disadvantage. We usually want our inputs to have "equal weight". So as a value hashing function hash(x1)+hash(x2)+hash(x3) is probably better then hash(hash(hash(x1),x2),x3). ("+" is not necessarily an addition.) Tests I tried several hash functions mostly from the multiplicative, xor-rotate and crc families. I haven't tested any hash functions that I thought would be an overkill (for performance reasons). I ran a couple of tests with my hash function candidates using different parameters and different hash table sizes. I used n rolling bytes (n=4..16) from larger input files (binary and text) as input values. The "best" functions produced surprisingly similar results - the difference was negligible. The number of collisions were *very* similar. There is no clear winner. But some of the functions are simpler and faster. Based on that I finally chose a winner. Some interesting findings Whenever a weaker hash function was post-processed ("finalized") with a simple permutation, the result was usually better than the "raw" result. Finalizing the "raw" result usually helps. Note: most of the best hash functions have some finalization step. This final scrambling is not necessarily a simple permutation. Remark: paq8px (until v167) post-processed the calculated hash when it was used for hash table lookup to enhance quality: cx=cx*987654323+i; cx=cx<<16|cx>>16; cxt=cx*123456791+i; i*=123456791; i=i<<16|i>>16; i*=234567891; (see also https://encode.su/threads/806-A-nice-article-about-hashing?p=17414&viewfull=1#post17414) On the other hand, when "too much" hashing was performed, the result was sometimes slightly worse. "Too much" hashing: include the character position, include a magic additive constant (for multiplicative hashes) or begin with a magic number as a seed (vs. begin with 0). We can decrease the number of hash calculations: hashing more input bytes (4) in one go is a little bit better than doing 4 hashes after each other. Too much hashing may weaken the hash key. The above finalizers were dropped in v168, it turned out that hash quality was improved so using such finalizers was already disadvantageous. I'd mention two of the more outstanding string functions: crc32c and the family of 32/64 bit multiplicative hashes with large 32/64 bit multipliers. Crc32c is a stable and good hash function. It for instance beats the internal hash function of the .NET framework. It has hardware support in some CPUs. It is fast with hardware-acceleration but slower with software emulation. Crc32c is good but I excluded it for being 32-bit. Multiplicative hashes are fast and spread very well when using a good multiplier (a large odd random number). Remark: Paq8px (until v167) used a combination of multiplicative hashes as its main hash function with some post-processing ("finalizing"): // Hash 2-5 ints. inline U32 hash(U32 a, U32 b, U32 c=0xffffffff, U32 d=0xffffffff,U32 e=0xffffffff) { U32 h=a*200002979u+b*30005491u+c*50004239u+d*70004807u +e*110002499u; return h^h>>9^a>>2^b>>3^c>>4^d>>5^e>>6; } The magic numbers here are all primes - except the first one (I don't know why). Notice that there is some post-processing involved (xors and shifts). Between v160 and v167 paq8px used a new string hash from the multiplicative hash family in MatchModel (see the combine() function), and a high-quality xor-shift hash as a general hash function: https://github.com/skeeto/hash-prospector#three-round-functions uint32_t triple32(uint32_t x) { x ^= x >> 17; x *= UINT32_C(0xed5ad4bb); x ^= x >> 11; x *= UINT32_C(0xac4c1b51); x ^= x >> 15; x *= UINT32_C(0x31848bab); x ^= x >> 14; return x; } Note: it's high quality when the inputs are random 32-bit numbers. But our use case is different: it turned out that this is not the best for us - even when it is so good (for random inputs). So soon it was out. ---- cut ----
    23 replies | 25605 view(s)
  • Gotty's Avatar
    Yesterday, 21:42
    Yes, it is clear that there is no clear global winner - any algo or magic number must be tried in the specific context. For example in paq8px it is important to 1) have a near-uniform spread i.e. as few collisions as possible, 2) be fast, because there are hundreds of hash calculations for every bit (or byte). Yes. Not one hash calculation for every byte or dword. Hundreds for every bit (or byte). (Let it sink :-)) That's a lot. 3) And finally to have a larger result than 32 bits, since we have large hash tables and we need some bits for collision detection a well. We need 35+ bits in some cases. It is also important that we usually hash small values or short strings of bytes. And a multiplicative hash or a set of multiplicative hashes are ideal for that (they are most suitable for small values). That is one specific use case. When searching for a solution to improve hashing I tried a lot of possibilities and settled for the current one. This gave the best improvement for paq8px. It is clear from the results (link#1 link#2) that the larger the files were, the better results we got. (There are more collisions to be expected for larger files, so better hashing is more crucial for them). From a lot of experience I can say: you have to try until you are satisfied with one algo/method.
    23 replies | 25605 view(s)
  • zmodem's Avatar
    Yesterday, 20:59
    I was curious, so looked some of these up. Plan 9's libflate: https://github.com/0intro/plan9/blob/c0f3d95/sys/src/libflate/deflate.c#L85 Google related compressors seem to favor 0x1e35a7bd Snappy: https://github.com/google/snappy/blob/f16eda3/snappy.cc#L94 Brotli: https://github.com/google/brotli/blob/78e7bbc/c/enc/hash.h#L69 Go's flate package: https://golang.org/src/compress/flate/deflate.go#L293 (It's also part of the WebP spec: https://developers.google.com/s/results/speed/webp?hl=pl&q=0x1e35a7bd) LZ4 and ZStd use 2654435761, the golden ratio LZ4: https://github.com/lz4/lz4/blob/7224f9b/lib/lz4.c#L638 ZStd: https://github.com/facebook/zstd/blob/c3e921c6398b8952b5248a1d99d0f58e69763d9e/lib/compress/zstd_compress_internal.h#L609
    23 replies | 25605 view(s)
  • Gotty's Avatar
    Yesterday, 20:23
    Gotty replied to a thread paq8px in Data Compression
    @Eppie, in the root folder there is a test.cpp file you created earlier. I looks like it is still under development (?). Could you please finalize it sometime? For now I think I'm gonna removed it so compilation is simpler (= "compile all cpp files"). Placing it to a "test" subfolder would also work, but it looks like you may have a test folder since it's excluded in .gitignore. Can we place it there?
    1864 replies | 539681 view(s)
  • Gotty's Avatar
    Yesterday, 20:14
    Gotty replied to a thread paq8px in Data Compression
    Usually we provide exes for the public with no specific instruction set and let the cpu dispacher do its thing during runtime. In order to be able to publish a general build for the next release again, I will modify the current behavior a little bit.
    1864 replies | 539681 view(s)
  • Gotty's Avatar
    Yesterday, 20:05
    Gotty replied to a thread paq8px in Data Compression
    Sorry that those changes caused you inconvenience. Folder compression was not removed but changed. The pre-v137 solution was unfortunately not deterministic, and that caused headaches. Also nobody seemed to use folder compression at that time. Darek uses per-tarred files, for example, that also gives deterministic results. Since v137 you'll need to create a list of files you wish to add to the archive. You can build this list of files ahead of time, and use that for all your subsequent tests. Since the list is fixed, the result is deterministic. Use "paq8px.exe -9 @listoffiles.txt" for compression. See the command line help for more. Or if you are for GUI solutions use https://github.com/moisespr123/PAQCompress. I hope it helps. The default compression (level 5) was removed, because *really* nobody used it. The default compression level was only there because with drag and drop you can not specify a compression level yourself. And level 5 is safe enough: if you don't have the necessary gigabytes of RAM it would still run. But noone really uses level 5. Creating the archive in the same place as the original file is a no-go. When I try to compress the same file with different options and run the test simultaneously one archive overwrites the other. I can understand that with the drag and drop functionality the best for you is when the file is created in the original folder and not on your desktop, but using the command line and running parallel tests the optimal default archive location is the location where you run your command, and not where the original file resides. But this is just the default. However you can still specify any destination folder for the archive for the command line so having the archive in the folder of the original file is still easily achievable. It's just not the default. @To all: is there anyone else that needs any of the above functionality to be reverted back to how it was before? Related: https://github.com/hxim/paq8px/issues/111
    1864 replies | 539681 view(s)
  • Gotty's Avatar
    Yesterday, 16:52
    Gotty replied to a thread paq8px in Data Compression
    Thank you Kaitz! I'm importing your fix from paq8pxd_v76.
    1864 replies | 539681 view(s)
  • Gotty's Avatar
    Yesterday, 16:51
    Gotty replied to a thread paq8px in Data Compression
    Thank you for submitting this issue. It is fixed in the next release. paq8px_v183fix1.exe -8 P1050204.JPG = 4891731 -> 4718331 paq8px_v187fix1.exe -8 P1050204.JPG = 4891731 -> 3549528
    1864 replies | 539681 view(s)
  • Gotty's Avatar
    Yesterday, 14:52
    Oh, I see. It looks like your use case is significantly different from mine. Let me share what exactly I did. For every file (49 in total) in calgary, cantenbury, maximumcompression, silesia plus some images from Darek plus enwik8 I fetched the consecutive bytes ("strings"), hashed them using different algorithms and counted how many items are to be stored in hash tables with sizes in the range 16..22 bits (64K..4M). I run the same experiment for 2..8-byte strings. Just to have a more global view. Results for 4-byte strings and 128K hash tables? For example, in calgary\bib there are 19086 distinct 4-byte strings. Fitting them to a 128K (17 bit) hash table looks simple and easy but we need to squeeze those 32-bit (4-byte) values to their 17-bit representation "somehow". That is what hashing does. It turns out that assigning 19086 values *randomly* to 128K buckets will produce some collisions (the birthday problem) even if we have plenty of room. We are to expect around 1300 cases where 2-3 different strings will be assigned to the same hash table bucket. That is a 0,069 collision-rate on average per bucket. The worst collision rate to be expected for 4-byte strings with a 128K hash table is from silesia\xml. It has 125899 distinct 4-byte strings. Having 131072 buckets and assigning our strings randomly to the buckets we are to expect 0.357 collisions per bucket on average. So I established these expected collisions rates for every . Then measured the actual collision rate for the different hashing algorithms. I simply normalized at the end: my metric for cross-scenario comparison is: the actual collision rate divided by the expected collision rate. A value less than or around 1.0 means the hash algo performed well enough. I averaged these metrics to have a total and also across dimensions (the string sizes and hash table sizes) to verify that this approach gives consistent result "no matter in which scenario". Remarks: - Hashing is not a random assignment to buckets, because the input strings make it deterministic, but having such an expected value for every setup we can compare the hashing algorithms across the setups. I can not just add up all the number of collisions for every , that would give a strong bias with setups with very low expected collision rates. - There are other metrics for measuring hash quality, but I think the above is useful for finding the best algo for our scenario: which are the hash algorithms that perform equally well with different (small) string sizes and hash table sizes. - With proper collision detection (and by chaining the strings of course) any "good enough" hashing works equally well. You will need to choose the faster/simpler one. - A collision is not necessarily a bad thing. In lucky cases it can even be advantageous. If for instance "ant", "fire" and "police" would produce the same base hash value, and would use the same statistics for the three words (since you don't have collision detection), and your text would contain the space or "m" as the next character (as is "antman", "fireman" and "policeman" or just simply "ant ", "fire " and "police " then you would be lucky with this collision. We have the following differences: - I ran the tests on every file separately. You tar your input files. - For me a "collision" is when there are multiple items to be assigned to the same bucket. For you a collision may occur multiple times: when one item overrides an old one, and then a newer occurrence of the old string overwrites the new one. And this can go on multiple times. As I understand you don't have collision detection (right?). - What else? What do you have in your compression algorithm that would explain the result you experienced? Attached are the results for calgary/bib for 4-byte strings and 128K hash table size. The bold items are the hash algorithms with different magic numbers you tried where you hash the dword at once not byte-by-byte.
    23 replies | 25605 view(s)
  • Dresdenboy's Avatar
    Yesterday, 14:22
    I found my latest version of the prefix emitter. It's even just 11B in the main loop (x86, 16b) without proper setup. Does the Z80 have bitfield instructions? I guess not. I programmed them the last time 28 years ago. Then the typical shift, test, reload would be required.
    29 replies | 1696 view(s)
  • Shelwien's Avatar
    Yesterday, 13:18
    Shelwien replied to a thread Zstandard in Data Compression
    Incidentally it was also added by winzip as method 93: https://support.winzip.com/hc/en-us/articles/115011811307-Unsupported-Zip-format
    437 replies | 131221 view(s)
  • algorithm's Avatar
    Yesterday, 09:32
    @smjohn1 You have only variable length offset. Your smaller size offset is only up to 127 distance. Compression ratio will not significantly improve. In my case the the smaller version had 2 bit LL, 2 bit ML and 11 bit offset. Also for fast modes 64KB window in LZ4 is too big. Having 4K hash table is the bottleneck in compression ratio. LZ4 is optimized for 32KB cache size. I think there can be an improvement in compression ratio if there were multiple versions for different cache sizes. Some mobile phones even have 128 KB L1. But i don't expect big improvement.
    8 replies | 256 view(s)
  • algorithm's Avatar
    Yesterday, 09:10
    I have designed one using xors and 2 complements that emulate cmov but didn't implement. But the code is from 5 years ago.
    8 replies | 256 view(s)
  • Jarek's Avatar
    Yesterday, 09:01
    Jarek replied to a thread Zstandard in Data Compression
    Zstd was added to ZIP a few days ago with method ID 20: https://pkware.cachefly.net/webdocs/APPNOTE/APPNOTE-6.3.7.TXT https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT
    437 replies | 131221 view(s)
  • Kirr's Avatar
    5th June 2020, 17:37
    FQSqueezer is in! But only tested on small data due to its extreme slowness. Thanks for suggestion, dnd! It's also worth mentioning that FQSqueezer is designed for FASTQ data, therefore its performance in my benchmark should not be taken as indicative of its actual performance on its intended data.
    27 replies | 3146 view(s)
  • Kirr's Avatar
    5th June 2020, 17:32
    I guess this discussion is about deflate-based libraries. Since I benchmark command line tools rather than bare libraries, each lib would need a command line interface before it can be tested. I already added p7zip and libdeflate's streaming gzip replacement to my to-do benchmark queue. If you have more suggestions in mind, please post here with links. Thanks!
    9 replies | 603 view(s)
  • Darek's Avatar
    5th June 2020, 17:28
    Darek replied to a thread Paq8sk in Data Compression
    122'505'372 - enwik9 -x15 -w -e1,english.dic by Paq8sk19, time 134115,35s - not as bad time as for paq8sk13 Score very close to my estimate.
    98 replies | 8842 view(s)
  • lz77's Avatar
    5th June 2020, 16:56
    ((a*271+b)*271+c)*271+d is even worse than 506832829, see my message #11...
    23 replies | 25605 view(s)
  • Jyrki Alakuijala's Avatar
    5th June 2020, 16:38
    how do they relate to 506832829
    23 replies | 25605 view(s)
  • smjohn1's Avatar
    5th June 2020, 16:37
    True. my tests of the simpler version ( see attached lz4.c above ) shows about 2/3 of original decompression speed. But in many applications, decompression speed is already way higher than other components, so even of 2/3, that wouldn't be a problem. Thus there is value in improving compression ratio and speed, if improvement is significant. My simple version of course only shows about 1% to 2% gain in compression ratio, which is not significant in most applications. That is why I think it might be worthwhile to explore more gains.
    8 replies | 256 view(s)
  • smjohn1's Avatar
    5th June 2020, 16:31
    Definitely welcome to change my modifications below to see if there are good improvements. ​
    8 replies | 256 view(s)
  • smjohn1's Avatar
    5th June 2020, 16:29
    Here is a simpler version: window size reduced to 32K ( cannot upload lz4.h, why .h extension is not allowed? ), and use 1+7 format for offset ( 1 byte for smaller offsets and 2 bytes for others ). So only changes are in lines 1011, 1016-1018 ( for encoding) and 1769, 1969 (for decoding). Works for pic, news, trans in calgarycorpus. Any other places need to be taken care of? ( Just for fast mode ). ​
    8 replies | 256 view(s)
  • Bulat Ziganshin's Avatar
    5th June 2020, 16:24
    did you try to use CMOV and other branchless approaches?
    8 replies | 256 view(s)
  • pklat's Avatar
    5th June 2020, 14:59
    pklat replied to a thread deflate in Data Compression
    Thanks Shelwien, I understand (better) now.
    2 replies | 155 view(s)
  • lz77's Avatar
    5th June 2020, 12:49
    See my new CalcHash and compare results with ones in my message #11: function CalcHash(dw: dword): dword; var a,b,c,d: byte; begin a:=dw and $ff; b:=dw shr 8 and $ff; c:=dw shr 16 and $ff; d:=dw shr 24 and $ff; Result:=(((a*271+b)*271+c)*271+d) and $1FFFF; end; enwik8: 40.579% enwik9: 36.318% silesia.tar: 37.434% It's slower on 15-20% and ratio always worse comparing with 123456789...
    23 replies | 25605 view(s)
  • Shelwien's Avatar
    5th June 2020, 12:44
    Shelwien replied to a thread deflate in Data Compression
    Do you mean, why other deflate libraries produce different deflate data than zlib? Any LZ format has multiple ways to parse the data into strings (eg. "abc"="a"."b"."c"="ab"."c"="a"."bc"), plus a string can be encoded as a reference to any of its previous occurrences within the window, which adds even more options. Deflate is a relatively complicated version of LZ algorithm - it uses Huffman coding for the data, then RLE + Huffman for length tables of primary Huffman codes. So there're multiple options for splitting the data into blocks, then parsing each block, then encoding of each block header. Its actually very hard to find one specific way of data parsing that would result in best compression - there're too many combinations so just trying all of them is impossible, and even iterative optimization methods can run for a week and still keep finding small improvements. Thus all practical compressor implementations only can rely on various heuristics to run with reasonable speed, so its not surprising that different implementations would produce different outputs - its pretty unlikely that different developers would make the same intuitive choices dozens of times.
    2 replies | 155 view(s)
  • algorithm's Avatar
    5th June 2020, 11:59
    I had made a LZ4 variant with 32KB window with an additional smaller 2 byte match sequence(instead of normal 3). While it had better compression, decompression speed was like half as fast due to a single unpredictable branch. Your method will likely have the same performance penalty. (unpredictable branches are expensive).
    8 replies | 256 view(s)
  • pklat's Avatar
    5th June 2020, 09:42
    pklat started a thread deflate in Data Compression
    apart from compression level and size of the sliding window, what makes zip encoder produce different deflate data from other algorithms?
    2 replies | 155 view(s)
  • Cyan's Avatar
    5th June 2020, 05:05
    Cyan replied to a thread Modifying LZ4 offset in Data Compression
    A more complete source code package will probably prove helpful. There are too many ways such a modification could introduce flaws, so it requires paying attention to implementation details. Good news is, you have tests to quickly check the outcome.
    8 replies | 256 view(s)
  • smjohn1's Avatar
    5th June 2020, 04:40
    I am trying a new encoding method for LZ4's match offset. which varies from 1 byte and 3 bytes, still byte aligned. Of course the hope is most offsets only need 1 byte, thus better compression ratio, and maybe compression speed as well. In implementation by simple modifications of the LZ4-V.1.9.2 package, in LZ4_compress_generic(), two places of LZ4_writeLE16(op, (U16)offset); op+=2; and LZ4_writeLE16(op, (U16)(ip - match)); op+=2; are replaced with new encoding method; ( just ignore the high compression part ) and two places in LZ4_decompress_generic() offset = LZ4_readLE16(ip); ip+=2; are replaced with new offset decoding rules ( a bit complicated ). Thought that would do. But while some files can be decompressed successfully, some cannot with errors such as Invalid Checksum : Decoding error at pos 152033 (block 0, sub 1, pos 152033) and some with "Error 66 : Decompression error : ERROR_decompressionFailed" ( on enwik8) So obviously somewhere else needs to be modified as well. Could anyone shed some lights on where to make changes? Thanks in advance.
    8 replies | 256 view(s)
  • SolidComp's Avatar
    5th June 2020, 02:26
    SolidComp replied to a thread Brotli in Data Compression
    Strange, I don't get anywhere near that. I didn't know there was a -12 level, and when I run it I get 34,768, nowhere near as good as 7-Zip's gzipper. What's your command line syntax?
    261 replies | 82836 view(s)
  • Gotty's Avatar
    4th June 2020, 23:52
    Thank you! That is unfortunately not optimal for this kind of hash. This hash works best with small values. The larger the values gets, the worse it performs. Probably that is why a smaller multiplier worked better for you (so far). I've just run some checks and I got the best results on the corpuses using a hash function originally from paq8a, still used in paq8pdx. The second place is for crc32c and the 3rd is from Robert Sedgwicks' hash function. The fastest from the top 3 is paq8a's wordmodel hash. So that is the absolute winner. Could you please try something similar like ( ((a*271+b)*271+c)*271+d ) where a,b,c,d are four bytes? Keep the lower bits (and 0x1FFFF) at the end. If you do timings: is it much slower in your case? (I'm still running tests, there are many candidates.)
    23 replies | 25605 view(s)
  • ScottX's Avatar
    4th June 2020, 22:53
    It could be because of UPX packer. You can try unpack .sfx by UPX (it's packed because of the size) by command: UPX -d rz.sfx But you get aprox. 32kB instead of 12kB.
    208 replies | 79872 view(s)
  • JamesB's Avatar
    4th June 2020, 18:48
    I just tried the latest version. It's as I recall - very fast but very light levels of compression. It also has no decompression support at all, so obviously it can't compete on that front. On the above file with the same machine, level 1 only. Redoing these encode timings as I realise I had I/O time in there which doesn't help the ultra fast ones: Tool Encode Decode Size ------------------------------------------ vanilla 0m1.657s 0m0.546s 42298786 intel 0m0.569s 0m0.524s 56046821 cloudflare 0m0.974s 0m0.470s 40867185 jtkukunas 0m1.106s 0m0.392s 40867185 ng 0m0.695s 0m0.397s 56045984 zstd (gz) 0m1.582s 0m0.475s 42298786 libdeflate 0m0.838s 0m0.235s 39597396 libslz 0m0.482s N/A 55665941 So yes it's the fastest. libslz has levels up to -9, but they're barely any different in size/speed.
    9 replies | 603 view(s)
  • suryakandau@yahoo.co.id's Avatar
    4th June 2020, 16:51
    Maybe you could test paq8sk19 with -x14 option..thanx. ...
    98 replies | 8842 view(s)
  • Dresdenboy's Avatar
    4th June 2020, 16:06
    We could use "~" anywhere for "on average" (some thread compression ;)). Saukav is probably just a one encoding format example (zx7) of an actually global optimization problem (any compression method + decomp stub size on target platform + decomp stub adaptions + other adaptions). I'm doing a lot of analysis of x86 intros (which methods work best, could they be stripped down, etc.). For example (and you surely saw that when doing your own small decompressors), a more or less generic packer uses a lot of encodings, code paths etc. to be good overall. But for small files (and there's also only little research in this regard, e.g. about compressed message sizes of eth protocol etc.) some specific subset with very little case handling might fit better. In small files (like 256B intros) there are a lot of 1B (short opcodes) and many 2B matches, but very few 3+ matches. So any complex handling of the latter ones, like interleaved gamma codes etc. might be just too much. Also the probabilities of literal run lengths look very different here. That's why I also think there is room for improvement. :) This was more about being platform agnostic, to first find the "overhead" of algorithms for finding their models, tune to the data, or simply encode literals and matches. Then would follow the next step of adding the decomp stub sizes, as COM might mean 16B, "low" memory (on PC at least, with 40+B for accessing >1MB). Some current list of tested compressors with example compressed sizes: lzw: 227/ 256 B puc: 221/ 256 B exo: 201/ 256 B zx7: 231/ 256 B hrust: 223/ 256 B pkf: 221/ 256 B pkft: 227/ 256 B muc: 251/ 256 B apc: 214/ 256 B apack: 217/ 256 B paq1: 189/ 256 B lzma: 292/ 256 B zlib: 246/ 256 B gzip: 258/ 256 B bz2: 288/ 256 B lz4: 261/ 256 B paq1ss: 188/ 256 B paq3n: 188/ 256 B paq6v2: 188/ 256 B paq8f: 185/ 256 B paq8h: 194/ 256 B paq8l: 204/ 256 B paqar: 179/ 256 B If you have some interesting ones, I'd happy to add them. So far I also want to add oneKpaq, as it has a 128B decompressor. More on the other topics later. E.g. prefix emitter: on x86 I might try to use long SSE insts. They share a 0x0f prefix. This is easy to emit with little code. On Z80 there are several prefixes (FD, ED etc.), which are rather common. It would be easy to write such a compressor with encoding the prefixes in 1-2 bits.
    29 replies | 1696 view(s)
  • lz77's Avatar
    4th June 2020, 13:46
    inb1 is a pointer to the current position in source data. inb1^ is 4 byte from current position, it has dword (uint32) type. Yes, but if there was a Hutter Price to the one who can compress enwik8 in 40% with pure LZ77 I would get it. :)
    23 replies | 25605 view(s)
  • Darek's Avatar
    4th June 2020, 12:13
    Darek replied to a thread Paq8sk in Data Compression
    I've started to test paq8sk19. paq8sk22 or paq8sk23 is generally too slow... First 200-300MB goes generally (historically) well but after then often program start to use very high amount of memory and often block whole computer - as I said I have "only" 32GB of RAM.
    98 replies | 8842 view(s)
  • SolidComp's Avatar
    4th June 2020, 12:13
    James, what about SLZ? That's the fastest gzip implementation on earth as far as I know. It doesn't compress as well though. But it's incredibly fast, and super light in terms of CPU and memory consumption.
    9 replies | 603 view(s)
  • Jyrki Alakuijala's Avatar
    4th June 2020, 00:32
    ​If we didn't fix that yet, we will fix it very soon now.
    156 replies | 37473 view(s)
  • vteromero's Avatar
    3rd June 2020, 17:51
    vteromero replied to a thread VTEnc in Data Compression
    If anyone is interested, here's a blog post about encoding parameters in VTEnc library: https://vteromero.github.io/2020/06/03/encoding-parameters-in-vtenc-library.html
    22 replies | 4682 view(s)
  • Gotty's Avatar
    3rd June 2020, 17:39
    Yes. And needs to be large. A large odd "random" number.
    23 replies | 25605 view(s)
  • Gotty's Avatar
    3rd June 2020, 17:38
    Looks good so far. What is supplied to this function as a parameter? I don't think that the number of bits in the multiplier has an effect on the performance of the multiplication. The cpu does not actually look for the 1 bits in the multiplier to do shifts and additions. An explanation is here. 40.008% -> 39.974% is a 0.034% gain. I would not call it noticeably better. :-)
    23 replies | 25605 view(s)
  • Jyrki Alakuijala's Avatar
    3rd June 2020, 17:28
    Jyrki Alakuijala replied to a thread Brotli in Data Compression
    There are 120 transforms. Appendix B in rfc7932. A custom 'shared brotli' dictionary can bring its own transforms.
    261 replies | 82836 view(s)
  • Jyrki Alakuijala's Avatar
    3rd June 2020, 17:26
    Jyrki Alakuijala replied to a thread Brotli in Data Compression
    Zstd's two improvements over zlib are larger backward reference window and ANS instead of prefix coding. ANS does not help much (0.5 %) if you have a lot of literals -- in brotli's design experiments ANS was a (small) density loss over prefix coding. larger backward references don't help much if you have a small data size (particularly if it is around 32 kB or less). So, no big surprise zstd does not compress better than zlib here. Brotli brings three improvements that do help for small data: context modeling, static dictionary and super-cheap swapping back to previous entropy codes. Optimized javascript is not as easy to compress as many other data, and often we see only ~10 % further savings from moving from zlib to brotli.
    261 replies | 82836 view(s)
  • Stefan Atev's Avatar
    3rd June 2020, 16:16
    I guess ROM is the only thing you can kind of depend on to be your "dictionary"; even then, you'd be making assumptions about the machine you're running on that may be a bit too specifc (e.g., all the 6502 clones I ever programmed were Pravetz 82 clones, definitely had their own ROM different than an Apple II); it's exactly like ROP exploits - they work on specific versions of OSes, not generically on a platform. Though maybe for non-malicious code, there's less risk of somebody patching what you depended on. People used to try to check the code pointed to by an interrupt handler to try to ensure that they are not running under debugger, or that certain TSRs are not installed, and that ultimately breaks when what you're checking legitimately changes...
    29 replies | 1696 view(s)
  • lz77's Avatar
    3rd June 2020, 13:19
    I tested both numbers on my LZ77 type compressor: magic number = 123456789 | 506832829 enwik8: 40.549% | 40.573% enwik9: 36.305% | 36.314% silesia.tar: 37.429% | 37.423% lz4_win64_v1_9_2.exe: 42.341% | 42.359% The humorous constant 123456789 gives noticeably better compression. Your number is better only on silesia.tar (on 0.006%). I think, because silesia.tar contains very specific img files. > I believe the number doesn't need to be prime, but needs to be odd and have a good distribution of 0s and 1s. Yes, the number 123456789 includes sequences of 1, 2, 3 and 4 1-bits. By the way: when I used the Knut's number 2654435761, my algorithm with best ratio was compressing enwik8 in 40.008%. After I change the Knuth's number to the 123456789 my algorithm overcame the psychological frontier and showed 39.974%. :_yahoo2: < 40% on enwik8 only with hash table of 128K cells without search matches, source analysis & additional compression! May be after improvement the algorithm will show the compression ratio < 40% on a hash table of 64K cells...
    23 replies | 25605 view(s)
  • Jyrki Alakuijala's Avatar
    3rd June 2020, 12:14
    Are those numbers better or worse than my personal magic number: 506832829 I believe the number doesn't need to be prime, but needs to be odd and have a good distribution of 0s and 1s.
    23 replies | 25605 view(s)
  • lz77's Avatar
    3rd June 2020, 11:19
    Bitte, this is a piece of my code: const TABLEBITS = 17; TABLESIZE = 1 shl TABLEBITS; var table: array of dword; ................................ // Calculate hash by 4 bytes from inb1^ function CalcHash(dw: dword): dword; assembler; asm mov ecx,123456789 mul ecx shr eax,32-TABLEBITS end; { CalcHash } I believe that the fewer 1 bits will be in the magic constant, the faster the multiplication will be performed, which will increase the compression speed.
    23 replies | 25605 view(s)
  • pklat's Avatar
    3rd June 2020, 08:49
    file timestamps are stored also, and iirc they differ in NTFS and other filesystems
    3 replies | 145 view(s)
  • Shelwien's Avatar
    3rd June 2020, 06:50
    Windows and linux actually have different file attributes, so you'd need a hacked zip which doesn't store attributes, otherwise normally it won't happen. You can try using a zip built under cygwin/msys2 on windows side, but even these would have different default unix attributes (eg. cygwin shows 0770 for windows files), and possibly also custom defines for cygwin which would make them use winapi.
    3 replies | 145 view(s)
  • Shelwien's Avatar
    3rd June 2020, 06:43
    > wouldn't have been able to depend on compression routines in Windows. I actually mean chunks of existing code, like https://en.wikipedia.org/wiki/Return-oriented_programming#Attacks https://github.com/JonathanSalwan/ROPgadget#screenshots
    29 replies | 1696 view(s)
  • ivan2k2's Avatar
    3rd June 2020, 06:06
    1) try to compess non-textual files and check results 2) try to compress your text files with -ll or -l option and check results
    3 replies | 145 view(s)
  • Stefan Atev's Avatar
    3rd June 2020, 04:49
    I can see that, though it goes against my instincts :) I have seen people extract common instruction sequences into subroutines even if they were pretty arbitrary and logically unrelated; you eat 3 bytes to do a call (and one ret) each time you need the sequence, and that is basically an "executable LZ"; I can see how actual LZ would quickly be better since matches are encoded more efficiently than even near calls. However, for some data LZ is not that great, while a custom encoding could work quite well. None of the stuff I ever wrote assumed anything more than DOS, wouldn't have been able to depend on compression routines in Windows.
    29 replies | 1696 view(s)
  • introspec's Avatar
    3rd June 2020, 01:54
    I think some people made use of tricks like this. I have a lot of experience with older computers, for them data compression pretty much did not exist. I'd love to be proved wrong here, but I'd be very surprised if any of 1980s machines has anything of the kind in their ROMs.
    29 replies | 1696 view(s)
  • introspec's Avatar
    3rd June 2020, 01:51
    I think that there are two approaches to making a compressed intro. First, more common one, would be to compress your well-tuned code so that a bit of extra squeeze can be achieved. This is very traditional strategy, but it is not the only one. Second strategy is to design data structures and also your code to help the compressor. E.g. often in a compressed intro a short loop can be replaced by a series of unrolled statements - an insane strategy in a size-optimized world, but quite possibly viable approach if you know that intro will be compressed. A complete paradigm shift is needed in this case, of course.
    29 replies | 1696 view(s)
  • introspec's Avatar
    3rd June 2020, 01:46
    1) I know some neat examples of Z80 decompressors. I am not aware of any systematic lists. I recently did some reverse-engineering of ZX Spectrum based 1Ks. About one third of them were packed; the most popular compressors seemed to be ZX7, MegaLZ and BitBuster (in order of reducing popularity, note that the respective decompressor sizes are 69, 110 and 88 bytes). 2) Maybe yes, but the large influence of the decompressor size means that data format becomes a lot more important than usual. I think that this implies a lot of scope for adaptivity and tricks.
    29 replies | 1696 view(s)
  • redrabbit's Avatar
    3rd June 2020, 01:40
    Hi! I remember a zip program wich can create a zip file with the same CRC/size no matter where you using that tool, actually i tried to compress a buch of files with 7zip 16.04 (64 bits) and zip 3.00 in Linux and in Windows but the final files don't have the same size, even i tried to stored the files but i get different results Example: wine zip.exe -rq -D -X -0 -A testwindows.zip *.txt zip -rq -D -X -0 -A testlinux.zip *.txt md5sum *.zip 725d46abb1b87e574a439db15b1ba506 testlinux.zip 70df8fe8d0371bf26a263593351dd112 testwindows.zip As i said i remember a zip program (i don't know the name) who the author said that the result was always the same regardless of the platform (win, linux...)
    3 replies | 145 view(s)
  • introspec's Avatar
    3rd June 2020, 01:40
    Frankly, I do not think Crinkler (or similar tools) are very relevant to this thread. You are right that there could be improvements to the decompressor, but I was trying to say that you won't get LZMA into sub 50-100b decompressor, so although an amazing tool for 4K or 8K intros, it is just a different kind of tool. Your idea to only have match length of 2 is cool (although I need to try it in practice to see how much ratio one loses in this case). The smallest generic LZ77 on Z80 that I know has an 18 byte decompression loop, so your 17 byte loop would be interesting to see - have you published it anywhere? In any case, I am working on a small article about such mini-decompressors and is definitely looking forward for anything you will write. I mainly code on Z80, so I do not know much about prefix emitters. Can you point to any discussion of what they can look like?
    29 replies | 1696 view(s)
  • maadjordan's Avatar
    3rd June 2020, 01:33
    maadjordan replied to a thread WinRAR in Data Compression
    As winrar does not support compressing with 7-zip and its plugins, would kindly provide a reduced version of your plugins for extraction only. Many Thanks
    185 replies | 129952 view(s)
  • maadjordan's Avatar
    3rd June 2020, 01:32
    maadjordan replied to a thread WinRAR in Data Compression
    :)
    185 replies | 129952 view(s)
  • Darek's Avatar
    3rd June 2020, 00:33
    Darek replied to a thread Paq8pxd dict in Data Compression
    I've tested best options for Byron's dictionary on 4 Corpuses files. It was made for paq8pxd v85 version. Of course I realise that It could works only for some versions but it looks at now it works. The best results are for Silesia Corpus -> 74KB of gain which is worth to use, for other corpuses gains are smaller but there always something. Files not mentioned below didn't get any gain due to use -w option or -exx. file: option SILESIA dickens: -e77,dict mozilla: -e26,dict osdb: -w reymont: -w samba: -e133,dict sao: -w webster: -e373,dict TOTAL Silesia savings = 74'107 bytes CALGARY book1: -e47,dict book2: -e43,dict news: -e97,dict paper2: -e34,dict progp: -e75,dict Calgary.tar: -e49,dict TOTAL Calgary savings = 1'327 bytes Calgary.tar savings = 3'057 bytes CANTERBURY alice29.txt: -e38,dict asyoulik.txt: -e53,dict lcet10.txt: -e54,dict plrabn12.txt: -e110,dict Canterbury.tar: -e95,dict TOTAL Canterbury savings = 873 bytes Canterbury.tar savings = 1'615 bytes MAXIMUM COMPRESSION world95.txt: -e22,dict TOTAL Maximum Compression savings = 1'449 bytes Due to all settings and changes Maximum Compression score for paq8pxd v89 is below 6'000'000 bytes! First time ever! (w/o using tarball option, comprssing tar file got 5'993'762 bytes)
    922 replies | 315667 view(s)
  • Shelwien's Avatar
    2nd June 2020, 23:27
    Windows has some preinstalled compression algorithms actually (deflate, LZX/LZMS): http://hugi.scene.org/online/hugi28/hugi%2028%20-%20coding%20corner%20gem%20cab%20dropping.htm I wonder if same applies to other platforms? Maybe at least some relevant code in the ROM?
    29 replies | 1696 view(s)
  • Gotty's Avatar
    2nd June 2020, 18:18
    Gotty replied to a thread paq8px in Data Compression
    Please specify what "digits" mean. Do you mean single-digit ASCII decimal numbers one after the other with no whitespace, like "20200602"? I'm not sure what you mean by "in detail". It would not be easy to demonstrate in a forum post what paq8px does exactly. Especially because paq8px is heavy stuff - it needs a lot of foundation. I suppose you did some research, study and reading since we last met and you would like to dig deeper? If you need real depth, you will need to fetch the source code and study it. However if you have a specific question, feel free to ask it here.
    1864 replies | 539681 view(s)
More Activity