Activity Stream

Filter
Sort By Time Show
Recent Recent Popular Popular Anytime Anytime Last 24 Hours Last 24 Hours Last 7 Days Last 7 Days Last 30 Days Last 30 Days All All Photos Photos Forum Forums
  • Gotty's Avatar
    Today, 20:23
    Gotty replied to a thread paq8px in Data Compression
    @Eppie, in the root folder there is a test.cpp file you created earlier. I looks like it is still under development (?). Could you please finalize it sometime? For now I think I'm gonna removed it so compilation is simpler (= "compile all cpp files"). Placing it to a "test" subfolder would also work, but it looks like you may have a test folder since it's excluded in .gitignore. Can we place it there?
    1862 replies | 539593 view(s)
  • Gotty's Avatar
    Today, 20:14
    Gotty replied to a thread paq8px in Data Compression
    Usually we provide exes for the public with no specific instruction set and let the cpu dispacher do its thing during runtime. In order to be able to publish a general build for the next release again, I will modify the current behavior a little bit.
    1862 replies | 539593 view(s)
  • Gotty's Avatar
    Today, 20:05
    Gotty replied to a thread paq8px in Data Compression
    Sorry that those changes caused you inconvenience. Folder compression was not removed but changed. The pre-v137 solution was unfortunately not deterministic, and that caused headaches. Also nobody seemed to use folder compression at that time. Darek uses per-tarred files, for example, that also gives deterministic results. Since v137 you'll need to create a list of files you wish to add to the archive. You can build this list of files ahead of time, and use that for all your subsequent tests. Since the list is fixed, the result is deterministic. Use "paq8px.exe -9 @listoffiles.txt" for compression. See the command line help for more. Or if you are for GUI solutions use https://github.com/moisespr123/PAQCompress. I hope it helps. The default compression (level 5) was removed, because *really* nobody used it. The default compression level was only there because with drag and drop you can not specify a compression level yourself. And level 5 is safe enough: if you don't have the necessary gigabytes of RAM it would still run. But noone really uses level 5. Creating the archive in the same place as the original file is a no-go. When I try to compress the same file with different options and run the test simultaneously one archive overwrites the other. I can understand that with the drag and drop functionality the best for you is when the file is created in the original folder and not on your desktop, but using the command line and running parallel tests the optimal default archive location is the location where you run your command, and not where the original file resides. But this is just the default. However you can still specify any destination folder for the archive for the command line so having the archive in the folder of the original file is still easily achievable. It's just not the default. @To all: is there anyone else that needs any of the above functionality to be reverted back to how it was before? Related: https://github.com/hxim/paq8px/issues/111
    1862 replies | 539593 view(s)
  • Gotty's Avatar
    Today, 16:52
    Gotty replied to a thread paq8px in Data Compression
    Thank you Kaitz! I'm importing your fix from paq8pxd_v76.
    1862 replies | 539593 view(s)
  • Gotty's Avatar
    Today, 16:51
    Gotty replied to a thread paq8px in Data Compression
    Thank you for submitting this issue. It is fixed in the next release. paq8px_v183fix1.exe -8 P1050204.JPG = 4891731 -> 4718331 paq8px_v187fix1.exe -8 P1050204.JPG = 4891731 -> 3549528
    1862 replies | 539593 view(s)
  • Gotty's Avatar
    Today, 14:52
    Oh, I see. It looks like your use case is significantly different from mine. Let me share what exactly I did. For ever file (49 in total) in calgary, cantenbury, maximumcompression, silesia plus some images from Darek plus enwik8 I fetched the consecutive bytes ("strings"), hashed them using different algorithms and counted how many items are to be stored in hash tables with sizes in the range 16..22 bits (64K..4M). I run the same experiment for 2..8-byte strings. Just to have a more global view. Results for 4-byte strings and 128K hash tables? For example, in calgary\bib there are 19086 distinct 4-byte strings. Fitting them to a 128K (17 bit) hash table looks simple and easy but we need to squeeze those 32-bit (4-byte) values to their 17-bit representation "somehow". That is what hashing does. It turns out that assigning 19086 values *randomly* to 128K buckets will produce some collisions (the birthday problem) even if we have plenty of room. We are to expect around 1300 cases where 2-3 different strings will be assigned to the same hash table bucket. That is a 0,069 collision-rate on average per bucket. The worst collision rate to be expected for 4-byte strings with a 128K hash table is from silesia\xml. It has 125899 distinct 4-byte strings. Having 131072 buckets and assigning our strings randomly to the buckets we are to expect 0.357 collisions per bucket on average. So I established these expected collisions rates for every . Then measured the actual collision rate for the different hashing algorithms. I simply normalized at the end: my metric for cross-scenario comparison is: the actual collision rate divided by the expected collision rate. A value less than or around 1.0 means the hash algo performed well enough. I averaged these metrics to have a total and also across dimensions (the string sizes and hash table sizes) to verify that this approach gives consistent result "no matter in which scenario". Remarks: - Hashing is not a random assignment to buckets, because the input strings make it deterministic, but having such an expected value for every setup we can compare the hashing algorithms across the setups. I can not just add up all the number of collisions for every , that would give a strong bias with setups with very low expected collision rates. - There are other metrics for measuring hash quality, but I think the above is useful for finding the best algo for our scenario: which are the hash algorithms that perform equally well with different (small) string sizes and hash table sizes. - With proper collision detection (and by chaining the strings of course) any "good enough" hashing works equally well. You will need to choose the faster/simpler one. - A collision is not necessarily a bad thing. In lucky cases it can even be advantageous. If for instance "ant", "fire" and "police" would produce the same base hash value, and would use the same statistics for the three words (since you don't have collision detection), and your text would contain the space or "m" as the next character (as is "antman", "fireman" and "policeman" or just simply "ant ", "fire " and "police " then you would be lucky with this collision. We have the following differences: - I ran the tests on every file separately. You tar your input files. - For me a "collision" is when there are multiple items to be assigned to the same bucket. For you a collision may occur multiple times: when one item overrides an old one, and then a newer occurrence of the old string overwrites the new one. And this can go on multiple times. As I understand you don't have collision detection (right?). - What else? What do you have in your compression algorithm that would explain the result you experienced? Attached are the results for calgary/bib for 4-byte strings and 128K hash table size. The bold items are the hash algorithms with different magic numbers you tried where you hash the dword at once not byte-by-byte.
    18 replies | 25529 view(s)
  • Dresdenboy's Avatar
    Today, 14:22
    I found my latest version of the prefix emitter. It's even just 11B in the main loop (x86, 16b) without proper setup. Does the Z80 have bitfield instructions? I guess not. I programmed them the last time 28 years ago. Then the typical shift, test, reload would be required.
    29 replies | 1674 view(s)
  • Shelwien's Avatar
    Today, 13:18
    Shelwien replied to a thread Zstandard in Data Compression
    Incidentally it was also added by winzip as method 93: https://support.winzip.com/hc/en-us/articles/115011811307-Unsupported-Zip-format
    437 replies | 131189 view(s)
  • algorithm's Avatar
    Today, 09:32
    @smjohn1 You have only variable length offset. Your smaller size offset is only up to 127 distance. Compression ratio will not significantly improve. In my case the the smaller version had 2 bit LL, 2 bit ML and 11 bit offset. Also for fast modes 64KB window in LZ4 is too big. Having 4K hash table is the bottleneck in compression ratio. LZ4 is optimized for 32KB cache size. I think there can be an improvement in compression ratio if there were multiple versions for different cache sizes. Some mobile phones even have 128 KB L1. But i don't expect big improvement.
    8 replies | 242 view(s)
  • algorithm's Avatar
    Today, 09:10
    I have designed one using xors and 2 complements that emulate cmov but didn't implement. But the code is from 5 years ago.
    8 replies | 242 view(s)
  • Jarek's Avatar
    Today, 09:01
    Jarek replied to a thread Zstandard in Data Compression
    Zstd was added to ZIP a few days ago with method ID 20: https://pkware.cachefly.net/webdocs/APPNOTE/APPNOTE-6.3.7.TXT https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT
    437 replies | 131189 view(s)
  • Kirr's Avatar
    Yesterday, 17:37
    FQSqueezer is in! But only tested on small data due to its extreme slowness. Thanks for suggestion, dnd! It's also worth mentioning that FQSqueezer is designed for FASTQ data, therefore its performance in my benchmark should not be taken as indicative of its actual performance on its intended data.
    27 replies | 3143 view(s)
  • Kirr's Avatar
    Yesterday, 17:32
    I guess this discussion is about deflate-based libraries. Since I benchmark command line tools rather than bare libraries, each lib would need a command line interface before it can be tested. I already added p7zip and libdeflate's streaming gzip replacement to my to-do benchmark queue. If you have more suggestions in mind, please post here with links. Thanks!
    9 replies | 602 view(s)
  • Darek's Avatar
    Yesterday, 17:28
    Darek replied to a thread Paq8sk in Data Compression
    122'505'372 - enwik9 -x15 -w -e1,english.dic by Paq8sk19, time 134115,35s - not as bad time as for paq8sk13 Score very close to my estimate.
    98 replies | 8837 view(s)
  • lz77's Avatar
    Yesterday, 16:56
    ((a*271+b)*271+c)*271+d is even worse than 506832829, see my message #11...
    18 replies | 25529 view(s)
  • Jyrki Alakuijala's Avatar
    Yesterday, 16:38
    how do they relate to 506832829
    18 replies | 25529 view(s)
  • smjohn1's Avatar
    Yesterday, 16:37
    True. my tests of the simpler version ( see attached lz4.c above ) shows about 2/3 of original decompression speed. But in many applications, decompression speed is already way higher than other components, so even of 2/3, that wouldn't be a problem. Thus there is value in improving compression ratio and speed, if improvement is significant. My simple version of course only shows about 1% to 2% gain in compression ratio, which is not significant in most applications. That is why I think it might be worthwhile to explore more gains.
    8 replies | 242 view(s)
  • smjohn1's Avatar
    Yesterday, 16:31
    Definitely welcome to change my modifications below to see if there are good improvements. ​
    8 replies | 242 view(s)
  • smjohn1's Avatar
    Yesterday, 16:29
    Here is a simpler version: window size reduced to 32K ( cannot upload lz4.h, why .h extension is not allowed? ), and use 1+7 format for offset ( 1 byte for smaller offsets and 2 bytes for others ). So only changes are in lines 1011, 1016-1018 ( for encoding) and 1769, 1969 (for decoding). Works for pic, news, trans in calgarycorpus. Any other places need to be taken care of? ( Just for fast mode ). ​
    8 replies | 242 view(s)
  • Bulat Ziganshin's Avatar
    Yesterday, 16:24
    did you try to use CMOV and other branchless approaches?
    8 replies | 242 view(s)
  • pklat's Avatar
    Yesterday, 14:59
    pklat replied to a thread deflate in Data Compression
    Thanks Shelwien, I understand (better) now.
    2 replies | 153 view(s)
  • lz77's Avatar
    Yesterday, 12:49
    See my new CalcHash and compare results with ones in my message #11: function CalcHash(dw: dword): dword; var a,b,c,d: byte; begin a:=dw and $ff; b:=dw shr 8 and $ff; c:=dw shr 16 and $ff; d:=dw shr 24 and $ff; Result:=(((a*271+b)*271+c)*271+d) and $1FFFF; end; enwik8: 40.579% enwik9: 36.318% silesia.tar: 37.434% It's slower on 15-20% and ratio always worse comparing with 123456789...
    18 replies | 25529 view(s)
  • Shelwien's Avatar
    Yesterday, 12:44
    Shelwien replied to a thread deflate in Data Compression
    Do you mean, why other deflate libraries produce different deflate data than zlib? Any LZ format has multiple ways to parse the data into strings (eg. "abc"="a"."b"."c"="ab"."c"="a"."bc"), plus a string can be encoded as a reference to any of its previous occurrences within the window, which adds even more options. Deflate is a relatively complicated version of LZ algorithm - it uses Huffman coding for the data, then RLE + Huffman for length tables of primary Huffman codes. So there're multiple options for splitting the data into blocks, then parsing each block, then encoding of each block header. Its actually very hard to find one specific way of data parsing that would result in best compression - there're too many combinations so just trying all of them is impossible, and even iterative optimization methods can run for a week and still keep finding small improvements. Thus all practical compressor implementations only can rely on various heuristics to run with reasonable speed, so its not surprising that different implementations would produce different outputs - its pretty unlikely that different developers would make the same intuitive choices dozens of times.
    2 replies | 153 view(s)
  • algorithm's Avatar
    Yesterday, 11:59
    I had made a LZ4 variant with 32KB window with an additional smaller 2 byte match sequence(instead of normal 3). While it had better compression, decompression speed was like half as fast due to a single unpredictable branch. Your method will likely have the same performance penalty. (unpredictable branches are expensive).
    8 replies | 242 view(s)
  • pklat's Avatar
    Yesterday, 09:42
    pklat started a thread deflate in Data Compression
    apart from compression level and size of the sliding window, what makes zip encoder produce different deflate data from other algorithms?
    2 replies | 153 view(s)
  • Cyan's Avatar
    Yesterday, 05:05
    Cyan replied to a thread Modifying LZ4 offset in Data Compression
    A more complete source code package will probably prove helpful. There are too many ways such a modification could introduce flaws, so it requires paying attention to implementation details. Good news is, you have tests to quickly check the outcome.
    8 replies | 242 view(s)
  • smjohn1's Avatar
    Yesterday, 04:40
    I am trying a new encoding method for LZ4's match offset. which varies from 1 byte and 3 bytes, still byte aligned. Of course the hope is most offsets only need 1 byte, thus better compression ratio, and maybe compression speed as well. In implementation by simple modifications of the LZ4-V.1.9.2 package, in LZ4_compress_generic(), two places of LZ4_writeLE16(op, (U16)offset); op+=2; and LZ4_writeLE16(op, (U16)(ip - match)); op+=2; are replaced with new encoding method; ( just ignore the high compression part ) and two places in LZ4_decompress_generic() offset = LZ4_readLE16(ip); ip+=2; are replaced with new offset decoding rules ( a bit complicated ). Thought that would do. But while some files can be decompressed successfully, some cannot with errors such as Invalid Checksum : Decoding error at pos 152033 (block 0, sub 1, pos 152033) and some with "Error 66 : Decompression error : ERROR_decompressionFailed" ( on enwik8) So obviously somewhere else needs to be modified as well. Could anyone shed some lights on where to make changes? Thanks in advance.
    8 replies | 242 view(s)
  • SolidComp's Avatar
    Yesterday, 02:26
    SolidComp replied to a thread Brotli in Data Compression
    Strange, I don't get anywhere near that. I didn't know there was a -12 level, and when I run it I get 34,768, nowhere near as good as 7-Zip's gzipper. What's your command line syntax?
    261 replies | 82832 view(s)
  • Gotty's Avatar
    4th June 2020, 23:52
    Thank you! That is unfortunately not optimal for this kind of hash. This hash works best with small values. The larger the values gets, the worse it performs. Probably that is why a smaller multiplier worked better for you (so far). I've just run some checks and I got the best results on the corpuses using a hash function originally from paq8a, still used in paq8pdx. The second place is for crc32c and the 3rd is from Robert Sedgwicks' hash function. The fastest from the top 3 is paq8a's wordmodel hash. So that is the absolute winner. Could you please try something similar like ( ((a*271+b)*271+c)*271+d ) where a,b,c,d are four bytes? Keep the lower bits (and 0x1FFFF) at the end. If you do timings: is it much slower in your case? (I'm still running tests, there are many candidates.)
    18 replies | 25529 view(s)
  • ScottX's Avatar
    4th June 2020, 22:53
    It could be because of UPX packer. You can try unpack .sfx by UPX (it's packed because of the size) by command: UPX -d rz.sfx But you get aprox. 32kB instead of 12kB.
    208 replies | 79860 view(s)
  • JamesB's Avatar
    4th June 2020, 18:48
    I just tried the latest version. It's as I recall - very fast but very light levels of compression. It also has no decompression support at all, so obviously it can't compete on that front. On the above file with the same machine, level 1 only. Redoing these encode timings as I realise I had I/O time in there which doesn't help the ultra fast ones: Tool Encode Decode Size ------------------------------------------ vanilla 0m1.657s 0m0.546s 42298786 intel 0m0.569s 0m0.524s 56046821 cloudflare 0m0.974s 0m0.470s 40867185 jtkukunas 0m1.106s 0m0.392s 40867185 ng 0m0.695s 0m0.397s 56045984 zstd (gz) 0m1.582s 0m0.475s 42298786 libdeflate 0m0.838s 0m0.235s 39597396 libslz 0m0.482s N/A 55665941 So yes it's the fastest. libslz has levels up to -9, but they're barely any different in size/speed.
    9 replies | 602 view(s)
  • suryakandau@yahoo.co.id's Avatar
    4th June 2020, 16:51
    Maybe you could test paq8sk19 with -x14 option..thanx. ...
    98 replies | 8837 view(s)
  • Dresdenboy's Avatar
    4th June 2020, 16:06
    We could use "~" anywhere for "on average" (some thread compression ;)). Saukav is probably just a one encoding format example (zx7) of an actually global optimization problem (any compression method + decomp stub size on target platform + decomp stub adaptions + other adaptions). I'm doing a lot of analysis of x86 intros (which methods work best, could they be stripped down, etc.). For example (and you surely saw that when doing your own small decompressors), a more or less generic packer uses a lot of encodings, code paths etc. to be good overall. But for small files (and there's also only little research in this regard, e.g. about compressed message sizes of eth protocol etc.) some specific subset with very little case handling might fit better. In small files (like 256B intros) there are a lot of 1B (short opcodes) and many 2B matches, but very few 3+ matches. So any complex handling of the latter ones, like interleaved gamma codes etc. might be just too much. Also the probabilities of literal run lengths look very different here. That's why I also think there is room for improvement. :) This was more about being platform agnostic, to first find the "overhead" of algorithms for finding their models, tune to the data, or simply encode literals and matches. Then would follow the next step of adding the decomp stub sizes, as COM might mean 16B, "low" memory (on PC at least, with 40+B for accessing >1MB). Some current list of tested compressors with example compressed sizes: lzw: 227/ 256 B puc: 221/ 256 B exo: 201/ 256 B zx7: 231/ 256 B hrust: 223/ 256 B pkf: 221/ 256 B pkft: 227/ 256 B muc: 251/ 256 B apc: 214/ 256 B apack: 217/ 256 B paq1: 189/ 256 B lzma: 292/ 256 B zlib: 246/ 256 B gzip: 258/ 256 B bz2: 288/ 256 B lz4: 261/ 256 B paq1ss: 188/ 256 B paq3n: 188/ 256 B paq6v2: 188/ 256 B paq8f: 185/ 256 B paq8h: 194/ 256 B paq8l: 204/ 256 B paqar: 179/ 256 B If you have some interesting ones, I'd happy to add them. So far I also want to add oneKpaq, as it has a 128B decompressor. More on the other topics later. E.g. prefix emitter: on x86 I might try to use long SSE insts. They share a 0x0f prefix. This is easy to emit with little code. On Z80 there are several prefixes (FD, ED etc.), which are rather common. It would be easy to write such a compressor with encoding the prefixes in 1-2 bits.
    29 replies | 1674 view(s)
  • lz77's Avatar
    4th June 2020, 13:46
    inb1 is a pointer to the current position in source data. inb1^ is 4 byte from current position, it has dword (uint32) type. Yes, but if there was a Hutter Price to the one who can compress enwik8 in 40% with pure LZ77 I would get it. :)
    18 replies | 25529 view(s)
  • Darek's Avatar
    4th June 2020, 12:13
    Darek replied to a thread Paq8sk in Data Compression
    I've started to test paq8sk19. paq8sk22 or paq8sk23 is generally too slow... First 200-300MB goes generally (historically) well but after then often program start to use very high amount of memory and often block whole computer - as I said I have "only" 32GB of RAM.
    98 replies | 8837 view(s)
  • SolidComp's Avatar
    4th June 2020, 12:13
    James, what about SLZ? That's the fastest gzip implementation on earth as far as I know. It doesn't compress as well though. But it's incredibly fast, and super light in terms of CPU and memory consumption.
    9 replies | 602 view(s)
  • Jyrki Alakuijala's Avatar
    4th June 2020, 00:32
    ​If we didn't fix that yet, we will fix it very soon now.
    156 replies | 37463 view(s)
  • vteromero's Avatar
    3rd June 2020, 17:51
    vteromero replied to a thread VTEnc in Data Compression
    If anyone is interested, here's a blog post about encoding parameters in VTEnc library: https://vteromero.github.io/2020/06/03/encoding-parameters-in-vtenc-library.html
    22 replies | 4673 view(s)
  • Gotty's Avatar
    3rd June 2020, 17:39
    Yes. And needs to be large. A large odd "random" number.
    18 replies | 25529 view(s)
  • Gotty's Avatar
    3rd June 2020, 17:38
    Looks good so far. What is supplied to this function as a parameter? I don't think that the number of bits in the multiplier has an effect on the performance of the multiplication. The cpu does not actually look for the 1 bits in the multiplier to do shifts and additions. An explanation is here. 40.008% -> 39.974% is a 0.034% gain. I would not call it noticeably better. :-)
    18 replies | 25529 view(s)
  • Jyrki Alakuijala's Avatar
    3rd June 2020, 17:28
    Jyrki Alakuijala replied to a thread Brotli in Data Compression
    There are 120 transforms. Appendix B in rfc7932. A custom 'shared brotli' dictionary can bring its own transforms.
    261 replies | 82832 view(s)
  • Jyrki Alakuijala's Avatar
    3rd June 2020, 17:26
    Jyrki Alakuijala replied to a thread Brotli in Data Compression
    Zstd's two improvements over zlib are larger backward reference window and ANS instead of prefix coding. ANS does not help much (0.5 %) if you have a lot of literals -- in brotli's design experiments ANS was a (small) density loss over prefix coding. larger backward references don't help much if you have a small data size (particularly if it is around 32 kB or less). So, no big surprise zstd does not compress better than zlib here. Brotli brings three improvements that do help for small data: context modeling, static dictionary and super-cheap swapping back to previous entropy codes. Optimized javascript is not as easy to compress as many other data, and often we see only ~10 % further savings from moving from zlib to brotli.
    261 replies | 82832 view(s)
  • Stefan Atev's Avatar
    3rd June 2020, 16:16
    I guess ROM is the only thing you can kind of depend on to be your "dictionary"; even then, you'd be making assumptions about the machine you're running on that may be a bit too specifc (e.g., all the 6502 clones I ever programmed were Pravetz 82 clones, definitely had their own ROM different than an Apple II); it's exactly like ROP exploits - they work on specific versions of OSes, not generically on a platform. Though maybe for non-malicious code, there's less risk of somebody patching what you depended on. People used to try to check the code pointed to by an interrupt handler to try to ensure that they are not running under debugger, or that certain TSRs are not installed, and that ultimately breaks when what you're checking legitimately changes...
    29 replies | 1674 view(s)
  • lz77's Avatar
    3rd June 2020, 13:19
    I tested both numbers on my LZ77 type compressor: magic number = 123456789 | 506832829 enwik8: 40.549% | 40.573% enwik9: 36.305% | 36.314% silesia.tar: 37.429% | 37.423% lz4_win64_v1_9_2.exe: 42.341% | 42.359% The humorous constant 123456789 gives noticeably better compression. Your number is better only on silesia.tar (on 0.006%). I think, because silesia.tar contains very specific img files. > I believe the number doesn't need to be prime, but needs to be odd and have a good distribution of 0s and 1s. Yes, the number 123456789 includes sequences of 1, 2, 3 and 4 1-bits. By the way: when I used the Knut's number 2654435761, my algorithm with best ratio was compressing enwik8 in 40.008%. After I change the Knuth's number to the 123456789 my algorithm overcame the psychological frontier and showed 39.974%. :_yahoo2: < 40% on enwik8 only with hash table of 128K cells without search matches, source analysis & additional compression! May be after improvement the algorithm will show the compression ratio < 40% on a hash table of 64K cells...
    18 replies | 25529 view(s)
  • Jyrki Alakuijala's Avatar
    3rd June 2020, 12:14
    Are those numbers better or worse than my personal magic number: 506832829 I believe the number doesn't need to be prime, but needs to be odd and have a good distribution of 0s and 1s.
    18 replies | 25529 view(s)
  • lz77's Avatar
    3rd June 2020, 11:19
    Bitte, this is a piece of my code: const TABLEBITS = 17; TABLESIZE = 1 shl TABLEBITS; var table: array of dword; ................................ // Calculate hash by 4 bytes from inb1^ function CalcHash(dw: dword): dword; assembler; asm mov ecx,123456789 mul ecx shr eax,32-TABLEBITS end; { CalcHash } I believe that the fewer 1 bits will be in the magic constant, the faster the multiplication will be performed, which will increase the compression speed.
    18 replies | 25529 view(s)
  • pklat's Avatar
    3rd June 2020, 08:49
    file timestamps are stored also, and iirc they differ in NTFS and other filesystems
    3 replies | 145 view(s)
  • Shelwien's Avatar
    3rd June 2020, 06:50
    Windows and linux actually have different file attributes, so you'd need a hacked zip which doesn't store attributes, otherwise normally it won't happen. You can try using a zip built under cygwin/msys2 on windows side, but even these would have different default unix attributes (eg. cygwin shows 0770 for windows files), and possibly also custom defines for cygwin which would make them use winapi.
    3 replies | 145 view(s)
  • Shelwien's Avatar
    3rd June 2020, 06:43
    > wouldn't have been able to depend on compression routines in Windows. I actually mean chunks of existing code, like https://en.wikipedia.org/wiki/Return-oriented_programming#Attacks https://github.com/JonathanSalwan/ROPgadget#screenshots
    29 replies | 1674 view(s)
  • ivan2k2's Avatar
    3rd June 2020, 06:06
    1) try to compess non-textual files and check results 2) try to compress your text files with -ll or -l option and check results
    3 replies | 145 view(s)
  • Stefan Atev's Avatar
    3rd June 2020, 04:49
    I can see that, though it goes against my instincts :) I have seen people extract common instruction sequences into subroutines even if they were pretty arbitrary and logically unrelated; you eat 3 bytes to do a call (and one ret) each time you need the sequence, and that is basically an "executable LZ"; I can see how actual LZ would quickly be better since matches are encoded more efficiently than even near calls. However, for some data LZ is not that great, while a custom encoding could work quite well. None of the stuff I ever wrote assumed anything more than DOS, wouldn't have been able to depend on compression routines in Windows.
    29 replies | 1674 view(s)
  • introspec's Avatar
    3rd June 2020, 01:54
    I think some people made use of tricks like this. I have a lot of experience with older computers, for them data compression pretty much did not exist. I'd love to be proved wrong here, but I'd be very surprised if any of 1980s machines has anything of the kind in their ROMs.
    29 replies | 1674 view(s)
  • introspec's Avatar
    3rd June 2020, 01:51
    I think that there are two approaches to making a compressed intro. First, more common one, would be to compress your well-tuned code so that a bit of extra squeeze can be achieved. This is very traditional strategy, but it is not the only one. Second strategy is to design data structures and also your code to help the compressor. E.g. often in a compressed intro a short loop can be replaced by a series of unrolled statements - an insane strategy in a size-optimized world, but quite possibly viable approach if you know that intro will be compressed. A complete paradigm shift is needed in this case, of course.
    29 replies | 1674 view(s)
  • introspec's Avatar
    3rd June 2020, 01:46
    1) I know some neat examples of Z80 decompressors. I am not aware of any systematic lists. I recently did some reverse-engineering of ZX Spectrum based 1Ks. About one third of them were packed; the most popular compressors seemed to be ZX7, MegaLZ and BitBuster (in order of reducing popularity, note that the respective decompressor sizes are 69, 110 and 88 bytes). 2) Maybe yes, but the large influence of the decompressor size means that data format becomes a lot more important than usual. I think that this implies a lot of scope for adaptivity and tricks.
    29 replies | 1674 view(s)
  • redrabbit's Avatar
    3rd June 2020, 01:40
    Hi! I remember a zip program wich can create a zip file with the same CRC/size no matter where you using that tool, actually i tried to compress a buch of files with 7zip 16.04 (64 bits) and zip 3.00 in Linux and in Windows but the final files don't have the same size, even i tried to stored the files but i get different results Example: wine zip.exe -rq -D -X -0 -A testwindows.zip *.txt zip -rq -D -X -0 -A testlinux.zip *.txt md5sum *.zip 725d46abb1b87e574a439db15b1ba506 testlinux.zip 70df8fe8d0371bf26a263593351dd112 testwindows.zip As i said i remember a zip program (i don't know the name) who the author said that the result was always the same regardless of the platform (win, linux...)
    3 replies | 145 view(s)
  • introspec's Avatar
    3rd June 2020, 01:40
    Frankly, I do not think Crinkler (or similar tools) are very relevant to this thread. You are right that there could be improvements to the decompressor, but I was trying to say that you won't get LZMA into sub 50-100b decompressor, so although an amazing tool for 4K or 8K intros, it is just a different kind of tool. Your idea to only have match length of 2 is cool (although I need to try it in practice to see how much ratio one loses in this case). The smallest generic LZ77 on Z80 that I know has an 18 byte decompression loop, so your 17 byte loop would be interesting to see - have you published it anywhere? In any case, I am working on a small article about such mini-decompressors and is definitely looking forward for anything you will write. I mainly code on Z80, so I do not know much about prefix emitters. Can you point to any discussion of what they can look like?
    29 replies | 1674 view(s)
  • maadjordan's Avatar
    3rd June 2020, 01:33
    maadjordan replied to a thread WinRAR in Data Compression
    As winrar does not support compressing with 7-zip and its plugins, would kindly provide a reduced version of your plugins for extraction only. Many Thanks
    185 replies | 129951 view(s)
  • maadjordan's Avatar
    3rd June 2020, 01:32
    maadjordan replied to a thread WinRAR in Data Compression
    :)
    185 replies | 129951 view(s)
  • Darek's Avatar
    3rd June 2020, 00:33
    Darek replied to a thread Paq8pxd dict in Data Compression
    I've tested best options for Byron's dictionary on 4 Corpuses files. It was made for paq8pxd v85 version. Of course I realise that It could works only for some versions but it looks at now it works. The best results are for Silesia Corpus -> 74KB of gain which is worth to use, for other corpuses gains are smaller but there always something. Files not mentioned below didn't get any gain due to use -w option or -exx. file: option SILESIA dickens: -e77,dict mozilla: -e26,dict osdb: -w reymont: -w samba: -e133,dict sao: -w webster: -e373,dict TOTAL Silesia savings = 74'107 bytes CALGARY book1: -e47,dict book2: -e43,dict news: -e97,dict paper2: -e34,dict progp: -e75,dict Calgary.tar: -e49,dict TOTAL Calgary savings = 1'327 bytes Calgary.tar savings = 3'057 bytes CANTERBURY alice29.txt: -e38,dict asyoulik.txt: -e53,dict lcet10.txt: -e54,dict plrabn12.txt: -e110,dict Canterbury.tar: -e95,dict TOTAL Canterbury savings = 873 bytes Canterbury.tar savings = 1'615 bytes MAXIMUM COMPRESSION world95.txt: -e22,dict TOTAL Maximum Compression savings = 1'449 bytes Due to all settings and changes Maximum Compression score for paq8pxd v89 is below 6'000'000 bytes! First time ever! (w/o using tarball option, comprssing tar file got 5'993'762 bytes)
    922 replies | 315652 view(s)
  • Shelwien's Avatar
    2nd June 2020, 23:27
    Windows has some preinstalled compression algorithms actually (deflate, LZX/LZMS): http://hugi.scene.org/online/hugi28/hugi%2028%20-%20coding%20corner%20gem%20cab%20dropping.htm I wonder if same applies to other platforms? Maybe at least some relevant code in the ROM?
    29 replies | 1674 view(s)
  • Gotty's Avatar
    2nd June 2020, 18:18
    Gotty replied to a thread paq8px in Data Compression
    Please specify what "digits" mean. Do you mean single-digit ASCII decimal numbers one after the other with no whitespace, like "20200602"? I'm not sure what you mean by "in detail". It would not be easy to demonstrate in a forum post what paq8px does exactly. Especially because paq8px is heavy stuff - it needs a lot of foundation. I suppose you did some research, study and reading since we last met and you would like to dig deeper? If you need real depth, you will need to fetch the source code and study it. However if you have a specific question, feel free to ask it here.
    1862 replies | 539593 view(s)
  • Stefan Atev's Avatar
    2nd June 2020, 18:05
    My experience being with x86 1K intros, this certainly resonates; at the end of the day, the tiny (de)compressor should only be helping you with code - all the data in the intro should be custom-packed anyway, in a way that makes it difficult to compress for LZ-based algorithms. For example, I remember using basically 2bits per beat for an audio track (2 instruments only, both OPL-2 synth); Fonts would be packed, etc. 4K is different, I think there you just have a lot more room. And for 128B and 256B demos, compression is very unlikely to help, I think.
    29 replies | 1674 view(s)
  • Gotty's Avatar
    2nd June 2020, 17:58
    How many bits are used for addressing the hash table (or: how many slots do you have)? How do you exactly implement hashing (do you shift >>)? What do you hash exactly?
    18 replies | 25529 view(s)
  • introspec's Avatar
    2nd June 2020, 17:05
    Yes, I know about Saukav too. I did not have time to do detailed testing of it, but I understand what it does quite well and it should offer compression at the level of Pletter (likely somewhat better than Pletter), while being fairly compact. I believe that its approach to adaptivity, with multiple specific decompressors offered, is an excellent way to increase the compression "for free". However, I strongly suspect that a better solution must be available, most likely for 1K intros and definitely for 256b intros. I can explain what I mean as follows: Suppose you are working on a 1K intro that uses Saukav and at some point you reach the situation where the compressed intro together with decompressor uses up all available space. Suppose that the average decompressor length is 64 bytes (this is the size of zx7b decompressor - the origin of Saukav). Then your compressed size is 1024-64=960 bytes. I do not know the exact ratio offered by Saukav, so I'll use Pletter's ratio of 1.975 as a guide. Hence, our intro is actually 960*1.975=1896 bytes long. Let us now consider switching to a better compressor, e.g. Shrinkler, which is LZMA-based and compresses at the level similar to 7-zip. Its ratio on the same small file corpus that I use for many of my tests is about 2.25. Thus, compressed by Shrinkler our intro will become 1896/2.25~843 bytes long (I should be saying "on average", but it is very annoying to repeat "on average" all the time, so I assume it implicitly). We saved 960-843=117 bytes, which may sound great, yet in fact is useless. The shortest decompressor for Shrinkler on Z80 is 209 bytes long, so we saved 117 bytes in data, and added 209-64=145 bytes in decompressor, i.e. lost 28 bytes overall. The point is, when you are making a 1K intro, Shrinkler will lose to Saukav (to ZX7, MegaLZ, to basically any decent compressor with compact enough decompressor). When working with 4K of memory, these concerns become pretty much irrelevant, you can use Crinkler, Shrinkler or any other tool of your choice. But at 1K the ratio becomes less of a concern, as long as it is not completely dreadful, and decompressor size starts to dominate the proceedings. For 256b intros the situation is even more dramatic. I made one such intro using ZXmini (decompressor size 38 bytes) and found it (the decompressor) a bit too large. More can be done for sure. So, looking at your graph, I do not know what is your target size, but for anything <=1K trust me, without decompressor size included, this data is meaningless.
    29 replies | 1674 view(s)
  • lz77's Avatar
    2nd June 2020, 14:44
    In my LZ77 like algorithm the magic number 123456789 is always better than famous Knuth's prime 2654435761. I tested it on enwik8, enwik9, silesia.tar, LZ4.1.9.2.exe, ... I tried to using two hash functions (one for even value, the other for odd value) but this worsened the result.
    18 replies | 25529 view(s)
  • Trench's Avatar
    2nd June 2020, 06:07
    Sorry. Yes True. But something like that wont be as versatile as the bigger ones. Kind of like having a Swiss army knife as a daily kitchen knife. Unless most of the code is in the file which that wont work, or it compressed itself. But it depends what one wants to decompress since one size cant fit all I assume. zip programs like all programs are like satellite programs relying on the main computer OS they run on. So the file is technically bigger. If you were to put it in another OS or older they wont function or be as small. So if the OS has tile files that help the zip program then you can get something smaller. So in a way its kind of cheating. What would a math professor come up with? But again its good to try for fun. Just an opinion.
    29 replies | 1674 view(s)
  • Sportman's Avatar
    1st June 2020, 22:23
    Sportman replied to a thread Paq8sk in Data Compression
    I guess -x14, I do only fast tests this moment so no enwik9.
    98 replies | 8837 view(s)
  • Dresdenboy's Avatar
    1st June 2020, 22:18
    Trench, I understand these more pragmatic and/or philosophical considerations. They could even lead to erasing the benefit of compressors like PAQ due to the computational costs. But here we're discussing decompressors, which ought to be used for older and curent platforms, with constraints for the actual code's size (like a program that fits into a 512 byte bootsector, maybe with less than that available for custom code). There are competitions for that.
    29 replies | 1674 view(s)
More Activity