Activity Stream

Filter
Sort By Time Show
Recent Recent Popular Popular Anytime Anytime Last 24 Hours Last 24 Hours Last 7 Days Last 7 Days Last 30 Days Last 30 Days All All Photos Photos Forum Forums
  • Kirr's Avatar
    Today, 17:37
    FQSqueezer is in! But only tested on small data due to its extreme slowness. Thanks for suggestion, dnd! It's also worth mentioning that FQSqueezer is designed for FASTQ data, therefore its performance in my benchmark should not be taken as indicative of its actual performance on its intended data.
    27 replies | 3085 view(s)
  • Kirr's Avatar
    Today, 17:32
    I guess this discussion is about deflate-based libraries. Since I benchmark command line tools rather than bare libraries, each lib would need a command line interface before it can be tested. I already added p7zip and libdeflate's streaming gzip replacement to my to-do benchmark queue. If you have more suggestions in mind, please post here with links. Thanks!
    9 replies | 551 view(s)
  • Darek's Avatar
    Today, 17:28
    Darek replied to a thread Paq8sk in Data Compression
    122'505'372 - enwik9 -x15 -w -e1,english.dic by Paq8sk19, time 134115,35s - not as bad time as for paq8sk13 Score very close to my estimate.
    98 replies | 8770 view(s)
  • lz77's Avatar
    Today, 16:56
    ((a*271+b)*271+c)*271+d is even worse than 506832829, see my message #11...
    17 replies | 25447 view(s)
  • Jyrki Alakuijala's Avatar
    Today, 16:38
    how do they relate to 506832829
    17 replies | 25447 view(s)
  • smjohn1's Avatar
    Today, 16:37
    True. my tests of the simpler version ( see attached lz4.c above ) shows about 2/3 of original decompression speed. But in many applications, decompression speed is already way higher than other components, so even of 2/3, that wouldn't be a problem. Thus there is value in improving compression ratio and speed, if improvement is significant. My simple version of course only shows about 1% to 2% gain in compression ratio, which is not significant in most applications. That is why I think it might be worthwhile to explore more gains.
    6 replies | 113 view(s)
  • smjohn1's Avatar
    Today, 16:31
    Definitely welcome to change my modifications below to see if there are good improvements. ​
    6 replies | 113 view(s)
  • smjohn1's Avatar
    Today, 16:29
    Here is a simpler version: window size reduced to 32K ( cannot upload lz4.h, why .h extension is not allowed? ), and use 1+7 format for offset ( 1 byte for smaller offsets and 2 bytes for others ). So only changes are in lines 1011, 1016-1018 ( for encoding) and 1769, 1969 (for decoding). Works for pic, news, trans in calgarycorpus. Any other places need to be taken care of? ( Just for fast mode ). ​
    6 replies | 113 view(s)
  • Bulat Ziganshin's Avatar
    Today, 16:24
    did you try to use CMOV and other branchless approaches?
    6 replies | 113 view(s)
  • pklat's Avatar
    Today, 14:59
    pklat replied to a thread deflate in Data Compression
    Thanks Shelwien, I understand (better) now.
    2 replies | 94 view(s)
  • lz77's Avatar
    Today, 12:49
    See my new CalcHash and compare results with ones in my message #11: function CalcHash(dw: dword): dword; var a,b,c,d: byte; begin a:=dw and $ff; b:=dw shr 8 and $ff; c:=dw shr 16 and $ff; d:=dw shr 24 and $ff; Result:=(((a*271+b)*271+c)*271+d) and $1FFFF; end; enwik8: 40.579% enwik9: 36.318% silesia.tar: 37.434% It's slower on 15-20% and ratio always worse comparing with 123456789...
    17 replies | 25447 view(s)
  • Shelwien's Avatar
    Today, 12:44
    Shelwien replied to a thread deflate in Data Compression
    Do you mean, why other deflate libraries produce different deflate data than zlib? Any LZ format has multiple ways to parse the data into strings (eg. "abc"="a"."b"."c"="ab"."c"="a"."bc"), plus a string can be encoded as a reference to any of its previous occurrences within the window, which adds even more options. Deflate is a relatively complicated version of LZ algorithm - it uses Huffman coding for the data, then RLE + Huffman for length tables of primary Huffman codes. So there're multiple options for splitting the data into blocks, then parsing each block, then encoding of each block header. Its actually very hard to find one specific way of data parsing that would result in best compression - there're too many combinations so just trying all of them is impossible, and even iterative optimization methods can run for a week and still keep finding small improvements. Thus all practical compressor implementations only can rely on various heuristics to run with reasonable speed, so its not surprising that different implementations would produce different outputs - its pretty unlikely that different developers would make the same intuitive choices dozens of times.
    2 replies | 94 view(s)
  • algorithm's Avatar
    Today, 11:59
    I had made a LZ4 variant with 32KB window with an additional smaller 2 byte match sequence(instead of normal 3). While it had better compression, decompression speed was like half as fast due to a single unpredictable branch. Your method will likely have the same performance penalty. (unpredictable branches are expensive).
    6 replies | 113 view(s)
  • pklat's Avatar
    Today, 09:42
    pklat started a thread deflate in Data Compression
    apart from compression level and size of the sliding window, what makes zip encoder produce different deflate data from other algorithms?
    2 replies | 94 view(s)
  • Cyan's Avatar
    Today, 05:05
    Cyan replied to a thread Modifying LZ4 offset in Data Compression
    A more complete source code package will probably prove helpful. There are too many ways such a modification could introduce flaws, so it requires paying attention to implementation details. Good news is, you have tests to quickly check the outcome.
    6 replies | 113 view(s)
  • smjohn1's Avatar
    Today, 04:40
    I am trying a new encoding method for LZ4's match offset. which varies from 1 byte and 3 bytes, still byte aligned. Of course the hope is most offsets only need 1 byte, thus better compression ratio, and maybe compression speed as well. In implementation by simple modifications of the LZ4-V.1.9.2 package, in LZ4_compress_generic(), two places of LZ4_writeLE16(op, (U16)offset); op+=2; and LZ4_writeLE16(op, (U16)(ip - match)); op+=2; are replaced with new encoding method; ( just ignore the high compression part ) and two places in LZ4_decompress_generic() offset = LZ4_readLE16(ip); ip+=2; are replaced with new offset decoding rules ( a bit complicated ). Thought that would do. But while some files can be decompressed successfully, some cannot with errors such as Invalid Checksum : Decoding error at pos 152033 (block 0, sub 1, pos 152033) and some with "Error 66 : Decompression error : ERROR_decompressionFailed" ( on enwik8) So obviously somewhere else needs to be modified as well. Could anyone shed some lights on where to make changes? Thanks in advance.
    6 replies | 113 view(s)
  • SolidComp's Avatar
    Today, 02:26
    SolidComp replied to a thread Brotli in Data Compression
    Strange, I don't get anywhere near that. I didn't know there was a -12 level, and when I run it I get 34,768, nowhere near as good as 7-Zip's gzipper. What's your command line syntax?
    261 replies | 82794 view(s)
  • Gotty's Avatar
    Yesterday, 23:52
    Thank you! That is unfortunately not optimal for this kind of hash. This hash works best with small values. The larger the values gets, the worse it performs. Probably that is why a smaller multiplier worked better for you (so far). I've just run some checks and I got the best results on the corpuses using a hash function originally from paq8a, still used in paq8pdx. The second place is for crc32c and the 3rd is from Robert Sedgwicks' hash function. The fastest from the top 3 is paq8a's wordmodel hash. So that is the absolute winner. Could you please try something similar like ( ((a*271+b)*271+c)*271+d ) where a,b,c,d are four bytes? Keep the lower bits (and 0x1FFFF) at the end. If you do timings: is it much slower in your case? (I'm still running tests, there are many candidates.)
    17 replies | 25447 view(s)
  • ScottX's Avatar
    Yesterday, 22:53
    It could be because of UPX packer. You can try unpack .sfx by UPX (it's packed because of the size) by command: UPX -d rz.sfx But you get aprox. 32kB instead of 12kB.
    208 replies | 79782 view(s)
  • JamesB's Avatar
    Yesterday, 18:48
    I just tried the latest version. It's as I recall - very fast but very light levels of compression. It also has no decompression support at all, so obviously it can't compete on that front. On the above file with the same machine, level 1 only. Redoing these encode timings as I realise I had I/O time in there which doesn't help the ultra fast ones: Tool Encode Decode Size ------------------------------------------ vanilla 0m1.657s 0m0.546s 42298786 intel 0m0.569s 0m0.524s 56046821 cloudflare 0m0.974s 0m0.470s 40867185 jtkukunas 0m1.106s 0m0.392s 40867185 ng 0m0.695s 0m0.397s 56045984 zstd (gz) 0m1.582s 0m0.475s 42298786 libdeflate 0m0.838s 0m0.235s 39597396 libslz 0m0.482s N/A 55665941 So yes it's the fastest. libslz has levels up to -9, but they're barely any different in size/speed.
    9 replies | 551 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Yesterday, 16:51
    Maybe you could test paq8sk19 with -x14 option..thanx. ...
    98 replies | 8770 view(s)
  • Dresdenboy's Avatar
    Yesterday, 16:06
    We could use "~" anywhere for "on average" (some thread compression ;)). Saukav is probably just a one encoding format example (zx7) of an actually global optimization problem (any compression method + decomp stub size on target platform + decomp stub adaptions + other adaptions). I'm doing a lot of analysis of x86 intros (which methods work best, could they be stripped down, etc.). For example (and you surely saw that when doing your own small decompressors), a more or less generic packer uses a lot of encodings, code paths etc. to be good overall. But for small files (and there's also only little research in this regard, e.g. about compressed message sizes of eth protocol etc.) some specific subset with very little case handling might fit better. In small files (like 256B intros) there are a lot of 1B (short opcodes) and many 2B matches, but very few 3+ matches. So any complex handling of the latter ones, like interleaved gamma codes etc. might be just too much. Also the probabilities of literal run lengths look very different here. That's why I also think there is room for improvement. :) This was more about being platform agnostic, to first find the "overhead" of algorithms for finding their models, tune to the data, or simply encode literals and matches. Then would follow the next step of adding the decomp stub sizes, as COM might mean 16B, "low" memory (on PC at least, with 40+B for accessing >1MB). Some current list of tested compressors with example compressed sizes: lzw: 227/ 256 B puc: 221/ 256 B exo: 201/ 256 B zx7: 231/ 256 B hrust: 223/ 256 B pkf: 221/ 256 B pkft: 227/ 256 B muc: 251/ 256 B apc: 214/ 256 B apack: 217/ 256 B paq1: 189/ 256 B lzma: 292/ 256 B zlib: 246/ 256 B gzip: 258/ 256 B bz2: 288/ 256 B lz4: 261/ 256 B paq1ss: 188/ 256 B paq3n: 188/ 256 B paq6v2: 188/ 256 B paq8f: 185/ 256 B paq8h: 194/ 256 B paq8l: 204/ 256 B paqar: 179/ 256 B If you have some interesting ones, I'd happy to add them. So far I also want to add oneKpaq, as it has a 128B decompressor. More on the other topics later. E.g. prefix emitter: on x86 I might try to use long SSE insts. They share a 0x0f prefix. This is easy to emit with little code. On Z80 there are several prefixes (FD, ED etc.), which are rather common. It would be easy to write such a compressor with encoding the prefixes in 1-2 bits.
    28 replies | 1627 view(s)
  • lz77's Avatar
    Yesterday, 13:46
    inb1 is a pointer to the current position in source data. inb1^ is 4 byte from current position, it has dword (uint32) type. Yes, but if there was a Hutter Price to the one who can compress enwik8 in 40% with pure LZ77 I would get it. :)
    17 replies | 25447 view(s)
  • Darek's Avatar
    Yesterday, 12:13
    Darek replied to a thread Paq8sk in Data Compression
    I've started to test paq8sk19. paq8sk22 or paq8sk23 is generally too slow... First 200-300MB goes generally (historically) well but after then often program start to use very high amount of memory and often block whole computer - as I said I have "only" 32GB of RAM.
    98 replies | 8770 view(s)
  • SolidComp's Avatar
    Yesterday, 12:13
    James, what about SLZ? That's the fastest gzip implementation on earth as far as I know. It doesn't compress as well though. But it's incredibly fast, and super light in terms of CPU and memory consumption.
    9 replies | 551 view(s)
  • Jyrki Alakuijala's Avatar
    Yesterday, 00:32
    ​If we didn't fix that yet, we will fix it very soon now.
    156 replies | 37355 view(s)
  • vteromero's Avatar
    3rd June 2020, 17:51
    vteromero replied to a thread VTEnc in Data Compression
    If anyone is interested, here's a blog post about encoding parameters in VTEnc library: https://vteromero.github.io/2020/06/03/encoding-parameters-in-vtenc-library.html
    22 replies | 4665 view(s)
  • Gotty's Avatar
    3rd June 2020, 17:39
    Yes. And needs to be large. A large odd "random" number.
    17 replies | 25447 view(s)
  • Gotty's Avatar
    3rd June 2020, 17:38
    Looks good so far. What is supplied to this function as a parameter? I don't think that the number of bits in the multiplier has an effect on the performance of the multiplication. The cpu does not actually look for the 1 bits in the multiplier to do shifts and additions. An explanation is here. 40.008% -> 39.974% is a 0.034% gain. I would not call it noticeably better. :-)
    17 replies | 25447 view(s)
  • Jyrki Alakuijala's Avatar
    3rd June 2020, 17:28
    Jyrki Alakuijala replied to a thread Brotli in Data Compression
    There are 120 transforms. Appendix B in rfc7932. A custom 'shared brotli' dictionary can bring its own transforms.
    261 replies | 82794 view(s)
  • Jyrki Alakuijala's Avatar
    3rd June 2020, 17:26
    Jyrki Alakuijala replied to a thread Brotli in Data Compression
    Zstd's two improvements over zlib are larger backward reference window and ANS instead of prefix coding. ANS does not help much (0.5 %) if you have a lot of literals -- in brotli's design experiments ANS was a (small) density loss over prefix coding. larger backward references don't help much if you have a small data size (particularly if it is around 32 kB or less). So, no big surprise zstd does not compress better than zlib here. Brotli brings three improvements that do help for small data: context modeling, static dictionary and super-cheap swapping back to previous entropy codes. Optimized javascript is not as easy to compress as many other data, and often we see only ~10 % further savings from moving from zlib to brotli.
    261 replies | 82794 view(s)
  • Stefan Atev's Avatar
    3rd June 2020, 16:16
    I guess ROM is the only thing you can kind of depend on to be your "dictionary"; even then, you'd be making assumptions about the machine you're running on that may be a bit too specifc (e.g., all the 6502 clones I ever programmed were Pravetz 82 clones, definitely had their own ROM different than an Apple II); it's exactly like ROP exploits - they work on specific versions of OSes, not generically on a platform. Though maybe for non-malicious code, there's less risk of somebody patching what you depended on. People used to try to check the code pointed to by an interrupt handler to try to ensure that they are not running under debugger, or that certain TSRs are not installed, and that ultimately breaks when what you're checking legitimately changes...
    28 replies | 1627 view(s)
  • lz77's Avatar
    3rd June 2020, 13:19
    I tested both numbers on my LZ77 type compressor: magic number = 123456789 | 506832829 enwik8: 40.549% | 40.573% enwik9: 36.305% | 36.314% silesia.tar: 37.429% | 37.423% lz4_win64_v1_9_2.exe: 42.341% | 42.359% The humorous constant 123456789 gives noticeably better compression. Your number is better only on silesia.tar (on 0.006%). I think, because silesia.tar contains very specific img files. > I believe the number doesn't need to be prime, but needs to be odd and have a good distribution of 0s and 1s. Yes, the number 123456789 includes sequences of 1, 2, 3 and 4 1-bits. By the way: when I used the Knut's number 2654435761, my algorithm with best ratio was compressing enwik8 in 40.008%. After I change the Knuth's number to the 123456789 my algorithm overcame the psychological frontier and showed 39.974%. :_yahoo2: < 40% on enwik8 only with hash table of 128K cells without search matches, source analysis & additional compression! May be after improvement the algorithm will show the compression ratio < 40% on a hash table of 64K cells...
    17 replies | 25447 view(s)
  • Jyrki Alakuijala's Avatar
    3rd June 2020, 12:14
    Are those numbers better or worse than my personal magic number: 506832829 I believe the number doesn't need to be prime, but needs to be odd and have a good distribution of 0s and 1s.
    17 replies | 25447 view(s)
  • lz77's Avatar
    3rd June 2020, 11:19
    Bitte, this is a piece of my code: const TABLEBITS = 17; TABLESIZE = 1 shl TABLEBITS; var table: array of dword; ................................ // Calculate hash by 4 bytes from inb1^ function CalcHash(dw: dword): dword; assembler; asm mov ecx,123456789 mul ecx shr eax,32-TABLEBITS end; { CalcHash } I believe that the fewer 1 bits will be in the magic constant, the faster the multiplication will be performed, which will increase the compression speed.
    17 replies | 25447 view(s)
  • pklat's Avatar
    3rd June 2020, 08:49
    file timestamps are stored also, and iirc they differ in NTFS and other filesystems
    3 replies | 137 view(s)
  • Shelwien's Avatar
    3rd June 2020, 06:50
    Windows and linux actually have different file attributes, so you'd need a hacked zip which doesn't store attributes, otherwise normally it won't happen. You can try using a zip built under cygwin/msys2 on windows side, but even these would have different default unix attributes (eg. cygwin shows 0770 for windows files), and possibly also custom defines for cygwin which would make them use winapi.
    3 replies | 137 view(s)
  • Shelwien's Avatar
    3rd June 2020, 06:43
    > wouldn't have been able to depend on compression routines in Windows. I actually mean chunks of existing code, like https://en.wikipedia.org/wiki/Return-oriented_programming#Attacks https://github.com/JonathanSalwan/ROPgadget#screenshots
    28 replies | 1627 view(s)
  • ivan2k2's Avatar
    3rd June 2020, 06:06
    1) try to compess non-textual files and check results 2) try to compress your text files with -ll or -l option and check results
    3 replies | 137 view(s)
  • Stefan Atev's Avatar
    3rd June 2020, 04:49
    I can see that, though it goes against my instincts :) I have seen people extract common instruction sequences into subroutines even if they were pretty arbitrary and logically unrelated; you eat 3 bytes to do a call (and one ret) each time you need the sequence, and that is basically an "executable LZ"; I can see how actual LZ would quickly be better since matches are encoded more efficiently than even near calls. However, for some data LZ is not that great, while a custom encoding could work quite well. None of the stuff I ever wrote assumed anything more than DOS, wouldn't have been able to depend on compression routines in Windows.
    28 replies | 1627 view(s)
  • introspec's Avatar
    3rd June 2020, 01:54
    I think some people made use of tricks like this. I have a lot of experience with older computers, for them data compression pretty much did not exist. I'd love to be proved wrong here, but I'd be very surprised if any of 1980s machines has anything of the kind in their ROMs.
    28 replies | 1627 view(s)
  • introspec's Avatar
    3rd June 2020, 01:51
    I think that there are two approaches to making a compressed intro. First, more common one, would be to compress your well-tuned code so that a bit of extra squeeze can be achieved. This is very traditional strategy, but it is not the only one. Second strategy is to design data structures and also your code to help the compressor. E.g. often in a compressed intro a short loop can be replaced by a series of unrolled statements - an insane strategy in a size-optimized world, but quite possibly viable approach if you know that intro will be compressed. A complete paradigm shift is needed in this case, of course.
    28 replies | 1627 view(s)
  • introspec's Avatar
    3rd June 2020, 01:46
    1) I know some neat examples of Z80 decompressors. I am not aware of any systematic lists. I recently did some reverse-engineering of ZX Spectrum based 1Ks. About one third of them were packed; the most popular compressors seemed to be ZX7, MegaLZ and BitBuster (in order of reducing popularity, note that the respective decompressor sizes are 69, 110 and 88 bytes). 2) Maybe yes, but the large influence of the decompressor size means that data format becomes a lot more important than usual. I think that this implies a lot of scope for adaptivity and tricks.
    28 replies | 1627 view(s)
  • redrabbit's Avatar
    3rd June 2020, 01:40
    Hi! I remember a zip program wich can create a zip file with the same CRC/size no matter where you using that tool, actually i tried to compress a buch of files with 7zip 16.04 (64 bits) and zip 3.00 in Linux and in Windows but the final files don't have the same size, even i tried to stored the files but i get different results Example: wine zip.exe -rq -D -X -0 -A testwindows.zip *.txt zip -rq -D -X -0 -A testlinux.zip *.txt md5sum *.zip 725d46abb1b87e574a439db15b1ba506 testlinux.zip 70df8fe8d0371bf26a263593351dd112 testwindows.zip As i said i remember a zip program (i don't know the name) who the author said that the result was always the same regardless of the platform (win, linux...)
    3 replies | 137 view(s)
  • introspec's Avatar
    3rd June 2020, 01:40
    Frankly, I do not think Crinkler (or similar tools) are very relevant to this thread. You are right that there could be improvements to the decompressor, but I was trying to say that you won't get LZMA into sub 50-100b decompressor, so although an amazing tool for 4K or 8K intros, it is just a different kind of tool. Your idea to only have match length of 2 is cool (although I need to try it in practice to see how much ratio one loses in this case). The smallest generic LZ77 on Z80 that I know has an 18 byte decompression loop, so your 17 byte loop would be interesting to see - have you published it anywhere? In any case, I am working on a small article about such mini-decompressors and is definitely looking forward for anything you will write. I mainly code on Z80, so I do not know much about prefix emitters. Can you point to any discussion of what they can look like?
    28 replies | 1627 view(s)
  • maadjordan's Avatar
    3rd June 2020, 01:33
    maadjordan replied to a thread WinRAR in Data Compression
    As winrar does not support compressing with 7-zip and its plugins, would kindly provide a reduced version of your plugins for extraction only. Many Thanks
    185 replies | 129932 view(s)
  • maadjordan's Avatar
    3rd June 2020, 01:32
    maadjordan replied to a thread WinRAR in Data Compression
    :)
    185 replies | 129932 view(s)
  • Darek's Avatar
    3rd June 2020, 00:33
    Darek replied to a thread Paq8pxd dict in Data Compression
    I've tested best options for Byron's dictionary on 4 Corpuses files. It was made for paq8pxd v85 version. Of course I realise that It could works only for some versions but it looks at now it works. The best results are for Silesia Corpus -> 74KB of gain which is worth to use, for other corpuses gains are smaller but there always something. Files not mentioned below didn't get any gain due to use -w option or -exx. file: option SILESIA dickens: -e77,dict mozilla: -e26,dict osdb: -w reymont: -w samba: -e133,dict sao: -w webster: -e373,dict TOTAL Silesia savings = 74'107 bytes CALGARY book1: -e47,dict book2: -e43,dict news: -e97,dict paper2: -e34,dict progp: -e75,dict Calgary.tar: -e49,dict TOTAL Calgary savings = 1'327 bytes Calgary.tar savings = 3'057 bytes CANTERBURY alice29.txt: -e38,dict asyoulik.txt: -e53,dict lcet10.txt: -e54,dict plrabn12.txt: -e110,dict Canterbury.tar: -e95,dict TOTAL Canterbury savings = 873 bytes Canterbury.tar savings = 1'615 bytes MAXIMUM COMPRESSION world95.txt: -e22,dict TOTAL Maximum Compression savings = 1'449 bytes Due to all settings and changes Maximum Compression score for paq8pxd v89 is below 6'000'000 bytes! First time ever! (w/o using tarball option, comprssing tar file got 5'993'762 bytes)
    922 replies | 315500 view(s)
  • Shelwien's Avatar
    2nd June 2020, 23:27
    Windows has some preinstalled compression algorithms actually (deflate, LZX/LZMS): http://hugi.scene.org/online/hugi28/hugi%2028%20-%20coding%20corner%20gem%20cab%20dropping.htm I wonder if same applies to other platforms? Maybe at least some relevant code in the ROM?
    28 replies | 1627 view(s)
  • Gotty's Avatar
    2nd June 2020, 18:18
    Gotty replied to a thread paq8px in Data Compression
    Please specify what "digits" mean. Do you mean single-digit ASCII decimal numbers one after the other with no whitespace, like "20200602"? I'm not sure what you mean by "in detail". It would not be easy to demonstrate in a forum post what paq8px does exactly. Especially because paq8px is heavy stuff - it needs a lot of foundation. I suppose you did some research, study and reading since we last met and you would like to dig deeper? If you need real depth, you will need to fetch the source code and study it. However if you have a specific question, feel free to ask it here.
    1857 replies | 539431 view(s)
  • Stefan Atev's Avatar
    2nd June 2020, 18:05
    My experience being with x86 1K intros, this certainly resonates; at the end of the day, the tiny (de)compressor should only be helping you with code - all the data in the intro should be custom-packed anyway, in a way that makes it difficult to compress for LZ-based algorithms. For example, I remember using basically 2bits per beat for an audio track (2 instruments only, both OPL-2 synth); Fonts would be packed, etc. 4K is different, I think there you just have a lot more room. And for 128B and 256B demos, compression is very unlikely to help, I think.
    28 replies | 1627 view(s)
  • Gotty's Avatar
    2nd June 2020, 17:58
    How many bits are used for addressing the hash table (or: how many slots do you have)? How do you exactly implement hashing (do you shift >>)? What do you hash exactly?
    17 replies | 25447 view(s)
  • introspec's Avatar
    2nd June 2020, 17:05
    Yes, I know about Saukav too. I did not have time to do detailed testing of it, but I understand what it does quite well and it should offer compression at the level of Pletter (likely somewhat better than Pletter), while being fairly compact. I believe that its approach to adaptivity, with multiple specific decompressors offered, is an excellent way to increase the compression "for free". However, I strongly suspect that a better solution must be available, most likely for 1K intros and definitely for 256b intros. I can explain what I mean as follows: Suppose you are working on a 1K intro that uses Saukav and at some point you reach the situation where the compressed intro together with decompressor uses up all available space. Suppose that the average decompressor length is 64 bytes (this is the size of zx7b decompressor - the origin of Saukav). Then your compressed size is 1024-64=960 bytes. I do not know the exact ratio offered by Saukav, so I'll use Pletter's ratio of 1.975 as a guide. Hence, our intro is actually 960*1.975=1896 bytes long. Let us now consider switching to a better compressor, e.g. Shrinkler, which is LZMA-based and compresses at the level similar to 7-zip. Its ratio on the same small file corpus that I use for many of my tests is about 2.25. Thus, compressed by Shrinkler our intro will become 1896/2.25~843 bytes long (I should be saying "on average", but it is very annoying to repeat "on average" all the time, so I assume it implicitly). We saved 960-843=117 bytes, which may sound great, yet in fact is useless. The shortest decompressor for Shrinkler on Z80 is 209 bytes long, so we saved 117 bytes in data, and added 209-64=145 bytes in decompressor, i.e. lost 28 bytes overall. The point is, when you are making a 1K intro, Shrinkler will lose to Saukav (to ZX7, MegaLZ, to basically any decent compressor with compact enough decompressor). When working with 4K of memory, these concerns become pretty much irrelevant, you can use Crinkler, Shrinkler or any other tool of your choice. But at 1K the ratio becomes less of a concern, as long as it is not completely dreadful, and decompressor size starts to dominate the proceedings. For 256b intros the situation is even more dramatic. I made one such intro using ZXmini (decompressor size 38 bytes) and found it (the decompressor) a bit too large. More can be done for sure. So, looking at your graph, I do not know what is your target size, but for anything <=1K trust me, without decompressor size included, this data is meaningless.
    28 replies | 1627 view(s)
  • lz77's Avatar
    2nd June 2020, 14:44
    In my LZ77 like algorithm the magic number 123456789 is always better than famous Knuth's prime 2654435761. I tested it on enwik8, enwik9, silesia.tar, LZ4.1.9.2.exe, ... I tried to using two hash functions (one for even value, the other for odd value) but this worsened the result.
    17 replies | 25447 view(s)
  • Trench's Avatar
    2nd June 2020, 06:07
    Sorry. Yes True. But something like that wont be as versatile as the bigger ones. Kind of like having a Swiss army knife as a daily kitchen knife. Unless most of the code is in the file which that wont work, or it compressed itself. But it depends what one wants to decompress since one size cant fit all I assume. zip programs like all programs are like satellite programs relying on the main computer OS they run on. So the file is technically bigger. If you were to put it in another OS or older they wont function or be as small. So if the OS has tile files that help the zip program then you can get something smaller. So in a way its kind of cheating. What would a math professor come up with? But again its good to try for fun. Just an opinion.
    28 replies | 1627 view(s)
  • Sportman's Avatar
    1st June 2020, 22:23
    Sportman replied to a thread Paq8sk in Data Compression
    I guess -x14, I do only fast tests this moment so no enwik9.
    98 replies | 8770 view(s)
  • Dresdenboy's Avatar
    1st June 2020, 22:18
    Trench, I understand these more pragmatic and/or philosophical considerations. They could even lead to erasing the benefit of compressors like PAQ due to the computational costs. But here we're discussing decompressors, which ought to be used for older and curent platforms, with constraints for the actual code's size (like a program that fits into a 512 byte bootsector, maybe with less than that available for custom code). There are competitions for that.
    28 replies | 1627 view(s)
  • Trench's Avatar
    1st June 2020, 19:55
    A file like 1234567890abcdefghij is Size = 20 bytes (20 bytes) size on disk = 4.00 KB (4,096 bytes) But if you have the file name be 1234567890abcdefghij.txt and erase the file content then you have 0 bytes as cheezy as that sounds. i assume it would still take the same size on your hard drive even if the file is 0 bytes. Even i the file was 0 Bytes the file compression goes as low as 82 bytes, and with the 20 bytes its 138, 136 if its 1 letter 20 times. Sure it has some pointless info in the compressed file just like how many pictures also have pointless info. Example "7z¼¯' óò÷6 2 ÷|\N€€  z . t x t  ªÚ38Ö " Pointless to even have "z . t x t " with so many spaces On a side note we live in a society which ignores wast fullness so in everything you see their is plenty of waste. Kind of like how if everyone did not throw away at least 2 grains of rice a day it be enough to feed every malnutrition starving person which is the main cause of health issues in the world, yet most throw away 1000 times more than that, even good quality since many confuse best by date to assume its expiration date which their are no expiration dates on food. The file compression programs dont let you set the library that small. Point is why should it be done? It can be depending on the file. I assume the bigger the file compression program is to store various methods then smaller the file. It would take more Hard drive space to have 1000 20 byes of file than 1 big file. The transfer rate would also get hurt greatly. Sure its possible to have a algorithm for something small but again no one would bother. If you intent to use that method on bigger file I am guessing it might be limited. But might be fun challenge find for someone to do.
    28 replies | 1627 view(s)
  • Dresdenboy's Avatar
    1st June 2020, 18:00
    Adding to 1): 6502's ISA likely is too limited to do sth useful in <40B. BTW is there a comprehensive list of decompressor sizes and algos somewhere? You mentioned several ones in your Russian article. Adding to 2): Chances of being able to compete are diminishing quickly with less code. Compressed data size + decompressor might be an interesting metric. But I got an idea to do some LZ78 variant. Maybe it stays below 40B.
    28 replies | 1627 view(s)
  • Dresdenboy's Avatar
    1st June 2020, 17:26
    Interesting idea. So is it pop cs as a jump instruction? With a COM executable and initial registers set to DS/CS (and useful constants like -2 or 100h in others) this sounds good.
    28 replies | 1627 view(s)
  • JamesB's Avatar
    1st June 2020, 16:02
    JamesB replied to a thread Brotli in Data Compression
    With libdeflate I get 29951 (29950 with -12). So close to zstd (I got 29450 on that).
    261 replies | 82794 view(s)
  • JamesB's Avatar
    1st June 2020, 15:54
    It's good for block based formats, but the lack of streaming may be an issue for general purpose zlib replacement. However even for a streaming gzip you could artificially chunk it into relatively large blocks. It's not ideal, but may be the better speed/ratio tradeoff still means it's a win for most data types. We use it in bgzf (wrapper for BAM and BCF formats), which has pathetically small block sizes, as a replacement for zlib.
    9 replies | 551 view(s)
  • suryakandau@yahoo.co.id's Avatar
    1st June 2020, 15:36
    Sportman could you test enwik9 using paq8sk23 -x4 -w -e1,English.dic please ?
    98 replies | 8770 view(s)
  • Sportman's Avatar
    1st June 2020, 12:57
    Sportman replied to a thread Paq8sk in Data Compression
    enwik8: 15,753,052 bytes, 13,791.106 sec., paq8sk23 -x15 -w 15,618,351 bytes, 14,426.736 sec., paq8sk23 -x15 -w -e1,english.dic
    98 replies | 8770 view(s)
  • Fallon's Avatar
    1st June 2020, 07:46
    Fallon replied to a thread WinRAR in Data Compression
    WinRAR - What's new in the latest version https://www.rarlab.com/download.htm
    185 replies | 129932 view(s)
  • Shelwien's Avatar
    1st June 2020, 06:22
    Fast encoding strategies like to skip hashing inside of matches. Otherwise, you can just get collisions for hash values - its easily possible that the cell for 'zabc' would be overwritten, while the cell for 'abcd' won't.
    1 replies | 176 view(s)
  • suryakandau@yahoo.co.id's Avatar
    1st June 2020, 02:52
    you are right but the result is better than paq8pxd series.:)
    98 replies | 8770 view(s)
  • Stefan Atev's Avatar
    1st June 2020, 01:56
    That was my experience for sure, I think I had to just make sure decompression started on a 16B-aligned offset so you could later just bump a segment register to point to the decompressed when you jump to it.
    28 replies | 1627 view(s)
  • Sportman's Avatar
    31st May 2020, 19:39
    The Ongoing CPU Security Mitigation Impact On The Core i9 10900K Comet Lake: "At least for the workloads tested this round, when booting the new Intel Core i9 10900K "Comet Lake" processor with the software-controlled CPU security mitigations disabled, the overall performance was elevated by about 6% depending upon the workload." https://www.phoronix.com/scan.php?page=article&item=intel-10900k-mitigations
    19 replies | 4542 view(s)
More Activity