Activity Stream

Filter
Sort By Time Show
Recent Recent Popular Popular Anytime Anytime Last 24 Hours Last 24 Hours Last 7 Days Last 7 Days Last 30 Days Last 30 Days All All Photos Photos Forum Forums
  • lz77's Avatar
    Today, 13:19
    I tested both numbers on my LZ77 type compressor: magic number = 123456789 | 506832829 enwik8: 40.549% | 40.573% enwik9: 36.305% | 36.314% silesia.tar: 37.429% | 37.423% lz4_win64_v1_9_2.exe: 42.341% | 42.359% 123456789 gives noticeably better compression. Your number is better only on silesia.tar (on 0.006%). I think, because silesia.tar contains very specific img files. > I believe the number doesn't need to be prime, but needs to be odd and have a good distribution of 0s and 1s. Yes the number 123456789 includes sequences of 1, 2, 3 and 4 1-bits. By the way: when I used the Knut's number 2654435761, my best in ratio algorithm was compressing enwik8 in 40.008%. After I change the Knuth's number to the 123456789 my algorithm overcame the psychological frontier and showed 39.974%. :_yahoo2: < 40% on enwik8 only with 128K cells hash table without search matches, source analysis & additional compression!
    10 replies | 25128 view(s)
  • Jyrki Alakuijala's Avatar
    Today, 12:14
    Are those numbers better or worse than my personal magic number: 506832829 I believe the number doesn't need to be prime, but needs to be odd and have a good distribution of 0s and 1s.
    10 replies | 25128 view(s)
  • lz77's Avatar
    Today, 11:19
    Bitte, this is a piece of my code: const TABLEBITS = 17; TABLESIZE = 1 shl TABLEBITS; var table: array of dword; ................................ // Calculate hash by 4 bytes from inb1^ function CalcHash(dw: dword): dword; assembler; asm mov ecx,123456789 mul ecx shr eax,32-TABLEBITS end; { CalcHash } I believe that the fewer 1 bits will be in the magic constant, the faster the multiplication will be performed, which will increase the compression speed.
    10 replies | 25128 view(s)
  • pklat's Avatar
    Today, 08:49
    file timestamps are stored also, and iirc they differ in NTFS and other filesystems
    3 replies | 58 view(s)
  • Shelwien's Avatar
    Today, 06:50
    Windows and linux actually have different file attributes, so you'd need a hacked zip which doesn't store attributes, otherwise normally it won't happen. You can try using a zip built under cygwin/msys2 on windows side, but even these would have different default unix attributes (eg. cygwin shows 0770 for windows files), and possibly also custom defines for cygwin which would make them use winapi.
    3 replies | 58 view(s)
  • Shelwien's Avatar
    Today, 06:43
    > wouldn't have been able to depend on compression routines in Windows. I actually mean chunks of existing code, like https://en.wikipedia.org/wiki/Return-oriented_programming#Attacks https://github.com/JonathanSalwan/ROPgadget#screenshots
    26 replies | 1403 view(s)
  • ivan2k2's Avatar
    Today, 06:06
    1) try to compess non-textual files and check results 2) try to compress your text files with -ll or -l option and check results
    3 replies | 58 view(s)
  • Stefan Atev's Avatar
    Today, 04:49
    I can see that, though it goes against my instincts :) I have seen people extract common instruction sequences into subroutines even if they were pretty arbitrary and logically unrelated; you eat 3 bytes to do a call (and one ret) each time you need the sequence, and that is basically an "executable LZ"; I can see how actual LZ would quickly be better since matches are encoded more efficiently than even near calls. However, for some data LZ is not that great, while a custom encoding could work quite well. None of the stuff I ever wrote assumed anything more than DOS, wouldn't have been able to depend on compression routines in Windows.
    26 replies | 1403 view(s)
  • introspec's Avatar
    Today, 01:54
    I think some people made use of tricks like this. I have a lot of experience with older computers, for them data compression pretty much did not exist. I'd love to be proved wrong here, but I'd be very surprised if any of 1980s machines has anything of the kind in their ROMs.
    26 replies | 1403 view(s)
  • introspec's Avatar
    Today, 01:51
    I think that there are two approaches to making a compressed intro. First, more common one, would be to compress your well-tuned code so that a bit of extra squeeze can be achieved. This is very traditional strategy, but it is not the only one. Second strategy is to design data structures and also your code to help the compressor. E.g. often in a compressed intro a short loop can be replaced by a series of unrolled statements - an insane strategy in a size-optimized world, but quite possibly viable approach if you know that intro will be compressed. A complete paradigm shift is needed in this case, of course.
    26 replies | 1403 view(s)
  • introspec's Avatar
    Today, 01:46
    1) I know some neat examples of Z80 decompressors. I am not aware of any systematic lists. I recently did some reverse-engineering of ZX Spectrum based 1Ks. About one third of them were packed; the most popular compressors seemed to be ZX7, MegaLZ and BitBuster (in order of reducing popularity, note that the respective decompressor sizes are 69, 110 and 88 bytes). 2) Maybe yes, but the large influence of the decompressor size means that data format becomes a lot more important than usual. I think that this implies a lot of scope for adaptivity and tricks.
    26 replies | 1403 view(s)
  • redrabbit's Avatar
    Today, 01:40
    Hi! I remember a zip program wich can create a zip file with the same CRC/size no matter where you using that tool, actually i tried to compress a buch of files with 7zip 16.04 (64 bits) and zip 3.00 in Linux and in Windows but the final files don't have the same size, even i tried to stored the files but i get different results Example: wine zip.exe -rq -D -X -0 -A testwindows.zip *.txt zip -rq -D -X -0 -A testlinux.zip *.txt md5sum *.zip 725d46abb1b87e574a439db15b1ba506 testlinux.zip 70df8fe8d0371bf26a263593351dd112 testwindows.zip As i said i remember a zip program (i don't know the name) who the author said that the result was always the same regardless of the platform (win, linux...)
    3 replies | 58 view(s)
  • introspec's Avatar
    Today, 01:40
    Frankly, I do not think Crinkler (or similar tools) are very relevant to this thread. You are right that there could be improvements to the decompressor, but I was trying to say that you won't get LZMA into sub 50-100b decompressor, so although an amazing tool for 4K or 8K intros, it is just a different kind of tool. Your idea to only have match length of 2 is cool (although I need to try it in practice to see how much ratio one loses in this case). The smallest generic LZ77 on Z80 that I know has an 18 byte decompression loop, so your 17 byte loop would be interesting to see - have you published it anywhere? In any case, I am working on a small article about such mini-decompressors and is definitely looking forward for anything you will write. I mainly code on Z80, so I do not know much about prefix emitters. Can you point to any discussion of what they can look like?
    26 replies | 1403 view(s)
  • maadjordan's Avatar
    Today, 01:33
    maadjordan replied to a thread WinRAR in Data Compression
    As winrar does not support compressing with 7-zip and its plugins, would kindly provide a reduced version of your plugins for extraction only. Many Thanks
    185 replies | 129821 view(s)
  • maadjordan's Avatar
    Today, 01:32
    maadjordan replied to a thread WinRAR in Data Compression
    :)
    185 replies | 129821 view(s)
  • Darek's Avatar
    Today, 00:33
    Darek replied to a thread Paq8pxd dict in Data Compression
    I've tested best options for Byron's dictionary on 4 Corpuses files. It was made for paq8pxd v85 version. Of course I realise that It could works only for some versions but it looks at now it works. The best results are for Silesia Corpus -> 74KB of gain which is worth to use, for other corpuses gains are smaller but there always something. Files not mentioned below didn't get any gain due to use -w option or -exx. file: option SILESIA dickens: -e77,dict mozilla: -e26,dict osdb: -w reymont: -w samba: -e133,dict sao: -w webster: -e373,dict TOTAL Silesia savings = 74'107 bytes CALGARY book1: -e47,dict book2: -e43,dict news: -e97,dict paper2: -e34,dict progp: -e75,dict Calgary.tar: -e49,dict TOTAL Calgary savings = 1'327 bytes Calgary.tar savings = 3'057 bytes CANTERBURY alice29.txt: -e38,dict asyoulik.txt: -e53,dict lcet10.txt: -e54,dict plrabn12.txt: -e110,dict Canterbury.tar: -e95,dict TOTAL Canterbury savings = 873 bytes Canterbury.tar savings = 1'615 bytes MAXIMUM COMPRESSION world95.txt: -e22,dict TOTAL Maximum Compression savings = 1'449 bytes Due to all settings and changes Maximum Compression score for paq8pxd v89 is below 6'000'000 bytes! First time ever! (w/o using tarball option)
    922 replies | 315230 view(s)
  • Shelwien's Avatar
    Yesterday, 23:27
    Windows has some preinstalled compression algorithms actually (deflate, LZX/LZMS): http://hugi.scene.org/online/hugi28/hugi%2028%20-%20coding%20corner%20gem%20cab%20dropping.htm I wonder if same applies to other platforms? Maybe at least some relevant code in the ROM?
    26 replies | 1403 view(s)
  • Gotty's Avatar
    Yesterday, 18:18
    Gotty replied to a thread paq8px in Data Compression
    Please specify what "digits" mean. Do you mean single-digit ASCII decimal numbers one after the other with no whitespace, like "20200602"? I'm not sure what you mean by "in detail". It would not be easy to demonstrate in a forum post what paq8px does exactly. Especially because paq8px is heavy stuff - it needs a lot of foundation. I suppose you did some research, study and reading since we last met and you would like to dig deeper? If you need real depth, you will need to fetch the source code and study it. However if you have a specific question, feel free to ask it here.
    1857 replies | 539258 view(s)
  • Stefan Atev's Avatar
    Yesterday, 18:05
    My experience being with x86 1K intros, this certainly resonates; at the end of the day, the tiny (de)compressor should only be helping you with code - all the data in the intro should be custom-packed anyway, in a way that makes it difficult to compress for LZ-based algorithms. For example, I remember using basically 2bits per beat for an audio track (2 instruments only, both OPL-2 synth); Fonts would be packed, etc. 4K is different, I think there you just have a lot more room. And for 128B and 256B demos, compression is very unlikely to help, I think.
    26 replies | 1403 view(s)
  • Gotty's Avatar
    Yesterday, 17:58
    How many bits are used for addressing the hash table (or: how many slots do you have)? How do you exactly implement hashing (do you shift >>)? What do you hash exactly?
    10 replies | 25128 view(s)
  • introspec's Avatar
    Yesterday, 17:05
    Yes, I know about Saukav too. I did not have time to do detailed testing of it, but I understand what it does quite well and it should offer compression at the level of Pletter (likely somewhat better than Pletter), while being fairly compact. I believe that its approach to adaptivity, with multiple specific decompressors offered, is an excellent way to increase the compression "for free". However, I strongly suspect that a better solution must be available, most likely for 1K intros and definitely for 256b intros. I can explain what I mean as follows: Suppose you are working on a 1K intro that uses Saukav and at some point you reach the situation where the compressed intro together with decompressor uses up all available space. Suppose that the average decompressor length is 64 bytes (this is the size of zx7b decompressor - the origin of Saukav). Then your compressed size is 1024-64=960 bytes. I do not know the exact ratio offered by Saukav, so I'll use Pletter's ratio of 1.975 as a guide. Hence, our intro is actually 960*1.975=1896 bytes long. Let us now consider switching to a better compressor, e.g. Shrinkler, which is LZMA-based and compresses at the level similar to 7-zip. Its ratio on the same small file corpus that I use for many of my tests is about 2.25. Thus, compressed by Shrinkler our intro will become 1896/2.25~843 bytes long (I should be saying "on average", but it is very annoying to repeat "on average" all the time, so I assume it implicitly). We saved 960-843=117 bytes, which may sound great, yet in fact is useless. The shortest decompressor for Shrinkler on Z80 is 209 bytes long, so we saved 117 bytes in data, and added 209-64=145 bytes in decompressor, i.e. lost 28 bytes overall. The point is, when you are making a 1K intro, Shrinkler will lose to Saukav (to ZX7, MegaLZ, to basically any decent compressor with compact enough decompressor). When working with 4K of memory, these concerns become pretty much irrelevant, you can use Crinkler, Shrinkler or any other tool of your choice. But at 1K the ratio becomes less of a concern, as long as it is not completely dreadful, and decompressor size starts to dominate the proceedings. For 256b intros the situation is even more dramatic. I made one such intro using ZXmini (decompressor size 38 bytes) and found it (the decompressor) a bit too large. More can be done for sure. So, looking at your graph, I do not know what is your target size, but for anything <=1K trust me, without decompressor size included, this data is meaningless.
    26 replies | 1403 view(s)
  • lz77's Avatar
    Yesterday, 14:44
    In my LZ77 like algorithm the magic number 123456789 is always better than famous Knuth's prime 2654435761. I tested it on enwik8, enwik9, silesia.tar, LZ4.1.9.2.exe, ... I tried to using two hash functions (one for even value, the other for odd value) but this worsened the result.
    10 replies | 25128 view(s)
  • Trench's Avatar
    Yesterday, 06:07
    Sorry. Yes True. But something like that wont be as versatile as the bigger ones. Kind of like having a Swiss army knife as a daily kitchen knife. Unless most of the code is in the file which that wont work, or it compressed itself. But it depends what one wants to decompress since one size cant fit all I assume. zip programs like all programs are like satellite programs relying on the main computer OS they run on. So the file is technically bigger. If you were to put it in another OS or older they wont function or be as small. So if the OS has tile files that help the zip program then you can get something smaller. So in a way its kind of cheating. What would a math professor come up with? But again its good to try for fun. Just an opinion.
    26 replies | 1403 view(s)
  • Sportman's Avatar
    1st June 2020, 22:23
    Sportman replied to a thread Paq8sk in Data Compression
    I guess -x14, I do only fast tests this moment so no enwik9.
    95 replies | 8579 view(s)
  • Dresdenboy's Avatar
    1st June 2020, 22:18
    Trench, I understand these more pragmatic and/or philosophical considerations. They could even lead to erasing the benefit of compressors like PAQ due to the computational costs. But here we're discussing decompressors, which ought to be used for older and curent platforms, with constraints for the actual code's size (like a program that fits into a 512 byte bootsector, maybe with less than that available for custom code). There are competitions for that.
    26 replies | 1403 view(s)
  • Trench's Avatar
    1st June 2020, 19:55
    A file like 1234567890abcdefghij is Size = 20 bytes (20 bytes) size on disk = 4.00 KB (4,096 bytes) But if you have the file name be 1234567890abcdefghij.txt and erase the file content then you have 0 bytes as cheezy as that sounds. i assume it would still take the same size on your hard drive even if the file is 0 bytes. Even i the file was 0 Bytes the file compression goes as low as 82 bytes, and with the 20 bytes its 138, 136 if its 1 letter 20 times. Sure it has some pointless info in the compressed file just like how many pictures also have pointless info. Example "7z¼¯' óò÷6 2 ÷|\N€€  z . t x t  ªÚ38Ö " Pointless to even have "z . t x t " with so many spaces On a side note we live in a society which ignores wast fullness so in everything you see their is plenty of waste. Kind of like how if everyone did not throw away at least 2 grains of rice a day it be enough to feed every malnutrition starving person which is the main cause of health issues in the world, yet most throw away 1000 times more than that, even good quality since many confuse best by date to assume its expiration date which their are no expiration dates on food. The file compression programs dont let you set the library that small. Point is why should it be done? It can be depending on the file. I assume the bigger the file compression program is to store various methods then smaller the file. It would take more Hard drive space to have 1000 20 byes of file than 1 big file. The transfer rate would also get hurt greatly. Sure its possible to have a algorithm for something small but again no one would bother. If you intent to use that method on bigger file I am guessing it might be limited. But might be fun challenge find for someone to do.
    26 replies | 1403 view(s)
  • Dresdenboy's Avatar
    1st June 2020, 18:00
    Adding to 1): 6502's ISA likely is too limited to do sth useful in <40B. BTW is there a comprehensive list of decompressor sizes and algos somewhere? You mentioned several ones in your Russian article. Adding to 2): Chances of being able to compete are diminishing quickly with less code. Compressed data size + decompressor might be an interesting metric. But I got an idea to do some LZ78 variant. Maybe it stays below 40B.
    26 replies | 1403 view(s)
  • Dresdenboy's Avatar
    1st June 2020, 17:26
    Interesting idea. So is it pop cs as a jump instruction? With a COM executable and initial registers set to DS/CS (and useful constants like -2 or 100h in others) this sounds good.
    26 replies | 1403 view(s)
  • JamesB's Avatar
    1st June 2020, 16:02
    JamesB replied to a thread Brotli in Data Compression
    With libdeflate I get 29951 (29950 with -12). So close to zstd (I got 29450 on that).
    258 replies | 82573 view(s)
  • JamesB's Avatar
    1st June 2020, 15:54
    It's good for block based formats, but the lack of streaming may be an issue for general purpose zlib replacement. However even for a streaming gzip you could artificially chunk it into relatively large blocks. It's not ideal, but may be the better speed/ratio tradeoff still means it's a win for most data types. We use it in bgzf (wrapper for BAM and BCF formats), which has pathetically small block sizes, as a replacement for zlib.
    6 replies | 435 view(s)
  • suryakandau@yahoo.co.id's Avatar
    1st June 2020, 15:36
    Sportman could you test enwik9 using paq8sk23 -x4 -w -e1,English.dic please ?
    95 replies | 8579 view(s)
  • Sportman's Avatar
    1st June 2020, 12:57
    Sportman replied to a thread Paq8sk in Data Compression
    enwik8: 15,753,052 bytes, 13,791.106 sec., paq8sk23 -x15 -w 15,618,351 bytes, 14,426.736 sec., paq8sk23 -x15 -w -e1,english.dic
    95 replies | 8579 view(s)
  • Fallon's Avatar
    1st June 2020, 07:46
    Fallon replied to a thread WinRAR in Data Compression
    WinRAR - What's new in the latest version https://www.rarlab.com/download.htm
    185 replies | 129821 view(s)
  • Shelwien's Avatar
    1st June 2020, 06:22
    Fast encoding strategies like to skip hashing inside of matches. Otherwise, you can just get collisions for hash values - its easily possible that the cell for 'zabc' would be overwritten, while the cell for 'abcd' won't.
    1 replies | 164 view(s)
  • suryakandau@yahoo.co.id's Avatar
    1st June 2020, 02:52
    you are right but the result is better than paq8pxd series.:)
    95 replies | 8579 view(s)
  • Stefan Atev's Avatar
    1st June 2020, 01:56
    That was my experience for sure, I think I had to just make sure decompression started on a 16B-aligned offset so you could later just bump a segment register to point to the decompressed when you jump to it.
    26 replies | 1403 view(s)
  • Sportman's Avatar
    31st May 2020, 19:39
    The Ongoing CPU Security Mitigation Impact On The Core i9 10900K Comet Lake: "At least for the workloads tested this round, when booting the new Intel Core i9 10900K "Comet Lake" processor with the software-controlled CPU security mitigations disabled, the overall performance was elevated by about 6% depending upon the workload." https://www.phoronix.com/scan.php?page=article&item=intel-10900k-mitigations
    19 replies | 4522 view(s)
  • lz77's Avatar
    31st May 2020, 18:12
    Why even if we have a hash table of 128K cells and remember hash for each position can often be done subj? For example: let we found match from current position for substring 'abcd' in string ...zabcd..., then we found that 'zabcd' also matches. Sorry for my English...
    1 replies | 164 view(s)
  • Jyrki Alakuijala's Avatar
    31st May 2020, 17:53
    Many major games installs for PS4 that I observed were writing 100 kB/s for a large fraction of the install. This is pretty disappointing since there are 5000x faster decompression solutions readily available in open source. Somehow just going for a 15000x faster commercial solution is unlikely going to help unless the system level problems are fixed first. Most likely these relate to poorly planned or designed optical disc I/O, or data layout on the optical disc, not decompression.
    46 replies | 24193 view(s)
  • Jyrki Alakuijala's Avatar
    31st May 2020, 17:47
    Smallest decoder for general purpose decoding just starts executing the compressed signal.
    26 replies | 1403 view(s)
  • Darek's Avatar
    31st May 2020, 16:58
    Darek replied to a thread Paq8sk in Data Compression
    Could you post source code in every version? Short time test: paq8sk23 is about - 17% faster than paq8sk22 but is still about - 40% slower than paq8sk19 and - 78% slower than paq8pxd series.
    95 replies | 8579 view(s)
  • suryakandau@yahoo.co.id's Avatar
    31st May 2020, 15:17
    Paq8sk23 - improve text model - faster than paq8sk22 ​
    95 replies | 8579 view(s)
  • Dresdenboy's Avatar
    31st May 2020, 13:42
    As you're mentioning Crinkler: I had some interesting discussion with Ferris, who created Squishy. He uses a small decompressor to decompress the actual decompressor. He said, the smaller one is about 200B. Then there is xlink, where unlord planned to have a talk at Revision Online 2020, but which has been cancelled. But there seems to be some progress, which he didn't publish yet. This might also be interesting to watch. BTW my smallest decompression loop (for small data sizes and only match length of 2) is 12B. Making it more generic for offsets "blows" it up to 17B. Typical LZ with multiple lengths is starting at ~20B depending on variant and assumptions. There likely are similarities to Stefan Atev's lost one. But I have to continue testing all those variants (with their respective encoders) to publish more about them. Another idea was to encode some x86 specific prefix. Such a prefix emitter can be as small as 15B.
    26 replies | 1403 view(s)
  • Dresdenboy's Avatar
    31st May 2020, 02:21
    No problem. Accumulating this information sounds useful. I also collected a lot of information and did some analysis both of encoding ideas for small executables and existing compressors. Recently I stumbled over ZX7 and related to it: Saukav. The latter one is cool as it creates a decompressor based on the actual compression variant and parameters. Before finding it I already deemed this a necessity to keep sizes small. Especially for tiny intros, where a coder could omit some of the generic compressor code (code mover, decompression to original address) to save even more bytes (aside from coding compression-friendly). Here is an example with some of the tested compression algorithms (sizes w/o decompressor stubs and other data blocks, e.g. in the PAQ archive file format), leaving out all samples with less than 5% reduction, as they might be compressed already: Also interesting would be the total size incl. decompressor (not done yet). In this case we might just see different starting offsets (decompressor stub, tables etc.) on Y axis and different gradients with increasing X.
    26 replies | 1403 view(s)
  • introspec's Avatar
    30th May 2020, 23:54
    Yes, thank you. I should have mentioned that when I gave my estimated tiny compressor sizes, I had a quick look in several places and definitely used Baudsurfer's page for reference. Unfortunately, his collection of routines is not very systematic (in the sense that I know better examples for at least some CPUs, e.g. Z80), so I am hoping that a bit more representative collection of examples can be gradually accumulated here.
    26 replies | 1403 view(s)
  • Shelwien's Avatar
    30th May 2020, 21:41
    I would also like to see some other limitations of the contest: > I read that there would be a speed limit, but what about a RAM limit. I guess there would be a natural one - test machine obviously won't have infinite memory. > There are fast NN compressors, like MCM, or LPAQ. Yes, these would be acceptable, just not full PAQ or CMIX. > It's hard to fight LZ algorithms like RAZOR so I wouldn't try going in that direction. Well, RZ is a ROLZ/LZ77/Delta hybrid. Its still easy enough to achieve better compression via CM/PPM/BWT (and encoding speed too). Or much faster decoding with worse compression. > Are AVX and other instruction sets allowed? Yes, but likely not AVX512, since its hard to find a test machine for it. > What would be nice is some default preprocessing. > If it's an english benchmark, why shouldn't .drt preprocessing (like the one from cmix) > be available by choice (or .wrt + english.dic like the one from pax8pxd). I proposed that, but this approach has a recompression exploit - somebody could undo our preprocessing, then apply something better. So we'd try to explain that preprocessing is expected and post links to some open-source WRT implementations, but the data won't be preprocessed by default. > It would save some time for the developers not to incorporate them into their compressors, > if there were a time limit for the contest. It should run for a few months, so there should be enough time. There're plenty of ways to make a better preprocessor, WRT is not the only option (eg. NNCP preprocess outputs 16-bit alphabet), so its not a good idea to block that and/or force somebody to work on WRT reverse-engineering.
    15 replies | 960 view(s)
  • Jarek's Avatar
    30th May 2020, 18:19
    Jarek replied to a thread Kraken compressor in Data Compression
    Road to PS5: https://youtu.be/ph8LyNIT9sg?t=1020 custom kraken >5GB/s decompressor ...
    46 replies | 24193 view(s)
  • Dresdenboy's Avatar
    30th May 2020, 17:57
    Thanks for opening this thread. I'm working on my own tiny decompression experiments. And for starters let me point you to Baudsurfer's (Olivier Poudade) assembly art section on his Assembly Language Page: http://olivier.poudade.free.fr/ (site seems a bit buggy sometimes), which has several tiny compressors and decompressors for different platforms.
    26 replies | 1403 view(s)
  • Darek's Avatar
    30th May 2020, 14:33
    Darek replied to a thread Paq8sk in Data Compression
    I will. At least I'll try :) I need 2-3 days to finish task which is in progress and then I'll start paq8sk19. paq8sk22 looks for me like move in not good direction - very slightly improvrment affected double of compression time.
    95 replies | 8579 view(s)
  • Darek's Avatar
    30th May 2020, 14:29
    Darek replied to a thread Paq8sk in Data Compression
    @Sportman - it's dramatic change in compression time - does this version use much more memory than previous?
    95 replies | 8579 view(s)
  • AlexDoro's Avatar
    30th May 2020, 09:39
    I would vote for private. I would also like to see some other limitations of the contest: I read that there would be a speed limit, but what about a RAM limit. There are fast NN compressors, like MCM, or LPAQ. I mean they could be a starting point for some experimental fast compressors. It's hard to fight LZ algorithms like RAZOR so I wouldn't try going in that direction. Are AVX and other instruction sets allowed? What would be nice is some default preprocessing. If it's an english benchmark, why shouldn't .drt preprocessing (like the one from cmix) be available by choice (or .wrt + english.dic like the one from pax8pxd). It would save some time for the developers not to incorporate them into their compressors, if there were a time limit for the contest.
    15 replies | 960 view(s)
  • suryakandau@yahoo.co.id's Avatar
    30th May 2020, 06:21
    How about paq8sk22 -x15 -w -e1,english.dic for enwik8
    95 replies | 8579 view(s)
  • Trench's Avatar
    30th May 2020, 05:04
    1 What is a programmer? A translator from one language to another. What is a designer? A person that creates. So what is a programmer that tries to create a better file compression like? =A translator that want to change profession to be the next best selling author like Steven King. 2 What you probably should not say when you failed to pack a file? =Fudge, I failed to pack it 3 watch out for your phrases if you ask another if they can squeeze your dongle. 4 A programmer was asked what are you doing and they say concentrating on how to un concentrate something. The othee say easy have a few beers. 5 So the drunk programmer went and bought un-concentrated orange juice to sat on the carton, and a person asks why are they sitting on it. They say to concentrate it obviously. 6 When you ask another to compress lemon juice for you and then wonder if it can be decompressed then maybe its time to take a break. 7 Don't be impressed if someone compresses a file for you 99% the file size since they will say it cant be decompressed since you didnt also ask for that too. 8 A judge told your lawyer to zip it and you misunderstand and says no RAR and you are filed under contempt for growling at the judge. 9 A programmer says he was looking for new ways of packing files all day and another says you must be tiered from lifting so many files. 10 You friend wants forgiveness from you and send you a gift in a 7 zip file. You un-compress the file, and their is another compressed file and after 7th file its still compressed with a new name saying Matthew 18:21-22. Can you guess how many files you have left to un-compress?
    7 replies | 1660 view(s)
  • Amsal's Avatar
    30th May 2020, 03:53
    Well I can't vote but I would go with private dataset option. Few of the reasons why I prefer Option 2 over Option 1 are: 1. The resulting compressor/algorithm have more general use in practical ways than a compressor which is optimized for a specific file/dataset which is pretty useless most of the time if you see. 2. Allowing the use of dictionary is also a great add on in the contest. 3. I have no problem(and I suppose most of people won't have) if a algorithm/compressor is using 10 methods(precomp+srep+lzma etc..) or just modifying 1 method (like lzma) to produce better results on multiple datasets until its getting the results which could be used as a better option in practical ways on multiple datasets. I totally agree with these three points by you as well and it would be very great to have a contest like this. And according to me, I wouldn't even care for 16MB compressor if it really saves more size than any other compressor when I compress a 50GB dataset to something like 10GB while other compressors are around 12GB, so a 16mb compressor is a mere small size to account for but anyways its a competition so we take account of everything so fine by me :D
    15 replies | 960 view(s)
  • Trench's Avatar
    30th May 2020, 02:53
    AMD was for years mostly better based off price/performance. It is like why pay a billion for something that gives 1% better gain while the other is a dollar for 1% less than the others. You can buy more AMD cpu to out perform Intel. Its just that Intel has better marketing. AMD just does not get it since they are bad at presentation and i dont even understand their order and always reference Intel as a benchmark for performance. Big game companies got it and used AMD for a while now. So in short a million dollars of AMD cpu can beat a million dollars worth of Intel CPU. But to be fair 15 years is a long time. Also linux still sucks which will remain at 2% popularity since they just don't get it, and can not give it away for free no matter how many types they have. Its mainly a hobby os and not fit to use for the people which makes it pointless despite more powerful. Android is better since designers took over. Which goes to show never hire a translator to write novels, just like never hire a programmer as a designer. Progress is slowed down since one profession insist on doing other professionals work.
    2 replies | 124 view(s)
  • Sportman's Avatar
    30th May 2020, 01:47
    Sportman replied to a thread Paq8sk in Data Compression
    enwik8: 15,755,063 bytes, 14,222.427 sec., paq8sk22 -x15 -w 15,620,894 bytes, 14,940.285 sec., paq8sk22 -x15 -w -e1,english.dic
    95 replies | 8579 view(s)
  • suryakandau@yahoo.co.id's Avatar
    29th May 2020, 19:54
    The result using paq8sk22 -s6 -w -e1,English.dic on Dickens file is 1900420 bytes
    95 replies | 8579 view(s)
  • Cyan's Avatar
    29th May 2020, 19:41
    Cyan replied to a thread Zstandard in Data Compression
    This all depends on storage strategy. Dictionary is primarily useful when there are tons of small files. But if the log lines are just appended into a single file, as is often the case, then just compress the file normally, it will likely compress very well.
    435 replies | 131005 view(s)
  • Jon Sneyers's Avatar
    29th May 2020, 18:34
    Yes, that would work. Then again, if you do such non-standard stuff, you can just as well make JPEG support alpha transparency by using 4-component JPEGs with some marker that says that the fourth component is alpha (you could probably encode it in such a way that decoders that don't know about the marker relatively gracefully degrade by interpreting the image as a CMYK image that looks the same as the desired RGBA image except it is blended to a black background). Or you could revive arithmetic coding and 12-bit support, which are in the JPEG spec but just not well supported. I guess the point is that we're stuck with legacy JPEG decoders, and they can't do parallel decode. And we're stuck with legacy JPEG files, which don't have a jump table. And even if we would re-encode them with restart markers and jump tables, it would only give parallel striped decode, not efficient cropped decode.
    15 replies | 842 view(s)
  • suryakandau@yahoo.co.id's Avatar
    29th May 2020, 15:37
    Paq8sk22 - improve text model ​
    95 replies | 8579 view(s)
  • pklat's Avatar
    29th May 2020, 14:58
    pklat replied to a thread Zstandard in Data Compression
    what would be best way to create dictionary for log file, such as these from spamassasin: Oct 19 03:42:59 localhost spamd: spamd: connection from blabla.bla.com :61395 to port 1783, fd 5 Oct 19 03:42:59 localhost spamd: spamd: checking message <0OY10MRFLRL00@blablafake.com> for (unknown):101 Oct 19 03:43:00 localhost spamd: spamd: clean message (3.3/8.0) for (unknown):101 in 2.0 seconds, 8848 bytes. Oct 19 03:43:00 localhost spamd: spamd: result: . 3 - DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HTML_MESSAGE,MIME_HTML_ONLY,MISSING_FROM,RDNS_NONE scantime=2.0,size=8848,user=(unknown),uid=101,required_score=8.0,rhost=blablafake.com,raddr=ip.add.re.ss,rport=45995,mid=<b9a461d565a@blabla.com>,autolearn=no autolearn_force=no how to manually create dictionary?
    435 replies | 131005 view(s)
  • Bulat Ziganshin's Avatar
    29th May 2020, 12:40
    Bulat Ziganshin replied to a thread Brotli in Data Compression
    AFAIK zstd "dictionary" is just prepended data for LZ matches. This approach can be used with any LZ compressor. While brotli dictionary is a list of byte sequences, plus 6 (?) transformations that can be applied to these byte sequences before inserting them into the stream. Yiu can prepend data for LZ matches with brotli too.
    258 replies | 82573 view(s)
  • Bulat Ziganshin's Avatar
    29th May 2020, 12:34
    Bulat Ziganshin replied to a thread Zstandard in Data Compression
    we may need a separate topic, but my little insight is the following: in image compression, we have 2D model and try to predict each pixel using data from left and above. in video, we even have 3rd dimension (previous frame) but general compression usually limited to 1D, although repeated distances and literal masking added a tiny bit of 2nd dimension to the LZ data model patching is natural 2D model - rather than considering it as "ORIGINAL MODIFIED", you should look at it as ORIGINAL MODIFIED this changes the model for LZ back references - we should keep "current pointer" in the ORIGINAL data and try to encode each reference relative to this pointer. It will reduce enocded reference size and thus allow to reference smaller strings from the ORIGINAL data. Also, we can use masked literals, i.e. use "corresponding byte" as the context for encoding the current one Knowledge that we are patching also should allow faster match search. Each time the previous match ends, we have 1) current byte in the MODIFIED data 2) "current byte" in the ORIGINAL data 3) last actually used byte in the ORIGINAL data So, we suppose that the next match may have srcpos near 2 or 3 and dstpos at 1 or a bit later. So we may look around for smaller matches (2-3 bytes) before going to full-scale search
    435 replies | 131005 view(s)
  • pklat's Avatar
    29th May 2020, 11:28
    ok Jyrki, will do. ​forgot to mention I don't have AVX CPU that was required before if that matters.
    155 replies | 37023 view(s)
  • Jyrki Alakuijala's Avatar
    29th May 2020, 10:27
    Jyrki Alakuijala replied to a thread Brotli in Data Compression
    This is likely a misunderstanding. Brotli can use the same linear dictionaries used in zstd and the same tooling. The dictionary mechanisms with simple grammar is in addition to that, but ordinary linear dictionaries can be used. One just gets a bit less benefit from them (but not less than zstd gets from these simple dictionaries). Zstd does not yet support transforms on dictionaries as far as I know.
    258 replies | 82573 view(s)
  • Jyrki Alakuijala's Avatar
    29th May 2020, 10:13
    Could you file an issue either on jpegxl GitLab or on brunsli GitHub repo and we will look at it, and make them not differ. We have run this with lots of filed successfully so I doubt that this is either a special corner case or more likely a recent bug. We are currently converting these compressors into more streaming operation and to more easily streamable apis, and this bug might have come from that effort. Thank you in advance!
    155 replies | 37023 view(s)
  • Shelwien's Avatar
    29th May 2020, 06:38
    Adding decompressor size requires absurd data sizes to avoid exploits (for 1GB dataset, compressed zstd size is still ~0.1% of total result) Otherwise the contest can turn into decoder size optimization contest, if intermediate 1st place is open-source. Also Alex pushes for a mixed dataset (part public, part private, with uncertain shares), but I think that it just combines negatives of both options (overtuning still possible on public part, decoder size still necessary to avoid exploits, compressed size of secret part still not 100% predictable in advance).
    15 replies | 960 view(s)
  • SvenBent's Avatar
    29th May 2020, 04:49
    I cant vote but i would vote private/secret The public dataset Encourage over tuning which is not really helpfull or a show of genreal compression rate. in the realworld the compressor does not know the data ahead of compression time. I would still add size+ de compressor though
    15 replies | 960 view(s)
  • Cyan's Avatar
    29th May 2020, 04:09
    Cyan replied to a thread Zstandard in Data Compression
    So far, we have only thoroughly compared with bsdiff We can certainly extend the comparison to more products, to get a more complete picture. MT support for --patch-from works just fine. In term of positioning, zstd is trying to bring speed to formula : fast generation of patches, fast application of patches. There are use cases which need speed and will like this trade-off, compared to more established solutions which tend to be less flexible in term of range of speed. At this stage, we don't try to claim "best" patch size. There are a few scenarios where zstd can be quite competitive, but that's not always the case. This first release will hopefully help us understand what are users's expectations, in order to select the next batch of improvements. This is a new territory for us, there is still plenty of room for improvements, both feature and performance wise. One unclear aspect to me is how much benefit could achieve a dedicated diff engine (as opposed to recycling our "normal" search engine) while preserving the zstd format. There are, most likely, some limitations introduced by the format, since it wasn't created with this purpose in mind. But how much comes from the format, as opposed to the engine ? This part is unclear to me. Currently, I suspect that the most important limitations come from the engine, hence better patch sizes should be possible.
    435 replies | 131005 view(s)
  • Shelwien's Avatar
    29th May 2020, 02:42
    Shelwien replied to a thread Zstandard in Data Compression
    I asked FitGirl to test it... got this: 1056507088 d2_game2_003.00 // (1) game data 1383948734 d2_game2_003.resources // (2) precomp output 327523769 d2_game2_003.resources.x5 // xdelta -5 245798553 d2_game2_003.resources.x5.8 // compressed 278021923 d2_game2_003.resources.zsp // zstd -patch 247363158 d2_game2_003.resources.zsp.8 // compressed Speed-wise zstd patching seems good, but it has a 2G window limit, MT support for this is unknown, and overall specialized patchers seem to work better.
    435 replies | 131005 view(s)
More Activity