Activity Stream

Filter
Sort By Time Show
Recent Recent Popular Popular Anytime Anytime Last 24 Hours Last 24 Hours Last 7 Days Last 7 Days Last 30 Days Last 30 Days All All Photos Photos Forum Forums
Filter by: Popular Clear All
  • Shelwien's Avatar
    21st May 2020, 19:48
    English plaintext compression task. 1. Public dataset: - we have to add decompressor size to compressed size; - encourages compiler tweaks, discourages speed optimization via inlining/unrolling (lzma.exe.7z: 51,165; zstd.exe.7z: 383,490) - encourages overtuning (participants can tune their entries to specific data); - embedded dictionaries are essentially blocked; 2. Private dataset: - decompressor size has to be limited for technical reasons (<16MB?) - embedded dictionaries can be used fairly - embedded dictionaries actually improve compression results (see brotli vs zstd benchmarks) - we can post hashes of private files in advance, then post the files after end of contest Its for an actual contest that is being prepared. Please vote.
    15 replies | 959 view(s)
  • SolidComp's Avatar
    22nd May 2020, 15:54
    Hi all – I'm curious about any comparisons between JPEG XL and AVIF. What are the advantages and disadvantages of each? Which one is expected to be supported by web browsers? Am I correct in understanding that JEPG XL can be used as an acquisition format, in cameras? To fully replace JPEG we need an in-camera format, not just a web format like webp. Are either of these formats suitable for graphics (replacing PNG and webp), or just for photos? JPEG XL page. AVIF article.
    16 replies | 910 view(s)
  • Jon Sneyers's Avatar
    26th May 2020, 20:30
    Hi everyone! I wrote a blog post about the current state of JPEG XL and how it compares to other state-of-the-art image codecs. https://cloudinary.com/blog/how_jpeg_xl_compares_to_other_image_codecs
    15 replies | 839 view(s)
  • CreepyLP's Avatar
    11th May 2020, 20:07
    Hey Guys, i found out abourepacks like a month ago. And now i wanna try to make my own ones. I already got a Game which i want to RePack and Maybe to Crack. It's Lara Croft and the Temple of Osiris. But what have i to do? As u see i am an noob. (DOn't be confused if u din't understzand the following: It's German) Am liebsten wäre es mir wenn es mir vielleicht sogar jemand auf Deutsch erklären könnte. Wie gesagt ich bin noch Grün hinter den Ohren, aber Lernbereit und Lernwillig. Bereits getan habe ich folgendes: Lara Croft and the Temple of Osiris als Game installiert Lara Croft and the Temple of Osiris als RePack installiert(um Unterschiede zu sehen) FreeArc, InnoSetup, RessourceHacker und dotPeek installiert Und die Steam API.dll Files von Lara Croft and the Temple of Osiris mit der von GTA V und der gerepackten verglichen. (Here the same Stuff in English:) As I already said I am really a noob on this Topic, but I want and I will learn. What I already did: Installed Lara Croft and the Temple of Osiris from Steam Installed Lara Croft and the Temple of Osiris as RePack(to see the differences) Installed FreeArc, InnoSetup, RessourceHacker and dotPeek Looked at the Steam_API.dll files of Lara Croft and the Temple of Osiris GTA V and the repacked one to see differences. What should i do next? Stay healthy!
    5 replies | 666 view(s)
  • Bulat Ziganshin's Avatar
    7th May 2020, 13:15
    Note that you absolutely need to compile with optimizations on (-O2) in order to get correct result. The algorithm is based on assumptions that a CPU is superscalar and can perform ADD and XOR operations in a single cycle #include <stdio.h> #include <stdlib.h> #include <sys/time.h> // Perform CYCLES simple in-order operations unsigned loop(int CYCLES) { unsigned a = rand(), b = rand(), x = rand(); for (int i=0; i < CYCLES/10; i++) { x = (x + a) ^ b; x = (x + a) ^ b; x = (x + a) ^ b; x = (x + a) ^ b; x = (x + a) ^ b; } return x; } ​ int main(int argc, char *argv) { int CYCLES = 100*1000*1000; unsigned x = loop(CYCLES/10); // warm up the cpu struct timeval tm, tn; gettimeofday(&tm, NULL); x += loop(CYCLES); gettimeofday(&tn, NULL); double t1 = (tn.tv_sec - tm.tv_sec) + (tn.tv_usec - tm.tv_usec) / 1e6; if (x) printf("Time: %.6f s, CPU freq %.2f GHz\n", t1, (CYCLES/1e9)/t1); return 0; }
    6 replies | 709 view(s)
  • SolidComp's Avatar
    25th May 2020, 17:39
    Hi all – @Kirr made an incredibly powerful compression benchmark website called the Sequence Compression Benchmark. It lets you select a bunch of options and run it yourself, with outputs including graphs, column charts, and tables. It can run every single level of every compressor. The only limitation I see at this point is the lack of text datasets – it's mostly genetic data. @Kirr, four things: Broaden it to include text? Would that require a name change or ruin your vision for it? It would be great to see web-based text, like the HTML, CSS, and JS files of the 100 most popular websites for example. The gzipper you currently use is the GNU gzip utility program that comes with most Linux distributions. If you add some text datasets, especially web-derived ones, the zlib gzipper will make more sense than the GNU utility. That's the gzipper used by virtually all web servers. In my limited testing the 7-Zip gzipper is crazy good, so good that it approaches Zstd and brotli levels. It's long been known to be better than GNU gzip and zlib, but I didn't know it approached Zstd and brotli. It comes with the 7-Zip Windows utility released by Igor Pavlov. You might want to include it. libdeflate is worth a look. It's another gzipper. The overarching message here is that gzip ≠ gzip. There are many implementations, and the GNU gzip utility is likely among the worst.
    6 replies | 435 view(s)
  • SolidComp's Avatar
    8th May 2020, 20:42
    Hi all – What's the state of the art in archive formats, or combined archive+compression formats? Is 7z considered the best? (Does Zstd have archive features?) Did Bulat have one? It seems like there hasn't been much movement from TAR files, because I see people slinging them around constantly. I'm vaguely aware of RAR but I don't know much about it. I stumbled on this article by some dude named Antonio Diaz, arguing that XZ is inadequate for long-term archival. I haven't finished it yet, but it got me thinking about specifying a super safe archival and compression format for long-term data archival. Most of these formats seem largely undocumented, which is strange. It would be nice to have a combined format so that it takes only one step to both archive and compress. What makes for a good archival format? What features or philosophy? Thanks.
    2 replies | 603 view(s)
  • smjohn1's Avatar
    4th May 2020, 18:32
    Does anyone know any existing tools that can display content of LZ compressed data, e.g., lz4 or deflate? It is not a decoder, but rather one step before the decoder, i.e., own shows, say (L, 6)(M, 10, 20) etc, where ( L,6) stands for a literal of length 6, and (M,10,20) stands for a match with length 10 and a step of 20 away. I know it is not hard to write such a tool, but just wondering if such tools already exist. ( Just lazy ). Thanks in advance for help tips.
    2 replies | 524 view(s)
  • smjohn1's Avatar
    26th May 2020, 00:36
    README.md: `LZ4_DISTANCE_MAX` : control the maximum offset that the compressor will allow. Set to 65535 by default, which is the maximum value supported by lz4 format. Reducing maximum distance will reduce opportunities for LZ4 to find matches, hence will produce worse the compression ratio. The above is true for high compression modes, i.e., levels above 3, but the opposite is true for compression levels 1 and 2. Here is a test result using default value ( 65535 ): <TestData> lz4-v1.9.1 -b1 enwik8 1#enwik8 : 100000000 -> 57262281 (1.746), 325.6 MB/s ,2461.0 MB/s and result using a smaller value ( 32767 ): <TestData> lz4-1.9.1-32 -b1 enwik8 1#enwik8 : 100000000 -> 53005796 (1.887), 239.3 MB/s ,2301.1 MB/s Anything unusual in LZ4_compress_generic() implementation? Could anyone shed some light? Thanks in advance.
    4 replies | 229 view(s)
  • redrabbit's Avatar
    Today, 01:40
    Hi! I remember a zip program wich can create a zip file with the same CRC/size no matter where you using that tool, actually i tried to compress a buch of files with 7zip 16.04 (64 bits) and zip 3.00 in Linux and in Windows but the final files don't have the same size, even i tried to stored the files but i get different results Example: wine zip.exe -rq -D -X -0 -A testwindows.zip *.txt zip -rq -D -X -0 -A testlinux.zip *.txt md5sum *.zip 725d46abb1b87e574a439db15b1ba506 testlinux.zip 70df8fe8d0371bf26a263593351dd112 testwindows.zip As i said i remember a zip program (i don't know the name) who the author said that the result was always the same regardless of the platform (win, linux...)
    3 replies | 43 view(s)
  • lz77's Avatar
    26th May 2020, 10:47
    https://habr.com/ru/news/t/503658/ ​Sorry, in Russian.
    2 replies | 124 view(s)
  • xezz's Avatar
    18th May 2020, 06:00
    xezz started a thread ppmini in Data Compression
    It is compact PPM compressor.size of UI + compressor + decompressor is only 1022bytes! enjoy. ppmini-time.7z gives process time.
    1 replies | 291 view(s)
  • Bulat Ziganshin's Avatar
    8th May 2020, 15:11
    It's really hard to find good benchmarks of ARM vs x86 CPUs, but Anand provides some measurements on industry-standard benchamrks: Note that while Apple A13 has the same speed as Ryzen/Core, the frequencies is at least 1.5x different, so Apple CPUs already has 1.5x higher IPC!!! And while you may think that they have special optimizations for SPEC, I've also seen 7-zip benchmark results comparing Intel and ARM cpus - Intel has better IPC on compression (due to 128-bit memory controller, I believe), while ARM has better IPC on decompression. And you can see above that Apple CPUs are significantly better than ARM own ones.
    1 replies | 272 view(s)
  • lz77's Avatar
    31st May 2020, 18:12
    Why even if we have a hash table of 128K cells and remember hash for each position can often be done subj? For example: let we found match from current position for substring 'abcd' in string ...zabcd..., then we found that 'zabcd' also matches. Sorry for my English...
    1 replies | 164 view(s)
  • pacalovasjurijus's Avatar
    11th May 2020, 17:36
    This code you can hack password 36 symbols, you need to use session, caculation. Becarful this code use very much cpu. This is php: /*if(isset($_SESSION) && $_SESSION!="$")//Session { if($_SESSION=="1"){ echo "&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp password crack &nbsp;&nbsp;".$_SESSION."<br>";// show comment } } */ /* {$sqlj = "SELECT * FROM Students WHERE student_id = '$ddd'";} $items = $dbConnection->prepare("$sqlj"); $items->execute();//mysql select from table student_id foreach ($items as $row) //read rows { for($u=0; $u<4; $u++) { } } $rowwwhash=$row;//take hash password code if($_SESSION=="1"){ if(!isset($_SESSION)) { $uu=0;//value if(isset($_SESSION)) { $uu=$_SESSION; } $uue=$uu+150;//count numbers use not more then 150 or more to much for cpu. while($uu<=$uue) { $pass1=base_convert($uu, 10, 36); // translate from 2 to 36 binary 2, octal 8, decimal 10, hex 16 and numbers and letters 36 if (password_verify($pass1, $rowwwhash)) {//password check of hash code which pasword $pass2=$pass1; $_SESSION="$pass2";//string save session echo '<meta http-equiv="refresh" content="0; Update_Details.php">'; } $uu++; $_SESSION="$uu";//save number echo '<meta http-equiv="refresh" content="0; Update_Details.php">';//run page } } } */
    0 replies | 208 view(s)
  • lz77's Avatar
    18th May 2020, 11:54
    Subj. Sorry, in Russian... https://www.youtube.com/watch?v=SZSjp-O7Ixo
    0 replies | 99 view(s)
  • pklat's Avatar
    Today, 08:49
    file timestamps are stored also, and iirc they differ in NTFS and other filesystems
    3 replies | 43 view(s)
  • Shelwien's Avatar
    Today, 06:50
    Windows and linux actually have different file attributes, so you'd need a hacked zip which doesn't store attributes, otherwise normally it won't happen. You can try using a zip built under cygwin/msys2 on windows side, but even these would have different default unix attributes (eg. cygwin shows 0770 for windows files), and possibly also custom defines for cygwin which would make them use winapi.
    3 replies | 43 view(s)
  • Shelwien's Avatar
    Today, 06:43
    > wouldn't have been able to depend on compression routines in Windows. I actually mean chunks of existing code, like https://en.wikipedia.org/wiki/Return-oriented_programming#Attacks https://github.com/JonathanSalwan/ROPgadget#screenshots
    26 replies | 1395 view(s)
  • ivan2k2's Avatar
    Today, 06:06
    1) try to compess non-textual files and check results 2) try to compress your text files with -ll or -l option and check results
    3 replies | 43 view(s)
  • Stefan Atev's Avatar
    Today, 04:49
    I can see that, though it goes against my instincts :) I have seen people extract common instruction sequences into subroutines even if they were pretty arbitrary and logically unrelated; you eat 3 bytes to do a call (and one ret) each time you need the sequence, and that is basically an "executable LZ"; I can see how actual LZ would quickly be better since matches are encoded more efficiently than even near calls. However, for some data LZ is not that great, while a custom encoding could work quite well. None of the stuff I ever wrote assumed anything more than DOS, wouldn't have been able to depend on compression routines in Windows.
    26 replies | 1395 view(s)
  • introspec's Avatar
    Today, 01:54
    I think some people made use of tricks like this. I have a lot of experience with older computers, for them data compression pretty much did not exist. I'd love to be proved wrong here, but I'd be very surprised if any of 1980s machines has anything of the kind in their ROMs.
    26 replies | 1395 view(s)
  • introspec's Avatar
    Today, 01:51
    I think that there are two approaches to making a compressed intro. First, more common one, would be to compress your well-tuned code so that a bit of extra squeeze can be achieved. This is very traditional strategy, but it is not the only one. Second strategy is to design data structures and also your code to help the compressor. E.g. often in a compressed intro a short loop can be replaced by a series of unrolled statements - an insane strategy in a size-optimized world, but quite possibly viable approach if you know that intro will be compressed. A complete paradigm shift is needed in this case, of course.
    26 replies | 1395 view(s)
  • introspec's Avatar
    Today, 01:46
    1) I know some neat examples of Z80 decompressors. I am not aware of any systematic lists. I recently did some reverse-engineering of ZX Spectrum based 1Ks. About one third of them were packed; the most popular compressors seemed to be ZX7, MegaLZ and BitBuster (in order of reducing popularity, note that the respective decompressor sizes are 69, 110 and 88 bytes). 2) Maybe yes, but the large influence of the decompressor size means that data format becomes a lot more important than usual. I think that this implies a lot of scope for adaptivity and tricks.
    26 replies | 1395 view(s)
  • introspec's Avatar
    Today, 01:40
    Frankly, I do not think Crinkler (or similar tools) are very relevant to this thread. You are right that there could be improvements to the decompressor, but I was trying to say that you won't get LZMA into sub 50-100b decompressor, so although an amazing tool for 4K or 8K intros, it is just a different kind of tool. Your idea to only have match length of 2 is cool (although I need to try it in practice to see how much ratio one loses in this case). The smallest generic LZ77 on Z80 that I know has an 18 byte decompression loop, so your 17 byte loop would be interesting to see - have you published it anywhere? In any case, I am working on a small article about such mini-decompressors and is definitely looking forward for anything you will write. I mainly code on Z80, so I do not know much about prefix emitters. Can you point to any discussion of what they can look like?
    26 replies | 1395 view(s)
  • maadjordan's Avatar
    Today, 01:33
    maadjordan replied to a thread WinRAR in Data Compression
    As winrar does not support compressing with 7-zip and its plugins, would kindly provide a reduced version of your plugins for extraction only. Many Thanks
    185 replies | 129806 view(s)
  • maadjordan's Avatar
    Today, 01:32
    maadjordan replied to a thread WinRAR in Data Compression
    :)
    185 replies | 129806 view(s)
  • Darek's Avatar
    Today, 00:33
    Darek replied to a thread Paq8pxd dict in Data Compression
    I've tested best options for Byron's dictionary on 4 Corpuses files. It was made for paq8pxd v85 version. Of course I realise that It could works only for some versions but it looks at now it works. The best results are for Silesia Corpus -> 74KB of gain which is worth to use, for other corpuses gains are smaller but there always something. Files not mentioned below didn't get any gain due to use -w option or -exx. file: option SILESIA dickens: -e77,dict mozilla: -e26,dict osdb: -w reymont: -w samba: -e133,dict sao: -w webster: -e373,dict TOTAL Silesia savings = 74'107 bytes CALGARY book1: -e47,dict book2: -e43,dict news: -e97,dict paper2: -e34,dict progp: -e75,dict Calgary.tar: -e49,dict TOTAL Calgary savings = 1'327 bytes Calgary.tar savings = 3'057 bytes CANTERBURY alice29.txt: -e38,dict asyoulik.txt: -e53,dict lcet10.txt: -e54,dict plrabn12.txt: -e110,dict Canterbury.tar: -e95,dict TOTAL Canterbury savings = 873 bytes Canterbury.tar savings = 1'615 bytes MAXIMUM COMPRESSION world95.txt: -e22,dict TOTAL Maximum Compression savings = 1'449 bytes Due to all settings and changes Maximum Compression score for paq8pxd v89 is below 6'000'000 bytes! First time ever! (w/o using tarball option)
    922 replies | 315206 view(s)
  • Shelwien's Avatar
    Yesterday, 23:27
    Windows has some preinstalled compression algorithms actually (deflate, LZX/LZMS): http://hugi.scene.org/online/hugi28/hugi%2028%20-%20coding%20corner%20gem%20cab%20dropping.htm I wonder if same applies to other platforms? Maybe at least some relevant code in the ROM?
    26 replies | 1395 view(s)
  • Gotty's Avatar
    Yesterday, 18:18
    Gotty replied to a thread paq8px in Data Compression
    Please specify what "digits" mean. Do you mean single-digit ASCII decimal numbers one after the other with no whitespace, like "20200602"? I'm not sure what you mean by "in detail". It would not be easy to demonstrate in a forum post what paq8px does exactly. Especially because paq8px is heavy stuff - it needs a lot of foundation. I suppose you did some research, study and reading since we last met and you would like to dig deeper? If you need real depth, you will need to fetch the source code and study it. However if you have a specific question, feel free to ask it here.
    1857 replies | 539243 view(s)
  • Stefan Atev's Avatar
    Yesterday, 18:05
    My experience being with x86 1K intros, this certainly resonates; at the end of the day, the tiny (de)compressor should only be helping you with code - all the data in the intro should be custom-packed anyway, in a way that makes it difficult to compress for LZ-based algorithms. For example, I remember using basically 2bits per beat for an audio track (2 instruments only, both OPL-2 synth); Fonts would be packed, etc. 4K is different, I think there you just have a lot more room. And for 128B and 256B demos, compression is very unlikely to help, I think.
    26 replies | 1395 view(s)
  • Gotty's Avatar
    Yesterday, 17:58
    How many bits are used for addressing the hash table (or: how many slots do you have)? How do you exactly implement hashing (do you shift >>)? What do you hash exactly?
    7 replies | 25105 view(s)
  • introspec's Avatar
    Yesterday, 17:05
    Yes, I know about Saukav too. I did not have time to do detailed testing of it, but I understand what it does quite well and it should offer compression at the level of Pletter (likely somewhat better than Pletter), while being fairly compact. I believe that its approach to adaptivity, with multiple specific decompressors offered, is an excellent way to increase the compression "for free". However, I strongly suspect that a better solution must be available, most likely for 1K intros and definitely for 256b intros. I can explain what I mean as follows: Suppose you are working on a 1K intro that uses Saukav and at some point you reach the situation where the compressed intro together with decompressor uses up all available space. Suppose that the average decompressor length is 64 bytes (this is the size of zx7b decompressor - the origin of Saukav). Then your compressed size is 1024-64=960 bytes. I do not know the exact ratio offered by Saukav, so I'll use Pletter's ratio of 1.975 as a guide. Hence, our intro is actually 960*1.975=1896 bytes long. Let us now consider switching to a better compressor, e.g. Shrinkler, which is LZMA-based and compresses at the level similar to 7-zip. Its ratio on the same small file corpus that I use for many of my tests is about 2.25. Thus, compressed by Shrinkler our intro will become 1896/2.25~843 bytes long (I should be saying "on average", but it is very annoying to repeat "on average" all the time, so I assume it implicitly). We saved 960-843=117 bytes, which may sound great, yet in fact is useless. The shortest decompressor for Shrinkler on Z80 is 209 bytes long, so we saved 117 bytes in data, and added 209-64=145 bytes in decompressor, i.e. lost 28 bytes overall. The point is, when you are making a 1K intro, Shrinkler will lose to Saukav (to ZX7, MegaLZ, to basically any decent compressor with compact enough decompressor). When working with 4K of memory, these concerns become pretty much irrelevant, you can use Crinkler, Shrinkler or any other tool of your choice. But at 1K the ratio becomes less of a concern, as long as it is not completely dreadful, and decompressor size starts to dominate the proceedings. For 256b intros the situation is even more dramatic. I made one such intro using ZXmini (decompressor size 38 bytes) and found it (the decompressor) a bit too large. More can be done for sure. So, looking at your graph, I do not know what is your target size, but for anything <=1K trust me, without decompressor size included, this data is meaningless.
    26 replies | 1395 view(s)
  • lz77's Avatar
    Yesterday, 14:44
    In my LZ77 like algorithm the magic number 123456789 is always better than famous Knuth's prime 2654435761. I tested it on enwik8, enwik9, silesia.tar, LZ4.1.9.2.exe, ... I tried to using two hash functions (one for even value, the other for odd value) but this worsened the result.
    7 replies | 25105 view(s)
  • Trench's Avatar
    Yesterday, 06:07
    Sorry. Yes True. But something like that wont be as versatile as the bigger ones. Kind of like having a Swiss army knife as a daily kitchen knife. Unless most of the code is in the file which that wont work, or it compressed itself. But it depends what one wants to decompress since one size cant fit all I assume. zip programs like all programs are like satellite programs relying on the main computer OS they run on. So the file is technically bigger. If you were to put it in another OS or older they wont function or be as small. So if the OS has tile files that help the zip program then you can get something smaller. So in a way its kind of cheating. What would a math professor come up with? But again its good to try for fun. Just an opinion.
    26 replies | 1395 view(s)
  • Sportman's Avatar
    1st June 2020, 22:23
    Sportman replied to a thread Paq8sk in Data Compression
    I guess -x14, I do only fast tests this moment so no enwik9.
    95 replies | 8576 view(s)
  • Dresdenboy's Avatar
    1st June 2020, 22:18
    Trench, I understand these more pragmatic and/or philosophical considerations. They could even lead to erasing the benefit of compressors like PAQ due to the computational costs. But here we're discussing decompressors, which ought to be used for older and curent platforms, with constraints for the actual code's size (like a program that fits into a 512 byte bootsector, maybe with less than that available for custom code). There are competitions for that.
    26 replies | 1395 view(s)
  • Trench's Avatar
    1st June 2020, 19:55
    A file like 1234567890abcdefghij is Size = 20 bytes (20 bytes) size on disk = 4.00 KB (4,096 bytes) But if you have the file name be 1234567890abcdefghij.txt and erase the file content then you have 0 bytes as cheezy as that sounds. i assume it would still take the same size on your hard drive even if the file is 0 bytes. Even i the file was 0 Bytes the file compression goes as low as 82 bytes, and with the 20 bytes its 138, 136 if its 1 letter 20 times. Sure it has some pointless info in the compressed file just like how many pictures also have pointless info. Example "7z¼¯' óò÷6 2 ÷|\N€€  z . t x t  ªÚ38Ö " Pointless to even have "z . t x t " with so many spaces On a side note we live in a society which ignores wast fullness so in everything you see their is plenty of waste. Kind of like how if everyone did not throw away at least 2 grains of rice a day it be enough to feed every malnutrition starving person which is the main cause of health issues in the world, yet most throw away 1000 times more than that, even good quality since many confuse best by date to assume its expiration date which their are no expiration dates on food. The file compression programs dont let you set the library that small. Point is why should it be done? It can be depending on the file. I assume the bigger the file compression program is to store various methods then smaller the file. It would take more Hard drive space to have 1000 20 byes of file than 1 big file. The transfer rate would also get hurt greatly. Sure its possible to have a algorithm for something small but again no one would bother. If you intent to use that method on bigger file I am guessing it might be limited. But might be fun challenge find for someone to do.
    26 replies | 1395 view(s)
  • Dresdenboy's Avatar
    1st June 2020, 18:00
    Adding to 1): 6502's ISA likely is too limited to do sth useful in <40B. BTW is there a comprehensive list of decompressor sizes and algos somewhere? You mentioned several ones in your Russian article. Adding to 2): Chances of being able to compete are diminishing quickly with less code. Compressed data size + decompressor might be an interesting metric. But I got an idea to do some LZ78 variant. Maybe it stays below 40B.
    26 replies | 1395 view(s)
  • Dresdenboy's Avatar
    1st June 2020, 17:26
    Interesting idea. So is it pop cs as a jump instruction? With a COM executable and initial registers set to DS/CS (and useful constants like -2 or 100h in others) this sounds good.
    26 replies | 1395 view(s)
  • JamesB's Avatar
    1st June 2020, 16:02
    JamesB replied to a thread Brotli in Data Compression
    With libdeflate I get 29951 (29950 with -12). So close to zstd (I got 29450 on that).
    258 replies | 82570 view(s)
  • JamesB's Avatar
    1st June 2020, 15:54
    It's good for block based formats, but the lack of streaming may be an issue for general purpose zlib replacement. However even for a streaming gzip you could artificially chunk it into relatively large blocks. It's not ideal, but may be the better speed/ratio tradeoff still means it's a win for most data types. We use it in bgzf (wrapper for BAM and BCF formats), which has pathetically small block sizes, as a replacement for zlib.
    6 replies | 435 view(s)
  • suryakandau@yahoo.co.id's Avatar
    1st June 2020, 15:36
    Sportman could you test enwik9 using paq8sk23 -x4 -w -e1,English.dic please ?
    95 replies | 8576 view(s)
  • Sportman's Avatar
    1st June 2020, 12:57
    Sportman replied to a thread Paq8sk in Data Compression
    enwik8: 15,753,052 bytes, 13,791.106 sec., paq8sk23 -x15 -w 15,618,351 bytes, 14,426.736 sec., paq8sk23 -x15 -w -e1,english.dic
    95 replies | 8576 view(s)
  • Fallon's Avatar
    1st June 2020, 07:46
    Fallon replied to a thread WinRAR in Data Compression
    WinRAR - What's new in the latest version https://www.rarlab.com/download.htm
    185 replies | 129806 view(s)
  • Shelwien's Avatar
    1st June 2020, 06:22
    Fast encoding strategies like to skip hashing inside of matches. Otherwise, you can just get collisions for hash values - its easily possible that the cell for 'zabc' would be overwritten, while the cell for 'abcd' won't.
    1 replies | 164 view(s)
  • suryakandau@yahoo.co.id's Avatar
    1st June 2020, 02:52
    you are right but the result is better than paq8pxd series.:)
    95 replies | 8576 view(s)
  • Stefan Atev's Avatar
    1st June 2020, 01:56
    That was my experience for sure, I think I had to just make sure decompression started on a 16B-aligned offset so you could later just bump a segment register to point to the decompressed when you jump to it.
    26 replies | 1395 view(s)
  • Sportman's Avatar
    31st May 2020, 19:39
    The Ongoing CPU Security Mitigation Impact On The Core i9 10900K Comet Lake: "At least for the workloads tested this round, when booting the new Intel Core i9 10900K "Comet Lake" processor with the software-controlled CPU security mitigations disabled, the overall performance was elevated by about 6% depending upon the workload." https://www.phoronix.com/scan.php?page=article&item=intel-10900k-mitigations
    19 replies | 4522 view(s)
  • Jyrki Alakuijala's Avatar
    31st May 2020, 17:53
    Many major games installs for PS4 that I observed were writing 100 kB/s for a large fraction of the install. This is pretty disappointing since there are 5000x faster decompression solutions readily available in open source. Somehow just going for a 15000x faster commercial solution is unlikely going to help unless the system level problems are fixed first. Most likely these relate to poorly planned or designed optical disc I/O, or data layout on the optical disc, not decompression.
    46 replies | 24182 view(s)
  • Jyrki Alakuijala's Avatar
    31st May 2020, 17:47
    Smallest decoder for general purpose decoding just starts executing the compressed signal.
    26 replies | 1395 view(s)
  • Darek's Avatar
    31st May 2020, 16:58
    Darek replied to a thread Paq8sk in Data Compression
    Could you post source code in every version? Short time test: paq8sk23 is about - 17% faster than paq8sk22 but is still about - 40% slower than paq8sk19 and - 78% slower than paq8pxd series.
    95 replies | 8576 view(s)
  • suryakandau@yahoo.co.id's Avatar
    31st May 2020, 15:17
    Paq8sk23 - improve text model - faster than paq8sk22 ​
    95 replies | 8576 view(s)
  • Dresdenboy's Avatar
    31st May 2020, 13:42
    As you're mentioning Crinkler: I had some interesting discussion with Ferris, who created Squishy. He uses a small decompressor to decompress the actual decompressor. He said, the smaller one is about 200B. Then there is xlink, where unlord planned to have a talk at Revision Online 2020, but which has been cancelled. But there seems to be some progress, which he didn't publish yet. This might also be interesting to watch. BTW my smallest decompression loop (for small data sizes and only match length of 2) is 12B. Making it more generic for offsets "blows" it up to 17B. Typical LZ with multiple lengths is starting at ~20B depending on variant and assumptions. There likely are similarities to Stefan Atev's lost one. But I have to continue testing all those variants (with their respective encoders) to publish more about them. Another idea was to encode some x86 specific prefix. Such a prefix emitter can be as small as 15B.
    26 replies | 1395 view(s)
  • Dresdenboy's Avatar
    31st May 2020, 02:21
    No problem. Accumulating this information sounds useful. I also collected a lot of information and did some analysis both of encoding ideas for small executables and existing compressors. Recently I stumbled over ZX7 and related to it: Saukav. The latter one is cool as it creates a decompressor based on the actual compression variant and parameters. Before finding it I already deemed this a necessity to keep sizes small. Especially for tiny intros, where a coder could omit some of the generic compressor code (code mover, decompression to original address) to save even more bytes (aside from coding compression-friendly). Here is an example with some of the tested compression algorithms (sizes w/o decompressor stubs and other data blocks, e.g. in the PAQ archive file format), leaving out all samples with less than 5% reduction, as they might be compressed already: Also interesting would be the total size incl. decompressor (not done yet). In this case we might just see different starting offsets (decompressor stub, tables etc.) on Y axis and different gradients with increasing X.
    26 replies | 1395 view(s)
  • introspec's Avatar
    30th May 2020, 23:54
    Yes, thank you. I should have mentioned that when I gave my estimated tiny compressor sizes, I had a quick look in several places and definitely used Baudsurfer's page for reference. Unfortunately, his collection of routines is not very systematic (in the sense that I know better examples for at least some CPUs, e.g. Z80), so I am hoping that a bit more representative collection of examples can be gradually accumulated here.
    26 replies | 1395 view(s)
  • Shelwien's Avatar
    30th May 2020, 21:41
    I would also like to see some other limitations of the contest: > I read that there would be a speed limit, but what about a RAM limit. I guess there would be a natural one - test machine obviously won't have infinite memory. > There are fast NN compressors, like MCM, or LPAQ. Yes, these would be acceptable, just not full PAQ or CMIX. > It's hard to fight LZ algorithms like RAZOR so I wouldn't try going in that direction. Well, RZ is a ROLZ/LZ77/Delta hybrid. Its still easy enough to achieve better compression via CM/PPM/BWT (and encoding speed too). Or much faster decoding with worse compression. > Are AVX and other instruction sets allowed? Yes, but likely not AVX512, since its hard to find a test machine for it. > What would be nice is some default preprocessing. > If it's an english benchmark, why shouldn't .drt preprocessing (like the one from cmix) > be available by choice (or .wrt + english.dic like the one from pax8pxd). I proposed that, but this approach has a recompression exploit - somebody could undo our preprocessing, then apply something better. So we'd try to explain that preprocessing is expected and post links to some open-source WRT implementations, but the data won't be preprocessed by default. > It would save some time for the developers not to incorporate them into their compressors, > if there were a time limit for the contest. It should run for a few months, so there should be enough time. There're plenty of ways to make a better preprocessor, WRT is not the only option (eg. NNCP preprocess outputs 16-bit alphabet), so its not a good idea to block that and/or force somebody to work on WRT reverse-engineering.
    15 replies | 959 view(s)
  • Jarek's Avatar
    30th May 2020, 18:19
    Jarek replied to a thread Kraken compressor in Data Compression
    Road to PS5: https://youtu.be/ph8LyNIT9sg?t=1020 custom kraken >5GB/s decompressor ...
    46 replies | 24182 view(s)
  • Dresdenboy's Avatar
    30th May 2020, 17:57
    Thanks for opening this thread. I'm working on my own tiny decompression experiments. And for starters let me point you to Baudsurfer's (Olivier Poudade) assembly art section on his Assembly Language Page: http://olivier.poudade.free.fr/ (site seems a bit buggy sometimes), which has several tiny compressors and decompressors for different platforms.
    26 replies | 1395 view(s)
  • Darek's Avatar
    30th May 2020, 14:33
    Darek replied to a thread Paq8sk in Data Compression
    I will. At least I'll try :) I need 2-3 days to finish task which is in progress and then I'll start paq8sk19. paq8sk22 looks for me like move in not good direction - very slightly improvrment affected double of compression time.
    95 replies | 8576 view(s)
  • Darek's Avatar
    30th May 2020, 14:29
    Darek replied to a thread Paq8sk in Data Compression
    @Sportman - it's dramatic change in compression time - does this version use much more memory than previous?
    95 replies | 8576 view(s)
  • AlexDoro's Avatar
    30th May 2020, 09:39
    I would vote for private. I would also like to see some other limitations of the contest: I read that there would be a speed limit, but what about a RAM limit. There are fast NN compressors, like MCM, or LPAQ. I mean they could be a starting point for some experimental fast compressors. It's hard to fight LZ algorithms like RAZOR so I wouldn't try going in that direction. Are AVX and other instruction sets allowed? What would be nice is some default preprocessing. If it's an english benchmark, why shouldn't .drt preprocessing (like the one from cmix) be available by choice (or .wrt + english.dic like the one from pax8pxd). It would save some time for the developers not to incorporate them into their compressors, if there were a time limit for the contest.
    15 replies | 959 view(s)
  • suryakandau@yahoo.co.id's Avatar
    30th May 2020, 06:21
    How about paq8sk22 -x15 -w -e1,english.dic for enwik8
    95 replies | 8576 view(s)
  • Trench's Avatar
    30th May 2020, 05:04
    1 What is a programmer? A translator from one language to another. What is a designer? A person that creates. So what is a programmer that tries to create a better file compression like? =A translator that want to change profession to be the next best selling author like Steven King. 2 What you probably should not say when you failed to pack a file? =Fudge, I failed to pack it 3 watch out for your phrases if you ask another if they can squeeze your dongle. 4 A programmer was asked what are you doing and they say concentrating on how to un concentrate something. The othee say easy have a few beers. 5 So the drunk programmer went and bought un-concentrated orange juice to sat on the carton, and a person asks why are they sitting on it. They say to concentrate it obviously. 6 When you ask another to compress lemon juice for you and then wonder if it can be decompressed then maybe its time to take a break. 7 Don't be impressed if someone compresses a file for you 99% the file size since they will say it cant be decompressed since you didnt also ask for that too. 8 A judge told your lawyer to zip it and you misunderstand and says no RAR and you are filed under contempt for growling at the judge. 9 A programmer says he was looking for new ways of packing files all day and another says you must be tiered from lifting so many files. 10 You friend wants forgiveness from you and send you a gift in a 7 zip file. You un-compress the file, and their is another compressed file and after 7th file its still compressed with a new name saying Matthew 18:21-22. Can you guess how many files you have left to un-compress?
    7 replies | 1659 view(s)
  • Amsal's Avatar
    30th May 2020, 03:53
    Well I can't vote but I would go with private dataset option. Few of the reasons why I prefer Option 2 over Option 1 are: 1. The resulting compressor/algorithm have more general use in practical ways than a compressor which is optimized for a specific file/dataset which is pretty useless most of the time if you see. 2. Allowing the use of dictionary is also a great add on in the contest. 3. I have no problem(and I suppose most of people won't have) if a algorithm/compressor is using 10 methods(precomp+srep+lzma etc..) or just modifying 1 method (like lzma) to produce better results on multiple datasets until its getting the results which could be used as a better option in practical ways on multiple datasets. I totally agree with these three points by you as well and it would be very great to have a contest like this. And according to me, I wouldn't even care for 16MB compressor if it really saves more size than any other compressor when I compress a 50GB dataset to something like 10GB while other compressors are around 12GB, so a 16mb compressor is a mere small size to account for but anyways its a competition so we take account of everything so fine by me :D
    15 replies | 959 view(s)
  • Trench's Avatar
    30th May 2020, 02:53
    AMD was for years mostly better based off price/performance. It is like why pay a billion for something that gives 1% better gain while the other is a dollar for 1% less than the others. You can buy more AMD cpu to out perform Intel. Its just that Intel has better marketing. AMD just does not get it since they are bad at presentation and i dont even understand their order and always reference Intel as a benchmark for performance. Big game companies got it and used AMD for a while now. So in short a million dollars of AMD cpu can beat a million dollars worth of Intel CPU. But to be fair 15 years is a long time. Also linux still sucks which will remain at 2% popularity since they just don't get it, and can not give it away for free no matter how many types they have. Its mainly a hobby os and not fit to use for the people which makes it pointless despite more powerful. Android is better since designers took over. Which goes to show never hire a translator to write novels, just like never hire a programmer as a designer. Progress is slowed down since one profession insist on doing other professionals work.
    2 replies | 124 view(s)
  • Sportman's Avatar
    30th May 2020, 01:47
    Sportman replied to a thread Paq8sk in Data Compression
    enwik8: 15,755,063 bytes, 14,222.427 sec., paq8sk22 -x15 -w 15,620,894 bytes, 14,940.285 sec., paq8sk22 -x15 -w -e1,english.dic
    95 replies | 8576 view(s)
  • suryakandau@yahoo.co.id's Avatar
    29th May 2020, 19:54
    The result using paq8sk22 -s6 -w -e1,English.dic on Dickens file is 1900420 bytes
    95 replies | 8576 view(s)
  • Cyan's Avatar
    29th May 2020, 19:41
    Cyan replied to a thread Zstandard in Data Compression
    This all depends on storage strategy. Dictionary is primarily useful when there are tons of small files. But if the log lines are just appended into a single file, as is often the case, then just compress the file normally, it will likely compress very well.
    435 replies | 131004 view(s)
  • Jon Sneyers's Avatar
    29th May 2020, 18:34
    Yes, that would work. Then again, if you do such non-standard stuff, you can just as well make JPEG support alpha transparency by using 4-component JPEGs with some marker that says that the fourth component is alpha (you could probably encode it in such a way that decoders that don't know about the marker relatively gracefully degrade by interpreting the image as a CMYK image that looks the same as the desired RGBA image except it is blended to a black background). Or you could revive arithmetic coding and 12-bit support, which are in the JPEG spec but just not well supported. I guess the point is that we're stuck with legacy JPEG decoders, and they can't do parallel decode. And we're stuck with legacy JPEG files, which don't have a jump table. And even if we would re-encode them with restart markers and jump tables, it would only give parallel striped decode, not efficient cropped decode.
    15 replies | 839 view(s)
More Activity