Activity Stream

Filter
Sort By Time Show
Recent Recent Popular Popular Anytime Anytime Last 24 Hours Last 24 Hours Last 7 Days Last 7 Days Last 30 Days Last 30 Days All All Photos Photos Forum Forums
  • Kirr's Avatar
    Today, 10:45
    Yeah, the two are a tool and a library implementing DEFLATE algorith, this is more accurate to say. In my benchmark, by "gzip" I refer to the software tool, not to the "gzip" file format. zlib has "zpipe.c" in "examples" directory. This may be what you mean. I guess there is no point testing it, but perhaps I should benchmark it to confirm this. It seems 7-Zip is still Windows-exclusive. However there is a more portable "p7zip" - I will think about adding it to the benchmark.
    3 replies | 187 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Today, 09:34
    @darek could you test paq8sk19 -x15 -w -e1,english.dic on enwik9 please ? thank you
    83 replies | 7649 view(s)
  • cssignet's Avatar
    Today, 09:13
    the host (https://i.slow.pics/) did some kind of post-processing on PNG (dropping the iCCP chunk and recompressing the image data worsely). those files are not what i uploaded (see edited link on my first post)
    5 replies | 180 view(s)
  • hunman's Avatar
    Today, 08:01
    hunman replied to a thread MCM + LZP in Data Compression
    Maybe you can integrate it into Bulat's FreeARC...
    53 replies | 35195 view(s)
  • Li weiqin's Avatar
    Today, 05:46
    Li weiqin replied to a thread MCM + LZP in Data Compression
    I've used this wonderful function for a year and wonder who made it. And I find this, thank you. But, it's hard to use for normal people like me for it can only run on cmd and compress a file per operation. If somebody can design a GUi or remake a graphic software, it will be great.
    53 replies | 35195 view(s)
  • SolidComp's Avatar
    Today, 03:25
    Your lossless reduction darkened the image though. Look at them side by side.
    5 replies | 180 view(s)
  • cssignet's Avatar
    Today, 02:19
    ​i guess the original PNG would be this: https://res.cloudinary.com/cloudinary-marketing/image/upload/Web_Assets/blog/high_fidelity.png some trials with close filesize (webp = no meta, png = meta): cwebp -q 91 high_fidelity.png -o q91.webp (52.81 KB) -> q91.png cwebp -q 90 -sharp_yuv high_fidelity.png -o q90-sharp.webp (52.06 KB) -> q90-sharp.png it would be unrelated with the point of the article itself, but still, since web delivery is mentionned, few points from end-user pov on samples/results: - this file could be a good example where automatic compression would be useful. this PNG could be losslessy reduced to 19.85 KB for web (or 16.19 KB in lossless WebP), which would make the comparison (targeted filesize ~53 KB) with lossy JPEG XL or other lossy format less revelant for users - about PNG itself, the encoder used here would make very over-bloated data for web context, making the initial filesize non-representative of the format (original PNG is 2542.12 KB, but expected rendering for web could be losslessly encoded to 227.08 KB with all chunks). as aside note, this PNG encoder also wrote non-standard key for zTxt/tEXt chunks or non-standard chunks (caNv) btw, instead of math lossless only, did you plan somehow to provide a "web lossless"? i did not try, but feeding the lossless (math) encoder with 16 bits/sample PNG would probably create over-bloated file for web usage
    5 replies | 180 view(s)
  • SvenBent's Avatar
    Today, 01:34
    Thank you for the testing i ran into some of the same issues with ECT. it appearsps ECT uses a lot higher number of blocks than pngout I reported this issue to caveman in his huffmixthread https://encode.su/threads/1313-Huffmix-a-PNGOUT-r-catalyst?p=65017&viewfull=1#post65017 Personally since Deflopt does never increase size I do not believe its has the biggest effect with huffmix but I can ECT + defltop /b mixed with ECT+defluffed+delftop /b, as defluff sometimes increases size. i wonder what the huffxmi succes rate is from ECT -9 with pngout /f6 /ks /kp /force on the ect file
    469 replies | 125651 view(s)
  • Shelwien's Avatar
    Yesterday, 23:30
    https://www.phoronix.com/scan.php?page=news_item&px=Torvalds-Threadripper Yes, but he just wanted more threads.
    1 replies | 58 view(s)
  • skal's Avatar
    Yesterday, 23:18
    Also: ​ * you forgot to use '-sharp_yuv' option for the webp example (53kb). Otherwise, it would have give you the quite sharper version: (and note that this webp was encoded from the jpeg-q97, not the original PNG). * in the "Computational Complexity", i'm very surprised that JPEG-XL is faster than libjpeg-turbo. Did you forget to mention multi-thread usage?
    5 replies | 180 view(s)
  • skal's Avatar
    Yesterday, 21:35
    Jon, your "Original PNG image (2.6 MB)"is actually a jpeg (https://res.cloudinary.com/cloudinary-marketing/image/upload/f_jpg,q_97/Web_Assets/blog/high_fidelity.png) when downloaded. did you mean to add 'f_jpg,q_97' in the URL ?
    5 replies | 180 view(s)
  • SolidComp's Avatar
    Yesterday, 21:19
    SolidComp replied to a thread Brotli in Data Compression
    Wow it shrunk jQuery down to 10 KB! That's impressive. The dictionary is 110 KB, but that's a one-time hit. There were error messages on dictionary creation though. I don't really understand them:
    255 replies | 82039 view(s)
  • Jon Sneyers's Avatar
    Yesterday, 20:30
    Hi everyone! I wrote a blog post about the current state of JPEG XL and how it compares to other state-of-the-art image codecs. https://cloudinary.com/blog/how_jpeg_xl_compares_to_other_image_codecs
    5 replies | 180 view(s)
  • SolidComp's Avatar
    Yesterday, 19:31
    "gzip" as such isn't a command line interface to the zlib library. It's just a format, one of three that zlib supports (the other two are raw DEFLATE and a "zlib" format, also DEFLATE-based). GNU gzip is just a specific app that produces gzip files (and maybe others?). I think zlib has a program that you can easily build. It might be called minizip. Someone please correct me if I'm wrong. The 7-Zip gzipper is unrelated to the .7z or LZMA formats. I'm speaking of 7-Zip the app. It can produce .7z, .xz, gzip (.gz), .zip, .bz2, and perhaps more compression formats. Pavlov wrote his own gzipper from scratch, apparently, and it's massively better than any other gzipper, like GNU gzip or libdeflate. I assume it's better than zlib's gzipper as well. I don't understand how he did it. So if you want to compare the state of the art to gzip, it would probably make sense to use the best gzipper. His gzip files are 17% smaller than libdeflate's on text...
    3 replies | 187 view(s)
  • Scope's Avatar
    Yesterday, 19:15
    Scope replied to a thread JPEG XL vs. AVIF in Data Compression
    How JPEG XL Compares to Other Image Codecs ​https://cloudinary.com/blog/how_jpeg_xl_compares_to_other_image_codecs
    15 replies | 664 view(s)
  • smjohn1's Avatar
    Yesterday, 19:12
    OK, that makes sense too. So reducing LZ4_DISTANCE_MAX doesn't necessary increases compression speed. That might be a sweet spot in terms of compression speed.
    4 replies | 170 view(s)
  • Cyan's Avatar
    Yesterday, 18:57
    In fast mode, finding more matches corresponds to effectively skipping more data and searching less, so it tends to be faster indeed.
    4 replies | 170 view(s)
  • smjohn1's Avatar
    Yesterday, 17:18
    You are right. Checked the code again, and memory use level was indeed 18 instead of 14. So that was the reason, which makes sense. On the other other hand, smaller LZ4_DISTANCE_MAX results in speed decrease ( though slightly ) in compression. Is that because literal processing ( memory copy ) is slower than match processing?
    4 replies | 170 view(s)
  • lz77's Avatar
    Yesterday, 10:47
    https://habr.com/ru/news/t/503658/ ​Sorry, in Russian.
    1 replies | 58 view(s)
  • Krishty's Avatar
    Yesterday, 10:23
    While huffmix works great with pngout /r, I had little success using it on combinations of ECT/DeflOpt/defluff. Details here: https://encode.su/threads/3186-Papa%E2%80%99s-Optimizer?p=65106#post65106 I should check whether there is a way to use ECT similar to pngout /r, i.e. whether block splits are stable with different parameters …
    469 replies | 125651 view(s)
  • Krishty's Avatar
    Yesterday, 09:54
    I did some experiments with huffmix according to this post by SvenBent: https://encode.su/threads/2274-ECT-an-file-optimizer-with-fast-zopfli-like-deflate-compression?p=64959&viewfull=1#post64959 (There is no public build because I haven’t gotten a response from caveman so far regarding the huffmix license.) I tested a few thousand PNGs from my hard drive. Previous optimization used ECT + defluff + DeflOpt; now it uses huffmix on all intermediate results. Some observations: Without pngout, huffmix has only three streams to choose from: ECT output, ECT + DeflOpt, ECT + defluff + DeflOpt. So there is not much gain to expect. Actual gains were seen in about one out of fifty files. These were mostly 1-B gains; one file got 7 B smaller and another 13 B. The larger the file, the larger the gains. The error rate increased significantly: “More than 1024 Deflate blocks detected, this is not handled by this version.” (known error with large PNGs) “Type 0 (uncompressed) block detected, this is not handled by this version.” (known error) On a few files, huffmix terminated without any error message There is a huge increase in complexity: Previously, there was just one pipeline for all DEFLATE-based formats. I have to maintain a separate path for ZIP now, which is supported by ECT/defluff/DeflOpt but not by huffmix. If huffmix exits with error code 0, all is well. Else if huffmix exits with error code 1, parse stdout: If stdout contains “Type 0 (uncompressed) block detected, this is not handled by this version.”, we have a soft error. Just pick the smaller file. Else if stdout contains “More than 1024 Deflate blocks detected, this is not handled by this version.”, we have a soft error. Just pick the smaller file. Else if stderr (not stdout!) contains “is an unknown file type”, we have hit a file that huffmix doesn’t understand. Just pick the smaller file. (This also happens with defluff and archives containing some Unicode characters.) Else we have a hard error like a destroyed file; abort. There is much more file I/O going on, and there seems to be a very rare race condition with Win32’s CopyFile and new processes. Not huffmix’s fault, but something that is now being triggered much more often. All in all, I’m not sure I should keep working on it. It definitely is a great tool, but it comes with so many limitations and rough edges that the few bytes it gains me over my existing pipeline hardly justify the increased complexity.
    80 replies | 20540 view(s)
  • Jyrki Alakuijala's Avatar
    Yesterday, 09:53
    Jyrki Alakuijala replied to a thread Brotli in Data Compression
    The same custom dictionary that zstd uses can be used for brotli. In my testing half the time Zstds custom dictionary builder wins Brotli's similar tool, half the time the opposite. Surprisingly often it is an even better strategy (for resulting compression density) to take random 10 samples of the data, concatenate them as a custom dictionary rather than trying to be smart about it.
    255 replies | 82039 view(s)
  • Cyan's Avatar
    Yesterday, 05:22
    Cyan replied to a thread Brotli in Data Compression
    I think 5 samples is the absolute minimum. Sometimes, even that is not enough, when samples are pretty small. But 90K is relatively large, so that should do it (assuming you are adding multiple copies of the same file, adding empty files wouldn't work). Looking at your screenshot, I noticed a wildcard character `*`. I have no idea how shell expansion works on Windows. Chances are, it doesn't. Prefer using the `-r` command to load all files from a directory, this capability is internal to `zstd` so it should be okay even on Windows, since it doesn't depend on any shell capability.
    255 replies | 82039 view(s)
  • Cyan's Avatar
    Yesterday, 05:14
    I don't get the same result. When testing with LZ4_DISTANCE_MAX == 32767, I get 57548126 for enwik8, aka slightly worse than a distance of 64KB. In order to get the same compressed size as you, I first need to increase the memory usage by quite a bit (from 16 KB to 256 KB), which is the actual reason for the compression ratio improvement (and compression speed decrease). The impact of MAX_DISTANCE is less dramatic than for high compression mode because, by the very nature of the fast mode, it doesn't have much time to search, so most searches will end up testing candidates at rather short distances anyway. But still, reducing max distance should nonetheless, on average, correspond to some loss of ratio, even if a small one.
    4 replies | 170 view(s)
  • Kirr's Avatar
    Yesterday, 03:24
    Thanks for kind words, SolidComp. I work with tons of biological data, which motivated me to first make a compressor for such data, and then this benchmark. I'll probably add FASTQ data in the future, if time allows. As for text, HTML, CSS and other data, I have no immediate plans for it. There are three main obstacles: 1. Computation capacity. 2. Selecting relevant data. 3. My time needed to work on it. Possibly it will require cooperating with other compression enthusiasts. I'll need to think about it. I'm under the impression that "zlib" is a compression library, and "gzip" is a command line interface to this same library. Since I benchmark command line compression tools, it's the "gzip" that is included, rather than "zlib". However please let me know if there is some alternative command line "zlib gzipper" that I am missing. Igor Pavlov's excellent LZMA algorithm (which powers 7-Zip) is represented by the "xz" compressor in the benchmark. Igor's unfortunate focus on Windows releases allowed "xz" to become standard LZMA implementation on Linux (as far as I understand). You mean this one - https://github.com/ebiggers/libdeflate ? Looks interesting, I'll take a look at it. I noticed this bit in the GitHub readme: "libdeflate itself is a library, but the following command-line programs which use this library are also provided: gzip (or gunzip), a program which mostly behaves like the standard equivalent, except that it does not yet have good streaming support and therefore does not yet support very large files" - Not supporting very large files sounds alarming. Especially without specifying what exactly they mean by "very large". Regarding gzip, don't get me started! Every single biological database shares data in gzipped form, wasting huge disk space and bandwidth. There is a metric ton of research on biological sequence compression, in addition to excellent general-purpose compressors. Yet the field remains stuck with gzip. I want to show that there are good alternatives to gzip, and that there are large benefits in switching. Whether this will have any effect remains to be seen. At least I migrated all my own data to a better format (saving space and increasing access speed).
    3 replies | 187 view(s)
  • smjohn1's Avatar
    Yesterday, 00:36
    README.md: `LZ4_DISTANCE_MAX` : control the maximum offset that the compressor will allow. Set to 65535 by default, which is the maximum value supported by lz4 format. Reducing maximum distance will reduce opportunities for LZ4 to find matches, hence will produce worse the compression ratio. The above is true for high compression modes, i.e., levels above 3, but the opposite is true for compression levels 1 and 2. Here is a test result using default value ( 65535 ): <TestData> lz4-v1.9.1 -b1 enwik8 1#enwik8 : 100000000 -> 57262281 (1.746), 325.6 MB/s ,2461.0 MB/s and result using a smaller value ( 32767 ): <TestData> lz4-1.9.1-32 -b1 enwik8 1#enwik8 : 100000000 -> 53005796 (1.887), 239.3 MB/s ,2301.1 MB/s Anything unusual in LZ4_compress_generic() implementation? Could anyone shed some light? Thanks in advance.
    4 replies | 170 view(s)
  • Shelwien's Avatar
    25th May 2020, 23:14
    Shelwien replied to a thread Paq8pxd dict in Data Compression
    I don't think so - there're bugfixes and various tweaks (mostly jpeg model), according to changelog. All changes should be included in v89. If you need something to test, why not test different ppmd parameters? https://github.com/kaitz/paq8pxd/blob/master/paq8pxd.cpp#L12013 These numbers there (12,6,210,64 etc) are somewhat random, so you can try increasing or decreasing them and check how it affects compression. (12 and 6 are PPM orders and 210,64 are memory allocation per ppmd instance).
    915 replies | 313387 view(s)
  • Darek's Avatar
    25th May 2020, 22:42
    Darek replied to a thread Paq8pxd dict in Data Compression
    Are there any changes worth to test v87 and v88?
    915 replies | 313387 view(s)
  • Shelwien's Avatar
    25th May 2020, 21:09
    > "shooting" hundreds of compressors a day That won't really work with gigabyte-sized datasets. At slowest allowed speeds it would take more than a hour to compress it. Number of attempts would be limited simply because of limited computing power (like 5 or so).
    10 replies | 650 view(s)
  • schnaader's Avatar
    25th May 2020, 20:10
    Another question that comes to my mind regarding a private dataset: will there be automation involved to get results quick? Because with a private dataset I imagine literally "shooting" hundreds of compressors a day using different dictionaries to analyze the data. So would this be a valid and working strategy? Alex' quote "organizers will provide some samples" points into the direction to reduce this a bit so you can also do offline using, but it would still be useful.
    10 replies | 650 view(s)
  • SolidComp's Avatar
    25th May 2020, 17:39
    Hi all – @Kirr made an incredibly powerful compression benchmark website called the Sequence Compression Benchmark. It lets you select a bunch of options and run it yourself, with outputs including graphs, column charts, and tables. It can run every single level of every compressor. The only limitation I see at this point is the lack of text datasets – it's mostly genetic data. @Kirr, four things: Broaden it to include text? Would that require a name change or ruin your vision for it? It would be great to see web-based text, like the HTML, CSS, and JS files of the 100 most popular websites for example. The gzipper you currently use is the GNU gzip utility program that comes with most Linux distributions. If you add some text datasets, especially web-derived ones, the zlib gzipper will make more sense than the GNU utility. That's the gzipper used by virtually all web servers. In my limited testing the 7-Zip gzipper is crazy good, so good that it approaches Zstd and brotli levels. It's long been known to be better than GNU gzip and zlib, but I didn't know it approached Zstd and brotli. It comes with the 7-Zip Windows utility released by Igor Pavlov. You might want to include it. libdeflate is worth a look. It's another gzipper. The overarching message here is that gzip ≠ gzip. There are many implementations, and the GNU gzip utility is likely among the worst.
    3 replies | 187 view(s)
  • SolidComp's Avatar
    25th May 2020, 17:20
    SolidComp replied to a thread Brotli in Data Compression
    Five times or five files? I added a fifth file, same error. Screenshot below:
    255 replies | 82039 view(s)
  • Shelwien's Avatar
    25th May 2020, 16:05
    Shelwien replied to a thread Brotli in Data Compression
    @SolidComp: Sorry, I left a mistake after renaming samples subdirectory :) After running gen.bat, the dictionary is in the file named "dictionary". If you're on linux, you can just repeat the operations in gen.bat manually, zstd --train produces the dictionary, zstd -D compresses using it. Then there's also this option to control dictionary size: --maxdict=# : limit dictionary to specified size (default: 112640)
    255 replies | 82039 view(s)
  • Cyan's Avatar
    25th May 2020, 15:25
    Cyan replied to a thread Zstandard in Data Compression
    --patch-from is a new capability designed to reduce the size of transmitted data when updating a file from one version to another. In this model, it is assumed that : - the old version is present at destination site - new and old versions are relatively similar, with only a handful of changes. If that's the case, the compression ratio will be ridiculously good. zstd will see the old version as a "dictionary" when generating the patch and when decompressing the new version. So it's not a new format : the patch is a regular zstd compressed file.
    429 replies | 130359 view(s)
  • Cyan's Avatar
    25th May 2020, 15:15
    Cyan replied to a thread Brotli in Data Compression
    You could try it 5 times. Assuming that the source file is ~90K, this should force the trainer to provide a dictionary from this material. Note though that the produced dictionary will be highly tuned for this specific file, which is not the target model. In production environment, we tend to use ~10K samples, randomly extracted from an even larger pool, in order to generate a dictionary for a category of documents.
    255 replies | 82039 view(s)
  • Jyrki Alakuijala's Avatar
    25th May 2020, 10:17
    Jyrki Alakuijala replied to a thread Brotli in Data Compression
    Based on a quick look at the makefiles, we are not using the fast math option. However, there can be more uncertainty, like perhaps using multiply-and-add as a single instruction leading to a different result that doing multiply and add as two separate instructions. (I'm a bit out of touch with this field. Compilers, vectorization and new instructions are improved constantly.)
    255 replies | 82039 view(s)
  • Kirr's Avatar
    25th May 2020, 05:58
    Kirr replied to a thread Zstandard in Data Compression
    From source, when possible. Thanks, will clarify it on website (in the next update).
    429 replies | 130359 view(s)
  • SolidComp's Avatar
    25th May 2020, 05:16
    SolidComp replied to a thread Zstandard in Data Compression
    Do you build the compressors from source, or do you use the builds provided by the projects?
    429 replies | 130359 view(s)
  • SolidComp's Avatar
    25th May 2020, 05:11
    SolidComp replied to a thread Brotli in Data Compression
    Do you specify fast math in the makefile or cmake?
    255 replies | 82039 view(s)
  • SolidComp's Avatar
    25th May 2020, 05:11
    SolidComp replied to a thread Brotli in Data Compression
    Where's the dictionary?
    255 replies | 82039 view(s)
  • Kirr's Avatar
    25th May 2020, 02:59
    Kirr replied to a thread Zstandard in Data Compression
    zstd is now updated to 1.4.5 in my benchmark: http://kirr.dyndns.org/sequence-compression-benchmark/ I noticed good improvement in decompression speed for all levels, and some improvement in compression speed for slower levels. (Though I am updating from 1.4.0, so the improvement may be larger than from 1.4.4).
    429 replies | 130359 view(s)
  • redrabbit's Avatar
    25th May 2020, 02:09
    Thanks for the explanation and the testing
    84 replies | 13163 view(s)
  • terrelln's Avatar
    25th May 2020, 01:55
    terrelln replied to a thread Zstandard in Data Compression
    Both single-thread and multi-thread modes are deterministic, but they produce different results. Multi-threaded compression produces the same output with any number of threads. The zstd cli defaults to multi-threaded compression with 1 worker thread. You can opt into single-thread compression with --single-thread.
    429 replies | 130359 view(s)
  • schnaader's Avatar
    24th May 2020, 23:47
    OK, so here's the long answer. I could reproduce the bad performance of the Life is Strange 2 testfile, my results are in the table below. There are two things this all boils down to: preflate (vs. zlib brute force in xtool and Precomp 0.4.6) and multithreading. Note that both the Life is Strange 2 times and the decompressed size are very similar for Precomp 0.4.6 and xtool when considering the multithreading factor (computed by using the time command and dividing "user" time by "real" time). Also note that the testfile has many small streams (64 KB decompressed each), preflate doesn't seem to use its multithreading in that case. Although preflate can be slower than zlib brute force, it also has big advantages which can be seen when looking at the Eternal Castle testfile. It consists of big PNG files, preflate can make use of multithreading (though not fully utilizing all cores) and is faster than the zlib brute force. And the zlib brute force doesn't even manage to recompress any of the PNG files. Xtool's (using reflate) decompressed size is somewhere between those two, most likely because reflate doesn't parse multi PNG files and can only decompress parts of them because of this. So, enough explanation, how can the problem be solved? Multithreading, of course. The current branch already features multithreading for JPEG when using -r and I'm working on it for deflate streams. When it's done, I'll post fresh results for the Life is Strange 2 testfile, should be very close to xtool if things work out well. Multithreaded -cn or -cl though is a bit more complex, I've got some ideas, but have to test them and it will take longer. Test system: Hetzner vServer CPX21: AMD Epyc, 3 cores @ 2.5 GHz, Ubuntu 20.04 64-Bit Eternal Castle testfile, 223,699,564 bytes program decompressed size time (decompression/recompression) multithreading factor (decompression/recompression) compressed size (-nl) --- Precomp 0.4.8dev -cn -d0 5,179,454,907 5 min 31 s / 4 min 45 s 1.73 / 1.64 118,917,128 Precomp 0.4.6 -cn -d0 223,699,589 8 min 31 s 1.00 173,364,804 xtool (redrabbit's result) 2,099,419,005 Life is Strange 2 testfile, 632,785,771 bytes program decompressed size time (decompression/recompression) multithreading factor (decompression/recompression) --- Precomp 0.4.8dev -cn -intense0 -d0 1,499,226,364 3 min 21 s / 2 min 14 s 0.91 / 0.99 Precomp 0.4.8dev (after tempfile fix) 1,499,226,364 3 min 11 s / 2 min 21 s 0.92 / 0.99 Precomp 0.4.6 -cn -intense0 -d0 1,497,904,244 1 min 55 s / 1 min 43 s 0.93 / 0.98 xtool 0.9 e:precomp:32mb,t4:zlib (Wine) 1,497,672,882 46 s / 36 s 2.75 / 2.87
    84 replies | 13163 view(s)
  • Jyrki Alakuijala's Avatar
    24th May 2020, 23:36
    Jyrki Alakuijala replied to a thread Brotli in Data Compression
    I don't remember what was improved. Perhaps hashing. There is a lot of floating point in brotli 10 and 11. I don't know how compiler invariant it is the way we do it. I'd assume it to be well-defined, but this could be a naive position.
    255 replies | 82039 view(s)
  • Shelwien's Avatar
    24th May 2020, 21:20
    Shelwien replied to a thread Brotli in Data Compression
    Normally it just needs more data for training. But here I made a workaround for you: Update: attachment deleted since there was a mistake in the script and it didn't work.
    255 replies | 82039 view(s)
  • schnaader's Avatar
    24th May 2020, 20:44
    @redrabbit: Just a quick post to let you know I'm currently in researching the bad performance you describe on those 2 files and stuff I found out while profiling the latest Precomp version. Don't waste too much time on this. As I saw your post, I wondered about where zlib is still used as preflate is doing all deflate related stuff itself. But zlib is indeed still used to speed up intense and brute mode (testing the first few bytes of a potential stream to avoid recompressing false positives). But profiling the latest version shows that for the Life Is Strange 2 file you posted, this is only using 0.3% of CPU time (of .pak -> .pcf, in -r zlib isn't used at all). So using a faster zlib library could only speed up things by 0.3%. On the other hand, I found something else and fixed it some minutes ago in both branches: About 5% of CPU time was wasted because uncompressed data was written to a file to prepare recursion even though both "-intense0" and "-d0" disable recursion, so the temporary file wasn't used at all. Fixed this by writing the file only if it's used. Testing this shows it works: 3 min 11 s instead of 3 min 21 s for "-cn -intense0 -d0" of the Life Is Strange 2 file. Not much, but some progress. Might have more impact on non-SSD drives.
    84 replies | 13163 view(s)
  • SolidComp's Avatar
    24th May 2020, 20:36
    SolidComp replied to a thread Brotli in Data Compression
    I tried to create a dictionary with --train, but I get an error saying not enough samples or something. I tried it with just the jQuery file as the training sample, which used to work in the past. Then I tried two, then three, then four jQuery files (the previous versions, 3.5.0, 3.4.1, etc.), and still get the error even with four files. Not sure what I'm doing wrong.
    255 replies | 82039 view(s)
  • Hakan Abbas's Avatar
    24th May 2020, 19:59
    In data compression, working speed is as important as compression rate. When looking at the sector, it is clearly seen that faster products are preferred even if they are low in efficiency. It is not preferable to spend much more energy than necessary for a small data saving. No matter who is behind the product. In most cases, while performing the calculations, the cost (cpu, ram ...) of these transactions is taken into consideration. Whenever possible, situations requiring excessive processing load are avoided. However, as we have seen, it cannot be said that attention is paid to these points for AVIF/AV1.
    15 replies | 664 view(s)
  • Shelwien's Avatar
    24th May 2020, 19:58
    Shelwien replied to a thread Brotli in Data Compression
    Its unfair comparison, since both brotli and zstd have support for external dictionary, but brotli silently uses integrated one 89,476 jquery-3.5.1.min.js 27,959 jquery-3.5.1.min.bro // brotli_gc82.exe -q 11 -fo jquery-3.5.1.min.bro jquery-3.5.1.min.js 28.754 jquery-3.5.1.min.bro // brotli with dictionary zeroed out in .exe 29,453 jquery-3.5.1.min.zst // zstd.exe --ultra -22 -fo jquery-3.5.1.min.zst jquery-3.5.1.min.js 28,942 jquery-3.5.1.min.zst // zstd.exe --ultra -22 -fo jquery-3.5.1.min.zst -D dictionary.bin jquery-3.5.1.min.js This example uses brotli's default dictionary for zstd, but we can generate a custom dictionary for zstd, while it is harder to do for brotli. Yes, brotli at max settings has stronger entropy model than zstd. But it also has 5x slower encoding and 2x slower decoding. And actual compression difference is still uncertain, since we can build a larger specialized dictionary for target data with zstd --train.
    255 replies | 82039 view(s)
  • Jyrki Alakuijala's Avatar
    24th May 2020, 19:40
    My basic understanding is that avif decoders are roughly half the speed of jpeg xl decoders. Getting the fastest avif decoding may require turning off some features that are always on in jpeg xl, for example no yuv444, no more than 8 bit of dynamics. It may be that hardware decoders for avif may not be able to do streaming, i.e., to display that part of the image that is already decoded. For some applications this can be a blocker.
    15 replies | 664 view(s)
  • Jyrki Alakuijala's Avatar
    24th May 2020, 19:30
    Pik is a superfast simple format for photography level qualities (1.5+ bpp), and gives great quality/density there. It used dct8x8. The requirements for JPEG XL included lower rates, down to 0.06 bpp was discussed. Variable sizes of dcts and filtering improve the performance at lower bpps, and keep it at state-of-the art down to 0.4 bpp and somewhat decent at 0.15 bpp. This was adding a lot of code and 2x decoding slowdown to cover a larger bpp range. Further, we integrated FUIF into PIK. We are still in the process of figuring out all the possibilities it brings. FUIF seems much less psycho usually efficient, but more versatile coding. Luca and Jon developed FUIF code further after integrating it into JPEG XL. Jon has been in general great to collaborate with and I am quite proud for having made the initial proposals of basing JPEG XL on these two codecs. Everyone in the team has grown very comfortable with the fusion of these two codecs.
    15 replies | 664 view(s)
  • Jyrki Alakuijala's Avatar
    24th May 2020, 19:16
    I'm not an expert on jpeg xt. Xt is likely a HDR patch on usual jpegs. Not particularly effective at compression density.
    15 replies | 664 view(s)
  • LucaBiondi's Avatar
    24th May 2020, 19:07
    LucaBiondi replied to a thread Paq8pxd dict in Data Compression
    Thank you!!
    915 replies | 313387 view(s)
  • DZgas's Avatar
    24th May 2020, 18:36
    DZgas replied to a thread JPEG XL vs. AVIF in Data Compression
    I think - Large corporations did AV1 codec for self, not for users haha. But decoding with the standard Dav1d is normal...if fact that my laptop can't VP9 1080p and AV1 720p and on the verge can HEVC 1080p. Problems of progress - I am slow. Obviously AV1 is slowest codec and the strongest of all. Encoding speeds are currently killing him. Despite the fact that it is ~ 25% better than HEVC or VP9, ​​it is 10-20 times slower, this is serious for people who do not have powerful PC/Server. Well, almost the Internet is still using the old AVC, because it’s fast and you can decode on Watch.
    15 replies | 664 view(s)
  • SolidComp's Avatar
    24th May 2020, 17:25
    SolidComp replied to a thread Zstandard in Data Compression
    Sportman, why is the compressed size different for single thread vs multithreaded? Is it supposed to produce different results? I thought it would be deterministic at any given compression level.
    429 replies | 130359 view(s)
  • SolidComp's Avatar
    24th May 2020, 17:19
    Yes, AVIF was not intended for cameras. The encoders are still incredibly slow, as are the encoders for AV1 video (though I think Intel's SVT AV1 encoder is improving). Do you know if the decoders are reasonably fast?
    15 replies | 664 view(s)
  • SolidComp's Avatar
    24th May 2020, 17:18
    Jyrki, is either JPEG XL or AVIF related to PIK? What became of PIK? And I'm confused by JPEG XT – do you know if it's related to XL?
    15 replies | 664 view(s)
  • SolidComp's Avatar
    24th May 2020, 17:02
    SolidComp replied to a thread Brotli in Data Compression
    Hi all – I'm impressed with the results of compressing jQuery with brotli, compared to Zstd and libdeflate (gzip): Original jQuery 3.5.1 (latest): 89,476 bytes (this is the minified production version from jQuery.com: Link) libdeflate 1.6 gzip -11: 36,043 (libdeflate adds two extra compression levels to zlib gzip's nine) Zstd 1.4.5 -22: 29,453 brotli 1.0.4 -11: 28,007 brotli 1.0.7 -11: 27,954 Update: 7-Zip's gzipper is incredible: 29,960 bytes. I'm not sure why it's so much better than libdeflate, or how it's so close to Zstd and brotli. Compression of web files is much more important to me than the weird benchmarks that are typically used. And this is where brotli shines, not surprisingly since it was designed for the web. Note that brotli has a dictionary, generated from web files, whereas Zstd and libdeflate do not. You can generate a dictionary with Zstd, but it keeps giving me an error saying there aren't enough samples... Brotli 1.0.7 performs slightly better than 1.0.4, which was surprising since there was nothing in the release notes that indicated improvements for the max compression setting (-11), just an improvement for the -1 setting. The only other difference is that I compiled my 1.0.7 version myself in Visual Studio 2019, dynamically linked, whereas my 1.0.4 version is the official build from GitHub, a static executable (1.0.4 is the last version they released builds for – all they've released is source code for 1.0.5 through 1.0.7). Jyrki, should compression results (size) be exactly the same across compilers, deterministic given the same settings? So it has to be something in the source code of 1.0.7 compared to 1.0.4 that explains the improvement, right, not Visual Studio?
    255 replies | 82039 view(s)
  • moisesmcardona's Avatar
    24th May 2020, 16:30
    The commits are there. Kaitz just didn't posted the compiled versions: v87: https://github.com/kaitz/paq8pxd/commit/86969e4174f8f3f801f9a0d94d36a8cbda783961 v88: https://github.com/kaitz/paq8pxd/commit/7969cc107116c31cd997f37359b433994fea1f6d I've attached them with the source from their respective commits. Compiled with march=native on my AMD Ryzen 9 CPU.
    915 replies | 313387 view(s)
  • Jyrki Alakuijala's Avatar
    24th May 2020, 00:19
    Jpeg XL is not frozen yet. (Other than the jpeg repacking part of it.) Our freezing schedule is end of August 2020. Before that it is not a good idea to integrate for other than testing use.
    15 replies | 664 view(s)
  • Darek's Avatar
    23rd May 2020, 20:54
    Darek replied to a thread Paq8pxd dict in Data Compression
    Scores of my testset for paq8pxd_v89. Slight reverse in total testset. Mixed scores for textual files, worse scores for exe files.
    915 replies | 313387 view(s)
  • schnaader's Avatar
    23rd May 2020, 20:16
    You can try to reverse engineer the code coming from this C compiler: https://github.com/Battelle/movfuscator Will be very hard as the binary will only contain mov instructions and has some (optional?) randomization applied, IIRC. On the other hand, performance of the programs compiled with this won't be very good. There's an example of DOOM 1 compiled this way that renders a frame around every 7 hours. https://github.com/xoreaxeaxeax/movfuscator/tree/master/validation/doom
    5 replies | 825 view(s)
  • Jyrki Alakuijala's Avatar
    23rd May 2020, 20:02
    I believe we have made some improvements in this area (simplicity and resource use of filters/predictors) and those will surface soon in the public repository. Encoder optimizations are on a rather low priority still, most optimizations are made for making decoding simpler and faster.
    152 replies | 35764 view(s)
  • Shelwien's Avatar
    23rd May 2020, 18:08
    1) This contest is intended for more practical algorithms - lowest allowed speed would be something like 250kb/s. So no PAQs,NN/ML most likely. 2) Archiver size can be pretty large - precomp+zstd could be 3Mb easily, more if there're several compression methods. 3) Processing speed would be a part of ranking, so dictionary preprocessing won't be a free way to decrease compressed size.
    10 replies | 650 view(s)
  • cssignet's Avatar
    23rd May 2020, 16:45
    those trials were about image data transformations, and it seems that the codec would be good enought on those vxFkgPwD.png = 1099.39 KB > cjpegxl -q 100 vxFkgPwD.png vxFkgPwD.jxl Kernel Time = 0.577 = 11% User Time = 11.356 = 229% Process Time = 11.934 = 240% Virtual Memory = 528 MB Global Time = 4.952 = 100% Physical Memory = 400 MB <-- multithreaded vxFkgPwD.jxl = 865.66 KB perhaps somehow the transformation could be improved sometimes akviv0R0.png = 1247.14 KB > cjpegxl -q 100 akviv0R0.png akviv0R0.jxl Kernel Time = 0.358 = 9% User Time = 11.091 = 278% Process Time = 11.450 = 287% Virtual Memory = 184 MB Global Time = 3.986 = 100% Physical Memory = 152 MB <-- multithreaded akviv0R0.jxl = 902.88 KB > cjpegxl -q 100 -s 9 akviv0R0.png akviv0R0-s9.jxl Kernel Time = 0.468 = 9% User Time = 15.678 = 302% Process Time = 16.146 = 311% Virtual Memory = 184 MB Global Time = 5.178 = 100% Physical Memory = 153 MB <-- multithreaded akviv0R0-s9.jxl = 892.17 KB > akviv0R0.png -> WebP (with prediction + pseudo-random colors re-ordering) Kernel Time = 0.093 = 1% User Time = 5.210 = 97% Process Time = 5.304 = 99% Virtual Memory = 81 MB Global Time = 5.330 = 100% Physical Memory = 78 MB <-- *not* multithreaded uSY86WmL.webp = 835.81 KB from end-users pov, some results could be unexpected atm, but maybe something went wrong on my side. still, could you confirm this result (from JPEG XL codec)? yfgfInnU.png = 222.15 KB > cjpegxl -q 100 yfgfInnU.png yfgfInnU.jxl Kernel Time = 0.312 = 12% User Time = 5.413 = 216% Process Time = 5.725 = 228% Virtual Memory = 151 MB Global Time = 2.502 = 100% Physical Memory = 110 MB <-- multithreaded yfgfInnU.jxl = 260.82 KB > cjpegxl -q 100 -s 9 yfgfInnU.png yfgfInnU-s9.jxl Kernel Time = 0.358 = 10% User Time = 9.578 = 272% Process Time = 9.937 = 282% Virtual Memory = 153 MB Global Time = 3.517 = 100% Physical Memory = 112 MB <-- multithreaded yfgfInnU-s9.jxl = 257.59 KB > cwebp -lossless yfgfInnU.png -o yfgfInnU.webp Kernel Time = 0.015 = 7% User Time = 0.187 = 89% Process Time = 0.202 = 96% Virtual Memory = 25 MB Global Time = 0.210 = 100% Physical Memory = 24 MB <-- *not* multithreaded yfgfInnU.webp = 201.44 KB
    152 replies | 35764 view(s)
  • DZgas's Avatar
    23rd May 2020, 16:18
    DZgas replied to a thread JPEG XL vs. AVIF in Data Compression
    Of course JpegXR is not JpegXL but can definitely say that AVIF will not be used in real time on cameras, it is Slowest. AVIF is developed by Alliance for Open Media, which includes most large corporations. AVIF is well suited for everything on the Internet, photographs of articles, video previews, stickers, and everything else. JpegXR has supported by Microsoft. You can view in the standard Windows photo viewer. And JpegXL?...was released...But I think - no one noticed it. There are currently no user-friendly programs for encoding and viewing JpegXL.
    15 replies | 664 view(s)
  • ivan2k2's Avatar
    23rd May 2020, 15:21
    ivan2k2 replied to a thread Zstandard in Data Compression
    error, if files in filelist separated by windows style new line (0x0d,0x0a): Error : util.c, 283 : fgets(buf, (int) len, file) unix style new line works fine.
    429 replies | 130359 view(s)
  • mhajicek's Avatar
    23rd May 2020, 14:09
    I d say private data set sounds better, cause then there is a higher chance the resulting compressor/algorithm will have some more general use, not just something designed for one file, otherwise pretty useless. So id say it supports usefull development more. Maybe the decompressor size could be limited even more, 16MB is really a lot - well, that depends on whether dictionary and other help data was just a considered side effect of the option 2], or if it was intended to encourage ppl to use that in development :)
    10 replies | 650 view(s)
  • Sportman's Avatar
    23rd May 2020, 12:50
    Sportman replied to a thread Zstandard in Data Compression
    enwik10: 3,638,532,709 bytes, 23.290 sec. - 10.649 sec., zstd -1 --ultra (v1.4.5) 3,325,793,817 bytes, 32.959 sec. - 11.632 sec., zstd -2 --ultra (v1.4.5) 3,137,188,839 bytes, 42.442 sec. - 11.994 sec., zstd -3 --ultra (v1.4.5) 3,072,048,223 bytes, 44.923 sec. - 12.828 sec., zstd -4 --ultra (v1.4.5) 2,993,531,459 bytes, 72.322 sec. - 12.827 sec., zstd -5 --ultra (v1.4.5) 2,921,997,106 bytes, 95.852 sec. - 12.613 sec., zstd -6 --ultra (v1.4.5) 2,819,369,488 bytes, 132.442 sec. - 11.922 sec., zstd -7 --ultra (v1.4.5) 2,780,718,316 bytes, 168.737 sec. - 11.724 sec., zstd -8 --ultra (v1.4.5) 2,750,214,835 bytes, 237.175 sec. - 11.574 sec., zstd -9 --ultra (v1.4.5) 2,694,582,971 bytes, 283.778 sec. - 11.564 sec., zstd -10 --ultra (v1.4.5) 2,669,751,039 bytes, 355.330 sec. - 11.651 sec., zstd -11 --ultra (v1.4.5) 2,645,099,063 bytes, 539.770 sec. - 11.658 sec., zstd -12 --ultra (v1.4.5) 2,614,435,940 bytes, 717.361 sec. - 11.766 sec., zstd -13 --ultra (v1.4.5) 2,569,453,043 bytes, 894.063 sec. - 11.872 sec., zstd -14 --ultra (v1.4.5) 2,539,608,782 bytes, 1,198.939 sec. - 11.795 sec., zstd -15 --ultra (v1.4.5) 2,450,374,607 bytes, 1,397.298 sec. - 11.547 sec., zstd -16 --ultra (v1.4.5) 2,372,309,135 bytes, 1,994.123 sec. - 11.414 sec., zstd -17 --ultra (v1.4.5) 2,339,536,175 bytes, 2,401.207 sec. - 11.819 sec., zstd -18 --ultra (v1.4.5) 2,299,200,392 bytes, 3,093.583 sec. - 12.295 sec., zstd -19 --ultra (v1.4.5) 2,196,998,753 bytes, 3,838.985 sec. - 12.952 sec., zstd -20 --ultra (v1.4.5) 2,136,031,972 bytes, 4,488.867 sec. - 13.171 sec., zstd -21 --ultra (v1.4.5) 2,079,998,491 bytes, 5,129.788 sec. - 12.915 sec., zstd -22 --ultra (v1.4.5) enwik10: 3,642,089,943 bytes, 28.752 sec. - 10.717 sec., zstd -1 --ultra --single-thread (v1.4.5) 3,336,007,957 bytes, 38.991 sec. - 11.808 sec., zstd -2 --ultra --single-thread (v1.4.5) 3,133,763,440 bytes, 48.671 sec. - 12.157 sec., zstd -3 --ultra --single-thread (v1.4.5) 3,065,081,662 bytes, 50.724 sec. - 12.904 sec., zstd -4 --ultra --single-thread (v1.4.5) 2,988,125,022 bytes, 79.664 sec. - 13.073 sec., zstd -5 --ultra --single-thread (v1.4.5) 2,915,934,603 bytes, 103.971 sec. - 12.798 sec., zstd -6 --ultra --single-thread (v1.4.5) 2,811,448,067 bytes, 148.300 sec. - 12.125 sec., zstd -7 --ultra --single-thread (v1.4.5) 2,775,621,897 bytes, 188.946 sec. - 11.804 sec., zstd -8 --ultra --single-thread (v1.4.5) 2,744,751,362 bytes, 255.285 sec. - 11.929 sec., zstd -9 --ultra --single-thread (v1.4.5) 2,690,272,721 bytes, 304.380 sec. - 11.737 sec., zstd -10 --ultra --single-thread (v1.4.5) 2,663,964,945 bytes, 380.876 sec. - 11.848 sec., zstd -11 --ultra --single-thread (v1.4.5) 2,639,230,515 bytes, 561.791 sec. - 11.774 sec., zstd -12 --ultra --single-thread (v1.4.5) 2,609,728,690 bytes, 705.747 sec. - 11.646 sec., zstd -13 --ultra --single-thread (v1.4.5) 2,561,381,234 bytes, 896.689 sec. - 11.777 sec., zstd -14 --ultra --single-thread (v1.4.5) 2,527,193,467 bytes, 1,227.455 sec. - 11.893 sec., zstd -15 --ultra --single-thread (v1.4.5) 2,447,614,045 bytes, 1,360.777 sec. - 11.614 sec., zstd -16 --ultra --single-thread (v1.4.5) 2,370,639,588 bytes, 1,953.282 sec. - 11.641 sec., zstd -17 --ultra --single-thread (v1.4.5) 2,337,506,087 bytes, 2,411.038 sec. - 11.971 sec., zstd -18 --ultra --single-thread (v1.4.5) 2,299,225,559 bytes, 2,889.098 sec. - 12.184 sec., zstd -19 --ultra --single-thread (v1.4.5) 2,197,171,322 bytes, 3,477.477 sec. - 12.862 sec., zstd -20 --ultra --single-thread (v1.4.5) 2,136,340,302 bytes, 4,024.675 sec. - 12.940 sec., zstd -21 --ultra --single-thread (v1.4.5) 2,080,479,075 bytes, 4,568.550 sec. - 12.934 sec., zstd -22 --ultra --single-thread (v1.4.5) Difference zstd x --ultra --single-thread (v1.4.4) versus (v1.4.5): zstd -1, 0.168 sec., -0.137 sec., -501 bytes zstd -2, 0.389 sec., -0.610 sec., -87 bytes zstd -3, 0.021 sec., -0.490 sec., 0 bytes zstd -4, 0.029 sec., -0.441 sec., 2 bytes zstd -5, 1.127 sec., -0.204 sec., 0 bytes zstd -6, 1.427 sec., -0.504 sec., 0 bytes zstd -7, 4.892 sec., -0.257 sec., 0 bytes zstd -8, 2.735 sec., -0.202 sec., 0 bytes zstd -9, 2.494 sec., -0.008 sec., 0 bytes zstd -10, 3.077 sec., -0.181 sec., 0 bytes zstd -11, 3.802 sec., -0.165 sec., 0 bytes zstd -12, 0.884 sec., -0.158 sec., 0 bytes zstd -13, 0.184 sec., -0.146 sec., 0 bytes zstd -14, -2.345 sec., -0.114 sec., 0 bytes zstd -15, -3.486 sec., -0.124 sec., 0 bytes zstd -16, -3.167 sec., -0.102 sec., 0 bytes zstd -17, 15.255 sec., -0.041 sec., 0 bytes zstd -18, 39.073 sec., -0.147 sec., 0 bytes zstd -19, 16.304 sec., -0.151 sec., 0 bytes zstd -20, 17.468 sec., 0.119 sec., 0 bytes zstd -21, 9.179 sec., 0.014 sec., 0 bytes zstd -22, 10.418 sec., -0.163 sec., 0 bytes Minus (-) = improvement
    429 replies | 130359 view(s)
More Activity