Activity Stream

Filter
Sort By Time Show
Recent Recent Popular Popular Anytime Anytime Last 24 Hours Last 24 Hours Last 7 Days Last 7 Days Last 30 Days Last 30 Days All All Photos Photos Forum Forums
  • Darek's Avatar
    Today, 10:59
    Darek replied to a thread Paq8pxd dict in Data Compression
    @LucaBiondi - could you attach exe file of modified paq8pxd_v89? According to benchmark procedure - good idea in my opinion but there should be the same benchmark file to test tt - maybe procedurally generated one by progam before start the test? @Kaitz - I have some idea but maybe it could be silly or not duable. Is it possible to use some sort of very light compression of program memory during use? As I understand majority of memory is use on some kinds of trees or other structure data representative. Is it possible to use lightly compressed data which would be virtual simulation of more memory usage? I think it could be still room for improvement for the biggest files (enwik8/9) if we could use more memory but maybe is not need to use more phisical memory and instead of this made kind of trick like this? Of course it would be more time consuming but maybe it could be worth it...
    938 replies | 318812 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Today, 10:20
    Let me try , but I don't promise because I am not a programmer :)
    33 replies | 1472 view(s)
  • lz77's Avatar
    Today, 10:16
    ​> TS40.txt: > 132,248,557 bytes, 8.055 sec. - 0.544 sec., zstd -7 --ultra --single-thread > 130,590,357 bytes, 10.530 sec. - 0.528 sec., zstd -8 --ultra --single-thread What ratio show zstd after preprocessing meaning 40/2.5=16 sec. for compression + decompression, 5% off? What ratio at all will be the best in 16 seconds?
    33 replies | 1472 view(s)
  • Darek's Avatar
    Today, 09:35
    yes, because it's option made for enwik8/9 :) 70'197'866 - TS40.txt -x15 -e1,english.dic by Paq8sk30, time - 73'876,51s - good score, bad time - paq8sk23 should be about 20x faster to meet contest criteria could you try to add use of more threads/cores?
    33 replies | 1472 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Today, 04:12
    On enwik8/9 there is no error when use -w option
    33 replies | 1472 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Today, 03:21
    paq8pxd_v89 when using -w option cause error message Transform fails at 333440671, skipping... so it detect ts40.txt as default , not bigtext wrt then that cause the compression ratio is worse
    33 replies | 1472 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Today, 03:04
    so the best score is using -x15 -e1,english.dic. @sportman could you add it to GDCC public test set file ?
    33 replies | 1472 view(s)
  • cssignet's Avatar
    Today, 00:53
    ​i would suggest a more simple, accurate, verifiable and *fair* test for time comparison. pingo/ECT binaries with same compiler/flags, cold-start running on dedicated resource (FX-4100 @ 3.6 Ghz — 8GB RAM — Windows 7 64-bit), tested on files found in PNG tab (aside note: i could not grab those https://i.redd.it/5lg9uz7fb7a41.png https://i.redd.it/6aiqgffywbk41.png https://i.redd.it/gxocab3x91e41.png https://i.redd.it/ks8z85usbg241.png https://i.redd.it/uuokrw18s4i41.png so input would be 1.89 GB (2 027 341 876 bytes) - 493 files) pingo (0.99 rc2 40) - ECT (f0b38f7 (0.8.3)) (with -strip) multi-processing (4x): ECT -1 --mt-file Kernel Time = 14.133 = 1% User Time = 3177.709 = 390% Process Time = 3191.842 = 392% Virtual Memory = 438 MB Global Time = 813.619 = 100% Physical Memory = 433 MB pingo -s0 Kernel Time = 86.518 = 16% User Time = 1740.393 = 328% Process Time = 1826.912 = 344% Virtual Memory = 1344 MB Global Time = 530.104 = 100% Physical Memory = 1212 MB ECT -5 --mt-file Kernel Time = 1557.482 = 43% User Time = 9361.869 = 259% Process Time = 10919.352 = 303% Virtual Memory = 1677 MB Global Time = 3601.090 = 100% Physical Memory = 1514 MB pingo -s5 Kernel Time = 144.550 = 6% User Time = 6937.879 = 317% Process Time = 7082.429 = 324% Virtual Memory = 1378 MB Global Time = 2183.105 = 100% Physical Memory = 1193 MB file per file: ECT -1 Kernel Time = 20.326 = 0% User Time = 2963.472 = 93% Process Time = 2983.799 = 99% Virtual Memory = 283 MB Global Time = 2984.405 = 100% Physical Memory = 282 MB pingo -s0 -nomulti Kernel Time = 68.468 = 4% User Time = 1443.711 = 95% Process Time = 1512.180 = 99% Virtual Memory = 905 MB Global Time = 1513.683 = 100% Physical Memory = 887 MB ECT -5 --mt-deflate Kernel Time = 886.538 = 14% User Time = 8207.743 = 134% Process Time = 9094.281 = 149% Virtual Memory = 1000 MB Global Time = 6083.433 = 100% Physical Memory = 916 MB <-- multithreaded pingo -s5 -nomulti Kernel Time = 109.107 = 1% User Time = 5679.091 = 98% Process Time = 5788.198 = 99% Virtual Memory = 978 MB Global Time = 5789.232 = 100% Physical Memory = 980 MB <-- *not* multithreaded regular -sN profiles in pingo goes more for small/avg image size paletted/RGBA. if someone would seek for speed over space, -sN -faster could be used instead. on some samples, it could be still competitive https://i.redd.it/05vnjqzhrou31.png (13 266 623 bytes) ECT -1 (out: 10 023 297 bytes) Kernel Time = 0.140 = 2% User Time = 5.928 = 97% Process Time = 6.068 = 99% Virtual Memory = 27 MB Global Time = 6.093 = 100% Physical Memory = 29 MB pingo -s0 (out: 8 777 351 bytes) Kernel Time = 0.280 = 8% User Time = 2.870 = 90% Process Time = 3.151 = 99% Virtual Memory = 98 MB Global Time = 3.166 = 100% Physical Memory = 90 MB pingo -s0 -faster (out: 9 439 005 bytes) Kernel Time = 0.124 = 6% User Time = 1.825 = 92% Process Time = 1.950 = 99% Virtual Memory = 86 MB Global Time = 1.965 = 100% Physical Memory = 78 MB
    159 replies | 40557 view(s)
  • Shelwien's Avatar
    Yesterday, 21:42
    > Shelwien, what does your utility do? Does it just convert numbers from text to binary? What does it do with other text? It convert out3.txt to binary and back to text losslessly (except for headers). > Do you convert floats to IEEE binary? Yes.
    8 replies | 207 view(s)
  • Shelwien's Avatar
    Yesterday, 21:41
    > Do you have any advice on resources for learning C++ as a language? https://stackoverflow.com/questions/388242/the-definitive-c-book-guide-and-list But you don't really need to know everything about C++ syntax and libraries to start programming in it. There're reference sites like https://www.cplusplus.com/ so you can always look up specific features. Basically just find some open-source project that you like and read the source, while looking up things that you don't know. > Do you think TurboPFor is the best method for this task? If you need very high processing speed, then probably yes. > I have researched other compressors(bzip2, zfp, SZ..) and it seems to have > the best performance, but I am by no means an expert. I am still planning > on implementing other methods for comparison. The best compression would be provided by a custom statistical (CM) compressor (since there're probably correlations between columns). TurboPFor doesn't have any really complex algorithms - its mostly just delta (subtracting predicted value from each number) and bitfield rearrangement/transposition. The main purpose of this library is that it provides efficient SIMD implementations of these algorithms for multiple platforms. But if gigabytes-per-second speed is not really necessary for you, then you can just as well use something else, like self-written delta + zstd. > Once I have progressed further, can I break apart the TurboPFor code? > Could I take a small amount of the files to run the method that seems > to work best from benchmarking or would this result in errors > with all of the interdependencies within TurboPFor? You can drop some of the files - actually it seems to build a library (libic.a) with the relevant part. Unfortunately its not very readable due to all the speed optimizations.
    8 replies | 207 view(s)
  • SolidComp's Avatar
    Yesterday, 21:24
    Shelwien, what does your utility do? Does it just convert numbers from text to binary? What does it do with other text? Do you convert floats to IEEE binary?
    8 replies | 207 view(s)
  • AlexBa's Avatar
    Yesterday, 20:44
    Thank you so much for your help. This all makes sense. I have done a little testing, and you were right, creating individual binary files from each column results in better compression. I will continue working on code and a header file to run this compression easily. As I'm getting started, I have a few more questions: Do you have any advice on resources for learning C++ as a language? I have taken a course with C++ in high school, but that was a few years ago. Do you think TurboPFor is the best method for this task? I have researched other compressors(bzip2, zfp, SZ..) and it seems to have the best performance, but I am by no means an expert. I am still planning on implementing other methods for comparison. Once I have progressed further, can I break apart the TurboPFor code? Could I take a small amount of the files to run the method that seems to work best from benchmarking or would this result in errors with all of the interdependencies within TurboPFor? Thank you for any more help and advice. You have done so much for me! Alex
    8 replies | 207 view(s)
  • Gotty's Avatar
    Yesterday, 14:46
    Gotty replied to a thread Paq8sk in Data Compression
    133 replies | 10586 view(s)
  • Shelwien's Avatar
    Yesterday, 14:26
    > If you would be able to, could you help to provide some guidance on how to > adapt files like fp.h into a header file. It is already a header file with declarations of FP-related functions. > My specific requires me to compress massive files of sensor data, > with a short snippet of sample data attached below. Well, that's text. Just converting it to binary would give you compression ratio 3.2: 1536/480 = 3.2 I made a simple utility for this (see attach), but in this case it would be better to write each column to a different file, most likely. Also you have to understand that float compression libraries are usually intended for _binary_ floats. Its also easier to run icapp - just "icapp -Ff out3.bin".
    8 replies | 207 view(s)
  • Jyrki Alakuijala's Avatar
    Yesterday, 13:16
    Brotli uses my own prefix coding that is different from (length limited) Huffman coding. In my coding I optimize not only for the code length (like Huffman), but also for the complexity of the resulting entropy code representation. That gives an 0.15 % improvement roughly over Huffman coding. In early 2014 Zoltan Szabadka compared our coding against ANS for our reference test corpora (web and fonts), and a simple ANS implementation was slightly less dense due to more overhead in entropy code description. In typical use the prefix code that we use is within 1 % of pure arithmetic coding, and simpler and faster (less shadowing of data because no reversals are needed). (Arithmetic coding gives about 10 % improvement over Huffman in practical implementations. 1 % of that is because of arithmetic coding being more efficient. The 9 % is because of context modeling that arithmetic coding allows. In Brotli we take much of that 9 % by using context modeling that is compatible with prefix coding, but none of the 1 % improvement.)
    15 replies | 756 view(s)
  • Jyrki Alakuijala's Avatar
    Yesterday, 12:48
    Zopflification (an attempt on optimal parsing) was added somewhere around early 2015. So a version from 2014 would likely work. IIRC, we opensource the first version in October 2013. The first version is not file-format compatible with the standardized brotli, but can give ideas on how well it can work for very short files. The first version only includes quality 11. Brotli's optimal parsing could be greatly improved -- it is still ignoring the context modeling, thus overestimating final literal cost. I suspect another 2 % in density could be possible by doing this more thoroughly. Before zopflification brotli would often outperform gzip with the smallest files (in 500 byte category) by 40-50 % in compression. After zopflification it got 10-20 % worse in the smallest category. This is likely because zopfli lead to more entropy codes to be used and that is not accounted for in optimal parsing.
    15 replies | 756 view(s)
  • Fallon's Avatar
    Yesterday, 10:21
    Fallon replied to a thread WinRAR in Data Compression
    WinRAR - What's new in the latest version https://www.rarlab.com/download.htm
    186 replies | 130287 view(s)
  • AlexBa's Avatar
    Yesterday, 05:36
    Thank you for your help. After posting, I did realize that only icapp was built, and that makes sense as a reason why. I'm rereading the readme, but I still feel like a lot of implementation details are lacking(it seems to mainly focus on results). I'm guessing it just assumes more knowledge than I have, so I will have to keep working on that. If you would be able to, could you help to provide some guidance on how to adapt files like fp.h into a header file. I want to learn the process, but I feel like the question is still too open ended for me to tackle blindly. My specific requires me to compress massive files of sensor data, with a short snippet of sample data attached below. The main complications I forsee are having the few lines of extra header filler and an inconsistent separator between individual columns. I would want to compress all data to within some small factor like 1e-9. Thanks again for everything, I am just a lowly engineer trying to learn the world of cs. People like you make it a lot easier. Best, Alex
    8 replies | 207 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Yesterday, 04:36
    Paq8sk31 - ​New experimental hash function to improve compression ratio
    133 replies | 10586 view(s)
  • SolidComp's Avatar
    Yesterday, 02:09
    Make isn't installed by default on most Linux distros. You have to install make first before trying to build – that's what your error message sounds like to me. Did you install it?
    8 replies | 207 view(s)
  • Shelwien's Avatar
    Yesterday, 01:45
    > 1. What is the difference between ./icapp and ./icbench commands for compression? icapp is supposedly the new benchmark framework, while icbench is the old one. Current makefile only builds icapp. > 2. After downloading the git and making, running $./icbench yields the error 'No such file or directory'. > Also, trying to specifically just $make icbench results in errors on a > completely new linux system. What could I be doing wrong? I checked it and git clone + make seems to successfully build icapp > 3. Other than the readme, what are good resources for learning how to use > the software? Specifically, I want to compress a huge file of floating > point numbers, do you have any guidance for how to do this(with TurboPFor > or any other software that seems better)? Read the whole readme page at https://github.com/powturbo/TurboPFor-Integer-Compression Note that icapp is a benchmark, its not intended for actual file-to-file compression. For actual compression you're supposed to make your own frontend using some of the provided header files (like fp.h). You can also look at other libraries referenced here: https://github.com/powturbo/TurboPFor-Integer-Compression/tree/master/ext
    8 replies | 207 view(s)
  • Sportman's Avatar
    Yesterday, 00:58
    I thought the same, only Sloot specs where more the opposite 4x faster 400x smaller.
    16 replies | 770 view(s)
  • madserb's Avatar
    Yesterday, 00:40
    Thanks. I wish you well with the trials. I tested Shelwien's version but not easy to use and doesn't support all functionality
    160 replies | 85253 view(s)
  • AlexBa's Avatar
    30th June 2020, 22:31
    I am trying to use TurboPFor for compressing huge data files for a school project. However, I have run into some very basic issues that I can't seem to resolve with just the readme. Any help is appreciated: 1.What is the difference between ./icapp and ./icbench commands for compression? 2. After downloading the git and making, running $./icbench yields the error 'No such file or directory'. Also, trying to specifically just $make icbench results in errors on a completely new linux system. What could I be doing wrong? 3. Other than the readme, what are good resources for learning how to use the software? Specifically, I want to compress a huge file of floating point numbers, do you have any guidance for how to do this(with TurboPFor or any other software that seems better)? Thank you for any help
    8 replies | 207 view(s)
  • Lucas's Avatar
    30th June 2020, 20:29
    Interesting how they say their solution is 400x faster than compression, it's almost like their solution isn't compression at all, I'm getting a whiff of sloot from reading this. Their 4 byte compression claims are incredibly dubious, it's almost like they don't care that UDP and TCP header sizes would become the bottleneck in such networks. Eg: 100x UDP packets containing <=4 bytes of data would send 2.94x more data over the wire than a single UDP packet with 400 bytes of payload, and TCP (which they propose using in the patent for a distributed compression network) would be 5.71x larger than a single TCP packet with 400 bytes of payload. And not once do they mention anything about "buffering" in their system which would make this claim of being able to actually compress 4-bytes hold up. To me this just appears to be a pump and dump company to rip-off investors.
    16 replies | 770 view(s)
  • oloke5's Avatar
    30th June 2020, 20:25
    I think that it can be done but afaik pcompress wasn't designed to work on windows from it's beginning ( source1, source2 ). I tried to build it on msys2 under windows 10 x64 but it failed on ./config and I gave up. Although I think that it isn't that impossible. The main problem is to downgrade openssl to version 1.0 and pretend to be a linux-based OS. Also I saw that Shelwien made something like that but unfortunately it isn't working right now. So I think that there's a really big chance to get it compiled on windows too ( only x64 because afaik 32-bit is not supported by pcompress ) but it's just not that simple. Eventually You can install wsl2 on windows then run pcompress compiled for linux and as I just checked it works too. :D I will try to compile pcompress under windows too and I will let you if it worked ;)
    160 replies | 85253 view(s)
  • schnaader's Avatar
    30th June 2020, 16:27
    After a first quick glance, testset 3 looks fine. Looking for compressed leftovers, only found this one so far (very small ZIP part, 210 bytes decompressed). This is the most interesting testset for me, because this kind of data dominates things like Android APK files and is a mix of structured pointer lists, program code and string sections (e.g. method names), so some preprocessing (parsing and reordering stuff, detecting structured data) should be the way to go and would help compressing data like this. Left side is from the original file, right side is the output of "Precomp -cn"
    37 replies | 2119 view(s)
  • madserb's Avatar
    30th June 2020, 15:12
    Any chance of compiling it for windows x64?
    160 replies | 85253 view(s)
  • lz77's Avatar
    30th June 2020, 15:11
    Reverse engineering is enough. :) Therefore, I don't like sharing compressors between competitors.
    37 replies | 2119 view(s)
  • Ms1's Avatar
    30th June 2020, 14:46
    Test 3 and Test 4 sets are now available. Small wording changes in set descriptions. http://globalcompetition.compression.ru/rules/#test-set-description
    3 replies | 583 view(s)
  • compgt's Avatar
    30th June 2020, 11:54
    I might as well join this competition. We have up to Nov. 20, 2020, hmm? First, since my dual-core computer crashed, i have to buy a new computer. And i have to learn how to install g++ again oh my! (but i still got bcc32). After almost a decade, i might be coding again. Brave. :) Tried entering lzuf2 as a test submission, but gmail failed to send. Now email is "queued". Will sponsor Huawei own the submitted compressors? If not, will it buy the winning compressor?
    37 replies | 2119 view(s)
  • Scope's Avatar
    30th June 2020, 10:35
    I selected about ~900 Mb of PNG images (because the test of the whole set would take a lot of time and not all PNGs are processed correctly by all PNG optimizers and they skip them, improving their results) and several times measured the processing time of each optimizer on the CPU not loaded with other tasks. This is an average of all tests with parallel processing of 8 images on AVX2 CPU and 8 threads, 1x is the fastest optimization (ECT -1 compiled on GCC 10.1 with PGO, LTO, AVX2). Although it is not ready yet, but there are also some updates in comparison: - added ex-Jpeg rating, for more convenience in determining lossy images, but it doesn't work very well on non-photo images (and completely useless for PixelArt) - AVIF lossless was added and compression was performed on the latest version of Libavif 0.7.3 (aom:2.0.0, dav1d:0.7.0), as for YUV444, the speed was significantly increased, but the efficiency for static images has not changed much - updated WebP result (with recent changes in the main branch) - added comparison of the older Jpeg XL build (Speed 9) with the newer one (but comparison with the current build is not ready yet, because compression on Speed 8-9 takes a long time) I have tested near lossless modes for myself, but they are very difficult to add to a comparison without visual representation of the result (or any metrics). I also have a set with game screenshots (but the comparison is not ready yet). The set with non-RGB images hasn't been done yet, because I need enough free time to find, collect and convert such images, also it doesn't fit the current comparison, because I tried to use only non-converted public images, with a link to view each image (although it's possible to make a separate tab for such images). P.S. And about the need for lossless images in the Web, I do not think that they are completely useless for everything except UI-like, perhaps the need is much less for photographic images, but I see it more and more in Art images, comics/manga and game screenshots, especially given the ever-increasing speed of the Internet in the world Also all images in comparison are taken from very popular, viewed subreddits, I deliberately did not want to take my own images (because my needs may not reflect the needs of most other people). And considering the ineffectiveness of lossless mode in AVIF for RGB images, I hope that WebP v2 will have a more effective or separate lossless mode (as it was in WebP v1). Lossless Image Formats Comparison (Jpeg XL, AVIF, WebP, FLIF, PNG, ...) v0.5 https://docs.google.com/spreadsheets/d/1ju4q1WkaXT7WoxZINmQpf4ElgMD2VMlqeDN2DuZ6yJ8/ Total size bars on the chart do not need to be taken into account, they are not quite real, they are only for very rough representation
    159 replies | 40557 view(s)
  • Darek's Avatar
    30th June 2020, 10:14
    paq8sk30 scores: 70'587'620 - TS40.txt -x15 by Paq8sk30, time - 38'154,53s 70'315'001 - TS40.txt -x15 -e1,english.dic by Paq8sk30, time - 45'253,18s 72'656'037 - TS40.txt -x15 -w -e1,english.dic by Paq8sk30, time - 89'471,51s - worse score, bad time, I don't made decompression test...
    33 replies | 1472 view(s)
  • Kaw's Avatar
    30th June 2020, 09:27
    It really sounds like LittleBit with a fancy description. Static huffman tree with variable word size. Although I bet that LittleBit outperforms them on size.
    16 replies | 770 view(s)
  • oloke5's Avatar
    30th June 2020, 06:43
    Hi there! Yeah, I know that I'm kinda late but maybe it will be useful for someone in the future. I totally agree that pcompress is so powerful compression utility, I think that it's one of the most efficient ones (also with nanozip and freearc). I've decided to compile it on my own and... Being honest that was a lot of work in 2020 (a lot more that I previously expected ;)) but at the same time I think it was worth it. For any future reader of this thread, here is static binary of latest it's latest repo clone. I've compiled it under x64 Ubuntu 14 VM (mainly because of legacy OpenSSL compatibility). I also tested it on Linux Mint 20, Ubuntu 14 and recent openSUSE snapshot and everything was working fine. :cool: I hope it will work on every amd64 Linux distro. Also it's my first post on this forum, I really appreciate the idea and it helped me a lot many times. :_happy2: Sorry for my poor English and have a nice day ;)
    160 replies | 85253 view(s)
  • suryakandau@yahoo.co.id's Avatar
    30th June 2020, 04:40
    bbb v1.10 400000000 -> 81046263 in 1071.57 sec faster than v1.9
    33 replies | 1472 view(s)
  • Sportman's Avatar
    30th June 2020, 04:07
    "about 100 bits at a time" "400 times faster than standard compression resulting in up to 75% less bandwidth and storage utilization" https://www.youtube.com/watch?v=m-3BNenuX_Q So "AI" create from every about 100 bits 40-25 bits codewords and send that over the network + one time codebook with sourceblocks (size unknown). Sounds like an AI trained custom dictionary for every data type.
    16 replies | 770 view(s)
  • Gotty's Avatar
    30th June 2020, 00:58
    On their site ... AtomBeam Technologies will be unveiling its patented technology at the upcoming Oct. 22nd – 24th Mobile World Congress (MWC) event in Los Angeles Source: https://atombeamtech.com/2019/10/18/atombeam-unveils-patented-data-compaction-software-at-mobile-world-congress/ Searched for it. And really they were there. https://www.youtube.com/watch?v=Jy4w-Sn-hEk The video was uploaded 4 months ago. Already 3 views. I'm the 4th one. Quote from the video: "We are the only game in town. There is no other way to reduce that data." "There is no other way to reduce the size except for AtomBeam." OK. I have never ever disliked any video on youtube. This is my first.:_down2:
    16 replies | 770 view(s)
  • cssignet's Avatar
    30th June 2020, 00:58
    about the PNG tab, how did you measure speed?
    159 replies | 40557 view(s)
  • JamesWasil's Avatar
    30th June 2020, 00:06
    It's basically the longest sentence ever made to describe a huffman table arranged with statistical frequencies. Rather than having trees and blocks, they have "chunklets" and "key-value pairs" and "warplets" other fancy names that mean nothing. The data read from a file or stream isn't input anymore, it's now "source blocks". It might use another table to keep track of recents and rep matches and calls that "AI training" (which nothing else does lol). "reconstruction engine comprising a fourth plurality of programming instructions stored in the memory and operable on the processor of the computing device, wherein the programming instructions, when operating on the processor, cause the processor to: receive the plurality of warplets representing the data; retrieve the chunklet corresponding to the reference code in each warplet from the reference code library; and assemble the chunklets to reconstruct the data." ^ This means compressor and decompressor on a computer reading from a file lol The USPTO never turns money away even when granting toilet paper like that.
    16 replies | 770 view(s)
  • Gotty's Avatar
    29th June 2020, 19:57
    W3 schools teaches you the basics of different programming languages and structure of websites and similar stuff. It does not teach you actual algorithms. I'm afraid there are no such teaching materials that gives you building blocks so that you can copy and paste and voila you created a compression software. Creating the structure of a website is different from creating the behavior of a web app. The structural building blocks are small and much easier to understand (and there are building blocks!). I can teach people to create a very basic HTML website in a couple of hours. For that you will need to learn simple HTML. Doable in a short amount of time. Not too difficult. But teaching you to code in C++ and implement a simple compression program - it requires many days. And you will need to actually learn C++. No other way. You will need to write that code. It's not only about data compression, it's about any algorithmic task. You need to create an algorithm. And that is ... programming. People here do work with people from other fields (other = no data compression). You want to master text compression? You will benefit from linguistics. Just look at the text model in paq8px. It's full of linguistic stuff. But it's far form complete - it can be still improved. You want to master audio compression? Image compression? Video compression? Same stuff. Applied mathematics, psychoacoustics, signal processing, machine learning, neural networks just to name a few. Actually an advanced data compression software has more stuff from other fields than the actual "compression algorithm". You better rethink that here we all know the same theories. ;-) No, we don't. We actually never stop learning. Don't think that we have some common theory, and that's all ;-) We continuously invent new stuff. From the user point of view it's not evident. A user just types a command or pushes a button, and the magic happens. And that magic might be real magic - you just don't see it. If you would like to explore it deeper and actually make a new compression software - go for it. There are lot's of ways and lot's of new possibilities. Truly.
    8 replies | 225 view(s)
  • Dresdenboy's Avatar
    29th June 2020, 19:57
    Here's my bunch of findings (as I'm researching compression of small binary files for a while now -> also check the tiny decompressors thread by introspec): Lossless Message Compression: https://www.diva-portal.org/smash/get/diva2:647904/FULLTEXT01.pdf The Hitchhiker's guide to choosing the compression algorithm for your smart meter data: https://ieeexplore.ieee.org/document/6348285 (PDF freely available) Slides: https://www.ti5.tuhh.de/publications/2012/hitchhikersguide_energycon.pdf A Review of IP Packet Compression Techniques: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.111.6448 Automatic message compression with overload protection: https://www.sciencedirect.com/science/article/pii/S0164121216300267 Also feel free to send me compressor executables (Win32/64) and test sets. So I can include them in my test bench (see example plots here and here in the already mentioned thread).
    15 replies | 756 view(s)
  • Dresdenboy's Avatar
    29th June 2020, 18:04
    Sorry, this was just a comment about the code box formatting to Shelwien as a forum admin. ;) ​ Sounds plausible! I think the LZW part is pretty standard. Just the code-word/symbol encoding got more efficient. And since those symbols grow to 19 bits, the adjusted binary savings are going down to ~1/38 bits per encoded symbol, I think.
    10 replies | 572 view(s)
  • xezz's Avatar
    29th June 2020, 14:14
    shorty.js https://github.com/enkimute/shorty.js P5, P6, P12a http://mattmahoney.net/dc/p5.cpp http://mattmahoney.net/dc/p6.cpp http://mattmahoney.net/dc/p12a.cpp HA https://www.sac.sk/download/pack/ha0999.zip MR-RePair https://github.com/izflare/MR-RePair GCIS https://github.com/danielsaad/GCIS
    15 replies | 756 view(s)
  • Krishty's Avatar
    29th June 2020, 12:13
    Is there any comment on the root cause? Live for Speed had similar issues, but they were programming errors, like inadvertedly changing the random number generator for wind. But this would have happened with integer math or bignum or anything else just as well! Again, that post does not say anything about the root cause – in fact, floating-point numbers are not even mentioned in the thread?! ———— I see, you’re trying to say that one version of the code will later raise an additional floating-point exception (because it results in a signalling NaN) and the other won’t (because it results in a quiet NaN)? That’s a nasty situation indeed, but from my understanding, at least Visual C++ will mind that and won’t re-order the operands or instructions under /fp:strict: ———— You mean the issue described here and here?I didn’t know that, thanks! Like above, it is controlled by compiler settings. ———— Good paper, but from my understanding their main issue is „When does rounding take place?“, and you generally won’t have this issue when compiling for SSE/AVX (except when mixing different precisions per above) because every operation rounds to register width? ———— Like I said: With the FPU you’re screwed anyway. SSE/AVX does not have transcendental instructions, so there is nothing to worry as long as you stick to the proper library. ———— I have little experience with non-x86 FP handling, so I assume you’re right. But at least x86 with SSE/AVX and any modern compiler shouldn’t need that disclaimer.
    14 replies | 658 view(s)
  • compgt's Avatar
    29th June 2020, 11:45
    Right, even high-schoolers can start data compression with these essential knowledge. They do have a basic computer science course using Basic, Pascal, or C/C++. Talented high-schoolers can go straight to coding if they have compression ideas. When i remembered i co-developed image formats in the 1970s and 80s (i.e., bmp, gif, png, jpeg, jpeg2000) i wondered if i can do some coding again. Alas, not working as a professional programmer, it was hard to find compressors on the net and without constant access to the internet and software companies' resources. The results are The Data Compression Guide's core compressors from 2004-2005 based on my rediscovered knowledge of C from "The C Programming Language" book by Kernighan & Ritchie and "Using C" authored by the Atkinson brothers. I recall Mark Nelson remarking that one programmer's compressor took 5 years to complete and become stable. Data compression is not for the faint of heart. I say some now popular compressors were actually done in the 1970s and 80s and released only in 2000s. That's how advanced a subject data compression is.
    8 replies | 225 view(s)
  • lz77's Avatar
    29th June 2020, 11:13
    I think, lzturbo by DND after tuning has a good chance, but B. Ziganshin has a claim to this program... Has lzturbo the right to participate in the Competition?
    33 replies | 1472 view(s)
  • Darek's Avatar
    29th June 2020, 09:34
    Darek replied to a thread Paq8sk in Data Compression
    70'587'620 - TS40.txt -x15 by Paq8sk30, time 38154,53s
    133 replies | 10586 view(s)
  • Trench's Avatar
    29th June 2020, 06:02
    True Well not to have a grandmother that doesnt know how to turn on a pc :) Which is why i also said not even a programmer can do what most do here off hand since they have to learn what you know. Which what was said does apply for them as a good starting point YES. Sure the links shown have some good things but its not a step by step and they can not be edited unless one knows how to decompile, edit, and recompile them. But to have something set up that someone that does not know try and put their algorithm or put some basic codes from W3 school. Or maybe you feel its far too complicated then you now best. I suggest programmers here to work with people from other fields to get a different perspective. Since the best way to find something is to start form the beginning. And since everyone here is compromised to think of the same theories that is not starting from scratch to find what is missing on the road to looking for what they are missing. Dont you think so too? Start form the beginning and discover the theories to understand the progression to find new things. But if you feel I asked for too much then sorry.
    8 replies | 225 view(s)
  • Gotty's Avatar
    29th June 2020, 00:53
    Gotty replied to a thread paq8px in Data Compression
    Yes, they are nothing special. I just needed to commit those so that the next "real" version does not carry too many changes. These are many small changes so it is already not easy to diffview anyway.
    1937 replies | 545664 view(s)
  • Gotty's Avatar
    29th June 2020, 00:46
    There may be some contradiction here or I don't understand your question. Your question includes: "to make a simple compression program?" To make any program, you need to learn how to program. That would be the first step. Otherwise how would you "make a program"? :_unknown2: If your question actually means: "Any tutorial about how data compression works" then still that person needs to understand what are bits and bytes at least. To understand more sophisticated compression algorithms you actually need to have some background in information theory. Without all the above (i.e. non-programmers and without information theory background), see these videos: https://www.youtube.com/watch?v=JsTptu56GM8 How Computers Compress Text: Huffman Coding and Huffman Trees https://www.youtube.com/watch?v=TxkA5UX4kis Compression codes | Journey into information theory | Computer Science | Khan Academy https://www.youtube.com/watch?v=Lto-ajuqW3w Compression - Computerphile These are simple enough with visualizations. Hopefully non-programmers will understand them. But it is still beneficial if the person knows what is a bit, byte, frequency, probability. There is no escape.
    8 replies | 225 view(s)
  • Trench's Avatar
    28th June 2020, 22:46
    it makes it seem like even programmers wont be able to do it. LOL I was meaning more of a step by step method to deal with it for non programmers. File compression does not seem that it gained as much progress compared to HD size increased over the year. And if more CPU and Memory is what is needed then that does not make it seems like its doing it on a level playing field and will also have its limits. I was thinning of file modification for the file to be open to be edited and then ran to see results.Plenty have edited files to get "desired" results without knowing how to code, or simplified to be edited. Just like how one does not need to know fully Japanese to go to Japan but enough words to know what to do. Shelwien Well its not a step by step method but its interesting. The all the charts are not clear to show % indicators which would match them all up evenly to see a comparison. Its not even spaced out evenly with tabla to be put into a spread sheet to evaluate it better. The methods used are fine but another approach has to be taken. Anything else just takes more power/memory and gets slower. I am confident that I do not see any "great" improvement happening with conventional actions. You know what I have not seen yet that no one mentions? What is the the theoretical HUMP it has to pass to be recompress since none structured to allow it. Other fiends have the theoretical hump to pass just like overunity forums can pass the hump with a V Gate. Compgt Huffman is an obvious way to compress. How can people find something new when they are are the same stack as everyone else to not be able to take another road to find something new. Huffman or Shannon were not even programmers and are the founders of compression is the irony. The the only way to progress is the same it seems. Which is why i say for non programmers to participate. The issue is the mind set of a programmer is hindering design but without programmers their is no progress. its like a finger trap. The harder one tried it is uses against themselves feeling they are so close but not really. If anyway understood this they they would take the next step. Gotty Yes. Just like all the people that help influence file compression did not get influenced by programming. I dont get why what worked stopped and went to only 1 skill type. A Significant progression is impossible.
    8 replies | 225 view(s)
  • SolidComp's Avatar
    28th June 2020, 22:31
    Are there any innovative theories for small data compression? It seems like there should be theories and strategies tailored to it. I found the Shoco or Unishox compressor you posted a while back. I think they were hand Huffman coding letters or something. Small data might end up being best served by dictionaries, I guess. Maybe the technique they use in brotli where you can permute dictionary entries.
    15 replies | 756 view(s)
  • SolidComp's Avatar
    28th June 2020, 22:13
    Jesus, that is one sentence. Why do people do this? Why would the patent office ever grant this crap?
    16 replies | 770 view(s)
  • moisesmcardona's Avatar
    28th June 2020, 22:05
    moisesmcardona replied to a thread paq8px in Data Compression
    Just some tweaks to the code. paq8px_v187fix4 2020.06.13 - Cosmetic changes (formatting for better readability) - Added a couple of remarks - Restricted input buffer size to 1GB (on compression level 12 less memory is used: 28702 MB -> 27678 MB) - Restored INJECT #definitions for accessing Shared members (speed improvement) - Bucket find() function is hardwired to the non-SIMD version (speed improvement; SIMD version introduced in v184 seems to be somewhat slower than the non-SIMD one) - Archives are still expected to be binary compatible with v183fix1 paq8px_v187fix5 2020.06.14 - More cosmetic changes (formatting for better readability), more remarks in code - "Shared" and "UpdateBroadcaster" instances are compositioned instead of using a singleton - Archives are still expected to be binary compatible with v183fix1
    1937 replies | 545664 view(s)
  • Darek's Avatar
    28th June 2020, 21:57
    Darek replied to a thread paq8px in Data Compression
    is paq8px v187fix5 contains any compression algorithm changes or it's only time optimization?
    1937 replies | 545664 view(s)
  • suryakandau@yahoo.co.id's Avatar
    28th June 2020, 17:23
    ​here is the source code of paq8sk30
    133 replies | 10586 view(s)
  • suryakandau@yahoo.co.id's Avatar
    28th June 2020, 17:14
    waoo it is 2x faster than paq8sk28. thank you sportman
    133 replies | 10586 view(s)
  • Gotty's Avatar
    28th June 2020, 17:06
    AtomBeam Technologies Assigned Patent Patent number: 10680645 Source: https://www.storagenewsletter.com/2020/06/24/atombeam-technologies-assigned-patent-2/ The first paragraph from the patent: If you have difficulties reading it, you are not alone. This is a single sentence. :_eek2:
    16 replies | 770 view(s)
  • Sportman's Avatar
    28th June 2020, 16:50
    Sportman replied to a thread Paq8sk in Data Compression
    enwik8: 15,643,209 bytes, 8,628.898 sec., paq8sk30 -x15 -w -e1,english.dic
    133 replies | 10586 view(s)
  • Gotty's Avatar
    28th June 2020, 16:46
    Gotty replied to a thread Paq8sk in Data Compression
    Please don't forget to always include the source code. The licensing requires that.
    133 replies | 10586 view(s)
  • Gotty's Avatar
    28th June 2020, 16:41
    Gotty replied to a thread paq8px in Data Compression
    That's good news then.
    1937 replies | 545664 view(s)
  • Gotty's Avatar
    28th June 2020, 16:37
    What's a "non programmer"? Someone who does not know any programming languages?
    8 replies | 225 view(s)
  • compgt's Avatar
    28th June 2020, 16:32
    "The Data Compression Guide" is for introductory purposes, e.g. on Huffman, LZ etc. I could have modified existing freeware or GPL compression programs that time but opted my programs to be distinct but easily understandable. https://sites.google.com/site/datacompressionguide/
    8 replies | 225 view(s)
  • Jyrki Alakuijala's Avatar
    28th June 2020, 15:56
    We will first freeze the format and have a good API available. After that we can start with integration work.
    35 replies | 2843 view(s)
  • Darek's Avatar
    28th June 2020, 15:38
    Darek replied to a thread Paq8sk in Data Compression
    I'm running ts40.txt tests - I've try: 1) just -x15 option 2) -x15 plus Byron directory 3) best of above and -w option (sometines get better scores and decompression is OK)
    133 replies | 10586 view(s)
  • suryakandau@yahoo.co.id's Avatar
    28th June 2020, 12:24
    @sportman could you add this result to your gdcc public test set please ? now i test paq8pxd_v89_noAVX2 using -x10 - running
    133 replies | 10586 view(s)
  • Shelwien's Avatar
    28th June 2020, 11:40
    > paq8pxd_v89_no_avx2 -s0 works 72 sec. Too slow preprocessing... Decoding is 24s though. Also there're others: xwrt, DRT, Shkarin's "liptify", Bulat's dict, cmix -s, mcm. Even WRT/LIPT is a simple algorithm, which has no reason to be slow (aside from inefficient implementation). > I'm interested in the file size after maximum Rapid compression (with a condition of 40/2.5 sec. for compression + decompression). Maybe try tweaking zstd? Like https://github.com/facebook/zstd/blob/dev/tests/paramgrill.c ?
    33 replies | 1472 view(s)
  • Shelwien's Avatar
    28th June 2020, 11:32
    1) Find a basic compressor in http://mattmahoney.net/dc/text.html 2) Try it ... 3) Profit Or learn C or C++ programming, download some open-source compressor and read the source. Unfortunately C/C++ have best compilers, a compressor would be automatically 3-4x slower if another programming language is chosen. Or maybe read some books first: https://www.amazon.com/Data-Compression-Complete-Salomon-David/dp/8184898002 https://www.amazon.com/Understanding-Compression-Data-Modern-Developers/dp/1491961538
    8 replies | 225 view(s)
  • lz77's Avatar
    28th June 2020, 10:12
    paq8pxd_v89_no_avx2 -s0 works 72 sec. Too slow preprocessing... I'm interested in the file size after maximum Rapid compression (with a condition of 40/2.5 sec. for compression + decompression).
    33 replies | 1472 view(s)
More Activity