Activity Stream

Filter
Sort By Time Show
Recent Recent Popular Popular Anytime Anytime Last 24 Hours Last 24 Hours Last 7 Days Last 7 Days Last 30 Days Last 30 Days All All Photos Photos Forum Forums
  • Shelwien's Avatar
    Today, 20:54
    I don't see a point in being too strict about definitions here. For example, zlib implementation limits codelengths to 15 bits or less, so strictly speaking that's not really Huffman code either. So, 1) deflate only uses static Huffman coding, with predefined table or block header. Predefined code is still a valid huffman code for some other data. 2) Adaptive code is based on idea of data stationarity - that statistics for known symbols s..s would also apply to s. In most cases adaptive coders would adjust their codes after processing each symbol, but its also possible to do batch updates. > do you think it would be feasible to implement adaptive arithmetic coding > with less CPU and memory overhead than deflate? This depends on implementation. Compared to zlib - most likely yes, since its not very speed-optimized. Also there's nothing good about zlib memory usage, so its not a problem. But a speed optimized deflate implementation has a decoding speed of >1GB/s, which is not feasible for truly adaptive AC/ANS. Might be technically possible with batch updates using _large_ batches (better to just encode the stats), or only apply AC for rarely used special symbols (so most of the code would remain huffman). However LZ is not always applicable (might not find matches in some data), so always beating deflate would also mean adaptive AC with >1GB/s, which is not realistic on current hardware. And encoding speed is hard to compare, since there're many different algorithms used to encode to same deflate format. I'd say that it should be possible to make LZ+AC/ANS coder which would be both faster and stronger than all zlib levels (zstd likely already applies).
    5 replies | 86 view(s)
  • cssignet's Avatar
    Today, 20:34
    ​ it could be somehow possible. now, about how you compared stuff what/how did you compare? did you ask an external program to run 8 threads of pingo/ECT?
    164 replies | 40756 view(s)
  • Jarek's Avatar
    Today, 20:20
    I don't know when first prefix codes were used - definitely before Huffman, but Morse code is not prefix-free, instead it uses three types of gaps for separation (this redundancy is very helpful for synchronization, error correction). Regarding context-dependece and adaptivity, they are separate concepts: - fixed Markov process is example of the former, ARMA model uses context as a few previous values, lossless image compression uses context as already decoded neighboring pixels, etc. - adaptation usually refers to evolving models/distribution for non-stationarity, e.g. independent variables of evolving probability distribution in adaptive prefix/AC/ANS, can be e.g. Gaussian/Laplace distribution of evolving parameters (e.g. https://arxiv.org/pdf/2003.02149 ). We can combine both, e.g. in Markov process of online optimized parameters, adaptive ARMA, LSTM etc.
    5 replies | 86 view(s)
  • Shelwien's Avatar
    Today, 20:19
    Shelwien replied to a thread 7-zip plugins in Data Compression
    > Iso7z support for bin files are limited to 4gb That could be because 7z's default solid mode has 4g blocks. Try adding -ms=1t or something.
    9 replies | 3093 view(s)
  • Shelwien's Avatar
    Today, 20:18
    Maybe like this?
    68 replies | 80283 view(s)
  • Shelwien's Avatar
    Today, 20:13
    The description says "texts from Project Gutenberg in UTF-8 characters, so it’s essentially ASCII", not that its ASCII. > It contains characters left over from UTF-8 That's intentional in this case. No need to make it too simple.
    39 replies | 2180 view(s)
  • Darek's Avatar
    Today, 19:50
    Darek replied to a thread Paq8pxd dict in Data Compression
    Scores of paq8pxd_v89_40_3360 on my testset. In general 640 bytes of improvement. Always something!
    941 replies | 318925 view(s)
  • Jyrki Alakuijala's Avatar
    Today, 19:26
    Even when they are called Huffman codes, these are not Huffman codes. They are static prefix codes. Static prefix coding that is not adapted to a particular use was used in 1840s when Morse, Gerke and ITU codes were introduces. Huffman coding is an improvement over static prefix coding. Huffman coding follows a process introduced in 1952 http://compression.ru/download/articles/huff/huffman_1952_minimum-redundancy-codes.pdf A Huffman code is an optimal prefix code -- when not including the coding length of the code itself. Dynamic Huffman in deflate vocabulary means a normal Huffman code. Deflate does not have actual dynamic (adaptive) Huffman codes despite that the normal Huffman codes are called dynamic. A real adaptive (or dynamic) Huffman code builds a more accurate version as data is being transmitted. Overall this is (roughly) equal density to sending the codes separately, but can receive the first symbols of the stream with less latency. The cost for decompression is quite big so this method is usually avoided. There are some more sane variations of adaptive prefix coding that are slightly cheaper, but don't follow the Huffman process for rebuilding the code. These, too, tend to be too complex and slow for practical use. In a canonical prefix code -- including the canonical Huffman code -- the codes are lexically ordered by length. You only communicate the lengths and the codes are implicit. Going from prefix coding to arithmetic is not a big improvement. One needs to have context, too, usually for adaptive coding. Adaption is based on context, often the probabilities of a few symbols are changed based on context. Traditionally prefix coding is not compatible with the concept of context. I believe that the first practical implementations of contextful prefix coding were in WebP lossless and Brotli. In JPEG XL we use that context-based method for ANS, too.
    5 replies | 86 view(s)
  • Scope's Avatar
    Today, 19:11
    I don't think so, just different configurations and different conditions, when there will be time (or rather a free CPU), maybe I will run more thoughtful tests. On a single file, Pingo is faster, but when processing multiple files in parallel with all threads, I noticed that on my configuration ECT is often faster. 05vnjqzhrou31.png powershell Measure-Command ECT -1 -s TotalMilliseconds : 3573,3911 ECT -5 -s TotalMilliseconds : 9823,7649 ECT -5 --mt-deflate -s TotalMilliseconds : 7871,9845 ECT_PGO -1 -s TotalMilliseconds : 3383,1625 ECT_PGO -5 -s TotalMilliseconds : 9509,1655 ECT_PGO -5 --mt-deflate -s TotalMilliseconds : 7393,5582 pingo -s0 -strip TotalMilliseconds : 2681,7484 pingo -s5 -strip TotalMilliseconds : 8587,4876 ProcProfile64 ECT -1 -s User Time : 3.593s Kernel Time : 0.046s Process Time : 3.639s Clock Time : 3.646s Working Set : 42440 KB Paged Pool : 125 KB Nonpaged Pool : 9 KB Pagefile : 41500 KB Page Fault Count : 12840 ECT -5 -s User Time : 9.437s Kernel Time : 0.421s Process Time : 9.858s Clock Time : 9.879s Working Set : 100856 KB Paged Pool : 125 KB Nonpaged Pool : 10 KB Pagefile : 102556 KB Page Fault Count : 510657 ECT -5 --mt-deflate -s User Time : 10.562s Kernel Time : 0.546s Process Time : 11.108s Clock Time : 7.848s Working Set : 167096 KB Paged Pool : 125 KB Nonpaged Pool : 14 KB Pagefile : 180024 KB Page Fault Count : 505302 ECT_PGO -1 -s User Time : 3.500s Kernel Time : 0.046s Process Time : 3.546s Clock Time : 3.541s Working Set : 42380 KB Paged Pool : 125 KB Nonpaged Pool : 9 KB Pagefile : 41568 KB Page Fault Count : 12830 ECT_PGO -5 -s User Time : 9.265s Kernel Time : 0.421s Process Time : 9.686s Clock Time : 9.700s Working Set : 100672 KB Paged Pool : 125 KB Nonpaged Pool : 10 KB Pagefile : 102552 KB Page Fault Count : 510700 ECT_PGO -5 --mt-deflate -s User Time : 10.203s Kernel Time : 0.531s Process Time : 10.734s Clock Time : 7.485s Working Set : 168432 KB Paged Pool : 125 KB Nonpaged Pool : 14 KB Pagefile : 181772 KB Page Fault Count : 502639 pingo -s0 -strip User Time : 1.890s Kernel Time : 0.109s Process Time : 1.999s Clock Time : 1.992s Working Set : 96100 KB Paged Pool : 124 KB Nonpaged Pool : 9 KB Pagefile : 104164 KB Page Fault Count : 115651 pingo -s5 -strip User Time : 8.328s Kernel Time : 0.171s Process Time : 8.499s Clock Time : 8.524s Working Set : 103628 KB Paged Pool : 124 KB Nonpaged Pool : 9 KB Pagefile : 112428 KB Page Fault Count : 171346 ​
    164 replies | 40756 view(s)
  • maadjordan's Avatar
    Today, 18:19
    maadjordan replied to a thread 7-zip plugins in Data Compression
    Iso7z support for bin files are limited to 4gb .. larger files are cut to 4gb limit even if using 64bit dll module.
    9 replies | 3093 view(s)
  • lz77's Avatar
    Today, 18:13
    TS40.txt is not an ASCII, even not an Latin-1 file. It contains characters left over from UTF-8 (with decimal codes 128, 145, 147, 225, 226). For example see next line after line "Thank Heaven! . . . . Good-night." Or see next word after She learned of the great medicine,
    39 replies | 2180 view(s)
  • Trench's Avatar
    Today, 18:12
    compgtzip is thr standard now while in the 90's it was not a standard in windows I think. Now you click on a zip file and it opens up like a regular folder to view. If you feel it takes a certain mindset to do compression then maybe that is the problem that its just that one mindset. Maybe a more abstract approach is needed to create something new. Many programmers want to be game programmers thinking they can make something good to make money which they almost all have bad design. Everyone wants to be a 1 man show to do it all which its hard to even do 1 skill right let alone many. Gotty Again you are right. Even Excel is somewhat a programming language despite encodes in the program. Non data compression programmers is different but not completely different another field. Their is a program called cheat engine which modifies almost any programs to act differently which doesnt take programming knowledge. Obviously its silly since its mainly for games. Excel modified things within the program to find patterns quicker than coding. Programmers as a whole are kind of ridged since one has to be to follow the rules of programming and might be the case here and why I stated the ones that help make compression like Hoffman were not programmers but engineers. And Hoffman theory is useless if not for programmers. Sometimes people look for the hardest solution when the simplest one is more effective. As i said before everyone here is defining compression as the wrong thing when they say randomness being an issue which is not it is lack of patterns. Which all those fields you stated are put in the same place when we evaluate it like that. Everyone is trying to make something more complex which would take up more cpu and memory to make the program more unusable. I assume you disagree, but disagreeing cuts your view from another perspective. I might have something or I might not but if i have something that cant be proven then its useless if i dont have something than can test it out then again useless. All I am is trying to assist in the mindset to think about of the box since a lot of things stated in the forum from all like minded people with someone with "no data compression" a you call that is trying to shine a light on another view. Just a tough that the best selling burgers in the world is made by a real estate company that rents out stores called McDonald. the best tasting burgers do not come close in sales. The reason McDonald did so well is to switch mentality from a burger company to a real estate company.
    9 replies | 258 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Today, 18:01
    Paq8sk32 has better score and faster than paq8sk29.
    138 replies | 10685 view(s)
  • Darek's Avatar
    Today, 17:48
    Darek replied to a thread Paq8sk in Data Compression
    Ok and?
    138 replies | 10685 view(s)
  • compgt's Avatar
    Today, 17:41
    When i was writing The Data Compression Guide, i included Adaptive Huffman Coding of Knuth by reading the paper of Vitter. My implementations for Knuth's and Vitter's algorithms are correct, exactly adhering to Vitter's description. Accordingly, Knuth's algorithm is used in the early program "compact". You can optimize my sample code or re-implement these algorithms. https://sites.google.com/site/datacompressionguide/fgk https://sites.google.com/site/datacompressionguide/vitter Yes, adaptive huffman coding can be done per block to avoid constant updates to the dynamic Huffman tree. There are compressors that do this as listed in comp.compression FAQ. Canonical Huffman coding and length-limited codes are studied extensively by A. Moffat.
    5 replies | 86 view(s)
  • Jyrki Alakuijala's Avatar
    Today, 17:32
    My understanding is that WebP v2 lossless will be based on my original architecture for WebP v1 lossless, just with most parts slightly improved from the nine years of experience we had in between -- like changing prefix coding to ANS. Due to its heritage I expect it to be consistently a few % (perhaps 3-5 %) better than WebP v1 with roughly similar execution speed. WebP v2's lossy improvements are not incremental from WebP v1, it looks like a full redesign with some inspiration from the AV1/AVIF/VP10/VP9 family.
    164 replies | 40756 view(s)
  • Jarek's Avatar
    Today, 16:57
    Probably nearly all used Huffman is static and canonical (?) - what reduces size of headers. Adaptive Huffman is interesting for theoretical considerations, but probably more costly than adaptive AC - doesn't seem to make sense in practice (?) But adaptive AC/ANS is quite popular for exploiting non-stationarity, e.g. LZNA, RAZOR, lolz use adaptive rANS, reaching ~100MB/s/core for 4bit alphabet (additional memory cost is negligible). https://sites.google.com/site/powturbo/entropy-coder
    5 replies | 86 view(s)
  • cssignet's Avatar
    Today, 16:16
    ​perhaps i am wrong, so if you have some spare time to try quick tests, i am curious to see the results from your computer. if you could compile ECT (with default stuff), and then run those from original file (i randomly choose this file, but pick whatever in the list): timer ECT -1 -s file.png timer ECT -5 -s file.png timer ECT -5 --mt-deflate -s file.png timer pingo -s0 -strip file.png could you please paste the logs from each here? thanks
    164 replies | 40756 view(s)
  • LucaBiondi's Avatar
    Today, 15:35
    LucaBiondi replied to a thread Paq8pxd dict in Data Compression
    Hi Darek this is the executable. Happy testing! paq8pxd_v89_40_3360.zip ​ Luca
    941 replies | 318925 view(s)
  • SolidComp's Avatar
    Today, 15:29
    Hi all – I have some questions about the different forms of Huffman coding, and where they're used, and I figured many of you would be able to fill in the blanks. Thanks for your help. Static Huffman: Does this mean 1) a tree generated from a single pass over all the data, or 2) some sort of predefined table independent of any given data, like defined in a spec? I'm seeing different accounts from different sources. For example, the HPACK header compression spec for HTTP/2 has a predefined static Huffman table, with codes specified for each ASCII character (starting with five-bit codes). Conversely, I thought static Huffman in deflate / gzip was based on a single pass over the data. If deflate or gzip have predefined static Huffman tables (for English text?), I've never seen them. Dynamic/Adaptive Huffman: What's the definition? How dynamic are we talking about? It's used in deflate and gzip right? Is it dynamic per block? (Strangely, the Wikipedia article says that it's rarely used, but I can't think of a codec more popular than deflate...) Canonical Huffman: Where is this used? By the way, do you think it would be feasible to implement adaptive arithmetic coding with less CPU and memory overhead than deflate? The Wikipedia article also said this about adaptive Huffman: "...the cost of updating the tree makes it slower than optimized adaptive arithmetic coding, which is more flexible and has better compression." Do you agree that adaptive arithmetic coding should be faster and with better ratios? What about the anticipated CPU and memory overhead? Thanks.
    5 replies | 86 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Today, 15:08
    ​i just compare paq8sk32 with paq8sk29 on enwik8
    138 replies | 10685 view(s)
  • Darek's Avatar
    Today, 14:30
    Darek replied to a thread Paq8sk in Data Compression
    Do you have the same score for paq8sk23 or paq8sk28?
    138 replies | 10685 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Today, 14:23
    How to compile fp8 using this batch script ?
    68 replies | 80283 view(s)
  • Scope's Avatar
    Today, 14:04
    I can also add these results to the comparison (as tests on another configuration and number of threads) if they are done for all other modes and optimizers. That was my main problem, because not all optimizers processed correctly PNG from the whole set (and if they skip PNG after an error, they don't waste time optimizing it and it's not a very honest speed result). Otherwise, I also tried to make it simple, accurate and fair, the only difference is that: - I tested on another configuration, it's a dedicated i7-4770K (Haswell, AVX2, 16GB RAM, Windows 10 64-bit) - additional optimization flags were used during compilation (but they were used equally on all open source optimizers) - I ran tests 3 times on the same set with the same optimizers to make sure there was no impact of sudden Windows background processes, HDD caching, swap, etc. and get more accurate results - for a simpler and more convenient result, and since this is not the time spent on the whole set (for the reasons described above), for 1x was taken the fastest result, instead of a long numerical value of time spent such as Total Milliseconds - 2355561,4903. Mostly for the same reasons I didn't want to compare speed results at all, because they may depend on different configurations, CPU, even HDD speed (especially for fast modes), although they still give approximate data about speed of different optimizers. Yes, some files are deleted over time, this is one of the disadvantages of using public images, but my whole idea was to compare real public images rather than specialized test sets. Also, if needed, I can upload and provide a link (in pm) to a set of files on which I was comparing the speed.
    164 replies | 40756 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Today, 13:57
    Paq8sk32 - improve text model by new hash function the result of enwik8 using -s6 -w -e1,english.dic is Total 100000000 bytes compressed to 16285631 bytes. Time 20340.22 sec, used 1290 MB (1352947102 bytes) of memory faster than paq8sk29 ​enwik9 is running ​Here is the source code... here is the binary too
    138 replies | 10685 view(s)
  • kaitz's Avatar
    Today, 12:16
    kaitz replied to a thread Paq8pxd dict in Data Compression
    I cant see how this can be done. Last version uses less memory 5gb or less (max option), cant remember. And size diff for enwik9 is 50kb (worse). So it is not only about memory. There are no new(+) context in wordmodel since v80, only preprocessing for enwik. Next time i will probably work on this in feb. edit: +RC,modSSE
    941 replies | 318925 view(s)
  • Darek's Avatar
    Today, 10:59
    Darek replied to a thread Paq8pxd dict in Data Compression
    @LucaBiondi - could you attach exe file of modified paq8pxd_v89? According to benchmark procedure - good idea in my opinion but there should be the same benchmark file to test it - maybe procedurally generated one by progam before start the test? @Kaitz - I have some idea but maybe it could be silly or not duable. Is it possible to use some sort of very light compression of program memory during use? As I understand majority of memory is use on some kinds of trees or other structure data representative. Is it possible to use lightly compressed data which would be virtual simulation of more memory usage? I think it could be still room for improvement for the biggest files (enwik8/9) if we could use more memory but maybe is not need to use more phisical memory and instead of this made kind of trick like this? Of course it would be more time consuming but maybe it could be worth it...
    941 replies | 318925 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Today, 10:20
    Let me try , but I don't promise because I am not a programmer :)
    33 replies | 1544 view(s)
  • lz77's Avatar
    Today, 10:16
    ​> TS40.txt: > 132,248,557 bytes, 8.055 sec. - 0.544 sec., zstd -7 --ultra --single-thread > 130,590,357 bytes, 10.530 sec. - 0.528 sec., zstd -8 --ultra --single-thread What ratio show zstd after preprocessing meaning 40/2.5=16 sec. for compression + decompression, 5% off? What ratio at all will be the best in 16 seconds? ........... lzturbo seems to be winner in the Rapid compression.
    33 replies | 1544 view(s)
  • Darek's Avatar
    Today, 09:35
    yes, because it's option made for enwik8/9 :) 70'197'866 - TS40.txt -x15 -e1,english.dic by Paq8sk30, time - 73'876,51s - good score, bad time - paq8sk23 should be about 20x faster to meet contest criteria could you try to add use of more threads/cores?
    33 replies | 1544 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Today, 04:12
    On enwik8/9 there is no error when use -w option
    33 replies | 1544 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Today, 03:21
    paq8pxd_v89 when using -w option cause error message Transform fails at 333440671, skipping... so it detect ts40.txt as default , not bigtext wrt then that cause the compression ratio is worse
    33 replies | 1544 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Today, 03:04
    so the best score is using -x15 -e1,english.dic. @sportman could you add it to GDCC public test set file ?
    33 replies | 1544 view(s)
  • cssignet's Avatar
    Today, 00:53
    ​i would suggest a more simple, accurate, verifiable and *fair* test for time comparison. pingo/ECT binaries with same compiler/flags, cold-start running on dedicated resource (FX-4100 @ 3.6 Ghz — 8GB RAM — Windows 7 64-bit), tested on files found in PNG tab (aside note: i could not grab those https://i.redd.it/5lg9uz7fb7a41.png https://i.redd.it/6aiqgffywbk41.png https://i.redd.it/gxocab3x91e41.png https://i.redd.it/ks8z85usbg241.png https://i.redd.it/uuokrw18s4i41.png so input would be 1.89 GB (2 027 341 876 bytes) - 493 files) pingo (0.99 rc2 40) - ECT (f0b38f7 (0.8.3)) (with -strip) multi-processing (4x): ECT -1 --mt-file Kernel Time = 14.133 = 1% User Time = 3177.709 = 390% Process Time = 3191.842 = 392% Virtual Memory = 438 MB Global Time = 813.619 = 100% Physical Memory = 433 MB pingo -s0 Kernel Time = 86.518 = 16% User Time = 1740.393 = 328% Process Time = 1826.912 = 344% Virtual Memory = 1344 MB Global Time = 530.104 = 100% Physical Memory = 1212 MB ECT -5 --mt-file Kernel Time = 1557.482 = 43% User Time = 9361.869 = 259% Process Time = 10919.352 = 303% Virtual Memory = 1677 MB Global Time = 3601.090 = 100% Physical Memory = 1514 MB pingo -s5 Kernel Time = 144.550 = 6% User Time = 6937.879 = 317% Process Time = 7082.429 = 324% Virtual Memory = 1378 MB Global Time = 2183.105 = 100% Physical Memory = 1193 MB file per file: ECT -1 Kernel Time = 20.326 = 0% User Time = 2963.472 = 93% Process Time = 2983.799 = 99% Virtual Memory = 283 MB Global Time = 2984.405 = 100% Physical Memory = 282 MB pingo -s0 -nomulti Kernel Time = 68.468 = 4% User Time = 1443.711 = 95% Process Time = 1512.180 = 99% Virtual Memory = 905 MB Global Time = 1513.683 = 100% Physical Memory = 887 MB ECT -5 --mt-deflate Kernel Time = 886.538 = 14% User Time = 8207.743 = 134% Process Time = 9094.281 = 149% Virtual Memory = 1000 MB Global Time = 6083.433 = 100% Physical Memory = 916 MB <-- multithreaded pingo -s5 -nomulti Kernel Time = 109.107 = 1% User Time = 5679.091 = 98% Process Time = 5788.198 = 99% Virtual Memory = 978 MB Global Time = 5789.232 = 100% Physical Memory = 980 MB <-- *not* multithreaded regular -sN profiles in pingo goes more for small/avg image size paletted/RGBA. if someone would seek for speed over space, -sN -faster could be used instead. on some samples, it could be still competitive https://i.redd.it/05vnjqzhrou31.png (13 266 623 bytes) ECT -1 (out: 10 023 297 bytes) Kernel Time = 0.140 = 2% User Time = 5.928 = 97% Process Time = 6.068 = 99% Virtual Memory = 27 MB Global Time = 6.093 = 100% Physical Memory = 29 MB pingo -s0 (out: 8 777 351 bytes) Kernel Time = 0.280 = 8% User Time = 2.870 = 90% Process Time = 3.151 = 99% Virtual Memory = 98 MB Global Time = 3.166 = 100% Physical Memory = 90 MB pingo -s0 -faster (out: 9 439 005 bytes) Kernel Time = 0.124 = 6% User Time = 1.825 = 92% Process Time = 1.950 = 99% Virtual Memory = 86 MB Global Time = 1.965 = 100% Physical Memory = 78 MB
    164 replies | 40756 view(s)
  • Shelwien's Avatar
    Yesterday, 21:42
    > Shelwien, what does your utility do? Does it just convert numbers from text to binary? What does it do with other text? It convert out3.txt to binary and back to text losslessly (except for headers). > Do you convert floats to IEEE binary? Yes.
    8 replies | 220 view(s)
  • Shelwien's Avatar
    Yesterday, 21:41
    > Do you have any advice on resources for learning C++ as a language? https://stackoverflow.com/questions/388242/the-definitive-c-book-guide-and-list But you don't really need to know everything about C++ syntax and libraries to start programming in it. There're reference sites like https://www.cplusplus.com/ so you can always look up specific features. Basically just find some open-source project that you like and read the source, while looking up things that you don't know. > Do you think TurboPFor is the best method for this task? If you need very high processing speed, then probably yes. > I have researched other compressors(bzip2, zfp, SZ..) and it seems to have > the best performance, but I am by no means an expert. I am still planning > on implementing other methods for comparison. The best compression would be provided by a custom statistical (CM) compressor (since there're probably correlations between columns). TurboPFor doesn't have any really complex algorithms - its mostly just delta (subtracting predicted value from each number) and bitfield rearrangement/transposition. The main purpose of this library is that it provides efficient SIMD implementations of these algorithms for multiple platforms. But if gigabytes-per-second speed is not really necessary for you, then you can just as well use something else, like self-written delta + zstd. > Once I have progressed further, can I break apart the TurboPFor code? > Could I take a small amount of the files to run the method that seems > to work best from benchmarking or would this result in errors > with all of the interdependencies within TurboPFor? You can drop some of the files - actually it seems to build a library (libic.a) with the relevant part. Unfortunately its not very readable due to all the speed optimizations.
    8 replies | 220 view(s)
  • SolidComp's Avatar
    Yesterday, 21:24
    Shelwien, what does your utility do? Does it just convert numbers from text to binary? What does it do with other text? Do you convert floats to IEEE binary?
    8 replies | 220 view(s)
  • AlexBa's Avatar
    Yesterday, 20:44
    Thank you so much for your help. This all makes sense. I have done a little testing, and you were right, creating individual binary files from each column results in better compression. I will continue working on code and a header file to run this compression easily. As I'm getting started, I have a few more questions: Do you have any advice on resources for learning C++ as a language? I have taken a course with C++ in high school, but that was a few years ago. Do you think TurboPFor is the best method for this task? I have researched other compressors(bzip2, zfp, SZ..) and it seems to have the best performance, but I am by no means an expert. I am still planning on implementing other methods for comparison. Once I have progressed further, can I break apart the TurboPFor code? Could I take a small amount of the files to run the method that seems to work best from benchmarking or would this result in errors with all of the interdependencies within TurboPFor? Thank you for any more help and advice. You have done so much for me! Alex
    8 replies | 220 view(s)
  • Gotty's Avatar
    Yesterday, 14:46
    Gotty replied to a thread Paq8sk in Data Compression
    138 replies | 10685 view(s)
  • Shelwien's Avatar
    Yesterday, 14:26
    > If you would be able to, could you help to provide some guidance on how to > adapt files like fp.h into a header file. It is already a header file with declarations of FP-related functions. > My specific requires me to compress massive files of sensor data, > with a short snippet of sample data attached below. Well, that's text. Just converting it to binary would give you compression ratio 3.2: 1536/480 = 3.2 I made a simple utility for this (see attach), but in this case it would be better to write each column to a different file, most likely. Also you have to understand that float compression libraries are usually intended for _binary_ floats. Its also easier to run icapp - just "icapp -Ff out3.bin".
    8 replies | 220 view(s)
  • Jyrki Alakuijala's Avatar
    Yesterday, 13:16
    Brotli uses my own prefix coding that is different from (length limited) Huffman coding. In my coding I optimize not only for the code length (like Huffman), but also for the complexity of the resulting entropy code representation. That gives an 0.15 % improvement roughly over Huffman coding. In early 2014 Zoltan Szabadka compared our coding against ANS for our reference test corpora (web and fonts), and a simple ANS implementation was slightly less dense due to more overhead in entropy code description. In typical use the prefix code that we use is within 1 % of pure arithmetic coding, and simpler and faster (less shadowing of data because no reversals are needed). (Arithmetic coding gives about 10 % improvement over Huffman in practical implementations. 1 % of that is because of arithmetic coding being more efficient. The 9 % is because of context modeling that arithmetic coding allows. In Brotli we take much of that 9 % by using context modeling that is compatible with prefix coding, but none of the 1 % improvement.)
    15 replies | 769 view(s)
  • Jyrki Alakuijala's Avatar
    Yesterday, 12:48
    Zopflification (an attempt on optimal parsing) was added somewhere around early 2015. So a version from 2014 would likely work. IIRC, we opensource the first version in October 2013. The first version is not file-format compatible with the standardized brotli, but can give ideas on how well it can work for very short files. The first version only includes quality 11. Brotli's optimal parsing could be greatly improved -- it is still ignoring the context modeling, thus overestimating final literal cost. I suspect another 2 % in density could be possible by doing this more thoroughly. Before zopflification brotli would often outperform gzip with the smallest files (in 500 byte category) by 40-50 % in compression. After zopflification it got 10-20 % worse in the smallest category. This is likely because zopfli lead to more entropy codes to be used and that is not accounted for in optimal parsing.
    15 replies | 769 view(s)
  • Fallon's Avatar
    Yesterday, 10:21
    Fallon replied to a thread WinRAR in Data Compression
    WinRAR - What's new in the latest version https://www.rarlab.com/download.htm
    186 replies | 130304 view(s)
  • AlexBa's Avatar
    Yesterday, 05:36
    Thank you for your help. After posting, I did realize that only icapp was built, and that makes sense as a reason why. I'm rereading the readme, but I still feel like a lot of implementation details are lacking(it seems to mainly focus on results). I'm guessing it just assumes more knowledge than I have, so I will have to keep working on that. If you would be able to, could you help to provide some guidance on how to adapt files like fp.h into a header file. I want to learn the process, but I feel like the question is still too open ended for me to tackle blindly. My specific requires me to compress massive files of sensor data, with a short snippet of sample data attached below. The main complications I forsee are having the few lines of extra header filler and an inconsistent separator between individual columns. I would want to compress all data to within some small factor like 1e-9. Thanks again for everything, I am just a lowly engineer trying to learn the world of cs. People like you make it a lot easier. Best, Alex
    8 replies | 220 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Yesterday, 04:36
    Paq8sk31 - ​New experimental hash function to improve compression ratio
    138 replies | 10685 view(s)
  • SolidComp's Avatar
    Yesterday, 02:09
    Make isn't installed by default on most Linux distros. You have to install make first before trying to build – that's what your error message sounds like to me. Did you install it?
    8 replies | 220 view(s)
  • Shelwien's Avatar
    Yesterday, 01:45
    > 1. What is the difference between ./icapp and ./icbench commands for compression? icapp is supposedly the new benchmark framework, while icbench is the old one. Current makefile only builds icapp. > 2. After downloading the git and making, running $./icbench yields the error 'No such file or directory'. > Also, trying to specifically just $make icbench results in errors on a > completely new linux system. What could I be doing wrong? I checked it and git clone + make seems to successfully build icapp > 3. Other than the readme, what are good resources for learning how to use > the software? Specifically, I want to compress a huge file of floating > point numbers, do you have any guidance for how to do this(with TurboPFor > or any other software that seems better)? Read the whole readme page at https://github.com/powturbo/TurboPFor-Integer-Compression Note that icapp is a benchmark, its not intended for actual file-to-file compression. For actual compression you're supposed to make your own frontend using some of the provided header files (like fp.h). You can also look at other libraries referenced here: https://github.com/powturbo/TurboPFor-Integer-Compression/tree/master/ext
    8 replies | 220 view(s)
  • Sportman's Avatar
    Yesterday, 00:58
    I thought the same, only Sloot specs where more the opposite 4x faster 400x smaller.
    16 replies | 778 view(s)
  • madserb's Avatar
    Yesterday, 00:40
    Thanks. I wish you well with the trials. I tested Shelwien's version but not easy to use and doesn't support all functionality
    160 replies | 85256 view(s)
  • AlexBa's Avatar
    30th June 2020, 22:31
    I am trying to use TurboPFor for compressing huge data files for a school project. However, I have run into some very basic issues that I can't seem to resolve with just the readme. Any help is appreciated: 1.What is the difference between ./icapp and ./icbench commands for compression? 2. After downloading the git and making, running $./icbench yields the error 'No such file or directory'. Also, trying to specifically just $make icbench results in errors on a completely new linux system. What could I be doing wrong? 3. Other than the readme, what are good resources for learning how to use the software? Specifically, I want to compress a huge file of floating point numbers, do you have any guidance for how to do this(with TurboPFor or any other software that seems better)? Thank you for any help
    8 replies | 220 view(s)
  • Lucas's Avatar
    30th June 2020, 20:29
    Interesting how they say their solution is 400x faster than compression, it's almost like their solution isn't compression at all, I'm getting a whiff of sloot from reading this. Their 4 byte compression claims are incredibly dubious, it's almost like they don't care that UDP and TCP header sizes would become the bottleneck in such networks. Eg: 100x UDP packets containing <=4 bytes of data would send 2.94x more data over the wire than a single UDP packet with 400 bytes of payload, and TCP (which they propose using in the patent for a distributed compression network) would be 5.71x larger than a single TCP packet with 400 bytes of payload. And not once do they mention anything about "buffering" in their system which would make this claim of being able to actually compress 4-bytes hold up. To me this just appears to be a pump and dump company to rip-off investors.
    16 replies | 778 view(s)
  • oloke5's Avatar
    30th June 2020, 20:25
    I think that it can be done but afaik pcompress wasn't designed to work on windows from it's beginning ( source1, source2 ). I tried to build it on msys2 under windows 10 x64 but it failed on ./config and I gave up. Although I think that it isn't that impossible. The main problem is to downgrade openssl to version 1.0 and pretend to be a linux-based OS. Also I saw that Shelwien made something like that but unfortunately it isn't working right now. So I think that there's a really big chance to get it compiled on windows too ( only x64 because afaik 32-bit is not supported by pcompress ) but it's just not that simple. Eventually You can install wsl2 on windows then run pcompress compiled for linux and as I just checked it works too. :D I will try to compile pcompress under windows too and I will let you if it worked ;)
    160 replies | 85256 view(s)
  • schnaader's Avatar
    30th June 2020, 16:27
    After a first quick glance, testset 3 looks fine. Looking for compressed leftovers, only found this one so far (very small ZIP part, 210 bytes decompressed). This is the most interesting testset for me, because this kind of data dominates things like Android APK files and is a mix of structured pointer lists, program code and string sections (e.g. method names), so some preprocessing (parsing and reordering stuff, detecting structured data) should be the way to go and would help compressing data like this. Left side is from the original file, right side is the output of "Precomp -cn"
    39 replies | 2180 view(s)
  • madserb's Avatar
    30th June 2020, 15:12
    Any chance of compiling it for windows x64?
    160 replies | 85256 view(s)
  • lz77's Avatar
    30th June 2020, 15:11
    Reverse engineering is enough. :) Therefore, I don't like sharing compressors between competitors.
    39 replies | 2180 view(s)
  • Ms1's Avatar
    30th June 2020, 14:46
    Test 3 and Test 4 sets are now available. Small wording changes in set descriptions. http://globalcompetition.compression.ru/rules/#test-set-description
    3 replies | 589 view(s)
  • compgt's Avatar
    30th June 2020, 11:54
    I might as well join this competition. We have up to Nov. 20, 2020, hmm? First, since my dual-core computer crashed, i have to buy a new computer. And i have to learn how to install g++ again oh my! (but i still got bcc32). After almost a decade, i might be coding again. Brave. :) Tried entering lzuf2 as a test submission, but gmail failed to send. Now email is "queued". Will sponsor Huawei own the submitted compressors? If not, will it buy the winning compressor?
    39 replies | 2180 view(s)
  • Scope's Avatar
    30th June 2020, 10:35
    I selected about ~900 Mb of PNG images (because the test of the whole set would take a lot of time and not all PNGs are processed correctly by all PNG optimizers and they skip them, improving their results) and several times measured the processing time of each optimizer on the CPU not loaded with other tasks. This is an average of all tests with parallel processing of 8 images on AVX2 CPU and 8 threads, 1x is the fastest optimization (ECT -1 compiled on GCC 10.1 with PGO, LTO, AVX2). Although it is not ready yet, but there are also some updates in comparison: - added ex-Jpeg rating, for more convenience in determining lossy images, but it doesn't work very well on non-photo images (and completely useless for PixelArt) - AVIF lossless was added and compression was performed on the latest version of Libavif 0.7.3 (aom:2.0.0, dav1d:0.7.0), as for YUV444, the speed was significantly increased, but the efficiency for static images has not changed much - updated WebP result (with recent changes in the main branch) - added comparison of the older Jpeg XL build (Speed 9) with the newer one (but comparison with the current build is not ready yet, because compression on Speed 8-9 takes a long time) I have tested near lossless modes for myself, but they are very difficult to add to a comparison without visual representation of the result (or any metrics). I also have a set with game screenshots (but the comparison is not ready yet). The set with non-RGB images hasn't been done yet, because I need enough free time to find, collect and convert such images, also it doesn't fit the current comparison, because I tried to use only non-converted public images, with a link to view each image (although it's possible to make a separate tab for such images). P.S. And about the need for lossless images in the Web, I do not think that they are completely useless for everything except UI-like, perhaps the need is much less for photographic images, but I see it more and more in Art images, comics/manga and game screenshots, especially given the ever-increasing speed of the Internet in the world Also all images in comparison are taken from very popular, viewed subreddits, I deliberately did not want to take my own images (because my needs may not reflect the needs of most other people). And considering the ineffectiveness of lossless mode in AVIF for RGB images, I hope that WebP v2 will have a more effective or separate lossless mode (as it was in WebP v1). Lossless Image Formats Comparison (Jpeg XL, AVIF, WebP, FLIF, PNG, ...) v0.5 https://docs.google.com/spreadsheets/d/1ju4q1WkaXT7WoxZINmQpf4ElgMD2VMlqeDN2DuZ6yJ8/ Total size bars on the chart do not need to be taken into account, they are not quite real, they are only for very rough representation
    164 replies | 40756 view(s)
  • Darek's Avatar
    30th June 2020, 10:14
    paq8sk30 scores: 70'587'620 - TS40.txt -x15 by Paq8sk30, time - 38'154,53s 70'315'001 - TS40.txt -x15 -e1,english.dic by Paq8sk30, time - 45'253,18s 72'656'037 - TS40.txt -x15 -w -e1,english.dic by Paq8sk30, time - 89'471,51s - worse score, bad time, I don't made decompression test...
    33 replies | 1544 view(s)
  • Kaw's Avatar
    30th June 2020, 09:27
    It really sounds like LittleBit with a fancy description. Static huffman tree with variable word size. Although I bet that LittleBit outperforms them on size.
    16 replies | 778 view(s)
  • oloke5's Avatar
    30th June 2020, 06:43
    Hi there! Yeah, I know that I'm kinda late but maybe it will be useful for someone in the future. I totally agree that pcompress is so powerful compression utility, I think that it's one of the most efficient ones (also with nanozip and freearc). I've decided to compile it on my own and... Being honest that was a lot of work in 2020 (a lot more that I previously expected ;)) but at the same time I think it was worth it. For any future reader of this thread, here is static binary of latest it's latest repo clone. I've compiled it under x64 Ubuntu 14 VM (mainly because of legacy OpenSSL compatibility). I also tested it on Linux Mint 20, Ubuntu 14 and recent openSUSE snapshot and everything was working fine. :cool: I hope it will work on every amd64 Linux distro. Also it's my first post on this forum, I really appreciate the idea and it helped me a lot many times. :_happy2: Sorry for my poor English and have a nice day ;)
    160 replies | 85256 view(s)
  • suryakandau@yahoo.co.id's Avatar
    30th June 2020, 04:40
    bbb v1.10 400000000 -> 81046263 in 1071.57 sec faster than v1.9
    33 replies | 1544 view(s)
  • Sportman's Avatar
    30th June 2020, 04:07
    "about 100 bits at a time" "400 times faster than standard compression resulting in up to 75% less bandwidth and storage utilization" https://www.youtube.com/watch?v=m-3BNenuX_Q So "AI" create from every about 100 bits 40-25 bits codewords and send that over the network + one time codebook with sourceblocks (size unknown). Sounds like an AI trained custom dictionary for every data type.
    16 replies | 778 view(s)
  • Gotty's Avatar
    30th June 2020, 00:58
    On their site ... AtomBeam Technologies will be unveiling its patented technology at the upcoming Oct. 22nd – 24th Mobile World Congress (MWC) event in Los Angeles Source: https://atombeamtech.com/2019/10/18/atombeam-unveils-patented-data-compaction-software-at-mobile-world-congress/ Searched for it. And really they were there. https://www.youtube.com/watch?v=Jy4w-Sn-hEk The video was uploaded 4 months ago. Already 3 views. I'm the 4th one. Quote from the video: "We are the only game in town. There is no other way to reduce that data." "There is no other way to reduce the size except for AtomBeam." OK. I have never ever disliked any video on youtube. This is my first.:_down2:
    16 replies | 778 view(s)
  • cssignet's Avatar
    30th June 2020, 00:58
    about the PNG tab, how did you measure speed?
    164 replies | 40756 view(s)
  • JamesWasil's Avatar
    30th June 2020, 00:06
    It's basically the longest sentence ever made to describe a huffman table arranged with statistical frequencies. Rather than having trees and blocks, they have "chunklets" and "key-value pairs" and "warplets" other fancy names that mean nothing. The data read from a file or stream isn't input anymore, it's now "source blocks". It might use another table to keep track of recents and rep matches and calls that "AI training" (which nothing else does lol). "reconstruction engine comprising a fourth plurality of programming instructions stored in the memory and operable on the processor of the computing device, wherein the programming instructions, when operating on the processor, cause the processor to: receive the plurality of warplets representing the data; retrieve the chunklet corresponding to the reference code in each warplet from the reference code library; and assemble the chunklets to reconstruct the data." ^ This means compressor and decompressor on a computer reading from a file lol The USPTO never turns money away even when granting toilet paper like that.
    16 replies | 778 view(s)
  • Gotty's Avatar
    29th June 2020, 19:57
    W3 schools teaches you the basics of different programming languages and structure of websites and similar stuff. It does not teach you actual algorithms. I'm afraid there are no such teaching materials that gives you building blocks so that you can copy and paste and voila you created a compression software. Creating the structure of a website is different from creating the behavior of a web app. The structural building blocks are small and much easier to understand (and there are building blocks!). I can teach people to create a very basic HTML website in a couple of hours. For that you will need to learn simple HTML. Doable in a short amount of time. Not too difficult. But teaching you to code in C++ and implement a simple compression program - it requires many days. And you will need to actually learn C++. No other way. You will need to write that code. It's not only about data compression, it's about any algorithmic task. You need to create an algorithm. And that is ... programming. People here do work with people from other fields (other = no data compression). You want to master text compression? You will benefit from linguistics. Just look at the text model in paq8px. It's full of linguistic stuff. But it's far form complete - it can be still improved. You want to master audio compression? Image compression? Video compression? Same stuff. Applied mathematics, psychoacoustics, signal processing, machine learning, neural networks just to name a few. Actually an advanced data compression software has more stuff from other fields than the actual "compression algorithm". You better rethink that here we all know the same theories. ;-) No, we don't. We actually never stop learning. Don't think that we have some common theory, and that's all ;-) We continuously invent new stuff. From the user point of view it's not evident. A user just types a command or pushes a button, and the magic happens. And that magic might be real magic - you just don't see it. If you would like to explore it deeper and actually make a new compression software - go for it. There are lot's of ways and lot's of new possibilities. Truly.
    9 replies | 258 view(s)
  • Dresdenboy's Avatar
    29th June 2020, 19:57
    Here's my bunch of findings (as I'm researching compression of small binary files for a while now -> also check the tiny decompressors thread by introspec): Lossless Message Compression: https://www.diva-portal.org/smash/get/diva2:647904/FULLTEXT01.pdf The Hitchhiker's guide to choosing the compression algorithm for your smart meter data: https://ieeexplore.ieee.org/document/6348285 (PDF freely available) Slides: https://www.ti5.tuhh.de/publications/2012/hitchhikersguide_energycon.pdf A Review of IP Packet Compression Techniques: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.111.6448 Automatic message compression with overload protection: https://www.sciencedirect.com/science/article/pii/S0164121216300267 Also feel free to send me compressor executables (Win32/64) and test sets. So I can include them in my test bench (see example plots here and here in the already mentioned thread).
    15 replies | 769 view(s)
  • Dresdenboy's Avatar
    29th June 2020, 18:04
    Sorry, this was just a comment about the code box formatting to Shelwien as a forum admin. ;) ​ Sounds plausible! I think the LZW part is pretty standard. Just the code-word/symbol encoding got more efficient. And since those symbols grow to 19 bits, the adjusted binary savings are going down to ~1/38 bits per encoded symbol, I think.
    10 replies | 577 view(s)
  • xezz's Avatar
    29th June 2020, 14:14
    shorty.js https://github.com/enkimute/shorty.js P5, P6, P12a http://mattmahoney.net/dc/p5.cpp http://mattmahoney.net/dc/p6.cpp http://mattmahoney.net/dc/p12a.cpp HA https://www.sac.sk/download/pack/ha0999.zip MR-RePair https://github.com/izflare/MR-RePair GCIS https://github.com/danielsaad/GCIS
    15 replies | 769 view(s)
More Activity