Activity Stream

Filter
Sort By Time Show
Recent Recent Popular Popular Anytime Anytime Last 24 Hours Last 24 Hours Last 7 Days Last 7 Days Last 30 Days Last 30 Days All All Photos Photos Forum Forums
  • cssignet's Avatar
    Today, 02:16
    the changes would be implemented atm (rc2 44). i did not it test widely though
    476 replies | 127114 view(s)
  • Trench's Avatar
    Today, 01:56
    As stated on wikipedia "GIF images are compressed using the Lempel–Ziv–Welch (LZW) lossless data compression technique to reduce the file size without degrading the visual quality. This compression technique was patented in 1985. Controversy over the licensing agreement between the software patent holder, Unisys, and CompuServe in 1994 spurred the development of the Portable Network Graphics (PNG) standard. By 2004 all the relevant patents had expired." "Welch filed a patent application for the LZW method in June 1983. The resulting patent, US 4558302, granted in December 1985""when the patent was granted, Unisys entered into licensing agreements with over a hundred companies" https://en.wikipedia.org/wiki/GIF Did they make money? can you if you did the same? if its a small difference no but if its a big maybe? "In January 2016, Telegram started re-encoding all GIFs to MPEG4 videos that "require up to 95% less disk space for the same image quality." well mpeg4 doesnt seem as convenient or as fast. A bigger hard drive you can buy easier than time. In other fiends some people creates things and big companies pay them million to buy the patent and the company never sells the thing they buy since more money to be made in the thing they have. But other programs say free to use for public use but must pay if corporation. And some use the honor system while others give a free demo for a free days are tries only or 1 time daily. So many questions arise and more which a few are. Now is their a legal form one must fill out to get the assurances? Even if you do make something wont it hurt others to try to improve it? How long will the patent or payment last? Would it be better to not have a patent and keep ti secret for personal use? If you do make money will you use the money to make something else better or just screw everyone else like all rich which make money got to their head to destroy nations for their ego? You cant make money giving money away. and you cant go forward if you spend your time to make some thing good and cant afford to feed yourself. if Nikola Tesla made money or had plenty of money more things would have been done. Let's say you are smart and can make money if your dont have money no matter how smart you are. Unless you are smart enough to make money and connections which make the world go round then maybe you can influence the world in one thought. while others think they can influence with just ideas. As Christ said about watching out for "CASTING PEARLS BEFORE. SWINE Matthew 7:1-6" But a lot of things in the bible were stated by the ancient Greeks and everything people have now as well. But many push away who they came with the dance with to dance with another and then wonder why it was not so good in the end. In Ancient Greece only the rich paid taxes IF they wanted to. And many wanted to pay taxes to help their fellow man and to help their status and business as well. But the rich mainly went to war too. Now its backwards for poor die for the rich, the non rich pay the most taxes, the rich want to give as little and when they give its for tax breaks, the rich get tax money for their business. People seem to give praise to some rich despite they are ripping them off. Maybe the masses like to be ripped off and the rich give them what they want since people are conditioned to be that way. And the Rich will get richer and only focus on money and if you feel you are morally right and give things away for free then dont blame them if they profit off you. You can only blame yourself. I am not saying dont give things away I am saying dont hurt yourself and others. Play it smart since if you are smart enough to make something find someone else smart enough to protect you. But dont hurt the masses that can benefit from your idea. Kind of all those big tech companies getting rich by you using their product since they make money with people going on the site. you can be part of the problem if you disagree with them. Sure they will make their money without you but the mentality of the masses is the issue. Which maybe that can never change but maybe your help to use their tactics against them if you have the capability. Maybe resistance is futile and give up everything. As the Ancient Greeks said leisure time is used to learn something new.
    0 replies | 11 view(s)
  • Trench's Avatar
    Yesterday, 23:11
    ​JamesCool. I thought about that too to replace the most used with a smaller one and the least used with bigger but in another way. But its also kind of used in coding in some ways I think. Gotty agreed. nice associations, so my wording was wrong and I emphasized it too much. But that was a side comment. Similarities from one field to another. The main point is not addressed which is programmers need completely different other fiends for perspective as for randomness you are right for the most part. when you open up a combination lock you have a limit of patterns to put in. the more digits the more numbers. if its 1 digit and binary you have 50% odds, if its 2 digits and binary its 4, 3 digits and binary etc you get the idea. I dont know if you saw my other post about randomness but I explained more in more details with examples. but in short a computer does not know the difference between random or order and we define it with the formula in what we understand. Maybe you are 100% right but as for not I do not agree fully. Also sorry but I disagree with discouraging other to disagree with random files. You have to push yourself to achieve something harder which makes the other feel easier. Randomness file compression is the future I feel despite almost everyone does not see it. https://encode.su/threads/3338-Random-files-are-compressed-all-the-time-by-definition-is-the-issue I forgot my programming 20 years ago and only do simple things like html, excel, or hex edit. Its a different mindset when dealing with other things and to be away form coding. Compgt very interesting comments. Did you have proof that it would be an issue? but if you can you can at least make one for yourself and have a dead man switch. I made a post about that too. https://encode.su/threads/3346-No-hope-for-better-compression-even-if-possible People think of the positive aspects of finding the ultimate compression but the issue is many ignore or imagine the negative side about it. Good programmers but not practical. What if you were in charge of a nations GDP, what if you are in charge of a companies fiduciary duty, what if you are in charge of livelihood of others. 1000 steps forward 2000 steps back. If one is going to release fire they better be able to control it. This forum is for compression but the balance is more than that, but not talked about since again this forum is just for file compression. Its hard to balance many aspects. yes they benefit from the occurrence of patterns, and that the edge you take and exploit. Just like a boxer exploits the opponents weakness the same with the coding. As for making money on their ideas. Well how did this work for Gif? I should make a new topic. Since I feel many are holding back out of fear,out of money, etc. Anyway I suggest people do what Christ said. Be like a child to at least start from the begging and understand how they learn. The obvious is not so obvious. I talk vague since I am trying to make others think about it and dont like to say much about it.
    20 replies | 400 view(s)
  • cssignet's Avatar
    Yesterday, 20:32
    ​these are the results i expected. i planned changes for rc2 45 as it could require more trials. thanks for your tests, it was very useful
    476 replies | 127114 view(s)
  • compgt's Avatar
    Yesterday, 18:39
    compgt replied to a thread 2019-nCoV in The Off-Topic Lounge
    Update, from me: Coronavirus is not new. We knew about it in the 1970s. In the end, i had many thought-up vaccines for coronaviruses, even for mutations. Many were following me, asking for my inputs. Because researcher groups were tipping me on their results too, even from on-the-spot lectures. But I recall a group deliberately brainstorming powerful coronaviruses mutations in the 1970s Cold War, maybe in early EDSA, QC, Philippines. They were the bio-weapons type of guys, who would willingly monetize on coronaviruses vaccines too. Now, this year 2020, many nations are offering billion$ worth for funding covid-19 vaccine research, to find an immediate vaccine. Need i say that the vaccines "hydroxychloroquine, avigan, and remdesivir" etc. were probably designed, named or suggested by me with my Ramizo relatives who were medical researchers saving the world from coronaviruses or bio-weapons? In the 1970s, actually. Japan, my ally, can probably validate this claim. If i was the one who designed remdesivir, then how many million$ for me from designing this vaccine? But that is not needed if i get my Hollywood and tech billion$. (5/7/2020)
    41 replies | 3464 view(s)
  • Scope's Avatar
    Yesterday, 18:35
    I did a test on the entire PNG set (after these multithreading changes are in Pingo, I will redo the speed results of all optimizers when I have time). 498 files (2 053 230 418 bytes) ect -1 -strip --mt-file *.png Processed 498 files Saved 332.32MB out of 1.91GB (16.9717%) Kernel Time = 0.015 = 0% User Time = 0.000 = 0% Process Time = 0.015 = 0% Virtual Memory = 9 MB Global Time = 342.581 = 100% Physical Memory = 8 MB proto -s0 -strip *.png proto - (213.94s): ------------------------------------------------------------------------- 498 files => 419141.53 KB - (20.90%) saved ------------------------------------------------------------------------- Kernel Time = 0.015 = 0% User Time = 0.000 = 0% Process Time = 0.015 = 0% Virtual Memory = 9 MB Global Time = 213.986 = 100% Physical Memory = 8 MB
    476 replies | 127114 view(s)
  • cssignet's Avatar
    Yesterday, 16:26
    ​ here we are. proto is pingo, just with fair comparison of threading. thanks to your test, i would fix that later just chunks removal, this would be 'fixed' now, if you are still up for it, the last trial: proto -s0 -strip, on the set you have test (~900 files) on your benchmark (heuristic test), and a comparison if possible with ECT -1 -strip. thanks!
    476 replies | 127114 view(s)
  • compgt's Avatar
    Yesterday, 16:18
    Gotty, the idea is to inform the computer science world that these tech companies were started up or planned during the Cold War. American students are now reaping the benefits of world peace which we were working on in the 1970s to the 80s. World peace happened because those militaristic power groups were satisfied already they will own these billion$-companies that i was creating and planning. I negotiated for world peace immediately because i was also thinking of my ownerships, of course! The next idea is clearly for money purposes now (amidst this pandemic and economic recessions around the world). They owe me. Plain and simple.
    8 replies | 1090 view(s)
  • compgt's Avatar
    Yesterday, 15:54
    Maybe there was a "Gotty" handle then, see there's a 'G' and 't' in Gotty? But i can't recall fully, his work on paq might be new indeed, done at present realtime, not 1980s. ivan2k2, i'm fine. Maybe even finer than you. I'm remembering right? :)
    8 replies | 1090 view(s)
  • Jyrki Alakuijala's Avatar
    Yesterday, 15:25
    Entropy codings tend to be less than 10 gbit/s on software. If you have a 10 GbE network connection 4 : 1 compression, you need 400 % of CPU to fill the 10 gbits network. 1 % is about 3 orders of magnitude off. It might be possible to be done with relatively simple hardware. Still, you'd be using more than 10 % of the memory bus for storing the decompressed data. ​So, no, it is not possible with 1 %.
    11 replies | 610 view(s)
  • Scope's Avatar
    Yesterday, 15:10
    Same set, timer64 (Timer 14.00), pingo (43), proto. Hmm, not bad, proto is noticeably faster and more efficient at speed 5 (including -mp=8, and faster, but slightly less efficient at speed 0 (maybe HDD in this configuration can affect, so I tested slower speed 5 too). ​ First run: pingo.exe -s0 -strip *.png pingo - (181.04s): ----------------------------------------------------------------- 236 files => 128508.08 KB - (15.57%) saved ----------------------------------------------------------------- Kernel Time = 0.015 = 0% User Time = 0.000 = 0% Process Time = 0.015 = 0% Virtual Memory = 9 MB Global Time = 181.088 = 100% Physical Memory = 8 MB pingo.exe -s5 -strip *.png pingo - (499.17s): ----------------------------------------------------------------- 236 files => 148348.89 KB - (17.97%) saved ----------------------------------------------------------------- Kernel Time = 0.015 = 0% User Time = 0.000 = 0% Process Time = 0.015 = 0% Virtual Memory = 9 MB Global Time = 499.209 = 100% Physical Memory = 8 MB pingo.exe -s0 -strip *.png -mp=8 pingo - (122.29s): ----------------------------------------------------------------- 236 files => 128508.08 KB - (15.57%) saved ----------------------------------------------------------------- Kernel Time = 0.015 = 0% User Time = 0.000 = 0% Process Time = 0.015 = 0% Virtual Memory = 9 MB Global Time = 122.331 = 100% Physical Memory = 8 MB pingo.exe -s5 -strip *.png -mp=8 pingo - (487.98s): ----------------------------------------------------------------- 236 files => 148348.89 KB - (17.97%) saved ----------------------------------------------------------------- Kernel Time = 0.031 = 0% User Time = 0.000 = 0% Process Time = 0.031 = 0% Virtual Memory = 9 MB Global Time = 488.028 = 100% Physical Memory = 8 MB proto.exe -s0 -strip *.png proto - (97.50s): ------------------------------------------------------------------------- 236 files => 128497.20 KB - (15.56%) saved ------------------------------------------------------------------------- Kernel Time = 0.000 = 0% User Time = 0.015 = 0% Process Time = 0.015 = 0% Virtual Memory = 9 MB Global Time = 97.553 = 100% Physical Memory = 8 MB proto.exe -s5 -strip *.png proto - (262.32s): ------------------------------------------------------------------------- 236 files => 151476.87 KB - (18.35%) saved ------------------------------------------------------------------------- Kernel Time = 0.031 = 0% User Time = 0.000 = 0% Process Time = 0.031 = 0% Virtual Memory = 9 MB Global Time = 262.347 = 100% Physical Memory = 8 MB Second run: pingo.exe -s0 -strip *.png pingo - (164.35s): ----------------------------------------------------------------- 236 files => 128508.08 KB - (15.57%) saved ----------------------------------------------------------------- Kernel Time = 0.015 = 0% User Time = 0.000 = 0% Process Time = 0.015 = 0% Virtual Memory = 9 MB Global Time = 164.398 = 100% Physical Memory = 8 MB pingo.exe -s5 -strip *.png pingo - (504.40s): ----------------------------------------------------------------- 236 files => 148348.89 KB - (17.97%) saved ----------------------------------------------------------------- Kernel Time = 0.031 = 0% User Time = 0.000 = 0% Process Time = 0.031 = 0% Virtual Memory = 9 MB Global Time = 504.451 = 100% Physical Memory = 8 MB pingo.exe -s0 -strip *.png -mp=8 pingo - (124.96s): ----------------------------------------------------------------- 236 files => 128508.08 KB - (15.57%) saved ----------------------------------------------------------------- Kernel Time = 0.031 = 0% User Time = 0.015 = 0% Process Time = 0.046 = 0% Virtual Memory = 9 MB Global Time = 124.992 = 100% Physical Memory = 8 MB pingo.exe -s5 -strip *.png -mp=8 pingo - (489.69s): ----------------------------------------------------------------- 236 files => 148348.89 KB - (17.97%) saved ----------------------------------------------------------------- Kernel Time = 0.000 = 0% User Time = 0.015 = 0% Process Time = 0.015 = 0% Virtual Memory = 9 MB Global Time = 489.742 = 100% Physical Memory = 8 MB proto.exe -s0 -strip *.png proto - (97.89s): ------------------------------------------------------------------------- 236 files => 128497.20 KB - (15.56%) saved ------------------------------------------------------------------------- Kernel Time = 0.015 = 0% User Time = 0.000 = 0% Process Time = 0.015 = 0% Virtual Memory = 9 MB Global Time = 97.936 = 100% Physical Memory = 8 MB proto.exe -s5 -strip *.png ​proto - (262.13s): ------------------------------------------------------------------------- 236 files => 151476.87 KB - (18.35%) saved ------------------------------------------------------------------------- Kernel Time = 0.000 = 0% User Time = 0.015 = 0% Process Time = 0.015 = 0% Virtual Memory = 9 MB Global Time = 262.171 = 100% Physical Memory = 8 MB
    476 replies | 127114 view(s)
  • ivan2k2's Avatar
    Yesterday, 15:00
    One day he will say something like: "i gave Gotty few ideas about paq8px, but he dont want to mention me anywhere". The only help he needs is better doctors and/or forum ban.
    8 replies | 1090 view(s)
  • Gotty's Avatar
    Yesterday, 14:17
    I also don't know how to respond. Sorry, compgt. I know a girl, who when she was still attending kindergarten told us many "true stories" that she owned dragons and distinguished fires (her father is a firefighter) and about many heroic acts she did. It was cute, but sometimes I could not handle her stories properly. Still unsure how to properly handle them. Now she is attending school, and these imaginations of hers slowly fade away. I can understand: she needed attention, more than she got. Maybe since she had a younger brother? I don't know. Today she is more mature. She still needs more attention than an average kid. It's also true that she is extremely intelligent - very smart girl and very active. What you wrote - the style and content - reminds me of her stories. I have no idea why you would write what you wrote, and don't know how to respond properly. Probably the reason is the same: you need attention, and don't get enough? But you are certainly not attending kindergarten, so I'm puzzled. And JamesWasil is also puzzled. We would tell you something useful, give you something that helps, but don't know what. ​ How can we help?
    8 replies | 1090 view(s)
  • cssignet's Avatar
    Yesterday, 11:04
    ​from user pov, perhaps. from devs, it makes sense to compare stuff in the same scope. anyway, the huge speed difference here would not be about compiler/flags, but an issue which seem to be related to my tool itself (possibly mp or heuristics that would failed on the specific set you have tested). if i could solve this, then pingo *should* be faster, as expected
    168 replies | 41288 view(s)
  • compgt's Avatar
    Yesterday, 10:38
    I'm not trolling. I don't intend it. I am not sowing discord among you. I am simply stating here the real history of modern computing. That the Philippines was the main venue of its making. (We were a military superpower, my clan. We were NASA/Starfleet.) And it's about justice, me being excluded as co-owner of the tech giants, and me being unpaid of my Hollywood music and movies. I made Star Wars, Star Trek and Transformers, you think that's cool? I say it would be cooler if i am duly paid for making these movies. So i ask million$, even billion$, from them. https://grtamayoblog.blogspot.com/2018/10/hollywood-billion.html?m=1 Well, i'll say it again. I was a child genius in the 1970s to early 80s dictating on computer science matters. Don't be intimidated by genius keyword; well maybe "talented, prolific, precocious" child, though i failed in realtime college in the 1990s. I was designing Intel cpus, DOS and Microsoft Windows, Microsoft Office, Borland compilers, Visual Studio, etc. I moderated on the tech giants because i got shares in them and ownership bonds. I favored Microsoft, IBM and Intel. I accepted AMD and Cyrix, so they existed. This is the timetable i dictated for data compression and encryption too. I co-developed many ZIP compressors with my cousins and aunts (pkzip, WinZip and WinRAR), snappy, arj. The bakery bread-named compressors by Google were probably developed with me; dnd's lzturbo optimization techniques related to cpu processing, even ZSTD webpage in Facebook website is familiar, suggesting i co-developed zstd core algorithm too, like bzip2 and nanozip (i was the co-programmer, our ideas). I created and approved many ciphers too that i now rediscover in Wikipedia. Yet my ownership bonds were not honored! Understand me. I co-own Apple, Microsoft, IBM, Intel, AMD, Yahoo, Google, and Facebook in the 1970s. I remember i was designing the Facebook GUIs while making/shooting the "Star Trek: Enterprise" tv series. These companies, to me, were already existent as we were planning the timetable of their products and technologies. I negotiated among these companies for our future plans. I outlined the computer science timetable in Windows OSes, Visual compilers, and data compression and encryption technologies, among others. I co-own Apple that Wozniak considered me his Boss. I was genius designing computers and algorithms and software. We planned github and LinkedIn too. Then they all stole the corporate side of me, totally pushed me out. But in the 1980s they would still come to me to ask me on computer matters, exploiting me, especially on quantum computing which i and my family pioneered. The fact that they still went to me in the 1980s is proof i was a major player in tech. Others were there to steal my shares and ownerships, wanted to take videos of me saying i didn't own the companies, wanted me to sign papers or agreements with their constant threats. Already owning Apple, Microsoft, IBM, and Intel, i guess my plan was for me to officially be co-founder of Yahoo, Google, and Facebook at their official start up dates in the 1990s and 2000s which they just had to follow--should had followed! I moderated on everything tech that i co-own these tech giants! They're too hardcore greedy that they wanted my ownerships for themselves, brainwashed me. That sums it up. We're talking of immense wealth here. People will do anything to be in Google, for example. Naturally, the taller, bigger, and handsome Americans and Europeans look more credible than me. But modern computer science we designed from here, in the Philippines. See, the freedom to be and to do on the Net is spawning many geniuses' creative works in the many sciences around the globe now. And on top of all these achievements, i knew quantum computing will change the world.
    8 replies | 1090 view(s)
  • JamesWasil's Avatar
    Yesterday, 04:59
    I'm not sure how to answer this poll. I remember receiving spam emails about this, but I'm not sure if this is supposed to be an effort at trolling or an exercise for mental health awareness?
    8 replies | 1090 view(s)
  • cssignet's Avatar
    Yesterday, 01:21
    would you please do few more tests, preferably on the same set againts pingo rc2 43 vs proto pingo -s0 -strip pingo -s0 -strip -mp=8 proto -s0 -strip proto -s0 -strip -mp=8 it would be nice if you can post the log from pingo + timer/PP64 instead of pshell. thanks
    476 replies | 127114 view(s)
  • Jarek's Avatar
    Yesterday, 00:16
    In https://sites.google.com/site/powturbo/entropy-coder the fastest rANS has ~600/1500 MB/s/core enc/dec ... AC 33/22 MB/s/core. But in https://github.com/jkbonfield/rans_static there is faster AC: 240/150 MB/s/core. Generally ANS is much more convenient for vectorization due to state being single number, in AC one needs to process two numbers ... might be doable, but I haven't seen it. Another story is SIMD 4bit adaptive rANS - started in LZNA, simultaneously updates 16 probabilities, and compares with all CDFs to find the proper subrange.
    11 replies | 610 view(s)
  • Sportman's Avatar
    3rd July 2020, 23:04
    Sportman replied to a thread Fp8sk in Data Compression
    TS40.txt: 79,571,976 bytes, 2,152.885 sec., fp8sk1 -8
    6 replies | 251 view(s)
  • CompressMaster's Avatar
    3rd July 2020, 21:37
    Thanks, but I checked videotest in full hd and I am unsatisfied with quality. Thus I need something better - full hd video, and rugged design such as blackview.
    4 replies | 84 view(s)
  • Darek's Avatar
    3rd July 2020, 18:08
    Darek replied to a thread Paq8pxd dict in Data Compression
    Ok, then there no changes... 15'655'526 - enwik8 -x15 -w -e1,english.dic by Paq8pxd_v89, change: -0,01% 15'654'147 - enwik8 -x15 -w -e1,english.dic by paq8pxd_v89_40_3360, change: -0,01% Looks like 14KB of gain for enwik9...
    952 replies | 319334 view(s)
  • compgt's Avatar
    3rd July 2020, 17:54
    Hear me, hear me, hear me: https://encode.su/threads/3338-Random-files-are-compressed-all-the-time-by-definition-is-the-issue?#10 https://grtamayoblog.blogspot.com/2020/02/paq-compression-programs.html?m=1
    8 replies | 1090 view(s)
  • Gotty's Avatar
    3rd July 2020, 17:21
    SolidComp, BLU G6 is unfortunately not available around here (in Slovakia where CompressMaster resides). Nevertheless, I checked this phone and found that users and reviews are not really satisfied. It's screen is not HD, and it's camera seems to be low end, screen is not gorilla protected. It is is certainly a low budget model. I'm afraid low budget phones are not the king of durability. Why do you suggested this phone? Because of it's price?
    4 replies | 84 view(s)
  • Shelwien's Avatar
    3rd July 2020, 16:55
    > You said SLZ wasn't the fastest deflate implementation, now you're talking about Zstd and LZ4. Well, they can be modified to write to deflate format. The slow part in LZ encoding is matchfinding, not huffman coding. > I was asking what other deflate implementations are faster. Based on this, intel gzip is faster: https://sites.google.com/site/powturbo/home/web-compression Also there's hardware for deflate encoding: https://en.wikipedia.org/wiki/DEFLATE#Hardware_encoders > On memory managers, you mean OS, or something application specific? Layers are added by OS, standard library, sometimes app itself too. SLZ is designed for a special use case where it has to compress 100s of potentially infinite streams in parallel on cheap hardware. However its not a good solution for other compression tasks, not even storage or filesystems.
    15 replies | 352 view(s)
  • SolidComp's Avatar
    3rd July 2020, 16:39
    Should SIMD rANS always beat SIMD AC?
    11 replies | 610 view(s)
  • Shelwien's Avatar
    3rd July 2020, 16:39
    Shelwien replied to a thread Paq8pxd dict in Data Compression
    @Darek: -s doesn't use mod_ppmd
    952 replies | 319334 view(s)
  • SolidComp's Avatar
    3rd July 2020, 16:37
    You said SLZ wasn't the fastest deflate implementation, now you're talking about Zstd and LZ4. I was asking what other deflate implementations are faster. On memory managers, you mean OS, or something application specific?
    15 replies | 352 view(s)
  • Shelwien's Avatar
    3rd July 2020, 16:33
    Shelwien replied to a thread Fp8sk in Data Compression
    @suryakandau: Don't mind him. In your case its better to make new threads. Or even better if you'd just use one thread for all of your clones of codecs. @CompressMaster: Do you really want *sk versions in main threads of these codecs?
    6 replies | 251 view(s)
  • Shelwien's Avatar
    3rd July 2020, 16:14
    > I don't follow. There's a difference between Huffman trees > and Huffman coding in this context? What's the difference? Dynamic and static coding are different algorithms. Dynamic is more complex and slower, but doesn't require multiple passes. "How the huffman tree is generated" and "how the tree is encoded" and "how the data is encoded using the tree" are completely unrelated questions, only the latter one is about whether coding is static or dynamic. > Those are just the stats from Willie's benchmarks. Well, the only table there with 100M zlib is for "HAProxy running on a single core of a core i5-3320M, with 500 concurrent users, the default 16kB buffers" so 100M is used by 500 instances of zlib, thus 200kb per instance, which is reasonable. As to CPU usage, its actually limited by output bandwidth there. "gigabit Ethernet link (hence the 976 Mbps of IP traffic)" So I guess your "CPU" is supposed to be measured as (976000000/8)*100/(encoding_speed_in_bytes/compression_ratio), so 256MB/s corresponds to 100% and 512MB/s to 50% (at SLZ's CR 2.1). This is not something obvious and is only defined for a specific use case, so don't expect anybody understanding you without explicit term definition. > How else do you measure memory use but by measuring memory use during program execution? Sure, its okay in libslz case, because he compares stats of the same program (haproxy), just with different compression modules. Comparing different programs like that is much less precise because stats don't show which algorithms use this memory. Well, anyway, in this case the best approach would be to use internal measurement (by replacing the standard memory manager) For example, this: for( i=0; i<1000000; i++ ) new char; // PeakWorkingSetSize: 34,775,040 and this: for( i=0; i<1; i++ ) new char; // PeakWorkingSetSize: 2,777,088 both allocate exactly 1000000 bytes of useful memory. But the memory usage by corresponding executables is 17x higher in first case. Why? Because of overhead of specific default memory manager. Which can be replaced with something custom-made and then the examples would be suddenly equal in memory usage. > I like software that is engineered for performance and low overhead, like SLZ. > I want it to be 1% CPU or so. So in SLZ terms its 25GB/s. SLZ itself only reaches 753MB/s (in some special cases), so good like with that. Maybe sleeping for 10-20 years would help :) > It's faster than libdeflate. What else is there? Well, zstd https://github.com/facebook/zstd#benchmarks or LZ4: https://github.com/lz4/lz4#benchmarks https://sites.google.com/site/powturbo/compression-benchmark https://sites.google.com/site/powturbo/entropy-coder Thing is, normally decoding is a more frequent operation, so codec developers mostly work on decoding speed optimization.
    15 replies | 352 view(s)
  • Scope's Avatar
    3rd July 2020, 16:09
    Unfortunately, I read it late and have already rewritten version 40, but I have done other tests with a different set: 236 files (845 416 026 bytes) ect -1 -strip --mt-file *.png TotalMilliseconds : 187197,7023 ppx2 -P 8 -L 1 ect.exe -1 -strip "{}" (8 MF) TotalMilliseconds : 201311,9329 ect -5 -strip --mt-file *.png TotalMilliseconds : 505926,7081 ppx2 -P 8 -L 1 ect.exe -5 -strip "{}" (8 MF) TotalMilliseconds : 514763,0062 pingo (41) -s0 -strip *.png TotalMilliseconds : 163415,1559 ppx2 -P 8 -L 1 pingo (41) -s0 -strip -nomulti "{}" (8 MF) TotalMilliseconds : 142454,0556 pingo (41) -s5 -strip *.png ​TotalMilliseconds : 498413,4901 ppx2 -P 8 -L 1 pingo (41) -s5 -strip -nomulti "{}" (8 MF) TotalMilliseconds : 398038,6889 pingo (42) -s0 -strip *.png TotalMilliseconds : 126019,8598 pingo (42) -s5 -strip *.png TotalMilliseconds : 493232,3343
    476 replies | 127114 view(s)
  • Scope's Avatar
    3rd July 2020, 15:25
    Yes and no. If people are able to use a better compiler, flags, etc. for open source applications, they are likely to use this (or use more optimally compiled files from other people). Closed source applications may be compiled from an old or not optimal compiler version (for example, in my experience, the speed of some applications even between MSVC/GCC/Clang compilers may be noticeably different), with non-optimal flags, but nothing can be done about it, it's not a user problem, I won't be able to "speed them up" if I want, although I can compile other applications with the same version, flags and compilers to make them slower (but since this isn't a speed test of individual algorithms and compilers, but a comparison of applications that will be used, it's not quite an honest comparison either). As an alternative, there can be two versions, stable for generic CPUs and for more modern ones with AVX2 support, with more aggressive optimization flags, etc., like when I tested Lepton, there were different compiled versions: https://github.com/dropbox/lepton/releases. It's also better to move to another topic (like ECT), as this is no longer a discussion of Google projects.
    168 replies | 41288 view(s)
  • Darek's Avatar
    3rd July 2020, 15:23
    Darek replied to a thread Paq8pxd dict in Data Compression
    15'728'903 - enwik8 -s15 -w -e1,english.dic by Paq8pxd_v89, change: -0,02% 15'728'903 - enwik8 -s15 -w -e1,english.dic by paq8pxd_v89_40_3360, change: 0,00% Hmmm... identical scores! Testing -x15...
    952 replies | 319334 view(s)
  • suryakandau@yahoo.co.id's Avatar
    3rd July 2020, 14:58
    I agree with you but how about jan ondrus ? does he agree if i make some improvement on it ?
    6 replies | 251 view(s)
  • LucaBiondi's Avatar
    3rd July 2020, 14:56
    LucaBiondi replied to a thread Paq8pxd dict in Data Compression
    Great!!! Have a good day!
    952 replies | 319334 view(s)
  • CompressMaster's Avatar
    3rd July 2020, 14:53
    CompressMaster replied to a thread Fp8sk in Data Compression
    @suryakandau@yahoo.co.id May I know WHY you posted new version in a separate thread AGAIN? If it´s an upgrade (new features, tweaked code etc.) IT WOULD BE FAR BETTER to have only one thread - Jan Ondrus started it. If you are disagree with my statement, consider an extreme case - we decided to make new thread for every new paq8px version. Result? Complicated navigation through many irrelevant threads. So please stop that, otherwise I report you to Shelwien.
    6 replies | 251 view(s)
  • suryakandau@yahoo.co.id's Avatar
    3rd July 2020, 13:58
    Paq8sk30 -s1 ts40.txt is: Total 400000000 bytes compressed to 80211410 bytes. Time 76785.83 sec, used 658 MB (690640027 bytes) of memory Paq8sk32 -s1 ts40.txt is Total 400000000 bytes compressed to 79461853 bytes. Time 65043.98 sec, used 659 MB (691656451 bytes) of memory i wonder if using paq8sk32 -x15 -e1,english.dic on ts40.txt can it reach below 70.xxx.xxx ?
    140 replies | 10953 view(s)
  • Darek's Avatar
    3rd July 2020, 13:39
    Darek replied to a thread Paq8pxd dict in Data Compression
    I'll test it. Starting from enwik8.
    952 replies | 319334 view(s)
  • compgt's Avatar
    3rd July 2020, 13:11
    Maybe Jyrki and Google (with all their expertise) can implement RLLZ into an actual compressor and see actual gains for LZ77 and LZSS... Since the write buffer is imposed in the decoding algorithm, it should be very fast like byte-aligned LZ.
    18 replies | 928 view(s)
  • LucaBiondi's Avatar
    3rd July 2020, 13:06
    LucaBiondi replied to a thread Paq8pxd dict in Data Compression
    Hi Darek are you able to test enwin9? I can't because is too big for me! Thank you!!! I will try to do some other expreriment!!!!
    952 replies | 319334 view(s)
  • compgt's Avatar
    3rd July 2020, 12:37
    Thanks for replying Gotty! Your posts here at encode.su are very informative, clearly well explained and sure are very helpful to anyone doing data compression, experts or enthusiasts alike. > I do encourage you to experiment even more. Well, maybe not too much in random data compression, but on algorithms very different than Huffman, LZ, grammar based, and arithmetic/ANS coding. If luck wills it, the question is again how programmers can "monetize" on their compression ideas and compressors.
    20 replies | 400 view(s)
  • Darek's Avatar
    3rd July 2020, 12:26
    Darek replied to a thread Paq8pxd dict in Data Compression
    Scores for 4 Corpuses by paqone paq8pxd_v89_40_3360. This versionget better scores for all corpuses. For all Silesia, Calgary and MaximumCompression set the paq8pxd records, however MaximumCompression tar version 5'991'491 bytes is in my opinion best scores ever! For Silesia there are 15KB of gain - nice! There is another tfing worth to mention for "nci" file from Silesia this version got the best score ever - beat cmix v18! The same case of for A10.jpg, FP.FOG and vcfiu.hlp from Maximum Compression!
    952 replies | 319334 view(s)
  • Gotty's Avatar
    3rd July 2020, 12:06
    I'm happy that you are happy and relieved. I think trying compressing random data is a must for everyone who wants to understand entropy. I'm with you, understand your enthusiasm and I do encourage you to experiment even more. After you understand it deeply, you will not post more different ideas. ;-)
    20 replies | 400 view(s)
  • compgt's Avatar
    3rd July 2020, 11:24
    Dresdenboy, if you're interested in LZSS coding, search for my "RLLZ" in this forum for my remarks on it. RLLZ doesn't need literal/match prefix bit, no match_len for succeeding similar strings past the initially encoded string, and no literal_len. It was my idea in high school (1988-1992), but i was forgetting computer programming then and we didn't have access to a computer. I remembered it in 2018, so it's here again, straightened out, better explained. https://encode.su/threads/3013-Reduced-Length-LZ-(RLLZ)-One-way-to-output-LZ77-codes?highlight=RLLZ
    18 replies | 928 view(s)
  • compgt's Avatar
    3rd July 2020, 11:01
    It's a relief that somebody here is admitting he actually tried compressing random files, like me. And actually suggests us to experiment with a random file. But not too much i guess. I tried random compression coding in 2006-2007 that i actually thought i solved it, that i thought i got a random data compressor. I feared the Feds and tech giants will come after me, so i deleted the compressor, maybe even without a decoder yet. Two of my random compression ideas are here: https://encode.su/threads/3339-A-Random-Data-Compressor-s-to-solve RDC#1 and RDC#2 are still promising, worth the look for those interested. Maybe, i still have some random compression ideas but i am not very active on it anymore. There are some "implied information" that a compressor can exploit such as the order or sequence of literals (kinda temporal) in my RLLZ idea, and the minimum match length in lzgt3a. Search here in this forum "RLLZ" for my posts. https://encode.su/threads/3013-Reduced-Length-LZ-(RLLZ)-One-way-to-output-LZ77-codes?highlight=RLLZ > Randomness is an issue. And randomness is the lack of useful patterns. Randomness is the lack of useful patterns, i guess, if your algorithm is a "pattern searching/encoding" algorithm. Huffman and arithmetic coding are not pattern searchers but naturally benefit on the occurrences of patterns. LZ based compressors are.
    20 replies | 400 view(s)
  • Aniskin's Avatar
    3rd July 2020, 10:25
    Aniskin replied to a thread 7-zip plugins in Data Compression
    Is there a way to get a sample of such file to debug?
    10 replies | 3263 view(s)
  • Bulat Ziganshin's Avatar
    3rd July 2020, 10:24
    first, we need to extend the vocabulary: static code: single encoding for the entire file, encoding tables are stored in the compressed file header block-static one: file is split into blocks, each block has its own encoding tables stored with the block dynamic: encoding tables are computed on-the-fly from previous data, updated each byte block-dynamic: the same, but encoding tables are updated f.e. once each 1000 bytes so, - the first LZ+entropy coder was lzari with dyn. AC - second one was lzhuf aka lharc 1.x with dyn. huffman - then, pkzip 1.x got Implode with static huffman coder. It employed canonical huffman coding so compressed file header stored only 4 bits of prefix code length for each of 256 chars (and more for LZ codewords). - then, ar002 aka lha 2.x further improved this idea and employed block-static huffman. It also added secondary encoding tables used to encode code lengths in the block header (instead of fixed 4-bit field) - pkzip 2.x added deflate which used just the same scheme as ar002 (in this aspect, there were a lot of other changes) Since then, block-static huffman became de-facto standard for fast LZ77-based codecs. It's used in RAR2 (which is based on deflate), cabarc (lzx), brotli (that added some O1 modelling), zstd (that combines block-static huffman with block-static ANS). Static/block-static codes require two passes over data - first you compute frequencies, then build tables and enocde data. You can avoid first pass by using fixed tables (or use more complex tricks such as building tables on the first 1% of data). Deflate block header specifies whether it uses custom encoding tables encoded in the block header (DYNAMIC) or fixed ones defined in the code (STATIC). So, this field in the block has its own vocabulary. Tornado implements both block-dynamic huffman and block-dynamic AC. Dynamic/block-dynamic codes uses only one pass over data, and block-dynamic coding is as fast as second pass of *static one.
    15 replies | 352 view(s)
  • Gotty's Avatar
    3rd July 2020, 09:16
    Gotty replied to a thread Fp8sk in Data Compression
    6 replies | 251 view(s)
  • Dresdenboy's Avatar
    3rd July 2020, 08:36
    My own experiments look promising. With a mix of LZW, LZSS and numeral system ideas (not ANS though ;)), I can get close to apultra, exomizer, packfire for smaller files, while the decompression logic is still smaller than theirs.
    39 replies | 2558 view(s)
  • Dresdenboy's Avatar
    3rd July 2020, 08:33
    Here's another paper describing an LZSS variant for small sensor data packets (so IoT, sensor mesh, SMS, network message compression related works look promising): An Improving Data Compression Capability in Sensor Node to Support SensorML-Compatible for Internet-of-Things http://bit.kuas.edu.tw/~jni/2018/vol3/JNI_2018_vol3_n2_001.pdf
    18 replies | 928 view(s)
  • Bulat Ziganshin's Avatar
    3rd July 2020, 08:28
    tANS requires memory lookups which is 2 loads/cycle in the best case (on intel cpus). so probably you can't beat SIMD rANS
    11 replies | 610 view(s)
  • suryakandau@yahoo.co.id's Avatar
    3rd July 2020, 08:22
    @sportman/Darek could you test it on GDCC public test set file (test 1,2,3,4) please ?
    6 replies | 251 view(s)
  • SolidComp's Avatar
    3rd July 2020, 05:43
    Durability is testable, but the problem is that the data is not public. We have no way of knowing how durable a phone's buttons are, since we have no access to the manufacturer's test results. Both the manufacturers and the wireless carriers conduct extensive durability testing. A good example is T-Mobile's robot "Tappy", the one Huawei tried to copy: https://www.npr.org/2019/01/29/689663720/a-robot-named-tappy-huawei-conspired-to-steal-t-mobile-s-trade-secrets-says-doj Since extensive testing is expensive and requires large teams, one inference we can make is that phones from the top two or three companies are likely to be there best tested and most durable. So Samsung and Apple. Any phone sold in the last few years can easily record 1080p video. Most newer phones can do 4K, usually at 60 fps. I just checked cheap phones on Amazon and the BLU G6 records 1080p and costs $90 in the US. It's probably not as durable as a Samsung Galaxy S20, but flagships have gotten extremely expensive, like $1,000.
    4 replies | 84 view(s)
  • suryakandau@yahoo.co.id's Avatar
    3rd July 2020, 04:40
    this is based on fp8v6 with small improvement ​
    6 replies | 251 view(s)
  • SolidComp's Avatar
    3rd July 2020, 02:37
    I don't follow. There's a difference between Huffman trees and Huffman coding in this context? What's the difference? Those are just the stats from Willie's benchmarks. How else do you measure memory use but by measuring memory use during program execution? I like software that is engineered for performance and low overhead, like SLZ. I want it to be 1% CPU or so. SIMD can be one way to get there, so long as it doesn't throttle down core frequencies like using AVX-512 does. What is faster than SLZ? It's much faster than the ridiculous Cloudflare zlib patch that no one can actually build anyway (typical Cloudflare). It's faster than libdeflate. What else is there?
    15 replies | 352 view(s)
  • Shelwien's Avatar
    3rd July 2020, 01:36
    Shelwien replied to a thread Paq8pxd dict in Data Compression
    > I am trying to learn how paq8px/pxd works but you know is not easy at all! Did you read DCE? http://mattmahoney.net/dc/dce.html > I am a delphi developer and i know c only a little. Fortunately paq doesn't use that much of mainstream C++. There're some tools like this: https://github.com/WouterVanNifterick/C-To-Delphi > What do you mean exactly with "instance updated with filtered byte values" mod_ppmd.inc has this function (at the end): U32 ppmd_Predict( U32 SCALE, U32 y ) { if( cxt==0 ) cxt=1; else cxt+=cxt+y; if( cxt>=256 ) ppmd_UpdateByte( U8(cxt) ), cxt=1; if( cxt==1 ) ppmd_PrepareByte(); U32 p = U64(U64(SCALE-2)*trF)/trT+1; return p; } Unlike paq's main model, it does update for a whole byte at once, so we can change it to something like ppmd_UpdateByte( Map ) which could provide effects similar to preprocessing.
    952 replies | 319334 view(s)
  • LucaBiondi's Avatar
    3rd July 2020, 01:18
    LucaBiondi replied to a thread Paq8pxd dict in Data Compression
    Thank you Darek!!!
    952 replies | 319334 view(s)
  • LucaBiondi's Avatar
    3rd July 2020, 01:18
    LucaBiondi replied to a thread Paq8pxd dict in Data Compression
    Hi Shelwien, Thank you for your explanation! I am trying to learn how paq8px/pxd works but you know is not easy at all! I am a delphi developer and i know c only a little. But ...i am trying and my goal is to learn. What do you mean exactly with "instance updated with filtered byte values" I you have some time try to explain me and if i will be able i will do it. Luca Learn to a developer and you will have a collegues .... :_yes3:
    952 replies | 319334 view(s)
  • Gotty's Avatar
    3rd July 2020, 01:18
    You need to actually try experimenting with a random file. And you'll see with your own eyes. Saying ("As I said before") in not enough. You must try. It worth it. First, let's fix your definition a little bit. This is a bit better definition of randomness: What does that mean? Do you see a pattern here: 000000000000000000000 ? These are 21 '0' bits. And is there a pattern here: 1111111111111111111111 ? These are 22 '1' bits. Yes, indeed they are patterns. Repeating bits. But these patterns are unfortunately worthless in the file where I found them. How can that be? Let me show you. I grabbed the latest 1MB random file (2020-07-02.bin) from https://archive.random.org/binary The above bit patterns are in this random file. They are the longest repeats of zeroes and ones. And there is only one from each. No more. You will understand the real meaning of a "useful" pattern when you try to actually make the file shorter by using the fact that it contains these patterns. When you would like to encode the information that the 1111111111111111111111 patterns is there, you will need to encode the position where this pattern is found in the file (and its length of course). It starts at bit position 5245980 and it's length is 22. The file being 8388608 bits (or 1048576 bytes) long, encoding any position in this file would cost us log2( 8388608 ) = 23 bits. Oh. See the problem? Even the pattern of 22 repeating '1' bits is in the file, it is still not long enough to be useful. Encoding this info would cost us at least 23 bits. We cannot use it to shorten the file. And there are no longer repeats... We are out of luck. When I first started experimenting with data compression I was trying to compress random files and find patterns. Like everybody else, I guess. I did find patterns but not useful ones. Eventually when you count all (!) possible bit-combinations in a random file you will end up with the pure definition of randomness. Everything has a near 50% chance. Count the number of '1's and '0's. Count the number of '00', '01', '10', '11', ... all of them will have a near equal probability. When I first experienced that it was of course discouraging, but beautiful at the same time. Lack of patterns? No. Lack of useful patterns. Let me quote you again: Randomness is an issue. And randomness is the lack of useful patterns.
    20 replies | 400 view(s)
  • JamesB's Avatar
    3rd July 2020, 01:08
    For what it's worth I reckon tANS ought to be faster still than rANS, but I'm not aware of anyone going to such extreme SIMD levels. I only did static rANS because I understood it and kind of stuck with it. :-) A lame reason really!
    11 replies | 610 view(s)
  • cssignet's Avatar
    3rd July 2020, 00:27
    yes, in that case, it would not be really surprising that pingo would be slower with mp. IMHO, the test would be unfair: first, the optimized binary (AVX2 etc) vs generic SSE2, second but not least the mp in pingo rc2 40 would not run the same number of threads (not in the same exact context) since i do not have the required hardware, and when you have some time, could you try the mp in rc2 41 vs 40? -s0 on ~ 96 files should be enought (if it works - perhaps it would be possible to make it better). thanks
    168 replies | 41288 view(s)
  • Gotty's Avatar
    3rd July 2020, 00:11
    Durability is not really measurable. But satisfaction rate (that included durability, too) is measurable. Xiaomi Redmi Note 8T has the highest user satisfaction rate for a currently available affordable phone. Edit: added user satisfaction rates for your current phone and Xiaomi Redmi Note 8T. Source: Hungarian market. Edit: added the same from amazon.de only for Xiaomi Redmi Note 8T. For your Lenovo it is already not available.
    4 replies | 84 view(s)
  • JamesWasil's Avatar
    3rd July 2020, 00:01
    The advantage that arithmetic encoding has over Huffman and other prefix implementations is that it is able to achieve fractional bits to assign a probability to a codeword rather than a whole entire bit. You're going to get more out of arithmetic compression whenever the symbol probabilities are not a power of 2 because of the amount gained from the delta values between it. For example, if you were to grab the probabilities for symbols from a file and see that you only needed 1.67 bits per symbol, with arithmetic encoding you'll be able to get it represented as 1.67 bits or very close to it on a fractional margin. Whereas with Huffman and other methods, it still requires at least 2 bits to represent the same probability. Whenever these instances occur, you'll be able to save the difference (in this example, .33 of a bit) for each and every occurrence of that probability. These savings add up, and the larger your data source is, the more significant these savings of fractional bits will be over the use of Huffman and other methods. So basically, Huffman will only be optimal if what you're reading is a power of 2. If it's not, then arithmetic encoding will give you more. If you're combining contexts, then you'll get more out of arithmetic compression usually because the additional contexts can be represented with fractional bits and save on that difference, too.
    15 replies | 352 view(s)
  • Gotty's Avatar
    2nd July 2020, 23:40
    Summary. >>And Hoffman theory is useless if not for programmers. As you see from those examples above, compression theory is embedded in our decisions and is a serious part of our everyday life. We didn't really invent compression, we discovered it and formulated it mathematically and statistically. It's just everywhere.
    20 replies | 400 view(s)
  • Gotty's Avatar
    2nd July 2020, 23:27
    And finally... language (morphology). In every (human) language the words that are used more often tend to be the shortest. (Zipf's law) We, humans, intuitively created our languages optimal in the sense that we express our thoughts and exchange information with the least effort - fewest possible sounds and fewest possible letters to convey the intended information. Thus we compress information as we speak. In a Huffman-like way. Isn't in phenomenal?
    20 replies | 400 view(s)
  • Shelwien's Avatar
    2nd July 2020, 23:00
    > Is he mistaken about zlib? http://www.libslz.org/ No, you are. What he says is "Instead, SLZ uses static huffman trees" and "the dynamic huffman trees would definitely shorten the output stream". I don't see anything about dynamic _coding_ there. > Eugene, deflate doesn't have a predefined Huffman or prefix table, does it? https://en.wikipedia.org/wiki/DEFLATE#Stream_format "01: A static Huffman compressed block, using a pre-agreed Huffman tree defined in the RFC" > What are the assumptions of Huffman optimality? Huffman algorithm is the algorithm for generation of a binary prefix code which compresses a given string to a shortest bitstring. > Does that mean that for some data distributions, arithmetic coding has no > advantage over Huffman? Static AC has the same codelength in a case where probabilities of all symbols are equal to 1/2^l. Adaptive AC is basically always stronger for real data, but its possible to generate artifical samples where Huffman code would win. > Oh, and does deflate use canonical Huffman coding? Yes, if you accept limited-length prefix code as Huffman code. > So when you say there's nothing good about zlib memory usage, you mean it uses too much? It uses a hashtable and a hash-chain array. So there're all kinds of ways to use less memory - from just using a hashtable alone to BWT SA to direct string search. > It uses about 100 MB RSS, while SLZ uses only 10 MB. SLZ is my baseline for > most purposes. It would be interesting if a LZ+AC/ANS solution could be > made that used 10 or fewer MB, and very little CPU, and was fast. You have very strange methods of measurement. If some executable uses 100MB of RAM, it doesn't mean that algorithm there requires that much memory - it could be just as well some i/o buffer for files. Also what do you mean by "uses little CPU"? There's throughput, latency, number of threads, SIMD width, cache dependencies, IPC, core load... its hard to understand what you want to minimize. For example, SLZ has low latency and memory usage, but its not necessarily fastest deflate implementation for a single long stream. > I think maybe brotli and Zstd in their low modes, > maybe -1 or so, could approach those metrics, but I'm not sure. For zstd you'd probability have to tweak more detailed parameters, like wlog/hlog. I don't think it has defaults to save memory to _that_ extent. > then generated Huffman trees for each block separately, that's still > static Huffman, right? Yes. If the code says the same within a block, then its static.
    15 replies | 352 view(s)
  • JamesWasil's Avatar
    2nd July 2020, 22:59
    To answer your question, yes. :) I set out to do this about a year or two ago, seeing that most of the source code was always in C++ or ANSI C, and rarely if ever in anything easier to read and closer to natural language for people who were intermediate or beginners. Many people started out with languages like Basic or Turbo Pascal, although C++ or assembly language is going to be your best bet long-term for programming efficiency, speed, and most real-world applications these days. BUT -- if you're getting started or are versed with Basic, you might want to start there and then gradually branch out to other languages like C++, Python, or Java...all of which are now industry standards. There were some great and helpful commenters on this thread when initially introduced (please ignore the 1 jerk, maybe 2 spouting off on there and read past it to get what you need out of the posts and the code): https://encode.su/threads/3022-TOP4-algorithm As a bonus, Sportman was especially helpful and compiled his own version along with the one that I submitted. He did independent testing as did at least one or two others. (jibz did one in C if you need it, too) Although the thread title is slightly misleading because in reality it doesn't always produce code better than Huffman...there are many modifications that you can do with it to where it can be made to achieve more with partial contexts and better compression. But I would use ait s a starting point for an easy way to understand, since there are few places to find anything easier or more straight-forward than this. Now please understand that there are other compressors that will do better too that are most likely based on Arithmetic Encoding or Range encoding, but with those is more complexity and might be too much for someone to start out with. A lot of people suggest "PAQ", but it's a lot of unnecessary stuff to do very basic compression and understand the premise of it. When you're ready you can do PAQ probably after Arithmetic Encoding and traditional Huffman, but for an easy and fast way to do things, I'd start here. With TOP4, you get a basic skeleton frame I made that is table-based in BASIC and compresses with a very straight-forward, WYSIWYG approach. There's a separate file that is able to reverse the process as a decompressor. It reads the bytes at the front of the file to get a table for codewords, and then decompresses data based on that. How it works: It represents the 4 most statistically likely symbols with 1 bit shorter code word, while adding 1 bit longer to the least frequent symbols at the end. By doing this, you get compression because of the frequencies with shorter codewords at the front always outweighing the frequency of the least occurring ones at the end. The less compressed and "balanced" the frequencies are for the symbols, the more you're able to compress data at the top and expand the few at the bottom. Your compression is what you get from the difference of this when all the bits are tallied up and converted back to 8 bit ASCII symbols. (I did one in C and used QB64 for what was submitted, but you can make it for VB6 or use Sportman's VB.Net submission just as well) The basic code should be easy enough to read to where you can adapt it to any language you fancy or want to use, since it's very close to pseudocode for beginners. You'll find however that a lot of things (most things?) are written in C++ now, and people are using that as their pseudocode as a defacto-standard. What I would suggest is using this to get an understanding, but gradually branch out to C++ or Python from here and adapt it to that. Then, you can move on to actual Huffman coding or Adaptive Arithmetic encoding and more. This is more or less instantly gratification to help you get your feet wet with compressing text files, EXE files, BMP files, and others that are easily compressed. Once you're comfortable with this, it'll be even easier for you to continue on and adapt as you grow. :) And of course, the code was submitted royalty-free with no restrictions really, more as a learning tool for people to use freely. If you're interested in things like BWT, there's sections on that, too. I made some BWT implementations entirely in BASIC and one in Python (not sure where I put that, but I still have the basic one on a flash drive), but honestly you'll find more straight-forward BWT implementations from others searching this site than what I have to share. Michael has a really good BWT compressor on here that I've seen. And Shelwein and Encode have tons of random stuff lol There may be other really easy compressors on here for you to check out too if you search for them. They'll either be on threads or under the download sections with source code.
    20 replies | 400 view(s)
  • Gotty's Avatar
    2nd July 2020, 22:49
    Decision making - again. When you try to decide something you actually try to predict what the outcome of that decision would be. Would it be good? Would it be bad for me? When you need to buy a new mobile phone for example, you have different options. Buy a high end one, and hope that it will last many-many years and you'll be satisfied with the packed in features. Or for quarter of the price you buy one from the low range? Probably it would fail sooner or you would need to exchange it sooner than the top one. Also it may be lagging or miss some features, so eventually your satisfaction would be a bit lower. Or a second hand phone? Hm, the failure rate could be even higher and you don't have warranty. But the price is really tempting... You make a decision by trying to predict the outcome based on different metrics: price, satisfaction rate, warranty, probability of a failure. You don't foresee the future. But your past experiences, listening to experts, asking the opinion of friends will help you making a good decision. (This is also called an informed decision.) An entropy-based compression software does exactly that: trying to predict the next character in a file and the better the prediction is, the better the compression will be. (Entropy-based) compression = prediction. When you try to predict the future: what's the probability that it would rain or the probability of a successful marriage with a person, you actually apply the same theory that is used in compression.
    20 replies | 400 view(s)
  • Gotty's Avatar
    2nd July 2020, 22:26
    Your decisions are greatly determined by the "success rate" and the "magnitude" of positive or negative feedback you experience. We are pursuing happiness throughout our entire lives and in order to reach it we make "statistics" starting in a very early age. We evaluate all situations based on these statistics and decide what to do and what not to do. For example I tried football, basketball, handball and I know I'm rather lame at any ball game. Even at snooker (I have my statistics ;-)). In these activities I can't shine, so they don't give me satisfaction, so I try to avoid them. But I'm good at running and jumping (I can jump my height ;-)), I got a bunch of medals from my teens, and so I love them (got my statistics). If I need to chose what to do, these statistics will dictate me that I better not go to play basketball in my free time but go running in the evenings. When having many friends with many different interests, how do we decide what to do when we meet and spend time together? We will instinctively maximize our shared happiness based on how much we like or dislike these activities. We will do activities more frequently that are liked by the majority of us and less frequently that are not liked by so many but still a couple of us enjoy - for the sake of those few. We can formulate the "high happiness score" as "less regret" or "less cost" and "low happiness score "as high regret" or "high cost". Summary: maximum happiness = do the "high regret"/"high cost" activities less frequently and the "low regret"/"low cost" activities more frequently. >>And Hoffman theory is useless if not for programmers. Huffman theory says: maximum compression = "As in other entropy encoding methods, more common symbols are generally represented using fewer bits than less common symbols." wikipedia] Hmmm... sounds familiar? Our decision making is intuitively based on this compression theory. Not just for programmers. For everybody.
    20 replies | 400 view(s)
  • Gotty's Avatar
    2nd July 2020, 22:21
    Let me tell you a couple of interesting facts.
    20 replies | 400 view(s)
  • SolidComp's Avatar
    2nd July 2020, 22:14
    So when you say there's nothing good about zlib memory usage, you mean it uses too much? It uses about 100 MB RSS, while SLZ uses only 10 MB. SLZ is my baseline for most purposes. It would be interesting if a LZ+AC/ANS solution could be made that used 10 or fewer MB, and very little CPU, and was fast. I think maybe brotli and Zstd in their low modes, maybe -1 or so, could approach those metrics, but I'm not sure. If we preprocess and sort data into blocks that group say all the numeric data in one block, all the text in another, hard random data in another, and then generated Huffman trees for each block separately, that's still static Huffman, right? It should help because you could get shorter codes on average if you just have numbers, or letters, and so forth, depending on the overhead of the tables.
    15 replies | 352 view(s)
More Activity