Activity Stream

Filter
Sort By Time Show
Recent Recent Popular Popular Anytime Anytime Last 24 Hours Last 24 Hours Last 7 Days Last 7 Days Last 30 Days Last 30 Days All All Photos Photos Forum Forums
  • necros's Avatar
    Today, 12:39
    Contest goal could be outperforming for example 7z LZMA2 compression of multiple data set in terms of same or lower time and same or better compression.
    20 replies | 1059 view(s)
  • compgt's Avatar
    Today, 11:42
    @JamesWasil, Ok... that's the official "history" you know. But i remember this the classified top secret part of 1970s to 80s Martial Law Cold War when the Americans were here in the Philippines. It was me a kind of "central hub" among scientist networks that time (so i learned from many, and privy to state of the art science and technologies that time), dictating on what-would-be computer science history for the 80s, 90s and 2000s (e.g. in data compression and encryption, IBM PC, Intel, AMD and Microsoft Windows dominance etc.) Just think of me then as a top military analyst. I mean, i wasn't just a player on all this; it was me moderating on everything tech. I knew i already co-own Apple and Microsoft. I guess i decided to officially be co-founder of Yahoo, Google and Facebook, but didn't happen officially. There was "too much fierce" competition amongst tech companies. I mean, it was Cold War, a real war. The Cold War officially ended in the early 1990s, with the Americans leaving the Philippines, military bases left behind or demolished. In short, the real computing history of US (and the world) was made, written, and decided here in the Philippines, with me. I chose Steve Jobs. I glorified Bill Gates, bettering his profile more and more. I chose Sergey Brin and Larry Page for Google, and i decided for a Mark Zuckerberg Chairman-CEO profile for Facebook. Too many ownerships for me in tech that they became greedy, wanted my ownerships for themselves, or decided to move on without me. That is, they asserted to other player groups my decisions or timetable for them to own the tech giants, but without me. What kind is that?! In late 1980s, however, they reminded me of Zuckerberg and Facebook, implying a chance for me to officially "co-found" Facebook. I remember this encode.su website and GUI (1970s), as the names Shelwien, David Scott, Bulat Ziganshin, dnd, Matt Mahoney, Michael Maniscalco, Jyrki Alakuijala. Some of them would come to me in the Philippines in the 80s...if it was really them. By early 1990s i was forgetting already. In the mid 2000s i was strongly remembering again. If i hear my voice in the bands "official" recordings of Bread, Nirvana, America, Queen, Scorpions etc, i then strongly believe these computer science memories.
    9 replies | 462 view(s)
  • JamesWasil's Avatar
    Today, 10:28
    There's a lot of things to address here, but first this: 1) Compgt: PAQ did not exist in the 70's or 80's. Wasn't conceptualized until the middle of the 1990's, I remember reading the initial newsgroup and forum threads for it back then under comp.compression and other forums/sites that no longer exist. Some of the file formats mentioned didn't either. Just politely letting you know of that so that you don't get flamed for that someday in a conversation or post, even if you were working on predecesor algorithms at that time. Trench: You have to understand the difference between pixel by pixel compression and macro sprites. The NES, Sega Master System, and other game systems from that time used a z80 or similar processor which ran anywhere from 1mhz to 8mhz baed on what it was. The graphics and audio were usually separate but dedicated processors running at about the same speed. Even still, while having a processor for each made a lot of things possible that a single chip would have struggled to do for the time, ram and storage space was still really expensive. They didn't want to use bloated image formats and they needed animation and scrolling to be fast. What they did is use sprites that were limited to 64x64 pixels (or less) and made background sprites that were used as tiles to create the images, maps, and backgrounds that you see. Larger Sprite animations were at times 2 or 3 of those large blocks synchronized to move together, but at times did flicker because of latency and refresh rate issues when the gpu tried to do too much at once, which was evident when too many sprites were on the screen at the same time. What this means is that out of a given screen area of say 480x284 (may be different but as an example), the entire background "image" was a jigsaw-piece assembled Sprite layer of tiles that were "stamped" in blocks, to where 64x64 pixel blocks squared which is the equivalent of 4096 pixels were represented with a pointer that was either 6 bits or 1 8 bit byte. This means that for every 4096 pixels, they were represented by 1 byte rather than 16. Yes, you might be able to fit entire images - stamped as Sprite macros to form that image - at 1/16th it's size for an NES game. But any image you're able to make with it is repetitive and restricted to Sprite artwork only loaded. PNG, GIF, and even lossy formats like JPEG are NOT restricted to premade macro images / aka graphical text fonts and have to be able to process, compress, and display ANY pixels you throw at it for an image. The same was done for audio and audio effects to make sure it all fit under 256 to 512k per cartridge. The earlier systems like the first generation of Sega Master Systems had to fit under 64k on game cards for the SG1000, and you were able to see the differences and restrictions even more to get it to fit. There are different types of compression, and not all are equivocal to one another. If there is a database of apriori knowledge, compression can be done with that. Compression isn't limited to the poor explanation of the tired and repetitive references to mahoney's "data compression explained", because in truth that is data compression only in one form and only explained one way. There are many other ways it can happen, and the NES and Sega implementation of Sprite image layers for KNOWN data demonstrate that. Compression of unknown data becomes more difficult of course, but not impossible. Just very difficult and only possible with methods that can address it.
    9 replies | 462 view(s)
  • Trench's Avatar
    Today, 01:36
    The file you compressed has mostly empty space and few colors, just like any file if its mostly empty it will go the same. But here are some examples of something more complex that cant be compressed desire original program can. Unlike something like an NES game map online which the size is 1,907kb and compressed to 1,660kb. While looking online for the file size it says it is 75KB (95% compression) despite it includes more art, sound, and code. Another game is earthbound on the list which the map is 3.4mb while the file size online says 194KB (94% compression) This applies to 100mb png image files that can not come close to the original which would probably be 5MB Plenty of patterns yet compression can not even recognize the patters. Shouldn't this be a simple thing that is not implemented yet? 95%! https://vgmaps.com/Atlas/NES/CastlevaniaII-Simon'sQuest-Transylvania(Unmarked).png
    9 replies | 462 view(s)
  • Jarek's Avatar
    Today, 01:15
    Thanks, that's a lot of priceless satisfaction. There is also WebP v2 coming ( https://aomedia.org/wp-content/uploads/2019/10/PascalMassimino_Google.pdf ), but I don't think it will have a chance with JPEG XL - your big advantage is perceptual evaluation. Also VVC is coming this year with HEIF successor (VIF?), but will rather have costly licenses, also computationally much more complex.
    63 replies | 16948 view(s)
  • Sportman's Avatar
    Yesterday, 21:40
    Added Shelwien compile.
    100 replies | 6135 view(s)
  • Jyrki Alakuijala's Avatar
    Yesterday, 19:28
    Thank you!! ​ Also, JPEG XL is the only ANS-based codec in this comparison :-D
    63 replies | 16948 view(s)
  • Shelwien's Avatar
    Yesterday, 17:31
    Ask me :)
    1 replies | 26 view(s)
  • maorshut's Avatar
    Yesterday, 17:21
    Hi all, How can I change my username or delete my account if changing is not possible ? Thanks
    1 replies | 26 view(s)
  • Sportman's Avatar
    Yesterday, 13:38
    Sportman replied to a thread BriefLZ in Data Compression
    252,991,647 bytes, 4628.815 sec. (1 hour 17 min) - 3.683 sec., blzpack -9 --optimal -b1g (v1.3.0)
    1 replies | 255 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Yesterday, 11:31
    How about bbb cm1000 for v1.8 ?? i use bbb cm1000 for v1.8
    100 replies | 6135 view(s)
  • Sportman's Avatar
    Yesterday, 11:24
    Added default mode.
    100 replies | 6135 view(s)
  • Jarek's Avatar
    Yesterday, 10:02
    Indeed, the comparisons are great: poster ~80kB: https://imgsli.com/MTIxNTQ/ lighthouse ~45kb: https://imgsli.com/MTIxNDg/ windows ~88kb: https://imgsli.com/MTIxNDk/ ice ~308kB: https://imgsli.com/MTE3ODc/ face ~200kB: https://imgsli.com/MTE2MjI/ JPEG XL is definitely the best here for maintaining details of textures like skin.
    63 replies | 16948 view(s)
  • Shelwien's Avatar
    Yesterday, 05:32
    Shelwien replied to a thread OBWT in Data Compression
    Here I compiled 1.8. But you easily can do it yourself - just install mingw: https://sourceforge.net/projects/mingw-w64/files/Toolchains%20targetting%20Win32/Personal%20Builds/
    31 replies | 2648 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Yesterday, 04:48
    could you upload the binary please ? because it is under GPL licenses
    31 replies | 2648 view(s)
  • Jyrki Alakuijala's Avatar
    Yesterday, 02:48
    I'm looking into this. No actual progress yet.
    300 replies | 313399 view(s)
  • Jyrki Alakuijala's Avatar
    Yesterday, 02:39
    Someone compares AVIF, WebP, MozJPEG and JPEG XL with a beautiful UI. https://medium.com/@scopeburst/mozjpeg-comparison-44035c42abe8
    63 replies | 16948 view(s)
  • jibz's Avatar
    15th February 2020, 19:01
    jibz started a thread BriefLZ in Data Compression
    Since the BriefLZ 1.2.0 thread disappeared, here is a new one! I've just pushed BriefLZ 1.3.0 which includes the forwards binary tree parser (btparse) which was in the latest bcrush and blz4. It improves the speed of --optimal on many types of data, at the cost of using more memory. The format is still backwards compatible with BriefLZ 1.0.0. enwik8 blzpack-1.2.0 --optimal -b100m 30,496,733 30 hours silesia.tar blzpack-1.2.0 --optimal -b205m 63,838,305 about a week enwik8 blzpack-1.3.0 --optimal -b100m 30,496,733 95 sec silesia.tar blzpack-1.3.0 --optimal -b205m 63,836,210 4 min enwik9 blzpack-1.3.0 --optimal -b1g 252,991,647 10.5 hours Not sure why the result for silesia.tar from 1.2.0 from two years ago is slightly higher, but not going to rerun it. If anyone has a machine with 32GiB of RAM, I would love to hear how long --optimal -b1g on enwik9 takes, because the results for this machine (8GiB RAM) includes swapping. Attached is a Windows 64-bit executable, and the source is at https://github.com/jibsen/brieflz
    1 replies | 255 view(s)
  • pacalovasjurijus's Avatar
    15th February 2020, 18:26
    Software: White_hole_1.0.0.1.6 Before: 2019-07-01.bin (little bit different but Random) 1048576 Bytes After: 2019-07-01.bin.b 1048396 Bytes Time 6 minutes
    22 replies | 1348 view(s)
  • compgt's Avatar
    15th February 2020, 15:35
    So, if you somehow compressed a string of jumbled characters into a significantly smaller (or program) size, it is simply "random-appearing" and not algorithmically random.
    6 replies | 52 view(s)
  • compgt's Avatar
    15th February 2020, 15:23
    To solve the bombshell, or presence of "anomalous" symbols, (1) you must have a way to create a smooth frequency distribution or the frequencies must be of the same bitsize as much as possible, of which there are many ways to do this, but must be reversible. For example, you can maybe XOR the bytes of the data source first (this is reversible), pre-whitening it for encoding. (2) Or, the bombshell symbol or byte can be thought of as an LZ77 literal, simply output a prefix bit flag (anomalous symbol or not?) for the symbols. This means at most two bits per symbol encoding, with the bit flag to indicate if the symbol sum doubles or MSbit. Plus the anomalous symbol when it happens, 8 bits. I wonder how large would the frequencies or freqtable be... And, like in Huffman coding, you can contrive or generate a file that is exactly perfect or suitable for this algorithm. What would be interesting is that the generated file is a "random-appearing data" file, perhaps indeed incompressible to known compressors. (See post above, now has pseudo-code for your easy understanding.)
    6 replies | 52 view(s)
  • compgt's Avatar
    15th February 2020, 15:22
    > 4. Instead of 8-bit bytes, use 4-bit symbols; How about 2-bit (base-4) symbols? Or maybe even better, a data source of base-3 symbols ??
    6 replies | 52 view(s)
  • compgt's Avatar
    15th February 2020, 15:21
    The compression algorithm is best understood if you "visualize" a bar chart or a histogram, where new symbol frequencies are always trying to become greater than the current highest frequency, which we increment by its delta with the new symbol's frequency. The new highest frequency becomes the new symbol's frequency; or put simply, the new symbol must have the highest frequency. So at most, the new highest frequency can only "double" or add by 1 bit in length. (In decoding, the symbol with the highest frequency is the symbol to decode; this means it is stack based. We add the delta to the highest frequency during encoding so we can preserve or get back to the previous frequency of the symbol when decoding.) Output is actually the frequency table, which is easy to compress or generate? Algorithm pseudo-code: /* initialize frequency table. */ for (i=0; i < 256; i++) freq = i + 1; max = freq; do { c = get_byte(infile); if (c == EOF) break; freq = max + (max - freq); max = freq; } while (1); No "runs" of a single character allowed in the input, as much as possible. "Random data" indeed. New or recalled observations: 1. This algorithm ironically "expands" the frequencies at first. ? LOL We're back to the early days of information theory or data compression history! 2. The bombshell: It takes more than 1 bit added to encode for very small frequencies which suddenly must be maximum. The solution might be to "swap" them but this requires new information or codes. This is back to delta coding. haist 3. But a total cycling of the frequency table might work... 4. Instead of 8-bit bytes, use 4-bit symbols; *** This is similar, i think, to WEB Technologies' algorithm as featured in BYTE magazine in 1992 and noted by comp.compression FAQ: "WEB, in fact, says that virtually any amount of data can be squeezed to under 1024 bytes by using DataFiles/16 to compress its own output multiple times." I think they were using or playing with a frequency table too, 256 32-bit frequencies = 1K. They might had to output the MSbit of the highest frequency, the result of which may equal another byte frequency/ies? That's why they had the problem that 4 numbers in a matrix are equal, a rare case in their algorithm. Just maybe. (Ideally, at most 1 bit increase in frequency of output or new symbol, but the bombshell precludes that. If they are of the same bitsize, then only 1 bit increase in the new max frequency. The current symbol has always the highest frequency. You decode backwards, from last symbol to first; the symbol with the highest frequency is the current symbol. One parameter in decoding is the famed file_size(). ) The problem with the algorithm is that the emitted frequency table could be very large due to very large frequencies if you implement it by really using BigNums or BigInts; You then have to compress the very large frequency table. Maybe to achieve compression, you can just consider the MSBit after the arithmetic (addition) operation. Or the solution is nearly just MTF (you have to output the character that *doubled* (MSBit activated)). WEB Technologies' Datafiles/16 algorithm is clearly designed for compression of *random* data, and recursive, which are futile indeed.
    6 replies | 52 view(s)
  • compgt's Avatar
    15th February 2020, 15:20
    Random data compressor #2 How about compressing an array of 256 Very Large Ints (BigNums or infinite-precision integers), i.e. a frequency table for 8-bit symbols? This is hard to compress, i think, since the said frequency table is already the output of a compression algorithm (which was fascinating at first).
    6 replies | 52 view(s)
  • compgt's Avatar
    15th February 2020, 15:18
    Random data compressor #1 One slow early approach is to guess the pattern or symbol. Just try to guess the input byte in < 32 tries, to output just 5 bits. (You can do this by randomly setting bits of a dummy byte on or off and compare it with the input byte.) If not guessed, output 00000 and then 8-bit byte. How would you initialize the dummy byte? Maybe by context; crude LZP like. What else? Build on this. Improve this.
    6 replies | 52 view(s)
  • compgt's Avatar
    15th February 2020, 15:17
    I create this separate thread for my random data compression ideas i posted in several threads here, so that it's easy to find in one thread. My "frequency table" random data coding was programmed in 2006-2007, perhaps with or without a decoder. Or maybe i solved it already but deleted the compressor. I remembered it in 2017 so here it is again. Note, do not try random data compression unless you can actually write a computer program to test your ideas, however futile they might be. It might take you years to clear up on your ideas, or admit that random data compression is impossible.
    6 replies | 52 view(s)
  • compgt's Avatar
    15th February 2020, 14:28
    My Google ownerships slot was taken from me. Now *my* JPEG is being killed off or supplanted by AVIF. I didn't earn officially from JPEG format anyway. We developed it in the 1970s to the 80s. Netflix will surely create its own image format, as Netflix is now a tech giant in media streaming. Incidentally, i planned Netflix too, having made the Hollywood movies, ... if i remember it right.
    63 replies | 16948 view(s)
  • suryakandau@yahoo.co.id's Avatar
    15th February 2020, 13:32
    decompression time of enwik10 using bbb v1.8: 42640.14 sec in my old machine. @sportman could you add this to your enwik10 benchmark please ?
    31 replies | 2648 view(s)
  • Jarek's Avatar
    15th February 2020, 13:03
    "Netflix wants to kill off JPEGs": https://www.techradar.com/news/netflix-wants-to-kill-off-jpegs ~500 upvotes: https://www.reddit.com/r/programming/comments/f46ysc/netflix_avif_for_nextgeneration_image_coding/
    63 replies | 16948 view(s)
  • compgt's Avatar
    15th February 2020, 12:57
    @CompressMaster, why are you trying random compression? I don't mean to pry, but is it your line of work, data compression? I think there are many companies out there trying to solve random data compression. I am not a programmer now professionally but i had done intermittent work on these compression stuff (zip format, bmp, gif, png, jpeg, vcd/svcd, dvd, paq etc.) in the 1970s and the 80s when i was a precocious child and grade-schooler. I am now just coding in C, too lazy to re-learn C++. Data compression remained an interest though.
    9 replies | 462 view(s)
  • Darek's Avatar
    15th February 2020, 12:19
    Darek replied to a thread Paq8pxd dict in Data Compression
    Some enwik scores for paq8pxd_v72: 16'309'641 - enwik8 -s8 by Paq8pxd_v61 15'968'477 - enwik8 -s15 by Paq8pxd_v61 16'570'543 - enwik8.drt -s15 by Paq8pxd_v61 126'587'796 - enwik9_1423 -s15 by Paq8pxd_v61 - best score for paq8pxd serie 16'309'012 - enwik8 -s8 by Paq8pxd_v63 15'967'201 - enwik8 -s15 by Paq8pxd_v63 16'637'302 - enwik8.drt -s15 by Paq8pxd_v63 126'597'584 - enwik9_1423 -s15 by Paq8pxd_v63 16'374'223 - enwik8 -s8 by Paq8pxd_v67_AVX2 16'048'070 - enwik8 -s15 by Paq8pxd_v67_AVX2 16'774'998 - enwik8.drt -s15 by Paq8pxd_v67_AVX2 127'063'602 - enwik9_1423 -s15 by Paq8pxd_v67_AVX2 16'364'165 - enwik8 -s8 by Paq8pxd_v68_AVX2 16'033'591 - enwik8 -s15 by Paq8pxd_v68_AVX2 16'755'942 - enwik8.drt -s15 by Paq8pxd_v68_AVX2 126'958'003 - enwik9_1423 -s15 by Paq8pxd_v68_AVX2 16'358'450 - enwik8 -s8 by Paq8pxd_v72_AVX2 - My compress time 6'780s. 16'013'638 - enwik8 -s15 by Paq8pxd_v72_AVX2 - tested by Sportman, My compress time 6'811s -@Sportman - you have very fast machine! 16'672'036 - enwik8.drt -s15 by Paq8pxd_v72_AVX2 - My compress time 9'280s. 126'779'432 - enwik9_1423 -s15 by Paq8pxd_v72_AVX2 - very nice gain, however still about 200KB behind paq8pxd_v61 version. My compress time 67'740s. 132'464'891 - enwik9_1423.drt -s15 by Paq8pxd_v72_AVX2 - also very nice gain, however still starting from paq8pxd_v57 DRT precompressed files got worse scores than pure enwik8/9 and the compression time is 50% higher. My compress time 93'207s.
    687 replies | 280906 view(s)
  • schnaader's Avatar
    15th February 2020, 12:12
    I'm not sure if this is what you wanted, but compression a png image to less than 50% of its size is not impossible at all: test.png: 80,225 bytes (attached image) test.pcf: 52,864 bytes (Precomp 0.4.7, completely reversible to get the original PNG file) test.webp: 38,224 bytes (cwebp -lossless -q 100 -m 5, same image data as the PNG)
    9 replies | 462 view(s)
  • schnaader's Avatar
    15th February 2020, 11:09
    The git infrastructure is already doing a great job here. First, the TurboBench repository is organized using submodules, so when you clone, you can choose which submodules to clone: git clone https://github.com/powturbo/TurboBench.git => (clones only the main repository) => directory size: 40,213,702 bytes transferred data: about 36,635,613 bytes (size of biggest file in .git\objects\pack) git submodule update --init brotli => (clones only the brotli submodule) => brotli directory size: 35,512,637 transferred data: about 32,181,545 (size of biggest file in .git\modules\brotli\objects\pack) Note that the 37 MB transferred data for the main repository contain the whole repository history (all 1,261 commits). If you don't need that, "git clone --depth 1" will give you the latest revision only which transfers only about 765 KB (!) of data. Looking at that brotli pack file, the transferred data is compressed quite good by git already, though I agree that it could be improved using lzma compression and recompression instead of deflate in git: .git\modules\brotli\objects\pack\.pack: 32,181,545 bytes .git\modules\brotli\objects\pack\.pcf_cn: 76,547,450 bytes (Precomp 0.4.7 -cn -intense) - so it transferred only 32 MB instead of 77 MB .git\modules\brotli\objects\pack\.pcf: 24,072,789 bytes (Precomp 0.4.7 -intense) I agree (though I would replace "system" with "kernel"), but they are already compressed good as well, and have the same potential to improve (tested on Ubuntu): /boot/initrd.img-4.15.0-74-generic: 24,305,803 /boot/initrd.img-4.15.0-74-generic.pcf_cn: 71,762,575 (Precomp 0.4.7 -cn) /boot/initrd.img-4.15.0-74-generic.pcf: 16,949,956 This is more of an issue of the brotli repository than of the TurboBench repository. Note that we didn't use "--recursive" in the submodule init command above, so the submodules in the brotli repository (esaxx and libdivsufsort) aren't cloned. Test data and stuff not needed to build could also be moved to brotli submodules. Of course another thing that could help would be to not use outdated image formats :p brotli\research\img\enwik9_diff.png: 5,096,698 .webp: 3,511,804 (cwebp -lossless -q 100 -m 5) .flif: 3,488,547 (flif -e)
    169 replies | 43181 view(s)
  • brispuss's Avatar
    15th February 2020, 07:57
    brispuss replied to a thread Paq8pxd dict in Data Compression
    I've run further tests, this time "tarring" all 171 jpg images first. Again, there was an improvement in compression, and compression time was reduced a bit of v72 with respect to to v69. The "tarred" files produced slightly better compression with respect to compressing each file individually. Compressor Total file(s) size (bytes) Compression time (seconds) Compression options Original 171 jpg files 64,469,752 paq8pxd v69 51,365,725 7,753 -s9 paq8pxd v72 51,338,132 7,533 -s9 Tarred jpg files 64,605,696 paq8pxd v69 50,571,934 7,897 -s9 paq8pxd v72 50,552,930 7,756 -s9
    687 replies | 280906 view(s)
  • suryakandau@yahoo.co.id's Avatar
    15th February 2020, 02:41
    could you also add bbb v1.8 , please ?
    100 replies | 6135 view(s)
  • suryakandau@yahoo.co.id's Avatar
    15th February 2020, 02:35
    enwik10 result using bbb v1.8 on my old machine: 1,635,996,697 bytes in 92,576.33 sec :cool:
    31 replies | 2648 view(s)
  • kaitz's Avatar
    15th February 2020, 01:13
    kaitz replied to a thread Paq8pxd dict in Data Compression
    Another testing. Wanted to see how well context actually work (wordmodel). ContextMap collects info and if threshold is reached context is permanently disabled, stats collection also. enwik6 i(0)=431965, i(3)=324985, i(24)=488462, i(27)=207541, i(33)=493349, i(34)=157440, i(35)=168725, i(36)=179219, i(37)=555562, i(38)=558076, i(45)=425289, i(58)=399287, i(60)=230295, i(61)=210033, book1 i(0)=520011, i(3)=491253, i(24)=394110, i(25)=220564, i(26)=4994, i(32)=132856, i(33)=490038, i(34)=76312, i(35)=77780, i(36)=80227, i(37)=463017, i(38)=461574, i(45)=421269, i(58)=405311, i(60)=256524, i(61)=131212, below bad contexts: + is enwik6 - is book1 in book i(26) is still ok sortof. -+ cm.set(hash(++i,x.spafdo, x.spaces,ccword)); -+ cm.set(hash(++i,x.spaces, (x.words&255), (numbers&255))); -+ cm.set(hash(++i,h, word1,word2,lastUpper<x.wordlen)); - cm.set(hash(++i,text0&0xffffff)); - cm.set(text0&0xfffff);/// i(26)=4994, book1 + cm.set(hash(++i,word0,number0, data0,xword0)); - cm.set(hash(++i,word0, cword0,isfword)); -+ cm.set(hash(++i,word0,buf(1), word2,isfword)); -+ cm.set(hash(++i,word0,buf(1), word3)); -+ cm.set(hash(++i,word0,buf(1), word4)); -+ cm.set(hash(++i,word0,buf(1), word5)); -+ cm.set(hash(++i,word0,buf(1), word1,word3)); -+ cm.set(hash(++i,word0,buf(1), word2,word3)); -+ cm.set(hash(++i,nl1-nl2,x.col,buf(1),above)); - cm.set(hash(++i,h, llog(wordGap), mask&0x1FF, )); + cm.set(hash(x.col,x.wordlen1,above,above1,x.c4&0xfF)); else cm.set(); //wordlist -+ cm.set(hash(++i,x.col,above^above1,above2 , ((islink)<<8)|)); //wordlist((istemplate)<<9)| -+ cm.set(hash((*pWord).Hash, h)); book1 compressed ​ 183314 (pxd v72) 100 sec 183752 (pxd vXX skip if i(x)>2024) 88 sec 183288 (pxd vXX no skip) 99 sec 184490 (px v183) 139 sec
    687 replies | 280906 view(s)
  • dnd's Avatar
    14th February 2020, 23:27
    Here a listing of the repository size of the some codecs used in TurboBench Compression Benchmark: brotli 37,3 MB pysap 12,8 MB zstd 9,5 MB lzma 7,0 MB isa-l 4,6 MB lzo 4,4 MB snappy 3,4 MB zlib 3,9 MB bzip2 2,8 MB Some packages are including huge test data or indirectly related files. This can reside on a separate repository. The size of the brotli repository is nearly as high as the whole linux system. The bandwidth on a lot of countries are not very high as the countries where the developers reside. Some users have only mobile connections. The paradox, we have here compressors that are designed to save internet bandwidth. Strange, that the files to download is still continuing to grow. This is also valable for games, web pages, images, ...
    169 replies | 43181 view(s)
  • kaitz's Avatar
    14th February 2020, 22:51
    kaitz replied to a thread Paq8pxd dict in Data Compression
    And if you compress as tar/zip (uncompressed), what is compressed size and speed then? ​
    687 replies | 280906 view(s)
  • Trench's Avatar
    14th February 2020, 22:45
    RichSelian When I means 1 step back 2 step forward I mean to make it permanent and not finalize to use ASCII which makes increased the randomness which makes it harder. As stated even a simple binary is random but you only deal with 2 characters unlike ascii which you deal with 255 characters which makes it even moe random. Both random but the odds are worse. It is not like people are compressing numbers with only numbers or binary with binary. But binary with number ascii. What will people come up with next a extended version of ascii to have 100,000 characters which is not the solution to throw more characters at the problem which might make it work but that has its other issues. I gave various perspectives from even as simple as a png image how its not compressed over 50% as it should be yet that is random. Or how one has to change the format like the music file over 50%. I agree its hard to compress with current method since it reaches its limit and so should other perspective be looked at? Maybe try to decompress first and see where they gets you as silly as that sounds but at lest is not trying the conventional. Even if you do achieve something to get a 50% gain out of all other compression it still not easy to be a standard. Also you tried many things that did not work, did you and other have a place to log what does not work so that others see and try a new path or see how to improve? Even I showed how changing a simple thing as 2 characters compressed an extra 3% yet how come compression programs dont use that? And if they dont know about it why not? and what standard is their to make what works or doesnt know in a chart? none? So its like going blind and we are using luck to find logic? As for ascii I think people should try to limit the use of it to go further due to the randomness odd are much greater than the lower randomness odds with something like numbers. You have a 1/2 change to guess a 1 or 0, a 1/10 for 0-9, and 1/255 for ascii. All random but the odds are different is what i mean since compression programs have a limit which maybe for good reason to have the limited dictionary. Maybe more formulas are needed as well since their is plenty room for improvement as dome examples shown. Again you will disagree or maybe I am right and time will tell, but its another perspective.
    9 replies | 462 view(s)
  • suryakandau@yahoo.co.id's Avatar
    14th February 2020, 21:30
    Could you upload your built please ? I will run the test for enwik9 n enwik10 again using your built. Thank you
    31 replies | 2648 view(s)
  • Lucas's Avatar
    14th February 2020, 18:32
    Lucas replied to a thread OBWT in Data Compression
    Your code is running way slower than it should be, compile with optimizations enabled, the compiler can do most of the hard work for you. Enwik8 100000000 -> 20772541 in 137.19 sec (compiled with -O3) 100000000 -> 20772541 in 469.99 sec (your build)
    31 replies | 2648 view(s)
  • Sportman's Avatar
    14th February 2020, 16:14
    Added.
    100 replies | 6135 view(s)
  • suryakandau@yahoo.co.id's Avatar
    14th February 2020, 16:05
    enwik9 result using bbb v1.8 on my old machine: 163,451,513 bytes compression time 10692.38 sec decompression time 6425.07 sec
    31 replies | 2648 view(s)
  • birdie's Avatar
    14th February 2020, 15:34
    @Jyrki Alakuijala Have you managed to find out anything?
    300 replies | 313399 view(s)
  • RichSelian's Avatar
    14th February 2020, 14:03
    here it is :)
    100 replies | 6135 view(s)
  • brispuss's Avatar
    14th February 2020, 12:21
    brispuss replied to a thread Paq8pxd dict in Data Compression
    Did some brief tests which may be of use. Used jpg fileset from here, consisting of 171 8 bit jpg images of slightly varying sizes. Test run under Windows 7 64 bit, using i5-3570K CPU, and 8 GB RAM. Used SSE41 compiles of paq8pxd as I can't run AVX2 compiles on the i5-3570k CPU. Each jpg file compressed individually via a batch file. Total original jpg files size = 64,469,752 bytes. Compressor Resulting total files size (bytes) Compression time (seconds) Compression options paq8pxd v69 51,365,725 7,753 -s9 paq8pxd v72 51,338,132 7,533 -s9 So, there was a slight improvement in compression for v72 with respect to v69. Also, the compression time was slightly reduced.
    687 replies | 280906 view(s)
  • vteromero's Avatar
    14th February 2020, 11:25
    vteromero replied to a thread VTEnc in Data Compression
    Version v0.0.3 is out! Link Compared with the initial version, the library is now about 30% faster when encoding and over 85% faster when decoding. However, there is still a long stretch to catch up with other similar algorithms in terms of encoding and decoding speed. Benchmarks for gov2.sorted: | Algorithm |Encoded Size |Ratio % |Encoding Speed|Decoding Speed| |:-------------------|----------------:|--------:|-------------:|-------------:| | VTEnc | 2,889,599,350| 12.08| 75.86 M/s| 97.50 M/s| | Delta+FastPFor128 | 3,849,161,656| 16.09| 641.80 M/s| 645.42 M/s| | Delta+FastPFor256 | 3,899,341,376| 16.30| 682.57 M/s| 679.37 M/s| | Delta+BinaryPacking| 4,329,919,808| 18.10| 2.32 G/s| 2.25 G/s| | Delta+VariableByte | 6,572,084,696| 27.48| 1.51 G/s| 1.59 G/s| | Delta+VarIntGB | 7,923,819,720| 33.13| 1.79 G/s| 2.86 G/s| | Copy | 23,918,861,764| 100.0| 4.96 G/s| - | I'm working now on v0.1.0 which will add the ability to specify a compression depth. This parameter will tell the encoder how many most-significant bits to encode, leaving the rest of least-significant bits untouched. In this way, the user will be able to fine-tune the degree of compression on each case, and consequently the encoding and decoding speed.
    20 replies | 3028 view(s)
  • Scope's Avatar
    14th February 2020, 04:06
    AVIF for Next-Generation Image Coding https://netflixtechblog.com/avif-for-next-generation-image-coding-b1d75675fe4
    63 replies | 16948 view(s)
  • Darek's Avatar
    14th February 2020, 00:22
    Darek replied to a thread Paq8pxd dict in Data Compression
    And 4 corpuses scores for paq8pxd_v72. Best scores from paq8pxd family for Canterbury and MaximumCompression :)
    687 replies | 280906 view(s)
  • Sportman's Avatar
    13th February 2020, 22:31
    Sportman replied to a thread Paq8pxd dict in Data Compression
    enwik8: 16,013,638 bytes, 6029.396 sec., paq8pxd_v72_AVX2 -s15
    687 replies | 280906 view(s)
  • dnd's Avatar
    13th February 2020, 18:45
    ​Turbo-Range-Coder update: - Added order 0 encode/decode in mb_o0.h. Possible RC for sizes 2, 3, 4(Nibble), 5, 6, 7, 8(byte) bits - Added chunked integer coding (mbl3) in mb_vint.h (similar to lzma match length encoding). - Slightly improved speed for lzp and bwt - new adaptive cdf predictor with SSE2/AVX2/Neon/Altivec on x86/Arm/powerpc CDF Nibble (only low nibble of each byte encoded) ./turborc -e40,48,49 enwik9 -t E MB/s size ratio% D MB/s function 80.78 475135200 47.51% 68.26 40-rc4s bitwise adaptive o0 simple 308.09 474280380 47.43% 67.40 48-rccdf CDF adaptive 306.41 474280388 47.43% 83.10 49-rccdf CDF nibble adaptive interleaved ./turborc -e40,48,49 enwik9bwt -t E MB/s size ratio% D MB/s function 137.38 156109312 15.61% 123.31 40-rc4s bitwise adaptive o0 simple 377.11 163493876 16.35% 112.27 48-rccdf CDF nibble adaptive 397.73 163493884 16.35% 143.06 49-rccdf CDF nibble adaptive interleaved CDF byte using high + low nibble ./turborc -e11,48,49 enwik9 E MB/s size ratio% D MB/s function 55.32 620093192 62.01% 47.69 11-rcs bitwise o0 simple 165.69 622450296 62.25% 51.27 48-rccdf CDF adaptive 157.96 622450300 62.25% 50.96 49-rccdf CDF byte adaptive interleaved ./turborc -e11,48,49 enwik9bwt E MB/s size ratio% D MB/s function 80.92 183542676 18.35% 74.10 11-rcs bitwise o0 simple 203.92 195826332 19.58% 88.09 48-rccdf CDF byte adaptive 200.22 195826340 19.58% 87.38 49-rccdf CDF byte adaptive interleaved
    27 replies | 2937 view(s)
  • Darek's Avatar
    13th February 2020, 10:06
    Darek replied to a thread Paq8pxd dict in Data Compression
    Scores for my testset on paq8pxd_v72. Very nice gain on audio files - about 111KB on total set. Also fine gains for textual files, however there are some loses on other kind of files -> especially K.WAD loses 10.5KB. In general, very good audio model implementation - thanks for this!
    687 replies | 280906 view(s)
  • arshalatti's Avatar
    13th February 2020, 06:56
    Hope
    22 replies | 1348 view(s)
  • suryakandau@yahoo.co.id's Avatar
    13th February 2020, 06:28
    bbb v1.8 size comptime decomptime enwik8 20,772,541 bytes 692.56 sec 428.44 sec ​:_superman2:
    31 replies | 2648 view(s)
  • suryakandau@yahoo.co.id's Avatar
    9 replies | 462 view(s)
  • Darek's Avatar
    12th February 2020, 19:25
    Darek replied to a thread paq8px in Data Compression
    for textual files difference of use -t option are crucial S.DOC: original 141'417 v185 -9 23'877 v185 -9t 22'741 v185 -9ta 22'700 R.DOC: original 120'201 v185 -9 27'068 v185 -9t 24'823 v185 -9ta 24'759
    1813 replies | 517500 view(s)
  • moisesmcardona's Avatar
    12th February 2020, 19:22
    Edit: @kaitz beat me XD
    687 replies | 280906 view(s)
  • kaitz's Avatar
    12th February 2020, 19:21
    kaitz replied to a thread Paq8pxd dict in Data Compression
    paq8pxd_v72 - Change StationaryMap, StateMap - Change jpegModel - Change wordModel1 - remove wavModel - added audio8, audio16 There my be some tiny loss on 24 bit image compression. Audio models now work, was tired probably when i could not get it working. -- Its always possible to add ppm,lstm models back.
    687 replies | 280906 view(s)
  • CompressMaster's Avatar
    12th February 2020, 18:49
    It is still impossible to compress random or already compressed data by the ways you have described. Read for example counting argument at matt mahoney´s book about data compression. The only possible way is to DECOMPRESS the input and then COMPRESS it by much stronger algorithms - for example JPEGs can be compressed up to 40%, but further compression requieres decompression first. I tried it many times alongside with all possible methods (including preprocessing) and results were always bigger files than original. So, no compression... forget about it. It´s impossible.
    9 replies | 462 view(s)
  • suryakandau@yahoo.co.id's Avatar
    12th February 2020, 17:13
    I guess the enwik10 result can be 1,637,xxx,XXX bytes
    31 replies | 2648 view(s)
  • DZgas's Avatar
    12th February 2020, 16:40
    DZgas replied to a thread paq8px in Data Compression
    Ok. exe original | 393 728| v183fix1 -6 | 371 238| v185 -6 | 371 238| v183fix1 -6e | 371 156| v185 -6e | 371 173|
    1813 replies | 517500 view(s)
  • Sportman's Avatar
    12th February 2020, 14:04
    I need a Windows binary for that.
    100 replies | 6135 view(s)
  • Darek's Avatar
    12th February 2020, 12:58
    Darek replied to a thread Paq8pxd dict in Data Compression
    > Also im8 compression is allot worse then in px version. Cant figure out why. But new changes on im24 compression gives you the best overall scores for paq8pxd_v70 on my testset files. :)
    687 replies | 280906 view(s)
  • Darek's Avatar
    12th February 2020, 12:53
    Darek replied to a thread paq8px in Data Compression
    @DZgas - you have right that for pure comparison between versions of the same compressor it is better to not to use switches. My tests try to find maximum compression from each compressor/version which means also use the most optimal switches - especially for textual files it gives much better results on paq8px serie. Use of maximum options gives also proper comparison for other programs like cmix, cmv, emma, nncp or even paq8pxd - best score to best score.
    1813 replies | 517500 view(s)
  • RichSelian's Avatar
    12th February 2020, 12:27
    could you also add orz v1.6.1, please? the source code is here: https://github.com/richox/orz/tree/v1.6.1
    100 replies | 6135 view(s)
  • DZgas's Avatar
    12th February 2020, 10:24
    DZgas replied to a thread paq8px in Data Compression
    @Darek I can not run v184, im am non-AVX user. I do not think that v184 can be tested due to problems. I never used -e -t because they affect of tests and results(is seen in your example). Why are you using this?
    1813 replies | 517500 view(s)
  • RichSelian's Avatar
    12th February 2020, 08:46
    yet another "Random Compression" post :D by the way, all BWT based compressors do "go back 1 step then go forward 2 step" by BWT transform. it always enlarge the file with 4 or 8 bytes.
    9 replies | 462 view(s)
  • suryakandau@yahoo.co.id's Avatar
    12th February 2020, 06:28
    bbb v1.7 enwik8 20779982 compression time 718.26 sec decompresssion time 435.48 sec i have not tested enwik10 yet because my main job (not as a programmer) is very busy. :_read2:
    31 replies | 2648 view(s)
  • Trench's Avatar
    12th February 2020, 03:44
    Hi To put compress a random 255 character ASCII file in a 255 character limit file not likely.The dictionary/words/block size would need to be far greater which would mean a unrealistic bigger processing power. But to compress a random 10 digit 0-9 file in 255 characters likely, but not if you try to compress it in 10 characters. pointless I find compression more like cheap tricks than skills. With compression I see programs just take a file and try to go 1 step forward to compress it than go 1 step back and 2 steps forward. I see people selling compression programs but it does not mater since even a small % gain is not worth changing the standard. Many even still reply on Zip despite 7zip is better. File compression seems like maybe it should try to not compress it at 100% of its ability but have some room to take that 1 step back and try again which needs a new approach. math Large digit numbers can end up with luck to be compresses in a small size and fast time. Sometimes files compression comes down to luck. Example try to compress a simple number like 3,486,784,401 This can be compressed theoretically to 1 digit in 1 steps (90% compression). Answer is 9^9 (raises the quantity to a power of itself =9). But if 1 number is off or even if half the digit length then its impossible to beat the original number. Even the current compression methods have a limited instruction range in how to compress things. Things have to be set up right like domino to be taken down easily. But if you try to make a program that take that's advantage of that well that might be some tough math, unless you find a way around it. But how can someone dismiss something when not evaluating, or maybe I am wrong and if so just continue to your road block standard and don't look at other avenues. audio Even music files can not be compressed well but MIDI files can since it is presented in a different way. They are structured very differently since you are not compressing the tune since the tune is in the file that loads it but the pattern. Same thing with older art tile sets for old systems since they can have a large image yet compress better than modern pictures since you are compressing a pattern since the art set is in the file that loads it. Most programs now do not have the file that loads them as the main thing. Some do which is rare which you can not tell the difference form a 10mb instrumental audio file to a few KB custom music file which takes the MIDI approach to thing. image Even images files do not deal with the pattern but take the image as it is. Compression programs can not come close to actually compressing a file as the original program does to the file if the file was converted to modern formats. Which in it of itself is even "more" random. Example a 8bit game with half the artwork presented alone in png format (compressed only 5%) is over 55% bigger than the original game file compressed which has more art and program. In other words their is a at least a 55%gap in modern compression performance since it does not know how to set up let alone look for patterns in a small 300KB png image file with no artifacts and limited colors. Even those 2 examples show their is some reevaluation needed when it comes to compression in pattern recognition. setup The file has to be set up to be willing to be compressed just like a hyperactive moving child needs to calm down to take a picture. If its set up is correct the end result will be better, if the setup is not correct then cheap tricks can not fix it but more advanced image enhancing programs can set it up better. Just like how you can not change the initial code to a PNG file but maybe (not likely) you can try to set it up to be taken down like domino as was 9^9. switch Maybe compression has to think out of the box and people should try to avoid the standard and find something completely different to get different results. You will say I am wrong but its fine, I am just presenting a different perspective. For example another "trick" is if you switch something as simple as a binary files characters you can compress the file a little over 3% more. Yet compression programs don't take advantage of that. But then again compression programs is like trying to use a butcher knife to do everything form cut hair to surgery. If something is not flexible then its will break which becomes broken. skills I forgot general programming a long time ago but so I am talking from a different perspective but I only do programming formulas on spread sheets. But just like every other activity in the world you need others perspectives to give you the edge in things. You may make a better burger than McDonald but you sure can not sell more than McDonald's. Everyone that knows better than McDonald still lack a list of many skills form marketing, accounting, real estate, art, design, nutrition, psychology, etc and the list goes on to about "almost" every profession. chart In the end if you do it for fun then good, if you do it for work then not worth it. Some people like pointless activities and some try to be like Edison how he test over 100 light bulbs to note for research what method does not work to help save others time. But like a periodic table that predicted elements before discovered, it almost seems like we are all at the pre periodic table stage trying to stumble to find a new element in the dark without understanding what we have than focusing on the big picture. Again just another perspective. Thanks
    9 replies | 462 view(s)
More Activity