Activity Stream

Filter
Sort By Time Show
Recent Recent Popular Popular Anytime Anytime Last 24 Hours Last 24 Hours Last 7 Days Last 7 Days Last 30 Days Last 30 Days All All Photos Photos Forum Forums
  • dnd's Avatar
    Today, 21:53
    Thank you for your elaboration, corrections and hints. I've removed recently some old, not maintained or not notable codecs. Many codecs listed in the readme, but not in the turbobench repository must manually downloaded and activated in the make file. I will continue to clean, simplify and automate the process.
    170 replies | 43212 view(s)
  • JamesB's Avatar
    Today, 20:51
    JamesB replied to a thread OBWT in Data Compression
    We use mingw/msys for building some of our software at work. A while back a colleague wrote a guide on how to do this here: https://github.com/samtools/htslib/issues/907#issuecomment-521182543 That's the Msys64 build of mingw64 which is a bit different to the native mingw build, but for us it works better as it has a more consistent interface and doesn't try to unixify all pathnames and drive letters. Your mileage may vary, but the instructions there hopefully help. Regarding BBB itself, I think it would be best to edit the version string to indicate this isn't and official BBB release but your fork. This happened a lot with the PAQ series. Eg BBB_sury v1.9. It then acknowledges the heritage, but also that it's a new project and not by the original author. Edit: naturally the htslib part of that link is irrelevant for you, but the pacman command (or most of it anyway) is important as it outlines how to install the compiler suite once you've got msys up and running.
    34 replies | 2784 view(s)
  • Darek's Avatar
    Today, 20:34
    Darek replied to a thread Paq8pxd dict in Data Compression
    From one side that's good news -> if you could change this then scores for enwik8 could be even better!
    695 replies | 281236 view(s)
  • kaitz's Avatar
    Today, 19:03
    kaitz replied to a thread Paq8pxd dict in Data Compression
    My bad, level 10 and up will be in -s mode.
    695 replies | 281236 view(s)
  • Darek's Avatar
    Today, 18:58
    Darek replied to a thread Paq8pxd dict in Data Compression
    @kaitz - I have question - is there -x option works for -x15 option? I've tested enwik8 files and scores for -s15 and -x15 are identical. Timings are also similar. For my testset, for textual files there are difference between -s9 and -x9 options. From other side scores are great: 16'013'638 - enwik8 -s15 by Paq8pxd_v72_AVX2 16'672'036 - enwik8.drt -s15 by Paq8pxd_v72_AVX2 15'993'409 - enwik8 -s15 by Paq8pxd_v73_AVX2 15'956'537 - enwik8.drt -s15 by Paq8pxd_v73_AVX2
    695 replies | 281236 view(s)
  • kaitz's Avatar
    Today, 18:48
    kaitz replied to a thread Paq8pxd dict in Data Compression
    In single mode there is file overhead about 50 bytes per file vs px version. Like when you compress 1 byte file. For this single mode test i think its about 9000 bytes total. Not sure how much overhead is on tarred file, probably 100 bytes total for input data. (data that cant be compressed)
    695 replies | 281236 view(s)
  • pacalovasjurijus's Avatar
    Today, 18:43
    The Random file: ​
    24 replies | 1413 view(s)
  • Darek's Avatar
    Today, 18:23
    Darek replied to a thread Paq8pxd dict in Data Compression
    Scores on my testset for paq8pxd_v73 - very nice improvements overall - especially for K.WAD file. In total 25KB of gain. Option -x (second table) gives additional 16KB of gain to -s test -> the gains are visible almost for every file. Time penalty for -x option compared to -s is about 21%.
    695 replies | 281236 view(s)
  • Lucas's Avatar
    Today, 17:19
    Lucas replied to a thread OBWT in Data Compression
    Mingw is a suite of development tools. When you install mingw it creates an environment variable for all of the executables which make up mingw. If you have installed mingw, reboot your machine then open command-prompt (or any other terminal tool you like) and type "g++". You'll be able to compile with that, as well as use optimization flags.
    34 replies | 2784 view(s)
  • Jyrki Alakuijala's Avatar
    Today, 17:11
    We looked into this in detail. The bitrates used in the cameras are 4-5 BPP, and at 1.5 BPP we can provide similar quality. The JPEG image bytes going into Chrome in June 2019 average between 2-3 BPP depending on the image size. See for reference: https://www.spiedigitallibrary.org/conference-proceedings-of-spie/11137/111370K/JPEG-XL-next-generation-image-compression-architecture-and-coding-tools/10.1117/12.2529237.full?SSO=1 When we deliver the same images with 35 % of the bytes, we are talking at 0.7 to 1.05 BPP for internet use. This is a 3x saving on the current internet practice. The encoder is tuned such that this responds to a distance setting of 1.4 to 2.1. At these distances ringing is not yet a problem in JPEG XL. There is a marging of about doubling the compression density before JPEG XL loses the throne to video codecs. People actually do care about image quality. E-commerce sales are up with XX % percent with higher quality images, click rates to videos can be higher for higher quality thumbnails. This is why people are not sending blurred or distorted images as final images in the internet. Also, I don't expect the quality requirements to go down as technology matures. If anything the design decisions and positioning to medium and high quality becomes a more ideal selection in the future. As a last note, we have two ways to counteract ringing in the lowest quality: adaptive quantization and filter range field. When we max out the filter range field, we get similar blurred results like video codecs do. With adaptive quantization we can allocate more bits in rare areas that show ringing. It is just that our current encoder development has focused on practically relevant use cases which are in the distance setting 1.0 to 2.0.
    73 replies | 17390 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Today, 16:48
    bbb v1.9 enwik10 using cm1000 option 1635575990 bytes i have attached the binary n source code. @shelwien could you compile it please ? i have download mingw from your link but there is no executable inside that. i have downloaded from nuwen.net and still can not compile it. so i still use dev c++ 5.11 to compile it
    34 replies | 2784 view(s)
  • JamesB's Avatar
    Today, 14:12
    I guess if you could find funding for a virtual machine instance (eg AWS of Google Cloud support) then one way of "winning" the competition is being on the Pareto frontier - either encode or decode. It encourages research into all aspects of data compression rather than simply chasing the best ratio. It can be on hidden data sets too (SequenceSqueeze competition did that) so it's slow and hard to over-train to one specific set. The reason for a standard virtual machine is automation and reproducibility. Either that or one person has to run every tool themselves for benchmarking purposes. Edit: I'd also say some subtle data permutations of input could scupper the race for the best preprocessors, especially if the permutations are secret, while still keeping the same general data patterns. Eg something as dumb as rotating all bytes values by 17 would completely foul many custom format specific preprocessors while not breaking general purpose things like automatic dictionary generation or data segmentation analysis.
    22 replies | 1583 view(s)
  • brispuss's Avatar
    Today, 10:45
    brispuss replied to a thread Paq8pxd dict in Data Compression
    I've run some tests with paq8pxd V73 and added the results to the table below. Tests run under Windows 7 64 bit, with i5-3570k CPU, and 8 GB RAM. Used SSE41 compiles of paq8pxd V*. Compressor Total file(s) size (bytes) Compression time (seconds) Compression options Original 171 jpg files 64,469,752 paq8pxd v69 51,365,725 7,753 -s9 paq8pxd v72 51,338,132 7,533 -s9 paq8pxd v73 51,311,533 7,629 -s9 Tarred jpg files 64,605,696 paq8pxd v69 50,571,934 7,897 -s9 paq8pxd v72 50,552,930 7,756 -s9 paq8pxd v73 50,530,038 7,521 -s9 Overall, improved compression, and slight reduction in compression time for v73!
    695 replies | 281236 view(s)
  • Jarek's Avatar
    Today, 09:41
    I was focused on steam pattern on coffee - AVIF destroyed it, HEIC and JXL maintained it but added artifacts ... and indeed JXL has additional ringing on the cup, but all 3 have artifacts there.
    73 replies | 17390 view(s)
  • Jaff's Avatar
    Today, 06:14
    Jaff replied to a thread Papa’s Optimizer in Data Compression
    Put this before optimize JPEG: 0) Extract JPEG trailer to separate file (need latest exiftool 11.87) 5) add back the trailer: 1.jpg = optimised file, 2.dat (saved trailer); 3.jpg (new optimized file) Let's rename file back and delete temp files... I wait for the new version. Now you can add option to strip the trailers or not. :_confused2:
    79 replies | 13017 view(s)
  • Shelwien's Avatar
    Today, 04:13
    > Sometime looking at things conventionally we get limited to conventional limits. Its not about conventions, but rather about hardware constraints. For example, there's a known method which allows to combine most of different existing compressors together as cmix submodels. Also just using more memory and more contexts still helps too. But business only cares about efficient algorithms, like ones that transparently save storage space. Also from the p.o.v. of Kolmogorov Complexity the common arithmetics operations are less efficient than NNs or LPC: https://en.wikipedia.org/wiki/Function_approximation > How much does compression get better every year? For cmix its about 250kb per year. > How many types of compression are their and can they be charted out in a logical manner? Compression is approximate enumeration of datatype instances, so we can say that there're N*M types of compression, where N is the number of types of data and M is the number of approximation methods... essentially its infinite. > What compression methods did not work? Mostly ones that you suggest, I suppose. "Common sense methods" mostly don't work: https://en.wikipedia.org/wiki/List_of_cognitive_biases > How many hours have people wasted trying to look at something that thousands > of others have wasted looking at which could have looked elsewhere? I'd say less than necessary. Compression developers too frequently don't actually look at the data at all (because hex viewers are not integrated in OS and/or IDE, etc), but instead work based on their imagination and test results. > If 1 digits can compress 4 digits and 4 digits can compress 16 digits > the why cant the new 4 digit can not compressed to 1 again? Because people mostly care about decompression... and a single digit has too few different values to be decompressed to many different files. > It cheated in the first place maybe to reply on another format like like hex or ASCII? Compression algorithms normally only work with binary data - there's no hardware for efficient storage of eg. decimal digits (or ASCII), thus there's no software to support it. > In the end is recognizable randomness vs unrecognizable randomness. In the end most of data sequences are incompressible, there're just too many of them, much more than the number of possible generator programs of smaller size (since these have dups and failures).
    12 replies | 603 view(s)
  • cade's Avatar
    Today, 03:18
    This map is rendered with a map loader that runs commands of the type: At x, y put source tile/sprite z. Tiles and sprites are either perfectly full chunks or chunks with transparency (bit mask). Something as simple as LZ compression can recognize 1D segments of these tiles/sprites, which is a lot more useful than .png (looks for linear patterns, not matches, then entropy reduction). Simple template matching with fixed-size images can match those patterns and reproduce the original instructions of (x,y,z). Or to put it simply, .png is wrong information representation for that format.
    12 replies | 603 view(s)
  • Trench's Avatar
    Today, 02:56
    James you know better but I was giving that as an example. I could have made a 8x8 or 16x16 sprite and duplicated it to 10000x10000. Point is the for the compression program to have a library of methods to handle certain structures. Sure plenty of variations but to have a reasonable progressive structure in the compression program, to take unto account patterns, gradient, shapes, etc. Sometime looking at things conventionally we get limited to conventional limits. How much does compression get better every year? How many types of compression are their and can they be charted out in a logical manner? What compression methods did not work? How many hours have people wasted trying to look at something that thousands of others have wasted looking at which could have looked elsewhere? Obviously these all cant be answered easily but maybe before people try to make the next best compression program to understand the basics information. If 1 digits can compress 4 digits and 4 digits can compress 16 digits the why cant the new 4 digit can not compressed to 1 again? It cheated in the first place maybe to reply on another format like like hex or ASCII? In the end is recognizable randomness vs unrecognizable randomness. Or no pattern vs pattern its all random but programs only prepare for certain patterns and not all since it can be confusing. If we never has ASCII or HEX would compression be possible? Or If we take on the complexity of unrecognizable be more recognizable of 1@C/ to 1234 would that help? Again you guys know best and I am just asking questions.
    12 replies | 603 view(s)
  • hexagone's Avatar
    Today, 02:54
    I know what it is, I have done image lossy compression myself. I was just mentioning the issue since Jarek said that JXL looked the best for this picture and other codecs do not show this issue.
    73 replies | 17390 view(s)
  • kaitz's Avatar
    Today, 01:42
    kaitz replied to a thread Paq8pxd dict in Data Compression
    paq8pxd_v73 - Change text detection - Change wordModel1 - Add mod_ppmd/chart model back (-x option)
    695 replies | 281236 view(s)
  • algorithm's Avatar
    Today, 01:19
    They are called ringing artifacts.It is a byproduct of DCT. It is the Gibbs phenomenon. JPEGXL has a lot of ringing. JPEG XL is probably the best replacement for jpeg for digital cameras. But for internet distribution were bitrate is lower it has weaknesses.
    73 replies | 17390 view(s)
  • hexagone's Avatar
    Yesterday, 23:35
    For coffee, JXL has boundary artifacts on the rim of the cup (180-270 degree angles) For landscape, JXL does something weird with the clouds.
    73 replies | 17390 view(s)
  • Scope's Avatar
    Yesterday, 22:42
    I will try to find such examples, usually other formats and different encoders were not particularly better (I can remember something like a manga or art with clear lines). But I didn’t do much testing of HEIC (I tried to focus on open and royalty-free standards). AV1 is very slow and it has a problem with strong blurring and loss of details and this cannot be fixed by changing the settings (it seems it was very optimized and tuned for streaming video, but not static images), there is a similar problem with WebP. ​ I tested at speed 8 (kitten), it was the default setting until the last update (now it is 7 - squirrel), I did not really notice the quality improvement at faster settings, but there were strange results at speed 9 (tortoise), so I stopped at 8, but I will try again. I tried to test the encoders without changing or filtering the images and show what they have their own tools for this. Modular mode is now enabled with quality settings (-q) below 40, the last example has already been encoded in this mode. I tested it separately, on manga/comics and drawings, at lower bitrates it sometimes shows better visual quality.
    73 replies | 17390 view(s)
  • Shelwien's Avatar
    Yesterday, 21:25
    Sure, but its pretty hard to setup a fair competition with just that. LZMA has old style (from time when out-of-order cpus didn't exist) entropy coding, but there're several lzma-based codecs developed for new cpus - LZNA etc. These new codecs don't really beat lzma in compression, but do have 5-8x faster decoding. Also there's RZ and various MT schemes (including PPM/CM) which would consistently beat 7z LZMA2. Problem is, I'd like this to be an actual competition and not just finding and awarding the best non-public preprocessor developer.
    22 replies | 1583 view(s)
  • pacalovasjurijus's Avatar
    Yesterday, 19:27
    New version of comression: Test: Before: 3,145,728 Bytes 1(4)) (Random) After: 3,145,724 Bytes 1(4)).b
    24 replies | 1413 view(s)
  • Jyrki Alakuijala's Avatar
    Yesterday, 16:37
    One possible mitigation with JPEG XL's steeper degradation at lowest BPPs is to resample the image to a slightly smaller resolution (for example ~70 % smaller) before compression. It might be a good idea to try out on these 'go down to 10 kB comparisons' to represent the real use of extreme compression in the web. ​This would likely work pretty well for photographs at these BPPs (< 0.4 BPP), and less well for pixelized computer graphics. Then again there is another mode (FUIF-inspired modular mode) in JPEG XL for that.
    73 replies | 17390 view(s)
  • Jyrki Alakuijala's Avatar
    Yesterday, 16:18
    Superb presentation and good analysis of the codec situation. What if you run JPEG XL encoder with the --distance 1.0 setting, will you be able to find an encoder that does better for any of the photographs (with consuming the same amount of bytes that JPEG XL used for 1.0)? More advanced guestion: at which distance settings will other encoders start to compete with JPEG XL? Would JPEG XL still win on every image if distance 1.5 would be used? (if it won at 1.0 :-D) Also, I suspect that our slower speed settings (slower animals) are only great at around distance 1.0-1.5, and they can be harmful for very high distances. You might get better results with low bit budgets with the faster settings, without looping in butteraugli. (Butteraugli only works very well at small distances.) I haven't tested the slowest settings for a rather long time, so they might have bugs, too. Possibly better to run with default settings :-) ​Do you have personal experience about the speed settings and quality?
    73 replies | 17390 view(s)
  • Jarek's Avatar
    Yesterday, 16:17
    Thanks, they have very different artifacts, HEIC seems better than AVIF. For example in https://imgsli.com/MTIyMDc/ AVIF completely destroys water and sky, JXL is definitely the best - this and faces probably come from perceptual evaluation. The upper bridge in https://imgsli.com/MTIyMDY/ - again completely destroyed by AVIF, the other two handle. In https://imgsli.com/MTIyMDg/ all 3 have nasty artifacts of its strange sky. For coffee in https://imgsli.com/MTIyMTk/ JXL looks the best. Fox https://imgsli.com/MTIyMjA/ is blured in all 3, HEIC is definitely the best. In this abstract https://imgsli.com/MTIyMjQ/ JXL has really nasty boundary artifacts. In this 10kB landscape https://imgsli.com/MTIyMzA/ AVIF is the best due to smoothing.
    73 replies | 17390 view(s)
  • Scope's Avatar
    Yesterday, 15:20
    It was me, I also added a comparison with HEIC (HEVC+HEIF) and extreme compression.
    73 replies | 17390 view(s)
  • Mauro Vezzosi's Avatar
    Yesterday, 14:37
    hbcount is always 0 in line 9905 (jpegModelx.p()): ​9869 if (hbcount==0) { 9905 cxt=(!hbcount)?hash(mcupos, column, row, hc>>2):0; // MJPEG
    695 replies | 281236 view(s)
  • necros's Avatar
    Yesterday, 12:39
    Contest goal could be outperforming for example 7z LZMA2 compression of multiple data set in terms of same or lower time and same or better compression.
    22 replies | 1583 view(s)
  • compgt's Avatar
    Yesterday, 11:42
    @JamesWasil, Ok... that's the official "history" you know. But i remember this the classified top secret part of 1970s to 80s Martial Law Cold War when the Americans were here in the Philippines. It was me a kind of "central hub" among scientist networks that time (so i learned from many, and privy to state of the art science and technologies that time), dictating on what-would-be computer science history for the 80s, 90s and 2000s (e.g. in data compression and encryption, IBM PC, Intel, AMD and Microsoft Windows dominance etc.). Just think of me then as a top military analyst. I mean, i wasn't just a player on all this; it was me moderating on everything tech. I knew i already co-own Apple and Microsoft. I guess i decided to officially be co-founder of Yahoo, Google and Facebook, but didn't happen officially in the 1990s and 2000s. There was "too much fierce" competition amongst tech companies. I mean, it was Cold War, a real war. The Cold War officially ended in the early 1990s, with the Americans leaving the Philippines, military bases left behind or demolished. In short, the real computing history of US (and the world) was made, written, and decided here in the Philippines, with me. I chose Steve Jobs. I glorified Bill Gates, bettering his profile more and more. I chose Sergey Brin and Larry Page for Google, and i decided for a Mark Zuckerberg Chairman-CEO profile for Facebook. Too many ownerships for me in tech that they became greedy, wanted my ownerships for themselves, or decided to move on without me. That is, they asserted to other player groups my decisions or timetable for them to own the tech giants, but without me. What kind is that?! In late 1980s, however, they reminded me of Zuckerberg and Facebook, implying a chance for me to officially "co-found" Facebook. I remember this encode.su website and GUI (1970s), as the names Shelwien, David Scott, Bulat Ziganshin, dnd, Matt Mahoney, Michael Maniscalco, Jyrki Alakuijala. Some of them would come to me in the Philippines in the 80s...if it was really them. By early 1990s i was forgetting already. In the mid 2000s i was strongly remembering again. If i hear my voice in the bands "official" recordings of Bread, Nirvana, America, Queen, Scorpions etc, i then strongly believe these computer science memories.
    12 replies | 603 view(s)
  • JamesWasil's Avatar
    Yesterday, 10:28
    There's a lot of things to address here, but first this: 1) Compgt: PAQ did not exist in the 70's or 80's. Wasn't conceptualized until the middle of the 1990's, I remember reading the initial newsgroup and forum threads for it back then under comp.compression and other forums/sites that no longer exist. Some of the file formats mentioned didn't either. Just politely letting you know of that so that you don't get flamed for that someday in a conversation or post, even if you were working on predecesor algorithms at that time. Trench: You have to understand the difference between pixel by pixel compression and macro sprites. The NES, Sega Master System, and other game systems from that time used a z80 or similar processor which ran anywhere from 1mhz to 8mhz baed on what it was. The graphics and audio were usually separate but dedicated processors running at about the same speed. Even still, while having a processor for each made a lot of things possible that a single chip would have struggled to do for the time, ram and storage space was still really expensive. They didn't want to use bloated image formats and they needed animation and scrolling to be fast. What they did is use sprites that were limited to 64x64 pixels (or less) and made background sprites that were used as tiles to create the images, maps, and backgrounds that you see. Larger Sprite animations were at times 2 or 3 of those large blocks synchronized to move together, but at times did flicker because of latency and refresh rate issues when the gpu tried to do too much at once, which was evident when too many sprites were on the screen at the same time. What this means is that out of a given screen area of say 480x284 (may be different but as an example), the entire background "image" was a jigsaw-piece assembled Sprite layer of tiles that were "stamped" in blocks, to where 64x64 pixel blocks squared which is the equivalent of 4096 pixels were represented with a pointer that was either 6 bits or 1 8 bit byte. This means that for every 4096 pixels, they were represented by 1 byte rather than 16. Yes, you might be able to fit entire images - stamped as Sprite macros to form that image - at 1/16th it's size for an NES game. But any image you're able to make with it is repetitive and restricted to Sprite artwork only loaded. PNG, GIF, and even lossy formats like JPEG are NOT restricted to premade macro images / aka graphical text fonts and have to be able to process, compress, and display ANY pixels you throw at it for an image. The same was done for audio and audio effects to make sure it all fit under 256 to 512k per cartridge. The earlier systems like the first generation of Sega Master Systems had to fit under 64k on game cards for the SG1000, and you were able to see the differences and restrictions even more to get it to fit. There are different types of compression, and not all are equivocal to one another. If there is a database of apriori knowledge, compression can be done with that. Compression isn't limited to the poor explanation of the tired and repetitive references to mahoney's "data compression explained", because in truth that is data compression only in one form and only explained one way. There are many other ways it can happen, and the NES and Sega implementation of Sprite image layers for KNOWN data demonstrate that. Compression of unknown data becomes more difficult of course, but not impossible. Just very difficult and only possible with methods that can address it.
    12 replies | 603 view(s)
  • Trench's Avatar
    Yesterday, 01:36
    The file you compressed has mostly empty space and few colors, just like any file if its mostly empty it will go the same. But here are some examples of something more complex that cant be compressed desire original program can. Unlike something like an NES game map online which the size is 1,907kb and compressed to 1,660kb. While looking online for the file size it says it is 75KB (95% compression) despite it includes more art, sound, and code. Another game is earthbound on the list which the map is 3.4mb while the file size online says 194KB (94% compression) This applies to 100mb png image files that can not come close to the original which would probably be 5MB Plenty of patterns yet compression can not even recognize the patters. Shouldn't this be a simple thing that is not implemented yet? 95%! https://vgmaps.com/Atlas/NES/CastlevaniaII-Simon'sQuest-Transylvania(Unmarked).png
    12 replies | 603 view(s)
  • Jarek's Avatar
    Yesterday, 01:15
    Thanks, that's a lot of priceless satisfaction. There is also WebP v2 coming ( https://aomedia.org/wp-content/uploads/2019/10/PascalMassimino_Google.pdf ), but I don't think it will have a chance with JPEG XL - your big advantage is perceptual evaluation. Also VVC is coming this year with HEIF successor (VIF?), but will rather have costly licenses, also computationally much more complex.
    73 replies | 17390 view(s)
  • Sportman's Avatar
    16th February 2020, 21:40
    Added Shelwien compile.
    100 replies | 6216 view(s)
  • Jyrki Alakuijala's Avatar
    16th February 2020, 19:28
    Thank you!! ​ Also, JPEG XL is the only ANS-based codec in this comparison :-D
    73 replies | 17390 view(s)
  • Shelwien's Avatar
    16th February 2020, 17:31
    Ask me :)
    1 replies | 46 view(s)
  • maorshut's Avatar
    16th February 2020, 17:21
    Hi all, How can I change my username or delete my account if changing is not possible ? Thanks
    1 replies | 46 view(s)
  • Sportman's Avatar
    16th February 2020, 13:38
    Sportman replied to a thread BriefLZ in Data Compression
    252,991,647 bytes, 4628.815 sec. (1 hour 17 min) - 3.683 sec., blzpack -9 --optimal -b1g (v1.3.0)
    1 replies | 292 view(s)
  • suryakandau@yahoo.co.id's Avatar
    16th February 2020, 11:31
    How about bbb cm1000 for v1.8 ?? i use bbb cm1000 for v1.8
    100 replies | 6216 view(s)
  • Sportman's Avatar
    16th February 2020, 11:24
    Added default mode.
    100 replies | 6216 view(s)
  • Jarek's Avatar
    16th February 2020, 10:02
    Indeed, the comparisons are great: poster ~80kB: https://imgsli.com/MTIxNTQ/ lighthouse ~45kb: https://imgsli.com/MTIxNDg/ windows ~88kb: https://imgsli.com/MTIxNDk/ ice ~308kB: https://imgsli.com/MTE3ODc/ face ~200kB: https://imgsli.com/MTE2MjI/ JPEG XL is definitely the best here for maintaining details of textures like skin.
    73 replies | 17390 view(s)
  • Shelwien's Avatar
    16th February 2020, 05:32
    Shelwien replied to a thread OBWT in Data Compression
    Here I compiled 1.8. But you easily can do it yourself - just install mingw: https://sourceforge.net/projects/mingw-w64/files/Toolchains%20targetting%20Win32/Personal%20Builds/
    34 replies | 2784 view(s)
  • suryakandau@yahoo.co.id's Avatar
    16th February 2020, 04:48
    could you upload the binary please ? because it is under GPL licenses
    34 replies | 2784 view(s)
  • Jyrki Alakuijala's Avatar
    16th February 2020, 02:48
    I'm looking into this. No actual progress yet.
    300 replies | 313514 view(s)
  • Jyrki Alakuijala's Avatar
    16th February 2020, 02:39
    Someone compares AVIF, WebP, MozJPEG and JPEG XL with a beautiful UI. https://medium.com/@scopeburst/mozjpeg-comparison-44035c42abe8
    73 replies | 17390 view(s)
  • jibz's Avatar
    15th February 2020, 19:01
    jibz started a thread BriefLZ in Data Compression
    Since the BriefLZ 1.2.0 thread disappeared, here is a new one! I've just pushed BriefLZ 1.3.0 which includes the forwards binary tree parser (btparse) which was in the latest bcrush and blz4. It improves the speed of --optimal on many types of data, at the cost of using more memory. The format is still backwards compatible with BriefLZ 1.0.0. enwik8 blzpack-1.2.0 --optimal -b100m 30,496,733 30 hours silesia.tar blzpack-1.2.0 --optimal -b205m 63,838,305 about a week enwik8 blzpack-1.3.0 --optimal -b100m 30,496,733 95 sec silesia.tar blzpack-1.3.0 --optimal -b205m 63,836,210 4 min enwik9 blzpack-1.3.0 --optimal -b1g 252,991,647 10.5 hours Not sure why the result for silesia.tar from 1.2.0 from two years ago is slightly higher, but not going to rerun it. If anyone has a machine with 32GiB of RAM, I would love to hear how long --optimal -b1g on enwik9 takes, because the results for this machine (8GiB RAM) includes swapping. Attached is a Windows 64-bit executable, and the source is at https://github.com/jibsen/brieflz
    1 replies | 292 view(s)
  • pacalovasjurijus's Avatar
    15th February 2020, 18:26
    Software: White_hole_1.0.0.1.6 Before: 2019-07-01.bin (little bit different but Random) 1048576 Bytes After: 2019-07-01.bin.b 1048396 Bytes Time 6 minutes
    24 replies | 1413 view(s)
  • compgt's Avatar
    15th February 2020, 15:35
    So, if you somehow compressed a string of jumbled characters into a significantly smaller (or program) size, it is simply "random-appearing" and not algorithmically random.
    6 replies | 67 view(s)
  • compgt's Avatar
    15th February 2020, 15:23
    To solve the bombshell, or presence of "anomalous" symbols, (1) you must have a way to create a smooth frequency distribution or the frequencies must be of the same bitsize as much as possible, of which there are many ways to do this, but must be reversible. For example, you can maybe XOR the bytes of the data source first (this is reversible), pre-whitening it for encoding. (2) Or, the bombshell symbol or byte can be thought of as an LZ77 literal, simply output a prefix bit flag (anomalous symbol or not?) for the symbols. This means at most two bits per symbol encoding, with the bit flag to indicate if the symbol sum doubles or MSbit. Plus the anomalous symbol when it happens, 8 bits. I wonder how large would the frequencies or freqtable be... And, like in Huffman coding, you can contrive or generate a file that is exactly perfect or suitable for this algorithm. What would be interesting is that the generated file is a "random-appearing data" file, perhaps indeed incompressible to known compressors. (See post above, now has pseudo-code for your easy understanding.)
    6 replies | 67 view(s)
  • compgt's Avatar
    15th February 2020, 15:22
    > 4. Instead of 8-bit bytes, use 4-bit symbols; How about 2-bit (base-4) symbols? Or maybe even better, a data source of base-3 symbols ??
    6 replies | 67 view(s)
  • compgt's Avatar
    15th February 2020, 15:21
    The compression algorithm is best understood if you "visualize" a bar chart or a histogram, where new symbol frequencies are always trying to become greater than the current highest frequency, which we increment by its delta with the new symbol's frequency. The new highest frequency becomes the new symbol's frequency; or put simply, the new symbol must have the highest frequency. So at most, the new highest frequency can only "double" or add by 1 bit in length. (In decoding, the symbol with the highest frequency is the symbol to decode; this means it is stack based. We add the delta to the highest frequency during encoding so we can preserve or get back to the previous frequency of the symbol when decoding.) Output is actually the frequency table, which is easy to compress or generate? Algorithm pseudo-code: /* initialize frequency table. */ for (i=0; i < 256; i++) freq = i + 1; max = freq; do { c = get_byte(infile); if (c == EOF) break; freq = max + (max - freq); max = freq; } while (1); No "runs" of a single character allowed in the input, as much as possible. "Random data" indeed. New or recalled observations: 1. This algorithm ironically "expands" the frequencies at first. ? LOL We're back to the early days of information theory or data compression history! 2. The bombshell: It takes more than 1 bit added to encode for very small frequencies which suddenly must be maximum. The solution might be to "swap" them but this requires new information or codes. This is back to delta coding. haist 3. But a total cycling of the frequency table might work... 4. Instead of 8-bit bytes, use 4-bit symbols; *** This is similar, i think, to WEB Technologies' algorithm as featured in BYTE magazine in 1992 and noted by comp.compression FAQ: "WEB, in fact, says that virtually any amount of data can be squeezed to under 1024 bytes by using DataFiles/16 to compress its own output multiple times." I think they were using or playing with a frequency table too, 256 32-bit frequencies = 1K. They might had to output the MSbit of the highest frequency, the result of which may equal another byte frequency/ies? That's why they had the problem that 4 numbers in a matrix are equal, a rare case in their algorithm. Just maybe. (Ideally, at most 1 bit increase in frequency of output or new symbol, but the bombshell precludes that. If they are of the same bitsize, then only 1 bit increase in the new max frequency. The current symbol has always the highest frequency. You decode backwards, from last symbol to first; the symbol with the highest frequency is the current symbol. One parameter in decoding is the famed file_size(). ) The problem with the algorithm is that the emitted frequency table could be very large due to very large frequencies if you implement it by really using BigNums or BigInts; You then have to compress the very large frequency table. Maybe to achieve compression, you can just consider the MSBit after the arithmetic (addition) operation. Or the solution is nearly just MTF (you have to output the character that *doubled* (MSBit activated)). WEB Technologies' Datafiles/16 algorithm is clearly designed for compression of *random* data, and recursive, which are futile indeed.
    6 replies | 67 view(s)
  • compgt's Avatar
    15th February 2020, 15:20
    Random data compressor #2 How about compressing an array of 256 Very Large Ints (BigNums or infinite-precision integers), i.e. a frequency table for 8-bit symbols? This is hard to compress, i think, since the said frequency table is already the output of a compression algorithm (which was fascinating at first).
    6 replies | 67 view(s)
  • compgt's Avatar
    15th February 2020, 15:18
    Random data compressor #1 One slow early approach is to guess the pattern or symbol. Just try to guess the input byte in < 32 tries, to output just 5 bits. (You can do this by randomly setting bits of a dummy byte on or off and compare it with the input byte.) If not guessed, output 00000 and then 8-bit byte. How would you initialize the dummy byte? Maybe by context; crude LZP like. What else? Build on this. Improve this.
    6 replies | 67 view(s)
  • compgt's Avatar
    15th February 2020, 15:17
    I create this separate thread for my random data compression ideas i posted in several threads here, so that it's easy to find in one thread. My "frequency table" random data coding was programmed in 2006-2007, perhaps with or without a decoder. Or maybe i solved it already but deleted the compressor. I remembered it in 2017 so here it is again. Note, do not try random data compression unless you can actually write a computer program to test your ideas, however futile they might be. It might take you years to clear up on your ideas, or admit that random data compression is impossible.
    6 replies | 67 view(s)
  • compgt's Avatar
    15th February 2020, 14:28
    My Google ownerships slot was taken from me. Now *my* JPEG is being killed off or supplanted by AVIF. I didn't earn officially from JPEG format anyway. We developed it in the 1970s to the 80s. Netflix will surely create its own image format, as Netflix is now a tech giant in media streaming. Incidentally, i planned Netflix too, having made the Hollywood movies, ... if i remember it right.
    73 replies | 17390 view(s)
  • suryakandau@yahoo.co.id's Avatar
    15th February 2020, 13:32
    decompression time of enwik10 using bbb v1.8: 42640.14 sec in my old machine. @sportman could you add this to your enwik10 benchmark please ?
    34 replies | 2784 view(s)
  • Jarek's Avatar
    15th February 2020, 13:03
    "Netflix wants to kill off JPEGs": https://www.techradar.com/news/netflix-wants-to-kill-off-jpegs ~500 upvotes: https://www.reddit.com/r/programming/comments/f46ysc/netflix_avif_for_nextgeneration_image_coding/
    73 replies | 17390 view(s)
  • compgt's Avatar
    15th February 2020, 12:57
    @CompressMaster, why are you trying random compression? I don't mean to pry, but is it your line of work, data compression? I think there are many companies out there trying to solve random data compression. I am not a programmer now professionally but i had done intermittent work on these compression stuff (zip format, bmp, gif, png, jpeg, vcd/svcd, dvd, paq etc.) in the 1970s and the 80s when i was a precocious child and grade-schooler. I am now just coding in C, too lazy to re-learn C++. Data compression remained an interest though.
    12 replies | 603 view(s)
  • Darek's Avatar
    15th February 2020, 12:19
    Darek replied to a thread Paq8pxd dict in Data Compression
    Some enwik scores for paq8pxd_v72: 16'309'641 - enwik8 -s8 by Paq8pxd_v61 15'968'477 - enwik8 -s15 by Paq8pxd_v61 16'570'543 - enwik8.drt -s15 by Paq8pxd_v61 126'587'796 - enwik9_1423 -s15 by Paq8pxd_v61 - best score for paq8pxd serie 16'309'012 - enwik8 -s8 by Paq8pxd_v63 15'967'201 - enwik8 -s15 by Paq8pxd_v63 16'637'302 - enwik8.drt -s15 by Paq8pxd_v63 126'597'584 - enwik9_1423 -s15 by Paq8pxd_v63 16'374'223 - enwik8 -s8 by Paq8pxd_v67_AVX2 16'048'070 - enwik8 -s15 by Paq8pxd_v67_AVX2 16'774'998 - enwik8.drt -s15 by Paq8pxd_v67_AVX2 127'063'602 - enwik9_1423 -s15 by Paq8pxd_v67_AVX2 16'364'165 - enwik8 -s8 by Paq8pxd_v68_AVX2 16'033'591 - enwik8 -s15 by Paq8pxd_v68_AVX2 16'755'942 - enwik8.drt -s15 by Paq8pxd_v68_AVX2 126'958'003 - enwik9_1423 -s15 by Paq8pxd_v68_AVX2 16'358'450 - enwik8 -s8 by Paq8pxd_v72_AVX2 - My compress time 6'780s. 16'013'638 - enwik8 -s15 by Paq8pxd_v72_AVX2 - tested by Sportman, My compress time 6'811s -@Sportman - you have very fast machine! 16'672'036 - enwik8.drt -s15 by Paq8pxd_v72_AVX2 - My compress time 9'280s. 126'779'432 - enwik9_1423 -s15 by Paq8pxd_v72_AVX2 - very nice gain, however still about 200KB behind paq8pxd_v61 version. My compress time 67'740s. 132'464'891 - enwik9_1423.drt -s15 by Paq8pxd_v72_AVX2 - also very nice gain, however still starting from paq8pxd_v57 DRT precompressed files got worse scores than pure enwik8/9 and the compression time is 50% higher. My compress time 93'207s.
    695 replies | 281236 view(s)
  • schnaader's Avatar
    15th February 2020, 12:12
    I'm not sure if this is what you wanted, but compression a png image to less than 50% of its size is not impossible at all: test.png: 80,225 bytes (attached image) test.pcf: 52,864 bytes (Precomp 0.4.7, completely reversible to get the original PNG file) test.webp: 38,224 bytes (cwebp -lossless -q 100 -m 5, same image data as the PNG)
    12 replies | 603 view(s)
  • schnaader's Avatar
    15th February 2020, 11:09
    The git infrastructure is already doing a great job here. First, the TurboBench repository is organized using submodules, so when you clone, you can choose which submodules to clone: git clone https://github.com/powturbo/TurboBench.git => (clones only the main repository) => directory size: 40,213,702 bytes transferred data: about 36,635,613 bytes (size of biggest file in .git\objects\pack) git submodule update --init brotli => (clones only the brotli submodule) => brotli directory size: 35,512,637 transferred data: about 32,181,545 (size of biggest file in .git\modules\brotli\objects\pack) Note that the 37 MB transferred data for the main repository contain the whole repository history (all 1,261 commits). If you don't need that, "git clone --depth 1" will give you the latest revision only which transfers only about 765 KB (!) of data. Looking at that brotli pack file, the transferred data is compressed quite good by git already, though I agree that it could be improved using lzma compression and recompression instead of deflate in git: .git\modules\brotli\objects\pack\.pack: 32,181,545 bytes .git\modules\brotli\objects\pack\.pcf_cn: 76,547,450 bytes (Precomp 0.4.7 -cn -intense) - so it transferred only 32 MB instead of 77 MB .git\modules\brotli\objects\pack\.pcf: 24,072,789 bytes (Precomp 0.4.7 -intense) I agree (though I would replace "system" with "kernel"), but they are already compressed good as well, and have the same potential to improve (tested on Ubuntu): /boot/initrd.img-4.15.0-74-generic: 24,305,803 /boot/initrd.img-4.15.0-74-generic.pcf_cn: 71,762,575 (Precomp 0.4.7 -cn) /boot/initrd.img-4.15.0-74-generic.pcf: 16,949,956 This is more of an issue of the brotli repository than of the TurboBench repository. Note that we didn't use "--recursive" in the submodule init command above, so the submodules in the brotli repository (esaxx and libdivsufsort) aren't cloned. Test data and stuff not needed to build could also be moved to brotli submodules. Of course another thing that could help would be to not use outdated image formats :p brotli\research\img\enwik9_diff.png: 5,096,698 .webp: 3,511,804 (cwebp -lossless -q 100 -m 5) .flif: 3,488,547 (flif -e)
    170 replies | 43212 view(s)
  • brispuss's Avatar
    15th February 2020, 07:57
    brispuss replied to a thread Paq8pxd dict in Data Compression
    I've run further tests, this time "tarring" all 171 jpg images first. Again, there was an improvement in compression, and compression time was reduced a bit of v72 with respect to to v69. The "tarred" files produced slightly better compression with respect to compressing each file individually. Compressor Total file(s) size (bytes) Compression time (seconds) Compression options Original 171 jpg files 64,469,752 paq8pxd v69 51,365,725 7,753 -s9 paq8pxd v72 51,338,132 7,533 -s9 Tarred jpg files 64,605,696 paq8pxd v69 50,571,934 7,897 -s9 paq8pxd v72 50,552,930 7,756 -s9
    695 replies | 281236 view(s)
  • suryakandau@yahoo.co.id's Avatar
    15th February 2020, 02:41
    could you also add bbb v1.8 , please ?
    100 replies | 6216 view(s)
  • suryakandau@yahoo.co.id's Avatar
    15th February 2020, 02:35
    enwik10 result using bbb v1.8 on my old machine: 1,635,996,697 bytes in 92,576.33 sec :cool:
    34 replies | 2784 view(s)
  • kaitz's Avatar
    15th February 2020, 01:13
    kaitz replied to a thread Paq8pxd dict in Data Compression
    Another testing. Wanted to see how well context actually work (wordmodel). ContextMap collects info and if threshold is reached context is permanently disabled, stats collection also. enwik6 i(0)=431965, i(3)=324985, i(24)=488462, i(27)=207541, i(33)=493349, i(34)=157440, i(35)=168725, i(36)=179219, i(37)=555562, i(38)=558076, i(45)=425289, i(58)=399287, i(60)=230295, i(61)=210033, book1 i(0)=520011, i(3)=491253, i(24)=394110, i(25)=220564, i(26)=4994, i(32)=132856, i(33)=490038, i(34)=76312, i(35)=77780, i(36)=80227, i(37)=463017, i(38)=461574, i(45)=421269, i(58)=405311, i(60)=256524, i(61)=131212, below bad contexts: + is enwik6 - is book1 in book i(26) is still ok sortof. -+ cm.set(hash(++i,x.spafdo, x.spaces,ccword)); -+ cm.set(hash(++i,x.spaces, (x.words&255), (numbers&255))); -+ cm.set(hash(++i,h, word1,word2,lastUpper<x.wordlen)); - cm.set(hash(++i,text0&0xffffff)); - cm.set(text0&0xfffff);/// i(26)=4994, book1 + cm.set(hash(++i,word0,number0, data0,xword0)); - cm.set(hash(++i,word0, cword0,isfword)); -+ cm.set(hash(++i,word0,buf(1), word2,isfword)); -+ cm.set(hash(++i,word0,buf(1), word3)); -+ cm.set(hash(++i,word0,buf(1), word4)); -+ cm.set(hash(++i,word0,buf(1), word5)); -+ cm.set(hash(++i,word0,buf(1), word1,word3)); -+ cm.set(hash(++i,word0,buf(1), word2,word3)); -+ cm.set(hash(++i,nl1-nl2,x.col,buf(1),above)); - cm.set(hash(++i,h, llog(wordGap), mask&0x1FF, )); + cm.set(hash(x.col,x.wordlen1,above,above1,x.c4&0xfF)); else cm.set(); //wordlist -+ cm.set(hash(++i,x.col,above^above1,above2 , ((islink)<<8)|)); //wordlist((istemplate)<<9)| -+ cm.set(hash((*pWord).Hash, h)); book1 compressed ​ 183314 (pxd v72) 100 sec 183752 (pxd vXX skip if i(x)>2024) 88 sec 183288 (pxd vXX no skip) 99 sec 184490 (px v183) 139 sec
    695 replies | 281236 view(s)
  • dnd's Avatar
    14th February 2020, 23:27
    Here a listing of the repository size of the some codecs used in TurboBench Compression Benchmark: brotli 37,3 MB pysap 12,8 MB zstd 9,5 MB lzma 7,0 MB isa-l 4,6 MB lzo 4,4 MB snappy 3,4 MB zlib 3,9 MB bzip2 2,8 MB Some packages are including huge test data or indirectly related files. This can reside on a separate repository. The size of the brotli repository is nearly as high as the whole linux system. The bandwidth on a lot of countries are not very high as the countries where the developers reside. Some users have only mobile connections. The paradox, we have here compressors that are designed to save internet bandwidth. Strange, that the files to download is still continuing to grow. This is also valable for games, web pages, images, ...
    170 replies | 43212 view(s)
  • kaitz's Avatar
    14th February 2020, 22:51
    kaitz replied to a thread Paq8pxd dict in Data Compression
    And if you compress as tar/zip (uncompressed), what is compressed size and speed then? ​
    695 replies | 281236 view(s)
  • Trench's Avatar
    14th February 2020, 22:45
    RichSelian When I means 1 step back 2 step forward I mean to make it permanent and not finalize to use ASCII which makes increased the randomness which makes it harder. As stated even a simple binary is random but you only deal with 2 characters unlike ascii which you deal with 255 characters which makes it even moe random. Both random but the odds are worse. It is not like people are compressing numbers with only numbers or binary with binary. But binary with number ascii. What will people come up with next a extended version of ascii to have 100,000 characters which is not the solution to throw more characters at the problem which might make it work but that has its other issues. I gave various perspectives from even as simple as a png image how its not compressed over 50% as it should be yet that is random. Or how one has to change the format like the music file over 50%. I agree its hard to compress with current method since it reaches its limit and so should other perspective be looked at? Maybe try to decompress first and see where they gets you as silly as that sounds but at lest is not trying the conventional. Even if you do achieve something to get a 50% gain out of all other compression it still not easy to be a standard. Also you tried many things that did not work, did you and other have a place to log what does not work so that others see and try a new path or see how to improve? Even I showed how changing a simple thing as 2 characters compressed an extra 3% yet how come compression programs dont use that? And if they dont know about it why not? and what standard is their to make what works or doesnt know in a chart? none? So its like going blind and we are using luck to find logic? As for ascii I think people should try to limit the use of it to go further due to the randomness odd are much greater than the lower randomness odds with something like numbers. You have a 1/2 change to guess a 1 or 0, a 1/10 for 0-9, and 1/255 for ascii. All random but the odds are different is what i mean since compression programs have a limit which maybe for good reason to have the limited dictionary. Maybe more formulas are needed as well since their is plenty room for improvement as dome examples shown. Again you will disagree or maybe I am right and time will tell, but its another perspective.
    12 replies | 603 view(s)
More Activity