Activity Stream

Filter
Sort By Time Show
Recent Recent Popular Popular Anytime Anytime Last 24 Hours Last 24 Hours Last 7 Days Last 7 Days Last 30 Days Last 30 Days All All Photos Photos Forum Forums
  • cade's Avatar
    Today, 04:09
    Simplified version of RK256, also carries the last match along: struct RK256 { const static int NICE_MATCH_UNTIL = 16; const static int BLOCK_BITS = 8; const static int BLOCK_SIZE = 1 << BLOCK_BITS; const static int BLOCK_MASK = BLOCK_SIZE - 1; const static uint32 ADDH = 0x2F0FD693u; //uint32 remh = 1; for (int i = 0; i < BLOCK_SIZE; i++) { remh *= addh; } const static uint32 REMH = 0x0E4EA401u; INLINE uint32 rolling_hash_add(uint32 p, int y) { return (y + p) * ADDH; } INLINE uint32 rolling_hash_add_remove(uint32 p, int yr, int yl) { return (yr + p - yl * REMH) * ADDH; } uint16* cache; uint32* table; uint32 rh, rh_end; uint32 cur_match_from, cur_match_len, cur_match_to; RK256() : cache(nullptr), table(nullptr), rh(0), rh_end(0), cur_match_from(0), cur_match_len(0), cur_match_to(0) { } void Roll(const byte* RESTRICT buf, uint32 p, uint32 p_end) { while (rh_end < BLOCK_SIZE && rh_end < p_end) { rh = rolling_hash_add(rh, buf); ++rh_end; if (!(rh_end & BLOCK_MASK)) { cache = hash4(rh) >> 16; table = rh_end; } } if (p - cur_match_to < cur_match_len) { uint32 diff = p - cur_match_to; cur_match_from += diff; cur_match_to += diff; cur_match_len -= diff; } else { cur_match_len = 0; } if (cur_match_len > NICE_MATCH_UNTIL) { while ((p >= rh_end && rh_end < p_end) || (rh_end >= BLOCK_SIZE && rh_end - p < BLOCK_SIZE && rh_end < p_end)) { rh = rolling_hash_add_remove(rh, buf, buf); ++rh_end; if (!(rh_end & BLOCK_MASK)) { cache = hash4(rh) >> 16; table = rh_end; } } return; } while ((p >= rh_end && rh_end < p_end) || (rh_end >= BLOCK_SIZE && rh_end - p < BLOCK_SIZE && rh_end < p_end)) { rh = rolling_hash_add_remove(rh, buf, buf); ++rh_end; uint16& cache_end = cache; uint16 hash_cur = hash4(rh) >> 16; if (cache_end == hash_cur) { uint32& hist_end = table; if (hist_end < rh_end && hist_end >= BLOCK_SIZE) { uint32 sp = hist_end - BLOCK_SIZE; uint32 mp = rh_end - BLOCK_SIZE; ASSERT(p >= mp); ASSERT(p > sp); uint32 pos_delta = p - mp; sp += pos_delta; mp += pos_delta; if (sp < p) { int len = try_match_unbounded(buf, sp, mp, p_end); if (len > cur_match_len) { cur_match_len = len; cur_match_from = sp; cur_match_to = mp; } } } if (!(rh_end & BLOCK_MASK)) { cache_end = hash_cur; hist_end = rh_end; } } else if (!(rh_end & BLOCK_MASK)) { cache_end = hash_cur; table = rh_end; } } } }; LZC1::RK256 rk256; rk256.table = new uint32; rk256.cache = new uint16; memset(rk256.table, -1, sizeof(uint32) << LZC1::RK256_HASH_BITS); memset(rk256.cache, -1, sizeof(uint16) << LZC1::RK256_HASH_BITS); Two limitations compared to the more complicated version: 1. Shift left is missing (for sliding windows). 2. Inefficient if p not updated for more than BLOCK_SIZE (extra matches tested that probably won't reach p while rolling). Experimenting with a chunk-based version with just static Huffman codes updated every 256k blocks, decompression speed is 3-5x faster, ratio only is ~2-3% worse in most cases.
    39 replies | 2363 view(s)
  • Shelwien's Avatar
    Today, 01:58
    Apparently it uses cygwin there. STATIC=1 let it build the exe anyway, though. With PATH set to mingw \bin it actually compiled with just "make". But then didn't work because of too many linked mingw dlls. I think STATIC=1 should be default on windows.
    176 replies | 43459 view(s)
  • kaitz's Avatar
    Today, 01:12
    kaitz replied to a thread Paq8pxd dict in Data Compression
    It's used in stream where all default data is. Also for text as humans tend to produce allot of it. :) Some images have headers and it my gives somewhat better compression on that, but not very useful. It all depends (how many files, etc). This needs time consuming testing to be actually useful on other types of data. My test version shows what context are mostly bad for given data in time. I think there has been good enough improvements from someone like me. But i still wonder.
    699 replies | 281509 view(s)
  • dnd's Avatar
    Today, 00:34
    The coding with dlsym is only included in linux. It is not included when _WIN32 is defined. see turbobench.c Normally it compiles with mingw without any issue. see the CI Mingw build . Don't know why _WIN32 is not defined in your gcc? You can try to compile with "make STATIC=1" (NMEMSIZE will be defined)
    176 replies | 43459 view(s)
  • Shelwien's Avatar
    Today, 00:10
    Z:\010\TurboBench> C:\MinGW820x\bin\make.exe gcc -O3 -w -Wall -DNDEBUG -s -w -std=gnu99 -fpermissive -Wall -Ibrotli/c/include -Ibrotli/c/enc -Ilibdeflate eflate/common -Ilizard/lib -Ilz4/lib -Izstd/lib -Izstd/lib/common -Ilzo/include -D_7ZIP_ST -Ilzsa/src -Ilzsa/sr vsufsort/include turbobench.c -c -o turbobench.o turbobench.c: In function 'mem_init': turbobench.c:154:24: error: 'RTLD_NEXT' undeclared (first use in this function); did you mean 'RTLD_NOW'? mem_malloc = dlsym(RTLD_NEXT, "malloc" ); ^~~~~~~~~ RTLD_NOW turbobench.c:154:24: note: each undeclared identifier is reported only once for each function it appears in makefile:717: recipe for target 'turbobench.o' failed make: *** Error 1
    176 replies | 43459 view(s)
  • dnd's Avatar
    Yesterday, 23:48
    On linux it's simple to download and build the package, but on windows you must first install git and the mingw-w64 package. This scares windows users. The submodules are already updated automatically and there is also a "make cleana" (linux only) to remove some unnecessary huge directories. I've made a new release with builds for linux+windows and a cleaned small source code 7zip package (5MB) containing all submodules ready to build. That's a solution for users with limited download bandwidth or with difficulties to build turbobench. But this step implies more work to setup. I've added this option in the readme file. As already stated, if you have git and gcc/mingw installed then there is no problem to download and build turbobench. This option reduces the downloaded size by few percents, but the huge submodules will be still completely downloaded.
    176 replies | 43459 view(s)
  • Darek's Avatar
    Yesterday, 23:33
    Darek replied to a thread Paq8pxd dict in Data Compression
    @kaitz - at first thanks to fix this. At second "-x option has affect only on default, text mode." - is it ineffective for other types of data?
    699 replies | 281509 view(s)
  • kaitz's Avatar
    Yesterday, 21:14
    kaitz replied to a thread Paq8pxd dict in Data Compression
    paq8pxd_v74 fix -x option on level 10-15 -x option has affect only on default, text mode. ​
    699 replies | 281509 view(s)
  • Jarek's Avatar
    Yesterday, 16:45
    Thanks, looks similar philosophy as this JPEG LS predictor ( https://en.wikipedia.org/wiki/Lossless_JPEG#LOCO-I_algorithm ) - using some manually chosen heuristic condition to select one of a few predictors, e.g. for smooth regions and edges. It could be optimized based on dataset with automatically found classifier - cluster into different distinguishable types of regions, such that predictor ensemble gives lowest MSE ... I can also build adaptive predictors - using adaptive linear regression: with coefficients evolving to adapt to local dependencies (page 4 of https://arxiv.org/abs/1906.03238 ). But it would require one LinearSolve() per adaptation ... and generally context-dependence seems a better way. Beside predicting value, it is also crucial to evaluate accuracy of such prediction - in practice width of Laplace distribution. JPEG LS uses brute force for this purpose: 365 contexts (I really don't like) - how do you choose it in FLIF, JXL?
    77 replies | 17860 view(s)
  • algorithm's Avatar
    Yesterday, 13:49
    To reduce size of repos you can just shallow clone git clone --depth=1 This downloads only the latest revision, omitting history.
    176 replies | 43459 view(s)
  • Jon Sneyers's Avatar
    Yesterday, 13:15
    That part is just Haar. The nonlinear part is in smooth_tendency(), which is basically subtracting the residu you would expect from interpolation, but only if the local neighborhood is monotonic (so it avoids overshoot/ringing).
    77 replies | 17860 view(s)
  • schnaader's Avatar
    Yesterday, 11:58
    Exactly. There's an additional detail of the PNG algorithm that shows why the algorithm performs so bad: It has the default deflate window size of 32 KB. Now, looking at that Castlevania PNG, width is 13895 and bit depth is 24 bit, so each line takes 41 KB. So the algorithm doesn't even see the previous line. Webp is much better for this kind of data as it recognizes block repetitions: 1.952.389 Original PNG (CastlevaniaII-Simon...) 763.246 cwebp -lossless -q 100 -m 5 729.836 same cwebp, after that paq8p -3 (so the webp file still has some redundancies) Also note that the original PNG contains some additional data (text and arrows), removing them reduces the color count from 440 to 73 and improves compression further: 1.975.238 Modified PNG 675.626 cwebp -lossless -q 100 -m 5 640.962 same cwebp, after that paq8p -3 Still 8 times bigger than the NES ROM, but you get the idea. Also, the non-modified version stresses the point that PNG/cwebp are universal compressors, so they can process anything, even when modified.
    15 replies | 740 view(s)
  • Darek's Avatar
    Yesterday, 11:18
    Darek replied to a thread Paq8pxd dict in Data Compression
    Scores of 4 corpuses for paq8pxd_v73 with -s15 mode. The best scores for Calgary, Canterbury and MaximumCompression for paq8pxd family!
    699 replies | 281509 view(s)
  • Scope's Avatar
    Yesterday, 00:43
    The file sizes are usually almost the same, their difference is less than this could affect the quality. AVIF - 273625 JXL - 273627 AVIF and Jpeg XL are usually very close in size, for other encoders it is more difficult to make the same exact size. I uploaded these files, JXL can be decoded by djpegxl or http://libwebpjs.appspot.com/jpegxl/ and AVIF by FFmpeg or online at https://kagami.github.io/avif.js/ Because there are still no good tools for working with AVIF and the format itself is still unstable, sometimes images are not displayed correctly after muxing, but this should not interfere with comparing quality, as it's just one frame of an AV1 encoder. I wanted to do this, but because tests and article writing are not automated, it would take a lot more time, I initially made these comparisons for myself and they do not pretend to be extremely accurate, but rather a general rough review (similar to a Netflix article). Encoder versions have been added where possible. For a more scientific article, it would be nice to upload all the encoded images, use metrics and take much more examples, but I wrote a more amateurish one, any examples I showed can be repeated independently, at least roughly encoding to the size I specified (because at high bpp, when it has been shown at low bpp, overall placement may change). Compared to the original, all the images are not perfect, Jpeg XL blurred some areas and added artifacts, but they are not so annoying and he more correctly conveyed the general structure. AVIF and HEIC are more enjoyable when viewed closely, but they have distorted the overall image more strongly. It’s easier to see this by quickly switching images, and not on the slider. But I will change the description, because it was more personal preference. - https://medium.com/@scopeburst/mozjpeg-comparison-44035c42abe8 I also added a comparison between 8 and 10 bit AVIF/HEIC and some examples of the HTJ2K encoder.
    77 replies | 17860 view(s)
  • Shelwien's Avatar
    Yesterday, 00:10
    I think it could make sense to make a complete and clean repository on your side (make scripts to download all submodules, then remove unnecessary stuff), then push that to github. Windows users don't have git by default etc. Also it would be good to have buildable full sources as release archive - getting most recent version of each codec is not always a good idea - eg. zstd developers frequently break it.
    176 replies | 43459 view(s)
  • Shelwien's Avatar
    18th February 2020, 23:55
    > " cmix its about 250kb per year." 250k out of what a 100mb file? 250kb smaller compressed size of 100M file (which is around 15mb). I meant this: http://mattmahoney.net/dc/text.html#1159 The surprising thing is that progress was equal during 4-5 recent years, even though one would normally expect exponential decay here. > HD sizes go bigger and faster than that every year. Not really. Storage becomes less and less reliable instead. > So you say we reached a limit of compression with current hardware and > to get better results need more powerful hardware? In a way - its possible to achieve new results with hardware implementations of advanced algorithms. But in practice I don't think that we're already up to date with modern hardware (SIMD,MT,x64). > If so why bother trying when their is not much to do?? There's very much to do, for example video recompression basically doesn't exist atm.
    15 replies | 740 view(s)
  • Trench's Avatar
    18th February 2020, 22:54
    The original file formats are outdated in how they handle things. But o well. well at least its not BMP format :) True its infinite just like combining elements to many different alloys, but at least their is a basic chart of the limited elements. Compression and decompression go hand in hand which I assume was implied. " cmix its about 250kb per year." 250k out of what a 100mb file? a gain o under 0.25% doesnt that seem worth the effort? HD sizes go bigger and faster than that every year. So you say we reached a limit of compression with current hardware and to get better results need more powerful hardware? If so why bother trying when their is not much to do?? if so i guess we have to wit until cmix gets better to incorporate the various meths in 1 since modern files have everything from text, music, art, etc.
    15 replies | 740 view(s)
  • skal's Avatar
    18th February 2020, 22:47
    You should add the exact encoded file size on the comparison slider. For instance, i'm very surprised how this one looks : https://imgsli.com/MTIyMDc/3/1 for AVIF. What are the respective file sizes? You should also print the *exact* final command line used (including the -q value, e.g.), along with the exact revision used, so that people can reproduce your exact results. Finally, i'm surprised about your comment for this one: https://imgsli.com/MTIyMTk/3/2 which says "JPEG-XL is slightly better than AVIF and HEIC", because frankly, the wool from the gloves disappeared on the left. skal/
    77 replies | 17860 view(s)
  • dnd's Avatar
    18th February 2020, 21:53
    Thank you for your elaboration, corrections and hints. I've removed recently some old, not maintained or not notable codecs. Many codecs listed in the readme, but not in the turbobench repository must manually downloaded and activated in the make file. I will continue to clean, simplify and automate the process.
    176 replies | 43459 view(s)
  • JamesB's Avatar
    18th February 2020, 20:51
    JamesB replied to a thread OBWT in Data Compression
    We use mingw/msys for building some of our software at work. A while back a colleague wrote a guide on how to do this here: https://github.com/samtools/htslib/issues/907#issuecomment-521182543 That's the Msys64 build of mingw64 which is a bit different to the native mingw build, but for us it works better as it has a more consistent interface and doesn't try to unixify all pathnames and drive letters. Your mileage may vary, but the instructions there hopefully help. Regarding BBB itself, I think it would be best to edit the version string to indicate this isn't and official BBB release but your fork. This happened a lot with the PAQ series. Eg BBB_sury v1.9. It then acknowledges the heritage, but also that it's a new project and not by the original author. Edit: naturally the htslib part of that link is irrelevant for you, but the pacman command (or most of it anyway) is important as it outlines how to install the compiler suite once you've got msys up and running.
    34 replies | 2900 view(s)
  • Darek's Avatar
    18th February 2020, 20:34
    Darek replied to a thread Paq8pxd dict in Data Compression
    From one side that's good news -> if you could change this then scores for enwik8 could be even better!
    699 replies | 281509 view(s)
  • kaitz's Avatar
    18th February 2020, 19:03
    kaitz replied to a thread Paq8pxd dict in Data Compression
    My bad, level 10 and up will be in -s mode.
    699 replies | 281509 view(s)
  • Darek's Avatar
    18th February 2020, 18:58
    Darek replied to a thread Paq8pxd dict in Data Compression
    @kaitz - I have question - is there -x option works for -x15 option? I've tested enwik8 files and scores for -s15 and -x15 are identical. Timings are also similar. For my testset, for textual files there are difference between -s9 and -x9 options. From other side scores are great: 16'013'638 - enwik8 -s15 by Paq8pxd_v72_AVX2 16'672'036 - enwik8.drt -s15 by Paq8pxd_v72_AVX2 15'993'409 - enwik8 -s15 by Paq8pxd_v73_AVX2 15'956'537 - enwik8.drt -s15 by Paq8pxd_v73_AVX2
    699 replies | 281509 view(s)
  • kaitz's Avatar
    18th February 2020, 18:48
    kaitz replied to a thread Paq8pxd dict in Data Compression
    In single mode there is file overhead about 50 bytes per file vs px version. Like when you compress 1 byte file. For this single mode test i think its about 9000 bytes total. Not sure how much overhead is on tarred file, probably 100 bytes total for input data. (data that cant be compressed)
    699 replies | 281509 view(s)
  • pacalovasjurijus's Avatar
    18th February 2020, 18:43
    The Random file: ​
    24 replies | 1476 view(s)
  • Darek's Avatar
    18th February 2020, 18:23
    Darek replied to a thread Paq8pxd dict in Data Compression
    Scores on my testset for paq8pxd_v73 - very nice improvements overall - especially for K.WAD file. In total 25KB of gain. Option -x (second table) gives additional 16KB of gain to -s test -> the gains are visible almost for every file. Time penalty for -x option compared to -s is about 21%.
    699 replies | 281509 view(s)
  • Lucas's Avatar
    18th February 2020, 17:19
    Lucas replied to a thread OBWT in Data Compression
    Mingw is a suite of development tools. When you install mingw it creates an environment variable for all of the executables which make up mingw. If you have installed mingw, reboot your machine then open command-prompt (or any other terminal tool you like) and type "g++". You'll be able to compile with that, as well as use optimization flags.
    34 replies | 2900 view(s)
  • Jyrki Alakuijala's Avatar
    18th February 2020, 17:11
    We looked into this in detail. The bitrates used in the cameras are 4-5 BPP, and at 1.5 BPP we can provide similar quality. The JPEG image bytes going into Chrome in June 2019 average between 2-3 BPP depending on the image size. See for reference: https://www.spiedigitallibrary.org/conference-proceedings-of-spie/11137/111370K/JPEG-XL-next-generation-image-compression-architecture-and-coding-tools/10.1117/12.2529237.full?SSO=1 When we deliver the same images with 35 % of the bytes, we are talking at 0.7 to 1.05 BPP for internet use. This is a 3x saving on the current internet practice. The encoder is tuned such that this responds to a distance setting of 1.4 to 2.1. At these distances ringing is not yet a problem in JPEG XL. There is a marging of about doubling the compression density before JPEG XL loses the throne to video codecs. People actually do care about image quality. E-commerce sales are up with XX % percent with higher quality images, click rates to videos can be higher for higher quality thumbnails. This is why people are not sending blurred or distorted images as final images in the internet. Also, I don't expect the quality requirements to go down as technology matures. If anything the design decisions and positioning to medium and high quality becomes a more ideal selection in the future. As a last note, we have two ways to counteract ringing in the lowest quality: adaptive quantization and filter range field. When we max out the filter range field, we get similar blurred results like video codecs do. With adaptive quantization we can allocate more bits in rare areas that show ringing. It is just that our current encoder development has focused on practically relevant use cases which are in the distance setting 1.0 to 2.0.
    77 replies | 17860 view(s)
  • suryakandau@yahoo.co.id's Avatar
    18th February 2020, 16:48
    bbb v1.9 enwik10 using cm1000 option 1635575990 bytes i have attached the binary n source code. @shelwien could you compile it please ? i have download mingw from your link but there is no executable inside that. i have downloaded from nuwen.net and still can not compile it. so i still use dev c++ 5.11 to compile it
    34 replies | 2900 view(s)
  • JamesB's Avatar
    18th February 2020, 14:12
    I guess if you could find funding for a virtual machine instance (eg AWS of Google Cloud support) then one way of "winning" the competition is being on the Pareto frontier - either encode or decode. It encourages research into all aspects of data compression rather than simply chasing the best ratio. It can be on hidden data sets too (SequenceSqueeze competition did that) so it's slow and hard to over-train to one specific set. The reason for a standard virtual machine is automation and reproducibility. Either that or one person has to run every tool themselves for benchmarking purposes. Edit: I'd also say some subtle data permutations of input could scupper the race for the best preprocessors, especially if the permutations are secret, while still keeping the same general data patterns. Eg something as dumb as rotating all bytes values by 17 would completely foul many custom format specific preprocessors while not breaking general purpose things like automatic dictionary generation or data segmentation analysis.
    22 replies | 1634 view(s)
  • brispuss's Avatar
    18th February 2020, 10:45
    brispuss replied to a thread Paq8pxd dict in Data Compression
    I've run some tests with paq8pxd V73 and added the results to the table below. Tests run under Windows 7 64 bit, with i5-3570k CPU, and 8 GB RAM. Used SSE41 compiles of paq8pxd V*. Compressor Total file(s) size (bytes) Compression time (seconds) Compression options Original 171 jpg files 64,469,752 paq8pxd v69 51,365,725 7,753 -s9 paq8pxd v72 51,338,132 7,533 -s9 paq8pxd v73 51,311,533 7,629 -s9 Tarred jpg files 64,605,696 paq8pxd v69 50,571,934 7,897 -s9 paq8pxd v72 50,552,930 7,756 -s9 paq8pxd v73 50,530,038 7,521 -s9 Overall, improved compression, and slight reduction in compression time for v73!
    699 replies | 281509 view(s)
  • Jarek's Avatar
    18th February 2020, 09:41
    I was focused on steam pattern on coffee - AVIF destroyed it, HEIC and JXL maintained it but added artifacts ... and indeed JXL has additional ringing on the cup, but all 3 have artifacts there.
    77 replies | 17860 view(s)
  • Jaff's Avatar
    18th February 2020, 06:14
    Jaff replied to a thread Papa’s Optimizer in Data Compression
    Put this before optimize JPEG: 0) Extract JPEG trailer to separate file (need latest exiftool 11.87) 5) add back the trailer: 1.jpg = optimised file, 2.dat (saved trailer); 3.jpg (new optimized file) Let's rename file back and delete temp files... I wait for the new version. Now you can add option to strip the trailers or not. :_confused2:
    79 replies | 13166 view(s)
  • Shelwien's Avatar
    18th February 2020, 04:13
    > Sometime looking at things conventionally we get limited to conventional limits. Its not about conventions, but rather about hardware constraints. For example, there's a known method which allows to combine most of different existing compressors together as cmix submodels. Also just using more memory and more contexts still helps too. But business only cares about efficient algorithms, like ones that transparently save storage space. Also from the p.o.v. of Kolmogorov Complexity the common arithmetics operations are less efficient than NNs or LPC: https://en.wikipedia.org/wiki/Function_approximation > How much does compression get better every year? For cmix its about 250kb per year. > How many types of compression are their and can they be charted out in a logical manner? Compression is approximate enumeration of datatype instances, so we can say that there're N*M types of compression, where N is the number of types of data and M is the number of approximation methods... essentially its infinite. > What compression methods did not work? Mostly ones that you suggest, I suppose. "Common sense methods" mostly don't work: https://en.wikipedia.org/wiki/List_of_cognitive_biases > How many hours have people wasted trying to look at something that thousands > of others have wasted looking at which could have looked elsewhere? I'd say less than necessary. Compression developers too frequently don't actually look at the data at all (because hex viewers are not integrated in OS and/or IDE, etc), but instead work based on their imagination and test results. > If 1 digits can compress 4 digits and 4 digits can compress 16 digits > the why cant the new 4 digit can not compressed to 1 again? Because people mostly care about decompression... and a single digit has too few different values to be decompressed to many different files. > It cheated in the first place maybe to reply on another format like like hex or ASCII? Compression algorithms normally only work with binary data - there's no hardware for efficient storage of eg. decimal digits (or ASCII), thus there's no software to support it. > In the end is recognizable randomness vs unrecognizable randomness. In the end most of data sequences are incompressible, there're just too many of them, much more than the number of possible generator programs of smaller size (since these have dups and failures).
    15 replies | 740 view(s)
  • cade's Avatar
    18th February 2020, 03:18
    This map is rendered with a map loader that runs commands of the type: At x, y put source tile/sprite z. Tiles and sprites are either perfectly full chunks or chunks with transparency (bit mask). Something as simple as LZ compression can recognize 1D segments of these tiles/sprites, which is a lot more useful than .png (looks for linear patterns, not matches, then entropy reduction). Simple template matching with fixed-size images can match those patterns and reproduce the original instructions of (x,y,z). Or to put it simply, .png is wrong information representation for that format.
    15 replies | 740 view(s)
  • Trench's Avatar
    18th February 2020, 02:56
    James you know better but I was giving that as an example. I could have made a 8x8 or 16x16 sprite and duplicated it to 10000x10000. Point is the for the compression program to have a library of methods to handle certain structures. Sure plenty of variations but to have a reasonable progressive structure in the compression program, to take unto account patterns, gradient, shapes, etc. Sometime looking at things conventionally we get limited to conventional limits. How much does compression get better every year? How many types of compression are their and can they be charted out in a logical manner? What compression methods did not work? How many hours have people wasted trying to look at something that thousands of others have wasted looking at which could have looked elsewhere? Obviously these all cant be answered easily but maybe before people try to make the next best compression program to understand the basics information. If 1 digits can compress 4 digits and 4 digits can compress 16 digits the why cant the new 4 digit can not compressed to 1 again? It cheated in the first place maybe to reply on another format like like hex or ASCII? In the end is recognizable randomness vs unrecognizable randomness. Or no pattern vs pattern its all random but programs only prepare for certain patterns and not all since it can be confusing. If we never has ASCII or HEX would compression be possible? Or If we take on the complexity of unrecognizable be more recognizable of 1@C/ to 1234 would that help? Again you guys know best and I am just asking questions.
    15 replies | 740 view(s)
  • hexagone's Avatar
    18th February 2020, 02:54
    I know what it is, I have done image lossy compression myself. I was just mentioning the issue since Jarek said that JXL looked the best for this picture and other codecs do not show this issue.
    77 replies | 17860 view(s)
  • kaitz's Avatar
    18th February 2020, 01:42
    kaitz replied to a thread Paq8pxd dict in Data Compression
    paq8pxd_v73 - Change text detection - Change wordModel1 - Add mod_ppmd/chart model back (-x option)
    699 replies | 281509 view(s)
  • algorithm's Avatar
    18th February 2020, 01:19
    They are called ringing artifacts.It is a byproduct of DCT. It is the Gibbs phenomenon. JPEGXL has a lot of ringing. JPEG XL is probably the best replacement for jpeg for digital cameras. But for internet distribution were bitrate is lower it has weaknesses.
    77 replies | 17860 view(s)
  • hexagone's Avatar
    17th February 2020, 23:35
    For coffee, JXL has boundary artifacts on the rim of the cup (180-270 degree angles) For landscape, JXL does something weird with the clouds.
    77 replies | 17860 view(s)
  • Scope's Avatar
    17th February 2020, 22:42
    I will try to find such examples, usually other formats and different encoders were not particularly better (I can remember something like a manga or art with clear lines). But I didn’t do much testing of HEIC (I tried to focus on open and royalty-free standards). AV1 is very slow and it has a problem with strong blurring and loss of details and this cannot be fixed by changing the settings (it seems it was very optimized and tuned for streaming video, but not static images), there is a similar problem with WebP. ​ I tested at speed 8 (kitten), it was the default setting until the last update (now it is 7 - squirrel), I did not really notice the quality improvement at faster settings, but there were strange results at speed 9 (tortoise), so I stopped at 8, but I will try again. I tried to test the encoders without changing or filtering the images and show what they have their own tools for this. Modular mode is now enabled with quality settings (-q) below 40, the last example has already been encoded in this mode. I tested it separately, on manga/comics and drawings, at lower bitrates it sometimes shows better visual quality.
    77 replies | 17860 view(s)
  • Shelwien's Avatar
    17th February 2020, 21:25
    Sure, but its pretty hard to setup a fair competition with just that. LZMA has old style (from time when out-of-order cpus didn't exist) entropy coding, but there're several lzma-based codecs developed for new cpus - LZNA etc. These new codecs don't really beat lzma in compression, but do have 5-8x faster decoding. Also there's RZ and various MT schemes (including PPM/CM) which would consistently beat 7z LZMA2. Problem is, I'd like this to be an actual competition and not just finding and awarding the best non-public preprocessor developer.
    22 replies | 1634 view(s)
  • pacalovasjurijus's Avatar
    17th February 2020, 19:27
    New version of comression: Test: Before: 3,145,728 Bytes 1(4)) (Random) After: 3,145,724 Bytes 1(4)).b
    24 replies | 1476 view(s)
  • Jyrki Alakuijala's Avatar
    17th February 2020, 16:37
    One possible mitigation with JPEG XL's steeper degradation at lowest BPPs is to resample the image to a slightly smaller resolution (for example ~70 % smaller) before compression. It might be a good idea to try out on these 'go down to 10 kB comparisons' to represent the real use of extreme compression in the web. ​This would likely work pretty well for photographs at these BPPs (< 0.4 BPP), and less well for pixelized computer graphics. Then again there is another mode (FUIF-inspired modular mode) in JPEG XL for that.
    77 replies | 17860 view(s)
  • Jyrki Alakuijala's Avatar
    17th February 2020, 16:18
    Superb presentation and good analysis of the codec situation. What if you run JPEG XL encoder with the --distance 1.0 setting, will you be able to find an encoder that does better for any of the photographs (with consuming the same amount of bytes that JPEG XL used for 1.0)? More advanced guestion: at which distance settings will other encoders start to compete with JPEG XL? Would JPEG XL still win on every image if distance 1.5 would be used? (if it won at 1.0 :-D) Also, I suspect that our slower speed settings (slower animals) are only great at around distance 1.0-1.5, and they can be harmful for very high distances. You might get better results with low bit budgets with the faster settings, without looping in butteraugli. (Butteraugli only works very well at small distances.) I haven't tested the slowest settings for a rather long time, so they might have bugs, too. Possibly better to run with default settings :-) ​Do you have personal experience about the speed settings and quality?
    77 replies | 17860 view(s)
  • Jarek's Avatar
    17th February 2020, 16:17
    Thanks, they have very different artifacts, HEIC seems better than AVIF. For example in https://imgsli.com/MTIyMDc/ AVIF completely destroys water and sky, JXL is definitely the best - this and faces probably come from perceptual evaluation. The upper bridge in https://imgsli.com/MTIyMDY/ - again completely destroyed by AVIF, the other two handle. In https://imgsli.com/MTIyMDg/ all 3 have nasty artifacts of its strange sky. For coffee in https://imgsli.com/MTIyMTk/ JXL looks the best. Fox https://imgsli.com/MTIyMjA/ is blured in all 3, HEIC is definitely the best. In this abstract https://imgsli.com/MTIyMjQ/ JXL has really nasty boundary artifacts. In this 10kB landscape https://imgsli.com/MTIyMzA/ AVIF is the best due to smoothing.
    77 replies | 17860 view(s)
  • Scope's Avatar
    17th February 2020, 15:20
    It was me, I also added a comparison with HEIC (HEVC+HEIF) and extreme compression.
    77 replies | 17860 view(s)
  • Mauro Vezzosi's Avatar
    17th February 2020, 14:37
    hbcount is always 0 in line 9905 (jpegModelx.p()): ​9869 if (hbcount==0) { 9905 cxt=(!hbcount)?hash(mcupos, column, row, hc>>2):0; // MJPEG
    699 replies | 281509 view(s)
  • necros's Avatar
    17th February 2020, 12:39
    Contest goal could be outperforming for example 7z LZMA2 compression of multiple data set in terms of same or lower time and same or better compression.
    22 replies | 1634 view(s)
  • compgt's Avatar
    17th February 2020, 11:42
    @JamesWasil, Ok... that's the official "history" you know. But i remember this the classified top secret part of 1970s to 80s Martial Law Cold War when the Americans were here in the Philippines. It was me a kind of "central hub" among scientist networks that time (so i learned from many, and privy to state of the art science and technologies that time), dictating on what-would-be computer science history for the 80s, 90s and 2000s (e.g. in data compression and encryption, IBM PC, Intel, AMD and Microsoft Windows dominance etc.). Just think of me then as a top military analyst. I mean, i wasn't just a player on all this; it was me moderating on everything tech. I knew i already co-own Apple and Microsoft. I guess i decided to officially be co-founder of Yahoo, Google and Facebook, but didn't happen officially in the 1990s and 2000s. There was "too much fierce" competition amongst tech companies. I mean, it was Cold War, a real war. The Cold War officially ended in the early 1990s, with the Americans leaving the Philippines, military bases left behind or demolished. In short, the real computing history of US (and the world) was made, written, and decided here in the Philippines, with me. I chose Steve Jobs. I glorified Bill Gates, bettering his profile more and more. I chose Sergey Brin and Larry Page for Google, and i decided for a Mark Zuckerberg Chairman-CEO profile for Facebook. Too many ownerships for me in tech that they became greedy, wanted my ownerships for themselves, or decided to move on without me. That is, they asserted to other player groups my decisions or timetable for them to own the tech giants, but without me. What kind is that?! In late 1980s, however, they reminded me of Zuckerberg and Facebook, implying a chance for me to officially "co-found" Facebook. I remember this encode.su website and GUI (1970s), as the names Shelwien, David Scott, Bulat Ziganshin, dnd, Matt Mahoney, Michael Maniscalco, Jyrki Alakuijala. Some of them would come to me in the Philippines in the 80s...if it was really them. By early 1990s i was forgetting already. In the mid 2000s i was strongly remembering again. If i hear my voice in the bands "official" recordings of Bread, Nirvana, America, Queen, Scorpions etc, i then strongly believe these computer science memories.
    15 replies | 740 view(s)
  • JamesWasil's Avatar
    17th February 2020, 10:28
    There's a lot of things to address here, but first this: 1) Compgt: PAQ did not exist in the 70's or 80's. Wasn't conceptualized until the middle of the 1990's, I remember reading the initial newsgroup and forum threads for it back then under comp.compression and other forums/sites that no longer exist. Some of the file formats mentioned didn't either. Just politely letting you know of that so that you don't get flamed for that someday in a conversation or post, even if you were working on predecesor algorithms at that time. Trench: You have to understand the difference between pixel by pixel compression and macro sprites. The NES, Sega Master System, and other game systems from that time used a z80 or similar processor which ran anywhere from 1mhz to 8mhz baed on what it was. The graphics and audio were usually separate but dedicated processors running at about the same speed. Even still, while having a processor for each made a lot of things possible that a single chip would have struggled to do for the time, ram and storage space was still really expensive. They didn't want to use bloated image formats and they needed animation and scrolling to be fast. What they did is use sprites that were limited to 64x64 pixels (or less) and made background sprites that were used as tiles to create the images, maps, and backgrounds that you see. Larger Sprite animations were at times 2 or 3 of those large blocks synchronized to move together, but at times did flicker because of latency and refresh rate issues when the gpu tried to do too much at once, which was evident when too many sprites were on the screen at the same time. What this means is that out of a given screen area of say 480x284 (may be different but as an example), the entire background "image" was a jigsaw-piece assembled Sprite layer of tiles that were "stamped" in blocks, to where 64x64 pixel blocks squared which is the equivalent of 4096 pixels were represented with a pointer that was either 6 bits or 1 8 bit byte. This means that for every 4096 pixels, they were represented by 1 byte rather than 16. Yes, you might be able to fit entire images - stamped as Sprite macros to form that image - at 1/16th it's size for an NES game. But any image you're able to make with it is repetitive and restricted to Sprite artwork only loaded. PNG, GIF, and even lossy formats like JPEG are NOT restricted to premade macro images / aka graphical text fonts and have to be able to process, compress, and display ANY pixels you throw at it for an image. The same was done for audio and audio effects to make sure it all fit under 256 to 512k per cartridge. The earlier systems like the first generation of Sega Master Systems had to fit under 64k on game cards for the SG1000, and you were able to see the differences and restrictions even more to get it to fit. There are different types of compression, and not all are equivocal to one another. If there is a database of apriori knowledge, compression can be done with that. Compression isn't limited to the poor explanation of the tired and repetitive references to mahoney's "data compression explained", because in truth that is data compression only in one form and only explained one way. There are many other ways it can happen, and the NES and Sega implementation of Sprite image layers for KNOWN data demonstrate that. Compression of unknown data becomes more difficult of course, but not impossible. Just very difficult and only possible with methods that can address it.
    15 replies | 740 view(s)
  • Trench's Avatar
    17th February 2020, 01:36
    The file you compressed has mostly empty space and few colors, just like any file if its mostly empty it will go the same. But here are some examples of something more complex that cant be compressed desire original program can. Unlike something like an NES game map online which the size is 1,907kb and compressed to 1,660kb. While looking online for the file size it says it is 75KB (95% compression) despite it includes more art, sound, and code. Another game is earthbound on the list which the map is 3.4mb while the file size online says 194KB (94% compression) This applies to 100mb png image files that can not come close to the original which would probably be 5MB Plenty of patterns yet compression can not even recognize the patters. Shouldn't this be a simple thing that is not implemented yet? 95%! https://vgmaps.com/Atlas/NES/CastlevaniaII-Simon'sQuest-Transylvania(Unmarked).png
    15 replies | 740 view(s)
  • Jarek's Avatar
    17th February 2020, 01:15
    Thanks, that's a lot of priceless satisfaction. There is also WebP v2 coming ( https://aomedia.org/wp-content/uploads/2019/10/PascalMassimino_Google.pdf ), but I don't think it will have a chance with JPEG XL - your big advantage is perceptual evaluation. Also VVC is coming this year with HEIF successor (VIF?), but will rather have costly licenses, also computationally much more complex.
    77 replies | 17860 view(s)
  • Sportman's Avatar
    16th February 2020, 21:40
    Added Shelwien compile.
    100 replies | 6254 view(s)
  • Jyrki Alakuijala's Avatar
    16th February 2020, 19:28
    Thank you!! ​ Also, JPEG XL is the only ANS-based codec in this comparison :-D
    77 replies | 17860 view(s)
  • Shelwien's Avatar
    16th February 2020, 17:31
    Ask me :)
    1 replies | 86 view(s)
  • maorshut's Avatar
    16th February 2020, 17:21
    Hi all, How can I change my username or delete my account if changing is not possible ? Thanks
    1 replies | 86 view(s)
  • Sportman's Avatar
    16th February 2020, 13:38
    Sportman replied to a thread BriefLZ in Data Compression
    252,991,647 bytes, 4628.815 sec. (1 hour 17 min) - 3.683 sec., blzpack -9 --optimal -b1g (v1.3.0)
    1 replies | 307 view(s)
  • suryakandau@yahoo.co.id's Avatar
    16th February 2020, 11:31
    How about bbb cm1000 for v1.8 ?? i use bbb cm1000 for v1.8
    100 replies | 6254 view(s)
  • Sportman's Avatar
    16th February 2020, 11:24
    Added default mode.
    100 replies | 6254 view(s)
  • Jarek's Avatar
    16th February 2020, 10:02
    Indeed, the comparisons are great: poster ~80kB: https://imgsli.com/MTIxNTQ/ lighthouse ~45kb: https://imgsli.com/MTIxNDg/ windows ~88kb: https://imgsli.com/MTIxNDk/ ice ~308kB: https://imgsli.com/MTE3ODc/ face ~200kB: https://imgsli.com/MTE2MjI/ JPEG XL is definitely the best here for maintaining details of textures like skin.
    77 replies | 17860 view(s)
  • Shelwien's Avatar
    16th February 2020, 05:32
    Shelwien replied to a thread OBWT in Data Compression
    Here I compiled 1.8. But you easily can do it yourself - just install mingw: https://sourceforge.net/projects/mingw-w64/files/Toolchains%20targetting%20Win32/Personal%20Builds/
    34 replies | 2900 view(s)
  • suryakandau@yahoo.co.id's Avatar
    16th February 2020, 04:48
    could you upload the binary please ? because it is under GPL licenses
    34 replies | 2900 view(s)
  • Jyrki Alakuijala's Avatar
    16th February 2020, 02:48
    I'm looking into this. No actual progress yet.
    300 replies | 313635 view(s)
  • Jyrki Alakuijala's Avatar
    16th February 2020, 02:39
    Someone compares AVIF, WebP, MozJPEG and JPEG XL with a beautiful UI. https://medium.com/@scopeburst/mozjpeg-comparison-44035c42abe8
    77 replies | 17860 view(s)
  • jibz's Avatar
    15th February 2020, 19:01
    jibz started a thread BriefLZ in Data Compression
    Since the BriefLZ 1.2.0 thread disappeared, here is a new one! I've just pushed BriefLZ 1.3.0 which includes the forwards binary tree parser (btparse) which was in the latest bcrush and blz4. It improves the speed of --optimal on many types of data, at the cost of using more memory. The format is still backwards compatible with BriefLZ 1.0.0. enwik8 blzpack-1.2.0 --optimal -b100m 30,496,733 30 hours silesia.tar blzpack-1.2.0 --optimal -b205m 63,838,305 about a week enwik8 blzpack-1.3.0 --optimal -b100m 30,496,733 95 sec silesia.tar blzpack-1.3.0 --optimal -b205m 63,836,210 4 min enwik9 blzpack-1.3.0 --optimal -b1g 252,991,647 10.5 hours Not sure why the result for silesia.tar from 1.2.0 from two years ago is slightly higher, but not going to rerun it. If anyone has a machine with 32GiB of RAM, I would love to hear how long --optimal -b1g on enwik9 takes, because the results for this machine (8GiB RAM) includes swapping. Attached is a Windows 64-bit executable, and the source is at https://github.com/jibsen/brieflz
    1 replies | 307 view(s)
  • pacalovasjurijus's Avatar
    15th February 2020, 18:26
    Software: White_hole_1.0.0.1.6 Before: 2019-07-01.bin (little bit different but Random) 1048576 Bytes After: 2019-07-01.bin.b 1048396 Bytes Time 6 minutes
    24 replies | 1476 view(s)
  • compgt's Avatar
    15th February 2020, 15:35
    So, if you somehow compressed a string of jumbled characters into a significantly smaller (or program) size, it is simply "random-appearing" and not algorithmically random.
    6 replies | 81 view(s)
  • compgt's Avatar
    15th February 2020, 15:23
    To solve the bombshell, or presence of "anomalous" symbols, (1) you must have a way to create a smooth frequency distribution or the frequencies must be of the same bitsize as much as possible, of which there are many ways to do this, but must be reversible. For example, you can maybe XOR the bytes of the data source first (this is reversible), pre-whitening it for encoding. (2) Or, the bombshell symbol or byte can be thought of as an LZ77 literal, simply output a prefix bit flag (anomalous symbol or not?) for the symbols. This means at most two bits per symbol encoding, with the bit flag to indicate if the symbol sum doubles or MSbit. Plus the anomalous symbol when it happens, 8 bits. I wonder how large would the frequencies or freqtable be... And, like in Huffman coding, you can contrive or generate a file that is exactly perfect or suitable for this algorithm. What would be interesting is that the generated file is a "random-appearing data" file, perhaps indeed incompressible to known compressors. (See post above, now has pseudo-code for your easy understanding.)
    6 replies | 81 view(s)
  • compgt's Avatar
    15th February 2020, 15:22
    > 4. Instead of 8-bit bytes, use 4-bit symbols; How about 2-bit (base-4) symbols? Or maybe even better, a data source of base-3 symbols ??
    6 replies | 81 view(s)
More Activity