Activity Stream

Filter
Sort By Time Show
Recent Recent Popular Popular Anytime Anytime Last 24 Hours Last 24 Hours Last 7 Days Last 7 Days Last 30 Days Last 30 Days All All Photos Photos Forum Forums
  • Darek's Avatar
    Today, 12:46
    Darek replied to a thread Paq8pxd dict in Data Compression
    enwik9 score for non DRT version and -s15: 126'211'491 - enwik9_1423 -s15 by Paq8pxd_v74_AVX2 - record for a paq8pxd series! Time 67'980,66s.
    707 replies | 282021 view(s)
  • dnd's Avatar
    Today, 12:28
    "Accelerating Compression with FPGAs" In this article, we’ll discuss the Intel GZIP example design, implemented with oneAPI, and how it can help make FPGAs more accessible see other Data Compression Tweets
    81 replies | 12061 view(s)
  • Shelwien's Avatar
    Today, 11:45
    So, the target compressed size of enwik9 to get the prize is ~115,300,000 (115,506,944 with decoder). Size-wise, it may be barely reachable for cmix (v18 result is 115,714,367), but even that v18 needs 3x allowed memory and 2x time (we can improve compression using public preprocessing scripts and save some memory and time by discarding non-text models, but 3x difference in memory size is too big). So once again either Alex wins first prize, then other people can start doing something (since open source is required), or contest keeps being stuck as before.
    6 replies | 137 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Today, 10:45
    Maybe it is more interesting to use enwik10 rather then enwik9...and the time <=48hours
    6 replies | 137 view(s)
  • hexagone's Avatar
    Today, 03:35
    Release 1.7 Changes: - Bug fixes & code cleanup - Slightly better compression throughout - Modified level 6 (faster for text files) - Better handling of small files Silesia C++ results: https://github.com/flanglet/kanzi-cpp Silesia Java results: https://github.com/flanglet/kanzi Silesia Go results: https://github.com/flanglet/kanzi-go enwik8 zip 3.0 -9 4.70 0.59 36445403 lzfse 1.0 4.66 0.82 36157828 kanzi -b 25m -l 1 -j 4 0.54 0.39 34532276 lrzip 0.631 -b -p 12 3.91 1.29 29122579 bzip2 1.0.6 -9 5.84 2.52 29008758 brotli 1.0.5 -9 64.68 0.84 28879185 kanzi -b 25m -l 2 -j 4 0.72 0.48 27962342 lrzip 0.631 -p 12 11.36 0.96 27228013 orz 1.5.0 4.71 0.95 27148974 zstd 1.4.5 -19 39.71 0.18 26960372 kanzi -b 12500k -l 3 -j 8 1.07 0.64 26741570 brotli 1.0.5 -Z 430.95 0.73 25742001 lzham 0x1010 -m4 20.35 0.50 25066677 kanzi -b 12500k -l 4 -j 8 1.29 0.76 24989286 lzma 5.2.2 -9 54.75 1.00 24861357 brotli 1.0.7 --large_window=30 435.10 0.95 24810180 lzturbo 1.2 -49 -b100 82.19 1.24 24356021 kanzi -b 25m -l 4 -j 8 1.59 0.94 24108751 kanzi -b 100m -l 4 -j 8 5.52 1.89 22478636 lrzip 0.631 -b -p 12 18.08 15.18 22197072 kanzi -b 100m -l 5 -j 8 7.93 3.31 21275446 bsc -b100 5.51 1.33 20920018 kanzi -b 100m -l 6 -j 8 9.98 3.38 20869366 kanzi -b 100m -l 7 18.98 18.81 19570938 kanzi -b 100m -l 8 27.18 27.73 19141858 xwrt 3.2 -b100 -l14 51.39 53.37 18721755 calgary 1.6 Level 2 Total encoding time: 91 ms Total output size: 1077662 bytes 1.7 Level 2 Total encoding time: 66 ms Total output size: 1012784 bytes 1.6 Level 7 Total encoding time: 1991 ms Total output size: 744184 bytes 1.7 Level 7 Total encoding time: 808 ms Total output size: 739624 bytes ​ 1.6 Level 8 Total encoding time: 3849 ms Total output size: 735236 bytes 1.7 Level 8 Total encoding time: 1382 ms Total output size: 733188 bytes
    19 replies | 6165 view(s)
  • Trench's Avatar
    Today, 03:08
    Imagine if the program took an hour to scan for what pattern to use then compress it. Would that be worth it? it would make the file just as "random" but at least to try and be more "comprehensible" to be compression friendly. nice effort but maybe better to learn some other ideas by experimenting in small scale than go big to see results. The skill people have is good here to be able to make such programs but do not have infinite perspectives which one has which takes exploration. Kind of how people use to go on trips was to experience new thing to learn and bring back home to use it and make your home better. Ancient Greeks went around gather information and put it in books for that other add on to ideas until something with a certain perspective can add up the things to make something new and that new thing inspires others when they gather other things and other better things happen. But now modern time to just visit see what you can and bring nothing back but selfies to say they were their which is only important to them and their circle and no one else. it would be like going to the supermarket to just look just for the sake of looking which has no purpose. Anyone take a selfie in a supermarket? LOL Obviously i am giving a silly example but you get the idea. Some people like wasting hours leveling up in a game to erase all progress and forget the game in a few months while others have fun figuring out math problems that helps themselves and/or everyone a lifetime. And some ideas go nowhere which is an example for others to not try their... if done right since some dead ends are false dead ends. replacing random with other random does not help but make it more meaningful if you are going to do it. Have the code find something that can become a pattern with the help of a pattern. I dont mean scan the entire file but randomly pick some spots in the file to see if they have similarities to input a pattern or 2 to make it be smaller not bigger. Again that can be a dead end but to say you have a loss makes it sound like you are doing something way off. best if the program scans which pattern is best to get a better result than to just do without scanning. Imagine if the program took an hour to scan for what pattern to use then compress it. Would that be worth it? it would make the file just as "random" but at least to try and be more "comprehensible" to be compression friendly. just like the random number presented before instead of finding a solution to find how many way it can be broken down. 128705610 LLHHLHHLL High Low the center 0 is off to need another L OEEOEOEOE Odds Evens the 2 after would have needed a O GGGLLGGLL greater lower than previous which 1st ones off so based off that then 228815621 pattern to add 100110011 based off average of the 3 previous patterns but patterns can be infinite which will take time. If it be at small chunks of the file to have 2 random numbers and put it in blocks which would be a bigger file or the single pattern though out. in a way it is a math problem which the would need a lot of processing. Or maybe a dead end. Again I am not saying it will work in practice but in theory. Just like the Irobot vacuum how eventually within time it will clean the home. Another odd thing is that it can be subtracted by some of #2 doubling of the numbers as the chart shows The left is what is misused first. which strangely enough if you put it a 1 next to an active number and zero next to the non active row you get 111101010111110010001001010 which is exactly the same number 128705610. if it was dividable by 3 it would be 101000111010100001011011100 with works 2/3 of the times. and converting the binary number would change to be 85803740. Other numbers like 6 it would work and is 1 digit less to be 10100011101010000101101110 which if you convert that number to decimal it is 42901870 again 1 digit less. But by 9 1101101000110101110011111 which needs a -2 to make it work since the last binary digit remains the same if 7 number off. I figure that means nothing but throwing it out their. by 2 to find 128705610 67108864 1 1 33554432 2 1 16777216 3 1 8388608 4 1 4194304 0 2097152 5 1 1048576 0 524288 6 1 262144 0 131072 7 1 65536 8 1 32768 9 1 16384 10 1 8192 11 1 4096 0 2048 0 1024 12 1 512 0 256 0 128 0 64 13 1 32 0 16 0 8 14 1 4 0 2 15 1 1 0
    19 replies | 925 view(s)
  • Alexander Rhatushnyak's Avatar
    Today, 02:54
    Leonardo da Vinci's birthday! good choice! The front page of LTCB still says "50,000 euros of funding". From the FAQ page: > Unfortunately the author of phd9 has not released the source code. Except the enwik-specific transforms, which reduced the effective size of input (the DRT-transformed enwik8) by 8.36%. Besides, the dictionary from phda9 is now being used in cmix (and maybe in some of the latest paq8 derivatives?) I also shared ideas, but there were no contributions from others, and almost no comments. Seemingly no one believes that simple things like a well-thought-out reordering of wiki articles can improve result by a percent or two (-: > the first winner, a Russian who always had to cycle 8km to a friend to test his code because he did not even have a suitable computer I guess this legend is based on these words: "I still don't have access to a PC with 1 Gb or more, have found only 512 Mb computer, but it's about 7 or 8 km away from my home (~20 minutes on the bicycle I use), thus I usually test on a 5 Mb stub of enwik8". Always had to cycle? That's too much of an exaggeration! I still cycle 3..5 hours per week when staying in Kitchener-Waterloo (9..10 months of the previous 12 months) attend a swimming pool when there's enough free time, and do lots of pull-ups outdoors when weather permits, simply because I enjoy these activities. By the way, even though I was born in Siberia, my ethnicity is almost 100% Ukrainian, guess it would be better to call me "a Ukrainian". My first flight to Canada in May 2006 was from Kiev, because I lived there then, because all of my relatives, except parents, reside in Ukraine.
    6 replies | 137 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Today, 02:21
    From LTCb site , I guess the winner is phda9 again.
    6 replies | 137 view(s)
  • Jarek's Avatar
    Today, 01:02
    Ok, so let's look at arithmetics of rANS decoding step: D(x) = (f * (x >> n) + (x & mask ) – CDF, s) We have n-bits accuracy, e.g. for 16bit state and 10bit accuracy, we need multiplication "10bits times 6bits -> 16 bits". In hardware implementation it would be exactly such multiplication, we can improve redundancy by using 1bit renormalization (count leading zeros gives number of bits).
    194 replies | 70564 view(s)
  • Darek's Avatar
    Today, 00:46
    Darek replied to a thread Hutter Prize update in Data Compression
    Yes, It's 10GB. "Restrictions: Must run in ≲100 hours using a single CPU core and <10GB RAM and <100GB HDD on our test machine."
    6 replies | 137 view(s)
  • Sportman's Avatar
    Today, 00:30
    Is the test machine link right, it show only 3816 MB? I assume 10MB RAM must be 10GB?
    6 replies | 137 view(s)
  • Matt Mahoney's Avatar
    Today, 00:18
    The Hutter prize has been updated by 10x in prize money (5000 euros per 1% improvement), size, and CPU time and memory allowed. http://prize.hutter1.net/ The task is now to compress (instead of decompress) enwik9 to a self extracting archive in 100 hours with 10 MB RAM allowed. Submissions must be OSI licensed. The baseline is phda9 (which meets time and memory but not other requirements). The compressor size is added to the archive size. Contest starts April 14 2020 with awards thereafter for improvements of 1% or better.
    6 replies | 137 view(s)
  • kaitz's Avatar
    Yesterday, 23:49
    kaitz replied to a thread Paq8pxd dict in Data Compression
    .hlp file has some LZ compressed data, recompress it. And if you target only this set you can gain allot. Jpeg gain 1-2kb gain is possible. On im24 4kb is possible. On dict 3kb is possible But why? And pxd splits data, and sometimes it is bad for compression. Silesia -> samba. Do you care compressor/decompressor size, memory usage, time... px version has gains on different places, but conisider time/mem/ etc... In pxd its clear that sometimes adding more models makes it worse. x vs s option. Like adding dmc back to jpeg stream makes it worse... (one present in pxd). Its hard.
    707 replies | 282021 view(s)
  • dnd's Avatar
    Yesterday, 23:23
    RansState is defined as uint32_t. The multiplication is 32 bits (32 bits for ransstate and 12-16 bits for the prob.) The AV1 range coder is using 16x16 multiplications. The stack buffer storing the probabilities in reserse order is more complex to maintain in a mixed bitwise and multisymbol adaptive encoding This is also what I've found in the Turbo-Range-Coder experiment.
    194 replies | 70564 view(s)
  • Darek's Avatar
    Yesterday, 21:48
    Darek replied to a thread Paq8pxd dict in Data Compression
    Truely speaking - I don't know. These tests were tested mostly w/o tarred file test from beginnig. I was copy existed approaches. Maybe due to the fact that there are longer tests and compress additionally tarred file double the test time. For paq8px/pxd there is a quite reasonable time yet, but for cmix it's additonal 3-day test in one approach which sometimes could be hard to handle. From other side - tarred file test didn't give you any information about improvement of particular files. According to MaximumCompression corpus estimate -> 1'000'000 is very ambitious challenge... at now best scores of all files gives 5'872'598 bytes. Using all techniques from all compressors (especially paq8px and paq8pxd for FlashMx.pdf and vcfiu.hlp) into actually best cmix there is a chance to get 5'800'000, maybe 5'700'000 bytes. More parsers and better NN compression maybe gives additonal 100-300KB of gain then we are landing about 5'400'00-5'500'000 bytes. There's need a completely new technique or specialized parsers for all files to get lower score but it could be 4'000'000 bytes. Hmmmm... 1'000'000 looks impossible at now for me. In attached table there are the best scores of MaximumCompression corpus for most of best compressors.
    707 replies | 282021 view(s)
  • Scope's Avatar
    Yesterday, 21:46
    ​ Thanks. WebP2 saves a little more detail than AVIF (Aomenc) with a slightly lower clarity of the overall picture (or this was added from JPEG). It will be interesting to try it, although this is already a big fragmentation of image formats, but I didn’t really understand the rush with AVIF (hardware encoding support?), it looks overcomplicated for static images and I don’t see work on improving the quality of any AV1 encoders in this direction, WebP2 format seems to me more promising for its development. I hope these experiments will not be abandoned I have the same opinion about quality, encoders should not filter out images and make them “better”. Various artifacts can be cleaned later, but it will not be possible to recover lost details. And there are many cases when the average person chose the best images with lost details and far from the original, if the other image had slightly more noticeable artifacts. I noted this even during the popularity of RealVideo or when WebP just appeared and people began to transcode their photos and say that the quality is even cleaner and better than the original. But on the Web, there is sometimes a need for very low image sizes, for a rough understanding of what is shown in the picture, where accuracy and details are not so important, but artifacts and low resolution can annoy people. The same applies for LQIP, when this placeholder will only be shown for a short time and where its beauty will be more important than accuracy. Yes, thanks, it looks a little better. It may be better to enable this option by default at low -Q settings?
    88 replies | 18630 view(s)
  • JamesB's Avatar
    Yesterday, 20:41
    It's currently the way to do fast rANS decoding *in CPU*. If you're looking at something which is optimised primarily for hardware, then your design can change radically. PS. Personally I wouldn't put too much faith in my rans_static implementations. They're rather hacked up messes, just for experimentation purposes to see how fast it can be driven. For production code we're not yet using any SIMD based ANS because it hasn't been the bottleneck in our pipelines anyway. The more stable code is https://github.com/jkbonfield/htscodecs although I'm open to suggestions for improvement. :-) I keep thinking I should do a CDF adaptive version, but in my own experiments it didn't massively win out over normal arithmetic or range coding as the efficiency of rANS starts to get lost once you add higher level modelling in there.
    194 replies | 70564 view(s)
  • CompressMaster's Avatar
    Yesterday, 20:15
    btw, it´s interesting to see how compression improving over time. And, maybe overestimated expectation, - when did you expect Maximum compression corpus goes below 1 000 000 bytes?
    707 replies | 282021 view(s)
  • CompressMaster's Avatar
    Yesterday, 20:12
    Why Silesia and MaximumCompression corpuses aren´t tarred? Maximum compression corpus - only 40 807 bytes left to get below 6 000 000 bytes! As always, good testing, Darek!
    707 replies | 282021 view(s)
  • Jarek's Avatar
    Yesterday, 20:10
    You have lots of 16bit in https://github.com/jkbonfield/rans_static Especially for 16 size alphabet, you could use much lower state, down to ~8bit multiplication if optimizing for hardware. tANS usually uses 11-12bits state for 256 size size alphabet. It has 1bit renormalization to reduce redundancy, in hardware it could be also cheaply done for low state rANS. The rest is the same for RC and rANS: search for corresponding subrange and probability adaptation. The same for ANS - you can use the same state while varying probability distributions (done in adaptive) or alphabet size.
    194 replies | 70564 view(s)
  • pacalovasjurijus's Avatar
    Yesterday, 19:56
    I working on version 1.0.0.1.8​I can compress this file:
    28 replies | 1595 view(s)
  • pacalovasjurijus's Avatar
    Yesterday, 19:14
    please, compress 3MB any file by zip and send to me. 2,689,630 s.zip 2,689,629 s.zip.b
    28 replies | 1595 view(s)
  • pacalovasjurijus's Avatar
    Yesterday, 18:53
    https://pinetools.com/random-file-generator c2 u2 3,145,728 3,145,726.b2 c u 3,145,728 3,145,724.b
    28 replies | 1595 view(s)
  • Jyrki Alakuijala's Avatar
    Yesterday, 18:18
    Fully agreed. A compression system should follow the principle of the least surprise for the user. I have seen video codecs to auto-renovate a damaged wall, to remove a crack on an engine block, to change the skin to see to make face look different age, plastic and to have too much make up foundation. Basically when there is an edge they can preserve that edge well, but the texture in that general neighbourhood or film grain can be significantly damaged or removed. Since the compression system does not know the utility of features, it should not start removing them. Perhaps the photo was taken to evaluate a skin rash, damage report of the motor, to evaluate the repainting needs of an apartment, or decide about the quality of cloth of something you are buying. If textures are smoothed, some utility values are erased. From what I can see jpeg xl can deliver very consistent results at 0.5 to 1.5 bpp, while video codecs occasionally struggle even at close to 3.0 bpp.
    88 replies | 18630 view(s)
  • SvenBent's Avatar
    Yesterday, 18:05
    I see no method to get back to the original file so your "compresse" file is lacking data/information. I can compress anything to one bit if there is no need to decompress it Is it you data? 1= yes 0=no
    28 replies | 1595 view(s)
  • dnd's Avatar
    Yesterday, 17:43
    Well, I'm referring to current rANS (<= 16 symbols) CDF implementations, that are all based on a branchless 32 bits SIMD in the symbol search at decoding. This is currently the only one possiblility to make rANS decoding fast. If, you do a scalar symbol search, then you lose the speed advantage to a range coder. And as you can see in Turbo-Range-Coder, you can also implement a fast interleaving and symbol search in RC that's faster than using division or reciprocal multiplication. see benchmark The division argument to justify the complexity and slowness of a multisymbol RC in decoding doesn't count anymore. Alias tables are static and not ANS specific. I've not tested this to have a definitive conclusion. A bitwise RC can also be very fast, adaptive, compress better and doesn't require large tables. I'm not aware of any fast rANS implementation with less than 32 bits state. I'm only interested in practical implementations and not what's can be possible in theory, because there is often a big divergence between implementations. See facts and real numbers in the entropy coder benchmark. In general a range coder is simpler and more natural to use than rANS, encoding is serval times faster and doesn't require buffering. You can easy combine bitwise and multisymbols RC.
    194 replies | 70564 view(s)
  • Shelwien's Avatar
    Yesterday, 16:46
    @Kaw: digits.bin is a bad target. That data is known to have at least 50 bits of redundancy (column parity), but you can't find it in packed format. https://encode.su/threads/3122-dec2bin-converter-for-Nelson-s-million-digits-file
    19 replies | 925 view(s)
  • Jon Sneyers's Avatar
    Yesterday, 16:41
    Yes, at such a low bitrate, codecs derived from video codecs will produce nicer looking results. They basically turn any photo into a nice painting with smooth texture and nice crisp lines – that's what directional predictors and strong deblocking filters will give you. It looks nicer than DCT artifacts, but in terms of authenticity I don't like it. You can often get a somewhat better JXL at low bitrates by using --progressive_dc=1, e.g. in this case you can get this https://res.cloudinary.com/jon/progressive_dc.jxl.png in 24,533 bytes (with -q 61 --progressive_dc=1). Still too low a bitrate to get a decent image, but it's better than what you get without progressive_dc (looks like you had to go down to -q 44 to produce that 24,664 byte JXL). Authenticity of image codecs – i.e. having "honest artifacts" – is in my opinion an important thing. When in the future AI-based codecs become mainstream, it will become an even more important thing. I don't want a codec that auto-photoshops my pictures, applying copious amounts of algorithmic foundation makeup to all the faces, producing fake images that look like they might be real. If the bitrate is too low, I want to able able to see that (when I zoom in on the pixels), instead of seeing some plausible but fake image. The goal of the lossy compression game should be to minimize distortion, not to hide it. For that reason, I prefer subjective evaluations using flip or flicker tests, not side-by-side or even worse, no-reference.
    88 replies | 18630 view(s)
  • Jarek's Avatar
    Yesterday, 15:04
    Very interesting, where did you get this nonsense? For tANS, we need ~4x more states than alphabet size to have redundancy on noise level. Redundancy drops ~4x every 2x more states. Let say it is ~16x for rANS, so for 256 size alphabet you would need 12bit state ... less for 16-size alphabet used in adaptive setting. 16bit multiplication with 8bit renormalization for rANS is muuuuuch above what is needed. There is usually used 32bit only because it doesn't matter on CPU, and allows to use 16-bit renormalization. Even better, we can use multiplication by very low numbers - here is redundancy bits/symbol for binary alphabet using multiplication by at most 2bits numbers, also comparing with Moffat's approximation for AC: RC/ANS decoders need exactly the same search: to which subrange corresponds given position. Alias is its optimization to table use and single branch - could be also applied to RC. ANS requires additional buffer for backward encoding - only in encoder, which already requires huge memory for performing all analysis.
    194 replies | 70564 view(s)
  • martinradev's Avatar
    Yesterday, 14:48
    An issue I am facing: I am trying to integrate CTW as a model into paq8px to check whether it somehow improves the predictions for our test data. I already have the integration done and I can see that the trace of predictions is the same as the CTW compressor I lifted the implementation from. However, the archive size doesn't change, so I must be doing something wrong with providing the probability to the mixer. I also locally modified PAQv182 to expose dynamically which models to use and to also enable/disable APM. This was done to find out what models have a negative effect or do not contribute for our data. For my test data, I disable all models but the ExeModel and the CTWModel and observe that the compressed archive is about the size of the input file which means that the predicted probability is somehow not used by the mixer or I am doing just something horribly wrong. I usually pick the ExeModel has it produces really poor results for our test data and I was hoping that the newly integrated model would produce good predictions and thus lead to an overall small archive size. The code at a high level looks the following: class CTWModel { MIXERINPUTS = 1; MIXERCONTEXTS = 0; MIXERCONTEXTSETS = 0; void mix(Mixer& m) { p = get_prob(); // this is in the range m.add(stretch(p)) } } Then in the ContextModel constructor I add the input size contribution from CTWModel MixerFactory::CreateMixer() ( ... + CTWModel::MixerInputs, ... + CTWModel::MixerContexts, ... + CTWModel::MixerContextSets); Then in ContextModel:: p () I mix the contribution from CTWModel: ContextModel:: p () { ... CTWModel &model = models.ctwModel(); model.mix(*m); } Am I doing something wrong with MIXERCONTEXT and MIXERCONTEXTSETS? I do not really understand the complete PAQ pipeline so I might be misusing parts. I would be happy if somebody could give me some hints.
    10 replies | 2258 view(s)
  • Jon Sneyers's Avatar
    Yesterday, 14:42
    I explained the context modeling (in JXL's Modular mode) here: https://encode.su/threads/3108-Google-s-compression-proje%D1%81ts?p=63875&viewfull=1#post63875
    88 replies | 18630 view(s)
  • Sportman's Avatar
    Yesterday, 14:41
    Sportman replied to a thread 2019-nCoV in The Off-Topic Lounge
    Checking table 1: http://weekly.chinacdc.cn/en/article/id/e53946e2-c6c4-41e9-9a9b-fea8db1a8f51 And https://www.who.int/ith/2020-0901_outbreak_of_Pneumonia_caused_by_a_new_coronavirus_in_C/en/ http://www.xinhuanet.com/politics/2020-01/09/c_1125438971.htm Date, Confirmed cases, Deaths, Case fatality Before Dec 31, 2019, 104, 15, 14.4% Jan 1–10, 2020, 653, 102, 15.6% Jan 11–20, 2020, 5,417, 310, 5.7% Jan 21–31, 2020, 26,468, 494, 1.9% After Feb 1, 2020, 12,030, 102, 0.8% I notice three things: 104 cases and 15 deaths before Dec 31, 2019 while CN media started to write first time at Jan 9, 2020 about 15 cases. Death rate round 15% but then suddenly drop after this first media date. How quick cases grow (spread through mucous membranes, mouth, nose and eyes). If five countries (Iran xxxx, South Korea 204, Japan 97, Singapore 85, Hong Kong 69) get the same grow rate then this is very alarming.
    2 replies | 75 view(s)
  • dnd's Avatar
    Yesterday, 14:13
    It will be nice to know on which criteria and analysis is the use of ANS in jpeg-xl based. From my analysis, one reasonable criteria in AV1 is the use of 16-bit multiplications in the daala range coder instead of the 32-bits required for ANS. This is apparently a lot cheaper to implement in hardware. Less memory is also required for the range coder in AV1 compared to ANS. ANS in jpeg-xl is based on alias tables. Given that AV1 decoders will be soon delivered for smartphones, it is more reasonable to base future image/video codecs on the AV1 coders to exploit the hardware coders. Unlike jpeg-xl, AVIF is based on AV1. Realtek 4K Set-top Box SoC RTD1319/RTD1311 to Support Android 10 on Android TV
    194 replies | 70564 view(s)
  • kaitz's Avatar
    Yesterday, 13:36
    kaitz replied to a thread Paq8pxd dict in Data Compression
    On my computer it takes about 9800 seconds to compress.
    707 replies | 282021 view(s)
  • Kaw's Avatar
    Yesterday, 12:09
    public static boolean backwardsMatch(java.util.BitSet bitSet, int setIndex, char pattern, int maxDepth) { // These two values are set when we observe a wildcard character. They // represent the locations, in the two strings, from which we start once // we've observed it. // int startIndex = setIndex; int setBookmark = 0; int patternBookmark = 0; int patternIndex = 0; // Walk the text strings one character at a time. while (startIndex - setIndex < maxDepth) { // How do you match a unique text string? if (pattern == '*') { // Easy: unique up on it! while (++patternIndex < pattern.length && pattern == '*') { } // "xy" matches "x**y" if (patternIndex == pattern.length) { return true; // "x" matches "*" } if (pattern != '?') { // Fast-forward to next possible match. while (bitSet.get(setIndex) != (pattern == '1')) { if (--setIndex == -1 || startIndex - setIndex >= maxDepth) return false; // "x" doesn't match "*y*" } } patternBookmark = patternIndex; setBookmark = setIndex; } else if (setIndex > -1 && bitSet.get(setIndex) != (pattern == '1') && pattern != '?') { // Got a non-match. If we've set our bookmarks, back up to one // or both of them and retry. // if (patternBookmark > 0) { if (setIndex != patternBookmark) { patternIndex = patternBookmark; if (bitSet.get(setIndex) != (pattern == '1')) { // Don't go this far back again. setIndex = --setBookmark; continue; // "xy" matches "*y" } else { patternIndex++; } } if (setIndex > -1) { setIndex--; continue; // "mississippi" matches "*sip*" } } return false; // "xy" doesn't match "x" } setIndex--; patternIndex++; // How do you match a tame text string? if (setIndex < 0) { // The tame way: unique up on it! while (patternIndex < pattern.length && pattern == '*') { patternIndex++; // "x" matches "x*" } if (patternIndex >= pattern.length) { return true; // "x" matches "x" } return false; // "x" doesn't match "xy" } } return false; } This is a function in Java to be able to use wildcard patterns on bitsets. I found a piece of code on the DrDobbs website and changed it from forwards to backwards and from characters to bits. Now you can do "000?1*" and it will return true if for some index you find this pattern. The second step is that I made a function that generates N pseudo-random patterns and goes over the bitset to find matches. I build an simple filter that recognizes not so useful patterns so we end up with at least probable useful patterns. Lets say 10 bits (1024) worth of patterns. I take the pattern that has the biggest entropy gain (P1 * matchCount) and save the index of the pattern with the probability converted to 8 bits. This way a pattern costs just 18 bits to save to the disk. The gain has to be at least 18 bits to improve compression. I ran this algorithm on AMillionRandomDigits.bin and a 7zipped Book1. The best I could find was a gain of -8 bits. Or in other words: a loss of 1 byte for every try to compress the uncompressible. So no luck here. Okay, lets cheat... Lets generate hundreds of thousands of patterns on AMillionRandomDigits.bin and try to gain more than 18 bits. I could not find anything... Lets use a different approach and lets use 10000 patterns in parallel and use a conservatively configured Paq-like mixer to try to compress AMillionRandomDigits.bin and the 7zipped Book1. My best result was 2 bits loss on AMillionRandomDigits.bin and 8 byte gain on the 7zipped book1. Reason? 7zip has some compressable bits on the start and the end of the file like mentioning the filename and a bit of checksum stuff. I think this is a good approach of trying to compress the random and it still fails. It fails because random means not predictable and that means compression is not possible. I could try to use Data Science to assemble a random forest of decision trees, boosted by a modified version of Gradient Boosting (to entropy) that does the job for every random file you input. And then I hide the decision trees in the executable. Compression by obscurity. Still no theoretical gains though!
    19 replies | 925 view(s)
  • Mauro Vezzosi's Avatar
    Yesterday, 11:43
    Mauro Vezzosi replied to a thread paq8px in Data Compression
    In JpegModel the line m1=MixerFactory::CreateMixer(sh, N+1, 2050, 3); should be m1=MixerFactory::CreateMixer(sh, N+1+2, 2050, 3); because MJPEGMap adds 2 additional inputs to the mixer MJPEGMap.mix(*m1); To reproduce the error I ran the following test with paq8px_v183fix1, but I suppose this problem is still present in the current version. Commenting out the line //#define NDEBUG // Remove (comment out) this line for debugging (turns on Array bound checks and asserts) and compressing a JPEG file with -simd none, e.g. paq8px_v183fix1 -9 -simd none A10.jpg the assert(nx<N) in Mixer.add() is triggered: Assertion failed! Program: C:\paq8px_v183fix1\paq8px_v183fix1.exe File: paq8px_v183fix1.cpp, Line 1735 Expression: nx<N
    1814 replies | 517778 view(s)
  • Darek's Avatar
    Yesterday, 11:11
    Darek replied to a thread Paq8pxd dict in Data Compression
    Scores of 4 Corpuses for paq8pxd_v74 -x15 option. Improvements to paq8pxd_v72 from 0.26% to 0.59% and of course the best scores for all four testsets for paq8pxd series.
    707 replies | 282021 view(s)
  • compgt's Avatar
    Yesterday, 10:26
    compgt replied to a thread 2019-nCoV in The Off-Topic Lounge
    This nCov virus threat is indeed a serious matter. There are now reports that nCov might had been intentionally designed. When i was with the authorities during Martial Law Cold War, we "masked" top secret intel that a nuke weapon or nuke bomb components are being transported or illegally shipped to some nations as mere "news" of biological virus spreading worldwide but actually it was nukes. For one, this will strengthen routine checks everywhere. The culprits might panic and reveal their suspicious nature and alert the authorities. Incidentally, amid this coronavirus outbreak that is now global, there are reports that some Chinese individuals arrived at my hometown in Masbate, Philippines as workers of a popular Philippine construction firm. These Chinese groups were reportedly rude to local Masbateños especially to women. It's like an invasion, me stating i was asked to "thwart" coronavirus or design a vaccine when i was in grade school in the 1980s, probably by Princess Anne of UK and former US Pres. Barack Obama. Well, maybe the early coronavirus strains.
    2 replies | 75 view(s)
  • Trench's Avatar
    Yesterday, 07:31
    Storage less reliable? I guess to some things. I remember 1980's hd slowly broke down to have time to transfer but now they just die without warning. But i figure the ssd might get better. But thats another topic. hardware does not seem to increase as fast as it use to. "250kb smaller compressed size of 100M file (which is around 15mb)." So a 2% gain a year sounds good compared to my suggested 3% one time only trick to switch a highly repeated upper characters to a lower character value. But again dont know if that can be implemented. As for the png yes better but not many read to view picture formats do as better job. But thanks for the info. How does one think out of the box when everyone is use to thinking in the box? Sure I can say things that are out of the box that does not work but at least I gave some suggestion and other have too i assume, which maybe 1 out of 100 out of the box ideas may stick or at least inspire an idea for progress. A lof of things in life done from a new perspective which inspires others to ass on. Even something silly as playing with definitions of the word to say their is no impossible random compression which everyone used that phrase which puts a road block in peoples mind. Which no one can dispute that it is not random that is the issue. And with that in mind random can be used to help things. If anything make the file more "random" to compress. An when I say random I find it hard to believe people would only think about compression only and not decompression as I stated before. So make a organised chaos since the file is luck in what type you get in how much you compress it so why not mix it up more to see if the "odd" are better? Maybe wrong again but as stated above its not random that is the issue but incomprehensible. Even a silly suggestion is to make a compressed file from a code like this 11335578... to add a basic pattern code that takes 1bit like 10101010... for the compressed file to be 12345679... and to be done many times if needed with other selected patterns logged to be compressed/decompressed. here is a random number what pattern does it have 128705610 ? at first whack at it nothing throw some other selected patterns in it and see when you get something. Thing is to see a list of various patterns it can handle. odds/evens. high/low, etc.. Maybe programs do that already then dont mind me. But can a machine find a pattern to that randomly pressed number? 8/9 numbers have a pattern remember those IQ tests how you had to solve find patterns? experts say most fail to get 100% and only less than 1% getting them all right. They dont say that stuff to people. Even something as silly as rubix cube can be solved by a computer in record time since they say its a math problem in a way. Take 100 pennies flip them and you get close to 50% heads and 50% tails, take the heads out and flip again the the other 25%... What that 4 out of the box points in this post? lets see what others have for ideas reach 100 of them and see if 1 sticks. or maybe its less than 100 to be 25 out of 1. :) no need to comment its just a thought.
    19 replies | 925 view(s)
  • Sportman's Avatar
    Yesterday, 03:03
    Are there people from Iran active in this forum? I read rumors about 20 death (one hospital) and 2000-2500 infected in Iran, much higher as official reported (https://www.worldometers.info/coronavirus/) any truth in that? I followed SK this week because the fast grow in cases but Iran can be worst.
    2 replies | 75 view(s)
  • skal's Avatar
    Yesterday, 02:57
    For completeness, here is a 24781 bytes encoded version with WebP2 ("cwp2 -q 50 ...", for the record). That's 0.07bpp. You'll just have to trust me on this one, until webp2's git repo is made public (soon) :) and yes, triangle-based preview is still very much under consideration. ​
    88 replies | 18630 view(s)
  • dnd's Avatar
    20th February 2020, 22:41
    TurboBench Compression Benchmark new update: - _WIN32 added when __CYGWIN__ is defined You can use the option -l1 to display the codecs compiled in TurboBench including the possible levels and parameters: ./turbobench -l1 Plugins: brotli 0,1,2,3,4,5,6,7,8,9,10,11/d#:V bzip2 fastlz 1,2 flzma2 0,1,2,3,4,5,6,7,8,9,10,11/mt# glza bsc 0,3,4,5,6,7,8/p:e# bscqlfc 1,2 libdeflate 1,2,3,4,5,6,7,8,9,12/dg zpaq 0,1,2,3,4,5 lz4 0,1,9,10,11,12,16/MfsB# lizard 10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49 lzfse lzham 1,2,3,4/t#:fb#:x# lzlib 1,2,3,4,5,6,7,8,9/d#:fb# lzma 0,1,2,3,4,5,6,7,8,9/d#:fb#:lp#:lc#:pb#:a#:mt# lzo1b 1,9,99,999 lzo1c 1,9,99,999 lzo1f 1,999 lzo1x 1,11,12,15,999 lzo1y 1,999 lzo1z 999 lzo2a 999 lzsa 9/f#cr quicklz 1,2,3 sap 0,1,2 zlib 1,2,3,4,5,6,7,8,9 zopfli zstd 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,-1,-2,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,-13,-14,-15,-16,-17,-18,-19,-20,-21,-22/d# imemcpy memcpy fse fsehuf TurboRC 11,12,13,14,21,22,23,24,25,30,31,32,33,34,35 zlibh 8,9,10,11,12,13,14,15,16,32 zlibrle divbwt st 3,4,5,6,7,8
    177 replies | 43761 view(s)
  • Jyrki Alakuijala's Avatar
    20th February 2020, 19:39
    It can be less important than you think initially. The photography compression engine VarDCT inside JPEG XL doesn't use many predictors. The Modular mode uses predictors, but its use is more limited to graphics and for some special uses.
    88 replies | 18630 view(s)
  • Scope's Avatar
    20th February 2020, 19:39
    https://imgsli.com/MTIzNTk/ (~24,500 bytes each image) AVIF (24,423 bytes) JXL (24,664 bytes) JXL (24,410 bytes) + https://i.slow.pics/TVoEtx9j.webp (24,688 bytes) Here is another example of extreme compression where Jpeg XL (in two modes) is noticeably worse than all other encoders, except for the blocking artifacts from MozJpeg. Especially with the popularity of the bokeh effect in a photo where the background is blurred and not so important for its accurate display, and on the object in focus, the texture details and sharpness are much more important, it would be nice to be able to distribute more bits to it. And since LQIP is part of the whole image in Jpeg XL, is it possible to display it in a more artistic and enjoyable (not just ordinary upscale) view, with a JXL decoder (module)? This would be less resource intensive than using external additional JS libraries and tools. Or like similar experiments that are planned in WebP v2 (triangle-based preview) or will it be easier to use Webp2-like formats in a separate 200-400 bytes picture, without complicating the Jpeg XL format?
    88 replies | 18630 view(s)
  • cade's Avatar
    20th February 2020, 18:18
    Oops, condition should be dist <= op ​Updated exe in first post and source on github, thanks.
    41 replies | 2668 view(s)
  • Darek's Avatar
    20th February 2020, 15:47
    Darek replied to a thread Paq8pxd dict in Data Compression
    some enwik scores for paq8pxd_v74: 16'358'450 - enwik8 -s8 by Paq8pxd_v72_AVX2 16'013'638 - enwik8 -s15 by Paq8pxd_v72_AVX2 16'672'036 - enwik8.drt -s15 by Paq8pxd_v72_AVX2 126'779'432 - enwik9_1423 -s15 by Paq8pxd_v72_AVX2 132'464'891 - enwik9_1423.drt -s15 by Paq8pxd_v72_AVX2 16'339'122 - enwik8 -s8 by Paq8pxd_v74_AVX2 15'993'409 - enwik8 -s15 by Paq8pxd_v74_AVX2 15'956'537 - enwik8.drt -s15 by Paq8pxd_v74_AVX2 16'279'540 - enwik8 -x8 by Paq8pxd_v74_AVX2 15'928'916 - enwik8 -x15 by Paq8pxd_v74_AVX2 - best score for non DRT preprocessed enwik8 file for whole paq8pxd family 15'880'133 - enwik8.drt -x15 by Paq8pxd_v74_AVX2 - best overall score for enwik8 file for whole paq8pxd family - enwik9 score could be even below 126'000'000 bytes :))) I'll check it.
    707 replies | 282021 view(s)
  • xezz's Avatar
    20th February 2020, 14:16
    command:c flags:nothing compression failed on alice29.txt(http://corpus.canterbury.ac.nz/descriptions/#cantrbry) Condition: dist<op
    41 replies | 2668 view(s)
  • Jarek's Avatar
    20th February 2020, 12:53
    Thanks, I will look closer on this choosing a predictor problem. For upsampling with Haar, indeed zero predictor seems a natural choice, but not necessarily optimal e.g. near edges. Also, predictor is only half of the question - the second half is evaluating accuracy of this predictor to choose width of Laplace distribution for entropy coding (also DCT coefficients) - e.g. JPEG LS uses 365 contexts for that ( https://en.wikipedia.org/wiki/Lossless_JPEG#Context_modeling ), how is it evaluated in JXL?
    88 replies | 18630 view(s)
  • Jyrki Alakuijala's Avatar
    20th February 2020, 12:38
    The coffee image is 0.139 BPP. This is only good for demonstrating a variety of artefacts, not attractive for actual use. Current image bytes in the internet average at ~2.0 BPP for larger images. With the next generation with 3x more efficiency, we will land somewhere around 0.7 BPP. For calm images less, for noisy or highly textured images a bit more. When you up the bit rates the artefacts are not just reduced, but they completely disappear. There are non-linear filters that will take care of them once the amplitude is lower. We may be able to tune the filters (the smoothing filter has a spatial control field) later for such very low BPP images, too. It just has not been a priority for now. Unlike with pretty much every other codec, with JPEG XL it is not difficult to get into the decent quality, and you don't need to experiment with what BPP you'd get it. You just specify the multiples of just noticeable difference as a distance and the codec takes care of the rest. For Internet use I'd recommend something around distance 1.5 to 2.0 to land between 0.7 to 1.0 BPP on a larger corpus.
    88 replies | 18630 view(s)
  • Jyrki Alakuijala's Avatar
    20th February 2020, 12:13
    Select is a Paeth-like predictor that I introduced in WebP lossless. It has less branches than Paeth, runs a bit faster, and compresses better. Outside of JPEG XL standard docs the only place where it is documented is the WebP lossless bitstream specification.
    88 replies | 18630 view(s)
  • Jon Sneyers's Avatar
    20th February 2020, 11:20
    For Squeeze residuals, the best predictor is just the zero predictor (the residuals are in a sense already predictor residuals) – let the meta-adaptive context model do the rest. For non-Squeezed pixel data, we have two options: 1) the usual simple predictors (zero, left, top, avg, select, gradient, variable per row), with a meta-adaptive context model, 2) a weighted predictor (based on Alexander Rhatushnyak's lossless PIK), which uses 4 parametrizable subpredictors, 3 of which have error-feedback built in, and the final prediction is a weighted sum of those 4 subpredictors, based on their past performance. In this case the context model is fixed and based on the max-prediction-error of the surrounding previously decoded pixels. The meta-adaptive context model (used for Squeeze residuals and simple predictors) is based on FLIF's MA trees: the context model itself is signalled, and it's a decision tree where the nodes are of the form (property > value ?), with a set of properties that include things like abs(NorthWest-North) and the x,y coordinates of the pixel. A good encoder produces small but effective MA trees that segment the data into maximally homogeneous contexts, without creating too many contexts.
    88 replies | 18630 view(s)
  • cade's Avatar
    20th February 2020, 04:09
    Simplified version of RK256, also carries the last match along: struct RK256 { const static int NICE_MATCH_UNTIL = 16; const static int BLOCK_BITS = 8; const static int BLOCK_SIZE = 1 << BLOCK_BITS; const static int BLOCK_MASK = BLOCK_SIZE - 1; const static uint32 ADDH = 0x2F0FD693u; //uint32 remh = 1; for (int i = 0; i < BLOCK_SIZE; i++) { remh *= addh; } const static uint32 REMH = 0x0E4EA401u; INLINE uint32 rolling_hash_add(uint32 p, int y) { return (y + p) * ADDH; } INLINE uint32 rolling_hash_add_remove(uint32 p, int yr, int yl) { return (yr + p - yl * REMH) * ADDH; } uint16* cache; uint32* table; uint32 rh, rh_end; uint32 cur_match_from, cur_match_len, cur_match_to; RK256() : cache(nullptr), table(nullptr), rh(0), rh_end(0), cur_match_from(0), cur_match_len(0), cur_match_to(0) { } void Roll(const byte* RESTRICT buf, uint32 p, uint32 p_end) { while (rh_end < BLOCK_SIZE && rh_end < p_end) { rh = rolling_hash_add(rh, buf); ++rh_end; if (!(rh_end & BLOCK_MASK)) { cache = hash4(rh) >> 16; table = rh_end; } } if (p - cur_match_to < cur_match_len) { uint32 diff = p - cur_match_to; cur_match_from += diff; cur_match_to += diff; cur_match_len -= diff; } else { cur_match_len = 0; } if (cur_match_len > NICE_MATCH_UNTIL) { while ((p >= rh_end && rh_end < p_end) || (rh_end >= BLOCK_SIZE && rh_end - p < BLOCK_SIZE && rh_end < p_end)) { rh = rolling_hash_add_remove(rh, buf, buf); ++rh_end; if (!(rh_end & BLOCK_MASK)) { cache = hash4(rh) >> 16; table = rh_end; } } return; } while ((p >= rh_end && rh_end < p_end) || (rh_end >= BLOCK_SIZE && rh_end - p < BLOCK_SIZE && rh_end < p_end)) { rh = rolling_hash_add_remove(rh, buf, buf); ++rh_end; uint16& cache_end = cache; uint16 hash_cur = hash4(rh) >> 16; if (cache_end == hash_cur) { uint32& hist_end = table; if (hist_end < rh_end && hist_end >= BLOCK_SIZE) { uint32 sp = hist_end - BLOCK_SIZE; uint32 mp = rh_end - BLOCK_SIZE; ASSERT(p >= mp); ASSERT(p > sp); uint32 pos_delta = p - mp; sp += pos_delta; mp += pos_delta; if (sp < p) { int len = try_match_unbounded(buf, sp, mp, p_end); if (len > cur_match_len) { cur_match_len = len; cur_match_from = sp; cur_match_to = mp; } } } if (!(rh_end & BLOCK_MASK)) { cache_end = hash_cur; hist_end = rh_end; } } else if (!(rh_end & BLOCK_MASK)) { cache_end = hash_cur; table = rh_end; } } } }; LZC1::RK256 rk256; rk256.table = new uint32; rk256.cache = new uint16; memset(rk256.table, -1, sizeof(uint32) << LZC1::RK256_HASH_BITS); memset(rk256.cache, -1, sizeof(uint16) << LZC1::RK256_HASH_BITS); Two limitations compared to the more complicated version: 1. Shift left is missing (for sliding windows). 2. Inefficient if p not updated for more than BLOCK_SIZE (extra matches tested that probably won't reach p while rolling). Experimenting with a chunk-based version with just static Huffman codes updated every 256k blocks, decompression speed is 3-5x faster, ratio only is ~2-3% worse in most cases.
    41 replies | 2668 view(s)
  • Shelwien's Avatar
    20th February 2020, 01:58
    Apparently it uses cygwin there. STATIC=1 let it build the exe anyway, though. With PATH set to mingw \bin it actually compiled with just "make". But then didn't work because of too many linked mingw dlls. I think STATIC=1 should be default on windows.
    177 replies | 43761 view(s)
  • kaitz's Avatar
    20th February 2020, 01:12
    kaitz replied to a thread Paq8pxd dict in Data Compression
    It's used in stream where all default data is. Also for text as humans tend to produce allot of it. :) Some images have headers and it my gives somewhat better compression on that, but not very useful. It all depends (how many files, etc). This needs time consuming testing to be actually useful on other types of data. My test version shows what context are mostly bad for given data in time. I think there has been good enough improvements from someone like me. But i still wonder.
    707 replies | 282021 view(s)
  • dnd's Avatar
    20th February 2020, 00:34
    The coding with dlsym is only included in linux. It is not included when _WIN32 is defined. see turbobench.c Normally it compiles with mingw without any issue. see the CI Mingw build . Don't know why _WIN32 is not defined in your gcc? You can try to compile with "make STATIC=1" (NMEMSIZE will be defined)
    177 replies | 43761 view(s)
  • Shelwien's Avatar
    20th February 2020, 00:10
    Z:\010\TurboBench> C:\MinGW820x\bin\make.exe gcc -O3 -w -Wall -DNDEBUG -s -w -std=gnu99 -fpermissive -Wall -Ibrotli/c/include -Ibrotli/c/enc -Ilibdeflate eflate/common -Ilizard/lib -Ilz4/lib -Izstd/lib -Izstd/lib/common -Ilzo/include -D_7ZIP_ST -Ilzsa/src -Ilzsa/sr vsufsort/include turbobench.c -c -o turbobench.o turbobench.c: In function 'mem_init': turbobench.c:154:24: error: 'RTLD_NEXT' undeclared (first use in this function); did you mean 'RTLD_NOW'? mem_malloc = dlsym(RTLD_NEXT, "malloc" ); ^~~~~~~~~ RTLD_NOW turbobench.c:154:24: note: each undeclared identifier is reported only once for each function it appears in makefile:717: recipe for target 'turbobench.o' failed make: *** Error 1
    177 replies | 43761 view(s)
  • dnd's Avatar
    19th February 2020, 23:48
    On linux it's simple to download and build the package, but on windows you must first install git and the mingw-w64 package. This scares windows users. The submodules are already updated automatically and there is also a "make cleana" (linux only) to remove some unnecessary huge directories. I've made a new release with builds for linux+windows and a cleaned small source code 7zip package (5MB) containing all submodules ready to build. That's a solution for users with limited download bandwidth or with difficulties to build turbobench. But this step implies more work to setup. I've added this option in the readme file. As already stated, if you have git and gcc/mingw installed then there is no problem to download and build turbobench. This option reduces the downloaded size by few percents, but the huge submodules will be still completely downloaded.
    177 replies | 43761 view(s)
  • Darek's Avatar
    19th February 2020, 23:33
    Darek replied to a thread Paq8pxd dict in Data Compression
    @kaitz - at first thanks to fix this. At second "-x option has affect only on default, text mode." - is it ineffective for other types of data?
    707 replies | 282021 view(s)
  • kaitz's Avatar
    19th February 2020, 21:14
    kaitz replied to a thread Paq8pxd dict in Data Compression
    paq8pxd_v74 fix -x option on level 10-15 -x option has affect only on default, text mode. ​
    707 replies | 282021 view(s)
  • Jarek's Avatar
    19th February 2020, 16:45
    Thanks, looks similar philosophy as this JPEG LS predictor ( https://en.wikipedia.org/wiki/Lossless_JPEG#LOCO-I_algorithm ) - using some manually chosen heuristic condition to select one of a few predictors, e.g. for smooth regions and edges. It could be optimized based on dataset with automatically found classifier - cluster into different distinguishable types of regions, such that predictor ensemble gives lowest MSE ... I can also build adaptive predictors - using adaptive linear regression: with coefficients evolving to adapt to local dependencies (page 4 of https://arxiv.org/abs/1906.03238 ). But it would require one LinearSolve() per adaptation ... and generally context-dependence seems a better way. Beside predicting value, it is also crucial to evaluate accuracy of such prediction - in practice width of Laplace distribution. JPEG LS uses brute force for this purpose: 365 contexts (I really don't like) - how do you choose it in FLIF, JXL?
    88 replies | 18630 view(s)
  • algorithm's Avatar
    19th February 2020, 13:49
    To reduce size of repos you can just shallow clone git clone --depth=1 This downloads only the latest revision, omitting history.
    177 replies | 43761 view(s)
  • Jon Sneyers's Avatar
    19th February 2020, 13:15
    That part is just Haar. The nonlinear part is in smooth_tendency(), which is basically subtracting the residu you would expect from interpolation, but only if the local neighborhood is monotonic (so it avoids overshoot/ringing).
    88 replies | 18630 view(s)
  • schnaader's Avatar
    19th February 2020, 11:58
    Exactly. There's an additional detail of the PNG algorithm that shows why the algorithm performs so bad: It has the default deflate window size of 32 KB. Now, looking at that Castlevania PNG, width is 13895 and bit depth is 24 bit, so each line takes 41 KB. So the algorithm doesn't even see the previous line. Webp is much better for this kind of data as it recognizes block repetitions: 1.952.389 Original PNG (CastlevaniaII-Simon...) 763.246 cwebp -lossless -q 100 -m 5 729.836 same cwebp, after that paq8p -3 (so the webp file still has some redundancies) Also note that the original PNG contains some additional data (text and arrows), removing them reduces the color count from 440 to 73 and improves compression further: 1.975.238 Modified PNG 675.626 cwebp -lossless -q 100 -m 5 640.962 same cwebp, after that paq8p -3 Still 8 times bigger than the NES ROM, but you get the idea. Also, the non-modified version stresses the point that PNG/cwebp are universal compressors, so they can process anything, even when modified.
    19 replies | 925 view(s)
  • Darek's Avatar
    19th February 2020, 11:18
    Darek replied to a thread Paq8pxd dict in Data Compression
    Scores of 4 corpuses for paq8pxd_v73 with -s15 mode. The best scores for Calgary, Canterbury and MaximumCompression for paq8pxd family!
    707 replies | 282021 view(s)
  • Scope's Avatar
    19th February 2020, 00:43
    The file sizes are usually almost the same, their difference is less than this could affect the quality. AVIF - 273625 JXL - 273627 AVIF and Jpeg XL are usually very close in size, for other encoders it is more difficult to make the same exact size. I uploaded these files, JXL can be decoded by djpegxl or http://libwebpjs.appspot.com/jpegxl/ and AVIF by FFmpeg or online at https://kagami.github.io/avif.js/ Because there are still no good tools for working with AVIF and the format itself is still unstable, sometimes images are not displayed correctly after muxing, but this should not interfere with comparing quality, as it's just one frame of an AV1 encoder. I wanted to do this, but because tests and article writing are not automated, it would take a lot more time, I initially made these comparisons for myself and they do not pretend to be extremely accurate, but rather a general rough review (similar to a Netflix article). Encoder versions have been added where possible. For a more scientific article, it would be nice to upload all the encoded images, use metrics and take much more examples, but I wrote a more amateurish one, any examples I showed can be repeated independently, at least roughly encoding to the size I specified (because at high bpp, when it has been shown at low bpp, overall placement may change). Compared to the original, all the images are not perfect, Jpeg XL blurred some areas and added artifacts, but they are not so annoying and he more correctly conveyed the general structure. AVIF and HEIC are more enjoyable when viewed closely, but they have distorted the overall image more strongly. It’s easier to see this by quickly switching images, and not on the slider. But I will change the description, because it was more personal preference. - https://medium.com/@scopeburst/mozjpeg-comparison-44035c42abe8 I also added a comparison between 8 and 10 bit AVIF/HEIC and some examples of the HTJ2K encoder.
    88 replies | 18630 view(s)
  • Shelwien's Avatar
    19th February 2020, 00:10
    I think it could make sense to make a complete and clean repository on your side (make scripts to download all submodules, then remove unnecessary stuff), then push that to github. Windows users don't have git by default etc. Also it would be good to have buildable full sources as release archive - getting most recent version of each codec is not always a good idea - eg. zstd developers frequently break it.
    177 replies | 43761 view(s)
  • Shelwien's Avatar
    18th February 2020, 23:55
    > " cmix its about 250kb per year." 250k out of what a 100mb file? 250kb smaller compressed size of 100M file (which is around 15mb). I meant this: http://mattmahoney.net/dc/text.html#1159 The surprising thing is that progress was equal during 4-5 recent years, even though one would normally expect exponential decay here. > HD sizes go bigger and faster than that every year. Not really. Storage becomes less and less reliable instead. > So you say we reached a limit of compression with current hardware and > to get better results need more powerful hardware? In a way - its possible to achieve new results with hardware implementations of advanced algorithms. But in practice I don't think that we're already up to date with modern hardware (SIMD,MT,x64). > If so why bother trying when their is not much to do?? There's very much to do, for example video recompression basically doesn't exist atm.
    19 replies | 925 view(s)
  • Trench's Avatar
    18th February 2020, 22:54
    The original file formats are outdated in how they handle things. But o well. well at least its not BMP format :) True its infinite just like combining elements to many different alloys, but at least their is a basic chart of the limited elements. Compression and decompression go hand in hand which I assume was implied. " cmix its about 250kb per year." 250k out of what a 100mb file? a gain o under 0.25% doesnt that seem worth the effort? HD sizes go bigger and faster than that every year. So you say we reached a limit of compression with current hardware and to get better results need more powerful hardware? If so why bother trying when their is not much to do?? if so i guess we have to wit until cmix gets better to incorporate the various meths in 1 since modern files have everything from text, music, art, etc.
    19 replies | 925 view(s)
  • skal's Avatar
    18th February 2020, 22:47
    You should add the exact encoded file size on the comparison slider. For instance, i'm very surprised how this one looks : https://imgsli.com/MTIyMDc/3/1 for AVIF. What are the respective file sizes? You should also print the *exact* final command line used (including the -q value, e.g.), along with the exact revision used, so that people can reproduce your exact results. Finally, i'm surprised about your comment for this one: https://imgsli.com/MTIyMTk/3/2 which says "JPEG-XL is slightly better than AVIF and HEIC", because frankly, the wool from the gloves disappeared on the left. skal/
    88 replies | 18630 view(s)
  • dnd's Avatar
    18th February 2020, 21:53
    Thank you for your elaboration, corrections and hints. I've removed recently some old, not maintained or not notable codecs. Many codecs listed in the readme, but not in the turbobench repository must manually downloaded and activated in the make file. I will continue to clean, simplify and automate the process.
    177 replies | 43761 view(s)
More Activity