Activity Stream

Filter
Sort By Time Show
Recent Recent Popular Popular Anytime Anytime Last 24 Hours Last 24 Hours Last 7 Days Last 7 Days Last 30 Days Last 30 Days All All Photos Photos Forum Forums
Filter by: Last 7 Days Clear All
  • Jarek's Avatar
    Today, 12:00
    There are three - from https://www.ibc.org/trends/2020-crunch-time-for-codecs/5569.article
    22 replies | 1071 view(s)
  • compgt's Avatar
    Today, 11:26
    Is your algorithm "recursive" or "perpetual" compression, i.e., you can apply the same algorithm to the output again and again and still achieves compaction?
    5 replies | 94 view(s)
  • Hcodec's Avatar
    Today, 11:15
    Of course one of the first laws i studied. So i found a way to transform the elements of set A into a subset of highly compressible numbers of lower entropy, the inverse takes less steps (signal bits) to reconstruct than the original size. Let's take a set of nine random numbers, unique as to not waste time with a huffman tree. {8,1,3,4,6,2,7,9,5,} 813462795 to binary is 30 bits. Entropy is. 28.65982114 bits. After a 4 step transform your number becomes. (0,0,1,2,5,6,7,3,5) or (1,2,5,6,7,3,5) but since i have not found a way to make the sets variable length and not loose integrity I'll add padding to make the set 8 digits (0,1,2,5,6,7,3,5) which is 21 bit plus 2 signal bits plus 2.33 bits for padding a 0 is 25.33 bits total or 2.814777778 bits per number. I would like to explain the transform, which is a great encryption also but probably move this out of off topic. I am not a programmer, this was a simple hand cipher compression problem.
    5 replies | 94 view(s)
  • Jyrki Alakuijala's Avatar
    Today, 09:03
    If the documents are in the main role and not compression and it is a personal project -- use the same compression that everyone else is using. Sorry if I missed sarcasm here... You are aware of the substitution problem in lossy jbig2?
    174 replies | 42506 view(s)
  • pklat's Avatar
    Today, 08:55
    I wanted to unpack old .pdf files, mostly scanned text. It seems such a waste creating pdf out of (scanned) images without proper OCR/vectorizing. jbig2 has an interesting lossy option. I googled it a bit, seems to do ocr or something. One day, I'll try to convert 'real' (vector) pdfs to svg, it seems much better to me. I've also unpacked cbz/cbr and such, dislike them. Presumably 1-bit images have other applications, too.
    174 replies | 42506 view(s)
  • Gotty's Avatar
    Today, 08:08
    Are you familiar with the pigenhole principle and the counting argument? #1. #2
    5 replies | 94 view(s)
  • Shelwien's Avatar
    Today, 06:43
    Btw, "mcm -store" also can be used as external preprocessor.
    35 replies | 1955 view(s)
  • Jyrki Alakuijala's Avatar
    Today, 06:17
    This is known. AVIF is currently better at low image quality. JPEG XL is currently better at medium and high image qualities, including the range of image quality used in the internet and the cameras today. It is an interesting question if low image quality or medium to high image quality will matter more in the future. It is a complex dynamic of internet/mobile speeds developing, display resolution increasing, image formats driving the cost of quality lower, HDR, impact of image quality on commerce, progressive rendering, and other user related dynamics.
    27 replies | 1538 view(s)
  • Jyrki Alakuijala's Avatar
    Today, 06:03
    Try using cjpegxl as follows: $ cjpegxl input.png output.jxl Try using 'cjpegxl --distance 1.5 input.png output.jxl' for slightly worse quality Don't worry if cjpegxl runs fast, likely about 1000x faster than what you are experiencing. If you want slightly (10 % or so) higher quality, use the --speed kitten, still 100x faster than what you use.
    27 replies | 1538 view(s)
  • Jyrki Alakuijala's Avatar
    Today, 05:38
    Out of curiosity: what do you need it for?
    174 replies | 42506 view(s)
  • byronknoll's Avatar
    Today, 04:41
    byronknoll replied to a thread paq8px in Data Compression
    Welcome back mpais! The dictionary preprocessor is something I wrote based on the WRT algorithm - it is not based on MCM. It uses the dictionary from phda.
    1958 replies | 547694 view(s)
  • Shelwien's Avatar
    Today, 04:16
    @Jarek: Sorry, but its not really a crash... Hosting company is experimenting with mod_security rules to block exploits. Not sure how to deal with it - VBulletin engine is not very safe, so in some cases its actually helpful.
    14 replies | 607 view(s)
  • Jarek's Avatar
    Today, 03:43
    I have updated https://arxiv.org/pdf/2004.03391 with perceptual evaluation, also to be combined with this decorrelation (for agreement with 3 nearly independent Laplace distributions) to automatically optimize quantization coefficients. So there is a separate basis P for perceptual evaluation e.g. YCrCb. In this basis we define d=(d1, d2, d3) weights for distortion penalty, e.g. larger for Y, smaller for Cr, Cb. There is also transform basis O into actually encoded channels (preferably decorrelated) with q=(q1, q2, q3) quantization coefficients. This way perceptual evaluation (distortion) becomes: D = |diag(q) O P^T diag(p)| Frobenius norm. Entropy (rate) is this H = h(X O^T) – lg(q1 q2 q3) + const bits/pixel. If P=O (rotations): perceptual evaluation is defined for decorrelation axes, then distortion D is minimized for quantization coefficients: (q1,q2,q3) = (1/d1,1/d2,1/d3) times constant choosing rate-distortion tradeoff. For general perceptual evaluation (P!=O) we can minimize rate H under constraint of fixed distortion D. ps. I didn't use bold 'H' because it literally crashes the server :D ("Internal Server Error", also italic, underline)
    14 replies | 607 view(s)
  • Shelwien's Avatar
    Today, 02:54
    Any comments on the site design? https://globalcompetition.compression.ru/ How do you think we can improve it to increase participation?
    40 replies | 2799 view(s)
  • DZgas's Avatar
    Today, 02:07
    DZgas replied to a thread JPEG XL vs. AVIF in Data Compression
    Another example for you, size files is 14 kb. Coding time 43 sec JpegXL 39 sec AVIF JepgXL save the few things that AVIF erased if viewed from far, comparing look of the original. But AVIF save so many forms and it looks better, JpegXL looks bad.
    27 replies | 1538 view(s)
  • Hcodec's Avatar
    Today, 01:46
    Thanks Gotty, It was quite the journey! living in another country i was only able to do so much. I am trying to figure out what to do with what I learned. What do you mean by pseudo random data exactly? I am referring to the Kolmogorov complexity or entropy when talking about the complexity of a bit stream. We were on a quest night and day for eight years to come up with a way to compress random data. It was quite the learning experience for me. I invented many concepts for the first time only to discover that others had made the same discovery many years before. I invented variable length coding, 3d cartisan point encryption, fractal geometry, and many other off the wall ideas only to find a way to compress random data.... only to find others invented the same years before. I came up with one idea that I had never seen since or after that showed the most hope and that is what led me here. I came up with a way to change the number and place value of digits subjectively in a pseudo random permutation order, that allows for an easy inverse. A way to take a random stream of any length of a high entropy and change it to a very low entropy. It is a new pseudo random generator. That allows for compression. I hope I can explain it more. Yes! I agree all data is pseudo random unless the source is based on some generator that defies to be quantified like radiation noise (hardware random number generators).
    5 replies | 94 view(s)
  • DZgas's Avatar
    Today, 01:26
    DZgas replied to a thread JPEG XL vs. AVIF in Data Compression
    Well, all codecs can. But with a low bitrate JpegXL cannot. I just found this problem. Average bitrate - JpegXL compress is good, same another codec. Low bitrates - the quality of AVIF is undeniable.
    27 replies | 1538 view(s)
  • DZgas's Avatar
    Yesterday, 23:54
    All these companies and services are involved. AV1 has too much support.
    22 replies | 1071 view(s)
  • Darek's Avatar
    Yesterday, 23:53
    Darek replied to a thread paq8px in Data Compression
    As I remember when I've found first paq instances (I don't remember exactly when - 2004? 2006?) but it was paq4 and then I've found paq6. During this time my laptop compress my testbed about 3-4h... Then I was though that such compress time is way too slow but now the fastests paq versions runs on my laptop about 60-70min... probably on Sportman machine it would be about 40-50min. I think, due to AMD/Intel battle CPU IPC will improve much during next 5 years and then "standard" paq will be really reasonable option. And now LSTM option compress my testset in 6hours (one instance time) - new era started - who knows when it would be reasonable to use but I'm sure it will.. :)
    1958 replies | 547694 view(s)
  • Gotty's Avatar
    Yesterday, 23:51
    Gotty replied to a thread paq8px in Data Compression
    Sorry for the misleading information. I checked only that the LSTM model does not use it and I was in the belief somehow that the LSTM model (being a neural network) replaces the paq mixer. Which is not true. See the answer form mpais above.
    1958 replies | 547694 view(s)
  • pklat's Avatar
    Yesterday, 23:46
    I tried webp lossless on black&white image (scanned text mostly), but ccitt has better ratio. Is there some modern alternative to ccitt? cmix compressed it but its just too slow and RAM consuming. edit: jbig2 seems to be better, even than cmix. and its fast!
    174 replies | 42506 view(s)
  • SolidComp's Avatar
    Yesterday, 22:25
    These are companies involved? I don't see any TV makers or AV equipment makers. We'll see how it goes.
    22 replies | 1071 view(s)
  • Darek's Avatar
    Yesterday, 21:50
    Darek replied to a thread paq8px in Data Compression
    Hmm, from my tetst shows that there is an difference between "la" and "l" option. Generally small (average 30-70 bytes but it exis, but for M.DBF difference is about 550 bytes). Are you sure that this option is ignored? For 1.BMP, C.TIF, M.DBF, and from R to X.DOC there are some gains using "la" vs. "l".
    1958 replies | 547694 view(s)
  • mpais's Avatar
    Yesterday, 20:58
    mpais replied to a thread paq8px in Data Compression
    Thanks for testing Darek. As for being faster than cmix, that's to be expected (cmix is basically paq8px(d) + paq8hp + mod_ppmd + LSTM + floating point mixer), but don't forget that porting these changes to cmix will also make it slightly faster, or at least Byron may choose to increase the LSTM network size and keep the same speed. Imho, even if the results are interesting, I still don't really see much use for the current crop of ML network architectures for data compression, they're all too slow (though progress is being made). Seeing as how you guys have been busy cleaning up the code and documenting things, I guess I'll leave a few more comments on the LSTM model. - We're using RMSNorm instead of LayerNorm (as in cmix), since it's faster and in testing showed better results. - The Softmax is done as in cmix also, using the direct application of the formula, which should by all means be numerically unstable due to the limited range of valid exponents in floating point single precision, but the usual correct implementation (finding the maximum value first, and using exp(v-max) instead of exp(v)) actually gave worse results in my (admittedly limited) testing. - Decoupling the learning rate for the forget gate and tunning it separately showed some promise on smaller files, but I couldn't find a decay strategy that was stable enough to give better overall results. - The code can still be made faster, as since we don't bother with proper memory alignment, we're using AVX2 unaligned loads and stores. - For reproducibility, an all fixed-point processing pipeline would be much better, since working with floating point is just a mess. Regarding further improvements, I must say that I'm a bit out of the loop as to what's new in paq8pxd and cmix. I see that paq8pxd now as an option "-w" that seems to be designed for enwik*, and cmix seems to have a dictionary preprocessor based on MCM(?). The mod_ppmd changes you mention in paq8pxd seem to just be a tweak of the maximum order and for allowing higher memory usage, am I missing something? If anything, the LSTM model showed that we really need to improve x86/64 executable compression, since that is where we get some of the best and most consistent gains. This would suggest that there's still a lot of low hanging fruit that probably needs a different approach which we've been missing. Also, one oft-forgotten advantage of cmix is that it has a floating point mixer. So even though its paq sub-models just output the same 12-bit predictions (then converted to FP), at least the LSTM predictions use full 32-bit single precision FP, as do the mixer weights, and perhaps more importantly even, so do the individual learning rates. @Gotty The LSTM model is just another model outputting predictions to the PAQ mixer, it doesn't replace it, so the "-a" option still has the same effect overall. It just doesn't affect the learning rate of the LSTM, since for that we have the decay strategy and the optimizer.
    1958 replies | 547694 view(s)
  • Gotty's Avatar
    Yesterday, 20:25
    Welcome, welcome! Sorry to hear your story. Random data compression is an everyday topic in the encode.su forum. Look under the https://encode.su/forums/19-Random-Compression subforum. What do you mean by pseudo random data exactly? And especially why would you like to compress such data? For me: All random data is pseudo random since all data were created by some process.
    5 replies | 94 view(s)
  • Gotty's Avatar
    Yesterday, 19:27
    Gotty replied to a thread paq8px in Data Compression
    Oh, that's a lot to test. Hint: the "-a" option is used only for the original paq-Mixer, it has no effect on LSTM, so when using "-L" the option "-a" is ignored. Edit: actually not true: see the reply from mpais below.
    1958 replies | 547694 view(s)
  • spaceship9876's Avatar
    Yesterday, 18:33
    EVC was ratified ~3 months ago. They haven't released an encoder or decoder to the public yet though. They also haven't announced licensing costs or terms for the main profile either. That codec may have been created not for mass adoption but to pressure the companies involved in VVC to use sensible licensing costs and terms.
    22 replies | 1071 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Yesterday, 17:38
    Fp8sk16 - tweak image24bit model - still the fastest paq8 version here is the source and the binary file
    34 replies | 1375 view(s)
  • Jyrki Alakuijala's Avatar
    Yesterday, 16:48
    Perhaps it is just me, but I see differences.
    27 replies | 1538 view(s)
  • Hcodec's Avatar
    Yesterday, 16:38
    Great to be here. Long short story...... if Possible. I became interested in data compression trying to help a brilliant friend Kelly D. Crawford PH.D. over come his abuse of alchohol. Thinking that if he took me on as a student he could hang on long enough to get help. He was programmer for many well known companies. We tackled data compression for 8 long years. My job was to come up with out of the box ideas, he would code. I started from knowing nothing. Just as we were making a breakthrough, he passed from liver problems. This was over a year ago, while looking over some of my notes and saw many things that I still think are possibilities, but I am not a programmer, aside from taking basic back in 1980. Came here to learn and perhaps connect with a programmer who would like to take a look at some of the out of the box ideas for random data compression.....pseudo random data, in particular. Jon
    5 replies | 94 view(s)
  • skal's Avatar
    Yesterday, 16:12
    skal replied to a thread JPEG XL vs. AVIF in Data Compression
    You should try with another test image, Lena has quite some problems (starting with being a bad old scan!). Not that i expect different results, but Lena should really be forgotten for tests... ​
    27 replies | 1538 view(s)
  • Darek's Avatar
    Yesterday, 16:12
    Darek replied to a thread paq8px in Data Compression
    Yes. However - for such a big change I'll test all parameters: 1) best option from "t", "a", "e" and "l" combinations to find best set of option to file 2) memory usage effect -> I'll test also (as usual because I do it for all versions) options with level 4 to 12 and for this version options 1 to 3 Then in most probably I'll find the best scores for my testet plus 4 corpuses. In general effectiveness is very high - as I mention some files are compress better than cmix with 2x (at least) speedup. As I see, from my testeset it looks loke bigger files got worse scores than cmix (K.WAD for example), however there are some advantages llike image parsers (generally for all and also like LZ77 precompressed image files = E.TIF) which provides to better score at all compared to cimix. In my opinion, there are also some additional things to improve of course = > especially Kaitz with some other users made very serious improvements like (of course if he want to share his ideas): 1) better 24bpp parser/optimization which provides to better scoes w/o LSTM. 2) very good impovement of big textual and html files compression like (made for enwik testset but works sometimes with other files) -w option and usage of external dictionary which is included into file. 3) some additional parsers like DEC Alpha which improves about 500KB in mozilla file. 4) some ppmd improvements like LucaBiondi made - it's nice additional gain due to some ppmd tune. 5) and last but not least, from some reasons for my testset file paq8px v184 have better compression for audio files than paq8px v188 with LSTM.
    1958 replies | 547694 view(s)
  • Gotty's Avatar
    Yesterday, 14:18
    Gotty replied to a thread paq8px in Data Compression
    Thank you Darek! I see you are testing with -9 memory level. I agree: this way results are comparable to earlier tests. (But not comparable to cmix results, which is OK, I guess.)
    1958 replies | 547694 view(s)
  • Bulat Ziganshin's Avatar
    Yesterday, 13:59
    Bulat Ziganshin replied to a thread Zstandard in Data Compression
    Thank you for extremely quick check. For me, the speed loss looks counter-intuitive - on Intel CPUs, AND can be performed on any of 4 ALUs, while SHL can be perfomed only on 2 ALUs, so AND shouldn't be any worse May be it will be different on other non-ARM cpus, in particular AMD Zen
    452 replies | 133221 view(s)
  • DZgas's Avatar
    Yesterday, 11:54
    Hahaha, rude.
    22 replies | 1071 view(s)
  • ivan2k2's Avatar
    Yesterday, 11:25
    There are 2 versions of calgary corpus, one with 14 files and other with 18 files. You can use whatever you want)
    1 replies | 101 view(s)
  • lz77's Avatar
    Yesterday, 10:49
    I wanted to download calgary.tar, I found via Google https://en.wikipedia.org/wiki/Calgary_corpus#External_links The ftp link "Original home of the Calgary Corpus" did not work, I've used "New home" instead: http://corpus.canterbury.ac.nz/descriptions/#calgary But this archive is differerent from http://mattmahoney.net/dc/calgary.tar I think the archive on mattmahoney.net is true. Give true links on most common corpuses (with its MD5/SHA-1) for benchmark, thanks.
    1 replies | 101 view(s)
  • Darek's Avatar
    Yesterday, 10:07
    Darek replied to a thread paq8px in Data Compression
    Scores of my testset for paq8px_v188 with -l option (comparison w and w/o) => about 100KB of gain for almost all files and a set new overall records for C.TIF, D.TGA, E.TIF, I.EXE, L.PAK, S.DOC and T.DOC! With paq8pxd which hold a records for 24bpp images (A.TIF and B.TGA) paq family holds 16 records on 28 files! cmix got the rest... :)
    1958 replies | 547694 view(s)
  • Gotty's Avatar
    Yesterday, 09:27
    Gotty replied to a thread paq8px in Data Compression
    @mpais: Oh, welcome back! Nice to see you again! Thank you for pulling/pushing my latest version. I appreciate it! Yes, that is what happened (after playing around I forgot to put the line back). I was not in a hurry to fix it, I had no idea someone is preparing something in the background ;-)
    1958 replies | 547694 view(s)
  • Jarek's Avatar
    Yesterday, 08:46
    Jarek replied to a thread JPEG XL vs. AVIF in Data Compression
    Chrome and Firefox are getting support for the new AVIF image formathttps://www.zdnet.com/article/chrome-and-firefox-are-getting-support-for-the-new-avif-image-format/ https://news.slashdot.org/story/20/07/09/202235/chrome-and-firefox-are-getting-support-for-the-new-avif-image-format
    27 replies | 1538 view(s)
  • Cyan's Avatar
    Yesterday, 05:57
    Cyan replied to a thread Zstandard in Data Compression
    Yes, the comment referred to the suggested hash function. Indeed, the `lz4` hash is different, using a double-shift instead. Since mixing of high bit seems a bit worse for the existing `lz4` hash function, it would implied that the newly proposed hash should perform better (better spread). And that's not too difficult to check : replace one line of code, and run on a benchmark corpus (important : have many different files of different types). Quite quickly, it appears that this is not the case. The "new" hash function (a relative of which used to be present in older `lz4` versions), doesn't compress better, in spite of the presumed better mixing. At least, not always, and not predictably. I can find a few cases where it compresses better : x-ray (1.010->1.038), ooffice (1.414 -> 1.454), but there are also counter examples : mr (1.833 -> 1.761), samba (2.800 -> 2.736), or nci (6.064->5.686). So, on first approximation, differences are mostly attributed to "noise". I believe a reason for this outcome is that the 12-bit hash table is already over-saturated, so it doesn't matter that a hash function has "better" mixing: all positions in the hash are already in use and will be overwritten before their distance limit. Any "reasonably correct" hash is good enough with regards to this lossy scheme (1-slot hash table). So, why selecting one instead of the other ? Well, speed becomes the next differentiator. And in this regards, according to my tests, there is really no competition : the double shift variant is much faster than the mask variant. I measure a 20% speed difference between the two, variable depending on source file, but always to the benefit of the double shift variant. I suspect the speed advantage is triggered by more than just the instructions spent for the hash itself. It seems to "blend" better with the rest of the match search, maybe due to instruction density, re-use of intermediate registers, or impact on match search pattern. Whatever the reason, the different is large enough to tilt the comparison in favor of the double-shift variant.
    452 replies | 133221 view(s)
  • SolidComp's Avatar
    Yesterday, 01:39
    Being "free" or not doesn't matter much for a lot of use cases and industries. Some people have an ideological obsession with "free" software for some reason, as opposed to free furniture or accounting services, etc. Lots of industries will pay for a good video compression codec, if it's only a dollar or two per unit. All the TV and AV companies pay, and AVC and HEVC have been hugely successful. It's mostly just browser makers who don't want to pay, so AV1 seems focused on the web.
    22 replies | 1071 view(s)
  • Bulat Ziganshin's Avatar
    Yesterday, 01:32
    Bulat Ziganshin replied to a thread Zstandard in Data Compression
    sorry, I can't edit the post. I thought that Cyan ansered me, but it seems that he answered algorithm
    452 replies | 133221 view(s)
  • Bulat Ziganshin's Avatar
    Yesterday, 01:27
    Bulat Ziganshin replied to a thread Zstandard in Data Compression
    you are right, except that it's opposite Imagine word '000abcde' (0 here represent zero byte). The existing code shifts it left, so it becomes 'abcde000' and then multiplies. As result, the first data byte, i.e. 'a' can influence only the highest byte of multiplication result. In the scheme I propose, you multiply '000abcde' by constant, so byte 'a' can influence 4 higher bytes of the result Note that you do it right way on motorola-endian architectures, this time using (sequence >> 24) * prime8bytes
    452 replies | 133221 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Yesterday, 01:17
    Fp8sk15 - improve image24 compression ratio here is the source and binary file
    34 replies | 1375 view(s)
  • moisesmcardona's Avatar
    12th July 2020, 23:20
    moisesmcardona replied to a thread paq8px in Data Compression
    CMake.
    1958 replies | 547694 view(s)
  • schnaader's Avatar
    12th July 2020, 22:38
    schnaader replied to a thread paq8px in Data Compression
    The source code is at Github: https://github.com/hxim/paq8px
    1958 replies | 547694 view(s)
  • suryakandau@yahoo.co.id's Avatar
    12th July 2020, 22:37
    and how to compile it using g++ ?
    1958 replies | 547694 view(s)
  • suryakandau@yahoo.co.id's Avatar
    12th July 2020, 22:27
    ​where can i get the source code ?
    1958 replies | 547694 view(s)
  • DZgas's Avatar
    12th July 2020, 22:21
    VVC has patents - well, use AV1! AVC / HEVC and most likely VVC - have very complicated coding algorithms and their ~extremely complex settings. For easy be Presets that are the settings for hard settings. Movie or animation presets. VP9 had just one button - to encode quickly, slowly, very slowly. It's all. AV1 is same, variant encoding speed. I think developers from all companies selected the most multi-complex algorithms for all cases, so that encoding algorithms work equally well in every situation I think that’s why VVC will have a microscopic advantage for Specifically Targeted Products.
    22 replies | 1071 view(s)
  • Jarek's Avatar
    12th July 2020, 22:14
    Oh, indeed - I got that feeling from checking a few years ago, but I see it has improved ~2018. ps. Just noticed that there is also EVC/MPEG-5 coming this year: https://en.wikipedia.org/wiki/Essential_Video_Coding
    22 replies | 1071 view(s)
  • Cyan's Avatar
    12th July 2020, 21:38
    Cyan replied to a thread Zstandard in Data Compression
    In this construction, top bit of `sequence` contributes to the entire hash. This feels intuitively better mixed than other hash formulas where the top bit only contributes to the top bit of the hash.
    452 replies | 133221 view(s)
  • Darek's Avatar
    12th July 2020, 21:34
    Darek replied to a thread paq8px in Data Compression
    Are you absolutely right - I'm testing 3 runs at once then I'm comparing non "l" mode with "l" mode with both slower timings (however both tests made in the same time and similar conditions - due to fact that I need also to test influence of "t", "e", "a" and "clear" options on LSTM and made some additional instances) - then as I wrote - I'll need to test it further deeper due to fact that slowdowns could be different for different instances. My comparison is only raffle. But the scores are good.... :)
    1958 replies | 547694 view(s)
  • mpais's Avatar
    12th July 2020, 21:24
    mpais replied to a thread paq8px in Data Compression
    I'm guessing that, even though your laptop has an AVX2 capable CPU, you're testing multiple simultaneous runs at lower (-9?) levels? That will absolutely destroy performance, since AVX2 code really makes CPUs run hot, hence you get thermal throttling, and the required memory bandwidth performance just isn't there either. I've seen the same thing happen in my testing, running 5 instances at once makes all of them over 2x slower :rolleyes: That's why I made it optional, it's just there in case you guys want to try it out and play with its settings.
    1958 replies | 547694 view(s)
  • moisesmcardona's Avatar
    12th July 2020, 21:12
    moisesmcardona replied to a thread paq8px in Data Compression
    I have submitted a Pull Request updating the CMakeLists file to compile using CMake/GCC. It compiled successful on GCC 10.1.0 gcc version 10.1.0 (Rev3, Built by MSYS2 project)
    1958 replies | 547694 view(s)
  • Darek's Avatar
    12th July 2020, 21:05
    Darek replied to a thread paq8px in Data Compression
    I've started to test my testset. It looks like -l option is 5-10 times slower than non lstm on my laptop. I need to test timings deeper. For first files some scores are better than cmix with 50% of time spend on compression...
    1958 replies | 547694 view(s)
  • mpais's Avatar
    12th July 2020, 20:39
    mpais replied to a thread paq8px in Data Compression
    The LSTM model, if enabled, is used with every block type. Since LSTMs are very slow learners, we don't just use its prediction, we use the current expected symbol (at every bit) and current epoch, along with an indirect context, as additional context for mixer contexts and APM contexts. In testing this significantly improved compression on relatively small files (upto 0.3% on small x86/64 executables, 0.13% on average on my testset). The code was written with C++11 in mind, since that's what's used in cmix, so on VS2019 with C++17 you'll get nagged that std::result_of is deprecated, and it doesn't use std::make_unique. The memory usage is also not reported to the ProgramChecker. I tried to make it reasonably modular, so it would be easy to exchange components and tweak parameters. Included are 2 learning rate decay strategies (PolynomialDecay and CosineDecay), 4 activations functions (Logistic, Tanh, HardSigmoid, HardTahn) and the optimizer used is Adam, though I also implemented others (Nadam, Amsgrad, Adagrad and SGD with(out) Nesterov Accelerated Gradient), just not with AVX2 instructions and none gave better results than Adam anyway. The current configuration is 2 layers of 200 cells each, and horizon 100, as in cmix, but with a different learning rate and gradient clip threshold. You can also configure each gate independently. I also omitted unused code, like the functions to save and load the weights (compatible with cmix), since those just dump the weights without any quantization. In the future, I'd like to try loading pre-trained language models with quantized 4 bit weights, possibly using N different LSTM models and using just the one for the current detected text language. The problems with that approach are finding large, good, clean datasets that are representative of each language, and the required computing power needed to train a model on them.
    1958 replies | 547694 view(s)
  • algorithm's Avatar
    12th July 2020, 20:18
    algorithm replied to a thread Zstandard in Data Compression
    No its not. Multiplicative hash function works better when bits are in the upper portion. So its about compression ratio. EDIT: Sorry I am wrong. The final shift is important for compression. But note that it is 5 byte hash and not 4.
    452 replies | 133221 view(s)
  • Bulat Ziganshin's Avatar
    12th July 2020, 19:31
    Bulat Ziganshin replied to a thread Zstandard in Data Compression
    I've not found thread dedicated to LZ4, and anyway there is similar code on ZSTD too: https://github.com/lz4/lz4/blob/6b12fde42a3156441a994153997018940c5d8142/lib/lz4.c#L648 It looks that it was optimized for ARM CPUs that supprt built-in bitshift in many ALU commands, am I right? I think that on x64 it would be both faster and provide better hash quality: return (U32)(((sequence & 0xFFFFFFFF) * prime8bytes) >> (64 - hashLog));
    452 replies | 133221 view(s)
  • SolidComp's Avatar
    12th July 2020, 19:09
    Jarek, HEVC isn't unused. It's widely used. It's the format used for Ultra HD Blu-ray, which is awesome. It's also used in streaming 4K content on a lot of platforms, and is supported by most 4K TVs and streaming devices. HEVC is much better than VP9, for reasons I don't understand. So it won this round. It's also not clear that VVC will have the same licensing issues that HEVC had. The Wikipedia article isn't written in an encyclopedic tone, and there's no explanation behind the opinions expressed by whoever wrote that part.
    22 replies | 1071 view(s)
  • SolidComp's Avatar
    12th July 2020, 19:01
    How did they beat AV1 so handily? I'm interested in the social science and cognitive science of both software development and domain-specific areas like video compression technology and theory. How do you think they beat AV1? Who was responsible? Was it likely to be a single person driving the technological advances, a small team, or a large team? Did they have to spend a lot of money developing VVC, you think? Is it the kind of thing where they'd have to recruit brilliant software engineers and video compression experts and pay them huge salaries and bonuses? I mean, there's no equity opportunity in working for some industry forum or coalition, no stock options and so forth. It's not like working for a Silicon Valley company. I wonder how the talent acquisition works. And the management and direction.
    22 replies | 1071 view(s)
  • Sportman's Avatar
    12th July 2020, 19:01
    With this three tools you can disable most Windows background activities: https://www.oo-software.com/en/shutup10 https://www.w10privacy.de/english-home/ https://docs.microsoft.com/en-us/sysinternals/downloads/autoruns Run as admin and reboot after change.
    3 replies | 147 view(s)
  • Darek's Avatar
    12th July 2020, 18:29
    Darek replied to a thread paq8px in Data Compression
    Welcome back mpais! :) One question - is it LSTM works with all datatypes on only with standard/text parts? OK, don't answer, I know - it works.
    1958 replies | 547694 view(s)
  • mpais's Avatar
    12th July 2020, 17:36
    mpais replied to a thread paq8px in Data Compression
    Changes: - New LSTM model, available with the option switch "L" This is based on the LSTM in cmix, by Byron Knoll. I wanted to play with it a bit but testing with cmix is simply too slow, so I ported it to paq8px and tried to speed it up with AVX2 code. Since we're using SIMD floating point code, files created using the AVX2 code path and the normal code path are incompatible, as expected. In testing the AVX2 version makes paq8px about 3.5x slower, so at least not as bad as cmix. Not sure that's much of a win... Note: The posted executable is based on v187fix5, which was the most up-to-date version when I was testing. The pull request on the official repository is based on 187fix7, Gotty's latest version, but something in that version broke the x86/64 transform, and I don't really have the time to check all the changes made between those versions. EDIT] Found the bug, Gotty seems to have mistakenly deleted a line in file filter/exe.hpp: uint64_t decode(File *in, File *out, FMode fMode, uint64_t size, uint64_t &diffFound) override { ... size -= VLICost(uint64_t(begin)); ... }
    1958 replies | 547694 view(s)
  • DZgas's Avatar
    12th July 2020, 17:06
    DZgas replied to a thread JPEG XL vs. AVIF in Data Compression
    JpegXL at maximum presset is so slow, same AVIF. For me - they are both the same Quality and Speed. But jpegXL cannot strongest compress...
    27 replies | 1538 view(s)
  • compgt's Avatar
    12th July 2020, 17:02
    Unless they really got one, tech giants like Google, Microsoft, Facebook etc. should buy any breakthrough (fast) data compression algorithm that comes along, like with 90% to 98% compression ratio.
    4 replies | 120 view(s)
  • DZgas's Avatar
    12th July 2020, 16:45
    DZgas replied to a thread JPEG XL vs. AVIF in Data Compression
    With this quality (1 BPP) JpegXL is indistinguishable from AVIF.
    27 replies | 1538 view(s)
  • Jyrki Alakuijala's Avatar
    12th July 2020, 14:57
    What happens if you compress to 32 kB, i.e., 1 BPP. Currently Internet compression averages at 2-3 BPP and cameras at 4-5 BPP. JPEG XL attempts to create a good result if you ask it to compress at distance 1.
    27 replies | 1538 view(s)
  • Jarek's Avatar
    12th July 2020, 13:38
    Here is some March 2020 benchmark paper https://arxiv.org/abs/2003.10282 Claim ~25-30% coding gain from HEVC (5-10% for AV1) ... in a bit more than half of complexity of AV1 ... Looks good if fair ... but with even greater licensing issues than nearly unused HEVC.
    22 replies | 1071 view(s)
  • paleski's Avatar
    12th July 2020, 13:23
    ​ Apparently 50% reduction is rather optimistic (marketing) statement, maybe can be reached only in some selected cases with subjectively perceived quality. 30%-40% reduction sounds more realistic, at least for the first waves of encoders. Earlier this year bitmovin.com posted an article mentioning VVC test results compared to HEVC: similar PSNR values were achieved while reducing the required bandwidth by roughly 35%.
    22 replies | 1071 view(s)
More Activity