Activity Stream

Filter
Sort By Time Show
Recent Recent Popular Popular Anytime Anytime Last 24 Hours Last 24 Hours Last 7 Days Last 7 Days Last 30 Days Last 30 Days All All Photos Photos Forum Forums
  • suryakandau@yahoo.co.id's Avatar
    Today, 14:34
    just testing paq8px201 and paq8px201fix1 on silesia corpus using -8lrta option, the result is: -8lrta paq8px201ori pa8px201fix1 xml 253766 253142 dickens 1899100 1898292 mozilla 6925711 6910517 mr 1988233 1987423 ooffice 1298763 1297553 reymont 751183 748527 sao 3749141 3749666 xray 3582991 3583354 osdb 2027465 2027137 samba 1672088 1669365 nci 849920 843474 webster 4657102 4646529 --------- -------- 29655463 29614979 diff 40484 bytes
    2306 replies | 609586 view(s)
  • Shelwien's Avatar
    Today, 13:32
    It may look controversial to you, but that's actually the point of this contest for Marcus Hutter. He invented this theory about AI development, called AIXI - https://en.wikipedia.org/wiki/AIXI And then created this contest to motivate people to prove it with their work. Until now, it didn't exactly go completely in the right direction - some technical problems of PC software certainly affect the results too much. But it did inspire development of better prediction algorithms too (NNCP etc). And for now there're no better rules for an open contest like that - if you think its easy to scam it, you can simply try. But better read the rules first.
    8 replies | 202 view(s)
  • Emil Enchev's Avatar
    Today, 13:19
    Are you all idiots, here?! May be you want to use AI that was trained on this specific enwik9 text too:)?! Using on dictionaries which are created in advance is a SCAM. If the program used does not compress other text files with an approximate compression ratio of enwik9, the whole Hutter Prize loses all its significance as a means of stimulating compression research. As I said it is absurd. P.s. Ok. Now you will see. I don't need to adjust dictionaries manually because I know how to calculate them optimally.
    8 replies | 202 view(s)
  • Shelwien's Avatar
    Today, 13:13
    > What you mean under "MANUAL tuning and dictionary optimization"? See all the Alex' previous HP entries before phda. http://mattmahoney.net/dc/text.html#1440 > Does this mean that the compressor can use ready-made dictionaries? Yes, and it does, but that's why program size is counted as part of compressed size for the contest. > If so, this award is completely absurd. Just read all the rules and pay more attention? > I can calculate in advance the optimal dictionary on very powerful machine, for example. Yes you can, and yes this option is already used in HP. But the size of compressed dictionary is added to final result (as part of decoder). > This is a hidden calculation and I think it is a scam. No, the rules are clearly explained. The main problem is that Alex invested 15 years of work into this, and already included all the useful open-source parts, while not opening the sources of his contest entries. This obviously makes it very hard to compete, especially for new people, but its not really impossible. > If you have option for MANUAL tuning, the situation is not much different, > as my manual interventions can be guided by this preliminary calculations too. WTF? If we already had a working AI, there'd be no point in this contest. As it is, its a competition between programmers on best design of a statistical model for prediction of wiki data. Yes, it may be possible to detect optimal values of some parameters in runtime, but there's a time limit, which is already hard to beat as it is, and any extra optimization algorithms would make the compressor slower.
    8 replies | 202 view(s)
  • Emil Enchev's Avatar
    Today, 12:56
    What you mean under "MANUAL tuning and dictionary optimization"? Does this mean that the compressor can use ready-made dictionaries? If so, this award is completely absurd. I can calculate in advance the optimal dictionary on very powerful machine, for example. This is a hidden calculation and I think it is a scam. If you have option for MANUAL tuning, the situation is not much different, as my manual interventions can be guided by this preliminary calculations too. WTF?
    8 replies | 202 view(s)
  • Jon Sneyers's Avatar
    Today, 09:16
    No, spec writers don't receive royalties. That money when you buy a spec just goes to ISO, I suppose to pay its support staff... ​
    52 replies | 4368 view(s)
  • Shelwien's Avatar
    Today, 04:32
    "BC" (Block Compression) comes from DirectX naming of texture compression methods: https://en.wikipedia.org/wiki/S3_Texture_Compression#BC4_and_BC5 https://docs.microsoft.com/en-us/windows/win32/direct3d11/texture-block-compression-in-direct3d-11 BCPack is likely an incremental improvement of BCx formats, likely with stronger entropy coding. This could've been an inspiration: https://github.com/BinomialLLC/crunch https://github.com/BinomialLLC/basis_universal Keep in mind that these texture formats are _lossy_ - that's actually the main reason for BCPack's "superiority" when compared to lossless kraken. This guy seems to be the BCPack developer - https://twitter.com/JamesStanard/status/1241060776896417795
    1 replies | 77 view(s)
  • Shelwien's Avatar
    Today, 04:01
    > At that point, once the speed has been set, we can discuss the reduction in size. Unfortunately its the reverse of how it actually works. Speed optimization is time-consuming, but also much more predictable than compression improvement. There're many known "bruteforce" methods for speed improvement - compiler tweaking, SIMD, MT, custom hardware. For example, these people claim 16GB/s LZMA compression (up to 256GB/s in a cluster): https://www.fungible.com/product/dpu-platform/ But its much harder to take an already existing algorithm and incrementally improve its compression ratio. To even start working on that its usually necessary to remove most of existing speed optimizations from the code (manual inlining/unrolling, precalculated tables etc), and then some algorithms simply can't be pushed further after some point (like LZ77 and huffman coding). Thus designing the algorithm for quality first (compression ratio in this case) is a much more reliable approach. Speed/quality tradeoff can be adjusted later, after reaching the maximum quality. Of course, the choices would be still affected by minimum acceptable speed - depending on whether its 1MB/s,10MB/s,100MB/s or 1000MB/s we'd have completely different choices (algorithm classes) to work with. Still, compromising on speed is the only option if we want to have better algorithms in the future. Speed optimizations can be always added later and better hardware is likely to appear, while better algorithms won't appear automatically - somebody has to design them and push the limits. Its just how it is - compression algorithms may have some parameters, but never cover the full spectrum of use cases, its simply impossible to push deflate to paq-level compression by allowing it to run slower - that requires a completely different algorithm (and underlying mathematical models).
    39 replies | 1490 view(s)
  • SolidComp's Avatar
    Today, 02:43
    Hi all – BCPack is apparently an all-new texture compression codec developed by Microsoft for the new Xbox Series X console. They seem to have implemented it in a hardware decompression chip mated to the SSD, much like Sony did with Kraken on the PlayStation 5. Does anyone know more about what BCPack is? What approaches and algorithms does it use? Who created it? (Is there a clue in the name? What would BC stand for in data compression circles?) Most of what I find online is from early 2020, before the release of the new Xbox, and with no details. This article from last year suggests that BCPack is better than Kraken. The more valid comparison from Oodle/Sony might be "Oodle Texture", which is supposed to be specialized, then followed by a Kraken step. Thanks.
    1 replies | 77 view(s)
  • e8c's Avatar
    Yesterday, 23:52
    Hi, https://mobile.twitter.com/jonsneyers Are IPSO (International PDF Selling Organization) members (or co-authors of JXL standard) receiving royalties for sold PDFs? https://www.iso.org/standard/77977.html https://www.iso.org/standard/80617.html :
    52 replies | 4368 view(s)
  • Darek's Avatar
    Yesterday, 23:31
    Darek replied to a thread paq8px in Data Compression
    Thanks, I know. I didn't keep this table now, then I can't change it now.
    2306 replies | 609586 view(s)
  • Lithium Flower's Avatar
    Yesterday, 22:50
    About non-photographic image tiny ringing artefacts and noise issue in vardct mode, In jpeg xl 0.2 vardct mode, Use cjxl -d 0.8 -s 9 --epf=3, can reduce tiny ringing artefacts, i think need -d 0.1 ~ 0.3 to get better result. Noise issue need use cjxl -d 0.1 ~ 0.3 -s 9, can reduce noise. png pal8 lossless use cjxl -q 100 -s 9 -g 2 -E 3 -I 1 can get best compress.
    52 replies | 4368 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Yesterday, 22:42
    @darek, I mean it should be paq8px with LSTM in your test set not paq8sk
    2306 replies | 609586 view(s)
  • Darek's Avatar
    Yesterday, 22:36
    Darek replied to a thread Paq8pxd dict in Data Compression
    enwik8 score for paq8px_v100 compared to paq8pxd_v95: 15'642'246 - enwik8 -x15 -w -e1,english.dic by paq8pxd_v95, change: 0,00%, time 10'130,00s 123'151'008 - enwik9 -x15 -w -e1,english.dic by paq8pxd_v95, change: 0,00%, time 102'009,55s 15'582'810 - enwik8 -x15 -w -e1,english.dic by paq8pxd_v100, change: -0,37%, time 22'066,98s 122'683'070 estimated - enwik9 -x15 -w -e1,english.dic by paq8pxd_v100, change: -0,37%, time - about 220'000s
    1039 replies | 365734 view(s)
  • byronknoll's Avatar
    Yesterday, 19:47
    You don't need to wait very long :) Yes, I tested enwik9 on my computer and it is valid: the decompressed file matches enwik9.
    8 replies | 202 view(s)
  • fcorbelli's Avatar
    Yesterday, 19:43
    Ahem... those little machines costs many thousands of euro each. And they consume about 300W each, and require an air conditioning on 24/365 (in Italy we have neither gas, oil nor nuclear). Yes, I am. Because BEFORE trusting in a new software a couple of years of testing is needed. You will never ever run anything new. With new builds of the same software you will run in parallel for months. rw-r--r-- 1 root wheel 794036166617 Jan 26 19:17 fserver_condivisioni.zpaq -rw-r--r-- 1 root wheel 320194332144 Jan 26 19:22 fserver_condivisioni47.zpaq Those are two backups, one for zpaqfranz v11, one for zpaqfranz v47 Even if you did it yourself. Imagine losing all your money for days because ooopsss there was corruption in restoring your Bank's backup. It just can't happen. In part yes, I agree. I am not able to tell. Unfortunately I no longer have the age, and therefore the time, to devote myself to projects that would interest me. These are things I could have done 25 years ago. Unfortunately, much of my time today is devoted to... paying taxes. A new algorithm from scratch that was more efficient would certainly be interesting. An algorithm as fast as the actual transfer media rate, say 500MB/s for 4 cores (which is typically how much you can use), even better. At that point, once the speed has been set, we can discuss the reduction in size. And the decompression speed, which must be decent. Because when you have a system hang, and you need to do a rollback, and your Bank's account is freezed, you can't wait 12 hours for unzipping. The ideal program reads and writes at the same speed that the data is "pumped" by the IO subsystem (which can also be a 40Gb NIC). Just like pv It would be a great relief to those who work in data storage.
    39 replies | 1490 view(s)
  • Shelwien's Avatar
    Yesterday, 18:35
    >> 1) Any compression algorithms would benefit from MT, when you can feed them different blocks. > In some cases separating the blocks considerably reduces efficiency. Sure, it depends on inter-block matches, and whether blocks break the internal data structure, which could be important for format detection and/or recompression. But recompression (forward transform) is usually slower than 50MB/s anyway, we have CDC dedup to take care of long inter-block matches, fast dictionary methods can be used to factor out inter-block matches, and nothing of above is really relevant when we're dealing with TBs of data - we'd be able to compress 100GB blocks in parallel, and that would barely affect the CR at all, because only specialized dedup algorithms handle that scale - for normal codecs its good if they can handle 1-2GB windows. > And those who use servers with 16 physical cores often will not have 1TB of data, but maybe 10. Or 100. As Bulat already said, that won't be unique data which has to be compressed with actual compression algorithms. Most of that data would be handled by dedup module in any case, so the speed of compression algorithm won't affect the overall performance that much. > But it is not exactly a common system, nor a requirement that seems > realistic to me for new software As I already calculated, 80MB/s is enough to compress 1TB of new data every 4 hours in 1 thread. You'd only use 16 if you really want it to run faster for some reason. > When you need to have 600TB of backup space don't worry too much about saving 20%. > Just buy some other hard disks. Maybe, but if the question is - buy extra 200TB of storage or use new free software (like your zpaq) to save 20% of space - are you that certain about the answer? > I point out, however, that the main backup software houses > in the world for virtual systems do not think so. > Which compress a little, ~deflate for example. Its more about ignorance and fashion than anybody actually evaluating their choices. Most even use default zlib when there're many much faster optimized implementations of the same API. For example, did you properly evaluate zstd for your use case? (Not just default levels, but also --fast ones, dictionary modes, manual parameter setting). If not, how you can say that zlib or LZ4 are better? > This does not mean that "we" are right, > but I expect a new software AT LEAST outperform the "old" ones. This is actually true in zlib vs zstd case, and seems to be rather common for goals of many other codecs too. But for me compression ratio is more interesting, so I made this thread about new algorithms with different properties, rather than about making zstd 2x faster via SIMD tricks.
    39 replies | 1490 view(s)
  • fcorbelli's Avatar
    Yesterday, 17:35
    I wouldn't be so assertive In some cases separating the blocks considerably reduces efficiency. In others less. ~ linear, yes. But as mentioned it is not important "how" You will typically never have all those cores 100% available, because servers work 24 hours a day. You doesn't turn off everything to make backups: more resources, but not infinite. And those who use servers with 16 physical cores often will not have 1TB of data, but maybe 10. Or 100. I do. In this case, as I have already explained, you will typically use specialized compression machines that read the snapshot data .zfs (thus loading the server's IO subsystem, but not the CPU, and not very much, with NVMe's). For example with physical 16CPU systems (AMD 3950X in my case). This gives you nearly 24 hours of backup time (100TB-time-scale). But it is not exactly a common system, nor a requirement that seems realistic to me for new software When you need to have 600TB of backup space don't worry too much about saving 20%. Indeed even 0% (no compression at all). Just buy some other hard disks. Certainly. I point out, however, that the main backup software houses in the world for virtual systems do not think so. Which compress a little, ~deflate for example. This does not mean that "we" are right, but I expect a new software AT LEAST outperform the "old" ones. =========== I add a detail for the choice of the algorithm: advanced management of large blocks of data-all-the-same. When you export vsphere typically thin disks become thick, and are padded with zeros (actually it depends on the filesystem, for example this happens for Linux based QNAP NAS, in sparse mode). So a new software should efficiently handle the case where there are hundreds of gigabytes of empty blocks, often positioned at the end. It is a serious problem especially during the decompression phase (slowdown) Just my cent
    39 replies | 1490 view(s)
  • Shelwien's Avatar
    Yesterday, 17:00
    Just test it for yourself, or try posting this in https://groups.google.com/g/hutter-prize . Some people did test it (like Byron Knoll), but you'd have to wait very long for their reply here.
    8 replies | 202 view(s)
  • Shelwien's Avatar
    Yesterday, 16:53
    1) Any compression algorithms would benefit from MT, when you can feed them different blocks. Linear scaling is not realistic because of shared L3 and other resources. But something like 0.5*N_cores*ST_speed would still turn 50MB/s into 400MB/s at 16 cores, and that's not even the maximum on modern servers. 2) If you have a scheduled backup every 4 hours, you don't really need the compressor to work faster than 80MB/s, so it may be sometimes profitable to use a slower codec, if it still fits the schedule and saves 20% of storage space. 3) Nobody forces you to use slower compression algorithms than what you currently like. Compression algorithm development is simply interesting for some people, and some other people are getting paid for development of new compression algorithms.
    39 replies | 1490 view(s)
  • Emil Enchev's Avatar
    Yesterday, 16:46
    So none was tested Alexander claims outside Hutter Prize for enwik9?
    8 replies | 202 view(s)
  • fcorbelli's Avatar
    Yesterday, 16:33
    You are thinking in the "gigabyte scale". Look at the watch for 4 seconds. That's a GB at 250MB/s (very fast indeed). Now look at the watch for an hour (~3.600s), or 1.000x. That's the TB scale at 250MB/s So every time you do a test, think about a method or an algorithm, imagine it for a time 1,000 times longer of what you are used to, if your job is not to make backups every day for hundreds of servers. I think that it is unlikely that you do a test on 1TB or maybe 50TB of data when developing such program. I need to, because it's my job. Then you will begin to understand that everything else doesn't matter, IF you don't have something really fast. But just so fast. I mean FAST. And how can it be that fast? Only if heavily multithreaded, of course. Single core performance is completely irrelevant IF does not scale ~ linearly with size And which compression algorithms (deduplication taken for granted) are so fast on multi-core systems, not because they are magical, but because they are highly parallel? Someone was offended, but it is simply factual. When in doubt, think about the terabyte scale with my gekandenexperiment and everything will be clearer. That's 1TB, the size for the hobbyist. Then multiply by 10 or even 100, so by 10,000 or 100,000 vs the 4-seconds-GB-time-scale, and THEN choose the algorithm Similarly for the consumption of RAM and all the points I have already written several times before
    39 replies | 1490 view(s)
  • Shelwien's Avatar
    Yesterday, 16:30
    I verified phda's enwik8 compression last year (on linux) - it matched. Didn't actually check enwik9, since it would take a week, but I don't see a reason to suspect it. phda doesn't even provide the best ratio currently - just fits the HP constraints. See http://mattmahoney.net/dc/text.html - cmix and nncp have better enwik9 results. There's nothing especially unique either - just all the useful open-source models and years of manual tuning and dictionary optimization. Btw, you can download it here: http://qlic.altervista.org/phda9.zip But make sure to read the read_me.txt - cmdline options are different for different input files. Also don't try decoding it on windows.
    8 replies | 202 view(s)
  • Darek's Avatar
    Yesterday, 16:20
    Darek replied to a thread Paq8pxd dict in Data Compression
    Scores for paq8pxd v99 and paq8pxd v100 on 4 Corpuses. Very nice changes and good improvements according to latest version (v95): Calgary: paq8pxd v99 = 484 bytes gain to previous version, paq8pxd v100 = 817 bytes gain for v99 Canterbury: paq8pxd v99 = no gain to previous version, paq8pxd v100 = 258 bytes gain for v99 Maximum Compression: paq8pxd v99 = 3.4KB gain to previous version, paq8pxd v100 = 40.9KB gain for v99 - good lstm improvement! Silesia: paq8pxd v99 = 150KB gain to previous version (!!!), paq8pxd v100 = about 170KB gain for v99 - paq8pxd is close to break 29'000'000 bytes score! And for -xN option there no such bad impact for "osdb" file.
    1039 replies | 365734 view(s)
  • fcorbelli's Avatar
    Yesterday, 16:18
    You're confirming my thesis In my previous post A real-ZPAQ-based example of a virtual Windows 2008 server with SQL server for a ERP software https://encode.su/threads/3559-Backup-compression-algorithm-recommendations?p=68383&viewfull=1#post68383 On filesystem
    39 replies | 1490 view(s)
  • Emil Enchev's Avatar
    Yesterday, 16:08
    Has anyone been able to test successfully Alexander Rhatushnyak claims? I can't run his program on my computer, and frankly I doubt that he was able to compress the enwik9 file to the size he claims with the standard compression algorithms available. And when I talk about an external test, I talk about test outside Hutter Prize founders Marcus Hutter, Jim Bowery and Matt Mahoney, who are interested parties.
    8 replies | 202 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Yesterday, 15:57
    that is not paq8sk in your testset darek, it should be paq8px
    2306 replies | 609586 view(s)
  • JamesB's Avatar
    Yesterday, 14:12
    Nice work. I think that's a bit smaller than the smallest CRAM I managed to produce (albeit *somewhat* slower!). I wonder what correlations it's discovering.
    41 replies | 2277 view(s)
  • lz77's Avatar
    Yesterday, 12:49
    > У вас в мультфильме он выглядит как мажор-гик с прихлебателями. Для "потерянных поколений" образца 90-х - 2000-х, наверно, так и должно быть. (Кто-то в районе фейсбука заметил, что это сексизм: девочка ходит за мальчиком и записывает его афоризмы. Ну, и не соблюдён баланс по цвету кожи: минимум половина персонажей должна быть чернокожей, а это уже расизм.) Для советских ИТР эта тройка выглядит нормально и юмор присутствует. В мультике проходит тема Штирлица: Стёпа чувствует, что они находятся под колпаком КГБ. Дети используют мои афоризмы. Вот первоисточник для вдохновения: http://kvant.mccme.ru/1978/09/rasstoyanie_stepy_moshkina.htm > Сейчас в 3D это делается достаточно быстро, включая синтезированную озвучку - попробуйте поискать на ютубе "mmd". Анимация - это долгая, кропотливая и дорогостоящая работа. Если что-то в ней делается быстро, то качество будет несмотрибельное...
    4 replies | 160 view(s)
  • RichSelian's Avatar
    Yesterday, 07:51
    i copied that codes from libsnappy if i don't remember wrong :)
    39 replies | 1490 view(s)
  • RichSelian's Avatar
    Yesterday, 07:45
    you can try some other ROLZ based compressor like balz https://sourceforge.net/projects/balz. ROLZ is more symmetric than LZ77 and still very fast
    39 replies | 1490 view(s)
  • Bulat Ziganshin's Avatar
    26th January 2021, 22:30
    wrong. if you backup your disk every day, typically only a few percents are changed. so, you may need to have very fast dedup algo (or not, if you watch disk changes using OS API), but compression speed is of less importance. and for decompression, you may just employ multiple servers
    39 replies | 1490 view(s)
  • Mauro Vezzosi's Avatar
    26th January 2021, 21:45
    cmv c -m2,0,0x0ba36a7f coronavirus.fasta coronavirus.fasta.cmv ​ 845497 coronavirus.fasta.cmv (~1700 MiB, ~5 days, decompression not verified) 218149 cmv.exe.zip (7-Zip 9.20) 1063646 Total
    41 replies | 2277 view(s)
  • Mauro Vezzosi's Avatar
    26th January 2021, 21:43
    Damn bzip2, I have it in \MinGW_7.1\x86_64-7.1.0-release-win32-sjlj-rt_v5-rev1\mingw64\opt and I had to add it by hand: g++ paq8pxd.cpp -DWINDOWS -msse2 -O3 -s -static -lz -I"\MinGW_7.1\x86_64-7.1.0-release-win32-sjlj-rt_v5-rev1\mingw64\opt\include" -L"\MinGW_7.1\x86_64-7.1.0-release-win32-sjlj-rt_v5-rev1\mingw64\opt\lib" -lbz2.dll -o paq8pxd.exe and also copy "\MinGW_7.1\x86_64-7.1.0-release-win32-sjlj-rt_v5-rev1\mingw64\opt\bin\libbz2-1.dll" to the current directory because paq8pxd requires it. Were the fixes here? 99 13013 int inputs() {return 0;} 13014 int nets() {return 0;} 13015 int netcount() {return 0;} 100 13745 int inputs() {return 2+1+1;} 13746 int nets() {return (horizon<<3)+7+1+8*256;} 13747 int netcount() {return 1+1;}
    1039 replies | 365734 view(s)
  • Shelwien's Avatar
    26th January 2021, 19:31
    @fcorbelli: > Use whatever you want that can handle at least 1TB in half night (4 hours). 2**40/(4*60*60) = 76,354,974 bytes/s Its not actually that fast, especially taking MT into account. I posted the requirements for the single thread of the algorithm, but of course the complete tool would be MT.
    39 replies | 1490 view(s)
  • Shelwien's Avatar
    26th January 2021, 19:24
    Unfortunately orz doesn't seem very helpful: 73102219 11.797s 2.250s // orz.exe encode -l0 corpus_VDI 1 72816656 12.062s 2.234s // orz.exe encode -l1 corpus_VDI 1 72738286 12.422s 2.234s // orz.exe encode -l2 corpus_VDI 1 53531928 87.406s 2.547s // lzma.exe e corpus_VDI 1 -d28 -fb273 -lc4 59917669 27.125s 2.703s // lzma.exe e corpus_VDI 1 -a0 -d28 -fb16 -mc4 -lc0 65114536 15.344s 2.860s // lzma.exe e corpus_VDI 1 -a0 -d24 -fb8 -mc1 -lc0 65114536 11.532s 2.875s // lzma.exe e corpus_VDI 1 -a0 -d24 -fb8 -mc1 -lc0 -mfhc4
    39 replies | 1490 view(s)
  • kaitz's Avatar
    26th January 2021, 18:19
    kaitz replied to a thread Paq8pxd dict in Data Compression
    silesia ​ paq8pxd v99 v100 -s8 -s8 diff dickens 1895705 1895269 436 mozilla 6917463 6910405 7058 mr 1999233 1998160 1073 nci 807857 801198 6659 ooffice 1305484 1301817 3667 osdb 2025419 2059676 -34257 reymont 759011 758606 405 samba 1680535 1676684 3851 sao 3734168 3733871 297 webster 4637776 4635525 2251 x-ray 3575990 3577183 -1193 xml 247545 246671 874 Total 29586186 29595065 -8879 v100 breaks osdb.
    1039 replies | 365734 view(s)
  • Shelwien's Avatar
    26th January 2021, 17:57
    > https://ru.wikipedia.org/wiki/Парадокс_Ябло На википедии очень мало, мне больше понравилась вот эта статья: https://iep.utm.edu/yablo-pa/ И еще вот тут есть интересные рассуждения: https://avva.livejournal.com/1159044.html > Это не Гёдель, я придумал это доказательство, когда узнал о парадоксе Ябло: Я имел в виду, что для начала надо доказать утверждение, что любое утверждение либо истинно, либо ложно. Что неверно, т.к. существуют невычислимые утверждения. По ссылке выше, кстати, тоже упоминается о сводимости парадокса Ябло к вычислимости. > По легенде, Стёпа Мошкин - простой советский школьник 1975-1980 гг. У вас в мультфильме он выглядит как мажор-гик с прихлебателями. > Я вначале сам пытался анимировать, но понял, что это затянется на годы. Сейчас в 3D это делается достаточно быстро, включая синтезированную озвучку - попробуйте поискать на ютубе "mmd". К сожалению, в наших краях компьютерная анимация не очень популярна, но можно бы попробовать поискать в заграничных вузах, типа https://www.bachelorstudies.ru/Bakalavriat/%D0%90%D0%BD%D0%B8%D0%BC%D0%B0%D1%86%D0%B8%D1%8F/%D0%95%D0%B2%D1%80%D0%BE%D0%BF%D0%B0/ Или у авторов короткометражек на ютубе поспрашивать - тема не забитая, если вы бесплатно дадите сценарий - может и найдутся желающие. Или можно попробовать сделать в виде текста с иллюстрациями на сайте типа https://author.today/
    4 replies | 160 view(s)
  • lz77's Avatar
    26th January 2021, 16:01
    It seems, in your C++ version of libzling you used my idea of finding match length. :)
    39 replies | 1490 view(s)
  • RichSelian's Avatar
    26th January 2021, 15:15
    i rewrote libzling with rust years ago (https://github.com/richox/orz) and libzling is no longer maintained now the compression ratio is almost the same with lzma for text data, but 10x times faster :)
    39 replies | 1490 view(s)
  • lz77's Avatar
    26th January 2021, 14:32
    > Can you please post the Delphi source? Pardon, I have another plans. I do not receive a salary either on Google or on Facebook. I want to sell this algorithm and sources (in C) as a shareware. Perhaps the students will be interested in this. Maybe in the next GDC this algorithm will help me win a prize. :)
    12 replies | 1243 view(s)
  • lz77's Avatar
    26th January 2021, 13:59
    Это не Гёдель, я придумал это доказательство, когда узнал о парадоксе Ябло: RU: https://ru.wikipedia.org/wiki/Парадокс_Ябло EN: https://en.wikipedia.org/wiki/Yablo's_paradox Я считаю, что это не парадокс, а софизм. Главный герой взят из старых статей журнала "Квант" (см. kvant.info). Я выписывал этот журнал, когда сидел в школе. По легенде, Стёпа Мошкин - простой советский школьник 1975-1980 гг. Действительно, совр. зрителей, которые смотрят Машу и медведя, это раздражает. Но кое-кому понравилось: https://habr.com/ru/post/474426/ Я вначале сам пытался анимировать, но понял, что это затянется на годы. Анимацию делал начинающий аниматор (не художник) из Харькова, это был его 1-й мультфильм. Мультфильм рассчитан на советских ИТР, кандидатов и докторов наук, а также на всех любителей математики.
    4 replies | 160 view(s)
  • SolidComp's Avatar
    26th January 2021, 04:56
    Have you looked at libzling? It had an intriguing balance of ratio and speed as of 2015 or so, and it seemed like there was still headroom for improvement. The ratios weren't as good as LZMA, but it might be possible to get there. You mentioned the possibility of optimizing LZMA/2, which is what I was thinking. Have you seen Igor's changes to 7-Zip 21.00? ​I wonder if SIMD might have asymmetric implications on algorithm choices. AVX2 should be the new baseline going forward, and some algorithms might be differentially impacted.
    39 replies | 1490 view(s)
  • radames's Avatar
    26th January 2021, 01:40
    Nah no worries, just post as you like. I let my account stay unused now. I just see why the community archiver Fairytale just stopped being developed :( Bye everyone!
    39 replies | 1490 view(s)
  • fcorbelli's Avatar
    26th January 2021, 01:37
    Just because the use case is not "mine", not "yours". A virtual disk is tipically made of hundreds of gigabyte. A tipical virtual backup can be multiple terabyte long. Do you agree on this use case? Because if yours virtual disk is 100MB then you are right Your use case != mine Short version Everything you want to use must consider the cardinality, space and time needed. It doesn't matter if you use X Y Z Use whatever you want that can handle at least 1TB in half night (4 hours). Preprocess, deduplication, ecc whatever. The real limit is time, not efficiency (space) This is not hutter prize. This is not 1GB challenge This is minimum 1TB challenge I hope this is clear. Anyway I will not post anymore
    39 replies | 1490 view(s)
  • radames's Avatar
    26th January 2021, 00:53
    Not sure about the reason being so negative on a site where people share thoughts but whatever :rolleyes: your use case != my use case, short and polite ZPAQ: -method 4: LZ77+CM, BWT or CM https://github.com/moinakg/pcompress#pcompress support for multiple algorithms like LZMA, Bzip2, PPMD, etc, with SKEIN/SHA checksums for data integrity Let the user decide or use detection. A single magic compression doesn't exist, and my idea with stream detection was okay.
    39 replies | 1490 view(s)
  • fcorbelli's Avatar
    26th January 2021, 00:05
    Can you please post the Delphi source?
    12 replies | 1243 view(s)
  • Shelwien's Avatar
    25th January 2021, 22:50
    So you're just saying that its not acceptable for you, that's ok. But it doesn't provide an answer to my question - which compression algorithms fits the constraints. Also, 1) Decoding is not strictly necessary for verification. Instead we can a) Prove that our compression algorithm is always reversible b) Add ECC to recover from hardware errors (at that scale it would be necessary in any case, even with verification by decoding) 2) LZ4 and deflate(pigz) are far from the best compression algorithms in any case. Even if you like these specific algorithms/formats, there're still known ways to improve their compression without speed loss. And then there're different algorithms that may be viable on different hardware, in the future, etc. You can't just say that perfect compression algorithms already exist and there's no need to think about further improvements.
    39 replies | 1490 view(s)
  • Shelwien's Avatar
    25th January 2021, 22:35
    Is it a Gödel thing?.. Anyway, characters are rather unique, but unlikable and rather painful to watch. Is that guy a son of nouveau riche? Btw, largest integer exists, and is called INT_MAX.
    4 replies | 160 view(s)
  • fcorbelli's Avatar
    25th January 2021, 22:25
    In fact, no. You will decode always. Every single time. Because you need to verify (or check) Unless you use, as I write, a "block checksumming approach". In this case you do not need to extract data at all (to verify). And you never can use a 4 or even 20MB/s algorithm I attach an example of "real world" virtual machine disk backup About 151TB stored in 437MB To "handle" this kind of file you will need the fastest algorithm, not the one with the most compression. That's what I'm trying to explain: the faster and lighter, the better. So LZ4 or PIGZ, AFAIK
    39 replies | 1490 view(s)
  • Shelwien's Avatar
    25th January 2021, 21:30
    @fcorbelli: What you say is correct, but completely unrelated to the topic here. Current compression algorithms are not perfect yet. For example, zstd design and development focuses on speed, while lzma provides 10% better compression and could be made much faster if somebody redesigned and optimized it to work on modern CPUs. Then, CM algorithms are slow, but can provide 20-30% better compression than zstd, and CM encoding can be actually much faster than LZ with parsing optimization - maybe even the requested 50MB/s. But decoding would be currently still around 4MB/s or so. So in some discussion I mentioned that such algorithms with "reverse asymmetry" can be useful in backup, because decoding is relatively rare there. And after a while got a feedback from actual backup developers with codec performance constraints that they're willing to accept. Problem is, it would be very hard to push CM to reach 20MB/s decoding, because its mostly determined by L3 latency. But it may be still possible, and there're other potential ways to the same goal - basically with all usual classes of compression algorithms. So I want to know which way would be easiest.
    39 replies | 1490 view(s)
  • fcorbelli's Avatar
    25th January 2021, 21:03
    It doesen't matter. You need something that deduplicates BUT that allows you to do many other things, such as verifying files without decompressing. If you have to store 300GB or 330GB really make no difference (huge from the point of view of software performance,irrelevant to use. It doesen't matter It doesen't matter, at all It doesen't matter You will find anything. Images, executables, database files, videos, word and excel documents, ZIP, RAR, 7z In my previous post 0) versioning "a-la-time-machine" 1) deduplication. 2) highly parallelizable compression. 3) RAM consumption 4) works with really large files 5) decompression which does NOT seek (if possible) 6) an advanced and fast copy verification mechanism WITHOUT decompress if possible 7) easy portability between Windows-Linux-* Nix systems. 8 append-only format 9) Reliability reliability reliability. No software "chains", where bugs and limitations can add up. This is, in fact, a patched ZPAQfranz with a fast LZ4 -1 compressor/decompressor Or zpaqfranz running on a zfs datastorage system, with embedded lz4. On developing hand a block-chunked format, with compressed AND uncompressed hash AND uncompressed CRC-32
    39 replies | 1490 view(s)
  • lz77's Avatar
    25th January 2021, 19:03
    Watch it on my youtube channel: https://www.youtube.com/watch?v=y0d5vniO2vk Intrigue of the 1st series: where is the bug in the proof of the Great Theorem? Maybe someone will translate the cartoon scenario into English... If anyone is interested, I will translate the summary into English using the google translate.
    4 replies | 160 view(s)
  • Shelwien's Avatar
    25th January 2021, 18:56
    Maybe increase window size for zstd? "-1" default is 512kb Also test newer zstd? They made significant speed improvements in 1.4.5, and extra 5% in 1.4.7 too.
    12 replies | 1243 view(s)
  • lz77's Avatar
    25th January 2021, 18:22
    Wow, I guess I finishing writing (for now in Delphi 7) not bad simple pure LZ77-type compressor called notbadlz. :-) I'm trying to beat zstd at fast levels for sporting interest. I set in my compressor 128K input buffer and 16K cells, here are results: (zstd-v1.4.4-win64) A:\>timer.exe zstd.exe --fast -f --no-progress --single-thread enwik8 Timer 3.01 Copyright (c) 2002-2003 Igor Pavlov 2003-07-10 enwik8 : 51.82% (100000000 => 51817322 bytes, enwik8.zst) Kernel Time = 0.062 = 00:00:00.062 = 7% User Time = 0.781 = 00:00:00.781 = 90% Process Time = 0.843 = 00:00:00.843 = 98% Global Time = 0.859 = 00:00:00.859 = 100% *** A:\>timer.exe notbadlz.exe Timer 3.01 Copyright (c) 2002-2003 Igor Pavlov 2003-07-10 Source size 100000000, packed size: 47086177 ratio 0.47086 Done in 1.35210 Kernel Time = 0.078 = 00:00:00.078 = 5% User Time = 1.359 = 00:00:01.359 = 93% Process Time = 1.437 = 00:00:01.437 = 98% Global Time = 1.453 = 00:00:01.453 = 100% *** A:\>timer.exe zstd.exe -d -f --no-progress --single-thread enwik8.zst -o enwik Timer 3.01 Copyright (c) 2002-2003 Igor Pavlov 2003-07-10 enwik8.zst : 100000000 bytes Kernel Time = 0.109 = 00:00:00.109 = 24% User Time = 0.203 = 00:00:00.203 = 44% Process Time = 0.312 = 00:00:00.312 = 68% Global Time = 0.453 = 00:00:00.453 = 100% A:\>timer notbadlzunp.exe Timer 3.01 Copyright (c) 2002-2003 Igor Pavlov 2003-07-10 Compressed size: 47086177 Decompressed size: 100000000 Done in 0.52653 sec. Kernel Time = 0.093 = 00:00:00.093 = 12% User Time = 0.515 = 00:00:00.515 = 68% Process Time = 0.609 = 00:00:00.609 = 81% Global Time = 0.750 = 00:00:00.750 = 100% With 128K cells and input buffer size = 256 Mb notbadlz compresses enwik8 better than zstd -1. Zstd does it faster, but I have in stock asm 64, prefetch etc. Now I'm thinking whether it's worth porting my program in FASM...
    12 replies | 1243 view(s)
  • pklat's Avatar
    25th January 2021, 16:44
    iirc, vbox has an option to keep virtual disk and save diff in separate file. dunno about others.
    39 replies | 1490 view(s)
  • radames's Avatar
    25th January 2021, 16:18
    This is where I agree with Eugene. This is where I agree with Franco. My tests were about what tool does what the best? Zpaq cannot de-duplicate as good as SREP. LZMA2 is fast enough with good ratio to compress an SREP file. Integrated exe/dll preprocessing is important as well like Eugene said. Will we need precomp? What are streams we find on a windows OS VM? ZIP/CAB(non-deflate like LZX/Quantum) files? Are the latter covered? What about ELF/so and *NIX operating systems? Those are important for VMs and servers as well. What are priorities and in what order? Multithreading >>> Deduplication/Journaling >> Recompress popular streams > Ratio (controlled by switch) What entropy coder really makes a difference? Franco made an excellent point about transfer speeds (Ratio will matter for your HDD storage space and 10Gbit/s net). Your network and disk speeds are almost as important as your total threads and RAM. I am just here because I am interested in what you might code. Eugene just please don't make it too difficult or perfect, or it may never be finished.
    39 replies | 1490 view(s)
  • suryakandau@yahoo.co.id's Avatar
    25th January 2021, 15:13
    by using paq8sk44 -s8 option on f.jpg (DBA corpus) the result is Total 112038 bytes compressed to 80194 bytes. Time 19.17 sec, used 2444 MB (2563212985 bytes) of memory
    1039 replies | 365734 view(s)
  • fcorbelli's Avatar
    25th January 2021, 14:23
    Now change a little on the image, just like a real VM do. redo the backup. How much space needed to retain today and yesterday backups? How much time and ram is needed to verify those backups? Today and yesterday PS shitty korean smartphone does not like english at all
    39 replies | 1490 view(s)
  • fcorbelli's Avatar
    25th January 2021, 14:20
    Ahem... No Srep is not enough sure to use. Nanozip source is not available. And those are NOT versioned backup. VM are simply too big to keep with different file. If you have a 500GB thick disk who become a 300GB compressed file today, what you will do tomorrow? Another 300GB? For a month-long backup retention policy, where to out 300GBx30=9TB for just a single vm? How long will take to transfer 300GB via LAN? How long will take to verify 300GB via LAN? Almost everything is good for a 100MB file. Or 30GB. But for 10TB of a typical vsphere server?
    39 replies | 1490 view(s)
  • Darek's Avatar
    25th January 2021, 13:58
    Darek replied to a thread Paq8pxd dict in Data Compression
    Scores for paq8pxd v99 and paq8pxd v100 for my testset. Paq8pxd v99 version is about 15KB better than paq8px v95. paq8pxd v100 version is abput 35KB better than paq8px v99 - mainly due to lstm implementation, however it's not as big gain like in paq8px version (about 95KB between non-lstm and lstm versions) Timings for paq8pxd v99, paq8pxd v100 and paq8px v200 (-l) versions: paq8pxd v99 = 5'460,32s paq8pxd v100 = 11'610,80s = 2.1 times slower - it's still about 1.7 times faster than paq8px paq8px v200 = 19'440,11s
    1039 replies | 365734 view(s)
  • Shelwien's Avatar
    25th January 2021, 12:32
    1) There're probably better options for zstd, like lower level (-9?) and --long=31, or explicit settings of wlog/hlog via --zstd=wlog=31 2) LZMA2 certainly isn't the best option, there're at least RZ and nanozip. 3) zstd doesn't have integrated exe preprocessing, while zpaq and 7z do - I'd suggest testing zstd with output of "7z a -mf=bcj2 -mm=copy"
    39 replies | 1490 view(s)
  • radames's Avatar
    25th January 2021, 12:11
    NTFS image: 38.7 GiB - ntfs-ptcl-img (raw) simple dedupe: 25.7 GiB - ntfs-ptcl-img.srep (m3f) -- 22 minutes single-step: 14.0 GiB - ntfs-ptcl-img.zpaq (method 3)-- 31 minutes 13.0 GiB - ntfs-ptcl-img.zpaq (method 4) -- 69 minutes Chained: 12.7 GiB - ntfs-ptcl-img.srep.zst (-19) -- hours 11.9 GiB - ntfs-ptcl-img.srep.7z (ultra) -- 21 minutes 11.8 GiB - ntfs-ptcl-img.srep.zpaq (method 4) -- 60 minutes 2700X, 32 GB RAM times are for the respecting step, not cumulative I think there is no magic archiver for VM images yet, just good old SREP+LZMA2.
    39 replies | 1490 view(s)
  • Jyrki Alakuijala's Avatar
    25th January 2021, 12:03
    Lode Vandevenne (one of the authors of JPEG XL) has hacked up a tool called grittibanzli in 2018. It recompresses gzip streams using more efficient methods (brotli) and can reconstruct the exactly same gzip bit stream back. For PNGs you can get ~10 % denser representation while being able to recover the original bit exact. I don't think people should be using this when there are other options like just using stronger formats for pixel exact lossless coding: PNG recompression tools (like Pingo or ZopfliPNG), br-content-encoded uncompressed but filtered PNGs, WebP lossless and, JPEG XL lossless.
    52 replies | 4368 view(s)
  • Jon Sneyers's Avatar
    24th January 2021, 22:42
    The thing is, it's not just libpng that you need to update. Lots of software doesn't use libpng, but just statically links some simple png decoder like lodepng. Getting all png-decoding software upgraded is way harder than just updating libpng and waiting long enough. I don't think it's a substantially easier task than getting them to support, say, jxl.
    52 replies | 4368 view(s)
  • e8c's Avatar
    24th January 2021, 22:22
    Sounds like "... not everything will use the latest version of Linux Kernel. Revising or adding features to an already deployed Linux is just as hard as introducing a new Operating System."
    52 replies | 4368 view(s)
  • Jon Sneyers's Avatar
    24th January 2021, 22:04
    I guess you could do that, but it would be a new kind of PNG that wouldn't work anywhere. Not everything uses libpng, and not everything will use the latest version. Revising or adding features to an already deployed format is just as hard as introducing a new format.
    52 replies | 4368 view(s)
  • Shelwien's Avatar
    24th January 2021, 18:36
    0) -1 is not the fastest mode (--fast=# ones are faster). It seems that --fast disables huffman coding for literals, but still keeps FSE for matches. 1) Yes, huffman for literals, FSE for matches - or nothing if block is incompressible 2) No. Certainly no preprocessing, but maybe you'd say that "--adapt" mode is related. 3) https://github.com/facebook/zstd/blob/dev/lib/compress/zstd_compress.c#L5003 - 16k cells 4) The library works with user buffers of whatever size, zstdcli seems to load 128KB blocks with fread. https://github.com/facebook/zstd/blob/69085db61c134192541208826c6d9dcf64f93fcf/lib/zstd.h#L108
    12 replies | 1243 view(s)
  • moisesmcardona's Avatar
    24th January 2021, 17:21
    moisesmcardona replied to a thread paq8px in Data Compression
    As we said, we will not be reviewing code that is not submitted via Git.
    2306 replies | 609586 view(s)
  • e8c's Avatar
    24th January 2021, 17:12
    (Khmm ...) "this feature in ... PNG" means "this feature in ... LibPNG": transcoding JPG -> PNG, result PNG smaller than original JPG. (Just for clarity.)
    52 replies | 4368 view(s)
More Activity