Activity Stream

Filter
Sort By Time Show
Recent Recent Popular Popular Anytime Anytime Last 24 Hours Last 24 Hours Last 7 Days Last 7 Days Last 30 Days Last 30 Days All All Photos Photos Forum Forums
  • RichSelian's Avatar
    Today, 07:51
    i copied that codes from libsnappy if i don't remember wrong :)
    32 replies | 1237 view(s)
  • RichSelian's Avatar
    Today, 07:45
    you can try some other ROLZ based compressor like balz https://sourceforge.net/projects/balz. ROLZ is more symmetric than LZ77 and still very fast
    32 replies | 1237 view(s)
  • Bulat Ziganshin's Avatar
    Yesterday, 22:30
    wrong. if you backup your disk every day, typically only a few percents are changed. so, you may need to have very fast dedup algo (or not, if you watch disk changes using OS API), but compression speed is of less importance. and for decompression, you may just employ multiple servers
    32 replies | 1237 view(s)
  • Mauro Vezzosi's Avatar
    Yesterday, 21:45
    cmv c -m2,0,0x0ba36a7f coronavirus.fasta coronavirus.fasta.cmv ​ 845497 coronavirus.fasta.cmv (~1700 MiB, ~5 days, decompression not verified) 218149 cmv.exe.zip (7-Zip 9.20) 1063646 Total
    40 replies | 2201 view(s)
  • Mauro Vezzosi's Avatar
    Yesterday, 21:43
    Damn bzip2, I have it in \MinGW_7.1\x86_64-7.1.0-release-win32-sjlj-rt_v5-rev1\mingw64\opt and I had to add it by hand: g++ paq8pxd.cpp -DWINDOWS -msse2 -O3 -s -static -lz -I"\MinGW_7.1\x86_64-7.1.0-release-win32-sjlj-rt_v5-rev1\mingw64\opt\include" -L"\MinGW_7.1\x86_64-7.1.0-release-win32-sjlj-rt_v5-rev1\mingw64\opt\lib" -lbz2.dll -o paq8pxd.exe and also copy "\MinGW_7.1\x86_64-7.1.0-release-win32-sjlj-rt_v5-rev1\mingw64\opt\bin\libbz2-1.dll" to the current directory because paq8pxd requires it. Were the fixes here? 99 13013 int inputs() {return 0;} 13014 int nets() {return 0;} 13015 int netcount() {return 0;} 100 13745 int inputs() {return 2+1+1;} 13746 int nets() {return (horizon<<3)+7+1+8*256;} 13747 int netcount() {return 1+1;}
    1037 replies | 365220 view(s)
  • Shelwien's Avatar
    Yesterday, 19:31
    @fcorbelli: > Use whatever you want that can handle at least 1TB in half night (4 hours). 2**40/(4*60*60) = 76,354,974 bytes/s Its not actually that fast, especially taking MT into account. I posted the requirements for the single thread of the algorithm, but of course the complete tool would be MT.
    32 replies | 1237 view(s)
  • Shelwien's Avatar
    Yesterday, 19:24
    Unfortunately orz doesn't seem very helpful: 73102219 11.797s 2.250s // orz.exe encode -l0 corpus_VDI 1 72816656 12.062s 2.234s // orz.exe encode -l1 corpus_VDI 1 72738286 12.422s 2.234s // orz.exe encode -l2 corpus_VDI 1 53531928 87.406s 2.547s // lzma.exe e corpus_VDI 1 -d28 -fb273 -lc4 59917669 27.125s 2.703s // lzma.exe e corpus_VDI 1 -a0 -d28 -fb16 -mc4 -lc0 65114536 15.344s 2.860s // lzma.exe e corpus_VDI 1 -a0 -d24 -fb8 -mc1 -lc0 65114536 11.532s 2.875s // lzma.exe e corpus_VDI 1 -a0 -d24 -fb8 -mc1 -lc0 -mfhc4
    32 replies | 1237 view(s)
  • kaitz's Avatar
    Yesterday, 18:19
    kaitz replied to a thread Paq8pxd dict in Data Compression
    silesia ​ paq8pxd v99 v100 -s8 -s8 diff dickens 1895705 1895269 436 mozilla 6917463 6910405 7058 mr 1999233 1998160 1073 nci 807857 801198 6659 ooffice 1305484 1301817 3667 osdb 2025419 2059676 -34257 reymont 759011 758606 405 samba 1680535 1676684 3851 sao 3734168 3733871 297 webster 4637776 4635525 2251 x-ray 3575990 3577183 -1193 xml 247545 246671 874 Total 29586186 29595065 -8879 v100 breaks osdb.
    1037 replies | 365220 view(s)
  • Shelwien's Avatar
    Yesterday, 17:57
    > https://ru.wikipedia.org/wiki/Парадокс_Ябло На википедии очень мало, мне больше понравилась вот эта статья: https://iep.utm.edu/yablo-pa/ И еще вот тут есть интересные рассуждения: https://avva.livejournal.com/1159044.html > Это не Гёдель, я придумал это доказательство, когда узнал о парадоксе Ябло: Я имел в виду, что для начала надо доказать утверждение, что любое утверждение либо истинно, либо ложно. Что неверно, т.к. существуют невычислимые утверждения. По ссылке выше, кстати, тоже упоминается о сводимости парадокса Ябло к вычислимости. > По легенде, Стёпа Мошкин - простой советский школьник 1975-1980 гг. У вас в мультфильме он выглядит как мажор-гик с прихлебателями. > Я вначале сам пытался анимировать, но понял, что это затянется на годы. Сейчас в 3D это делается достаточно быстро, включая синтезированную озвучку - попробуйте поискать на ютубе "mmd". К сожалению, в наших краях компьютерная анимация не очень популярна, но можно бы попробовать поискать в заграничных вузах, типа https://www.bachelorstudies.ru/Bakalavriat/%D0%90%D0%BD%D0%B8%D0%BC%D0%B0%D1%86%D0%B8%D1%8F/%D0%95%D0%B2%D1%80%D0%BE%D0%BF%D0%B0/ Или у авторов короткометражек на ютубе поспрашивать - тема не забитая, если вы бесплатно дадите сценарий - может и найдутся желающие. Или можно попробовать сделать в виде текста с иллюстрациями на сайте типа https://author.today/
    3 replies | 116 view(s)
  • lz77's Avatar
    Yesterday, 16:01
    It seems, in your C++ version of libzling you used my idea of finding match length. :)
    32 replies | 1237 view(s)
  • RichSelian's Avatar
    Yesterday, 15:15
    i rewrote libzling with rust years ago (https://github.com/richox/orz) and libzling is no longer maintained now the compression ratio is almost the same with lzma for text data, but 10x times faster :)
    32 replies | 1237 view(s)
  • lz77's Avatar
    Yesterday, 14:32
    > Can you please post the Delphi source? Pardon, I have another plans. I do not receive a salary either on Google or on Facebook. I want to sell this algorithm and sources (in C) as a shareware. Perhaps the students will be interested in this. Maybe in the next GDC this algorithm will help me win a prize. :)
    12 replies | 1212 view(s)
  • lz77's Avatar
    Yesterday, 13:59
    Это не Гёдель, я придумал это доказательство, когда узнал о парадоксе Ябло: RU: https://ru.wikipedia.org/wiki/Парадокс_Ябло EN: https://en.wikipedia.org/wiki/Yablo's_paradox Я считаю, что это не парадокс, а софизм. Главный герой взят из старых статей журнала "Квант" (см. kvant.info). Я выписывал этот журнал, когда сидел в школе. По легенде, Стёпа Мошкин - простой советский школьник 1975-1980 гг. Действительно, совр. зрителей, которые смотрят Машу и медведя, это раздражает. Но кое-кому понравилось: https://habr.com/ru/post/474426/ Я вначале сам пытался анимировать, но понял, что это затянется на годы. Анимацию делал начинающий аниматор (не художник) из Харькова, это был его 1-й мультфильм. Мультфильм рассчитан на советских ИТР, кандидатов и докторов наук, а также на всех любителей математики.
    3 replies | 116 view(s)
  • SolidComp's Avatar
    Yesterday, 04:56
    Have you looked at libzling? It had an intriguing balance of ratio and speed as of 2015 or so, and it seemed like there was still headroom for improvement. The ratios weren't as good as LZMA, but it might be possible to get there. You mentioned the possibility of optimizing LZMA/2, which is what I was thinking. Have you seen Igor's changes to 7-Zip 21.00? ​I wonder if SIMD might have asymmetric implications on algorithm choices. AVX2 should be the new baseline going forward, and some algorithms might be differentially impacted.
    32 replies | 1237 view(s)
  • radames's Avatar
    Yesterday, 01:40
    Nah no worries, just post as you like. I let my account stay unused now. I just see why the community archiver Fairytale just stopped being developed :( Bye everyone!
    32 replies | 1237 view(s)
  • fcorbelli's Avatar
    Yesterday, 01:37
    Just because the use case is not "mine", not "yours". A virtual disk is tipically made of hundreds of gigabyte. A tipical virtual backup can be multiple terabyte long. Do you agree on this use case? Because if yours virtual disk is 100MB then you are right Your use case != mine Short version Everything you want to use must consider the cardinality, space and time needed. It doesn't matter if you use X Y Z Use whatever you want that can handle at least 1TB in half night (4 hours). Preprocess, deduplication, ecc whatever. The real limit is time, not efficiency (space) This is not hutter prize. This is not 1GB challenge This is minimum 1TB challenge I hope this is clear. Anyway I will not post anymore
    32 replies | 1237 view(s)
  • radames's Avatar
    Yesterday, 00:53
    Not sure about the reason being so negative on a site where people share thoughts but whatever :rolleyes: your use case != my use case, short and polite ZPAQ: -method 4: LZ77+CM, BWT or CM https://github.com/moinakg/pcompress#pcompress support for multiple algorithms like LZMA, Bzip2, PPMD, etc, with SKEIN/SHA checksums for data integrity Let the user decide or use detection. A single magic compression doesn't exist, and my idea with stream detection was okay.
    32 replies | 1237 view(s)
  • fcorbelli's Avatar
    Yesterday, 00:05
    Can you please post the Delphi source?
    12 replies | 1212 view(s)
  • Shelwien's Avatar
    25th January 2021, 22:50
    So you're just saying that its not acceptable for you, that's ok. But it doesn't provide an answer to my question - which compression algorithms fits the constraints. Also, 1) Decoding is not strictly necessary for verification. Instead we can a) Prove that our compression algorithm is always reversible b) Add ECC to recover from hardware errors (at that scale it would be necessary in any case, even with verification by decoding) 2) LZ4 and deflate(pigz) are far from the best compression algorithms in any case. Even if you like these specific algorithms/formats, there're still known ways to improve their compression without speed loss. And then there're different algorithms that may be viable on different hardware, in the future, etc. You can't just say that perfect compression algorithms already exist and there's no need to think about further improvements.
    32 replies | 1237 view(s)
  • Shelwien's Avatar
    25th January 2021, 22:35
    Is it a Gödel thing?.. Anyway, characters are rather unique, but unlikable and rather painful to watch. Is that guy a son of nouveau riche? Btw, largest integer exists, and is called INT_MAX.
    3 replies | 116 view(s)
  • fcorbelli's Avatar
    25th January 2021, 22:25
    In fact, no. You will decode always. Every single time. Because you need to verify (or check) Unless you use, as I write, a "block checksumming approach". In this case you do not need to extract data at all (to verify). And you never can use a 4 or even 20MB/s algorithm I attach an example of "real world" virtual machine disk backup About 151TB stored in 437MB To "handle" this kind of file you will need the fastest algorithm, not the one with the most compression. That's what I'm trying to explain: the faster and lighter, the better. So LZ4 or PIGZ, AFAIK
    32 replies | 1237 view(s)
  • Shelwien's Avatar
    25th January 2021, 21:30
    @fcorbelli: What you say is correct, but completely unrelated to the topic here. Current compression algorithms are not perfect yet. For example, zstd design and development focuses on speed, while lzma provides 10% better compression and could be made much faster if somebody redesigned and optimized it to work on modern CPUs. Then, CM algorithms are slow, but can provide 20-30% better compression than zstd, and CM encoding can be actually much faster than LZ with parsing optimization - maybe even the requested 50MB/s. But decoding would be currently still around 4MB/s or so. So in some discussion I mentioned that such algorithms with "reverse asymmetry" can be useful in backup, because decoding is relatively rare there. And after a while got a feedback from actual backup developers with codec performance constraints that they're willing to accept. Problem is, it would be very hard to push CM to reach 20MB/s decoding, because its mostly determined by L3 latency. But it may be still possible, and there're other potential ways to the same goal - basically with all usual classes of compression algorithms. So I want to know which way would be easiest.
    32 replies | 1237 view(s)
  • fcorbelli's Avatar
    25th January 2021, 21:03
    It doesen't matter. You need something that deduplicates BUT that allows you to do many other things, such as verifying files without decompressing. If you have to store 300GB or 330GB really make no difference (huge from the point of view of software performance,irrelevant to use. It doesen't matter It doesen't matter, at all It doesen't matter You will find anything. Images, executables, database files, videos, word and excel documents, ZIP, RAR, 7z In my previous post 0) versioning "a-la-time-machine" 1) deduplication. 2) highly parallelizable compression. 3) RAM consumption 4) works with really large files 5) decompression which does NOT seek (if possible) 6) an advanced and fast copy verification mechanism WITHOUT decompress if possible 7) easy portability between Windows-Linux-* Nix systems. 8 append-only format 9) Reliability reliability reliability. No software "chains", where bugs and limitations can add up. This is, in fact, a patched ZPAQfranz with a fast LZ4 -1 compressor/decompressor Or zpaqfranz running on a zfs datastorage system, with embedded lz4. On developing hand a block-chunked format, with compressed AND uncompressed hash AND uncompressed CRC-32
    32 replies | 1237 view(s)
  • lz77's Avatar
    25th January 2021, 19:03
    Watch it on my youtube channel: https://www.youtube.com/watch?v=y0d5vniO2vk Intrigue of the 1st series: where is the bug in the proof of the Great Theorem? Maybe someone will translate the cartoon scenario into English... If anyone is interested, I will translate the summary into English using the google translate.
    3 replies | 116 view(s)
  • Shelwien's Avatar
    25th January 2021, 18:56
    Maybe increase window size for zstd? "-1" default is 512kb Also test newer zstd? They made significant speed improvements in 1.4.5, and extra 5% in 1.4.7 too.
    12 replies | 1212 view(s)
  • lz77's Avatar
    25th January 2021, 18:22
    Wow, I guess I finishing writing (for now in Delphi 7) not bad simple pure LZ77-type compressor called notbadlz. :-) I'm trying to beat zstd at fast levels for sporting interest. I set in my compressor 128K input buffer and 16K cells, here are results: (zstd-v1.4.4-win64) A:\>timer.exe zstd.exe --fast -f --no-progress --single-thread enwik8 Timer 3.01 Copyright (c) 2002-2003 Igor Pavlov 2003-07-10 enwik8 : 51.82% (100000000 => 51817322 bytes, enwik8.zst) Kernel Time = 0.062 = 00:00:00.062 = 7% User Time = 0.781 = 00:00:00.781 = 90% Process Time = 0.843 = 00:00:00.843 = 98% Global Time = 0.859 = 00:00:00.859 = 100% *** A:\>timer.exe notbadlz.exe Timer 3.01 Copyright (c) 2002-2003 Igor Pavlov 2003-07-10 Source size 100000000, packed size: 47086177 ratio 0.47086 Done in 1.35210 Kernel Time = 0.078 = 00:00:00.078 = 5% User Time = 1.359 = 00:00:01.359 = 93% Process Time = 1.437 = 00:00:01.437 = 98% Global Time = 1.453 = 00:00:01.453 = 100% *** A:\>timer.exe zstd.exe -d -f --no-progress --single-thread enwik8.zst -o enwik Timer 3.01 Copyright (c) 2002-2003 Igor Pavlov 2003-07-10 enwik8.zst : 100000000 bytes Kernel Time = 0.109 = 00:00:00.109 = 24% User Time = 0.203 = 00:00:00.203 = 44% Process Time = 0.312 = 00:00:00.312 = 68% Global Time = 0.453 = 00:00:00.453 = 100% A:\>timer notbadlzunp.exe Timer 3.01 Copyright (c) 2002-2003 Igor Pavlov 2003-07-10 Compressed size: 47086177 Decompressed size: 100000000 Done in 0.52653 sec. Kernel Time = 0.093 = 00:00:00.093 = 12% User Time = 0.515 = 00:00:00.515 = 68% Process Time = 0.609 = 00:00:00.609 = 81% Global Time = 0.750 = 00:00:00.750 = 100% With 128K cells and input buffer size = 256 Mb notbadlz compresses enwik8 better than zstd -1. Zstd does it faster, but I have in stock asm 64, prefetch etc. Now I'm thinking whether it's worth porting my program in FASM...
    12 replies | 1212 view(s)
  • pklat's Avatar
    25th January 2021, 16:44
    iirc, vbox has an option to keep virtual disk and save diff in separate file. dunno about others.
    32 replies | 1237 view(s)
  • radames's Avatar
    25th January 2021, 16:18
    This is where I agree with Eugene. This is where I agree with Franco. My tests were about what tool does what the best? Zpaq cannot de-duplicate as good as SREP. LZMA2 is fast enough with good ratio to compress an SREP file. Integrated exe/dll preprocessing is important as well like Eugene said. Will we need precomp? What are streams we find on a windows OS VM? ZIP/CAB(non-deflate like LZX/Quantum) files? Are the latter covered? What about ELF/so and *NIX operating systems? Those are important for VMs and servers as well. What are priorities and in what order? Multithreading >>> Deduplication/Journaling >> Recompress popular streams > Ratio (controlled by switch) What entropy coder really makes a difference? Franco made an excellent point about transfer speeds (Ratio will matter for your HDD storage space and 10Gbit/s net). Your network and disk speeds are almost as important as your total threads and RAM. I am just here because I am interested in what you might code. Eugene just please don't make it too difficult or perfect, or it may never be finished.
    32 replies | 1237 view(s)
  • suryakandau@yahoo.co.id's Avatar
    25th January 2021, 15:13
    by using paq8sk44 -s8 option on f.jpg (DBA corpus) the result is Total 112038 bytes compressed to 80194 bytes. Time 19.17 sec, used 2444 MB (2563212985 bytes) of memory
    1037 replies | 365220 view(s)
  • fcorbelli's Avatar
    25th January 2021, 14:23
    Now change a little on the image, just like a real VM do. redo the backup. How much space needed to retain today and yesterday backups? How much time and ram is needed to verify those backups? Today and yesterday PS shitty korean smartphone does not like english at all
    32 replies | 1237 view(s)
  • fcorbelli's Avatar
    25th January 2021, 14:20
    Ahem... No Srep is not enough sure to use. Nanozip source is not available. And those are NOT versioned backup. VM are simply too big to keep with different file. If you have a 500GB thick disk who become a 300GB compressed file today, what you will do tomorrow? Another 300GB? For a month-long backup retention policy, where to out 300GBx30=9TB for just a single vm? How long will take to transfer 300GB via LAN? How long will take to verify 300GB via LAN? Almost everything is good for a 100MB file. Or 30GB. But for 10TB of a typical vsphere server?
    32 replies | 1237 view(s)
  • Darek's Avatar
    25th January 2021, 13:58
    Darek replied to a thread Paq8pxd dict in Data Compression
    Scores for paq8pxd v99 and paq8pxd v100 for my testset. Paq8pxd v99 version is about 15KB better than paq8px v95. paq8pxd v100 version is abput 35KB better than paq8px v99 - mainly due to lstm implementation, however it's not as big gain like in paq8px version (about 95KB between non-lstm and lstm versions) Timings for paq8pxd v99, paq8pxd v100 and paq8px v200 (-l) versions: paq8pxd v99 = 5'460,32s paq8pxd v100 = 11'610,80s = 2.1 times slower - it's still about 1.7 times faster than paq8px paq8px v200 = 19'440,11s
    1037 replies | 365220 view(s)
  • Shelwien's Avatar
    25th January 2021, 12:32
    1) There're probably better options for zstd, like lower level (-9?) and --long=31, or explicit settings of wlog/hlog via --zstd=wlog=31 2) LZMA2 certainly isn't the best option, there're at least RZ and nanozip. 3) zstd doesn't have integrated exe preprocessing, while zpaq and 7z do - I'd suggest testing zstd with output of "7z a -mf=bcj2 -mm=copy"
    32 replies | 1237 view(s)
  • radames's Avatar
    25th January 2021, 12:11
    NTFS image: 38.7 GiB - ntfs-ptcl-img (raw) simple dedupe: 25.7 GiB - ntfs-ptcl-img.srep (m3f) -- 22 minutes single-step: 14.0 GiB - ntfs-ptcl-img.zpaq (method 3)-- 31 minutes 13.0 GiB - ntfs-ptcl-img.zpaq (method 4) -- 69 minutes Chained: 12.7 GiB - ntfs-ptcl-img.srep.zst (-19) -- hours 11.9 GiB - ntfs-ptcl-img.srep.7z (ultra) -- 21 minutes 11.8 GiB - ntfs-ptcl-img.srep.zpaq (method 4) -- 60 minutes 2700X, 32 GB RAM times are for the respecting step, not cumulative I think there is no magic archiver for VM images yet, just good old SREP+LZMA2.
    32 replies | 1237 view(s)
  • Jyrki Alakuijala's Avatar
    25th January 2021, 12:03
    Lode Vandevenne (one of the authors of JPEG XL) has hacked up a tool called grittibanzli in 2018. It recompresses gzip streams using more efficient methods (brotli) and can reconstruct the exactly same gzip bit stream back. For PNGs you can get ~10 % denser representation while being able to recover the original bit exact. I don't think people should be using this when there are other options like just using stronger formats for pixel exact lossless coding: PNG recompression tools (like Pingo or ZopfliPNG), br-content-encoded uncompressed but filtered PNGs, WebP lossless and, JPEG XL lossless.
    49 replies | 4166 view(s)
  • Jon Sneyers's Avatar
    24th January 2021, 22:42
    The thing is, it's not just libpng that you need to update. Lots of software doesn't use libpng, but just statically links some simple png decoder like lodepng. Getting all png-decoding software upgraded is way harder than just updating libpng and waiting long enough. I don't think it's a substantially easier task than getting them to support, say, jxl.
    49 replies | 4166 view(s)
  • e8c's Avatar
    24th January 2021, 22:22
    Sounds like "... not everything will use the latest version of Linux Kernel. Revising or adding features to an already deployed Linux is just as hard as introducing a new Operating System."
    49 replies | 4166 view(s)
  • Jon Sneyers's Avatar
    24th January 2021, 22:04
    I guess you could do that, but it would be a new kind of PNG that wouldn't work anywhere. Not everything uses libpng, and not everything will use the latest version. Revising or adding features to an already deployed format is just as hard as introducing a new format.
    49 replies | 4166 view(s)
  • Shelwien's Avatar
    24th January 2021, 18:36
    0) -1 is not the fastest mode (--fast=# ones are faster). It seems that --fast disables huffman coding for literals, but still keeps FSE for matches. 1) Yes, huffman for literals, FSE for matches - or nothing if block is incompressible 2) No. Certainly no preprocessing, but maybe you'd say that "--adapt" mode is related. 3) https://github.com/facebook/zstd/blob/dev/lib/compress/zstd_compress.c#L5003 - 16k cells 4) The library works with user buffers of whatever size, zstdcli seems to load 128KB blocks with fread. https://github.com/facebook/zstd/blob/69085db61c134192541208826c6d9dcf64f93fcf/lib/zstd.h#L108
    12 replies | 1212 view(s)
  • moisesmcardona's Avatar
    24th January 2021, 17:21
    moisesmcardona replied to a thread paq8px in Data Compression
    As we said, we will not be reviewing code that is not submitted via Git.
    2302 replies | 609175 view(s)
  • e8c's Avatar
    24th January 2021, 17:12
    (Khmm ...) "this feature in ... PNG" means "this feature in ... LibPNG": transcoding JPG -> PNG, result PNG smaller than original JPG. (Just for clarity.)
    49 replies | 4166 view(s)
  • lz77's Avatar
    24th January 2021, 17:06
    4 questions about the fastest compression mode (zstd.exe -1 ...) 1. Does zstd use additional compression (Huffman, FSE) besides LZ? 2. Is there any optimization (E8/E9 etc.) or analysis of source data to configure the compression algorithm? 3. What is the size of the hash table with this option (how many cells and are there chains)? 4. What is the size of the input buffer? Thanks.
    12 replies | 1212 view(s)
  • Jon Sneyers's Avatar
    24th January 2021, 15:14
    Bitexact PNG file reconstruction gets a bit tricky because unlike JPEG which uses just Huffman, PNG uses Deflate which has way more degrees of freedom in how to encode. Emulating popular existing simple png encoders could help for most cases encountered in practice, but comes at the cost of having to include those encoders in the reconstruction method. To be honest, I am more interest in non-bitexact recompression for png, which is still lossless in terms of the image data and metadata. For PSD (Photoshop) it might be worthwhile to have a bitexact recompression though - at least for the common case where the image data itself is uncompressed or PackBits (RLE). It shouldn't be hard to get very good recompression ratios on those, and the recompressed jxl file would be viewable in anything that supports jxl, which will hopefully soon be more than what supports psd viewing.
    49 replies | 4166 view(s)
  • kaitz's Avatar
    24th January 2021, 14:09
    kaitz replied to a thread Paq8pxd dict in Data Compression
    paq8pxd_v100 add lstm model back (active on -x option)used on all predictors exept audio add matchModel from paq8px_v201 as second model adjust old matchModel parameters tar header as bintext add back 1 mixer context if DECA in sparsemodel (default) add 2 contexts adjust normalModel Fixes https://encode.su/threads/1464-Paq8pxd-dict?p=66290&viewfull=1#post66290
    1037 replies | 365220 view(s)
  • schnaader's Avatar
    24th January 2021, 13:24
    Might be possible after integrating image compressors (currently planned: FLIF, webp, JPEG XL) into Precomp. Depending on which one gives the best result, new file size will be a bit larger than that because of the zlib reconstruction data, but the original PNG file can be restored bit-to-bit-lossless. As a synergetic effect side project, zlib reconstruction data and other PNG metadata could be stored as binary metadata in the new format. For FLIF, I'm quite sure that arbitrary metadata can be stored, but I'd expect this to be possible in the two other formats, too. This way, full lossless .png<->.png,.flif,.webp,.jxl would be possible, the resulting files would be viewable and the original PNG file could be restored. A checksum of the PNG would have to be stored to prevent/invalidate restoration attempts after editing the new file, because the original PNG obviously can't be restored after altering the image data. The size of reconstruction data differs depending on what was used to create the PNG, a rough guess would be: if the image data can be compressed to 50% of the PNG size, resulting file including restoration data would usually have a size between 50% and 75% of the PNG - though edge cases with >= 100% are possible, too (e.g. compressing random noise and using a PNG optimizer). Of course, integration from the other side would also be possible, by integrating preflate and PNG parsing into some webp/jpegxl transcoding side project.
    49 replies | 4166 view(s)
  • suryakandau@yahoo.co.id's Avatar
    24th January 2021, 05:38
    i just adding mixercontextset in textmodel.cpp and simdlstmmodel.hpp...
    2302 replies | 609175 view(s)
  • Alexander Rhatushnyak's Avatar
    24th January 2021, 02:19
    а потом другой такой же, mikle1, говорит: "втравить несовершеннолетних в политические разборки! Ничего более безнравственного, ну, более гнусного придумать невозможно!" -- на 12:38 своего 17-минутного видео. В несоответствующей своим убеждениям футболке. :_down2: обоим.
    13 replies | 1052 view(s)
  • e8c's Avatar
    24th January 2021, 02:08
    https://cloudinary.com/blog/legacy_and_transition_creating_a_new_universal_image_codec There are now UFS 3+ and NVMe / SATA 3 SSDs that are fast, cheap, and big enough to handle lossless imaging. The lossy format looks like a thing from the past. Efficient lossless transcoding of existing JPEGs is cool. I want to see this feature in... PNG.
    49 replies | 4166 view(s)
  • natanpro's Avatar
    23rd January 2021, 21:21
    Any Chance for updated mingw64 version? (sources has been updated about a day ago)
    49 replies | 4166 view(s)
  • WinnieW's Avatar
    23rd January 2021, 14:35
    FWIW I use Zstandard for multi-Gbyte heavy backups, with commandline: -12 --long=30 I'm quite content with it.
    32 replies | 1237 view(s)
  • lz77's Avatar
    23rd January 2021, 13:52
    lz77 replied to a thread Lucky again in The Off-Topic Lounge
    ​https://www.youtube.com/watch?v=QwLb43aXxHU
    13 replies | 1052 view(s)
  • Krishty's Avatar
    23rd January 2021, 12:30
    I released a new version last evening (RSS readers should notice): https://papas-best.com/optimizer_en#download I accidentially deleted the EXIF rotation tag if a JPG file cannot be derotated in lossless mode. If you have a Huawei phone, alas a bunch of JPGs in the ungodly resolution 5312×2988 (not evenly divisible into JPG blocks), they may lose all rotation. This is fixed now. There is another problem lurking below: mozjpegtran fails to rotate some images where normal jpegtran succeeds. I suspect this happens if a JPG file grows after derotation. It seems to be fixed in the current build (4.0.1-rc2), but I’d like to test it internally for a week before I ship it. On my PC, only three files out of a few thousands were affected, so it is not a high priority problem anyway.
    91 replies | 33303 view(s)
  • fcorbelli's Avatar
    23rd January 2021, 12:06
    For me the answer is easy. The one who scales better on multicore. Just like pigz. Single thread performance is useless. On implementation side: the one which can extensively use HW SSE instructions. Compression ratio is irrilevant. Only speed (and limitated RAM usage). In two words: a deduplicated pigz (aka deflate). Or lz4 for decompression speed (not so relevant). In fact I use this one (storing on zfs the deduplicated archive)
    32 replies | 1237 view(s)
  • Shelwien's Avatar
    23rd January 2021, 03:08
    Again, this is not about archive format, and obviously there'd be dedup, MT, etc. Question is, which class of compression algorithms to use as a base for development. Compression-wise, my PLZMA almost fits, but encoding with parsing optimization is too slow. fast-CM (like MCM, or nzcc) fits by CR and, potentially, encoding speed (it can be significantly improved only for encoding with context sorting and out-of-order probability evaluation), but there's no solution for decoding speed. And BWT fits both by enc and dec speed, and even CR on text, but BWT CR on binaries is relatively bad. Plus, there're preprocessors and hybrid options - plenty of choices, which is the problem. 1,048,576 corpus_VDI_pcf_x3.1M 249,592 corpus_VDI_pcf_x3.1M.lzma 1048576/249592 = 4.20 (276366/249592-1)*100 = 10.72% 243,743 corpus_VDI_pcf_x3.1M.plzma_c1 1048576/243743 = 4.30 (276366/243743-1)*100 = 13.38% 248,687 corpus_VDI_pcf_x3.1M.rz 1048576/248687 = 4.22 (276366/248687-1)*100 = 11.13% 276,366 corpus_VDI_pcf_x3.1M.zst 1048576/276366 = 3.79 276,403 corpus_VDI_pcf_x3.1M.lzma_a0 // lzma -a0 -d20 -fb8 -mc4 -lc0 -lp0 533,864 corpus_VDI_pcf_x3.1M.lz4-1 369,616 corpus_VDI_pcf_x3.1M.lz4-1.c7lit_c2 443,586 corpus_VDI_pcf_x3.1M.lz4-12 355,800 corpus_VDI_pcf_x3.1M.lz4-12.c7lit_c2 707,961 corpus_VDI_pcf_x3.1M.LZP-DS 236,180 corpus_VDI_pcf_x3.1M.LZP-DS.c7lit_c2 391,962 corpus_VDI_pcf_x3.1M.lzpre 306,616 corpus_VDI_pcf_x3.1M.lzpre.c7lit_c2
    32 replies | 1237 view(s)
  • JamesB's Avatar
    23rd January 2021, 01:49
    I get that patents generated huge revenues and funded numerous people work on improved technologies, but it's no longer the only model. The theory that unless someone pays for it, how are you going to fund the next generation of research, is blinkered in the modern world. Huge organisations collectively spend billions on infrastructure - storage, network bandwidth, etc. The more enlightened ones are not really interested in owning a file format or a codec technology - they view their product as being content providers and the mundane nuts are bolts are in the "precompetitive" area. Jointly funding research to reduce their overheads is a really obvious win. Hence why AOMedia exists. The flip side is the old-school members of MPEG that ended up bickering over patent pools and essentially killed MPEG as an organisation. I find it sad, and intensely annoying, that many of those have now turned their attention to other markets such as genomics. I've spent months battling stupid patents, but ultimately I've given up. It's just whack-a-mole. The entire thing is a broken model for all bar a very few select cases.
    2 replies | 351 view(s)
  • pklat's Avatar
    22nd January 2021, 20:54
    think you should consider using 'Linux containers' or similar if possible, they should use space and other resources much more efficiently. dunno about security.
    32 replies | 1237 view(s)
  • Jyrki Alakuijala's Avatar
    22nd January 2021, 20:01
    I have landed four improvements on this topic -- however not yet in the public gitlab. I have not checked if they are effective for your use case, and suspect that some more iterations are needed. ​Please keep sending these examples, they are very inspiring.
    49 replies | 4166 view(s)
  • fcorbelli's Avatar
    22nd January 2021, 19:50
    As I tried to explain, the compression rate of virtual machine disks is the last, but the very last, of the aspects that are considered when operating with VM. I enumerate the necessary requirements for those who want to develop his own 0) versioning "a-la-time-machine" 1) deduplication. This is the most important, indeed fundamental, element to save space during versioned copies 2) highly parallelizable compression. Server CPUs are typically low clocked, but with many cores. Therefore the maximum performance obtainable by a certain algorithm on a single core is almost irrelevant. 3) since the problem is clearly IO-bound, ideally a program shoul But it is not a real requirement, the point is the RAM consumption of any multiple processes launched in the background with &d be able to process in parallel data streams arriving from different media (e.g. multiple NFS shares) 4) works with really large files (I've had thousands of terabytes), with a low RAM (~20/30GB, not more). RAM is precious on VM server. Specific compression machines are expensive, delicate, fail 5) the decompression performance is, again, IO-bound rather than CPU-bound. So a system that, for example, does NOT seek (as does for example ZPAQ) when extracting is excellent Even if you compress a lot, absurdly, a virtual disk of 500GB by 98% in 10GB, then you have to write in extraction 500GB, and you will pay the "writing cost" (time) of 500GB 6) an advanced and fast copy verification mechanism. Any unverified backup is not eligible A fast control mechanism is even more important than fast software. So, ideally, you need a check that does NOT include massive data extraction (which we know is really huge). That is ... the mechanism of ZPAQ (!). Keep the hashes of the decompressed blocks, so that you do NOT have to decompress the data to verify them. Clearly using TWO different algorithms (... like I do ...) for hash collisions, if paranoid 7) easy portability between Windows-Linux-* Nix systems. No strange compiling paradigma, libraries etc 8) append-only format, to use rsync or whatever. You simply cannot move even the backups (if you do not have days to spare) 9) Reliability reliability reliability No software "chains", where bugs and limitations can add up. ===== Just today I'm restoring a small virtualbox Windows server with a 400GB drive. Even assuming you get 100MB/s of sustained rate (normal value for a normal load virtualization server), it takes over an hour just to read it. Obviously I didn't do that, but a zfs snapshot and copying yesterday's version (about 15 minutes) In the real world you make a minimum of a backup for day (in fact 6+) This give you no more than 24 hours to do a backup (tipically 6 hours 23:00-05:00,plus 1 hour until 06:00 of uploading to remote site) With a single small server with 1TB (just about a home server) this means 10^12 / (86.400) = ~10MB/s as a lower bound. In fact for 6 hours this is ~50M/s for terabyte. This is about the performance of Zip or whatever. For a small SOHO of 10TB ~500MB/s for 6 hours. This is much more than a typical server can do. For medium size vsphere server soon it become challenging,needing external cruncher (I use AMD 3950x), but with a blazing-fast network (not very cheap, at all), and a lot of efforts. To recap: the amount of data is so gargantuan that hoping to be able to compress it with something really efficient, in a period of a few hours, becomes unrealistic. Unzipping is also no small problem for thick disks If it takes a week to compress'n'test a set of VM images, you get one backup per week. Not quite ideal. Moving the data to a different server and then having it compressed "calmly" also doesn't work. There are simply too many. Often compression is completely disabled (for example, leaving it to the OS with LZ4). This is my thirty years of experience in datastorage, and twenty-five in virtual datastorage.
    32 replies | 1237 view(s)
  • JamesB's Avatar
    22nd January 2021, 18:01
    It's also worth considering that this is two way. The data compression community can improve bioinformatics tools, but vice versa is true too. I strongly suspect there may be applications for using minimisers as an alternative to hashing in rapid-LZ applications or when doing dedup. The ability to quickly identify matches across datasets 10s of GB in size is something the bioinformatics community has put a lot of effort into. Similarly some of the rapid approximate alignment algorithms may give better ways of describing not-quite-perfect matches instead of a series of neighbouring LZ steps.
    40 replies | 2201 view(s)
  • pklat's Avatar
    22nd January 2021, 17:13
    if you got several of them, I'd defragment them, perhaps make sparse file, and then use lrzip.
    32 replies | 1237 view(s)
  • Shelwien's Avatar
    22nd January 2021, 16:48
    This is not about using some already available tool, but more about development of one. zpaq doesn't fit at all in that sense, because all of its algorithms are worse than other known open-sources ones. Yes, its nice that zpaq is a working integrated solution, and I really appreciate that you're trying to improve it. But this thread is about designing a compression algorithm with given constraints. These constraints are a little ahead of current state-of-art, and there're multiple ways to solve it (making a stronger/slower LZ77 or ROLZ, speed-optimizing a fast-CM, finding a fitting BWT/postcoder setup, some LZ/CM hybrid maybe, etc) so I'd like to know what other developers think about this.
    32 replies | 1237 view(s)
  • Gotty's Avatar
    22nd January 2021, 15:06
    Gotty replied to a thread paq8px in Data Compression
    Yes, as Luca said and also Moises said and also I said before: don't post source code here. It's quite cumbersome to find changes in your contributions that way. Do a pull request to my branch instead if you would like to have it reviewed. Would yo do that please? I'm not an authority here but I must act one as I haven't seen a modification/improvement from your side that was bug-free or issue-free. Maybe your current one is an exception, I don't know yet. But I'm willing to review it only when you use git. No source codes in the forum please. That's not reviewer friendly.
    2302 replies | 609175 view(s)
  • LucaBiondi's Avatar
    22nd January 2021, 14:39
    LucaBiondi replied to a thread paq8px in Data Compression
    Hi suryakandau Could you post your modifications for example an exam diff please I would to learn (slowly) how to paq8px works... thank you Luca
    2302 replies | 609175 view(s)
  • fcorbelli's Avatar
    22nd January 2021, 12:41
    When making backup of virtual disks, even worse thick one, it is normal to take say 400GB for image. If you have even only 2/3 it's one terabyte just for a home server. in production easily 10TB for DAY. there is not a single x86 CPU that can compress this amount of data with High compress ratio. bandwith dominate the problem, it is io bound and not CPU bound. If the backup is 370GB or 390GB makes no difference at all. If 3000GB or 3600GB even less. For quick backup the answer is differential zfs send (not incremental) pigzip-ed Requires zfs, lots of Ram and fast disks. It is doable: I do every day from years. But restore is painful, and extensive zfs expertise needed. I make intermedial backup with zfs (hourly) and nighttime zpaqfranz, plus ZIP (yes 7z in ZIP mode)
    32 replies | 1237 view(s)
  • fcorbelli's Avatar
    22nd January 2021, 12:30
    Zpaq is the only answer. I use it for the same work by years, until today. Compression speed is decent (150/200MB/s for modern server). Deduplication very good. High compression is totally a waste of time for virtual disks. m1 or even m0 (only dedup). I will prefer pigz -1 but it is too hard to merge in zpaq. It simply... WORKS even with very big files. Decompression is slow with magnetic disks but.. who cares? Better, of course, my zpaqfranz fork. Compile on BSD and Linux and Windows With a decent gui for Windows (pakka) a la time macchine.
    32 replies | 1237 view(s)
  • kaitz's Avatar
    22nd January 2021, 04:11
    kaitz replied to a thread Paq8pxd dict in Data Compression
    Silesia:
    1037 replies | 365220 view(s)
  • suryakandau@yahoo.co.id's Avatar
    22nd January 2021, 03:06
    i have make a little improvement on lstm model by adding 1 mixercontextsets and 2 mixerinputs. @darek could you test it on silesia file using -12lrta option ? i think it can save more space. thank you!!
    2302 replies | 609175 view(s)
  • Jyrki Alakuijala's Avatar
    21st January 2021, 23:54
    Funnily, this was a starting point for brunsli. In the first six weeks of Brunsli development (back in 2014) we migrated away from the clustering, because context modeling (and even prediction) were stronger for our JPEG recompression corpus. It doesn't mean that it could be more effective now for another domain. Just back then me and Zoltan were not able to get fantastic results with this approach on JPEG recompression modeling. We got about -14 % of JPEG with this kind of ideas (k means in five dimensions to do entropy coding) and -22 % with the ideas expressed in brunsli or JPEG XL. In WebP lossless we sometimes have 700+ entropy codes for different ergodic processes -- so having 3 to 255 is not necessarily excessive.
    5 replies | 714 view(s)
  • danlock's Avatar
    21st January 2021, 22:13
    New approaches to compression and image processing are exciting and invigorating! I hope they stimulate others who have image compression specialization to consider new perspectives as well! Watching this method evolve will be a lot of fun! Thanks for introducing us to your Rust image compression library!
    5 replies | 714 view(s)
  • Shelwien's Avatar
    21st January 2021, 20:08
    Shelwien replied to a thread paq8px in Data Compression
    fastest method is probably https://www.facebook.com/mattmahoneyfl
    2302 replies | 609175 view(s)
More Activity