Activity Stream

Filter
Sort By Time Show
Recent Recent Popular Popular Anytime Anytime Last 24 Hours Last 24 Hours Last 7 Days Last 7 Days Last 30 Days Last 30 Days All All Photos Photos Forum Forums
  • radames's Avatar
    Today, 16:18
    This is where I agree with Eugene. This is where I agree with Franco. My tests were about what tool does what the best? Zpaq cannot de-duplicate as good as SREP. LZMA2 is fast enough with good ratio to compress an SREP file. Integrated exe/dll preprocessing is important as well like Eugene said. Will we need precomp? What are streams we find on a windows OS VM? zlib/cab files? Are the latter covered? What about ELF/so and *NIX operating systems? Those are important for VMs and servers as well. What are priorities and in what order? Multithreading >>> Deduplication >> Recompress popular streams > Ratio (controlled by switch) What entropy coder really makes a difference? Franco made an excellent point about transfer speeds. Your network and disk speeds are almost as important as your total threads and RAM. I am just here because I am interested in what you might code. Eugene just please don't make it too difficult or perfect, or it may never be finished.
    16 replies | 662 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Today, 15:13
    by using paq8sk44 -s8 option on f.jpg (DBA corpus) the result is Total 112038 bytes compressed to 80194 bytes. Time 19.17 sec, used 2444 MB (2563212985 bytes) of memory
    1035 replies | 364565 view(s)
  • fcorbelli's Avatar
    Today, 14:23
    Now change a little on the image, just like a real VM do. redo the backup. How much space needed to retain today and yesterday backups? How much time and ram is needed to verify those backups? Today and yesterday PS shitty korean smartphone does not like english at all
    16 replies | 662 view(s)
  • fcorbelli's Avatar
    Today, 14:20
    Ahem... No Srep is not enough sure to use. Nanozip source is not available. And those are NOT versioned backup. VM are simply too big to keep with different file. If you have a 500GB thick disk who become a 300GB compressed file today, what you will do tomorrow? Another 300GB? For a month-long backup retention policy, where to out 300GBx30=9TB for just a single vm? How long will take to transfer 300GB via LAN? How long will take to verify 300GB via LAN? Almost everything is good for a 100MB file. Or 30GB. But for 10TB of a typical vsphere server?
    16 replies | 662 view(s)
  • Darek's Avatar
    Today, 13:58
    Darek replied to a thread Paq8pxd dict in Data Compression
    Scores for paq8pxd v99 and paq8pxd v100 for my testset. Paq8pxd v99 version is about 15KB better than paq8px v95. paq8pxd v100 version is abput 35KB better than paq8px v99 - mainly due to lstm implementation, however it's not as big gain like in paq8px version (about 95KB between non-lstm and lstm versions) Timings for paq8pxd v99, paq8pxd v100 and paq8px v200 (-l) versions: paq8pxd v99 = 5'460,32s paq8pxd v100 = 11'610,80s = 2.1 times slower - it's still about 1.7 times faster than paq8px paq8px v200 = 19'440,11s
    1035 replies | 364565 view(s)
  • Shelwien's Avatar
    Today, 12:32
    1) There're probably better options for zstd, like lower level (-9?) and --long=31, or explicit settings of wlog/hlog via --zstd=wlog=31 2) LZMA2 certainly isn't the best option, there're at least RZ and nanozip. 3) zstd doesn't have integrated exe preprocessing, while zpaq and 7z do - I'd suggest testing zstd with output of "7z a -mf=bcj2 -mm=copy"
    16 replies | 662 view(s)
  • radames's Avatar
    Today, 12:11
    NTFS image: 38.7 GiB - ntfs-ptcl-img (raw) simple dedupe: 25.7 GiB - ntfs-ptcl-img.srep (m3f) -- 22 minutes single-step: 14.0 GiB - ntfs-ptcl-img.zpaq (method 3)-- 31 minutes 13.0 GiB - ntfs-ptcl-img.zpaq (method 4) -- 69 minutes Chained: 12.7 GiB - ntfs-ptcl-img.srep.zst (-19) -- hours 11.9 GiB - ntfs-ptcl-img.srep.7z (ultra) -- 21 minutes 11.8 GiB - ntfs-ptcl-img.srep.zpaq (method 4) -- 60 minutes 2700X, 32 GB RAM times are for the respecting step, not cumulative I think there is no magic archiver for VM images yet, just good old SREP+LZMA2.
    16 replies | 662 view(s)
  • Jyrki Alakuijala's Avatar
    Today, 12:03
    Lode Vandevenne (one of the authors of JPEG XL) has hacked up previous a tool called grittibanzli in 2018. It recompresses gzip streams using more efficient methods (brotli) and can reconstruct the bit exact stream back. For PNGs you can get ~10 % denser representation while being able to recover the original bit exact. I don't think people should be using this when there are other options like just using stronger formats for pixel exact lossless coding: PNG recompression tools (like Pingo or ZopfliPNG), br-content-encoded uncompressed but filtered PNGs, WebP lossless and, JPEG XL lossless.
    49 replies | 3915 view(s)
  • Jon Sneyers's Avatar
    Yesterday, 22:42
    The thing is, it's not just libpng that you need to update. Lots of software doesn't use libpng, but just statically links some simple png decoder like lodepng. Getting all png-decoding software upgraded is way harder than just updating libpng and waiting long enough. I don't think it's a substantially easier task than getting them to support, say, jxl.
    49 replies | 3915 view(s)
  • e8c's Avatar
    Yesterday, 22:22
    Sounds like "... not everything will use the latest version of Linux Kernel. Revising or adding features to an already deployed Linux is just as hard as introducing a new Operating System."
    49 replies | 3915 view(s)
  • Jon Sneyers's Avatar
    Yesterday, 22:04
    I guess you could do that, but it would be a new kind of PNG that wouldn't work anywhere. Not everything uses libpng, and not everything will use the latest version. Revising or adding features to an already deployed format is just as hard as introducing a new format.
    49 replies | 3915 view(s)
  • Shelwien's Avatar
    Yesterday, 18:36
    0) -1 is not the fastest mode (--fast=# ones are faster). It seems that --fast disables huffman coding for literals, but still keeps FSE for matches. 1) Yes, huffman for literals, FSE for matches - or nothing if block is incompressible 2) No. Certainly no preprocessing, but maybe you'd say that "--adapt" mode is related. 3) https://github.com/facebook/zstd/blob/dev/lib/compress/zstd_compress.c#L5003 - 16k cells 4) The library works with user buffers of whatever size, zstdcli seems to load 128KB blocks with fread. https://github.com/facebook/zstd/blob/69085db61c134192541208826c6d9dcf64f93fcf/lib/zstd.h#L108
    8 replies | 1030 view(s)
  • moisesmcardona's Avatar
    Yesterday, 17:21
    moisesmcardona replied to a thread paq8px in Data Compression
    As we said, we will not be reviewing code that is not submitted via Git.
    2302 replies | 608663 view(s)
  • e8c's Avatar
    Yesterday, 17:12
    (Khmm ...) "this feature in ... PNG" means "this feature in ... LibPNG": transcoding JPG -> PNG, result PNG smaller than original JPG. (Just for clarity.)
    49 replies | 3915 view(s)
  • lz77's Avatar
    Yesterday, 17:06
    4 questions about the fastest compression mode (zstd.exe -1 ...) 1. Does zstd use additional compression (Huffman, FSE) besides LZ? 2. Is there any optimization (E8/E9 etc.) or analysis of source data to configure the compression algorithm? 3. What is the size of the hash table with this option (how many cells and are there chains)? 4. What is the size of the input buffer? Thanks.
    8 replies | 1030 view(s)
  • Jon Sneyers's Avatar
    Yesterday, 15:14
    Bitexact PNG file reconstruction gets a bit tricky because unlike JPEG which uses just Huffman, PNG uses Deflate which has way more degrees of freedom in how to encode. Emulating popular existing simple png encoders could help for most cases encountered in practice, but comes at the cost of having to include those encoders in the reconstruction method. To be honest, I am more interest in non-bitexact recompression for png, which is still lossless in terms of the image data and metadata. For PSD (Photoshop) it might be worthwhile to have a bitexact recompression though - at least for the common case where the image data itself is uncompressed or PackBits (RLE). It shouldn't be hard to get very good recompression ratios on those, and the recompressed jxl file would be viewable in anything that supports jxl, which will hopefully soon be more than what supports psd viewing.
    49 replies | 3915 view(s)
  • kaitz's Avatar
    Yesterday, 14:09
    kaitz replied to a thread Paq8pxd dict in Data Compression
    paq8pxd_v100 add lstm model back (active on -x option)used on all predictors exept audio add matchModel from paq8px_v201 as second model adjust old matchModel parameters tar header as bintext add back 1 mixer context if DECA in sparsemodel (default) add 2 contexts adjust normalModel Fixes https://encode.su/threads/1464-Paq8pxd-dict?p=66290&viewfull=1#post66290
    1035 replies | 364565 view(s)
  • schnaader's Avatar
    Yesterday, 13:24
    Might be possible after integrating image compressors (currently planned: FLIF, webp, JPEG XL) into Precomp. Depending on which one gives the best result, new file size will be a bit larger than that because of the zlib reconstruction data, but the original PNG file can be restored bit-to-bit-lossless. As a synergetic effect side project, zlib reconstruction data and other PNG metadata could be stored as binary metadata in the new format. For FLIF, I'm quite sure that arbitrary metadata can be stored, but I'd expect this to be possible in the two other formats, too. This way, full lossless .png<->.png,.flif,.webp,.jxl would be possible, the resulting files would be viewable and the original PNG file could be restored. A checksum of the PNG would have to be stored to prevent/invalidate restoration attempts after editing the new file, because the original PNG obviously can't be restored after altering the image data. The size of reconstruction data differs depending on what was used to create the PNG, a rough guess would be: if the image data can be compressed to 50% of the PNG size, resulting file including restoration data would usually have a size between 50% and 75% of the PNG - though edge cases with >= 100% are possible, too (e.g. compressing random noise and using a PNG optimizer). Of course, integration from the other side would also be possible, by integrating preflate and PNG parsing into some webp/jpegxl transcoding side project.
    49 replies | 3915 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Yesterday, 05:38
    i just adding mixercontextset in textmodel.cpp and simdlstmmodel.hpp...
    2302 replies | 608663 view(s)
  • Alexander Rhatushnyak's Avatar
    Yesterday, 02:19
    а потом другой такой же, mikle1, говорит: "втравить несовершеннолетних в политические разборки! Ничего более безнравственного, ну, более гнусного придумать невозможно!" -- на 12:38 своего 17-минутного видео. В несоответствующей своим убеждениям футболке. :_down2: обоим.
    13 replies | 878 view(s)
  • e8c's Avatar
    Yesterday, 02:08
    https://cloudinary.com/blog/legacy_and_transition_creating_a_new_universal_image_codec There are now UFS 3+ and NVMe / SATA 3 SSDs that are fast, cheap, and big enough to handle lossless imaging. The lossy format looks like a thing from the past. Efficient lossless transcoding of existing JPEGs is cool. I want to see this feature in... PNG.
    49 replies | 3915 view(s)
  • natanpro's Avatar
    23rd January 2021, 21:21
    Any Chance for updated mingw64 version? (sources has been updated about a day ago)
    49 replies | 3915 view(s)
  • WinnieW's Avatar
    23rd January 2021, 14:35
    FWIW I use Zstandard for multi-Gbyte heavy backups, with commandline: -12 --long=30 I'm quite content with it.
    16 replies | 662 view(s)
  • lz77's Avatar
    23rd January 2021, 13:52
    lz77 replied to a thread Lucky again in The Off-Topic Lounge
    ​https://www.youtube.com/watch?v=QwLb43aXxHU
    13 replies | 878 view(s)
  • Krishty's Avatar
    23rd January 2021, 12:30
    I released a new version last evening (RSS readers should notice): https://papas-best.com/optimizer_en#download I accidentially deleted the EXIF rotation tag if a JPG file cannot be derotated in lossless mode. If you have a Huawei phone, alas a bunch of JPGs in the ungodly resolution 5312×2988 (not evenly divisible into JPG blocks), they may lose all rotation. This is fixed now. There is another problem lurking below: mozjpegtran fails to rotate some images where normal jpegtran succeeds. I suspect this happens if a JPG file grows after derotation. It seems to be fixed in the current build (4.0.1-rc2), but I’d like to test it internally for a week before I ship it. On my PC, only three files out of a few thousands were affected, so it is not a high priority problem anyway.
    91 replies | 33110 view(s)
  • fcorbelli's Avatar
    23rd January 2021, 12:06
    For me the answer is easy. The one who scales better on multicore. Just like pigz. Single thread performance is useless. On implementation side: the one which can extensively use HW SSE instructions. Compression ratio is irrilevant. Only speed (and limitated RAM usage). In two words: a deduplicated pigz (aka deflate). Or lz4 for decompression speed (not so relevant). In fact I use this one (storing on zfs the deduplicated archive)
    16 replies | 662 view(s)
  • Shelwien's Avatar
    23rd January 2021, 03:08
    Again, this is not about archive format, and obviously there'd be dedup, MT, etc. Question is, which class of compression algorithms to use as a base for development. Compression-wise, my PLZMA almost fits, but encoding with parsing optimization is too slow. fast-CM (like MCM, or nzcc) fits by CR and, potentially, encoding speed (it can be significantly improved only for encoding with context sorting and out-of-order probability evaluation), but there's no solution for decoding speed. And BWT fits both by enc and dec speed, and even CR on text, but BWT CR on binaries is relatively bad. Plus, there're preprocessors and hybrid options - plenty of choices, which is the problem. 1,048,576 corpus_VDI_pcf_x3.1M 249,592 corpus_VDI_pcf_x3.1M.lzma 1048576/249592 = 4.20 (276366/249592-1)*100 = 10.72% 243,743 corpus_VDI_pcf_x3.1M.plzma_c1 1048576/243743 = 4.30 (276366/243743-1)*100 = 13.38% 248,687 corpus_VDI_pcf_x3.1M.rz 1048576/248687 = 4.22 (276366/248687-1)*100 = 11.13% 276,366 corpus_VDI_pcf_x3.1M.zst 1048576/276366 = 3.79 276,403 corpus_VDI_pcf_x3.1M.lzma_a0 // lzma -a0 -d20 -fb8 -mc4 -lc0 -lp0 533,864 corpus_VDI_pcf_x3.1M.lz4-1 369,616 corpus_VDI_pcf_x3.1M.lz4-1.c7lit_c2 443,586 corpus_VDI_pcf_x3.1M.lz4-12 355,800 corpus_VDI_pcf_x3.1M.lz4-12.c7lit_c2 707,961 corpus_VDI_pcf_x3.1M.LZP-DS 236,180 corpus_VDI_pcf_x3.1M.LZP-DS.c7lit_c2 391,962 corpus_VDI_pcf_x3.1M.lzpre 306,616 corpus_VDI_pcf_x3.1M.lzpre.c7lit_c2
    16 replies | 662 view(s)
  • JamesB's Avatar
    23rd January 2021, 01:49
    I get that patents generated huge revenues and funded numerous people work on improved technologies, but it's no longer the only model. The theory that unless someone pays for it, how are you going to fund the next generation of research, is blinkered in the modern world. Huge organisations collectively spend billions on infrastructure - storage, network bandwidth, etc. The more enlightened ones are not really interested in owning a file format or a codec technology - they view their product as being content providers and the mundane nuts are bolts are in the "precompetitive" area. Jointly funding research to reduce their overheads is a really obvious win. Hence why AOMedia exists. The flip side is the old-school members of MPEG that ended up bickering over patent pools and essentially killed MPEG as an organisation. I find it sad, and intensely annoying, that many of those have now turned their attention to other markets such as genomics. I've spent months battling stupid patents, but ultimately I've given up. It's just whack-a-mole. The entire thing is a broken model for all bar a very few select cases.
    2 replies | 312 view(s)
  • pklat's Avatar
    22nd January 2021, 20:54
    think you should consider using 'Linux containers' or similar if possible, they should use space and other resources much more efficiently. dunno about security.
    16 replies | 662 view(s)
  • Jyrki Alakuijala's Avatar
    22nd January 2021, 20:01
    I have landed four improvements on this topic -- however not yet in the public gitlab. I have not checked if they are effective for your use case, and suspect that some more iterations are needed. ​Please keep sending these examples, they are very inspiring.
    49 replies | 3915 view(s)
  • fcorbelli's Avatar
    22nd January 2021, 19:50
    As I tried to explain, the compression rate of virtual machine disks is the last, but the very last, of the aspects that are considered when operating with VM. I enumerate the necessary requirements for those who want to develop his own 0) versioning "a-la-time-machine" 1) deduplication. This is the most important, indeed fundamental, element to save space during versioned copies 2) highly parallelizable compression. Server CPUs are typically low clocked, but with many cores. Therefore the maximum performance obtainable by a certain algorithm on a single core is almost irrelevant. 3) since the problem is clearly IO-bound, ideally a program shoul But it is not a real requirement, the point is the RAM consumption of any multiple processes launched in the background with &d be able to process in parallel data streams arriving from different media (e.g. multiple NFS shares) 4) works with really large files (I've had thousands of terabytes), with a low RAM (~20/30GB, not more). RAM is precious on VM server. Specific compression machines are expensive, delicate, fail 5) the decompression performance is, again, IO-bound rather than CPU-bound. So a system that, for example, does NOT seek (as does for example ZPAQ) when extracting is excellent Even if you compress a lot, absurdly, a virtual disk of 500GB by 98% in 10GB, then you have to write in extraction 500GB, and you will pay the "writing cost" (time) of 500GB 6) an advanced and fast copy verification mechanism. Any unverified backup is not eligible A fast control mechanism is even more important than fast software. So, ideally, you need a check that does NOT include massive data extraction (which we know is really huge). That is ... the mechanism of ZPAQ (!). Keep the hashes of the decompressed blocks, so that you do NOT have to decompress the data to verify them. Clearly using TWO different algorithms (... like I do ...) for hash collisions, if paranoid 7) easy portability between Windows-Linux-* Nix systems. No strange compiling paradigma, libraries etc 8) append-only format, to use rsync or whatever. You simply cannot move even the backups (if you do not have days to spare) 9) Reliability reliability reliability No software "chains", where bugs and limitations can add up. ===== Just today I'm restoring a small virtualbox Windows server with a 400GB drive. Even assuming you get 100MB/s of sustained rate (normal value for a normal load virtualization server), it takes over an hour just to read it. Obviously I didn't do that, but a zfs snapshot and copying yesterday's version (about 15 minutes) In the real world you make a minimum of a backup for day (in fact 6+) This give you no more than 24 hours to do a backup (tipically 6 hours 23:00-05:00,plus 1 hour until 06:00 of uploading to remote site) With a single small server with 1TB (just about a home server) this means 10^12 / (86.400) = ~10MB/s as a lower bound. In fact for 6 hours this is ~50M/s for terabyte. This is about the performance of Zip or whatever. For a small SOHO of 10TB ~500MB/s for 6 hours. This is much more than a typical server can do. For medium size vsphere server soon it become challenging,needing external cruncher (I use AMD 3950x), but with a blazing-fast network (not very cheap, at all), and a lot of efforts. To recap: the amount of data is so gargantuan that hoping to be able to compress it with something really efficient, in a period of a few hours, becomes unrealistic. Unzipping is also no small problem for thick disks If it takes a week to compress'n'test a set of VM images, you get one backup per week. Not quite ideal. Moving the data to a different server and then having it compressed "calmly" also doesn't work. There are simply too many. Often compression is completely disabled (for example, leaving it to the OS with LZ4). This is my thirty years of experience in datastorage, and twenty-five in virtual datastorage.
    16 replies | 662 view(s)
  • JamesB's Avatar
    22nd January 2021, 18:01
    It's also worth considering that this is two way. The data compression community can improve bioinformatics tools, but vice versa is true too. I strongly suspect there may be applications for using minimisers as an alternative to hashing in rapid-LZ applications or when doing dedup. The ability to quickly identify matches across datasets 10s of GB in size is something the bioinformatics community has put a lot of effort into. Similarly some of the rapid approximate alignment algorithms may give better ways of describing not-quite-perfect matches instead of a series of neighbouring LZ steps.
    39 replies | 2077 view(s)
  • pklat's Avatar
    22nd January 2021, 17:13
    if you got several of them, I'd defragment them, perhaps make sparse file, and then use lrzip.
    16 replies | 662 view(s)
  • Shelwien's Avatar
    22nd January 2021, 16:48
    This is not about using some already available tool, but more about development of one. zpaq doesn't fit at all in that sense, because all of its algorithms are worse than other known open-sources ones. Yes, its nice that zpaq is a working integrated solution, and I really appreciate that you're trying to improve it. But this thread is about designing a compression algorithm with given constraints. These constraints are a little ahead of current state-of-art, and there're multiple ways to solve it (making a stronger/slower LZ77 or ROLZ, speed-optimizing a fast-CM, finding a fitting BWT/postcoder setup, some LZ/CM hybrid maybe, etc) so I'd like to know what other developers think about this.
    16 replies | 662 view(s)
  • Gotty's Avatar
    22nd January 2021, 15:06
    Gotty replied to a thread paq8px in Data Compression
    Yes, as Luca said and also Moises said and also I said before: don't post source code here. It's quite cumbersome to find changes in your contributions that way. Do a pull request to my branch instead if you would like to have it reviewed. Would yo do that please? I'm not an authority here but I must act one as I haven't seen a modification/improvement from your side that was bug-free or issue-free. Maybe your current one is an exception, I don't know yet. But I'm willing to review it only when you use git. No source codes in the forum please. That's not reviewer friendly.
    2302 replies | 608663 view(s)
  • LucaBiondi's Avatar
    22nd January 2021, 14:39
    LucaBiondi replied to a thread paq8px in Data Compression
    Hi suryakandau Could you post your modifications for example an exam diff please I would to learn (slowly) how to paq8px works... thank you Luca
    2302 replies | 608663 view(s)
  • fcorbelli's Avatar
    22nd January 2021, 12:41
    When making backup of virtual disks, even worse thick one, it is normal to take say 400GB for image. If you have even only 2/3 it's one terabyte just for a home server. in production easily 10TB for DAY. there is not a single x86 CPU that can compress this amount of data with High compress ratio. bandwith dominate the problem, it is io bound and not CPU bound. If the backup is 370GB or 390GB makes no difference at all. If 3000GB or 3600GB even less. For quick backup the answer is differential zfs send (not incremental) pigzip-ed Requires zfs, lots of Ram and fast disks. It is doable: I do every day from years. But restore is painful, and extensive zfs expertise needed. I make intermedial backup with zfs (hourly) and nighttime zpaqfranz, plus ZIP (yes 7z in ZIP mode)
    16 replies | 662 view(s)
  • fcorbelli's Avatar
    22nd January 2021, 12:30
    Zpaq is the only answer. I use it for the same work by years, until today. Compression speed is decent (150/200MB/s for modern server). Deduplication very good. High compression is totally a waste of time for virtual disks. m1 or even m0 (only dedup). I will prefer pigz -1 but it is too hard to merge in zpaq. It simply... WORKS even with very big files. Decompression is slow with magnetic disks but.. who cares? Better, of course, my zpaqfranz fork. Compile on BSD and Linux and Windows With a decent gui for Windows (pakka) a la time macchine.
    16 replies | 662 view(s)
  • kaitz's Avatar
    22nd January 2021, 04:11
    kaitz replied to a thread Paq8pxd dict in Data Compression
    Silesia:
    1035 replies | 364565 view(s)
  • suryakandau@yahoo.co.id's Avatar
    22nd January 2021, 03:06
    i have make a little improvement on lstm model by adding 1 mixercontextsets and 2 mixerinputs. @darek could you test it on silesia file using -12lrta option ? i think it can save more space. thank you!!
    2302 replies | 608663 view(s)
  • Jyrki Alakuijala's Avatar
    21st January 2021, 23:54
    Funnily, this was a starting point for brunsli. In the first six weeks of Brunsli development (back in 2014) we migrated away from the clustering, because context modeling (and even prediction) were stronger for our JPEG recompression corpus. It doesn't mean that it could be more effective now for another domain. Just back then me and Zoltan were not able to get fantastic results with this approach on JPEG recompression modeling. We got about -14 % of JPEG with this kind of ideas (k means in five dimensions to do entropy coding) and -22 % with the ideas expressed in brunsli or JPEG XL. In WebP lossless we sometimes have 700+ entropy codes for different ergodic processes -- so having 3 to 255 is not necessarily excessive.
    5 replies | 643 view(s)
  • danlock's Avatar
    21st January 2021, 22:13
    New approaches to compression and image processing are exciting and invigorating! I hope they stimulate others who have image compression specialization to consider new perspectives as well! Watching this method evolve will be a lot of fun! Thanks for introducing us to your Rust image compression library!
    5 replies | 643 view(s)
  • Shelwien's Avatar
    21st January 2021, 20:08
    Shelwien replied to a thread paq8px in Data Compression
    fastest method is probably https://www.facebook.com/mattmahoneyfl
    2302 replies | 608663 view(s)
  • CompressMaster's Avatar
    21st January 2021, 20:00
    CompressMaster replied to a thread paq8px in Data Compression
    @Darek, I highly doubt that this is the proper way for reaching Mr. Matt Mahoney. Better to navigate to his website mattmahoney.net and contact him from there.
    2302 replies | 608663 view(s)
  • Shelwien's Avatar
    21st January 2021, 19:55
    100,000,000 mixed100.cut 27,672,053 m1.zpaq 24,873,818 m2.zpaq 19,040,766 m3.zpaq 17,735,276 mixed100.cut.zst 15,180,571 mixed100.cut.lzma 13,465,031 m5.zpaq No, zpaq is totally useless in this case since its LZ and BWT are subpar and CM is too slow. In any case, even -m1 -t1 encoding is already slower than 50MB/s, and -m2 is more like 5MB/s. -m5 compression is good, but 0.5MB/s... there're much faster CMs around.
    16 replies | 662 view(s)
  • ivan2k2's Avatar
    21st January 2021, 18:50
    ZPAQ maybe? It have fast modes like -m1/2/3, or you can play with custom compression mode(-m x.......), it requires some time to find a good one. Just check ZPAQ thread
    16 replies | 662 view(s)
  • kaitz's Avatar
    21st January 2021, 18:39
    kaitz replied to a thread Paq8pxd dict in Data Compression
    paq8pxd_v99 -xml/html like content processed separately in wordmodel -adjust some wordmodel parameters -attempt to detect ISO latin text -some changes in detection (cleanup) 1 thing helps on silesia webster/xml 2 helps 1bit on most text files (1k loss on enwik8) 3 helps on silesia samba maybe mozilla and any other text file prev detected as bintext 4 in tar mode some files are treated as text by extension without confirming (.c,.h,.html,.cpp,.po,.txt), just a bit faster processing. (for examp, linux kernel source tar) ​
    1035 replies | 364565 view(s)
  • Shelwien's Avatar
    21st January 2021, 17:03
    - VM image data (mostly exes and other binaries) - compression ratio 15% higher than zstd (lzma is ~10% better, but its encoding is too slow, or no gain without parsing optimization) - encoding speed = 50MB/s or higher (single thread) - decoding speed = 20MB/s or higher (single thread) Any ideas? Preprocessors can be used, but are also applicable for zstd. CM actually might fit by CR and enc speed (with context sorting etc), but decoding speed is a problem. PPMs doesn't fit because its CR is bad on this data.
    16 replies | 662 view(s)
  • Gotty's Avatar
    21st January 2021, 15:49
    Gotty replied to a thread paq8px in Data Compression
    The change I made in normalmodel ("NormalModel now includes the BlockType in its contexts") in v199 is not fully compatible with text pre-training. Calgary/trans includes mostly text but is detected as "default", so it lost the boost given by text pre-training. A fix is coming in my next version.
    2302 replies | 608663 view(s)
  • suryakandau@yahoo.co.id's Avatar
    21st January 2021, 14:09
    @darek which option do you use ?
    2302 replies | 608663 view(s)
  • Darek's Avatar
    21st January 2021, 13:17
    Darek replied to a thread paq8px in Data Compression
    Scores of 4 Corpuses for paq8px v201. For Calgary corpus there no record, however most of files got the best results. The strange thing is that "trans" file got 5.6% lose between version v198 and 199. For Canterbury corpus there a record of total files compression, no record for tar file due to mentioned change between v192 and v193 For MaxumimCompression corpus - best score for paq8px serie, about 5KB of gain and 5.5KB of gain for tar file For Silesia corpus - there is a record score (15.5KB of gain) and THE BEST SCORE overall - that means paq8px beat best cmix score (whith precompressor) !!!! Request for Matt Mahoney: could you add this score to the official Silesia page? Detailed scores are as follow: compressor version: paq8px_v201, no precompression FILE score truncated score dickens | 1,796,777 | 1796 mozilla | 6,569,742 | 6569 mr | 1,839,331 | 1839 nci | 815,620 | 815 ooffice | 1,183,903 | 1183 osdb | 1,978,882 | 1978 reymont | 698 148 | 698 samba | 1,625,828 | 1625 sao | 3,733,172 | 3733 webster | 4,413,395 | 4413 x-ray | 3,521,326 | 3521 xml | 250,710 | 250 TOTAL | 28,426,834 | 28426
    2302 replies | 608663 view(s)
  • skal's Avatar
    21st January 2021, 09:44
    Glad you noticed the introduction of palette + predictors :) Yes, since right now it's doing a pretty exhaustive search for predictors and transforms to apply. Going forward, some heuristic will help. We're trying to work with premultiplied Argb as much as possible. Means the RGB can't be really preserved under alpha=0 (it's rather a niche use-case, despite regular complaints about it, even for v1 where we introduced the -exact flag). ​
    66 replies | 4781 view(s)
  • skal's Avatar
    21st January 2021, 09:25
    Looking at the slide #10 here: https://www.itu.int/en/ITU-T/Workshops-and-Seminars/20191008/Documents/Guido_Meardi_Presentation.pdf it seems the base layer is now coded at half the resolution (instead of full). The extra enhancement layer is then coded more efficiently. Hence the global gain.
    1 replies | 202 view(s)
  • spaceship9876's Avatar
    21st January 2021, 05:40
    The future AV2 codec might be released in 2026 which is when nearly all h264 patents will have expired. Sisvel claim that hundreds of patents in their patent pool are violated by AV1. The organisation 'Unified Patents' is working on invalidating many h265 and av1 patents in the courts at the moment. It is difficult to know whether EVC will become popular, i highly doubt it. This is because AV1 is already available and hardware decoders are already available. I think the creation of EVC is to try to pressure the companies involved in the creation of VVC/H.266 to choose low patents fees and fair conditions such as no streaming fees. This is what happened with Microsoft's VC-1 codec, it forced the mpeg-la to make h264 cheap to license with good conditions.
    2 replies | 312 view(s)
  • SolidComp's Avatar
    21st January 2021, 03:57
    Hi all – Another next-generation video codec is Essential Video Coding (EVC). This is also an example of a dual layer approach, but it doesn't use an existing codec as the base layer like LCEVC does. Rather, it uses expired patents as the base layer, which is freaking brilliant! What I mean is that the base codec is intended to be free and unencumbered by patents or licensing. But it's a new codec, one that only uses innovations or techniques that were patented at least 20 years ago, and therefore expired. (There are also some patent grants or something thrown in, but it's mostly about expired patents.) The base codec is supposed to be something like 30% more efficient than H.264, so it's probably only slightly worse than H.265/HEVC, and might be similar to VP9 but with less complexity, more speed. The enhanced version might cost money to license, and is supposed to be about 24% more efficient than H.265/HEVC. Here's an article that talks about it. Have any other projects used the expired patents method before as a formal constraint in developing a codec? Is that how free codecs like Opus would be developed? It's fascinating, and of course more possibilities steadily emerge as each day passes and more patents expire. It seems like there might be an opportunity to apply advanced software to leveraging patents this way, but it would probably require something closer to real AI than what Silicon Valley currently labels as "AI".
    2 replies | 312 view(s)
  • SolidComp's Avatar
    21st January 2021, 03:43
    Hi all – Can anyone explain how LCEVC works? It's supposed to leverage an existing codec (e.g. H.264 or VP9) and adds an "enhancement layer". The end result is something like a 28% data savings. I've read a few things about it, but I don't understand where the trick is, the secret sauce. How can adding a layer to an existing encode save data? Here's a recent test comparing H.264 to LCEVC using H.264 as the base layer. Thanks.
    1 replies | 202 view(s)
  • e8c's Avatar
    20th January 2021, 20:48
    e8c replied to a thread Lucky again in The Off-Topic Lounge
    Ну, например, я не припоминаю никаких "плохих" поступков, сделанных Михаилом. (Только не надо говорить про итоги приватизации: с моей точки зрения, большинство людей, получивших в собственность жильё, построенное во времена СССР, имеют на него не больше прав, чем Миша, купивший за копейки (относительно реальной стоимости) несколько предприятий на "залоговых аукционах".) ​ Во-первых, он не олигарх. Олигархи - это Ротенберг(-и), Усманов, Дерипаска, Абрамович. Не олигарх - это, например, Мордашов: никаких муток с "кремлёвскими" не мутит, башляет на общак, когда (и сколько) скажут, а так "его хата с краю". ​Ходорковский и Мордашов похожи, разница в том, что Мордашову комфортно быть "с##кой" "кремлёвских" (поэтому он не потерял собственность), а Мише такая перспектива не очень нравилась. Он давно для себя выбрал роль "парня, топящего за всё хорошее (ну, в основном)" (https://youtu.be/alu-03QTJvs), потом у него появились политические амбиции, чувак решил "конкурировать" с экс-КГБ, не имея в распоряжении превосходящей по силе личной армии. Во-вторых, для чего добавлено прилагательное "Jewish"? ​ Если посмотреть на сообщения пользователя @lz77, в которых как-то упомянуты деньги, то прилагательное "Jewish" выглядит как проявление зависти к тем, кто более удачлив в деле набивания собственных карманов. И точно не выглядит как критика людей, ставящих прибыль превыше всего. ​Вообще, Ходорковский - политический Ноль, и занимается тем, что тратит деньги на третьесортные СМИ ("Открытые Медиа" не сделали практически ни одного громкого расследования). В данный момент Ходорковский - фактически Никто.
    13 replies | 878 view(s)
  • Gotty's Avatar
    20th January 2021, 18:39
    Gotty replied to a thread paq8px in Data Compression
    Paq8px has a bunch of "general" models for "general data". And specialized models for each of the special data: audio, jpg and bitmap images. If paq8px don't see anything "special" about the data (also applies to the case when it can't detect the file type (blockType) because of some data corruption), it will just use its default models to compress it. Additional note: there are a couple of special blocktypes that paq8px can transform before compression: like zip, cdrom, exe. If it fails to detect these formats then no transformation takes place and so without these special compression-helping transformations compression ratio will be somewhat worse.
    2302 replies | 608663 view(s)
  • CompressMaster's Avatar
    20th January 2021, 17:44
    CompressMaster replied to a thread paq8px in Data Compression
    How is compression of broken files handled in paq8px? It refuses to compress at all or at least tries it? By "broken", I mean not text files with missing characters, I mean for example JPG without header, PDF corrupted by virus etc.
    2302 replies | 608663 view(s)
  • fcorbelli's Avatar
    20th January 2021, 17:36
    fcorbelli replied to a thread zpaq updates in Data Compression
    Just to keep https://github.com/fcorbelli/zpaqfranz
    2655 replies | 1132078 view(s)
  • lz77's Avatar
    20th January 2021, 17:21
    lz77 replied to a thread Lucky again in The Off-Topic Lounge
    Try asking Navalny to say something bad about Khodorkovsky or just say the phrase "Jewish oligarch." I guess that it will be very difficult for him to do this. :) Although, with his revelations, he benefits people. It's not clear who wets whom: Putin Navalny or Navalny Putin. :)
    13 replies | 878 view(s)
  • Sportman's Avatar
    20th January 2021, 13:16
    Sportman replied to a thread Lucky again in The Off-Topic Lounge
    Flat upgrade: https://www.youtube.com/watch?v=ipAnwilMncI
    13 replies | 878 view(s)
  • umgefahren's Avatar
    20th January 2021, 13:15
    I have my results! The hole folder with all the compressed images is 305,598,693 Bytes in size. It took 405.2812011241913 Seconds to compress them. It took 9.863522052764893 Seconds to decompress them. I used this image set: RGB 8 bit My compression ratio is 1.539966344 on the hole image set. The compression ratio of the individual images are: | File Name | Compression Ratio | |------------------------|--------------------| | spider_web.ppm | 2.14235671557331 | | deer.ppm | 1.2424318516015507 | | fireworks.ppm | 3.642381743674327 | | artificial.ppm | 12.25476523000428 | | bridge.ppm | 1.2273064711294759 | | flower_foveon.ppm | 2.4469685311217293 | | big_tree.ppm | 1.2789847127858722 | | cathedral.ppm | 1.5089509013690656 | | hdr.ppm | 1.9960575653205344 | | leaves_iso_1600.ppm | 1.203903570936856 | | big_building.ppm | 1.3922857035699863 | | nightshot_iso_1600.ppm | 1.501047996887146 | | nightshot_iso_100.ppm | 2.251600481220427 | | leaves_iso_200.ppm | 1.3158267828823695 | The table function seems broken, i hope this is fine.
    5 replies | 643 view(s)
  • cssignet's Avatar
    20th January 2021, 11:09
    - do you think there is room for improvement in encoding speed (if y, by how much distance from v1 efficiency?) - would you plan to let straight alpha and preserve RGB if a=0? (or is it possible atm? — i did not check)
    66 replies | 4781 view(s)
  • cssignet's Avatar
    20th January 2021, 11:08
    ​ 571c204: >cwp2 -q 100 -effort 1 -mt 8 box.png output size: 70990 (2.17 bpp) Kernel Time = 0.000 = 00:00:00.000 = 0% User Time = 0.265 = 00:00:00.265 = 282% Process Time = 0.265 = 00:00:00.265 = 282% Global Time = 0.094 = 00:00:00.094 = 100% >cwp2 -q 100 -mt 8 box.png output size: 60658 (1.85 bpp) Kernel Time = 0.015 = 00:00:00.015 = 4% User Time = 1.031 = 00:00:01.031 = 314% Process Time = 1.046 = 00:00:01.046 = 319% Global Time = 0.328 = 00:00:00.328 = 100% 3db306e: >cwp2 -q 100 -effort 1 -mt 8 box.png output size: 55907 (1.71 bpp) Kernel Time = 0.015 = 00:00:00.015 = 10% User Time = 0.437 = 00:00:00.437 = 280% Process Time = 0.453 = 00:00:00.453 = 290% Global Time = 0.156 = 00:00:00.156 = 100% >cwp2 -q 100 -mt 8 box.png output size: 50733 (1.55 bpp) Kernel Time = 0.046 = 00:00:00.046 = 8% User Time = 1.828 = 00:00:01.828 = 334% Process Time = 1.875 = 00:00:01.875 = 342% Global Time = 0.547 = 00:00:00.547 = 100% :cool:
    66 replies | 4781 view(s)
  • umgefahren's Avatar
    20th January 2021, 09:45
    I would love to. Give me some time.
    5 replies | 643 view(s)
  • Kirr's Avatar
    20th January 2021, 09:05
    Thanks Innar, and please take good rest and get well! One straightforward path for compression improvement to affect biology knowledge is by improving accuracy of alignment-free sequence comparison. Better compression means better approximation of Kolmogorov complexity and more accurate information distances. This can be used for many purposes, e.g., for comparing entire genomes, or investigating the history of repeat expansions. I'm not sure if improved compression can help in studying coronaviruses specifically, because coronaviruses can be easily aligned, which allows more accurate comparison without compression. But many other topics can greatly benefit from better compression. E.g. see for some overview. One other thing. I think there's too much emphasis on compression strength in this field. This is understandable, because in information science we dream about computing Kolmogorov complexity, so any step closer to approximating it must be welcome. However, compressor users often have a different balance of priorities, where compression strength is just one of useful qualities. (This again explains longevity of gzip in genomics). I realized that many compressor developers mainly care about compression strength. They will spend huge effort fine-tuning their method to gain extra 0.01% of compactness. But they are fine if their compressor works only on DNA sequence (no support for sequence names, N, IUPAC, or even end of line in some cases). Or if their compressor takes days (or weeks) to compress a genome (more problematic, but still common, is when it takes days for decompression too). Maybe it feels great to get that 0.01% compactness, but it's often disconnected from applications. What users want in a compressor is a combination of reliability, strength, speed, compatibility and ease of use. Funny thing is that I did not want to develop a compressor. But I wanted to use one, because I was routinely transferring huge data back and forth among computation nodes. I was shocked to realize that in a ton of available DNA compressors there's not one that was suitable for my purposes. (Never mind another ton of papers describing compression method without providing any actual compressor). Currently, personally NAF is perfect for my needs. But if I ask myself, how it can be made even better, the answer (for me as a user) is not just "20% better compactness" (even though it would be great too). Instead it may be something like: 1. Random access (without sacrificing compression strength much). 2. Library for easier integration in other tools. 3. Built-in simple substring searcher. 4. Something about FASTQ qualities. (:)). etc.. E.g., is an interesting recent development for practical uses. Zielezinski et al. (2019) "Benchmarking of alignment-free sequence comparison methods" Genome Biology, 20:144, https://doi.org/10.1186/s13059-019-1755-7 Hoogstrate et al. (2020) "FASTAFS: file system virtualisation of random access compressed FASTA files" https://www.biorxiv.org/content/10.1101/2020.11.11.377689v1.full
    39 replies | 2077 view(s)
  • Shelwien's Avatar
    20th January 2021, 02:29
    Welcome! Can you post some test results vs other lossless codecs? http://imagecompression.info/gralic/LPCB.html
    5 replies | 643 view(s)
  • innar's Avatar
    20th January 2021, 00:46
    Dear Kirr, Thank you so much for such deep insights. I had an unexpected health issue, which took me away from computer for few weeks, but I will bounce back soon, hopefully in the end of this week and work through the whole backlog of submissions incl yours. Your contribution is highly appreciated! Meanwhile, if someone checks this forum, then I would relay a question, which I got from one of the top 50 researchers in genetics: if suddenly someone would get a (let's say) 20% (30%? 50%?) better result than others - how could this be turned to an insight for professionals with deep knowledge about coronaviruses? What would be the way, representation or visualization of results (or tools) that would enable a person knowing nothing about compressing algorithms, but a lot about coronaviruses, to understand how such compression came about? I think this is an important and fundamental question from many benchmarks- how to leak the "intelligence of better compression" back to the field? Any ideas?
    39 replies | 2077 view(s)
  • Mike's Avatar
    19th January 2021, 22:14
    Mike replied to a thread 7zip update in Data Compression
    7-Zip 21.00 alpha was released: https://sourceforge.net/p/sevenzip/discussion/45797/thread/dd4edab390/
    2 replies | 872 view(s)
More Activity