Activity Stream

Filter
Sort By Time Show
Recent Recent Popular Popular Anytime Anytime Last 24 Hours Last 24 Hours Last 7 Days Last 7 Days Last 30 Days Last 30 Days All All Photos Photos Forum Forums
  • Shelwien's Avatar
    Today, 22:05
    Shelwien replied to a thread fpaq0 in ADA in Data Compression
    @JamesWasil: I think its not just a random choice of language - https://encode.su/threads/3064-paq8pxv-virtual-machine?p=62339&viewfull=1#post62339
    3 replies | 1 view(s)
  • JamesWasil's Avatar
    Today, 22:03
    JamesWasil replied to a thread fpaq0 in ADA in Data Compression
    Lol I haven't used Ada for ages. That is cool though, kaitz. Thanks! Next someone should port it to Fortran, Basic, and APL lol (Has anyone here used APL for anything in the last 30 years?)
    3 replies | 1 view(s)
  • Shelwien's Avatar
    Today, 22:02
    Shelwien replied to a thread fpaq0 in ADA in Data Compression
    So can you explain what you see in Ada that makes it useful for compression algorithms? This source looks pretty similar to pascal/delphi to me.
    3 replies | 1 view(s)
  • kaitz's Avatar
    Today, 21:55
    kaitz started a thread fpaq0 in ADA in Data Compression
    This is fpaq0 ported to ADA language. source: http://mattmahoney.net/dc/fpaq0.cpp Used GPS 19.1 dev enviroment for that. Executable is static. Compression/decompression is identical to cpp source. This is first time to write something in ADA so probably not the best example writing correct code. ​:D
    3 replies | 1 view(s)
  • cottenio's Avatar
    Today, 21:01
    Hi Gotty! Ha, you're absolutely right, and what's crazy is I had the same thought you did and am building a minimal skeleton first. I have similar results to yours, although I also found some fun ways to infer data about the relationship with timestamps and revision_ids and storing the errors from predicted values. I'll definitely check out your code as well!
    4 replies | 183 view(s)
  • cottenio's Avatar
    Today, 20:59
    Thanks for the warm welcome James! I really appreciate your insight and will definitely try out the technique on other sources as well; thanks for the links!
    4 replies | 183 view(s)
  • dado023's Avatar
    Today, 20:23
    Hi Sven, would you be so kind and share your pngbest.bat. I am mostly fan of best possible compression, but within in a reasonable compression duration :)
    421 replies | 110004 view(s)
  • Aniskin's Avatar
    Today, 20:13
    Imho quite strange using of MFilter. And again - MFilter should be used with additional compression -m1=LZMA(2). MFilter separates input jpeg file to metadata and jpeg data, then pass jpeg data into selected jpeg coder. So if you don`t use -m1=LZMA(2) all metadata will be without any compression and additionally result will have overhead added by 7Z file format and MFilter codec. And simple using of Lepton/Brunsli/PAQ will give you better results because they compress metadata. lepton+paq8px 52,820,938 7z a -m0=mfilter:a2 <- not the same size as last entry at 50,782,720 despite similar processing!? This is not lepton+paq8px. This is paq8px only. Metadata without any compression. And simple paq8px give you better result. lepton (slow version) 52,705,758 ran under original 7-zip using command 7z a -m0=mfilter:a1 Metadata without any compression. Simple Lepton.exe may give you better result. brunsli 52,444,562 7a a m0=mfilter:a0 Metadata without any compression. Simple Brunsli.exe may give you better result. lepton+paq8px+lzma2 51,434,256 7z a -m0=mfilter:a2 -m1=lzma2:x9 This is not lepton+paq8px+lzma2. This is paq8px+lzma2.
    21 replies | 1101 view(s)
  • Gotty's Avatar
    Today, 19:09
    Also refer to Fastest and smallest enwik8 skeleton (de)compression (for fun) The observation is the same, the solution is a little bit different.
    4 replies | 183 view(s)
  • schnaader's Avatar
    Today, 12:41
    Thanks! That was a mutex error in PNG restoration, fixed.
    55 replies | 3813 view(s)
  • JamesWasil's Avatar
    Today, 09:28
    Hi Cottenio, and welcome to the Encode Data Compression forums! What you've done is interesting, as it's a form of lossless Delta Compression combined with intelligent binary headers and flags speculating distances for page data, while compressing the gaps by the shortest predictions possible with logarithmic bits required for each structure. There really should be a name for structural targeting and compression of file attributes that are outside of LZ pattern matching or other forms of weighted context mixing and partial matches, etc. As far as I'm aware, there is no official name for it yet, but perhaps there should be one and a glossary of nomenclature to accompany it? Although there are names and definitions for things mathematically that are commonly understood and accepted for naming convention; things like order 0 translating to 1 byte analysis, order 1 as 2 bytes, order 2 as 3, etc as n+1 always, etc. There are many things not defined by a static name that may be beneficial to assign, and your neat work (whether you partially reinvented the wheel or not ;) ) brings the importance of that to the forefront. I suppose we should name it the Cottenio Delta algorithm, since it is yours and it is a form of delta encoding. What do you guys think? P.S: You may want to see if there are ways to apply this and tailor it to be useful with other text files that are outside of enwiki8...perhaps focusing upon spaces and page breaks CR+LF (chr(13)+chr(10)) or other commonalities in text to preformat it for compression. There are several ways you might go about implementing this, like detecting how many words exist before a space or sequence of common characters, removing them, and then representing them with binary flags similar to how you did with the page IDs and missing pages from enwiki8. That said, it might end up being a form of precomp if you're able to isolate the bit flags and have the rest of the file remain text data that can still be worked upon by other compressors, adding to their efficiency. That's a way to approach it and one idea for it, but there are many more I'm sure. P.P.S: If you do decide to expand upon it further or tie together this implementation with a form of LZ hybrid for a stand-alone text-based compressor, you might find some of Ross Williams work from years ago as beneficial to do that, available freely on http://ross.net/compression/introduction.html (You still might want to make it a precomp and use cmix for better compression, but you have plenty of options)
    4 replies | 183 view(s)
  • SvenBent's Avatar
    Today, 02:36
    Thank you. It was mostly for curiosity and testing. I am doing a re evaluation of ect against my old pngbest.bat script. trying to figure out which tools are still useable when using ECT
    421 replies | 110004 view(s)
  • cottenio's Avatar
    Today, 00:14
    Hi all, I'm relatively new to data compression and have been toying around with enwik8. After noticing that the page ids (which are strongly structured as <id></id> tags) are sequential in nature with gaps due to - I assume - deletions of dead pages , I tried my hand at doing some delta encoding of the gaps, while knowing that in general I could assume at least a delta of 1 for each subsequent id. Knowing that there are 12,347 total pages in enwik8 I did as follows: The largest gap I found was 198, so in the first naive implementation I stored each delta in ceil(log(198,2),1) bits, which was 8, which took up 12,347 bytes. I'm sure no one is surprised by that. 12,347 bytes was much better than the 226,718 bytes that the natural text represented ("____<id>%i</id>\n"). I wondered how I could make that more efficient, so I tried getting a feel for the shape of the deltas, which looked like this: As you can see the outliers like 198 and 178 massively inflated the bits required, so the next strategy was bit-packing everything but those two with only 5 bits, and then manually fixing those during the decoding process. Now the largest delta was only 25 (5 bits) and that took ceil((12,345 * 5) / 8, 1) = 7,716 bytes (+ a few more for the two repairs), which I thought was a neat improvement. Next I thought: can I embed that idea directly in an unpacking structure that operates on interval steps? I worked out the math and found that: 12,347 values as 1 bit: 0 for no additional delta, 1 for additional delta (max is 1) 2,151 values from above had 1's. Most of them don't have any higher value, so I stored another 2,151 values as 1 bit: 0 - for done, 1 additional delta (max is 2) 475 of those had reached the equivalent of 2 by this point. So I stored another 475 as 2 bit values: 0 - for done, 1 - 3 as higher numbers with 3 meaning the possibility of additional delta beyond the sum so far (max is 5) 18 of those reached 5 from before, so I stored 18 as 4 bit values using the same idea, with a max reaching 20 5 of those reached 20, so i stored 5 as 8 bit values. Altogether, this required 1,544 + 269 + 119 + 9 + 5 = 1,946 bytes. I wrote out a binary file containing those bit-packed sequences, then a short C program to decode it again and it worked fine. I threw it into assembly (and I am by no means a talented assembly programmer) and ended up with a 4,096 byte win32 console executable (data embedded) - which I thought was pretty great for a little decoding engine that can unpack the deltas and recreate the original enwik8 strings. The total compression storage is, compared by type (compared to 4,096 bytes): Raw enwik8 lines: 226,718 (1.8%) Parsed chars/no-tags (just the numbers): 53,320 (7.7%) 32-bit integers: 49,388 (8.3%) 8-bit deltas: 12,347 (33.2%) Control: cmix achieved a 1,685 byte file on the a text file just containing the original page ids. So I'm pretty sure I've reinvented a wheel here, but I have no idea what the technique is called so I can learn more about it. It's like a delta encoder, but it infers additional interval steps based on an expansion and knowing the ceiling of the previous interval. Any ideas? I've attached the raw data and the executable for review, and included a link to the bit packing data as a Google sheet. enwik8_raw_page_ids.txt - a grep of all page id lines from enwik8 page_id_decoder.exe - extracts the same content as enwik8_raw_page_ids.txt https://docs.google.com/spreadsheets/d/1Xq9-KVF40BxwUNf6pkXM0z2L6X0vKryT-fw8122VGZE/edit?usp=sharing
    4 replies | 183 view(s)
  • brispuss's Avatar
    Today, 00:01
    Added four more compression methods. However, having problem running lepton (lepton-slow-best-ratio) by itself. Command syntax appears when typing in lepton-slow-best-ratio.exe, but trying to compress files always brings up error - "Failed to start subprocess with command line OS_ERROR". Command - lepton-slow-best-ratio.exe 0067.jpg 0067.lep (for example), or even just - lepton-slow-best-ratio.exe 0067.jpg brings up this error!? Why getting this error message? BTW running under Windows 7 64 bit.
    21 replies | 1101 view(s)
  • fhanau's Avatar
    Yesterday, 22:30
    You can change it in the source code if it is important to you, but I do not plan to support this in general.
    421 replies | 110004 view(s)
  • JamesB's Avatar
    Yesterday, 20:15
    Gzip is too popular. I regularly have discussions trying to channel people towards zstd instead. No reason to use gzip in modern era IMO unless it's some legacy compatibility.
    10 replies | 805 view(s)
  • SvenBent's Avatar
    Yesterday, 19:44
    When running with --allfilters-b, ECT stop after 500generations with no progress. Is there a way to increase that threshold ?
    421 replies | 110004 view(s)
  • Aniskin's Avatar
    Yesterday, 18:35
    Technically there is no problem to create such version of the codec. I will think about it.
    21 replies | 1101 view(s)
  • Shelwien's Avatar
    Yesterday, 18:20
    @Aniskin: Btw, is it possible to get mfilter to output jpeg and metainfo to different streams? With that we could avoid compressing jpegs twice...
    21 replies | 1101 view(s)
  • Aniskin's Avatar
    Yesterday, 18:04
    If you want to use MFilter+Lepton: 7z a -m0=mfilter:a1 -m1=lzma2:x9 If you want to use MFilter+paq8: 7z a -m0=mfilter:a2 -m1=lzma2:x9 Also what about solid compression?
    21 replies | 1101 view(s)
  • brispuss's Avatar
    Yesterday, 17:38
    Thanks. I didn't remove metadata.
    21 replies | 1101 view(s)
  • smjohn1's Avatar
    Yesterday, 17:33
    Did you remove meta-info when using packJPG ( -d )? meta-info has a large percentage for small files.
    21 replies | 1101 view(s)
  • Kirr's Avatar
    Yesterday, 17:22
    Yes, fqzcomp performs well considering it works via wrapper that chops long sequence into reads. (And adds constant quality as per your idea, which I probably took a bit too far :-)). Interestingly, it is currently leading in compactness on spruce genome: chart (though this test is not complete, some compressors are still missing). Also it may still improve more after I add its newly fixed "-s9" mode. I guess it will work even better on proper fastq shord reads datasets. Thanks. Yeah, NAF is focused on transfer + decompression speed, because both of these steps can be a bottleneck in my work. I noticed that many other sequence compressors are primarily optimized for compactness (something I did not know before doing the benchmark), which partly explains why gzip remains popular.
    10 replies | 805 view(s)
  • brispuss's Avatar
    Yesterday, 16:55
    OK. Working on other compression examples. Modified MFilter7z.ini as per earlier post, but changed paq8px version to 183fix1 instead of version 181fix1. Created sub directory named paq8px_183fix1 under Codecs sub directory under 7-zip. Paq8px_v183fix1 executable copied to the directory paq8px_183fix1. So the command for lepton plus paq8px_v183fix1 PLUS lzma2 should now be 7z a -m0=mfilter:a1 -m1=mfilter:a2 -m2=lzma2:x9 0067.7z 0067.jpg (for example)?
    21 replies | 1101 view(s)
  • Marco_B's Avatar
    Yesterday, 15:48
    Hi all, ​in this episode I am glad to tell about an attempt of my own to dispense the problem previously encountered in Grouped (ROLZ) LZW: the fixed size of the groups (dictionaries attached to contest). A way to proceed is illustrated by Nakano, Yahagi, Okada but I started from a different consideration. Every time a symbol occurs in a text, it gains an increasingly number of children and the chance for it to reappear is more and more smaller, while an entropy stage which takes its output assigns shorter codes. To conciliate this two opposite I settled down a schema where symbols belong to a set of lists entitled for the number of children, and each list is organized as LRU. A symbol now will be emitted by its list and rank inside it, respectively via the Witten-Cleary arithmetic coder (Witten Cleary arithmetic coder) and the Elias delta. I choosed an AC because it s the solely that can mimic closely the fatherhood distribution among ymbols, but this constraint put me in front of the necessity to interleave ts sequence. After a complicated period I realized that the solution must be based on two facts: (i) the encoder has to be ahead of two symbols beacause the decoder needs to start with 16 bit; (ii) the variables high and low (which define the focus interval) are in lockstep between this two part. The rest, you can see it through the source below. Actually at the moment the compressor is terrible, both in terms of speed and ratio, but I made it public as the interleaver could be of some interest. I have in mind to improve the performance of the compressor imposing on it a more canonical contest apparatus, that should curtails the lists at the expense of memory consumption. I hope to be back soon, greetings, Marco Borsari
    0 replies | 99 view(s)
  • Shelwien's Avatar
    Yesterday, 15:34
    Shelwien replied to a thread FLiF Batch File in Data Compression
    > stuffit deluxe is even commercial! Here's stuffit windows binary: http://nishi.dreamhosters.com/stuffit14_v0.rar > which one would you use on linux? There's wine on linux, so hard to say. But I like BMF.
    14 replies | 661 view(s)
  • pklat's Avatar
    Yesterday, 15:18
    pklat replied to a thread FLiF Batch File in Data Compression
    sorry, mea culpa. so there are: rkim, StuffItDeluxe, Flic, zpaq, bmf, gralic, cmix. but most of them are either too slow or too 'exotic' imho. stuffit deluxe is even commercial! flif is supported by xnview, irfanview, imagemagick. which one would you use on linux?
    14 replies | 661 view(s)
  • Shelwien's Avatar
    Yesterday, 14:58
    Shelwien replied to a thread FLiF Batch File in Data Compression
    Well, this says that it isn't: http://qlic.altervista.org/LPCB.html As to solid flif - there's support for multi-image files.
    14 replies | 661 view(s)
  • pklat's Avatar
    Yesterday, 14:49
    pklat replied to a thread FLiF Batch File in Data Compression
    from my limited tests, flif has best lossless ratio. however, its slow at decoding and not supported well. it seems best to use it for archiving only for now. as someone already mentioned, pity it has no 'solid' option yet for large set of pictures like archivers do. ( or dictionary option )
    14 replies | 661 view(s)
  • Aniskin's Avatar
    Yesterday, 14:37
    MFilter should be used with additional compression -m1=LZMA(2) because MFilter does not pass any metadata into packer. And maybe lepton.exe without MFilter will show better result.
    21 replies | 1101 view(s)
  • Shelwien's Avatar
    Yesterday, 13:51
    Add also -exact: https://encode.su/threads/3246-FLiF-Batch-File?p=62419&viewfull=1#post62419
    51 replies | 11019 view(s)
  • Shelwien's Avatar
    Yesterday, 13:49
    1. jojpeg doesn't compress metainfo on its own, try using "pa -m0=jojpeg -m1=lzma2:x9", "pa a -m0=jojpeg -m1=lzma2:x9 -m2=lzma2:x9:lp0:lc0:pb0 -mb00s0:1 -mb00s1:2" 2. There's Aniskin's post above about integrating paq8px into mfilter, results could be better than standalone paq8px. 3. add brunsli, maybe new precomp too. 11,711,247 jpg\ 11,333,680 0.pa // pa a -m0=jojpeg 1.pa jpg 9,161,957 1.pa // pa a -m0=lzma2:x9 0.pa jpg 9,132,865 2.pa // pa a -m0=jojpeg -m1=lzma2:x9 2.pa jpg 9,133,254 3.pa // pa a -m0=jojpeg -m1=lzma2:x9 -m2=lzma2:x9:lp0:lc0:pb0 -mb00s0:1 -mb00s1:2 3.pa jpg
    21 replies | 1101 view(s)
  • pklat's Avatar
    Yesterday, 13:33
    i have tried webp (0.6.1-2) in Debian using lossless mode ( -z 9 )with .png image, but it doesn't produce exact same .bmp after unpacking?
    51 replies | 11019 view(s)
  • JamesB's Avatar
    Yesterday, 12:25
    JamesB replied to a thread DZip in Data Compression
    The authors are heavily involved in various genome sequence data formats, so that's probably their primary focus here and why they have so much genomic data in their test corpus. So maybe they don't care so much about text compression. At the moment the tools (Harc, Spring) from some of the authors make heavy use of libbsc, so perhaps they're looking at replacing it. Albeit slowly... They'd probably be better off considering something like MCM for rapid CM encoding as a more tractable alternative, but it's always great to see new research of course and this team have a track record of interesting results.
    4 replies | 290 view(s)
  • brispuss's Avatar
    Yesterday, 11:26
    Ran a few more compression tests using 171 jpeg image files from a HOG (Hidden Object Game). Total size of all files is 64,469,752 bytes. Created batch files to compress each file individually. original files 64,469,752 bytes jojpeg (version sh3) 55,431,288 ran under 7-zip modified (7zdll) using command pa a -m0=jojpeg where "pa" was renamed from "7z" jojpeg+lzma2x2 53,481,658 pa a -m0=jojpeg -m1=lzma2:x9 -m2=lzma2:x9:lp0:lc0:pb0 -mb00s0:1 -mb00s1:2 jojpeg+lzma2 53,479,612 pa a -m0=jojpeg -m1=lzma2:x9 lepton+paq8px 52,820,938 7z a -m0=mfilter:a2 <- not the same size as last entry at 50,782,720 despite similar processing!? lepton (slow version) 52,705,758 ran under original 7-zip using command 7z a -m0=mfilter:a1 brunsli 52,444,562 7a a m0=mfilter:a0 packjpg (version 2.5k) 51,975,698 fast paq8 (version 6) 51,588,301 using option -8 lepton+paq8px+lzma2 51,434,256 7z a -m0=mfilter:a2 -m1=lzma2:x9 paq8pxd (version 69) 51,365,725 using option -s9 paq8px (version 183fix1) 51,310,086 using option -9 brunsli+lzma2 51,133,547 7z a -m0=mfilter:a0 -m1=lzma2:x9 lepton+lzma2 51,116,016 7z a -m0=mfilter:a1 -m1=lzma2:x9 lepton+paq8px183fix1 50,782,720 7z a -m0=mfilter:a1 PLUS paq8px_v183fix1 -9 manually Noticed that packjpg tended to produce larger than original files when compressing small files originally in the tens of kilobytes size. But packjpg performed better with good compression on larger files in the 100's of kilobytes size and up. EDIT: added four more compression results. More results to come later. EDIT2: added three more compression results. More results to come later. Surprisingly relatively poor result by using lepton + paq8px_v183fix1 + lzma2!? EDIT3: added lepton+paq8px at 52,820,938 bytes. But surprisingly not the same size as lepton+paqpx183fix1 (with paq8px separately processed after 7z a -m0=mfliter:a1)!? Why?
    21 replies | 1101 view(s)
  • Sebastian's Avatar
    Yesterday, 09:39
    Sebastian replied to a thread DZip in Data Compression
    Interestingly, the improvements using the dynamic model are almost neglible on some files, although it is not clear to me if they keep updating the static model.
    4 replies | 290 view(s)
  • Shelwien's Avatar
    Yesterday, 08:08
    Shelwien replied to a thread DZip in Data Compression
    Not related to cmix. Its written in python and uses all the popular NN frameworks. Compression seems to be pretty bad though - they report worse results than bsc for enwik8.
    4 replies | 290 view(s)
  • bwt's Avatar
    Yesterday, 01:55
    bwt replied to a thread DZip in Data Compression
    It seems like cmix
    4 replies | 290 view(s)
  • Gonzalo's Avatar
    5th December 2019, 20:18
    Precomp hangs restoring an .iso image file. I attached a 10 Mb chunk around the area where it happens. On this particular file, precomp hangs at 39.27%
    55 replies | 3813 view(s)
  • JamesB's Avatar
    5th December 2019, 20:16
    JamesB started a thread DZip in Data Compression
    I stumbled across another neural network based general purpose compressor today. They compare it against LSTM where it mostly does better (sometimes very much so) but is sometimes poorer. I haven't tried it yet. https://arxiv.org/abs/1911.03572 https://github.com/mohit1997/DZip
    4 replies | 290 view(s)
  • Shelwien's Avatar
    5th December 2019, 18:30
    Shelwien replied to a thread vectorization in Data Compression
    I guess you mean https://en.wikipedia.org/wiki/Image_tracing Conversion software like that does exist, so you can test it. In particular, djvu format may be related. But its much harder to properly compress vector graphics... for any reasonable image complexity it would be better to rasterize it instead and compress the bitmap.
    1 replies | 180 view(s)
  • smjohn1's Avatar
    5th December 2019, 18:08
    It is really hard to come to a definitive conclusion with only one sample file. In general packJPG ( which Lepton borrowed a lot of techniques from ) works quite well. It would be great if you could report test results on a large group of sample JPEG files with various contexts, such as people, nature, etc.
    21 replies | 1101 view(s)
  • pklat's Avatar
    5th December 2019, 16:59
    pklat started a thread vectorization in Data Compression
    ( if OT, or already mentioned, sorry ) what do you think of using vectorization as a sort of 'lossy' compresion? pdf could be converted to svg, for eg. .cbr also depending on content. something would be lost, but something gained obviously. or use it for text parts only?
    1 replies | 180 view(s)
  • suryakandau@yahoo.co.id's Avatar
    5th December 2019, 07:19
    XML.tar from Silesia benchmark without Precomp and memory only ~1.3 Gb the result is 265136 byte
    18 replies | 1255 view(s)
  • suryakandau@yahoo.co.id's Avatar
    5th December 2019, 05:18
    It only takes ~1.3 Gb memory
    18 replies | 1255 view(s)
  • Shelwien's Avatar
    5th December 2019, 00:20
    As I said, its not a single files, there're dependencies (vbrun300.dll etc)
    5 replies | 321 view(s)
  • CompressMaster's Avatar
    4th December 2019, 21:06
    @Shelwien it does not work properly. I tried many versions of that file (including uncorrupted original from corrupted CD), but it does not work - same error. Could I request you for compiling that? I have found some of these approached prior, but I have only VS 2010. Thanks a lot.
    5 replies | 321 view(s)
  • JamesB's Avatar
    4th December 2019, 20:43
    In that case I'm amazed fqzcomp does even remotely well! It was written for short read Illumina sequencing data. :-) Luck I guess, although it's clearly not the optimal tool. NAF is occupying a great speed vs size tradeoff there.
    10 replies | 805 view(s)
  • kaitz's Avatar
    4th December 2019, 19:18
    https://www.toptensoftware.com/win3mu/ ​
    5 replies | 321 view(s)
  • schnaader's Avatar
    4th December 2019, 17:52
    Here's the latest development version - I fixed an error with a file write that had no mutex, which led to incorrect reconstruction on files with many small interleaved JPG and preflate (PDF/ZIP/...) streams.
    55 replies | 3813 view(s)
  • Kirr's Avatar
    4th December 2019, 11:57
    The first type of data is currently not represented at all in the benchmark. I will certainly add such data in the future. The other two kinds are both used, and I thought are labeled clearly. But probably not clearly enough. I will try to further improve the clarity. Thanks! There are also different kind of compressors. E.g., those designed specifically for short reads vs those not caring about sequence type. I will probably separate short read compressors to their own category. (Currently I bundle all specialized compressors together as "Sequence compressos").
    10 replies | 805 view(s)
  • Shelwien's Avatar
    4th December 2019, 09:08
    There're actually 3 different "codecs" based on same reflate in 7zdll: -m0=reflate:x9 -m0=reflate*4:x=9876 -m0=reflate*7:x=1234567 Thing is, its quite possible to encounter nested deflate... like .png in .docx in .zip, so reflate supports nested preprocessing... but each nesting level requires an extra output stream for diff data, and 7-zip framework doesn't support dynamic number of output streams, so I made 3 instances with different names instead: "reflate" has nesting depth 1, "reflate*4" = depth 4, "reflate*7" = depth 7... 7-zip framework also has a limit at max 8 output streams per codec, and reflate*7 uses all of them. Unfortunately reflate doesn't yet have detection of zlib parameters... window is assumed to always be 32k (which is bad for pngs), and level has to be specified manually (via x parameter in this case; default is 6). Thus "-m0=reflate:x9" means "diff against zlib level 9", while "-m0=reflate*4:x=9876" means "level 9 for depth 0, level 8 for depth 1...". Its important to keep in mind that in 7zdll, reflate's "x" parameter clashes with global "x" parameter, so "7z a -m0=reflate ..." would use reflate/level6, while "7z a -mx=9 -m0=reflate ..." would suddently use reflate/level9, but its not a problem if reflate level is always specified manually like "7z a -mx=9 -m0=reflate:x6 ...". Reflate library has some other parameters, but they're not included in 7zdll atm. https://encode.su/threads/1399-reflate-a-new-universal-deflate-recompressor?p=50858&pp=1 Also keep in mind that actual cmdline using reflate would look something like this: pa a -m0=reflate:x6 -m1=lzma2:x9 -m2=lzma2:lc0:lp0:pb0 -mb00s0:1 -mb00s1:2 archive.pa *.pdf or even pa a -m0=reflate:x6 -m1=jojpeg -m2=lzma2:x9 -m3=lzma2:lc0:lp0:pb0 -m4=lzma2:lc0:lp0:pb0 -mb00s0:1 -mb00s1:4 -mb01s0:2 -mb01s1:3 archive.pa *.pdf since reflate's diff outputs are usually incompressible per deflate stream... but would be same for multiple instances of same stream. I have this converter for complex filter trees: http://nishi.dreamhosters.com/7zcmd.html
    131 replies | 72423 view(s)
  • brispuss's Avatar
    4th December 2019, 06:35
    Thanks. I've got the 7zdll "7z" working again. Followed your advice and renamed 7z.exe to pa.exe to avoid confusion with the original 7-zip program. For reflate, I think there may be options associated with it? What are those options please?
    131 replies | 72423 view(s)
  • Shelwien's Avatar
    4th December 2019, 06:19
    It says "Not implemented" about the archive format (.7z or .wtf). Archive name should either end with ".pa" or you need to specify the format explicitly with -tpa option. Also maybe rename 7zdll's 7z.exe to pa.exe - that would let you use both 7-zip and 7zdll at once.
    131 replies | 72423 view(s)
  • brispuss's Avatar
    4th December 2019, 04:01
    Thanks for the detailed comments. However, it doesn't entirely explain why I'm getting the "Not implemented" errors. I can now install the original 7-zip program and run it OK. But still getting errors when trying to compress files using files from 7zdll_vF7.rar. Original 7-zip program once again uninstalled. I've "installed" files from 7zdll_vF7.rar to c:\7zdll_vF7 directory. Set environment variables path to c:\7zdll_vf7\x64 since I'm running x64 version of Windows. Typing 7z in a command window brings up command syntax as expected. But trying to compress anything brings up errors again "Not implemented". Trying to run reflate on a small swf file (attached) with the command - 7z a -m0=reflate intenium.swf.wtf intenium.swf. But it doesn't work > "Not implemented" What is wrong here? Is the 7zdll installation wrong, somehow?
    131 replies | 72423 view(s)
  • Shelwien's Avatar
    4th December 2019, 02:26
    1. 7zdll works with a ".pa" format, there's no support for any other formats. You can't replace normal 7-zip with it. 2. With 7zdll, "7z a 0067jpg.pa 0067.jpg" would compress the .jpg with lzma2, there's no codec optimization in console version. 3. 7zdll does include same lepton and brunsli codecs (and jojpeg, but its much slower), but they can be used only with -ms=off (because both lepton and brunsli want to load the whole file), so for jpegs atm its better to use normal 7-zip with mfilter or 7zdll with jojpeg.
    131 replies | 72423 view(s)
  • brispuss's Avatar
    4th December 2019, 01:35
    Not a bad result. However, the best compression is still gained by using paq8px_v183fix1 -9beta on the 0067.jpg file which results in a 101,192 byte file.
    21 replies | 1101 view(s)
  • Jarek's Avatar
    4th December 2019, 00:00
    Simple approximation of above modeling minimum as where linear trend of gradients intersects 0, gives looking very practical and universal way of adaptive choice of learning rate 'mu'. Imagine you would like to minimize in 1D using gradient (g) descent - with steps: theta <- theta - g * mu The big question is how to choose learning rate mu - too small and you get slow convergence, too large and you e.g. jump over valleys. In second order method we would like to find minimum of parabola e.g. in such single step. For parabola these (theta,g) points are on line as in figure above - we can find its linear coefficient by dividing standard deviations of both - we jump to its minimum if using: mu = sqrt( variance(theta) / variance(g) ) Getting very simple and pathology resistant (repelling from saddles, avoiding too large steps) adaptive choice of learning rate - requires updating just four (exponential moving) averages: of theta, theta^2, g, g^2. For example we can cheaply do it for all coordinates in SGD (page 5 of v2) - getting 2nd order learning rate adaptation independently for each parameter. Standard methods like RMSprop, ADAM use sqrt(mean(g^2)) denominator, but it will not minimize parabola in one step - have anybody seen optimizers using variances like here?
    23 replies | 1400 view(s)
  • Aniskin's Avatar
    3rd December 2019, 23:59
    MFilter + paq8px_v181fix1.exe MFilter7z.ini: Encode=lepton\lepton-slow-best-ratio.exe "%1" "%2" Decode=lepton\lepton-slow-best-ratio.exe "%1" "%2" PipeDecode=lepton\lepton-slow-best-ratio.exe - Ext=lep Ext=paq8 Encode=paq8px_v181fix1\paq8px_v181fix1.exe -9 "%1" "%2" Decode=paq8px_v181fix1\paq8px_v181fix1.exe -d "%1" "%2" Result: 101401 bytes
    21 replies | 1101 view(s)
  • brispuss's Avatar
    3rd December 2019, 12:25
    OK. Something strange going on here! I believe I got reflate working under 7z a few days ago, but 7z now no longer works (properly), and can't get anything to compress!? Getting error: System ERROR Not implemented now!? Not sure if I used the original 7-zip installation with the updated files from file 7zdll_vF7.rar installed within the original directory (c:\program files\7-zip), or I might have used a newly created directory called c:\7z-mod with x64 version only files from file 7zdll_vF7.rar? The original (unmodified) 7-zip program under c:\program files\7-zip has been uninstalled. And the c:\7z-mod folder and files were deleted recently. I've re-created c:\7z-mod folder and extracted only the x64 files from file 7zdll_vF7.rar. This is the only 7-zip installation at the moment. Typing in 7z in a command window brings up 7z syntax etc. But trying a simple compress test such as - 7z a 0067jpg.7z 0067.jpg brings up the above error " . Not implemented". I can't even do basic compression let alone advanced stuff such as reflate etc. What is wrong here?
    131 replies | 72423 view(s)
  • JamesB's Avatar
    2nd December 2019, 21:08
    It's worth noting there are several very different classes of compression tool out there, so it may be good to label the type of input data more clearly. The sort I can think of are: Compression of many small fragments; basically sequencing machine outputs. There is a lot of replication as we expect e.g. 30 fold redundancy, but finding those repeats is challenging. Further subdivided into fixed length short reads (Illumina) and long variable size reads (ONT, PacBio) Compression of long genomes with a single copy of each chromosome. No obvious redundancy except for repeats internal to the genome itself (ALU, LINES, SINES, etc). Compression of sets of genomes or sequence families. Very strong redundancy.
    10 replies | 805 view(s)
  • Shelwien's Avatar
    2nd December 2019, 14:20
    That's the idea... Brunsli normally uses brotli library to compress jpeg metainfo, while this version leaves metainfo uncompressed. In theory this can improve solid compression of recompressed jpegs.
    15 replies | 1971 view(s)
  • maadjordan's Avatar
    2nd December 2019, 13:33
    but yours lose some compression but can unpack both safely.
    15 replies | 1971 view(s)
  • Aniskin's Avatar
    1st December 2019, 17:14
    It is possible to create additional section in mfilter.ini that will describe direct call of paq8px or cmix. In this case you can use m0=mfilter:a2. (from another thread) Yes, you are right.
    21 replies | 1101 view(s)
  • Shelwien's Avatar
    1st December 2019, 15:27
    http://nishi.dreamhosters.com/u/brunsli_v2a_dll.rar Applied schnaader's patch to my dll kit... can be replaced in mfilter (smaller dll because of removed brotli). Doesn't affect compressed size though, I guess mfilter doesn't feed metainfo to brunsli at all.
    15 replies | 1971 view(s)
  • Bulat Ziganshin's Avatar
    1st December 2019, 15:16
    it seems that mfilter provides lepton-compressed output (and last time I looked into lepton, it has no feature to provide uncompressed output), and paq compression is slower but tighter than the lepton one
    21 replies | 1101 view(s)
  • Shelwien's Avatar
    1st December 2019, 14:30
    Yes, or you can add -m1=lzma2... etc. Unfortunately mfilter outputs a single stream, so lepton/brunsli output has to be compressed again. Also maybe I should make a brunsli dll from schnaader's hacked versions that doesn't use brotli for metainfo...
    21 replies | 1101 view(s)
  • brispuss's Avatar
    1st December 2019, 12:57
    Yes, thanks. I had already placed the mfilter files within a newly created Codecs sub-folder previously, as per the instructions that came with the files. The command 7z a -m0=mfilter:a1 0067.7z 0067.jpg creates an uncompressed 7-zip file via the lepton codec which then can be further processed/compressed using paq8.. and other compressors? EDIT: Assuming the above, the 0067.7z file was compressed using the following with the following results - 135,671 bytes - original file 0067.jpg 109,982 bytes - mfiltered 7z file created using above command 109,977 bytes - mfiltered 7z file compressed with paq8pxd v69 -s9 109,927 bytes - mfiltered 7z file compressed with fast paq8 v6 -8 109,908 bytes - mfiltered 7z file compressed with paq8px V183fix1 -9beta Not really good results!? Best result so far is using paq8px v183fix1 directly on the 0067.jpg file, with 101,192 bytes file resulting. It seems the mfilter method is not the best to use here(?)
    21 replies | 1101 view(s)
  • Shelwien's Avatar
    1st December 2019, 12:36
    Have to put mfilter7z*.dll to "7-Zip\Codecs" rather than "7-Zip". Then it should work with syntax like above (7z a -m0=mfilter:a1 0067.7z 0067.jpg) ...Actually there's this: http://nishi.dreamhosters.com/u/mfilter_demo_20191013.rar
    21 replies | 1101 view(s)
  • Shelwien's Avatar
    1st December 2019, 12:33
    Shelwien replied to a thread FLiF Batch File in Data Compression
    > to flif files, decompressing/decoding webp files back is apparently not lossless Both flif and webp actually encode raw picture (pixels), they're not recompression algorithms. .bmp files decoded from .png and dwebp(cwebp(.png)) would be mostly the same. Or pngcrush(.png) and pngcrush(dwebp(cwebp(.png))). Although webp also discards metainfo. And for lossless compression of .pngs I can suggest either precomp, or flif/webp+pngcrush+bsdiff.
    14 replies | 661 view(s)
More Activity