Ran countless tests on reflate even made a couple of weird inputs, handles all with no problems, even ran out of ideas to make it show errors.
BTW regarding stdio, I think the two outputs of stdout and stderr can be made into a single output by maybe storing hif and out in memory then during restoration hif and out are read as one input but store again in memory then written to one output unless you're not looking forward to making reflate use more than 20mb of ram.Code:Memory allocation: 698MB 53048011/53048011 -> 42175193 depth=5; levels[]={ 9, 6, 5, 4, 4, 4, 4 }; blk_minsize[] = { 66666, 666, 66, 6.00 } sizeof(M)=30380032 inp=53048011/53048011 out=2404116427 depth[0].n_raws=3877 depth[1].n_raws=0 depth[2].n_raws=0 depth[3].n_raws=0 depth[4].n_raws=0 F:<.out>:2404116427 Memory allocation: 698MB F:<.hif1>:9848141-> 22020096 F:<.hif2>:220571 -> 24707072 F:<.hif3>:648 F:<.hif4>:0 F:<.hif5>:0 Result: There are no more files. 2413987368/65551 -> 31473451 Memory allocation: 86MB F:<.out>:2404116427800/2413987368 F:<.hif1>:984814187368/2413987368 F:<.hif2>:220573987368/2413987368 31473451 -> 2413987368/2413987368 F:<.hif3>:648 F:<.hif4>:0 F:<.hif5>:0 Result: The pipe has been ended. depth=5 sizeof(M)=30380032 inp=2404116427/2404116427 out=53048011 Comparing files C:\USERS\ZEE\DESKTOP\RAWFILT_V1K\pack.wim and .UNP FC: no differences encountered
Anyways, I have a question, I see reflate uses entropy to help search for deflate streams, doesn't the function itself to check hinder overall speed?
I ask because I tried the entropy idea for my deflate project and it lead to a loss of speed especially on dense input that doesn't really have deflate streams at all, it did benefit on some input but as I said, I ran many tests, I input a video file on purpose and reflate became very slow.
> Ran countless tests on reflate even made a couple of weird inputs
Thanks, but since v1k I already fixed one harmless bug related to n_raws counting,
and then PrinceGupta found a case where reflate crashes during decoding.
> made into a single output by maybe storing hif and out in memory
For now, 7z actually solves that problem for me.
But in general, temp files or random access or caching whole streams in memory
are not a solution, because what I need is a single output _stream_ without large delays,
not just a single file. For example, reflate should be able to work in a network proxy.
> I see reflate uses entropy to search for deflate streams, doesn't the
> function itself to check hinder overall speed?
Here you can see the source of that entropy filter:
http://nishi.dreamhosters.com/u/entropy_v0.rar
And this is the function which is called for each input byte:
So no, its not even really noticeable comparing to file i/o speed.Code:void Update( byte c ) { uint d=buf[(a+rwinsize-winsize)&rwinmask]; buf[a&rwinmask]=c; a++; freq[d]--; cl += LOG2[freq[d]+1] - LOG2[freq[c]+1]; freq[c]++; }
> I input a video file on purpose and reflate became very slow.
Yes, but that's because entropy filter doesn't help there,
so detector actually has to try decoding a deflate block, starting from each byte.
In fact, recently I experimented with making a 512M (2^32=4gbit) lookup table
for valid 32-bit prefixes of deflate streams, and apparently that only allows
to discard 10% or so of attempts, so I still don't have a workaround for faster
processing of compressed data.
void Update( byte c ) {
uint d=buf[(a+rwinsize-winsize)&rwinmask];
buf[a&rwinmask]=c; a++;
freq[d]--;
cl += LOG2[freq[d]+1] - LOG2[freq[c]+1];
freq[c]++;
}
Oh I guess this was the missing element.
> reflate should be able to work in a network proxy.
Ok, I can now see its purpose.
Files needed for compression:
cls-reflate.dll
reflate.exe
Files needed for decompression:
cls-reflate.dll
reflate.exe
Drawbacks:
It's actually slower during compression process than normal.
Haven't checked decompression speed but I'm sure it's speed isn't affected that much.
Advantages:
Shows progress when compressing and decompressing.
Bulat Ziganshin (9th October 2016),Minimum (30th September 2016),Simorq (11th November 2016),Viperel (19th January 2017),Zonder (1st October 2016)
http://nishi.dreamhosters.com/u/rawfilt_v1l.rar
Bugfix for the file found by PrinceGupta, also disabled the EOF codelen check in deflate decoder - better scan results are possible in some cases.
oltjon (18th November 2017),RamiroCruzo (2nd October 2016),Razor12911 (30th September 2016),Simorq (11th November 2016)
http://nishi.dreamhosters.com/u/rawfilt_v1l.rar (updated)
Added "reflate_std.exe" which uses the frontend from v1e - no nesting support, but can work with stdin/out/err.
Note that .raw/.out are swapped - see test_std.bat.
PrinceGupta (2nd October 2016),RamiroCruzo (2nd October 2016),Razor12911 (3rd October 2016),Simorq (11th October 2016),Stephan Busch (3rd October 2016),xinix (2nd October 2016),~MAK~ (5th October 2016)
78372 (30th September 2019),Bulat Ziganshin (9th October 2016),Minimum (17th October 2016),RamiroCruzo (8th October 2016),Simorq (11th October 2016),~MAK~ (8th October 2016)
use 32-bit fa32.exe, copy both distro into same dir and then try just "fa32 a archive xxx.zip -mreflate"
I want to use it in fa.ini and with 64-bit
you can use "reflate" method in fa.ini just like you use "precomp" or "lzma". overal, cls-XXX.dll just adds XXX compression method that can be used like any built-in one, and may even have parameters a-la "reflate:a1:b2:c3:d44". but you can't use any 32-bit dll with any 64-bit program. i asked Razor to compile 64-bit versions of his CLS plugins, but this may need some time
RamiroCruzo (26th October 2016),Simorq (11th November 2016),Stephan Busch (25th October 2016)
Do you have any idea what goes wrong with the attached PNG that can't be decompressed by reflate? What I know about it is that it mixes static and dynamic huffman and has many end codes (1611), but I guess this shouldn't prevent decompression. Perhaps this can be solved with other parameters or another version of reflate, I only tried the newest reflate_std version you posted (rawfilt_v1l.rar) with standard parameters (c9 - - -). Tried all of the compression levels, though (leading to similar results, 303 KB .out, 11/11/14/17/23/33/98/111/124 bytes .raw for levels 9/8/.../1).
Also note that the image on XKCD has been replaced with a newer and smaller version (240 KB instead of 302 KB) that works in reflate and even in Precomp.
http://schnaader.info
Damn kids. They're all alike.
Simorq (11th November 2016)
Nothing really goes wrong, it just starts with a type1 block probably...
Thanks for the sample file, but I still don't know what to do with such cases, because easily accepting valid type1 blocksCode:D:\tmp7>reflate.exe c new_pet.png 1 2 // 999 9 9 0 depth=1; levels[]={ 9, 9, 9 }; blk_minsize[] = { 999, 9, 9, 0.000 } sizeof(M)=30380032 inp=302719/302719 out=1791251 depth[0].n_raws=2 D:\tmp7>reflate.exe d 1 2 3 depth=1 sizeof(M)=30380032 inp=1791251/1791251 out=302719 D:\tmp7>md5sum 3 new_pet.png 302726f7ff3a5684b0038170f96c968e *3 302726f7ff3a5684b0038170f96c968e *new_pet.png
would cause lots of misdetections - in any case, there're command line parameters for that.
Also, there's a plan to make a specialized png recompressor based on reflate.
Mike (8th November 2016),RamiroCruzo (9th November 2016),schnaader (8th November 2016),Simorq (11th November 2016)
Shelwien,
Could you explain the parameters of reflate? what does 999 9 9 0 mean?
What is '-' in :
reflate c9 - - - <%1 >.out 2>.raw
Thanks
blk_minsize[] = { 99999, 4096, 10, 7.500 }
1. There're 3 types of deflate blocks - type0 (stored), type1 (precomputed huffman table), type2 (encoded huffman table)
So first 3 params are minimum block sizes (in bytes) per block type - block is skipped if decoder returns success,
but block size is shorter than threshold. (This only applies to first block in a stream.)
However there're problems with type0 and type1 blocks at start of a stream, so default thresholds for these are set relatively high:
type1 block only has 3 bits of header followed by huffman data, so there're lots of misdetections when we accept short type1 blocks -
longer type1 blocks can be discarded due to invalid matches (out of window distance).
type0 block actually has "inverted length" field, which makes detecting it much easier than type1.
But still, 00 00 FF FF or FF FF 00 00 are not so rare, so misdetections are easy.
And the main problem is that type0 block has a padding field (there're 3 header bits and then a padding for byte alignment),
and there's currently a bug (more like design problem) where first block is supposed to always have zero padding.
Which is why minblksize[0] is set to 99999 by default, which would skip _all_ type0 blocks.
2. 4th parameter is entropy filter threshold - there's a fast order0 model with 256-byte window, which estimates
bpc (compressed size in bits) of current data byte. Deflate is supposed to be a compression algo, so it shouldn't
be possible to further compress it with simple entropy coding, thus default threshold is set to 7.5 bits per byte.
But sometimes there're quirky compression libs or _very_ redundant data (like MBs of zeroes), in which case using
lower threshold can help - but it would also significantly reduce the processing speed.
3. "-" is a standard option in unix utils, which makes them work with stdin/stdout instead of files.
See https://linux.die.net/man/1/grep
However this atm only applies to reflate_std.exe
Bulat Ziganshin (13th November 2016),Mike (19th November 2016),msat59 (12th November 2016),RamiroCruzo (14th November 2016),Simorq (12th November 2016)
Thank you for detailed description.
Please add -help to reflate.exe and list the arguments.
Some queries regarding Your 7z.
1.What is the difference between zstd2 and zstd?
2.What x64flt, split** and deltb does?
3.Is there anything special about format "P a F i L e"?
4.Why not SREP instead of rep?
Thanks in Advance.......
1) same as between lzma2 and lzma. There're "mt" and "c" params which control MT. See http://nishi.dreamhosters.com/u/method.htm
2a) x64flt is an x64 add-on for bсj/bcj2 filters, compare:
7z a -mf=off -bb3 -m0=lzma 1.pa *.exe
7z a -mf=off -bb3 -m0=bcj -m1=lzma 2.pa *.exe
7z a -mf=off -bb3 -m0=x64flt -m1=bcj -m2=lzma 3.pa *.exe
2b) split*N is a MT splitter for filter trees like reflate
2c) deltb is an adaptive delta filter, like Bulat's (a bit worse atm)
3) I had to hack codec/stream number limits in 7z source, so the format is not really compatible with .7z.
The source will be posted on github at some point anyway (except for reflate and plzma dlls)
4) Because this rep filter is written by me, and srep is not. Also, unmodified srep is not exactly compatible with 7z class system
(7z can use multiple instances of any filter/codec), and large buffers and/or temp files are not a good idea either.
I intend to work on improving rep1 speed though (it has fb and c params btw).
Also, this is not really compatible with reflate thread - better switch to PM or start another thread.
Bulat Ziganshin (5th January 2017),PrinceGupta (21st November 2016),Simorq (20th November 2016)
Seems it will be included in PowerArchiver 2017: http://www.powerarchiver.com/2016/12...up-to-70-more/
Bulat Ziganshin (5th January 2017)
Congrats Shelwien Uncle :popper: Am getting that software to bench it XD
Nice to see more plugins added in latest PowerArchiver 2017 beta (bsc, DeJPG)![]()
http://nishi.dreamhosters.com/u/rawfilt_v1l2.rar
Fixed a bug that appeared during restoring of some pdfs.
Thanks to Stephan Busch for providing the samples.
Also changed some simple coroutine (in restoring) to a state machine, as experiment,
but somehow speed remained exactly the same.
78372 (7th May 2017),oltjon (18th November 2017),RamiroCruzo (10th March 2017),Razor12911 (12th April 2017),Simorq (17th July 2017),Stephan Busch (10th March 2017)
..
Hi Shelwien
Please Update
There's nothing special to update really.
1. I made a back-port of raw2hif dlls from recent (v1l) sources for people who use these dlls.
There're some bugfixes and its faster, comparing to 0c3.
http://nishi.dreamhosters.com/u/refl...x32_dll_v0.rar
http://nishi.dreamhosters.com/u/refl...x32_dll_v0.rar
2. http://nishi.dreamhosters.com/u/7zdll_vF4.rar has integrated reflate, with MT support via split.
Usage examples:
7z a -m0=reflate -m1=plzma 1.pa files
7z a -mx=9 -m0=split*2:c=16M -m1=reflate*4:x9876 -m2=lzma2 -m3=reflate*4:x9876 -m4=lzma2 -mb0s0:1 -mb0s1:3 -mb1s0:2 -mb3s0:4 1.pa testfile.zip
3. Still didn't have time to add level detection or anything like that.
Simorq (17th July 2017)
can get the level from this
http://fileforums.com/showthread.php?t=99270
Last edited by PrinceGupta; 17th July 2017 at 18:32.
Simorq (18th July 2017)
Simorq (18th July 2017)
Hello Shelwien,
thank you for this nice recompressor.
I've got a bug to report with http://nishi.dreamhosters.com/u/7zdll_vF5.rar
I've been testing it with reflate+plzma and it seems plzma is giving consistently worse results then either lzma1 or lzma2. The bug appears when decompressing approx. 4GB+ archives created with these settings:
Compression runs ok, but on decompression it gets stuck at 99% and the end of the decompressed file is missing. I have tried reflate only and plzma only methods and separately these work ok.Code:7z a -r -mx=9 -myx=9 -m0=reflate:x6 -m1=plzma:mt2:a1:1610612736:lc8:fb273 win.plzma.pa Win10_1709_Czech_x64.iso
You can get the test iso here.
MD5 sums of tested files:
2932a4b336a9313eadf7c4518369e437 *Win10_1709_Czech_x64.iso
294486208fde71ab634fbdfa07db65e2 *win.plzma.pa
Shelwien (9th July 2018)