Activity Stream

Filter
Sort By Time Show
Recent Recent Popular Popular Anytime Anytime Last 24 Hours Last 24 Hours Last 7 Days Last 7 Days Last 30 Days Last 30 Days All All Photos Photos Forum Forums
  • suryakandau@yahoo.co.id's Avatar
    Today, 09:09
    bigm_suryak v9.1 xml 264149 bytes
    25 replies | 2115 view(s)
  • SvenBent's Avatar
    Today, 03:41
    3 days later and im still not done with ect -9 --allfilters-b --pal_sort=120 i am going to do the full color testing on a faster pc...
    428 replies | 112802 view(s)
  • Shelwien's Avatar
    Today, 00:41
    In theory you can run "precomp -v" and parse its output: (67.02%) Possible bZip2-Stream found at position 154448374, compression level = 9 Compressed size: 3051 Can be decompressed to 8996 bytes Identical recompressed bytes: 3051 of 3051 Identical decompressed bytes: 8996 of 8996 Best match: 3051 bytes, decompressed to 8996 bytes Recursion start - new recursion depth 1 No recursion streams found Recursion end - back to recursion depth 0 (72.75%) Possible GIF found at position 167662070 Can be decompressed to 5211 bytes Recompression successful (72.75%) Possible GIF found at position 167663606 Can be decompressed to 5193 bytes Recompression successful (72.75%) Possible GIF found at position 167665142 Can be decompressed to 5211 bytes Recompression successful (72.75%) Possible GIF found at position 167666678 Can be decompressed to 5988 bytes Recompression successful It prints positions so shouldn't be that hard.
    57 replies | 5002 view(s)
  • Gonzalo's Avatar
    Yesterday, 23:12
    I was thinking about a rather naive way to improve precomp effectiveness... I'm sure somebody thought about it before, I'm just sharing it so as to know if it could be done or is it a bad idea. It's been stated before the possibility of rearranging data inside the .PCFs to group similar streams and in doing so improve compression. Couldn't it be simpler to output every stream as a separate file with a guessed extension, like '.ari' for incompressible streams, '.txt' for text, '.bmp' for bitmaps and '.bin' for everything else? Then any modern archiver would take care of the grouping and maybe codec selection. An alternative (so as to not write a million little files to the disk) would be to output a few big TXT, BIN, and so on with all respective streams concatenated, plus an index.pcf containing the metadata needed for reconstruction. ​What do you think about it?
    57 replies | 5002 view(s)
  • byronknoll's Avatar
    Yesterday, 10:51
    @Kirr, if you haven't seen it, there was some DNA benchmarking in this thread: https://encode.su/threads/2105-DNA-Corpus From that thread, there is one trick for "palindromic sequences" that significantly improves compression rate on DNA for general compressors. One other suggestion is to try adding cmv to your benchmark.
    17 replies | 1225 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Yesterday, 08:58
    @byron may you run bigm using paq8hp(11) paq8h2(11) paq8l(10) on enwik8 ? I just want to know how much enwik8 can be compressed ...thank you
    25 replies | 2115 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Yesterday, 06:38
    I have published the source code in bigm thread
    17 replies | 1225 view(s)
  • Kirr's Avatar
    Yesterday, 05:39
    One cool thing you can do with the benchmark is detailed comparison of any two (or more) compressors, or their settings. For example, recently I was wondering about some compressor levels that seem redundant. Let's take a closer look at one such example: lz4 -2 vs lz4 -1. From data table you can see that they are very close, but it's not so convenient to detect it in a wall of numbers. Fortunately, it's easy to visualize this data. For example, this scatterplot shows the difference between "-1" and "-2". For each dataset it shows the results of lz4-2 divided by those of lz4-1 (so that ratios of the measurements are shown). Each point is a different test dataset. What measurements to show is selectable, in this case it's compression ratio on the X axis, and compression+decompression speed on the Y axis. E.g., here is the same kind of chart showing compression memory against decompression memory of lz4-2 compared to lz4-1. The charts clearly show that "-2" and "-1" have identical compression strength. The difference in speed and memory consumption is tiny and probably can be explained by the measurement noise. (Considering that all outliers are on very small data). Therefore "-2" can be considered redundant, at least on this particular machine and test data.
    17 replies | 1225 view(s)
  • Kirr's Avatar
    Yesterday, 04:57
    1.3 GB is nice. It's unfortunate that you choose to waste your good work by ignoring the concerns of others (and possibly violating GPL), by keeping the source closed, by distributing only windows binary and by staying anonymous. I'm not going to touch your compressor with 10-foot pole while this remains the case.
    17 replies | 1225 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Yesterday, 01:55
    Enwik8 17801098 bytes
    25 replies | 2115 view(s)
  • Gotty's Avatar
    Yesterday, 00:10
    Oh, sorry. I could have seen from your earlier posts. Nevertheless these numbers are (or may be) useful for the reader - including me. So thank you for your effort. Appreciated.
    206 replies | 122179 view(s)
  • suryakandau@yahoo.co.id's Avatar
    13th December 2019, 18:06
    bigm_suryak v9 improve wordmodel xml file from silesia benchmark 264409 bytes using ~1.3 gb memory this is archive file contain source code and binary
    25 replies | 2115 view(s)
  • kaitz's Avatar
    13th December 2019, 02:26
    int cxt={}; // dynamic alloc memory for int array int cxt1,cxt2,cxt3,cxt4,N; enum {SMC=1,APM1,DS,AVG,RCM,SCM,CM,MX,ST}; int update(int y,int c0,int bpos,int c4,int pos){ int i; if (bpos==0) cxt4=cxt3,cxt3=cxt2,cxt2=cxt1,cxt1=(c4&0xff)*256; cxt=(cxt1+c0); cxt=(cxt2+c0+0x10000); cxt=(cxt3+c0+0x20000); for (i=0;i<N;++i) vmx(DS,0,cxt);// pr--pr vmx(APM1,0,c0); // return 0; } void block(int a,int b){} int main(){int i; N=3; vms(0,1,1,2,0,0,0,0,0); //APM1,DS,AVG vmi(DS, 0,18,1023,N); // pr..pr vmi(AVG,0,0,1,2); // pr=avg(1,2) vmi(AVG,1,0,0,3); // pr=avg(0,3) vmi(APM1,0,256,7,4); // pr=apm(4) rate 7 cxt1=cxt2=cxt3=cxt4=0; } Above works in newer version. In update only contexts will be set. Prediction order depends on the order set up in main. Sort of like in zpaql config file. ​
    20 replies | 3892 view(s)
  • dnd's Avatar
    12th December 2019, 22:58
    TurboBench - Compression Benchmark updated. - All compressors are continiously updated to the latest version - base64 encoding/decoding - New external dictionary compression with zstd including multiblock mode in TurboBench - benchmarking zstd with external dictionary 1 - generate a dictionary file with: zstd --train mysamples/* -o mydic 2 - start turbobench with: ./turbobench -ezstd,22Dmydic file actually the external dictionary "mydic" must be in the current directoy You can also benchmark multiple small files using multiblock mode in turbobench: 1 - store your small files into a multiblock file using option "M" ./turbobench -Mmymultiblock files (mymultiblock output format: length1,file1,length2,file2,...lengthN,fileN, length=4 bytes file/block length) 2 - Benchmark using option "-m" : ./turbobench -ezstd,22Dmydic mymultiblock -m
    160 replies | 41534 view(s)
  • encode's Avatar
    12th December 2019, 21:36
    I'm not that really into this thing, however here's my quick test of my i5-9600K @ 5GHz:
    206 replies | 122179 view(s)
  • load's Avatar
    11th December 2019, 16:54
    load replied to a thread WinRAR in Data Compression
    WinRAR Version 5.80 https://www.rarlab.com/rarnew.htm https://www.rarlab.com/download.htm
    175 replies | 122057 view(s)
  • GamesBX2910's Avatar
    11th December 2019, 10:31
    Hello IO games or .IO games are real-time multiplayer games, easily identified by the .io domain name (the top-level national domain of the British Indian Ocean Territory - The British Indian Ocean Territory ). Initially, these games were only developed for the web (webgame), but with a boom, game makers have put them on mobile platforms so that players can experience them anytime and anywhere. I will take a few examples for this development: Diep io tank and Chompers games free Game IO is entertaining, light and fun, but equally deep. Players need to have the tactics and patience to win online multiplayer battles and dominate the arena and rankings. So we are developing these .io games for everyone to turn these simple games back into their place. We created websites with thousands of .io games called GamesBX Along with the advancement of science GamesBX offers a wide range of smart games with the form of .io games, suitable for all ages because it is very user friendly, especially very easy to use with the Simple protocol to use directly on the browser. No download time, especially no charge, as it is completely free to play. GamesBX's game store is updated daily with the latest BX games, with a modern interface and a lot of different titles. Includes interactive multiplayer games such as war, action, shooting ...; or strategy games with real time, puzzles, adventure ... and many other things for free. Coppywriter: Gamesbx5.info Blog games Guide games hot
    0 replies | 45 view(s)
  • fhanau's Avatar
    11th December 2019, 05:17
    I think ECT generally rewrites the palette and transforms 1,2 and 4 are done as long as you are using mode 2 or higher.
    428 replies | 112802 view(s)
  • Sportman's Avatar
    11th December 2019, 00:01
    Software-based Fault Injection Attacks against Intel SGX (Software Guard Extensions): https://www.plundervolt.com/doc/plundervolt.pdf https://www.plundervolt.com/ https://github.com/KitMurdock/plundervolt https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00289.html
    15 replies | 3309 view(s)
  • SvenBent's Avatar
    10th December 2019, 19:21
    Does ECT do palette compression /expansion 1: if a palette contains unused entries does ECT discard the unsued entries ? 2: if a palette contains the same colors twice does ECT merge them into 1 entry ? 3: if a palette is only 4 bits ( 16 colors) does ECT try to convert it to 8bits palette for better filtering (as i understand filters works in byte level not pixel level) 4: Does ECT convert to grey tone if the palette entries all falls withing the colors available in grey tone ? Thank you -- edit --- I guess I could have tested this easily once i got home
    428 replies | 112802 view(s)
  • suryakandau@yahoo.co.id's Avatar
    10th December 2019, 07:00
    enwik8 17809514 bytes using ~1.3 gb memory ​
    25 replies | 2115 view(s)
  • SvenBent's Avatar
    10th December 2019, 06:12
    wow I will admit doing this really show ECT impressiveness 24x 8bit paletted Bit Size Encoding Time Original BMP 32 27.30 MB (28,640,482 bytes) Paint save as PNG 32 18.50 MB (19,494,337 bytes) PNGout 32 12.30 MB (12,966,158 bytes) Global Time = 1010.344 = 00:16:50.344 = 100% PNGout /C3 8 9.41 MB ( 9,877,738 bytes) Global Time = 122.219 = 00:02:02.219 = 100% ECT -9 -strip 8 9.40 MB ( 9,860,903 bytes) Global Time = 197.641 = 00:03:17.641 = 100%
    428 replies | 112802 view(s)
  • brispuss's Avatar
    10th December 2019, 03:16
    Thanks for the script! It works! Don't know why I didn't think of this method before!? Anyway, I'll experiment a bit more with lepton and other (pre)processors and see what compression results are obtained and update the compression test list as required.
    25 replies | 1807 view(s)
  • Shelwien's Avatar
    10th December 2019, 01:17
    I just downloaded https://github.com/dropbox/lepton/releases/download/1.2/lepton-slow-best-ratio.exe and tried running it like this: C:\>C:\Users\Shelwien\Downloads\lepton-slow-best-ratio.exe C:\9A6T-jpegdet\A10.jpg lepton v1.0-08c52d9280df3d409d9246df7ff166dd94628730 lepton v1.0-08c52d9280df3d409d9246df7ff166dd94628730 7511941 bytes needed to decompress this file 672236 842468 79.79% lepton v1.0-08c52d9280df3d409d9246df7ff166dd94628730 bytes needed to decompress this file 672236 842468 79.79% and it works. And for recursive processing you can do something like this: cd /d C:\test for /R %a in (*.j*) do C:\lepton\lepton-slow-best-ratio.exe "%a"
    25 replies | 1807 view(s)
  • brispuss's Avatar
    9th December 2019, 21:33
    As mentioned in previous post, getting lepton to run is a bit difficult as it apparently requires to be located in the current working directory of files to be processed. Currently lepton-slow-best-ratio.exe is located in c:\lepton directory. Jpeg files to be processed are located in directory c:\test, plus, there are sub-directories as well. So what is wanted is a batch script that would "effectively" allow lepton-slow-best-ratio.exe to run in the c:\test directory and process files in this directory and also recurse into sub-directories as well. Any ideas please?
    25 replies | 1807 view(s)
  • byronknoll's Avatar
    9th December 2019, 21:33
    @suryakandau, please distribute the source code of bigm along with the releases. Since cmix code is GPL, and bigm contains cmix code, bigm also needs to be open source.
    25 replies | 2115 view(s)
  • byronknoll's Avatar
    9th December 2019, 21:19
    byronknoll replied to a thread cmix in Data Compression
    cmix on enwik8: 14838332 -> 14834133 cmix on enwik9: 115714367 -> 115638939
    418 replies | 102116 view(s)
  • suryakandau@yahoo.co.id's Avatar
    9th December 2019, 18:40
    Bigm is not cmix because bigm use only ~1.3 gb memory.cmix use 24 gb-25gb of memory. I use Windows 10 64 bit. Do it's okay to add bigm to benchmark list
    17 replies | 1225 view(s)
  • cottenio's Avatar
    9th December 2019, 18:15
    Good news! I applied the idea (recursive deltas) to identifying the placement of the gaps (since the data is otherwise sequential) instead of bitmapping the placements first, and it brought the total encoding size down to 1,421 bytes, which handily beat the cmix best benchmark. Per James' suggestion I'm going to write up a quick tool to do compression/decompression automatically on "delta compatible" data like this.
    5 replies | 448 view(s)
  • Kirr's Avatar
    9th December 2019, 17:29
    This would put bigm on #2 for Astrammina rara (after cmix) and on #5 for Nosema ceranae (after jarvis, cmix, xm and geco), in compactness. (Note that this is not the most important measurement for judging practical usability of a compressor). What was compression and decompression time and memory use? Also on what hardware and OS? According to another thread the relationship between bigm and cmix is unclear currently, which probably means that I should not add bigm to benchmark until the issue is resolved?
    17 replies | 1225 view(s)
  • suryakandau@yahoo.co.id's Avatar
    9th December 2019, 11:48
    using bigm_suryak v8 astrammina.fna 361344 bytes nosema.fna 1312863 bytes
    17 replies | 1225 view(s)
  • suryakandau@yahoo.co.id's Avatar
    9th December 2019, 09:29
    bigm_suryak v8 - improve word model -use only 1.3 gb memory xml file from silesia benchmark without precomp: 264725 bytes
    25 replies | 2115 view(s)
  • SvenBent's Avatar
    9th December 2019, 03:54
    its in the download section of this forum. But i am not sure its really optimal now with All the work in ECT. which i why im doing some retesting curerlty doing some testing on kodak image suit
    428 replies | 112802 view(s)
  • zyzzle's Avatar
    9th December 2019, 01:58
    This deflopt bug has bothered me for years -- decades. Is there any way to patch the binary to overcome this 2^32 bit limit for files? At least making it 2^32 bytes would be help very substantially. I take it the source is no longer available or never was. Somehow it must be possible to reverse engineer the binary and eliminate the bits business, change it to bytes...
    428 replies | 112802 view(s)
  • Krishty's Avatar
    9th December 2019, 00:07
    Yes, please share it! If anyone is interested, here is how Papa’s Best Optimizer does it: Copy the file to a temporary location and enforce extension .png because defluff does not work on PNGs without the .png extension also prevents Unicode/max path problems ECT -30060 --alfilters-b --pal_sort=120, optionally with --strip and --strict 30060 is according to my tests earlier in this thread defluff DeflOpt /bk skipped if the file is larger than 512 MiB because it breaks Deflate streams with more than 2³² bits because you said it’s best to run it after defluff if defluff could not shrink the file and DeflOpt printed Number of files rewritten : 0, optimization stops here; else there has been some gain (even single-bit) and it goes back to 3.This is broken in the current version and will be fixed for the next one; missed a 1-B gain on two out of 200 files.The next time I have plenty of spare time, I want to check out Huffmix.
    428 replies | 112802 view(s)
  • LucaBiondi's Avatar
    8th December 2019, 14:20
    LucaBiondi replied to a thread fpaq0 in ADA in Data Compression
    Kaitz you are my hero :) Should you be able to write it in delphi? Have a good sunday!
    4 replies | 339 view(s)
  • Shelwien's Avatar
    8th December 2019, 12:48
    Shelwien replied to a thread vectorization in Data Compression
    > for text at least, vectorizing is always better option Lossy OCR is not a better option. Especially for languages like japanese or chinese. Too many errors. Lossless (symbols + diff, like what djvu tries to do) kinda makes sense, but its still hard to compress it better than original bitmap. Also tracing errors are more noticeable than eg. jpeg blur. And high-quality tracing produces too much data. Of course, its a useful tool anyway, but for reverse-engineering (when we want to edit a document without source) rather than compression.
    3 replies | 369 view(s)
  • pklat's Avatar
    8th December 2019, 11:36
    pklat replied to a thread vectorization in Data Compression
    I had tried djvu long ago. its nice, but today OCR is excellent. svg format can combine bitmap and vector. so it would be ideal replacement for proprietary pdf. for text at least, vectorizing is always better option, for number of reasons, obviously.
    3 replies | 369 view(s)
  • Mauro Vezzosi's Avatar
    8th December 2019, 00:09
    Mauro Vezzosi replied to a thread cmix in Data Compression
    Cmix commit 2019/12/05, changing from layer_norm to rms_norm: yes, rms_norm looks better than layer_norm. How much does cmix (or lstm-compress) improve?
    418 replies | 102116 view(s)
  • brispuss's Avatar
    7th December 2019, 23:55
    Thanks for clarifying. I'll amend compression descriptions in a while. Now regarding lepton-slow-best-ratio, which doesn't seem to work properly as described in my post above. From quick search on the 'net, it seems the issue may be due to how "lepton" has been coded. It seems that lepton has to be placed within the current working directory in order for it to find files to process(?) If so, what batch script, if any, will effectively place lepton within recursed directories in order for it to locate and process (jpg) files?
    25 replies | 1807 view(s)
  • Shelwien's Avatar
    7th December 2019, 22:05
    Shelwien replied to a thread fpaq0 in ADA in Data Compression
    @JamesWasil: I think its not just a random choice of language - https://encode.su/threads/3064-paq8pxv-virtual-machine?p=62339&viewfull=1#post62339
    4 replies | 339 view(s)
  • JamesWasil's Avatar
    7th December 2019, 22:03
    JamesWasil replied to a thread fpaq0 in ADA in Data Compression
    Lol I haven't used Ada for ages. That is cool though, kaitz. Thanks! Next someone should port it to Fortran, Basic, and APL lol (Has anyone here used APL for anything in the last 30 years?)
    4 replies | 339 view(s)
  • Shelwien's Avatar
    7th December 2019, 22:02
    Shelwien replied to a thread fpaq0 in ADA in Data Compression
    So can you explain what you see in Ada that makes it useful for compression algorithms? This source looks pretty similar to pascal/delphi to me.
    4 replies | 339 view(s)
  • kaitz's Avatar
    7th December 2019, 21:55
    kaitz started a thread fpaq0 in ADA in Data Compression
    This is fpaq0 ported to ADA language. source: http://mattmahoney.net/dc/fpaq0.cpp Used GPS 19.1 dev enviroment for that. Executable is static. Compression/decompression is identical to cpp source. This is first time to write something in ADA so probably not the best example writing correct code. ​:D
    4 replies | 339 view(s)
  • cottenio's Avatar
    7th December 2019, 21:01
    Hi Gotty! Ha, you're absolutely right, and what's crazy is I had the same thought you did and am building a minimal skeleton first. I have similar results to yours, although I also found some fun ways to infer data about the relationship with timestamps and revision_ids and storing the errors from predicted values. I'll definitely check out your code as well!
    5 replies | 448 view(s)
  • cottenio's Avatar
    7th December 2019, 20:59
    Thanks for the warm welcome James! I really appreciate your insight and will definitely try out the technique on other sources as well; thanks for the links!
    5 replies | 448 view(s)
  • dado023's Avatar
    7th December 2019, 20:23
    Hi Sven, would you be so kind and share your pngbest.bat. I am mostly fan of best possible compression, but within in a reasonable compression duration :)
    428 replies | 112802 view(s)
  • Aniskin's Avatar
    7th December 2019, 20:13
    Imho quite strange using of MFilter. And again - MFilter should be used with additional compression -m1=LZMA(2). MFilter separates input jpeg file to metadata and jpeg data, then pass jpeg data into selected jpeg coder. So if you don`t use -m1=LZMA(2) all metadata will be without any compression and additionally result will have overhead added by 7Z file format and MFilter codec. And simple using of Lepton/Brunsli/PAQ will give you better results because they compress metadata. lepton+paq8px 52,820,938 7z a -m0=mfilter:a2 <- not the same size as last entry at 50,782,720 despite similar processing!? This is not lepton+paq8px. This is paq8px only. Metadata without any compression. And simple paq8px give you better result. lepton (slow version) 52,705,758 ran under original 7-zip using command 7z a -m0=mfilter:a1 Metadata without any compression. Simple Lepton.exe may give you better result. brunsli 52,444,562 7a a m0=mfilter:a0 Metadata without any compression. Simple Brunsli.exe may give you better result. lepton+paq8px+lzma2 51,434,256 7z a -m0=mfilter:a2 -m1=lzma2:x9 This is not lepton+paq8px+lzma2. This is paq8px+lzma2.
    25 replies | 1807 view(s)
  • Gotty's Avatar
    7th December 2019, 19:09
    Also refer to Fastest and smallest enwik8 skeleton (de)compression (for fun) The observation is the same, the solution is a little bit different.
    5 replies | 448 view(s)
  • schnaader's Avatar
    7th December 2019, 12:41
    Thanks! That was a mutex error in PNG restoration, fixed.
    57 replies | 5002 view(s)
  • JamesWasil's Avatar
    7th December 2019, 09:28
    Hi Cottenio, and welcome to the Encode Data Compression forums! What you've done is interesting, as it's a form of lossless Delta Compression combined with intelligent binary headers and flags speculating distances for page data, while compressing the gaps by the shortest predictions possible with logarithmic bits required for each structure. There really should be a name for structural targeting and compression of file attributes that are outside of LZ pattern matching or other forms of weighted context mixing and partial matches, etc. As far as I'm aware, there is no official name for it yet, but perhaps there should be one and a glossary of nomenclature to accompany it? Although there are names and definitions for things mathematically that are commonly understood and accepted for naming convention; things like order 0 translating to 1 byte analysis, order 1 as 2 bytes, order 2 as 3, etc as n+1 always, etc. There are many things not defined by a static name that may be beneficial to assign, and your neat work (whether you partially reinvented the wheel or not ;) ) brings the importance of that to the forefront. I suppose we should name it the Cottenio Delta algorithm, since it is yours and it is a form of delta encoding. What do you guys think? P.S: You may want to see if there are ways to apply this and tailor it to be useful with other text files that are outside of enwiki8...perhaps focusing upon spaces and page breaks CR+LF (chr(13)+chr(10)) or other commonalities in text to preformat it for compression. There are several ways you might go about implementing this, like detecting how many words exist before a space or sequence of common characters, removing them, and then representing them with binary flags similar to how you did with the page IDs and missing pages from enwiki8. That said, it might end up being a form of precomp if you're able to isolate the bit flags and have the rest of the file remain text data that can still be worked upon by other compressors, adding to their efficiency. That's a way to approach it and one idea for it, but there are many more I'm sure. P.P.S: If you do decide to expand upon it further or tie together this implementation with a form of LZ hybrid for a stand-alone text-based compressor, you might find some of Ross Williams work from years ago as beneficial to do that, available freely on http://ross.net/compression/introduction.html (You still might want to make it a precomp and use cmix for better compression, but you have plenty of options)
    5 replies | 448 view(s)
  • SvenBent's Avatar
    7th December 2019, 02:36
    Thank you. It was mostly for curiosity and testing. I am doing a re evaluation of ect against my old pngbest.bat script. trying to figure out which tools are still useable when using ECT
    428 replies | 112802 view(s)
  • cottenio's Avatar
    7th December 2019, 00:14
    Hi all, I'm relatively new to data compression and have been toying around with enwik8. After noticing that the page ids (which are strongly structured as <id></id> tags) are sequential in nature with gaps due to - I assume - deletions of dead pages , I tried my hand at doing some delta encoding of the gaps, while knowing that in general I could assume at least a delta of 1 for each subsequent id. Knowing that there are 12,347 total pages in enwik8 I did as follows: The largest gap I found was 198, so in the first naive implementation I stored each delta in ceil(log(198,2),1) bits, which was 8, which took up 12,347 bytes. I'm sure no one is surprised by that. 12,347 bytes was much better than the 226,718 bytes that the natural text represented ("____<id>%i</id>\n"). I wondered how I could make that more efficient, so I tried getting a feel for the shape of the deltas, which looked like this: As you can see the outliers like 198 and 178 massively inflated the bits required, so the next strategy was bit-packing everything but those two with only 5 bits, and then manually fixing those during the decoding process. Now the largest delta was only 25 (5 bits) and that took ceil((12,345 * 5) / 8, 1) = 7,716 bytes (+ a few more for the two repairs), which I thought was a neat improvement. Next I thought: can I embed that idea directly in an unpacking structure that operates on interval steps? I worked out the math and found that: 12,347 values as 1 bit: 0 for no additional delta, 1 for additional delta (max is 1) 2,151 values from above had 1's. Most of them don't have any higher value, so I stored another 2,151 values as 1 bit: 0 - for done, 1 additional delta (max is 2) 475 of those had reached the equivalent of 2 by this point. So I stored another 475 as 2 bit values: 0 - for done, 1 - 3 as higher numbers with 3 meaning the possibility of additional delta beyond the sum so far (max is 5) 18 of those reached 5 from before, so I stored 18 as 4 bit values using the same idea, with a max reaching 20 5 of those reached 20, so i stored 5 as 8 bit values. Altogether, this required 1,544 + 269 + 119 + 9 + 5 = 1,946 bytes. I wrote out a binary file containing those bit-packed sequences, then a short C program to decode it again and it worked fine. I threw it into assembly (and I am by no means a talented assembly programmer) and ended up with a 4,096 byte win32 console executable (data embedded) - which I thought was pretty great for a little decoding engine that can unpack the deltas and recreate the original enwik8 strings. The total compression storage is, compared by type (compared to 4,096 bytes): Raw enwik8 lines: 226,718 (1.8%) Parsed chars/no-tags (just the numbers): 53,320 (7.7%) 32-bit integers: 49,388 (8.3%) 8-bit deltas: 12,347 (33.2%) Control: cmix achieved a 1,685 byte file on the a text file just containing the original page ids. So I'm pretty sure I've reinvented a wheel here, but I have no idea what the technique is called so I can learn more about it. It's like a delta encoder, but it infers additional interval steps based on an expansion and knowing the ceiling of the previous interval. Any ideas? I've attached the raw data and the executable for review, and included a link to the bit packing data as a Google sheet. enwik8_raw_page_ids.txt - a grep of all page id lines from enwik8 page_id_decoder.exe - extracts the same content as enwik8_raw_page_ids.txt https://docs.google.com/spreadsheets/d/1Xq9-KVF40BxwUNf6pkXM0z2L6X0vKryT-fw8122VGZE/edit?usp=sharing
    5 replies | 448 view(s)
  • brispuss's Avatar
    7th December 2019, 00:01
    Added four more compression methods. However, having problem running lepton (lepton-slow-best-ratio) by itself. Command syntax appears when typing in lepton-slow-best-ratio.exe, but trying to compress files always brings up error - "Failed to start subprocess with command line OS_ERROR". Command - lepton-slow-best-ratio.exe 0067.jpg 0067.lep (for example), or even just - lepton-slow-best-ratio.exe 0067.jpg brings up this error!? Why getting this error message? BTW running under Windows 7 64 bit.
    25 replies | 1807 view(s)
  • fhanau's Avatar
    6th December 2019, 22:30
    You can change it in the source code if it is important to you, but I do not plan to support this in general.
    428 replies | 112802 view(s)
  • JamesB's Avatar
    6th December 2019, 20:15
    Gzip is too popular. I regularly have discussions trying to channel people towards zstd instead. No reason to use gzip in modern era IMO unless it's some legacy compatibility.
    17 replies | 1225 view(s)
  • SvenBent's Avatar
    6th December 2019, 19:44
    When running with --allfilters-b, ECT stop after 500generations with no progress. Is there a way to increase that threshold ?
    428 replies | 112802 view(s)
  • Aniskin's Avatar
    6th December 2019, 18:35
    Technically there is no problem to create such version of the codec. I will think about it.
    25 replies | 1807 view(s)
  • Shelwien's Avatar
    6th December 2019, 18:20
    @Aniskin: Btw, is it possible to get mfilter to output jpeg and metainfo to different streams? With that we could avoid compressing jpegs twice...
    25 replies | 1807 view(s)
  • Aniskin's Avatar
    6th December 2019, 18:04
    If you want to use MFilter+Lepton: 7z a -m0=mfilter:a1 -m1=lzma2:x9 If you want to use MFilter+paq8: 7z a -m0=mfilter:a2 -m1=lzma2:x9 Also what about solid compression?
    25 replies | 1807 view(s)
  • brispuss's Avatar
    6th December 2019, 17:38
    Thanks. I didn't remove metadata.
    25 replies | 1807 view(s)
  • smjohn1's Avatar
    6th December 2019, 17:33
    Did you remove meta-info when using packJPG ( -d )? meta-info has a large percentage for small files.
    25 replies | 1807 view(s)
  • Kirr's Avatar
    6th December 2019, 17:22
    Yes, fqzcomp performs well considering it works via wrapper that chops long sequence into reads. (And adds constant quality as per your idea, which I probably took a bit too far :-)). Interestingly, it is currently leading in compactness on spruce genome: chart (though this test is not complete, some compressors are still missing). Also it may still improve more after I add its newly fixed "-s9" mode. I guess it will work even better on proper fastq shord reads datasets. Thanks. Yeah, NAF is focused on transfer + decompression speed, because both of these steps can be a bottleneck in my work. I noticed that many other sequence compressors are primarily optimized for compactness (something I did not know before doing the benchmark), which partly explains why gzip remains popular.
    17 replies | 1225 view(s)
  • brispuss's Avatar
    6th December 2019, 16:55
    OK. Working on other compression examples. Modified MFilter7z.ini as per earlier post, but changed paq8px version to 183fix1 instead of version 181fix1. Created sub directory named paq8px_183fix1 under Codecs sub directory under 7-zip. Paq8px_v183fix1 executable copied to the directory paq8px_183fix1. So the command for lepton plus paq8px_v183fix1 PLUS lzma2 should now be 7z a -m0=mfilter:a1 -m1=mfilter:a2 -m2=lzma2:x9 0067.7z 0067.jpg (for example)?
    25 replies | 1807 view(s)
  • Marco_B's Avatar
    6th December 2019, 15:48
    Hi all, ​in this episode I am glad to tell about an attempt of my own to dispense the problem previously encountered in Grouped (ROLZ) LZW: the fixed size of the groups (dictionaries attached to contest). A way to proceed is illustrated by Nakano, Yahagi, Okada but I started from a different consideration. Every time a symbol occurs in a text, it gains an increasingly number of children and the chance for it to reappear is more and more smaller, while an entropy stage which takes its output assigns shorter codes. To conciliate this two opposite I settled down a schema where symbols belong to a set of lists entitled for the number of children, and each list is organized as LRU. A symbol now will be emitted by its list and rank inside it, respectively via the Witten Cleary arithmetic coder and the Elias delta. I choosed an AC because it is the solely that can mimic closely the fatherhood distribution among symbols, but this constraint put me in front of the necessity to interleave its sequence. After a complicated period I realized that the solution must be based on two facts: (i) the encoder has to be ahead of two symbols beacause the decoder needs to start with 16 bit; (ii) the variables high and low (which define the focus interval) are in lockstep between the just mentioned two sides. The rest, you can see it through the source below. Actually at the moment the compressor is terrible, both in terms of speed and ratio, but I made it public as the interleaver could be of some interest. I have in mind to improve the performance of the compressor imposing on it a more canonical contest apparatus, that should curtail the lists at the expense of memory consumption. I hope to be back soon, greetings, Marco Borsari
    0 replies | 185 view(s)
  • Shelwien's Avatar
    6th December 2019, 15:34
    Shelwien replied to a thread FLiF Batch File in Data Compression
    > stuffit deluxe is even commercial! Here's stuffit windows binary: http://nishi.dreamhosters.com/stuffit14_v0.rar > which one would you use on linux? There's wine on linux, so hard to say. But I like BMF.
    14 replies | 941 view(s)
  • pklat's Avatar
    6th December 2019, 15:18
    pklat replied to a thread FLiF Batch File in Data Compression
    sorry, mea culpa. so there are: rkim, StuffItDeluxe, Flic, zpaq, bmf, gralic, cmix. but most of them are either too slow or too 'exotic' imho. stuffit deluxe is even commercial! flif is supported by xnview, irfanview, imagemagick. which one would you use on linux?
    14 replies | 941 view(s)
  • Shelwien's Avatar
    6th December 2019, 14:58
    Shelwien replied to a thread FLiF Batch File in Data Compression
    Well, this says that it isn't: http://qlic.altervista.org/LPCB.html As to solid flif - there's support for multi-image files.
    14 replies | 941 view(s)
  • pklat's Avatar
    6th December 2019, 14:49
    pklat replied to a thread FLiF Batch File in Data Compression
    from my limited tests, flif has best lossless ratio. however, its slow at decoding and not supported well. it seems best to use it for archiving only for now. as someone already mentioned, pity it has no 'solid' option yet for large set of pictures like archivers do. ( or dictionary option )
    14 replies | 941 view(s)
  • Aniskin's Avatar
    6th December 2019, 14:37
    MFilter should be used with additional compression -m1=LZMA(2) because MFilter does not pass any metadata into packer. And maybe lepton.exe without MFilter will show better result.
    25 replies | 1807 view(s)
More Activity