Activity Stream

Filter
Sort By Time Show
Recent Recent Popular Popular Anytime Anytime Last 24 Hours Last 24 Hours Last 7 Days Last 7 Days Last 30 Days Last 30 Days All All Photos Photos Forum Forums
  • brispuss's Avatar
    Today, 12:40
    Ogg (vorbis) files are usually highly compressed, so compressing them further is very difficult. Oggre compresses ogg files fairly well, but not as much as I would like. I have heaps of ogg files that I would like to compress for archiving purposes. Attached sample at 1,806,612 bytes compresses down to 1,336,503 bytes using oggre. That is a ~ 26% reduction in size! Not bad. Tried using paq8px v183fix1, resulting in a size of 1,530,168 bytes. A 15% reduction in size. Not so good. But is there a way to compress the ogg file even further? Maybe by converting to some other format first? Or using some combination of (pre)compressors? I don't seem to be getting good results using precomp, reflate, pzlib, and/or paq8px_183fix1.
    0 replies | 4 view(s)
  • dnd's Avatar
    Today, 12:24
    Turbo Base64 SIMD - 100% C (C++ headers), as simple as memcpy. - No other base64 library encode or decode faster - Scalar can be faster than other SSE or ARM Neon based base64 libraries - Turbo Base64 SSE faster than other SSE/AVX/AVX2! base64 library - Fastest AVX2 implementation, damn near to memcpy - TurboBase64 AVX2 decoding is ~2x faster than other AVX2 libs. - Fastest ARM Neon base64 - Dynamic CPU detection and JIT scalar/sse/avx/avx2 switching - Base64 robust error checking - Portable library, 32/64 bits, SSE/AVX/AVX2, ARM Neon, Power9 Altivec - OS:Linux amd64, arm64, Power9, MacOs, s390x. Windows: Mingw, visual c++ - Big+Little endian - Ready and simple to use library, no armada of files, no hassles dependencies
    0 replies | 7 view(s)
  • Jyrki Alakuijala's Avatar
    Today, 06:06
    -lossless does change the rgb of alpha=0 pixels, the same as many other lossless codecs. if you need to keep the rgb for alpha=0, then use -exact.
    54 replies | 11961 view(s)
  • Shelwien's Avatar
    Today, 01:59
    TSX provides a way to restart a chunk of code when a conflict happens during shared memory access: https://gcc.gnu.org/onlinedocs/gcc/x86-transactional-memory-intrinsics.html I suppose it can be useful for some slow MT compressor, like NNCP. Normally compression doesn't require frequent inter-thread communication because smaller processing units mean worse compression.
    17 replies | 3380 view(s)
  • Shelwien's Avatar
    Today, 01:23
    Just checked: "-z 9 -exact" is lossless, "-lossless" isn't :)
    54 replies | 11961 view(s)
  • Jyrki Alakuijala's Avatar
    Yesterday, 23:48
    You can enable lossless with -lossless. -z 9 alone is lossy.
    54 replies | 11961 view(s)
  • SolidComp's Avatar
    Yesterday, 22:51
    I was about to post about TSX to see if anyone had found a use for these instructions in compression. It's surprising that they already have security issues – no one seems to be using TSX since they were disabled in many Haswell or Broadwell CPUs.
    17 replies | 3380 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Yesterday, 09:09
    bigm_suryak v9.1 xml 264149 bytes
    25 replies | 2188 view(s)
  • SvenBent's Avatar
    Yesterday, 03:41
    3 days later and im still not done with ect -9 --allfilters-b --pal_sort=120 i am going to do the full color testing on a faster pc...
    428 replies | 112862 view(s)
  • Shelwien's Avatar
    Yesterday, 00:41
    In theory you can run "precomp -v" and parse its output: (67.02%) Possible bZip2-Stream found at position 154448374, compression level = 9 Compressed size: 3051 Can be decompressed to 8996 bytes Identical recompressed bytes: 3051 of 3051 Identical decompressed bytes: 8996 of 8996 Best match: 3051 bytes, decompressed to 8996 bytes Recursion start - new recursion depth 1 No recursion streams found Recursion end - back to recursion depth 0 (72.75%) Possible GIF found at position 167662070 Can be decompressed to 5211 bytes Recompression successful (72.75%) Possible GIF found at position 167663606 Can be decompressed to 5193 bytes Recompression successful (72.75%) Possible GIF found at position 167665142 Can be decompressed to 5211 bytes Recompression successful (72.75%) Possible GIF found at position 167666678 Can be decompressed to 5988 bytes Recompression successful It prints positions so shouldn't be that hard.
    57 replies | 5126 view(s)
  • Gonzalo's Avatar
    14th December 2019, 23:12
    I was thinking about a rather naive way to improve precomp effectiveness... I'm sure somebody thought about it before, I'm just sharing it so as to know if it could be done or is it a bad idea. It's been stated before the possibility of rearranging data inside the .PCFs to group similar streams and in doing so improve compression. Couldn't it be simpler to output every stream as a separate file with a guessed extension, like '.ari' for incompressible streams, '.txt' for text, '.bmp' for bitmaps and '.bin' for everything else? Then any modern archiver would take care of the grouping and maybe codec selection. An alternative (so as to not write a million little files to the disk) would be to output a few big TXT, BIN, and so on with all respective streams concatenated, plus an index.pcf containing the metadata needed for reconstruction. ​What do you think about it?
    57 replies | 5126 view(s)
  • byronknoll's Avatar
    14th December 2019, 10:51
    @Kirr, if you haven't seen it, there was some DNA benchmarking in this thread: https://encode.su/threads/2105-DNA-Corpus From that thread, there is one trick for "palindromic sequences" that significantly improves compression rate on DNA for general compressors. One other suggestion is to try adding cmv to your benchmark.
    17 replies | 1250 view(s)
  • suryakandau@yahoo.co.id's Avatar
    14th December 2019, 08:58
    @byron may you run bigm using paq8hp(11) paq8h2(11) paq8l(10) on enwik8 ? I just want to know how much enwik8 can be compressed ...thank you
    25 replies | 2188 view(s)
  • suryakandau@yahoo.co.id's Avatar
    14th December 2019, 06:38
    I have published the source code in bigm thread
    17 replies | 1250 view(s)
  • Kirr's Avatar
    14th December 2019, 05:39
    One cool thing you can do with the benchmark is detailed comparison of any two (or more) compressors, or their settings. For example, recently I was wondering about some compressor levels that seem redundant. Let's take a closer look at one such example: lz4 -2 vs lz4 -1. From data table you can see that they are very close, but it's not so convenient to detect it in a wall of numbers. Fortunately, it's easy to visualize this data. For example, this scatterplot shows the difference between "-1" and "-2". For each dataset it shows the results of lz4-2 divided by those of lz4-1 (so that ratios of the measurements are shown). Each point is a different test dataset. What measurements to show is selectable, in this case it's compression ratio on the X axis, and compression+decompression speed on the Y axis. E.g., here is the same kind of chart showing compression memory against decompression memory of lz4-2 compared to lz4-1. The charts clearly show that "-2" and "-1" have identical compression strength. The difference in speed and memory consumption is tiny and probably can be explained by the measurement noise. (Considering that all outliers are on very small data). Therefore "-2" can be considered redundant, at least on this particular machine and test data.
    17 replies | 1250 view(s)
  • Kirr's Avatar
    14th December 2019, 04:57
    1.3 GB is nice. It's unfortunate that you choose to waste your good work by ignoring the concerns of others (and possibly violating GPL), by keeping the source closed, by distributing only windows binary and by staying anonymous. I'm not going to touch your compressor with 10-foot pole while this remains the case.
    17 replies | 1250 view(s)
  • suryakandau@yahoo.co.id's Avatar
    14th December 2019, 01:55
    Enwik8 17801098 bytes
    25 replies | 2188 view(s)
  • Gotty's Avatar
    14th December 2019, 00:10
    Oh, sorry. I could have seen from your earlier posts. Nevertheless these numbers are (or may be) useful for the reader - including me. So thank you for your effort. Appreciated.
    206 replies | 122189 view(s)
  • suryakandau@yahoo.co.id's Avatar
    13th December 2019, 18:06
    bigm_suryak v9 improve wordmodel xml file from silesia benchmark 264409 bytes using ~1.3 gb memory this is archive file contain source code and binary
    25 replies | 2188 view(s)
  • kaitz's Avatar
    13th December 2019, 02:26
    int cxt={}; // dynamic alloc memory for int array int cxt1,cxt2,cxt3,cxt4,N; enum {SMC=1,APM1,DS,AVG,RCM,SCM,CM,MX,ST}; int update(int y,int c0,int bpos,int c4,int pos){ int i; if (bpos==0) cxt4=cxt3,cxt3=cxt2,cxt2=cxt1,cxt1=(c4&0xff)*256; cxt=(cxt1+c0); cxt=(cxt2+c0+0x10000); cxt=(cxt3+c0+0x20000); for (i=0;i<N;++i) vmx(DS,0,cxt);// pr--pr vmx(APM1,0,c0); // return 0; } void block(int a,int b){} int main(){int i; N=3; vms(0,1,1,2,0,0,0,0,0); //APM1,DS,AVG vmi(DS, 0,18,1023,N); // pr..pr vmi(AVG,0,0,1,2); // pr=avg(1,2) vmi(AVG,1,0,0,3); // pr=avg(0,3) vmi(APM1,0,256,7,4); // pr=apm(4) rate 7 cxt1=cxt2=cxt3=cxt4=0; } Above works in newer version. In update only contexts will be set. Prediction order depends on the order set up in main. Sort of like in zpaql config file. ​
    20 replies | 3929 view(s)
  • dnd's Avatar
    12th December 2019, 22:58
    TurboBench - Compression Benchmark updated. - All compressors are continuously updated to the latest version - base64 encoding/decoding - New external dictionary compression with zstd including multiblock mode in TurboBench - benchmarking zstd with external dictionary 1 - generate a dictionary file with: zstd --train mysamples/* -o mydic 2 - start turbobench with: ./turbobench -ezstd,22Dmydic file actually the external dictionary "mydic" must be in the current directoy You can also benchmark multiple small files using multiblock mode in turbobench: 1 - store your small files into a multiblock file using option "M" ./turbobench -Mmymultiblock files (mymultiblock output format: length1,file1,length2,file2,...lengthN,fileN, length=4 bytes file/block length) 2 - Benchmark using option "-m" : ./turbobench -ezstd,22Dmydic mymultiblock -m
    160 replies | 41560 view(s)
  • encode's Avatar
    12th December 2019, 21:36
    I'm not that really into this thing, however here's my quick test of my i5-9600K @ 5GHz:
    206 replies | 122189 view(s)
  • load's Avatar
    11th December 2019, 16:54
    load replied to a thread WinRAR in Data Compression
    WinRAR Version 5.80 https://www.rarlab.com/rarnew.htm https://www.rarlab.com/download.htm
    175 replies | 122093 view(s)
  • GamesBX2910's Avatar
    11th December 2019, 10:31
    Hello IO games or .IO games are real-time multiplayer games, easily identified by the .io domain name (the top-level national domain of the British Indian Ocean Territory - The British Indian Ocean Territory ). Initially, these games were only developed for the web (webgame), but with a boom, game makers have put them on mobile platforms so that players can experience them anytime and anywhere. I will take a few examples for this development: Diep io tank and Chompers games free Game IO is entertaining, light and fun, but equally deep. Players need to have the tactics and patience to win online multiplayer battles and dominate the arena and rankings. So we are developing these .io games for everyone to turn these simple games back into their place. We created websites with thousands of .io games called GamesBX Along with the advancement of science GamesBX offers a wide range of smart games with the form of .io games, suitable for all ages because it is very user friendly, especially very easy to use with the Simple protocol to use directly on the browser. No download time, especially no charge, as it is completely free to play. GamesBX's game store is updated daily with the latest BX games, with a modern interface and a lot of different titles. Includes interactive multiplayer games such as war, action, shooting ...; or strategy games with real time, puzzles, adventure ... and many other things for free. Coppywriter: Gamesbx5.info Blog games Guide games hot
    0 replies | 47 view(s)
  • fhanau's Avatar
    11th December 2019, 05:17
    I think ECT generally rewrites the palette and transforms 1,2 and 4 are done as long as you are using mode 2 or higher.
    428 replies | 112862 view(s)
  • Sportman's Avatar
    11th December 2019, 00:01
    Software-based Fault Injection Attacks against Intel SGX (Software Guard Extensions): https://www.plundervolt.com/doc/plundervolt.pdf https://www.plundervolt.com/ https://github.com/KitMurdock/plundervolt https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00289.html
    17 replies | 3380 view(s)
  • SvenBent's Avatar
    10th December 2019, 19:21
    Does ECT do palette compression /expansion 1: if a palette contains unused entries does ECT discard the unsued entries ? 2: if a palette contains the same colors twice does ECT merge them into 1 entry ? 3: if a palette is only 4 bits ( 16 colors) does ECT try to convert it to 8bits palette for better filtering (as i understand filters works in byte level not pixel level) 4: Does ECT convert to grey tone if the palette entries all falls withing the colors available in grey tone ? Thank you -- edit --- I guess I could have tested this easily once i got home
    428 replies | 112862 view(s)
  • suryakandau@yahoo.co.id's Avatar
    10th December 2019, 07:00
    enwik8 17809514 bytes using ~1.3 gb memory ​
    25 replies | 2188 view(s)
  • SvenBent's Avatar
    10th December 2019, 06:12
    wow I will admit doing this really show ECT impressiveness 24x 8bit paletted Bit Size Encoding Time Original BMP 32 27.30 MB (28,640,482 bytes) Paint save as PNG 32 18.50 MB (19,494,337 bytes) PNGout 32 12.30 MB (12,966,158 bytes) Global Time = 1010.344 = 00:16:50.344 = 100% PNGout /C3 8 9.41 MB ( 9,877,738 bytes) Global Time = 122.219 = 00:02:02.219 = 100% ECT -9 -strip 8 9.40 MB ( 9,860,903 bytes) Global Time = 197.641 = 00:03:17.641 = 100%
    428 replies | 112862 view(s)
  • brispuss's Avatar
    10th December 2019, 03:16
    Thanks for the script! It works! Don't know why I didn't think of this method before!? Anyway, I'll experiment a bit more with lepton and other (pre)processors and see what compression results are obtained and update the compression test list as required.
    25 replies | 1831 view(s)
  • Shelwien's Avatar
    10th December 2019, 01:17
    I just downloaded https://github.com/dropbox/lepton/releases/download/1.2/lepton-slow-best-ratio.exe and tried running it like this: C:\>C:\Users\Shelwien\Downloads\lepton-slow-best-ratio.exe C:\9A6T-jpegdet\A10.jpg lepton v1.0-08c52d9280df3d409d9246df7ff166dd94628730 lepton v1.0-08c52d9280df3d409d9246df7ff166dd94628730 7511941 bytes needed to decompress this file 672236 842468 79.79% lepton v1.0-08c52d9280df3d409d9246df7ff166dd94628730 bytes needed to decompress this file 672236 842468 79.79% and it works. And for recursive processing you can do something like this: cd /d C:\test for /R %a in (*.j*) do C:\lepton\lepton-slow-best-ratio.exe "%a"
    25 replies | 1831 view(s)
  • brispuss's Avatar
    9th December 2019, 21:33
    As mentioned in previous post, getting lepton to run is a bit difficult as it apparently requires to be located in the current working directory of files to be processed. Currently lepton-slow-best-ratio.exe is located in c:\lepton directory. Jpeg files to be processed are located in directory c:\test, plus, there are sub-directories as well. So what is wanted is a batch script that would "effectively" allow lepton-slow-best-ratio.exe to run in the c:\test directory and process files in this directory and also recurse into sub-directories as well. Any ideas please?
    25 replies | 1831 view(s)
  • byronknoll's Avatar
    9th December 2019, 21:33
    @suryakandau, please distribute the source code of bigm along with the releases. Since cmix code is GPL, and bigm contains cmix code, bigm also needs to be open source.
    25 replies | 2188 view(s)
  • byronknoll's Avatar
    9th December 2019, 21:19
    byronknoll replied to a thread cmix in Data Compression
    cmix on enwik8: 14838332 -> 14834133 cmix on enwik9: 115714367 -> 115638939
    418 replies | 102197 view(s)
  • suryakandau@yahoo.co.id's Avatar
    9th December 2019, 18:40
    Bigm is not cmix because bigm use only ~1.3 gb memory.cmix use 24 gb-25gb of memory. I use Windows 10 64 bit. Do it's okay to add bigm to benchmark list
    17 replies | 1250 view(s)
  • cottenio's Avatar
    9th December 2019, 18:15
    Good news! I applied the idea (recursive deltas) to identifying the placement of the gaps (since the data is otherwise sequential) instead of bitmapping the placements first, and it brought the total encoding size down to 1,421 bytes, which handily beat the cmix best benchmark. Per James' suggestion I'm going to write up a quick tool to do compression/decompression automatically on "delta compatible" data like this.
    5 replies | 453 view(s)
  • Kirr's Avatar
    9th December 2019, 17:29
    This would put bigm on #2 for Astrammina rara (after cmix) and on #5 for Nosema ceranae (after jarvis, cmix, xm and geco), in compactness. (Note that this is not the most important measurement for judging practical usability of a compressor). What was compression and decompression time and memory use? Also on what hardware and OS? According to another thread the relationship between bigm and cmix is unclear currently, which probably means that I should not add bigm to benchmark until the issue is resolved?
    17 replies | 1250 view(s)
  • suryakandau@yahoo.co.id's Avatar
    9th December 2019, 11:48
    using bigm_suryak v8 astrammina.fna 361344 bytes nosema.fna 1312863 bytes
    17 replies | 1250 view(s)
  • suryakandau@yahoo.co.id's Avatar
    9th December 2019, 09:29
    bigm_suryak v8 - improve word model -use only 1.3 gb memory xml file from silesia benchmark without precomp: 264725 bytes
    25 replies | 2188 view(s)
  • SvenBent's Avatar
    9th December 2019, 03:54
    its in the download section of this forum. But i am not sure its really optimal now with All the work in ECT. which i why im doing some retesting curerlty doing some testing on kodak image suit
    428 replies | 112862 view(s)
  • zyzzle's Avatar
    9th December 2019, 01:58
    This deflopt bug has bothered me for years -- decades. Is there any way to patch the binary to overcome this 2^32 bit limit for files? At least making it 2^32 bytes would be help very substantially. I take it the source is no longer available or never was. Somehow it must be possible to reverse engineer the binary and eliminate the bits business, change it to bytes...
    428 replies | 112862 view(s)
  • Krishty's Avatar
    9th December 2019, 00:07
    Yes, please share it! If anyone is interested, here is how Papa’s Best Optimizer does it: Copy the file to a temporary location and enforce extension .png because defluff does not work on PNGs without the .png extension also prevents Unicode/max path problems ECT -30060 --alfilters-b --pal_sort=120, optionally with --strip and --strict 30060 is according to my tests earlier in this thread defluff DeflOpt /bk skipped if the file is larger than 512 MiB because it breaks Deflate streams with more than 2³² bits because you said it’s best to run it after defluff if defluff could not shrink the file and DeflOpt printed Number of files rewritten : 0, optimization stops here; else there has been some gain (even single-bit) and it goes back to 3.This is broken in the current version and will be fixed for the next one; missed a 1-B gain on two out of 200 files.The next time I have plenty of spare time, I want to check out Huffmix.
    428 replies | 112862 view(s)
  • LucaBiondi's Avatar
    8th December 2019, 14:20
    LucaBiondi replied to a thread fpaq0 in ADA in Data Compression
    Kaitz you are my hero :) Should you be able to write it in delphi? Have a good sunday!
    4 replies | 343 view(s)
  • Shelwien's Avatar
    8th December 2019, 12:48
    Shelwien replied to a thread vectorization in Data Compression
    > for text at least, vectorizing is always better option Lossy OCR is not a better option. Especially for languages like japanese or chinese. Too many errors. Lossless (symbols + diff, like what djvu tries to do) kinda makes sense, but its still hard to compress it better than original bitmap. Also tracing errors are more noticeable than eg. jpeg blur. And high-quality tracing produces too much data. Of course, its a useful tool anyway, but for reverse-engineering (when we want to edit a document without source) rather than compression.
    3 replies | 374 view(s)
  • pklat's Avatar
    8th December 2019, 11:36
    pklat replied to a thread vectorization in Data Compression
    I had tried djvu long ago. its nice, but today OCR is excellent. svg format can combine bitmap and vector. so it would be ideal replacement for proprietary pdf. for text at least, vectorizing is always better option, for number of reasons, obviously.
    3 replies | 374 view(s)
  • Mauro Vezzosi's Avatar
    8th December 2019, 00:09
    Mauro Vezzosi replied to a thread cmix in Data Compression
    Cmix commit 2019/12/05, changing from layer_norm to rms_norm: yes, rms_norm looks better than layer_norm. How much does cmix (or lstm-compress) improve?
    418 replies | 102197 view(s)
  • brispuss's Avatar
    7th December 2019, 23:55
    Thanks for clarifying. I'll amend compression descriptions in a while. Now regarding lepton-slow-best-ratio, which doesn't seem to work properly as described in my post above. From quick search on the 'net, it seems the issue may be due to how "lepton" has been coded. It seems that lepton has to be placed within the current working directory in order for it to find files to process(?) If so, what batch script, if any, will effectively place lepton within recursed directories in order for it to locate and process (jpg) files?
    25 replies | 1831 view(s)
  • Shelwien's Avatar
    7th December 2019, 22:05
    Shelwien replied to a thread fpaq0 in ADA in Data Compression
    @JamesWasil: I think its not just a random choice of language - https://encode.su/threads/3064-paq8pxv-virtual-machine?p=62339&viewfull=1#post62339
    4 replies | 343 view(s)
  • JamesWasil's Avatar
    7th December 2019, 22:03
    JamesWasil replied to a thread fpaq0 in ADA in Data Compression
    Lol I haven't used Ada for ages. That is cool though, kaitz. Thanks! Next someone should port it to Fortran, Basic, and APL lol (Has anyone here used APL for anything in the last 30 years?)
    4 replies | 343 view(s)
  • Shelwien's Avatar
    7th December 2019, 22:02
    Shelwien replied to a thread fpaq0 in ADA in Data Compression
    So can you explain what you see in Ada that makes it useful for compression algorithms? This source looks pretty similar to pascal/delphi to me.
    4 replies | 343 view(s)
  • kaitz's Avatar
    7th December 2019, 21:55
    kaitz started a thread fpaq0 in ADA in Data Compression
    This is fpaq0 ported to ADA language. source: http://mattmahoney.net/dc/fpaq0.cpp Used GPS 19.1 dev enviroment for that. Executable is static. Compression/decompression is identical to cpp source. This is first time to write something in ADA so probably not the best example writing correct code. ​:D
    4 replies | 343 view(s)
  • cottenio's Avatar
    7th December 2019, 21:01
    Hi Gotty! Ha, you're absolutely right, and what's crazy is I had the same thought you did and am building a minimal skeleton first. I have similar results to yours, although I also found some fun ways to infer data about the relationship with timestamps and revision_ids and storing the errors from predicted values. I'll definitely check out your code as well!
    5 replies | 453 view(s)
  • cottenio's Avatar
    7th December 2019, 20:59
    Thanks for the warm welcome James! I really appreciate your insight and will definitely try out the technique on other sources as well; thanks for the links!
    5 replies | 453 view(s)
  • dado023's Avatar
    7th December 2019, 20:23
    Hi Sven, would you be so kind and share your pngbest.bat. I am mostly fan of best possible compression, but within in a reasonable compression duration :)
    428 replies | 112862 view(s)
  • Aniskin's Avatar
    7th December 2019, 20:13
    Imho quite strange using of MFilter. And again - MFilter should be used with additional compression -m1=LZMA(2). MFilter separates input jpeg file to metadata and jpeg data, then pass jpeg data into selected jpeg coder. So if you don`t use -m1=LZMA(2) all metadata will be without any compression and additionally result will have overhead added by 7Z file format and MFilter codec. And simple using of Lepton/Brunsli/PAQ will give you better results because they compress metadata. lepton+paq8px 52,820,938 7z a -m0=mfilter:a2 <- not the same size as last entry at 50,782,720 despite similar processing!? This is not lepton+paq8px. This is paq8px only. Metadata without any compression. And simple paq8px give you better result. lepton (slow version) 52,705,758 ran under original 7-zip using command 7z a -m0=mfilter:a1 Metadata without any compression. Simple Lepton.exe may give you better result. brunsli 52,444,562 7a a m0=mfilter:a0 Metadata without any compression. Simple Brunsli.exe may give you better result. lepton+paq8px+lzma2 51,434,256 7z a -m0=mfilter:a2 -m1=lzma2:x9 This is not lepton+paq8px+lzma2. This is paq8px+lzma2.
    25 replies | 1831 view(s)
  • Gotty's Avatar
    7th December 2019, 19:09
    Also refer to Fastest and smallest enwik8 skeleton (de)compression (for fun) The observation is the same, the solution is a little bit different.
    5 replies | 453 view(s)
  • schnaader's Avatar
    7th December 2019, 12:41
    Thanks! That was a mutex error in PNG restoration, fixed.
    57 replies | 5126 view(s)
  • JamesWasil's Avatar
    7th December 2019, 09:28
    Hi Cottenio, and welcome to the Encode Data Compression forums! What you've done is interesting, as it's a form of lossless Delta Compression combined with intelligent binary headers and flags speculating distances for page data, while compressing the gaps by the shortest predictions possible with logarithmic bits required for each structure. There really should be a name for structural targeting and compression of file attributes that are outside of LZ pattern matching or other forms of weighted context mixing and partial matches, etc. As far as I'm aware, there is no official name for it yet, but perhaps there should be one and a glossary of nomenclature to accompany it? Although there are names and definitions for things mathematically that are commonly understood and accepted for naming convention; things like order 0 translating to 1 byte analysis, order 1 as 2 bytes, order 2 as 3, etc as n+1 always, etc. There are many things not defined by a static name that may be beneficial to assign, and your neat work (whether you partially reinvented the wheel or not ;) ) brings the importance of that to the forefront. I suppose we should name it the Cottenio Delta algorithm, since it is yours and it is a form of delta encoding. What do you guys think? P.S: You may want to see if there are ways to apply this and tailor it to be useful with other text files that are outside of enwiki8...perhaps focusing upon spaces and page breaks CR+LF (chr(13)+chr(10)) or other commonalities in text to preformat it for compression. There are several ways you might go about implementing this, like detecting how many words exist before a space or sequence of common characters, removing them, and then representing them with binary flags similar to how you did with the page IDs and missing pages from enwiki8. That said, it might end up being a form of precomp if you're able to isolate the bit flags and have the rest of the file remain text data that can still be worked upon by other compressors, adding to their efficiency. That's a way to approach it and one idea for it, but there are many more I'm sure. P.P.S: If you do decide to expand upon it further or tie together this implementation with a form of LZ hybrid for a stand-alone text-based compressor, you might find some of Ross Williams work from years ago as beneficial to do that, available freely on http://ross.net/compression/introduction.html (You still might want to make it a precomp and use cmix for better compression, but you have plenty of options)
    5 replies | 453 view(s)
  • SvenBent's Avatar
    7th December 2019, 02:36
    Thank you. It was mostly for curiosity and testing. I am doing a re evaluation of ect against my old pngbest.bat script. trying to figure out which tools are still useable when using ECT
    428 replies | 112862 view(s)
  • cottenio's Avatar
    7th December 2019, 00:14
    Hi all, I'm relatively new to data compression and have been toying around with enwik8. After noticing that the page ids (which are strongly structured as <id></id> tags) are sequential in nature with gaps due to - I assume - deletions of dead pages , I tried my hand at doing some delta encoding of the gaps, while knowing that in general I could assume at least a delta of 1 for each subsequent id. Knowing that there are 12,347 total pages in enwik8 I did as follows: The largest gap I found was 198, so in the first naive implementation I stored each delta in ceil(log(198,2),1) bits, which was 8, which took up 12,347 bytes. I'm sure no one is surprised by that. 12,347 bytes was much better than the 226,718 bytes that the natural text represented ("____<id>%i</id>\n"). I wondered how I could make that more efficient, so I tried getting a feel for the shape of the deltas, which looked like this: As you can see the outliers like 198 and 178 massively inflated the bits required, so the next strategy was bit-packing everything but those two with only 5 bits, and then manually fixing those during the decoding process. Now the largest delta was only 25 (5 bits) and that took ceil((12,345 * 5) / 8, 1) = 7,716 bytes (+ a few more for the two repairs), which I thought was a neat improvement. Next I thought: can I embed that idea directly in an unpacking structure that operates on interval steps? I worked out the math and found that: 12,347 values as 1 bit: 0 for no additional delta, 1 for additional delta (max is 1) 2,151 values from above had 1's. Most of them don't have any higher value, so I stored another 2,151 values as 1 bit: 0 - for done, 1 additional delta (max is 2) 475 of those had reached the equivalent of 2 by this point. So I stored another 475 as 2 bit values: 0 - for done, 1 - 3 as higher numbers with 3 meaning the possibility of additional delta beyond the sum so far (max is 5) 18 of those reached 5 from before, so I stored 18 as 4 bit values using the same idea, with a max reaching 20 5 of those reached 20, so i stored 5 as 8 bit values. Altogether, this required 1,544 + 269 + 119 + 9 + 5 = 1,946 bytes. I wrote out a binary file containing those bit-packed sequences, then a short C program to decode it again and it worked fine. I threw it into assembly (and I am by no means a talented assembly programmer) and ended up with a 4,096 byte win32 console executable (data embedded) - which I thought was pretty great for a little decoding engine that can unpack the deltas and recreate the original enwik8 strings. The total compression storage is, compared by type (compared to 4,096 bytes): Raw enwik8 lines: 226,718 (1.8%) Parsed chars/no-tags (just the numbers): 53,320 (7.7%) 32-bit integers: 49,388 (8.3%) 8-bit deltas: 12,347 (33.2%) Control: cmix achieved a 1,685 byte file on the a text file just containing the original page ids. So I'm pretty sure I've reinvented a wheel here, but I have no idea what the technique is called so I can learn more about it. It's like a delta encoder, but it infers additional interval steps based on an expansion and knowing the ceiling of the previous interval. Any ideas? I've attached the raw data and the executable for review, and included a link to the bit packing data as a Google sheet. enwik8_raw_page_ids.txt - a grep of all page id lines from enwik8 page_id_decoder.exe - extracts the same content as enwik8_raw_page_ids.txt https://docs.google.com/spreadsheets/d/1Xq9-KVF40BxwUNf6pkXM0z2L6X0vKryT-fw8122VGZE/edit?usp=sharing
    5 replies | 453 view(s)
  • brispuss's Avatar
    7th December 2019, 00:01
    Added four more compression methods. However, having problem running lepton (lepton-slow-best-ratio) by itself. Command syntax appears when typing in lepton-slow-best-ratio.exe, but trying to compress files always brings up error - "Failed to start subprocess with command line OS_ERROR". Command - lepton-slow-best-ratio.exe 0067.jpg 0067.lep (for example), or even just - lepton-slow-best-ratio.exe 0067.jpg brings up this error!? Why getting this error message? BTW running under Windows 7 64 bit.
    25 replies | 1831 view(s)
  • fhanau's Avatar
    6th December 2019, 22:30
    You can change it in the source code if it is important to you, but I do not plan to support this in general.
    428 replies | 112862 view(s)
  • JamesB's Avatar
    6th December 2019, 20:15
    Gzip is too popular. I regularly have discussions trying to channel people towards zstd instead. No reason to use gzip in modern era IMO unless it's some legacy compatibility.
    17 replies | 1250 view(s)
  • SvenBent's Avatar
    6th December 2019, 19:44
    When running with --allfilters-b, ECT stop after 500generations with no progress. Is there a way to increase that threshold ?
    428 replies | 112862 view(s)
  • Aniskin's Avatar
    6th December 2019, 18:35
    Technically there is no problem to create such version of the codec. I will think about it.
    25 replies | 1831 view(s)
  • Shelwien's Avatar
    6th December 2019, 18:20
    @Aniskin: Btw, is it possible to get mfilter to output jpeg and metainfo to different streams? With that we could avoid compressing jpegs twice...
    25 replies | 1831 view(s)
  • Aniskin's Avatar
    6th December 2019, 18:04
    If you want to use MFilter+Lepton: 7z a -m0=mfilter:a1 -m1=lzma2:x9 If you want to use MFilter+paq8: 7z a -m0=mfilter:a2 -m1=lzma2:x9 Also what about solid compression?
    25 replies | 1831 view(s)
  • brispuss's Avatar
    6th December 2019, 17:38
    Thanks. I didn't remove metadata.
    25 replies | 1831 view(s)
  • smjohn1's Avatar
    6th December 2019, 17:33
    Did you remove meta-info when using packJPG ( -d )? meta-info has a large percentage for small files.
    25 replies | 1831 view(s)
  • Kirr's Avatar
    6th December 2019, 17:22
    Yes, fqzcomp performs well considering it works via wrapper that chops long sequence into reads. (And adds constant quality as per your idea, which I probably took a bit too far :-)). Interestingly, it is currently leading in compactness on spruce genome: chart (though this test is not complete, some compressors are still missing). Also it may still improve more after I add its newly fixed "-s9" mode. I guess it will work even better on proper fastq shord reads datasets. Thanks. Yeah, NAF is focused on transfer + decompression speed, because both of these steps can be a bottleneck in my work. I noticed that many other sequence compressors are primarily optimized for compactness (something I did not know before doing the benchmark), which partly explains why gzip remains popular.
    17 replies | 1250 view(s)
More Activity