Activity Stream

Filter
Sort By Time Show
Recent Recent Popular Popular Anytime Anytime Last 24 Hours Last 24 Hours Last 7 Days Last 7 Days Last 30 Days Last 30 Days All All Photos Photos Forum Forums
  • Darek's Avatar
    Today, 11:22
    Darek replied to a thread Paq8pxd dict in Data Compression
    125,101,814 bytes, 89,181.930 sec., paq8pxd_v81_avx2 -x15, enwik 1423
    802 replies | 290354 view(s)
  • lz77's Avatar
    Today, 11:03
    I wrote for sports and academic interest pure and only LZ77 type compressor (while this is a prototype program for debugging) that for example beats blzpack -1 ... & blzpack -2 ... and approaching the zstd -1 ... So I want to compare my compressor with others that I may not know about. I'm going to port one of my algorithms on FASM for Win64 to achieve maximum results. I would like to find buyers for my algorithms/sources. Where I can download Windows binaries of LittleBit to compare with mine?
    5 replies | 295 view(s)
  • Mauro Vezzosi's Avatar
    Today, 09:37
    Mauro Vezzosi replied to a thread paq8px in Data Compression
    Can this post be useful?
    1845 replies | 525963 view(s)
  • moisesmcardona's Avatar
    Today, 04:13
    moisesmcardona replied to a thread paq8px in Data Compression
    Translated SSE2 to NEON by using this as a guide: https://github.com/jratcliff63367/sse2neon/blob/master/SSE2NEON.h So now paq8px can use it if the ARM CPU supports it. It also produces identical results when using SSE2 and AVX2 to extract a PAQ8PX file compressed with NEON and vice versa. This means that there may be a bug on the "None" simd section of the code, as it's the only one that doesn't produce an identical file. Using other SIMD extensions work fine. My branch for SIMD_NEON: https://github.com/moisespr123/paq8px/tree/simd_neon Should I open a merge request?
    1845 replies | 525963 view(s)
  • moisesmcardona's Avatar
    Today, 00:07
    moisesmcardona replied to a thread paq8px in Data Compression
    Today I decided to get PAQ8PX to compile on ARM CPU's. For this, I had to do some defines to check if we are compiling on x86 or ARM CPU's, since ARM doesn't have SSE/SSE2/AVX2 instructions. Also, the `rsqrt` on `Ilog.cpp` needed to be translated to ARM SIMD instructions, so I'm also checking which processor we are using there. Tested it with my Samsung Galaxy S9+ which has 6GB of RAM itself. Running on Ubuntu under Termux. Here is my pull request: https://github.com/hxim/paq8px/pull/123/ Now, here's the interesting part and I tested it with the current branch as well as mine. It seems there is a bug somewhere when identical results will be produced if we use `-simd none`. However, it seems that either there's a bug somewhere, or maybe it is a correct behavior due to different simds? When a file is compressed with `-simd sse2` or `-simd avx2` it cannot be extracted with `-simd none`. Compression works, but the files are not the same. The same is true for the opposite. Files compressed with `-simd none` will not produce identical results if extracted with `-simd avx2` or `-simd sse2`.
    1845 replies | 525963 view(s)
  • Kaw's Avatar
    Yesterday, 17:49
    LittleBit (https://github.com/kapenga/LittleBit) can compress Enwik8 with a static Huffman tree to less than 40%. With a 1.5mb tree it could compress enwik8 to 26,5mb and that's including the tree. With a tree limited to 512kb it would be still below 40mb because the gains after a 512kb tree are small. I think other methods should be possible too. It should be do-able. But why would you want this?
    5 replies | 295 view(s)
  • kaitz's Avatar
    Yesterday, 16:48
    kaitz replied to a thread Paq8pxd dict in Data Compression
    No. v81 is probably tiny bit faster.
    802 replies | 290354 view(s)
  • Darek's Avatar
    Yesterday, 15:20
    Darek replied to a thread Paq8pxd dict in Data Compression
    Nope... -s0 -w and also -x0 -w transforms fails at all with pure 1423 file. But if performs well with enwik9_1423.DRT file. Is there any difference between -s0 -w and -x0 -w options? If there was "small" mistake in enwik10 only then enwik9 should be the same for paq8pxd v80 and v81, that's right?
    802 replies | 290354 view(s)
  • Sportman's Avatar
    Yesterday, 11:53
    Sportman replied to a thread Paq8pxd dict in Data Compression
    enwik9: 124,905,286 bytes, 72,450.859 sec., paq8pxd_v81_avx2 -x15 -w
    802 replies | 290354 view(s)
  • lz77's Avatar
    Yesterday, 09:26
    I thought it was intuitively clear... It must be like LZ4/LZ5 without search matches. Only hash table with size ~ 512 Kb and no additional compression of mathes/literal length/literals/prefixes and offsets. Bit or byte encoding is up to you.
    5 replies | 295 view(s)
  • Jarek's Avatar
    Yesterday, 06:45
    I have finally looked at that, and for the last scan in database of 48 grayscale 512x512 images: - predicting value from context lead to 0.393 bits/value savings - blue dots, especially thanks to "C-D" giving local gradient, - additionally predicting width (error level), the savings grew to 0.645 bits/value - green dots. https://arxiv.org/pdf/2004.03391
    103 replies | 24988 view(s)
  • Alexander Rhatushnyak's Avatar
    Yesterday, 06:33
    I downloaded the validation set. My detector for ex-JPEGs says 358/500 of images are definitely ex-JPEGs. Lossless compression? tuned for ex-JPEGs?
    1 replies | 190 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Yesterday, 02:28
    PAQ8SK v1 this is forked from pa8pxv182fix1 with tweak text model n increase memory usage from 4gb upto ~7gb. here is the source code and the binary. the result of xml file use -9eta option is: paq8pxv182fix1 250750 bytes paq8sk 249237 bytes enwik8 use paq8sk -9eta is: 16289679 bytes in 30307.92 sec
    0 replies | 90 view(s)
  • kaitz's Avatar
    Yesterday, 00:09
    kaitz replied to a thread Paq8pxd dict in Data Compression
    Option -w is general, if i add this change it makes it target specific to one file. So another option. This is moment i like to have external config (like pxv) to use and added to archive.
    802 replies | 290354 view(s)
  • Darek's Avatar
    7th April 2020, 22:39
    Darek replied to a thread Paq8pxd dict in Data Compression
    > My test version finished on enwik9. Now i dont know result of v80 but compared to v79 -s15 its about 727kb better. Damn! It's great result! I'm still testing some files with v80 then I'll test enwik9. This result means that -x15 version should be even 400-500KB less = 124'xxx'xxx > And how to handle it on command line. - what this means? Is this options hardcoded?
    802 replies | 290354 view(s)
  • Sportman's Avatar
    7th April 2020, 21:26
    PyTorch Implementation of "Lossless Image Compression through Super-Resolution" https://github.com/caoscott/SReC
    1 replies | 190 view(s)
  • suryakandau@yahoo.co.id's Avatar
    7th April 2020, 18:30
    PAQ8SK this is forked from paq8pxv182fix1 with some tweaking on textmodel n increase the memory usage upto 7gb. the result for xml file is: paq8pxv182fix1​ 250750 bytes paq8sk 249237 bytes enwik8 on progress
    5 replies | 295 view(s)
  • jibz's Avatar
    7th April 2020, 18:04
    Given the previous times you've asked roughly the same, you need to specify exactly what you mean by "LZ77 only". There is a whole range of options from actual LZ77 with a fixed size token for length and offset, over variable length integer encodings like LZ4, to ranges inside bytes like lzo1x, to bit-level encoding like LZSS uses for literal/match, to universal codes like Elias gamma.
    5 replies | 295 view(s)
  • kaitz's Avatar
    7th April 2020, 17:04
    kaitz replied to a thread Paq8pxd dict in Data Compression
    My test version finished on enwik9. Now i dont know result of v80 but compared to v79 -s15 its about 727kb better. I will think about this and test some other things. And how to handle it on command line.
    802 replies | 290354 view(s)
  • lz77's Avatar
    7th April 2020, 16:42
    Can anyone compress enwik8 with ratio < 40% using LZ77 only with a simple hash table?
    5 replies | 295 view(s)
  • Romul's Avatar
    7th April 2020, 13:17
    The fact of the matter is that if I'm right, then there is no difference between the "originally digital noise" and the digitized analog signal. All this has a certain generative function. Abstract mathematical white noise has no restrictions on frequencies and amplitudes. Both can take on infinite meanings. And therefore, it (perfect white noise) cannot be described, except in the form of some idea, like the same infinity. Or rather, that the parameters describing this noise require infinite accuracy in the description. That is why I highlighted "discrete white noise" in my text, referring to the sequence with restrictions on frequencies and amplitudes. PS: I write through an online translator, so my text may not look very correct. Приведу и текст на русском языке. Не уверен я в точной передаче смысла при использовании автоматического переводчика. : В том то и дело, что если я прав, то разницы между "изначально цифровым шумом" и оцифрованным аналоговым сигналом нет. У всего этого есть некая порождающая функция. Абстрактный математический белый шум не имеет ограничений на частоты и амплитуды. И то и другое может принимать бесконечные значения. Или вернее сказать что параметры описывающие этот шум требуют бесконечной точности(числа бесконечной длины) при описании. И поэтому он(идеальный белый шум) не может быть описан, кроме как в виде некоторой идеи, вроде той же бесконечности. И именно поэтому я выделил в своем тексте "дискретный белый шум" имея в виду последовательности с ограничениями по частотам и амплитудам.
    42 replies | 1914 view(s)
  • Darek's Avatar
    7th April 2020, 00:11
    Darek replied to a thread Paq8pxd dict in Data Compression
    @Kaitz -> this is an paq8px v81 version or something new? Ok, I see - it's v82... I've gathered some scores of paq8pxd v80 for enwik8 and comparison to paq8pxd v79 scores are: 16'272'537 - enwik8 -s8 by Paq8pxd_v79_AVX2 16'214'034 - enwik8 -x8 by Paq8pxd_v79_AVX2 15'925'621 - enwik8 -s15 by Paq8pxd_v79_AVX2- tested by Sportman 15'862'122 - enwik8 -x15 by Paq8pxd_v79_AVX2- tested by Sportman 15'843'925 - enwik8.drt -x15 by Paq8pxd_v79_AVX2 16'265'881 - enwik8 -s8 by Paq8pxd_v80_AVX2 16'222'997 - enwik8 -s8 -w by Paq8pxd_v80_AVX2- tested by Kaitz 16'207'724 - enwik8 -x8 by Paq8pxd_v80_AVX2 16'162'663 - enwik8 -x8 -w by Paq8pxd_v80_AVX2 15'924'798 - enwik8 -s15 by Paq8pxd_v80_AVX2 15'898'839 - enwik8 -s15 -w by Paq8pxd_v80_AVX2 15'861'418 - enwik8 -x15 by Paq8pxd_v80_AVX2 - tested by Sportman 15'835'340 - enwik8 -x15 -w by Paq8pxd_v80_AVX2 - tested by Sportman - best score ever for paq8pxd series for enwik8 - what is important that there is a better score than for DRT preprocessed file - the first case from long time! 15'849'095 - enwik8.drt -x15 by Paq8pxd_v80_AVX2 15'849'039 - enwik8.drt -x15 -w by Paq8pxd_v80_AVX2
    802 replies | 290354 view(s)
  • kaitz's Avatar
    6th April 2020, 20:26
    kaitz replied to a thread Paq8pxd dict in Data Compression
    Made another test. Moving some content around. enwik8 -s8 -w 16222552 fixed my mistake compared to v80 16217354 reordered some articles Data size moved after main data is 1814632 bytes. Total 159 articles. It should be the same thing as Darek's 1423 ordering but on enwik8. This also on enwik9 selects about 130MB that matches http://mattmahoney.net/dc/textdata.html gap. And about 49000 articles. enwik9 test running... ​ How selection is made is kindof stupid. Really. , EDIT: To be sure :) made comparsion:
    802 replies | 290354 view(s)
  • kaitz's Avatar
    6th April 2020, 20:07
    kaitz replied to a thread Paq8pxd dict in Data Compression
    No.
    802 replies | 290354 view(s)
  • LucaBiondi's Avatar
    6th April 2020, 17:40
    LucaBiondi replied to a thread Paq8pxd dict in Data Compression
    I found that one xml file is in UTF8 format. The other is in UFT16 format (noetpad++ tell me "UCS-2 LE BOM") Should UTF16 files detected as text files? Thank you as usual!! Luca
    802 replies | 290354 view(s)
  • kaitz's Avatar
    6th April 2020, 17:06
    kaitz replied to a thread Paq8pxd dict in Data Compression
    v81 fixes enwik10 processing only, there is still "small" mistake i added. So testing v80 is ok. No need for v81. Also if 1423 order is used, please check with -s0 -w that it does transform without fail. ' Also drt has no effect on -w option. Transform will fail.
    802 replies | 290354 view(s)
  • snowcat's Avatar
    6th April 2020, 13:53
    snowcat replied to a thread 2019-nCoV in The Off-Topic Lounge
    I think that only apply for specialized mask. Currently I don't think normal mask has those things...
    39 replies | 2208 view(s)
  • RichSelian's Avatar
    6th April 2020, 12:29
    RichSelian replied to a thread 2019-nCoV in The Off-Topic Lounge
    chinese scientists said that do not spray anything like alcohol or disinfectant on mask, they will break the micro-structure of a mask and make it useless.
    39 replies | 2208 view(s)
  • moisesmcardona's Avatar
    6th April 2020, 01:31
    It seems that building the avrecode tool isn't working on Ubuntu 20.04:
    25 replies | 11213 view(s)
  • Shelwien's Avatar
    6th April 2020, 01:09
    Shelwien replied to a thread paq8px in Data Compression
    @suryakandau: It compiles, only needs extra -DWINDOWS
    1845 replies | 525963 view(s)
  • LawCounsels's Avatar
    5th April 2020, 23:54
    LawCounsels replied to a thread 2019-nCoV in The Off-Topic Lounge
    essential items to have will recover even when infected. Russian Research Institure patented product http://biosilverlab.com Spray colloidal silver on mask last whole day ( 70% alcohol last 15 minutes only ). If symptoms show ingest will remove virus from throat UV lights kills all virus : https://tekwase.com/search?q=Uvc Hydrogen water best immunity boost : https://earlytechadopters.com/collections/all/products/portable-hydrogen-generator Any interedting items to add ? ( like off the counter drugs which effective kill virus ? )
    39 replies | 2208 view(s)
  • Jyrki Alakuijala's Avatar
    5th April 2020, 22:32
    Agreed. Html and xml are not great for transport and parsing efficiency. Packing things together can become dangerous due to enabling new attacks.
    401 replies | 124917 view(s)
  • Jyrki Alakuijala's Avatar
    5th April 2020, 21:43
    I have to say that I am proud of the jpeg xl team delivering such coding speeds and the strong guarantees on quality above 0.5 bpp (1 : 50 compression).
    103 replies | 24988 view(s)
  • Darek's Avatar
    5th April 2020, 20:12
    Darek replied to a thread Paq8pxd dict in Data Compression
    @Kaitz - this version = paq8pxd v81 - It's only fix? I'm asking about enwik8 and enwik9 testing => should I use v81 instead of v80 version or there are would be different scores?
    802 replies | 290354 view(s)
  • kaitz's Avatar
    5th April 2020, 19:01
    kaitz replied to a thread Paq8pxd dict in Data Compression
    No need to test this one.
    802 replies | 290354 view(s)
  • Darek's Avatar
    5th April 2020, 18:49
    Darek replied to a thread Paq8pxd dict in Data Compression
    Damn, Kaitz is too fast... I'm still testing v80 ;)
    802 replies | 290354 view(s)
  • moisesmcardona's Avatar
    5th April 2020, 18:32
    v81 is on GitHub: https://github.com/kaitz/paq8pxd/releases/tag/v81
    802 replies | 290354 view(s)
  • Darek's Avatar
    5th April 2020, 17:30
    Darek replied to a thread Paq8pxd dict in Data Compression
    @CompressMaster - first attempt to tarball compression: 10'335'343 - score of compress particular files with one option => paq8pxe v1 gc82 10'116'571 - best score of solid archive, revisited by paq8pxe v1 gc82, I've faced some issues with paq8px v184 and v185 to test... 10'122'251 - tarball file compressed by paq8pxe v1 gc82 - slightly worse than solid compresion for the same version. I wonder if file order could be important for tar file. Question - is it possible to set particular files in tarball file in my own order?
    802 replies | 290354 view(s)
  • schnaader's Avatar
    5th April 2020, 17:22
    schnaader replied to a thread Paq8pxd dict in Data Compression
    The second number in each line is a 32 bit datatype display error. 33310 MiB*1024*1024 modulo 2^32 = 568,328,192 bytes
    802 replies | 290354 view(s)
  • LucaBiondi's Avatar
    5th April 2020, 16:03
    LucaBiondi replied to a thread Paq8pxd dict in Data Compression
    Thank you!
    802 replies | 290354 view(s)
  • Sportman's Avatar
    5th April 2020, 15:07
    Sportman replied to a thread Paq8pxd dict in Data Compression
    I use only 32GB and no problems so far.
    802 replies | 290354 view(s)
  • LucaBiondi's Avatar
    5th April 2020, 14:49
    LucaBiondi replied to a thread Paq8pxd dict in Data Compression
    Thank you! ...so i need al least 48 gb
    802 replies | 290354 view(s)
  • Darek's Avatar
    5th April 2020, 14:46
    Darek replied to a thread Paq8pxd dict in Data Compression
    Looks like the even times are smaller... !
    802 replies | 290354 view(s)
  • Sportman's Avatar
    5th April 2020, 14:05
    Sportman replied to a thread Paq8pxd dict in Data Compression
    "used 33310 MB (568349169 bytes) of memory" -x15 "used 33310 MB (568349201 bytes) of memory" -x15 -w
    802 replies | 290354 view(s)
  • Sportman's Avatar
    5th April 2020, 14:04
    Sportman replied to a thread Paq8pxd dict in Data Compression
    enwik8: 15,835,340 bytes, 7,418.998 sec., paq8pxd_v80_avx2 -x15 -w
    802 replies | 290354 view(s)
  • kaitz's Avatar
    5th April 2020, 13:52
    kaitz replied to a thread Paq8pxd dict in Data Compression
    ​enwik8 16222997 -s8 -w Paq8pxd_v80_AVX2Decompression identical.
    802 replies | 290354 view(s)
  • LucaBiondi's Avatar
    5th April 2020, 13:21
    LucaBiondi replied to a thread Paq8pxd dict in Data Compression
    Hi sportman how much memory do you need to use -x15 option? I am upgrading some server.. Luca
    802 replies | 290354 view(s)
  • Darek's Avatar
    5th April 2020, 13:14
    Darek replied to a thread paq8px in Data Compression
    There is something wrong with use "solid" compression with paq8px v185, until paq8pxe v1 gc82 it works fine but starting from v185 compreccion scores are much worse (2.5x higher) than previous versions. Maybe it could be helfpful that paq8px v184 runs quite ok, but for some files (I.EXE) there is an crash with "encodeExe read error".
    1845 replies | 525963 view(s)
  • Sportman's Avatar
    5th April 2020, 11:57
    Sportman replied to a thread Paq8pxd dict in Data Compression
    enwik8: 15,861,418 bytes, 7,564.860 sec., paq8pxd_v80_avx2 -x15
    802 replies | 290354 view(s)
  • suryakandau@yahoo.co.id's Avatar
    5th April 2020, 06:53
    @shelwien i use this script to compile paq8px v182 and it works but for paq8pxd v69 it does not work why ??
    1845 replies | 525963 view(s)
  • suryakandau@yahoo.co.id's Avatar
    5th April 2020, 04:05
    ​how to compile paq8pxd uses mingw ? could you give the script to compile it ? thank you
    1845 replies | 525963 view(s)
  • schnaader's Avatar
    4th April 2020, 23:10
    Thanks for the info, but please don't forget the units ;) The paper says "throughput in megapixels/second". I was confused at first because I read it as "seconds".
    103 replies | 24988 view(s)
  • Darek's Avatar
    4th April 2020, 23:07
    Darek replied to a thread Paq8pxd dict in Data Compression
    @CompressMaster - hmmm, I've never think of doing that. Really. But it looks as quite nice idea to try! Thanks. At this moment my records are: 10'335'343 - score of compress particular files with one option => paq8pxe v1 gc82 10'196'351 - summary of best scores for all files => various compressors - paq8px, paq8pxd and cmix actually 10'116'576 - best score of solid archive => paq8px v183fix1 -9t - it gets advantage of two similar files (D.TGA and E.TIF) which are different formats but there are the same images. I'll test best versions of decent compressors to check if tarball file get better score.
    802 replies | 290354 view(s)
  • CompressMaster's Avatar
    4th April 2020, 22:03
    @Darek, have you ever tried to TAR files first? Maybe then you could achieve better results...
    802 replies | 290354 view(s)
  • Jarek's Avatar
    4th April 2020, 21:31
    Benchmarking JPEG XL image compression (1 April 2020): https://www.spiedigitallibrary.org/conference-proceedings-of-spie/11353/113530X/Benchmarking-JPEG-XL-image-compression/10.1117/12.2556264.full?SSO=1 E.g. speed: Codec Encode Decode JPEG XL (N=4) 49.753 132.424 JPEG (libjpeg) 9.013 11.133 JPEG (libjpeg-turbo) 48.811 107.981 HEVC-HM-YUV444 0.014 5.257 HEVC-x265-YUV444 1.031 14.037 HEVC-x265-YUV444 (N = 4) 3.691 14.100 HEVC-x265-YUV444 (N = 8 ) 6.345 13.471
    103 replies | 24988 view(s)
  • Darek's Avatar
    4th April 2020, 17:25
    Darek replied to a thread Paq8pxd dict in Data Compression
    Scores of my testset by paq8pxd v80. Mixed results => exe files got some improvements.K.WAD, L.PAK, got some loses...
    802 replies | 290354 view(s)
  • Trench's Avatar
    4th April 2020, 17:06
    Trench replied to a thread 2019-nCoV in The Off-Topic Lounge
    on the other hand truth only comes when everything is evaluated ever if right or wrong. Over a trillion for corona stimulus they voted on and the people get around 200 billion (1/10). what about the the majority of the trillion? O that goes to organizations like arts, public broadcasting, etc. in other words mostly unrelated things. remember the coronavirus of 2012 the media said it would be a pandemic? More died from the common flu yet didnt destroy the world economy which more people will get sick and die from other things than the virus. People with low vitamin D levels report say get it serious. And people which have a genetically low vitamin C also. "The microbe is nothing. The terrain is everything" — Claude Bernard 1813-1878 (widely regarded to be the father of modern physiology). “The primary cause of disease is in us, always in us” — Professor Pierre Antoine Bechamp, 1883. How does one get bad terrain? environment from the AIR, WATER, FOOD. What does air have and why do wild fires go more out of control? Why are toxic metals in more people and increasing every year like aluminum, mercury, etc? Too much chromium which the side affect is lung scaring (breathing issues)? What happens when you put metal in the microwave? bad idea. or putting your head outside the microwave bad but not constant(one punch). How would the body react when something else that gives off the waves nonstop like sitting/sleeping next to a router/modem (never ending soft taps) and a person has metals as most do? why do people with health issues get problems with more radiation just like people that die from corona? Even if you feel good and not sick its those people that spread the "virus" as the media says since they dont know it yet. If that is true since who knows then you are to blame. As the the CDC and WHO saying masks are useless but it can spread though talking well that is another oxymoron thing to say which they are not stupid so whats the deal? either they lie or are incompetent which either way why does anyone listen to anything a liar/idiot says? the odd thing their are reports that some people are getting it without being in contact with anyone or anything for over 20 days. Very strange things. If it walks like a duck, talks like a duck, looks like a duck, it must be a dog? WHAT?
    39 replies | 2208 view(s)
  • moisesmcardona's Avatar
    4th April 2020, 16:00
    moisesmcardona replied to a thread paq8px in Data Compression
    It seems that some files are getting stuck extracting at 100% when the file has been compressed using pre-training text model. Using the refactored v185.
    1845 replies | 525963 view(s)
  • Marco_B's Avatar
    4th April 2020, 15:33
    I would specify better how the decoder may work. The distribution passed to the arithmetic/range coder is that for the alphabet specified in the row indicated by I (see again #187) in the matrix of the secondary prediction, augmented by an escape symbol. The frequency for a certain symbol is calculated making the sum of the counts for its column divided the sum of all other columns. The decoder receives the code and infers the corresponding symbol, then it updates the table of the primary prediction, finally with the symbol which emerges from here used as J-index it could update the matrix of the secondary prediction. In this way it seems to me that it is not even necessary the check about the reversibility of the matrix.
    190 replies | 9039 view(s)
  • kaitz's Avatar
    4th April 2020, 11:44
    kaitz replied to a thread Paq8pxd dict in Data Compression
    paq8pxd_v80 - Small changes in wordModel - Add second option -w for direct input of wikipedia dumps - WIT data type (option -w) for wikipedia no detection, malformed input gets transform fail -> subtract ID, convert timestamp, convert html entities (also to UTF8) | extract article header ns/id -> contributor, place after data | extract langs at the end of article, place after data - Move online wrt out from wordmodel Compressing enwik8: paq8pxd_v80 -s8 -w enwik8 Without -w no wiki specific processing is done. This will work on enwik9 and squeezechart files. Currently fails on enwik10. I think its second file inside.
    802 replies | 290354 view(s)
  • kaitz's Avatar
    4th April 2020, 11:36
    kaitz replied to a thread Paq8pxd dict in Data Compression
    Maybe open it in notepad++ and see if it is infact a text. ​You need to change wrtpre.cpp. If it reports bintext then its text mixed with binary data. And wrt probably will bloat the hell out of this file.
    802 replies | 290354 view(s)
  • pklat's Avatar
    4th April 2020, 09:31
    pklat replied to a thread Zstandard in Data Compression
    You might then move to some entirely new format, more efficient than html and compress that. Does brotli pack each file separately? If so, huge gains could be made if all text files for that page were solid packed in one. iirc, time is mostly wasted on fetching multitude of small files.
    401 replies | 124917 view(s)
  • Jyrki Alakuijala's Avatar
    4th April 2020, 07:53
    Perhaps you are more focused on aesthetics and elegance than efficiency. Efficiency is something that can be measured in a benchmark, not by reasoning. As an example when I played with dictionary generation (both zstd --train and shared brotli), occasionally I found that taking 10 random samples of the data and finding the best sample as a dictionary turned out more efficient than running either of the more expensive dictionary extraction algorithm. Other times concatenating 10 random samples was a decent strategy. It is not necessary for thorough thinking, logic and beauty to 'win' the dictionary efficiency game. Depending on how well the integration of a shared dictionary has been done, different 'rotting' times can be observed. SDCH dictionaries were rotting every 3-6 months into being mostly useless or already slightly harmful, with brotli dictionaries we barely see rot at all. Zstd dictionaries use -- while less efficient than shared brotli style shared dictionary coding -- also likely rots much slower than SDCH dictionaries. This is because SDCH used to mix the entropy originating from the dictionary use with the literals in the data, and then hope that a general purpose compressor can make sense out of this bit porridge. IMHO, we could come up with a unified way to do shared dictionaries and use it across the simple compressors (like zstd and brotli).
    401 replies | 124917 view(s)
  • xinix's Avatar
    4th April 2020, 07:24
    xinix replied to a thread Paq8pxd dict in Data Compression
    I can compile. Therefore. Can you tell which part of the code to change? _____ Perhaps simplify the task. pxd_v79 does not preprocess for my file Segmentation outputs bintext And does not preprocess If he can ignore it and apply preprocessing anyway, then I won’t need an external dictionary. Thanks!
    802 replies | 290354 view(s)
  • xinix's Avatar
    4th April 2020, 07:20
    You must not use "C9" and you must not use "C" You must use lowercase "c" And do not try to use phda9dec.exe windows phda9 only works under linux The created file in linux will not be unpacked under windows + due to the fact that phda is based on floats, it is possible to pack and unpack only on the same processor\linux.
    90 replies | 31120 view(s)
  • Alexander Rhatushnyak's Avatar
    4th April 2020, 07:14
    C9 is only for enwik9 C is for other files, but they can't be as big as enwik9. There's something in readme.txt about sizes. Sorry, it looks like this year I won't have more than 5 minutes per month for this.
    90 replies | 31120 view(s)
  • ShihYat's Avatar
    4th April 2020, 06:16
    Same problem here as above. ​When I use 'C' on other large enwik text files, it will return segmentation fault at different percentage, as well as use 'C' instead of 'C9' on enwik9, fault at 91%.
    90 replies | 31120 view(s)
  • byronknoll's Avatar
    4th April 2020, 00:11
    byronknoll replied to a thread cmix in Data Compression
    I didn't use your binary. Here are some suggestions that might help: - change the compiler flag from -Ofast to -O3. The binary will be slower, but might fix the issue you are seeing. - upgrade your compiler to a more recent version. - change to a different compiler - I recommend clang.
    449 replies | 110460 view(s)
  • LucaBiondi's Avatar
    3rd April 2020, 17:01
    LucaBiondi replied to a thread Paq8pxd dict in Data Compression
    Hi Guys! I hope you are all fine! I have two XML similar files. i found that one is detected as default while the other is detected as text: 0 |default | 1 | 14404734 ​18 |text | 1 | 11449237 Why? What should i verify inside my files? This is the log: C:\Compression\paq8pxd>paq8pxd_v76_AVX2 -x12 testset_xml c:\compression\Xml_testset\*.xml Slow mode FileDisk: unable to open file (No such file or directory) Creating archive testset_xml.paq8pxd76 with 2 file(s)... File list (76 bytes) Compressed from 76 to 58 bytes. 1/2 Filename: c:/compression/Xml_testset/12_A_20060313194711.xml (11449237 bytes) Block segmentation: 0 | text | 11449237 2/2 Filename: c:/compression/Xml_testset/AS_B_20150226093210.xml (14404734 bytes) Block segmentation: 0 | default | 14404734 Segment data size: 27 bytes TN |Type name |Count |Total size ----------------------------------------- 0 |default | 1 | 14404734 18 |text | 1 | 11449237 ----------------------------------------- Total level 0 | 2 | 25853971 default stream(0). Total 14404734 bigtext wrt stream(10). Total 8167896 Stream(0) compressed from 14404734 to 133667 bytes WRT dict count 811 words. WRT dict online. Stream(10) compressed from 8167896 to 284267 bytes Segment data compressed from 27 to 18 bytes Total 25853971 bytes compressed to 418059 bytes. Time 4365.32 sec, used 15766 MB (3646968591 bytes) of memory Thank you!!!! ​Luca
    802 replies | 290354 view(s)
  • Marco_B's Avatar
    3rd April 2020, 17:00
    > You say "We update it before to extract any information", but normally we have to decode a symbol > first to update anything. Yes, I do it to take into account for the posterior estimation, the check of the reversibility of the matrix in the secondary prediction should assure that the decoder can walk in lock-in, obviously it is possible I disredgard something. > In any case, SEE/SSE is about using a probability as context. I try to re-read with this in mind.
    190 replies | 9039 view(s)
More Activity