Activity Stream

Filter
Sort By Time Show
Recent Recent Popular Popular Anytime Anytime Last 24 Hours Last 24 Hours Last 7 Days Last 7 Days Last 30 Days Last 30 Days All All Photos Photos Forum Forums
  • Shelwien's Avatar
    Today, 02:46
    Not sure if I already encountered it, but here I modified encoder from this: https://encode.su/threads/3140-LZ98?p=60866&viewfull=1#post60866 Size seems to be similar, but not an exact match.
    1 replies | 61 view(s)
  • easyaspi314's Avatar
    Today, 02:43
    Intriguing. It seems like XXH3_128b (swapped acc + input) has perfect distribution. Each value appears exactly 1024 times according to my output. @Cyan is it correct that XXH128 has even distribution? Wait, hold on, there is a huge error in my code - I'm only testing the low bits lol
    21 replies | 815 view(s)
  • easyaspi314's Avatar
    Today, 02:33
    Here's 10 bit instead of 4 bit ZrHa Chart Here's XXH3_64b: Chart For the key, I used different chunks of kSecret on each row. About 1/4 of the values end up being zero in ZrHa, but XXH3 has fairly even distribution.
    21 replies | 815 view(s)
  • L@Zar0's Avatar
    Today, 00:37
    Hi, I don't know if this forum is the correct to ask for help, but I hope that someone can give me light to this. I'm trying to translate a game (Star Trek Judgment Rites, Enhanced CD, from English to Spanish), it is very old, from 1993, from Interplay, but the files are compressed with LZSS (or similar) algorithm. I have a fake compression algorithm, which increases the size of the file, but I need the original compression to avoid the files to be bigger than 0xFFFF. I tried to make the compression algorithm, tested another bunch of codes for LZSS found in inet, but without success. This is the decompression algorithm for the files (it can be improved, of course, but this is not what I'm searching for now). I need the compression algorithm if possible: void _uncompresslzss(uint32 CompSize, uint32 UnCompSize) { int N = 0x1000; // This is 4096. int THRESHOLD = 3, i = 0, j = 0; byte *HisBuf = new byte; byte b = 0, Length = 0, flagbyte = 0; unsigned short int offsetlen; uint32 outstreampos = 0; uint32 bufpos = 0; bool end = false; unsigned int BytesRead = 0, Offset = 0; memset(HisBuf, 0, N); BytesRead = 0; while (!end) { flagbyte = (byte)compdata; if (BytesRead == CompSize) end = true; BytesRead++; if (!end) { for (i = 0; i < 8; i++) { if ((flagbyte & (1 << i)) == 0) { offsetlen = (byte)compdata + ((byte)compdata << 8); if (BytesRead == CompSize) end = true; BytesRead += 2; if (!end) { Length = (offsetlen & 0xF) + THRESHOLD; Offset = bufpos - ((offsetlen >> 4) & (N - 1)); for (j = 0; j < Length; j++) { b = HisBuf; uncompdata = b; HisBuf = b; bufpos = (bufpos + 1) & (N - 1); } } } else { b = (byte)compdata; if (BytesRead == CompSize) end = true; BytesRead++; if (!end) { uncompdata = b; HisBuf = b; bufpos = (bufpos + 1) & (N - 1); } } } } } if (outstreampos != UnCompSize) printf("WARNING: file might not have been extracted correctly! "); } I attached two sample files (2x compressed + 2x uncompressed), one is bigger than the other, which is very small: I hope someone can help me, as far as I've seen there is not any tool for the compression algorithm (at least I have not found it). Any help will be appreciated. Thanks a lot.
    1 replies | 61 view(s)
  • NohatCoder's Avatar
    Yesterday, 23:00
    @easyaspi314 Yeah, that is the expected pattern, there is only 256 different possible states after mixing in the plaintext, and you are generating each of those states exactly 256 times. If you want to do something fun, take a slightly bigger version of the function, take each of the possible states and iterate a number of times with a fixed input, then we can do stats on the resulting distribution. Do things get worse/better with a different fixed input? I'd say probably, but I don't know.
    21 replies | 815 view(s)
  • NohatCoder's Avatar
    Yesterday, 22:44
    Yeah, Spooky v2 is broken. You can generate some high probability collision by changing a single bit in the final 8 bytes of the second-to last block, and then undo that change by flipping the two affected bits in the final 16 bytes of the last block. There is a pattern switch for the handling of the last block, and it means that data gets ingested into the same spot twice with virtually no mixing in-between.
    21 replies | 815 view(s)
  • easyaspi314's Avatar
    Yesterday, 22:26
    ZrHa64_update when scaled down to 4 bits instead of 64 bits (with 2 states and 2 inputs = 256x256), generated with libpng instead of terminal escape codes and a screenshot: It seems that some values just don't occur, everything occurs a multiple of 256 times, and there is a huge favoring of 136. Chart of value occurences Here is the code to generate it. #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <png.h> // 4-bit ZrHa64_update void ZrHa4_update(uint8_t state, uint8_t data) { uint8_t x0 = (state + data) % 16; uint8_t x1 = (state + data) % 16; uint8_t m0 = ((x0 % 4) * (x0 / 4)) % 16; uint8_t m1 = ((x1 % 4) * (x1 / 4)) % 16; uint8_t rot1 = ((x1 >> 2) | (x1 << 2)) % 16; uint8_t rot0 = ((x0 >> 2) | (x0 << 2)) % 16; state = (m0 + rot1) % 16; state = (m1 + rot0) % 16; } // Shameless copy of the libpng example code. int main(void) { int width = 256, height = 256; int code = 0; FILE *fp = NULL; png_structp png_ptr = NULL; png_infop info_ptr = NULL; png_bytep row = NULL; // Open file for writing (binary mode) fp = fopen("xormul.png", "wb"); if (fp == NULL) { fprintf(stderr, "Could not open file xormul.png for writing\n"); code = 1; goto finalise; } // Initialize write structure png_ptr = png_create_write_struct(PNG_LIBPNG_VER_STRING, NULL, NULL, NULL); if (png_ptr == NULL) { fprintf(stderr, "Could not allocate write struct\n"); code = 1; goto finalise; } // Initialize info structure info_ptr = png_create_info_struct(png_ptr); if (info_ptr == NULL) { fprintf(stderr, "Could not allocate info struct\n"); code = 1; goto finalise; } // Setup Exception handling if (setjmp(png_jmpbuf(png_ptr))) { fprintf(stderr, "Error during png creation\n"); code = 1; goto finalise; } png_init_io(png_ptr, fp); // Write header (8 bit colour depth) png_set_IHDR(png_ptr, info_ptr, width, height, 8, PNG_COLOR_TYPE_RGB, PNG_INTERLACE_NONE, PNG_COMPRESSION_TYPE_BASE, PNG_FILTER_TYPE_BASE); png_write_info(png_ptr, info_ptr); // Allocate memory for one row (3 bytes per pixel - RGB) row = (png_bytep) malloc(3 * width * sizeof(png_byte)); // Write image data int x, y; // Count the outputs int outputs = {0}; for (y=0 ; y<height ; y++) { for (x=0 ; x<width ; x++) { // Split up the nibbles uint8_t state = { (uint8_t)(y % 16), (uint8_t)(y / 16) % 16 }; uint8_t data = { (uint8_t)(x % 16), (uint8_t)(x / 16) % 16 }; // Run our downscaled update routine ZrHa4_update(state, data); // Combine the state back together uint8_t code = state + (state << 4); // Log the occurrence ++outputs; // Insert the pixel, with the R, G, and B being the outputted value. row = row = row = code; } png_write_row(png_ptr, row); } // Dump CSV of all of the outputs to stdout. printf("value,amount\n"); for (int i = 0; i < 256; i++) { printf("%4d,%4d\n", i, outputs); } // End write png_write_end(png_ptr, NULL); finalise: if (fp != NULL) fclose(fp); if (info_ptr != NULL) png_free_data(png_ptr, info_ptr, PNG_FREE_ALL, -1); if (png_ptr != NULL) png_destroy_write_struct(&png_ptr, (png_infopp)NULL); if (row != NULL) free(row); return code; }
    21 replies | 815 view(s)
  • svpv's Avatar
    Yesterday, 17:14
    Speaking of assembly, I have spooky64v2-x86_64.S, which closely corresponds to a slightly improved C version. The compiler cannot avoid register spilling, which in fact can be avoided narrowly (12 state registres with 15 general-purpose registers), so when spelled out in assembly, the loop runs about 10% faster (I don't remember the exact figure). As you can see, there is a fair amount of boilerplate, some of it to support both System V and Microsoft x64 calling convention, and some of it to make the rest of the code less error-prone. Then I started to have doubts about SpookyHash. Leo Yuriev reports that it has "collisions with 4bit diff" (I don't know the details), but more importantly, the way the last portion of data is fed atop of the state that's not been mixed, contrary to the trickle-feed theory, looks very suspicious (this is possibly the same issue that leads to 4-bit diff collisions). If it wasn't Bob Jenkins, I would call it a blunder. I also wrote a JIT compiler to explore ARX constructions like SpookyHash, and specifically to find better rotation constants that maximize avalanche. Measuring avalanche is not a trivial thing though: some if it comes from the fact that the state must be sufficiently random, and some from the strength of the construction proper... I must add that I don't understand well all aspects of Bob Jenkins' theories.
    21 replies | 815 view(s)
  • Gonzalo's Avatar
    Yesterday, 03:11
    Ok... Tried :rolleyes: Segmentation fault (core dumped) An before that, Cygwin WARNING: Couldn't compute FAST_CWD pointer. This typically occurs if you're using an older Cygwin version on a newer Windows. Please update to the latest available Cygwin version from https://cygwin.com/. If the problem persists, please see https://cygwin.com/problems.html The opensuse version from berelix downloads also crashed, but at least could initialize to the point of showing the options. I was thinking about downloading an old liveCD with build-essentials from that time and try compiling it there. Anyways, seems that the code itself isn't very mature so I don't know if it's worth it.
    156 replies | 81292 view(s)
  • NohatCoder's Avatar
    16th October 2019, 23:09
    The ideal case for non-reversible mixing is a random mapping function from one state to another, i.e. for 128 bit states, there exist (2^128)^(2^128) different (mostly non-reversible) mapping functions (as (2^128)! of them are reversible), if you pick a random one of those functions you'll expect the state-space to deteriorate by 22 29 bits in a billion applications. This number can be found by iteration, if at iteration x, n out of p different states are possible, then at iteration x+1 the expected number of possible states is p*(1 - ((p-1)/p)^n). By experimentation that turns into the bit loss being asymptotic to log2(iteration count)-1. But if your mapping function is not an ideal random pick, and it has some distinguisher (like being equivalent to a simple series of arithmetic operations), then the only thing we can really say theoretically is that it is unlikely to be better in this regard than a random pick amongst the functions. Your test functions stand out by entering loops faster than expected for a random function, and especially by always (?) entering into period 2 loops.
    21 replies | 815 view(s)
  • Gonzalo's Avatar
    16th October 2019, 20:36
    Thank you Shelwien. I'll give it a try. Although what I really wanted was the program working as an archiver.
    156 replies | 81292 view(s)
  • easyaspi314's Avatar
    16th October 2019, 17:39
    XOR has fewer patterns than ADD (which is almost symmetrical), here is an 8x8 table visualized. Edit: Here's 64x64 to really show the pattern: Add is really pretty though.
    21 replies | 815 view(s)
  • easyaspi314's Avatar
    16th October 2019, 17:00
    True, but there are many drawbacks to writing in assembly: It is platform-specific (and even subarch specific) It can be somewhat difficult to follow with multiple lanes Trying to figure out the best way to inline/unroll is annoying It can't be inlined It can't be constexpr'd I usually prefer using tiny inline assembly hacks, which while they mess up ordering a bit, can usually help a lot, especially this magic __asm__("" : "+r" (var)); which can break up patterns, murder SLP vectorization, move loads outside of loops, and more. It is like temporary volatile. For example, on ARMv7-A, vzip.32 (a.k.a. vtrn.32) modifies in place: If we use this on the even and odd halves of the vector (on ARMv7, Q-forms are unions of 2 half-width D-forms), we can end up with our vmlal.u32 setup in one instruction by vzipping in place at the cost of clobbering data_key (which is ok): Edit: Oops, the arrows are pointing to the wrong lanes. They should point to a >> 32 and b & 0xFFFFFFFF However, Clang and GCC can't comprehend an operation modifying 2 operands, and emit an extra vmov (a.k.a. vorr) to copy an operand. This line __asm__("vzip.32 %e0, %f0" : "+w" (data_key)); forces an in-place modification. I only write things in assembly when I want tiny code or when I am just messing around. It is just a pain otherwise.
    21 replies | 815 view(s)
  • svpv's Avatar
    16th October 2019, 16:36
    I had some thoughts on non-reversible mixing. It must be bad if you only have a 64-bit state and you want 64-bit result. Interleaving doesn't change this, if the lanes are independent. But that's not what we have with ZrHa_update. Since the lanes are "cross-pollinated", the irreversible mixing applies to 128-bit state. So this might be all good if you only want 64-bit result. And you can't get a decent 128-bit result anyway because a single 32x32->64 multiplication isn't enough for it. So what's "the mechanics" of non-reversible mixing? The internal state may "wear out" or "obliterate" gradually, but how and when does this happen? After being mixed, the state cannot assume certain values, about 1/e of all possible values. But as we feed the next portion of data, it seems that the state "somewhat recovers" in that it can assume more values. If the next portion is somewhat dissimilar to the mixed state, it is plausible to say that the state can assume any value again. (If we fed ascii strings with XOR, this could not have flipped the high bit in each byte, but we feed with ADD.) Assuming that the state recovers, I can't immediately see how the non-reversible mix is worse than a reversible one. Can it be show that the construction leaks, specifically more than 1 bit after a few iterations? Of course, the worst case is that the state assumes progressively less values on each iteration, which happens if you feed zeroes into it. We can see how this happens in the scaled-down constructions with 8x8->16 or 16x16->32 multiplications. #include <stdio.h> #include <inttypes.h> static inline uint32_t mix(uint32_t x) { uint8_t x0 = (x >> 0); uint8_t x1 = (x >> 8); uint8_t x2 = (x >> 16); uint8_t x3 = (x >> 24); uint16_t m0 = x0 * x1; uint16_t m1 = x2 * x3; uint16_t y0 = m0 + (x2 << 8 | x3); uint16_t y1 = m1 + (x0 << 8 | x1); return y0 << 16 | y1; } int main() { uint32_t x = 2654435761; while (1) { x = mix(x); printf("%08" PRIx32 "\n", x); } return 0; } The construction collapses with less than 2^10 iterations: $ ./a.out |head -$((1<<9)) |tail ecd22f4e e13e0fc7 4a8afd8d 15a3b5e1 422aef14 3cee1fc3 05d9fae7 ba9bec37 ce6ea88a c95ee32c $ ./a.out |head -$((1<<10)) |tail 2900a500 002900a5 2900a500 002900a5 2900a500 002900a5 2900a500 002900a5 2900a500 002900a5 If you change the combining step to XOR, the construction collapses in under 2^14 iterations: uint16_t y0 = m0 ^ (x2 << 8 | x3); uint16_t y1 = m1 ^ (x0 << 8 | x1); $ ./a.out |head -$((1<<13)) |tail 572eb816 2187191a 85ab0b7e aeef26dc cf067e54 2f9750a4 a46fbfe9 c273aea3 1d08f488 89bd881c $ ./a.out |head -$((1<<14)) |tail a400a400 00a400a4 a400a400 00a400a4 a400a400 00a400a4 a400a400 00a400a4 a400a400 00a400a4 Here's the 16x16->32 version, which collapses in under 1^25 and 1^30 iterations (with ADD resp. XOR, I'll spare you the outputs). #include <stdio.h> #include <inttypes.h> static inline uint64_t mix(uint64_t x) { uint16_t x0 = (x >> 0); uint16_t x1 = (x >> 16); uint16_t x2 = (x >> 32); uint16_t x3 = (x >> 48); uint32_t m0 = x0 * x1; uint32_t m1 = x2 * x3; uint32_t y0 = m0 + (x2 << 16 | x3); uint32_t y1 = m1 + (x0 << 16 | x1); return (uint64_t) y0 << 32 | y1; } int main() { uint64_t x = 6364136223846793005; while (1) { x = mix(x); printf("%016" PRIx64 "\n", x); } return 0; } By extrapolation, the construction with 32x32->64 multiplication must collapse in about 2^60 iterations. Can it be shown that XOR as the combining step works better than ADD also in the average case rather than just in the worst case? @NohatCoder, how did you calculate that 22 bits must be leaked after 1G iterations? Was that the average case or the worst-case analysis, or the distinction doesn't matter?
    21 replies | 815 view(s)
  • svpv's Avatar
    16th October 2019, 15:02
    It's no big deal to write an .S file in assembly if you can't cajole the compiler into emitting the right sequence of instructions.
    21 replies | 815 view(s)
  • maadjordan's Avatar
    16th October 2019, 10:55
    maadjordan replied to a thread 7-Zip in Data Compression
    but nevertheless there is a gain ;)
    555 replies | 289160 view(s)
  • Aniskin's Avatar
    16th October 2019, 09:52
    Aniskin replied to a thread 7-Zip in Data Compression
    because of
    555 replies | 289160 view(s)
  • t64's Avatar
    16th October 2019, 08:52
    t64 replied to a thread paq8px in Data Compression
    I have tested zpaq v7.15 with GTA IV & EFLC (31 GiB), with the following parameters: taskset -c 2,3 zpaq a compressed.zpaq folder -m5 Ended with a 18.7 GiB file (after 14 hours and 45 minutes, on a i5-6200U), while FreeArc 0.67 produced a 19.2 GiB file (using the default Ultra settings) With zpaq -m5 also compressed Postal 10th Anniversary Collectors Ed. Multi-platform (works on Windows, Mac and Linux) Repack (16.4 GiB) to only 4.1 GiB (in 5.7 hours), and that repack includes a 7.2 GiB .mdf image of the original game disc with old Postal 1 & Postal 2 versions, a 507.7 MiB .bin image file (the Music to Go Postal By CD), and 698.4 MiB of FLACs Other people compressed Wasteland 2 from 20.2 GiB to only 2.8 GiB with lrzip -z (which uses zpaq) (https://forums.inxile-entertainment.com/viewtopic.php?p=148864#p148864) And I wanted to try paq8px and compare results to zpaq, because some people consider paq8px the best for producing the smallest files (https://www.reddit.com/r/compression/comments/8uy70j/is_freearc_or_kgb_better_at_compression/) I'm interested in getting the smallest files possible regardless of compression time, for backups of all kind of data Thanks for the suggestion, I will try UltraARC then (has precomp and srep) and compare the results with paq8px, zpaq and uharc This seems like a good alternative to password-protected archives from other software, compiling paq8px (so your binary is unique and only your binary can decode the archive) and using the '-e' option Maybe I will :)
    1729 replies | 481892 view(s)
  • maadjordan's Avatar
    16th October 2019, 08:50
    maadjordan replied to a thread 7-Zip in Data Compression
    Mfilter handling of "tar" file in your package does not show good gain and then noticed that many of these jpg files are damaged
    555 replies | 289160 view(s)
  • Shelwien's Avatar
    16th October 2019, 06:54
    Shelwien replied to a thread paq8px in Data Compression
    Trying to repack games with any paq version is a bad idea. Not only it would take a lot of time and resources (note that decoding time is the same as encoding), but also won't even allow you to estimate the best possible compression, since paq doesn't have any special handling for large files and compressed formats. I'd suggest using precomp, xtool, srep first. If you're really interested, you can apply paq8px to their output. > What is the difference between paq8px and paq8pxd? I know they are different projects but I don't know what are the actual differences Read the first posts here: https://encode.su/threads/342-paq8px https://encode.su/threads/1464-Paq8pxd-dict These are original developers of each branch and there's some description of initial differences. At this point though its hard to list differences, since parts were added, removed, exchanged etc. You can do a benchmark of some small (~1Mb) files of various types and tell us. > Could you explain what does option 'e = Pre-train x86/x64 model' do? Afaik it uses the compressor exe itself to train the predictor. Thus a different version would be unable to decode the archive. > Would using this option improve compression ratio with the files from the attached filelist? It would help with exe/dll and .dylib. Oggs are already compressed and paq8px doesn't recompress them - you can try oggre or cdm instead.
    1729 replies | 481892 view(s)
  • t64's Avatar
    16th October 2019, 05:46
    t64 replied to a thread paq8px in Data Compression
    The problem was the gcc version, mine is 6.3.0, thanks for investigating the issue :) Used a VM with MX-Linux 19 beta 3 (with gcc 8.3.0) for compiling the binary and had no problems I'm currently compressing the files of two games, POSTAL 1 and 2 (1.8 GiB): taskset -c 2,3 ./paq8px -9b @FILELIST -v -log "POSTAL1&2.paq8px182fix1.log" Will later post about the results P.D.: What is the difference between paq8px and paq8pxd? I know they are different projects but I don't know what are the actual differences P.D.2: Could you explain what does option 'e = Pre-train x86/x64 model' do? Would using this option improve compression ratio with the files from the attached filelist?
    1729 replies | 481892 view(s)
  • Shelwien's Avatar
    16th October 2019, 03:12
    With some hacks I was able to build it on msys2 (windows): http://nishi.dreamhosters.com/u/pcompress_v1.7z It doesn't want to work with filesystem (creates empty archive), but seems to work in stream mode. Something like this: cat * | pcompress -c adapt2 -l2 -t1 -p >..\test pcompress.exe -d -p <../test >unp
    156 replies | 81292 view(s)
  • Gonzalo's Avatar
    15th October 2019, 23:55
    Has anybody been able to compile this recently? I tried in both Manjaro and Ubuntu, but it keeps throwing me errors. I had to downgrade openssl in ubuntu, use a modified version of pcompress in manjaro, and I should downgrade binutils too to make the linker work but I don't want to go there yet. I also tried to find an old copy in my backups, and I didn't have one Seems that the author sadly abandoned the project. It was a very promising program. So... has anybody got any luck and want to share their binary? Preferable with WavPack and bsc included. Thanks!!
    156 replies | 81292 view(s)
  • Gotty's Avatar
    15th October 2019, 23:15
    Gotty replied to a thread paq8px in Data Compression
    Lubuntu 19.04 64 bit: paq8px_v182fix1 compiled successfully. Your command for compiling looks identical to mine except for -fprofile-use (which I don't have). But it works, too (I have just tried). It's strange that compiling is successful in your environment but linking is not. You have linker errors complaining about a couple of static const/constexpr arrays. Could you verify the source file? $ md5sum paq8px.cpp 1f7e2ee9eb3a8bba679a101db4aff46b paq8px.cpp What is your gcc version? Mine is: $ gcc --version gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0 Edit: it could be that you may have an older gcc. It looks like those static constexpr arrays would work only in newer compilers. Could you try upgrading your gcc package and see if it works?
    1729 replies | 481892 view(s)
  • Shelwien's Avatar
    15th October 2019, 22:58
    Shelwien replied to a thread 7-Zip in Data Compression
    made an mfilter demo because apparently some people can't RTFM. http://nishi.dreamhosters.com/u/mfilter_demo_20191013.rar
    555 replies | 289160 view(s)
  • Jarek's Avatar
    15th October 2019, 09:40
    I have added some new methods for adaptiveness to the arxiv, generally for non-stationary time series. In data compression there is used adaptiveness for discrete probability distributions: CDF += (mixCDF - CDF) >> rate But what about adaptiveness for continuous distributions, like of parameters of Laplace distribution for residues? I am developing methodology for this difficult general problem. A natural approach is replacing standard "static" estimation: of fixed parameters for entire dataset, with adaptive estimation: separate parameter estimation for every moment, using only its past information - weakening weights of the old values. For example instead of standard likelihood, we can optimize weighted likelihood - for time T find parameters of rho maximizing: likelihood_T = \sum_{t<T} eta^{T-t} log(rho(x_t)) for some eta in (0,1) for such exponential moving average. For estimating width b of Laplace distribution centered in 0, e.g. to encode residues in data compression, it leads to recurrence: b_T = eta b_{T-1} + (1-eta) |x| or like for CDF above: b += (abs(x) - b) >> rate so it is really cheap (also for exponential power distribution) - might be worth to consider for audiovisual data compression. I have also two other tools for such adaptiveness for continuous distributions: - we can do adaptive least-squares linear regression (II.D in this arxiv) - with evolving coefficients optimizing such exponential moving average MSE, - for more complex distributions we can model them with polynomials and analogously MSE-adapt their coefficients with time ( https://arxiv.org/pdf/1807.04119 ).
    29 replies | 1560 view(s)
  • t64's Avatar
    15th October 2019, 03:02
    t64 replied to a thread paq8px in Data Compression
    Hello Gotty The 31 GiB are multiple files :) I'm particularly interested in doing compression tests with videogames, have several games under 2 GB, so if I manage to compile paq8px under Linux (or get the 64 bit Linux binary) I'd be glad to do tests with paq8px, even if is a slow process (I just want to see how much I can compress these games, in order to reduce disk space needed for backups) Best regards
    1729 replies | 481892 view(s)
  • Gonzalo's Avatar
    14th October 2019, 23:47
    Gonzalo replied to a thread repack .psarc in Data Compression
    You can always use a de-duplicator before 7z or RAR, like srep, or freearc -m0=rep. If you have memory enough, I believe this last method to be better. FA also lets you sort the files on different ways to put the similar ones closer. Deduplication improves radically the overall speed and almost always improves the ratio, sometimes greatly, especially in big archives. OTOH, You can replace 7z with FA altogether. There is another project that seems great for this but I haven't tried it yet: https://github.com/moinakg/pcompress In my personal case, I found the rep+fastlzma2 combination to be a perfect match to my needs. It usually gives me the same or better ratio than pure 7z but at least 2x faster, sometimes up to 20x faster.
    4 replies | 367 view(s)
  • Gotty's Avatar
    14th October 2019, 22:25
    Gotty replied to a thread paq8px in Data Compression
    Hello t64, Is the 31 GiB one file or multiple files? Currently paq8px does not support files over 2 GB. Also paq8px is quite slow: compressing that amount would require like 2-5 days (depending on your cpu and memory speed, and other programs running simultaneously). Anyway I'll try to figure out what's wrong with compiling it (it should work.), and come back to you soon.
    1729 replies | 481892 view(s)
  • t64's Avatar
    14th October 2019, 21:46
    t64 replied to a thread paq8px in Data Compression
    Hello, I'm doing tests compressing GTA IV & Episodes From Liberty City (31 GiB in total) Tried zpaq with -m5 and the games were compressed to 18.7 GiB Now I want to see if I can achieve better compression with paq8px, but I'm failing to compile it under Linux (MX-Linux 18.3, 64 bit) $ g++ -s -fno-rtti -fwhole-program -static -std=gnu++1z -O3 -m64 -march=native -Wall -floop-strip-mine -funroll-loops -ftree-vectorize -fgcse-sm -fprofile-use paq8px.cpp -opaq8px.exe -lz /tmp/ccieukZA.o: In function `StateMap::StateMap(Shared const*, int, int, int, bool)': paq8px.cpp:(.text+0x4539): undefined reference to `StateTable::State_table' /tmp/ccieukZA.o: In function `ContextMap::mix(Mixer&)': paq8px.cpp:(.text+0xe1cd): undefined reference to `StateTable::State_table' /tmp/ccieukZA.o: In function `IndirectMap::update()': paq8px.cpp:(.text+0x1a94e): undefined reference to `StateTable::State_table' /tmp/ccieukZA.o: In function `FrenchStemmer::Stem(Word*)': paq8px.cpp:(.text+0x1c7cd): undefined reference to `FrenchStemmer::TypesExceptions' /tmp/ccieukZA.o: In function `ContextMap2::update()': paq8px.cpp:(.text+0x26da9): undefined reference to `StateTable::State_table' /tmp/ccieukZA.o: In function `ContextMap2::mix(Mixer&)': paq8px.cpp:(.text+0x27904): undefined reference to `StateTable::State_table' paq8px.cpp:(.text+0x27997): undefined reference to `StateTable::State_group' /tmp/ccieukZA.o: In function `ContextMap::update()': paq8px.cpp:(.text+0x3246d): undefined reference to `StateTable::State_table' /tmp/ccieukZA.o: In function `dmcModel::st()': paq8px.cpp:(.text+0x41e88): undefined reference to `StateTable::State_table' paq8px.cpp:(.text+0x421c6): undefined reference to `StateTable::State_table' /tmp/ccieukZA.o: In function `Image4bitModel::mix(Mixer&)': paq8px.cpp:(.text+0x42a20): undefined reference to `StateTable::State_table' /tmp/ccieukZA.o: In function `ExeModel::mix(Mixer&)': paq8px.cpp:(.text+0x675cd): undefined reference to `ExeModel::InvalidX64Ops' paq8px.cpp:(.text+0x675d4): undefined reference to `ExeModel::InvalidX64Ops' paq8px.cpp:(.text+0x675df): undefined reference to `ExeModel::InvalidX64Ops' paq8px.cpp:(.text+0x67714): undefined reference to `ExeModel::Table1' paq8px.cpp:(.text+0x6771b): undefined reference to `ExeModel::TypeOp1' paq8px.cpp:(.text+0x67912): undefined reference to `ExeModel::Table3_3A' paq8px.cpp:(.text+0x67919): undefined reference to `ExeModel::TypeOp3_3A' paq8px.cpp:(.text+0x6871f): undefined reference to `ExeModel::TypeOp1' paq8px.cpp:(.text+0x688db): undefined reference to `ExeModel::TableX' paq8px.cpp:(.text+0x688e8): undefined reference to `ExeModel::TypeOpX' paq8px.cpp:(.text+0x68b51): undefined reference to `ExeModel::Table2' paq8px.cpp:(.text+0x68b58): undefined reference to `ExeModel::TypeOp2' paq8px.cpp:(.text+0x68c4e): undefined reference to `ExeModel::Table3_38' paq8px.cpp:(.text+0x68c55): undefined reference to `ExeModel::TypeOp3_38' paq8px.cpp:(.text+0x68c73): undefined reference to `ExeModel::Table1' paq8px.cpp:(.text+0x68c7a): undefined reference to `ExeModel::TypeOp1' paq8px.cpp:(.text+0x68e7f): undefined reference to `ExeModel::TypeOp1' /tmp/ccieukZA.o: In function `ContextModel::p()': paq8px.cpp:(.text+0x6bf84): undefined reference to `StateTable::State_table' paq8px.cpp:(.text+0x6ecfc): undefined reference to `DmcForest::dmcparams' paq8px.cpp:(.text+0x6ed0a): undefined reference to `DmcForest::dmcmem' paq8px.cpp:(.text+0x6f1df): undefined reference to `StateTable::State_table' /tmp/ccieukZA.o: In function `Predictor::trainText(char const*, int)': paq8px.cpp:(.text+0x7ec3d): undefined reference to `StateTable::State_table' paq8px.cpp:(.text+0x7ec80): undefined reference to `StateTable::State_table' paq8px.cpp:(.text+0x7ed0e): undefined reference to `StateTable::State_group' paq8px.cpp:(.text+0x7fcf8): undefined reference to `StateTable::State_table' paq8px.cpp:(.text+0x7fcff): undefined reference to `StateTable::State_table' paq8px.cpp:(.text+0x80ee5): undefined reference to `StateTable::State_table' paq8px.cpp:(.text+0x80f24): undefined reference to `StateTable::State_table' paq8px.cpp:(.text+0x80fba): undefined reference to `StateTable::State_group' paq8px.cpp:(.text+0x81b67): undefined reference to `StateTable::State_table' paq8px.cpp:(.text+0x81b85): undefined reference to `StateTable::State_table' /tmp/ccieukZA.o: In function `EnglishStemmer::Stem(Word*)': paq8px.cpp:(.text+0x82b6b): undefined reference to `EnglishStemmer::TypesExceptions1' paq8px.cpp:(.text+0x841b5): undefined reference to `EnglishStemmer::TypesStep4' paq8px.cpp:(.text+0x84a90): undefined reference to `EnglishStemmer::TypesExceptions2' paq8px.cpp:(.text+0x85601): undefined reference to `EnglishStemmer::TypesStep3' paq8px.cpp:(.text+0x8574a): undefined reference to `EnglishStemmer::TypesStep1b' paq8px.cpp:(.text+0x8657e): undefined reference to `EnglishStemmer::TypesStep4' collect2: error: ld returned 1 exit status Can someone provide a Linux Binary for paq8px_v182fix1? Assuming that is the latest version
    1729 replies | 481892 view(s)
  • Krishty's Avatar
    14th October 2019, 20:12
    Hah! I’ll see what I can do :D
    38 replies | 2127 view(s)
  • JamesWasil's Avatar
    14th October 2019, 19:59
    Papa's Optimizer: Better Ingredients. Better Refinements. Papa Op's. Once the optimizations can be delivered under 30 minutes with a Papa Tracker, I would tip for that. :)
    38 replies | 2127 view(s)
  • schnaader's Avatar
    14th October 2019, 18:35
    Sorry, "streaming support" was an unclear term, here. I meant the capability to detect JPG streams inside other files (so, "embedded support" would be a better term), e.g. JPG in RAW files, game containers or .tar files. Or, as in the case above, a file processed by brunsli-nobrotli that has the main image data compressed, but two JPG streams for the thumbnails left that are "embedded" in the .brn-nobrot file (part of the copied metadata).
    12 replies | 1118 view(s)
  • Krishty's Avatar
    14th October 2019, 14:18
    Updated version with drag’n’drop support and optional foreground priority: https://papas-best.com/downloads/optimizer/stable/x64/Best%20Optimizer.7z This was a major UI overhaul, so regard this as a beta release. major UI overhaul (now supporting drag’n’drop) moved options to new General tab fixed 7-Zip settings fixed crashes with analysis fixed a typo fixed tab flickering
    38 replies | 2127 view(s)
  • pklat's Avatar
    14th October 2019, 11:31
    pklat replied to a thread repack .psarc in Data Compression
    I don't understand the question. .psarc is a PS3 archive, it is already compressed, and the point here is same as in Precomp. that is, to unpack the .psarc, keep the metadata, and repack it with better compression and larger dictionary. so that later you can recreate identical .psarc the difference to Precomp is that this is done in 'file level'. so you can rearrange files (-mqs) to gain better compression. there can be significant gains ( like 30% ). but most data in PS3 games are videos, etc. if you got the PC version of same game, hopefully, some data files like textures would be identical if not similar so you could gain more by putting it all in giant solid .7z I've been planning to do it with .cab and similar someone else here already did it, but iirc hasn't released the source code.
    4 replies | 367 view(s)
  • Raphael Canut's Avatar
    14th October 2019, 10:20
    Yes to interest companies I must adapt my codec to any image size, but I postpone this work for later if I resume work on the NHW Project, as I think now I will start this training (unfortunately there are no image/video compression job positions in my area). You can always contact me if you are interested in the NHW Project. Cheers, Raphael
    186 replies | 17120 view(s)
  • Jyrki Alakuijala's Avatar
    14th October 2019, 10:12
    Brunsli the lib, no. Brunsli the file format, yes. Brunsli has a fixed two way progressive layout. 8x8 subsampled whole image first, followed by the sequential AC.
    12 replies | 1118 view(s)
  • telengard's Avatar
    14th October 2019, 04:57
    telengard replied to a thread LZ98 in Data Compression
    I managed to get my hands on a HW debugger and dumped RAM where the main program is loaded after decompression, every single byte is exactly the same. :) There were some differences WAY at the end of the section, but that seems to be some kind of global data which had been updated while running for a few seconds. I hope to test out the compression code soon with some changes I will be testing. thanks again for your help!
    30 replies | 1717 view(s)
  • Shelwien's Avatar
    14th October 2019, 02:51
    You can try contacting game studios, especially small ones, or try attaching your codec to some open-source software like XBMC/Plex. Games commonly work with GPU-friendly image formats, so your codec would likely require extra conversion layer. Being limited to 512x512 is a large hurdle for any practical use too. Also a lossy image codec is nothing rare, so its hard to get attention unless you can beat everything else at something like this: https://stackoverflow.com/questions/891643/twitter-image-encoding-challenge Maybe consider switching to a lossless image codec? Its easier to find practical applications for that.
    186 replies | 17120 view(s)
  • Raphael Canut's Avatar
    13th October 2019, 21:36
    Hello all, Just a quick new email to give some few update... As I told you I had to start in September a training in Machine Learning at University but the National Employment Agency finally refused to finance me... Now they want that I start a training as Java Fullstack developer in November, but I am not totally enthusiastic about it (personal taste)... So I am now reconsidering the NHW Project image/video compression codec... I don't have worked on it since February this year but I have made some visual comparison with AOM AV1 (AVIF) and HEVC, and as I told you, I visually prefer the NHW Project compared to AVIF and HEVC for high quality to high compression (up to -l11 quality setting) because it has more neatness, and for me it is more pleasant.The NHW Project is also a lot faster to encode/decode and royalty-free.-For very high and extreme compression (below 0.4bpp), that's right that AVIF and HEVC are better (and very impressive)-... Just a reminder if you want to test the NHW Project, its new entropy coding schemes are not totally optimal for now, and we can save 2.5KB per .nhw compressed file in average.Will have to re-work on it... As I also told you, some months ago, I contacted JPEG, MPEG and the Alliance for Open Media, and they confirmed me that they were not interested in the NHW Project.So I don't think the NHW Project will find a large application in the industry, but would some of you be interested in developing the NHW Project maybe as a niche market? Sorry to "spam" this forum about my job search, but again if you and your company are interested in developing the state-of-the-art NHW Project image/video compression codec, do not hesitate to contact me.Any thought on it are also very welcome.Is my objective hopeless? Cheers, Raphael
    186 replies | 17120 view(s)
  • CompressMaster's Avatar
    13th October 2019, 16:26
    @pklat, it´s possible to compress results further or it´s already compressed?
    4 replies | 367 view(s)
  • jethro's Avatar
    13th October 2019, 14:36
    jethro replied to a thread Zstandard in Data Compression
    https://github.com/facebook/zstd/issues/1817
    339 replies | 113311 view(s)
  • maadjordan's Avatar
    13th October 2019, 11:45
    as always Shnaader, you keep amazing me.
    12 replies | 1118 view(s)
  • schnaader's Avatar
    13th October 2019, 11:04
    Nice trick! Can also be confirmed using SREP: MFilter7z.64.dll.srep 1,200,440 // srep MFilter7z.64.dll jcaron.jpg.srep 536,446 // srep jcaron.jpg MFilter_jcaron.dat.srep 1,215,822 // copy /b MFilter7z.64.dll + jcaron.jpg MFilter_jcaron.dat After searching a bit, I found a site from Adobe with download links for their typical ICC profiles and together with a string from jcaron.jpg ("U.S. Web Coated (SWOP) v2"), the specific profile can be found: USWebCoatedSWOP.icc 557,168 USWebCoatedSWOP.icc.srep 531,192 USWeb_jcaron.dat.srep 541,982 By the way, the mentioned thumbnail recompression would also be possible with the modified brunsli version: cover.jpg 201,988 // file with 2 thumbnails cover.jpg.brn 152,636 // unmodified brunsli treats thumbnails as metadata... cover.jpg.brn.pcf 152,665 // ...so precomp finds nothing afterwards cover.jpg.brn-nobrot 163,750 // modified brunsli-nobrotli cover.jpg.brn-nobrot.pcf_cn 162,093 // precomp -cn => 2/2 JPG streams (the thumbnails) cover.jpg.brn-nobrot.pcf 152,653 // but it doesn't really help compared to unmodified brunsli on this file The thumbnail streams are small compared to the whole file (5,157 bytes each) and completely identical, so the second one gets deduplicated by unmodified brunsli as well as by lzma2 on the modified brunsli. Unfortunately, brunsli has no streaming support (processes whole JPEGs only), so you can't apply it a second time to recompress thumbnails. Another modified version would be needed that detects and processes thumbnails in the metadata.
    12 replies | 1118 view(s)
  • maadjordan's Avatar
    13th October 2019, 09:22
    item #3 is like Ecm did with dumped cds which could explain why Mfilter size is large and this should get benefit with compressing few jpg files and lose gain on compressing tens of jpg files.
    12 replies | 1118 view(s)
  • pklat's Avatar
    12th October 2019, 20:12
    pklat replied to a thread repack .psarc in Data Compression
    guess someone already did it all: https://aluigi.altervista.org/quickbms.htm oh, well.
    4 replies | 367 view(s)
  • Aniskin's Avatar
    12th October 2019, 19:17
    MFilter uses the following additional optimizations for jcaron.jpg: 1) Compresses jpeg thumbnail in Exif segment 2) Compresses jpeg thumbnail in Photoshop segment 3) Deletes "well known" ICC profile. MFilter knows several well known ICC profiles and can delete and restore such ICC profiles on the fly.
    12 replies | 1118 view(s)
  • Shelwien's Avatar
    12th October 2019, 14:39
    mfilter can recompress jpeg thumbnails, maybe because of that?
    12 replies | 1118 view(s)
  • WinnieW's Avatar
    12th October 2019, 13:33
    WinnieW replied to a thread Zstandard in Data Compression
    I can confirm there is no problem. I compressed a file of 38 Gbyte of size using the official 64 Bit Windows command line binary. Verified the file integrity using SHA1 checksums. Original file and decompressed file were bit identical.
    339 replies | 113311 view(s)
  • maadjordan's Avatar
    12th October 2019, 09:52
    maadjordan replied to a thread 7-Zip in Data Compression
    Great news. I tested it on MHT samples on https://www.fileformat.info/format/mime-html/sample/index.htm and gain was on (yahoo.mht & microsoft.mht) also tested on XML files like https://github.com/schnaader/precomp-cpp/files/381607/Acacia_High.zip (after unpacked as its gzipped) and it worked ..
    555 replies | 289160 view(s)
  • maadjordan's Avatar
    12th October 2019, 09:24
    I tried 7-zip with mfilter and the result is much less (2,667 bytes)
    12 replies | 1118 view(s)
  • Aniskin's Avatar
    12th October 2019, 02:22
    Aniskin replied to a thread 7-Zip in Data Compression
    MFilter: support of data in base64 format added.
    555 replies | 289160 view(s)
  • Shelwien's Avatar
    11th October 2019, 17:28
    Shelwien replied to a thread Zstandard in Data Compression
    No filesize limit normally. Zstd API (zstd.h) doesn't even work with files, but rather streams of unknown length. A custom file format which uses zstd for compression can easily have such limits though.
    339 replies | 113311 view(s)
  • WinnieW's Avatar
    11th October 2019, 16:44
    WinnieW replied to a thread Zstandard in Data Compression
    I have got a question: Is Zstandard capable of compressing very large files, e.g. files of 40 Gbytes each, or is there a file size limit?
    339 replies | 113311 view(s)
  • easyaspi314's Avatar
    10th October 2019, 20:16
    It's a bit faster, but no massive speedups like SSE2 gets. ./xxhsum 0.7.2 (64-bits aarch64 little endian), Clang 8.0.1 (tags/RELEASE_801/final), by Yann Collet Sample of 100 KB... XXH32 : 102400 -> 27899 it/s ( 2724.5 MB/s) XXH32 unaligned : 102400 -> 22728 it/s ( 2219.5 MB/s) XXH64 : 102400 -> 31618 it/s ( 3087.7 MB/s) XXH64 unaligned : 102400 -> 31660 it/s ( 3091.8 MB/s) XXH3_64b : 102400 -> 61503 it/s ( 6006.2 MB/s) XXH3_64b unaligned : 102400 -> 56964 it/s ( 5562.9 MB/s) XXH3_64b seeded : 102400 -> 61156 it/s ( 5972.3 MB/s) XXH3_64b seeded unaligne : 102400 -> 56355 it/s ( 5503.4 MB/s) XXH128 : 102400 -> 51273 it/s ( 5007.1 MB/s) XXH128 unaligned : 102400 -> 49356 it/s ( 4819.9 MB/s) XXH128 seeded : 102400 -> 51421 it/s ( 5021.6 MB/s) XXH128 seeded unaligned : 102400 -> 50749 it/s ( 4955.9 MB/s) ZrHa64 : 102400 -> 50611 it/s ( 4942.4 MB/s) ZrHa64 unaligned : 102400 -> 41221 it/s ( 4025.4 MB/s) ZrHa64 (NEON) : 102400 -> 67734 it/s ( 6614.7 MB/s) ZrHa64 (NEON) unaligned : 102400 -> 60503 it/s ( 5908.5 MB/s) The speed difference makes sense though, it is only 2 or 3 cycles faster. However, the main problem is that Clang is emitting vaddhn, slowing down the code. (Darn it, instcombine) GCC 9.2.0 obeys my code, though: ZrHa64 (NEON) : 102400 -> 70132 it/s ( 6848.8 MB/s) ZrHa64 (NEON) unaligned : 102400 -> 62468 it/s ( 6100.4 MB/s) However, it could definitely be improved, as there are multiple ways to end up with the shuffle I need.
    21 replies | 815 view(s)
  • 78372's Avatar
    10th October 2019, 17:20
    78372 replied to a thread video recompression in Data Compression
    If you say visibly lossless than you can use ffmpeg with -crf 18 preset
    13 replies | 553 view(s)
  • Cyan's Avatar
    10th October 2019, 16:12
    If I do understand the produced assembly as being a comparison between XXH3 current kernel and Zrha proposed one, it does not look much worse. In particular, the nb of instructions seems to be the same in both cases. Now the order and complexity of these instructions differ, but is the difference that large ? The way I see it, the change : - removes a 'load' and a 'xor' from getting rid of the secret - adds 2 instructions `vrev` + `vext` to emulate the _MM_SHUFFLE(0, 1, 2, 3) The `xor` is likely trivial, and the `load` is likely fetching hot data from L1 cache. But even L1 cache fetching costs a few cycles. I would expect such a cost to be comparable or even slightly slower than announced 3 cycles for `vrev` and `vext`.
    21 replies | 815 view(s)
  • schnaader's Avatar
    10th October 2019, 14:31
    I just created a branch of brunsli on GitHub that removes the brotli dependency. Apart from the smaller binaries, this has the advantage of not compressing metadata (the only thing brotli is used for in brunsli) which is useful for using brunsli together with other compressors or for deduplication of metadata between JPEGs. Here are results for a strange file I found in an eBook that has not much image data (10x20 pixels) but lots of metadata (some ICC color profile): size (bytes) encoding time (ms) notes original 566,486 Precomp 0.4.7 -cn 486,517 2000 here we can see that packJPG compresses metadata, too Precomp -t+ 298,250 400 not using packJPG helps for this file brunsli: 363,960 200 better than packJPG in both speed and size... brunsli + Precomp 364,073 300 ...but nothing left to optimize brunsli-nobrotli: 566,165 60 only image data is compressed, so not much difference brunsli-nobrotli + Precomp: 298,378 500 metadata compressed by lzma2 Attached are Visual Studio 2019 64 bit binaries (note: if you need best performance, try to compile with clang) and the mentioned tested JPG file. Renamed the binaries to "Xbrunsli-nobrotli.exe" to reduce confusion and conflicts with existing brunsli binaries.
    12 replies | 1118 view(s)
  • easyaspi314's Avatar
    10th October 2019, 06:36
    In order to do the ZrHa_update routine on NEON, we would need to have, at the same time, a DCBA, AC, and a BD permutation (or CA/DB). I'm thinking something like this: ZrHa_update_neon32: vld1.64 {d0, d1}, vld1.8 {d2, d3}, vadd.i64 q0, q0, q1 vrev64.32 q1, q0 vtrn.32 d0, d1 vswp d2, d3 vmlal.u32 q1, d0, d1 vst1.64 {d1, d2}, bx lr ZrHa_update_neon64: ld1 q0, ld1 q1, add v0.2d, v0.2d, v1.2d xtn v1.2s, v0.2d shrn v2.2s, v0.2d, #32 rev64 v0.4s, v0.4s ext v0.16b, v0.16b, v0.16b, #8 umlal v0.2d, v1.2s, v2.2s st1 q0, ret Not as clean as SSE2 (Wow, never thought I'd say that one!) ZrHa_update_sse2: // x86_64, sysv movdqu xmm0, xmmword ptr paddq xmm0, xmmword ptr pshufd xmm1, xmm0, _MM_SHUFFLE(2, 3, 0, 1) pshufd xmm2, xmm0, _MM_SHUFFLE(0, 1, 2, 3) pmuludq xmm0, xmm1 paddq xmm0, xmm2 movdqa xmmword ptr, xmm0 ret "XXH3_64b_round": XXH3_64b_round_neon32: vld1.64 {d0, d1}, vld1.8 {d2, d3}, vadd.i64 q0, q0, q1 vld1.8 {d4, d5}, veor q2, q2, q1 vtrn.32 d4, d5 vmlal.u32 q0, d4, d5 vst1.64 {d0, d1}, bx lr XXH3_64b_round_neon64: ld1 q0, ld1 q1, add v0.2d, v0.2d, v1.2d ld1 q2, eor v2.16b, v2.16b, v1.16b xtn v1.2s, v2.2d shrn v2.2s, v2.2d, #32 umlal v0.2d, v1.2s, v2.2s st1 q0, ret
    21 replies | 815 view(s)
  • easyaspi314's Avatar
    9th October 2019, 21:29
    That shuffle is pretty ugly for NEON. The main cheap shuffles NEON has are shown below, all taking roughly 3 cycles a piece (except the first one): Edit: The other problem is that vmull.u32 requires an entirely different setup than pmuludq: Or uint64_t pmuludq(uint64_t a, uint64_t b) { return (a & 0xFFFFFFFF) * (b & 0xFFFFFFFF); } uint64_t vmull_u32(uint32_t a, uint32_t b) { return (uint64_t)a * (uint64_t)b; }
    21 replies | 815 view(s)
  • Shelwien's Avatar
    9th October 2019, 12:54
    So it seems: https://patents.google.com/patent/US6199064B1/en But there were quite a few implementations despite that. Also I don't think that you can patent reading one buffer and writing to many.
    6 replies | 475 view(s)
  • MegaByte's Avatar
    9th October 2019, 11:35
    So the patent just expired.
    6 replies | 475 view(s)
  • Shelwien's Avatar
    8th October 2019, 23:32
    I think its this: http://www.compressconsult.com/st/ You can compare to bsc ST modes: http://libbsc.com/
    6 replies | 475 view(s)
  • CompressMaster's Avatar
    8th October 2019, 18:42
    Sorry for late reply. I was forced to be offline... new NNCP based on original version - this is called Upgrade/Update. As I wrote earlier, it´s all about NNCP. To improve readability, it´s better to post versions within CODE tags in first post - otherwise, first post could be veeeery long. thread starter - @pothos2 - should update his first post with links to all versions.
    98 replies | 10137 view(s)
  • CompressMaster's Avatar
    8th October 2019, 18:26
    My quick test on enwik8: Total input size: 100000000 bytes Total output size: 28054812 bytes it tooks 20.44 seconds. Great! Decompression verified OK - 100% match. What does it mean "switch" here? At least in CMD there isn´t any examples of use...
    6 replies | 475 view(s)
  • Gonzalo's Avatar
    8th October 2019, 18:13
    Gonzalo replied to a thread Stcompression 0.5 in Data Compression
    1/69 ratio... "Suspicious.low.ml.score" which means Trapmine's Machine Learning detector didn't quite like it but it won't say why. 'Because' Basically a false positive. ------------------------------- BTW: Does anyone knows what kind of a compressor is this? Could be a form of this algorithm? Or this?
    6 replies | 475 view(s)
  • telengard's Avatar
    8th October 2019, 17:51
    telengard replied to a thread LZ98 in Data Compression
    I did that and it all checks out, also verified I see the exact same thing in IDA. Ok, I'll have to figure out how to do that. Before coming to this forum, I was fiddling with it and had this: endian big comtype lzss0 get ZSIZE long get SIZE long savepos OFFSET get NAME filename string NAME + ".unpacked" clog NAME OFFSET ZSIZE SIZE If you feel it is not worth comparing with your v2 code, I won't spend more time on it. There's so much that looks correct I have a hard time believing the if the v2 decompression program was off it would only be off in very few select places in almost 8M of assembly. :) I'd expect a lot of differences. But I'm not very knowledgeable about compression, etc so that may be the case.
    30 replies | 1717 view(s)
  • svpv's Avatar
    8th October 2019, 15:42
    I've pushed a non-maquette implementation to github. There are three variants: uint64_t ZrHa64_long_generic(const void *data, size_t len, uint64_t seed0, uint64_t seed1); uint64_t ZrHa64_long_sse2(const void *data, size_t len, uint64_t seed0, uint64_t seed1); uint64_t ZrHa64_long_avx2(const void *data, size_t len, uint64_t seed0, uint64_t seed1); which have the states, respectively, uint64_t state; __m128i state; __m256i state; I checked that the return value is the same. (There should be a runtime switch, but that requires an extra portability layer. I haven't made up my mind whether to press on with any kind of a release or whether it's just an experiment.) This introduces the routine that merges two states into one. It is fast and relatively weak (matching in its speed and weakness the update routine). void ZrHa_merge(uint64_t state, const uint64_t other) { uint64_t x0 = state + rotl64(other, 32); uint64_t x1 = state + rotl64(other, 32); uint64_t m0 = (uint32_t) x0 * (x0 >> 32); uint64_t m1 = (uint32_t) x1 * (x1 >> 32); state = m0 + x1; state = m1 + x0; } In "state + rotl64(other, 32)", rotl64 is somewhat important: this is how we prefer to combine two products. Combining state with other rather than with other is less important. (It breaks the symmetry though and further eschews combining the data at power-of-two distances.) There is no rotl64 in "m0 + x1", that rotl64 only makes sense in a loop. We end up with two products again. Thus the routine is applied recursively: merge two AVX registers into a single AVX register, worth two SSE registers, merge them into a single SSE register, the last merge stops short and the function returns with "state + rotl64(state, 32)". This makes it a pretty fast mixer with just 2 or 3 SIMD multiplications (resp. with AVX2 or SSE2). The result still passes all SMHasher tests.
    21 replies | 815 view(s)
  • NohatCoder's Avatar
    8th October 2019, 10:12
    The interesting part isn't so much how many bits you lose in one iteration, rather it is how many bits you lose in a lot of iterations. In the ideal case you should lose ~29 bits in 1G iterations, dropping another bit every time the iteration count doubles. But we just established that your function isn't the ideal case, and we don't know that it follows a similar pattern. The established theory of non-reversible mixing is that it is ******* complicated, and therefore best avoided. I will mention that Merkle-Damgård does non-reversible mixing by xoring the state with a previous state, after a long series of reversible mixing. I believe that this pretty much guarantees the ideal case. I still don't like Merkle-Damgård though, periodically reducing the state to output size does nothing good. It is by the way not true that you can't multiply the input with itself in a reversible manner, you could do: //Note: This function is for demonstrating a point only, I do not condone this as a hash function. void update(uint64_t state, uint64_t data) { uint64_t x0 = state + data; state = state + (uint32_t) x0 * (x0 >> 32); state = x0; } Multiplication between two non-constants still has some weird behaviour, in that the numbers themselves decide how far the mixing goes, with the ultimate undesirable case that one of the numbers is zero.
    21 replies | 815 view(s)
  • Jarek's Avatar
    8th October 2019, 07:58
    The encode.su list link is now back thanks to the comment on user talk page, getting protection from future attacks. Beside building own e.g. wiki pages, we should also work on our sources being treated as reliable - especially e.g. for Wikipedia, which is often the first contact with a new field. Fortunately there is much more github now - which is generally seen as a relatively reliable source, especially that it has history available. It is also worth to consider arxiv sometimes to explain technical details, summarize your work.
    59 replies | 4562 view(s)
More Activity