Activity Stream

Filter
Sort By Time Show
Recent Recent Popular Popular Anytime Anytime Last 24 Hours Last 24 Hours Last 7 Days Last 7 Days Last 30 Days Last 30 Days All All Photos Photos Forum Forums
  • pacalovasjurijus's Avatar
    Yesterday, 14:14
    I like writing programs. c=0 while c < -1: c=c+1
    46 replies | 5370 view(s)
  • Dresdenboy's Avatar
    Yesterday, 07:52
    Dresdenboy replied to a thread Crinkler in Data Compression
    Somewhat related: There is another, less known compressor for 1K/4K intros called oneKpaq (developed for Mac with clang compiler, but I could get it running on Win with some changes, at least providing compressed sizes), with a 128-165 byte decompressor code (not including executable file format headers, which are included in the Crinkler decompressor). Here's the repository: https://github.com/temisu/oneKpaq
    2 replies | 2760 view(s)
  • danlock's Avatar
    Yesterday, 07:31
    danlock replied to a thread Crinkler in Data Compression
    As old as this thread is, I believe it is the correct place to put the notification that the popular demoscene executable compressor (compressing linker), Crinkler, has been made open-source: https://github.com/runestubbe/Crinkler I acknowledge that it's been several weeks and that Dresdenboy already posted about Crinkler's new status and linked to its Github repository in the '(Extremely) tiny decompressors' thread.
    2 replies | 2760 view(s)
  • avitar's Avatar
    7th August 2020, 20:13
    avitar started a thread 7zip update in Data Compression
    ​7-Zip 20.01 alpha was released.7-Zip for 64-bit Windows x64: https://7-zip.org/a/7z2001-x64.exe 7-Zip for 32-bit Windows: https://7-zip.org/a/7z2001.exe What's new after 7-Zip 20.00 alpha: The default number of LZMA2 chunks per solid block in 7z archive was increased to 64. It allows to increase the compression speed for big 7z archives, if there is a big number of CPU cores and threads. The speed of PPMd compressing/decompressing was increased for 7z/ZIP/RAR archives. The new -ssp switch. If the switch -ssp is specified, 7-Zip doesn't allow the system to modify "Last Access Time" property of source files for archiving and hashing operations. Some bugs were fixed. New localization: Swahili.
    0 replies | 141 view(s)
  • ivan2k2's Avatar
    7th August 2020, 03:55
    MultiPar with subfolders
    1 replies | 66 view(s)
  • SvenBent's Avatar
    7th August 2020, 02:20
    I am backing up som family pictures and i want to fully utilize the DVD's capacity for dat sefty redudancy I remember par files used to be the to go for to handle this so i went and downloaded quickpar 0.9 However this program is only able to crate par files for files in a current directory and no sub or multiple directories Is there a tool better for this purpose and able to handle multiple levels in a folder structure ?
    1 replies | 66 view(s)
  • encode's Avatar
    6th August 2020, 19:41
    For comparison, i7-9700K @ 5.0 and 5.1 GHz:
    208 replies | 127130 view(s)
  • withmorten's Avatar
    6th August 2020, 16:57
    I actually found that a bit after and forgot to mention it, thanks for getting back though :) Nice IDA historical archive in any case, and via TOR the speed wasn't quite so bad.
    4 replies | 490 view(s)
  • LucaBiondi's Avatar
    6th August 2020, 14:48
    LucaBiondi replied to a thread paq8px in Data Compression
    Hi Mpais, what about a preliminare model for mp3 files? Do you think should be an easy task? Luca
    2031 replies | 557477 view(s)
  • moisesmcardona's Avatar
    6th August 2020, 14:43
    moisesmcardona replied to a thread paq8px in Data Compression
    BTW, I noticed when using the `-v` flag it would print levels -1 to -9. I updated it to show instead up to level -12 rather than -9. Be sure to update the code! This is simply a cosmetic change. :)
    2031 replies | 557477 view(s)
  • Dresdenboy's Avatar
    6th August 2020, 13:41
    I think it might just be better than current LZ77-family-like algorithms (LZB etc.) in specific situations, e.g. with a high percentage of literal bytes and still multiple reuses for matched strings (so the savings for matchs length get bigger), or just a high number of reused strings overall (texts). But this remains to be seen. Possible difficulties in staying efficient come from overlapped matches (like ABRA and BRA, see my notes for leaving match strings in the literals block).
    12 replies | 607 view(s)
  • lz77's Avatar
    6th August 2020, 13:14
    But compressed data must contain positions 0 and 7, 5 and d? How can they take fewer bits than literal counters 2-4 bits long? And how I can use offset history buffer? Among 256 offsets will not be the same ones. And their high bytes are unlikely to be equal...
    12 replies | 607 view(s)
  • Dresdenboy's Avatar
    6th August 2020, 11:49
    As comgt described, you have a literals data block (in his variant only containing literals, which weren't matched) and the match strings (literal string + positions to copy it to) in some list. Example: Original data: ABRACADABRA12AD 0123456789abcde (position) Match strings: ABRA - copy to pos 0 and 7 (ABRA...ABRA....) AD - copy to pos 5 and d (ABRA.ADABRA..AD) Literals left over: C12 - copy into remaining spots "." Re LZ77: It might be difficult to get there. With just over 4 MB literals left, the length < 4 matches can't remove more than that (and cost encoding bits). So the most surely could be gained from the encoding.
    12 replies | 607 view(s)
  • lz77's Avatar
    6th August 2020, 11:06
    I still can't figure out how to decompress LZT data if there are no literal counters...And I want to compress TS40.txt to at least 120 Mb in 10 seconds. But LZ77 only will not be able to do this.
    12 replies | 607 view(s)
  • Darek's Avatar
    5th August 2020, 22:15
    Darek replied to a thread paq8px in Data Compression
    enwik9 scores of last three changes: 130'076'196 - enwik9_1423.drt -9eta by Paq8px_v183fix1, 124'786'260 - enwik9_1423.drt -12eta by Paq8px_v187fix2, change: -4,07% 121'056'858 - enwik9_1423.drt -12leta by Paq8px_v189, change: -2,99%, time 330'229,11s - it's quite close to cross 120'000'000 line :)
    2031 replies | 557477 view(s)
  • Dresdenboy's Avatar
    5th August 2020, 20:38
    For RLLZ you might have to rewrite and rebalance everything. You already have good results. At this stage you might turn it into an optimization problem with parameter tuning.
    12 replies | 607 view(s)
  • lz77's Avatar
    5th August 2020, 18:05
    Thanks, but I haven't figured out how to use this to improve my compressor yet. All the more so for winning 3000 euros. :) For example, if I exclude literal length bits from my code words, I will improve ratio on ~1.5%... I've tried to use 00, 01, 10, 110, 111 prefixes, but it made the compression worse. At this time my baby LZ77 compressor without compiler optimizations compresses a bit better and 20% faster than lzturbo -p0 -22 -b1024... on I3 5005U. May be on newer CPUs the comparison results will be different...
    12 replies | 607 view(s)
  • Dresdenboy's Avatar
    5th August 2020, 15:30
    It looks like LZ77 wants to give RLLZ a try. :) I think we can't help him here.
    10 replies | 2026 view(s)
  • Dresdenboy's Avatar
    5th August 2020, 14:28
    I think, this is a nice opportunity to learn something new. When I did my research of current LZ versions and probably forgotten ones from the past, I found a lot of interesting ideas! There is no strong relation for specific values (e.g. offset 360, len 5), but longer matches usually are further away (see links below), as the probability to find a combination of letters (or other symbols) shrinks with the length. Instead of using another prefix bit, you might redistribute your current offset ranges among these 4 subdivisions. Or really create a variable length prefix tree (00, 01, 10, 110, 111 or sth. else, costing only additional bits for the biggest offsets, which most likely involve longer match lengths, being more seldom but saving lots of bytes already). For such things I used detailed stats in Excel to play around with. I found them useful for smaller binary files. It just depends. Large texts could contain more of the longer matches, of course. But it's certainly not wrong to look at the frequency of shorter lenghts (with shorter offsets, since a 2 byte match with all flags etc. should be smaller than 2 literals). These two pages give some interesting insights based on an (also linked) "FV" tool: http://mattmahoney.net/dc/textdata.html http://www.fantascienza.net/leonardo/ar/string_repetition_statistics/string_repetition_statistics.html (which is linked from Matt's page) This would at least be interesting to see its capabilities. I made some notes while playing it through in my mind, adding/changing some bits of that algo already at that level. I'll send you the bullet point list via PM.
    12 replies | 607 view(s)
  • lz77's Avatar
    5th August 2020, 13:17
    Unfortunately, I'm not an English speaker, I learned German (long ago...) I understand not all the LZ77 compression improvements yet. I see no relation between len/offset. Also I see no repeated offsets when compressing enwik8. I see only repeated match strings like 'the ', ', and ', 'and ', 'of the ', ... If I want to use history of matched substrings, I will need one more prefix for these codewords? For example, if I'm using 4 prefixes 00-11 for 4 length of offset, I will have to go to 3 bit prefixes. Won't those 3-bit prefixes eat up the benefits of using them? I'm using a minimum match length of 4 bytes as LZ4. How could you use the match length of 2 and 3 bytes? I'm going to use some ideas from the thread "Reduced Length LZ (RLLZ): One way to output LZ77 codes". I doubt that it gives a big win, but after using this idea, the compression will no longer be so fast...
    12 replies | 607 view(s)
  • compgt's Avatar
    5th August 2020, 10:42
    Right, it is better to just write it as a binary number, since the range or length in bits is needed anyway. My variable-length code is more compact i think. Only increases in bitlength (by 1) after all possible values are used up for that bitsize. This is Rice coding generalized, i think. /* Filename: ucodes.h (universal codes.) Written by: Gerald Tamayo, 2009 */ #include <stdio.h> #include <stdlib.h> #if !defined(_UCODES_) #define _UCODES_ /* Unary Codes. */ #define put_unary(n) put_golomb((n),0) #define get_unary() get_golomb(0) /* Exponential Golomb coding */ #define put_xgolomb(n) put_vlcode((n), 0) #define get_xgolomb() get_vlcode(0) /* Elias-Gamma coding. Note: don't pass a zero (0) to the encoding function: only n > 0 */ #define put_elias_gamma(n) put_xgolomb((n)-1) #define get_elias_gamma() get_xgolomb() /* Golomb Codes. */ void put_golomb( int n, int mfold ); int get_golomb( int mfold ); void put_vlcode( int n, int len ); int get_vlcode( int len ); #endif I just use my put_vlcode() function for short codes. Last time i checked, this is related to Rice codes. /* Filename: ucodes.c (universal codes.) Written by: Gerald Tamayo, 2009 */ #include <stdio.h> #include <stdlib.h> #include "gtbitio.h" #include "ucodes.h" /* Golomb Codes. We divide integer n by (1<<mfold), write the result as a unary code, and then output the remainder as a binary number, the bitlength of which is exactly the length of the unary_code-1. In the implementation below, mfold is an exponent of two: mfold = {0, 1, 2, ...} and (1<<mfold) is thus a power of two. Each 1 bit of the unary code signifies a (1<<mfold) *part* of integer n. In *exponential* Golomb coding, each 1 bit signifies succeeding powers of 2. (We allow a length/mfold of 0 to encode n as a plain unary code.) */ void put_golomb( int n, int mfold ) { int i = n >> mfold; while ( i-- ) { put_ONE(); } put_ZERO(); if ( mfold ) put_nbits( n%(1<<mfold), mfold ); } int get_golomb( int mfold ) { int n = 0; while ( get_bit() ) n++; n <<= mfold; if ( mfold ) n += get_nbits(mfold); return n; } /* The following variable-length encoding function can write Elias-Gamma codes and Exponential-Golomb codes according to the *len* parameter, which can be 0 to encode integer 0 as just 1 bit. */ void put_vlcode( int n, int len ) { while ( n >= (1<<len) ){ put_ONE(); n -= (1<<len++); } put_ZERO(); if ( len ) put_nbits( n, len ); } int get_vlcode( int len ) { int n = 0; while ( get_bit() ){ n += (1<<len++); } if ( len ) n += get_nbits(len); return n; } Treat the number as a binary number (not decimal number) and transform it into a variable-length code.
    4 replies | 272 view(s)
  • Lucas's Avatar
    5th August 2020, 07:03
    After a bit more reading it doesn’t appear to be a compression method at all, we are taking it a bit too literally. It appears to be a transform into the unary domain with finite width codes. Seems kinda dumb at first until you consider optimizations, lookup tables can be used to generalize the unary domain of integers using this method. It appears they made this to get into that domain for biological nueral networks (songbirds). It’s not directly applicable to compression, but is essentially a preprocessor for changing domains, so it can be thought of as a model which can be used for compression, rather than a direct compression method itself. It would be nice if they explained the unary domain better though. As to why it is better for training biological NN’s, I’m not sure why...
    4 replies | 272 view(s)
  • cottenio's Avatar
    5th August 2020, 05:34
    I think we share the same struggle about it being "considered a variant of unary"; especially after I just gave reading his paper a go. ("Generalizing Unary Coding", DOI: 10.1007/s00034-015-0120-7) He mentions the possible uses in data compression related to neural networks - with the idea that a "spatial form" could represent an array of neurons. What I'm started to realize though, after looking through the Wikipedia page history, is this might be a case of author self-promotion: there seems to be a lot of edits over time adding his journal articles into the reference material for unary and neural network related pages. I'm honestly having trouble parsing this sentence from his paper:
    4 replies | 272 view(s)
  • JamesWasil's Avatar
    5th August 2020, 03:44
    Did you invent KITT for David Hasselhoff to play on Knight Rider as Michael Knight? Or did Wilton Knight come to you to get the design ideas and understand how the molecular bonded shell works?
    10 replies | 2026 view(s)
  • JamesWasil's Avatar
    5th August 2020, 03:10
    There are times that you understand things that the world doesn't see. To some this is an obligation, while to others it is a blessing.
    0 replies | 68 view(s)
  • Lucas's Avatar
    5th August 2020, 03:10
    It appears useless for compression, maybe it was designed for some other application in mind. Since this system requires a limit on the numbers in the included 0-15 example, we could have just encoded 4-bit wide binary codes instead. Regular unary and universal codes don't impose a limit on integer size, while this does. I'm also struggling to grasp why this is considered a variant of unary, let alone an improvement.
    4 replies | 272 view(s)
  • dado023's Avatar
    4th August 2020, 23:56
    Hm.....do we(ppl on forum) have standardized test collection of PNGs...i am willing to download and test, just provide me with samples to test with.
    85 replies | 23911 view(s)
  • Krishty's Avatar
    4th August 2020, 23:05
    I don’t know :( If you or someone else would like to go ahead and benchmark it, I’d be very happy to learn! I’m using ECT and I’ve seen Pingo talk in the ECT thread, but to be honest I didn’t understand the bottom line.
    85 replies | 23911 view(s)
  • dado023's Avatar
    4th August 2020, 22:38
    Does this compresses higher than Pingo ?
    85 replies | 23911 view(s)
  • Krishty's Avatar
    4th August 2020, 21:01
    … and then came the COVID lockdown … I haven’t forgotten you, Jaff. I haven’t implemented the trailing data handling yet, but your thumbnail optimization suggestion works really well. Find attached the portable version of the current build; the setup/update can be found on my site. Feedback is always welcome. I notice that JPEG optimization is somewhat slower now, because ExifTool (which is my bottleneck) runs once more per file. Changes: added JPEG thumbnail optimization added option to remove Adobe metadata from JPEG This was suggested to me and it indeed strips some files further. ExifTool doesn’t do it by default. Be aware, though, that there is a good reason for that: The data may be required to correctly identify the image’s color space. If you use this option, be sure to check the result; especially if you use CMYK & Co a lot. For an example of what could go wrong, see here. improved launch speed improved performance during resizing updated ECT to 0.8.3 updated ExifTool to 12.03 fixed genetic PNG optimization appearing frozen Thanks to fhanau for the help! fixed wrong status messages minor performance improvements
    85 replies | 23911 view(s)
  • cottenio's Avatar
    4th August 2020, 20:58
    I was looking into unary coding while working on a compression project for artificial neurons, partially spurred on by the neat research into birdsong/HVC. On my journey I found that the Wikipedia page for unary coding has a dedicated subsection for "Generalized unary coding" by Subhash Kak. https://en.wikipedia.org/wiki/Unary_coding I read the description but came away with no concrete understanding of its nature or utility. It reads like an arbitrary fixed length encoding, and even requires "markers" for "higher integers." Am I reading this correctly and its hogwash, or is there some deeper usefulness that I'm missing here?
    4 replies | 272 view(s)
  • compgt's Avatar
    4th August 2020, 17:56
    > What is the state of implementing this idea? Not implemented yet. I stopped coding in 2010. I just know this makes very compact LZ. If i recall correctly that James Storer and Jennifer Chayes of Microsoft went to me in my high school (Philippines) in the early 1990s, then it was at that time that i mentioned to Storer the idea of transmitting the literals last, in LZ77. I didn't elaborate though, as i hadn't been programming anymore that time. I didn't have access to an own computer in the 1980s. Mostly theoretical. I had no achievement in high school except when Hollywood people (and Jennifer Chayes!) went to me in school to sing or maybe re-record the official songs for Bread and America bands, maybe Nirvana too, etc. I didn't get some million$ for my Hollywood music and movies that time, not even thousand$. See, the lead singer in Bread is a "Gates" though it's my voice in the band's modern "official" songs. Others called me "David Gates" or "Bill Gates". To some, i was the real Bill Gates.
    10 replies | 2026 view(s)
  • Darek's Avatar
    4th August 2020, 17:51
    Darek replied to a thread paq8px in Data Compression
    As I found the "english.exp" file was changed - now it's about 2KB smaller. Other two "english" looks as the same.
    2031 replies | 557477 view(s)
  • Dresdenboy's Avatar
    4th August 2020, 16:39
    What is the state of implementing this idea? I had some thoughts and played it through in my mind (knowing my match/literal statistics of compressing binaries, no text though) in different variants (for example: not storing the matched string literal bytes with the offset list, but referencing it from the literals block, avoiding the first explicit copying from the string list to some first position (code bits!) and leaving that to the literal filler instead). So far I think that saving the literal/match flags isn't enough to offset the needed pointers to either the first occurence or the literal block, even with some kind of delta coding, either going sequentially forward through referenced literals or target positions will cause more randomized referenced on the other side. But for text compression it might actually work due to more frequent reuse of matched strings. In my files I mostly see single time matches (bytes at same absolute position only copied once for a match). There is surely a way to calculate the benefits based on reuse probabilities for strings.
    10 replies | 2026 view(s)
  • mpais's Avatar
    4th August 2020, 16:21
    mpais replied to a thread paq8px in Data Compression
    I could've sworn I uploaded it even before writing the post text :confused: Oh well, at least the code tags are working again
    2031 replies | 557477 view(s)
  • moisesmcardona's Avatar
    4th August 2020, 16:17
    moisesmcardona replied to a thread paq8px in Data Compression
    No executable? :cool:
    2031 replies | 557477 view(s)
  • mpais's Avatar
    4th August 2020, 16:03
    mpais replied to a thread paq8px in Data Compression
    Changes: - New option switch "r" to perform initial retraining of the LSTM on text blocks - Support for DEC Alpha executable compression, with a specific transform and model As requested by Darek, I made a preliminary model for DEC Alpha executable code. Should get us very close to #2 spot on the Silesia Open Source Benchmark. File: mozilla, from Silesia Corpus, 51.220.480 bytes paq8px_v189 -12 7.559.180 bytes paq8px_v189 -12l 7.225.479 bytes paq8px_v190 -12 6.848.090 bytes paq8px_v190 -12l 6.627.595 bytes
    2031 replies | 557477 view(s)
  • Sportman's Avatar
    4th August 2020, 14:51
    Sportman replied to a thread 2019-nCoV in The Off-Topic Lounge
    Iran cover-up of deaths revealed by data leak: https://www.bbc.com/news/world-middle-east-53598965
    42 replies | 3798 view(s)
  • Dresdenboy's Avatar
    4th August 2020, 13:24
    Mne tozhe! :) Well I thought along the lines of looking at stats for lengths (e.g. 4624 len 2 matches, 3511 len 3 matches...), distances, literal run lengths between matches (0 to n bytes). For the len/offset relationship I used a matrix with either linear scale (e.g. for lengths) or binary (log2) scale for distances. And either by applying the encoding somehow to the stats or calculating the costs for encoding and providing stats for it, will also show, where the encoding might cost too many bits. This might really help getting ideas for improvements. Did you look at the probabilities of getting repeated offsets (same offset as one of the last n)? Re LZ4/Zstd: I don't know. Maybe the details described in https://tools.ietf.org/html/rfc8478 will help.
    12 replies | 607 view(s)
  • Dresdenboy's Avatar
    4th August 2020, 12:41
    I got hold of the BeRoExePacker depacker sources for some older version of 2008. This is not including LZBRA (LZSS+AC), a PAQ variant based on kkrunchy, or some LZP+context modelling variant. But it includes LZBRS, LZBRR and LZMA (see BeRo's blog for some details). The sources could be found in this Chinese forum: https://bbs.pediy.com/thread-71242.htm But since it required some complex registration and contains the packer, which triggers security mechanisms all over the place (Win, Chrome, Firefox..), I stripped the exe from the archive. The current version can be downloaded from BeRo's blog. LZBRS depacker (without CLD - clear direction flag, and source/dest init) is 69 Bytes in 32b x86 asm (going 16b would save 10B from long call adresses, and add a byte here and there for replacing LEAs, see disasm): 00000274 BE02714000 mov esi,0x407102 00000279 BF00204000 mov edi,0x402000 0000027E FC cld 0000027F AD lodsd 00000280 8D1C07 lea ebx, 00000283 B080 mov al,0x80 00000285 3BFB cmp edi,ebx 00000287 733B jnc 0x2c4 00000289 E81C000000 call dword 0x2aa 0000028E 7203 jc 0x293 00000290 A4 movsb 00000291 EBF2 jmp short 0x285 00000293 E81A000000 call dword 0x2b2 00000298 8D51FF lea edx, 0000029B E812000000 call dword 0x2b2 000002A0 56 push esi 000002A1 8BF7 mov esi,edi 000002A3 2BF2 sub esi,edx 000002A5 F3A4 rep movsb 000002A7 5E pop esi 000002A8 EBDB jmp short 0x285 000002AA 02C0 add al,al 000002AC 7503 jnz 0x2b1 000002AE AC lodsb 000002AF 12C0 adc al,al 000002B1 C3 ret 000002B2 33C9 xor ecx,ecx 000002B4 41 inc ecx 000002B5 E8F0FFFFFF call dword 0x2aa 000002BA 13C9 adc ecx,ecx 000002BC E8E9FFFFFF call dword 0x2aa 000002C1 72F2 jc 0x2b5 000002C3 C3 ret LZBRR depacker is (same conditions) 149 Bytes in 32b x86 asm (10 long relative call adresses, which would be 20B less in 16b asm): 00000274 BE52714000 mov esi,0x407152 00000279 BF00204000 mov edi,0x402000 0000027E FC cld 0000027F B280 mov dl,0x80 00000281 33DB xor ebx,ebx 00000283 A4 movsb 00000284 B302 mov bl,0x2 00000286 E86D000000 call dword 0x2f8 0000028B 73F6 jnc 0x283 0000028D 33C9 xor ecx,ecx 0000028F E864000000 call dword 0x2f8 00000294 731C jnc 0x2b2 00000296 33C0 xor eax,eax 00000298 E85B000000 call dword 0x2f8 0000029D 7323 jnc 0x2c2 0000029F B302 mov bl,0x2 000002A1 41 inc ecx 000002A2 B010 mov al,0x10 000002A4 E84F000000 call dword 0x2f8 000002A9 12C0 adc al,al 000002AB 73F7 jnc 0x2a4 000002AD 753F jnz 0x2ee 000002AF AA stosb 000002B0 EBD4 jmp short 0x286 000002B2 E84D000000 call dword 0x304 000002B7 2BCB sub ecx,ebx 000002B9 7510 jnz 0x2cb 000002BB E842000000 call dword 0x302 000002C0 EB28 jmp short 0x2ea 000002C2 AC lodsb 000002C3 D1E8 shr eax,1 000002C5 744D jz 0x314 000002C7 13C9 adc ecx,ecx 000002C9 EB1C jmp short 0x2e7 000002CB 91 xchg eax,ecx 000002CC 48 dec eax 000002CD C1E008 shl eax,byte 0x8 000002D0 AC lodsb 000002D1 E82C000000 call dword 0x302 000002D6 3D007D0000 cmp eax,0x7d00 000002DB 730A jnc 0x2e7 000002DD 80FC05 cmp ah,0x5 000002E0 7306 jnc 0x2e8 000002E2 83F87F cmp eax,byte +0x7f 000002E5 7702 ja 0x2e9 000002E7 41 inc ecx 000002E8 41 inc ecx 000002E9 95 xchg eax,ebp 000002EA 8BC5 mov eax,ebp 000002EC B301 mov bl,0x1 000002EE 56 push esi 000002EF 8BF7 mov esi,edi 000002F1 2BF0 sub esi,eax 000002F3 F3A4 rep movsb 000002F5 5E pop esi 000002F6 EB8E jmp short 0x286 000002F8 02D2 add dl,dl 000002FA 7505 jnz 0x301 000002FC 8A16 mov dl, 000002FE 46 inc esi 000002FF 12D2 adc dl,dl 00000301 C3 ret 00000302 33C9 xor ecx,ecx 00000304 41 inc ecx 00000305 E8EEFFFFFF call dword 0x2f8 0000030A 13C9 adc ecx,ecx 0000030C E8E7FFFFFF call dword 0x2f8 00000311 72F2 jc 0x305 00000313 C3 ret
    44 replies | 3532 view(s)
  • Sportman's Avatar
    4th August 2020, 11:29
    Basic computer stuff as mouse https://en.wikipedia.org/wiki/Douglas_Engelbart#/media/File:SRI_Computer_Mouse.jpg, copy/paste/edit, network, hypertext link, video conferencing, shared working etc. we use today was already invented round 1965 as oN-Line System https://en.wikipedia.org/wiki/NLS_(computer_system) at ARC https://en.wikipedia.org/wiki/Augmentation_Research_Center and SRI https://en.wikipedia.org/wiki/SRI_International December 9, 1968 (The Mother of All Demos https://en.wikipedia.org/wiki/The_Mother_of_All_Demos) demo: Summary: https://www.youtube.com/watch?v=B6rKUf9DWRI Full: https://www.youtube.com/watch?v=yJDv-zdhzMY
    42 replies | 2123 view(s)
  • Shelwien's Avatar
    4th August 2020, 07:25
    > But for so many to not give their reasons it shows it is not that important, More like nobody here cares about the question - I suppose you can try asking it on Quora instead: https://www.quora.com/profile/Matt-Mahoney-2 Also your posts are kind of hard to read. > You say its very important but you mainly show > money are the important core cause a few times. I don't understand why'd you expect some kind of ideology for people to work on data compression. Yes, with current tech we can avoid using any compression algorithms at all, so its certainly not something essential, just a useful option. On other hand, in cases without hardware solutions (eg. we want to add some new features to device's firmware, but can't replace the firmware flash chip which has a limited size) we might have to look for software solutions. > Yet I assume you want it free to some degree despite it would save people money. There's a significant difference between a compression algorithm and a product based on it, which is actually designed to save money in some specific use cases. > As stated in my other post HD manufacturers don't make that much money. Well, you'd have to look at media content providers instead (netflix, youtube etc). Online video won't be possible without significant effort invested into video codec development (which is also compression). > Some people pay for winrar which the gain is not better and even worse than > the free which doesn't justify but they are magically afloat. There's some weird logic in play, but its actually easier for corporations to buy commercial software for common tasks rather than use free equivalents. > Well I was hoping for more than money reason There're plenty of cases where compression is useful (eg. improvement of encryption security) or is the only solution to some technical problem (sending a picture in a twitter post). Soon enough (maybe in 10-20 years) it would also be the only way to improve storage density (once switching to a higher density tech becomes too expensive). But since there're technical workarounds for using compression in most cases, money can be said to be the only real reason. > I guess everyone says the same which I am a bit disappointed. Well, your brain does a lot of data compression and translation between different coding methods (google "data compression human brain"), so it can be said that its unavoidable? On other hand, you don't have to know the implementation details for your brain to work. > which is why I keep saying a programmer can not discover a far better file > compression. They have the ability to write the code but not create it. That's not how it works at all. 1) Programming is not a mechanical task like you see it. In most cases there's no one best solution which just requires "writing the code". 2) In mathematical terms, compression is a solved problem (find a shortest set of cpu instructions that outputs given data), but practically that solution is impossible to use. So its up to programmers to find ways to make efficient algorithms for available hardware - there's plenty of creativity involved, in facts that's one of the main reasons as to why compression is interesting. 3) Everything is up to the volume of external information. In theory, we can replace any file with a reference to some global storage system - that's already near-infinite compression in practice, for most users. Again, the problem is purely technical - we need a solution based on cheap resources, not just something theoretical. This means that to compress a file we need to enumerate all versions of its content compatible with additional known information, then encode the index of one specific version which matches the file data. If you'd ever try using combinatorics to compute the numbers of content versions in some simple cases, you'd see that compression ratio is very limited when the volume of external information is low (dictionaries etc) - log2 of the number of possible data versions would be close to original size of the file.
    3 replies | 121 view(s)
  • Trench's Avatar
    4th August 2020, 04:42
    Thanks But for so many to not give their reasons it shows it is not that important, especially for the person that is reading this that did not reply. lol Your reply is interesting. You say its very important but you mainly show money are the important core cause a few times. If that is the case then it is not important in a way in how may see it. Yet I assume you want it free to some degree despite it would save people money. As stated in my other post HD manufacturers don't make that much money. But if its to save money then how much money will people pay to save money. :) But only if it gets significant results. Some people pay for winrar which the gain is not better and even worse than the free which doesn't justify but they are magically afloat. When I said if its a game I kind of find a hobby and competition a bit of a game to do it for fun. Well I was hoping for more than money reason but at least you stated 1 good thing besides money which is a faster cpu.;) I guess everyone says the same which I am a bit disappointed. I telling people about the internet before anyone went online but it was hard to describe something that has not been around but I did say it can do anything of everything. Hard to describe things no one experienced yet. The same thing with file compression we only know what we experience since creativity is not as easy as many may assume, which is why I keep saying a programmer can not discover a far better file compression. They have the ability to write the code but not create it. It reminds me a computer science professor / Entrepreneur David Gelernter says. "The thing I don’t look for in a developer is a degree in computer science." He beat Apple in court to win his patent case which he doesn't go with free mentality since as it shows corporations use other that have good intentions to gain from their hard work. It is not an insult to programmer in what he said but a limitation of mastering another field which would be an insult to the other other field of work if anyone can do it. Jack of all trade is a mater of nothing as the saying goes. It's not a discouragement but and encouragement how everyone should focus their skills to be more efficient to get results. Their are the few exception but that does not make the rule. On a side note this free mentality hurts people that want things for free when corporations benefit from it to crush the competition that make free things so that they can sell their product that doesn't have that much passion and innovation gets stuck. People with passion have to not give their passion away and should be as valuable as their desire. Just my view. Maybe if I ask people in another field of work like philosophy, math, theology I might get a different answer. :)
    3 replies | 121 view(s)
  • well's Avatar
    4th August 2020, 01:26
    8 bits is octet from begining of times
    2 replies | 122 view(s)
  • suryakandau@yahoo.co.id's Avatar
    3rd August 2020, 23:04
    @ms1 have you received my newest submission for gdcc ? Thank you
    51 replies | 4487 view(s)
  • moisesmcardona's Avatar
    3rd August 2020, 20:47
    moisesmcardona replied to a thread paq8px in Data Compression
    Maybe the problem was that I overloaded the CPU. I've set a 50% task limit on my machine and the CPU is at around 70%. I also recompiled it with -DNATIVECPU=ON since I run paq8px exclusively on my AVX2 CPUs and it works on both AMD and Intel just fine. Just hoping I don't get into the 100% extract issue I've been before. I'm not really sure if that was caused because of that compilation flag. Will report how it goes.
    2031 replies | 557477 view(s)
  • lz77's Avatar
    3rd August 2020, 14:47
    lzturbo -22 -b450 (LZ77 only) compresses TS40.txt up to 167.9 Mb. lzturbo -32 -b450 (LZ77+"Asymmetric Numeral System TurboANX" (I think, should be TurboANS...)) compresses it up to 125 Mb. How lzturbo with TurboANS decreases compressed size on 43 Mb and spends 1 sec. on it?? May be lzturbo -32 is not LZ77+TurboANS but something other than LZ77+TurboANS?
    12 replies | 607 view(s)
  • lz77's Avatar
    3rd August 2020, 12:45
    From https://globalcompetition.compression.ru/test4api/ : "The size of the input buffer is the block size to be used in the test (inSize = 32,768 bytes)." I think, it will be great to add 32 bytes to this inSize, because LZ77 compressor may try to read some (two or more) bytes after begin of an input buffer. The check out-of-bounds access of the buffer will cost time...
    51 replies | 4487 view(s)
  • Jarek's Avatar
    3rd August 2020, 08:56
    https://jpeg.org/items/20200803_press.html Part 1: Core coding system: https://www.iso.org/standard/77977.html
    40 replies | 4262 view(s)
  • comp1's Avatar
    3rd August 2020, 08:14
    :D
    2 replies | 122 view(s)
  • Shelwien's Avatar
    3rd August 2020, 08:11
    > How important is compression? Very, but there's a lot of duplicated terminology. Any prediction, generation, recognition, optimization are closely related to compression, basically any AI/ML/NN too. > Everyone talks about how to compress but why compress is not mentioned much or maybe I missed it. To save storage space or communication bandwidth, which have significant costs. > What would be the benefits of a better compression? More saved money. > What would be a detriment of better compression? Extra latency on data access, maybe obscure bugs/exploits. > Obviously not all compression are equal and the better you compress the better results. Actually there's no one clear metric of "compression goodness". Different use cases require different solutions. > Hard drive space has increased more than file compression from the 1980. The technology at that time was simply that rough. > I remember hard drives under 30MB and now over 30TB while compression did > not gain that much in percentage maybe due to lack of financial gains. Its not about money. Random compression methods don't exist, and lossless compression improvements are quite hard to discover and implement. > If file compression was better HD technology would not have increase as fast. Hardware technology improvements would stop soon enough, because of physical limits... in fact storage now is only ~2x cheaper than it was in 2011, and the sequence is logarithmic: https://www.backblaze.com/blog/wp-content/uploads/2017/07/chart-cost-per-gb-2017.jpg > If you make a file be compressed 10% how much of a benefit would it be? Is that "to 10%" or "by 10%"? In any case, if file size is reduced by 10%, then 10% of its storage cost is saved, it directly maps to money: without compression you needed 10 SSDs, now you need 9. > Is this a game for most how some kind of indicated? "For most" of whom? Some people like it as a hobby, some have related jobs, some like competitions and benchmarks. > Can it benefit or save the world? Well, its more likely to destroy it... we'd keep increasing density of randomness (that is, compressed/encrypted data) until universe breaks :) > Can it make CPU faster? It already does - branch prediction in recent cpus is pretty similar to CM. > Or is it something like to just to store all movies in your pocket? Newer movies would just have higher resolution, so they'd never all fit in any case :) > What is your perspective the better a file is compressed or not?? Actually its better if it isn't - you won't lose it from a single-bit error then. But at this point its hard to fully avoid it - for example, HDDs always use some entropy coding to store the data, it won't work otherwise: https://en.wikipedia.org/wiki/Run_length_limited#Need_for_RLL_coding and SSD sometimes have integrated LZ compression.
    3 replies | 121 view(s)
  • Shelwien's Avatar
    3rd August 2020, 06:50
    Well, http://fuckilfakp5d6a5t.onion.pet/7.3/ - that site has all of them, and its on TOR, this is just a gate.
    4 replies | 490 view(s)
  • Trench's Avatar
    3rd August 2020, 01:12
    US patent gave 2 patents for random compression which are Patent 5,488,364 on compression of random data (Expired 2000 FAILURE TO PAY MAINTENANCE FEES) Patent 5,533,051 on compression of random data (Expired 2008 FAILURE TO PAY MAINTENANCE FEES) https://www.uspto.gov Oddly you cant get a patent for a formula but a copyright? I am surprised the Us patent office gave them a patent. Maybe cause its a racket to take money? If KFC patented their secret ingredients they might be out of Business. but the issue is do they work? one site says they believe that its bs despite they got a patent. It didn't seem clear anyway how they wanted to implement it. http://gailly.net/05533051.html and http://gailly.net/05488364.html decription of one of them http://www.freepatentsonline.com/5488364.html ​on another note top storage companies Western digital gross profit 3,163,000 https://finance.yahoo.com/quote/WDC/financials?p=WDC Seagate gross profit 2,842,000 https://finance.yahoo.com/quote/STX/financials?p=STX in short very little.So not like those companies can afford to pay you if you have something.
    1 replies | 153 view(s)
  • Trench's Avatar
    2nd August 2020, 17:35
    Everyone talks about how to compress but why compress is not mentioned much or maybe I missed it. What would be the benefits of a better compression? What would be a detriment of better compression? Obviously not all compression are equal and the better you compress the better results. Hard drive space has increased more than file compression from the 1980. I remember hard drives under 30MB and now over 30TB while compression did not gain that much in percentage maybe due to lack of financial gains. If file compression was better HD technology would not have increase as fast. If you make a file be compressed 10% how much of a benefit would it be? What if 100% or 1000% ? does it matter and if so in what way? Is this a game for most how some kind of indicated? Or some use this as a stepping stone for a better job to put on resume? Can it benefit or save the world? Can it make CPU faster? Or is it something like to just to store all movies in your pocket? Everyone has a reason so what are yours since everyone has a different perspective. To get the ball rolling. A 10% gain might help with maybe 1% of ewaste, less pollution, less files being lost. Can CPU also go faster? The bad thing about file compression is it can hurt economy since if no one buys an updated phone with more space or HD then less sales, stock market less, less taxes. A 1000% gain might help with maybe 10%?? My percentage can obviously be wrong but its a guess. Would higher be better to make a big change or just nice to have? What is your perspective the better a file is compressed or not??
    3 replies | 121 view(s)
  • JamesWasil's Avatar
    2nd August 2020, 16:48
    And 4 bits is a nibble... Are 2 bits just a taste of what's there?
    2 replies | 122 view(s)
  • JamesWasil's Avatar
    2nd August 2020, 16:46
    Did he steal that from Maxwell and Lorentz, too?
    1 replies | 78 view(s)
  • LawCounsels's Avatar
    2nd August 2020, 03:03
    Einstein's unrecognised Masterstroke : variable speed of light ! ....how Minskowski led all of physics astray with 3+1 dimensions Space-Time https://m.youtube.com/watch?v=TDjgQ_megMI
    1 replies | 78 view(s)
  • Sportman's Avatar
    1st August 2020, 23:56
    Input: 1,000,000,124 bytes, enwik9.bwt Output: 468,906,482 bytes, 3.025 sec. 315.26 MiB/s - 1.414 sec. 674.45 MiB/s, 46.89%, rle 0.0.0.6 -bench VB.NET 468,906,482 bytes, 3.795 sec. 251.30 MiB/s - 2.571 sec. 370.94 MiB/s, 46.89%, rle 0.0.0.6 VB.NET 468,906,482 bytes, 1.766 sec. 540.02 MiB/s - 0.922 sec. 1034.35 MiB/s, 46.89%, rle 0.0.0.6 -bench C++ GCC 468,906,482 bytes, 2.765 sec. 344.91 MiB/s - 1.437 sec. 663.66 MiB/s, 46.89%, rle 0.0.0.6 C++ GCC 468,906,482 bytes, 1.826 sec. 522.28 MiB/s - 1.126 sec. 846.96 MiB/s, 46.89%, rle 0.0.0.6 -bench C++ Intel 468,906,482 bytes, 2.683 sec. 355.45 MiB/s - 1.394 sec. 684.13 MiB/s, 46.89%, rle 0.0.0.6 C++ Intel 468,906,482 bytes, 2.032 sec. 469.33 MiB/s - 1.202 sec. 793.41 MiB/s, 46.89%, rle 0.0.0.6 -bench C++VS 468,906,482 bytes, 2.975 sec. 320.56 MiB/s - 1.506 sec. 633.25 MiB/s, 46.89%, rle 0.0.0.6 C++ VS Input: 400,000,052 bytes - TS40.bwt Output: 235,490,336 bytes, 1.349 sec. 282.78 MiB/s 0.613 sec. 622.30 MiB/s, 58.87%, rle 0.0.0.6 -bench VB.NET 235,490,336 bytes, 1.690 sec. 225.72 MiB/s 1.105 sec. 345.22 MiB/s, 58.87%, rle 0.0.0.6 VB.NET 235,490,336 bytes, 0.765 sec. 498.65 MiB/s 0.375 sec. 1017.25 MiB/s, 58.87%, rle 0.0.0.6 -bench C++ GCC 235,490,336 bytes, 1.234 sec. 309.13 MiB/s 0.624 sec. 611.33 MiB/s, 58.87%, rle 0.0.0.6 C++ GCC 235,490,336 bytes, 0.850 sec. 448.79 MiB/s 0.468 sec. 815.11 MiB/s, 58.87%, rle 0.0.0.6 -bench C++ Intel 235,490,336 bytes, 1.182 sec. 322.73 MiB/s 0.607 sec. 628.45 MiB/s, 58.87%, rle 0.0.0.6 C++ Intel 235,490,336 bytes, 0.929 sec. 410.62 MiB/s 0.501 sec. 761.42 MiB/s, 58.87%, rle 0.0.0.6 -bench C++VS 235,490,336 bytes, 1.318 sec. 289.43 MiB/s 0.670 sec. 569.36 MiB/s, 58.87%, rle 0.0.0.6 C++VS
    10 replies | 821 view(s)
  • Sportman's Avatar
    1st August 2020, 23:05
    Added RLE version 0.0.0.6 with encoding fix for some cases.
    10 replies | 821 view(s)
  • withmorten's Avatar
    1st August 2020, 20:13
    Hey there, do you have another link for the 7.3 sdk? The MEGA link is already offline and I didn't manage to grab it :)
    4 replies | 490 view(s)
  • lz77's Avatar
    1st August 2020, 17:24
    Hm... I doubt that Huffman compression literal/match counters as well as prefixes can give something... Yes, match length = 4 is very common, but what does it do? By the way: what's compressed after LZ4 in zstd: offsets or counters? Ich bin verwirrt... :)
    12 replies | 607 view(s)
  • Dresdenboy's Avatar
    1st August 2020, 16:37
    I think the best way to see potential optimizations is to gather a lot of statistics (distribution and frequency of matches for len 1 to max len, match pos, literal run lengths, match run lengths etc.).
    12 replies | 607 view(s)
  • Dresdenboy's Avatar
    1st August 2020, 16:29
    It's actually not that surprising. ;) GCC added good features over the last versions. And MS VC never was one of the best. It just does what it is supposed to do. I think this was not much different in the 90s, with Borland and Watcom as competitors, later ICC, PGI, Sun, Clang/LLVM and some other good ones for x86/x64 (especially for vectorization and autoparallelization) But it is suprising, that GCC is sometimes better than ICC now on some archs. This can be seen in compiler usage in SPEC benchmark runs. Addendum: Of course, a reason for that is that GCC got dev support from CPU manufacturers like AMD. I tracked their contributions in the past to be able to derive the microarchitecture of their next CPU families (Bulldozer and Zen), like in this blog entry (which is even linked from the Zen Wikipedia article ;)).
    10 replies | 821 view(s)
  • Sportman's Avatar
    1st August 2020, 15:55
    Added RLE version 0.0.0.5 with improved encoding speed and Linux/Mac OS X displayed time/speed fix.
    10 replies | 821 view(s)
  • lz77's Avatar
    1st August 2020, 11:16
    My simple kids LZ77 compressor is faster than lzturbo -22 but compresses the file TS40.txt better. Also I wrote simple preprocessor for this file. Preprocessor + fast LZ77 compress TS40.txt up to 143 Mb. But I want to get final size at least 120 Mb. I tried to use static Huffman for additional compression: I've gathered 8 upper bits from each offset in one file (62 Mb), then I compressed it with the Huffman codec. After that I expected to get at least 35 Mb, but got 60 Mb! I'm surprised: lzturbo without any preprocessor compresses TS40.txt up to 125 Mb. How and what lzturbo compresses after LZ77? Huffman compression of literals will not give much because the size of literals is 4.2 Mb. Does tANS/rANS compress much better than static Huffman? Don't believe in it... :_help2:
    12 replies | 607 view(s)
  • Shelwien's Avatar
    1st August 2020, 03:13
    7.4 installer leak, without password: https://bbs.pediy.com/thread-261050.htm from this: https://team-ira.com/index.php?/topic/4216-stolen-versions-of-ida-pro-74/
    4 replies | 490 view(s)
  • Ms1's Avatar
    1st August 2020, 01:25
    Thanks, noted. I guess there are much better options in other situations as well, and we can try to find them automatically. We would be happy to get submissions from authors, though.
    51 replies | 4487 view(s)
  • Ms1's Avatar
    1st August 2020, 01:19
    If possible, avoid attaching files. Mail servers nowadays know better than users. A link is safe (probably). There is no license grant or whatever. A submitted compressor belongs to the author(s). Actually, we don't send the executables to Huawei, believe it or not. We can't say if Huawei will be interested in buying something. But, apparently, certain divisions of the company are interested in the topic now. My own opinion: Reverse engineering, if you care, does not make sense. I hardly see how any big company may be doing this in such situations. If there will be something useful to reverse engineer, it will be cheaper to buy the author. Lock, stock, and barrel.
    51 replies | 4487 view(s)
  • Ms1's Avatar
    1st August 2020, 01:03
    Not blocked, but you are heading that way. We are glad that you stopped ignoring GPL, and the next step is showing and proving that the submitted compressors essentially differ from the original works created by other people.
    51 replies | 4487 view(s)
  • Sportman's Avatar
    1st August 2020, 00:33
    Added RLE version 0.0.0.4 with encoding fix for some cases and added Intel compiler binary.
    10 replies | 821 view(s)
More Activity