Activity Stream

Filter
Sort By Time Show
Recent Recent Popular Popular Anytime Anytime Last 24 Hours Last 24 Hours Last 7 Days Last 7 Days Last 30 Days Last 30 Days All All Photos Photos Forum Forums
  • mpais's Avatar
    Today, 16:21
    mpais replied to a thread paq8px in Data Compression
    I could've sworn I uploaded it even before writing the post text :confused: Oh well, at least the code tags are working again
    2027 replies | 554537 view(s)
  • moisesmcardona's Avatar
    Today, 16:17
    moisesmcardona replied to a thread paq8px in Data Compression
    No executable? :cool:
    2027 replies | 554537 view(s)
  • mpais's Avatar
    Today, 16:03
    mpais replied to a thread paq8px in Data Compression
    Changes: - New option switch "r" to perform initial retraining of the LSTM on text blocks - Support for DEC Alpha executable compression, with a specific transform and model As requested by Darek, I made a preliminary model for DEC Alpha executable code. Should get us very close to #2 spot on the Silesia Open Source Benchmark, early testing points to about 6.629.xxx bytes for mozilla (I'm currently running a test with this final version, should take about 9h)
    2027 replies | 554537 view(s)
  • Sportman's Avatar
    Today, 14:51
    Sportman replied to a thread 2019-nCoV in The Off-Topic Lounge
    Iran cover-up of deaths revealed by data leak: https://www.bbc.com/news/world-middle-east-53598965
    42 replies | 3642 view(s)
  • Dresdenboy's Avatar
    Today, 13:24
    Mne tozhe! :) Well I thought along the lines of looking at stats for lengths (e.g. 4624 len 2 matches, 3511 len 3 matches...), distances, literal run lengths between matches (0 to n bytes). For the len/offset relationship I used a matrix with either linear scale (e.g. for lengths) or binary (log2) scale for distances. And either by applying the encoding somehow to the stats or calculating the costs for encoding and providing stats for it, will also show, where the encoding might cost too many bits. This might really help getting ideas for improvements. Did you look at the probabilities of getting repeated offsets (same offset as one of the last n)? Re LZ4/Zstd: I don't know. Maybe the details described in https://tools.ietf.org/html/rfc8478 will help.
    4 replies | 316 view(s)
  • Dresdenboy's Avatar
    Today, 12:41
    I got hold of the BeRoExePacker depacker sources for some older version of 2008. This is not including LZBRA (LZSS+AC), a PAQ variant based on kkrunchy, or some LZP+context modelling variant. But it includes LZBRS, LZBRR and LZMA (see BeRo's blog for some details). The sources could be found in this Chinese forum: https://bbs.pediy.com/thread-71242.htm But since it required some complex registration and contains the packer, which triggers security mechanisms all over the place (Win, Chrome, Firefox..), I stripped the exe from the archive. The current version can be downloaded from BeRo's blog. LZBRS depacker (without CLD - clear direction flag, and source/dest init) is 69 Bytes in 32b x86 asm (going 16b would save 10B from long call adresses, and add a byte here and there for replacing LEAs, see disasm): 00000274 BE02714000 mov esi,0x407102 00000279 BF00204000 mov edi,0x402000 0000027E FC cld 0000027F AD lodsd 00000280 8D1C07 lea ebx, 00000283 B080 mov al,0x80 00000285 3BFB cmp edi,ebx 00000287 733B jnc 0x2c4 00000289 E81C000000 call dword 0x2aa 0000028E 7203 jc 0x293 00000290 A4 movsb 00000291 EBF2 jmp short 0x285 00000293 E81A000000 call dword 0x2b2 00000298 8D51FF lea edx, 0000029B E812000000 call dword 0x2b2 000002A0 56 push esi 000002A1 8BF7 mov esi,edi 000002A3 2BF2 sub esi,edx 000002A5 F3A4 rep movsb 000002A7 5E pop esi 000002A8 EBDB jmp short 0x285 000002AA 02C0 add al,al 000002AC 7503 jnz 0x2b1 000002AE AC lodsb 000002AF 12C0 adc al,al 000002B1 C3 ret 000002B2 33C9 xor ecx,ecx 000002B4 41 inc ecx 000002B5 E8F0FFFFFF call dword 0x2aa 000002BA 13C9 adc ecx,ecx 000002BC E8E9FFFFFF call dword 0x2aa 000002C1 72F2 jc 0x2b5 000002C3 C3 ret LZBRR depacker is (same conditions) 149 Bytes in 32b x86 asm (10 long relative call adresses, which would be 20B less in 16b asm): 00000274 BE52714000 mov esi,0x407152 00000279 BF00204000 mov edi,0x402000 0000027E FC cld 0000027F B280 mov dl,0x80 00000281 33DB xor ebx,ebx 00000283 A4 movsb 00000284 B302 mov bl,0x2 00000286 E86D000000 call dword 0x2f8 0000028B 73F6 jnc 0x283 0000028D 33C9 xor ecx,ecx 0000028F E864000000 call dword 0x2f8 00000294 731C jnc 0x2b2 00000296 33C0 xor eax,eax 00000298 E85B000000 call dword 0x2f8 0000029D 7323 jnc 0x2c2 0000029F B302 mov bl,0x2 000002A1 41 inc ecx 000002A2 B010 mov al,0x10 000002A4 E84F000000 call dword 0x2f8 000002A9 12C0 adc al,al 000002AB 73F7 jnc 0x2a4 000002AD 753F jnz 0x2ee 000002AF AA stosb 000002B0 EBD4 jmp short 0x286 000002B2 E84D000000 call dword 0x304 000002B7 2BCB sub ecx,ebx 000002B9 7510 jnz 0x2cb 000002BB E842000000 call dword 0x302 000002C0 EB28 jmp short 0x2ea 000002C2 AC lodsb 000002C3 D1E8 shr eax,1 000002C5 744D jz 0x314 000002C7 13C9 adc ecx,ecx 000002C9 EB1C jmp short 0x2e7 000002CB 91 xchg eax,ecx 000002CC 48 dec eax 000002CD C1E008 shl eax,byte 0x8 000002D0 AC lodsb 000002D1 E82C000000 call dword 0x302 000002D6 3D007D0000 cmp eax,0x7d00 000002DB 730A jnc 0x2e7 000002DD 80FC05 cmp ah,0x5 000002E0 7306 jnc 0x2e8 000002E2 83F87F cmp eax,byte +0x7f 000002E5 7702 ja 0x2e9 000002E7 41 inc ecx 000002E8 41 inc ecx 000002E9 95 xchg eax,ebp 000002EA 8BC5 mov eax,ebp 000002EC B301 mov bl,0x1 000002EE 56 push esi 000002EF 8BF7 mov esi,edi 000002F1 2BF0 sub esi,eax 000002F3 F3A4 rep movsb 000002F5 5E pop esi 000002F6 EB8E jmp short 0x286 000002F8 02D2 add dl,dl 000002FA 7505 jnz 0x301 000002FC 8A16 mov dl, 000002FE 46 inc esi 000002FF 12D2 adc dl,dl 00000301 C3 ret 00000302 33C9 xor ecx,ecx 00000304 41 inc ecx 00000305 E8EEFFFFFF call dword 0x2f8 0000030A 13C9 adc ecx,ecx 0000030C E8E7FFFFFF call dword 0x2f8 00000311 72F2 jc 0x305 00000313 C3 ret
    44 replies | 3362 view(s)
  • Sportman's Avatar
    Today, 11:29
    Basic computer stuff as mouse https://en.wikipedia.org/wiki/Douglas_Engelbart#/media/File:SRI_Computer_Mouse.jpg, copy/paste/edit, network, hypertext link, video conferencing, shared working etc. we use today was already invented round 1965 as oN-Line System https://en.wikipedia.org/wiki/NLS_(computer_system) at ARC https://en.wikipedia.org/wiki/Augmentation_Research_Center and SRI https://en.wikipedia.org/wiki/SRI_International December 9, 1968 (The Mother of All Demos https://en.wikipedia.org/wiki/The_Mother_of_All_Demos) demo: Summary: https://www.youtube.com/watch?v=B6rKUf9DWRI Full: https://www.youtube.com/watch?v=yJDv-zdhzMY
    42 replies | 2056 view(s)
  • Shelwien's Avatar
    Today, 07:25
    > But for so many to not give their reasons it shows it is not that important, More like nobody here cares about the question - I suppose you can try asking it on Quora instead: https://www.quora.com/profile/Matt-Mahoney-2 Also your posts are kind of hard to read. > You say its very important but you mainly show > money are the important core cause a few times. I don't understand why'd you expect some kind of ideology for people to work on data compression. Yes, with current tech we can avoid using any compression algorithms at all, so its certainly not something essential, just a useful option. On other hand, in cases without hardware solutions (eg. we want to add some new features to device's firmware, but can't replace the firmware flash chip which has a limited size) we might have to look for software solutions. > Yet I assume you want it free to some degree despite it would save people money. There's a significant difference between a compression algorithm and a product based on it, which is actually designed to save money in some specific use cases. > As stated in my other post HD manufacturers don't make that much money. Well, you'd have to look at media content providers instead (netflix, youtube etc). Online video won't be possible without significant effort invested into video codec development (which is also compression). > Some people pay for winrar which the gain is not better and even worse than > the free which doesn't justify but they are magically afloat. There's some weird logic in play, but its actually easier for corporations to buy commercial software for common tasks rather than use free equivalents. > Well I was hoping for more than money reason There're plenty of cases where compression is useful (eg. improvement of encryption security) or is the only solution to some technical problem (sending a picture in a twitter post). Soon enough (maybe in 10-20 years) it would also be the only way to improve storage density (once switching to a higher density tech becomes too expensive). But since there're technical workarounds for using compression in most cases, money can be said to be the only real reason. > I guess everyone says the same which I am a bit disappointed. Well, your brain does a lot of data compression and translation between different coding methods (google "data compression human brain"), so it can be said that its unavoidable? On other hand, you don't have to know the implementation details for your brain to work. > which is why I keep saying a programmer can not discover a far better file > compression. They have the ability to write the code but not create it. That's not how it works at all. 1) Programming is not a mechanical task like you see it. In most cases there's no one best solution which just requires "writing the code". 2) In mathematical terms, compression is a solved problem (find a shortest set of cpu instructions that outputs given data), but practically that solution is impossible to use. So its up to programmers to find ways to make efficient algorithms for available hardware - there's plenty of creativity involved, in facts that's one of the main reasons as to why compression is interesting. 3) Everything is up to the volume of external information. In theory, we can replace any file with a reference to some global storage system - that's already near-infinite compression in practice, for most users. Again, the problem is purely technical - we need a solution based on cheap resources, not just something theoretical. This means that to compress a file we need to enumerate all versions of its content compatible with additional known information, then encode the index of one specific version which matches the file data. If you'd ever try using combinatorics to compute the numbers of content versions in some simple cases, you'd see that compression ratio is very limited when the volume of external information is low (dictionaries etc) - log2 of the number of possible data versions would be close to original size of the file.
    3 replies | 80 view(s)
  • Trench's Avatar
    Today, 04:42
    Thanks But for so many to not give their reasons it shows it is not that important, especially for the person that is reading this that did not reply. lol Your reply is interesting. You say its very important but you mainly show money are the important core cause a few times. If that is the case then it is not important in a way in how may see it. Yet I assume you want it free to some degree despite it would save people money. As stated in my other post HD manufacturers don't make that much money. But if its to save money then how much money will people pay to save money. :) But only if it gets significant results. Some people pay for winrar which the gain is not better and even worse than the free which doesn't justify but they are magically afloat. When I said if its a game I kind of find a hobby and competition a bit of a game to do it for fun. Well I was hoping for more than money reason but at least you stated 1 good thing besides money which is a faster cpu.;) I guess everyone says the same which I am a bit disappointed. I telling people about the internet before anyone went online but it was hard to describe something that has not been around but I did say it can do anything of everything. Hard to describe things no one experienced yet. The same thing with file compression we only know what we experience since creativity is not as easy as many may assume, which is why I keep saying a programmer can not discover a far better file compression. They have the ability to write the code but not create it. It reminds me a computer science professor / Entrepreneur David Gelernter says. "The thing I don’t look for in a developer is a degree in computer science." He beat Apple in court to win his patent case which he doesn't go with free mentality since as it shows corporations use other that have good intentions to gain from their hard work. It is not an insult to programmer in what he said but a limitation of mastering another field which would be an insult to the other other field of work if anyone can do it. Jack of all trade is a mater of nothing as the saying goes. It's not a discouragement but and encouragement how everyone should focus their skills to be more efficient to get results. Their are the few exception but that does not make the rule. On a side note this free mentality hurts people that want things for free when corporations benefit from it to crush the competition that make free things so that they can sell their product that doesn't have that much passion and innovation gets stuck. People with passion have to not give their passion away and should be as valuable as their desire. Just my view. Maybe if I ask people in another field of work like philosophy, math, theology I might get a different answer. :)
    3 replies | 80 view(s)
  • well's Avatar
    Today, 01:26
    8 bits is octet from begining of times
    2 replies | 93 view(s)
  • suryakandau@yahoo.co.id's Avatar
    Yesterday, 23:04
    @ms1 have you received my newest submission for gdcc ? Thank you
    51 replies | 4264 view(s)
  • moisesmcardona's Avatar
    Yesterday, 20:47
    moisesmcardona replied to a thread paq8px in Data Compression
    Maybe the problem was that I overloaded the CPU. I've set a 50% task limit on my machine and the CPU is at around 70%. I also recompiled it with -DNATIVECPU=ON since I run paq8px exclusively on my AVX2 CPUs and it works on both AMD and Intel just fine. Just hoping I don't get into the 100% extract issue I've been before. I'm not really sure if that was caused because of that compilation flag. Will report how it goes.
    2027 replies | 554537 view(s)
  • lz77's Avatar
    Yesterday, 14:47
    lzturbo -22 -b450 (LZ77 only) compresses TS40.txt up to 167.9 Mb. lzturbo -32 -b450 (LZ77+"Asymmetric Numeral System TurboANX" (I think, should be TurboANS...)) compresses it up to 125 Mb. How lzturbo with TurboANS decreases compressed size on 43 Mb and spends 1 sec. on it?? May be lzturbo -32 is not LZ77+TurboANS but something other than LZ77+TurboANS?
    4 replies | 316 view(s)
  • lz77's Avatar
    Yesterday, 12:45
    From https://globalcompetition.compression.ru/test4api/ : "The size of the input buffer is the block size to be used in the test (inSize = 32,768 bytes)." I think, it will be great to add 32 bytes to this inSize, because LZ77 compressor may try to read some (two or more) bytes after begin of an input buffer. The check out-of-bounds access of the buffer will cost time...
    51 replies | 4264 view(s)
  • Jarek's Avatar
    Yesterday, 08:56
    https://jpeg.org/items/20200803_press.html Part 1: Core coding system: https://www.iso.org/standard/77977.html
    40 replies | 4097 view(s)
  • comp1's Avatar
    Yesterday, 08:14
    :D
    2 replies | 93 view(s)
  • Shelwien's Avatar
    Yesterday, 08:11
    > How important is compression? Very, but there's a lot of duplicated terminology. Any prediction, generation, recognition, optimization are closely related to compression, basically any AI/ML/NN too. > Everyone talks about how to compress but why compress is not mentioned much or maybe I missed it. To save storage space or communication bandwidth, which have significant costs. > What would be the benefits of a better compression? More saved money. > What would be a detriment of better compression? Extra latency on data access, maybe obscure bugs/exploits. > Obviously not all compression are equal and the better you compress the better results. Actually there's no one clear metric of "compression goodness". Different use cases require different solutions. > Hard drive space has increased more than file compression from the 1980. The technology at that time was simply that rough. > I remember hard drives under 30MB and now over 30TB while compression did > not gain that much in percentage maybe due to lack of financial gains. Its not about money. Random compression methods don't exist, and lossless compression improvements are quite hard to discover and implement. > If file compression was better HD technology would not have increase as fast. Hardware technology improvements would stop soon enough, because of physical limits... in fact storage now is only ~2x cheaper than it was in 2011, and the sequence is logarithmic: https://www.backblaze.com/blog/wp-content/uploads/2017/07/chart-cost-per-gb-2017.jpg > If you make a file be compressed 10% how much of a benefit would it be? Is that "to 10%" or "by 10%"? In any case, if file size is reduced by 10%, then 10% of its storage cost is saved, it directly maps to money: without compression you needed 10 SSDs, now you need 9. > Is this a game for most how some kind of indicated? "For most" of whom? Some people like it as a hobby, some have related jobs, some like competitions and benchmarks. > Can it benefit or save the world? Well, its more likely to destroy it... we'd keep increasing density of randomness (that is, compressed/encrypted data) until universe breaks :) > Can it make CPU faster? It already does - branch prediction in recent cpus is pretty similar to CM. > Or is it something like to just to store all movies in your pocket? Newer movies would just have higher resolution, so they'd never all fit in any case :) > What is your perspective the better a file is compressed or not?? Actually its better if it isn't - you won't lose it from a single-bit error then. But at this point its hard to fully avoid it - for example, HDDs always use some entropy coding to store the data, it won't work otherwise: https://en.wikipedia.org/wiki/Run_length_limited#Need_for_RLL_coding and SSD sometimes have integrated LZ compression.
    3 replies | 80 view(s)
  • Shelwien's Avatar
    Yesterday, 06:50
    Well, http://fuckilfakp5d6a5t.onion.pet/7.3/ - that site has all of them, and its on TOR, this is just a gate.
    3 replies | 398 view(s)
  • Trench's Avatar
    Yesterday, 01:12
    US patent gave 2 patents for random compression which are Patent 5,488,364 on compression of random data (Expired 2000 FAILURE TO PAY MAINTENANCE FEES) Patent 5,533,051 on compression of random data (Expired 2008 FAILURE TO PAY MAINTENANCE FEES) https://www.uspto.gov Oddly you cant get a patent for a formula but a copyright? I am surprised the Us patent office gave them a patent. Maybe cause its a racket to take money? If KFC patented their secret ingredients they might be out of Business. but the issue is do they work? one site says they believe that its bs despite they got a patent. It didn't seem clear anyway how they wanted to implement it. http://gailly.net/05533051.html and http://gailly.net/05488364.html decription of one of them http://www.freepatentsonline.com/5488364.html ​on another note top storage companies Western digital gross profit 3,163,000 https://finance.yahoo.com/quote/WDC/financials?p=WDC Seagate gross profit 2,842,000 https://finance.yahoo.com/quote/STX/financials?p=STX in short very little.So not like those companies can afford to pay you if you have something.
    1 replies | 139 view(s)
  • Trench's Avatar
    2nd August 2020, 17:35
    Everyone talks about how to compress but why compress is not mentioned much or maybe I missed it. What would be the benefits of a better compression? What would be a detriment of better compression? Obviously not all compression are equal and the better you compress the better results. Hard drive space has increased more than file compression from the 1980. I remember hard drives under 30MB and now over 30TB while compression did not gain that much in percentage maybe due to lack of financial gains. If file compression was better HD technology would not have increase as fast. If you make a file be compressed 10% how much of a benefit would it be? What if 100% or 1000% ? does it matter and if so in what way? Is this a game for most how some kind of indicated? Or some use this as a stepping stone for a better job to put on resume? Can it benefit or save the world? Can it make CPU faster? Or is it something like to just to store all movies in your pocket? Everyone has a reason so what are yours since everyone has a different perspective. To get the ball rolling. A 10% gain might help with maybe 1% of ewaste, less pollution, less files being lost. Can CPU also go faster? The bad thing about file compression is it can hurt economy since if no one buys an updated phone with more space or HD then less sales, stock market less, less taxes. A 1000% gain might help with maybe 10%?? My percentage can obviously be wrong but its a guess. Would higher be better to make a big change or just nice to have? What is your perspective the better a file is compressed or not??
    3 replies | 80 view(s)
  • JamesWasil's Avatar
    2nd August 2020, 16:48
    And 4 bits is a nibble... Are 2 bits just a taste of what's there?
    2 replies | 93 view(s)
  • JamesWasil's Avatar
    2nd August 2020, 16:46
    Did he steal that from Maxwell and Lorentz, too?
    1 replies | 63 view(s)
  • LawCounsels's Avatar
    2nd August 2020, 03:03
    Einstein's unrecognised Masterstroke : variable speed of light ! ....how Minskowski led all of physics astray with 3+1 dimensions Space-Time https://m.youtube.com/watch?v=TDjgQ_megMI
    1 replies | 63 view(s)
  • Sportman's Avatar
    1st August 2020, 23:56
    Input: 1,000,000,124 bytes, enwik9.bwt Output: 468,906,482 bytes, 3.025 sec. 315.26 MiB/s - 1.414 sec. 674.45 MiB/s, 46.89%, rle 0.0.0.6 -bench VB.NET 468,906,482 bytes, 3.795 sec. 251.30 MiB/s - 2.571 sec. 370.94 MiB/s, 46.89%, rle 0.0.0.6 VB.NET 468,906,482 bytes, 1.766 sec. 540.02 MiB/s - 0.922 sec. 1034.35 MiB/s, 46.89%, rle 0.0.0.6 -bench C++ GCC 468,906,482 bytes, 2.765 sec. 344.91 MiB/s - 1.437 sec. 663.66 MiB/s, 46.89%, rle 0.0.0.6 C++ GCC 468,906,482 bytes, 1.826 sec. 522.28 MiB/s - 1.126 sec. 846.96 MiB/s, 46.89%, rle 0.0.0.6 -bench C++ Intel 468,906,482 bytes, 2.683 sec. 355.45 MiB/s - 1.394 sec. 684.13 MiB/s, 46.89%, rle 0.0.0.6 C++ Intel 468,906,482 bytes, 2.032 sec. 469.33 MiB/s - 1.202 sec. 793.41 MiB/s, 46.89%, rle 0.0.0.6 -bench C++VS 468,906,482 bytes, 2.975 sec. 320.56 MiB/s - 1.506 sec. 633.25 MiB/s, 46.89%, rle 0.0.0.6 C++ VS Input: 400,000,052 bytes - TS40.bwt Output: 235,490,336 bytes, 1.349 sec. 282.78 MiB/s 0.613 sec. 622.30 MiB/s, 58.87%, rle 0.0.0.6 -bench VB.NET 235,490,336 bytes, 1.690 sec. 225.72 MiB/s 1.105 sec. 345.22 MiB/s, 58.87%, rle 0.0.0.6 VB.NET 235,490,336 bytes, 0.765 sec. 498.65 MiB/s 0.375 sec. 1017.25 MiB/s, 58.87%, rle 0.0.0.6 -bench C++ GCC 235,490,336 bytes, 1.234 sec. 309.13 MiB/s 0.624 sec. 611.33 MiB/s, 58.87%, rle 0.0.0.6 C++ GCC 235,490,336 bytes, 0.850 sec. 448.79 MiB/s 0.468 sec. 815.11 MiB/s, 58.87%, rle 0.0.0.6 -bench C++ Intel 235,490,336 bytes, 1.182 sec. 322.73 MiB/s 0.607 sec. 628.45 MiB/s, 58.87%, rle 0.0.0.6 C++ Intel 235,490,336 bytes, 0.929 sec. 410.62 MiB/s 0.501 sec. 761.42 MiB/s, 58.87%, rle 0.0.0.6 -bench C++VS 235,490,336 bytes, 1.318 sec. 289.43 MiB/s 0.670 sec. 569.36 MiB/s, 58.87%, rle 0.0.0.6 C++VS
    10 replies | 747 view(s)
  • Sportman's Avatar
    1st August 2020, 23:05
    Added RLE version 0.0.0.6 with encoding fix for some cases.
    10 replies | 747 view(s)
  • withmorten's Avatar
    1st August 2020, 20:13
    Hey there, do you have another link for the 7.3 sdk? The MEGA link is already offline and I didn't manage to grab it :)
    3 replies | 398 view(s)
  • lz77's Avatar
    1st August 2020, 17:24
    Hm... I doubt that Huffman compression literal/match counters as well as prefixes can give something... Yes, match length = 4 is very common, but what does it do? By the way: what's compressed after LZ4 in zstd: offsets or counters? Ich bin verwirrt... :)
    4 replies | 316 view(s)
  • Dresdenboy's Avatar
    1st August 2020, 16:37
    I think the best way to see potential optimizations is to gather a lot of statistics (distribution and frequency of matches for len 1 to max len, match pos, literal run lengths, match run lengths etc.).
    4 replies | 316 view(s)
  • Dresdenboy's Avatar
    1st August 2020, 16:29
    It's actually not that surprising. ;) GCC added good features over the last versions. And MS VC never was one of the best. It just does what it is supposed to do. I think this was not much different in the 90s, with Borland and Watcom as competitors, later ICC, PGI, Sun, Clang/LLVM and some other good ones for x86/x64 (especially for vectorization and autoparallelization) But it is suprising, that GCC is sometimes better than ICC now on some archs. This can be seen in compiler usage in SPEC benchmark runs. Addendum: Of course, a reason for that is that GCC got dev support from CPU manufacturers like AMD. I tracked their contributions in the past to be able to derive the microarchitecture of their next CPU families (Bulldozer and Zen), like in this blog entry (which is even linked from the Zen Wikipedia article ;)).
    10 replies | 747 view(s)
  • Sportman's Avatar
    1st August 2020, 15:55
    Added RLE version 0.0.0.5 with improved encoding speed and Linux/Mac OS X displayed time/speed fix.
    10 replies | 747 view(s)
  • lz77's Avatar
    1st August 2020, 11:16
    My simple kids LZ77 compressor is faster than lzturbo -22 but compresses the file TS40.txt better. Also I wrote simple preprocessor for this file. Preprocessor + fast LZ77 compress TS40.txt up to 143 Mb. But I want to get final size at least 120 Mb. I tried to use static Huffman for additional compression: I've gathered 8 upper bits from each offset in one file (62 Mb), then I compressed it with the Huffman codec. After that I expected to get at least 35 Mb, but got 60 Mb! I'm surprised: lzturbo without any preprocessor compresses TS40.txt up to 125 Mb. How and what lzturbo compresses after LZ77? Huffman compression of literals will not give much because the size of literals is 4.2 Mb. Does tANS/rANS compress much better than static Huffman? Don't believe in it... :_help2:
    4 replies | 316 view(s)
  • Shelwien's Avatar
    1st August 2020, 03:13
    7.4 installer leak, without password: https://bbs.pediy.com/thread-261050.htm
    3 replies | 398 view(s)
  • Ms1's Avatar
    1st August 2020, 01:25
    Thanks, noted. I guess there are much better options in other situations as well, and we can try to find them automatically. We would be happy to get submissions from authors, though.
    51 replies | 4264 view(s)
  • Ms1's Avatar
    1st August 2020, 01:19
    If possible, avoid attaching files. Mail servers nowadays know better than users. A link is safe (probably). There is no license grant or whatever. A submitted compressor belongs to the author(s). Actually, we don't send the executables to Huawei, believe it or not. We can't say if Huawei will be interested in buying something. But, apparently, certain divisions of the company are interested in the topic now. My own opinion: Reverse engineering, if you care, does not make sense. I hardly see how any big company may be doing this in such situations. If there will be something useful to reverse engineer, it will be cheaper to buy the author. Lock, stock, and barrel.
    51 replies | 4264 view(s)
  • Ms1's Avatar
    1st August 2020, 01:03
    Not blocked, but you are heading that way. We are glad that you stopped ignoring GPL, and the next step is showing and proving that the submitted compressors essentially differ from the original works created by other people.
    51 replies | 4264 view(s)
  • Sportman's Avatar
    1st August 2020, 00:33
    Added RLE version 0.0.0.4 with encoding fix for some cases and added Intel compiler binary.
    10 replies | 747 view(s)
  • JamesWasil's Avatar
    31st July 2020, 23:02
    Google search results are no longer good after 2013. They return bad results and the opposite of what you search for, or things that are not relevant. Part of this is due to commercialization and ads, but other reasons are censorship and other agendas that make things no longer findable there and search results from Google are now often useless. Usually asking here on the forum, someone will know. Stackexchange sometimes, and if you have to use a search engine, duckduckgo, Bing, dogpile, and Yahoo (for older but still semi-relevant results) will be better than Google for anything.
    6 replies | 264 view(s)
  • JamesWasil's Avatar
    31st July 2020, 22:51
    Gmail won't let you send attachments as 7zip/7z either anymore. For a while they did up until about 2008 or 2010. Then after 2010 they made it to where you can't send 7z and other attachments anymore that they recognize. Google screwed me over royally when they did that, because I sent 7z compressed archives of important documents to myself via email as a time capsule for cloud storage when needed again, but now Google refuses to let me download the attachments that were already sent to myself because it was blocked later! Google basically traps the data you send to yourself by email this way, making it inaccessible if or when they ban the attachment type. If I didn't still have a local copy of what I saved and compressed but can no longer get to, I'd have lost the data permanently. Plenty of reasons to still never trust Gmail or any other form of cloud storage. Best way to share files long term is to have your own http or ftp server. Next best way is free file host. Never Google, Google Drive, Dropbox, or anything like it. It will disappear or be refused read or download later whenever others want.
    4 replies | 248 view(s)
  • Gotty's Avatar
    31st July 2020, 21:58
    Gotty replied to a thread paq8px in Data Compression
    Sounds strange, indeed. I believe, using an AMD is not a problem. Does it have AVX2? If you use the -v option when start compressing your files and does it report that it uses AVX2? You may benefit from recompiling the source with -march=native and -mtune=native. Remark: the tweaks in v189 do not affect compression speed.
    2027 replies | 554537 view(s)
  • Darek's Avatar
    31st July 2020, 18:12
    Darek replied to a thread paq8px in Data Compression
    @ClmpressMaster - I'll do that. I'll also check latest 7zip option. Done. Scores for different archivers: 40'405'504 - tar version 20'762'347 - zip version - my previous file 18'547'630 - 7zip bzip2 version 17'951'250 - 7zip ppm2 version 17'159'665 - Winrar - rar version 16'103'044 - 7zip - lzma version 16'083'346 - 7zip - lzma2 version 15'625'846 - Winrar - rar4 version - uploaded but: score for Winrar 3.70b5 (max options for particular files) = 14'956'926 @moisesmcardona - how much slower is the -l option? It should be about 3 - 3,5 times slower than non LSTM.
    2027 replies | 554537 view(s)
  • CompressMaster's Avatar
    31st July 2020, 17:04
    CompressMaster replied to a thread paq8px in Data Compression
    @Darek, you could slightly decrease upload filesize if you compress DBA corpus as RAR with highest settings via winrar. I was able to shrink it down to 17,694,628 bytes.
    2027 replies | 554537 view(s)
  • moisesmcardona's Avatar
    31st July 2020, 14:16
    moisesmcardona replied to a thread paq8px in Data Compression
    Nope. I have 128GB RAM with over 60GB free RAM
    2027 replies | 554537 view(s)
  • LucaBiondi's Avatar
    31st July 2020, 14:08
    LucaBiondi replied to a thread paq8px in Data Compression
    maybe are you running out of memory?
    2027 replies | 554537 view(s)
  • suryakandau@yahoo.co.id's Avatar
    31st July 2020, 14:04
    I have changed context mixing process. Block40.dat and ts40.txt got better Too. It can compress and decompress ok not freeze at the end like MCM v0.84
    90 replies | 39313 view(s)
  • moisesmcardona's Avatar
    31st July 2020, 13:35
    moisesmcardona replied to a thread paq8px in Data Compression
    It's been over 3 days now compressing with v189 using LSTM and the compression progress is at 17%. Since v188 took 3 days, I wonder if this could be a result of: 1. using an AMD CPU 2. the new 24-bit images tweak. I'll probably abort some tasks and run on an intel CPU and see how long it takes there. I'm compressing the same files I compressed on v188 with LSTM.
    2027 replies | 554537 view(s)
  • lz77's Avatar
    31st July 2020, 12:05
    May be it's possible to encrypt rar archive with its header?
    4 replies | 248 view(s)
  • madserb's Avatar
    31st July 2020, 12:04
    madserb replied to a thread MCM + LZP in Data Compression
    Could you please inform us what changes you made? mixed40.dat compressed better but is there loss or gain for other files?
    90 replies | 39313 view(s)
  • lz77's Avatar
    31st July 2020, 11:50
    Ms1: Hopefully we will have a sufficient number of participants 1-2 months later to build the real leaderboards. https://globalcompetition.compression.ru/#leaderboards In the leaderboards for lzturbo you could select the option -b1000 (-b400 for a test data) with a bit better compression...
    51 replies | 4264 view(s)
  • CompressMaster's Avatar
    31st July 2020, 11:39
    Not only exes. I noticed problems when I´ve tried to send .RAR or .ZIP file. I resolved it via altering the extension i.e. file.exe.SUBOR
    4 replies | 248 view(s)
  • Shelwien's Avatar
    31st July 2020, 11:03
    Sure, I just did that before finding out that I have to write the number of floats to compressed file.
    6 replies | 347 view(s)
  • Shelwien's Avatar
    31st July 2020, 11:00
    gmail doesn't like exes in attachments: https://support.google.com/mail/answer/6590?hl=en Just upload it to some filehosting and send the link. Or did you mean something different?
    4 replies | 248 view(s)
  • suryakandau@yahoo.co.id's Avatar
    31st July 2020, 09:57
    I have send submission for gdcc via email but why that always block my attachment ?
    4 replies | 248 view(s)
  • suryakandau@yahoo.co.id's Avatar
    31st July 2020, 09:19
    Why i can not send submission for GDCC ? i have email globalcompetition@compression.ru and that blocked my submission.
    51 replies | 4264 view(s)
  • MegaByte's Avatar
    31st July 2020, 08:18
    MegaByte replied to a thread MCM + LZP in Data Compression
    No, they are collaborating on the code. https://github.com/hxim/paq8px/pulls?q=is%3Apr+is%3Aclosed That's not what you're doing.
    90 replies | 39313 view(s)
  • suryakandau@yahoo.co.id's Avatar
    31st July 2020, 07:58
    I just do it like in paq8px, whereas the original author is Jan ondrus but marcio Pais n gotty don't start new branch.
    90 replies | 39313 view(s)
  • Baker22's Avatar
    31st July 2020, 05:33
    Just as a quick followup, I think there might be a small error in how you size the out bytestream. The size of it seems to be based on the read binary file, even in decoding. So, I think it is possible for it to size the out stream too small and result in a segfault. This happened for me when using the p4nzzenc128v32 compressor(it had a compression ratio of around 4% since I had data where delta encoding was very effective). To fix this, I just initialized the byte stream, and declared its size later based of f_len*4 for the encoding and n*4 for the decoding. It is also very possible I was doing something incorrectly, but I just wanted to let you know what I found!
    6 replies | 347 view(s)
  • AlexBa's Avatar
    31st July 2020, 04:32
    Exactly! Thank you so much. For some reason google searches kept pointing me to tools for many dimensional data, and not for many points. I will work on implementing this.
    6 replies | 264 view(s)
  • Baker22's Avatar
    31st July 2020, 04:29
    Oh wow I was being pretty dumb. I didn't realize that you could just create byte pointers and cast them later. Thank you for your help!!
    6 replies | 347 view(s)
  • hexagone's Avatar
    31st July 2020, 02:53
    hexagone replied to a thread MCM + LZP in Data Compression
    Can you please rename your small changes mcm083_sk... or something ? Make it clear that it is your branch, not a release by the original author. Thanks.
    90 replies | 39313 view(s)
  • suryakandau@yahoo.co.id's Avatar
    31st July 2020, 02:37
    input 40,405,504 bytes dbacorpus.tar (darek benchmark) output 14,516,571 in 46.912s using -x11
    90 replies | 39313 view(s)
  • suryakandau@yahoo.co.id's Avatar
    31st July 2020, 02:31
    MCM v083.6 input 400,000,000 mixed40.dat output 52,198,319 in 290.809s
    90 replies | 39313 view(s)
  • Shelwien's Avatar
    31st July 2020, 02:19
    Seems to work like this?
    6 replies | 347 view(s)
  • Ms1's Avatar
    31st July 2020, 00:45
    Results for a number of publicly available compressors are published as reference data for Tests 1, 2, 3. Hopefully we will have a sufficient number of participants 1-2 months later to build the real leaderboards. https://globalcompetition.compression.ru/#leaderboards Also API for Test 4 is defined. https://globalcompetition.compression.ru/test4api/
    4 replies | 937 view(s)
  • Baker22's Avatar
    31st July 2020, 00:02
    Thank you for your help! I have one issue with my implementation that I am running into. When calling a floating point compressor(like the ones listed in fp.h) it asks for a uint32_t pointer as the input(here's a snippet of fp.h): // ---------- TurboPFor Zigzag of delta (=delta of delta + zigzag encoding) (TurboPFor) size_t p4nzzenc128v32( uint32_t *in, size_t n, unsigned char *out, uint32_t start); size_t p4nzzdec128v32( unsigned char *in, size_t n, uint32_t *out, uint32_t start); //----------- Zigzag (bit/io) ------------------------------------------------------- size_t bvzenc32( uint32_t *in, size_t n, unsigned char *out, uint32_t start); size_t bvzdec32( unsigned char *in, size_t n, uint32_t *out, uint32_t start); //----------- Zigzag of delta (bit/io) --------------------------------------------- size_t bvzzenc32( uint32_t *in, size_t n, unsigned char *out, uint32_t start); size_t bvzzdec32( unsigned char *in, size_t n, uint32_t *out, uint32_t start); To get this uint32_t pointer, I am currently using memcpy to reinterpret my floats as uint32_t and point to them. However, while this does work for running the compressor and getting my input back when compressing and decompressing, the compression ratios are awful(greater than 1). I'm guessing this means I am creating my uint32_t pointer incorrectly. How else should I do it? I also tried premultiplying my floats by 10^6 to create uint32_t's, and while this did work better, it seems like it shouldn't be necessary since TurboPFor explicitly advertises floating point compression. Is there something that I am missing? I posted a snippet of my code below. I'm sure there are plenty of errors in it(I'm not the most experienced), but maybe it will help with answering my question. The code opens a binary file f, and writes the compressed version to w, and after decoding rewrites the original to h so that I can check that the compression/decompression had no unintended effects. else { FILE* f = fopen( argv, "rb" ); if( f==0 ) return 2; fseek(f, 0, SEEK_END); int n = ftell(f); rewind(f); FILE* w = fopen( argv, "wb" ); if( w==0 ) return 2; FILE* h = fopen( argv, "wb" ); if( h==0 ) return 2; unsigned char *out; uint32_t * in, *cpy; in = (uint32_t*)malloc((n + 1024*1024)); out = (unsigned char*)malloc(n+1024*1024); cpy = (uint32_t*)malloc(n+1024*1024); uint32_t * checker = cpy; if (in == NULL or out == NULL or cpy == NULL) { printf("Memory not allocated.\n"); exit(0); } size_t numEl = 0; float a; int i=0; while(1) { if( fread( &a, sizeof(float),1, f )!=1 ) break; memcpy(&in, &a, 4); numEl++; } size_t newEl = p4ndenc128v32(in, n/4, out); float ratio = (float)newEl/numEl; fwrite(&n, sizeof(float),1, w); fwrite(out,sizeof(float),newEl,w); //size_t p4nzzdec128v32( unsigned char *in, size_t n, uint32_t *out, uint32_t start); size_t finalSize = p4nddec128v32(out, n/4, cpy); float b; for (int j=0; j<n/4; j++){ memcpy(&b,&cpy, 4); fwrite(&b,1, sizeof(float),h); }
    6 replies | 347 view(s)
  • Bulat Ziganshin's Avatar
    30th July 2020, 21:41
    > igzip could decode libdeflate -9 output, so it's not just customised for specific predefined huffman trees it may contain BOTH customized decoder and generic one. You should check decompression speed specifically on the libdefalte -9 output > I'm particularly impressed with how it stacks up against zstd -1 and bro --quality 0 with regards to speed vs size tradeoffs, while still using a compliant gzip stream. It's not a millin miles away from lz4 - faster for compression anyway! modern compressors are optimized toward binary data and large dictionaries. in particular, zstd -1 has min_match_len=5 or so, and also includes tricks such as repdist useless for text data. brotli format seems to be optimized toward high compression modes
    14 replies | 1232 view(s)
  • Darek's Avatar
    30th July 2020, 20:23
    Darek replied to a thread paq8px in Data Compression
    Yes, there was available partially. It's not official, there no internet place to get it. Here you are the testbed. The files and the description I've made. Due to fact that some files was get from existed programs from 1996 or images from non-verified sources then all files are for testing purpose only.
    2027 replies | 554537 view(s)
  • Eppie's Avatar
    30th July 2020, 16:54
    Eppie replied to a thread paq8px in Data Compression
    @Darek is it possible for me to download your testset? I would like to do some testing of my own, and I've searched the forum but haven't found it available. @Gotty/@mpais: Glad to see all the recent activity from you two :)
    2027 replies | 554537 view(s)
  • Darek's Avatar
    30th July 2020, 11:46
    Darek replied to a thread paq8px in Data Compression
    enwik scores for the latest versions of paq8px: 16'190'519 - enwik8 -12 by Paq8px_v187fix2, change: 0,00% - this version changes are compared to paq8px_v183fix1 16'080'588 - enwik8 -12eta by Paq8px_v187fix2, change: -1,74% 15'889'931 - enwik8.drt -12eta by Paq8px_v187fix2, change: -1,18% 127'626'051 - enwik9_1423 -12eta by Paq8px_v187fix2, change: -4,37% 124'786'260 - enwik9_1423.drt -12eta by Paq8px_v187fix2, change: -4,07% 15'900'206 - enwik8 -12leta by Paq8px_v188, change: -1,12% 15'503'221 - enwik8.drt -12leta by Paq8px_v188, change: -2,43% 15'907'081 - enwik8 -12leta by Paq8px_v188b, change: 0,04% 15'505'761 - enwik8.drt -12leta by Paq8px_v188b, change: 0,02% 15'896'588 - enwik8 -12leta by Paq8px_v189, change: -0,07% 15'490'302 - enwik8.drt -12leta by Paq8px_v189, change: -0,10% My raffle estimate of pure enwik9 is about 126'2xx'xxx bytes and DRT version about 121'6xx'xxx bytes.
    2027 replies | 554537 view(s)
  • Kirr's Avatar
    30th July 2020, 05:05
    Interesting, thanks for pointing out. I added this link to my to-do notes, and will try to add it to the benchmark (sooner or later).
    14 replies | 1232 view(s)
  • Darek's Avatar
    30th July 2020, 01:07
    Darek replied to a thread paq8px in Data Compression
    These are scores of 4 corpuses by v188, v188 LSTM, v188b LSTM and v189 LSTM. For Silesia corpus there are gains for every new version with LSTM.
    2027 replies | 554537 view(s)
More Activity