Activity Stream

Filter
Sort By Time Show
Recent Recent Popular Popular Anytime Anytime Last 24 Hours Last 24 Hours Last 7 Days Last 7 Days Last 30 Days Last 30 Days All All Photos Photos Forum Forums
  • Darek's Avatar
    Today, 22:42
    Darek replied to a thread paq8px in Data Compression
    Theroretically Ryzen 9 3950x shuold be about 30-33% faster than i7-4770HQ... Looks like there are 2 times slowdown from hidden reason. Of course different builds runs different on different architectures but it's eally strange. I've checked other benchmarks for these CPUs (CPU-Z, Cinebench r11.5, Cinebench r15, Cinebench r20, Geekbench 4.0, Geekbench 5.0, Passmark, SisoftSandra Arithmetic, Userbench) and all these tests shows in average about 33% of single thread dominance of 3950x vs. 4770HQ... These must be: a) worst (really worst) case scenario for Ryzen or b) some compile/build implications
    2040 replies | 558071 view(s)
  • Dresdenboy's Avatar
    Today, 22:41
    He created that data compression site with sources and interesting articles. For the other thing - not everything is in one's own hands. I know a similar case.
    15 replies | 2252 view(s)
  • moisesmcardona's Avatar
    Today, 20:12
    moisesmcardona replied to a thread paq8px in Data Compression
    Ok, here's the results of the AMD 3950x CPU along with the previous ones from Intel. Took aprox. 200,000 seconds more on AMD than on Intel. NOTE: The real number is the CPU time, since that logs the time the process was using the CPU. Run time is the overall time it took for the process to run.
    2040 replies | 558071 view(s)
  • JamesWasil's Avatar
    Today, 20:07
    Oops, I thought your last name was Tomato not Tamayo. I must have misread. I apologize for getting your name wrong. As for the rest of it, professional help of some sort may be needed, but I doubt that a lawyer to sue the tech firms is what's required for that. I sure hope that compgt is a troll of sorts. Otherwise, this is a very sad thing.
    15 replies | 2252 view(s)
  • Sportman's Avatar
    Today, 20:01
    I see some forums hacked today, don't know if this forum need an update: Base Score: 9.8 CRITICAL https://nvd.nist.gov/vuln/detail/CVE-2019-16759 vBulletin 5.6.0, 5.6.1, 5.6.2 Security Patch: https://forum.vbulletin.com/forum/vbulletin-announcements/vbulletin-announcements_aa/4445227-vbulletin-5-6-0-5-6-1-5-6-2-security-patch
    55 replies | 8598 view(s)
  • bwt's Avatar
    Today, 18:07
    I think if we develop from the existing source code it is more efficient and participant can grow up
    60 replies | 4702 view(s)
  • bwt's Avatar
    Today, 18:02
    The main important thing You must have fresh idea to write it from scratch n improve for several months..
    60 replies | 4702 view(s)
  • lz77's Avatar
    Today, 17:31
    An unique LZ77+Huffman can claim price only in Rapid Compression of english text...
    60 replies | 4702 view(s)
  • algorithm's Avatar
    Today, 17:24
    The problem is that there are too many categories. If there were half as many, the prize would be 6000.
    60 replies | 4702 view(s)
  • bwt's Avatar
    Today, 15:27
    from gdcc notices thread, ms1 said do own work it means we can not combine open source. it means we must to code from scratch. and it is appreciated only 3000 euros.
    60 replies | 4702 view(s)
  • Shelwien's Avatar
    Today, 15:08
    Combining open-source preprocessors and coders hardly requires months of work. And on other hand, an unique new work could claim prizes in multiple categories.
    60 replies | 4702 view(s)
  • bwt's Avatar
    Today, 13:19
    Yes, for several months the programmer only appreciated only 3000 euro. It is so dramatic...
    60 replies | 4702 view(s)
  • lz77's Avatar
    Today, 12:59
    I think so too. An ordinary programmer in a Google office gets $10,000 per month. Not every programmer wants to write a custom program for several months...
    60 replies | 4702 view(s)
  • bwt's Avatar
    Today, 12:47
    I think that the participants is not much interest because the prize is small.
    60 replies | 4702 view(s)
  • Darek's Avatar
    Today, 11:31
    Darek replied to a thread paq8px in Data Compression
    @moisesmcardona - yes, it really strange that CPU with 150-160 points in Cinebench r15 (single) runs faster than 3950x which achieve about 200-210 pts in Cinebench. My scores of 4 corpuses - both versions -> with -r and w/o this option plus best of them. paq8px_v190 got 2'nd place in Silesia benchmark now! It loses to cmix v18 "only" 183KB.
    2040 replies | 558071 view(s)
  • compgt's Avatar
    Today, 10:58
    Hurling insults to my name now, attacking me personally when you can just have an open mind?... The corrupt people using my Hollywood billion$ for themselves it seems created that profile "Gerald Tamayo math wizard" in the news here in the Philippines, also a child math whiz kid?! who was good in calculating square roots of numbers, and who was begging on the streets for money. That's in the news in 2014 or 2015, though i think i saw it already earlier. This they did to discredit me when they found out i have an internet presence already via my "The Data Compression Guide" in 2008. I don't despise the whiz kid, of course, no offense to him; He was just being used by these corrupt people. This will make people who google "Gerald Tamayo" think and perplexed that i'm that square root kid. I am the "Gerald Tamayo" whiz kid by that name, but in the 1970s to the 80s, pioneering modern computing and quantum computing, and was prolific in composing songs for the Hollywood singers and bands and making the Hollywood blockbuster movies. These hardcore corrupt people will do everything to discredit me or destroy me, at the same time shamelessly partaking of my Hollywood and tech billion$. Yes, i made some "Knight Rider" episodes too and programmed its analog AIs, when i was Hollywood.
    15 replies | 2252 view(s)
  • Trench's Avatar
    Today, 06:35
    that page has multiple issues to deal with. It's hard enough to extract/convert text that is presented clearly in a picture let alone not clearly. If their is no money to be made or a passion for it no one will invest time in it. No one can even make a simple program to put together a jigsaw puzzle. Most programs that make panorama distort the image to a degree and not not handle pixel art that is obvious. It almost seems like everyone wants to make the same programs as the other since people like to make what the other did better than make something original. But I think if someone wants to try it then it takes time for a program to scan it many times with many color variations and curves to spot some letters at every square. The word "THE" is kind of clear on the maybe. the rest are smaller fonts which get blurred. Every assumed letter has to be adjusted to get proper results and stores and then compared with a dictionary to find a match. If the human eye can not detect a hint of a letter which not likely a computer will. Maybe the best algorithm would be to quickly flicker multiple color variation combination and have the computer try to detect every square to match with a letter. compare it with other letters that you know that are behind a papers and see if it gets some match. But in the end i dont see it done with your example. maybe a perfectly new paper. The details must be first increased which another program is required, the bumps on the paper dont help. If you have a secret message on that no one will get it ever. Art programs have more options to to distort images than make them better. lol
    2 replies | 83 view(s)
  • moisesmcardona's Avatar
    Yesterday, 23:50
    moisesmcardona replied to a thread paq8px in Data Compression
    My CPUs are Intel i7-4700MQ and i7-7700HQ. Both finished faster than the AMD CPUs that are currently running. Here are some numbers: NOTE: Computer names are wrong because I swapped the HDD and I haven't renamed them. But you can see the CPU details there. Files being compressed are around 230MB, 24-bit TIFF files. Compressing using -9l.
    2040 replies | 558071 view(s)
  • JamesWasil's Avatar
    Yesterday, 22:55
    Yes, Gerald Tomato will tell us that the newest Knight Rider was already done and voiced by him during the cold war in the Philippines and not by William Daniels, and that it was done during the 70's even though he never knew about it. Funny how that works. Speaking of Knight Rider though, you know KITT and KARR had to use a great deal of data compression to store data with limited space and for transmissions with signals and frequencies. I wonder why the show never mentioned that or tried to (the original 80's show or the horrible reboots that came later). I figure Huffman or LZ77 might have been worth a mention at least once when talking about a state of the art computer and car of the future, but none of the episodes ever did?
    15 replies | 2252 view(s)
  • Darek's Avatar
    Yesterday, 22:30
    Darek replied to a thread paq8px in Data Compression
    I can test some file to compare Intel CPU to AMD. I'm wondering how much faster is 3950x than 8950HK.
    2040 replies | 558071 view(s)
  • moisesmcardona's Avatar
    Yesterday, 21:45
    moisesmcardona replied to a thread paq8px in Data Compression
    The AMD tasks are still running. I'll post the CPU runtime once they finish. May take another day for the 3950x system and like 3 more days for the 1700 CPU.
    2040 replies | 558071 view(s)
  • Darek's Avatar
    Yesterday, 21:33
    Darek replied to a thread paq8px in Data Compression
    Could you provide some numbers?
    2040 replies | 558071 view(s)
  • Stefan Atev's Avatar
    Yesterday, 20:39
    Thanks, this will save me some digging. I don't need a ton of examples, even 1 or 2 examples are better than pretending that text or that random chunks of compiler-produced x64 is representative....
    48 replies | 3700 view(s)
  • JamesWasil's Avatar
    Yesterday, 19:47
    I have often wondered about this. Let's suppose you have something ready for the world and ready for the market. Is it better to have a patent, a patent pending, or a copyright on it considering the semantic restrictions and red tape on works now for you to retain your rights to it?
    2 replies | 192 view(s)
  • Dresdenboy's Avatar
    Yesterday, 18:14
    You might check the Hardcode collection (http://hardcode.untergrund.net/). It surely has some uncompressed intros >=1k and others might be decompressed by using some general unpacker or common tools (apack, upx) to decompress. Or use a debugger (which I did for a few). Beware of AV software warnings, though! ;)
    48 replies | 3700 view(s)
  • Shelwien's Avatar
    Yesterday, 17:12
    Obviously extracting non-existent information is impossible. We can extrapolate from visible edges etc, but results would be frequently incorrect. https://www.3dnatives.com/en/photogrammetry-software-190920194/#!
    2 replies | 83 view(s)
  • Stefan Atev's Avatar
    Yesterday, 16:51
    Hi, is there a place where I can find decompressed demos? I'd like to use real data when tweaking parameters / experimenting.
    48 replies | 3700 view(s)
  • Ms1's Avatar
    Yesterday, 16:31
    We got submissions based on 3rd-party source code which perform marginally better (if at all) than the originals. I'd like to remind our conditions: http://globalcompetition.compression.ru/rules/#participant-requirements In particular: "Each participant guarantees that the submitted compressor or compressors are that participant’s own work or, if they’re based on third-party source code, they differ essentially from that code and the participant has received all necessary permissions to modify and submit the code to this competition. Each participant must also guarantee that the submitted compressor or compressors violate no third-party intellectual-property rights." The spirit of the condition is that your modifications should produce a different compressor which performs very different from the original, and you don't abuse intellectual property rights and copyrights of other people. We don't have any exact objective criteria what does "differ essentially" mean, our decisions on this are subjective, but, to give a rule of thumb, don't have any hope if the modified compressor shows compression ratio improvements which are worse than 3% for the same (or similar) speed in case of public parts of the tests. Do something your own, don't mechanically train and tune.
    5 replies | 1083 view(s)
  • moisesmcardona's Avatar
    Yesterday, 15:55
    moisesmcardona replied to a thread paq8px in Data Compression
    Regarding the Intel vs AMD tests, the Intel CPUs are faster than the AMD. And here's this, I compiled it on my AMD Ryzen using NATIVECPU and the Intel machine was faster. Also, consider the fact that the Intel CPUs I have are all i7 mobile CPU parts (4c/8t) while the AMD Ryzen CPUs are all Desktop parts. So in conclusion: 1. Compiled PAQ8PX on my AMD machine using GCC with NATIVECPU 2. Ran PAQ8PX on Intel and AMD machines, which half threads used. 3. Intel tasks finished faster than the AMD tasks. Update: The first-gen Ryzen is half slower than the 3rd-gen. Mainly due to its AVX2 implementation being just 128-bit and requiring 2 cycles when the 3rd-gen supports 256-bit natively.
    2040 replies | 558071 view(s)
  • Dresdenboy's Avatar
    Yesterday, 15:03
    Temisu (oneKpaq developer) also has collected a lot of depacker code (C++) here: https://github.com/temisu/ancient There are no small asm depackers, though, but at least the sources could give an idea of what could be implemented with small code footprint in asm. Another search revealed some more information and sources of baah's packer "louzy77" (the 32B depacker on 68000): http://abrobecker.free.fr/text/louzy.htm
    48 replies | 3700 view(s)
  • Trench's Avatar
    Yesterday, 04:03
    It seems like many in the US don't know about that religion despite it was the first christian religion with the first churches and they put together the new testament bible but don't rely on it since Christ did not come to make the bible but apostolic succession to make the Church. The Catholic Churches pope goes directly against what Christ said that everyone is equal and no one above the other yet the pope is presented as above the rest. About every crusade was against the Orthodox church which stole a lot of thing which the previous pope said he was sorry for that but wont give anything bad which is a hollow apology. Which the Ottomans Turks saw their opportunity of a weakened county and invaded which helped the catholic church be the main power to preach. The protestant sects that spin off the Catholic church interoperates the bible as they see fit and as the world of God despite the Orthodox church has more resources than the bible and they also go against what Christ says since they have no apostolic succession and everyone has their own interest that present it and miss the point. No one is saved automatically to just believe and salvation come from doing the right thing. A sin means to miss the mark to fail and to love sin is to love failure, which is not right. To be better and not stagnate and rot. In short its something to make yourself better. A Christian majority nations are places that most in the world want to go to. When that does not exist every nation will be one big prison.
    0 replies | 46 view(s)
  • CompressMaster's Avatar
    Yesterday, 01:07
    Maybe irrelevant question, but... It´s possible to see particular object behind another in image by using advanced algorithms? If not, what do you think in the near future, it would be possible some day? I´m just asking...
    2 replies | 83 view(s)
  • pacalovasjurijus's Avatar
    9th August 2020, 15:50
    We enjoy writing programs. We like writing programs.
    4 replies | 254 view(s)
  • Darek's Avatar
    9th August 2020, 11:13
    Darek replied to a thread paq8px in Data Compression
    Scores of my testset for paq8px_v190. Option -r gives some slight improvements to textual files.
    2040 replies | 558071 view(s)
  • danlock's Avatar
    9th August 2020, 06:52
    I wonder if it relates to the new Knight Rider series I heard was coming?
    15 replies | 2252 view(s)
  • danlock's Avatar
    9th August 2020, 06:00
    danlock replied to a thread 7zip update in Data Compression
    A small fix to 7-Zip 20.01 was posted. Source of following text: https://sourceforge.net/p/sevenzip/discussion/45797/thread/9dbfea1e30/ 7-Zip 20.02 alpha was released. 7-Zip for 64-bit Windows x64: https://7-zip.org/a/7z2002-x64.exe 7-Zip for 32-bit Windows: https://7-zip.org/a/7z2002.exe What's new after 7-Zip 20.00 alpha: Notes about changes in 7-Zip 20.02 size of solid block for 7z/LZMA2 archives was increased, so 7-Zip can use more threads and chunks to compress solid block. It can improve the compression speed for multi-core processors like Threadripper/Epyc and Xeon. Also it can slightly improve multi-core LZMA2 decompression speed. Some important parts of PPMd code were rewritten for faster compression and decompression. And that new PPMd code works faster for data that is not too compressible, like exe files or already compressed data. Also PPMD in previous 7-Zip could be slow in some Intel CPUs for decompression, if CVE-2018-3639 Speculative Store Bypass (SSB) mitigation was enabled in system. And new PPMd code must work faster for such cases too. Another change of 7-Zip 20.02 is speed optimization in Delta filter code, that is about two times faster than in previous version. 7-Zip testing You can post the results of tests to to show the changes between 7-Zip 20.00 and new 7-Zip 20.02. Notes: you can test any version of 7-Zip without full installation. You just need to extract exe installer of 7-Zip as archive file to any folder and run 7-zip from that folder. following command is a benchmark command that can show some changes of new version 20.02: 7z b -mmt=* -mm=* -bt > b.txt The only expected changes in benchmark results between 20.00 and 20.02 versions are PPMd and Delta lines. If you have CPU with big number of threads, you can compare 7z/LZMA2 compression speed for big data (more than 4 GB) with versions 20.00 and 20.02: 7z a mx1.7z files -bt -mx1 >> a.txt 7z t mx1.7z -bt >> a.txt 7z a mx5.7z files -bt -mx5 >> a.txt 7z t mx5.7z -bt >> a.txt 7-Zip 20.02 can be faster than 20.00, if your CPU has big number of threads. What's new after 7-Zip 20.01 alpha: the bug was fixed: 7-Zip File Manager 20.01 could lock archive files while 7-Zip File Manager is running.
    1 replies | 311 view(s)
  • pacalovasjurijus's Avatar
    8th August 2020, 14:14
    I like writing programs. c=0 while c < -1: c=c+1
    46 replies | 5428 view(s)
  • Dresdenboy's Avatar
    8th August 2020, 07:52
    Dresdenboy replied to a thread Crinkler in Data Compression
    Somewhat related: There is another, less known compressor for 1K/4K intros called oneKpaq (developed for Mac with clang compiler, but I could get it running on Win with some changes, at least providing compressed sizes), with a 128-165 byte decompressor code (not including executable file format headers, which are included in the Crinkler decompressor). Here's the repository: https://github.com/temisu/oneKpaq
    2 replies | 2848 view(s)
  • danlock's Avatar
    8th August 2020, 07:31
    danlock replied to a thread Crinkler in Data Compression
    As old as this thread is, I believe it is the correct place to put the notification that the popular demoscene executable compressor (compressing linker), Crinkler, has been made open-source: https://github.com/runestubbe/Crinkler I acknowledge that it's been several weeks and that Dresdenboy already posted about Crinkler's new status and linked to its Github repository in the '(Extremely) tiny decompressors' thread.
    2 replies | 2848 view(s)
  • avitar's Avatar
    7th August 2020, 20:13
    avitar started a thread 7zip update in Data Compression
    ​7-Zip 20.01 alpha was released.7-Zip for 64-bit Windows x64: https://7-zip.org/a/7z2001-x64.exe 7-Zip for 32-bit Windows: https://7-zip.org/a/7z2001.exe What's new after 7-Zip 20.00 alpha: The default number of LZMA2 chunks per solid block in 7z archive was increased to 64. It allows to increase the compression speed for big 7z archives, if there is a big number of CPU cores and threads. The speed of PPMd compressing/decompressing was increased for 7z/ZIP/RAR archives. The new -ssp switch. If the switch -ssp is specified, 7-Zip doesn't allow the system to modify "Last Access Time" property of source files for archiving and hashing operations. Some bugs were fixed. New localization: Swahili. See later update 2002 below!
    1 replies | 311 view(s)
  • ivan2k2's Avatar
    7th August 2020, 03:55
    MultiPar with subfolders
    1 replies | 82 view(s)
  • SvenBent's Avatar
    7th August 2020, 02:20
    I am backing up som family pictures and i want to fully utilize the DVD's capacity for dat sefty redudancy I remember par files used to be the to go for to handle this so i went and downloaded quickpar 0.9 However this program is only able to crate par files for files in a current directory and no sub or multiple directories Is there a tool better for this purpose and able to handle multiple levels in a folder structure ?
    1 replies | 82 view(s)
  • encode's Avatar
    6th August 2020, 19:41
    For comparison, i7-9700K @ 5.0 and 5.1 GHz:
    208 replies | 127173 view(s)
  • withmorten's Avatar
    6th August 2020, 16:57
    I actually found that a bit after and forgot to mention it, thanks for getting back though :) Nice IDA historical archive in any case, and via TOR the speed wasn't quite so bad.
    4 replies | 547 view(s)
  • LucaBiondi's Avatar
    6th August 2020, 14:48
    LucaBiondi replied to a thread paq8px in Data Compression
    Hi Mpais, what about a preliminare model for mp3 files? Do you think should be an easy task? Luca
    2040 replies | 558071 view(s)
  • moisesmcardona's Avatar
    6th August 2020, 14:43
    moisesmcardona replied to a thread paq8px in Data Compression
    BTW, I noticed when using the `-v` flag it would print levels -1 to -9. I updated it to show instead up to level -12 rather than -9. Be sure to update the code! This is simply a cosmetic change. :)
    2040 replies | 558071 view(s)
  • Dresdenboy's Avatar
    6th August 2020, 13:41
    I think it might just be better than current LZ77-family-like algorithms (LZB etc.) in specific situations, e.g. with a high percentage of literal bytes and still multiple reuses for matched strings (so the savings for match lengths get bigger), or just a high number of reused strings overall (texts). But this remains to be seen. Possible difficulties in staying efficient come from overlapped matches (like ABRA and BRA, see my notes for leaving match strings in the literals block).
    12 replies | 669 view(s)
  • lz77's Avatar
    6th August 2020, 13:14
    But compressed data must contain positions 0 and 7, 5 and d? How can they take fewer bits than literal counters 2-4 bits long? And how I can use offset history buffer? Among 256 offsets will not be the same ones. And their high bytes are unlikely to be equal...
    12 replies | 669 view(s)
  • Dresdenboy's Avatar
    6th August 2020, 11:49
    As comgt described, you have a literals data block (in his variant only containing literals, which weren't matched) and the match strings (literal string + positions to copy it to) in some list. Example: Original data: ABRACADABRA12AD 0123456789abcde (position) Match strings: ABRA - copy to pos 0 and 7 (ABRA...ABRA....) AD - copy to pos 5 and d (ABRA.ADABRA..AD) Literals left over: C12 - copy into remaining spots "." Re LZ77: It might be difficult to get there. With just over 4 MB literals left, the length < 4 matches can't remove more than that (and cost encoding bits). So the most surely could be gained from the encoding.
    12 replies | 669 view(s)
  • lz77's Avatar
    6th August 2020, 11:06
    I still can't figure out how to decompress LZT data if there are no literal counters...And I want to compress TS40.txt to at least 120 Mb in 10 seconds. But LZ77 only will not be able to do this.
    12 replies | 669 view(s)
  • Darek's Avatar
    5th August 2020, 22:15
    Darek replied to a thread paq8px in Data Compression
    enwik9 scores of last three changes: 130'076'196 - enwik9_1423.drt -9eta by Paq8px_v183fix1, 124'786'260 - enwik9_1423.drt -12eta by Paq8px_v187fix2, change: -4,07% 121'056'858 - enwik9_1423.drt -12leta by Paq8px_v189, change: -2,99%, time 330'229,11s - it's quite close to cross 120'000'000 line :)
    2040 replies | 558071 view(s)
  • Dresdenboy's Avatar
    5th August 2020, 20:38
    For RLLZ you might have to rewrite and rebalance everything. You already have good results. At this stage you might turn it into an optimization problem with parameter tuning.
    12 replies | 669 view(s)
  • lz77's Avatar
    5th August 2020, 18:05
    Thanks, but I haven't figured out how to use this to improve my compressor yet. All the more so for winning 3000 euros. :) For example, if I exclude literal length bits from my code words, I will improve ratio on ~1.5%... I've tried to use 00, 01, 10, 110, 111 prefixes, but it made the compression worse. At this time my baby LZ77 compressor without compiler optimizations compresses a bit better and 20% faster than lzturbo -p0 -22 -b1024... on I3 5005U. May be on newer CPUs the comparison results will be different...
    12 replies | 669 view(s)
  • Dresdenboy's Avatar
    5th August 2020, 15:30
    It looks like LZ77 wants to give RLLZ a try. :) I think we can't help him here.
    15 replies | 2252 view(s)
  • Dresdenboy's Avatar
    5th August 2020, 14:28
    I think, this is a nice opportunity to learn something new. When I did my research of current LZ versions and probably forgotten ones from the past, I found a lot of interesting ideas! There is no strong relation for specific values (e.g. offset 360, len 5), but longer matches usually are further away (see links below), as the probability to find a combination of letters (or other symbols) shrinks with the length. Instead of using another prefix bit, you might redistribute your current offset ranges among these 4 subdivisions. Or really create a variable length prefix tree (00, 01, 10, 110, 111 or sth. else, costing only additional bits for the biggest offsets, which most likely involve longer match lengths, being more seldom but saving lots of bytes already). For such things I used detailed stats in Excel to play around with. I found them useful for smaller binary files. It just depends. Large texts could contain more of the longer matches, of course. But it's certainly not wrong to look at the frequency of shorter lenghts (with shorter offsets, since a 2 byte match with all flags etc. should be smaller than 2 literals). These two pages give some interesting insights based on an (also linked) "FV" tool: http://mattmahoney.net/dc/textdata.html http://www.fantascienza.net/leonardo/ar/string_repetition_statistics/string_repetition_statistics.html (which is linked from Matt's page) This would at least be interesting to see its capabilities. I made some notes while playing it through in my mind, adding/changing some bits of that algo already at that level. I'll send you the bullet point list via PM.
    12 replies | 669 view(s)
  • lz77's Avatar
    5th August 2020, 13:17
    Unfortunately, I'm not an English speaker, I learned German (long ago...) I understand not all the LZ77 compression improvements yet. I see no relation between len/offset. Also I see no repeated offsets when compressing enwik8. I see only repeated match strings like 'the ', ', and ', 'and ', 'of the ', ... If I want to use history of matched substrings, I will need one more prefix for these codewords? For example, if I'm using 4 prefixes 00-11 for 4 length of offset, I will have to go to 3 bit prefixes. Won't those 3-bit prefixes eat up the benefits of using them? I'm using a minimum match length of 4 bytes as LZ4. How could you use the match length of 2 and 3 bytes? I'm going to use some ideas from the thread "Reduced Length LZ (RLLZ): One way to output LZ77 codes". I doubt that it gives a big win, but after using this idea, the compression will no longer be so fast...
    12 replies | 669 view(s)
  • compgt's Avatar
    5th August 2020, 10:42
    Right, it is better to just write it as a binary number, since the range or length in bits is needed anyway. My variable-length code is more compact i think. Only increases in bitlength (by 1) after all possible values are used up for that bitsize. This is Rice coding generalized, i think. /* Filename: ucodes.h (universal codes.) Written by: Gerald Tamayo, 2009 */ #include <stdio.h> #include <stdlib.h> #if !defined(_UCODES_) #define _UCODES_ /* Unary Codes. */ #define put_unary(n) put_golomb((n),0) #define get_unary() get_golomb(0) /* Exponential Golomb coding */ #define put_xgolomb(n) put_vlcode((n), 0) #define get_xgolomb() get_vlcode(0) /* Elias-Gamma coding. Note: don't pass a zero (0) to the encoding function: only n > 0 */ #define put_elias_gamma(n) put_xgolomb((n)-1) #define get_elias_gamma() get_xgolomb() /* Golomb Codes. */ void put_golomb( int n, int mfold ); int get_golomb( int mfold ); void put_vlcode( int n, int len ); int get_vlcode( int len ); #endif I just use my put_vlcode() function for short codes. Last time i checked, this is related to Rice codes. /* Filename: ucodes.c (universal codes.) Written by: Gerald Tamayo, 2009 */ #include <stdio.h> #include <stdlib.h> #include "gtbitio.h" #include "ucodes.h" /* Golomb Codes. We divide integer n by (1<<mfold), write the result as a unary code, and then output the remainder as a binary number, the bitlength of which is exactly the length of the unary_code-1. In the implementation below, mfold is an exponent of two: mfold = {0, 1, 2, ...} and (1<<mfold) is thus a power of two. Each 1 bit of the unary code signifies a (1<<mfold) *part* of integer n. In *exponential* Golomb coding, each 1 bit signifies succeeding powers of 2. (We allow a length/mfold of 0 to encode n as a plain unary code.) */ void put_golomb( int n, int mfold ) { int i = n >> mfold; while ( i-- ) { put_ONE(); } put_ZERO(); if ( mfold ) put_nbits( n%(1<<mfold), mfold ); } int get_golomb( int mfold ) { int n = 0; while ( get_bit() ) n++; n <<= mfold; if ( mfold ) n += get_nbits(mfold); return n; } /* The following variable-length encoding function can write Elias-Gamma codes and Exponential-Golomb codes according to the *len* parameter, which can be 0 to encode integer 0 as just 1 bit. */ void put_vlcode( int n, int len ) { while ( n >= (1<<len) ){ put_ONE(); n -= (1<<len++); } put_ZERO(); if ( len ) put_nbits( n, len ); } int get_vlcode( int len ) { int n = 0; while ( get_bit() ){ n += (1<<len++); } if ( len ) n += get_nbits(len); return n; } Treat the number as a binary number (not decimal number) and transform it into a variable-length code.
    4 replies | 305 view(s)
  • Lucas's Avatar
    5th August 2020, 07:03
    After a bit more reading it doesn’t appear to be a compression method at all, we are taking it a bit too literally. It appears to be a transform into the unary domain with finite width codes. Seems kinda dumb at first until you consider optimizations, lookup tables can be used to generalize the unary domain of integers using this method. It appears they made this to get into that domain for biological nueral networks (songbirds). It’s not directly applicable to compression, but is essentially a preprocessor for changing domains, so it can be thought of as a model which can be used for compression, rather than a direct compression method itself. It would be nice if they explained the unary domain better though. As to why it is better for training biological NN’s, I’m not sure why...
    4 replies | 305 view(s)
  • cottenio's Avatar
    5th August 2020, 05:34
    I think we share the same struggle about it being "considered a variant of unary"; especially after I just gave reading his paper a go. ("Generalizing Unary Coding", DOI: 10.1007/s00034-015-0120-7) He mentions the possible uses in data compression related to neural networks - with the idea that a "spatial form" could represent an array of neurons. What I'm started to realize though, after looking through the Wikipedia page history, is this might be a case of author self-promotion: there seems to be a lot of edits over time adding his journal articles into the reference material for unary and neural network related pages. I'm honestly having trouble parsing this sentence from his paper:
    4 replies | 305 view(s)
  • JamesWasil's Avatar
    5th August 2020, 03:44
    Did you invent KITT for David Hasselhoff to play on Knight Rider as Michael Knight? Or did Wilton Knight come to you to get the design ideas and understand how the molecular bonded shell works?
    15 replies | 2252 view(s)
  • JamesWasil's Avatar
    5th August 2020, 03:10
    There are times that you understand things that the world doesn't see. To some this is an obligation, while to others it is a blessing.
    0 replies | 85 view(s)
  • Lucas's Avatar
    5th August 2020, 03:10
    It appears useless for compression, maybe it was designed for some other application in mind. Since this system requires a limit on the numbers in the included 0-15 example, we could have just encoded 4-bit wide binary codes instead. Regular unary and universal codes don't impose a limit on integer size, while this does. I'm also struggling to grasp why this is considered a variant of unary, let alone an improvement.
    4 replies | 305 view(s)
  • dado023's Avatar
    4th August 2020, 23:56
    Hm.....do we(ppl on forum) have standardized test collection of PNGs...i am willing to download and test, just provide me with samples to test with.
    85 replies | 24085 view(s)
  • Krishty's Avatar
    4th August 2020, 23:05
    I don’t know :( If you or someone else would like to go ahead and benchmark it, I’d be very happy to learn! I’m using ECT and I’ve seen Pingo talk in the ECT thread, but to be honest I didn’t understand the bottom line.
    85 replies | 24085 view(s)
  • dado023's Avatar
    4th August 2020, 22:38
    Does this compresses higher than Pingo ?
    85 replies | 24085 view(s)
  • Krishty's Avatar
    4th August 2020, 21:01
    … and then came the COVID lockdown … I haven’t forgotten you, Jaff. I haven’t implemented the trailing data handling yet, but your thumbnail optimization suggestion works really well. Find attached the portable version of the current build; the setup/update can be found on my site. Feedback is always welcome. I notice that JPEG optimization is somewhat slower now, because ExifTool (which is my bottleneck) runs once more per file. Changes: added JPEG thumbnail optimization added option to remove Adobe metadata from JPEG This was suggested to me and it indeed strips some files further. ExifTool doesn’t do it by default. Be aware, though, that there is a good reason for that: The data may be required to correctly identify the image’s color space. If you use this option, be sure to check the result; especially if you use CMYK & Co a lot. For an example of what could go wrong, see here. improved launch speed improved performance during resizing updated ECT to 0.8.3 updated ExifTool to 12.03 fixed genetic PNG optimization appearing frozen Thanks to fhanau for the help! fixed wrong status messages minor performance improvements
    85 replies | 24085 view(s)
  • cottenio's Avatar
    4th August 2020, 20:58
    I was looking into unary coding while working on a compression project for artificial neurons, partially spurred on by the neat research into birdsong/HVC. On my journey I found that the Wikipedia page for unary coding has a dedicated subsection for "Generalized unary coding" by Subhash Kak. https://en.wikipedia.org/wiki/Unary_coding I read the description but came away with no concrete understanding of its nature or utility. It reads like an arbitrary fixed length encoding, and even requires "markers" for "higher integers." Am I reading this correctly and its hogwash, or is there some deeper usefulness that I'm missing here?
    4 replies | 305 view(s)
  • compgt's Avatar
    4th August 2020, 17:56
    > What is the state of implementing this idea? Not implemented yet. I stopped coding in 2010. I just know this makes very compact LZ. If i recall correctly that James Storer and Jennifer Chayes of Microsoft went to me in my high school (Philippines) in the early 1990s, then it was at that time that i mentioned to Storer the idea of transmitting the literals last, in LZ77. I didn't elaborate though, as i hadn't been programming anymore that time. I didn't have access to an own computer in the 1980s. Mostly theoretical. I had no achievement in high school except when Hollywood people (and Jennifer Chayes!) went to me in school to sing or maybe re-record the official songs for Bread and America bands, maybe Nirvana too, etc. I didn't get some million$ for my Hollywood music and movies that time, not even thousand$. See, the lead singer in Bread is a "Gates" though it's my voice in the band's modern "official" songs. Others called me "David Gates" or "Bill Gates". To some, i was the real Bill Gates.
    15 replies | 2252 view(s)
  • Darek's Avatar
    4th August 2020, 17:51
    Darek replied to a thread paq8px in Data Compression
    As I found the "english.exp" file was changed - now it's about 2KB smaller. Other two "english" looks as the same.
    2040 replies | 558071 view(s)
  • Dresdenboy's Avatar
    4th August 2020, 16:39
    What is the state of implementing this idea? I had some thoughts and played it through in my mind (knowing my match/literal statistics of compressing binaries, no text though) in different variants (for example: not storing the matched string literal bytes with the offset list, but referencing it from the literals block, avoiding the first explicit copying from the string list to some first position (code bits!) and leaving that to the literal filler instead). So far I think that saving the literal/match flags isn't enough to offset the needed pointers to either the first occurence or the literal block, even with some kind of delta coding, either going sequentially forward through referenced literals or target positions will cause more randomized references on the other side. But for text compression it might actually work due to more frequent reuse of matched strings. In my files I mostly see single time matches (bytes at same absolute position only copied once for a match). There is surely a way to calculate the benefits based on reuse probabilities for strings.
    15 replies | 2252 view(s)
More Activity