I develop a algorithm can compress text from 10000 letters to 9 letters ,it can decompress so.But useless for me , i want to sell it .
I develop a algorithm can compress text from 10000 letters to 9 letters ,it can decompress so.But useless for me , i want to sell it .
What kind of text? Are there many repetitions? If so, it´s easily compressible.
If your file is random, then you are out of luck. But random content is compressible at least to few percent, but not that big as you have mentioned. But never say never - I am working on my custom data compression software that will be able to handle ANY filetype and compress it at least to 90% losslessly, but it will be terribly slow.
Could you post some screenshots of your algorithm or at least compressed sample? Maybe then we can tell you more about it and also we can help you to compress it much better.
Thanks.
Last edited by CompressMaster; 9th May 2019 at 22:05. Reason: small typo
Sorry , i can't show you. mine just algorithm not software so got some lack. It can compress random letters and 10k letters is random.
You don't need to sell it directly.
Just apply to http://prize.hutter1.net/ or http://mailcom.com/challenge/ or https://marknelson.us/posts/2012/10/...turns-ten.html .
There're also plenty of other contests where you can advertise your work.
thanks.But my algorithm not so advance , can't reach at that level.
Just split it to blocks. If you can compress 10000 letters to 9 bytes, it means you can split enwik8 to 10k blocks and compress them to 10k*9=90k total.
It means you can claim the whole 50k euro prize.
but when decompress it need huge database or super computer and it cant compress chinese word .That's why i want let it go.
Try to compress these pure random files and post compressed results in attachment.
Database size is not a problem for me and chinese strings can be completely filtered. So that´s not problem. I don´t need to decompress your results back to original files, I want only compressed archive of original files. Thanks.
for 1 million random alphabet letters.txt -1028 bytes result
Last edited by Obama; 10th May 2019 at 14:46.
You mean 1,000,000 bytes input, 1,028 bytes output is 973 times smaller?
Did decompress also work and is file compare output equal to input?
What file size has your compress and decompress software and do it use a database, if yes what size has the database?
How long took it to compress and decompress?
What program language did you use?
Any idea for what price you want to sell your algorithm?
Its not a problem even if it can only compress valid english... Just type out the data as text, ie 0xFF = 255 = "two five five".
Even if enwik8 becomes 1G, it should be still compressible to 900k, so you'd still get the full prize.
Btw, hashing is not a solution for compression, not because it needs "huge database or super computer"
to restore input data from hash value, but because of collisions.
Even with assumed charset [\x20a-z] of 27 letters, you'd still start having collisions with 16 input symbols
and 9 bytes of output:Also, you can't sell software rights that easily - the trade has to be officially registered in some way,Code:16 letters = 27^16 = 79766443076872509863361 9 bytes = 256^9 = 4722366482869645213696
usually you'd get a patent for your algorithm, then sell it.
Otherwise you can always claim that your software was stolen, once the buyer starts making money from it.
Obama (10th May 2019)
You so nice , lead the stranger to the point.So good you are here.
Input:
1,000,000 bytes - 1 million random alphabet letters.txt
Output:
588,286 bytes - paq8px v178
588,001 bytes - cmix v17
-------------------------------------------------------
Input:
1,000,000 bytes - 1 million pure random data.txt
Output:
749,400 bytes - paq8px v178
748,956 bytes - cmix v17
1 million random alphabet letters.txt
charset = [a-z], size 26
1000000*Log[256.,26] = 587555
1 million pure random data.txt
charset = [\x0C\x1E07-9;=?ABD-FHIKMO-QTY\x5D\x5Ea-z\x7F\x83\x8D\x9E\xAF\xB0\xC6\xC7\xCE\xD3\xD5\xD8\ xDF\xE0\xE5\xE7\xEC-\xF0\xF3\xF6\xF8\xFA\xFC], size 79
1000000*Log[256.,79] = 787973
how do you know the charset ?
This script prints it. Its in perl.
Obama (10th May 2019)
To apply patent it need around RM15k .My algorithm can make unlimited compress data (I think , just tried 100k compress to 11 letters) , if apply patent my algorithm worth it or not ?
It is obvious that this fellow is pulling our chains and pressing our buttons. Let's make a graceful exit from his nonsense.
1,000,000 random digits to 1028 bytes? Absolute rubbish, and no way even given 10^9 years of time...
It depends on how you define "random" really: https://encode.su/threads/3099-Compr...ll=1#post59940
xinix (14th May 2019)
> How much compress text worth ?
I had heard of your algorithm in the high places during the Cold War. But there are only 2 ^ (8,224) files addressed or compressed by your algorithm, not enough to cover all the files in your 2 ^ (8,000,000) source file space, of course.
But how much compression algorithm worth? I guess it must be larger than $120 million paid by Microsoft to Stac for the Doublespace program infringement. Maybe compression algorithm must be worth more than $210 million or $220 million, considering that DeepMind was bought by Google for only $400 million.
Last edited by compgt; 18th May 2019 at 09:59.
I skim through this thread for the third time and it strikes me again that this is a "Nigerian Prince" kind of thing. Bogus claims for gullible audience. But what if this could be true? Well, it reminds me that patent offices in some countries are prohibited by law to grant patents related to any kind of perpetual motion machines. The laws must be amended to prohibit the possibility of compression below the entropy.