Page 1 of 2 12 LastLast
Results 1 to 30 of 62

Thread: How much compress text worth ?

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Member
    Join Date
    May 2019
    Location
    Malaysia
    Posts
    43
    Thanks
    8
    Thanked 0 Times in 0 Posts

    How much compress text worth ?

    I develop a algorithm can compress text from 10000 letters to 9 letters ,it can decompress so.But useless for me , i want to sell it .

  2. #2
    Member CompressMaster's Avatar
    Join Date
    Jun 2018
    Location
    Lovinobana, Slovakia
    Posts
    216
    Thanks
    66
    Thanked 18 Times in 18 Posts
    What kind of text? Are there many repetitions? If so, it´s easily compressible.
    If your file is random, then you are out of luck. But random content is compressible at least to few percent, but not that big as you have mentioned. But never say never - I am working on my custom data compression software that will be able to handle ANY filetype and compress it at least to 90% losslessly, but it will be terribly slow.

    Could you post some screenshots of your algorithm or at least compressed sample? Maybe then we can tell you more about it and also we can help you to compress it much better.

    Thanks.
    Last edited by CompressMaster; 9th May 2019 at 22:05. Reason: small typo

  3. #3
    Member
    Join Date
    May 2019
    Location
    Malaysia
    Posts
    43
    Thanks
    8
    Thanked 0 Times in 0 Posts
    Sorry , i can't show you. mine just algorithm not software so got some lack. It can compress random letters and 10k letters is random.

  4. #4
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,135
    Thanks
    320
    Thanked 1,397 Times in 802 Posts
    You don't need to sell it directly.
    Just apply to http://prize.hutter1.net/ or http://mailcom.com/challenge/ or https://marknelson.us/posts/2012/10/...turns-ten.html .
    There're also plenty of other contests where you can advertise your work.

  5. #5
    Member
    Join Date
    May 2019
    Location
    Malaysia
    Posts
    43
    Thanks
    8
    Thanked 0 Times in 0 Posts
    thanks.But my algorithm not so advance , can't reach at that level.

  6. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,135
    Thanks
    320
    Thanked 1,397 Times in 802 Posts
    Just split it to blocks. If you can compress 10000 letters to 9 bytes, it means you can split enwik8 to 10k blocks and compress them to 10k*9=90k total.
    It means you can claim the whole 50k euro prize.

  7. #7
    Member
    Join Date
    May 2019
    Location
    Malaysia
    Posts
    43
    Thanks
    8
    Thanked 0 Times in 0 Posts
    but when decompress it need huge database or super computer and it cant compress chinese word .That's why i want let it go.

  8. #8
    Member CompressMaster's Avatar
    Join Date
    Jun 2018
    Location
    Lovinobana, Slovakia
    Posts
    216
    Thanks
    66
    Thanked 18 Times in 18 Posts
    Try to compress these pure random files and post compressed results in attachment.

    Database size is not a problem for me and chinese strings can be completely filtered. So that´s not problem. I don´t need to decompress your results back to original files, I want only compressed archive of original files. Thanks.
    Attached Files Attached Files

  9. #9
    Member
    Join Date
    May 2019
    Location
    Malaysia
    Posts
    43
    Thanks
    8
    Thanked 0 Times in 0 Posts
    for 1 million random alphabet letters.txt -1028 bytes result
    Last edited by Obama; 10th May 2019 at 14:46.

  10. #10
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,040
    Thanks
    104
    Thanked 420 Times in 293 Posts
    Quote Originally Posted by Obama View Post
    1 million random alphabet letters.txt -1028 bytes
    You mean 1,000,000 bytes input, 1,028 bytes output is 973 times smaller?

    Did decompress also work and is file compare output equal to input?
    What file size has your compress and decompress software and do it use a database, if yes what size has the database?
    How long took it to compress and decompress?
    What program language did you use?
    Any idea for what price you want to sell your algorithm?

  11. #11
    Member
    Join Date
    May 2019
    Location
    Malaysia
    Posts
    43
    Thanks
    8
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Sportman View Post
    You mean 1,000,000 bytes input, 1,028 bytes output is 973 times smaller?

    Did decompress also work and is file compare output equal to input?
    What file size has your compress and decompress software and do it use a database, if yes what size has the database?
    How long took it to compress and decompress?
    What program language did you use?
    Any idea for what price you want to sell your algorithm?
    Yes, 1,000,000 bytes to 1,028 bytes.
    Yes,equal if try with short compress.
    few GB.
    Few day
    Python,but i can use any language cause i got algorithm.
    No idea,just offer me.

  12. #12
    Member
    Join Date
    Sep 2018
    Location
    Philippines
    Posts
    121
    Thanks
    31
    Thanked 2 Times in 2 Posts
    Quote Originally Posted by Obama View Post
    Yes, 1,000,000 bytes to 1,028 bytes.
    Yes,equal if try with short compress.
    few GB.
    Few day
    Python,but i can use any language cause i got algorithm.
    No idea,just offer me.
    Man, 1028 bytes? Seems to me like my output 1K frequency table plus the famed 32-bit filesize.

    You might have an algorithm for some tailored inputs, but sure you can't compress *all* the 1,000,000-byte random files into just 1028 bytes.

  13. #13
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,040
    Thanks
    104
    Thanked 420 Times in 293 Posts
    Quote Originally Posted by Obama View Post
    few GB.
    Few day
    Is "few GB" a static fixed dictionary or altered and/or increased with every input?

    How can it be a "Few day" while there was only 11:22 hour between the random text test file post and your answer or did you decompress it later?

  14. #14
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,135
    Thanks
    320
    Thanked 1,397 Times in 802 Posts
    Its not a problem even if it can only compress valid english... Just type out the data as text, ie 0xFF = 255 = "two five five".
    Even if enwik8 becomes 1G, it should be still compressible to 900k, so you'd still get the full prize.

    Btw, hashing is not a solution for compression, not because it needs "huge database or super computer"
    to restore input data from hash value, but because of collisions.
    Even with assumed charset [\x20a-z] of 27 letters, you'd still start having collisions with 16 input symbols
    and 9 bytes of output:
    Code:
    16 letters = 27^16 = 79766443076872509863361
    9 bytes = 256^9 =     4722366482869645213696
    Also, you can't sell software rights that easily - the trade has to be officially registered in some way,
    usually you'd get a patent for your algorithm, then sell it.
    Otherwise you can always claim that your software was stolen, once the buyer starts making money from it.

  15. Thanks:

    Obama (10th May 2019)

  16. #15
    Member
    Join Date
    May 2019
    Location
    Malaysia
    Posts
    43
    Thanks
    8
    Thanked 0 Times in 0 Posts
    You so nice , lead the stranger to the point.So good you are here.

  17. #16
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,040
    Thanks
    104
    Thanked 420 Times in 293 Posts
    Input:
    1,000,000 bytes - 1 million random alphabet letters.txt

    Output:
    588,286 bytes - paq8px v178
    588,001 bytes - cmix v17

    -------------------------------------------------------

    Input:
    1,000,000 bytes - 1 million pure random data.txt

    Output:
    749,400 bytes - paq8px v178
    748,956 bytes - cmix v17

  18. #17
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,135
    Thanks
    320
    Thanked 1,397 Times in 802 Posts
    1 million random alphabet letters.txt
    charset = [a-z], size 26
    1000000*Log[256.,26] = 587555

    1 million pure random data.txt
    charset = [\x0C\x1E07-9;=?ABD-FHIKMO-QTY\x5D\x5Ea-z\x7F\x83\x8D\x9E\xAF\xB0\xC6\xC7\xCE\xD3\xD5\xD8\ xDF\xE0\xE5\xE7\xEC-\xF0\xF3\xF6\xF8\xFA\xFC], size 79
    1000000*Log[256.,79] = 787973

  19. #18
    Member
    Join Date
    May 2019
    Location
    Malaysia
    Posts
    43
    Thanks
    8
    Thanked 0 Times in 0 Posts
    how do you know the charset ?


  20. #19
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,135
    Thanks
    320
    Thanked 1,397 Times in 802 Posts
    This script prints it. Its in perl.
    Attached Files Attached Files
    • File Type: zip 1.zip (497 Bytes, 58 views)

  21. Thanks:

    Obama (10th May 2019)

  22. #20
    Member
    Join Date
    May 2019
    Location
    Malaysia
    Posts
    43
    Thanks
    8
    Thanked 0 Times in 0 Posts
    To apply patent it need around RM15k .My algorithm can make unlimited compress data (I think , just tried 100k compress to 11 letters) , if apply patent my algorithm worth it or not ?

  23. #21
    Member
    Join Date
    Nov 2015
    Location
    -
    Posts
    75
    Thanks
    298
    Thanked 18 Times in 14 Posts
    Quote Originally Posted by Obama View Post
    To apply patent it need around RM15k .My algorithm can make unlimited compress data (I think , just tried 100k compress to 11 letters) , if apply patent my algorithm worth it or not ?
    We do not know whether to patent it.
    Since you need more information about the algorithm.
    ___
    By the way, how much can you compress book1?
    ___
    Have you tried to decode the archive and check the MD5 files?
    Attached Files Attached Files

  24. #22
    Member
    Join Date
    May 2019
    Location
    Malaysia
    Posts
    43
    Thanks
    8
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by xinix View Post
    We do not know whether to patent it.
    Since you need more information about the algorithm.
    ___
    By the way, how much can you compress book1?
    ___
    Have you tried to decode the archive and check the MD5 files?
    compress for me no problem ,but decompress i need huge database or super computer.Thats the reason i want sell it.i cant afford it .

  25. #23
    Member
    Join Date
    Nov 2015
    Location
    -
    Posts
    75
    Thanks
    298
    Thanked 18 Times in 14 Posts
    Quote Originally Posted by Obama View Post
    compress for me no problem ,but decompress i need huge database or super computer.Thats the reason i want sell it.i cant afford it .
    How long will it take to unpack a 1MB file?

  26. #24
    Member
    Join Date
    May 2019
    Location
    Malaysia
    Posts
    43
    Thanks
    8
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by xinix View Post
    We do not know whether to patent it.
    Since you need more information about the algorithm.
    ___
    By the way, how much can you compress book1?
    ___
    Have you tried to decode the archive and check the MD5 files?
    Done,just few hours,the result is 847 bytes.

  27. #25
    Member
    Join Date
    Nov 2015
    Location
    -
    Posts
    75
    Thanks
    298
    Thanked 18 Times in 14 Posts
    Quote Originally Posted by Obama View Post
    Done,just few hours,the result is 847 bytes.
    Thank.
    Interesting.
    Do you have the opportunity to put this 847 byte file here?
    Only compressed file.
    I think we can squeeze it again.

  28. #26
    Member
    Join Date
    Oct 2009
    Location
    usa
    Posts
    62
    Thanks
    1
    Thanked 9 Times in 6 Posts
    It is obvious that this fellow is pulling our chains and pressing our buttons. Let's make a graceful exit from his nonsense.

    1,000,000 random digits to 1028 bytes? Absolute rubbish, and no way even given 10^9 years of time...

  29. #27
    Member
    Join Date
    May 2019
    Location
    Malaysia
    Posts
    43
    Thanks
    8
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by zyzzle View Post
    It is obvious that this fellow is pulling our chains and pressing our buttons. Let's make a graceful exit from his nonsense.

    1,000,000 random digits to 1028 bytes? Absolute rubbish, and no way even given 10^9 years of time...
    Why rubbish please explain to me ?

  30. #28
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,135
    Thanks
    320
    Thanked 1,397 Times in 802 Posts
    It depends on how you define "random" really: https://encode.su/threads/3099-Compr...ll=1#post59940

  31. Thanks:

    xinix (14th May 2019)

  32. #29
    Member
    Join Date
    Sep 2018
    Location
    Philippines
    Posts
    121
    Thanks
    31
    Thanked 2 Times in 2 Posts
    > How much compress text worth ?

    I had heard of your algorithm in the high places during the Cold War. But there are only 2 ^ (8,224) files addressed or compressed by your algorithm, not enough to cover all the files in your 2 ^ (8,000,000) source file space, of course.

    But how much compression algorithm worth? I guess it must be larger than $120 million paid by Microsoft to Stac for the Doublespace program infringement. Maybe compression algorithm must be worth more than $210 million or $220 million, considering that DeepMind was bought by Google for only $400 million.
    Last edited by compgt; 18th May 2019 at 09:59.

  33. #30
    Member
    Join Date
    Jan 2017
    Location
    Selo Bliny-S'edeny
    Posts
    24
    Thanks
    7
    Thanked 10 Times in 8 Posts
    I skim through this thread for the third time and it strikes me again that this is a "Nigerian Prince" kind of thing. Bogus claims for gullible audience. But what if this could be true? Well, it reminds me that patent offices in some countries are prohibited by law to grant patents related to any kind of perpetual motion machines. The laws must be amended to prohibit the possibility of compression below the entropy.

Page 1 of 2 12 LastLast

Similar Threads

  1. How to compress text bitmap?
    By hey in forum Data Compression
    Replies: 5
    Last Post: 4th August 2017, 20:55
  2. The lag-based compression algorithm (worth 1B$!)
    By EagleOne in forum Data Compression
    Replies: 7
    Last Post: 17th September 2015, 19:21
  3. Would this be worth it for a compression rig ?
    By SvenBent in forum The Off-Topic Lounge
    Replies: 3
    Last Post: 19th May 2015, 08:09
  4. text compression?
    By codebox in forum The Off-Topic Lounge
    Replies: 2
    Last Post: 16th March 2015, 17:31
  5. Text Detection
    By Simon Berger in forum Data Compression
    Replies: 15
    Last Post: 30th May 2009, 10:58

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •