Page 2 of 2 FirstFirst 12
Results 31 to 38 of 38

Thread: I am looking for the best TEXT compression algorithm available

  1. #31
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    554
    Thanks
    356
    Thanked 356 Times in 193 Posts
    Quote Originally Posted by Kennon Conrad View Post
    You cannot get better than 87.5% if the data is random and the two symbols have equal probability.
    "the data is random" entails "the two symbols have equal probability". You don't need both requirements.
    I used the wording "truly random" to try to emphasize that but "the two symbols have equal probability" is probably better for CompressMaster to comprehend it. Add "no useful patterns" to the requirements, and you have the practical definition of "randomness".

    Anyway, we are on the same message and thanx for the analysis.

  2. Thanks:

    Kennon Conrad (28th July 2018)

  3. #32
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    554
    Thanks
    356
    Thanked 356 Times in 193 Posts
    Quote Originally Posted by CompressMaster View Post
    "QQQQQQq" will be replaced by "dog "
    "QQQQq" will be replaced by "car "
    "QQq" will be replaced by "cat "
    and so on...
    Quote Originally Posted by Piotr Tarsa View Post
    Text models usually help text compression, but they are tuned on real texts so they rely on properties of real texts ...
    @Piotr: Thanx.
    @CompressMaster: Replacing binary symbols or sequences with words will not make the file a real text file.
    This is a real text file (string): "A dog was chasing a cat in a warm Saturday morning." This has the properties of real text. Text compressors are tuned for this. (See Piotr's post.)
    This is a file containing binary information: "cat cat cat cat dog cat cat dog car". It does not have the properties of real text. Text compressors will not like it very much.
    Conclusion: you cannot convert your binary file to a real text file.

  4. #33
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    554
    Thanks
    356
    Thanked 356 Times in 193 Posts
    CompressMaster, you were right at the very beginning.
    RLE preprocessing. Maximum 4-bit lenghts. Symbols are: series of O's, ending with one o.
    (Why don't you tell us, how you generated the file?)

    C#:
    Code:
    using System.Collections.Generic;
    using System.IO;
    
    namespace Decoder
    {
        class Program
        {
            static void Main(string[] args)
            {
                //reading
                byte[] bytes = File.ReadAllBytes("only-O.txt");
                List<int> lengths = new List<int>();
                //unary to lengths
                int length = 0;
                for (int i = 0; i < bytes.Length; i++) {
                    if (bytes[i] == 'O') length++;
                    else {lengths.Add(length-1);length = 0;}
                }
                //encoding to nibbles (4-bits)
                List<byte> encoded_data = new List<byte>(lengths.Count);
                for (int i = 0; i < lengths.Count; i+=2)
                  encoded_data.Add((byte)((lengths[i]<<4) | (lengths[i+1])));
                //writing result
                File.WriteAllBytes("photo.jpg",encoded_data.ToArray());
            }
        }
    }
    Edit: updated code to reveal photo immediately.
    Last edited by Gotty; 28th July 2018 at 13:58.

  5. #34
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    554
    Thanks
    356
    Thanked 356 Times in 193 Posts
    On, it's already done in the other thread.
    I'm late.

  6. #35
    Member
    Join Date
    Feb 2016
    Location
    Luxembourg
    Posts
    546
    Thanks
    203
    Thanked 796 Times in 322 Posts
    @Gotty
    See here (JPEG nibble-encoded in unary with q and Q instead of 0 and 1).

    132.827 bytes with paq8px_v151 after the bit1 tool from Shelwien to undo the encoding.

  7. Thanks:

    Gotty (28th July 2018)

  8. #36
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    554
    Thanks
    356
    Thanked 356 Times in 193 Posts
    Yeah, I'm kind of late. Nevertheless I updated my code to reveal the photo immediately.
    You are right, it's unary-encoded.

  9. #37
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    695
    Thanks
    153
    Thanked 183 Times in 108 Posts
    Quote Originally Posted by Gotty View Post
    CompressMaster, you were right at the very beginning.
    RLE preprocessing. Maximum 4-bit lenghts. Symbols are: series of O's, ending with one o.
    (Why don't you tell us, how you generated the file?)

    C#:
    Code:
    using System.Collections.Generic;
    using System.IO;
    
    namespace Decoder
    {
        class Program
        {
            static void Main(string[] args)
            {
                //reading
                byte[] bytes = File.ReadAllBytes("only-O.txt");
                List<int> lengths = new List<int>();
                //unary to lengths
                int length = 0;
                for (int i = 0; i < bytes.Length; i++) {
                    if (bytes[i] == 'O') length++;
                    else {lengths.Add(length-1);length = 0;}
                }
                //encoding to nibbles (4-bits)
                List<byte> encoded_data = new List<byte>(lengths.Count);
                for (int i = 0; i < lengths.Count; i+=2)
                  encoded_data.Add((byte)((lengths[i]<<4) | (lengths[i+1])));
                //writing result
                File.WriteAllBytes("photo.jpg",encoded_data.ToArray());
            }
        }
    }
    Edit: updated code to reveal photo immediately.
    Stupid question, but I have to ask. Is that the all the code that's needed for C# and good practice, etc.? If so, I really need to learn it. It looks so nice and clean compared to C.

  10. #38
    Member Gotty's Avatar
    Join Date
    Oct 2017
    Location
    Switzerland
    Posts
    554
    Thanks
    356
    Thanked 356 Times in 193 Posts
    Yeah, that's all the code needed to grab only-O.txt and convert it to photo.jpg.

    {start of ad}

    When you need to develop/test a new idea, it's perfect. From idea to working code it's just a couple of minutes. There are a lot of useful pre-baked classes (the .net environment). They are really useful.
    C# makes/helps/forces you to think object oriented.
    C# and java are very close relatives. If you learn one, you are really close to learn the other.

    The memory management is automatic. No pointers, no memory leaks.
    You can skip a lot of error handling like index boundary checks - the system will do it for you automatically.

    Now it comes with a free development environment with a (really good) debugger, code editor, performance profiler, code analyzer, whatnot.
    My favorite IDE feature (beside the debugger) is that when you type, and reach for variables, classes, objects, object members, the editor immediately tells you your choices. This feature is really helpful for fast development. Also you don't even have to compile your code to find out if you have some (stupid) mistakes: the editor tells you on the fly as you type what problems it sees.

    It is worthy to learn it.

    {end of ad}

  11. Thanks:

    Kennon Conrad (29th July 2018)

Page 2 of 2 FirstFirst 12

Similar Threads

  1. Numbers vs text compression
    By irect in forum Data Compression
    Replies: 3
    Last Post: 7th March 2016, 01:24
  2. Multi-language text compression corpus?
    By Paul W. in forum Data Compression
    Replies: 13
    Last Post: 19th November 2015, 18:06
  3. text compression?
    By codebox in forum The Off-Topic Lounge
    Replies: 2
    Last Post: 16th March 2015, 16:31
  4. lost interest in text data compression
    By RichSelian in forum The Off-Topic Lounge
    Replies: 12
    Last Post: 10th February 2014, 23:12
  5. Rationale for Text Compression
    By cfeck in forum Data Compression
    Replies: 34
    Last Post: 20th November 2013, 03:43

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •