Results 1 to 7 of 7

Thread: Most efficient/practical compression method for short strings?

  1. #1
    Member
    Join Date
    Aug 2009
    Location
    there
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Most efficient/practical compression method for short strings?

    Hello. I can't think of a better place to ask this.
    The string in question may include some sort of xml tags, numbers and plain text. How would you go about achieving the best possible compression ratio for such data?

  2. #2
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    Preprocess with xml-wrt. It's included in some PAQ versions or standalone.

  3. #3
    Member
    Join Date
    Aug 2009
    Location
    there
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Umm. I think I need to state my requirements more clearly.
    I have to implement compression for small separate chunks of (mostly text) data. Assuming that general approaches using dictionaries and huge blocks are sub-optimal for this scenario, I would like to know what method or combination of methods gives best theoretically possible results for compression of tiny amount of data.

  4. #4
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    >best theoretically possible results

    there is my CM algorithm, that allows you to store common stats for large block of data and then decode short fragments using these stats as dictionary

  5. #5
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts
    Quote Originally Posted by never frog View Post
    Umm. I think I need to state my requirements more clearly.
    I have to implement compression for small separate chunks of (mostly text) data. Assuming that general approaches using dictionaries and huge blocks are sub-optimal for this scenario, I would like to know what method or combination of methods gives best theoretically possible results for compression of tiny amount of data.
    There is no best way. Since its a moving target. But for small text
    files. I would view that text as it underlying string and then do a
    binary BWTS and then some simple bijective encoder. If these
    small separate chunks don't exist in separate files you may have to
    add a count field in the start of the data so that it can be separated out.

    Easier yet is to just use a static huffman or arithmetic table that's fixed
    and stored off line. But still do it bijectively as above.

  6. #6
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    >best theoretically possible results

    there is my CM algorithm, that allows you to store common stats for large block of data and then decode short fragments using these stats as dictionary
    One can use prebuilt dictionaries instead of prebuilt models (or together with them).

  7. #7
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    In theory the best way to compress anything is to assign a probability to each possible input and use a single Huffman code for the whole string. If your strings are short enough that you can list every possibility, and you know the probabilities, that might even be practical.

    But usually, that's not the case. If you have more than a few thousand or million possible inputs, then use an arithmetic code. Guessing the probabilities is another matter. In general the problem is not even computable. That's what makes data compression an art instead of a science.

Similar Threads

  1. Frequent Bits Elimination Method - suggest please
    By Scientist in forum Data Compression
    Replies: 14
    Last Post: 28th September 2009, 19:30
  2. Stuffit Method 15 - Dedicated to BWT maniacs
    By encode in forum Data Compression
    Replies: 1
    Last Post: 13th May 2008, 00:43
  3. Maximum Practical Compression
    By Bulat Ziganshin in forum Forum Archive
    Replies: 5
    Last Post: 31st March 2008, 16:20
  4. Does this compression method already exist?
    By Lasse Reinhold in forum Forum Archive
    Replies: 4
    Last Post: 24th August 2007, 14:59
  5. Best practical archiver
    By nimdamsk in forum Forum Archive
    Replies: 34
    Last Post: 24th March 2007, 22:51

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •