Results 1 to 17 of 17

Thread: What method you propose for comperssing this bitstring?

  1. #1
    Member Ali Imran Khan Shirani's Avatar
    Join Date
    Sep 2010
    Location
    UAE
    Posts
    11
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Question What method you propose for comperssing this bitstring?

    You have a 128bits long bit string. We split it to two 64bit strings. Result is:

    00110010001000000011001000011100010110001000011111 00011110001110
    00100010110000000001011110011110011110111011111111 00111111011111

    Where these two strings (as now) put on separate lines, if we see closely, there are a lot of bits matching in both strings at same positions. In this case 45 bits match which have same either 0 or 1 at same position. 22 0's and 23 1's matched exactly in same positions in both strings.

    Survey question:
    What is the best method you propose to compress this 128 bit string to less than 112 bits ?



    NOTE: I asked this question before to Dr. Matt, and I am not sure what he proposed, as far I remember it was, try compressing the XOR of these 2 values.

    Input is requested.

  2. #2
    Member Ali Imran Khan Shirani's Avatar
    Join Date
    Sep 2010
    Location
    UAE
    Posts
    11
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Post

    One more large string of 512 bits, when splitted, results in two 256 bitstrings containing total 191 matches (95 0's 96 1'x).

    First 256 bits block
    11010111011001000100011100010110101010010100010001 01000011001001000111010111011001001101100100010000 11100011110011011000101101101011011001100011101110 01111001101111000100011000111011110000111001100101 01001001100100101111000011011011000001110010101100 110010

    Second 256 bits block
    10000101011001000011011000010010101000111001111001 00000011001101000110010101001001010101110101111000 11000111110011010101101101001111001000111111001111 10001001101111000100011001111011111000011101100001 01000010100100111111011010011111100100110000101111 111000

    If you (reader) want, I can post decimal, octal, hex, as well as presentation as unsigned char in 64 bytes.

  3. #3
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Well, all adaptive data compression methods must have large enough input passed to learn from. For such a small inputs you need something like LZW with a static dictionary or sort of Vector Quantization.

  4. #4
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    863
    Thanks
    460
    Thanked 257 Times in 105 Posts
    Compressing the XOR with a classic entropy scheme will give you 112 bits.

    To get something better, you need more precise model than the simple statement "45 out of 64 bits are identical".

    For example, you could assign a dedicated probability for each bit position.
    Wether it makes sense entirely depends on your data set, and cannot be properly evaluated with only 2 samples.

    Best guess : if you are compressing a structure list of 64-bits field, you'll find indeed a lot of common similarities.
    You may end up compressing 64 flow of 1-bit wide channels (most generic method).
    If you know better what's into these field, you could create a more optimised (and more specific) model.

  5. #5
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    The key to compression is to understand what the data means. Compression is an AI problem.

  6. #6
    Member Ali Imran Khan Shirani's Avatar
    Join Date
    Sep 2010
    Location
    UAE
    Posts
    11
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hi Dr. matt,

    Was trying a brute force method to break complexity of some data. In the above data ofcourse there is absolutely no meanings but just a transform for finding more matches (just for fun), that might result in increase in probability of certain symbols at nibbles level, and has no such model rightnow, thus, wanted opinion from you guys.

    Thanks for your reply Doc.
    regards

  7. #7
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    How is the data created? Where does it come from? You have to answer those questions to find the best compression algorithm.

  8. #8
    Member
    Join Date
    May 2009
    Location
    China
    Posts
    36
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Ali Imran Khan Shirani View Post
    You have a 128bits long bit string. We split it to two 64bit strings. Result is:

    00110010001000000011001000011100010110001000011111 00011110001110
    00100010110000000001011110011110011110111011111111 00111111011111

    Where these two strings (as now) put on separate lines, if we see closely, there are a lot of bits matching in both strings at same positions. In this case 45 bits match which have same either 0 or 1 at same position. 22 0's and 23 1's matched exactly in same positions in both strings.

    Survey question:
    What is the best method you propose to compress this 128 bit string to less than 112 bits ?



    NOTE: I asked this question before to Dr. Matt, and I am not sure what he proposed, as far I remember it was, try compressing the XOR of these 2 values.

    Input is requested.
    It can be compressed reach to 116 bits.

  9. #9
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Like I said, you need to tell me what the data means to get any more compression. Compression = prediction = understanding.

  10. #10
    Member biject.bwts's Avatar
    Join Date
    Jun 2008
    Location
    texas
    Posts
    449
    Thanks
    23
    Thanked 14 Times in 10 Posts

    Thumbs up

    One question is the bit string alway a multiple of 8 if so
    why not post some files so we can test with some of our current code.
    If its really for any length why not publish some files where
    every byte is either ascii 1 or o then we can compress to a file
    where every byte is an ascii one or zero.
    And as Matt says what is the format of the data. If its really random
    then no compression possible. Sure we can compress this file to
    1 bit. in a lossless way with a special written general compressor but
    its not likely going to compress your other strings very well.

    As a first test I would if the file is a pure ascii strong of 1 and 0 with
    BWTS MTFQ A012B01 ARB2X B012A01
    if that doesn't work I would try it again without the BWTS

    BWTS is the bijective byte BWT it's just permutation of the string
    MTFQ is my move to symbols seen instead of moving to binary 0 1 2..
    at this point the file is still all ascii 1 and 0 and the same length as
    original
    A012B012 packs it bijectively to a byte file.
    ARB2X compress the byte file to bytes using only ratio of ones to zeroes
    it would be better to make a mod to ARB2X to handle ascii one and zeros direct
    B012A01 converts bijectively any byte file to a file of bytes of
    only ascii 1 or 0

    It may work for you it may not try it

  11. #11
    Member Ali Imran Khan Shirani's Avatar
    Join Date
    Sep 2010
    Location
    UAE
    Posts
    11
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Post

    Quote Originally Posted by Matt Mahoney View Post
    How is the data created? Where does it come from? You have to answer those questions to find the best compression algorithm.
    Well it was a regular jpeg file http://compression.ca/act/act-jpeg.html and shrunk DSCN5081.JPG down to smaller size (about 39 kB) so test wont take longer, the final data that was posted was a simple result of progressive delta encoding (with some predefined constants) of some 128bits taken out from some position. result files are attached ascii as well as binary.

    Quote Originally Posted by ddfox View Post
    It can be compressed reach to 116 bits.
    Great news, but question is how do you have claim that do you have a model in mind? and do you mind sharing it? And does model apply to similar strings? Try the attached files below.


    Quote Originally Posted by biject.bwts View Post
    One question is the bit string alway a multiple of 8 if so
    why not post some files so we can test with some of our current code.
    If its really for any length why not publish some files where
    every byte is either ascii 1 or o then we can compress to a file
    where every byte is an ascii one or zero.
    And as Matt says what is the format of the data. If its really random
    then no compression possible. Sure we can compress this file to
    1 bit. in a lossless way with a special written general compressor but
    its not likely going to compress your other strings very well.

    As a first test I would if the file is a pure ascii strong of 1 and 0 with
    BWTS MTFQ A012B01 ARB2X B012A01
    if that doesn't work I would try it again without the BWTS

    BWTS is the bijective byte BWT it's just permutation of the string
    MTFQ is my move to symbols seen instead of moving to binary 0 1 2..
    at this point the file is still all ascii 1 and 0 and the same length as
    original
    A012B012 packs it bijectively to a byte file.
    ARB2X compress the byte file to bytes using only ratio of ones to zeroes
    it would be better to make a mod to ARB2X to handle ascii one and zeros direct
    B012A01 converts bijectively any byte file to a file of bytes of
    only ascii 1 or 0

    It may work for you it may not try it
    Hi Mr. David, nice to hear you are ready to give it a try. I am attaching both binary and ascii versions. Besides, the ascii is not splitted among chunks of 64, you have to read each two 64digit chunks and identify matches, similarly in binary read two 64bit integers.


    Dear Admin
    I am not receiving any replies' notifications.

    regards
    Attached Files Attached Files

  12. #12
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    See http://mattmahoney.net/dc/dce.html#Section_616 for compressing JPEG files.

  13. #13
    Member
    Join Date
    May 2009
    Location
    China
    Posts
    36
    Thanks
    0
    Thanked 0 Times in 0 Posts
    To the david.bin file, I can find less than 8 bits redundance, but can not eliminate it. or you can provide larger data to me to try, how about 10M bytes?

  14. #14
    Member Ali Imran Khan Shirani's Avatar
    Join Date
    Sep 2010
    Location
    UAE
    Posts
    11
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thanks for the reference Doc.

    ddfox I will provide a large sample and let you know.

    regards

  15. #15
    Member Ali Imran Khan Shirani's Avatar
    Join Date
    Sep 2010
    Location
    UAE
    Posts
    11
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by biject.bwts View Post
    One question is the bit string alway a multiple of 8 if so
    why not post some files so we can test with some of our current code.
    If its really for any length why not publish some files where
    every byte is either ascii 1 or o then we can compress to a file
    where every byte is an ascii one or zero.
    And as Matt says what is the format of the data. If its really random
    then no compression possible. Sure we can compress this file to
    1 bit. in a lossless way with a special written general compressor but
    its not likely going to compress your other strings very well.

    As a first test I would if the file is a pure ascii strong of 1 and 0 with
    BWTS MTFQ A012B01 ARB2X B012A01
    if that doesn't work I would try it again without the BWTS

    BWTS is the bijective byte BWT it's just permutation of the string
    MTFQ is my move to symbols seen instead of moving to binary 0 1 2..
    at this point the file is still all ascii 1 and 0 and the same length as
    original
    A012B012 packs it bijectively to a byte file.
    ARB2X compress the byte file to bytes using only ratio of ones to zeroes
    it would be better to make a mod to ARB2X to handle ascii one and zeros direct
    B012A01 converts bijectively any byte file to a file of bytes of
    only ascii 1 or 0

    It may work for you it may not try it
    Sorry for late reply

    Tried all of it, and achieved 1/8 = 12.5% or original so practically no compression in fact couple bytes larger than 12.5% which could have been achieved by simply write 0's and 1's down in binary format. And after removing BWTS results is exactly the same.

    Thanks for your input

  16. #16
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,505
    Thanks
    741
    Thanked 665 Times in 359 Posts
    it can be compressed to 0 bits by trivial algorithm

  17. #17
    Member Karhunen's Avatar
    Join Date
    Dec 2011
    Location
    USA
    Posts
    91
    Thanks
    2
    Thanked 1 Time in 1 Post
    My method is to find the offset of a secret irrational number that contains the codestream

Similar Threads

  1. Unaligned bitstring matching experiment
    By Shelwien in forum Data Compression
    Replies: 1
    Last Post: 16th April 2010, 20:59
  2. Frequent Bits Elimination Method - suggest please
    By Scientist in forum Data Compression
    Replies: 14
    Last Post: 28th September 2009, 19:30
  3. Most efficient/practical compression method for short strings?
    By never frog in forum Data Compression
    Replies: 6
    Last Post: 1st September 2009, 05:05
  4. Stuffit Method 15 - Dedicated to BWT maniacs
    By encode in forum Data Compression
    Replies: 1
    Last Post: 13th May 2008, 00:43
  5. Does this compression method already exist?
    By Lasse Reinhold in forum Forum Archive
    Replies: 4
    Last Post: 24th August 2007, 14:59

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •