1. ## What method you propose for comperssing this bitstring?

You have a 128bits long bit string. We split it to two 64bit strings. Result is:

00110010001000000011001000011100010110001000011111 00011110001110
00100010110000000001011110011110011110111011111111 00111111011111

Where these two strings (as now) put on separate lines, if we see closely, there are a lot of bits matching in both strings at same positions. In this case 45 bits match which have same either 0 or 1 at same position. 22 0's and 23 1's matched exactly in same positions in both strings.

Survey question:
What is the best method you propose to compress this 128 bit string to less than 112 bits ?

NOTE: I asked this question before to Dr. Matt, and I am not sure what he proposed, as far I remember it was, try compressing the XOR of these 2 values.

Input is requested.

2. One more large string of 512 bits, when splitted, results in two 256 bitstrings containing total 191 matches (95 0's 96 1'x).

First 256 bits block
11010111011001000100011100010110101010010100010001 01000011001001000111010111011001001101100100010000 11100011110011011000101101101011011001100011101110 01111001101111000100011000111011110000111001100101 01001001100100101111000011011011000001110010101100 110010

Second 256 bits block
10000101011001000011011000010010101000111001111001 00000011001101000110010101001001010101110101111000 11000111110011010101101101001111001000111111001111 10001001101111000100011001111011111000011101100001 01000010100100111111011010011111100100110000101111 111000

If you (reader) want, I can post decimal, octal, hex, as well as presentation as unsigned char in 64 bytes.

3. Well, all adaptive data compression methods must have large enough input passed to learn from. For such a small inputs you need something like LZW with a static dictionary or sort of Vector Quantization.

4. Compressing the XOR with a classic entropy scheme will give you 112 bits.

To get something better, you need more precise model than the simple statement "45 out of 64 bits are identical".

For example, you could assign a dedicated probability for each bit position.
Wether it makes sense entirely depends on your data set, and cannot be properly evaluated with only 2 samples.

Best guess : if you are compressing a structure list of 64-bits field, you'll find indeed a lot of common similarities.
You may end up compressing 64 flow of 1-bit wide channels (most generic method).
If you know better what's into these field, you could create a more optimised (and more specific) model.

5. The key to compression is to understand what the data means. Compression is an AI problem.

6. Hi Dr. matt,

Was trying a brute force method to break complexity of some data. In the above data ofcourse there is absolutely no meanings but just a transform for finding more matches (just for fun), that might result in increase in probability of certain symbols at nibbles level, and has no such model rightnow, thus, wanted opinion from you guys.

regards

7. How is the data created? Where does it come from? You have to answer those questions to find the best compression algorithm.

8. Originally Posted by Ali Imran Khan Shirani
You have a 128bits long bit string. We split it to two 64bit strings. Result is:

00110010001000000011001000011100010110001000011111 00011110001110
00100010110000000001011110011110011110111011111111 00111111011111

Where these two strings (as now) put on separate lines, if we see closely, there are a lot of bits matching in both strings at same positions. In this case 45 bits match which have same either 0 or 1 at same position. 22 0's and 23 1's matched exactly in same positions in both strings.

Survey question:
What is the best method you propose to compress this 128 bit string to less than 112 bits ?

NOTE: I asked this question before to Dr. Matt, and I am not sure what he proposed, as far I remember it was, try compressing the XOR of these 2 values.

Input is requested.
It can be compressed reach to 116 bits.

9. Like I said, you need to tell me what the data means to get any more compression. Compression = prediction = understanding.

10. One question is the bit string alway a multiple of 8 if so
why not post some files so we can test with some of our current code.
If its really for any length why not publish some files where
every byte is either ascii 1 or o then we can compress to a file
where every byte is an ascii one or zero.
And as Matt says what is the format of the data. If its really random
then no compression possible. Sure we can compress this file to
1 bit. in a lossless way with a special written general compressor but
its not likely going to compress your other strings very well.

As a first test I would if the file is a pure ascii strong of 1 and 0 with
BWTS MTFQ A012B01 ARB2X B012A01
if that doesn't work I would try it again without the BWTS

BWTS is the bijective byte BWT it's just permutation of the string
MTFQ is my move to symbols seen instead of moving to binary 0 1 2..
at this point the file is still all ascii 1 and 0 and the same length as
original
A012B012 packs it bijectively to a byte file.
ARB2X compress the byte file to bytes using only ratio of ones to zeroes
it would be better to make a mod to ARB2X to handle ascii one and zeros direct
B012A01 converts bijectively any byte file to a file of bytes of
only ascii 1 or 0

It may work for you it may not try it

11. Originally Posted by Matt Mahoney
How is the data created? Where does it come from? You have to answer those questions to find the best compression algorithm.
Well it was a regular jpeg file http://compression.ca/act/act-jpeg.html and shrunk DSCN5081.JPG down to smaller size (about 39 kB) so test wont take longer, the final data that was posted was a simple result of progressive delta encoding (with some predefined constants) of some 128bits taken out from some position. result files are attached ascii as well as binary.

Originally Posted by ddfox
It can be compressed reach to 116 bits.
Great news, but question is how do you have claim that do you have a model in mind? and do you mind sharing it? And does model apply to similar strings? Try the attached files below.

Originally Posted by biject.bwts
One question is the bit string alway a multiple of 8 if so
why not post some files so we can test with some of our current code.
If its really for any length why not publish some files where
every byte is either ascii 1 or o then we can compress to a file
where every byte is an ascii one or zero.
And as Matt says what is the format of the data. If its really random
then no compression possible. Sure we can compress this file to
1 bit. in a lossless way with a special written general compressor but
its not likely going to compress your other strings very well.

As a first test I would if the file is a pure ascii strong of 1 and 0 with
BWTS MTFQ A012B01 ARB2X B012A01
if that doesn't work I would try it again without the BWTS

BWTS is the bijective byte BWT it's just permutation of the string
MTFQ is my move to symbols seen instead of moving to binary 0 1 2..
at this point the file is still all ascii 1 and 0 and the same length as
original
A012B012 packs it bijectively to a byte file.
ARB2X compress the byte file to bytes using only ratio of ones to zeroes
it would be better to make a mod to ARB2X to handle ascii one and zeros direct
B012A01 converts bijectively any byte file to a file of bytes of
only ascii 1 or 0

It may work for you it may not try it
Hi Mr. David, nice to hear you are ready to give it a try. I am attaching both binary and ascii versions. Besides, the ascii is not splitted among chunks of 64, you have to read each two 64digit chunks and identify matches, similarly in binary read two 64bit integers.

I am not receiving any replies' notifications.

regards

12. See http://mattmahoney.net/dc/dce.html#Section_616 for compressing JPEG files.

13. To the david.bin file, I can find less than 8 bits redundance, but can not eliminate it. or you can provide larger data to me to try, how about 10M bytes?

14. Thanks for the reference Doc.

ddfox I will provide a large sample and let you know.

regards

15. Originally Posted by biject.bwts
One question is the bit string alway a multiple of 8 if so
why not post some files so we can test with some of our current code.
If its really for any length why not publish some files where
every byte is either ascii 1 or o then we can compress to a file
where every byte is an ascii one or zero.
And as Matt says what is the format of the data. If its really random
then no compression possible. Sure we can compress this file to
1 bit. in a lossless way with a special written general compressor but
its not likely going to compress your other strings very well.

As a first test I would if the file is a pure ascii strong of 1 and 0 with
BWTS MTFQ A012B01 ARB2X B012A01
if that doesn't work I would try it again without the BWTS

BWTS is the bijective byte BWT it's just permutation of the string
MTFQ is my move to symbols seen instead of moving to binary 0 1 2..
at this point the file is still all ascii 1 and 0 and the same length as
original
A012B012 packs it bijectively to a byte file.
ARB2X compress the byte file to bytes using only ratio of ones to zeroes
it would be better to make a mod to ARB2X to handle ascii one and zeros direct
B012A01 converts bijectively any byte file to a file of bytes of
only ascii 1 or 0

It may work for you it may not try it

Tried all of it, and achieved 1/8 = 12.5% or original so practically no compression in fact couple bytes larger than 12.5% which could have been achieved by simply write 0's and 1's down in binary format. And after removing BWTS results is exactly the same.