# Thread: Data Compression Diamond Algorithm

1. ## Data Compression Diamond Algorithm

Don't worry, I'm in the process of developing a new algorithm. Something like 40% Theory wins and the rest can be completed quickly.An MP4 file can be minimized from 40% to 95% and can be resized ,The advantage is that Similarly, you can compress the file again and again.  Reply With Quote

2. Kolmogorov complexity is a proven fact about the compressability of a piece of information. There is a minimum amount of information necessary in order to be able to restore the compressed information. Sometimes the lower bound of the complexity is a short formula or definition about the data. For example the description "the first million digits of pi" is much shorter than the actual first million digits. In case of real random data the Kolmogorov complexity is equal to the size of the random data.

Therefor your statement that every piece of information can be compressed by half the length and that it can be repeated, must be false. It contradicts the known facts surrounding Kolmogorov complexity.  Reply With Quote

3. The system I'm developing has completely eliminated decimals.
Each word has a hash-number it is repeated and different position taken.
This position is less than string length.that is maximum length from input as 10% to 95% .it can compressed again and again

i have tested ,is that 32 bit can made as 24 bit .when input bit can increase more compression is taken 64 bit change to 34 bit ....  Reply With Quote

4. Originally Posted by uhost it can compressed again and again
only with recompression...  Reply With Quote

5. Originally Posted by uhost The system I'm developing has completely eliminated decimals.
Each word has a hash-number it is repeated and different position taken.
This position is less than string length.that is maximum length from input as 10% to 95% .it can compressed again and again

i have tested ,is that 32 bit can made as 24 bit .when input bit can increase more compression is taken 64 bit change to 34 bit ....
If you use a 64 bit or 32 bit hash function as the base of your compression algorithm... Oh boy... How do I explain this...

Lets take a simple example: the well known CRC-32. It takes any N-byte input and hashes it to 32 bits. If you hash 64 bit to 32 bit using CRC you go from 2^64 inputs to 2^32 outputs. After 'compression' for every 32 bit output you have 2^32 options of 64 bit input that leads to the same hash. If I read you correctly you accounted for hash collisions, using a diamond shaped collections of hash values that should ensure the correctness of reverting to 64 bits again. But for every N discarded bits you generate 2^N of hash revert options you don't account for. And that's if we are sure that CRC-32 is a 'perfect' hash function. CRC-32 is not perfect and because of that the loss of N discarded bits is even a bit higher than 2^N.  Reply With Quote

6. ## Thanks:

uhost (4th March 2020)

7. ## :     Reply With Quote

8. Can you explain your algorithm is it like this:
A=b-c
Y=A/thebigest  Reply With Quote

9. yes sure but not this time .... i am build up my dream. i know 1% fail make 99% fail .so i cannot reveal that When i finished..  Reply With Quote

10.   Reply With Quote

11. Originally Posted by pacalovasjurijus Can you explain your algorithm is it like this:
A=b-c
Y=A/thebigest
position value profit
===== ==== =============
1)  100000 5bit
2)  100001 4bt
3)  100010 4bit
4)  100011 3bit
5)  100100 3bit
6)  100101 3bit
7)  100110 3bit 100111 2bit
9)  101000 2bit

this is example but not same. i think this idea can make compress data  Reply With Quote

12. OK genius where is binary to test?  Reply With Quote

13. when i would complete The theory part, i will send sample program for your advice and testing  Reply With Quote

14. Originally Posted by Romul No. According to the equations should work. But how in practice is unknown.
Everything is only in the form of equations, graphs and formulas.
But, now I will try to write a program. And it will be seen how wrong I am. )))

PS: I write through an online translator, so my text may not look very correct.
well, please complete your Dream in real world . you have well knowledge about math. use it and catch it

i will be finish my algorithm, i am generated some math equation for reduced huge value to small [like as ^ root] this main advantage is no decimal data [convert to 3054 => 1042=>1006=>18=>14=>2] this 5 step can reduce 3054 =>2  Reply With Quote

15. Originally Posted by uhost i will be finish my algorithm, i am generated some math equation for reduced huge value to small [like as ^ root] this main advantage is no decimal data [convert to 3054 => 1042=>1006=>18=>14=>2] this 5 step can reduce 3054 =>2
Out of curiosity, what are the results of reducing 3055 and 3053?  Reply With Quote

16.   Reply With Quote

17. Originally Posted by schnaader Out of curiosity, what are the results of reducing 3055 and 3053?
After 3 conversion 3055 = >1041=>1007=>17,3053=>1043=>1005=>19
this method can provide decimal free conversion

3055 =101111101111[12bit]
count=11(3)+10001(17)=1110001 [7bit] if you want reduce again
113=1110001
113=>15==1111  Reply With Quote

18. Originally Posted by uhost 3055 = 101111101111[12bit]
count=11(3)+10001(17)=1110001 [7bit]
I don't understand this "count" step. What do I have to count? If I count zeros and ones for example, I get 2 (zeros) and 10 (ones), not 3 and 17.

Also, first you wrote 3055 => 1041 => 1007 => 17, the second one looks like 3055 => 113 => 15, which one is correct?

Last question: Why can't 17 and 19 be reduced further?  Reply With Quote

19. ##    Reply With Quote

20. Originally Posted by schnaader I don't understand this "count" step. What do I have to count? If I count zeros and ones for example, I get 2 (zeros) and 10 (ones), not 3 and 17.

Also, first you wrote 3055 => 1041 => 1007 => 17, the second one looks like 3055 => 113 => 15, which one is correct?

Last question: Why can't 17 and 19 be reduced further?
count = how many divisions are taken
eg: 3055 Fist Div Result = 1041 After 3 or 4 steps we have 17 or 15
(0) == == (1) ==== (2) === (3) == (4) = (5)
3055 => 1041 => 1007 => 17 => 15 => 1
1 have different angles & generate different master number
For example : 01,001,0001 This value is equal but 1 position is different. [2,3,4] [left to right]

this method complete success [ encoding and decoding ]

but i try to more effective new method it encoder is completed but decode some complicated[in progress]
that can 3055 =>7 with angles (position )7 ,281474970863668=>32751=>14 with angles (position )7,281474970863667=>32767=>15 with angles (position )7
it take 1 or max 3 step[count]

if you do not understand my explanation please forgive me
when i complete this method [practical ..new compression algo..(math & Dictionary Method )] i will explain how to work[only after patent ]  Reply With Quote

21. Originally Posted by uhost well, please complete your Dream in real world . you have well knowledge about math. use it and catch it
My idea is that the so-called "white noise" is actually not so random.
​At least this applies to discrete white noise. There are patterns in it.
https://en.wikipedia.org/wiki/White_noise

To better understand all this, there is a catastrophic lack of time.

PS: I write through an online translator, so my text may not look very correct.  Reply With Quote

1. In your example, you're reducing numbers from the "input" range 3000-4000 to numbers in the "output" range 1-20. Doing this for more than 20 numbers in the input range would produce duplicate outputs, so decoding won't work anymore.
2. Since you're reducing the numbers in multiple steps (e.g. 6 steps for 3055 => 1042 => 1006 => 18 => 14 => 2), you have additional information ("6 steps") to encode to get back from 2 to 3055. If you don't encode this additional information, the decoder doesn't know if he should stop at 14, 18, 1006, 1042 or 3055. Another way would be to encode the range 3000-4000 and stop with the first number in that range, but again, this is additional information that has to be encoded. So it looks like a 12 bit to 2 bit reduction, but the additional information will increase the 2 bit result.
3. If "have different angles & generate different master number" means that you can encode the same number in different ways, this is additional information that the decoder has to know, too.

So that's why I don't think that this statement holds:

this method complete success [ encoding and decoding ]  Reply With Quote

23. Originally Posted by schnaader 1. In your example, you're reducing numbers from the "input" range 3000-4000 to numbers in the "output" range 1-20. Doing this for more than 20 numbers in the input range would produce duplicate outputs, so decoding won't work anymore.
2. Since you're reducing the numbers in multiple steps (e.g. 6 steps for 3055 => 1042 => 1006 => 18 => 14 => 2), you have additional information ("6 steps") to encode to get back from 2 to 3055. If you don't encode this additional information, the decoder doesn't know if he should stop at 14, 18, 1006, 1042 or 3055. Another way would be to encode the range 3000-4000 and stop with the first number in that range, but again, this is additional information that has to be encoded. So it looks like a 12 bit to 2 bit reduction, but the additional information will increase the 2 bit result.
3. If "have different angles & generate different master number" means that you can encode the same number in different ways, this is additional information that the decoder has to know, too.

So that's why I don't think that this statement holds:
i think following example can fix your doubt

4000 => 000096

Decoding Input: 0000 96 = outputput : 4000
Decoding Input: 00000 96 = outputput : 8096
Decoding Input: 000 96 = outputput : 1952
Decoding Input: 00 96 = outputput : 928
Decoding Input: 0 96 = outputput : 416
Decoding Input: 96 = outputput : 160

1):D96 have duplication but not same in my algorithm so left side zero is impotent or valuable information of 96

2)if you want to avoid left side of zero , some rule will help ,
all left zero number is indicated[-] and equivalent to [+] value like as -6 0 6

eg:0003[-]==19 [+][real value of 0003=61] ,020[-]==52[+][real value of 020=108]
(all negative values have equivalent positive values it is less than real value of negative )

3)first step division cannot need this information[left zero] because [zero guess rule, in progress]

This Algorithm is only part of my project it will be 1 or more bit reduce per step of division
although some problem has occurred but that is not still i will fix , Data Compression is complicated This method is not finished for compression.  Reply With Quote

24. @Romul:
You're right. For example, lossless audio compressors deal with it - its mostly incompressible,
but SAC still compresses better than coders that store low bits without modelling.
(Its likely not quite "white" noise, though).

However there's a difference between digitized analog "noise" and originally digital random data
which technically fits the definition of "white noise". Only the latter falls under "counting argument"
(lossless compression transforms each data instance i of N bits to a unique string of Mi bits;
if we say that compression always reduces the size of data by at least 1 bit, it means that
2^(N-1) compressed strings have to be decompressed to 2^N _unique_ data instances, which is impossible.
Thus for some i values Mi>=N is true, so some data instances can't be compressed).  Reply With Quote

25. Originally Posted by Shelwien However there's a difference between digitized analog "noise" and originally digital random data
which technically fits the definition of "white noise".
The fact of the matter is that if I'm right, then there is no difference between the "originally digital noise" and the digitized analog signal.
All this has a certain generative function.
Abstract mathematical white noise has no restrictions on frequencies and amplitudes. Both can take on infinite meanings.
And therefore, it (perfect white noise) cannot be described, except in the form of some idea, like the same infinity.
Or rather, that the parameters describing this noise require infinite accuracy in the description.
That is why I highlighted "discrete white noise" in my text, referring to the sequence
with restrictions on frequencies and amplitudes.

PS: I write through an online translator, so my text may not look very correct.

Приведу и текст на русском языке. Не уверен я в точной передаче смысла при использовании автоматического переводчика.
:
В том то и дело, что если я прав, то разницы между "изначально цифровым шумом" и оцифрованным аналоговым сигналом нет.
У всего этого есть некая порождающая функция.
Абстрактный математический белый шум не имеет ограничений на частоты и амплитуды. И то и другое может принимать бесконечные значения.
Или вернее сказать что параметры описывающие этот шум требуют бесконечной точности(числа бесконечной длины) при описании.
И поэтому он(идеальный белый шум) не может быть описан, кроме как в виде некоторой идеи, вроде той же бесконечности.
И именно поэтому я выделил в своем тексте "дискретный белый шум" имея в виду последовательности
с ограничениями по частотам и амплитудам.  Reply With Quote

data compression 