A long time ago I noticed Binders discussion of DC though he provided only a few examples like
"aaabccca" where he assumed you already had the number of character types a b c as well as there
starting positions. he got 2 DC values namely 3 and 0 so I felt like that was to high I think one DC
value is all that is needed and its a 2.
Yuta has code that I think reflects the original internet discussion of DC. Over a period of a few
days I showed several examples of what I felt was wrong with the DC values. Since I wanted to
make it bijective. Yuta and I made several changes. I made another one recently. I am not sure if
its the last.
What I did was write a program to encode DC as follows
bij_dc_32.exe e A B
and decode as follows
bij_dc_32.exe d B C
B would be the bijective DC of file A
C should be the same as A
Since bijective you could take any file and decode first and then encode second
to get back to same file.
The output of the DC is nothing but numbers done in a bijective way
0xxxxxxx is a number
1xxxxxxx 0xxxxxxxx is a larger number
the 1 in first postion of a byte means continue inless last number in a file.
the 0 means this is last byte in the file. Note the codeing of last number
is special.
I did this so one could test it in reverse direction easily so if you use
random data to decode and then encode so you don't end up with a
monster file. Here is hex ouput of "aaabccca"
E2036261F903
6 BYTES. It could be smaller since use a whole byte for numbers wasteful.
E203 codes the b position
62 codes the code the c position
61 by what I call wrapping around code the a position
F903 codes the NEW MORE NEW SYMBOLS with the first DC value
that's it since only one DC value
Another example SNNBAAA the BWTS of BANANAS
codes to 4EC10141CC02 which is 6 bytes. However by using a different number system easy to get to 5 bytes
4E is for N
C101 is for B
41 is 4 A
CC02 is for S no need for the no new symbol symbol
PLEASE TESTERS TEST IT