I wrote a bijective DC a long time ago. But most of what I wrote is gone. Here is a more pure approach to writing bijective code.
I have covered many times of how to go from a Huffman like series to a fully bijective file here is a summary with an example
A 00
B 01
C 10
D 11
You basically write the bits for each token and add ZEROS as needed. If the set of tokens ends with an ALL ZERO TOKEN or
an ALL ZERO TOKEN Followed by one or more ONE TOKENS you add another 1 bits and then several zero bits.
From this you start taken 8 bits at a time and write byte out. Ignore the trailing extra zero bits where it creates a tail of ZERO BYTES.
Know if it ends in TAIL of an all ZERO BYTE followed by a one or more bytes of 100000000 drop the last byte and your done
example
00 10 01 11 goes to a single byte 00100111
10 01 11 00 goes to a two bytes 10011100 10000000
00 00 00 00 goes to a singe byte 00000000
00 00 00 00 10 00 00 00 goes to two bytes 00000000 10000000
00 00 00 00 10 goes to two bytes 00000000 10100000
now if you have Huffman like tables but have code where the ZERO token is never last. You lay out all the bits as above and
never need a the ONE TOKENS ADD to the beginning set however you still need to take it in to account when you write the
byte data out.
examples
00 10 01 11 same as above to 00100111
10 01 11 goes to 10011100 note on input no ZERO TOKEN allowed for ending.
Not run of all ZEROS that would be a Null File or Null string on output
00 00 00 00 10 a single byte out 00000000
Back to DC the famous string on the internet aaabccca the answer given by some is 30
I say its just plain 3 no 0
The above example does not cover the start of DC is assumes you already have the start of first
a b c it also does not cover what happen when the last run is longer than one character.
Here is a way to do it all including the starting and multiple character runs at end. I will add
a ":" when virtual part done. The mapping to list of number is such that you never have a zero
at end and that you never have more than M-2 zeroes in the output.
looking at aaabccca lets assume that only the characters abcd allowed.
write abcd:aaabccca
as you sweep through modified file. you start at the first "a" go to the end of that run count to you see
next a if no more a write a temporary zero if number of zeros witter so far less than M-2
now take the count not zero and subtract the number of types of character seen so far other than a
in this case 4 - 3 so you write out a "1"
then do the same for b in this case its a "2"
then do the same for the c in this case a "2"
then do the same for the d in this case a "0" allowing a max of one more
know I am going to write a colon to show the allowed taken care of ":"
now at first part of string go to end of run count to a 5 two symbol types so subtract 2 write a "3"
now on the next b you have no b so prepare a "0" which is last zero if needed
now at the c go to end of run and and no count to an c so next chech how many character is
in last run. There is one but it was before the last zero so no output for the c your done
1220:3 is the whole output
lets change the string by addind another a so we have aaabcccaa
now when will geto to the b we write the "0" since we need to write a "1" for the last c since we need an extra a
1220:301
next add two extra cc and end so you have aaabcccaacc
now when you get to the c after the only b and the fact all ZEROS written you the number of a's write a "2" note
that it was one before since those a's were pointed to before last zero written and now not end.
know since c pointed to after last zero written need true count "2"
so you get 1220:3022
suppose only abc allowed going back to first case
cba:aaabccca you get
c goes to "5"
b goes to "4"
a need to start at end of run even though in virtual part its a "3"
so you get 543:
Now the tricky part if you have not used all the zero tokens
the bit patterns for N are
0 0
1 100
2 101
3 11000
4 11010
5 11001
6 11011
once or if M-2 zeroes appear no need for zero
1 1
2 010
3 011
4 00100
5 00110
6 00101
7 00111
the key is pick a pattern so zero is only zeros until m-2 done
then pick a universal for after when zero not need make sure not zero token exists after m-2 zeroes
then convert to the byte file example
abcd:aaabcccaacc
1220:3022
100 101 101 0 : 011 0 010 010 not after second zero change pattern for numbers
10010110 10011001 00100000 3 bytes
of course you a a special bijective arithmetic coder for last pass,
" if errors will reread and change text by editing I am to tired now"