I wrote a bijective DC a long time ago. But most of what I wrote is gone. Here is a more pure approach to writing bijective code.

I have covered many times of how to go from a Huffman like series to a fully bijective file here is a summary with an example

A 00

B 01

C 10

D 11

You basically write the bits for each token and add ZEROS as needed. If the set of tokens ends with an ALL ZERO TOKEN or

an ALL ZERO TOKEN Followed by one or more ONE TOKENS you add another 1 bits and then several zero bits.

From this you start taken 8 bits at a time and write byte out. Ignore the trailing extra zero bits where it creates a tail of ZERO BYTES.

Know if it ends in TAIL of an all ZERO BYTE followed by a one or more bytes of 100000000 drop the last byte and your done

example

00 10 01 11 goes to a single byte 00100111

10 01 11 00 goes to a two bytes 10011100 10000000

00 00 00 00 goes to a singe byte 00000000

00 00 00 00 10 00 00 00 goes to two bytes 00000000 10000000

00 00 00 00 10 goes to two bytes 00000000 10100000

now if you have Huffman like tables but have code where the ZERO token is never last. You lay out all the bits as above and

never need a the ONE TOKENS ADD to the beginning set however you still need to take it in to account when you write the

byte data out.

examples

00 10 01 11 same as above to 00100111

10 01 11 goes to 10011100 note on input no ZERO TOKEN allowed for ending.

Not run of all ZEROS that would be a Null File or Null string on output

00 00 00 00 10 a single byte out 00000000

Back to DC the famous string on the internet aaabccca the answer given by some is 30

I say its just plain 3 no 0

The above example does not cover the start of DC is assumes you already have the start of first

a b c it also does not cover what happen when the last run is longer than one character.

Here is a way to do it all including the starting and multiple character runs at end. I will add

a ":" when virtual part done. The mapping to list of number is such that you never have a zero

at end and that you never have more than M-2 zeroes in the output.

looking at aaabccca lets assume that only the characters abcd allowed.

write abcd:aaabccca

as you sweep through modified file. you start at the first "a" go to the end of that run count to you see

next a if no more a write a temporary zero if number of zeros witter so far less than M-2

now take the count not zero and subtract the number of types of character seen so far other than a

in this case 4 - 3 so you write out a "1"

then do the same for b in this case its a "2"

then do the same for the c in this case a "2"

then do the same for the d in this case a "0" allowing a max of one more

know I am going to write a colon to show the allowed taken care of ":"

now at first part of string go to end of run count to a 5 two symbol types so subtract 2 write a "3"

now on the next b you have no b so prepare a "0" which is last zero if needed

now at the c go to end of run and and no count to an c so next chech how many character is

in last run. There is one but it was before the last zero so no output for the c your done

1220:3 is the whole output

lets change the string by addind another a so we have aaabcccaa

now when will geto to the b we write the "0" since we need to write a "1" for the last c since we need an extra a

1220:301

next add two extra cc and end so you have aaabcccaacc

now when you get to the c after the only b and the fact all ZEROS written you the number of a's write a "2" note

that it was one before since those a's were pointed to before last zero written and now not end.

know since c pointed to after last zero written need true count "2"

so you get 1220:3022

suppose only abc allowed going back to first case

cba:aaabccca you get

c goes to "5"

b goes to "4"

a need to start at end of run even though in virtual part its a "3"

so you get 543:

Now the tricky part if you have not used all the zero tokens

the bit patterns for N are

0 0

1 100

2 101

3 11000

4 11010

5 11001

6 11011

once or if M-2 zeroes appear no need for zero

1 1

2 010

3 011

4 00100

5 00110

6 00101

7 00111

the key is pick a pattern so zero is only zeros until m-2 done

then pick a universal for after when zero not need make sure not zero token exists after m-2 zeroes

then convert to the byte file example

abcd:aaabcccaacc

1220:3022

100 101 101 0 : 011 0 010 010 not after second zero change pattern for numbers

10010110 10011001 00100000 3 bytes

of course you a a special bijective arithmetic coder for last pass,

" if errors will reread and change text by editing I am to tired now"