I ran cmix on the file packed.bin in the attached tar file, achieving about 4% compression (0.95895948855 = 1798050/1875001). To generate the file I ran the accompanying perl program, token_gen.pl, which, for each of the 32 quintets, assigns a random frequency from 0 to 7. The frequency distribution of quintets for packed.bin is in the accompanying sdist.txt which appears below for illustration.
Given such a distribution, how can one calculate the ideal compression ratio?
7:11010
7:11000
7:10111
7:10101
7:01001
7:01000
7:00001
6:11110
6:10100
5:11001
5:01110
5:00010
4:11011
4:10010
4:10001
4:01111
3:11111
3:01100
3:01010
3:00110
3:00101
3:00100
3:00011
2:11101
2:11100
2:10011
2:00111
2:00000
1:01101
1:01011
0:10110
0:10000