Perhaps this helps in analyzing the codes - I changed the printf in DumpCode to this:
Code:
printf( "%06d - %06d - %02X - %c - <", freq0[i] * code[i].l, freq0[i], i, cmask(i) );
This adds two columns - one is stating how much bits a symbol took in the result file, the other shows the frequency of that code. Then I used "sort /R log-huf > log-huf-sorted" to get an overview over the symbols wasting most space. Of course this doesn't help much as we can't directly change one code without changing several of the others, but now you can see where the "bad codes" are - for example w and y that occur quite often (over 10000 times), but have 7 and 8 bit codes. However, it seems this isn't avoidable as w and y are some of the last characters and will have to start with a run of 1's.
Code:
376653 - 125551 - 20 - - <001>
289724 - 072431 - 65 - e - <0110>
239180 - 047836 - 61 - a - <01010>
200108 - 050027 - 74 - t - <1110>
187805 - 037561 - 68 - h - <01111>
179180 - 044795 - 6F - o - <1011>
164445 - 032889 - 72 - r - <11001>
163676 - 040919 - 6E - n - <1010>
159738 - 026623 - 64 - d - <010111>
148028 - 037007 - 69 - i - <1000>
147152 - 036788 - 73 - s - <1101>
138468 - 023078 - 6C - l - <100101>
098497 - 014071 - 77 - w - <1111101>
095888 - 011986 - 79 - y - <11111110>
088795 - 012685 - 63 - c - <0101101>
083110 - 016622 - 0A - . - <00001>
080155 - 016031 - 75 - u - <11110>
073818 - 012303 - 67 - g - <011101>
073422 - 012237 - 66 - f - <011100>
070220 - 014044 - 6D - m - <10011>
063924 - 009132 - 62 - b - <0101100>
061776 - 010296 - 2C - , - <010001>
055992 - 009332 - 70 - p - <110000>
051760 - 006470 - 27 - ' - <01000010>
050190 - 007170 - 2E - . - <0100101>
037674 - 005382 - 76 - v - <1111100>
034958 - 004994 - 6B - k - <1001001>
027685 - 003955 - 2D - - - <0100100>
026091 - 002899 - 49 - I - <010011011>
022212 - 002468 - 22 - " - <010000010>
017694 - 001966 - 54 - T - <010011110>
016093 - 001463 - 42 - B - <01001100110>
010637 - 000967 - 41 - A - <01001100101>
009770 - 000977 - 48 - H - <0100110101>
009416 - 000856 - 4F - O - <01001110100>
008500 - 000850 - 53 - S - <0100111011>
008382 - 000762 - 3B - ; - <01001100010>
008349 - 000759 - 3F - ? - <01001100100>
008316 - 000693 - 50 - P - <010011101010>
008283 - 000753 - 57 - W - <01001111101>
006960 - 000580 - 43 - C - <010011001110>
006656 - 000832 - 21 - ! - <01000000>
006474 - 000498 - 3C - < - <0100110001100>
006325 - 000575 - 47 - G - <01001101001>
006219 - 000691 - 2B - + - <010000111>
006215 - 000565 - 4D - M - <01001110010>
006027 - 000861 - 78 - x - <1111110>
005976 - 000498 - 3E - > - <010011000111>
005772 - 000444 - 45 - E - <0100110011111>
005522 - 000502 - 4E - N - <01001110011>
004576 - 000416 - 59 - Y - <01001111111>
004543 - 000413 - 4C - L - <01001110001>
004543 - 000413 - 46 - F - <01001101000>
003497 - 000269 - 44 - D - <0100110011110>
003276 - 000468 - 6A - j - <1001000>
003185 - 000245 - 52 - R - <0100111010111>
003120 - 000520 - 71 - q - <110001>
003120 - 000240 - 31 - 1 - <0100110000001>
003036 - 000253 - 4A - J - <010011100000>
002640 - 000220 - 3A - : - <010011000011>
002590 - 000185 - 32 - 2 - <01001100000100>
002576 - 000184 - 33 - 3 - <01001100000101>
002112 - 000264 - 7A - z - <11111111>
001963 - 000151 - 34 - 4 - <0100110000011>
001344 - 000096 - 35 - 5 - <01001100001000>
001305 - 000087 - 36 - 6 - <010011000010010>
001275 - 000085 - 37 - 7 - <010011000010011>
001274 - 000098 - 30 - 0 - <0100110000000>
001236 - 000103 - 55 - U - <010011111000>
001190 - 000085 - 38 - 8 - <01001100001010>
001148 - 000082 - 39 - 9 - <01001100001011>
000768 - 000064 - 56 - V - <010011111001>
000540 - 000045 - 4B - K - <010011100001>
000473 - 000043 - 28 - ( - <01000011000>
000440 - 000040 - 29 - ) - <01000011001>
000182 - 000014 - 51 - Q - <0100111010110>
000065 - 000005 - 3D - = - <0100110001101>
000055 - 000005 - 58 - X - <01001111110>
000010 - 000001 - 2A - * - <0100001101>
000009 - 000001 - 26 - & - <010000011>
000005 - 000001 - 00 - . - <00000>
000004 - 000001 - 1A - . - <0001>