I wanted to post it earlier, but wasn't sure if it was interesting at all, the image to be compressed is quite small (32x32) and I thought there isn't much room for clever things. But as it turns out, I'm at 670 chars now with my huffman in C attempt and just passed a more generic answer (zlib in Python).
The image is somehow random, but there are only 10 different colors and they are grouped to small regions with single black pixels sprayed over the image.
Arithmetic coding should give much better results here, but I'm not sure about the code overhead involved. (Adaptive) LZW would be another strong candidate to test.
Instead of transforming from 37-100 to 0-63 you can transform to 64-127. Then you can remove the variable holding the bit index. You would just shift the number to right by one bit at a time until the value is equal to 1 and then you would read next symbol.
I don't know if that would decrease the size though.
And convert:
l=l*2+(j&k)/k
to:
l+=l+(j&k)/k
I would also try performing MTF before Huffman.
Last edited by Piotr Tarsa; 22nd March 2012 at 20:25.
As to the actual problem, it kinda looks like this - http://en.wikipedia.org/wiki/Four_color_theorem
Maybe the original intention was something like encoding some boundary mask, then flood-filling the shapes.
But the small size of the picture makes it much less interesting - likely a bit RLE with elias codes or something similar
would win here too.
Thanks, although I already did exactly that change just minutes ago
Originally Posted by Shelwien
How about this:
[...]
Hm... there are some more control characters I'd convert (e.g. bell and TAB), but it should still be much better, yes, saving ~80-90 characters.
EDIT: Just tried it using 7bit (37-164) and although it compiles and runs, copy/paste runs into problems and depends too much on the browser/editor used.
Last edited by schnaader; 22nd March 2012 at 21:47.
Instead of transforming from 37-100 to 0-63 you can transform to 64-127. Then you can remove the variable holding the bit index. You would just shift the number to right by one bit at a time until the value is equal to 1 and then you would read next symbol.
Thanks, saved 6 chars
Originally Posted by JamesB
A more standard version of k?:expr is k||expr. Similarly k?expr: (is that a gcc extension too?) is k&&expr.
The image is somehow random, but there are only 10 different colors and they are grouped to small regions with single black pixels sprayed over the image.
You could perhaps reduce the colors to 9 (since brown appears only after purple).
Locate the spot of the last purple pixel (31,1, once decoding has crossed it replace the purple color by brown).
remove the brown color from:
p[]={0,5390379,6379596,7783255,3435360,16354410,51318 94,5295994,4671418,9975021};
Recompute Huffman codes for 9 colors + special repeat, using a picture similar to the one attached (brown replaced by purple -> 9 colors)
for(m=0;m<10;m++){ // only 10 huffman codes
if(l==h[m]){
if (y>1 p[purple index]=brown color;
My version, with a rangecoder - http://nishi.dreamhosters.com/u/rgb_00.rar, 860 chars
base-96 version compresses the image to 310 bytes,
base-256 version produces 256 bytes.
bmf's result is 230 (w/o header), so 256 should be fairly ok.
I actually tried tuning a linear counter for it and the improvement was too small (like maybe 5 bytes).
Python script winning it with built-in zlib and base64 is no fun though.
And another one with 648 chars - http://nishi.dreamhosters.com/u/rgb_02.rar
Maybe I should try switching to C, these automatic types are surely convenient.
@schnaader: Look at my color table, there's stuff you can borrow too.
Well done, Shelwien! These are entries I had in mind when posting here, using more advanced algorithms.
Originally Posted by Shelwien
Python script winning it with built-in zlib and base64 is no fun though.
Yes - the author didn't even follow my advice to use a special color (yet). Since the overhead is so small, he'll be at 400-500 characters total after it, which should be impossible to catch with even the best algorithms.
Originally Posted by Shelwien
@schnaader: Look at my color table, there's stuff you can borrow too.
Just saw that. Very useful - this is a good alternative to hexadecimal representations that are pretty useless in C-like languages (except for big numbers) because the prefix is 2 bytes long.
EDIT: Yeah, now I'm at 648, too By the way, if you can switch color order, GBR is 1 char shorter:
I also tried factoring the numbers (the idea was to reuse common parts, and/or find more numbers which could be written as literals),
but it failed, because most of the colors include very large prime factors, so nothing to reuse.
Thanks for the GBR tip, I used it, current results are 631(C++) and 613(C).
Also tried literals with masking (like 'ab\377'), but it wasn't of any help.
Found an interesting trick like this:
while(j--){XXX;
for(;j--;){XXX; // same length
for(XXX;j--;){ // 1 char less
Also X,Y arguments appeared very useful as loop counters etc
and X+=Y*32+1 is shorter than j=Y*32+X+1
Its also interesting that C++ accepts ++X+=Y*32, but C doesn't.
The color palette could be seen as ~65 additional symbols (0-9,next color) that could be added to the compressed data, so the color palette would be decoded at runtime. It would save around 30 chars (with base64) and would cost a conditional to decode those first symbols different, something like:
My first implementation (the one with real rangecoder) had colors encoded along with other data.
Current version has about 80 bytes of redundancy comparing to that, but appears that the extra decoding branch
was even longer.