As it turns out, a lot of work has gone into studying the computational complexity of optimal compression under various models (setting aside Kolmogorov's famously uncomputable model). A lot of the work was done by Jim Storer and/or his students. There is a wealth of references on his web page:

http://www.cs.brandeis.edu/~storer/

I haven't looked through all of it, and I don't want to risk misquoting something, so I'm not going to restate many results. But, optimal grammar-based compression (Tree-type) is NP-complete, and many other practical compression methods have been proven to be NP-complete. Brandeis has a compression group (which I've somehow been systematically ignoring) that has studied a great many interesting questions related to compression.

There are too many interesting things to try to stuff them all into one post and do them justice. It would take a week to write the post (not even counting the time to read it). I'll add interesting info as I read more.

Unfortunately, I couldn't find too many free papers. However, here is an interesting paper that deals with approximating optimal dictionary compression: http://people.csail.mit.edu/arasala/papers/grammar.pdf

Here is a short paper showing that the problem of finding optimal binary decision trees is NP-complete. A binary decision tree is a procedure that identifies an object out of a set through a series of binary decisions. Many equivalent trees are possible; a tree is optimal when the expected number of decisions is minimized. A compressor can be viewed as a (infinitely-large) decision tree. Objects are replaced with the sequence of decisions that identifies them. Clearly this result is applicable to compression; exactly what the boundaries are of the affected subset requires a little bit of consideration.