# Thread: Even when not uncomputable, optimal compression is intractable

1. ## Even when not uncomputable, optimal compression is intractable

As it turns out, a lot of work has gone into studying the computational complexity of optimal compression under various models (setting aside Kolmogorov's famously uncomputable model). A lot of the work was done by Jim Storer and/or his students. There is a wealth of references on his web page:

http://www.cs.brandeis.edu/~storer/

I haven't looked through all of it, and I don't want to risk misquoting something, so I'm not going to restate many results. But, optimal grammar-based compression (Tree-type) is NP-complete, and many other practical compression methods have been proven to be NP-complete. Brandeis has a compression group (which I've somehow been systematically ignoring) that has studied a great many interesting questions related to compression.

There are too many interesting things to try to stuff them all into one post and do them justice. It would take a week to write the post (not even counting the time to read it). I'll add interesting info as I read more.

Unfortunately, I couldn't find too many free papers. However, here is an interesting paper that deals with approximating optimal dictionary compression: http://people.csail.mit.edu/arasala/papers/grammar.pdf

Here is a short paper showing that the problem of finding optimal binary decision trees is NP-complete. A binary decision tree is a procedure that identifies an object out of a set through a series of binary decisions. Many equivalent trees are possible; a tree is optimal when the expected number of decisions is minimized. A compressor can be viewed as a (infinitely-large) decision tree. Objects are replaced with the sequence of decisions that identifies them. Clearly this result is applicable to compression; exactly what the boundaries are of the affected subset requires a little bit of consideration.

2. ## Thanks:

encode (17th July 2014)

3. Having such a large fraction of important papers (in computer science, no less) behind some type of paywall is one of the tragedies of modern academia. The odd thing is provides next to zero benefits, after publication, for the writers of the papers to have them paywall, and comes with significant downsides.

4. Originally Posted by PSHUFB
Having such a large fraction of important papers (in computer science, no less) behind some type of paywall is one of the tragedies of modern academia. The odd thing is provides next to zero benefits, after publication, for the writers of the papers to have them paywall, and comes with significant downsides.
I agree. You can find leaks in the paywalls, though, and learn a lot in spite of them.

5. Originally Posted by PSHUFB
Having such a large fraction of important papers (in computer science, no less) behind some type of paywall is one of the tragedies of modern academia. The odd thing is provides next to zero benefits, after publication, for the writers of the papers to have them paywall, and comes with significant downsides.
Jim and Brandeis are the sponsors of DCC (Data Compression Conference) where you find many interesting papers every year, thus I assume that most of the papers there are on IEEE. Most universities pay for IEEE services (including mine) thus papers from their repository can be downloaded by the academic staff. This being said, IEEE also sponsors such conferences. Whether the price is fair I do not know. The business model of IEEE (and to an even higher degree) other publishers is questionable, though. Scientists pay for papers to be published, review papers for free, and pay for the papers again when retrieving them - thus, "triple pay" for a single publication.

Unfortunately, there is no recognized open access journal comparable to IEEE, thus there are not really many alternatives.

6. From past experience I would say the problem with papers you pay for are often of poor quality. And the ones that are really good don't make it to those that could use them for the advancement of mankind. I also think in the 70's and 80's the people actually coding aircraft and missile software for example were why ahead of academia. The problem with software today is that the people from the 70's and 80's that actually where good are not in charge of the software. Some of these people had only a high school education most only had a BS in math or engineering. But they were far better than the PHD types that lacked real world experience that are trusted today. In those day the managers where are also engineers and scientists. Another major problem is that if someone has a real understand of a method it may not get published at all because of the way peer reviews are being done. Just look at the global warming bull shit. You don't get an honest review or published if you don't pay homage to the religion of global warming. It kind of reminds me of stories an early math teacher told us about how people fared that dared to believe in irrational numbers.

7. Originally Posted by biject.bwts
It kind of reminds me of stories an early math teacher told us about how people fared that dared to believe in irrational numbers.
Not believing in irrational numbers is like not believing in punctuation. They exist because we say so. Unless you are talking about people who refused to believe that the ratio of the radius to the circumference of a circle is not equal to any ratio of whole numbers. That's a factual issue. I think lots of people misspent their time trying to show otherwise, sort of like compression of random data.

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•