# Thread: Papers - Summary of Data Compression Algorithms

1. ## Papers - Summary of Data Compression Algorithms

Even when in this forum lies the probably largest source of information about data compression, I want to share with you some documents, including a very useful brief description of numerous algorithms from RLE to PAQ, including BWT, LZ*, PPM, etc.
The purpose is to resume the almost confusing bunch of papers about compression, specially for newbies' well . Actually, I am one of those...
After having this overview can be done further investigation on the chosen algorithm..

The summary:

Note: Don't be afraid to add your preferred papers in a clean way.
Also, as I don't want to make public a somewhat defective text, nor even a word, please be that gently and point out any wrong or unclear redaction via PM. I'll correct them as soon as I take notice. Thanks in advance.

2. ## Thanks (2):

jman (14th April 2017),just a worm (16th January 2015)

3. Another easy to understand document about LZ derivatives.

4. And the well known, must-have Data Compression Explained

Edit: on this post I'll be writing new additions, including yours. That's in order to maintain a clean presentation.

-----------------------------------------------------------------------------------

The Data Compression Book
- 370 pages...

-----------------------------------------------------------------------------------

http://www.drdobbs.com/cpp/data-comp...0169251?pgno=1

-----------------------------------------------------------------------------------

From the same site as above:

* LZW data compression - Include demonstration code.

* A simple data-compression technique - Look at page 3 first. Fast compression in comparable ratio with LZW. Code included.

* A new algorithm for data compression - Look at page 3 first. Algorithm: "Byte Pair Encoding"

* Improving data compression fidelity without sacrificing speed - Now lossy is not that lossy.

* The enduring challenge of compressing random data - Or how the numbers laugh at you.

* Combinatorial data compression - Yet another article against universal compression.

* Data Compression with the Burrows-Wheeler transform

* Differential compression algorithms

* Dynamic Markov Compression - Go to page 5 first.

* An algorithm for online data compression - "LZH-Light" LZ77+Huffman. Go to page 5 first.

* Image compression using Laplacian Pyramid Encoding - Go to page 12 first.

* Image compression with Wavelets

* Faster fractal compression - Speeding up the process. Go to page 9 first.

* Audio Compression

* Digital speech compression - About GSM standard. Go to page 5 first.

* MPEG-2 and video compression

* H.264 and video compression

* Easy analog data compression - Delta applied to medicinal data logging.

* Compression shrinks digital mammograms down to practical size - Less than 1% using wavelet...

* Bounding Box data compression - Very simple bitmap fonts compressor. Go to page 17.

Note: the documents in this section are mostly old. Despite that, they constitute a good source of original knowledge. Almost everyone have source code included.

-----------------------------------------------------------------------------------

At WikiBooks:

* Data Coding Theory

* Data Compression

-----------------------------------------------------------------------------------

More incoming... let the comments show up

5. Originally Posted by Gonzalo
And the well known, must-have Data Compression Explained
You don't miss much if you haven't read this one.
Originally Posted by Data Compression Explained
Symbol ranking, or move-to-front (MTF), is a transform
Move to front is a conversion algorithm, not a transformation algorithm. A transformation algorithm moves the elements around but does not change the value (like the Wheelers Transform does) while a conversion algorithm changes the value but does not change the position of the elements (like a conversion from ISO 8859-1 to UTF-8 or MTF does).

6. just a worm,

A transform can change the values around. Think FFT---there's nothing in the output of a Fourier transform that corresponds one-to-one and literally with anything in the input.

MTF is a transform. You can invert it and get all the original symbols back, just as you can invert the Fourier transform to go from the frequency domain back to the original domain.

7. Doesn't jpeg use a discrete cosine transform?

8. Originally Posted by Kennon Conrad
Doesn't jpeg use a discrete cosine transform?
Yes. The DCT (Discrete Cosine Transform) is the most common version of the FFT (Fast Fourier Transform), which is the most common implementation of the Fourier transform. It's used in jpg and lots of other image & video compression algorithms.

9. I don't like your write up on BWT for several reasons.
1) The original BWT did not use a EOF character to be added to the string so from a historical point of view its wrong.
2) I feel if talking about the Real BWT or the more used one where an EOF character is added one like you used you should point out that in general most such strings formed at random would not have a UNBWT which means the UNBWT needs to have special handling because it can fail for most files or strings. There is no easy way to check a file or string to see if it has a UNBWT with out doing most of the work needed in doing an actually UNBWT.
4) There is an alternate BWT that solves these problems called the BWTS and it is a pure permutation of the string or file and always has an inverse. Also the information can be stored in the same amount of space as original data without an index or specially placed EOF character in some string.
5) In today's world files are often compressed and then encrypted to be sent around the world on the net. If one used Real BWT or the Modifed one you described. If one had a handful or possible encryption keys it most likely most false keys could be immediately rejected do to the poor choice of using a nonbijective method of BWT

Why not discuss the orignal at least for historical reasons

Then discuss the current common approach like you did

And then talk about the bijective BWT approach.

Just my few cents on it.

11. .

12. Originally Posted by Bulat Ziganshin
Is that a good article? I skimmed it, and I have not personally written an arithmetic coder, but I have looked at the code for a few. I thought it was odd that they used floating points. There's also a bit too much c++ for my taste. Those aren't fatal flaws for an article, and the rest may be on point. It just seems like you could probably save time by skipping the extra stuff and seeing the way experts write it right from the start. Matt's arithmetic coder is pretty simple and readable as-is.

13. Referring to DCE:
Originally Posted by just a worm
You don't miss much if you haven't read this one.
There are some valuable parts. I think the biggest flaw is the way Matt's opinions are mixed in with the objective information. I think it's probably a good distance removed from what might be considered a standard introduction to compression.

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•