I created an article about an alternative approach to minimum redundancy coding (fixedlength) without using tree data structures. Here's the link: http://rccoding.sourceforge.net/
Feedback is welcome.
Best regards,
Daniel
I created an article about an alternative approach to minimum redundancy coding (fixedlength) without using tree data structures. Here's the link: http://rccoding.sourceforge.net/
Feedback is welcome.
Best regards,
Daniel
Thanks a lot!
You're welcome!
I also have the C/C++ code in case anyone is interested.
BTW: Nice avatar image
Regards,
Daniel
Could you compile it to a compressor, so one can see how does it perform?
Siema m^2,
The algorithm is only used for coding and the compression ratio is the same of any optimal fixedlength coder, so the main advantage is a small/faster (de)compressor.
To make a compressor with it, it would either be used in combination with a LZtype algorithm or a statistical model to generate probabilities/frequencies so this data could then be encoded according to the codes generated by rccoding. There are other possibilities of course but those are probably the most effective.
Considering this, an efficient coding algorithm may be important since it leaves more processing capability available to the statistical model, which may improve the final compression ratio indirectly.
At the moment I have no plans for creating a compressor myself with it.
Best regards,
Daniel
Last edited by rasputin; 5th November 2008 at 18:48.
I once read something about "redundancy feedback" coding on comp.compression. Does RC term this? Sorry, i didn't really check out the source, just the images on your site reminded me.
Hi toffer,
It's not related with "redundancy feedback", "range coder" or any other concept with similar initials.
I did wrote an introductory article about it a few years ago though, which was actually very popular but most people didn't understand how to make the optimizations to implement it with efficient source code, so I created this new version to detail everything and cover those shortcomings.
Regards,
Daniel
Hi!
Now i read the describtion of your coding method. Alltogether that's Huffman coding, viewing only the nodes which don't have a parent:
"Find the two chains with lowest frequency and merge them ... "
"Find the two nodes with lowest frequency and merge them..."
Traditional Huffman could assign codes on the fly like you do in a similar fashion: prefix all symbol of the left child with a "0" and all of the right child with a "1". All nodes of the childs can be kept in a list (like you do), since their hierarchy is encoded in the (partial) codes.
To me this just looks like a different implementation of Huffman coding.
What about decoding? I haven't checked the sources yet.
So these are optimal fixed length codes.
you may read Kadach thesis describing how to implement huffman encoding without trees, using just sorting. it's done in tornado too
http://www.compression.ru/download/a...d_1997_pdf.rar
have read your paper. well, it's the same approach as in Kadach/tornado, but we have used simple arrays. algrithm is the following:
1) sort initial symbols list by frequency
2) create empty list of merged symbols
3) repeat N1 times the following operation: find two elements with minimal frequencies and combine them, inserting combined node in the head of "merged symbols" list
since both lists are sorted by frequency (combined list stays sorted because each new node created has combined frequency larger than previous nodes), we need O(1) time to find two nodes to join
further optimizations (not implemented in tornado) are directed toward improving sorting speed (read Kadach). your linked chain datastrucure may require less memory  i don't calculated, but in other aspects it's rather old idea
Hi toffer,
Indeed it's based on standard Huffman coding, as you can clearly see in the description of the base algorithm. However, you'll find the implementation is much more simple and efficient since you don't need to create trees/nodes, etc. The original name of the algorithm actually had a "Huffman" prefix but I changed the title since some people claimed I was taking advantage of the name.
Regarding encoding/decoding, you can use whatever method you prefer since rccoding only performs the coding. Here are some tips that may help you with this: http://www.compressconsult.com/huffman/

Bulat,
Thanks for the link but I don't understand russian
The main innovation is the use of circular chains to simplify the processing and reduce memory usage. Applying sorting isn't new indeed and was already implemented in many different ways before.
Usually the recommended way to perform sorting is to rely on the implementation from the standard library of the language used. Microsoft.NET for example has a very good optimized quicksort implementation by default.
Note however that RC3 (which requires sorting) is just one of several possible variations possible to create with the base algorithm, so the main advantages of using it are the flexibility (you can apply any variation which is best in your case) and simplicity (source code and resource usage is usually much smaller than the alternatives).
Also, the original article I wrote was from 2001 so this isn't exactly new stuff. The techniques are still widely used today and very practical though.
Best regards,
Daniel