Subj:
http://ru.wikipedia.org/wiki/ROLZ
Written by me. Probably the first article on ROLZ.![]()
Subj:
http://ru.wikipedia.org/wiki/ROLZ
Written by me. Probably the first article on ROLZ.![]()
?? ????? ???????????? ?????? ???? ???????. ????? ? ????????? ??????? ????????? ?? ?????? ??????????? ?? compression.ru - ?? ???????? ???????? ??, ??? ? ???? ?????? ?????? ??? ????????? ??? ?????, ????????? ??? ???? ?????? ?? ??????? ??????? ?????????? ?????? ? ?? ?? ????? ????????? ??? ????? ???? ??????. ? ???? ?? ????????? ??? ??????? ?????????, ?? ????????? ??? ?? - ??? ???? ??? ?????? ??????, ? ??????? ?? ??????????? ?????? ????? ?? ????? ?????, ??????? ?? ???????? ??? ?????
?? ???. ?????? ? ????? ????????? ???????? ???-?? ???????? - ????? ??? ????? ????? ? LZ ??????? ? ???????????? ???????. ???? ?????? ?????!![]()
afaik, there is no other rolz archivers, so you can write definitely that it's only your merit
Кстати, Булат, смени кодировку своего браузера на ISO-8859-1 - а то все твои сообщения высвечиваются на нанайском - приходится менять кодировку самому. А так соообщения будут записаны в юникоде - &xxxx &xxxx...![]()
in this case you forget about lzma
i've added more urls
?????°?????±??. ?‚?µ???µ???? ?????????°?»??????? ?µ?‰?' ?±?‹ ???°??-?‚?? ???????????·?°?‚?? ???‚?? ?? ???°?????????? ???°???‚??? (IE6+Maxthon)
LZMA is not from this family! It uses the complete offset set (i.e. offset set is not reduced)!Originally Posted by Bulat Ziganshin
ЗЫ
А вот это вообще не понятная кодировка...![]()
Поидее кодировка уже прописанна в HTML: text/html;charset=iso-8859-1 - и браузер должен автоматически ее включать! Видимо у тебя какой-то глюк в браузере или ты в ручную выставил такую кодировку... Кстати, FireFox - зе бест! Експлорер очень дырявый в плане безопасности - вирусы и всякая гадость просто толпами сочится через него...![]()
?????? ?? ISO. ?????? ok?
Nope!![]()
? ??? ??? ????????????. ??? ???????? - ???? ?????? "??? ????????? ????"![]()
not counting Ross WilliamsOriginally Posted by encode
![]()
Еле нашел кодировку...Originally Posted by Bulat Ziganshin
Originally Posted by Bulat Ziganshin
![]()
hmm, i dont understand russian
encode:
could you provide full source code for decoding one symbol (match or literal). ive examined your code and i see some inefficient algorithms and bugs.
firstly i think there should be:
tab[context][0] = position
not
tab[0][i] = position
second, instead of shifting tab[context] (which takes c * tab_size time), you can use queue (ie. hold additional pointer to the beginning of queue), effectively using only constant time for updating last index and shifing queue. this will improve docoding speed.
secondly, you can store additional variable for each tab[context][i] entry that would tell you how long is the common prefix of actual and tab[context][i - 1] entry.
so the code can look like:
where second argument of getmatch is a number of symbols that matches for sure, so they can be skipped. getmatch never returns lower value that its second argument.context = buf[position - 1];
max_match = 0;
match_legth = 0;
for (i = beginning[context]; i != beginning[context]; i = i < TAB_SIZE - 1 ? i + 1 : 0) {
offset = tab[context][i];
match_length := get_match(offset, min(match_length, commons[context][i]));
if (match_length > max_match) {
max_match = match_length;
index = i;
}
}
if (max_match >= MIN_MATCH) {
encode_match_length(max_match);
encode_match_index(index >= beginning[context] : beginning[context] - index : beginning[context] + TAB_SIZE - index);
position += max_match;
}
else {
encode_literal(buf[position]);
position++;
}
commons[context][beginning[context]] = match_legth;
tab[context][beginning[context]] = position;
beginning[context] = beginning[context] < TAB_SIZE - 1 : beginning[context] + 1 : 0;
alternatively, version with modulo division instead of conditianal assingments (faster if TAB_SIZE is a power of two and you use unsigned ints):
i hope my code has no bugscontext = buf[position - 1];
max_match = 0;
match_legth = 0;
for (i = beginning[context]; i != beginning[context]; i = (i + 1) % TAB_SIZE) {
offset = tab[context][i];
match_length := get_match(offset, min(match_length, commons[context][i]));
if (match_length > max_match) {
max_match = match_length;
index = i;
}
}
if (max_match >= MIN_MATCH) {
encode_match_length(max_match);
encode_match_index((beginning[context] + TAB_SIZE - index) % TAB_SIZE);
position += max_match;
}
else {
encode_literal(buf[position]);
position++;
}
commons[context][beginning[context]] = match_legth;
tab[context][beginning[context]] = position;
beginning[context] = (beginning[context] + 1) % TAB_SIZE;
ps. hmm, for me rolz looks like multiway lzp, cause index is not reduced offset, but index to one of most recent (TAB_SIZE) lzp offsets![]()
Its indeed was a bug...Originally Posted by donkey7
Thanks!
I hope you understand that its just a pseudo code for better principles understanding! The real ROLZ code (say LZPM) is different...
Some additional optimizing and stuff will make code unreadable...
Translate the article with babelfish.Originally Posted by donkey7
We reduce the offset set with a context. Yepp, it can be ragarded as an LZP with multiple contexts, however, if we have say 32000+ offsets...
Its more LZ77 than LZP!
![]()
you can look at quad sources, its rolz part is not too large
historically, rolz (lzrw4) was "improved lz77", then Bloom invented many different lzp schemes. important difference - while rolz considers only one context but many variants in this context, lzp may consider several contexts (5/4/3 bytes, for example), but either only last occurence of this context or sequence of *chars* that last times found in this context. one interesting variant which seems not tested by anyone is rolz which tries, say, first several order-5 contexts, then several order-4 contexts..
i guess that this cant make substantial speed improvement. in quad other methods are used instead. for article, simplest method should be used for clarityOriginally Posted by donkey7
hmm, lzp differs from lz77 in only one aspect:
lz77 can code every real offset while lzp can code only offsets in current context. so rolz is definitely lzp unless you add a way to code every offset
i've read quad sources (mainly decode function) and i must admit it's optimal (at least decoding)![]()
hey, ive read quad sources. from what i understand you use the so called flexible parsing in max mode and some weird parsing in normal mode.
hmm, if you modify quad code from:
toif (max) { // max mode
int len = maxlen;
int maxvalue = (len - ((len == MINMATCH)
&& (index >= 4))) + getmatch((i + len), n);
if (maxvalue < ((MAXMATCH * 2) - 1)) {
for (int j = 1; j < len; j++) {
int temp = (j + (j == 1)) + getmatch((i + j), n);
if (temp > maxvalue) {
maxvalue = temp;
maxlen = j; // optimize match length
}
}
will it change things significantly? i know youre using hashing, so it certainly will lower compression ratio, but if when using perfect parsing it would not change compression.if (max) { // max mode
int len = maxlen;
int maxvalue = (len - ((len == MINMATCH)
&& (index >= 4))) + getmatch((i + len), n);
if (maxvalue < ((MAXMATCH * 2) - 1)) {
for (int j = 1 + len - MINMATCH; j < len; j++) {
int temp = (j + (j == 1)) + getmatch((i + j), n);
if (temp > maxvalue) {
maxvalue = temp;
maxlen = j; // optimize match length
}
}
only one line was changed. from:
for (int j = 1; j < len; j++) {
to:
for (int j = 1 + len - MINMATCH; j < len; j++) {
i hope you will understand it without explaining![]()
lazy oneOriginally Posted by donkey7
![]()
Note that the FINAL fersion of QUAD already released...![]()
I tried this already. IIRC, this will hurt compression! And Max mode is about the MAX possible compression!Originally Posted by donkey7
![]()