Results 1 to 22 of 22

Thread: A small article on ROLZ (Russian)

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Subj:
    http://ru.wikipedia.org/wiki/ROLZ

    Written by me. Probably the first article on ROLZ.

  2. #2
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    ?? ????? ???????????? ?????? ???? ???????. ????? ? ????????? ??????? ????????? ?? ?????? ??????????? ?? compression.ru - ?? ???????? ???????? ??, ??? ? ???? ?????? ?????? ??? ????????? ??? ?????, ????????? ??? ???? ?????? ?? ??????? ??????? ?????????? ?????? ? ?? ?? ????? ????????? ??? ????? ???? ??????. ? ???? ?? ????????? ??? ??????? ?????????, ?? ????????? ??? ?? - ??? ???? ??? ?????? ??????, ? ??????? ?? ??????????? ?????? ????? ?? ????? ?????, ??????? ?? ???????? ??? ?????

  3. #3
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    ?? ???. ?????? ? ????? ????????? ???????? ???-?? ???????? - ????? ??? ????? ????? ? LZ ??????? ? ???????????? ???????. ???? ?????? ?????!

  4. #4
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    afaik, there is no other rolz archivers, so you can write definitely that it's only your merit

  5. #5
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Кстати, Булат, смени кодировку своего браузера на ISO-8859-1 - а то все твои сообщения высвечиваются на нанайском - приходится менять кодировку самому. А так соообщения будут записаны в юникоде - &xxxx &xxxx...

  6. #6
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    in this case you forget about lzma

    i've added more urls

  7. #7
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    ?????°?????±??. ?‚?µ???µ???? ?????????°?»??????? ?µ?‰?' ?±?‹ ???°??-?‚?? ???????????·?°?‚?? ???‚?? ?? ???°?????????? ???°???‚??? (IE6+Maxthon)

  8. #8
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Bulat Ziganshin
    in this case you forget about lzma
    LZMA is not from this family! It uses the complete offset set (i.e. offset set is not reduced)!

    ЗЫ
    А вот это вообще не понятная кодировка...

  9. #9
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Поидее кодировка уже прописанна в HTML: text/html;charset=iso-8859-1 - и браузер должен автоматически ее включать! Видимо у тебя какой-то глюк в браузере или ты в ручную выставил такую кодировку... Кстати, FireFox - зе бест! Експлорер очень дырявый в плане безопасности - вирусы и всякая гадость просто толпами сочится через него...

  10. #10
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    ?????? ?? ISO. ?????? ok?

  11. #11
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Nope!

  12. #12
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    ? ??? ??? ????????????. ??? ???????? - ???? ?????? "??? ????????? ????"

  13. #13
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    Quote Originally Posted by encode
    Written by me. Probably the first article on ROLZ
    not counting Ross Williams

  14. #14
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Bulat Ziganshin
    я уже всё перепробовал. эта страница - прям список "все кодировки мира"
    Еле нашел кодировку...

    Quote Originally Posted by Bulat Ziganshin
    not counting Ross Williams

  15. #15
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    hmm, i dont understand russian

    encode:
    could you provide full source code for decoding one symbol (match or literal). ive examined your code and i see some inefficient algorithms and bugs.

    firstly i think there should be:
    tab[context][0] = position
    not
    tab[0][i] = position

    second, instead of shifting tab[context] (which takes c * tab_size time), you can use queue (ie. hold additional pointer to the beginning of queue), effectively using only constant time for updating last index and shifing queue. this will improve docoding speed.

    secondly, you can store additional variable for each tab[context][i] entry that would tell you how long is the common prefix of actual and tab[context][i - 1] entry.

    so the code can look like:
    context = buf[position - 1];

    max_match = 0;
    match_legth = 0;

    for (i = beginning[context]; i != beginning[context]; i = i < TAB_SIZE - 1 ? i + 1 : 0) {
    offset = tab[context][i];

    match_length := get_match(offset, min(match_length, commons[context][i]));

    if (match_length > max_match) {
    max_match = match_length;
    index = i;
    }
    }


    if (max_match >= MIN_MATCH) {
    encode_match_length(max_match);
    encode_match_index(index >= beginning[context] : beginning[context] - index : beginning[context] + TAB_SIZE - index);
    position += max_match;
    }
    else {
    encode_literal(buf[position]);
    position++;
    }

    commons[context][beginning[context]] = match_legth;
    tab[context][beginning[context]] = position;
    beginning[context] = beginning[context] < TAB_SIZE - 1 : beginning[context] + 1 : 0;
    where second argument of getmatch is a number of symbols that matches for sure, so they can be skipped. getmatch never returns lower value that its second argument.

    alternatively, version with modulo division instead of conditianal assingments (faster if TAB_SIZE is a power of two and you use unsigned ints):
    context = buf[position - 1];

    max_match = 0;
    match_legth = 0;

    for (i = beginning[context]; i != beginning[context]; i = (i + 1) % TAB_SIZE) {
    offset = tab[context][i];

    match_length := get_match(offset, min(match_length, commons[context][i]));

    if (match_length > max_match) {
    max_match = match_length;
    index = i;
    }
    }


    if (max_match >= MIN_MATCH) {
    encode_match_length(max_match);
    encode_match_index((beginning[context] + TAB_SIZE - index) % TAB_SIZE);
    position += max_match;
    }
    else {
    encode_literal(buf[position]);
    position++;
    }

    commons[context][beginning[context]] = match_legth;
    tab[context][beginning[context]] = position;
    beginning[context] = (beginning[context] + 1) % TAB_SIZE;
    i hope my code has no bugs



    ps. hmm, for me rolz looks like multiway lzp, cause index is not reduced offset, but index to one of most recent (TAB_SIZE) lzp offsets

  16. #16
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by donkey7
    tab[context][0] = position
    Its indeed was a bug...

    Thanks!

    I hope you understand that its just a pseudo code for better principles understanding! The real ROLZ code (say LZPM) is different...

    Some additional optimizing and stuff will make code unreadable...

    Quote Originally Posted by donkey7
    ps. hmm, for me rolz looks like multiway lzp, cause index is not reduced offset, but index to one of most recent (TAB_SIZE) lzp offsets
    Translate the article with babelfish. We reduce the offset set with a context. Yepp, it can be ragarded as an LZP with multiple contexts, however, if we have say 32000+ offsets... Its more LZ77 than LZP!

  17. #17
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    you can look at quad sources, its rolz part is not too large

    historically, rolz (lzrw4) was "improved lz77", then Bloom invented many different lzp schemes. important difference - while rolz considers only one context but many variants in this context, lzp may consider several contexts (5/4/3 bytes, for example), but either only last occurence of this context or sequence of *chars* that last times found in this context. one interesting variant which seems not tested by anyone is rolz which tries, say, first several order-5 contexts, then several order-4 contexts..

    Quote Originally Posted by donkey7
    secondly, you can store additional variable for each tab[context][i] entry that would tell you how long is the common prefix of actual and tab[context][i - 1] entry.
    i guess that this cant make substantial speed improvement. in quad other methods are used instead. for article, simplest method should be used for clarity

  18. #18
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    hmm, lzp differs from lz77 in only one aspect:
    lz77 can code every real offset while lzp can code only offsets in current context. so rolz is definitely lzp unless you add a way to code every offset

    i've read quad sources (mainly decode function) and i must admit it's optimal (at least decoding)

  19. #19
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    hey, ive read quad sources. from what i understand you use the so called flexible parsing in max mode and some weird parsing in normal mode.

    hmm, if you modify quad code from:
    if (max) { // max mode
    int len = maxlen;
    int maxvalue = (len - ((len == MINMATCH)
    && (index >= 4))) + getmatch((i + len), n);

    if (maxvalue < ((MAXMATCH * 2) - 1)) {
    for (int j = 1; j < len; j++) {
    int temp = (j + (j == 1)) + getmatch((i + j), n);

    if (temp > maxvalue) {
    maxvalue = temp;
    maxlen = j; // optimize match length
    }
    }
    to

    if (max) { // max mode
    int len = maxlen;
    int maxvalue = (len - ((len == MINMATCH)
    && (index >= 4))) + getmatch((i + len), n);

    if (maxvalue < ((MAXMATCH * 2) - 1)) {
    for (int j = 1 + len - MINMATCH; j < len; j++) {
    int temp = (j + (j == 1)) + getmatch((i + j), n);

    if (temp > maxvalue) {
    maxvalue = temp;
    maxlen = j; // optimize match length
    }
    }
    will it change things significantly? i know youre using hashing, so it certainly will lower compression ratio, but if when using perfect parsing it would not change compression.

    only one line was changed. from:
    for (int j = 1; j < len; j++) {
    to:
    for (int j = 1 + len - MINMATCH; j < len; j++) {

    i hope you will understand it without explaining

  20. #20
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    Quote Originally Posted by donkey7
    some weird parsing in normal mode
    lazy one

  21. #21
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Note that the FINAL fersion of QUAD already released...

  22. #22
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by donkey7
    for (int j = 1 + len - MINMATCH; j < len; j++) {
    I tried this already. IIRC, this will hurt compression! And Max mode is about the MAX possible compression!

Similar Threads

  1. Russian Copulation Scottifed or Not?
    By biject.bwts in forum Data Compression
    Replies: 0
    Last Post: 25th August 2008, 02:05
  2. ROLZ explanation?
    By Trixter in forum Data Compression
    Replies: 5
    Last Post: 10th June 2008, 19:24
  3. RZM - a dull ROLZ compression engine
    By Christian in forum Forum Archive
    Replies: 178
    Last Post: 1st May 2008, 22:26
  4. A nice article about hashing
    By encode in forum Forum Archive
    Replies: 19
    Last Post: 26th September 2007, 22:31
  5. Russian National Anthem
    By encode in forum Forum Archive
    Replies: 2
    Last Post: 8th June 2006, 23:07

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •