Results 1 to 20 of 20

Thread: CSC3.2 is on developing

  1. #1
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts

    CSC3.2

    Hello Every, Here is my CSC 3.2 Alpha 2.
    Here is csc on compression ratings
    It's a complete rewritten of CSC3.1, aiming at better ration and faster speed.

    The main algorithm of them are almost same -- LZ77 + Ari.
    But to achieve a better performance, I made such changes:

    1. Abandon HashChain match finder. Use hashTable instead. And the number of match candidates is only 1-6 (hidden parameter -mx is 12). That means in -m1 there is only 1 hash-6-bytes candiate on each string matching.

    2. On most data, LZ is very good. But on bad data like audios. It is very slow. So only find matches on the beginning of small blocks and insert hash-6 strings every several bytes. Then it would performs like skip (30MB/s+) but can find duplicate blocks. (try Tar contains several same zipped files)

    3. Detection: CSC3.1 calculates each column's(1\2\3\4\8\16) entropy to determine if delta is needed every 64KB. It's slow. Now such code instead:
    while(progress<size-12)
    {

    sameDist[0]+=abs((signed)src[progress]-(signed)src[progress+1]);
    sameDist[1]+=abs((signed)src[progress]-(signed)src[progress+2]);
    sameDist[2]+=abs((signed)src[progress]-(signed)src[progress+3]);
    sameDist[3]+=abs((signed)src[progress]-(signed)src[progress+4]);
    sameDist[4]+=abs((signed)src[progress]-(signed)src[progress+8]);
    sameDist[5]+=abs((signed)src[progress]-(signed)src[progress+12]);


    progress++;
    }

    The MinBlockSize is 8KB. If largest sameDist[i] and smallest sameDist[i] differs much that often indicates this block is tables and smallest sameDist[i] is the Channel Num.

    4. Improved parsing and model. There are 5 kinds of packes in LZ out stream:
    Normal Match, Matches with Repeat Distance (4), 1-Byte match (last distance)(helps much in data tables), Same match(last len and distance), literal.
    Literal use order-1 (3MSBit as context). Match distances coding use match len as context.

    5. Range coder changed to the same as LZMA. Seems 10-20% faster on decompression compared to CSC31's coder.

    6. The dictionary size is enlarged much ---320MB (actually 500MB+ but seems meaningless). Due to the changes in match finder, memory usage is only 40M+dictionary size in -m1 mode. Decompression needs 5M+dictionary size, i/o buffer included.

    Still need to improve:
    LZ Parsing has bugs so far. Its price evaluation doesn't work. Check e.coli -- Theorical should has no matches and be compressed to 2bpb.

    The delta coding is far from statisfied. Thanks to Nania, but his works distinguish datas by header of file. So what sould I do if I don't know the width of bitmap? Sami advices linear preditor + order 0 is good enough. But I lack the knowledge of linear preditor. Also change CSC to archiver is a kind of solution.

    Technical changes in CSC3.2 Alpha 3:
    1. Removed repeat match in LZ. Very little ratio improved on most files.
    2. Changed literal coding a little lead to very little ratio improved on all files.
    3. Changed delta coding. Improved decoding speed.
    4. Cached some bits in matchfinder, improved compression speed.

    Technical changes in CSC3.2 Alpha 4:
    1. Improved compression speed on data that hard to compress
    2. Fixed a bug that hurts compression.
    3. Reset mode: -m0 -> -m1, -m1 -> -m2, -m2/3 -> -m3. default -m2 now.
    4. Dictionary size is more flexible now, can be any KB ranges from 32KB to 512MB.

    Technical changes in CSC3.2 Alpha 5:
    1. Improved compression speed a little.
    2. A very weak and experimental DICT (not bulat's ) preprocessor added. Improved %1 ratio on text.

    Technical changes in CSC3.2 Alpha 6:
    1. Very small improvement, most about DICT
    2. This is the last version of CSC. Next version will be an archiver, and maybe called CSArc.

    Technical changes in CSC3.2 Beta 2:
    New version emphasize more on compression ratio rather than compression efficiency.
    1. Rewrite LZ part for -m2/-m3 mode. Removed previous -m1 mode,now -m1 is previous -m2.
    Now the LZ(-m2/3) use better parsing, but much slower than before.
    Now literal coder is complete order-1 while previous use only 3-MSB.
    2. New delta detector, but seems not better than before.
    3. Now it's an archiver, many codes are from YZX(or from Sami Runsas).
    4. Some other small modification.

    Changes in CSC3.2 Final:
    Fixed decompression error in previous version.
    Attached Files Attached Files
    Last edited by Fu Siyuan; 1st March 2011 at 17:35. Reason: Update

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,372
    Thanks
    213
    Thanked 1,020 Times in 541 Posts
    http://en.wikipedia.org/wiki/Linear_prediction
    Also, did you look at the bsc source?
    Its compression methods may be different, but detectors and filters
    can be used as a reference.

  3. #3
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Fu Siyuan View Post
    3. Detection: CSC3.1 calculates each column's(1\2\3\4\8\16) entropy to determine if delta is needed every 64KB. It's slow. Now such code instead:
    while(progress<size-12)
    {

    sameDist[0]+=abs((signed)src[progress]-(signed)src[progress+1]);
    sameDist[1]+=abs((signed)src[progress]-(signed)src[progress+2]);
    sameDist[2]+=abs((signed)src[progress]-(signed)src[progress+3]);
    sameDist[3]+=abs((signed)src[progress]-(signed)src[progress+4]);
    sameDist[4]+=abs((signed)src[progress]-(signed)src[progress+8]);
    sameDist[5]+=abs((signed)src[progress]-(signed)src[progress+12]);


    progress++;
    }

    The MinBlockSize is 8KB. If largest sameDist[i] and smallest sameDist[i] differs much that often indicates this block is tables and smallest sameDist[i] is the Channel Num.
    1 - This IS NOT entropy, you have used some kind of heuristics.
    2 - Linear predictors are not very complex. Just think a series of coefficients and you just apply convolution at each step with those coefficients. You can even use heuristics to update coefficients if you want to make it adaptive. Or you can use least squares method to obtain coefficients.
    3 - "Same match(last len and distance)" option seems unappropriate. Because, it does not occur too much IIRC. You can dump statistics of some files to see it really helps or not.
    BIT Archiver homepage: www.osmanturan.com

  4. #4
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts
    I just downloaded bsc source. It's a nice reference of detectors.


    To Osman:
    1. You misread it. CSC3.1 uses entropy, this does not. However, I finally understands your metioned much the word "heuristics".

    2. I will have a try, I also read the RAR audio filter. "heuristics" often works for me.

    3. Yes, you are right. Same match only benefits on a few files. I will fixed it in next alpha edition.

  5. #5
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    are you seen mmdet and delta in freearc sources?

    That means in -m1 there is only 1 hash-6-bytes candiate on each string matching.
    are you use "caching"? i think that saving 4 bytes of each match as check key along with pointer should reduce number of extra memory accesses to minimum. and since L2 cache line is 64 byte, you may access 8 matches without extra penalty:

    Code:
    hash = hash_func(p[0]...p[5])
    key = *(uint32*)p
    for (i=0; i<8; i++) {
      if (ht[hash][i].key == key) {
        match_ptr = ht[hash][i].ptr
        match_len = memcmp_bytes(p, match_ptr) 
        ....
      }
    }
    Last edited by Bulat Ziganshin; 9th May 2010 at 11:27.

  6. #6
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    are you seen mmdet and delta in freearc sources?
    Truly speaking, it's not easy for me to understand all your ideas though I always keep trying to. At the same time I also try some my own ideas. BTW, your dict preprocessor attracts me more and I'm consider implement one.

  7. #7
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    Quote Originally Posted by Fu Siyuan View Post
    Truly speaking, it's not easy for me to understand all your ideas though I always keep trying to. At the same time I also try some my own ideas. BTW, your dict preprocessor attracts me more and I'm consider implement one.
    of course it's hard since my comments are in russian rather than chinese

    dict implementation is amazingly optimized by speed but its algorithm isn't ideal. look at data produced by WRT: prerocessing by WRT makes further ppmd compression faster while preprocessing by DICT makes it slower
    Last edited by Bulat Ziganshin; 9th May 2010 at 11:36.

  8. #8
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts
    hash = hash_func(p[0]...p[5])
    key = *(uint32*)p
    for (i=0; i<8; i++) {
    if (ht[hash][i].key == key) {
    match_ptr = ht[hash][i].ptr
    match_len = memcmp_bytes(p, match_ptr)
    ....
    }
    }
    Based on my observation. Hash miss (find same hash but not a match) isn't much so I didn't use "if (ht[hash][i].key == key)"
    In my program, i<1(-m1) 3 (-m2) 6(-m3). However, more candidates still slows much, not because memory access. Checking longer matches itself cost obvious long time.
    Another test: Hash8-byte is very helps on enwik8/9. But get much worse ratio on binary.

  9. #9
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    Quote Originally Posted by Fu Siyuan View Post
    Hash miss (find same hash but not a match) isn't much
    every memory access costs ~100 ticks

    However, more candidates still slows much, not because memory access. Checking longer matches itself cost obvious long time.
    much more than 100 ticks? taking into account that most of checked matches will be less that 4 bytes

    you may gather stats, for example on ewnwik8 and my dll100.dll with 1..8 matches checked: % of hash misses, average match_len (in means of my code)


    Another test: Hash8-byte is very helps on enwik8/9. But get much worse ratio on binary.
    many years ago i've found the same with hash4 matchfinder instead of hash3 widely used those times

    now i think that one should use separate hash table that fits in L2 chache. this means about 2^20 entries, so it may be used only for matches up to 5 or 6 bytes. btw, it's better organized as hc4 since it doesn't suffer from main hc4 problem - lot of memory access

    although if one need maximum-maximum compression w/o bt4, she may use separate hash with size up to 100 mbytes, so it may find matches up to 7-8 bytes
    Last edited by Bulat Ziganshin; 9th May 2010 at 12:55.

  10. #10
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    "linear predictor + order0" is very simple thing. you subtracts successive values (over N bytes) - this is called linear predictor, and then computes order0 compression of resulted dara. what N gives best compression and if it gives better compression than order0 of original data - it should be used. it's how mmdet works - you may run its executable ( http://freearc.org/Research.aspx ) on wav/bmp files

    so, delta preprocessing by itself is called linear predictor simplest way to check whether it helps is to compare order0 compression before and after applying it

  11. #11
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    "linear predictor + order0" is very simple thing. you subtracts successive values (over N bytes) - this is called linear predictor, and then computes order0 compression of resulted dara. what N gives best compression and if it gives better compression than order0 of original data - it should be used. it's how mmdet works - you may run its executable ( http://freearc.org/Research.aspx ) on wav/bmp files

    so, delta preprocessing by itself is called linear predictor simplest way to check whether it helps is to compare order0 compression before and after applying it
    Yeah, I know delta is a kind of linear predictor. Matt explained this in his article. But only subtracts successive values is not enough to get a good performance on wav/bmps. Current CSC3.2 use delta and order1, it's slow and performs not as good as RAR, expecially on bitmap. CSC3.1 does what your said. The new method of detection is a trying. It should be simpler and faster.

  12. #12
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    i talked only about detector, since you said that you don't understand thats's "linear predictor + order0"

    overall, MM compression is its own area and lz+filter is a very poor men's approach here

  13. #13
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts

    Smile

    Hi Bulat:

    Thanks for your advice about caching. To avoid use too much memory, I use only 3bits to store the key (3bit key+29 bit dictionary size). On many test cases, the program speeds up by nearly 10%.

  14. #14
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts

  15. #15
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts

    Hi!

    Very very nice results for this version!
    Super and fast LZ77! hi nice job!

  16. #16
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Some tests:
    TCUP:
    No code has to be inserted here.Exe:
    No code has to be inserted here.Sorry, no compression timings, but it felt fast.

  17. #17
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts
    New update of my program, with source code.

    Use a better flexible parsing.

  18. #18
    Member Fu Siyuan's Avatar
    Join Date
    Apr 2009
    Location
    Mountain View, CA, US
    Posts
    176
    Thanks
    10
    Thanked 17 Times in 2 Posts
    Now it's ok. I'm not sure if the codes work without Visual C++ 2008
    Last edited by Fu Siyuan; 22nd December 2010 at 16:12.

  19. #19
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Updated results for csc32 final. http://mattmahoney.net/dc/text.html#2299

  20. #20
    Member Surfer's Avatar
    Join Date
    Mar 2009
    Location
    oren
    Posts
    203
    Thanks
    18
    Thanked 7 Times in 1 Post
    Any progress on it?

Similar Threads

  1. CSC3 FINAL
    By Fu Siyuan in forum Data Compression
    Replies: 12
    Last Post: 25th August 2009, 07:59

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •