Page 2 of 3 FirstFirst 123 LastLast
Results 31 to 60 of 86

Thread: RH4 - Solid multifile compressor

  1. #31
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    A faster attempt at CRC32 this time after reading CRC32C's description... the look-up table now contains what happens to process 16 bytes (8 for x86) from any state. It's still a simple approach with no special instructions. enwik8 is read and hashed in 68 ms on x64, 84 ms on x86, including I/O times with a 512KB block. My old single-byte look-up table took ~250 ms on enwik8, ~400 ms because I was doing it while it was packing so the table wasn't always in the cache.

    I changed the duplicate finder to check for all files of the same size instead of only the next one, n*(n-1) instead of 2*n now. If a file is checked as a duplicate, it is hashed. I use CRC32 as a gate to compare, and I still read the file if they have matching hashes (not very often but quite possible). Also, zero-byte files are written now.
    Attached Files Attached Files
    Last edited by cade; 22nd March 2014 at 15:17. Reason: typo

  2. #32
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    True, VMAC is faster, but not cryptographically secure unless seeded with a secret key. That works fine when you are creating a new archive, but not for updating existing archives because you must store the key along with the hashes. This might not be a concern if you aren't worried about people trying to deliberately create collisions so deduplication will fail. But I worry about that.

    BLAKE2 is probably faster than SHA-1 on 64 bit machines, but not 32 bit. BLAKE2 hasn't received the same level of security analysis as BLAKE or Keccak, but I guess that doesn't bother people who are more concerned about speed.

  3. #33
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    yes, we have 3 levels of hashing:

    1. non-crypto such as cityhash and xxHash. xxHash should be around 7 GB/s on 4 GHz cpu

    2. MACs that can be used for secure deduplication without storing hashes of the files, as implemented in srep and RH. the fastest one i know is VMAC, 10 GB/s

    3. cryptohashes such as sha256 and blake2. afaik blake2 is fastest now, but sha256 will be the only one with CPU acceleration. NIST recommends not to use sha1 in new products since it was partially broken. afaik sha256 speed is around 400 MB/s and blake2 is 2x faster

  4. #34
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Updated tests on RH4. Faster and better. http://mattmahoney.net/dc/10gb.html

  5. #35
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    Thanks! Could you also please update LTCB?

  6. #36
    Member
    Join Date
    Jun 2013
    Location
    USA
    Posts
    98
    Thanks
    4
    Thanked 14 Times in 12 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    3. cryptohashes such as sha256 and blake2. afaik blake2 is fastest now, but sha256 will be the only one with CPU acceleration. NIST recommends not to use sha1 in new products since it was partially broken. afaik sha256 speed is around 400 MB/s and blake2 is 2x faster
    the blake2 source includes SSE versions that get compiled by default. Is it not possible to make a fast SSE implementation of SHA256?

    BLAKE2 is probably faster than SHA-1 on 64 bit machines, but not 32 bit. BLAKE2 hasn't received the same level of security analysis as BLAKE or Keccak, but I guess that doesn't bother people who are more concerned about speed.


    Call me crazy but if you want more security from BLAKE2 or whatever, can't you just increase the number of rounds? As far as I can tell, BLAKE2 is stronger per round than SHA256 or 512.

  7. #37
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,497
    Thanks
    733
    Thanked 659 Times in 354 Posts
    1. blake2 was developed with SIMD in mind, unlike sha1/2. afair, it's possible to compute several SHA1/2 values simultaneously - there is Intel paper describing that

    2. Matt said about a lack of analysis, i.e. time spent by scientists. although increasing nu,ber of rounds usually increase security, it can't replace analysis

  8. #38
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts

  9. The Following User Says Thank You to Matt Mahoney For This Useful Post:

    cade (24th March 2014)

  10. #39
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    776
    Thanks
    63
    Thanked 270 Times in 190 Posts
    I get error when compressing Keshe Foundation embassy USB stick files:

    Found 632 files
    Error opening input file X:\org data\KESHE PATENTS\Medical_PATENT1\Brooks Design-Contemporary Graphics ?Art Theory 101.pdf

    and if I rename it:

    Found 632 files
    Error opening input file X:\org data\KESHE PATENTS\patent 8-060207\ZW2\diamond_fullerene\photos_diamond_coatin g\bolt_?.JPG
    Last edited by Sportman; 25th March 2014 at 04:36.

  11. #40
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    776
    Thanks
    63
    Thanked 270 Times in 190 Posts
    Input RAM-disk:
    1,771,512,146 bytes, Keshe Foundation embassy USB stick files (with above two files renamed)

    Output RAM-disk:
    799,411,741 bytes, 35.302 sec., rh4_x64 c1
    794,341,466 bytes, 101.690 sec., rh4_x64 c6

    I get:
    Error: output filename contains ../ (plus a very long string) when decompressing.
    Last edited by Sportman; 25th March 2014 at 04:43.

  12. #41
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    It might be a unicode filename or some special character. If I display that error, it means the basic fopen() in C's standard library failed.

  13. #42
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    This is mainly for RichSelian, perhaps others will also find it useful:

    Here is source for an experiment for optimal parsing with 64k tables. enwik8 compresses to 24,617k with the decoder slightly slower than before (no code changed, only table size and memory usage). I didn't measure timings because I didn't try using any better match finder (just hash chain, not hash table like normal), so match finding is slow. The actual optimal parse path finding on 16 MB blocks for the 100 MB is 1.5 seconds.

    Code:
        const int OPTIMAL_NUM_SLOTS = 8;
    
    ...
    
        struct OptimalSlot
        {
            int bucket_idx[OPTIMAL_NUM_SLOTS];
            int bucket_len[OPTIMAL_NUM_SLOTS];
            int num_slots;
    
            int best_len;
        };
        // Add buckets as more matches are found, when num_slots == OPTIMAL_NUM_SLOTS, the last one is overwritten by longer matches.
    
    ...
    
                    int bits_for_sym[0x100];
                    {
                        Coder<0x100> coder;
                        coder.MemInit();
                        for (int i = 0; i < window_n; i++)
                        {
                            coder.UpdateFreq(window[i]);
                        }
                        coder.update_lengths();
    
                        for (int i = 0; i < 0x100; i++)
                        {
                            bits_for_sym[i] = coder.code_lengths[i];
                        }
                        coder.MemFree();
                    }
    
                    for (int pos = window_n - 1; pos >= 0; pos--)
                    {
                        OptimalSlot* slot = slots + pos;
    
                        int cur_cost = bits_for_sym[window[pos]];
                        if (pos < window_n - 1)
                        {
                            cur_cost += cost_to_end[pos + 1];
                        }
    
                        int match_len = 1;
    
                        if (slot->best_len > 0)
                        {
                            int best_len = 0;
                            for (int cur_len = slot->best_len; cur_len > 1; cur_len--)
                            {
                                int idx = -1;
                                for (int i = 0; i < slot->num_slots; i++)
                                {
                                    if (slot->bucket_len[i] >= cur_len)
                                    {
                                        idx = slot->bucket_idx[i];
                                        break;
                                    }
                                }
                                ASSERT(idx != -1);
    
                                int min_len = min_length_for_offset(idx);
                                if (min_len > cur_len) { continue; }
    
                                int match_cost = cost_to_end[pos + cur_len];
    
                                int lidx = base_idx(cur_len, lengthcode_base, lnum, lengthcode_bits);
                                match_cost += average_bits_for_length_sym;
                                match_cost += lengthcode_bits[lidx];
    
                                int didx = base_idx(idx, distcode_base, dnum, distcode_bits);
                                match_cost += average_bits_for_distance_sym;
                                match_cost += distcode_bits[didx];
    
                                if (match_cost < cur_cost)
                                {
                                    best_len = cur_len;
                                    cur_cost = match_cost;
                                }
                            }
    
                            match_len = best_len;
                        }
    
                        slot->best_len = match_len;
                        cost_to_end[pos] = cur_cost;
                    }
    
                    while (rpos < window_n)
                    {
                        OptimalSlot* slot = slots + rpos;
                        if (slot->best_len > 1)
                        {
                            int idx = -1;
                            for (int i = 0; i < slot->num_slots; i++)
                            {
                                if (slot->bucket_len[i] >= slot->best_len)
                                {
                                    idx = slot->bucket_idx[i];
                                    break;
                                }
                            }
    
                            ASSERT(idx > -1);
                            int min_len = min_length_for_offset(idx);
                            ASSERT(slot->best_len >= min_len);
    
                            codeblock[wpos++] = 256 + slot->best_len - min_len;
                            codeblock[wpos++] = idx;
    
                            rpos += slot->best_len;
                        }
                        else
                        {
                            codeblock[wpos++] = window[rpos];
    
                            ++rpos;
                        }
                    }
    
                    delete[] cost_to_end;
                    delete[] slots;
    
    ...
    Set up entropy
    ...
    
                for (int i = 0; i < wpos; i++)
                {
                    int sym = codeblock[i];
                    if (sym < 0x100)
                    {
                        coder.EncodeSym(io, sym);
                        num_syms++;
                    }
                    else
                    {
                        num_syms++;
                        num_dists++;
    
                        int match_len = sym - 256;
                        int match_dist = codeblock[++i];
    
                        int lidx = base_idx(match_len, lengthcode_base, lnum, lengthcode_bits);
                        int didx = base_idx(match_dist, distcode_base, dnum, distcode_bits);
                        num_extra_bits += lengthcode_bits[lidx];
                        num_extra_bits += distcode_bits[didx];
    
                        coder.EncodeSym(io, 256 + lidx);
                        if (lengthcode_bits[lidx] > 0) { io.Write(match_len - lengthcode_base[lidx], lengthcode_bits[lidx]); }
                        coder_dist.EncodeSym(io, didx);
                        if (distcode_base[didx] > 0) { io.Write(match_dist - distcode_base[didx], distcode_bits[didx]); }
                    }
                }
    I do entropy coding for lengths, distances and literals. Lengths and distances use a log-exp scheme (similar to WinRAR decoder but 2k max length and 64k max distance), only the logarithm part is entropy coded. The exponent is stored as bits. bits_for_symbol is calculated by Huffman coding.

    If there are any questions, please ask. This is not optimized at all, I'm still experimenting. Using greedy + lookahead with 64k tables compresses 2-5% worse than the normal 4k tables. Using a BALZ-style predictor is ~0% better, 1% better if I context-code flag (match/literal) with a LZMA context table (kStates) and don't context length or distance.

    Edit: Forgot to mention, for optimal parsing to work, I do ROLZ update and match for every symbol, not phrase. 4k tables are worse than before because offsets scroll out faster, and the decoder is slightly slower because now every byte is a (fast) update.
    Last edited by cade; 14th April 2014 at 02:20.

  14. The Following 2 Users Say Thank You to cade For This Useful Post:

    Bulat Ziganshin (14th April 2014),RichSelian (14th April 2014)

  15. #43
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    160
    Thanks
    18
    Thanked 56 Times in 27 Posts
    Quote Originally Posted by cade View Post
    This is mainly for RichSelian, perhaps others will also find it useful:

    Here is source for an experiment for optimal parsing with 64k tables. enwik8 compresses to 24,617k with the decoder slightly slower than before (no code changed, only table size and memory usage). I didn't measure timings because I didn't try using any better match finder (just hash chain, not hash table like normal), so match finding is slow. The actual optimal parse path finding on 16 MB blocks for the 100 MB is 1.5 seconds.

    Code:
        const int OPTIMAL_NUM_SLOTS = 8;
    
    ...
    
        struct OptimalSlot
        {
            int bucket_idx[OPTIMAL_NUM_SLOTS];
            int bucket_len[OPTIMAL_NUM_SLOTS];
            int num_slots;
    
            int best_len;
        };
        // Add buckets as more matches are found, when num_slots == OPTIMAL_NUM_SLOTS, the last one is overwritten by longer matches.
    
    ...
    
                    int bits_for_sym[0x100];
                    {
                        Coder<0x100> coder;
                        coder.MemInit();
                        for (int i = 0; i < window_n; i++)
                        {
                            coder.UpdateFreq(window[i]);
                        }
                        coder.update_lengths();
    
                        for (int i = 0; i < 0x100; i++)
                        {
                            bits_for_sym[i] = coder.code_lengths[i];
                        }
                        coder.MemFree();
                    }
    
                    for (int pos = window_n - 1; pos >= 0; pos--)
                    {
                        OptimalSlot* slot = slots + pos;
    
                        int cur_cost = bits_for_sym[window[pos]];
                        if (pos < window_n - 1)
                        {
                            cur_cost += cost_to_end[pos + 1];
                        }
    
                        int match_len = 1;
    
                        if (slot->best_len > 0)
                        {
                            int best_len = 0;
                            for (int cur_len = slot->best_len; cur_len > 1; cur_len--)
                            {
                                int idx = -1;
                                for (int i = 0; i < slot->num_slots; i++)
                                {
                                    if (slot->bucket_len[i] >= cur_len)
                                    {
                                        idx = slot->bucket_idx[i];
                                        break;
                                    }
                                }
                                ASSERT(idx != -1);
    
                                int min_len = min_length_for_offset(idx);
                                if (min_len > cur_len) { continue; }
    
                                int match_cost = cost_to_end[pos + cur_len];
    
                                int lidx = base_idx(cur_len, lengthcode_base, lnum, lengthcode_bits);
                                match_cost += average_bits_for_length_sym;
                                match_cost += lengthcode_bits[lidx];
    
                                int didx = base_idx(idx, distcode_base, dnum, distcode_bits);
                                match_cost += average_bits_for_distance_sym;
                                match_cost += distcode_bits[didx];
    
                                if (match_cost < cur_cost)
                                {
                                    best_len = cur_len;
                                    cur_cost = match_cost;
                                }
                            }
    
                            match_len = best_len;
                        }
    
                        slot->best_len = match_len;
                        cost_to_end[pos] = cur_cost;
                    }
    
                    while (rpos < window_n)
                    {
                        OptimalSlot* slot = slots + rpos;
                        if (slot->best_len > 1)
                        {
                            int idx = -1;
                            for (int i = 0; i < slot->num_slots; i++)
                            {
                                if (slot->bucket_len[i] >= slot->best_len)
                                {
                                    idx = slot->bucket_idx[i];
                                    break;
                                }
                            }
    
                            ASSERT(idx > -1);
                            int min_len = min_length_for_offset(idx);
                            ASSERT(slot->best_len >= min_len);
    
                            codeblock[wpos++] = 256 + slot->best_len - min_len;
                            codeblock[wpos++] = idx;
    
                            rpos += slot->best_len;
                        }
                        else
                        {
                            codeblock[wpos++] = window[rpos];
    
                            ++rpos;
                        }
                    }
    
                    delete[] cost_to_end;
                    delete[] slots;
    
    ...
    Set up entropy
    ...
    
                for (int i = 0; i < wpos; i++)
                {
                    int sym = codeblock[i];
                    if (sym < 0x100)
                    {
                        coder.EncodeSym(io, sym);
                        num_syms++;
                    }
                    else
                    {
                        num_syms++;
                        num_dists++;
    
                        int match_len = sym - 256;
                        int match_dist = codeblock[++i];
    
                        int lidx = base_idx(match_len, lengthcode_base, lnum, lengthcode_bits);
                        int didx = base_idx(match_dist, distcode_base, dnum, distcode_bits);
                        num_extra_bits += lengthcode_bits[lidx];
                        num_extra_bits += distcode_bits[didx];
    
                        coder.EncodeSym(io, 256 + lidx);
                        if (lengthcode_bits[lidx] > 0) { io.Write(match_len - lengthcode_base[lidx], lengthcode_bits[lidx]); }
                        coder_dist.EncodeSym(io, didx);
                        if (distcode_base[didx] > 0) { io.Write(match_dist - distcode_base[didx], distcode_bits[didx]); }
                    }
                }
    I do entropy coding for lengths, distances and literals. Lengths and distances use a log-exp scheme (similar to WinRAR decoder but 2k max length and 64k max distance), only the logarithm part is entropy coded. The exponent is stored as bits. bits_for_symbol is calculated by Huffman coding.

    If there are any questions, please ask. This is not optimized at all, I'm still experimenting. Using greedy + lookahead with 64k tables compresses 2-5% worse than the normal 4k tables. Using a BALZ-style predictor is ~0% better, 1% better if I context-code flag (match/literal) with a LZMA context table (kStates) and don't context length or distance.

    Edit: Forgot to mention, for optimal parsing to work, I do ROLZ update and match for every symbol, not phrase. 4k tables are worse than before because offsets scroll out faster, and the decoder is slightly slower because now every byte is a (fast) update.
    I've tried flexible parsing on my old compressor comprolz. but I found it too slow, and updating per symbol instead of phase also makes decompression slow.

    also checkout new version of libzling: https://github.com/richox/libzling/tree/20140414
    this version makes use of len=2 sequences.
    Last edited by RichSelian; 14th April 2014 at 15:16.

  16. #44
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    Decompression is not so slow compared to arithmetic coding or something else... but compression needs too much memory (I was thinking of HC3 + BT4). Anyway, a fun academic exercise.

    Nice idea. I tried encoding two bytes if they occurred in the last 256b (not context-predicted), but the difference in size was not really significant to me (<1%). I'm not certain how to improve any further without any significant hit in speed (such as 200% slower...). Probably, this is enough for me and large files.

  17. #45
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    RH4 tested in WCC 2014
    Unfortunately I failed in test ISO decompression! do not create file output..

  18. #46
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    Is there something strange about the file path? Invalid characters or too long? Try something simple such as C:/Folder/1.iso and see if that works.

  19. #47
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    Another pdate: many small improvements, encoding is a little faster, hashes are stored (use l to show a listing of an archive) and verified when unpacking (old format still works).

    I also now re-arrange files based on a guess of their content: text vs binary. Text can be divided into XML/JSON/other markups based on the frequency of <>, (), [] and {} to "learn" tags with ROLZ. Binary can recognize MZ .exe and .dll. If nothing is recognized, they are identified as generic text, generic binary, or generic incompressible. These classifications generate a sorting key (upper bit = text vs binary, lower bits = type) that I use to sort files before packing. This stage could be improved later.
    Attached Files Attached Files

  20. The Following User Says Thank You to cade For This Useful Post:

    RichSelian (29th April 2014)

  21. #48
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    342
    Thanks
    12
    Thanked 34 Times in 28 Posts
    I've made a very quick (and very dirty) test on my typical file, ~1GB SQL-DUMP
    Here's the results
    RH64 work very, very well with little RAM requirements.
    Does not scale (shrink file size) much increasing compression mode, but default is very good indeed

    Code:
    RH64_x64
    1,101,869,297 -> 137,203,205 bytes (130.85 MB, 12.45%) - 8.961 seconds
    Exit code      : 0
    Elapsed time   : 9.61
    Kernel time    : 0.50 (5.2%)
    User time      : 9.05 (94.1%)
    page fault #   : 7872
    Working set    : 31496 KB
    Paged pool     : 16 KB
    Non-paged pool : 3 KB
    Page file size : 30464 KB
    => 137.203.205
    
    ZPAQ64 6.50 (-method 1)
    0 + (1101869297 -> 104 + 155149711 + 326641 = 155476456) = 155476456
    12.485 seconds (all OK)
    Exit code      : 0
    Elapsed time   : 12.50
    Kernel time    : 1.56 (12.5%)
    User time      : 41.09 (328.8%)
    page fault #   : 1299255
    Working set    : 468224 KB
    Paged pool     : 124 KB
    Non-paged pool : 11 KB
    Page file size : 478880 KB
    => 155.476.456
    
    ZPAQ64 6.50 (-method 2)
    0 + (1101869297 -> 104 + 134759564 + 322020 = 135081688) = 135081688
    43.953 seconds (all OK)
    Exit code      : 0
    Elapsed time   : 43.97
    Kernel time    : 3.63 (8.2%)
    User time      : 284.97 (648.0%)
    page fault #   : 1407269
    Working set    : 3477972 KB
    Paged pool     : 124 KB
    Non-paged pool : 13 KB
    Page file size : 3489248 KB
    
    
    
    RAR 5.10 (-m2)
    Creazione archivio prova2.rar
    Aggiunta      r:\1.sql
    Fatto
    Exit code      : 0
    Elapsed time   : 12.14
    Kernel time    : 0.56 (4.6%)
    User time      : 70.13 (577.5%)
    page fault #   : 27050
    Working set    : 107732 KB
    Paged pool     : 153 KB
    Non-paged pool : 21 KB
    Page file size : 113232 KB
    => 136.705.297

  22. #49
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    342
    Thanks
    12
    Thanked 34 Times in 28 Posts
    ...But not so good when working with a mix of "everything"...
    (take a lot of time to analyze data)

    Very low RAM, indeed.

    I suggest you to log the time needed to scan, analyze, dedup and so on
    to think about tradeoff (time spent "thinking" and "working")

    Code:
    Found 25539 files
    Scanning files...
    Scanned types
    Done scanning files
    Checking duplicates... 3904/25539
    Model: 29,187 KB
    Processing 3,245,433/3,245,433 KB
    3,739,488,989 -> 2,226,938,670 bytes (2.07 GB, 59.55%) - 125.810 second
    Exit code      : 0
    Elapsed time   : 294.59
    Kernel time    : 8.75 (3.0%)
    User time      : 122.95 (41.7%)
    page fault #   : 27649
    Working set    : 92568 KB
    Paged pool     : 16 KB
    Non-paged pool : 3 KB
    Page file size : 91840 KB
    
    
    ZPAQ64 650 method 1
    0 + (3739488989 -> 104 + 2072827922 + 2177834 = 2075005860) = 2075005860
    60.422 seconds (all OK)
    Exit code      : 0
    Elapsed time   : 60.44
    Kernel time    : 7.67 (12.7%)
    User time      : 140.88 (233.1%)
    page fault #   : 3163541
    Working set    : 669500 KB
    Paged pool     : 124 KB
    Non-paged pool : 13 KB
    Page file size : 680444 KB

  23. #50
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    Thanks for testing. Deduplicating is reading each file to get its CRC32 (also used to verify after decompression), then comparing against other files of the same size, and re-reading only if there is a collision. Scanning works on the first 64k only.

    It is not very fast in this stage if you have thousands of small files that cause slow I/O just to open and check. There is very little I can do to make this faster... if you test with files on a ram-disk, you will see how this behaviour completely changes.

    I could add a flag to skip hashing and analyzing. The format already allows writing with no hashes, but then there is no verification after decompression.

    Edit: Also, if you are including binary, I don't have exe filter. ZPAQ can reach better results with a larger window.
    Last edited by cade; 29th April 2014 at 14:10.

  24. #51
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    It looks like compression is slower and a bit worse on 10GB (system 4). http://mattmahoney.net/dc/10gb.html
    Compression is about the same on LTCB. Decompression is slower. http://mattmahoney.net/dc/text.html#2584

  25. #52
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    RH4 is in testing with WCC2014 Benchmark .. no problem verified !
    Nice job !

  26. #53
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    Thanks for testing.

    Quote Originally Posted by Matt Mahoney View Post
    It looks like compression is slower and a bit worse on 10GB (system 4). http://mattmahoney.net/dc/10gb.html
    Compression is about the same on LTCB. Decompression is slower. http://mattmahoney.net/dc/text.html#2584
    Hmm... strange, decompressor is exactly the same as before, bit IO is improved a bit, speed has improved slightly.

    My timings, stock i7 2600 with 4 GB memory:
    March 22: enwik8 in 3.12 seconds, unpack in 0.638 seconds
    April 24: enwik8 in 2.91 seconds, unpack in 0.568 seconds.
    Both are c2, 64-bit version, same size (plus a few bytes for the new format).

    If v6 on 10GB is the previous and v7 is the newest, I guess re-ordering files should be smarter (or isn't a very good idea in my case).

    Some open questions:
    1) Is there a good strategy to re-order files?
    2) For binary files, is it better to do LZ77 so that for match distances > some amount (such as 16), the lower 4 bits can be encoded with separate entropy?

  27. #54
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    In zpaq I sort first by filename extension, then by decreasing size rounded to 16K, then by full path name. I found that when sorting by exact size or less rounding it compressed slightly better but slower, I think because the disk head had to jump between directories.

    Sorting by decreasing rather than increasing size probably won't make a difference for most compressors but for zpaq it matters because it makes it more likely that files with different extensions are packed into separate blocks.

    Edit: Oops, RH4 is actually faster on LTCB. On 10GB I always use real time but on enwik9 I use user+sys if it is faster, which it is in this case because it is single threaded, and I forgot to do this. Decompression speed is improved from 12 to 9 seconds and compression is slightly faster too. http://mattmahoney.net/dc/text.html#2584
    Last edited by Matt Mahoney; 30th April 2014 at 04:21.

  28. #55
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    Here is what I have right now, testing on a 34MB source tree with C++ source, make files, small exe utilities, scripts and XML:
    No re-ordering: 5.78 MB
    Order as I described above (generating sorting keys based on content): 5.72 MB
    Order by extension, then by file size (shifted right 14 bits/divided by 64k), and then by the full path: 5.80 MB

    Also, decompression memory is slightly less ~5 MB than compression memory (a note for LTCB's mem column).

  29. #56
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    873
    Thanks
    460
    Thanked 175 Times in 85 Posts
    The yesterday's build crashes on SqueezeChart Sources testset on my machine. can please anyone confirm this?

  30. #57
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    Sorry, I have not tried it. Could you please try the different folders and find which one is causing a crash? I have tried on >50GB disk images without any problems.

    I had a mistake in my sorting which made it unstable if I had multiple criteria. After fixing it,
    No re-ordering: 5.78 MB
    By extension, then size / 64k, then full path: 5.77 MB
    With a content-based key: 5.71 MB

    In this new version, the flags -r0/-r1/-r2 select the re-ordering method (r0 is none).

    I also made some small changes to unpacking, which is a bit faster now.
    Attached Files Attached Files

  31. #58
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    342
    Thanks
    12
    Thanked 34 Times in 28 Posts
    Quote Originally Posted by cade View Post
    Thanks for testing. Deduplicating is reading each file to get its CRC32 (also used to verify after decompression), then comparing against other files of the same size, and re-reading only if there is a collision. Scanning works on the first 64k only.

    It is not very fast in this stage if you have thousands of small files that cause slow I/O just to open and check. There is very little I can do to make this faster... if you test with files on a ram-disk, you will see how this behaviour completely changes.
    I already use a ramdisk, so latency is very low.

    Edit: Also, if you are including binary, I don't have exe filter. ZPAQ can reach better results with a larger window.
    Yes, there is a lot of EXE, DLL and so on (is the developement tree of a ERP)

  32. #59
    Tester
    Stephan Busch's Avatar
    Join Date
    May 2008
    Location
    Bremen, Germany
    Posts
    873
    Thanks
    460
    Thanked 175 Times in 85 Posts
    Latest build of RH4 also crashes on the Sources testset.
    There doesn"t seem to be a specific directory or file that causes this - it happens right before being ready with compression. Here's the log:

    Found 11652 files
    Scanning, ordering and hashing files...
    Ordered files
    Checking duplicates... 621/11652
    Model: 29,187 KB
    Processing 192,829/192,829 KB
    - and then: RH4_x64 doesn't work anymore..

  33. #60
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    You can make deduplication faster by using a secure hash like SHA256 or BLAKE2 and not bothering to compare the files if the hashes match. SREP uses a faster hash, VMAC, which is only secure when used with a secret key chosen by a secure random number generator. That would be OK as long as you don't allow the archive to be updated.

Page 2 of 3 FirstFirst 123 LastLast

Similar Threads

  1. Bittorrent and solid archives
    By lunaris in forum Data Compression
    Replies: 8
    Last Post: 29th December 2010, 10:54

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •