Results 1 to 24 of 24

Thread: RH

  1. #1
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts

    RH

    ROLZ (o3 for RH, o1 for RH2) + Huffman, named after LZH.

    Here are two small programs on the same theme of what I was looking at for fast compression with minimal memory. RH fits in about 8 MB, compresses faster and decompresses slower. RH2 fits in about 64 MB, compresses slightly slower but better, and decompresses faster.

    The first compression level for RH2 is LZP. The second level is limited ROLZ search (o1), the third level is a full ROLZ search. Literals are stored in one Huffman-coded buffer as 0-511, where 256-511 is 256 + match length. Matches are followed by a 12-bit index (second coder) in the ROLZ table.
    Attached Files Attached Files

  2. #2
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    776
    Thanks
    63
    Thanked 270 Times in 190 Posts
    967,021,395 bytes, html text:

    187,144,650 bytes, 11.714 sec. - 6.231 sec., rh_x64
    161,949,923 bytes, 11.117 sec. - 3.863 sec., rh2_x64

    1,000,000,000 bytes, enwik9:

    315,110,004 bytes, 19.954 sec. - 9.601 sec., rh_x64
    279,847,097 bytes, 20.260 sec. - 6.153 sec., rh2_x64

    All RAM-disk
    Last edited by Sportman; 16th February 2014 at 23:31.

  3. #3
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    Quote Originally Posted by Sportman View Post
    187,144,650 bytes, 11.714 sec. - crash, rh_x64
    Whoops... I will investigate.

    Edit: Maybe this will fix it, otherwise I need a test file.
    Attached Files Attached Files
    Last edited by cade; 16th February 2014 at 23:02.

  4. #4
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    776
    Thanks
    63
    Thanked 270 Times in 190 Posts
    Sorry I just rechecked it and I got that crash halfway decompressing by using rh2 instead of rh, my mistake.

  5. #5
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    Oh... well, I'm sure that I fixed something.

    (I fixed building Huffman trees with only 1 symbol).

  6. #6
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    160
    Thanks
    18
    Thanked 56 Times in 27 Posts
    Can you post a 32-bit Windows or 64-bit Linux version?

  7. #7
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    Sure, here are 32-bit x86 versions. All operations (compression, decompression for RH and RH2) are slower in these versions (compared to x64) for some reason.

    Note that files with 64-bit versions are not compatible with 32-bit versions (different packing for file and block sizes).
    Attached Files Attached Files

  8. The Following User Says Thank You to cade For This Useful Post:

    RichSelian (17th February 2014)

  9. #8
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    160
    Thanks
    18
    Thanked 56 Times in 27 Posts
    Quote Originally Posted by cade View Post
    Sure, here are 32-bit x86 versions. All operations (compression, decompression for RH and RH2) are slower in these versions (compared to x64) for some reason.

    Note that files with 64-bit versions are not compatible with 32-bit versions (different packing for file and block sizes).
    RH2 comparing to libzling:

    Code:
    zhangli10@B000000025930DL /z $ ./RH2_x86.exe c2 testdata/enwik8 1
    RH2 - Larger model
    32-bit version
    Written by Nauful
    Feb 16, 2014
    Mem: 63520 KB
    Working: 97656/97656 KB                                                        
    100000000 -> 31957388 bytes - 5.907 seconds
    
    zhangli10@B000000025930DL /z $ richox-compression/libzling/build/zling_demo.exe e testdata/enwik8 2
    zling:
       light-weight lossless data compression utility
       by Zhang Li <zhangli10 at baidu.com>
    
     16.78 MB =>   5.43 MB 32.35%, 0.639 sec, speed=26.255 MB/sec
     33.55 MB =>  10.79 MB 32.15%, 1.263 sec, speed=26.567 MB/sec
     50.33 MB =>  16.19 MB 32.16%, 1.856 sec, speed=27.118 MB/sec
     67.11 MB =>  21.60 MB 32.19%, 2.481 sec, speed=27.049 MB/sec
     83.89 MB =>  27.01 MB 32.20%, 3.088 sec, speed=27.165 MB/sec
    100.00 MB =>  32.19 MB 32.19%, 3.665 sec, speed=27.285 MB/sec
    encode: 100000000 => 32189884, time=3.665 sec, speed=27.285 MB/sec
    compresses a little better but slower than libzling.
    maybe you can get some idea from then open-source libzling (https://github.com/richox/libzling, it uses the same technology ROLZ1+Huffman). and I hope RH2 to become another open-source program.

  10. #9
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    160
    Thanks
    18
    Thanked 56 Times in 27 Posts
    and RH (ROLZ-o3 + Huffman) is not a good idea. I've tried the same idea with the old zlite (https://github.com/richox/zlite). since ROLZ-o3 needs too long literal as contexts, and the contexts are poorly compressed with o0-huffman.

  11. #10
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    160
    Thanks
    18
    Thanked 56 Times in 27 Posts
    Quote Originally Posted by RichSelian View Post
    and RH (ROLZ-o3 + Huffman) is not a good idea. I've tried the same idea with the old zlite (https://github.com/richox/zlite). since ROLZ-o3 needs too long literal as contexts, and the contexts are poorly compressed with o0-huffman.
    in addition, zlite has a simplified order-1 MTF transform. that's (maybe) why zlite are faster and gets better compression ratio than RH_x86.exe.

  12. #11
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    Thanks for the tips. ROLZ with o3 is not a good idea in general, but I wanted some method that fit within 8 MB of memory used. Compression speed for RH2 depends on matches tested, maybe I should split it into 1/2/3/4 and make 2 faster. Obviously, -c1 will be faster with fewer match tests.

    I'm not sure what else makes RH2 slow, probably some very fine details (e.g. cache). I will take a look at zling. If I can clean this up and make something decent, I will probably post the source later.

  13. #12
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    I got the following errors when trying to run in 64 bit Ubuntu using Wine.
    Code:
    matt@matt-Latitude-E6510:~/Downloads$ ./RH_x64.exe 
    wine: Call from 0x7fe819542c67 to unimplemented function KERNEL32.dll.LCMapStringEx, aborting
    wine: Unimplemented function KERNEL32.dll.LCMapStringEx called at address 0x7fe819542c67 (thread 0024), starting debugger...
    fixme:dbghelp_dwarf:dwarf2_parse_subprogram Unhandled Tag type 0x26 at ctx(0x23a960,L"libc.so.6"), for debug_info(abbrev:0x430880,symt:0x46831c)
    fixme:dbghelp_dwarf:compute_location Only supporting one breg (rbx/329 -> r8/336)
    fixme:dbghelp_dwarf:dwarf2_parse_subprogram Unhandled Tag type 0x26 at ctx(0x23a960,L"libc.so.6"), for debug_info(abbrev:0x19825b0,symt:0x1a01514)
    fixme:dbghelp_dwarf:dwarf2_parse_subprogram_block Unhandled Tag type 0xf at ctx(0x23a960,L"libc.so.6"), for debug_info(abbrev:0x19c27a8,symt:(nil))
    fixme:dbghelp_dwarf:dwarf2_parse_subprogram_block Unhandled Tag type 0xf at ctx(0x23a960,L"libc.so.6"), for debug_info(abbrev:0x19c27a8,symt:(nil))
    
    matt@matt-Latitude-E6510:~/Downloads$ ./RH2_x86.exe 
    wine: Call from 0x7bc4d63e to unimplemented function KERNEL32.dll.LCMapStringEx, aborting
    wine: Unimplemented function KERNEL32.dll.LCMapStringEx called at address 0x7bc4d63e (thread 0009), starting debugger...
    
    matt@matt-Latitude-E6510:~/Downloads$ ./RH_x86.exe 
    wine: Call from 0x7bc4d63e to unimplemented function KERNEL32.dll.LCMapStringEx, aborting
    wine: Unimplemented function KERNEL32.dll.LCMapStringEx called at address 0x7bc4d63e (thread 0009), starting debugger...
    Unhandled exception: unimplemented function KERNEL32.dll.LCMapStringEx called in 32-bit code (0x7bc4d63e).
    Register dump:
     CS:0023 SS:002b DS:002b ES:002b FS:0063 GS:006b
     EIP:7bc4d63e ESP:0032f4e4 EBP:0032f548 EFLAGS:00000216(   - --  I   -A-P- )
     EAX:0032f4f0 EBX:7bcc7000 ECX:000001e2 EDX:000000f0
     ESI:00000100 EDI:00000000
    Stack dump:
    0x0032f4e4:  f7691fb8 f77b96a0 00000001 80000100
    0x0032f4f4:  00000001 00000000 7bc4d63e 00000002
    0x0032f504:  00429a0e 004299c6 00000000 0032f550
    0x0032f514:  0032fc70 7b8af000 00000100 0032f580
    0x0032f524:  7b850ae7 f77b96a0 00000001 0032fc70
    0x0032f534:  00000100 0032f5a8 00000100 00000000
    000c: sel=0067 base=00000000 limit=00000000 32-bit --x
    Backtrace:
    =>0 0x7bc4d63e call_dll_entry_point+0x66() in ntdll (0x0032f548)
      1 0x0033005a (0x0032f580)
      2 0x00410812 in rh_x86 (+0x10811) (0x0032f7c4)
      3 0x00410962 in rh_x86 (+0x10961) (0x0032f800)
      4 0x0040cae1 in rh_x86 (+0xcae0) (0x0032fd74)
      5 0x0040d01a in rh_x86 (+0xd019) (0x0032fdac)
      6 0x0040ccf0 in rh_x86 (+0xccef) (0x0032fdec)
      7 0x0040c90a in rh_x86 (+0xc909) (0x0032fe04)
      8 0x0040350a in rh_x86 (+0x3509) (0x0032fe40)
      9 0x7b861d48 call_process_entry+0xb() in kernel32 (0x0032fe58)
      10 0x7b861e8d call_process_entry+0x150() in kernel32 (0x0032fea8)
      11 0x7bc7edfc call_thread_func_wrapper+0xb() in ntdll (0x0032feb8)
      12 0x7bc7ee4b call_thread_func+0x44() in ntdll (0x0032ff98)
      13 0x7bc7edda in ntdll (+0x6edd9) (0x0032ffb8)
      14 0x7bc545bf in ntdll (+0x445be) (0x0032ffe8)
      15 0xf7692b5d wine_call_on_stack+0x1c() in libwine.so.1 (0x00000000)
      16 0xf7692b3b wine_switch_to_stack+0x2a() in libwine.so.1 (0xffb11ca8)
      17 0x7bc54901 LdrInitializeThunk+0x341() in ntdll (0xffb11d28)
      18 0x7b862766 __wine_kernel_init+0x719() in kernel32 (0xffb12bb8)
      19 0x7bc5507b __wine_process_init+0x156() in ntdll (0xffb12c08)
      20 0xf76913ce wine_init+0x140() in libwine.so.1 (0xffb12c48)
      21 0x7bf011c6 main+0x13d() in <wine-loader> (0xffb13088)
      22 0xf74b9905 __libc_start_main+0xf4() in libc.so.6 (0x00000000)
    0x7bc4d63e call_dll_entry_point+0x66 in ntdll: subl	$4,%esp
    Modules:
    Module	Address			Debug info	Name (19 modules)
    PE	  400000-  431000	Export          rh_x86
    ELF	7b800000-7ba43000	Dwarf           kernel32<elf>
      \-PE	7b810000-7ba43000	\               kernel32
    ELF	7bc00000-7bce3000	Dwarf           ntdll<elf>
      \-PE	7bc10000-7bce3000	\               ntdll
    ELF	7bf00000-7bf04000	Dwarf           <wine-loader>
    ELF	7ebba000-7ebdc000	Deferred        libtinfo.so.5
    ELF	7ebdc000-7ec01000	Deferred        libncurses.so.5
    ELF	7ef71000-7ef7e000	Deferred        libnss_files.so.2
    ELF	7ef7e000-7ef8a000	Deferred        libnss_nis.so.2
    ELF	7ef8a000-7efa3000	Deferred        libnsl.so.1
    ELF	7efa3000-7efe6000	Deferred        libm.so.6
    ELF	f749b000-f74a0000	Deferred        libdl.so.2
    ELF	f74a0000-f7654000	Dwarf           libc.so.6
    ELF	f7655000-f7670000	Deferred        libpthread.so.0
    ELF	f7681000-f768a000	Deferred        libnss_compat.so.2
    ELF	f768a000-f77ce000	Dwarf           libwine.so.1
    ELF	f77d0000-f77f2000	Deferred        ld-linux.so.2
    ELF	f77f2000-f77f3000	Deferred        [vdso].so
    Threads:
    process  tid      prio (all id:s are in hex)
    00000008 (D) Z:\home\matt\Downloads\RH_x86.exe
    	00000009    0 <==
    0000000e services.exe
    	0000001f    0
    	0000001e    0
    	00000018    0
    	00000017    0
    	00000015    0
    	00000010    0
    	0000000f    0
    00000012 winedevice.exe
    	0000001c    0
    	00000019    0
    	00000014    0
    	00000013    0
    0000001a plugplay.exe
    	00000020    0
    	0000001d    0
    	0000001b    0
    00000021 explorer.exe
    	00000022    0
    matt@matt-Latitude-E6510:~/Downloads$

  14. #13
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    I'd say that you have a little outdated version of Wine if you're using vanilla Ubuntu 12.04. If you add Ubuntu Wine PPA repository then you will have newer version available to install. Eg you can use something like (haven't tested that):
    Code:
    sudo add-apt-repository ppa:ubuntu-wine/ppa
    sudo apt-get update
    sudo apt-get install wine1.6

  15. The Following User Says Thank You to Piotr Tarsa For This Useful Post:

    Matt Mahoney (17th February 2014)

  16. #14
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts
    Works OK in 32 bit Windows. http://mattmahoney.net/dc/text.html#2796

    Edit: Also, upgrading from Wine 1.4 to 1.6 fixed the problem (thanks). All 4 programs now run. The 64 bit version produces compressed files that are 4 bytes larger.
    Last edited by Matt Mahoney; 17th February 2014 at 19:22.

  17. #15
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    Here are two test versions that have a simple hierarchical profiler to show where time is spent (packing/unpacking buffers versus coding).

    Output is similar to this:
    Code:
    RH2 - Larger model
    64-bit version, supports files >4 GB
    Written by Nauful
    Feb 16, 2014
    Mem: 63520 KB
    Working: 97656/97656 KB
    100000000 -> 31957392 bytes - 3.177 seconds
    compress                         3.177      1
      pack_buffer                      2.775      6
      encode_buffer                    0.261      6
        Coder::update_codes              0.000      6
        Coder::update_codes              0.003      6
    Here is the code for the profiler if anyone would like one:
    Code:
    struct Profile;
    
    class ProfilerManager
    {
    public:
        const static int MAX_PROFILES = 8192;
        const static int MAX_STACK = 128;
    
    private:
        Profile* profiles[MAX_PROFILES];
        int num_profiles;
    
        Profile* roots[MAX_PROFILES];
        int num_roots;
    
        Profile* stack[MAX_STACK];
        int stack_p;
    
    public:
        ProfilerManager()
        {
            num_profiles = 0;
            num_roots = 0;
            stack_p = 0;
        }
    
        void Register(Profile* p);
        void Enter(Profile* p);
        void Leave();
    
        void Print();
        void Print(Profile* p, int depth);
    };
    
    static ProfilerManager _profiler_manager;
    
    struct Profile
    {
        const static int MAX_CHILDREN = 64;
        Profile* child[MAX_CHILDREN];
        int num_children;
    
        const char* name;
        clock_t dt;
        uint16 hits;
    
        Profile(const char* name) : name(name), dt(0)
        {
            _profiler_manager.Register(this);
        }
    
        void AddChild(Profile* p)
        {
            for (int i = 0; i < num_children; i++)
            {
                if (child[i] == p) { return; }
            }
    
            child[num_children++] = p;
        }
    };
    
    void ProfilerManager::Register(Profile* p)
    {
        profiles[num_profiles++] = p;
    }
    
    void ProfilerManager::Enter(Profile* p)
    {
        if (stack_p > 0)
        {
            stack[stack_p - 1]->AddChild(p);
            stack[stack_p++] = p;
        }
        else
        {
            stack[stack_p++] = p;
            for (int i = 0; i < num_roots; i++)
            {
                if (roots[i] == p) { return; }
            }
            roots[num_roots++] = p;
        }
    }
    
    void ProfilerManager::Leave()
    {
        --stack_p;
    }
    
    void ProfilerManager::Print(Profile* p, int depth)
    {
        for (int i = 0; i < depth; i++) { printf("  "); }
        printf("%-30s %7.3f %6d\n", p->name, p->dt / (float)CLOCKS_PER_SEC, p->hits);
    
        for (int i = 0; i < p->num_children; i++)
        {
            Print(p->child[i], depth + 1);
        }
    }
    
    void ProfilerManager::Print()
    {
        for (int i = 0; i < num_roots; i++)
        {
            Print(roots[i], 0);
        }
    
        if (!num_profiles) { printf("No profiles\n"); }
    }
    
    struct ProfileInstance
    {
        clock_t start;
        Profile* p;
    
        ProfileInstance(Profile* p)
            : start(clock()), p(p)
        {
            _profiler_manager.Enter(p);
        }
    
        ~ProfileInstance()
        {
            p->dt += clock() - start;
            p->hits += p->hits < 0xFFFF;
    
            _profiler_manager.Leave();
        }
    };
    
    #ifdef USE_PROFILER
    #    define PROFILE_REGION(x) static Profile _profile(x); ProfileInstance _profile_instance(&_profile);
    #else
    #    define PROFILE_REGION(x)
    #endif
    Profile functions or blocks like so:
    Code:
    {
    PROFILE_REGION("pack_buffer");
    ...
    }
    And at the end of the program,
    Code:
    _profiler_manager.Print();
    Edit: There are two update codes because I have a class templated twice, one for literals, and a second for indices. It's Coder<512> and Coder<4096>.
    Attached Files Attached Files
    Last edited by cade; 18th February 2014 at 02:34.

  18. The Following 2 Users Say Thank You to cade For This Useful Post:

    Bulat Ziganshin (19th February 2014),RichSelian (18th February 2014)

  19. #16
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    160
    Thanks
    18
    Thanked 56 Times in 27 Posts
    libzling uses 16MB block size, much smaller than RH2's 64MB. but compression ratio is almost the same.
    maybe 64MB is too big? (especially for limited searching)

  20. #17
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    I didn't optimize RH2 for memory:

    16 MB block
    16 MB size uint16 (packed block, 32 MB of memory)
    ~16 MB for a few other structures (hashes, two huffman, IO buffer, etc), mainly the hashes.

    I understand that your method has smaller block sizes for packing and coding. Something to think about for RH3.

  21. #18
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    160
    Thanks
    18
    Thanked 56 Times in 27 Posts
    Quote Originally Posted by cade View Post
    I didn't optimize RH2 for memory:

    16 MB block
    16 MB size uint16 (packed block, 32 MB of memory)
    ~16 MB for a few other structures (hashes, two huffman, IO buffer, etc), mainly the hashes.

    I understand that your method has smaller block sizes for packing and coding. Something to think about for RH3.
    using more memory can slow down your algorithm.
    libzling uses 16MB block and 10MB for dictionary. other costs like huffman or IO buffer can be ignored. moreover, decompressing doesn't need hashing. that will save a lot of memory while making decompressing faster.

  22. The Following User Says Thank You to RichSelian For This Useful Post:

    cade (19th February 2014)

  23. #19
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    The amount of memory isn't nearly as significant as the access pattern. I tried 32 MB blocks, ~93% of the time I didn't pass 16 MB (roughly the same speed). Thanks for reminding me, I can use a more tightly packed structure (and smaller) for the decoder. The decoder is faster now, encoding is still the same speed.

    I have some other ideas I want to try for Huffman before I continue on this. Even though its entropy is imprecise, it's fast enough to use.
    Attached Files Attached Files

  24. #20
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    Here's a test to pack a buffer with optimal parsing, and unpack it later. Hopefully someone will find this useful. Note that this is for reference and not optimized for speed.

    enwik8 packs and codes to 29,511 KB (30,219,262 bytes) in ~13 seconds:
    Code:
    pack_buffer                     13.234      6
      matches                         10.988      6
      optimal parsing                  1.509      6
      packing                          0.221      6
    Matching is slower than greedy or lazy parsing because I have to evaluate all matches at every position. I find LITERAL_COST from the entropy of the current block and MATCH_COST from the entropy of the match indices for the previous block. These are estimates for optimal parsing and const here just for reference.

    Code:
    const int LITERAL_COST = 6;
    const int MATCH_COST = 10;
    
    const static int BLOCK_BITS = 24;
    const static int TABLE_BITS = 12;
    const static int HASH_BITS = 14;
    
    const static int MATCH_MIN = 4;
    const static int MATCH_MAX = 254; // len encoded in a byte
    const static int MAX_TESTS = 8;
    
    const static int BLOCK_SIZE = 1 << BLOCK_BITS;
    const static int BLOCK_MASK = BLOCK_SIZE - 1;
    
    const static int HASH_SIZE = 1 << HASH_BITS;
    const static int HASH_MASK = HASH_SIZE - 1;
    
    struct Slot
    {
        const static int NUM_ENTRIES = 4;
    
        byte max_len;
    
        byte num_entries;
        byte lengths[NUM_ENTRIES];
        uint16 indices[NUM_ENTRIES];
    };
    
    struct Table
    {
        int32 offsets[TABLE_SIZE];
        int16 head;
    };
    
    struct EncoderTable
    {
        int16 hash[TABLE_SIZE];
        int32 next[TABLE_SIZE];
        int32 root[HASH_SIZE];
    };
    
    uint32 hash_calc(byte* in)
    {
        const int SHIFT = 32 - HASH_BITS;
    
        return (2654435761u * *(uint32*)in) >> SHIFT;
    }
    
    void model_update(Table* table, EncoderTable* enc_table, byte* in, int pos)
    {
        uint32 hash = hash_calc(in + pos) & HASH_MASK;
        enc_table->hash[table->head] = hash;
    
        enc_table->next[table->head] = enc_table->root[hash];
        enc_table->root[hash] = table->head;
    
        table->offsets[table->head] = pos;
        table->head = (table->head + 1) & TABLE_MASK;
    }
    
    void model_update(Table* table, int pos)
    {
        table->offsets[table->head] = pos;
        table->head = (table->head + 1) & TABLE_MASK;
    }
    
    int pack_buffer(byte* in, uint16* out, int in_size)
    {
        int wpos = 0;
        int rpos = 0;
    
        Slot* slots = new Slot[in_size];
        Table* tables = new Table[0x100];
        EncoderTable* enc_tables = new EncoderTable[0x100];
        int* cost_to_end = new int[in_size];
    
        memset(cost_to_end, 0, sizeof(int) * in_size);
        memset(slots, 0, sizeof(Slot) * in_size);
    
        for (int i = 0; i < 0x100; i++)
        {
            tables[i].head = 0;
            memset(tables[i].offsets, 0, sizeof(tables[i].offsets));
    
            memset(enc_tables[i].hash, -1, sizeof(enc_tables[i].hash));
            memset(enc_tables[i].root, -1, sizeof(enc_tables[i].root));
            memset(enc_tables[i].next, -1, sizeof(enc_tables[i].next));
        }
    
        printf("Tables: %d MB\n", (sizeof(Table) * 0x100) >> 20);
        printf("Encoder tables: %d MB\n", (sizeof(EncoderTable) * 0x100) >> 20);
        printf("Slots: %d MB\n", (sizeof(Slot) * in_size) >> 20);
        printf("Costs: %d MB\n", (sizeof(int) * in_size) >> 20);
    
        for (int pos = 1; pos < in_size - MATCH_MAX; pos++)
        {
            byte ctx = in[pos - 1];
    
            Table* t = tables + ctx;
            EncoderTable* enc_t = enc_tables + ctx;
            Slot* slot = slots + pos;
    
            uint32 hash = hash_calc(in + pos) & HASH_MASK;
    
            int node = enc_t->root[hash];
            int match_len = MATCH_MIN - 1;
    
            if (node != INVALID_U32)
            {
                uint32 match_end = min(in_size, pos + MATCH_MAX);
                for (int tests = 0; tests < MAX_TESTS; tests++)
                {
                    uint32 offs = t->offsets[node];
                    if (!offs) { break; }
    
                    uint32 t0 = offs;
                    uint32 len = pos;
    
                    if (*(uint32*)(in + t0) == *(uint32*)(in + len))
                    {
                        t0 += 4;
                        len += 4;
    
                        if (*(uint32*)(in + t0) == *(uint32*)(in + len))
                        {
                            t0 += 4;
                            len += 4;
                        }
    
                        while (len < match_end && in[t0] == in[len])
                        {
                            ++t0;
                            ++len;
                        }
                    }
    
                    len -= pos;
                    if (len > match_len)
                    {
                        ASSERT(t0 > 0);
    
                        int idx = (t->head - node) & TABLE_MASK;
    
                        if (slot->num_entries < Slot::NUM_ENTRIES)
                        {
                            slot->lengths[slot->num_entries] = len;
                            slot->indices[slot->num_entries] = idx;
                            ++slot->num_entries;
                        }
                        else
                        {
                            slot->lengths[slot->num_entries - 1] = len;
                            slot->indices[slot->num_entries - 1] = idx;
                        }
    
                        match_len = len;
                        slot->max_len = len;
                        if (len == MATCH_MAX) { break; }
                    }
    
                    node = enc_t->next[node];
                    if (node == INVALID_U32 || offs <= t->offsets[node])
                    {
                        break;
                    }
                }
            }
    
            model_update(t, enc_t, in, pos);
        }
    
        // Lazy parsing
        //for (int pos = 0; pos < in_size - MATCH_MAX; pos++)
        //{
        //    Slot* slot = slots + pos;
        //    if (slot->max_len == 0) { continue; }
    
        //    for (int j = 1; j <= slot->max_len; j++)
        //    {
        //        Slot* slot2 = slots + pos + j;
        //        if (slot2->max_len - j > slot->max_len)
        //        {
        //            slot->max_len = j;
        //            break;
        //        }
        //    }
        //}
    
        // Optimal parsing
        for (int pos = in_size - 1; pos >= 0; pos--)
        {
            Slot* slot = slots + pos;
    
            if (slot->max_len > 0)
            {
                byte best_len = 0;
                int best_cost = in_size * 8;
                for (byte cur_len = slot->max_len; cur_len >= 1; cur_len--)
                {
                    int cur_cost = cost_to_end[pos + cur_len];
                    cur_cost += (cur_len < MATCH_MIN) ? (LITERAL_COST * cur_len) : MATCH_COST;
    
                    if (cur_cost < best_cost)
                    {
                        best_len = cur_len;
                        best_cost = cur_cost;
                    }
                }
    
                slot->max_len = best_len;
            }
    
            if (pos == in_size - 1)
            {
                cost_to_end[pos] = LITERAL_COST;
                slot->max_len = 0;
            }
            else if (slot->max_len < MATCH_MIN)
            {
                cost_to_end[pos] = cost_to_end[pos + 1] + LITERAL_COST;
                slot->max_len = 0;
            }
            else
            {
                cost_to_end[pos] = cost_to_end[pos + slot->max_len] + MATCH_COST + LITERAL_COST;
            }
        }
    
        printf("Optimized cost: %d KB (%d bits)\n", cost_to_end[0] / (8 * 1024), cost_to_end[0]);
    
        int matched_bytes = 0;
        while (rpos < in_size)
        {
            Slot* slot = slots + rpos;
    
            if (slot->max_len >= MATCH_MIN)
            {
                int idx = -1;
                for (int i = 0; i < slot->num_entries; i++)
                {
                    if (slot->lengths[i] >= slot->max_len)
                    {
                        idx = slot->indices[i];
                        break;
                    }
                }
    
                ASSERT(idx > -1);
                ASSERT(idx < TABLE_SIZE);
    
                out[wpos++] = slot->max_len - MATCH_MIN + 256;
                out[wpos++] = idx;
    
                rpos += slot->max_len;
                matched_bytes += slot->max_len;
            }
            else
            {
                ASSERT(slot->max_len == 0);
                out[wpos++] = in[rpos];
    
                ++rpos;
            }
        }
    
        delete[] cost_to_end;
        delete[] slots;
        delete[] tables;
        delete[] enc_tables;
    
        return wpos;
    }
    Code:
    void unpack_buffer(uint16* in, byte* out, int in_size, int out_size)
    {
        int wpos = 0;
        int rpos = 0;
    
        Table* tables = new Table[0x100];
    
        for (int i = 0; i < 0x100; i++)
        {
            tables[i].head = 0;
            memset(tables[i].offsets, 0, sizeof(tables[i].offsets));
        }
    
        while (rpos < in_size)
        {
            if (in[rpos] < 256)
            {
                out[wpos] = in[rpos];
    
                if (wpos > 0)
                {
                    model_update(tables + out[wpos - 1], wpos);
                }
                wpos++;
                rpos++;
            }
            else
            {
                byte ctx = out[wpos - 1];
                int length = in[rpos++] - 256 + MATCH_MIN;
                int idx = in[rpos++];
    
                ASSERT(length + 256 - MATCH_MIN < 512);
                ASSERT(idx < TABLE_SIZE);
    
                int recon_idx = (tables[ctx].head - idx) & TABLE_MASK;
                uint32 src_pos = tables[ctx].offsets[recon_idx];
    
                ASSERT(src_pos < wpos && src_pos > 0);
    
                while (length > 0)
                {
                    model_update(tables + out[wpos - 1], wpos);
                    out[wpos++] = out[src_pos++];
                    length--;
                }
            }
        }
    
        ASSERT(wpos == out_size);
    
        delete[] tables;
    }
    Coding is just this:
    Code:
            for (int i = 0; i < packed_block_size; i++)
            {
                sym_coder.EncodeSym(io, coded_block[i]);
                if (coded_block[i] & 256)
                {
                    i++;
                    match_coder.EncodeSym(io, coded_block[i]);
                }
            }

  25. #21
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    Last version for now. There are now 5 different levels. I don't use optimal parsing at any level as it uses too much memory, but the source for it is above. I also added a simple detector for incompressible sections and skip them because I have solid files with binary + text + already compressed files.

    Options are:
    c1 - LZP, skip incompressible sections.
    c2 - Fast search, skip incompressible sections (Default).
    c3/c4 - Normal search + lazy, skip incompressible sections.
    c5 - Maximum + forward optimal parsing (slow, stores nothing), no skipping This isn't very practical in speed. Forward optimal parsing is probably a quarter of the distance between lazy and optimal, but it was an interesting experiment for small files.

    Some text: 505 KB at c4, 502 at c5, 489 with optimal, 485 with optimal + arithmetic. Although, with a simple and silly 1 MB CM, it's ~417 KB.
    Attached Files Attached Files

  26. #22
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 778 Times in 485 Posts

  27. The Following User Says Thank You to Matt Mahoney For This Useful Post:

    cade (28th February 2014)

  28. #23
    Member
    Join Date
    Nov 2013
    Location
    US
    Posts
    131
    Thanks
    31
    Thanked 29 Times in 19 Posts
    Nice, thanks for testing. It looks like better parsing methods are more useful (c2->c3->optimal) than much deeper searching (c3->c5).

  29. #24
    Member RichSelian's Avatar
    Join Date
    Aug 2011
    Location
    Shenzhen, China
    Posts
    160
    Thanks
    18
    Thanked 56 Times in 27 Posts
    Quote Originally Posted by Matt Mahoney View Post
    Matt please also update the new libzling on LTCB!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •