Page 29 of 29 FirstFirst ... 19272829
Results 841 to 849 of 849

Thread: Tree alpha v0.1 download

  1. #841
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,475
    Thanks
    26
    Thanked 121 Times in 95 Posts
    I tried time but only got 1 second resolution. Is that normal?
    It seems so, unfortunately. http://www.cplusplus.com/reference/ctime/time_t/ says:
    type time_t <ctime>
    Time type
    Alias of a fundamental arithmetic type capable of representing times, as those returned by function time.

    For historical reasons, it is generally implemented as an integral value representing the number of seconds elapsed since 00:00 hours, Jan 1, 1970 UTC (i.e., a unix timestamp). Although libraries may implement this type using alternative time representations.

    Portable programs should not use values of this type directly, but always rely on calls to elements of the standard library to translate them to portable types.
    On the other hand, function 'clock_gettime' suggested by user 'algorithm' seems to be working on my system. I've implemented a program using documentation at https://linux.die.net/man/3/clock_gettime and snippet from https://gist.github.com/diabloneo/9619917 .

    Example code ( https://ideone.com/6XcQ7d ):
    #define _POSIX_C_SOURCE 199309L

    #include <stdio.h>
    #include <time.h>
    #include <unistd.h>

    void timespec_diff(const struct timespec * start, const struct timespec *stop,
    struct timespec *result) {
    if ((stop->tv_nsec - start->tv_nsec) < 0) {
    result->tv_sec = stop->tv_sec - start->tv_sec - 1;
    result->tv_nsec = stop->tv_nsec - start->tv_nsec + 1000000000;
    } else {
    result->tv_sec = stop->tv_sec - start->tv_sec;
    result->tv_nsec = stop->tv_nsec - start->tv_nsec;
    }
    return;
    }

    void timespec_print(const char * description,
    const struct timespec *start, const struct timespec *stop) {
    struct timespec result;
    timespec_diff(start, stop, &result);
    double seconds = result.tv_sec + result.tv_nsec / 1e9;
    printf("times elapsed for clock '%s': %lf\n", description, seconds);
    return;
    }

    int main(void) {
    struct timespec start_clock_realtime;
    struct timespec start_clock_monotonic;
    struct timespec start_clock_process_cputime_id;
    struct timespec start_clock_thread_cputime_id;

    struct timespec finish_clock_realtime;
    struct timespec finish_clock_monotonic;
    struct timespec finish_clock_process_cputime_id;
    struct timespec finish_clock_thread_cputime_id;

    clock_gettime(CLOCK_REALTIME, &start_clock_realtime);
    clock_gettime(CLOCK_MONOTONIC, &start_clock_monotonic);
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start_clock_process_cputime_id);
    clock_gettime(CLOCK_THREAD_CPUTIME_ID, &start_clock_thread_cputime_id);

    sleep(3); // seconds

    clock_gettime(CLOCK_REALTIME, &finish_clock_realtime);
    clock_gettime(CLOCK_MONOTONIC, &finish_clock_monotonic);
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &finish_clock_process_cputime_id);
    clock_gettime(CLOCK_THREAD_CPUTIME_ID, &finish_clock_thread_cputime_id);

    timespec_print("realtime",
    &start_clock_realtime, &finish_clock_realtime);
    timespec_print("monotonic",
    &start_clock_monotonic, &finish_clock_monotonic);
    timespec_print("process cputime id",
    &start_clock_process_cputime_id, &finish_clock_process_cputime_id);
    timespec_print("thread cputime id",
    &start_clock_thread_cputime_id, &finish_clock_thread_cputime_id);

    return 0;
    }

    Result:
    Code:
    times elapsed for clock 'realtime': 3.000057
    times elapsed for clock 'monotonic': 3.000057
    times elapsed for clock 'process cputime id': 0.000007
    times elapsed for clock 'thread cputime id': 0.000007
    Precision seems to be pretty good. Check if your compiler gives similiar results (I hope so).

  2. Thanks (4):

    Bulat Ziganshin (23rd September 2017),Kennon Conrad (24th September 2017),Mike (23rd September 2017),Shelwien (23rd September 2017)

  3. #842
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Yes, I get similar results. I still don't understand why my compiles give global time for clock(), but I guess that mystery will remain unsolved. I agree precision is pretty good, much better than with clock.

    I am curious though about one thing you did that I didn't do in post #835:

    Code:
    double seconds = result.tv_sec + result.tv_nsec / 1e9;
    Will compilers always convert result.tv_nsec and 1e9 to double before doing the divide?

  4. #843
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,511
    Thanks
    746
    Thanked 668 Times in 361 Posts
    1e9 is a constant of double type, arith operators promote narrower operand to a wider one before calculation

  5. Thanks:

    Kennon Conrad (27th September 2017)

  6. #844
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts

    GLZA v0.9.2

    I changed a few things for the "fast" mode (-f) and got faster compression than I ever expected. The printing of elapsed time has been fixed too.

    GLZA c -f enwik8 enwik8.glza
    Code:
    Compressed 100000000 bytes -> 20722473 bytes (1.6578 bpB) in 24.142 seconds.
    GLZA c -f enwik9 enwik9.glza
    Code:
    Compressed 1000000000 bytes -> 167188835 bytes (1.3375 bpB) in 556.622 seconds.
    GLZA c -f Silesia.tar Silesia.tar.glza
    Code:
    Compressed 211957760 bytes -> 51559957 bytes (1.9460 bpB) in 132.188 seconds.
    Attached Files Attached Files

  7. Thanks (3):

    algorithm (9th October 2017),Bulat Ziganshin (10th October 2017),Lucas (9th October 2017)

  8. #845
    Member
    Join Date
    Apr 2015
    Location
    Greece
    Posts
    69
    Thanks
    32
    Thanked 22 Times in 15 Posts
    Maybe silesia.tar has low compression ratio because GLZA is not good for non uniform files?Does it remove symbols from the dictionary?

  9. #846
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Quote Originally Posted by algorithm View Post
    Maybe silesia.tar has low compression ratio because GLZA is not good for non uniform files?Does it remove symbols from the dictionary?
    Yes, GLZA is not as effective for non-uniform files as for uniform files. It seems to also generally not be as effective as LZMA for files that are not text-like. GLZA does remove low frequency symbols from the dictionary but this is likely an area that could be improved. Here's a plot of the dictionary size vs. file position for Silesia.tar. In total, there are 1,525,516 dictionary entries, but most come and go:

    Click image for larger version. 

Name:	Silesia.png 
Views:	88 
Size:	17.6 KB 
ID:	5394

  10. #847
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts

    GLZA v0.10

    Here is GLZA version 0.10. The main goal for this version was to simplify the code and improve decompression speed.

    Functionally, the "-f" flag (faster compression) is now the default and "-x" flag (extreme compression) can be used to get the previous default slower but usually better compression.

    The convoluted multi-mtf queue scheme that could hold up to 1152 symbols has been replaced by a single queue holding up to 256 symbols. To reduce the number of symbol moves between the mtf queue and the dictionary a new signal type has been added that tells the decoder whether or not to put an mtf eligible symbol into the queue. So now symbols never fall out of the mtf queue (without being used) and queue shuffling associated with queue misses is gone. The side benefit is that the queue can be smaller, making less overhead for queue hits and also giving lower average queue positions. The complication is that now the code has to figure out which symbols to put in the queue. This is not so easy, I think the code could be better but it's the best I could do for now.

    There were quite a few changes to simplify the code, too many to remember them all, and a few bug fixes including one that may have affected Gonzalo earlier. The overall results are that compression and decoding are (a little) faster and encoding is slower. Test results for enwik9 show a decompression speed improvement of about 15% and a compression improvement of almost 1 MB, putting it at #20 on LTCB, something that has been a goal of mine for quite a while:

    enwik9:
    Code:
    GLZA c enwik9 enwik9.glza:  167,832,309 bytes in 594 seconds, 7452 MB RAM.
    GLZA d enwik9.glza enwik9:  12.1 seconds, 355 MB RAM.
    Code:
    GLZA c -x -p3 enwik9 enwik9.glza:  163,768,203 bytes in 8184 seconds, 8205 MB RAM
    GLZA d enwik9.glza enwik9:  12.2 seconds, 429 MB RAM.
    enwik8:
    Code:
    GLZA c enwik8 enwik8.glza:  20,753,713 bytes in 25.9 seconds, 952 MB RAM.
    GLZA d enwik8.glza enwik8:  1.35 seconds, 47 MB RAM.
    Code:
    GLZA c -x -p3 enwik8 enwik8.glza:  20,356,097 bytes in 420 seconds, 2140 MB RAM
    GLZA d enwik8.glza enwik8:  1.40 seconds, 57 MB RAM.
    A .zip file of the source code is 69,980 bytes.

    If anyone has any problems, please let me know.
    Attached Files Attached Files

  11. Thanks (2):

    Bulat Ziganshin (2nd January 2018),JamesB (2nd January 2018)

  12. #848
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts

    GLZA v0.10.1

    I didn't realize until after the latest release that one of the 4 byte fields in the dictionary entry structure wasn't really necessary so I took it out and got about 2% faster decompression and 3 - 4% less RAM usage when decompressing enwik9 (I did not rerun the compression from v0.10):

    enwik9:
    Code:
    GLZA c enwik9 enwik9.glza:  167,832,309 bytes in 594 seconds, 7452 MB RAM.
    GLZA d enwik9.glza enwik9:  11.9 seconds, 344 MB RAM.
    Code:
    GLZA c -x -p3 enwik9 enwik9.glza:  163,768,203 bytes in 8184 seconds, 8205 MB RAM
    GLZA d enwik9.glza enwik9:  11.9 seconds, 413 MB RAM.
    enwik8:
    Code:
    GLZA c enwik8 enwik8.glza:  20,753,713 bytes in 25.9 seconds, 952 MB RAM.
    GLZA d enwik8.glza enwik8:  1.30 seconds, 45 MB RAM.
    Code:
    GLZA c -x -p3 enwik8 enwik8.glza:  20,356,097 bytes in 420 seconds, 2140 MB RAM
    GLZA d enwik8.glza enwik8:  1.36 seconds, 54 MB RAM.
    A .zip file of the source code is now 69,935 bytes.
    Attached Files Attached Files

  13. Thanks (5):

    Bulat Ziganshin (6th January 2018),Gonzalo (2nd July 2018),load (16th May 2018),Matt Mahoney (9th August 2018),Stephan Busch (9th January 2018)

  14. #849
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    685
    Thanks
    153
    Thanked 177 Times in 105 Posts
    Here is a new version of GLZA, version 0.11. It's the same basic idea but there are lots of changes, including the following highlights:

    GLZAcompress:
    Added an order 1 trailing character/leading character deduplication candidate scoring formula. This can be used by providing a "-o1.0" command line option. Without this option, it will use the order 0 score formula. Values between 0.0 and 1.0 will use a blend of the two candidate score formulas. Using the -o option forces extreme (-x) compression mode. Using the -o option slows down compression, maybe 50% on average. Overall, I'd say order 1 and order 0 typically provide similar compression ratios, order 0 creating a typically smaller grammar but order 1 creating a grammar that is slightly more predictable with an order 1 model. On the files I tested, on average -o0.5 worked a little better than -o0.0 or -o1.0.
    Increased the maximum input file size from 2 GB to almost 4 GB. I didn't test this much at all. User beware, it may not work and you better have a lot of RAM (up to 64 GB) and time, especially if you are using the -x or -o options.

    GLZAencode:
    Added additional models and blending of model predictions. It's probably not the best implementation but does generally improve the compression ratio a little.
    Simplified the selective MTF queue selection code and made it generally more effective.
    For capital encoded files, changed the MTF queue to three queues, one for symbols that start with a space character, one for symbols that start with an alphabet (a - z) character, and one for the rest of the symbols and added a field to indicate with queue the next symbol is coming out of (unless the most recent trailing character was a capital encode character, in which case it goes straight to the a - z queue.

    GLZAdecode:
    Modified the code to support GLZAencode changes.
    Changed the dictionary structure from a set of indexes into an array of dictionary structures per starting symbol/code length combination to a set of arrays of dictionary structures per starting symbol/code length combination. Now the code has to move dictionary structure data into and out of the mtf queue but it doesn't use as much RAM and the dictionary data can be accessed more directly.

    Results for enwik8/9 are below, with options -o0.6, -p4, -r20000 giving the best compression ratios of the combinations I tested. The "-x -p3" option is shown for comparison to the previous version.

    enwik9:
    Code:
    GLZA c -o0.6 -p4 -r20000 enwik9 enwik9.glza:  161,966,886 bytes, 17,031 seconds, 13,071 MB RAM
    GLZA d enwik9.glza enwik9:  9.9 seconds, 384 MB RAM
    Code:
    GLZA c -x -p3 enwik9 enwik9.glza:  162,899,155 bytes, 8219 seconds, 8220 MB RAM
    GLZA d enwik9.glza enwik9:  9.9 seconds, 383 MB RAM
    Code:
    GLZA c enwik9 enwik9.glza:  166,844,914 bytes, 593 seconds, 7463 MB RAM
    GLZA d enwik9.glza enwik9:  10.1 seconds, 327 MB RAM
    enwik8:
    Code:
    GLZA c -o0.6 -p4 -r20000 enwik8 enwik8.glza: 20,114,673 bytes, 782 seconds, 2347 MB RAM
    GLZA d enwik8.glza enwik8: 1.24 seconds, 50 MB RAM
    Code:
    GLZA c -x -p3 enwik8 enwik8.glza: 20,247,308 bytes, 423 seconds, 2142 MB RAM
    GLZA d enwik8.glza enwik8: 1.23 seconds, 50 MB RAM
    Code:
    GLZA c enwik8 enwik8.glza: 20,636,657 bytes, 25.4 seconds, 954 MB RAM
    GLZA d enwik8.glza enwik8: 1.22 seconds, 43 MB RAM
    A .zip file of the source code is 75,527 bytes.
    Attached Files Attached Files

  15. Thanks (2):

    algorithm (7th November 2018),Mike (7th November 2018)

Page 29 of 29 FirstFirst ... 19272829

Similar Threads

  1. Replies: 4
    Last Post: 2nd December 2012, 03:55
  2. Suffix Tree's internal representation
    By Piotr Tarsa in forum Data Compression
    Replies: 4
    Last Post: 18th December 2011, 08:37
  3. M03 alpha
    By michael maniscalco in forum Data Compression
    Replies: 6
    Last Post: 10th October 2009, 01:31
  4. PIM 2.00 (alpha) is here!!!
    By encode in forum Forum Archive
    Replies: 46
    Last Post: 14th June 2007, 20:27
  5. PIM 2.00 (alpha) overview
    By encode in forum Forum Archive
    Replies: 21
    Last Post: 8th June 2007, 14:41

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •