Results 1 to 19 of 19

Thread: super-fast i/o

  1. #1
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    i just recalled one more technique for really fast i/o: memory-mapped files. i've even developed os-independent library for m/m files, but it is written in Haskell

    m/m files have its own drawbacks but at very least using it avoids superfluous data copying to/from i/o buffers which is important for programs like quicklz whose speed is ~1/3 of memcpy()

  2. #2
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    In the first place I wanted to say that the argument is a lot interesting and is bond that in the situated one of Bulat is dealt with all the link necessary! I wanted to ask you Bulat if in pseudo-tails you would know aituarmi to understand the operation and the differences between the ROLZ (that he is simple) and LZMA that instead I do not succeed well to understand! use also LZMA a key (key) for the position and a key that the length of the macth indicates? .....
    Example [ROLZ]
    aaaaabbbbb
    for [bbbbb]
    [a] = context;
    [1] = key position on the buffer 1 of 1 in this case
    [5] = length
    LZMA ?
    Thanks!

  3. #3
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    lzma is just lz77. it uses additional modeling tricks to bettter predict some codes, encode described these tricks once (-pb and so on)

  4. #4
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    I have understood therefore in some way LZMA reduces the position Keys analyzing to all possible the Key and excluding those less possible ones! As far as the speech of the I/O it is for the Array that for the writing of the files compressed, then exists the possibility to increase the speed of reading and writing (getc,putc,fread,fwrite) of the bytes and however it is tied to the buffer of the hard-disk?

  5. #5
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    have you tried asynchronous i/ o? that mode doesn't use any buffering. it is enabled by specifying FILE_FLAG_NO_BUFFERING flag in CreateFile api call.

    with memory mapped files you can't process files larger than 2 gigabytes under 32- bit os environment.

  6. #6
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    donkey7
    no. this may be a bit harder to implement compared to mmap scheme and anyway i'm not really interested in this implementation. just thoughts...

    in windows, file-mapping API has access to large files:

    LPVOID MapViewOfFile(

    HANDLE hFileMappingObject, // file-mapping object to map into address space
    DWORD dwDesiredAccess, // access mode
    DWORD dwFileOffsetHigh, // high-order 32 bits of file offset
    DWORD dwFileOffsetLow, // low-order 32 bits of file offset
    DWORD dwNumberOfBytesToMap // number of bytes to map
    );

    i tested it using 5gb file

  7. #7
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    Quote Originally Posted by Nania Francesco Antonio
    I have understood therefore in some way LZMA reduces the position Keys analyzing to all possible the Key and excluding those less possible ones!
    it is better to ask encode, but afaik lzma just uses some data bits to better predict other values. for example, lzma may predict match characteristics based on lower bits of last char before this match

    Quote Originally Posted by Nania Francesco Antonio
    ...As far
    i dont understood that you tried to ask here

  8. #8
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    Ok:
    it is possible to more fastly manage the Array with some library in c++ ?

  9. #9
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    what you mean by *managing* array?

  10. #10
    Tester
    Nania Francesco's Avatar
    Join Date
    May 2008
    Location
    Italy
    Posts
    1,565
    Thanks
    220
    Thanked 146 Times in 83 Posts
    examples:

    ----

    unsigned char array[16][16][16],x;

    x=array[8][8][8]
    ----
    // other method
    unsigned char x;
    array=(unsigned char) malloc (1<<12);
    x=*(arrray+8+(8<<4)+(8<<);
    ---
    ok!

  11. #11
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    3,982
    Thanks
    377
    Thanked 351 Times in 139 Posts
    Quote Originally Posted by Nania Francesco Antonio
    I have understood therefore in some way LZMA reduces the position Keys analyzing to all possible the Key and excluding those less possible ones!
    If I correctly understood the statement. No, actually LZMA is not ROLZ. (LZMA is ROLZ... NOT ). It encodes match positions in manner of the most LZ77 coders - bucket number, bucket position. In addition, LZMA has two types of literals - regular literal and literal with matched byte, in terms of Igor Pavlov.
    + Regular literals are coded using bit-oriented coder with N-higher bits of previous byte as context.
    + Literals with macthed bytes are coded with the special context. Special context created from "regular context" and a first byte from rep0 position.


  12. #12
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    Nania Francesco Antonio
    i still don't understand what you want to ask

  13. #13
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    well, i also tried to manage i/o faster. here what i got on my beloved testfile "all":

    D: emp>t tor5.exe all -2
    greedy 32kb hash1, buffer 2mb, hufcoder: compressed 239919 -> 63360 kb (26.4%): 6.349 sec
    Elapsed time = 13.259 seconds

    D: emp>t tor5.exe all -1
    greedy 16kb hash1, buffer 1mb, bytecoder: compressed 239919 -> 83636 kb (34.9%): 3.880 sec
    Elapsed time = 14.270 seconds

    D: emp>t slug.exe c all nu
    SLUG 1.1 (Apr 26 2007) - Copyright (c) 2007 by C. Martelock - TEST Version
    234296.77 KiB -> 67969.86 KiB (29.01%)
    Elapsed time = 16.443 seconds

    D: emp>t quick.exe all nu
    all
    Compressed 239919888 bytes.
    Elapsed time = 17.785 seconds

    D: emp>t quick32.exe all nu
    OK
    Elapsed time = 18.967 seconds


    last two results are for quicklz 1.20, and 1.20, accordingly

  14. #14
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    and now how i got it. first, why quicklz and tor -1 performs slower here. just because they write too much data, slug strategy of boring simple match finder plus huffman encoder turned out to be best here - huffman encoding makes output about 20% smaller what means that we need to write less data to disk

    second - results of my tests with various outbuf sizes (input data in all tests are read in 256k chunks):

    WRITE 16kb BLOCKS
    greedy 16kb hash1, buffer 1mb, bytecoder: compressed 239919 -> 83636 kb (34.9%): 3.999 sec
    Elapsed time = 23.684 seconds

    WRITE 256kb BLOCKS
    greedy 16kb hash1, buffer 1mb, bytecoder: compressed 239919 -> 83636 kb (34.9%): 3.855 sec
    Elapsed time = 17.985 seconds

    WRITE 16mb BLOCKS
    greedy 16kb hash1, buffer 1mb, bytecoder: compressed 239919 -> 83636 kb (34.9%): 3.899 sec
    Elapsed time = 13.970 seconds

    WRITE FULL COMPRESSED FILE AT ONCE
    greedy 16kb hash1, buffer 1mb, bytecoder: compressed 239919 -> 83636 kb (34.9%): 3.793 sec
    Elapsed time = 13.800 seconds

    AND FOR COMAPRISON - COMPRESSED DATA ARE WRITTEN TO ANOTHER HARD DRIVE
    greedy 16kb hash1, buffer 1mb, bytecoder: compressed 239919 -> 83636 kb (34.9%): 4.071 sec
    Elapsed time = 12.157 seconds


    as you see, the larger outbuf is better, up to the whole file. but i limited it in tor5 (tested above) to 16mb which gives almost the same results. that's all


    so, async i/o as it implemented in freearc, has one serious drawback - when we try to read and write data simultaneously, data are processed in small chunks and disk heads are jumping between read and write positions too much:

    D: emp>t arc a a -m1 all
    Read-ahead cache 16mb
    Compressed 1 file, 239.919.888 => 111.691.966 bytes. Ratio 46.5%
    Compression time 10.80 secs, speed 22.224 kb/s. Total 23.65 secs
    Elapsed time = 24.695 seconds

  15. #15
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    hmm, async i/ o for proper functioning requires double or triple buffering for both input and output. ie. one buffer for data that is read from disk, one for operating, and maybe one for better dealing with delays. same goes for output buffers. after compressing one block, we issue waitformltipleobjects.

    huh, async i/o is describes here: http://msdn2.microsoft.com/en-us/library/aa365683. aspx . maybe i'll try it soon.

  16. #16
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    probably, good async i/o requires several buffers both on input and ouput side and issuing only one operation each time, so disk heads will not jump after each track read. i will try to implement this to some degree in freearc

  17. #17
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    i made the same test with thor 0.95:

    >>>>>>>t thor e1 all nu
    SIZE: 239919888 bytes
    COMPRESSED: 80501140 bytes
    RATIO: 33.55% 2.684 bpB
    TIME: 16sec 324msec
    SPEED: 14.02 MB/sec
    Elapsed time = 16.343 seconds

    >>>>>>>t thor e2 all nu
    COMPRESSED: 67950708 bytes
    RATIO: 28.32% 2.266 bpB
    TIME: 15sec 693msec
    SPEED: 14.58 MB/sec
    Elapsed time = 15.913 seconds

  18. #18
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    the same 16mb output buffer greatly reduces deompression time:

    D: emp>t tor6.exe dll700.tor
    Decompressed 328672 -> 690514 kb (47.6%): 22.776 sec
    Elapsed time = 45.145 seconds

    D: emp>t thor d temp tempd
    SIZE: 690514620 bytes
    COMPRESSED: 330829348 bytes
    RATIO: 47.91% 3.833 bpB
    TIME: 57sec 463msec


    for info, mem-to-mem decompression speed was almost the same:

    D: emp>t thor d temp nul|tail -1
    Elapsed time = 24.034 seconds

    D: emp>t tor6.exe dll700.tor -o
    Decompressed 328672 -> 690514 kb (47.6%): 22.173 sec
    Elapsed time = 23.704 seconds

  19. #19
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,507
    Thanks
    742
    Thanked 665 Times in 359 Posts
    copy of my message in another thread, just to save all i/o related talks together:

    it is what i do and what i propose here. the only thing where you wrong is what windows don't cache too much written data, it's only a few mb typical. so i advise to write (de)compressed data to nul, which any quick compressor allows

    just a few interesting results. first, compression to nul of 5gb file:

    D:/temp>t tor6 5g -2 -o
    greedy 16kb hash1, buffer 2mb, hufcoder: compressed 5586729 -> 2526058 kb (45.2%): 172.587 sec
    Elapsed time = 249.870 seconds

    D: emp>t thor e2 5g nul
    COMPRESSED: 2541073872 bytes
    RATIO: 45.48% 3.639 bpB
    TIME: 229sec 831msec
    SPEED: 23.18 MB/sec
    Elapsed time = 231.382 seconds

    SLUG 1.1 (Apr 26 2007) - Copyright (c) 2007 by C. Martelock - TEST Version
    D: emp>t slug c 5g nul
    5455791.00 KiB -> 0.00 KiB (0.00%)
    Elapsed time = 218.273 seconds


    as you see, slug is fastest here while my program is slowest. now, the same operation with compressed data written to the same disk:

    D: emp>t tor6.exe 5g -2
    greedy 16kb hash1, buffer 2mb, hufcoder: compressed 5586729 -> 2526058 kb (45.2%): 173.276 sec
    Elapsed time = 340.019 seconds

    D: emp>t thor e2 5g e2
    COMPRESSED: 2541073872 bytes
    RATIO: 45.48% 3.639 bpB
    TIME: 408sec 317msec
    SPEED: 13.05 MB/sec
    Elapsed time = 408.648 seconds

    D: emp>t slug c 5g 5g.slug
    5455791.00 KiB -> 2552156.50 KiB (46.78%)
    Elapsed time = 417.460 seconds


    now ratings are opposite - my program is fastest while slug is lowest. why? as i wrote in other thread, i read data in 256kb chunks (and it seems that my HD/OS does some partial read-ahead - it's clearly impossible to read 5.5 gb of data in 70 seconds because my disk has 37mb/s speed) and writes output in 16 mb chunks, greatly avoiding excessive disk seeks. seems that it is the best "primitive" strategy (i.e. without using of async i/o, m/m files and threads)

Similar Threads

  1. QuikCAT Miliki Super Compressor
    By osmanturan in forum Data Compression
    Replies: 2
    Last Post: 2nd November 2008, 19:29

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •