Results 1 to 16 of 16

Thread: Question about fpaq0 I/O

  1. #1
    Member
    Join Date
    May 2006
    Location
    Uruguay
    Posts
    30
    Thanks
    0
    Thanked 1 Time in 1 Post

    Question about fpaq0 I/O

    Hello, after years of lurking the forum, now that I have a bit of time (and some college) I would like to learn more about compression. The only compressor that I made apart from a RLE compressor was a simplified LOCO-I style compressor for a class.
    I'm reading fpaq0 sources. I see that fpaq0 uses single byte I/O (getc & putc).
    Is that a reasonable approach because of system's buffers or just a (over)simplification to keep the code small?
    Last edited by ggf31416; 11th March 2014 at 17:08.

  2. #2
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,572
    Thanks
    780
    Thanked 687 Times in 372 Posts
    imho, it's the lack of optimization proficiency. well, i'm not use about every modern compiler, buth when i looked into that (~15 years ago), getc/putc perfromed a lot of useless work. also, 32/64-bit memory access would be anyway more efficient

  3. #3
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    615
    Thanks
    260
    Thanked 242 Times in 121 Posts
    Most of Matt's code uses single byte I/O to keep things simple. Also, getc/putc speed heavily depends on the compiler and can be quite fast, especially if compared to fgetc/fputc. See this older thread discussing getc/putc I/O and the benchmark results I posted in there. Matt changed the zpaq code to use buffered I/O later (see this post for example).
    Last edited by schnaader; 11th March 2014 at 18:15.
    http://schnaader.info
    Damn kids. They're all alike.

  4. Thanks:

    Bulat Ziganshin (11th March 2014)

  5. #4
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,572
    Thanks
    780
    Thanked 687 Times in 372 Posts
    my own tests using fread/fwrite calls show I/O speed around 4GB/s for buffer size >=1MB. My computers are i7-2600K and i7-477 with overclocked memory (ddr3-2000) in double channel. Yes, actually, it's just a memcpy test.

    Code:
    Z:\>for /l %b in (10 1 30) do @read.exe -b%b test.arc
    Buffer   1 KB.  base-m-rep+lz4.arc 5922mb: time 260.600555 seconds, speed 22.727487 mbytes/sec
    Buffer   2 KB.  base-m-rep+lz4.arc 5922mb: time 134.673238 seconds, speed 43.979010 mbytes/sec
    Buffer   4 KB.  base-m-rep+lz4.arc 5922mb: time 69.283830 seconds, speed 85.485973 mbytes/sec
    Buffer   8 KB.  base-m-rep+lz4.arc 5922mb: time 35.406883 seconds, speed 167.278089 mbytes/sec
    Buffer  16 KB.  base-m-rep+lz4.arc 5922mb: time 18.409689 seconds, speed 321.721658 mbytes/sec
    Buffer  32 KB.  base-m-rep+lz4.arc 5922mb: time 9.914260 seconds, speed 597.401668 mbytes/sec
    Buffer  64 KB.  base-m-rep+lz4.arc 5922mb: time 5.685817 seconds, speed 1041.678870 mbytes/sec
    Buffer 128 KB.  base-m-rep+lz4.arc 5922mb: time 3.520065 seconds, speed 1682.581350 mbytes/sec
    Buffer 256 KB.  base-m-rep+lz4.arc 5922mb: time 2.433378 seconds, speed 2433.980550 mbytes/sec
    Buffer 512 KB.  base-m-rep+lz4.arc 5922mb: time 1.851349 seconds, speed 3199.178349 mbytes/sec
    
    Buffer   1 MB.  base-m-rep+lz4.arc 5922mb: time 1.564150 seconds, speed 3786.590847 mbytes/sec
    Buffer   2 MB.  base-m-rep+lz4.arc 5922mb: time 1.517026 seconds, speed 3904.216248 mbytes/sec
    Buffer   4 MB.  base-m-rep+lz4.arc 5922mb: time 1.545801 seconds, speed 3831.537076 mbytes/sec
    Buffer   8 MB.  base-m-rep+lz4.arc 5922mb: time 1.602798 seconds, speed 3695.284395 mbytes/sec
    Buffer  16 MB.  base-m-rep+lz4.arc 5922mb: time 1.559132 seconds, speed 3798.776750 mbytes/sec
    Buffer  32 MB.  base-m-rep+lz4.arc 5922mb: time 1.517372 seconds, speed 3903.324311 mbytes/sec
    Buffer  64 MB.  base-m-rep+lz4.arc 5922mb: time 1.481666 seconds, speed 3997.390298 mbytes/sec
    Buffer 128 MB.  base-m-rep+lz4.arc 5922mb: time 1.425596 seconds, speed 4154.610919 mbytes/sec
    Buffer 256 MB.  base-m-rep+lz4.arc 5922mb: time 1.424678 seconds, speed 4157.287025 mbytes/sec
    Buffer 512 MB.  base-m-rep+lz4.arc 5922mb: time 1.460279 seconds, speed 4055.934471 mbytes/sec
    Buffer   1 GB.  base-m-rep+lz4.arc 5922mb: time 1.524847 seconds, speed 3884.189359 mbytes/sec
    Attached Files Attached Files
    Last edited by Bulat Ziganshin; 11th March 2014 at 18:59.

  6. #5
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    When I wrote fpaq0, getc() and putc() were fast. In newer versions of g++ and MSVC that is no longer the case so I use fread() and fwrite() for better performance. For example on 100 MB text:

    cat enwik8 > out

    takes 2.3 seconds (2.0 GHz T3200, Win32) when compiled with MinGW 3.4.5 (g++ -O3) and takes 28.6 seconds with g++ 4.8.1.

    Code:
    // cat.cpp reads files (text mode) to stdout: cat filenames...
    // Public domain
    #include <stdio.h>
    int main(int argc, char** argv) {
      int i;
      for (i=1; i<argc; ++i) {
        FILE* f=fopen(argv[i], "r");
        if (!f)
          perror(argv[i]);
        else {
          int c;
          while ((c=getc(f))!=EOF)
            putchar(c);
          fclose(f);
        }
      }
      return 0;
    }
    Edit: 22.8 seconds in MSVC (cl 16.00.30319.01). Adding the line
    Code:
    #define _CRT_DISABLE_PERFCRIT_LOCKS
    to the beginning of the program improves the time to 1.9 seconds, but has no effect on g++.

    Edit: but fread() and fwrite() is still faster. The following takes 0.25 seconds in both g++ and MSVC. (Note that I/O is binary, not text).

    Code:
    // cat.cpp reads files to stdout: cat filenames...
    // Public domain
    #include <stdio.h>
    #include <fcntl.h>
    #include <io.h>
    int main(int argc, char** argv) {
      int i, n;
      const int N=65536;  // Smaller is slower but larger has very little effect
      _setmode(0, _O_BINARY);
      _setmode(1, _O_BINARY);
      char buf[N];
      for (i=1; i<argc; ++i) {
        FILE* f=fopen(argv[i], "rb");
        if (!f)
          perror(argv[i]);
        else {
          while ((n=fread(buf, 1, N, f))>0)
            fwrite(buf, 1, n, stdout);
          fclose(f);
        }
      }
      return 0;
    }
    Last edited by Matt Mahoney; 11th March 2014 at 23:26.

  7. #6
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,572
    Thanks
    780
    Thanked 687 Times in 372 Posts
    again, one should know how to test. try "cat >nul" first. then check writing zeroes to file, on windows it's cached and has the same 2-4 GB/s speed on my cpu. try fread/fwrite solution. 50 MB/s from disk cache is absolutely meaningless

  8. #7
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,572
    Thanks
    780
    Thanked 687 Times in 372 Posts
    Matt, look at my results above. you need buffer >1mb to get full read speed. also, "r" open mode is of course slow, use "rb" for getc() too

  9. #8
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    2.0 GHz T3200, 3 GB, Win32

    Code:
    C:\tmp>for /l %b in (10 1 30) do @.\read.exe -b%b \res\enwik9
    Buffer   1 KB.  \res\enwik9 1000mb : time 32.783418 seconds, speed 30.503226 mbytes/sec
    Buffer   2 KB.  \res\enwik9 1000mb : time 14.411144 seconds, speed 69.390745 mbytes/sec
    Buffer   4 KB.  \res\enwik9 1000mb : time 7.973190 seconds, speed 125.420313 mbytes/sec
    Buffer   8 KB.  \res\enwik9 1000mb : time 4.375216 seconds, speed 228.560156 mbytes/sec
    Buffer  16 KB.  \res\enwik9 1000mb : time 2.530460 seconds, speed 395.185052 mbytes/sec
    Buffer  32 KB.  \res\enwik9 1000mb : time 1.656544 seconds, speed 603.666382 mbytes/sec
    Buffer  64 KB.  \res\enwik9 1000mb : time 1.206855 seconds, speed 828.599818 mbytes/sec
    Buffer 128 KB.  \res\enwik9 1000mb : time 1.008861 seconds, speed 991.217266 mbytes/sec
    Buffer 256 KB.  \res\enwik9 1000mb : time 0.922969 seconds, speed 1083.460276 mbytes/sec
    Buffer 512 KB.  \res\enwik9 1000mb : time 1.208180 seconds, speed 827.691033 mbytes/sec
    Buffer   1 MB.  \res\enwik9 1000mb : time 1.269507 seconds, speed 787.707559 mbytes/sec
    Buffer   2 MB.  \res\enwik9 1000mb : time 1.221812 seconds, speed 818.456482 mbytes/sec
    Buffer   4 MB.  \res\enwik9 1000mb : time 1.187975 seconds, speed 841.768847 mbytes/sec
    Buffer   8 MB.  \res\enwik9 1000mb : time 1.191831 seconds, speed 839.045418 mbytes/sec
    Buffer  16 MB.  \res\enwik9 1000mb : time 1.175721 seconds, speed 850.542221 mbytes/sec
    Buffer  32 MB.  \res\enwik9 1000mb : time 1.178252 seconds, speed 848.714686 mbytes/sec
    Buffer  64 MB.  \res\enwik9 1000mb : time 1.201541 seconds, speed 832.264631 mbytes/sec
    Buffer 128 MB.  \res\enwik9 1000mb : time 1.253091 seconds, speed 798.026795 mbytes/sec
    Buffer 256 MB.  \res\enwik9 1000mb : time 1.340995 seconds, speed 745.714733 mbytes/sec
    Buffer 512 MB.  \res\enwik9 1000mb : time 21.486798 seconds, speed 46.540205 mbytes/sec
    Buffer   1 GB.  \res\enwik9 1000mb : time 22.459789 seconds, speed 44.524015 mbytes/sec

  10. #9
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    2.67 GHz Core i7 M620, 4 GB, Ubuntu 64 bit, Wine 1.6

    Code:
    matt@matt-Latitude-E6510:~/Downloads$ for (( i=10 ; i<=30 ; ++i )) do ./read.exe -b$i ../enwik9 ; done
    Buffer   1 KB.  ../enwik9 1000mb : time 23.998596 seconds, speed 41.669104 mbytes/sec
    Buffer   2 KB.  ../enwik9 1000mb : time 12.481632 seconds, speed 80.117729 mbytes/sec
    Buffer   4 KB.  ../enwik9 1000mb : time 6.500303 seconds, speed 153.838975 mbytes/sec
    Buffer   8 KB.  ../enwik9 1000mb : time 2.977948 seconds, speed 335.801655 mbytes/sec
    Buffer  16 KB.  ../enwik9 1000mb : time 1.512208 seconds, speed 661.284647 mbytes/sec
    Buffer  32 KB.  ../enwik9 1000mb : time 1.045031 seconds, speed 956.909321 mbytes/sec
    Buffer  64 KB.  ../enwik9 1000mb : time 0.641992 seconds, speed 1557.652051 mbytes/sec
    Buffer 128 KB.  ../enwik9 1000mb : time 0.424019 seconds, speed 2358.385996 mbytes/sec
    Buffer 256 KB.  ../enwik9 1000mb : time 0.330525 seconds, speed 3025.493413 mbytes/sec
    Buffer 512 KB.  ../enwik9 1000mb : time 0.301158 seconds, speed 3320.516141 mbytes/sec
    Buffer   1 MB.  ../enwik9 1000mb : time 0.259577 seconds, speed 3852.415503 mbytes/sec
    Buffer   2 MB.  ../enwik9 1000mb : time 0.349860 seconds, speed 2858.284538 mbytes/sec
    Buffer   4 MB.  ../enwik9 1000mb : time 0.391924 seconds, speed 2551.515741 mbytes/sec
    Buffer   8 MB.  ../enwik9 1000mb : time 0.378953 seconds, speed 2638.851762 mbytes/sec
    Buffer  16 MB.  ../enwik9 1000mb : time 0.377242 seconds, speed 2650.821118 mbytes/sec
    Buffer  32 MB.  ../enwik9 1000mb : time 0.383296 seconds, speed 2608.946338 mbytes/sec
    Buffer  64 MB.  ../enwik9 1000mb : time 0.395234 seconds, speed 2530.149899 mbytes/sec
    Buffer 128 MB.  ../enwik9 1000mb : time 0.426388 seconds, speed 2345.282312 mbytes/sec
    Buffer 256 MB.  ../enwik9 1000mb : time 0.486861 seconds, speed 2053.973494 mbytes/sec
    Buffer 512 MB.  ../enwik9 1000mb : time 0.613266 seconds, speed 1630.614062 mbytes/sec
    Buffer   1 GB.  ../enwik9 1000mb : time 0.878621 seconds, speed 1138.146906 mbytes/sec
    Last edited by Matt Mahoney; 12th March 2014 at 00:14.

  11. #10
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    Reading with mmap(2) may be the fastest of all.

    It's disappointing that getc and putc are slow, because writing buffering code is tiresome. stdio is supposed to handle buffering. Does setvbuf have any effect?

  12. #11
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    Quote Originally Posted by Matt Mahoney View Post
    When I wrote fpaq0, getc() and putc() were fast. In newer versions of g++ and MSVC that is no longer the case so I use fread() and fwrite() for better performance. For example on 100 MB text:

    cat enwik8 > out

    takes 2.3 seconds
    Perhaps you should have called that one dog.

  13. #12
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    Quote Originally Posted by Bulat Ziganshin View Post
    again, one should know how to test. try "cat >nul" first. then check writing zeroes to file, on windows it's cached and has the same 2-4 GB/s speed on my cpu. try fread/fwrite solution. 50 MB/s from disk cache is absolutely meaningless
    I don't agree that it's meaningless. There are two components to performance that are worth testing. One is the CPU impact of getc/putc and the other is the impact on disk I/O. The OS might negate the impact on disk I/O, if they result in the same amount of read-ahead and buffering in the kernel.

  14. #13
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,572
    Thanks
    780
    Thanked 687 Times in 372 Posts
    nburns, the "disk I/O" part includes copying from the cache to program buffer that is 4GB/s on my Windows +2*ddr3-2000 config, and actual ssd/hdd reading that may be slow. i believe that the last component should be excluded from the testing, both because there are disks of various speed, and because in real operation data may be already cached

    Reading with mmap(2) may be the fastest of all.
    reading mmapped file is zero-time because the data are already here. but if data aren't cached, in windows it may slowdown execution, probably because it can't read-ahead such data. mmap is great for benchmarking algorithms, though, since you can deal with large real files and still drop the copying-from-the-cache overhead

  15. #14
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    Ah. You must have meant the opposite of what I thought you meant. I'd expect that the most important issue between getc and fread is the number of system calls that result. It would actually be simple to find out, at least in linux. If getc does lots of small reads, that would explain the slowness. In linux, you'd just run it with strace.

  16. #15
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,572
    Thanks
    780
    Thanked 687 Times in 372 Posts
    getc/fgetc uses the buffer set by setvbuf. the problem is their implementation, f.e. locking to ensure correct m/t behavior

  17. #16
    Member
    Join Date
    Feb 2013
    Location
    San Diego
    Posts
    1,057
    Thanks
    54
    Thanked 72 Times in 56 Posts
    I think when you use fread(3) to read a large amount it also bypasses the stdio buffer.

Similar Threads

  1. Lempel Ziv question
    By azizever83 in forum Data Compression
    Replies: 4
    Last Post: 23rd April 2012, 14:39
  2. lzp question
    By sourena in forum Data Compression
    Replies: 4
    Last Post: 5th February 2012, 17:24
  3. Question about BWTS
    By Piotr Tarsa in forum Data Compression
    Replies: 4
    Last Post: 1st May 2011, 22:01
  4. LZC Question
    By moisesmcardona in forum Data Compression
    Replies: 3
    Last Post: 16th August 2009, 22:33
  5. RVLC Question
    By pessen in forum Data Compression
    Replies: 3
    Last Post: 11th July 2009, 03:29

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •