Results 1 to 6 of 6

Thread: File system repercussion on benchmarks

  1. #1
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    538
    Thanks
    238
    Thanked 92 Times in 72 Posts

    Question File system repercussion on benchmarks

    Is wide known benchmarks are essential to development in data compression algorithms. There are plenty of them, and every time a packer is released, we look at some of the more important to see how they perform.

    In order to make possible a comparison on other machines, each benchmark specifies CPU speed, SS.OO. installed, RAM model and son on.

    What I haven't realized is how much file systems impacts on speed... For example, I just made precomp run 7x faster just by moving the test file from a NTFS partition to an ext4 on the same disk. You can argue and say that Linux implementation of NTFS driver isn't that good as native, and that's probably true. But I don't think it justifies 7x slowdown...
    Of course, precomp is a somehow special case because it makes an intensive use of very small temp files. But even if the final difference is not that big, it could still be big enough to pay attention.

    So, what are your impressions? Have you seen something like that before? What are the numbers in your hard drives? Maybe we should start including HD type and FS used on every benchmark.

  2. #2
    Member
    Join Date
    Feb 2015
    Location
    United Kingdom
    Posts
    176
    Thanks
    28
    Thanked 74 Times in 44 Posts
    I have encountered speed oddities when running compression tests, such as compressing a second time can be done in under half the time of the first. Here's a log of a test with such speed results http://pastebin.com/T4K4LUig, tested on a 1TB Western Digital Red (NTFS format). I know the cache on the drive is too small to give it this much of a speed advantage on enwik9, I'm curious as to what's actually going on during that second test.

  3. #3
    Member
    Join Date
    Aug 2014
    Location
    Argentina
    Posts
    538
    Thanks
    238
    Thanked 92 Times in 72 Posts
    Quote Originally Posted by Lucas View Post
    I have encountered speed oddities when running compression tests, such as compressing a second time can be done in under half the time of the first. Here's a log of a test with such speed results http://pastebin.com/T4K4LUig, tested on a 1TB Western Digital Red (NTFS format). I know the cache on the drive is too small to give it this much of a speed advantage on enwik9, I'm curious as to what's actually going on during that second test.
    Maybe OO.SS. caché in RAM? I have such a speed-up on fast compressors like lz4 on test files up to 1gb approximately. That's usually my free RAM.

  4. #4
    Member
    Join Date
    Sep 2007
    Location
    Denmark
    Posts
    923
    Thanks
    57
    Thanked 115 Times in 92 Posts
    This goes way back to windows 2000 times

    but back then i would format my smaller partions into fat 32 because if small speed gains. that would typically be my temp partition which does recieve a lot of small file writes.
    NTFS do have a bigger overhead compared to FAT32. but NTFS has some benefits in regards to Data safety

  5. #5
    Member jibz's Avatar
    Join Date
    Jan 2015
    Location
    Denmark
    Posts
    124
    Thanks
    106
    Thanked 71 Times in 51 Posts
    On Windows and Linux you can try telling the OS to remove a file from the cache before using it. Something along the lines of:

    static void
    clear_file_from_cache(const char *name)
    {
    #if defined(_WIN32) || defined(__CYGWIN__)
    HANDLE hFile = INVALID_HANDLE_VALUE;

    hFile = CreateFile(name, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING,
    FILE_ATTRIBUTE_NORMAL | FILE_FLAG_NO_BUFFERING, NULL);

    if (hFile != INVALID_HANDLE_VALUE) {
    CloseHandle(hFile);
    }
    #else
    int fd = -1;

    fd = open(name, O_RDONLY);

    if (fd != -1) {
    fdatasync(fd);
    posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED);
    close(fd);
    }
    #endif
    }

  6. #6
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    354
    Thanks
    12
    Thanked 35 Times in 29 Posts
    The problem is correlated with very big dataset, and advanced filesystems.

    For example I use very often zfs on my machine, with already built-in LZ compression,hash cheking and (sometimes) even deduplication.

    So for my test I use ramdisk for ~60GB (I have a 128GB machine), but into real world (say 1TB) using SSD or spinning disk is necessary.

Similar Threads

  1. Asymetric Numeral System
    By Cyan in forum Data Compression
    Replies: 250
    Last Post: 28th July 2020, 16:37
  2. Replies: 32
    Last Post: 8th January 2016, 09:47
  3. Replies: 39
    Last Post: 10th April 2014, 22:26
  4. Replies: 3
    Last Post: 19th April 2013, 16:33
  5. Greetings, Questions, and Benchmarks
    By musicdemon in forum Data Compression
    Replies: 4
    Last Post: 8th January 2012, 21:45

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •