Results 1 to 23 of 23

Thread: Random reads vs random writes

  1. #1
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,475
    Thanks
    26
    Thanked 121 Times in 95 Posts

    Random reads vs random writes

    Update:
    New program version is here: http://encode.su/threads/1285-Random...ll=1#post25070 Please test new version.
    End update.

    I've always thought that random writes should be slower on CPU than random reads. I thought that CPU, in order to write something to memory, must read a cache line, modify it and then write back that cache line. Unfortunately (or maybe fortunately) this isn't the case. On my system random writes are much faster than random reads.

    Here is the testing program: https://ideone.com/6Ddrx (**ATTENTION**: I've added Mersenne Twister. Previous versions used rand() which is totally broken).

    I have a request for you - please run that program and write the results, together with the description of your system: CPU, RAM, OS and compiler. You should run program a few times at least and then choose most representative results (eg most repetitive/ reproducable).

    My example results are:
    Code:
    1000000
    700000
    787058355
    3450000
    -59746632
    4910000
    My system is:
    CPU: Intel Core 2 E8400, 3.00 GHz
    RAM: 8 GiB, DDR2 800 MHz, CL5, dual-channel
    OS: Ubuntu 10.10 64-bit
    Compiler: GCC 4.4.4, 64-bit output binary, compilation options: -O3

    As you can see the speed ration of random reads vs random writes is different on my system than on ideone's system. I'm curious about that ration on different systems (setups).

    Update - results on my netbook:
    CPU: AMD Zacate E-350, 1.60 GHz
    RAM: 4 GiB, DDR3 1066 MHz, single-channel
    OS: Ubuntu 11.04 64-bit
    Compiler: GCC 4.5.2, 64-bit output binary, compilation options: -O3
    Code:
    1000000
    2060000
    787058355
    20500000
    -59746632
    20690000
    Last edited by Piotr Tarsa; 6th May 2011 at 18:58.

  2. #2
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Pentium D 2.66, 1 GB DDR2 533, XP x64, Mingw 4.5.2 (32-bit) -O3
    Code:
    1000
    1984
    787058355
    17562
    -59746632
    22360
    
    C:\MinGW\bin>a
    1000
    1968
    787058355
    18141
    -59746632
    22656
    I won't run it on my Atom netbook because unblocked ads on the website annoyed me.

  3. #3
    Tester
    Black_Fox's Avatar
    Join Date
    May 2008
    Location
    [CZE] Czechia
    Posts
    471
    Thanks
    26
    Thanked 9 Times in 8 Posts
    Intel Q6600 @3GHz, 2x2 GB DDR2 1066, Win7 x64, MinGW 4.5.1 (32-bit), compiled with -O3

    Code:
    1000
    765
    787058355
    3703
    -59746632
    5088
    I am... Black_Fox... my discontinued benchmark
    "No one involved in computers would ever say that a certain amount of memory is enough for all time? I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again." -- Bill Gates

  4. #4
    Programmer osmanturan's Avatar
    Join Date
    May 2008
    Location
    Mersin, Turkiye
    Posts
    651
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I guess, you already know but, I want to share it anyway. Did you consider to use MOVNTQ instruction to minimize cache pollution thus faster writing?

    http://www.rz.uni-karlsruhe.de/rz/do...ence/vc198.htm
    BIT Archiver homepage: www.osmanturan.com

  5. #5
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,475
    Thanks
    26
    Thanked 121 Times in 95 Posts
    Thanks for the tests.

    Osman:
    Yes, I know about cacheless instructions. But those are possibly useless here as I want to write single bytes to random locations. When using MOVNTQ I would have to read 8 bytes, then change one byte in it, and then write those 8 bytes back. That could be even slower than single byte write alone.

    Cache pollution isn't a problem here as in my case, ie. retrieving BWT transformed data from Suffix Array is completely incompatible with caching. So if cache can't help me speed up that process, then I don't care about what it holds.

    The thing that could possibly help is prefetching. As I know in advance the addresses I want to access I could prefetch them very early. However, specyfing: -fprefetch-loop-arrays -march=core2 in addition to -O3 doesn't bring any benefit over simply -O3.

    I have mailed Agner For asking for an explanation of faster writes than reads.
    Last edited by Piotr Tarsa; 3rd May 2011 at 23:14.

  6. #6
    Member
    Join Date
    Mar 2011
    Location
    Google Switzerland
    Posts
    19
    Thanks
    0
    Thanked 0 Times in 0 Posts
    This is probably the simple effect of writeback cache. For a random write, the write happens in the background while you're going to the next loop iteration; for a random read, the CPU at some point has to wait for the read to happen before continuing. (If you unroll the loop, you might be able to keep more reads in-flight, which may or may not help overall speed.)

    Prefetching would probably help a lot, at least if you do something with your data in addition to just reading it, but you can't really rely on the compiler to figure out cases like this easily. Use __builtin_prefetch() manually and stay a few iterations ahead of your actual reading (experiment with different values here), and you should see a marked improvement.

    /* Steinar */

  7. #7
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,475
    Thanks
    26
    Thanked 121 Times in 95 Posts
    Many thanks! That function greatly improves random read speed.

    My new program is here: http://pastebin.com/yM3DsrLN (Note: It didn't compile from command line, but it compiles in NetBeans, I don't really know what's the reason for that)

    Results on NetBeans 6.9 using my desktop setup (as I've described in first post), except that compiler options are default to Release profile in NetBeans:
    Code:
    CLOCKS_PER_SEC: 1000000
    Linear write time: 780000
    Random write time: 3600000
    Random read time:  5160000, control value: -59606515
    Random read time:  2160000, control value: -59606515, prefetch queue size: 8
    
    RUN SUCCESSFUL (total time: 12s)
    Additionally, results on netbook, again this time I've run it in NetBeans (version 7.0 this time) and Release profile:
    Code:
    CLOCKS_PER_SEC: 1000000
    Linear write time: 2000000
    Random write time: 21160000
    Random read time:  21560000, control value: -59606515
    Random read time:  10420000, control value: -59606515, prefetch queue size: 8
    
    RUN SUCCESSFUL (total time: 55s)
    Two cases are too small data set to draw conclusions but nevertheless this knowledge is very valuable.



    m^2:
    Could you run the new program?

    Also I would want someone with Nehalem or Sandy Bridge based processor to post results.
    Last edited by Piotr Tarsa; 4th May 2011 at 12:11.

  8. #8
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Code:
    C:\MinGW\bin>1
    CLOCKS_PER_SEC: 1000
    Linear write time: 2046
    Random write time: 20813
    Random read time:  23422, control value: -59606515
    Random read time:  23187, control value: -59606515, prefetch queue size: 8
    
    C:\MinGW\bin>1
    CLOCKS_PER_SEC: 1000
    Linear write time: 1984
    Random write time: 20641
    Random read time:  23234, control value: -59606515
    Random read time:  23391, control value: -59606515, prefetch queue size: 8

  9. #9
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,475
    Thanks
    26
    Thanked 121 Times in 95 Posts
    m^2:
    For what architecture you've compiled the program? Processors earlier than Pentium II or so didn't have prefetch instruction, so if you compile against 32-bit architectures add for example switch to compile for Atom (and not for default 386 or something).

  10. #10
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Added -march=pentium4:
    Code:
    C:\MinGW\bin>1
    CLOCKS_PER_SEC: 1000
    Linear write time: 1890
    Random write time: 19000
    Random read time:  21141, control value: -59606337
    Random read time:  10625, control value: -59606337, prefetch queue size: 8
    
    C:\MinGW\bin>1
    CLOCKS_PER_SEC: 1000
    Linear write time: 2078
    Random write time: 18656
    Random read time:  20391, control value: -59606515
    Random read time:  11250, control value: -59606515, prefetch queue size: 8

  11. #11
    Member Raymond_NGhM's Avatar
    Join Date
    Oct 2008
    Location
    UK
    Posts
    51
    Thanks
    0
    Thanked 0 Times in 0 Posts
    The another looking good, with EVEREST cache & memory benchmark tool,

    tested on my old PC, at least 3 years ago...
    on triple elements of Read/Write/Copy.

    Cause in this case, CPU & RAM are overclocked, in Windows XP SP2
    CPU Cache size, L1:32KB, L2:128KB

    Note that, it shown full result on each segments of RAM,L1,L2 & L3.
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	cache_mem_RWC.png 
Views:	361 
Size:	51.0 KB 
ID:	1568  

  12. #12
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    Raymond, stuff created by computers is not copyrightable, so your copyright claim on the picture is meaningless.

  13. #13
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,475
    Thanks
    26
    Thanked 121 Times in 95 Posts
    Raymond:
    This report doesn't mention the number of write buffers in CPU and the effect of prefetching reads. Thus the numbers are pretty much irrelevant from my point of view. Try compiling & running my updated program to see the results. I bet they won't reflect the numbers Everest provided - Everest simply measures different things.


    Quote Originally Posted by m^2 View Post
    Raymond, stuff created by computers is not copyrightable, so your copyright claim on the picture is meaningless.
    What do you mean by "created by computers"? If I create an art in GIMP or something the it isn't copyrightable? Or if I spend several hours, tuning parameters, in some program that generates fractals?
    Last edited by Piotr Tarsa; 6th May 2011 at 19:04.

  14. #14
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Actually, no. You can copyright the picture. But you can't copyright data, like the table displayed in the picture.

  15. #15
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    I talked up a case of game screenshots with an IP lawyer and it was obvious to her that there's no copyright. The reasons didn't surface in that talk, but my interpretation is that there is no creativity whatsoever - and the only case where you get copyright on things that are not creative is databases.

  16. #16
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    My interpretation is that when you take a screenshot of a game, you are not copying the game. You are creating a picture. Just like if you take a picture of me or something I own, you still own the copyright on the photo.

  17. #17
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,611
    Thanks
    30
    Thanked 65 Times in 47 Posts
    The talk was whether review sites, not game makers, have copyright on screenshots.
    And as to photos - I don't know much about them, but I'm pretty sure that not all are copyrightable.

  18. #18
    Member Raymond_NGhM's Avatar
    Join Date
    Oct 2008
    Location
    UK
    Posts
    51
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Piotr Tarsa View Post
    Raymond:
    This report doesn't mention the number of write buffers in CPU and the effect of prefetching reads. Thus the numbers are pretty much irrelevant from my point of view. Try compiling & running my updated program to see the results. I bet they won't reflect the numbers Everest provided - Everest simply measures different things.
    CPU: Look at screenshot
    RAM: 512MB SDRAM
    OS: Windows XP SP3
    Compiler: MinGW GCC v4.5.1, 1st tune on def.mode & 2nd on i486

    Code:
    L:\>rw_i386
    CLOCKS_PER_SEC: 1000
    Linear write time: 9603
    Random write time: 51775
    Random read time:  47708, control value: -59606335
    Random read time:  41069, control value: -59606335, prefetch queue size: 8
    
    L:\>rw_i486
    CLOCKS_PER_SEC: 1000
    Linear write time: 9593
    Random write time: 51835
    Random read time:  47708, control value: -59606335
    Random read time:  47699, control value: -59606335, prefetch queue size: 8
    But what different between previous benchmark screenshot with this?
    just only, upgraded OS from SP2 to SP3 (7 month ago) that's improved
    result's values in same test.
    if carefully look at the bottom of left shot, we find a "Save" botton,
    for autosave benchmark into .PNG format.

    Well also here...
    Memory benchmark guide from Lavalys EVEREST Ultimate Edition v4.50.1330 + test sample
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	CacheMem_RWC.png 
Views:	302 
Size:	48.7 KB 
ID:	1571  
    Attached Files Attached Files

  19. #19
    Member Raymond_NGhM's Avatar
    Join Date
    Oct 2008
    Location
    UK
    Posts
    51
    Thanks
    0
    Thanked 0 Times in 0 Posts
    To ALL:

    How one hardware(s) is fixed in one body but software(s), not so ?
    because nowadays, software(s) must be toward compatible with
    hardware(s) quickly for optimal performance...
    Texts (character), Graphics (pixel), Binaries (code),
    all created from "one base of little bit" into virtual space,
    however who "Human" can see or "Touch" them by invisible hands
    but not so directly by physical mode.(...always need to buy enough time)
    plan, theorize, advice, develop, assemble, compile, test, result,...

    Get to the available answer, let them who have understanding:
    ... so let them "SHALL BE DONE BY 'BYTE' MODIFICATION"

    like a gaming by words
    [Who you know who i know, so they know who human wants what to know...]
    (c) 2011 by Raymond N.GhM
    Last edited by Raymond_NGhM; 15th May 2011 at 19:39.

  20. #20
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,475
    Thanks
    26
    Thanked 121 Times in 95 Posts
    PREFETCH instructions are a part of SSE instruction set. Do your program use SSE extensions? The answer is: No, because you forbidden the compiler to do that. Your processor supports SSE extensions so you can use and test them if you really want.

  21. #21
    Member Raymond_NGhM's Avatar
    Join Date
    Oct 2008
    Location
    UK
    Posts
    51
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Piotr Tarsa View Post
    PREFETCH instructions are a part of SSE instruction set. Do your program use SSE extensions? The answer is: No, because you forbidden the compiler to do that. Your processor supports SSE extensions so you can use and test them if you really want.
    Tuned on -march=pentium3 (MMX,SSE)

    Code:
    L:\>rw_P3_SSE
    CLOCKS_PER_SEC: 1000
    Linear write time: 9433
    Random write time: 51624
    Random read time:  45906, control value: -59606515
    Random read time:  43533, control value: -59606515, prefetch queue size: 8
    well, same as i predicted before, no more affect,
    the main program is not stable, at least on this system...
    look at tested "Instructions_Latency_dump.txt" in .7z archive

  22. #22
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,475
    Thanks
    26
    Thanked 121 Times in 95 Posts
    Thanks for testing. Anyway, it looks weird. Maybe prefetching was disabled in Celerons?

  23. #23
    Member Raymond_NGhM's Avatar
    Join Date
    Oct 2008
    Location
    UK
    Posts
    51
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Piotr Tarsa View Post
    Maybe prefetching was disabled in Celerons?
    Try look in disassembled program, directly generated from object file.
    same time look into tested "Instructions_Latency_dump.txt"
    to find "PREFETCHxx" instruction(s)...

    Code:
    :
    :
    :
    Inst  679 SSE   : PREFETCHNTA [mem]             L: [memory dep.]    T:   1.05ns=  1.00c
    Inst  680 SSE   : PREFETCHT0 [mem]              L: [memory dep.]    T:   1.05ns=  1.00c
    Inst  681 SSE   : PREFETCHT1 [mem]              L: [memory dep.]    T:   1.05ns=  1.00c
    Inst  682 SSE   : PREFETCHT2 [mem]              L: [memory dep.]    T:   1.05ns=  1.00c
    :
    :
    :
    it's simple, you can find answer your question...
    but in best way you must try put direct inline-asm. into program, don't trust same known compilers...
    Attached Files Attached Files

Similar Threads

  1. Random neural network weights for paq8
    By byronknoll in forum Data Compression
    Replies: 2
    Last Post: 25th March 2011, 01:53
  2. Sometimes data look like random... here's an interesting file:
    By Alexander Rhatushnyak in forum The Off-Topic Lounge
    Replies: 29
    Last Post: 25th December 2010, 04:05
  3. goodbye and some random thoughts
    By Christian in forum The Off-Topic Lounge
    Replies: 72
    Last Post: 25th January 2010, 05:40
  4. Dark Space Random Thoughts
    By Tribune in forum The Off-Topic Lounge
    Replies: 19
    Last Post: 14th March 2009, 16:22
  5. Random Data Question.
    By Tribune in forum Data Compression
    Replies: 7
    Last Post: 13th June 2008, 20:30

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •