Results 1 to 17 of 17

Thread: How does memory speed affect compression time?

  1. #1
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    404
    Thanked 403 Times in 153 Posts

    Exclamation How does memory speed affect compression time?

    So, do we need a faster RAM?

    Previously, I did a quick test with LZMA. Faster RAM shows pretty nice boost.
    This time I decided to run something more interesting. First of all, I will test the performance of PAQ8L - the latest PAQ8 from Matt. Secondly, I will use more DIMMs.

    System specs

    Intel Core i7-2700K @ 4.6 GHz watercooled with Corsair H80
    ASRock Z68M-ITX/HT
    240 GB SSD Corsair Force GT

    All memory modules are pairs of 4 GB ones - 8 GB for each setup.

    Results: memory modules and the time needed to compress ENWIK8:

    No code has to be inserted here.

    Quote Originally Posted by The detailed log

    Kingston KVR 1333 MHz

    C:\>timer paq8l -8 enwik8

    Timer 3.01 Copyright (c) 2002-2003 Igor Pavlov 2003-07-10
    Creating archive enwik8.paq8l with 1 file(s)...
    enwik8 100000000 -> 17916420
    100000000 -> 17916450
    Time 4210.03 sec, used 1643021323 bytes of memory

    Kernel Time = 1.435 = 00:00:01.435 = 0%
    User Time = 4203.540 = 01:10:03.540 = 99%
    Process Time = 4204.975 = 01:10:04.975 = 99%
    Global Time = 4210.093 = 01:10:10.093 = 100%


    Corsair Vengeance LP1600 MHz

    C:\>timer paq8l -8 enwik8

    Timer 3.01 Copyright (c) 2002-2003 Igor Pavlov 2003-07-10
    Creating archive enwik8.paq8l with 1 file(s)...
    enwik8 100000000 -> 17916420
    100000000 -> 17916450
    Time 4066.51 sec, used 1643021323 bytes of memory

    Kernel Time = 1.216 = 00:00:01.216 = 0%
    User Time = 4063.373 = 01:07:43.373 = 99%
    Process Time = 4064.590 = 01:07:44.590 = 99%
    Global Time = 4066.571 = 01:07:46.571 = 100%


    Corsair Dominator GT 1866 MHz

    C:\>timer paq8l -8 enwik8

    Timer 3.01 Copyright (c) 2002-2003 Igor Pavlov 2003-07-10
    Creating archive enwik8.paq8l with 1 file(s)...
    enwik8 100000000 -> 17916420
    100000000 -> 17916450
    Time 3996.57 sec, used 1643021323 bytes of memory

    Kernel Time = 0.982 = 00:00:00.982 = 0%
    User Time = 3994.187 = 01:06:34.187 = 99%
    Process Time = 3995.170 = 01:06:35.170 = 99%
    Global Time = 3996.637 = 01:06:36.637 = 100%


    G.Skill Ares 2133 MHz

    C:\>timer paq8l -8 enwik8

    Timer 3.01 Copyright (c) 2002-2003 Igor Pavlov 2003-07-10
    Creating archive enwik8.paq8l with 1 file(s)...
    enwik8 100000000 -> 17916420
    100000000 -> 17916450
    Time 3929.55 sec, used 1643021323 bytes of memory

    Kernel Time = 1.107 = 00:00:01.107 = 0%
    User Time = 3926.981 = 01:05:26.981 = 99%
    Process Time = 3928.089 = 01:05:28.089 = 99%
    Global Time = 3929.603 = 01:05:29.603 = 100%
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	DSC_5472.jpg 
Views:	238 
Size:	835.0 KB 
ID:	1940  

  2. #2
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,497
    Thanks
    26
    Thanked 132 Times in 102 Posts
    Many thanks!

    So to recap:
    With 7-zip 1.6x times faster memory (2133/ 1333) resulted in 1.169x times higher performance, while with PAQ8L that resulted in 1.072x times higher performance.

    Results are somewhat unexpected for me. PAQ8L was said to be memory bottlenecked but that doesn't seem to be the case with your setup. Probably doing multithreaded tests should expose higher benefits from higher clocked memory.

  3. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,304 Times in 740 Posts
    Afair paq8 basically needs to read single bytes from random locations, while for lzma there're matched strings and 8-byte bt4 nodes.

  4. #4
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,497
    Thanks
    26
    Thanked 132 Times in 102 Posts
    So? CL9 at 2133 MHz is 1.6 times lower in absolute value than CL9 at 1333 MHz.

  5. #5
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    404
    Thanked 403 Times in 153 Posts
    For sure, any LZ-type coder is more bottlenecked by memory bandwidth than computationally complex CM.

    In some LZ implementations string searching is the thing that takes 99% of time.

    CM performs many complex computations in pair with random memory access. So, CPU power is the real bottleneck. As example, with CPU @ 4.0 GHz, compression time will be notable longer than that with 4.6 GHz...

  6. #6
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,977
    Thanks
    296
    Thanked 1,304 Times in 740 Posts
    1. Afaik CL9 only applies to aligned bytes (at offset 0 in DRAM row)
    2. Its not as simple as 2133/1333, because RAM, memory controller, and cpu all have to be synchronized.

  7. #7
    Member
    Join Date
    Oct 2009
    Location
    usa
    Posts
    60
    Thanks
    1
    Thanked 9 Times in 6 Posts
    Interesting analysis. My findings are, also concurrent. In other words, 1600 Mhz RAM offers the best price / performance ratio! (4066 sec is very much faster than 4210 sec) vs 3996 and 3926 sec. I've got 16 GB of 1600 Mhz RAM and am happy with its speed vs 2133 Mhz.

  8. #8
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    404
    Thanked 403 Times in 153 Posts
    A new test. This time with Intel Core i7-3770K watercooled with Corsair H60.

    Motherboard: ASUS P8Z77-I DELUXE

    Memory sticks are:

    G.Skill RipJawsX 1600 MHz, CL9, 1.5V - 2x8 GB
    Corsair Dominator Platinum 2133 MHz, CL9, 1.5V - 2x4 GB

    Same PAQ8L and ENWIK8. And this is interesting since now you can compare the speed of 2700K and 3770K at the same clock speed.

    No code has to be inserted here.

    3770K is a cool thing. Also, I think Corsair Dominator Platinum is just mind blowing!


  9. #9
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,497
    Thanks
    26
    Thanked 132 Times in 102 Posts
    Is the CPU the only thing that differs? Clock for clock Ivy is over 20% faster than Sandy in PAQ8L? Interesting

  10. #10
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    404
    Thanked 403 Times in 153 Posts
    Quote Originally Posted by Piotr Tarsa View Post
    Is the CPU the only thing that differs? Clock for clock Ivy is over 20% faster than Sandy in PAQ8L? Interesting
    The 2700K timings are taken from the first post. In general, the other components are just the same or in the same class, so this can't play such a huge role.

  11. #11
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,026
    Thanks
    103
    Thanked 410 Times in 285 Posts
    3960X @ 4.4 GHz - 2133 MHz CL11 - 4119 sec
    What compile do you use? This is none Intel compile time.

  12. #12
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,026
    Thanks
    103
    Thanked 410 Times in 285 Posts
    3960X @ 4.4 GHz - 2133 MHz CL11 - 3498 sec (Intel compile)

  13. #13
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,572
    Thanks
    780
    Thanked 687 Times in 372 Posts
    if it's not an error, such great difference is due to architecture improvement in Ivy. i don't remeber is it improved DIV times? Ilia, please check lzma:max too

  14. #14
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    404
    Thanked 403 Times in 153 Posts
    Quote Originally Posted by Sportman View Post
    3960X @ 4.4 GHz - 2133 MHz CL11 - 4119 sec
    What compile do you use? This is none Intel compile time.
    It's an Intel compile that can be found in paq8l.zip

  15. #15
    The Founder encode's Avatar
    Join Date
    May 2006
    Location
    Moscow, Russia
    Posts
    4,013
    Thanks
    404
    Thanked 403 Times in 153 Posts
    Looks like LZMA is not CPU intense:
    Code:
    ; 2700K @ 4.6 GHz, 2133 MHz CL9
    
    C:\>timer lzma e enwik9 enwik9.z -d27
    
    Timer 3.01 Copyright (c) 2002-2003 Igor Pavlov 2003-07-10
    
    LZMA 9.20 : Igor Pavlov : Public domain : 2010-11-18
    
    Kernel Time = 0.686 = 00:00:00.686 = 0%
    User Time = 686.248 = 00:11:26.248 = 99%
    Process Time = 686.934 = 00:11:26.934 = 99%
    Global Time = 688.354 = 00:11:28.354 = 100% 
    
    ; 3770K @ 4.6 GHz, 2133 MHz CL9
    
    c:\test>timer lzma e enwik9 enwik9.z -d27
    
    Timer 3.01  Copyright (c) 2002-2003 Igor Pavlov  2003-07-10
    
    LZMA 9.20 : Igor Pavlov : Public domain : 2010-11-18
    
    Kernel Time  =     0.780 = 00:00:00.780 =   0%
    User Time    =   668.823 = 00:11:08.823 =  99%
    Process Time =   669.603 = 00:11:09.603 =  99%
    Global Time  =   670.118 = 00:11:10.118 = 100%
    
    ; 3770K @ 4.8 GHz, 2133 MHz CL9
    
    c:\test>timer lzma e enwik9 enwik9.z -d27
    
    Timer 3.01  Copyright (c) 2002-2003 Igor Pavlov  2003-07-10
    
    LZMA 9.20 : Igor Pavlov : Public domain : 2010-11-18
    
    Kernel Time  =     0.655 = 00:00:00.655 =   0%
    User Time    =   651.787 = 00:10:51.787 =  99%
    Process Time =   652.442 = 00:10:52.442 =  99%
    Global Time  =   652.693 = 00:10:52.693 = 100%

  16. #16
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,497
    Thanks
    26
    Thanked 132 Times in 102 Posts
    For a 4.3% clock bump there is 2.7% performance gain, so I would say 7-zip isn't completely memory bound. I would say that Ivy enhancements don't make significant change in 7-zip case.

  17. #17
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,572
    Thanks
    780
    Thanked 687 Times in 372 Posts
    1. lzma became 2.8% faster for 4.3% faster clock speed - 66% scaling
    2. ivy is 2.6% faster on the same clock speed than sandy

    I would say 7-zip isn't completely memory bound.
    i will even say that it's more cpu-bound rather than memory-bound

Similar Threads

  1. compression speed VS decomp speed: which is more important?
    By Lone_Wolf236 in forum Data Compression
    Replies: 14
    Last Post: 12th July 2010, 19:57
  2. Replies: 8
    Last Post: 12th April 2009, 02:39
  3. Can't allocate memory required for (de)compression..help!
    By Duarte in forum Data Compression
    Replies: 19
    Last Post: 18th July 2008, 18:14
  4. Better compression performance across time?
    By Trixter in forum Data Compression
    Replies: 16
    Last Post: 16th June 2008, 23:35
  5. TC 5.0dev11 is here - Time to gain compression!
    By encode in forum Forum Archive
    Replies: 38
    Last Post: 1st August 2006, 09:24

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •