# Thread: Code snippet to compute CPU frequency

1. ## Code snippet to compute CPU frequency

Note that you absolutely need to compile with optimizations on (-O2) in order to get correct result. The algorithm is based on assumptions that a CPU is superscalar and can perform ADD and XOR operations in a single cycle

```
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>

// Perform CYCLES simple in-order operations
unsigned loop(int CYCLES)
{
unsigned a = rand(),  b = rand(),  x = rand();
for (int i=0; i < CYCLES/10; i++)
{
x = (x + a) ^ b;
x = (x + a) ^ b;
x = (x + a) ^ b;
x = (x + a) ^ b;
x = (x + a) ^ b;
}
return x;
}

​
int main(int argc, char *argv[])
{
int CYCLES = 100*1000*1000;
unsigned x = loop(CYCLES/10);  // warm up the cpu

struct timeval tm, tn;
gettimeofday(&tm, NULL);

x += loop(CYCLES);

gettimeofday(&tn, NULL);
double t1 = (tn.tv_sec - tm.tv_sec) +
(tn.tv_usec - tm.tv_usec) / 1e6;

if (x)
printf("Time: %.6f s, CPU freq %.2f GHz\n", t1, (CYCLES/1e9)/t1);
return 0;
}
```

2. ## Thanks:

Lucas (7th May 2020)

3. 1. VS doesn't have gettimeofday: https://stackoverflow.com/questions/...ay-for-windows

2. Your code didn't work correctly for me:
Code:
`Time: 0.022079 s, CPU freq 4.53 GHz; RDTSC result: 2.01 Ghz`

4. ## Thanks:

Bulat Ziganshin (7th May 2020)

5. yeah, my code directly measures frequency based on some assumptions about command execution on modern CPUs rather than requesting OS/CPU for this info

my goal was to write a small portable code measuring the CURRENT frequency of the CORE executing this code. My best bet is that 2 GHz is base freq of your CPU, since it's what RDTSC measures while my code measured 4.5 GHz which is turbo freq for a single core on your CPU. If you can disclose your CPU model, we can check my assumption.

By any means, the code performed 100 millions of (hopefully) dependent operations in 1/45 sec, and it's really strange to do what at 2 GHz.

My own measurement on i7-8665 (taskman reports base freq of 2.1 and current freqs up to ~4 GHz):
====
Time: 0.026001 s, CPU freq 3.85 GHz; RDTSC result: 2.08 Ghz
====

Note that the time is different by 18% while RDTSC-measured freqs differ only by a few percents. I.e. according to RDTSC, our CPUs spent different amount of cpu cycles performing the same, cache-local code. Moreover, on my own CPU we can get pretty different results:
====
Time: 0.027003 s, CPU freq 3.70 GHz; RDTSC result: 2.10 Ghz

Time: 0.023000 s, CPU freq 4.35 GHz; RDTSC result: 2.16 Ghz
====

I think, the conclusion is obvious

PS: I should try longer warm-up to reach maximum freq possible, or at least more stable one

6. Quick check at godbolt shows that all compilers generate exactly the expected code. F.e. clang on POWER64:

Code:
```.L3:
xor 3,30,3
xor 3,30,3
xor 3,30,3
xor 3,30,3
xor 3,30,3
rldicl 3,3,0,32
bdnz.L3
​```

7. > My best bet is that 2 GHz is base freq of your CPU, since it's what RDTSC measures

Actually its the idle freq of my cpu.

> measuring the CURRENT frequency of the CORE executing this code

TSC does exactly that - its per-core and it increments at actual current clock speed.
Turbo tricks make it hard to use for time measurement, but its still the perfect
tool for measurement of cpu clocks taken by some code - by design.

Also TSC equivalents exist on all modern architectures (including GPU) -
for ARM and PPC.

> If you can disclose your CPU model, we can check my assumption.

Its 7820X.

> By any means, the code performed 100 millions of (hopefully) dependent operations in 1/45 sec,
> and it's really strange to do what at 2 GHz.

Ok, I made a new version which measures clocks taken by _one_ instance of your code:
```test1:
.text:04015F7                 call    _Z5rdtscv
.text:04015FC                 mov     [rsp+38h+var_10], rax
.text:0401601                 call    _Z5rdtscv

test2:
.text:040158A                 call    _Z5rdtscv
.text:040158F                 mov     [rsp+38h+var_10], rax
.text:0401594                 lea     eax, [r8+r11]
.text:0401598                 xor     eax, r9d
.text:040159E                 xor     eax, r9d
.text:04015A4                 xor     eax, r9d
.text:04015AA                 xor     eax, r9d
.text:04015B0                 xor     r9d, eax
.text:04015B3                 call    _Z5rdtscv```

Code:
```subsequent RDTSC calls: 13 clk
RDTSC calls around 5 xor-adds: 14 clk```
So I suspect that you underestimated the IPC in this case (I disabled HT).

> PS: I should try longer warm-up to reach maximum freq possible, or at least more stable one

Problem is, not all compression algorithms manage to warm up the cpu properly.
Also "warm up" with AVX2/AVX512 usage could reduce cpu freq rather than increasing it.

8. Interesting... I tried adding more "x = (x + a) ^ b" lines:
Code:
``` 0 lines -> 13 clk
5 lines -> 14 clk
10 lines -> 15 clk
15 lines -> 19 clk
20 lines -> 23 clk
25 lines -> 27 clk
30 lines -> 31 clk
35 lines -> 37 clk
40 lines -> 41 clk
45 lines -> 45 clk
50 lines -> 51 clk```
Ok, so I modified the first script according to this, somehow its a better match for TSC now.
GCC doesn't actually unroll the loop though.
```// Perform CYCLES simple in-order operations
uint loop( int CYCLES ) {
uint i,j, a = rand(), b = rand(), x = rand();
for( i=0; i<CYCLES/45; i++ ) {
#pragma unroll(50)
for( j=0; j<50; j++ ) x = (x + a) ^ b;
}
return x;
}```

Code:
`Time: 0.049632 s, CPU freq 2.01 GHz; RDTSC result: 1.99 Ghz`

9. https://ark.intel.com/content/www/us...-4-30-ghz.html :

Intel® Turbo Boost Max Technology 3.0 Frequency ‡
4.50 GHz

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•