Note that you absolutely need to compile with optimizations on (-O2) in order to get correct result. The algorithm is based on assumptions that a CPU is superscalar and can perform ADD and XOR operations in a single cycle
// Perform CYCLES simple in-order operations
unsigned loop(int CYCLES)
{
unsigned a = rand(), b = rand(), x = rand();
for (int i=0; i < CYCLES/10; i++)
{
x = (x + a) ^ b;
x = (x + a) ^ b;
x = (x + a) ^ b;
x = (x + a) ^ b;
x = (x + a) ^ b;
}
return x;
}
int main(int argc, char *argv[])
{
int CYCLES = 100*1000*1000;
unsigned x = loop(CYCLES/10); // warm up the cpu
yeah, my code directly measures frequency based on some assumptions about command execution on modern CPUs rather than requesting OS/CPU for this info
my goal was to write a small portable code measuring the CURRENT frequency of the CORE executing this code. My best bet is that 2 GHz is base freq of your CPU, since it's what RDTSC measures while my code measured 4.5 GHz which is turbo freq for a single core on your CPU. If you can disclose your CPU model, we can check my assumption.
By any means, the code performed 100 millions of (hopefully) dependent operations in 1/45 sec, and it's really strange to do what at 2 GHz.
My own measurement on i7-8665 (taskman reports base freq of 2.1 and current freqs up to ~4 GHz):
====
Time: 0.026001 s, CPU freq 3.85 GHz; RDTSC result: 2.08 Ghz
====
Note that the time is different by 18% while RDTSC-measured freqs differ only by a few percents. I.e. according to RDTSC, our CPUs spent different amount of cpu cycles performing the same, cache-local code. Moreover, on my own CPU we can get pretty different results:
====
D:\Downloads\003>1.exe
Time: 0.027003 s, CPU freq 3.70 GHz; RDTSC result: 2.10 Ghz
> My best bet is that 2 GHz is base freq of your CPU, since it's what RDTSC measures
Actually its the idle freq of my cpu.
> measuring the CURRENT frequency of the CORE executing this code
TSC does exactly that - its per-core and it increments at actual current clock speed.
Turbo tricks make it hard to use for time measurement, but its still the perfect
tool for measurement of cpu clocks taken by some code - by design.
Also TSC equivalents exist on all modern architectures (including GPU) -
google benchmark header linked in my previous post includes TSC-like code
for ARM and PPC.
> If you can disclose your CPU model, we can check my assumption.
Its 7820X.
> By any means, the code performed 100 millions of (hopefully) dependent operations in 1/45 sec,
> and it's really strange to do what at 2 GHz.
Ok, I made a new version which measures clocks taken by _one_ instance of your code:
So I suspect that you underestimated the IPC in this case (I disabled HT).
> PS: I should try longer warm-up to reach maximum freq possible, or at least more stable one
Problem is, not all compression algorithms manage to warm up the cpu properly.
Also "warm up" with AVX2/AVX512 usage could reduce cpu freq rather than increasing it.