I wrote a timer function for counting CPU clock cycles for very precise timing when doing optimization (under 1 ns). Use it like this:
Code:
int start = rdtsc(); // get the time
// code to be timed
printf("%u clocks\n", rdtsc()-start); // print time difference
Here is the function:
Code:
unsigned int(*const rdtsc)()=(unsigned int(*)())(char*)"\x0f\x31\xc3";
The 3 byte string is x86 assembler for
RDTSC
RET
The x86 has a 64 bit counter that is incremented once per clock cycle. The RDTSC instruction puts the counter in edx:eax. The function returns the low 32 bits in eax. The code is first cast to char* (so VC++ doesn't complain), then cast to a constant pointer to a function taking no parameters and returning unsigned int.
The code should be safe (on a Pentium or higher) because eax, edx (and ecx) are scratch registers that don't need to be preserved across function calls. I tested it with g++, Borland, Mars, and VC++ with various optimizations in WinXP. No assembler required 
You could also get all 64 bits of the counter by changing the return type to unsigned long long.