A small delirium idea of the first morning of 2015. Original Adler32 optimization idea is to perform MOD of Adler and Sum2 each 5552 bytes (modulo delay count), to avoid 32-bit arithmetic overflow.
But since we are heavily on 64-bit now, why tiny 5552 byte blocks? We easily can do MOD once per a few megabytes! And still keep Adler variable as a 32-bit integer.
And hence the code:
const int BUF_SIZE=1<<21; // 2 MB
UINT32 Adler=1;
UINT64 Sum2=0;
int n;
while ((n=fread(buf, 1, BUF_SIZE, f))>0)
{
for (int p=0; p<n; ++p)
{
Adler+=buf[p];
Sum2+=Adler;
}
Adler%=65521;
Sum2%=65521;
}
return (Sum2<<16)|Adler;
And we are confidently safe from a 64-bit integer overflow here...
And with some loop-unrolling, this code is notable faster than super-fast Slice-by-8 CRCs. An upcoming CHK v1.80 will feature this accelerated Adler32 for sure!
The code (just keep an eye on n variable - it must not exceed 16 MB, in this case max l is (255<<24), 64-bit h withstand even further):
void Update(UINT32& Adler, UINT8* s, int n)
{
UINT32 l=UINT16(Adler);
UINT64 h=Adler>>16;
int p=0;
for (; p<(n&7); ++p)
h+=(l+=s[p]);
for (; p<n; p+=8)
{
h+=(l+=s[p]);
h+=(l+=s[p+1]);
h+=(l+=s[p+2]);
h+=(l+=s[p+3]);
h+=(l+=s[p+4]);
h+=(l+=s[p+5]);
h+=(l+=s[p+6]);
h+=(l+=s[p+7]);
}
Adler=(UINT32(h%65521)<<16)|(l%65521);
}