
Originally Posted by
Bulat Ziganshin
blake2sp is ~10cpb. vmac is a part of srep for 2 years...
... btw, srep performs full dedup at 4GB/s where checksumming is only a part of work

Yes, I know about SREP internals. I'm tracking its progress almost from the beggining 

Originally Posted by
Bulat Ziganshin
you can make standalone program yourself.
You're overestimating my possibilities ). I tried to compile original sources but only after a couple of hours I realized why entry point is not visible. ОК, I compiled it. But resulted exe started to give some bad results and reporting about some differences in abc. So I took your version from SREP and everything became fine. By the way, changes you have made for removing some limitations of vmac are admirable.
Yes, you were right - VMAC is blazingly fast if not to say more.
Code:
16 bytes, 20.60 cpb | 2048 bytes, 0.46 cpb
32 bytes, 10.37 cpb | 4096 bytes, 0.39 cpb
64 bytes, 5.28 cpb | 8192 bytes, 0.34 cpb
128 bytes, 2.70 cpb | 16384 bytes, 0.33 cpb
256 bytes, 1.54 cpb | 32768 bytes, 0.32 cpb
512 bytes, 0.94 cpb | 65536 bytes, 0.31 cpb
1024 bytes, 0.62 cpb | 131072 bytes, 0.31 cpb
But I have a suspicion that VMAC is strongly asm optimized and tuned for aligned lengths of data (correct me If I wrong). Also I think its not ready for large portions of data. At least out-of-the-box. I tried to test it with larger bytes lengths and it simply crashed at 1048576 bytes. Also there are still limitations like the first bit of the nonce buffer and special conditions for vhash_update.
I have also retested blake2 with my new compile and blake2b for this time. It gets 5.90 cpb.
There is also xxHash which impressed me by its nice code layout and speed. It gets 0.86 cpb.