Could someone please me to do some testing with my version 47, which includes (perhaps) a pretty fast but reliable verifier?
Essentially, the CRC32 codes of the individual files are stored during the compression phase, HW calculated (but not very smart).
During the test they are checked (default setting) or even re-readed from the files on disk (switch -crc32).
C:\zpaqfranz>zpaqfranz a r:\unnoo f:\* c:\dropbox\dropbox\* -test -crc32
zpaqfranz v47-experimental journaling archiver, compiled Dec 25 2020
Creating r:/unnoo.zpaq at offset 0 + 0
Adding 35.277.684.778 in 193.591 files at 2020-12-25 10:11:03
f:/System Volume Information/klmeta.dat: error sharing violation8 89.050.678/sec
99.71% 0:00:00 35.171.430.059 -> 18.581.200.867 of 35.273.997.866 104.057.485/sec
211.530 +added, 0 -removed.
0.000000 + (35273.997866 -> 27200.708309 -> 18680.869392) = 18.680.869.392
Forced XLS has included 87.887.223 bytes in 582 files
zpaqfranz: do a full (not paranoid) test
r:/unnoo.zpaq:
1 versions, 211530 files, 520146 fragments, 18.680.869.392
Checking 35.273.997.866 in 193.590 files -threads 12
99.82% 0:00:00 35.208.949.750 -> 18.660.742.481 of 35.273.997.866 81.502.198/sec
Checking 299.475 blocks with CRC32 (34.485.752.228)
Re-testing CRC-32 from filesystem
ERROR: STORED B3FBAB1C != DECOMPRESSED 348150FB (ck 00000001) c:/dropbox/dropbox/libri/collisione_sha1/shattered-1.pdf
ERROR: STORED B3FBAB1C != DECOMPRESSED 348150FB (ck 00000001) c:/dropbox/dropbox/libri/collisione_sha1/shattered-2.pdf
Verify time 111.625000 zeroed bytes 788.245.638
ERRORS : 00000002 (ERROR: something WRONG)
SURE : 00193588 of 00193590 (stored=decompressed=file on disk)
WITH ERRORS
544.328 seconds (with errors)
I would therefore need some add () with the -test option of very weird file (all zeros, part zeros part not, small, large, duplicated and un-duplicated etc)
zpaqfranz a z:\pippo.zpaq c:\mydata d:\mydata2 -test
A fast (not filesystem-reload) can be done via t(est)
While not optimized it should be pretty fast
zpaqfranz t z:\pippo.zpaq
Slow (file system reload)
In this case each file is reread by the filesystem, and CRC32 codes recalculated. Normally by CPU hardware instructions, so the bottleneck is normally the media transfer rate
Using a double check, SHA1 on individual fragments, and CRC32 on the entire file, I hope to catch even SHA1 collisions.
A pretty brutal method, but it should work
Thank you and and merry christmas
Cпасибо и счастливого рождества