Could someone please me to do some testing with my version 47, which includes (perhaps) a pretty fast but reliable verifier?
Essentially, the CRC32 codes of the individual files are stored during the compression phase, HW calculated (but not very smart).
During the test they are checked (default setting) or even re-readed from the files on disk (switch -crc32).
C:\zpaqfranz>zpaqfranz a r:\unnoo f:\* c:\dropbox\dropbox\* -test -crc32
zpaqfranz v47-experimental journaling archiver, compiled Dec 25 2020
Creating r:/unnoo.zpaq at offset 0 + 0
Adding 35.277.684.778 in 193.591 files at 2020-12-25 10:11:03
f:/System Volume Information/klmeta.dat: error sharing violation8 89.050.678/sec
99.71% 0:00:00 35.171.430.059 -> 18.581.200.867 of 35.273.997.866 104.057.485/sec
211.530 +added, 0 -removed.
0.000000 + (35273.997866 -> 27200.708309 -> 18680.869392) = 18.680.869.392
Forced XLS has included 87.887.223 bytes in 582 files
zpaqfranz: do a full (not paranoid) test
r:/unnoo.zpaq:
1 versions, 211530 files, 520146 fragments, 18.680.869.392
Checking 35.273.997.866 in 193.590 files -threads 12
99.82% 0:00:00 35.208.949.750 -> 18.660.742.481 of 35.273.997.866 81.502.198/sec
Checking 299.475 blocks with CRC32 (34.485.752.228)
Re-testing CRC-32 from filesystem
ERROR: STORED B3FBAB1C != DECOMPRESSED 348150FB (ck 00000001) c:/dropbox/dropbox/libri/collisione_sha1/shattered-1.pdf
ERROR: STORED B3FBAB1C != DECOMPRESSED 348150FB (ck 00000001) c:/dropbox/dropbox/libri/collisione_sha1/shattered-2.pdf
Verify time 111.625000 zeroed bytes 788.245.638
ERRORS : 00000002 (ERROR: something WRONG)
SURE : 00193588 of 00193590 (stored=decompressed=file on disk)
WITH ERRORS
544.328 seconds (with errors)
I would therefore need some add () with the -test option of very weird file (all zeros, part zeros part not, small, large, duplicated and un-duplicated etc)
zpaqfranz a z:\pippo.zpaq c:\mydata d:\mydata2 -test
A fast (not filesystem-reload) can be done via t(est)
While not optimized it should be pretty fast
zpaqfranz t z:\pippo.zpaq
Slow (file system reload)
In this case each file is reread by the filesystem, and CRC32 codes recalculated. Normally by CPU hardware instructions, so the bottleneck is normally the media transfer rate
Using a double check, SHA1 on individual fragments, and CRC32 on the entire file, I hope to catch even SHA1 collisions.
A pretty brutal method, but it should work
Thank you and and merry christmas
Cпасибо и счастливого рождества
I tested it with weird data I got from my friend. It has a program to calculate some starting positions, and these positions are stored in files after a while, I can't say more. I don't know if it meets your requirements, they are very small, similar files, but not the same, each folder also contains one picture. It seems to have gone without a errors.
1) zpaqfranz v47-experimental journaling archiver, compiled Dec 25 2020
Creating m.zpaq at offset 0 + 0
Adding 41.131.731 in 3.241 files at 2020-12-26 19:16:05
19.05% 0:00:00 7.834.586 -> 0 of 41.131.731 7.834.586/sec
3.594 +added, 0 -removed.
zpaqfranz: do a full (not paranoid) test
m.zpaq:
1 versions, 3594 files, 3003 fragments, 1.219.484
Checking 41.131.731 in 3.241 files -threads 16
17.13% 0:00:04 7.047.342 -> 1.075.011 of 41.131.731 7.047.342/sec
Checking 3.305 blocks with CRC32 (41.131.731)
Verify time 0.078000 zeroed bytes 0
GOOD : 00003241 of 00003241 (stored=decompressed)
All OK (normal test)
Checking 3.305 blocks with CRC32 (41.131.731)
Verify time 0.078000 zeroed bytes 0
GOOD : 00003241 of 00003241 (stored=decompressed)
All OK (normal test)
0.219 seconds (all OK)
3) zpaqfranz v47-experimental journaling archiver, compiled Dec 25 2020
franz:use CRC32 instead of SHA1
m.zpaq:
1 versions, 3594 files, 3003 fragments, 1.219.484
Checking 41.131.731 in 3.241 files -threads 16
Checking 3.305 blocks with CRC32 (41.131.731)
Re-testing CRC-32 from filesystem
Verify time 0.297000 zeroed bytes 0
SURE : 00003241 of 00003241 (stored=decompressed=file on disk)
All OK (paranoid test)
OK, here are some zeros. The test was performed with a 1GB file. However, I must emphasize that I also tried the test with a 2GB file and the process was so slow that I canceled the test (about after an hour).Maybe some limit for zero blocks has been exceeded, I don't know.
zpaqfranz v47-experimental journaling archiver, compiled Dec 25 2020
Creating 1GB.zpaq at offset 0 + 0
Adding 1.075.838.976 in 1 files at 2020-12-28 08:04:07
zpaqfranz: do a full (not paranoid) test
1GB.zpaq:
1 versions, 1 files, 4 fragments, 5.434
Checking 1.075.838.976 in 1 files -threads 16
Checking 3 blocks with CRC32 (218.145)
Verify time 23.057000 zeroed bytes 1.075.620.831
GOOD : 00000001 of 00000001 (stored=decompressed)
All OK (normal test)
Checking 3 blocks with CRC32 (218.145)
Verify time 23.042000 zeroed bytes 1.075.620.831
GOOD : 00000001 of 00000001 (stored=decompressed)
All OK (normal test)
23.697 seconds (all OK)
zpaqfranz v47-experimental journaling archiver, compiled Dec 25 2020
franz:use CRC32 instead of SHA1
1GB.zpaq:
1 versions, 1 files, 4 fragments, 5.434
Checking 1.075.838.976 in 1 files -threads 16
Checking 3 blocks with CRC32 (218.145)
Re-testing CRC-32 from filesystem
Verify time 23.618000 zeroed bytes 1.075.620.831
SURE : 00000001 of 00000001 (stored=decompressed=file on disk)
All OK (paranoid test)
I am working on a multiple directory comparator to verify, after having performed a multiple backup extraction, the perfect match.
Essentially given the source /tank folder we place on three different devices /copia1 /copia2 /copia3 3 different copies with 3 different software.
Then extracted in three folders /test/1 /test/2 /test/3
Three threads are launched that scan in parallel and calculate the CRC32 always in parallel.
From the first few tests I'm seeing overall disk reads over 1.6GB / s (which isn't bad) from three SATA SSDs.
By any chance there is already some such tool, which I don't know, for UNIX?
Should also compile on Linux (tested only on Debian), plus FreeBSD and Windows (gcc)
I have added some functions that I think are useful.
The first is the l (list) command.
Now with ONE parameter (the .ZPAQ file) shows its contents.
If more than one parameter, compare the contents of the ZPAQ with one or more folders, with a (block) check of SHA1s (the old -not =).
Can be used as a quick check after add:
zpaqfranz a z:\1.zpaq c:\pippo
zpaqfranz l z:\1.zpaq c:\pippo
Then I introduce the command c (compare) for directories, between a master and N slave.
With the switch -all launches N+1 threads.
The default verification is file name and size only.
Applying the -crc32 switch also verifies its checksum
WHAT?
During the verification phase of the correct functioning of the backups it is normal to extract them on several different media (devices).
Using for example folders synchronized with rsync on NAS, ZIP files, ZPAQ via NFS-mounted shares, smbfs, internal HDD etc.
Comparing multiple copies can takes a (very) long time.
Suppose to have a /tank/condivisioni master (or source) directory (hundreds of GB, hundred thousand files)
Suppose to have some internal (HDD) and external (NAS) rsynced copy (/rsynced-copy-1, /rsynced-copy-2, /rsynced-copy-3...)
Suppose to have internal ZIP backup, internal ZPAQ backup, external (NAS1 zip backup), external (NAS2 zpaq backup) and so on.
Let's extract all of them (ZIP and ZPAQs) into /temporaneo/1, /temporaneo/2, /temporaneo/3...
But this can take a lot of time (many hours) even for fast machines
The command c compares a master folder (the first indicated) to N slave folders (all the others) in two particular operating modes.
By default it just checks the correspondence of files and their size
(for example for rsync copies with different charsets,
ex unix vs linux, mac vs linux, unix vs ntfs it is extremely useful).
Using the -crc32 switch a check of this code is also made (with HW CPU support, if available).
The interesting aspect is the -all switch: N+1 threads will be created
(one for each specified folder) and executed in parallel,
both for scanning and for calculating the CRC.
On modern servers (eg Xeon with 8, 10 or more CPUs)
with different media (internal) and multiple connections (NICs) to NASs
you can drastically reduce times compared to multiple, sequential diff -qr.
It clearly makes no sense for single magnetic disks.
In the given example
zpaqfranz c /tank/condivisioni /temporaneo/1 /temporaneo/2 /temporaneo/3 /rsynced-copy-1 /rsynced-copy-2 /rsynced-copy-3 -all
will run 7 threads which take care of one directory each.
The hypothesis is that the six copies are each on a different device, and the server have plenty of cores and NICs.
It's normal in datastorage and virtualization environments (at least in mine).
In particular (perhaps) settled the test-after-add.
Using the -test switch, immediately after the creation of the archive,
a "chunk" SHA1 codes verify is done (very little RAM used), with also a CRC-32 verification
(HW if available)
This intercept even SHA1 collisions
C:\zpaqfranz>zpaqfranz a z:\1.zpaq c:\dropbox\Dropbox -test
zpaqfranz v50.7-experimental journaling archiver, compiled Jan 19 2021
Creating z:/1.zpaq at offset 0 + 0
Adding 8.725.128.041 in 29.399 files at 2021-01-19 18:11:23
98.22% 0:00:00 8.569.443.981 -> 5.883.235.525 of 8.725.128.041 164.796.999/sec
34.596 +added, 0 -removed.
0.000000 + (8725.128041 -> 7400.890377 -> 6054.485111) = 6.054.485.111
Forced XLS has included 13.342.045 bytes in 116 files
zpaqfranz: doing a full (with file verify) test
Compare archive content of:z:/1.zpaq:
1 versions, 34.596 files, 122.232 fragments, 6.054.485.111 bytes (5.64 GB)
34.596 in <<c:/dropbox/Dropbox>>
Total files found 34.596
GURU SHA1 COLLISION! B3FBAB1C vs 348150FB c:/dropbox/Dropbox/libri/collisione_sha1/shattered-1.pdf
# 2020-11-06 16:00:09 422.435 c:/dropbox/Dropbox/libri/collisione_sha1/shattered-1.pdf
+ 2020-11-06 16:00:09 422.435 c:/dropbox/Dropbox/libri/collisione_sha1/shattered-1.pdf
Block checking ( 119.742.900) done ( 7.92 GB) of ( 8.12 GB)
00034595 =same
00000001 #different
00000001 +external (file missing in ZPAQ)
Total different file size 844.870
79.547 seconds (with errors)
This (quick check) function can be invoked simply by using l instead of a
zpaqfranz a z:\1.zpaq c:\pippo
zpaqfranz l z:\1.zpaq c:\pippo