Page 89 of 89 FirstFirst ... 3979878889
Results 2,641 to 2,657 of 2657

Thread: zpaq updates

  1. #2641
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    517
    Thanks
    25
    Thanked 45 Times in 37 Posts
    Could someone please me to do some testing with my version 47, which includes (perhaps) a pretty fast but reliable verifier?
    Essentially, the CRC32 codes of the individual files are stored during the compression phase, HW calculated (but not very smart).

    During the test they are checked (default setting) or even re-readed from the files on disk (switch -crc32).

    C:\zpaqfranz>zpaqfranz a r:\unnoo f:\* c:\dropbox\dropbox\* -test -crc32
    zpaqfranz v47-experimental journaling archiver, compiled Dec 25 2020
    Creating r:/unnoo.zpaq at offset 0 + 0
    Adding 35.277.684.778 in 193.591 files at 2020-12-25 10:11:03
    f:/System Volume Information/klmeta.dat: error sharing violation8 89.050.678/sec
    99.71% 0:00:00 35.171.430.059 -> 18.581.200.867 of 35.273.997.866 104.057.485/sec
    211.530 +added, 0 -removed.

    0.000000 + (35273.997866 -> 27200.708309 -> 18680.869392) = 18.680.869.392
    Forced XLS has included 87.887.223 bytes in 582 files

    zpaqfranz: do a full (not paranoid) test
    r:/unnoo.zpaq:
    1 versions, 211530 files, 520146 fragments, 18.680.869.392
    Checking 35.273.997.866 in 193.590 files -threads 12
    99.82% 0:00:00 35.208.949.750 -> 18.660.742.481 of 35.273.997.866 81.502.198/sec

    Checking 299.475 blocks with CRC32 (34.485.752.228)
    Re-testing CRC-32 from filesystem
    ERROR: STORED B3FBAB1C != DECOMPRESSED 348150FB (ck 00000001) c:/dropbox/dropbox/libri/collisione_sha1/shattered-1.pdf
    ERROR: STORED B3FBAB1C != DECOMPRESSED 348150FB (ck 00000001) c:/dropbox/dropbox/libri/collisione_sha1/shattered-2.pdf
    Verify time 111.625000 zeroed bytes 788.245.638
    ERRORS : 00000002 (ERROR: something WRONG)
    SURE : 00193588 of 00193590 (stored=decompressed=file on disk)
    WITH ERRORS

    544.328 seconds (with errors)


    I would therefore need some add () with the -test option of very weird file (all zeros, part zeros part not, small, large, duplicated and un-duplicated etc)

    zpaqfranz a z:\pippo.zpaq c:\mydata d:\mydata2 -test


    A fast (not filesystem-reload) can be done via t(est)
    While not optimized it should be pretty fast
    zpaqfranz t z:\pippo.zpaq



    Slow (file system reload)
    In this case each file is reread by the filesystem, and CRC32 codes recalculated. Normally by CPU hardware instructions, so the bottleneck is normally the media transfer rate
    zpaqfranz t z:\pippo.zpaq -crc32


    The 64-bit Windows binary is here

    http://www.francocorbelli.it/zpaqfranz.exe

    Using a double check, SHA1 on individual fragments, and CRC32 on the entire file, I hope to catch even SHA1 collisions.
    A pretty brutal method, but it should work

    Thank you and and merry christmas
    Cпасибо и счастливого рождества
    Attached Files Attached Files

  2. #2642
    Member
    Join Date
    Sep 2017
    Location
    Czech
    Posts
    8
    Thanks
    3
    Thanked 5 Times in 3 Posts
    I tested it with weird data I got from my friend. It has a program to calculate some starting positions, and these positions are stored in files after a while, I can't say more. I don't know if it meets your requirements, they are very small, similar files, but not the same, each folder also contains one picture. It seems to have gone without a errors.


    1) zpaqfranz v47-experimental journaling archiver, compiled Dec 25 2020
    Creating m.zpaq at offset 0 + 0
    Adding 41.131.731 in 3.241 files at 2020-12-26 19:16:05
    19.05% 0:00:00 7.834.586 -> 0 of 41.131.731 7.834.586/sec
    3.594 +added, 0 -removed.


    0.000000 + (41.131731 -> 40.499245 -> 1.219484) = 1.219.484


    zpaqfranz: do a full (not paranoid) test
    m.zpaq:
    1 versions, 3594 files, 3003 fragments, 1.219.484
    Checking 41.131.731 in 3.241 files -threads 16
    17.13% 0:00:04 7.047.342 -> 1.075.011 of 41.131.731 7.047.342/sec


    Checking 3.305 blocks with CRC32 (41.131.731)
    Verify time 0.078000 zeroed bytes 0
    GOOD : 00003241 of 00003241 (stored=decompressed)
    All OK (normal test)


    1.045 seconds (all OK)


    2) zpaqfranz v47-experimental journaling archiver, compiled Dec 25 2020
    m.zpaq:
    1 versions, 3594 files, 3003 fragments, 1.219.484
    Checking 41.131.731 in 3.241 files -threads 16




    Checking 3.305 blocks with CRC32 (41.131.731)
    Verify time 0.078000 zeroed bytes 0
    GOOD : 00003241 of 00003241 (stored=decompressed)
    All OK (normal test)


    0.219 seconds (all OK)


    3) zpaqfranz v47-experimental journaling archiver, compiled Dec 25 2020
    franz:use CRC32 instead of SHA1
    m.zpaq:
    1 versions, 3594 files, 3003 fragments, 1.219.484
    Checking 41.131.731 in 3.241 files -threads 16




    Checking 3.305 blocks with CRC32 (41.131.731)
    Re-testing CRC-32 from filesystem
    Verify time 0.297000 zeroed bytes 0
    SURE : 00003241 of 00003241 (stored=decompressed=file on disk)
    All OK (paranoid test)


    0.453 seconds (all OK)

  3. Thanks:

    fcorbelli (27th December 2020)

  4. #2643
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    517
    Thanks
    25
    Thanked 45 Times in 37 Posts
    Quote Originally Posted by huh View Post
    I tested it with weird ...
    Thank you.
    As I can see there isen't large blocks of zeroed bytes

  5. #2644
    Member
    Join Date
    Sep 2017
    Location
    Czech
    Posts
    8
    Thanks
    3
    Thanked 5 Times in 3 Posts
    OK, here are some zeros. The test was performed with a 1GB file. However, I must emphasize that I also tried the test with a 2GB file and the process was so slow that I canceled the test (about after an hour).Maybe some limit for zero blocks has been exceeded, I don't know.

    zpaqfranz v47-experimental journaling archiver, compiled Dec 25 2020
    Creating 1GB.zpaq at offset 0 + 0
    Adding 1.075.838.976 in 1 files at 2020-12-28 08:04:07


    1 +added, 0 -removed.


    0.000000 + (1075.838976 -> 0.176802 -> 0.005434) = 5.434


    zpaqfranz: do a full (not paranoid) test
    1GB.zpaq:
    1 versions, 1 files, 4 fragments, 5.434
    Checking 1.075.838.976 in 1 files -threads 16




    Checking 3 blocks with CRC32 (218.145)
    Verify time 23.057000 zeroed bytes 1.075.620.831
    GOOD : 00000001 of 00000001 (stored=decompressed)
    All OK (normal test)


    31.824 seconds (all OK)


    zpaqfranz v47-experimental journaling archiver, compiled Dec 25 2020
    1GB.zpaq:
    1 versions, 1 files, 4 fragments, 5.434
    Checking 1.075.838.976 in 1 files -threads 16




    Checking 3 blocks with CRC32 (218.145)
    Verify time 23.042000 zeroed bytes 1.075.620.831
    GOOD : 00000001 of 00000001 (stored=decompressed)
    All OK (normal test)


    23.697 seconds (all OK)


    zpaqfranz v47-experimental journaling archiver, compiled Dec 25 2020
    franz:use CRC32 instead of SHA1
    1GB.zpaq:
    1 versions, 1 files, 4 fragments, 5.434
    Checking 1.075.838.976 in 1 files -threads 16




    Checking 3 blocks with CRC32 (218.145)
    Re-testing CRC-32 from filesystem
    Verify time 23.618000 zeroed bytes 1.075.620.831
    SURE : 00000001 of 00000001 (stored=decompressed=file on disk)
    All OK (paranoid test)


    24.289 seconds (all OK)

  6. #2645
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    517
    Thanks
    25
    Thanked 45 Times in 37 Posts
    Thank you.
    in fact i am working on a faster crc zero block calculator

  7. #2646
    Member
    Join Date
    Sep 2017
    Location
    Czech
    Posts
    8
    Thanks
    3
    Thanked 5 Times in 3 Posts
    ​You can try it for yourself, here is a 2GB file.
    Attached Files Attached Files

  8. Thanks:

    fcorbelli (30th December 2020)

  9. #2647
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    517
    Thanks
    25
    Thanked 45 Times in 37 Posts
    Quote Originally Posted by huh View Post
    ​You can try it for yourself, here is a 2GB file.
    Yes, you are right.
    From some time I have been wondering how to calculate the CRC of large blocks of zeros
    https://encode.su/threads/3543-How-t...-zeroed-buffer


    I need to write a "smarter" function, maybe pre-calculating various sized 0-bytes blocks, instead of a fixed one
    Work in progress...

  10. #2648
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    517
    Thanks
    25
    Thanked 45 Times in 37 Posts
    Quote Originally Posted by huh View Post
    ​You can try it for yourself, here is a 2GB file.
    OK, this Windows 64 build use a lookup table of CRC blocks until 2^53
    Should be rather fast for chunks up to about 9000 TB.

    http://www.francocorbelli.it/zpaqfranz.exe

  11. #2649
    Member
    Join Date
    Sep 2017
    Location
    Czech
    Posts
    8
    Thanks
    3
    Thanked 5 Times in 3 Posts
    2GB
    1) 16.645 seconds (all OK)
    2) 0.702 seconds (all OK)
    3) 1.810 seconds (all OK)

    6GB
    1) 50.201 seconds (all OK)
    2) 2.043 seconds (all OK)
    3) 5.366 seconds (all OK)

    ​Nice, congratulations!

  12. #2650
    Member
    Join Date
    Mar 2019
    Location
    China
    Posts
    3
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Is there any plan for sliding mmap? I have already mentioned that not everyone has lots of RAM, especially when you share the archive on the internet.

  13. #2651
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,593
    Thanks
    802
    Thanked 698 Times in 378 Posts
    for compression algos, using SSD instead of RAM will make compression prohibitively slow

  14. #2652
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    517
    Thanks
    25
    Thanked 45 Times in 37 Posts
    I am working on a multiple directory comparator to verify, after having performed a multiple backup extraction, the perfect match.

    Essentially given the source /tank folder we place on three different devices /copia1 /copia2 /copia3 3 different copies with 3 different software.

    Then extracted in three folders /test/1 /test/2 /test/3

    Three threads are launched that scan in parallel and calculate the CRC32 always in parallel.
    From the first few tests I'm seeing overall disk reads over 1.6GB / s (which isn't bad) from three SATA SSDs.

    By any chance there is already some such tool, which I don't know, for UNIX?

    So I don't finish editing ZPAQ and use ... that!

    A parallel diff -qr in other words

  15. #2653
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    517
    Thanks
    25
    Thanked 45 Times in 37 Posts
    This is version 49.5.

    Should also compile on Linux (tested only on Debian), plus FreeBSD and Windows (gcc)

    I have added some functions that I think are useful.

    The first is the l (list) command.
    Now with ONE parameter (the .ZPAQ file) shows its contents.

    If more than one parameter, compare the contents of the ZPAQ with one or more folders, with a (block) check of SHA1s (the old -not =).
    Can be used as a quick check after add:

    zpaqfranz a z:\1.zpaq c:\pippo
    zpaqfranz l z:\1.zpaq c:\pippo


    Then I introduce the command c (compare) for directories, between a master and N slave.

    With the switch -all launches N+1 threads.

    The default verification is file name and size only.

    Applying the -crc32 switch also verifies its checksum

    WHAT?

    During the verification phase of the correct functioning of the backups it is normal to extract them on several different media (devices).

    Using for example folders synchronized with rsync on NAS, ZIP files, ZPAQ via NFS-mounted shares, smbfs, internal HDD etc.

    Comparing multiple copies can takes a (very) long time.

    Suppose to have a /tank/condivisioni master (or source) directory (hundreds of GB, hundred thousand files)

    Suppose to have some internal (HDD) and external (NAS) rsynced copy (/rsynced-copy-1, /rsynced-copy-2, /rsynced-copy-3...)

    Suppose to have internal ZIP backup, internal ZPAQ backup, external (NAS1 zip backup), external (NAS2 zpaq backup) and so on.

    Let's extract all of them (ZIP and ZPAQs) into /temporaneo/1, /temporaneo/2, /temporaneo/3...

    You can do something like

    diff -qr /temporaneo/condivisioni /temporaneo/1
    diff -qr /temporaneo/condivisioni /temporaneo/2
    diff -qr /temporaneo/condivisioni /temporaneo/3
    (...)
    diff -qr /temporaneo/condivisioni /rsynced-copy-1
    diff -qr /temporaneo/condivisioni /rsynced-copy-2
    diff -qr /temporaneo/condivisioni /rsynced-copy-3
    (...)

    But this can take a lot of time (many hours) even for fast machines

    The command c compares a master folder (the first indicated) to N slave folders (all the others) in two particular operating modes.

    By default it just checks the correspondence of files and their size
    (for example for rsync copies with different charsets,
    ex unix vs linux, mac vs linux, unix vs ntfs it is extremely useful).

    Using the -crc32 switch a check of this code is also made (with HW CPU support, if available).

    The interesting aspect is the -all switch: N+1 threads will be created
    (one for each specified folder) and executed in parallel,
    both for scanning and for calculating the CRC.

    On modern servers (eg Xeon with 8, 10 or more CPUs)
    with different media (internal) and multiple connections (NICs) to NASs
    you can drastically reduce times compared to multiple, sequential diff -qr.

    It clearly makes no sense for single magnetic disks.

    In the given example

    zpaqfranz c /tank/condivisioni /temporaneo/1 /temporaneo/2 /temporaneo/3 /rsynced-copy-1 /rsynced-copy-2 /rsynced-copy-3 -all

    will run 7 threads which take care of one directory each.

    The hypothesis is that the six copies are each on a different device, and the server have plenty of cores and NICs.

    It's normal in datastorage and virtualization environments (at least in mine).

    Win32 e Win64 on
    http://www.francocorbelli.it/zpaqfranz.exe
    http://www.francocorbelli.it/zpaqfranz32.exe
    Attached Files Attached Files

  16. Thanks:

    Mike (11th January 2021)

  17. #2654
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    517
    Thanks
    25
    Thanked 45 Times in 37 Posts
    This is an example of sequential scan...


    (...)
    540.739.857.890 379.656 time 16.536 /tank/condivisioni/
    540.739.857.890 379.656 time 17.588 /temporaneo/dedup/1/condivisioni/
    540.739.857.890 379.656 time 17.714 /temporaneo/dedup/2/tank/condivisioni/
    540.739.857.890 379.656 time 16.71 /temporaneo/dedup/3/tank/condivisioni/
    540.739.857.890 379.656 time 16.991 /temporaneo/dedup/4/condivisioni/
    540.739.857.890 379.656 time 93.043 /monta/nas1_condivisioni/
    540.739.857.890 379.656 time 67.312 /monta/nas2_condivisioni/
    540.739.840.075 379.656 time 362.129 /copia1/backup1/sincronizzata/condivisioni/
    ------------------------
    4.325.918.845.305 3.037.248 time 608.024 sec

    608.027 seconds (all OK)


    vs threaded...

    zpaqfranz v49.5-experimental journaling archiver, compiled Jan 11 2021
    Dir compare (8 dirs to be checked)
    Creating 8 scan threads
    12/01/2021 02:00:54 Scan dir || <</tank/condivisioni/>>
    12/01/2021 02:00:54 Scan dir || <</temporaneo/dedup/1/condivisioni/>>
    12/01/2021 02:00:54 Scan dir || <</temporaneo/dedup/2/tank/condivisioni/>>
    12/01/2021 02:00:54 Scan dir || <</temporaneo/dedup/3/tank/condivisioni/>>
    12/01/2021 02:00:54 Scan dir || <</temporaneo/dedup/4/condivisioni/>>
    12/01/2021 02:00:54 Scan dir || <</monta/nas1_condivisioni/>>
    12/01/2021 02:00:54 Scan dir || <</monta/nas2_condivisioni/>>
    12/01/2021 02:00:54 Scan dir || <</copia1/backup1/sincronizzata/condivisioni/>>
    Parallel scan ended in 330.402000
    About twice as fast (in this example)

  18. #2655
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    517
    Thanks
    25
    Thanked 45 Times in 37 Posts
    This is version 50.7, with numerous bug fixes.

    In particular (perhaps) settled the test-after-add.
    Using the -test switch, immediately after the creation of the archive,
    a "chunk" SHA1 codes verify is done (very little RAM used), with also a CRC-32 verification
    (HW if available)
    This intercept even SHA1 collisions


    C:\zpaqfranz>zpaqfranz a z:\1.zpaq c:\dropbox\Dropbox -test
    zpaqfranz v50.7-experimental journaling archiver, compiled Jan 19 2021
    Creating z:/1.zpaq at offset 0 + 0
    Adding 8.725.128.041 in 29.399 files at 2021-01-19 18:11:23
    98.22% 0:00:00 8.569.443.981 -> 5.883.235.525 of 8.725.128.041 164.796.999/sec
    34.596 +added, 0 -removed.

    0.000000 + (8725.128041 -> 7400.890377 -> 6054.485111) = 6.054.485.111
    Forced XLS has included 13.342.045 bytes in 116 files

    zpaqfranz: doing a full (with file verify) test
    Compare archive content of:z:/1.zpaq:
    1 versions, 34.596 files, 122.232 fragments, 6.054.485.111 bytes (5.64 GB)
    34.596 in <<c:/dropbox/Dropbox>>
    Total files found 34.596


    GURU SHA1 COLLISION! B3FBAB1C vs 348150FB c:/dropbox/Dropbox/libri/collisione_sha1/shattered-1.pdf
    # 2020-11-06 16:00:09 422.435 c:/dropbox/Dropbox/libri/collisione_sha1/shattered-1.pdf
    + 2020-11-06 16:00:09 422.435 c:/dropbox/Dropbox/libri/collisione_sha1/shattered-1.pdf
    Block checking ( 119.742.900) done ( 7.92 GB) of ( 8.12 GB)
    00034595 =same
    00000001 #different
    00000001 +external (file missing in ZPAQ)
    Total different file size 844.870

    79.547 seconds (with errors)



    This (quick check) function can be invoked simply by using l instead of a

    zpaqfranz a z:\1.zpaq c:\pippo
    zpaqfranz l z:\1.zpaq c:\pippo


    Win32 e Win64 on
    http://www.francocorbelli.it/zpaqfranz.exe
    http://www.francocorbelli.it/zpaqfranz32.exe

    Any comments are very welcome

    Attached Files Attached Files

  19. Thanks:

    Mike (19th January 2021)

  20. #2656
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    517
    Thanks
    25
    Thanked 45 Times in 37 Posts

  21. #2657
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    517
    Thanks
    25
    Thanked 45 Times in 37 Posts
    And now on
    https://sourceforge.net/projects/zpaqfranz/

    Could someone with artistic skills help me to design the logo?

Page 89 of 89 FirstFirst ... 3979878889

Similar Threads

  1. ZPAQ self extracting archives
    By Matt Mahoney in forum Data Compression
    Replies: 31
    Last Post: 17th April 2014, 04:39
  2. ZPAQ 1.05 preview
    By Matt Mahoney in forum Data Compression
    Replies: 11
    Last Post: 30th September 2009, 05:26
  3. zpaq 1.02 update
    By Matt Mahoney in forum Data Compression
    Replies: 11
    Last Post: 10th July 2009, 01:55
  4. Metacompressor.com benchmark updates
    By Sportman in forum Data Compression
    Replies: 79
    Last Post: 22nd April 2009, 04:24
  5. ZPAQ pre-release
    By Matt Mahoney in forum Data Compression
    Replies: 54
    Last Post: 23rd March 2009, 03:17

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •