Page 3 of 3 FirstFirst 123
Results 61 to 83 of 83

Thread: Backup compression algorithm recommendations

  1. #61
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    517
    Thanks
    25
    Thanked 45 Times in 37 Posts
    Thank you but after the release of zpaq I totally left step and others non versioned packer

  2. #62
    Member
    Join Date
    Jun 2018
    Location
    Yugoslavia
    Posts
    82
    Thanks
    8
    Thanked 6 Times in 6 Posts
    Quote Originally Posted by fcorbelli View Post
    Encrypted filesystems can also occur (they are rare, but they exist), which are inherently uncompressible.
    Instead, they are easily deduplicable
    encrypted data is compressible if you know the key and encryption algorithm.
    deduplication is a form of compression.


    Last edited by pklat; 31st January 2021 at 00:55.

  3. #63
    Programmer schnaader's Avatar
    Join Date
    May 2008
    Location
    Hessen, Germany
    Posts
    630
    Thanks
    288
    Thanked 252 Times in 128 Posts
    Quote Originally Posted by fcorbelli View Post
    Unpacking the virtual machine archive (the vmdk) must be identical to the vmdk itself.
    Neither a diff ( complexity too high =>way too slow ).
    Yeah, the polynomial complexity argument is valid, at least for the proof of concept using libguestfs. I'd still say that a diff algorithm crafted for this purpose that uses the known mapping from the .vmdk should be able to get linear complexity.

    Quote Originally Posted by fcorbelli View Post
    You cannot mount (also because in some cases you would not be able, if you want I can post some examples like the one above with Solaris) the image and read its contents
    When installing the libguestfs packages, I wondered why it installs so much other stuff and didn't understand that it indeed mounts the filesystem - I thought that it would only read out the necessary data from the .vmdk file, but I guess it makes sense this way from a practical view as it's more flexible when new filesystems are introduced (mount stuff has to be updated, but libguestfs can rely on the data being passed to it).
    I was surprised at first that it didn't work with the files you uploaded, even after installing the additional libguestfs-zfs package for ZFS support - maybe I missed something. At least guestfish successfully lists two partitions (/dev/sda1 and /dev/sda2), but the filesystems displays "(unknown)", could also be some unsupported vmdk/zfs version.

    Quote Originally Posted by fcorbelli View Post
    Encrypted filesystems can also occur (they are rare, but they exist), which are inherently uncompressible.
    Instead, they are easily deduplicable
    Of course, 100% coverage will not be achieved realistically - there are just too many different VM image types (some of them proprietary), versions, filesystem and additional possible feature layers (e.g. sparse/non-sparse, built-in dedupe and/or compression in file systems, encryption...).

    Also tried 7-Zip on the smallest of your files ("fresh") and it could extract the main part of the first layer ("zfs0.img"), but only non-sparse (19,3 GB), so that also confirms the need of a tool crafted for this purpose. Guess diff and .vmdk reconstruction could still work, but all this unneeded overhead would of course slow the whole process down unnecessarily, so I stopped my curios experiments for now. Nevertheless, thanks for the test files, gave me a better insight of the stuff involved, confirmed the downside of the (underestimated) overall complexity and the fact that libguestfs is useful here in the "VM light" cases and for a proof of concept, but a good solution needs much more custom work!
    http://schnaader.info
    Damn kids. They're all alike.

  4. #64
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    517
    Thanks
    25
    Thanked 45 Times in 37 Posts
    Quote Originally Posted by pklat View Post
    encrypted data is compressible if you know the key and encryption algorithm.
    deduplication is a form of compression.


    Of course you do NOT known the keys of your customers
    Neither format.

  5. #65
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    517
    Thanks
    25
    Thanked 45 Times in 37 Posts
    Quote Originally Posted by schnaader View Post
    Also tried 7-Zip on the smallest of your files ("fresh") and it could extract the main part of the first layer ("zfs0.img"), but only non-sparse (19,3 GB), so that also confirms the need of a tool crafted for this purpose. Guess diff and .vmdk reconstruction could still work, but all this unneeded overhead would of course slow the whole process down unnecessarily, so I stopped my curios experiments for now. Nevertheless, thanks for the test files, gave me a better insight of the stuff involved, confirmed the downside of the (underestimated) overall complexity and the fact that libguestfs is useful here in the "VM light" cases and for a proof of concept, but a good solution needs much more custom work!
    This is the history.
    As you can see there is no fancy voodoo in disk filesystem
         1    15:53    ls
    2 15:53 exit
    3 15:55 pkg upgrade
    4 15:55 pkg update
    5 15:55 pkg install gcc
    6 15:55 portsnap fetch
    7 15:57 portsnap extract
    8 15:58 adduser
    9 15:59 shutdown -h now
    10 16:01 pkg install samb
    11 16:01 pkg install samba
    12 16:01 pkg search
    13 16:01 pkg search samba
    14 16:01 pkg install samba411
    15 16:02 nano /etc/rc.conf
    16 16:02 sysrc samba_server_enable=YES
    17 16:02 nano /etc/rc.conf
    18 16:03 zpool
    19 16:03 zfs
    20 16:03 zfs status
    21 16:03 zpool list
    22 16:03 nano /usr/local/etc/smb4.conf
    23 16:05 service samba_server restart
    24 16:06 smbstatus
    25 16:08 smbpasswd -a franco
    26 16:08 ping 192.168.1.2
    27 16:09 zfs create zroot/video
    28 16:09 ls /video
    29 16:09 zfs list
    30 16:09 zfs get compression zroot/video
    31 16:10 cd /monta
    32 16:10 mkdir /monta
    33 16:10 cd monta
    34 16:10 cd /monta
    35 16:10 mkdir prova
    36 16:11 mount_smbfs -N -I 192.168.1.2 //utente@franzk/z /monta/prova/
    37 16:12 mount_smbfs -I 192.168.1.2 //utente@franzk/z /monta/prova/
    38 16:12 cd z
    39 16:12 ls
    40 16:12 cd prova
    41 16:12 ls -l
    42 16:28 ls
    43 16:29 ls
    44 16:29 cp *.mp4 /zroot/video/
    45 16:30 df -h /zroot
    46 16:30 cd /
    47 16:30 cd /monta
    48 16:30 ls
    49 16:30 cd prova
    50 16:30 ls
    51 16:30 cp FreeBSD-11.4-RELEASE-amd64-disc1.iso /video
    52 16:30 rm /video
    53 16:31 cp FreeBSD-11.4-RELEASE-amd64-disc1.iso /zroot/video/
    54 16:31 shutdown -h now
    55 16:38 cls
    56 16:38 cd /tmp
    57 16:38 mkdir zpaq
    58 16:38 cd zpaq
    59 16:38 ftp archivio.francocorbelli.it
    60 16:38 vi zpaq.cpp
    61 16:38 gcc7 -O3 -march=native -Dunix zpaq.cpp -static -lstdc++ libzpaq.cpp -pthread -o
    62 16:39 gcc -O3 -march=native -Dunix zpaq.cpp -static -lstdc++ libzpaq.cpp -pthread -o zpaqfranz -static -lm
    63 16:39 ./zpaqfranz
    64 16:39 ./zpaqfranz a /tmp/prova /etc/* /usr/local/etc/*
    65 16:39 pkg install mc
    66 16:40 shutdown -h now
    67 13:00 history > /tmp/storia.txt


    As I tried to explain the answer to the thread specific question, i.e. "which" algorithm for this type of work, the answer if simple: as fast as possible, little RAM footprint, scaling well across multiple cores, good handling of large amounts of bytes equal (aka thick), decent decompression speed (ideally with no or few seeks).
    The compression rate does not matter if the above requirements are met.
    In fact, in many cases compression is not used ... at all.


    I can explain, and I have partly started to do so, "why".
    This requires starting far enough away, which is how the disaster recovery administration of a virtual datacenter works.

    In turn on the types of virtualizers, which are now 3.
    And, maybe, specifically, how the most popular (vSphere / ESX) works, and then the second (VirtualBox) and maybe the others (Xen, Proxmox etc).

    Because while I understand the question well (which compression "algorithm", not which "program"), the answer presupposes knowing the concrete use scenarios.

    I'm not sure if anyone in this thread cares about it, sure it's a very, very different world than backing up objects on filesystems.

    Because, for example, in the most popular system (vSphere) simply a filesystem (as we mean it in the modern sense) does not exist at all.

    Really, there is no filesystem

    You can't even run programs on the physical server, at most scripts.

    This has a whole series of very relevant implications in the backup and restore procedures.

    And finally, the TB-scale and half-night-time window.

    This is why the answer "whatever you want, as long as it is fast" is actually meditated on the basis of decades of experience (also as a software developer, if necessary) and not just a rant from uneducated guess

  6. #66
    Member FatBit's Avatar
    Join Date
    Jan 2012
    Location
    Prague, CZ
    Posts
    194
    Thanks
    0
    Thanked 36 Times in 27 Posts
    Dear Mr Corbelli,

    I almost agree with you but my experience is that zpaq does not cover (very) long file names and national characters (at least Czech ones). Does have Italian also national characters, different from en/us ascii? Have you tried it?

    I have "invented" to me approach to copy/rename files according to their hash values = english ascii and consequently there is not any problem.

    Best regards,

    Fatbit

  7. #67
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    517
    Thanks
    25
    Thanked 45 Times in 37 Posts
    Quote Originally Posted by FatBit View Post
    Dear Mr Corbelli,

    I almost agree with you but my experience is that zpaq does not cover (very) long file names and national characters (at least Czech ones).
    It's true
    Does have Italian also national characters, different from en/us ascii? Have you tried it?
    Yes, we have, not very much, but we have (contabilità is the most used word in Excel files)

    So I made a little "trick" on zpaqfranz source

    #ifdef _WIN32
    //// set UTF-8 for console
    SetConsoleCP(65001);
    SetConsoleOutputCP(65001);

    I have "invented" to me approach to copy/rename files according to their hash values = english ascii and consequently there is not any problem.

    Best regards,

    Fatbit
    In fact I made the (infamous) "c" (compare) command in zpaqfranz just to take care of ... file name encoding.
    On *nix there isen't any, on Windows some "voodoo" is made.

    PS I understand this is off-topic, please forgive me

  8. #68
    Member
    Join Date
    Jan 2021
    Location
    Spain
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by fcorbelli View Post
    "which" algorithm for this type of work, the answer if simple:
    as fast as possible
    e.g.
    -mx=1

    Quote Originally Posted by fcorbelli View Post
    little RAM footprint
    Depends entirely on what switches you set.
    e.g.
    -mfb=64 -md=1536M
    etc.


    Quote Originally Posted by fcorbelli View Post
    scaling well across multiple cores
    Depends entirely on what switches you set.
    e.g.:
    -mmt=16

    Quote Originally Posted by fcorbelli View Post
    good handling of large amounts of bytes equal (aka thick)
    e.g.:
    -myx[0-9]


    Quote Originally Posted by fcorbelli View Post
    The compression rate does not matter if the above requirements are met.
    Wrong, it doesn't matter to you. It matters to many depending if you compress 100M, 5GB, 500GB.
    Depends entirely on what switches you set.
    e.g.
    -mx=1 vs. -mx=9

    If using the screwdriver like a hammer, problems happen. Really read the documentation.
    Using easy mode pigz -1 is not even trying!

    Quote Originally Posted by fcorbelli View Post
    Encrypted filesystems can also occur (they are rare, but they exist), which are inherently uncompressible.
    Instead, they are easily deduplicable

    No, correct disk encryption (dm-crypt/LUKS etc.) will NEVER yield blocks that are the same:
    en.wikipedia.org/wiki/Disk_encryption_theory#Encrypted_salt-sector_initialization_vector_(ESSIV)


    Quote Originally Posted by fcorbelli View Post
    Of course you do NOT known the keys of your customers
    Neither format.
    See, no duplicates blocks possible + no keys: We cannot apply deduplication technique over correctly encrypted data.
    Talking about such edge case will slow us down and this must be SOLVED IN FILESYSTEM of encrypted machines:
    Apply deduplication on filesystem level if disk is encrypted, apply weak/super fast compression on filesystem level (lz4) if disk is encrypted.
    It is only really helpful to talk about unencrypted vmdks.

  9. #69
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    517
    Thanks
    25
    Thanked 45 Times in 37 Posts
    Quote Originally Posted by saltarelli View Post
    e.g.
    Wrong, it doesn't matter to you. It matters to many depending if you compress 100M, 5GB, 500GB...
    VM disks are all 10GB+
    Most often 400GB+

    If using the screwdriver like a hammer, problems happen. Really read the documentation.
    Typically I WRITE the documentation, when I have the time (rarely)
    Using easy mode pigz -1 is not even trying!
    Do you ever made backups for a single virtual server?
    Just to ask
    No, correct disk encryption (dm-crypt/LUKS etc.) will NEVER yield blocks that are the same:
    ...We cannot apply deduplication technique over correctly encrypted data.
    Wrong.
    Yes, we can (cit).
    I made it all the time.

    You don't want to deduplicate a single disk image, much less compress it

    You want to deduplicate (not compress) a series of daily backups of an encrypted image,
    to maintain a history of recoverability, an essential requirement for any disaster recovery plan.

    And this is doable, of course.

    This is why if you don't know how a virtual datacenter administrator works,
    it is difficult to choose the right algorithm, because you don't even know what a hammer is.

    Here is a concrete, non-theoretical example of an full-encrypted image of 30GB (approximately),
    for about 30 versions.
    900GB stored in approximately 37GB


    zpaqfranz v50.12-experimental journaling archiver, compiled Jan 29 2021
    franz:FIND >>>>music<<<<<
    provona.zpaq:
    Block 00000001 K
    Block 00000002 K
    43 versions, 597 files, 533.677 fragments, 37.628.023.420 bytes (35.04 GB)

    - 2020-11-27 09:37:36 31.138.512.896 A 0001|SHA1: 284A93CFE09E7062BB3A771D0DB0BB874611D501 CRC32: 85454A09 c:/parte/BKmusicall3p
    - 2020-11-28 09:34:08 31.138.512.896 A 0002|SHA1: 81913BAC9454E47AFCAAE295FC7469958833FDFB CRC32: 2C305D99 c:/parte/BKmusicall3p
    - 2020-11-30 08:46:36 31.138.512.896 A 0003|SHA1: 5CECEAD2E253087768AE14BC00F7FA035629E981 CRC32: CBD25D3C c:/parte/BKmusicall3p
    - 2020-11-30 08:56:16 31.138.512.896 A 0004|SHA1: 5CECEAD2E253087768AE14BC00F7FA035629E981 CRC32: CBD25D3C c:/parte/BKmusicall3p
    - 2020-11-30 08:56:16 31.138.512.896 A 0017|SHA1: A8935A5AB7ABF2DD3BC619D41AAF581F11BF4385 CRC32: 379A466A c:/parte/BKmusicall3p
    - 2020-11-30 08:56:16 31.138.512.896 A 0019|SHA1: C6A89A5455D9CE4BAB8A0E9ED12D1BC072068470 CRC32: 3BD3182C c:/parte/BKmusicall3p
    - 2020-11-30 10:26:58 31.138.512.896 A 0020|SHA1: C6A89A5455D9CE4BAB8A0E9ED12D1BC072068470 CRC32: 3BD3182C c:/parte/BKmusicall3p
    - 2020-11-30 10:35:28 31.138.512.896 A 0022|SHA1: D0C4ED3C9529FA06E818631FA2A08E5EDAD47D62 CRC32: DFC250E7 c:/parte/BKmusicall3p
    - 2020-11-30 16:20:02 31.138.512.896 A 0023|SHA1: ABD40D1DB1EE939914E3780461A56ABC5D6E8150 CRC32: CD6AD6D2 c:/parte/BKmusicall3p
    - 2020-12-02 17:59:16 31.138.512.896 A 0024|SHA1: 886BB9C6C35DEC84D53F5880BAE38B2D0BF8450D CRC32: D05278A4 c:/parte/BKmusicall3p
    - 2020-12-03 17:57:14 31.138.512.896 A 0025|SHA1: 85998AED4BF0791ED42954AB3615A8A45F1B75EA CRC32: AA98CF1F c:/parte/BKmusicall3p
    - 2020-12-05 14:21:22 31.138.512.896 A 0026|SHA1: 1B895E1B81F2B5BE67E635230340501CACF2D159 CRC32: 990DDBB2 c:/parte/BKmusicall3p
    - 2020-12-05 17:28:04 31.138.512.896 A 0027|SHA1: 83D6B9BADF1FA9B6A00B95C8AC341552309CCEB7 CRC32: E01442E1 c:/parte/BKmusicall3p
    - 2020-12-06 10:37:22 31.138.512.896 A 0028|SHA1: E6973B3E5CEFF3A0735E790119AD985918E80489 CRC32: 408E9150 c:/parte/BKmusicall3p
    - 2020-12-10 09:00:46 31.138.512.896 A 0029|SHA1: 65CA9276C0744B52C96CE393EFCE686A3E38D4D4 CRC32: 352CC30E c:/parte/BKmusicall3p
    - 2020-12-12 10:26:28 31.138.512.896 A 0030|SHA1: 873EF4D387EF02B3EFA1F59F741A0A8587AAE7C2 CRC32: E38E416A c:/parte/BKmusicall3p
    - 2020-12-15 10:06:46 31.138.512.896 A 0031|SHA1: DB4F42A2B8EAAD5C8FB6E9005D6018F236A6DF57 CRC32: 0FF18597 c:/parte/BKmusicall3p
    - 2020-12-17 08:47:40 31.138.512.896 A 0032|SHA1: 0B060E58714A3EBD246B3A291169854A22D41C62 CRC32: 869F0B0E c:/parte/BKmusicall3p
    - 2020-12-19 10:10:34 31.138.512.896 A 0033|SHA1: 39C6A23FFB52F1329386AF0C9BFCEBB2C5ECC94A CRC32: 063BD783 c:/parte/BKmusicall3p
    - 2020-12-27 12:54:02 31.138.512.896 A 0034|SHA1: 5A5E7AE5FC10BA06730723E79FCE138EF619F08B CRC32: 8104C8D4 c:/parte/BKmusicall3p
    - 2021-01-01 12:24:20 31.138.512.896 A 0035|SHA1: DD06E3D8748D45DDA97EB05DCCFD27832B13C394 CRC32: 9B060676 c:/parte/BKmusicall3p
    - 2021-01-07 18:28:58 31.138.512.896 A 0036|SHA1: ED6F4C2632F725930C568297E1D750EA129BFE77 CRC32: 7217B02C c:/parte/BKmusicall3p
    - 2021-01-14 12:36:50 31.138.512.896 A 0037|SHA1: 3B3812D365BE5E03C368492A581BEF342C947352 CRC32: 55117BBF c:/parte/BKmusicall3p
    - 2021-01-19 09:48:38 31.138.512.896 A 0038|SHA1: 5449719B7449B3AC1146ADC332F65451818E003B CRC32: EEBC783F c:/parte/BKmusicall3p
    - 2021-01-20 18:25:34 31.138.512.896 A 0039|SHA1: 78515EF3762F99C5BAFBA9063146E1C5DCC135BD CRC32: F276CFB0 c:/parte/BKmusicall3p
    - 2021-01-24 14:54:58 31.138.512.896 A 0040|SHA1: D13FF0731168EE29F2F679F95801A9BECD3B2927 CRC32: B3CBF5FC c:/parte/BKmusicall3p
    - 2021-01-24 14:54:58 31.138.512.896 A 0041|SHA1: 8F0244493D4F5062E825B42A070F4761C4781DE9 CRC32: C42F75B6 c:/parte/BKmusicall3p
    - 2021-01-28 13:49:38 31.138.512.896 A 0042|SHA1: 8F0244493D4F5062E825B42A070F4761C4781DE9 CRC32: C42F75B6 c:/parte/BKmusicall3p
    - 2021-01-30 13:39:56 31.138.512.896 A 0043|SHA1: 8F0244493D4F5062E825B42A070F4761C4781DE9 CRC32: C42F75B6 c:/parte/BKmusicall3p

    903.016.873.984 (841.00 GB) of 903.359.132.833 (841.32 GB) in 665 files shown



    Those are ~1.33TB images in ~50GB space

    zpaqfranz v50.12-experimental journaling archiver, compiled Jan 29 2021
    franz:FIND >>>>music<<<<<
    provona_???????.zpaq:
    Block 00000001 K
    Block 00000002 K
    Block 00000003 K
    50 versions, 50 files, 725.781 fragments, 51.972.616.481 bytes (48.40 GB)

    - 2020-01-07 19:18:37 31.138.512.896 A 0001|c:/parte/BKmusicall3p
    - 2020-05-16 10:50:22 31.138.512.896 A 0002|c:/parte/BKmusicall3p
    - 2020-05-16 10:58:52 31.138.512.896 A 0003|c:/parte/BKmusicall3p
    - 2020-05-16 17:09:06 31.138.512.896 A 0004|c:/parte/BKmusicall3p
    - 2020-05-16 17:14:36 31.138.512.896 A 0005|c:/parte/BKmusicall3p
    - 2020-05-16 17:45:09 31.138.512.896 A 0006|c:/parte/BKmusicall3p
    - 2020-05-16 17:56:35 31.138.512.896 A 0007|c:/parte/BKmusicall3p
    - 2020-05-17 15:18:06 31.138.512.896 A 0008|c:/parte/BKmusicall3p
    - 2020-05-27 12:25:20 31.138.512.896 A 0009|c:/parte/BKmusicall3p
    - 2020-05-29 14:51:40 31.138.512.896 A 0010|c:/parte/BKmusicall3p
    - 2020-05-31 10:42:24 31.138.512.896 A 0011|c:/parte/BKmusicall3p
    - 2020-05-31 13:36:49 31.138.512.896 A 0012|c:/parte/BKmusicall3p
    - 2020-05-31 15:33:10 31.138.512.896 A 0013|c:/parte/BKmusicall3p
    - 2020-06-03 17:33:01 31.138.512.896 A 0014|c:/parte/BKmusicall3p
    - 2020-06-08 08:54:38 31.138.512.896 A 0015|c:/parte/BKmusicall3p
    - 2020-06-13 11:09:36 31.138.512.896 A 0016|c:/parte/BKmusicall3p
    - 2020-06-16 16:30:32 31.138.512.896 A 0017|c:/parte/BKmusicall3p
    - 2020-06-21 12:41:16 31.138.512.896 A 0018|c:/parte/BKmusicall3p
    - 2020-06-23 16:03:39 31.138.512.896 A 0019|c:/parte/BKmusicall3p
    - 2020-06-24 10:04:45 31.138.512.896 A 0020|c:/parte/BKmusicall3p
    - 2020-06-29 12:58:28 31.138.512.896 A 0021|c:/parte/BKmusicall3p
    - 2020-07-03 10:14:22 31.138.512.896 A 0022|c:/parte/BKmusicall3p
    - 2020-07-05 10:03:06 31.138.512.896 A 0023|c:/parte/BKmusicall3p
    - 2020-07-08 12:52:17 0 A 0024|c:/parte/BKmusicall3p
    - 2020-07-08 12:52:42 0 A 0025|c:/parte/BKmusicall3p
    - 2020-07-08 12:53:18 0 A 0026|c:/parte/BKmusicall3p
    - 2020-07-08 12:54:26 31.138.512.896 A 0027|c:/parte/BKmusicall3p
    - 2020-07-16 12:29:05 31.138.512.896 A 0028|c:/parte/BKmusicall3p
    - 2020-07-18 11:58:34 31.138.512.896 A 0029|c:/parte/BKmusicall3p
    - 2020-07-22 12:20:31 31.138.512.896 A 0030|c:/parte/BKmusicall3p
    - 2020-07-25 16:44:04 31.138.512.896 A 0031|c:/parte/BKmusicall3p
    - 2020-07-26 14:49:04 31.138.512.896 A 0032|c:/parte/BKmusicall3p
    - 2020-07-31 11:30:28 31.138.512.896 A 0033|c:/parte/BKmusicall3p
    - 2020-08-12 08:42:50 31.138.512.896 A 0034|c:/parte/BKmusicall3p
    - 2020-08-14 14:05:46 31.138.512.896 A 0035|c:/parte/BKmusicall3p
    - 2020-08-21 15:11:16 31.138.512.896 A 0036|c:/parte/BKmusicall3p
    - 2020-08-23 08:38:22 31.138.512.896 A 0037|c:/parte/BKmusicall3p
    - 2020-08-26 16:52:54 31.138.512.896 A 0038|c:/parte/BKmusicall3p
    - 2020-09-04 12:22:56 31.138.512.896 A 0039|c:/parte/BKmusicall3p
    - 2020-09-07 14:58:28 31.138.512.896 A 0040|c:/parte/BKmusicall3p
    - 2020-09-08 17:10:26 31.138.512.896 A 0041|c:/parte/BKmusicall3p
    - 2020-09-13 16:43:30 31.138.512.896 A 0042|c:/parte/BKmusicall3p
    - 2020-09-15 17:19:18 31.138.512.896 A 0043|c:/parte/BKmusicall3p
    - 2020-09-21 15:58:56 31.138.512.896 A 0044|c:/parte/BKmusicall3p
    - 2020-09-24 16:03:56 31.138.512.896 A 0045|c:/parte/BKmusicall3p
    - 2020-09-26 17:19:40 31.138.512.896 A 0046|c:/parte/BKmusicall3p
    - 2020-09-30 12:25:12 31.138.512.896 A 0047|c:/parte/BKmusicall3p
    - 2020-10-06 16:44:04 31.138.512.896 A 0048|c:/parte/BKmusicall3p
    - 2020-10-07 10:23:14 31.138.512.896 A 0049|c:/parte/BKmusicall3p
    - 2020-10-11 16:29:44 31.138.512.896 A 0050|c:/parte/BKmusicall3p

    1.463.510.106.112 (1.33 TB) of 1.463.510.106.112 (1.33 TB) in 100 files shown


    So, in this case, you will not use compression at all, just I previously explained

  10. #70
    Member
    Join Date
    Jan 2021
    Location
    Germany
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by fcorbelli View Post
    Wrong.
    Yes, we can (cit).
    I made it all the time.
    You don't want to deduplicate a single disk image, much less compress it
    You want to deduplicate (not compress) a series of daily backups of an encrypted image,
    to maintain a history of recoverability, an essential requirement for any disaster recovery plan.
    Here is a concrete, non-theoretical example of an full-encrypted image of 30GB (approximately),
    blah

    So, in this case, you will not use compression at all, just I previously explained

    a) you are showing us outputs of an archiver with active journaling
    b) it is just an encrypted zpaq archive
    c) completely missing the point what disk encryption is

    You seemingly only encrypt the image in zpaq.
    look at this
    c:/parte/BKmusicall3p

    If you had real encryption you wouldn't even be able to see any directory name

    But it is natural that windows people are unfamiliar with linux type full disk encryption
    https://wiki.gentoo.org/wiki/Dm-cryp...isk_encryption

    Can you see file paths inside an image of unmounted && encrypted filesystem?
    No. That would entirely defeat the purpose of encryption. Any metadata is encrypted as well.

    So why are you spinning? Just say you don't know about these things. Nothing to be ashamed about!
    encrypted FS != encrypted image != encrypted archive

    Quote Originally Posted by fcorbelli View Post
    Of course you do NOT known the keys of your customers
    Neither format.
    If your filesystem and your memory is not encrypted, you are offering your customers no real encryption.

    https://www.anandtech.com/show/14587...ow-to-build-22

    But it is a waste of time to register to such sites and argue.
    I wish you'd just not mix and use these terms this much incorrectly.

  11. #71
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    517
    Thanks
    25
    Thanked 45 Times in 37 Posts
    Quote Originally Posted by ManuelS View Post
    a) you are showing us outputs of an archiver with active journaling
    b) it is just an encrypted zpaq archive
    Ahem... no
    c) completely missing the point what disk encryption is
    You seemingly only encrypt the image in zpaq.
    look at this
    c:/parte/BKmusicall3p

    If you had real encryption you wouldn't even be able to see any directory name
    Ahem...
    No.
    This is virtual disk container.
    Do you know what is it?
    It is just about a .vmdk, but fully encrypted at sector level...
    ...something like a truecrypt container.
    But it is natural that windows people are unfamiliar with linux type full disk encryption
    Yes, you are right.
    I am unfamiliar with Linux,managing only about fifty servers, because it is not enough safe for my needs.
    In fact I use Unix systems (BSD and Solaris).
    But, when I needed a really secure operating system, I wrote it myself, from scratch, on Alpha AXP.

    So why are you spinning? Just say you don't know about these things. Nothing to be ashamed about!
    encrypted FS != encrypted image != encrypted archive
    OK, you are right.
    You win.


    PS I'm a little ironic because I have probably been using (and writing) encryption, encrypted disks, whole machines encrypted, well before you knew there was something called a "computer".
    Cracking encryption and security systems for courts for 21 years now (before I was on the "dark side")
    It's sunday, after all, have a laugh

  12. #72
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    517
    Thanks
    25
    Thanked 45 Times in 37 Posts

    Srep output...
    100%: 31,138,512,896 -> 31,138,631,752: 100.00%. Cpu 215 mb/s (138.031 sec), real 230 mb/s (128.893 sec) = 107%. Remains 00:00
    Decompression memory is 0 mb. 0 matches = 0 bytes = 0.00% of file

    30/01/2021 14:39 31.138.512.896 BKmusicall3p
    31/01/2021 18:38 31.138.631.752 prova.srep

    So it should be clear that this is a really encripted container.

    However it can obviously be deduplicated, in this case by zpaq, taking up a minimum effective space.
    A kind of diff on steroids

    C:\cloud>c:\cloud\zpaqfranz a r:\test\cloud\provona.zpaq c:\parte\BKmusicall3p c:\zpaqfranz\* -force -method 0 -summary 1 -checksum -pakka
    franz:checksumming every file with CRC32, store SHA1

    Updating r:/test/cloud/provona.zpaq at offset 37628023420 + 0
    Adding 31.252.884.173 in 383 files at 2021-01-31 17:41:20
    + Checksumming ( 31.138.512.896) c:/parte/BKmusicall3p
    1 +added, 0 -removed.

    37628.023420 + (31252.884173 -> 3.952238 -> 5.743409) = 37.633.766.829
    Forced XLS has included 27.136 bytes in 1 files

    214.641 seconds (all OK)

    New version: about 6MB

  13. #73
    Member
    Join Date
    Jun 2018
    Location
    Yugoslavia
    Posts
    82
    Thanks
    8
    Thanked 6 Times in 6 Posts
    you didn't start this thread and there was no mentioning of encryption (esp. of other peoples data) in OP post.
    I think clouds should just evaporate, and everyone should fully control their own data.

  14. #74
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    517
    Thanks
    25
    Thanked 45 Times in 37 Posts
    Quote Originally Posted by pklat View Post
    you didn't start this thread and there was no mentioning of encryption (esp. of other peoples data) in OP post.
    I think clouds should just evaporate, and everyone should fully control their own data.
    ...- VM image data
    And yes.
    There are encrypted VMs.
    Not very common, but they certainly exist.
    As I just demonstrated.
    A good compression algorithm must consider all use cases, not just mine or yours.
    ---
    The other special situation is the ESX thin disks that transferred to different systems (ex NAS) become thick or thick sparse.
    In both cases they can (depending on the fragmentation) be padded to zero

  15. #75
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    517
    Thanks
    25
    Thanked 45 Times in 37 Posts
    A detail.
    In an ideal World the algorithm should have, in addition to a very modest use of RAM, the possibility of being compiled even on a gcc 3.2 (yes, I know, archaic) in 32 bit.

    The reason is to run it on ESXi servers.
    I'm working on it, and it's not easy at all.

    Clearly an "implantable" software, directly into the server, would have enormous appeal for every vSphere user

    Click image for larger version. 

Name:	esx.jpg 
Views:	31 
Size:	257.1 KB 
ID:	8342

  16. #76
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,133
    Thanks
    320
    Thanked 1,396 Times in 801 Posts
    For similar reasons, I'm thinking about a custom binary-blob loader for codec plugins.
    Compression algorithms usually don't need much interaction with OS from the start, and mman.h should be portable enough.

  17. #77
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    517
    Thanks
    25
    Thanked 45 Times in 37 Posts
    As the compressor I suggest something like this (just a mock up for Windows,
    I am working on a vSphere port)
    Attached Files Attached Files

  18. #78
    Member
    Join Date
    Feb 2021
    Location
    here
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hello. Some time lurker, first time poster here.

    I had a wishlist for a while now:
    LZHAM has close LZMA(2) ratio, and significantly faster decompression than LZMA(2). but very slow compression.
    FLZMA2 on the other hand, has LZMA(2) ratio, and faster compression than LZMA(2), but being 7z/LZMA format compatible, it stuck with slow LZMA(2) decompression.
    I imagine that if someone could fit FLZMA2 compression & speed into the faster decompressing LZHAM format, that would be the best of the two worlds! (Maybe lose a bit of compression ratio in such merge, but still... a class of its own, possibly Paretto approaching too)
    And such combination may be the answer to the needs you seek.

  19. #79
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,133
    Thanks
    320
    Thanked 1,396 Times in 801 Posts
    @fcorbelli: please test zstd: https://github.com/facebook/zstd
    There's currently no point to accept a codec with worse compression than that.

    @warthog: The actual reason for relative LZMA slowness on out-of-order CPUs is serial dependency.
    Its possible to significantly improve LZMA decoding speed if substream interleaving is introduced...
    which adds some redundancy and breaks format compatibility though.
    Codecs like that do already exist: LZNA, LOLZ, NLZM etc.
    LZHAM uses huffman coding and thus should be compared to zstd and brotli instead.

  20. #80
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    517
    Thanks
    25
    Thanked 45 Times in 37 Posts
    Quote Originally Posted by Shelwien View Post
    @fcorbelli: please test zstd: https://github.com/facebook/zstd
    There's currently no point to accept a codec with worse compression than that.
    In fact, no,or at least not always.
    No if you need something way easier to compile.
    Strip down one of those 'monster' require a lot of time and efforts.
    Mini lzo works rather well with about 150K on a single thread.
    Of course ratio is not exceptional, but it doesnt really matter for VMs.

    Good (I hope, developing) for a 24h background compression task (ex on ghetto vmware snapshot)
    Therefore another question is: where the software will run?

  21. #81
    Member
    Join Date
    Jan 2017
    Location
    uk
    Posts
    15
    Thanks
    0
    Thanked 7 Times in 3 Posts
    Quote Originally Posted by Bulat Ziganshin View Post

    And FA'Next is even better in both CR and speed, especially the unpublished 0.12 version.
    When will you be releasing v0.12?

  22. #82
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    4,133
    Thanks
    320
    Thanked 1,396 Times in 801 Posts
    Single-file zstd -1
    Compiling: g++ -O3 czstd.cpp -o czstd
    Usage: czstd c/d input output
    Attached Files Attached Files

  23. #83
    Member
    Join Date
    Dec 2013
    Location
    Italy
    Posts
    517
    Thanks
    25
    Thanked 45 Times in 37 Posts
    Quote Originally Posted by Shelwien View Post
    Single-file zstd -1
    Compiling: g++ -O3 czstd.cpp -o czstd
    Usage: czstd c/d input output
    I will try

Page 3 of 3 FirstFirst 123

Similar Threads

  1. My new compression algorithm
    By tefara in forum Random Compression
    Replies: 55
    Last Post: 12th June 2019, 21:45
  2. Anyone know which compression algorithm does this?
    By hjazz in forum Data Compression
    Replies: 8
    Last Post: 24th March 2014, 06:49
  3. How do you backup?
    By Piotr Tarsa in forum The Off-Topic Lounge
    Replies: 25
    Last Post: 17th September 2013, 23:39
  4. Replies: 5
    Last Post: 25th December 2011, 21:53
  5. Test set: backup
    By m^2 in forum Data Compression
    Replies: 1
    Last Post: 23rd October 2008, 23:16

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •