Thank you but after the release of zpaq I totally left step and others non versioned packer
Thank you but after the release of zpaq I totally left step and others non versioned packer
Yeah, the polynomial complexity argument is valid, at least for the proof of concept using libguestfs. I'd still say that a diff algorithm crafted for this purpose that uses the known mapping from the .vmdk should be able to get linear complexity.
When installing the libguestfs packages, I wondered why it installs so much other stuff and didn't understand that it indeed mounts the filesystem - I thought that it would only read out the necessary data from the .vmdk file, but I guess it makes sense this way from a practical view as it's more flexible when new filesystems are introduced (mount stuff has to be updated, but libguestfs can rely on the data being passed to it).
I was surprised at first that it didn't work with the files you uploaded, even after installing the additional libguestfs-zfs package for ZFS support - maybe I missed something. At least guestfish successfully lists two partitions (/dev/sda1 and /dev/sda2), but the filesystems displays "(unknown)", could also be some unsupported vmdk/zfs version.
Of course, 100% coverage will not be achieved realistically - there are just too many different VM image types (some of them proprietary), versions, filesystem and additional possible feature layers (e.g. sparse/non-sparse, built-in dedupe and/or compression in file systems, encryption...).
Also tried 7-Zip on the smallest of your files ("fresh") and it could extract the main part of the first layer ("zfs0.img"), but only non-sparse (19,3 GB), so that also confirms the need of a tool crafted for this purpose. Guess diff and .vmdk reconstruction could still work, but all this unneeded overhead would of course slow the whole process down unnecessarily, so I stopped my curios experiments for now. Nevertheless, thanks for the test files, gave me a better insight of the stuff involved, confirmed the downside of the (underestimated) overall complexity and the fact that libguestfs is useful here in the "VM light" cases and for a proof of concept, but a good solution needs much more custom work!
http://schnaader.info
Damn kids. They're all alike.
This is the history.
As you can see there is no fancy voodoo in disk filesystem
1 15:53 ls
2 15:53 exit
3 15:55 pkg upgrade
4 15:55 pkg update
5 15:55 pkg install gcc
6 15:55 portsnap fetch
7 15:57 portsnap extract
8 15:58 adduser
9 15:59 shutdown -h now
10 16:01 pkg install samb
11 16:01 pkg install samba
12 16:01 pkg search
13 16:01 pkg search samba
14 16:01 pkg install samba411
15 16:02 nano /etc/rc.conf
16 16:02 sysrc samba_server_enable=YES
17 16:02 nano /etc/rc.conf
18 16:03 zpool
19 16:03 zfs
20 16:03 zfs status
21 16:03 zpool list
22 16:03 nano /usr/local/etc/smb4.conf
23 16:05 service samba_server restart
24 16:06 smbstatus
25 16:08 smbpasswd -a franco
26 16:08 ping 192.168.1.2
27 16:09 zfs create zroot/video
28 16:09 ls /video
29 16:09 zfs list
30 16:09 zfs get compression zroot/video
31 16:10 cd /monta
32 16:10 mkdir /monta
33 16:10 cd monta
34 16:10 cd /monta
35 16:10 mkdir prova
36 16:11 mount_smbfs -N -I 192.168.1.2 //utente@franzk/z /monta/prova/
37 16:12 mount_smbfs -I 192.168.1.2 //utente@franzk/z /monta/prova/
38 16:12 cd z
39 16:12 ls
40 16:12 cd prova
41 16:12 ls -l
42 16:28 ls
43 16:29 ls
44 16:29 cp *.mp4 /zroot/video/
45 16:30 df -h /zroot
46 16:30 cd /
47 16:30 cd /monta
48 16:30 ls
49 16:30 cd prova
50 16:30 ls
51 16:30 cp FreeBSD-11.4-RELEASE-amd64-disc1.iso /video
52 16:30 rm /video
53 16:31 cp FreeBSD-11.4-RELEASE-amd64-disc1.iso /zroot/video/
54 16:31 shutdown -h now
55 16:38 cls
56 16:38 cd /tmp
57 16:38 mkdir zpaq
58 16:38 cd zpaq
59 16:38 ftp archivio.francocorbelli.it
60 16:38 vi zpaq.cpp
61 16:38 gcc7 -O3 -march=native -Dunix zpaq.cpp -static -lstdc++ libzpaq.cpp -pthread -o
62 16:39 gcc -O3 -march=native -Dunix zpaq.cpp -static -lstdc++ libzpaq.cpp -pthread -o zpaqfranz -static -lm
63 16:39 ./zpaqfranz
64 16:39 ./zpaqfranz a /tmp/prova /etc/* /usr/local/etc/*
65 16:39 pkg install mc
66 16:40 shutdown -h now
67 13:00 history > /tmp/storia.txt
As I tried to explain the answer to the thread specific question, i.e. "which" algorithm for this type of work, the answer if simple: as fast as possible, little RAM footprint, scaling well across multiple cores, good handling of large amounts of bytes equal (aka thick), decent decompression speed (ideally with no or few seeks).
The compression rate does not matter if the above requirements are met.
In fact, in many cases compression is not used ... at all.
I can explain, and I have partly started to do so, "why".
This requires starting far enough away, which is how the disaster recovery administration of a virtual datacenter works.
In turn on the types of virtualizers, which are now 3.
And, maybe, specifically, how the most popular (vSphere / ESX) works, and then the second (VirtualBox) and maybe the others (Xen, Proxmox etc).
Because while I understand the question well (which compression "algorithm", not which "program"), the answer presupposes knowing the concrete use scenarios.
I'm not sure if anyone in this thread cares about it, sure it's a very, very different world than backing up objects on filesystems.
Because, for example, in the most popular system (vSphere) simply a filesystem (as we mean it in the modern sense) does not exist at all.
Really, there is no filesystem
You can't even run programs on the physical server, at most scripts.
This has a whole series of very relevant implications in the backup and restore procedures.
And finally, the TB-scale and half-night-time window.
This is why the answer "whatever you want, as long as it is fast" is actually meditated on the basis of decades of experience (also as a software developer, if necessary) and not just a rant from uneducated guess
Dear Mr Corbelli,
I almost agree with you but my experience is that zpaq does not cover (very) long file names and national characters (at least Czech ones). Does have Italian also national characters, different from en/us ascii? Have you tried it?
I have "invented" to me approach to copy/rename files according to their hash values = english ascii and consequently there is not any problem.
Best regards,
Fatbit
It's true
Yes, we have, not very much, but we have (contabilità is the most used word in Excel files)Does have Italian also national characters, different from en/us ascii? Have you tried it?
So I made a little "trick" on zpaqfranz source
#ifdef _WIN32
//// set UTF-8 for console
SetConsoleCP(65001);
SetConsoleOutputCP(65001);
In fact I made the (infamous) "c" (compare) command in zpaqfranz just to take care of ... file name encoding.I have "invented" to me approach to copy/rename files according to their hash values = english ascii and consequently there is not any problem.
Best regards,
Fatbit
On *nix there isen't any, on Windows some "voodoo" is made.
PS I understand this is off-topic, please forgive me![]()
e.g.
-mx=1
Depends entirely on what switches you set.
e.g.
-mfb=64 -md=1536M
etc.
Depends entirely on what switches you set.
e.g.:
-mmt=16
e.g.:
-myx[0-9]
Wrong, it doesn't matter to you. It matters to many depending if you compress 100M, 5GB, 500GB.
Depends entirely on what switches you set.
e.g.
-mx=1 vs. -mx=9
If using the screwdriver like a hammer, problems happen. Really read the documentation.
Using easy mode pigz -1 is not even trying!
No, correct disk encryption (dm-crypt/LUKS etc.) will NEVER yield blocks that are the same:
en.wikipedia.org/wiki/Disk_encryption_theory#Encrypted_salt-sector_initialization_vector_(ESSIV)
See, no duplicates blocks possible + no keys: We cannot apply deduplication technique over correctly encrypted data.
Talking about such edge case will slow us down and this must be SOLVED IN FILESYSTEM of encrypted machines:
Apply deduplication on filesystem level if disk is encrypted, apply weak/super fast compression on filesystem level (lz4) if disk is encrypted.
It is only really helpful to talk about unencrypted vmdks.
VM disks are all 10GB+
Most often 400GB+
Typically I WRITE the documentation, when I have the time (rarely)If using the screwdriver like a hammer, problems happen. Really read the documentation.
Do you ever made backups for a single virtual server?Using easy mode pigz -1 is not even trying!
Just to ask
Wrong.No, correct disk encryption (dm-crypt/LUKS etc.) will NEVER yield blocks that are the same:
...We cannot apply deduplication technique over correctly encrypted data.
Yes, we can (cit).
I made it all the time.
You don't want to deduplicate a single disk image, much less compress it
You want to deduplicate (not compress) a series of daily backups of an encrypted image,
to maintain a history of recoverability, an essential requirement for any disaster recovery plan.
And this is doable, of course.
This is why if you don't know how a virtual datacenter administrator works,
it is difficult to choose the right algorithm, because you don't even know what a hammer is.
Here is a concrete, non-theoretical example of an full-encrypted image of 30GB (approximately),
for about 30 versions.
900GB stored in approximately 37GB
zpaqfranz v50.12-experimental journaling archiver, compiled Jan 29 2021
franz:FIND >>>>music<<<<<
provona.zpaq:
Block 00000001 K
Block 00000002 K
43 versions, 597 files, 533.677 fragments, 37.628.023.420 bytes (35.04 GB)
- 2020-11-27 09:37:36 31.138.512.896 A 0001|SHA1: 284A93CFE09E7062BB3A771D0DB0BB874611D501 CRC32: 85454A09 c:/parte/BKmusicall3p
- 2020-11-28 09:34:08 31.138.512.896 A 0002|SHA1: 81913BAC9454E47AFCAAE295FC7469958833FDFB CRC32: 2C305D99 c:/parte/BKmusicall3p
- 2020-11-30 08:46:36 31.138.512.896 A 0003|SHA1: 5CECEAD2E253087768AE14BC00F7FA035629E981 CRC32: CBD25D3C c:/parte/BKmusicall3p
- 2020-11-30 08:56:16 31.138.512.896 A 0004|SHA1: 5CECEAD2E253087768AE14BC00F7FA035629E981 CRC32: CBD25D3C c:/parte/BKmusicall3p
- 2020-11-30 08:56:16 31.138.512.896 A 0017|SHA1: A8935A5AB7ABF2DD3BC619D41AAF581F11BF4385 CRC32: 379A466A c:/parte/BKmusicall3p
- 2020-11-30 08:56:16 31.138.512.896 A 0019|SHA1: C6A89A5455D9CE4BAB8A0E9ED12D1BC072068470 CRC32: 3BD3182C c:/parte/BKmusicall3p
- 2020-11-30 10:26:58 31.138.512.896 A 0020|SHA1: C6A89A5455D9CE4BAB8A0E9ED12D1BC072068470 CRC32: 3BD3182C c:/parte/BKmusicall3p
- 2020-11-30 10:35:28 31.138.512.896 A 0022|SHA1: D0C4ED3C9529FA06E818631FA2A08E5EDAD47D62 CRC32: DFC250E7 c:/parte/BKmusicall3p
- 2020-11-30 16:20:02 31.138.512.896 A 0023|SHA1: ABD40D1DB1EE939914E3780461A56ABC5D6E8150 CRC32: CD6AD6D2 c:/parte/BKmusicall3p
- 2020-12-02 17:59:16 31.138.512.896 A 0024|SHA1: 886BB9C6C35DEC84D53F5880BAE38B2D0BF8450D CRC32: D05278A4 c:/parte/BKmusicall3p
- 2020-12-03 17:57:14 31.138.512.896 A 0025|SHA1: 85998AED4BF0791ED42954AB3615A8A45F1B75EA CRC32: AA98CF1F c:/parte/BKmusicall3p
- 2020-12-05 14:21:22 31.138.512.896 A 0026|SHA1: 1B895E1B81F2B5BE67E635230340501CACF2D159 CRC32: 990DDBB2 c:/parte/BKmusicall3p
- 2020-12-05 17:28:04 31.138.512.896 A 0027|SHA1: 83D6B9BADF1FA9B6A00B95C8AC341552309CCEB7 CRC32: E01442E1 c:/parte/BKmusicall3p
- 2020-12-06 10:37:22 31.138.512.896 A 0028|SHA1: E6973B3E5CEFF3A0735E790119AD985918E80489 CRC32: 408E9150 c:/parte/BKmusicall3p
- 2020-12-10 09:00:46 31.138.512.896 A 0029|SHA1: 65CA9276C0744B52C96CE393EFCE686A3E38D4D4 CRC32: 352CC30E c:/parte/BKmusicall3p
- 2020-12-12 10:26:28 31.138.512.896 A 0030|SHA1: 873EF4D387EF02B3EFA1F59F741A0A8587AAE7C2 CRC32: E38E416A c:/parte/BKmusicall3p
- 2020-12-15 10:06:46 31.138.512.896 A 0031|SHA1: DB4F42A2B8EAAD5C8FB6E9005D6018F236A6DF57 CRC32: 0FF18597 c:/parte/BKmusicall3p
- 2020-12-17 08:47:40 31.138.512.896 A 0032|SHA1: 0B060E58714A3EBD246B3A291169854A22D41C62 CRC32: 869F0B0E c:/parte/BKmusicall3p
- 2020-12-19 10:10:34 31.138.512.896 A 0033|SHA1: 39C6A23FFB52F1329386AF0C9BFCEBB2C5ECC94A CRC32: 063BD783 c:/parte/BKmusicall3p
- 2020-12-27 12:54:02 31.138.512.896 A 0034|SHA1: 5A5E7AE5FC10BA06730723E79FCE138EF619F08B CRC32: 8104C8D4 c:/parte/BKmusicall3p
- 2021-01-01 12:24:20 31.138.512.896 A 0035|SHA1: DD06E3D8748D45DDA97EB05DCCFD27832B13C394 CRC32: 9B060676 c:/parte/BKmusicall3p
- 2021-01-07 18:28:58 31.138.512.896 A 0036|SHA1: ED6F4C2632F725930C568297E1D750EA129BFE77 CRC32: 7217B02C c:/parte/BKmusicall3p
- 2021-01-14 12:36:50 31.138.512.896 A 0037|SHA1: 3B3812D365BE5E03C368492A581BEF342C947352 CRC32: 55117BBF c:/parte/BKmusicall3p
- 2021-01-19 09:48:38 31.138.512.896 A 0038|SHA1: 5449719B7449B3AC1146ADC332F65451818E003B CRC32: EEBC783F c:/parte/BKmusicall3p
- 2021-01-20 18:25:34 31.138.512.896 A 0039|SHA1: 78515EF3762F99C5BAFBA9063146E1C5DCC135BD CRC32: F276CFB0 c:/parte/BKmusicall3p
- 2021-01-24 14:54:58 31.138.512.896 A 0040|SHA1: D13FF0731168EE29F2F679F95801A9BECD3B2927 CRC32: B3CBF5FC c:/parte/BKmusicall3p
- 2021-01-24 14:54:58 31.138.512.896 A 0041|SHA1: 8F0244493D4F5062E825B42A070F4761C4781DE9 CRC32: C42F75B6 c:/parte/BKmusicall3p
- 2021-01-28 13:49:38 31.138.512.896 A 0042|SHA1: 8F0244493D4F5062E825B42A070F4761C4781DE9 CRC32: C42F75B6 c:/parte/BKmusicall3p
- 2021-01-30 13:39:56 31.138.512.896 A 0043|SHA1: 8F0244493D4F5062E825B42A070F4761C4781DE9 CRC32: C42F75B6 c:/parte/BKmusicall3p
903.016.873.984 (841.00 GB) of 903.359.132.833 (841.32 GB) in 665 files shown
Those are ~1.33TB images in ~50GB space
zpaqfranz v50.12-experimental journaling archiver, compiled Jan 29 2021
franz:FIND >>>>music<<<<<
provona_???????.zpaq:
Block 00000001 K
Block 00000002 K
Block 00000003 K
50 versions, 50 files, 725.781 fragments, 51.972.616.481 bytes (48.40 GB)
- 2020-01-07 19:18:37 31.138.512.896 A 0001|c:/parte/BKmusicall3p
- 2020-05-16 10:50:22 31.138.512.896 A 0002|c:/parte/BKmusicall3p
- 2020-05-16 10:58:52 31.138.512.896 A 0003|c:/parte/BKmusicall3p
- 2020-05-16 17:09:06 31.138.512.896 A 0004|c:/parte/BKmusicall3p
- 2020-05-16 17:14:36 31.138.512.896 A 0005|c:/parte/BKmusicall3p
- 2020-05-16 17:45:09 31.138.512.896 A 0006|c:/parte/BKmusicall3p
- 2020-05-16 17:56:35 31.138.512.896 A 0007|c:/parte/BKmusicall3p
- 2020-05-17 15:18:06 31.138.512.896 A 0008|c:/parte/BKmusicall3p
- 2020-05-27 12:25:20 31.138.512.896 A 0009|c:/parte/BKmusicall3p
- 2020-05-29 14:51:40 31.138.512.896 A 0010|c:/parte/BKmusicall3p
- 2020-05-31 10:42:24 31.138.512.896 A 0011|c:/parte/BKmusicall3p
- 2020-05-31 13:36:49 31.138.512.896 A 0012|c:/parte/BKmusicall3p
- 2020-05-31 15:33:10 31.138.512.896 A 0013|c:/parte/BKmusicall3p
- 2020-06-03 17:33:01 31.138.512.896 A 0014|c:/parte/BKmusicall3p
- 2020-06-08 08:54:38 31.138.512.896 A 0015|c:/parte/BKmusicall3p
- 2020-06-13 11:09:36 31.138.512.896 A 0016|c:/parte/BKmusicall3p
- 2020-06-16 16:30:32 31.138.512.896 A 0017|c:/parte/BKmusicall3p
- 2020-06-21 12:41:16 31.138.512.896 A 0018|c:/parte/BKmusicall3p
- 2020-06-23 16:03:39 31.138.512.896 A 0019|c:/parte/BKmusicall3p
- 2020-06-24 10:04:45 31.138.512.896 A 0020|c:/parte/BKmusicall3p
- 2020-06-29 12:58:28 31.138.512.896 A 0021|c:/parte/BKmusicall3p
- 2020-07-03 10:14:22 31.138.512.896 A 0022|c:/parte/BKmusicall3p
- 2020-07-05 10:03:06 31.138.512.896 A 0023|c:/parte/BKmusicall3p
- 2020-07-08 12:52:17 0 A 0024|c:/parte/BKmusicall3p
- 2020-07-08 12:52:42 0 A 0025|c:/parte/BKmusicall3p
- 2020-07-08 12:53:18 0 A 0026|c:/parte/BKmusicall3p
- 2020-07-08 12:54:26 31.138.512.896 A 0027|c:/parte/BKmusicall3p
- 2020-07-16 12:29:05 31.138.512.896 A 0028|c:/parte/BKmusicall3p
- 2020-07-18 11:58:34 31.138.512.896 A 0029|c:/parte/BKmusicall3p
- 2020-07-22 12:20:31 31.138.512.896 A 0030|c:/parte/BKmusicall3p
- 2020-07-25 16:44:04 31.138.512.896 A 0031|c:/parte/BKmusicall3p
- 2020-07-26 14:49:04 31.138.512.896 A 0032|c:/parte/BKmusicall3p
- 2020-07-31 11:30:28 31.138.512.896 A 0033|c:/parte/BKmusicall3p
- 2020-08-12 08:42:50 31.138.512.896 A 0034|c:/parte/BKmusicall3p
- 2020-08-14 14:05:46 31.138.512.896 A 0035|c:/parte/BKmusicall3p
- 2020-08-21 15:11:16 31.138.512.896 A 0036|c:/parte/BKmusicall3p
- 2020-08-23 08:38:22 31.138.512.896 A 0037|c:/parte/BKmusicall3p
- 2020-08-26 16:52:54 31.138.512.896 A 0038|c:/parte/BKmusicall3p
- 2020-09-04 12:22:56 31.138.512.896 A 0039|c:/parte/BKmusicall3p
- 2020-09-07 14:58:28 31.138.512.896 A 0040|c:/parte/BKmusicall3p
- 2020-09-08 17:10:26 31.138.512.896 A 0041|c:/parte/BKmusicall3p
- 2020-09-13 16:43:30 31.138.512.896 A 0042|c:/parte/BKmusicall3p
- 2020-09-15 17:19:18 31.138.512.896 A 0043|c:/parte/BKmusicall3p
- 2020-09-21 15:58:56 31.138.512.896 A 0044|c:/parte/BKmusicall3p
- 2020-09-24 16:03:56 31.138.512.896 A 0045|c:/parte/BKmusicall3p
- 2020-09-26 17:19:40 31.138.512.896 A 0046|c:/parte/BKmusicall3p
- 2020-09-30 12:25:12 31.138.512.896 A 0047|c:/parte/BKmusicall3p
- 2020-10-06 16:44:04 31.138.512.896 A 0048|c:/parte/BKmusicall3p
- 2020-10-07 10:23:14 31.138.512.896 A 0049|c:/parte/BKmusicall3p
- 2020-10-11 16:29:44 31.138.512.896 A 0050|c:/parte/BKmusicall3p
1.463.510.106.112 (1.33 TB) of 1.463.510.106.112 (1.33 TB) in 100 files shown
So, in this case, you will not use compression at all, just I previously explained
a) you are showing us outputs of an archiver with active journaling
b) it is just an encrypted zpaq archive
c) completely missing the point what disk encryption is
You seemingly only encrypt the image in zpaq.
look at this
c:/parte/BKmusicall3p
If you had real encryption you wouldn't even be able to see any directory name
But it is natural that windows people are unfamiliar with linux type full disk encryption
https://wiki.gentoo.org/wiki/Dm-cryp...isk_encryption
Can you see file paths inside an image of unmounted && encrypted filesystem?
No. That would entirely defeat the purpose of encryption. Any metadata is encrypted as well.
So why are you spinning? Just say you don't know about these things. Nothing to be ashamed about!
encrypted FS != encrypted image != encrypted archive
If your filesystem and your memory is not encrypted, you are offering your customers no real encryption.
https://www.anandtech.com/show/14587...ow-to-build-22
But it is a waste of time to register to such sites and argue.
I wish you'd just not mix and use these terms this much incorrectly.
Ahem... no
Ahem...c) completely missing the point what disk encryption is
You seemingly only encrypt the image in zpaq.
look at this
c:/parte/BKmusicall3p
If you had real encryption you wouldn't even be able to see any directory name
No.
This is virtual disk container.
Do you know what is it?
It is just about a .vmdk, but fully encrypted at sector level...
...something like a truecrypt container.
Yes, you are right.But it is natural that windows people are unfamiliar with linux type full disk encryption
I am unfamiliar with Linux,managing only about fifty servers, because it is not enough safe for my needs.
In fact I use Unix systems (BSD and Solaris).
But, when I needed a really secure operating system, I wrote it myself, from scratch, on Alpha AXP.
OK, you are right.So why are you spinning? Just say you don't know about these things. Nothing to be ashamed about!
encrypted FS != encrypted image != encrypted archive
You win.
PS I'm a little ironic because I have probably been using (and writing) encryption, encrypted disks, whole machines encrypted, well before you knew there was something called a "computer".
Cracking encryption and security systems for courts for 21 years now (before I was on the "dark side")
It's sunday, after all, have a laugh
Srep output...
100%: 31,138,512,896 -> 31,138,631,752: 100.00%. Cpu 215 mb/s (138.031 sec), real 230 mb/s (128.893 sec) = 107%. Remains 00:00
Decompression memory is 0 mb. 0 matches = 0 bytes = 0.00% of file
30/01/2021 14:39 31.138.512.896 BKmusicall3p
31/01/2021 18:38 31.138.631.752 prova.srep
So it should be clear that this is a really encripted container.
However it can obviously be deduplicated, in this case by zpaq, taking up a minimum effective space.
A kind of diff on steroids
C:\cloud>c:\cloud\zpaqfranz a r:\test\cloud\provona.zpaq c:\parte\BKmusicall3p c:\zpaqfranz\* -force -method 0 -summary 1 -checksum -pakka
franz:checksumming every file with CRC32, store SHA1
Updating r:/test/cloud/provona.zpaq at offset 37628023420 + 0
Adding 31.252.884.173 in 383 files at 2021-01-31 17:41:20
+ Checksumming ( 31.138.512.896) c:/parte/BKmusicall3p
1 +added, 0 -removed.
37628.023420 + (31252.884173 -> 3.952238 -> 5.743409) = 37.633.766.829
Forced XLS has included 27.136 bytes in 1 files
214.641 seconds (all OK)
New version: about 6MB
you didn't start this thread and there was no mentioning of encryption (esp. of other peoples data) in OP post.
I think clouds should just evaporate, and everyone should fully control their own data.
And yes....- VM image data
There are encrypted VMs.
Not very common, but they certainly exist.
As I just demonstrated.
A good compression algorithm must consider all use cases, not just mine or yours.
---
The other special situation is the ESX thin disks that transferred to different systems (ex NAS) become thick or thick sparse.
In both cases they can (depending on the fragmentation) be padded to zero
A detail.
In an ideal World the algorithm should have, in addition to a very modest use of RAM, the possibility of being compiled even on a gcc 3.2 (yes, I know, archaic) in 32 bit.
The reason is to run it on ESXi servers.
I'm working on it, and it's not easy at all.
Clearly an "implantable" software, directly into the server, would have enormous appeal for every vSphere user
![]()
For similar reasons, I'm thinking about a custom binary-blob loader for codec plugins.
Compression algorithms usually don't need much interaction with OS from the start, and mman.h should be portable enough.
As the compressor I suggest something like this (just a mock up for Windows,
I am working on a vSphere port)
Hello. Some time lurker, first time poster here.
I had a wishlist for a while now:
LZHAM has close LZMA(2) ratio, and significantly faster decompression than LZMA(2). but very slow compression.
FLZMA2 on the other hand, has LZMA(2) ratio, and faster compression than LZMA(2), but being 7z/LZMA format compatible, it stuck with slow LZMA(2) decompression.
I imagine that if someone could fit FLZMA2 compression & speed into the faster decompressing LZHAM format, that would be the best of the two worlds! (Maybe lose a bit of compression ratio in such merge, but still... a class of its own, possibly Paretto approaching too)
And such combination may be the answer to the needs you seek.
@fcorbelli: please test zstd: https://github.com/facebook/zstd
There's currently no point to accept a codec with worse compression than that.
@warthog: The actual reason for relative LZMA slowness on out-of-order CPUs is serial dependency.
Its possible to significantly improve LZMA decoding speed if substream interleaving is introduced...
which adds some redundancy and breaks format compatibility though.
Codecs like that do already exist: LZNA, LOLZ, NLZM etc.
LZHAM uses huffman coding and thus should be compared to zstd and brotli instead.
In fact, no,or at least not always.
No if you need something way easier to compile.
Strip down one of those 'monster' require a lot of time and efforts.
Mini lzo works rather well with about 150K on a single thread.
Of course ratio is not exceptional, but it doesnt really matter for VMs.
Good (I hope, developing) for a 24h background compression task (ex on ghetto vmware snapshot)
Therefore another question is: where the software will run?
Single-file zstd -1
Compiling: g++ -O3 czstd.cpp -o czstd
Usage: czstd c/d input output