I think there is one use case which is already supported by this specification but which could be supported better.
Consider the following folder structure:
Code:
dir_to_comp
1.rar_uncompressed_dir
1.rar
Rar (for example) files probably will not be able to be reconstructed using uncompressed files any time soon, so they need to be stored directly. The extracted files on the other hand may be extracted from the rar instead of being stored seperately. Right now I think each file needs its own block node entry, if you want to split it in multiple blocks for deduplication even more. So how about allowing a single block node entry to create multiple blocks, defined by the underlaying block itself? Maybe it would be helpful to be able to redundantly store just the information how many blocks it will be to allow for easier/faster parsing of the remaining block nodes without having to decompress the .rar first. This count could also be useful for some corruption checks.
Some examples where I think this could be useful (for reducing the number of explicitly listed block IDs):
- Compressed archives, which cannot be recreated but where (some parts of) the included files are also stored somewhere else again. Maybe even jpegs which are (for some weird reason) also stored as a bitmap or even png extracted from them.
- Splitting files into multiple blocks for deduplication, for example using anchor hashing. It should only be used as long as the blocks should only be referenced and not removed by deduplication itself. So the most relevant use case will probably be files inside a non recreatable archive.
On the other hand this may increase the count of block IDs that "pollute the namespace" if only a few of them are needed. Although files where no blocks at all are needed could be skipped.