deduplicating backup program
git clone
Log | Files | Refs | README | LICENSE

commit a7753b65b2b40ba265e30e8f2f0bda25da7baa53
parent 42797a877f6efb89a46968af77fe8ab9c0e335fa
Author: sin <>
Date:   Mon,  6 May 2019 01:00:49 +0100

Add DESIGN doc

ADESIGN | 51+++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 51 insertions(+), 0 deletions(-)

diff --git a/DESIGN b/DESIGN @@ -0,0 +1,51 @@ +Design notes +============ + +There are three main abstractions in the design of dedup: + + - The chunker interface + - The snapshot layer + - The block layer + +The block layer +--------------- + +From the outside world, the block layer is just an abstraction for +dealing with variable length blocks. All blocks are referenced with +their hash. + +The block layer is arranged into a stack of layers. From top to +bottom these are as follows: + + - Generic layer + - The compression layer + - The encryption layer + - The storage layer + +The generic layer is the one that client code interfaces with. It is +the top level entrypoint to the block layer. + +The compression layer will prepend a compression descriptor to the +block and then compress the block using snappy or lz4. It is possible +to disable compression in which case a special descriptor is prepended +and the data is passed uncompressed to the layer below. + +The encryption layer will prepend an encryption descriptor to the +block and then encrypt/authenticate the block using XChaCha20 and +Poly1305. It is possible to disable encryption in which case it acts +as a bypass with a special type of encryption descriptor. + +The storage layer will prepend a storage descriptor and append the +descriptor and the data to a single backing file. + +The snapshot layer +------------------ + +The snapshot abstraction is currently very simplistic. A snapshot is +a file under $repo/archive/<name>. The contents of the file are the +block hashes of the data stored in the snapshot. + +The chunker interface +--------------------- + +TBD