data deduplication program
git clone git://
Log | Files | Refs | README | LICENSE

commit 2c954112417c5101887d3a789cfd97f44d875390
parent 4590e214c6140e6d71896c8133120ebc2af287a1
Author: sin <>
Date:   Wed, 21 Mar 2018 15:20:07 +0000


Makefile | 2+-
README | 32++++++++++++++++++++++++++++++++
2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile @@ -3,7 +3,7 @@ PREFIX = /usr/local SRC = dedup.c OBJ = dedup.o BIN = dedup -DISTFILES = $(SRC) LICENSE Makefile arg.h tree.h +DISTFILES = $(SRC) LICENSE Makefile README arg.h tree.h CFLAGS = -g -Wall CPPFLAGS = -I/usr/local/include diff --git a/README b/README @@ -0,0 +1,32 @@ +dedup is a simple data deduplication program. It is designed to be +used in a pipeline with tar/gpg etc. + +dedup only handles a single file at a time, so using tar is advised. +For example, to dedup a tar file you can invoked dedup as follows: + + tar cf - ~/bak | dedup + +This will create a .{cache,index,store} in the current directory. The +store file contains all the unique blocks. The index file contains +all the revisions of files that have been deduplicated. Each revision +is identified by its SHA256 hash. The cache file is only used to +speed up block comparison. + +To list all known revisions run: + + dedup -l + +You will get a list of hashes. Each hash corresponds to a single file +(in this case, a tar archive). + +To extract a file from the deduplicated store run: + + dedup -e <hash> > bak.tar + +You can mix dedup with other programs like gpg(1). For example to +perform a remote backup you can use the following command: + + tar cf - ~/bak | gpg -c | ssh user@host dedup + +Cheers, +sin