dedup

data deduplication program
git clone git://git.2f30.org/dedup.git
Log | Files | Refs | README | LICENSE

README (872B)


      1 dedup is a simple data deduplication program.  It is designed to be
      2 used in a pipeline with tar/gpg etc.
      3 
      4 dedup only handles a single file at a time, so using tar is advised.
      5 For example, to dedup a tar file you can invoke dedup as follows:
      6 
      7     tar cf - ~/bak | dedup -r ~/bak-dedup
      8 
      9 This will create .{cache,index,store} files in the ~/bak-dedup
     10 directory.  The store file contains all the unique blocks.  The index
     11 file contains all the revisions of files that have been deduplicated.
     12 Each revision is identified by its SHA256 hash.  The cache file is
     13 only used to speed up block comparison.
     14 
     15 To list all known revisions run:
     16 
     17     dedup -r ~/bak-dedup -l
     18 
     19 You will get a list of hashes.  Each hash corresponds to a single file
     20 (in this case, a tar archive).
     21 
     22 To extract a file from the deduplicated store run:
     23 
     24     dedup -r ~/bak-dedup -e <hash> > bak.tar
     25 
     26 Cheers,
     27 sin