Tuesday, July 12, 2011

Duplicity – secure incremental backup

Overview:

Duplicity is a tool to create GPG-encrypted (this way you can store your backups at remote servers without having to worry about who has access to your data) incremental backups to remote servers. Its a quite handy and secure method.
Installation
The steps to install duplicity is as follows
wget http://code.launchpad.net/duplicity/trunk/0.6.02/+download/duplicity-0.6.02.tar.gz
tar -xvf duplicity-0.6.02.tar.gz
cd duplicity-0.6.02.tar.gz
python setup.py install
If you come across any  librsync.so errors . You can resolve it by using the following steps
wget http://sourceforge.net/projects/librsync/files/librsync/0.9.7/librsync-0.9.7.tar.gz/download
tar -xzvf librsync-0.9.7.tar.gz
cd librsync-0.9.7
./configure
make
make install
Now we got duplicity installed :-)

Create a GPG key

In order to be able to encrypt your backups, you have to create a GPG key.  Open a second shell and run the following command (this generates some “randomness” on your system, which will be useful to create a secure key). Kill the command with CTRL+C when you are done with key generation.
while /bin/true; do cat /var/log/messages > ~/temp.txt; sleep 1; done;
On your other shell, create your GPG key. Be sure to use a secure passphrase and to copy/write down the key ID which is displayed at the end of the generation process (we’ll need it for ftplicity). Also, make sure to backup the key to a secure location outside your server. As all your backups will be encrypted, they will be worthless if your server crashes and you lose the key.
gpg –gen-key
Default options should be fine. This will create your key in ~/.gnupg/. Once its done you can verify the existence of your key using the command
gpg –list-keys
The next step is to prepare an off-site location to receive the backup files.
The software supports different protocols like FTP,RSYNC,SCpP.
I am restricting myself with SCP here

Simple unEncrypted Backup over SCP

Setup ssh keys on the backup server allowing root to seamlessly login to the backup server.
duplicity /home/me scp://uname@other.host/usr/backup

  • If the above command is run repeatedly, the first session will be a full backup, and subsequent ones will be incremental.
    The full option can be used to force a full backup. The next command also excludes the /tmp directory.
    duplicity full –exclude /tmp /home/me scp://uname@other.host/usr/backup
  • Basic restore command—restore the /home/me directory backed up with scp above to directory restored_dir:

  • duplicity scp://uid@other.host//usr/backup restored_dir

  • To enable verbose mode use the option -v
    Specify verbosity level (0 is total silent, 4 is the default, and 9 is noisiest)
    The command would look like
    duplicity  -v5 /home/me scp://uid@other.host/some_dir

    Encrypted Backup over SCP

    Here we use the GPG key generated earlier
    The format would be look like this
    duplicity
        --encrypt-key=${GPG_KEY} \
        --sign-key=${GPG_KEY} \
        --include=/boot \
        --include=/etc \
        --include=/home \
        --include=/root \
        --include=/var/lib/mysql \
        --exclude=/** \
        ${SOURCE} ${DEST}
    Needless to say the include and exclude options are for specifying the backup criteria.
    duplicity –encrypt-key=”FFF7730B” –sign-key=”FFF7730B” -v5 /home/me scp://uid@other.host/some_dir
    you will be asked for a GnuPG passphrase. You can type in any password you like; this has to be done everytime you run duplicity. The backup will be encrypted with the help of GnuPG. Permissions and ownerships will be preserved in the backup.
    To avoid this issue , you can simply set the passphrase as  environment variable using the command
    export PASSPHRASE=gpgpassphrase

    Backup Format & Explanation

    Once it is executed ,  you can see the backup in the server and it would look like the following way
    duplicity-full-signatures.2005-11-27T01:00:01-05:00.sigtar.gpg
    duplicity-full.2005-11-27T01:00:01-05:00.manifest.gpg
    duplicity-full.2005-11-27T01:00:01-05:00.vol1.difftar.gpg
    duplicity-full.2005-11-27T01:00:01-05:00.vol2.difftar.gpg
    The signatures file contains, signatures of each file that is backed up so that Duplicity can figure out which part of a file has changed. With that information it can upload only the missing part to complete a new backup set.
    The manifest file contains a listing of all the files in the backup set and a SHA1 hash of each file, probably so Duplicity can tell very quickly whether a file has been changed or not since the last backup.
    The volume files (vol1 and vol2) contain the actual file data. It appears that Duplicity volumes are at most 5MB. That’s helpful during restores so the entire backup set does’t not need to be downloaded to retrieve a single file. Duplicity will only download the volume containing that file.

    Common Options:

    Depending on the parameters and order of the parameters in the duplicity command, different functions can be performed. For example, an archive can be verified to see if a complete backup was made and what files, if any, have changed since the last backup.
    duplicity verify [options] source_url target_directory
    duplicity verify -v4 scp://user@bakuphost/etc /etc

    Listfiles

    It’s sometimes handy to check which files are in the latest backup set.
    duplicity list-current-files [options] target_url
    The command would look like
    duplicity list-current-files –archive-dir /root/test/ scp://user@backupserver/some_dir

    Restore

    The main purpose of backup is to restore data which has been lost.  The following is the common format for restoring the data from the latest backup
    duplicity scp://uid@other.host/some_dir  /home/me
    Duplicity enters restore mode because the URL comes before the local directory. If we wanted to restore just the file “Mail/article” in /home/me as it was three days ago into /home/me/restored_file:
    duplicity -t 3D –file-to-restore Mail/article scp://uid@other.host/some_dir /home/me/restored_file
    The following command compares the files we backed up, so see what has changed since then:
    duplicity verify scp://uid@other.host/some_dir /home/me
    The following command can be used to retrieve a  single file from backup
    duplicity –encrypt-key “” –sign-key “” –file-to-restore home/sburke/file.txt  scp://user@server.com/backup/  /var/tmp/file.txt
    1. The path to the file that is to be restored is relative to the directory on which the backup set is based. So in the command above,  home/sburke/file.txt plus the directory on which we based our backup (/backup) equals /backup/home/sburke/file.txt/. It would not work to put /backup/home/sburke/file.txt as the source path because the backup will not recognize /backup as a valid path. The last portion in the above command is the location where the file will be restored.
    To delete old backups, we can use the following  command
     duplicity --full --remove-older-than 1Y /media/data/backup scp://uid@server/personal
    To automate the tasks, you can write a shell script