Dev Notes

Software Development Resources by David Egan.

Incremental Backup using rsync with Hardlinks


Backup, Linux, Sysadmin, Ubuntu, rsync
David Egan

This method involves an incremental backup bash script that runs on a production server. The aim is to automatically build a local backup which can then be synchronised with a remote backup server.

Production Server: Backup User

Create a backup user - without sudo privileges:

adduser servernamebackup

Production Server: Source Directory

Create a directory called /home/backupuser/source. This will be used as the source for the local rsync, so it forms a target for:

  • a MySQL backup - mysqldump dumps database copies to /home/backupuser/source/sql
  • Symlinks to config files (in /home/backupuser/source/config)
  • Symlinks to site files (from the Apache root directory - /var/www/html)

Set up directories and add symlinks to /source:

sudo mkdir -p /home/backupuser/source/config # make the config directories
sudo mkdir /home/backupuser/source/sql # target for mysqldump
sudo ln -s /var/www/html /home/backupuser/source # Symlink site files
sudo ln -s /etc/apache2 /home/backupuser/source/config # Symlink Apache config files (including vhosts setup)
sudo ln -s /etc/mysql /home/backupuser/source/config # Symlink MySQL config files

Production Server: Incremental Backup

Incremental backup script:

#!/bin/bash
#
# Website Backup Script for SERVER
#
# Add this script as a cronjob to run daily. It creates an archive of incremental
# backups, with the current date set as the name of the backup directory.
# Assumes the script is run from /usr/local/sbin.
# ------------------------------------------------------------------------------

# Todays date in ISO-8601 format:
# ------------------------------------------------------------------------------
DAY0=$(date -I)
TIMESTAMP=$(date '+%Y-%m-%d at %H:%M:%S')

# Yesterdays date in ISO-8601 format:
# ------------------------------------------------------------------------------
DAY1=$(date -I -d "1 day ago")

# The source directory: this should contain a symlink to Apache doc root.
# ------------------------------------------------------------------------------
SRC="/home/servernamebackup/source/"

# The target directory
# ------------------------------------------------------------------------------
TRG="/home/servernamebackup/backup/$DAY0"

# The link destination directory
# ------------------------------------------------------------------------------
LNK="/home/servernamebackup/backup/$DAY1"

# Backup databases
# ------------------------------------------------------------------------------
USER="XXXXXXXXXXXXXXXXXXX"
PASSWORD="XXXXXXXXXXXXXXXXXXXXXXXXXX"
HOST="localhost"
DB_BACKUP_PATH="${SRC}sql"

# Get list of databases, but not 'Database' or 'information_schema'
# ------------------------------------------------------------------------------
DATABASES=$(mysql --user=$USER --password=$PASSWORD -e "SHOW DATABASES;" | grep -Ev "(Database|information_schema)")

DUMPFAIL=false

# Remove previous dumped databases
# ------------------------------------------------------------------------------
rm $DB_BACKUP_PATH/*

# Set up log
# ------------------------------------------------------------------------------
echo "Database backup report. ${TIMESTAMP}" > $DB_BACKUP_PATH/DB_LOG
echo "=======================================" >> $DB_BACKUP_PATH/DB_LOG

# Create dumps for each database
# ------------------------------------------------------------------------------
for DB in $DATABASES
do

  mysqldump -v --user=$USER --password=$PASSWORD --single-transaction --log-error=$DB_BACKUP_PATH/$DB.log --host=$HOST $DB > $DB_BACKUP_PATH/$DB.sql

  # Reportage - log result of mysqldump
  # ----------------------------------------------------------------------------
  if [[ $? -eq 0 ]]

  then

    echo -e "Mysqldump created ${DB}.sql\n" >> $DB_BACKUP_PATH/DB_LOG

  else

    echo "Mysqldump encountered a problem backing up ${DB}. Look in ${DB_BACKUP_PATH}/${DB}.log for information.\n" >> $DB_BACKUP_PATH/DB_LOG
    $DUMPFAIL=true

  fi

done

# The rsync options: follow the symlinks to make a hard backup
# ------------------------------------------------------------------------------
OPT=(-avL --progress --delete --link-dest=$LNK)

# Execute the backup
# ------------------------------------------------------------------------------
rsync "${OPT[@]}" $SRC $TRG

# Log Results
# ------------------------------------------------------------------------------
if [[ $? -gt 0 ]]
then

  # rsync Failure
  # ----------------------------------------------------------------------------
  echo "ERROR. rsync didn't complete the nightly backup: ${TIMESTAMP}" >> /var/log/server-backup.log
  echo "There was an error in the nightly backup for <servername>: ${TIMESTAMP}"| mail -s "Backup Error, <servername>" info@yourdomain.com

else

  # rsync Success
  # ----------------------------------------------------------------------------
  if [[ false == $DUMPFAIL ]]

    # rsync & mysqldump worked OK
    # --------------------------------------------------------------------------
    then

      echo "SUCCESS. Backup made on: ${TIMESTAMP}" >> /var/log/server-backup.log

      # email the report
      # ------------------------------------------------------------------------
      echo -e "${TIMESTAMP}: Server <servername> successfully ran a local backup.\nBoth rsync & mysqldump report success."| mail -s "Backup Success, <servername>" info@yourdomain.com

    # rsync worked but there was at least one mysqldump error
    # --------------------------------------------------------------------------
    else

      echo "PARTIAL SUCCESS. File backup (rsync) was successful, but mysqldump reports errors: ${TIMESTAMP}" >> /var/log/server-backup.log

      # email the report
      # ------------------------------------------------------------------------
      echo -e "${TIMESTAMP}: Server <servername> ran a local backup.\nFile backup reports success, however mysqldump reports at least one problem.\nCheck "| mail -s "Backup: Partial Success, <servername>" info@yourdomain.com

  fi

fi
  • Add the incremental backup script to /usr/local/sbin
  • Make executable
  • Set up cronjob
# Upload script
rsync --progress -a -v -rz -e "ssh -p 1234" ~/bash-projects/remote-backup-scripts/servername/backup-servername daviduser@123.456.789.0:~/

# Move into position
sudo mv ~/backup-servername /usr/local/sbin

# Make executable
sudo chmod u+x /usr/local/sbin/backup-servername

# Open crontable
sudo crontab -e

# Add the following to the crontab, save and exit
# Run backup script every day at 3 am
00 03 * * * /usr/local/sbin/backup-servername
# Send an email report from this cronjob
MAILTO=info@yourdomain.com

–delete

The delete option causes files that are not in source to be deleted from the target - and ONLY the target:

This tells rsync to delete extraneous files from the receiving side (ones that aren’t on the sending side), but only for the directories that are being synchronized. You must have asked rsync to send the whole directory … Files that are excluded from the transfer are also excluded from being deleted unless you use the –delete-excluded option or mark the rules as only matching on the sending side

rsync man-page

I originally used the --delete flag on this script - thinking that it would be necessary to keep the latest incremental backup synchronised with the current source filesystem.

However, --delete has no effect when syncing into an empty directory, which is basically what happens when using the --link-dest method of rsync. The directory specified by --link-dest is used as a reference point, and any files in source that are unchanged relative to this reference point are hardlinked to their exisiting inode in the reference directory.

Files that are not in the source directory (because they have been deleted) are therefore not hardlinked in the backup directory.

However there is a possibility that rsync is backing up to an incomplete directory that is left over from a previous failed run, and may contain things to delete.

--delete is probably superfluous when using --link-dest. The backup directory accurately reflects the source directory. However, we leave it in place in case of backing up to an incomplete directory.

Note: Be careful deleting old snapshots. The old directory in it’s entirety should be properly archived and a new full backup snapshot should be taken to kick off the next round of incremental backups.

This warning is not necessary as using the --link-dest option creates hard links. Thanks to aselinux for pointing this out in the comments.

Result

This will result in a backup directory on the production server, /home/backupuser/backup, that contains dated incremental backups.

The Production backup directory can then be targeted by a Backup server, again via rsync. Typically, this might involve:

  • Access to the backup directory by means of SSH public/private key pair
  • rrsync access for the backupuser - only rsync can run on the given SSH key, restricted to read-only

Next Steps

  • Setup Secure rsync link between servers
  • Run a cronjob on the backup server to read and rsync the backup directory on the Production server

Resources


comments powered by Disqus