We will make this discussion less abstract by providing an example. A common backup strategy is to run your backups to the master backup respository every day (maybe automatically via cron) and to move (or rotate) this backup offsite (or at least to another physically separate media) periodically. For illustration purposes, let's assume you have a separate backup disk stored offsite that you will update weekly.32 Once each week you will add the newly made backups from your master backup to the backup copy on the external(offsite) disk (which you temporarily bring onsite and connect to your computer).
Copying the new backup only with ``cp -a'' is a bad idea, because the newly copied directories (backups) will not be hard linked to the existing ones on the external disk.33 You can use linkToDirs.pl to link (and copy) the new backups in the master backup to the existing ones in the backup copy on the external disk. Using linkToDirs.pl is nice for ad hoc replications, but not the best for planned and automated ones.
Another common way to copy the new backups to the external disk is to
use synchronization tools like rsync. There are two issues with
this approach: 1st, it takes very long if
you have lots of backups and 2nd, you will replicate
every fault on you master backup disk to your backup, and that's
really not what you want. Imagine, your disk for the master backup
gets a block error in a file from the backup one month ago. So the
affected file is broken in your backup. If you now synchronize the
disk with e.g. rsync, you will copy the broken file. In the
worst of all cases, you can destroy your whole Backup by this method
(without getting more security). If you use the replication from
storeBackup, old data is not affected in the replication.
BUT STOP: What if the newly copied data is broken because some
sectors of the disk are seriously broken or you have to deal with
broken RAM or any other reason which leads to incorrect data in your
master backup? Will you ever determine that parts of your data in the backup are
broken? The backup program storeBackup will tell you the same
as rsync about that - nothing, because it is not in their
control. For this reason you should run
storeBackupCheckBackup.pl which
recalculates check sums for every file periodically on your
backup(s). By running this program, you are able to see faults in old
backups which you are able to correct manually if you have a
replica. And you are able to see in an early stage if your new backups
are broken. Therefore, we suggest to run
storeBackupCheckBackup.pl on new backups every week or so on
the master backup and on the copy plus to run it on old backups
(which may take a long time) every few months.
If you recognize errors on your hard disk, you should investigate deeper into the problem and not hesitate to replace the disk.
The basic idea of storeBackup's replication feature is to solve the issues described above. A replication means we have the same state in two different locations (e.g., in the master backup and in the backup copy). That's what we have done in the description above with the cp -a command. Let's say, this was the backup from Monday. After a day we have a change (a new backup on Tuesday) in the master backup. For the replication, we need just the differences between the backup from Monday and the backup from Tuesday. If we have some clever algorithm to get all the changes (deltas) from the backup of Monday to the backup of Tuesday, we could transport these changes to the backup copy on the external disk and rebuild the full backup (with all links, permissions and so on) on the external disk. As a result, the backup on the external disk contains exactly the same information as the master backup.
If we want to connect the external disk only once a week34 we need a place to store the differences. We will have these deltas from Monday to Tuesday, from Tuesday to Wednesday etc. What we are doing, is to rebuild the complete and full backups on the backup copy disk, e.g.:
This means we need the deltas between two consecutive backups in the master backup. In principle, there are two ways to get these:
StoreBackup generates deltas to (one or more) existing backups with the option lateLinks and temporarily stores them in a "delta cache".35 (See section 7.6 for more information about how to configure it.)
The storeBackup replication functionality provides the following features:
In short, if you make a ``normal'' backup (without replication) with storeBackup.pl, you typically have one place (see option backupDir) where you store your backups. This will be called the master backup. (It is the same as what we have called the master backup repository in other sections of this document.) If the disk (or e.g., the file) system for this ``master backup'' fails, you will lose the backup and therefore the history of your data. It is a form of data loss that can be prevented with storeBackup's replication feature.
Heinz-Josef Claes 2014-04-20