Before explaining some examples, it is not too bad if you know what you
are doing. Here are some important aspects about how storeBackup
works: (The following explains the basic mechanism, for
performance reasons it is implemented a little bit different. There are
several waiting queues, parallelisms and a tiny scheduler inside which
are not described here.)
storeBackup uses at least two internal flat files in each generated
backup:
.md5CheckSums.info |
- general information about the backup |
.md5CheckSums[.bz2] |
- information about every file (dir, etc.) saved |
When starting storeBackup.pl, it will basically do (beside some other
things):
- Read the contents of the previous .md5CheckSums[.bz2] file
and store it in two dbm databases:
dbm(md5sum) and dbm(filename) (dbm(md5sum) means, that md5sum is the
key). Default is to store these databases in memory.
- Read the contents of other .md5CheckSums[.bz2] files
(otherBackupSeries) and store it to dbm(md5sum). Always store
the last copied file in the dbm file if two different files
(e.g. from different backup series) are identical. This assures
that multiple versions of the same file in different backups are
unified in future backups.
- Without sharing
files from another backup series (simple backup), see
example 1 and
example 2 storeBackup.pl works as follows:
In a loop over all files to backup it will do:
- Look into dbm(filename) - which contains all files from the
previous backup - if the exact same file exists and has not
changed. In this case, the needed information are the values of
dbm(filename).
If it existed in the previous backup(s), make a hard link and go
to 3.)
- Calculate the md5 sum of the file to backup and
look into dbm(md5sum) for that md5 sum
If it exists there, make a hard link.
If it does not exist, copy or compress the file.
- Write the information of the new file to the
corresponding .md5CheckSums[.bz2] file
- With sharing of files
from another backup series, see
example 3 and
example 4 storeBackup works as follows:
In a loop over all files to backup it will do:
- Look into dbm(filename) - which contains all files
from the previous backup - if the exact same file exists and has
not changed. In this case, the needed information are the values
of dbm(filename).
(Now, because there are independent backups, it is possible, that
a file with the same content exists in another backup series. So
storeBackup.pl has to look into the dbm(md5sum) to ensure linking
to the same file from all different backup series.)
- Calculate the md5 sum of the file to backup if not known from
step 1
look into dbm(md5sum) for that md5 sum
If it exists there, make a hard link
If it does not exist, copy or compress the file
- Write the information of the new file to the corresponding
.md5CheckSums[.bz2] file.
- The Option lateLinks is used as follows
example 6)
If you save your backup via NFS to a server, then most of the time
will be spent for setting hard links. Setting a hard link is very
fast, but if you have many thousands of them it takes some time.
You can avoid waiting for hard linking if you use the option
lateLinks:
- Make a backup with storeBackup and set --lateLinks (or
set lateLinks = yes) in the configuration file. Then
storeBackup will not generate any hard links, only a file will be
written with the information what has to be linked.
This newly, just generated backup is initially an incremental backup.
- In a separate step, call storeBackupUpdateBackup to set all
the required hard links to make full backups out of these
incomplete backups. Please also see section using option
lateLinks for a
more detailed explanation.
Conclusions:
- Do not delete a backup to which the hard links are not yet
generated. Use storeBackupUpdateBackup.pl to set the hard links and
check consistency. It is a good idea to only use storeBackup.pl or
storeBackupDel.pl for the deletion of old backups.
- All sharing of data in the backups is done via hard links. This
means:
- A backup series cannot be split across different file systems.
- If you want to share data between different backup series, all
backups must reside in the same file system.
- Every information of a backup in the .md5CheckSums is stored
with relative paths. It does not matter if you change the absolute
path to the backup or backup with a different machine (server makes
backup from client via NFS - client makes backup to server via
NFS).
Unresolved hard links to to other backup series (via option
lateLinks) are also stored with relative paths. This means: You can
move backupDir around as you like, but you should never change the
relative paths between backup series before resolving all the links
with storeBackupUpdateBackup.pl.
It is a good idea to use a configuration file instead of command line
options. Simply call:
# storeBackup.pl --generate <configFile>
Edit the configuration file and call storeBackup in the following way:
# storeBackup.pl -f <configFile>
You can override settings in the configuration file on the command line
(see Example 6).
If you have additional ideas or any questions, feel free to contact me
(hjclaes(at)web.de).
Heinz-Josef Claes
2014-04-20