Performance
The procedure described above assumes
that an existing backup is being checked for identical files prior to
a new backup of a file. This applies to files in the previous backup
as well as to the newly created one. Of course it does not make much
sense to directly compare every file to be backed up with the previous
backup. So, the md5 sums of the previous backup are being compared
with the md5 sum of the file to be backed up with the utilization of
the hash table.
Computing the md5 sum is fast, but in case of a large amount of data it
is still not fast enough. For this reason storeBackup checks initially
if the file was not altered since the last backup (path + file name,
ctime, mtime and size are the same). If that is the case, the md5 sum
of the last backup is being adopted and the hard link set. If the
initial check shows a difference, the md5 sum is being computed and a
check takes place to see if another file with the same md5 sum
exists. (The comparison with a number of backup series uses an expanded
but similarly efficient process). For this approach only a few md5
sums need to be calculated for a backup. If you want to tune
storeBackup, especially if you save via NFS, there are two things you
can do:
- tune NFS (see configuring nfs)
- use the lateLinks option of storeBackup, and possibly delete
your
old backups independent from the backup process.
Using storeBackup with lateLinks is like using an asynchronous
client / server application or to be more precisely like using
multiple batches on (normally) multiple machines:
- Checking the source directory to know what has changed and to
be compressed and save the changed data to the backup directory on
the backup server. The missing directories and hard links in the
backup are stored in a protocol file.
- Take this information and restore a ``normal'' fully linked
backup.
- Delete old backups depending on the rules for the deletion.
The follwing performance measurements only show the direct backup time
(without calling storeBackupUpdateBackup.pl8). They have
been done with a beta version of storeBackup 2.0.
Some background information to the following numbers: The backup was
run on an Athlon X2, 2.3 GHz, 4 GB RAM. The NFS server was an
Athlon XP, 1.84 GHz, 1.5 GB RAM. The network was running with
100 MBit/s, storeBackup was used with standard parameters. The units
of the measurements are in hours:minutes:seconds or minutes:seconds.
The size of sourceDir was 12GB, the size of the backup done with
storeBackup was 9.2 GB. The backups were done with 4769 directories
and 38499 files. StoreBackup.pl linked 5038 files internally which
means these were duplicates. The source for the data were my files and
the ``Desktop'' from my Windows XP Laptop, i.e. ``real'' data.
The first table shows the time for copying the data to the nfs server
with standard programs. The nfs server is mounted with option
async9, which is a performance optimization
and not the standard configuration.
command |
duration |
size of backup |
cp -a |
28:46 |
12 GB |
tar jcf |
01:58:20 |
9.4 GB |
tar cf |
21:06 |
12 GB |
All is like it was to expect: tar with compression is much slower than
the other ones; and cp is slower than tar, because it has to create
lots of files. There is one astonishing number: The size of the backup
file of tar jcf ist 9.4 GB, while the resulting size of the backup
with storeBackup.pl is only 9.2 GB. We see the reason for this in the
internal linked 5038 files - the duplicates are stored only once
with storeBackup.
We do not see the effect of comparing the contents in this benchmark
again, but it makes a lot of differences in performance and especially
used disk space. If the time stamp of a file is changed, then
traditional backup software will store this file in an incremental
backup - storeBackup will only create a hard link.
Now let's run storeBackup.pl on the same contents. The nfs server is
still mounted with option async. There are no changes in the
source directory between the first to the second or third backup.
storeBackup |
1.19, Standard |
2.0, Standard |
2.0, lateLinks |
mount with async |
1. backup |
49:51 |
100% |
49:20 |
99% |
31:14 |
63% |
|
2. backup |
02:45 |
100% |
02:25 |
88% |
00:42 |
25% |
file system read cache empty |
3. backup |
01:51 |
100% |
01:54 |
100% |
00:26 |
23% |
file system read cache filled |
We can see the following:
- The first run of storeBackup.pl is faster than tar jcf
(tar with compression.) It is easy to understand why: storeBackup.pl
uses both cores of the machine, while the compression with tar uses
only one. But if you look a little bit deeper to the number, you see
that storeBackup.pl needs less than half the time (42%) of tar with
compression. It naturally additionally calculates all md5 sums
and has to perform the overhead of creating thousands of files (look
at the difference between cp and tar cf above). The effect
of reducing the time for copying more than 50% comes from two
effects: storeBackup.pl does not compress all files (depending on
their suffix, e.g., .bz2 files are not compressed again) and
it recognizes files with the same content and sets just a hard link
(also the reason for 9.2 instead of 9.4 GB).
- The second backup was done with a new mount of the source
directory, so the read cache for it was not filled. You can see some
improvement between version 1.19 and 2.0 because of better
parallelization of reading the data in storeBackup itself.
You see no difference in the third run between version 1.19 and 2.0,
because reading the source directory entries is now in the file
system cache, which means that the blocking factor is now the speed
of the nfs server - and that's the same in both runs.
- With option lateLinks, you can see an improvement by a factor
of 4. The time you see depends massively on the time needed for
reading the source directory (plus reading the information from the
previous backup, which is always the same).
Now let's do the same with an nfs mount without ``tricks'' like
configuring async:
command |
duration |
size of backup |
cp -a |
37:51 |
12 GB |
tar jcf |
02:02:01 |
9.4 GB |
tar cf |
25:05 |
12 GB |
storeBackup |
1.19, Standard |
2.0, Standard |
2.0, lateLinks |
mount with sync |
1. backup |
53:35 |
100% |
49:20 |
100% |
38:53 |
63% |
|
2. backup |
05:36 |
100% |
05:24 |
96% |
00:43 |
13% |
file system read cache empty |
3. backup |
05:10 |
100% |
04:54 |
95% |
00:27 |
9% |
file system read cache filled |
We can see the following:
- Everything is more or less slower, because of higher latency due
to the synchronous communication with the nfs server. If only one
file is written (like with tar), the difference to the backups with
async is smaller, if many files are written, it's bigger.
- We see that the difference between sync and async
using lateLinks is very small. The
reason is simple: Only a few files are written over nfs, so the
latency only has a small impact on the overall time for the
backup. This results in the fact that the backup with lateLinks and
a very fast source directory (cache) is now 10 times
faster.
- Because the latency is not important for making a backup, I
mounted this file server over a vpn10 over the Internet. This means very high latency and a bandwidth of
about 20KByte/s from the nfs server and 50KByte/s to the nfs server
(seen on a network monitoring tool). With same same boundary
conditions as before (mounted with async, source directory
file system in cache, no changes) I got a speed up with lateLinks
(compared with non-lateLinks backup)
by a factor of 70.
So if your changed or new files are not too big compared with the
available bandwidth, you can also use storeBackup (with lateLinks)
for making a backup over a vpn on high latency lines.11 Naturally you
should not choose option lateCompress in such a case. Another
advantage with lateLinks in such cases is, that parallelization
works much better, because reading unchanged data in the source
directory nearly needs no action on the NFS mount.
Conclusion: If you mount with nfs, you can make it really fast using
option lateLinks. See section 7.6 for how to configure
it.
Using ``blocked files'' also improves performance a lot because only a
small percentage of an image file has to be copied or compressed. See
the description about using blocked files for the influence of this
option to performance and space needed.
Heinz-Josef Claes
2014-04-20