This is intended as a short, rather high-level, documentation.
# -- NOTATIONS -- #
Define a few notations, as used in the following.
- file tree: A file tree with possible excluded subtrees residing on exactly
one file system. The file tree does not need to share the root
of that file system.
# -- GENERAL -- #
This backup program makes incremental backups of a file tree, using hard
links between differently dated backups of that file tree to reduce space
requirements while keeping the data in a simple form.
The destination file system is usually different to the on the source file
tree resides on, in particular, it may be on a different host. The rsync(1)
command line program serves as backend. Therefore, at least one of the
source and destination need to be local.
The backup process is organized in a way such that any pure file renames
between the old and new backup directory do not cause a new file to be
created. I.e., the two files are hard linked. This is contrary to the
simpler method of simply exploiting rsync's '--link-dest' functionality, and
thereby to the early versions of this program.
This is achieved by
1. Creating a local mirror of the source tree inside itself
(non-recursively, of course), using hard links. In common backup
operation, there are always at least two such mirrors: an obsolete one,
a remnant of the last backup's creation, featuring the same structure
as the last backup as stored in the destination file tree, and the newly
created mirror. By transitivity, the two mirrors share hard links. See
also APPENDIX: NOTE-1.
2. Copying the pair of these two mirrors to the destination tree, thereby
referencing the old mirror to the (identical!) old mirror in the
destination tree. Since the two mirrors are inter-linked, and the
referencing allows for linking the old mirror to the (old) latest backup,
the new mirror (and very latest backup) is effectively linked to the
old latest backup. See also APPENDIX: NOTE-2.
The chosen procedure has some side effects.
* Backups are available on the source host (however they likely diverge from
the actual backups, as the former are linked to the live file tree and
hence change upon in-place modification). See also APPENDIX: NOTE-1.
`- Note that, currently, neither in the source nor in the destination tree,
backups (resp. mirrors) are deleted. In the source tree this might be
more desired than in the destination tree.
* The local mirror creation is much faster than the transfer to the backup
medium, in particular, if it is remote. Therefore, the state, excluding
the aforementioned in-place modifications, is closer to consistency with a
single point in time. One might argue though, that precisely these
in-place modifications cause the window of modifications to be in fact
widened to the adjunction of the mirroring and the transfer.
There are also some clear (but probably not grave) drawbacks to the simpler
* There is one more step involved, thereby adding complexity.
* Both the source and destination file tree need to reside on respectively
one single file system. As a consequence, this program needs to be
configured for each file system containing data to be backed up. This
could in theory be done in the program itself, but seems not worth the
* A little additional space is needed on the source file system, however
only for new file links and directories, not for new actual regular files.
`- In particular though, the source file system needs to be writable. This
problem may possibly be circumvented using bind mounts.
# -- REMOTE METHOD CALLS -- #
This program essentially wraps rsync, however performs a few additional,
useful functions, some of which need to be executed on the source or
destination host, irrespective of whether that is local or not. In order to
allow for this, there exist wrapper methods, namely 'run_function' and
'run_command', allowing to execute a local bash function on a remote host.
Methods to be executed either on the source or the destination are located in
the source files 'source' and 'destination' and annotated with environment
dependencies ('requires'). See also the 'remote' source file.
# -- APPENDIX -- #
# NOTE-1: The old mirror.
As the old mirror is a remnant of the last backup it features exactly the
same file tree as that last backup - by labels, not by content. This is
due to this old mirror being (purposefully) linked to the live file tree,
in which files may be modified in place, which due to the hard links also
appear in the old mirror.
# NOTE-2: Avoid propagation of errors on the old source mirror.
The simple way to do the transfer from the source to the destination tree
may seem to use the old latest backup on the destination as part of the
target, when referencing the old mirror to it.
I.e., sync (old_mirror, new_mirror) -> (old_latest_backup, new_backup).
This is however, not a good idea, since as explicated in NOTE-1, old_mirror
may change and hence the above operation would change old_latest_backup.
Since this is not desired, old_latest_backup is instead only used as
link-dest for rsync, which unfortunately causes a superfluous instance of
old_latest_backup to be created in the destination tree. This is simply
deleted in the end.
# NOTE-3: Imperfection.
The chosen solution is far from perfect. In particular, the backups are
far from being atomar. Apparently, the Btrfs makes this easier .
Another alternative might be to use an overlayFS to record new writes.
Further, in order to at least get closer to atomicity, a full copy, made
on and of the source tree, could be done before actual transferral. This
would however require a lot of space and also only likely be beneficial, if
the transfer is non-locally.
Also, the extranously created copy of the source's old mirror on the
destination is disturbing. I do however not know of way to avoid this,
unless not relying on the rsync(1) command line program as backend.
vi: ai et tw=77 fo+=t