path: root/NOTES
diff options
Diffstat (limited to 'NOTES')
1 files changed, 140 insertions, 0 deletions
diff --git a/NOTES b/NOTES
new file mode 100644
index 0000000..cc84920
--- /dev/null
+++ b/NOTES
@@ -0,0 +1,140 @@
+This is intended as a short, rather high-level, documentation.
+# -- NOTATIONS -- #
+ Define a few notations, as used in the following.
+- file tree: A file tree with possible excluded subtrees residing on exactly
+ one file system. The file tree does not need to share the root
+ of that file system.
+# -- GENERAL -- #
+ This backup program makes incremental backups of a file tree, using hard
+links between differently dated backups of that file tree to reduce space
+requirements while keeping the data in a simple form.
+ The destination file system is usually different to the on the source file
+tree resides on, in particular, it may be on a different host. The rsync(1)
+command line program serves as backend. Therefore, at least one of the
+source and destination need to be local.
+ The backup process is organized in a way such that any pure file renames
+between the old and new backup directory do not cause a new file to be
+created. I.e., the two files are hard linked. This is contrary to the
+simpler method of simply exploiting rsync's '--link-dest' functionality, and
+thereby to the early versions of this program.
+ This is achieved by
+ 1. Creating a local mirror of the source tree inside itself
+ (non-recursively, of course), using hard links. In common backup
+ operation, there are always at least two such mirrors: an obsolete one,
+ a remnant of the last backup's creation, featuring the same structure
+ as the last backup as stored in the destination file tree, and the newly
+ created mirror. By transitivity, the two mirrors share hard links. See
+ also APPENDIX: NOTE-1.
+ 2. Copying the pair of these two mirrors to the destination tree, thereby
+ referencing the old mirror to the (identical!) old mirror in the
+ destination tree. Since the two mirrors are inter-linked, and the
+ referencing allows for linking the old mirror to the (old) latest backup,
+ the new mirror (and very latest backup) is effectively linked to the
+ old latest backup. See also APPENDIX: NOTE-2.
+ The chosen procedure has some side effects.
+ * Backups are available on the source host (however they likely diverge from
+ the actual backups, as the former are linked to the live file tree and
+ hence change upon in-place modification). See also APPENDIX: NOTE-1.
+ `- Note that, currently, neither in the source nor in the destination tree,
+ backups (resp. mirrors) are deleted. In the source tree this might be
+ more desired than in the destination tree.
+ * The local mirror creation is much faster than the transfer to the backup
+ medium, in particular, if it is remote. Therefore, the state, excluding
+ the aforementioned in-place modifications, is closer to consistency with a
+ single point in time. One might argue though, that precisely these
+ in-place modifications cause the window of modifications to be in fact
+ widened to the adjunction of the mirroring and the transfer.
+ There are also some clear (but probably not grave) drawbacks to the simpler
+ * There is one more step involved, thereby adding complexity.
+ * Both the source and destination file tree need to reside on respectively
+ one single file system. As a consequence, this program needs to be
+ configured for each file system containing data to be backed up. This
+ could in theory be done in the program itself, but seems not worth the
+ effort.
+ * A little additional space is needed on the source file system, however
+ only for new file links and directories, not for new actual regular files.
+ `- In particular though, the source file system needs to be writable. This
+ problem may possibly be circumvented using bind mounts.
+ This program essentially wraps rsync, however performs a few additional,
+useful functions, some of which need to be executed on the source or
+destination host, irrespective of whether that is local or not. In order to
+allow for this, there exist wrapper methods, namely 'run_function' and
+'run_command', allowing to execute a local bash function on a remote host.
+Methods to be executed either on the source or the destination are located in
+the source files 'source' and 'destination' and annotated with environment
+dependencies ('requires'). See also the 'remote' source file.
+# -- APPENDIX -- #
+# NOTE-1: The old mirror.
+ As the old mirror is a remnant of the last backup it features exactly the
+ same file tree as that last backup - by labels, not by content. This is
+ due to this old mirror being (purposefully) linked to the live file tree,
+ in which files may be modified in place, which due to the hard links also
+ appear in the old mirror.
+# NOTE-2: Avoid propagation of errors on the old source mirror.
+ The simple way to do the transfer from the source to the destination tree
+ may seem to use the old latest backup on the destination as part of the
+ target, when referencing the old mirror to it.
+ I.e., sync (old_mirror, new_mirror) -> (old_latest_backup, new_backup).
+ This is however, not a good idea, since as explicated in NOTE-1, old_mirror
+ may change and hence the above operation would change old_latest_backup.
+ Since this is not desired, old_latest_backup is instead only used as
+ link-dest for rsync, which unfortunately causes a superfluous instance of
+ old_latest_backup to be created in the destination tree. This is simply
+ deleted in the end.
+# NOTE-3: Imperfection.
+ The chosen solution is far from perfect. In particular, the backups are
+ far from being atomar. Apparently, the Btrfs makes this easier [0].
+ Another alternative might be to use an overlayFS to record new writes.
+ Further, in order to at least get closer to atomicity, a full copy, made
+ on and of the source tree, could be done before actual transferral. This
+ would however require a lot of space and also only likely be beneficial, if
+ the transfer is non-locally.
+ Also, the extranously created copy of the source's old mirror on the
+ destination is disturbing. I do however not know of way to avoid this,
+ unless not relying on the rsync(1) command line program as backend.
+# Links
+[0] https://btrfs.wiki.kernel.org/index.php/Incremental_Backup
+vi: ai et tw=77 fo+=t