aboutsummaryrefslogtreecommitdiff
path: root/NOTES
diff options
context:
space:
mode:
authorEinhard Leichtfuß <alguien@respiranto.de>2019-10-18 00:28:00 +0200
committerEinhard Leichtfuß <alguien@respiranto.de>2019-10-18 01:17:39 +0200
commit4e2ceb9875c81c33b09f6eaf0858b7f9591b22cc (patch)
treebe3860dc8c26b3c18a30c2a81b58e00025b98069 /NOTES
parent51f58d7c4c984ec8ccc5ab0a3d31d04cd39d499c (diff)
Restructure, and rewrite substantial partsHEADmasterdevel
A notable problem (to me) with the former version was, that upon simple movements in the source file tree, these would in general cause duplicated data in the backup forest. This problem has been resolved by essentially tracking file movements, using hard links. Also, the code has been divided into several different files, extranous code removed, the organization of remote code execution simplified. In the process of simplification, all parts requiring direct interaction of the user with the program have been replaced. In most cases, this means that the program now just terminates with an error instead of allowing the user to confirm deletion of an unexpected (remnant) file. This may be considered a drawback, but actually these interactive options were anyways suboptimal solutions in most cases - where they occured, which was due to remnant files of aborted or failed former backups. In general, there have been a lot of changes. Which are not thoroughly documented here. Since there is often no strong relation to the old code, this is not deemed necessary, as the in-code documentation is expected to be of sufficient help. Also, there has been added a little general documentation in the README and particularly the NOTES file. It is to be noted that this new version is far from sophisticated. It has been tested with and is in use for real data, however lacks a lot of convenience. In particular, if a backup fails unexpectedly, the next backup will in the very most cases loudly fail without manual intervention. Also, the program is not able to continue the growth of a backup tree built with former versions of this program, by itself. This can however be arranged by hand.
Diffstat (limited to 'NOTES')
-rw-r--r--NOTES140
1 files changed, 140 insertions, 0 deletions
diff --git a/NOTES b/NOTES
new file mode 100644
index 0000000..cc84920
--- /dev/null
+++ b/NOTES
@@ -0,0 +1,140 @@
+This is intended as a short, rather high-level, documentation.
+
+
+# -- NOTATIONS -- #
+
+ Define a few notations, as used in the following.
+
+- file tree: A file tree with possible excluded subtrees residing on exactly
+ one file system. The file tree does not need to share the root
+ of that file system.
+
+
+
+# -- GENERAL -- #
+
+ This backup program makes incremental backups of a file tree, using hard
+links between differently dated backups of that file tree to reduce space
+requirements while keeping the data in a simple form.
+
+ The destination file system is usually different to the on the source file
+tree resides on, in particular, it may be on a different host. The rsync(1)
+command line program serves as backend. Therefore, at least one of the
+source and destination need to be local.
+
+ The backup process is organized in a way such that any pure file renames
+between the old and new backup directory do not cause a new file to be
+created. I.e., the two files are hard linked. This is contrary to the
+simpler method of simply exploiting rsync's '--link-dest' functionality, and
+thereby to the early versions of this program.
+
+ This is achieved by
+
+ 1. Creating a local mirror of the source tree inside itself
+ (non-recursively, of course), using hard links. In common backup
+ operation, there are always at least two such mirrors: an obsolete one,
+ a remnant of the last backup's creation, featuring the same structure
+ as the last backup as stored in the destination file tree, and the newly
+ created mirror. By transitivity, the two mirrors share hard links. See
+ also APPENDIX: NOTE-1.
+
+ 2. Copying the pair of these two mirrors to the destination tree, thereby
+ referencing the old mirror to the (identical!) old mirror in the
+ destination tree. Since the two mirrors are inter-linked, and the
+ referencing allows for linking the old mirror to the (old) latest backup,
+ the new mirror (and very latest backup) is effectively linked to the
+ old latest backup. See also APPENDIX: NOTE-2.
+
+
+ The chosen procedure has some side effects.
+
+ * Backups are available on the source host (however they likely diverge from
+ the actual backups, as the former are linked to the live file tree and
+ hence change upon in-place modification). See also APPENDIX: NOTE-1.
+ `- Note that, currently, neither in the source nor in the destination tree,
+ backups (resp. mirrors) are deleted. In the source tree this might be
+ more desired than in the destination tree.
+
+ * The local mirror creation is much faster than the transfer to the backup
+ medium, in particular, if it is remote. Therefore, the state, excluding
+ the aforementioned in-place modifications, is closer to consistency with a
+ single point in time. One might argue though, that precisely these
+ in-place modifications cause the window of modifications to be in fact
+ widened to the adjunction of the mirroring and the transfer.
+
+
+ There are also some clear (but probably not grave) drawbacks to the simpler
+method.
+
+ * There is one more step involved, thereby adding complexity.
+
+ * Both the source and destination file tree need to reside on respectively
+ one single file system. As a consequence, this program needs to be
+ configured for each file system containing data to be backed up. This
+ could in theory be done in the program itself, but seems not worth the
+ effort.
+
+ * A little additional space is needed on the source file system, however
+ only for new file links and directories, not for new actual regular files.
+ `- In particular though, the source file system needs to be writable. This
+ problem may possibly be circumvented using bind mounts.
+
+
+
+# -- REMOTE METHOD CALLS -- #
+
+ This program essentially wraps rsync, however performs a few additional,
+useful functions, some of which need to be executed on the source or
+destination host, irrespective of whether that is local or not. In order to
+allow for this, there exist wrapper methods, namely 'run_function' and
+'run_command', allowing to execute a local bash function on a remote host.
+Methods to be executed either on the source or the destination are located in
+the source files 'source' and 'destination' and annotated with environment
+dependencies ('requires'). See also the 'remote' source file.
+
+
+
+# -- APPENDIX -- #
+
+# NOTE-1: The old mirror.
+ As the old mirror is a remnant of the last backup it features exactly the
+ same file tree as that last backup - by labels, not by content. This is
+ due to this old mirror being (purposefully) linked to the live file tree,
+ in which files may be modified in place, which due to the hard links also
+ appear in the old mirror.
+
+
+# NOTE-2: Avoid propagation of errors on the old source mirror.
+ The simple way to do the transfer from the source to the destination tree
+ may seem to use the old latest backup on the destination as part of the
+ target, when referencing the old mirror to it.
+
+ I.e., sync (old_mirror, new_mirror) -> (old_latest_backup, new_backup).
+ This is however, not a good idea, since as explicated in NOTE-1, old_mirror
+ may change and hence the above operation would change old_latest_backup.
+ Since this is not desired, old_latest_backup is instead only used as
+ link-dest for rsync, which unfortunately causes a superfluous instance of
+ old_latest_backup to be created in the destination tree. This is simply
+ deleted in the end.
+
+
+# NOTE-3: Imperfection.
+ The chosen solution is far from perfect. In particular, the backups are
+ far from being atomar. Apparently, the Btrfs makes this easier [0].
+ Another alternative might be to use an overlayFS to record new writes.
+
+ Further, in order to at least get closer to atomicity, a full copy, made
+ on and of the source tree, could be done before actual transferral. This
+ would however require a lot of space and also only likely be beneficial, if
+ the transfer is non-locally.
+
+ Also, the extranously created copy of the source's old mirror on the
+ destination is disturbing. I do however not know of way to avoid this,
+ unless not relying on the rsync(1) command line program as backend.
+
+
+# Links
+[0] https://btrfs.wiki.kernel.org/index.php/Incremental_Backup
+
+
+vi: ai et tw=77 fo+=t