diff options
Diffstat (limited to 'NOTES')
-rw-r--r-- | NOTES | 140 |
1 files changed, 140 insertions, 0 deletions
@@ -0,0 +1,140 @@ +This is intended as a short, rather high-level, documentation. + + +# -- NOTATIONS -- # + + Define a few notations, as used in the following. + +- file tree: A file tree with possible excluded subtrees residing on exactly + one file system. The file tree does not need to share the root + of that file system. + + + +# -- GENERAL -- # + + This backup program makes incremental backups of a file tree, using hard +links between differently dated backups of that file tree to reduce space +requirements while keeping the data in a simple form. + + The destination file system is usually different to the on the source file +tree resides on, in particular, it may be on a different host. The rsync(1) +command line program serves as backend. Therefore, at least one of the +source and destination need to be local. + + The backup process is organized in a way such that any pure file renames +between the old and new backup directory do not cause a new file to be +created. I.e., the two files are hard linked. This is contrary to the +simpler method of simply exploiting rsync's '--link-dest' functionality, and +thereby to the early versions of this program. + + This is achieved by + + 1. Creating a local mirror of the source tree inside itself + (non-recursively, of course), using hard links. In common backup + operation, there are always at least two such mirrors: an obsolete one, + a remnant of the last backup's creation, featuring the same structure + as the last backup as stored in the destination file tree, and the newly + created mirror. By transitivity, the two mirrors share hard links. See + also APPENDIX: NOTE-1. + + 2. Copying the pair of these two mirrors to the destination tree, thereby + referencing the old mirror to the (identical!) old mirror in the + destination tree. Since the two mirrors are inter-linked, and the + referencing allows for linking the old mirror to the (old) latest backup, + the new mirror (and very latest backup) is effectively linked to the + old latest backup. See also APPENDIX: NOTE-2. + + + The chosen procedure has some side effects. + + * Backups are available on the source host (however they likely diverge from + the actual backups, as the former are linked to the live file tree and + hence change upon in-place modification). See also APPENDIX: NOTE-1. + `- Note that, currently, neither in the source nor in the destination tree, + backups (resp. mirrors) are deleted. In the source tree this might be + more desired than in the destination tree. + + * The local mirror creation is much faster than the transfer to the backup + medium, in particular, if it is remote. Therefore, the state, excluding + the aforementioned in-place modifications, is closer to consistency with a + single point in time. One might argue though, that precisely these + in-place modifications cause the window of modifications to be in fact + widened to the adjunction of the mirroring and the transfer. + + + There are also some clear (but probably not grave) drawbacks to the simpler +method. + + * There is one more step involved, thereby adding complexity. + + * Both the source and destination file tree need to reside on respectively + one single file system. As a consequence, this program needs to be + configured for each file system containing data to be backed up. This + could in theory be done in the program itself, but seems not worth the + effort. + + * A little additional space is needed on the source file system, however + only for new file links and directories, not for new actual regular files. + `- In particular though, the source file system needs to be writable. This + problem may possibly be circumvented using bind mounts. + + + +# -- REMOTE METHOD CALLS -- # + + This program essentially wraps rsync, however performs a few additional, +useful functions, some of which need to be executed on the source or +destination host, irrespective of whether that is local or not. In order to +allow for this, there exist wrapper methods, namely 'run_function' and +'run_command', allowing to execute a local bash function on a remote host. +Methods to be executed either on the source or the destination are located in +the source files 'source' and 'destination' and annotated with environment +dependencies ('requires'). See also the 'remote' source file. + + + +# -- APPENDIX -- # + +# NOTE-1: The old mirror. + As the old mirror is a remnant of the last backup it features exactly the + same file tree as that last backup - by labels, not by content. This is + due to this old mirror being (purposefully) linked to the live file tree, + in which files may be modified in place, which due to the hard links also + appear in the old mirror. + + +# NOTE-2: Avoid propagation of errors on the old source mirror. + The simple way to do the transfer from the source to the destination tree + may seem to use the old latest backup on the destination as part of the + target, when referencing the old mirror to it. + + I.e., sync (old_mirror, new_mirror) -> (old_latest_backup, new_backup). + This is however, not a good idea, since as explicated in NOTE-1, old_mirror + may change and hence the above operation would change old_latest_backup. + Since this is not desired, old_latest_backup is instead only used as + link-dest for rsync, which unfortunately causes a superfluous instance of + old_latest_backup to be created in the destination tree. This is simply + deleted in the end. + + +# NOTE-3: Imperfection. + The chosen solution is far from perfect. In particular, the backups are + far from being atomar. Apparently, the Btrfs makes this easier [0]. + Another alternative might be to use an overlayFS to record new writes. + + Further, in order to at least get closer to atomicity, a full copy, made + on and of the source tree, could be done before actual transferral. This + would however require a lot of space and also only likely be beneficial, if + the transfer is non-locally. + + Also, the extranously created copy of the source's old mirror on the + destination is disturbing. I do however not know of way to avoid this, + unless not relying on the rsync(1) command line program as backend. + + +# Links +[0] https://btrfs.wiki.kernel.org/index.php/Incremental_Backup + + +vi: ai et tw=77 fo+=t |