This is intended as a short, rather high-level, documentation. # -- NOTATIONS -- # Define a few notations, as used in the following. - file tree: A file tree with possible excluded subtrees residing on exactly one file system. The file tree does not need to share the root of that file system. # -- GENERAL -- # This backup program makes incremental backups of a file tree, using hard links between differently dated backups of that file tree to reduce space requirements while keeping the data in a simple form. The destination file system is usually different to the on the source file tree resides on, in particular, it may be on a different host. The rsync(1) command line program serves as backend. Therefore, at least one of the source and destination need to be local. The backup process is organized in a way such that any pure file renames between the old and new backup directory do not cause a new file to be created. I.e., the two files are hard linked. This is contrary to the simpler method of simply exploiting rsync's '--link-dest' functionality, and thereby to the early versions of this program. This is achieved by 1. Creating a local mirror of the source tree inside itself (non-recursively, of course), using hard links. In common backup operation, there are always at least two such mirrors: an obsolete one, a remnant of the last backup's creation, featuring the same structure as the last backup as stored in the destination file tree, and the newly created mirror. By transitivity, the two mirrors share hard links. See also APPENDIX: NOTE-1. 2. Copying the pair of these two mirrors to the destination tree, thereby referencing the old mirror to the (identical!) old mirror in the destination tree. Since the two mirrors are inter-linked, and the referencing allows for linking the old mirror to the (old) latest backup, the new mirror (and very latest backup) is effectively linked to the old latest backup. See also APPENDIX: NOTE-2. The chosen procedure has some side effects. * Backups are available on the source host (however they likely diverge from the actual backups, as the former are linked to the live file tree and hence change upon in-place modification). See also APPENDIX: NOTE-1. `- Note that, currently, neither in the source nor in the destination tree, backups (resp. mirrors) are deleted. In the source tree this might be more desired than in the destination tree. * The local mirror creation is much faster than the transfer to the backup medium, in particular, if it is remote. Therefore, the state, excluding the aforementioned in-place modifications, is closer to consistency with a single point in time. One might argue though, that precisely these in-place modifications cause the window of modifications to be in fact widened to the adjunction of the mirroring and the transfer. There are also some clear (but probably not grave) drawbacks to the simpler method. * There is one more step involved, thereby adding complexity. * Both the source and destination file tree need to reside on respectively one single file system. As a consequence, this program needs to be configured for each file system containing data to be backed up. This could in theory be done in the program itself, but seems not worth the effort. * A little additional space is needed on the source file system, however only for new file links and directories, not for new actual regular files. `- In particular though, the source file system needs to be writable. This problem may possibly be circumvented using bind mounts. # -- REMOTE METHOD CALLS -- # This program essentially wraps rsync, however performs a few additional, useful functions, some of which need to be executed on the source or destination host, irrespective of whether that is local or not. In order to allow for this, there exist wrapper methods, namely 'run_function' and 'run_command', allowing to execute a local bash function on a remote host. Methods to be executed either on the source or the destination are located in the source files 'source' and 'destination' and annotated with environment dependencies ('requires'). See also the 'remote' source file. # -- APPENDIX -- # # NOTE-1: The old mirror. As the old mirror is a remnant of the last backup it features exactly the same file tree as that last backup - by labels, not by content. This is due to this old mirror being (purposefully) linked to the live file tree, in which files may be modified in place, which due to the hard links also appear in the old mirror. # NOTE-2: Avoid propagation of errors on the old source mirror. The simple way to do the transfer from the source to the destination tree may seem to use the old latest backup on the destination as part of the target, when referencing the old mirror to it. I.e., sync (old_mirror, new_mirror) -> (old_latest_backup, new_backup). This is however, not a good idea, since as explicated in NOTE-1, old_mirror may change and hence the above operation would change old_latest_backup. Since this is not desired, old_latest_backup is instead only used as link-dest for rsync, which unfortunately causes a superfluous instance of old_latest_backup to be created in the destination tree. This is simply deleted in the end. # NOTE-3: Imperfection. The chosen solution is far from perfect. In particular, the backups are far from being atomar. Apparently, the Btrfs makes this easier [0]. Another alternative might be to use an overlayFS to record new writes. Further, in order to at least get closer to atomicity, a full copy, made on and of the source tree, could be done before actual transferral. This would however require a lot of space and also only likely be beneficial, if the transfer is non-locally. Also, the extranously created copy of the source's old mirror on the destination is disturbing. I do however not know of way to avoid this, unless not relying on the rsync(1) command line program as backend. # Links [0] https://btrfs.wiki.kernel.org/index.php/Incremental_Backup vi: ai et tw=77 fo+=t