aboutsummaryrefslogtreecommitdiff
path: root/NOTES
blob: cc849204efba6aac68c8589d57e21b75f3acb143 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
This is intended as a short, rather high-level, documentation.


# -- NOTATIONS -- #

  Define a few notations, as used in the following.

- file tree: A file tree with possible excluded subtrees residing on exactly
             one file system.  The file tree does not need to share the root
             of that file system.



# -- GENERAL -- #

  This backup program makes incremental backups of a file tree, using hard
links between differently dated backups of that file tree to reduce space
requirements while keeping the data in a simple form.

  The destination file system is usually different to the on the source file
tree resides on, in particular, it may be on a different host.  The rsync(1)
command line program serves as backend.  Therefore, at least one of the
source and destination need to be local.

  The backup process is organized in a way such that any pure file renames
between the old and new backup directory do not cause a new file to be
created.  I.e., the two files are hard linked.  This is contrary to the
simpler method of simply exploiting rsync's '--link-dest' functionality, and
thereby to the early versions of this program.

  This is achieved by

 1. Creating a local mirror of the source tree inside itself
    (non-recursively, of course), using hard links.  In common backup
    operation, there are always at least two such mirrors: an obsolete one,
    a remnant of the last backup's creation, featuring the same structure
    as the last backup as stored in the destination file tree, and the newly
    created mirror.  By transitivity, the two mirrors share hard links.  See
    also APPENDIX: NOTE-1.

 2. Copying the pair of these two mirrors to the destination tree, thereby
    referencing the old mirror to the (identical!) old mirror in the
    destination tree.  Since the two mirrors are inter-linked, and the
    referencing allows for linking the old mirror to the (old) latest backup,
    the new mirror (and very latest backup) is effectively linked to the
    old latest backup.  See also APPENDIX: NOTE-2.


  The chosen procedure has some side effects.

 * Backups are available on the source host (however they likely diverge from
   the actual backups, as the former are linked to the live file tree and
   hence change upon in-place modification).  See also APPENDIX: NOTE-1.
  `- Note that, currently, neither in the source nor in the destination tree,
     backups (resp.  mirrors) are deleted.  In the source tree this might be
     more desired than in the destination tree.

 * The local mirror creation is much faster than the transfer to the backup
   medium, in particular, if it is remote.  Therefore, the state, excluding
   the aforementioned in-place modifications, is closer to consistency with a
   single point in time.  One might argue though, that precisely these
   in-place modifications cause the window of modifications to be in fact
   widened to the adjunction of the mirroring and the transfer.


  There are also some clear (but probably not grave) drawbacks to the simpler
method.

 * There is one more step involved, thereby adding complexity.

 * Both the source and destination file tree need to reside on respectively
   one single file system.  As a consequence, this program needs to be
   configured for each file system containing data to be backed up.  This
   could in theory be done in the program itself, but seems not worth the
   effort.

 * A little additional space is needed on the source file system, however
   only for new file links and directories, not for new actual regular files.
  `- In particular though, the source file system needs to be writable.  This
     problem may possibly be circumvented using bind mounts.



# -- REMOTE METHOD CALLS -- #

  This program essentially wraps rsync, however performs a few additional,
useful functions, some of which need to be executed on the source or
destination host, irrespective of whether that is local or not.  In order to
allow for this, there exist wrapper methods, namely 'run_function' and
'run_command', allowing to execute a local bash function on a remote host.
Methods to be executed either on the source or the destination are located in
the source files 'source' and 'destination' and annotated with environment
dependencies ('requires').  See also the 'remote' source file.



# -- APPENDIX -- #

# NOTE-1: The old mirror.
    As the old mirror is a remnant of the last backup it features exactly the
  same file tree as that last backup - by labels, not by content.  This is
  due to this old mirror being (purposefully) linked to the live file tree,
  in which files may be modified in place, which due to the hard links also
  appear in the old mirror.


# NOTE-2: Avoid propagation of errors on the old source mirror.
    The simple way to do the transfer from the source to the destination tree
  may seem to use the old latest backup on the destination as part of the
  target, when referencing the old mirror to it.

    I.e., sync (old_mirror, new_mirror) -> (old_latest_backup, new_backup).
  This is however, not a good idea, since as explicated in NOTE-1, old_mirror
  may change and hence the above operation would change old_latest_backup.
  Since this is not desired, old_latest_backup is instead only used as
  link-dest for rsync, which unfortunately causes a superfluous instance of
  old_latest_backup to be created in the destination tree.  This is simply
  deleted in the end.


# NOTE-3: Imperfection.
    The chosen solution is far from perfect.  In particular, the backups are
  far from being atomar.  Apparently, the Btrfs makes this easier [0].
  Another alternative might be to use an overlayFS to record new writes.
  
    Further, in order to at least get closer to atomicity, a full copy, made
  on and of the source tree, could be done before actual transferral.  This
  would however require a lot of space and also only likely be beneficial, if
  the transfer is non-locally.

    Also, the extranously created copy of the source's old mirror on the
  destination is disturbing.  I do however not know of way to avoid this,
  unless not relying on the rsync(1) command line program as backend.


# Links
[0] https://btrfs.wiki.kernel.org/index.php/Incremental_Backup


vi: ai et tw=77 fo+=t