Resilience Concept

Lindenberg Software Backup can be set up for Resilience in order to improve availability of the backup service. As a consequence of the CAP theorem it is impossible to ensure the exact same data all the time in the presence of network outages (partitions). Thus it not the goal to ensure, that all backup servers share the same history of backups. This would require either having the client send data to all servers, or replication of backup data between servers, which due to the delays incurred would require complex logic to maintain backup trees instead of chains and also additional disk space up to the combined size of all chains. Moreover, in case of network partitions, consistency would be gone anyway. Therefore, in case a backup server or the corresponding network connection fails, any delta backup stored on that server is not accessible. Note that actual data loss only happens in case original and backups are lost, which is not very likely. Impact can also be reduced by scheduling frequent backups. For restore or other read operations, Janus may evolve into providing a consolidated view accross backup servers.

Resilience requires two (or more) backup servers plus a "Janus" service - named after the god Janus with two faces - configuration as shown below: Resilience Setup
Ideally, one backup server and one Janus server each are set up at two (or more) different locations - with different power supplies and different network connections. If only one network connection is shared between the backup servers, then setting up two Janus servers does not improve resilience with respect to network outages, but configuration of multiple backup servers also helps to protect against disk crashes or downtimes due to system maintenance.

The following diagram illustrates how Janus can alternate between backup servers while avoiding backup servers that are not available. Resilience Sequence
When starting a backup and as long as all backup servers are available, Janus will always refer to the backup server that was not used for backup the longest time. For any operation other than starting a backup, Janus will refer to the most recently used server. In both cases, if a backup server is not reachable, it will be skipped. Moreover, as described in Network Aspects and Wake-On-Lan, the client will first try the Url for LAN, then the Url for WAN, which also ensures two Janus servers can be used.

Notes:
  • if connectivity between the servers is broken, the decision of Janus can be based on outdated information. Therefore the availability check is essential as otherwise a server might be selected that is not available. However because of the availability check, Wake-On-Lan will in general be too slow to be used. Instead you can use scheduled wake-up times (e.g using Lights-Out)
  • the client remembers hash codes and USN information for all backup servers and all client disks and volumes involved. I In particular the hashes require roughly 1% (1/128) of the total disk capacity per server.
  • Backup server and Janus server can coexist on the same system, in fact in the same process.
  • Janus servers are not included with the standard Lindenberg Software Backup license.

Resilience Setup

The following components work together in that setup: Configuration is as described below. Note that the settings "Image Path", "Prefer VHDX", and "enforce separate folders by user" must be identical on all servers involved, whereas network ports may differ. Network shares as Image Path are not supported in resilience setup. Be sure to assign consistent directory authorizations, preferably using a group stored in Active Directory.

Configure Janus Servers

  1. Install Lindenberg Software Backup. If you are adding Janus to an existing backup server, turn off the service temporarily.
  2. Janus servers need the following settings which at present are only available via the registry:
    • ServerUrls (should have local backup server first)
    • JanusPort (https), different from server port
  3. Turn on Backup Service. This will also enable Windows Remote Management and Windows Event Collector on the Janus server. Windows Event Viewer will now show a Subscription "Lindenberg Software Backup Admin", likely without any computers yet.

Configure Backup Servers

  1. Install and configure as usual.
  2. Configure source initiated event collection as described in Setting up a source-initiated subscription where the event sources are in the same domain as the event collector computer. Note that you can run the "Group Policy Management" tool on any domain member with Remote Server Administration Tools installed. If you use a group policy, then you may want to restrict this policy to backup servers only (see e.g. How to apply a Group Policy Object to individual users or computer). If you don´t use group policy but a local policy, then you will have to repeat this step on all backup servers.
    For Subscription Manager add all your (planned) Janus servers using the following format: Server=http://janus.samba.lindenberg.one:5985/wsman/SubscriptionManager/WEC,Refresh=60.
  3. Don´t forget to run winrm qc -q and gpupdate /force on all backup servers.
Using Windows Event Viewer on the Janus servers you may want to check that backup servers subscribed. In case your Janus server is running on a Server Core installation, use wecutil gr "Lindenberg Software Backup Admin" on the command line instead.

Initial Backup

A special consideration is the initial backup. Without manual intervention, the client would in fact do two (or more) initial backups, one per backup server involved. Obviously this is not desirable depending on network speed. Instead the initial backup (or the most recent backup in case resilience is setup at a later point in time) and the hashes can be replicated (and merged if necessary) to the new backup server. In order to minimize the delay, it is possible to just replicate the hashes and create a "stub" virtual disk (as with local backup procedure) and then later replicate the remainder of the backup. TODO: tool support?