Resilience ConceptLindenberg Software Backup can be set up for Resilience in order to improve availability of the backup service. As a consequence of the CAP theorem it is impossible to ensure the exact same data all the time in the presence of network outages (partitions). Thus it not the goal to ensure, that all backup servers share the same history of backups. This would require either having the client send data to all servers, or replication of backup data between servers, which due to the delays incurred would require complex logic to maintain backup trees instead of chains and also additional disk space up to the combined size of all chains. Moreover, in case of network partitions, consistency would be gone anyway. Therefore, in case a backup server or the corresponding network connection fails, any delta backup stored on that server is not accessible. Note that actual data loss only happens in case original and backups are lost, which is not very likely. Impact can also be reduced by scheduling frequent backups. For restore or other read operations, Janus may evolve into providing a consolidated view accross backup servers.
Resilience requires two (or more) backup servers plus a "Janus" service - named after the god Janus with two faces - configuration as shown below:
Ideally, one backup server and one Janus server each are set up at two (or more) different locations - with different power supplies and different network connections. If only one network connection is shared between the backup servers, then setting up two Janus servers does not improve resilience with respect to network outages, but configuration of multiple backup servers also helps to protect against disk crashes or downtimes due to system maintenance.
The following diagram illustrates how Janus can alternate between backup servers while avoiding backup servers that are not available.
When starting a backup and as long as all backup servers are available, Janus will always refer to the backup server that was not used for backup the longest time. For any operation other than starting a backup, Janus will refer to the most recently used server. In both cases, if a backup server is not reachable, it will be skipped. Moreover, as described in Network Aspects and Wake-On-Lan, the client will first try the Url for LAN, then the Url for WAN, which also ensures two Janus servers can be used.
- if connectivity between the servers is broken, the decision of Janus can be based on outdated information. Therefore the availability check is essential as otherwise a server might be selected that is not available. However because of the availability check, Wake-On-Lan will in general be too slow to be used. Instead you can use scheduled wake-up times (e.g using Lights-Out)
- the client remembers hash codes and USN information for all backup servers and all client disks and volumes involved. I In particular the hashes require roughly 1% (1/128) of the total disk capacity per server.
- Backup server and Janus server can coexist on the same system, in fact in the same process.
- Janus servers are not included with the standard Lindenberg Software Backup license.
Resilience SetupThe following components work together in that setup:
- DNS shall be setup in a way that hides any differences between LAN or WAN names - split-horizon, split-dns, split-brain, you name it. DNSSEC options are discussed in draft-krishnaswamy-dnsop-dnssec-split-view-04.
- A network connection is required between all servers. In case of two locations this shall be implemented with a virtual private network (VPN).
- All network interfaces on the servers need to be configured to domain or private (see "Network Location Awareness (NLA) and how it relates to Windows Firewall Profiles" for background information and "How to force a network type in Windows using PowerShell" for instructions). This is required to support Active Directory and Windows Event Collector communication below. Note that Hyper-V virtual switches are likely to require manual intervention.
- An Active Directory (can be Samba of course) is highly recommended for user authentication, optionally for authorization, and configuration via group policy. All servers must be domain members. Local users are discoraged as then keeping the passwords in sync is a nightmare. Microsoft accounts are possible, but authentication is significantly slower than with a domain.
- Windows Event Collector needs to be configured to collect events from all backup servers to all Janus servers. This is detailed in the sections below.
Configure Janus Servers
- Install Lindenberg Software Backup. If you are adding Janus to an existing backup server, turn off the service temporarily.
- Janus servers need the following settings which at present are only available via the registry:
- ServerUrls (should have local backup server first)
- JanusPort (https), different from server port
- Turn on Backup Service. This will also enable Windows Remote Management and Windows Event Collector on the Janus server. Windows Event Viewer will now show a Subscription "Lindenberg Software Backup Admin", likely without any computers yet.
Configure Backup Servers
- Install and configure as usual.
- Configure source initiated event collection as described in Setting up a source-initiated subscription where the event sources are in the same domain as the event collector computer. Note that you can run the "Group Policy Management" tool on any domain member with Remote Server Administration Tools installed. If you use a group policy, then you may want to restrict this policy to backup servers only (see e.g. How to apply a Group Policy Object to individual users or computer). If you don´t use group policy but a local policy, then you will have to repeat this step on all backup servers.
For Subscription Manager add all your (planned) Janus servers using the following format:
- Don´t forget to run
winrm qc -qand
gpupdate /forceon all backup servers.
wecutil gr "Lindenberg Software Backup Admin"on the command line instead.