We live in a hyper-connected world with expectations of 100% uptime, at all times. We’ve long crossed the threshold where any length of network downtime is acceptable, and it can, in fact, create catastrophic knock-on effects.
An outage could lead to loss of revenue, increased risks associated with security and fraud, halting of business operations, and the dreaded inaccessibility to your data. The damage to your brand and customer satisfaction during such a disaster could be irreparable, and you might end up watching some of your precious customers stroll over to your closest competitor. Let’s face it, nobody wants to experience that!
Single points of failure are the downfall of any infrastructure, and designing and operating a system with high availability is the key to avoiding downtime.
What is high availability?
High Availability (HA) is a characteristic of a system, which aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period.
The first thing that comes to mind when designing HA infrastructure is adding redundancy. Redundancy is the duplication of the critical components of a system to increase reliability and prevent loss of function.
These are the components that need to be taken into consideration when designing HA infrastructure:
- Hardware: highly available servers should be resilient to power outages and hardware failures, including hard disks and network interfaces. Servers should be located at least in different racks/halls within a data center.
- Software: the whole software stack, including the operating system and the application itself, must be prepared for handling unexpected failure that could potentially require a system restart, for instance.
- Data: several factors can cause data loss and inconsistency, and it’s not restricted to hard disk failures. Highly available systems must account for data safety in the event of a failure.
- Network: unplanned network outages represent another possible point of failure for highly available systems. It is important that a redundant network strategy is in place for possible failures.
How might basic HA infrastructure look?
Let’s assume we have a domain (website) which runs on one server, the DNS of the domain points to the IP address of this server. If this server fails, then the website is offline. In this case, the server is the single point of failure.
How can we implement High Availability infrastructure to ensure that the website stays online?
To continue with reference to the example given earlier, we could add one identical server to the infrastructure above, preferably in another rack and we’d set up the servers in an active-passive mode, then we have redundancy on the hardware layer.
By adding this additional server, we would then have 2 servers where our website is hosted, but that does not mean that traffic will start flowing to the passive server when the active one is down. This is because the DNS of the domain points to the IP address of the active server. A DNS change must be made in order to point a domain name to the passive server, which can take a considerable amount of time to be propagated on the Internet. There is a solution to this which we would recommend: Floating IP addresses or ‘Floating IPs.’
What are Floating IPs?
Floating IPs are routable IP addresses which can direct traffic to another IP address. Instead of using a server’s IP address for the A records of your domain, you can use a Floating IP address. Floating IP addresses eliminate the propagation and caching issues inherent in DNS changes by providing an IP address that can be easily redirected when needed. Here’s a diagram showing how floating IPs can fit in with a HA system.
In the example shown above, we have two identical servers in active – passive mode where the same website is hosted. The DNS of the website points to a Floating IP. The Floating IP directs traffic to the active server. If the active server fails, a single change to the Floating IP address will direct traffic to the IP of the passive server, and this ensures that traffic will flow to the passive server.
Last, but not least, we will need data synchronization on both servers, to establish a smooth failover. Data synchronization is achieved by setting up a private network between the two servers.
Designing and implementing High Availability systems might seem complex. However, it is an essential feature of modern systems where businesses, employees, and customers are all expecting no interruptions to services and 100% uptime.