You Have Docker – But Are Your Docker Registries Highly Available?

HA Docker Registry with JFrog Artifactory

With Docker continuing to gain traction in production systems, and widespread use in pre-production, a company’s Docker registry can be central to its operations. This post explains how a high availability Docker registry can help companies avoid the enormous expense incurred when mission-critical systems go down.

The Cost of Downtime

Downtime is extremely costly, and the effects of downtime for an organization can range from bad to bankrupt. A quick search on Google for “cost of downtime” will bring up some great resources with startling statistics. Depending on the study you look at, you’ll see numbers ranging from $5,6001 to over $17,0002 per minute! Those figures reflect direct losses in income, but even more so, indirect costs due to reduced productivity caused by the downtime. According to British Airways CEO, an outage in the airline’s IT systems in May 2017 cost the company over $100 million3.

Let’s think about that. When production systems are down, the losses are clear. Ecommerce sites can’t sell, reservation systems can’t take orders, payment systems don’t work etc. But what about pre-production systems? Imagine developers gaping at screens because a local build got stuck, DevOps engineers getting edgy because the third person just told them the CI server is down, and QA engineers unable to run regression testing required to approve a release candidate. Lost productivity that quickly adds up to $$$.

Your Docker Registry is Mission-Critical

Imagine a developer who is working on one of the company’s Docker images. Imagine that she is trying to fix a bug that causes the company’s billing systems to overcharge customers 10-fold. Now imagine that the developer’s Docker registry experiences an outage. No Docker registry means no builds which means hours wasted before getting the bug fixed leading to many overcharged, unhappy customers.

A Docker Registry with 5-Nines Availability

In a previous post, we discussed the central role Artifactory plays as your Docker registry. When deployed in a high availability configuration, Artifactory can also prevent the sad scenario described in the previous paragraph.

To achieve high availability, Artifactory is installed as an active/active, redundant cluster of multiple nodes on the same LAN.

HA Docker Registry with JFrog Artifactory

This prevents downtime in the following ways:

  1. No single point of failure
    Since there are multiple nodes in an HA installation, an outage in any of the nodes does not take down the whole cluster. Any of the remaining nodes in the cluster can answer requests until the downed server is back up.
  2. No maintenance downtime
    For the same reason, when taking a server down for maintenance, the cluster can still operate and respond to any requests. To perform maintenance on the whole cluster, each node can be taken down in turn, worked on as needed, and then brought back up into the cluster before taking down the next one.
  3. Manage heavy loads
    Since requests are distributed equally among all the cluster node by a load balancer, your Artifactory Docker registry can accommodate large load bursts with no degradation to performance. And as usage grows, you can add more servers to the cluster as needed to increase your capacity and accommodate any load.

With this level of stability and reliability, your Artifactory Docker registry can provide up to 5-nines availability.

Docker is used by many companies and government agencies both during the software development process and in production systems. Any downtime (planned or unplanned) to the Docker registry that is serving images can result in huge costs to a company not to mention damage to its brand. With JFrog Artifactory as a high availability Docker registry, not only can companies host and  manage all the docker images in one location, but they can also benefit from stability and reliability that is unmatched in the industry to ensure that their Docker images flow from their developer workstations, safely and securely to their production runtimes.

Resources

  1. Lerner, A. (2014). The Cost of Downtime. Retrieved from https://blogs.gartner.com/andrew-lerner/2014/07/16/the-cost-of-downtime/
  2. How to Calculate the True Cost of Downtime. (2017). Retrieved from https://www.datafoundry.com/blog/how-to-calculate-the-true-cost-of-downtime/
  3. Hetz, R., Day, P., & Neely, J. (2017). Retrieved from https://www.reuters.com/article/us-iag-ceo/british-airways-ceo-puts-cost-of-recent-it-outage-at-80-million-pounds-idUSKBN1961H2