How do I use Amazon Elastic Filesystem (EFS) with Artifactory HA

How do I use Amazon Elastic Filesystem (EFS) with Artifactory HA

Artifactory High Availability (HA) in AWS may use S3 for scalable storage or Amazon's Elastic File System (EFS) may be implemented for an NFS filestore. Designing for EFS implementation must take into account certain aspects of how EFS works.

This document explains the differences between Bursting Throughput and Provisioned Throughput modes with the goal of achieving optimal performance using EFS with Artifactory HA. For more information about Amazon EFS, see What is Amazon Elastic File System?

Amazon EFS Performance Overview

Bursting Throughput Mode (default mode)

Throughput on Amazon EFS scales as a file system grows. A file system can drive throughput continuously at its baseline rate.  Additionally, Amazon EFS is designed to burst to high throughput levels for periods of time.

For full documentation on EFS performance, see Throughput Scaling in Amazon EFS.

Amazon EFS uses a credit system to determine when file systems can burst. Each file system earns credits over time at a baseline rate that is determined by the size of the file system, and uses credits whenever it reads or writes data. Whenever a file system is inactive or driving throughput below its baseline rate, the file system accumulates burst credits.

If your file system has no burst credits available, the I/O throughput is the baseline rate, until burst credits replenish. The baseline rate may severely impact Artifactory performance.  Therefore, it is best practice to avoid needing burst credits at all, or very rarely, and instead be able to provide the entire needed throughput with the baseline throughput allocation.  You can monitor the balance by using Amazon CloudWatch metric for Amazon EFS.

Provisioned Throughput Mode

If you find that performance is limited in Bursting Throughput mode due to a small capacity file system or spikes in throughput requirements, you can opt for Provisioned Throughput mode.  Provisioned Throughput provides the ability to set a throughput level that will remain constant and consistently high as you use the file system, independent of the size of the file system.  This mode is ideal for customers who need high throughput but have a relatively small file system, which is not uncommon for Artifactory users. You can find more details on Provisioned Throughput usage and pricing on the AWS website.

 

How to determine which throughput mode is best suited for your workload?

 

For the best results, you must consider what your workload is and the capacity required to meet that performance. Take it this way: Using Bursting Throughput mode, the more capacity in your filesystem, the higher your baseline and burst throughput.

 

Many JFrog Artifactory servers scale to the TiB range in storage, which is often ample performance for the use case. For smaller filestores however, the baseline throughput is quite low: 1 GiB of data is 50 KiB/s and scales linearly, per GiB, above that.  If your workload requires more throughput than what is provided by the baseline and burst model for your capacity described above, then you should consider Provisioned Throughput mode.

Using local EBS caches

You can use local EBS caches  on Artifactory node instances. This option reduces workload on the EFS filestore, and the use of the burst credit balance when in Bursting Throughput mode.  Configure Artifactory to use a cache in EBS on each node. This local cache stores artifacts on each individual node and serves them directly to the client instead of pulling them from EFS. Each node may have duplicate cache entries (since any node can serve any request) but this greatly reduces the access to EFS. It’s important to consider that this same mechanism can also be used to enhance performance if S3 is being used as the binary store. This method may allow you to utilize the bursting capability more often, but should still only be used when the baseline throughput is at a reasonable value. In particular, bear in mind that when a cache is implemented, it must FIRST be streamed into the cache at the EFS speed, and then sent out, so if the EFS is running very slow due to exhaustion of burst credit balance, this may result in client timeouts. If you find that you are running into burst credit balance limits or are experiencing client timeouts, you should switch to Provisioned Throughput mode to increase available throughput from EFS.

For example, if a file was downloaded 1000 times without a local cache, EFS would have the download activity of 1000 * filesize. But, with local cache enabled, it may end up being just 2 * filesize.  The EBS cache is recommended to be persisted (non-ephemeral), if you do not want to have to refill it when a new instance is created.  For more information about configuring a cachefs, see the notes on configuring the filestore in the binarystore.xml file.

Symptoms of inadequate throughput/exceeding the burst credit balance

Symptoms of inadequate storage design include ping delays (api/system/ping) and very slow download/upload process. The delays may show up as timeout errors or broken connections caused by client timeout disconnects in your Artifactory log.

Artifactory versions prior to 5.0 Implemented on S3 Storage

If you implemented Artifactory HA versions prior to 5.0 with S3 storage, you needed a cluster-wide write cache (called the eventual cache) to be implemented on a shared NFS mount. Jfrog recommend’s EFS in Provisioned Throughput mode for this cache as it is typically low storage and high usage, which aligns with the workload type that EFS Provisioned Throughput mode was designed to support.  For Artifactory versions 5.0 and higher, the cache for an S3 implementation should be EBS local disk caches.