Resolving Server Timeouts in large artifacts

What is server timeout?

Generally, a timeout indicates a problem where the upstream server is taking more time than expected causing the request to fail. In this article, we will be covering some timeout scenarios, and we will use Artifactory as an example. We generally see timeout issues with large-sized artifacts – specifically docker images which generally tend to be large compared with other artifacts. However please be informed that increasing the timeout value may not be the first option in order to overcome the situation.

This general architecture below can be referred to throughout this article.

How to identify which layer/device is timing out?

Well, that is simple to find out. In order to find a bottleneck, try testing an upload/download by removing the device from the picture. As an example, for the above configuration if you are observing docker upload failures, then the first check would be Artifactory. It is recommended to try uploading the docker image (or any similar size file) to Artifactory directly from the Artifactory local machine. If it’s successfully uploaded, subsequently check the same with the Reverse proxy and then the load balancer.

Timeouts at Load balancer:

Every device in the network comes with the possibility of a timeout, and load balancers are no exception to this rule. These timeouts are usually configurable to make sure a connection is terminated in the event of an idle state. You can choose to increase this timeout as needed.

Timeouts at Reverse proxy layer:

In a HA environment Reverse proxy will be the first point where the request will be landing and it plays an important role in keeping the incoming requests alive and waiting for the response from the upstream Artifactory.

Here are some examples of the Timeouts that can be tweaked at the Nginx level:

Proxy read timeout: Nginx comes with a default timeout of 60 seconds to keep the connection alive from a client. When you have issues with this Timeout value we might see a 504 gateway timeout error.

Here is an example of increased time out value to 3 minutes for Nginx
proxy_read_timeout 180s;

Proxy send timeout: This variable defines the time out value to transfer the request to the upstream Artifactory server and it plays a good role while we are uploading large artifacts.

An example for Nginx has been provided below:
proxy_read_timeout 180s;

Timeouts in Artifactory:

Usecase1: Timeout occurring while downloading a package from an external site – this can occur on any of the following devices involved in the request flow.

We may also encounter timeout issues in Artifactory – Tomcat when the Tomcat server is running out of capacity. It is important to Make sure there are enough resources available for the Tomcat server to create more threads and handle the load. Also, ensure Artifactory is fine-tuned for high load. Here is the link to the KB to do so.

Check1: The default timeout at Artifactory – Tomcat is set to 60 seconds by default and it can be increased by adding the below parameter to the tomcat/conf/server.xml file. A restart is required to take the changes into effect.

Shared:

extraJavaOpts: “connectionTimeout=”60000””

Default is 60000 (60 seconds). More information about Artifactory system yam can be found here.

Check2: In case if you are observing the upstream is taking time to respond, increase socket timeout as well by navigating to the remote repository advanced tab.

Check3: If the above two did not resolve the issue there is a chance that network proxy might be terminating the connection. The recommendation is to validate it by removing the network proxy from the picture and work with relevant teams to fix it.

Usecase2: Timeout occurring while downloading a private package from the backend filestore like S3. The usual flow may look like the diagram below

Check1: Try to validate if there are any issues with the S3 provided itself. To validate this, configure a client like s3cmd or AWSCLI and try to download the binaries from S3 directly from the local server of Artifactory.

Check2: if there are no issues with S3 performance, try to download the binary after by-passing the network proxy. If you want to increase the timeout between the Artifactory and backend S3 you can use the below snippet to the Artifactory binarystoreconfig.xml file:

An example snippet of Artifactory Binary store config would look like this:

<bucketName>my-bucket</bucketName>

<path>myPath</path>

<useInstanceCredentials>false</useInstanceCredentials>

<proxyIdentity>username</proxyIdentity>

<proxyCredential>password</proxyCredential>

</provider>

</config>

Timeout during the replication:

Push replication representation:

Checkpoints:

Artifactory1’s Tomcat
Artifactory1’s network proxy
Reverse proxy in front of Artifactory2
Tomcat of Artifactory2

Pull replication representation:

Checkpoints:

Artifactory1’s Tomcat timeout
Artifactory1’s network proxy
Remote repository socket timeout
Reverse proxy in front of Artifactory2
Tomcat of Artifactory2

To get more information about replication timeouts, you can also refer to this KB article.

How to Resolve Server Timeouts

Speakers

DevOps Plumbing: Red Hat OpenShift CI/CD Pipelines with Artifactory and Xray [swampUP 2021]

AIOps and You – Faster Deployments, Safer Pipelines, Happier People [swampUP 2021]

Diversity and Inclusion : Remote/WFH edition @KubeDaily

Be Careful from Data Leakage – Potential Pitfalls in your Machine Learning Model