Introduction
When Artifactory becomes unresponsive or hangs, you may encounter the following error:
com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool This error suggests that threads are exhausted and may show hundreds of TIMED_WAITING state threads. In such cases, many threads are likely waiting on the S3 connection pool, which can indicate an issue with the connection pool size.
Example Thread Dump:
"2024-08-23T00:08:19.482Z|Trace_ID|/artifactory/helm-xxxxx-release/index.yaml|http-nio-8081-exec-44" #228 daemon prio=5 os_prio=0 cpu=8025769.69ms elapsed=1067120.37s tid=0x00007fb5fc826aa0 nid=0x1ee077 waiting on condition [0x00007fa844192000] java.lang.Thread.State: TIMED_WAITING (parking) at jdk.internal.misc.Unsafe.park(java.base@17.0.6/Native Method) - parking to wait for <0x00007fab7d755100> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(AbstractConnPool.java:391) ... com.amazonaws.http.conn.ClientConnectionRequestFactory$Handler.invoke(ClientConnectionRequestFactory.java:70
This situation typically indicates that the pool size for outgoing requests to S3 is too small for the current workload, causing threads to block while waiting for a connection.
You can resolve this by increasing the connection pool size.
Modify the <maxConnections> setting under the cloud object storage configuration in the binarystore.xml file (e.g., AWS S3: type="s3-storage-v3", Google: type="google-storage-v2", Azure: type="azure-blob-storage-v2") , not the Remote Binary Provider (type=”remote”). After making the changes, restart Artifactory to apply them.
However, to ensure a thorough investigation, it's important to consider the possibility of a connection leak, as shown in the steps below.
Resolution
Step 1: Monitor the Outgoing HTTP Connection Pool
- Track Connection Usage: Use the outgoing HTTP connection pool monitoring guide to observe the connection count over time. Create a graph to visualize trends and usage patterns. Compare the number of leased connections with the number of released connections in the connection pool logs. If there are consistently more leased connections than released, it indicates a leak. This helps identify whether connections are being properly closed or if there’s a leak. A connection leak often results in a steady increase in the number of connections without a subsequent decrease.
- Monitor CLOSE_WAIT Connections: Run the following command to check the number of connections in the CLOSE_WAIT state: if you see a high count, it indicates that there’s a leak.
netstat | grep 'CLOSE_WAIT' | wc -l
Step 2: Upgrade to Artifactory 7.7X. or Higher
Starting from Artifactory version 7.7X, a built-in connection leak detector is available. This feature can help identify and troubleshoot connection leaks more effectively, simplifying the monitoring process. Please contact JFrog Support for more information.