ARTIFACTORY: How to Conduct a Performance Analysis on Your request.log

Patrick Russell
2021-09-13 12:32

ARTIFACTORY: How to Conduct a Performance Analysis on Your request.log

Relevant versions: This information pertains to Artifactory versions 4.x through 6.x

The fields displayed in Artifactory's request.log offer information that can be extremely useful to you when trying to identify the what, when, where, why, and how of given performance issues. The log format is powerful because it is able to measure the time it takes Artifactory to process a request regardless of networking. This can rule out networking as a factor in an issue in performance problems.

Following is the format of a standard request.log entry:

20140508154145|2632|REQUEST|86:12:14:192|admin|GET|/jcenter/org/iostreams/iostreams/0.2/iostreams-0.2.jar|HTTP/1.1|200|8296

 

Date and time stamp

Date and time a given request was completed and entered into the log file. The format will appear as [YYYYMMDDHHMMSS]

Request time

Time in ms taken for the request to be processed

Request type

DOWNLOAD for a download request
UPLOAD for an upload request
REQUEST for any other request

IP

Requesting user's IP address

User name

Requesting user's user name or 
"non_authenticated_user" when accessed anonymously

Request method

HTTP request method (e.g., GET, PUT, etc.)

Requested resource path

Relative path to the requested resource

Protocol version

HTTP protocol version

Response code

HTTP response code

Size (bytes) of request or response

If request method is GET: Size of response
If request method is PUT or POST: Size of request

The most important fields to analyze performance problems are the Request time and Size of Response. This can be demonstrated using some bash terminal commands to search through the file.

Checking for Performance Problems

When it comes to performance problems, the most important fields to analyze are Request time and Size of request or response. This can be demonstrated by using some bash terminal commands to search through the request log file, as shown below.

Let's look at a very common issue: docker login times out. If you were experiencing this problem, you'd certainly want to try and figure out why this is happening. You may know that the docker login process requests a token by using the Artifactory API. You can obtain the precise request format by simulating a docker login request against Artifactory:

 
GET /api/docker/docker-repo/v2/token

curl -u admin:password http://artifactory.com/artifactory/api/docker-local/v2/token

Now, with a URL to search for, a set of grep searches can be crafted to parse you request log:

Note: This search will identify any request that takes between 10,000 and 99,999 milliseconds (10-99 seconds).
 
[Identify when a "docker login" request times out]
#In grep, a single wildcard character is the period '.'
grep "|.....|REQUEST" request.log | grep "api/docker" | grep "v2/token" 

#Alternative "awk" search
awk 'BEGIN { FS="|" } ($2 > requestTime) && ($6 == "GET") { print $1 " response_time: " $2 " " $7}' requestTime=10000 request.log

#Example long-running docker login
20190726113810| 39070 [39 seconds] |REQUEST| 127.0.0.1 |Jenkins-LDAP|GET|/api/docker/docker-virtual/v2/token|HTTP/1.1|200|0

#Adding an additional dot searches for 100 - 999 second requests
20190726114126 | 126000 [126 seconds] |REQUEST| 127.0.0.1 |Jenkins-LDAP|GET|/api/docker/docker-virtual/v2/token|HTTP/1.1|200|0

This technique allows you see whether or not Artifactory is the cause of slow requests. You might have reason to be concerned, for example, if a given, long-running request is for a small file. On the other hand, if that file is being pulled from a remote endpoint for the first time, the request might take longer to execute versus subsequent occasions, after that file has been cached:
 

[2.2 seconds to stream a 53 KB file the first time - Pulled from a remote repository]
20181228204422|2218|REQUEST|127.0.0.1|admin|GET|/api/npm/npm/sshpk/-/sshpk-1.16.0.tgz|HTTP/1.1|200|53405

20190814003406|681|REQUEST|127.0.0.1|admin|GET|/api/npm/npm/sshpk/-/sshpk-1.16.0.tgz|HTTP/1.1|200|53405

Of course, there are some limitations to this approach. For example, slow downloads or uploads might be taking longer to execute because your client is needing to stream a lot of data. Indeed, if you encounter a long-running request that's transferring a lot of binary data, it's likely that the length of time to execute issue will be chalked up as expected behavior:
 

[137 milliseconds to upload a 2.5 MB file]
20181204003658|137 [milliseconds]|REQUEST|127.0.0.1|admin|PUT|/libs-snapshot-local/org/jfrog/test/multi3/3.7-SNAPSHOT/multi3-3.7-SNAPSHOT.war;build.timestamp=1543883800178;build.name=maven-pipeline;build.number=23|HTTP/1.1|201|2533340 [2.5 MB]

To solve these problems, it's best to consider how Artifactory serves requests in order to identify the bottleneck. This diagram shows the standard process Artifactory uses to handle the download of a binary file:

User-added image

1. Artifactory receives the request (and the request time begins ticking)
2. Artifactory authenticates the user via its embedded Access server (which searches for the user in the Access database)
3. Artifactory finds the file checksum in the database
4. Artifactory fetches the file from the file system
5. Artifactory streams the file to the user
6. The request closes, the request time stops, and the request.log entry is recorded

Each link in the chain can be tested to see if it's the bottleneck. For example, you can eliminate anything before or after steps 1 and 5 using just this request log analysis (It's then only slow inside the application). 

Checking Request Rates

If Artifactory is using a lot of processing power, one way to see what's causing this high usage is to review your request rates. This can be done by analyzing the logs in a Unix BASH terminal and relies on a set of anchor text next to the request log timestamp. To obtain a count of all of the requests that occurred in a particular hour, you can look for request log timestamps ending at the hour mark:
 

#Starting the count at 2019-07-29 09:00 AM
>$ grep "2019072909" request*.log | wc -l
  162848
>$ grep "2019072910" request*.log | wc -l
  301432
>$ grep "2019072911" request*.log | wc -l
  275613
>$ grep "2019072912" request*.log | wc -l
  359222
patrickr-mac:logs patrickr$ grep "2019072913" request*.log | wc -l
  219759

User-added image

Each of the numbers returned represents requests served during that hour. These numbers can be very helpful to you when you want to tune and scale Artifactory to your shop's particular needs.

Information on How to Tune Artifactory for Heavy Loads is available HERE.