How to tune Cron Replication for a large number of artifacts

Joshua Han
2019-12-12 18:44

Summary

Cron-based replication temporarily stores file lists locally and may need to be tuned up
 

Details

Cron-based replication uses differences in file lists of the source and the target Artifactory to determine and replicate artifacts that the target does not have. 

Artifactory 6.13.0 introduced a performance improvement for Cron-based replication but may require tuning up new parameter for the Cron-based replication. With this change, Artifactory will save the file list from the target to the filesystem, under the Artifactory temp work directory, as a file called FullTree-{timestamp}.json.gz. It is stored in memory while the list is populated in chunks then stored as compressed (gzipped) to conserve disk space, so even if the file list API returns 1GB, it would be saved as 100MB or less. 

This file will be deleted after the replication is done or after the VM shuts down (in cases of shutting down mid-replications)

It may take a considerable amount of storage space if several cron-based replications run at the same time for repositories with a large number of artifacts and folders stored.
 

Resolution

You may estimate the size of the disk for this by running the curl command below after specifying the variables.

curl -L -u$ADMIN_USER "https://$targetArtifactoryUrl/artifactory/api/storage/$TARGET_REPO?list&deep=1&listFolders=1&mdTimestamps=1&statsTimestamps=1&includeRootPath=1" -o output.json

About 10 percent (or less) of the size of the output.json is what you can expect to take the temporary disk space, in addition to what the other concurrent Cron based replication may create at the same time. Also, you can expect increased memory utilization for up to the size of the output.json file as it is temporarily stored in memory before it is saved compressed in the disk. It is advisable to separate different replications timing not to run simultaneously. In order to get the most accurate estimation, choose the repository with the largest number of artifacts and folders.

Based on the size of the total output, you may customize the size of how much space Artifactory will use for the replication by setting the following parameters in artifactory.system.properties with the desired size. # Default is 100000000

artifactory.replication.push.fullTree.saveLocally.free.disk.threshold.bytes=100000000

Also, you may disable this to go back to behavior of replicating while streaming the file-list by setting the following flag to false (default: true)

artifactory.replication.push.fullTree.saveLocally=true 

Please use the following article for troubleshooting:

https://jfrog.com/knowledge-base/how-to-troubleshoot-common-replication-issues/