ARTIFACTORY: Artifactory Garbage Collection

Janardhana JL
2022-08-26 16:39

What is Garbage Collection?

When an Artifactory user "deletes" a file, what is actually deleted is the reference from the Artifactory database to the physical file. Before actually deleting a file Artifactory must scan the system to ensure that there are no other users referencing the file. Scanning the system is very CPU intensive, and locks files while the scan is in process, and this may stress the development environment. Therefore this can be scheduled to run periodically as a "Garbage Collection" process during times when demands on the system are low.
 
GC also helps in artifact optimization when there is a discrepancy between the artifacts and the binaries size.
 

How GC is triggered and how they delete the binaries

GC can be run from the Artifactory UI Administration module under Artifactory | Maintenance, where you can schedule an automatic run of Garbage Collection with a Cron expression and  GC can also be triggered using the REST API command.

User-added image

GC will be performing the activity and it will run based on the scheduled cron job and the task will be completed on Full GC strategy that runs every 20 GC iterations and then the artifacts will be deleted from the filestore. If you have monitored the storage status for a few GC iterations, we may not be able to observe major changes in storage, and wait till the Full GC completion.  We can also invoke the immediate GC job from the Artifactory UI until this triggers the full GC (i.e. trigger GC job 21 times to complete full GC immediately).

When the GC is triggered and completed we could observe logs shown below and as mentioned in the above  the full Garbage collection would happen on every 20 executions of the GC process and then reclaim the space once the GC clears the artifacts marked for deletion. 

Below is the example  log snippet:2021-12-12T22:00:00.006Z [jfrt ] [INFO ] [9d38a917aadd695c] [.s.d.b.s.BinaryServiceImpl:684] [art-exec-14] - Triggering Garbage Collection

2021-12-12T22:00:00.007Z [jfrt ] [INFO ] [9d38a917aadd695c] [.s.d.b.s.g.GarbageCollector:66] [art-exec-14] - Starting GC strategy 'TRASH_AND_BINARIES'

2021-12-12T22:00:18.853Z [jfrt ] [INFO ] [9d38a917aadd695c] [.s.d.b.s.g.GarbageCollector:68] [art-exec-14] - Finished GC Strategy 'TRASH_AND_BINARIES'
When the GC completes its cycle then we could  observe something like below in the logs:2021-11-25T20:00:00.303Z [jfrt ] [INFO ] [f7b0ef0ee5ecaebe] [.s.b.s.GarbageCollectorInfo:99] [art-exec-14] - Storage garbage collector report:
Total execution time: xx
Candidates for deletion: xx
Binaries deleted: xx
Total size freed: xx

Garbage Collection Improvements and Tuning

The following improvements have been introduced to the Garbage Collection mechanism:

1. From Artifactory 6.12.0, a faster Garbage Collection strategy was introduced and runs automatically when enabling the Trash Can settings. The new cleanup strategy fetches and undeploy the trashcan artifacts located under the trashcan repository that are older than the configured trash retention period.  The linked binary is deleted if there is no other artifact referencing the checksum in question.

2.The cleanup also runs on multiple threads and the threads can be configurable by setting the below parameter in the artifactory.system.properties file under
 JFROG_HOME/var/etc/artifactory
artifactory.gc.numberOfWorkersThreads=3

3. We can set/reduce the GC iteration count to less number in artifactory.system.properties, based on the traffic and load on Artifactory. If the traffic is pretty less and not much load on the Artifactory, you can set the parameter to "artifactory.gc.skipFullGcBetweenMinorIterations=<Some less number>".

NOTE: Garbage collection is a resource intensive operation. Running it too frequently may compromise system performance.

4.  From Artifactory 7.31.10, you can improve the Garbage Collection performance, by skipping the need to set the order of the objects by adding the artifactory.gc.skipOrderByFullGc=true parameter to the artifactory.system.properties file. By default, Artifactory deletes the largest files first when it runs the Garbage collections process. In this default strategy Artifactory will use the ORDER BY in the SQL, which could be slower in case of larger Artifactory instances with too much data to be deleted. However, if the Artifactory is having many artifacts it will be having huge number of rows in binaries table in the Artifactory DB, in this case we can enable the system property artifactory.gc.skipOrderByFullGc=true to not use the ORDER BY. This  means that the files will be deleted not based on their size and the process will be faster.

5.  We can also set the property to delete the binaries in batch by adding “artifactory.gc.binariesToDeleteBatchSize=10000”, default value 10k and this property decides the batch size.
 
NOTE: After making changes in the artifactory.system.properties file, restart the Artifactory instance to consider these changes.