This guide includes a description of how the JFrog Artifactory Garbage collection job works as well as tuning advice and frequently asked questions.
Below is an example of the same artifact (left side) that was deployed to Artifactory twice, each time to a different repository. On the righthand side, we see that the binary was saved only once in the filestore.
Artifactory Garbage Collector
The Garbage Collector's job is to clear out binary files from storage that don't have matching artifacts and free up disk space.
Artifactory Garbage Colletor has two strategies: Small Garbage Collection (from Artifactory version 6.12.0) and Full Garbage Collection (since day one). In both strategies, Artifactory uses a database query(ies) to determine which binaries should be removed from the filestore by comparing the artifacts table and the binaries table.
Small Garbage Collection
This task runs on every GC execution and involves searching the trash can for artifacts whose Retention Period has expired.
A binary will be removed from storage as well as its references from nodes and binaries database tables if there are no more copies of the corresponding artifact exists.
If there is still even a single copy of this artifact in another path or repository, only the reference from the nodes table will be removed. Artifactory will keep the binary and its corresponding entry in the binaries table.
Full Garbage Collection
Note: The Full Garbage Collection job may consume a lot of system resources of both Artifactory and its Database.
The Full Garbage Collection uses a batch cleanup mechanism since Artifactory version 7.29.8 to improve performance.. The batch size and number of sub-iterations is configurable, see the tuning section of this guide for more information.
Good to Know:
- The Small Garbage Collector won't clean up binaries if artifacts are manually deleted in bulk from the Trash Can. You will have to wait until the 20th iteration of the GC when the Full GC will be triggered.
- For a binary to qualify for Garbage Collection, it must have a reference in the database's binaries table. Any files that weren't deployed to your storage via Artifactory won't be removed by the Garbage Collector. For such cases, you may use the Prune Unreferenced Data feature.
- The Small Garbage Collector won’t work if the Trash Can is disabled in Artifactory (Administration panel → Artifactory → Settings → “Enable Trash Can”).
How to trigger the Garbage Collection
Garbage Collection can be triggered by running the Rest API call. To trigger the Full Garbage collection, execute the Rest API call for 20 times.
curl -uusername:password -XPOST "http://<ARTIFACTORY-URL>/artifactory/api/system/storage/gc"
Log in as an Artifactory user with administrative permissions. Navigate to the Administration panel → Artifactory → Maintenance → Click on “Run Now”, as shown below.
Verification and Monitoring
In order to verify that the Small Garbage Collection job was executed, search for the following output in the artifactory-service.log or console.log logs:
2022-11-07T17:09:51.474Z [jfrt ] [INFO ] [38dc43ddf24cdacc] [.s.d.b.s.BinaryServiceImpl:728] [24cdacc|art-exec-138] - Triggering Garbage Collection
2022-11-07T17:09:51.475Z [jfrt ] [INFO ] [38dc43ddf24cdacc] [.s.d.b.s.g.GarbageCollector:66] [24cdacc|art-exec-138] - Starting GC strategy 'TRASH_AND_BINARIES'
2022-11-07T17:09:51.476Z [jfrt ] [INFO ] [38dc43ddf24cdacc] [.s.d.b.s.g.GarbageCollector:68] [24cdacc|art-exec-138] - Finished GC Strategy 'TRASH_AND_BINARIES'
In order to verify that the Full Garbage Collection job was executed, search for the following output in the artifactory-service.log or console.log logs:
2021-06-03T19:00:52.167Z [jfrt ] [INFO ] [2b5d4bc1dd3e2430] [.s.b.s.GarbageCollectorInfo:96] [art-exec-2270397 ] - Storage garbage collector report:
Number of binaries: 470,507
Total execution time: 49.93 secs
Candidates for deletion: 124
Checksums deleted: 123
Binaries deleted: 123
Total size freed: 15.80 GB
Current total size: 18.74 TB
To keep track of Artifacts and Binaries sizes after Garbage Collection execution, navigate to the Administration panel → Monitoring → Storage Status page in the Artifactory UI as a user with admin privileges or use the REST API call.
Why the Binaries' Size is greater than the Artifacts' Size?
When an artifact is deleted, its database reference is deleted immediately, but the binary stays in the filestore until the next GC run.
Binaries Size greater than Artifacts Size indicates that the GC might not work properly or not running at all.
When the Artifacts Size is greater than or equal to the Binaries Size, the GC operates as expected.
Below is an example of Artifactory Storage Status that indicates that the GC doesn’t work properly or fast enough (Binaries Size greater than Artifacts Size):
How many binaries are eligible for Full Garbage Collection, and how much space should be freed?
Run the following SQL query on the Artifactory database:
Sum (b.bin_length) as binaries_size_in_bytes
FROM binaries b
WHERE NOT EXISTS
FROM nodes n
WHERE n.sha1_actual = b.sha1);
How to tune the Garbage Collection?
Each of the system properties listed below can be configured in the $JFROG_HOME/artifactory/var/etc/artifactory/artifactory.system.properties file.
Be sure to restart Artifactory for the changes to take effect.
1. Scheduling the Garbage Collection
2. Tuning the number of worker threads
artifactory.gc.numberOfWorkersThreads=3Note: When using Microsoft SQL Server, Garbage Collection is single-threaded regardless of the system property above.
3. Number of Small Garbage Collection runs (Artifactory 6.12.0 and above)
4. Disable sorted deletion of binaries (Artifactory 7.31.10 and above)
5. Configure the Full GC batch size and the number of iterations (Artifactory 7.29.8 and above)
artifactory.gc.binariesToDeleteBatchSize=10000The property below controls the number of the Full Garbage Collection sub-iterations.
artifactory.gc.binariesToDeleteIterationAmount=20Even if there are additional binaries to clean up, the Garbage Collection will stop once the above-mentioned value is reached, and the following message will be displayed to indicate this.
2022-08-01 00:38:27,233Z [jfrt ] [WARN ] [a83991fe168767bb] [.s.d.b.s.BinaryServiceImpl:681] [art-exec-1072408 ] -
The GC is stopping due to maximum iterations reached