How does RPM indexing works in Artifactory?

RPM (Yum) repository is a repository meant for holding and managing RPM Packages. 

It works with clients used by popular Unix systems such as RHEL and CentOS for managing binary packages. Artifactory is a fully-fledged RPM and YUM repository manager. JFrog’s official Wiki page offers detailed information about Artifactory RPM repositories.  
 
We will discuss internal parts to expose more depth about the indexing process in Artifactory. 
 
Since Artifactory version 5.5.0 – a major improvement for the YUM metadata calculation handling has been released. Metadata is being calculated in a parallel and incremental process.

Also:

  • This should be preferable over the previous (automatic) asynchronous calculation triggering. 
  • User plugins for triggering metadata calculation (generally) are no longer needed.
  • It is possible to monitor and know exactly the new metadata calculation is now available. 

First, consider the below:

When creating an RPM repository, the “Auto Calculate RPM Metadata” repository option would be ticked-on, when means every file deployment, copy or movement action would be intercepted by Artifactory and would trigger the calculation process automatically. Naturally, this is useful since in automation processes it is needed to have the newest metadata made available to be served by Artifactory to clients.
 
There are two ways for metadata calculation to happen:

Async:
This is the normal path; whenever a package deployment, be via REST or the UI, a calculation will follow (given the option above is enabled). Asynchronous calculation is based on intercepting file operations and adding the needed indexing operation to an internal Artifactory queue. This usually happens immediately.
 
Sync: 
Can be used only when the when the “Auto Calculate RPM Metadata” is set to off. Now, you may control the triggering of the yum metadata calculation manually. This can be used if you want to ensure all of the metadata on the repository is available to be served to any requesting client by holding the request until the calculation is finished.

Example:

You have a CI job that deploys many versions to a large repository (e.g. snapshot versions); you can add an extra build step that would be dependent on an answer to the Calculate YUM metadata REST API query answer with the async query parameter set to 0 – this is triggered on a repository called rpm-release-local:

curl -uadmin:password -XPOST "localhost:8081/artifactory/api/yum/rpm-release-local?async=0" -i -Lvv

* Connected to localhost (::1) port 8081 (#0)
* Server auth using Basic with user 'admin'
> POST /artifactory/api/yum/rpm-release-local?async=0 HTTP/1.1
> Host: localhost:8081
> Authorization: Basic YWRtaW46cGFzc3dvcmQ=
> User-Agent: curl/7.54.0
> Accept: */*

< HTTP/1.1 200 OK
< Server: Artifactory/6.3.2
< X-Artifactory-Id: a9116dfeb1f6dac4:449dde33:1658a295e45:-8000
< Content-Type: text/plain
< Transfer-Encoding: chunked
< Date: Sun, 02 Sep 2018 12:19:56 GMT

YUM metadata calculation for repository 'rpm-release-local' accepted.
 

  • With respect to any aggregating RPM virtual repositories; by the end of the calculation, virtual repository calculation is triggered as well.

 
Tuning options with Artifactory RPM system properties (5.5.0 and above):

  • rpm.metadata.calculation.workers (default is 8) – Controls the number of threads (workers) for RPM metadata calculation.
  • rpm.metadata.history.cycles.to.keep (default 3) – Meanwhile calculations are taking place and in real world it is likely that concurrent calculation would run, Artifactory keeps record of previous metadata record, including ones that already finished parallel calculation.

 
RPM logging (org.artifactory.addon.yum.YumAddonImpl):

INFO Level: Starting to calculate Rpm metadata for
 
Verbose logging: You can enable debug/trace level logging on the following packages in Artifactory (modify $ARTIFACTORY_HOME/etc/logback.xml) to track/debug your calculations: 

org.artifactory.addon.yum.YumAddonImpl:

  • Automatic calculation (Async):

Level Debug: Async Rpm calculation for {path}
 

  • Triggered (Sync):

Level Debug: Sync Rpm calculation for {path}
 

  • Virtual RPM Repository calculation:

Enable per log level for org.artifactory.addon.yum.virtual.index:

Level Debug: Starting virtual yum metadata calculation for {path}
 

  • Trace level for the entire package logic process:

Enable per log level for org.jfrog.metadata.indexer.RpmRepoIndexer:

Level TRACE: Preparing to index RPM repository metadata

Level Debug: Finished indexing RPM repository metadata