How RPM Repository Indexing Works in Artifactory

Daniel Poterman
2021-03-25 09:06

An RPM repository is designed to hold and manage RPM packages. It works with clients used by popular Unix systems, such as RHEL and CentOS, for managing binary packages. Artifactory is a total RPM repository manager. While detailed information about Artifactory RPM repositories can be found HERE, this article will focus on the RPM indexing process.
RPM repositories enable:

  • RPM metadata calculation for RPM packages hosted in Artifactory local repositories
  • Provisioning of RPM packages directly from Artifactory to YUM clients
  • High-concurrency performance for the software development industry
  • Detailed RPM metadata views from within Artifactory's web UI
  • Provisioning of GPG signatures, which can be used by the YUM client to authenticate RPM metadata

When creating an RPM repository, the Auto Calculate RPM Metadata repository option should be ticked. This allows Artifactory to intercept every file deployment, copy, or movement action, and triggers the calculation process automatically. This is a benefit because automated processes make it possible for Artifactory to offer clients the newest metadata as soon as it's available. As regards metadata calculations, they can occur in one of two ways:

  • Async:

This is the usual path. Whenever a package is deployed (via REST or the UI), a calculation follows. An asynchronous calculation is based on intercepting file operations and adding the necessary indexing operation to an internal Artifactory queue. This typically happens immediately.

  • Sync:

This method allows you to control the triggering of YUM metadata calculations manually. However, this option can only be used when the Auto Calculate RPM Metadata function is turned off. Sync is a useful approach when you want to ensure that all of the metadata in your repository will be available to be served to any requesting client, but holding on any such request until a given calculation has been completed.

Let’s say you have a CI job that deploys many versions of a package to a large repository (e.g., snapshot versions). You could add an extra build step that would be dependent on an answer to the Calculate YUM metadata REST API query with the async query parameter set to "0," which would be triggered on the rpm-release-local repository:curl -u<USERNAME>:<PASSWORD> -XPOST "localhost:8081/artifactory/api/yum/rpm-release-local?async=0" -i -LvvThe output of this query would be, as follows:* Connected to localhost (::1) port 8081 (#0)
* Server auth using Basic with user 'admin'
> POST /artifactory/api/yum/rpm-release-local?async=0 HTTP/1.1
> Host: localhost:8081
> Authorization: Basic YWRtaW46cGFzc3dvcmQ=
> User-Agent: curl/7.54.0
> Accept: */*

< HTTP/1.1 200 OK
< Server: Artifactory/6.3.2
< X-Artifactory-Id: a9116dfeb1f6dac4:449dde33:1658a295e45:-8000
< Content-Type: text/plain
< Transfer-Encoding: chunked
< Date: Sun, 02 Sep 2018 12:19:56 GMT
YUM metadata calculation for repository 'rpm-release-local' accepted.
In the box above, as regards any aggregated RPM virtual repositories, at the conclusion of the process, a virtual repository metadata calculation will be triggered, as well.

Artifactory System Properties that can be tuned to optimize your interactions with RPM include:

  • artifactory.rpm.metadata.calculation.workers (eight by default): Controls the number of threads (workers) for local RPM metadata calculations.
  • (three by default): As calculations are taking place (and in real-world situations, it's likely that concurrent calculations will be running), Artifactory is maintaining a record of previous metadata.
  • artifactory.yum.virtual.metadata.calculation.workers (five by default): Controls the number of threads (workers) for virtual RPM metadata calculations.