How Debian Repository metadata calculation works?

How Debian Repository metadata calculation works?

AuthorFullName__c
Ariel Kabov
articleNumber
000004522
ft:sourceType
Salesforce
FirstPublishedDate
2019-08-11T07:22:32Z
lastModifiedDate
2024-03-10T07:47:59Z
VersionNumber
8
Debian repositories are available in Artifactory from version 3.3.
In version 5.6 the internal mechanism for Debian metadata calculation has changed.

Starting version 5.6, once a Debian package is deployed into a local repository, an event to index the distribution path of that repository is being added to a queue. (e.g. for the ‘debian-local’ repository and the Xenial distribution, the distribution path is: debian-local/dists/xenial)
The queue is constantly being worked on by dedicated Debian metadata workers (8 by default, configurable).
This means, once a Debian package has been uploaded, if the queue is empty and a worker is available, it will start handling the event and index the metadata.

This method works by creating a lock for an entire distribution path including the repository (debian-local/dists/xenial).
On top of that, in an HA cluster all nodes can participate in Debian metadata calculation. Events for distribution path & repository are being queued and handled one after the other. (In most cases everything happens very fast, so events do not queue for long, or at all)

Optional Index Compression Formats” can be configured. When creating a local Debian repository via the UI, it will mark the Bzip2(.bz2 extension) checkbox in the Optional Index Compression Formats. If not needed, disabling this will improve the metadata calculation time.

Virtual repository support for Debian has been added to Artifactory in version 6.6. The virtual repository has a separated metadata calculation process, which aggregates and merges the Packages & Release files from the selected aggregated repositories. Same as with local repositories, it is being indexed per distribution path & virtual repository.
The implementation of the virtual repository metadata calculation is similar to the local repository in terms of a dedicated queue and workers. For virtual repository metadata calculation there are 5 workers by default (also configurable).

The Debian Virtual repository has some important parameters which can highly impact the calculation time and performance:
Indexed Remote Architectures” - As we are going to merge local repositories with remote ones, specifying the architectures we want to index from remote repositories can save significant time for the process. We should always strive to index only what’s needed.
Optional Index Compression Formats” - Similar as with local repositories, but with an even higher impact. Indexing only the needed formats will save calculation time.
"Metadata Retrieval Cache Period” - Default is 10 minutes. This defines the period in which the calculated cache is treated as “not expired”. Once it expires, a metadata calculation must occur in order to download files. A higher value means fewer calculations, but recent items might not be available.

Note: Every indexed Debian package’s internal metadata is cached locally under $ARTIFACTORY_HOME/data/.cache/debian/. In this folder you will find a structure, composing a hierarchy per the Debian repositories in Artifactory as directories and under these the extracted Debian Control metadata files of the Debian packages that have been calculated. This becomes more efficient and even crucial in cases where you are running a calculation of the entire Debian repository, as the pre-extracted Control files are used to retrieve metadata from the local disk instead of extracting for each package again.