How Debian Repository metadata calculation works?

Ariel Kabov
2019-08-12 07:09

Debian repositories are available in Artifactory from version 3.3.
In version 5.6 the internal mechanism for Debian metadata calculation has changed.

Starting version 5.6, once a Debian package is deployed into a local repository, an event to index the distribution path of that repository is being added to a queue. (e.g. for the ‘debian-local’ repository and the Xenial distribution, the distribution path is: debian-local/dists/xenial
The queue is constantly being worked on by dedicated Debian metadata workers (8 by default, configurable).
This means, once a Debian package has been uploaded, if the queue is empty and a worker is available, it will start handling the event and index the metadata.

This method works by creating a lock for an entire distribution path including the repository (debian-local/dists/xenial).
On top of that, in an HA cluster all nodes can participate in Debian metadata calculation. Events for distribution path & repository are being queued and handled one after the other. (In most cases everything happens very fast, so events do not queue for long, or at all)

Optional Index Compression Formats” can be configured. When creating a local Debian repository via the UI, it will mark the Bzip2(.bz2 extension) checkbox in the Optional Index Compression Formats. If not needed, disabling this will improve the metadata calculation time.

Virtual repository support for Debian has been added to Artifactory in version 6.6. The virtual repository has a separated metadata calculation process, which aggregates and merges the Packages & Release files from the selected aggregated repositories. Same as with local repositories, it is being indexed per distribution path & virtual repository.
The implementation of the virtual repository metadata calculation is similar to the local repository in terms of a dedicated queue and workers. For virtual repository metadata calculation there are 5 workers by default (also configurable).

The Debian Virtual repository has some important parameters which can highly impact the calculation time and performance:
Indexed Remote Architectures” – As we are going to merge local repositories with remote ones, specifying the architectures we want to index from remote repositories can save significant time for the process. We should always strive to index only what’s needed.
Optional Index Compression Formats” – Similar as with local repositories, but with an even higher impact. Indexing only the needed formats will save calculation time.
"Metadata Retrieval Cache Period” – Default is 10 minutes. This defines the period in which the calculated cache is treated as “not expired”. Once it expires, a metadata calculation must occur in order to download files. A higher value means fewer calculations, but recent items might not be available.

Note: Every indexed Debian package’s internal metadata is cached locally under $ARTIFACTORY_HOME/data/.cache/debian/. In this folder you will find a structure, composing a hierarchy per the Debian repositories in Artifactory as directories and under these the extracted Debian Control metadata files of the Debian packages that have been calculated. This becomes more efficient and even crucial in cases where you are running a calculation of the entire Debian repository, as the pre-extracted Control files are used to retrieve metadata from the local disk instead of extracting for each package again.
 

REST APIs

Calculate Debian Repository Metadata – Recalculate the metadata of an entire repository.
Synchronous by default. Applicable to Local and Virtual repositories.

Calculate Cached Remote Debian Repository Coordinates – This API will add coordinates to cached Debian packages. This will make the packages resolvable if later moved to a local repository.
 

Tuning Debian Metadata Calculation

Heads Up! – Debian metadata calculation is based on a locking mechanism. Once a worker has started to index a specific combination of repository & distribution, another worker won’t start indexing another event for the same path while the lock exists. Increasing the workers’ allocation won’t help if all packages are deployed to the same repository and distribution.
The below-mentioned workers are part of the Artifactory async Thread Pool
When changing the below configurations, you may need to consider increasing the total async Thread Pool
Read more at: How do I tune Artifactory for heavy loads?

We will configure the mentioned properties in: $ARTIFACTORY_HOME/etc/artifactory.system.properties.

Debian Local repository metadata calculation workers:artifactory.debian.metadata.calculation.workers = 8

Debian Virtual repository metadata calculation workers:artifactory.debian.virtual.metadata.calculation.workers = 5

Debian Cached Remote repository coordinates calculation workers:artifactory.debian.coordinates.calculation.workers = 4

Troubleshooting Debian Metadata problems

Disclaimer: Changing log levels from their original value may result in performance degradation. Handle with care when applying to production systems.

To troubleshoot possible problems with the Debian Metadata calculation, we can add these loggers to $ARTIFACTORY_HOME/etc/logback.xml.
When applied, a "debian.log" file will appear in the $ARTIFACTORY_HOME/logs/ folder. A restart is not required for these changes to effect.

Most of Debian related operations:<appender name="debian" class="ch.qos.logback.core.rolling.RollingFileAppender">
      <File>${artifactory.home}/logs/debian.log</File>
      <encoder>
         <pattern>%date ${artifactory.contextId}[%thread] [%-5p] \(%-20c{3}:%L\) - %m%n</pattern>
      </encoder>
      <rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
         <FileNamePattern>${artifactory.home}/logs/debian.%i.log</FileNamePattern>
         <maxIndex>13</maxIndex>
      </rollingPolicy>
      <triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
         <MaxFileSize>25MB</MaxFileSize>
      </triggeringPolicy>
</appender>

<logger name="org.jfrog.repomd.debian" additivity="false">
<level value="trace"/>
<appender-ref ref="debian"/>
</logger>
<logger name="org.artifactory.addon.debian" additivity="false">
<level value="trace"/>
<appender-ref ref="debian"/>
</logger>
<logger name="org.jfrog.repomd.dpkg" additivity="false">
<level value="debug"/>
<appender-ref ref="debian"/>
</logger>

Mover service (responsible to copy/move operations, being called during calculation finalization. This is a cross-Artifactory logger and not only Debian related):<logger name="org.artifactory.repo.service.mover" additivity="false">
<level value="trace" />
<appender-ref ref="debian"/>
</logger>

Work queue (The core of the work queue used by Debian metadata calculations. This is a cross-Artifactory very verbose logger):<logger name="org.artifactory.work.queue" additivity="false">
<level value="trace"/>
<appender-ref ref="debian"/>
</logger>

Information about read from local cache (This logger will help in troubleshooting issues with calculations of the entire repository):<logger name="org.artifactory.addon.dpkgcommon" additivity="false">
<level value="trace"/>
<appender-ref ref="debian"/>
</logger>
<logger name="org.jfrog.repomd.dpkg.extractor.DpkgMetadataExtractor" additivity="false">
<level value="trace"/>
<appender-ref ref="debian"/>
</logger>