JFrog Xray RabbitMQ Queues

Snir Ben Ami
2021-09-14 15:10

JFrog Xray RabbitMQ Queues

User-added image

Xray uses RabbitMQ for managing asynchronous operations.
This document describes the main queues and the way Xray utilizes them.

RabbitMQ Queues

There are 3 main types of queues:

  • New Content – Does not have a suffix – for example, Index. This type of queue is responsible for events related to new content added to the system. For example, uploading a new Artifact to a repository, that is marked for indexing, will create a message in the Index queue.

  • Existing Content – Includes the ‘ExistingContent’ suffix – for example, IndexExistingContent. This type of queue is responsible for the content which already exists in the system. For example, reindexing a repository will send messages to this queue.

  • Retry – Failed messages will be sent to this queue and will stay there for a TTL. Once the TTL has elapsed, the messages will be returned to the original queue. See the Retry section.
     

Workers

Each queue will have workers which are consumers of the queue. The configuration is per node of the application.
For example, if the system is configured with 10 Index workers and there are 3 HA nodes, the system will have 30 consumers for the Index queue.
 

Index

Downloads from JFrog Artifactory and creates the components graph.

Queues

  • index – New artifacts indexing. Will affect the indexing of new builds, artifacts, and Release Bundles.

  • indexExistingContent – Existing artifact indexing that is triggered by reindexing a repository. 
     

Persist 

Persists the dependency graph to the Postgres database.

Queues

  • persist – Persist dependency graph of new Artifacts. Messages will be published here from Index workers. Will affect the indexing of new builds, artifacts, and Release Bundles. Also responsible for deleting unused dependency graphs after delete events.

  • persistExistingContent – Persists the dependency graph of existing Artifacts. Messages will be published here by the IndexExistingContent workers. 
     

Analysis

Matches security data to components

Queues

  • analysis – Analyzes the dependency graph of new artifacts and matches security vulnerabilities.

  • analysisExistingContent – Analyzes the dependency graph of existing artifacts and matches security vulnerabilities. Can be triggered by rescanning a watch and reindex.
     

Alert

Creates violations according to Policies and Watches, prepares emails, and webhooks.

Queues

Alerts support these queues:

  • alert – Create violations according to Policies and Watches, create emails and webhooks – for new artifacts.

  • alertImpactAnalysis – For existing artifacts, triggered by impact analysis (see below). 
     

Notification

Performs the actual sending of emails and webhooks.

Queues

notification – All generated notifications.
 

Impact Analysis

The process of updating the dependency graph with new vulnerabilities and license data following a DB-sync.

Queues

impactAnalysis – New vulnerabilities and license data for existing components.
 

MDS Update

The process of updating Metadata Server with security information. This information will be exposed while searching through the UI/API

Queues

  • mdsUpdate – Sends updates over REST to the Metadata Service (installed with Artifactory) for new deployed artifacts.

  • mdsUpdateExistingContent – Sends updates over REST to Metadata Service (installed with Artifactory) for existing artifacts. Can be triggered by other ‘ExistingContent’ workers. Most of the load for this queue derives from the Impact Analysis flow. 

Retry

The built-in retry mechanism works as follows: 

  1. Failed messages with recoverable errors (like network errors, disk limits, etc.) will be written to the corresponding retry queue with a small TTL (starting at 6 seconds).

  2. After the TTL has expired, the messages will automatically be moved (by RabbitMQ) to the original queue.

  3. Upon reoccurred errors, the message will be transferred to the retry queue with 2* (original TTL). If the TTL is larger than 7 days, the message will be considered failed and persisted in the failure table (also presented in the UI under ‘System Messages’). 

Scale

Scaling up the workers can be done in 2 ways:

  • Increasing the number of workers for specific queues

  • Adding new Xray nodes

Considerations for scaling out/up:

Scaling may affect the following external resources:

  • PostgreSQL – May affect the number of connections, CPU, Disk, and Memory usage

  • Artifactory and Metadata Server – Increasing the number of MdsUpdate workers, can cause a significant increase in the load on Artifactory and Metadata server.

  • RabbitMQ – Increasing the number of workers usually does not create a bad effect on RabbitMQ. Usually, RabbitMQ consumes a lot of resources when the queues get very large.

Configuration

All queues’ workers are configurable through the REST API. One exception is the MDS worker, which is only configurable in the system.yaml, and there is no separate configuration for MdsUpdateExistingContent and MdsUpdate.

A restart is required in order to apply changes in the workers configuration.

In the future, we plan to move all the configuration of workers to the system.yaml.