How to tune Xray for heavy loads?

Ariel Kabov
2019-08-25 06:48

Relevant Versions: Xray 2.8.8 and above.

Xray comes with a predefined set of default parameters and configurations.
If you believe your Xray server is under-utilized or you wish to tune Xray to be capable of handling a higher load, you should be able to achieve that by following this article.

While it is always possible to scale horizontally by adding additional nodes to your HA cluster, here we will focus on a more vertical scale.

Memory & CPU

The minimum system requirements for Xray 16GB of RAM and 8-cores CPU.
Increasing these will allow you to scale Xray to higher limits. It is important to monitor the Xray system resources, in addition to the DB servers.

Externalize Databases

If you haven’t done so yet, our recommendation will be to externalize the databases used by Xray to dedicated servers. This will ensure a healthier growth and easier troubleshooting in case of an issue. 

Storage

When scaling up Xray, do not forget to allocate extra storage! As we are going to allow more parallel indexing tasks, this will require more disk space.

RabbitMQ

As Xray uses RabbitMQ for queue & task management, it is important to familiarize yourself with it as well, in addition to tune it accordingly too.

RabbitMQ Production Checklist
RabbitMQ Runtime Tuning

Database Connections

We can alter the maximum connections each Xray microservice can open to the Postgres DB.
This will be configured in the $XRAY_HOME/config/xray_config.yaml file.

Default values:maxOpenConnServer: 30
maxOpenConnPersist: 30
maxOpenConnAnalysis: 30
maxOpenConnIndexer: 30

Tuning example:maxOpenConnServer: 100
maxOpenConnPersist: 100
maxOpenConnAnalysis: 100
maxOpenConnIndexer: 100

Important: Do not forget to increase the number of connections the Postgres DB can accept. 
As a rule of thumb we will require from the DB a number of connections based on:Total # of connections = (number of nodes) * (maxOpenConnServer + maxOpenConnPersist + maxOpenConnAnalysis + maxOpenConnIndexer) + 50;

Queue Workers

Defaulty, Xray starts with all queue workers set to 8.
This is configurable via the Xray UI, at Admin → General.
The configured number presents how much parallel messages the corresponding Xray microservice can handle in parallel.
For instance, the “Index” value defines how many concurrent packages each Indexer node can process. 

In Xray versions 2.9.0 and above, you can set the worker value via the UI.

For versions below 2.9.0, a higher value can be granted if inserted directly to the MongoDB.
To review the current configurations via a MongoDB query:> db.configuration.find({config_id:"xrayConfig"}).pretty()

To update the workers settings:> db.configuration.update({config_id: "xrayConfig"},{$set : {"general_settings.index_workers": NumberInt(40)}})
> db.configuration.update({config_id: "xrayConfig"},{$set : {"general_settings.bin_mgr_workers": NumberInt(40)}})
> db.configuration.update({config_id: "xrayConfig"},{$set : {"general_settings.persist_workers": NumberInt(40)}})
> db.configuration.update({config_id: "xrayConfig"},{$set : {"general_settings.alert_workers": NumberInt(40)}})
> db.configuration.update({config_id: "xrayConfig"},{$set : {"general_settings.analysis_workers": NumberInt(40)}})
> db.configuration.update({config_id: "xrayConfig"},{$set : {"general_settings.impact_analysis_workers": NumberInt(40)}})
> db.configuration.update({config_id: "xrayConfig"},{$set : {"general_settings.notification_workers": NumberInt(40)}})

In the above example we set the number of workers of all queues to 40.
Do not forget that each worker is a separate Goroutine (as Xray is written in Go), so a high worker value will be limited by the available CPU cores.

Important: By increasing the “bin_mgr_workers” we allow Xray to open more concurrent connections to Artifactory, resulting in more concurrent downloads.
A high value here can also impact the Artifactory instance, therefore it is important to monitor Artifactory as well. (Consider tuning Artifactory as well)

*Depending on the scale, you might need to modify the RabbitMQ Virtual Host limit.
To do so run the following on the RabbitMQ host (will allow unlimited number of connections):rabbitmqctl set_vhost_limits -p / '{"max-connections": -1}'
 

Tuning Artifactory for Xray

In Artifactory, there are several properties that can be configured to tune the Artifactory <> Xray interactions.
By default, Artifactory is set to check every 60 seconds if it has new events it should send to Xray. This can be altered, in addition to some other parameters.

This will be set in $ARTIFACTORY_HOME/etc/artifactory.system.properties.

Tuning example:artifactory.xray.indexer.intervalSecs = 30

 

Property name

Usage

Default

artifactory.xray.indexer.intervalSecs

Interval between events submission

60

artifactory.xray.client.block.cache.expiration.intervalSecs

Cache of artifacts that got a scanning status 

300

artifactory.xray.client.block.unscanned.cache.expiration.intervalSecs

Cache of artifact that don’t have scanning status

120

artifactory.xray.client.block.cache.size

 

10000

artifactory.xray.client.heartbeat.intervalSecs

Interval between each Xray server status checkup

5

artifactory.xray.client.max.connections

 

50

artifactory.xray.client.builds.socket.timeout.millis

Build client – anything related to scan build operations

600000

artifactory.xray.client.normal.socket.timeout.millis

Normal client – everything else

5000

Changing any of the other parameters is not expected to tune Xray better for heavy loads, but it is shared here for common knowledge.
 

Manually Vacuuming PostgreSQL during off-hours

Xray triggers Vacuum for the “files” table in Postgres once a week.
This will by default occur exactly 1 week after the last time Xray has started.

In large scale environments, the operation of a `vacuum full` can take up to several minutes, and during this period Xray might not behave as expected.
In order to avoid that, you can manually invoke vacuuming during off-hours, thus even when Xray will `vacuum`, as the table was recently `vacuumed`, it shouldn’t take long.

One option to achieve this will be to use pg_cron to schedule these:vacuum full files;
vacuum full root_files;