XRAY: How to troubleshoot RabbitMQ related issues which prevents Xray startup

Vignesh Surendrababu
2022-07-27 15:31

Overview:

RabbitMQ is a messaging Queue service that is used by Xray for performing asynchronous operations.

In this article, we are describing the common errors related to RabbitMQ which prevent the Xray startup.

It is required to have the RabbitMQ up and running at the time of the Xray startup. 
In certain cases, the failure of RabbitMQ startup can prevent Xray service startup. In this article, we will be checking the scenarios on the possible causes and steps to overcome it

Scenarios on RabbitMQ failure:

Issue 1:  

Error connecting to rabbit message queue check mq settings. Error: Exception (403) Reason: "username or password not allowed"

Xray comes pre-installed with RabbitMQ, by setting the erlang cookie value as the rabbitmq password for guest users. In certain cases, the default password can be hardcoded with “default_pass = guest” on the rabbitmq.conf file present under $JFROG_HOME/xray/app/bin/rabbitmq/ directory.

Hence, in this scenario, we would need to update the system.yaml matching the username/password available on the rabbitmq.conf file as shown below and restart Xray.rabbitMq:
        erlangCookie:
            value: JFXR_RABBITMQ_COOKIE
        url: amqp://localhost:5672/
        username: guest
        password: guest

To make sure the right username password, from the Xray user, try to execute the curl command to connect to the vhosts of RabbitMQ curl --user guest:guest http://rabbitmq:15672/api/vhostsIf in case of a failed authentication using the above curl command,  navigate to $JFROG_HOME/xray/app/third-party/rabbitmq/sbin directory and execute the below command to change the password for the guest user which Xray uses to connect with rabbitmq.  ./rabbitmqctl change_password guest guest
Note: On the above command we are changing the password as guest for the username guest.
Also, makesure to use the same password in rabbitmq.conf, system.yaml

Once the password is updated, restart Xray.

Issue 2:

Producer channel can not be allocated, error from rabbitMQ: Exception (504) Reason: "channel/connection is not open"

This error indicates that the channel Xray using is closed, explicitly or due to a channel exception and we need to Inspect RabbitMQ log to find out more.
If the below error is encountered on the rabbitmq log file[info] <0.272.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[‘rabbit@xray-eu-rabbitmq-ha-2.xray-eu-rabbitmq-ha-discovery.xray.svc.cluster.local’,‘rabbit@xray-eu-rabbitmq-ha-1.xray-eu-rabbitmq-ha-discovery.xray.svc.cluster.local’,‘rabbit@xray-eu-rabbitmq-ha-0.xray-eu-rabbitmq-ha-discovery.xray.svc.cluster.local’],[rabbit_durable_queue]}

This can also happen due to the corruption of mnesia tables, hence, to overcome this issue, try to remove the contents under the $JFROG_HOME/xray/var/data/rabbitmq/mnesia directory after stopping Xray and starting the Xray.

Note: When deleting the mnesia directory, the RabbitMQ message queues, exchanges will be deleted. Hence, if there was any new indexing performed, might need to reindex those artifacts. 

If RabbitMq running on a kubernetes cluster, remove the rabbitmq pvc, perform a redeployment which will help to eliminate the errors.

Issue 3:

no access to this vhost

Whenever the disk ran into an out of space or if the RabbitMQ is not stopped correctly or if there is a permission issue to connect to “/” vhosts this error may occur and to overcome the issue, it is suggested to follow the instructions available on the knowledge base article

Issue 4:

RabbitMQ keep alive: failed in opening a new connection: dial tcp 127.0.0.1:5672: connect: connection refused

The connection refused error may occur when the RabbitMq fails to start but the Xray startup is initialized. The possible reasons for this could be the erlang cookie mismatch and preventing the RabbitMq startup through “xray” user

To confirm this, navigate to the the directory /opt/jfrog/xray/app/third-party/rabbitmq/sbin and execute ./rabbitmqctl cluster_status

If there is any error observed like below, then it means that the erlang cookie created by Xray is not matching with the erlang cookie created by the rabbitmq itself

User-added image

As highlighted above the erlang cookie will be available under the effective user’s home directory. To confirm the content of all the .erlang.cookie files, navigate to the path “/opt/jfrog/xray” and “/root” then verify the .erlang.cookie file with “/opt/jfrog/xray/app/third-party/rabbitmq/.erlang.cookie”.

The content of all the .erlang.cookie should match and it is expected to have the content “JFXR_RABBITMQ_COOKIE”. If any mismatch is identified, stop Xray, and make sure all the rabbitmq processes are stopped. To retrieve the PID, use the below commands and use kill -9 <PID>ps aux | grep erl
ps aux | grep epmd
ps aux | grep rabbit
ps aux | grep erlang

Further, update the values on all places ie [“/root”, “/opt/jfrog/xray”, “/opt/jfrog/xray/app/third-party/rabbitmq/.erlang.cookie” ] and update the system.yaml to use the “JFXR_RABBITMQ_COOKIE” as rabbitmq.erlangCookie.value.

After updating the files, navigate to the /opt/jfrog/xray/var/data/rabbitmq/mnesia directory and take a backup and delete the contents within the directory.

Starting from Xray 3.8x, the stop and restart action on Xray will not be applied to RabbitMQ process. On start action of Xray, if RabbitMQ is not running, it will be started.
If you want the script to perform stop and restart action on RabbitMQ, set shared.rabbitMq.autoStop as true in the system.yaml. Note that this flag is not consumed in docker-compose installation.  rabbitMq:
    ## Enable this to stop rabbitmq along with other services of xray
    ## By default rabbitmq will always be running
    autoStop: true

Finally, restart Xray.