Reclustering

If you see these symptoms or are experiencing any of these problems, here’s our recommended solution.

1] First, check the network.

Please ensure the following ports are open for RabbitMQ communications:

4369 (epmd, a peer discovery service used by RabbitMQ nodes and CLI tools)
5672 (RabbitMQ’s listening port)
25672 (RabbitMQ’s inter-node HA port)
15672 (RabbitMQ’s dashboard UI and HTTP API port, which is optional, but helpful)

Ensure that each of your machines can reach all of those ports via telnet, cURL may not work since RabbitMQ uses AMP rather than HTTP.

Also ensure that the RabbitMQ Hostname is correct. RabbitMQ uses the "hostname -s" command to determine what its hostname should be. In some environments this hostname won't work; an IP address or other URL needs to be used instead. The hostname is used in the join_cluster command outlined below.

2] Choose one of the installed Xray nodes to be the Primary. Don't do anything else on your "primary" Xray node. If you're unsure which is supposed to be the primary, pick one arbitrarily. You'll have the other nodes join this Primary's cluster.

3] On the remaining Xray nodes, execute the following Rabbitmqctl commands:

#Halt the local RabbitMQ "app" - The Rabbitmq server itself should be running still
./rabbitmqctl stop_app 

#Join the Primary's cluster - I.E. run this command from xray-2
./rabbitmqctl join_cluster rabbit@xray-1

#Start the local RabbitMQ "app"
./rabbitmqctl start_app

4] Repeat step [3] on all Xray nodes until all of them have joined the Primary cluster

5] Verify the cluster state by re-running cluster_status. There should now be all the nodes in the printout in both places:

./rabbitmqctl cluster_status

Basics
Cluster name: rabbit@xray-1.us-central1-c.internal

Disk Nodes
rabbit@xray-1
rabbit@xray-2

Running Nodes
rabbit@xray-1
rabbit@xray-2
[...]

If you get an Erlang distribution error, it means you have an Erlang cookie mismatch in your /var/lib/rabbitmq/.erlang.cookie file. As RabbitMQ requires the cookies to be identical, copy the contents of the cookie from your Primary node to all of your other nodes, then restart the service:

[$XRAY_HOME/app/third-party/rabbitmq/sbin/]
./rabbitmqctl stop
rabbitmq-server -detached

Then, similar to the steps above:

# Stop the RabbitMQ app: 
rabbitmqctl stop_app

# Reset the app:
rabbitmqctl reset

# Cluster the app:
rabbitmqctl join_cluster rabbit@<Hostname>

# Restart the app: 
rabbitmqctl start_app

# Remirror you queues: 
rabbitmqctl set_policy ha-all "^" '{"ha-mode":"all"}'

Don't forget to check the rabbitmqctl cluster_status afterward.

Looking at the Queues section under the RabbitMQ dashboard, we can see that each node will display 'ha-all' in the features column, indicating that the policy for syncing queues is in place.

In the Queues section of your RabbitMQ dashboard, confirm that each of your nodes is displaying ha-all in the Features column. This will indicate that your syncing queues policy is in place and functioning properly:

User-added image

You can also execute a DB sync, and follow its progress by directly accessing both servers in the UI. The progress bar should update for both (nearly) simultaneously.

More information, from RabbitMQ’s Clustering Guide, is available HERE.