Federation Recovery and Auto-Healing

JFrog Artifactory Documentation

ft:sourceType
Paligo

Important

This feature requires all JPDs in the Federation to run Artifactory release 7.71.1 or later.

In certain cases, it may not be possible to maintain near real-time synchronization of all artifact events (create, update, delete) among Federation members. Examples include short-term networking issues between the JPDs, Artifactory upgrades, a user-initiated synchronization pause, and so on. If synchronization continues to fail after reaching the maximum number of retry events, event sync is paused and the Federation moves into an error state.

One way to recover the Federation is to perform a full sync, but this can be a time-consuming process if the Federated repositories contain a large number of artifacts, as this amounts to restarting the Federation.Federated Repository Full Sync

Starting with release 7.71.1, Artifactory now features an auto-healing mechanism that checks Federated repositories at regular intervals for exhausted queues (queues that have exceeded the maximum number of attempts to send events to other Federation members). This mechanism resets the failed events automatically and tries again to sync with the target mirror.

Note

If events have accumulated over a period of days, the event cleanup mechanism might potentially clean events that have not been propagated, causing the queue to move to an out-of-sync state. In such cases, performing a full sync is required.

Email Notifications

All administrators who are registered for Artifactory's mail service will receive notifications similar to the one shown below when auto-healing takes place:

2023-09-21T10:39:24.696Z [jfrt ] [INFO ] [29cb8b34b3ec63e4] [atedRepositoryRecoveryTest:247] [TestNG_1            ] [rt_229036255 ] [rt_229036255] - Mail notification subject: [JFrog] Mirror Recovery in Progress
2023-09-21T10:39:24.696Z [jfrt ] [INFO ] [29cb8b34b3ec63e4] [atedRepositoryRecoveryTest:249] [TestNG_1            ] [rt_229036255 ] [rt_229036255] - Mail notification content: ------=_Part_0_2088885210.1695292764495
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit

"Auto Healing" task has recognized a mirror in an exhausted state: 'http://localhost:11552/artifactory/generic-fed-9ac0b669-5760-4ff3-a209-c6b8c7826928' -> 'http://localhost:55295/artifactory/generic-fed-9ac0b669-5760-4ff3-a209-c6b8c7826928'.
Recovery attempt is now in progress...

System Properties

Federation recovery and auto-healing are controlled using the following properties in the system.properties file:

Property

Description

artifactory.auto.healing.job.interval.sec

Defines the interval (in seconds) at which the auto-healing feature checks for exhausted queues.

The default value is 20 seconds.

artifactory.federated.subsequent.event.grace.period

Defines the buffer that works in conjunction with the federated. negotiation.enabled setting.

The default value is 60 seconds.

artifactory.federated.event.queue.max.error.retries

Defines the number of attempts to send a queued Federated event before the queue becomes exhausted and therefore eligible for auto-healing.

The default value is 6.

artifactory.persistentQueue.max.lock.lease.time.minutes

Defines the delay interval (in minutes) between attempts to trigger the queue.

The default value is 1.

artifactory.reset.stale.full.sync.job.interval.min

Defines the interval (in minutes) for an async task that resets the status of a Full Sync operation that has become "stuck", enabling the Full Sync to restart.

This property is useful, for example, if the Artifactory instance is restarted while a Full Sync operation is running. After the restart, this async task will reset the operation and restart it.

The default value is 15.

artifactory.reset.stale.full.sync.job.initial.delay.min

Defines the initial delay (in minutes) before running the async task that resets the status of a stuck Full Sync operation.

The default value is 1.

Manual Recovery using a REST API

Use the Federation Recovery REST API to perform recovery manually. This API can be used when auto-healing has been disabled or when you want to perform recovery immediately without waiting for the auto-healing interval to arrive.Federation Recovery