Enhancing JFrog Internal Operations with Near Zero Downtime Migration

How JFrog transformed its internal migration process to ensure faster, more reliable data transfers with minimal operational disruption

JFrog's nZDM Database Migration 863x300Data migrations have long been a significant source of anxiety for businesses and IT teams alike. The thought of moving critical databases often conjures images of prolonged downtime, service interruptions, and the ever-present risk of data loss.

Indeed, statistics show that “90% of businesses experience unexpected downtime during database migrations, leading to significant revenue loss and customer dissatisfaction”. From JFrog’s research and internal data, migrations could mean overall downtime of up to 50 hours, requiring extensive manual intervention, complex infrastructure changes, and sequential processes that extended overall migration times, causing delays and potential errors.

At JFrog, we understand these challenges intimately. That’s we have developed our own near Zero Downtime Migration (nZDM) tool, for internal use only in our own development operations.  For the sake of clarity, this is an internal tool which is used exclusively by JFrog for its internal migrations and is not available for external use.

The purpose of this blog is to show how much innovation we invest in our own operations which translates into better service for our customers, ensuring maximum speed, minimal downtime, and optimum reliability for JFrog’s internal operations.

What is nZDM?

The database near Zero Downtime Migration (nZDM) tool is an internal JFrog solution built to facilitate the migration of databases between hosts with near-zero downtime. At its core, nZDM leverages pglogical, a powerful PostgreSQL extension that implements logical replication, enabling highly efficient and selective data replication.

nZDM automates numerous critical tasks, from essential pre-migration checks to orchestrating the entire migration lifecycle, from start to finish. This automation significantly simplifies what would otherwise be a complex and error-prone process, allowing our engineers to execute database migrations faster and more reliably.

Why nZDM is a Game-Changer for JFrog

nZDM brings a host of compelling benefits that directly address the traditional pain points of database migrations, including:

  • Minimized Downtime: The most significant advantage of nZDM is its ability to ensure near-zero downtime. This means  seamless service continuity for our customers, and full flexibility and control for JFrog DevOps over our databases throughout the process.
  • Reduced Risk: By automating migrations and eliminating manual steps, nZDM drastically reduces the risk of human error. This ensures superior data integrity and consistent service availability, while protecting critical operations.
  • Enhanced Efficiency: Our engineers can now execute database migrations faster and with greater reliability. This reduction in time spent on manual tasks allows them to focus on higher-value projects, driving innovation and improving overall productivity.
  • Increased Agility: nZDM provides engineers with the flexibility to perform simultaneous migrations and deploy changes more swiftly, significantly enhancing the agility and responsiveness of JFrog’s internal services.
  • Improved User Experience: Ultimately, nZDM’s seamless migrations contribute to a better service  for our customers by minimizing interruptions and maintaining high availability consistent performance.

How nZDM Delivers Near-Zero Downtime

nZDM’s robust architecture uses an event-driven approach, utilizing asynchronous processing with the open-source project Redis. The system comprises three main microservices:

  • Server
  • Migration-consumer
  • Cutoff-consumer

Each microservice is responsible for a specific aspect of the business logic.

The high-level migration flow involves several key stages:

  1. Pre-Migration Setup: Before a migration begins, nZDM performs essential checks, such as verifying maintenance flags and providing a migration approval flag with the relevant date. This acts as a crucial safeguard against unintended migrations.
  2. Initiating Migration: Migrations can be initiated via a dedicated Jenkins job or through an API endpoint, supplying the necessary parameters such as customer lists, destination database hosts, and any tables to be excluded.
  3. Logical Replication: Once initiated, the migration-consumer processes the request, and pglogical replication is established. This involves creating provider (source) and subscriber (target) nodes, dumping and restoring the schema, and defining which tables and sequences should be replicated from the source to the target database.
  4. Monitoring: JFrog teams can monitor the migration process using a Grafana dashboard and Coralogix logging. Key metrics like target_col, migration_status, pglogical_status, and cutoff-status are closely watched to ensure smooth progression through the workflow.
  5. The Cutoff Process: The cutoff is the only step in the near Zero Downtime Migration process that may involve some  downtime, but it is carefully managed to be as minimal as possible. This occurs once the pglogical replication has successfully configured, the target DB is replicating from the source DB, and a minimal replication lag is achieved. During this phase, connections to the source database are disabled, active sessions are terminated, and critical validations are performed to ensure data consistency, such as sequence and unique ID synching, before connections to the target database are enabled.

JFrog’s Accomplishments with nZDM

Since its inception, nZDM has already contributed significantly to increasing efficiency and, proving its value and robustness for JFrog’s internal development ecosystem. These include:

  • Improved migration process performance and efficiency.
  • Resolution of  critical issues and high-memory incidents in production environments.
  • Successful execution of the Single2Flexible migration project, migrating all relevant databases to updated versions.
  • Provided a dedicated solution for migrating a large database to a dedicated DB, S3 bucket, and cell.
  • Migrated over 500 databases using the nZDM tool.

The Road Ahead: Continuous Innovation

While nZDM has already delivered substantial value, we are continuously enhancing its capabilities. Our future roadmap includes:

  • Adding support for cross-cloud migrations, addressing the complexities of connecting databases across different cloud environments.
  • Enabling S3 Bucket migrations.
  • Expanding support for multi-tenant (MT) single applications.
  • Integrating Kafka support for message streaming.
  • Developing an SRE operator for improved operational management.

We are also actively tackling challenges such as supporting MT applications that do not follow a standard schema, which requires innovative approaches to data migration at the row/table level.

Conclusion

At JFrog, we are committed to developing tools for our own internal use only, that are not available to partners and customers.  We continue to do this as we believe that innovation is the best way to increase the efficiency of our internal operations by providing high availability and reliable service for our customers.

The near Zero Downtime Migration tool embodies this commitment, transforming a historically challenging process into an efficient, low-risk operation. By leveraging cutting-edge technology like pglogical and an event-driven microservices architecture, nZDM ensures that database migrations are no longer a headache for JFrog’s IT and DevOps teams, but a smooth, near-invisible process that empowers them to better serve our customers.

Click here for more information about how our innovative technologies provide outstanding support and value for our customers.