This session will address how to migrate artifacts from one Artifactory server to another while keeping production CI/CD pipelines operating smoothly with minimal interruption.
Topics include motivation for the migration, changes required for both CI systems and how to coordinate changes on user side.
Hello, my name is Robert Wen from SalesForce. Thank you for attending this year’s SwampUP. Today, I’m delighted to share with you a story about migrating repositories and binaries from one Artifactory server to the other. First, a little bit about myself, I’m a Lead Build and Release Engineer at SalesForce infrastructure engineering team. My team is responsible for CI\CD, DevOps and developer productivity. First, an overview about the story. So what are we trying to accomplish?
First, we transfer artifacts from Artifactory server to a new Artifactory with high availability. We needed to update build configurations in TeamCity. Why are we doing this? Well, first, the old Artifactory server version was old Well, first, the old Artifactory server version was old and it was shared by multiple teams over the years and it was shared by multiple teams over the years on one single point of failure.
Meanwhile, in the new Artifactory server, it was configured with high availability in a Kubernetes cluster. So how do we make this thing happen, make these changes happen? Well, first, we needed to update TeamCity server and agent settings. In source code we needed to update hardcode reference to Artifactory server. On the user site, all the local settings for Mavern, Gradie, Scala and Python on the update.
This diagram represents a high level description of the old and new Artifactory servers. As you can see on the left hand side it is a single node Artifactory, the right hand side represent the new Artifactory server with three nodes HA. Here’s a little bit background about this story, and the motivation to migrate repositories. As I mentioned a little bit earlier, the old Artifactory was operating an older version, it has been a shared server between different organizations for years, it was not well supported in the past. As a result, carrying out maintenance or upgrade for this server has become more challenging due to various differences in technical and release requirements. In fact, it was one big single point of failure.
Meanwhile, in the new Artifactory server, it has been up and running. This new server is operating in a Kubernetes cluster with high availability feature and automated backup in place. So what we needed to do to get the migration complete? Here’s how to at a high level.
First, we need to identify all the changes required. These changes cover TeamCity server, the build templates, and custom build configurations. Those configuration templates, many of them include reference to Artifactory. Among the TeamCity, agents, all the local settings that reference Artifactory also need change. TeamCity server and agents.
In the source code, we need to look up all the references to Artifactory as well, those on the update. On the user side, their [inaudible], desktop or laptop, they have to change their settings to point to the new Artifactory. So after identifying all the changes required, we need to test changes in an isolated build.
Meaning, we are testing this build against new Artifactory without impacting ongoing build and release. We are confident that we can move forward, then we can carry out the rest of the migration, meaning we migrate or build configurations to new Artifactory. Then let’s dive into a little bit about what are the changes required.
The TeamCity server, the Artifactory integration, we need to update these configurations for the new Artifactory that includes a new service account, new connection from TeamCity to Artifactory server, we need to set up the new user settings for build configurations. So it’s these new user settings are primarily used for building TeamCity to interact with Artifactory.
In the build templates, we have many Maven build or Gradle. They all derive from the build templates, therefore, all the new user settings that are referenced by the build templates must update for the new Artifactory server. So after updating the TeamCity settings, we need to look at the source code.
In some of the source code repositories, there exists a package reference to a server URL for example. All those changes require update. In TeamCity, agent instance, here all the local settings need update, including Maven settings Gradle, sbts, Scala, and Python. On the user side, the server key changes must apply.
First, all users need to log on to the new Artifactory server and generate new API keys. Those API keys will be used in their local settings. Of course, you can use password. But there’s an advantage for using API key. For example, users can use the API keys for carrying out REST API calls. There’s no need to worry about expiring passwords. Even if password changes, and API key remains valid, as until it’s revoked. Once this API key is revoked, of course, you have to generate a new API key and update local settings accordingly. So why is this a big advantage? In many organizations, passwords need to rotate. For us the artifacts really integrate with LDAP and we have LDAP password rotation update requirement. So once a password is change, obviously, your local settings will no longer work. So it’s very important to use API key in your local settings. So once the API key is generated against the new server, new Artifactory server, all the local settings, laptop or desktop, for Maven, Gradle, SPT, or PI form, all need to be updated accordingly.
Previously we went over some of the changes required. There are several things we need to set up for the preparation. First, we want to make sure that this transition experience is smooth. To help achieve that, first we set up the automated artifact transfer from old to new servers on hourly basis. TeamCity builds continue to publish to Artifactory’s old server. Meanwhile, artifacts from the old server will automatically push to the new server. So once build configuration switches to a new Artifactory, all the new builds will continue where the old builds left off. By that, I mean, the general artifacts from OBS will be available for new builds.
So in this case, there shouldn’t be any interruption in fact, with the nice experience any interruption. As you may be aware, or experienced where you do continuous build, the artifacts generally sometimes may be required by the new build, let’s say tomorrow or next week. So it’s very important to keep all the artifacts generated before migration available for the builds after migration.
Here’s a sample for artifact transfer. We simply use a cron job and JFrog cli commands. This is a search spec input, some of you are probably familiar if you have been using JFrog cli. So here we use the AQL Artifactory query language, just identify the repository we are interested in for migrating artifacts. In this example, we include a path matching to look up all the com/SalesForce artifacts. So we look up all the artifacts that have been downloaded or updated within last 60 minutes. Meaning basically we are interested in all the artifacts that were updated, rewritten or whatever within the last hour. Then we do this on hourly basis.
Here, on the bottom here, you can see the command line after using JFrog cli we download from old Artifactory server and then upload the same artifacts to the new server. Once we start sinking the artifacts, we can do the build validation tests using the new settings. Of course, we don’t want to interrupt any ongoing production builds. Therefore, we create a new TeamCity agent image. This new TeamCity agent image is based on existing image.
Then we change the local settings to point to the new server, new Artifactory server followed by launching a new agent instance from the new agent image. By using the new agent instance, we were able to carry out test builds. How do we do the test peers? Well, first we clone the existing working build configuration and updates, Maven setting point to the new Artifactory, Then we need one bill against the new agent instance. Followed by validating build results.
During this step, as you can see here, we have the build test against a new agent instance. And there’s no interruption at all against anything ongoing, development tests or builds against the old server. Once we validate the build, new build settings based on the build itself, we evaluate the correctness, the build performance, there should not be any difference preferably better, but if not, at least there should not be build performance difference between the old and new build settings.
After we validate build results using the new TeamCity and Artifactory settings, we need to carry out the migration for all the peers. So this will take some time for maintenance. For us, it is fairly straightforward. First we need to communicate very clearly what are the migration steps, what is the maintenance schedule.
Once the information is sent out, we get a go ahead agreement with all the stakeholders, we were able to carry out the maintenance and migration. Here are several high level steps we took to complete the migration. First we need to run TeamCity server backup and database snapshot. We pause TeamCity queue. At that point, no new build will take place. Once the build queue is paused, we do thinking, seeing artifacts again from old to new Artifactory servers.
As I mentioned earlier, there sure was a cron job continuously sinking artifacts from old Artifactory to new Artifactory server. The next steps include updating TeamCity build configurations, and also caging a TeamCity agent launch configuration. So why the launch configuration is important here? That’s related to the following steps including, we need to shut down the old agent instance, then launch a new instance.
At this point, the new instance will be launched from the new agent launch configuration. As a result, they will include all the updated settings against the new Artifactory server. In the meantime, we need to search all hardcode reference to all Artifactory in GitHub, where we manage all our source code. So all these own artifact reference must be updated against the new server. After replacing reference to a new server, we are ready to continue all build activities.
So at that point, we re-enable TeamCity build queue. From that point on all the build resumes, we are ready to communicate with all the stakeholders about the migration result. So how do we finalize the migration? One key aspect is communication. When doing this process, I mentioned couple times, we communicate the migration steps. We communicate the migration steps. What needs to be done, what change you apply on a server client side and user local settings. Once all the build configuration migrated, and build queue resumes, we notify all stakeholders about the migration success. We provide summary of changes on server and client sides. We reiterate user local changes required. So why is this aspect so critical while we have a global development team?
It’s very important to communicate repeatedly about what change they need to apply in order to continue their local build. Otherwise, there may be a developer productivity issue. To help support our users, we set up office hours to answer any support questions.
We set up office hours to answer any support questions. For example, some users may have trouble logging in the new server, For example, some users may have trouble logging in the new server, or setting up API keys credentials, or their local settings.
We dedicate certain time slots within the two weeks after the final build configuration migration, make sure we continue to provide support. And we can provide more, one on one support as needed. So in conclusion, what is this story about? It’s about migrating artifacts. Why do we do this?
Well, again, the old server infrastructure was complex, it’s not well supported. As a result, upgrade costs or maintenance are uncertain. It is very important for migration, we need to have a high level of confidence that the migration path will work. So how about doing this, how do we accomplish this migration effort?
There are several key aspects. One is we set up automated transfer of repositories, we identify all the changes required. Once we identify the change required, we need to validate all the change in isolated configurations. Again, this step is critical, because we obviously do not want to interrupt any ongoing build test activity while we try to validate all the changes required. Once validation is done, we apply the changes to all PO configurations. Then we communicate with users regarding local setting changes, as well as continue to provide support as needed.
So that concludes today’s presentation.
Thank you very much.
Look forward to your questions and enjoy the rest of the SwampUP event.