Best Practices for Virtual Repository Sizing

Patrick Russell
2021-09-14 08:51

Virtual Repository Best Practices

    The sad story of "/repo" 

For some, a common desired end-state for Artifactory was to have it use only one URL for all artifact downloading needs. In theory, it sounds like a breezy, one-stop shopping solution. However, there are many problems associated with this type of setup and this approach is not supported in Artifactory.

A virtual repository should not aggregate more than twenty (20) repositories as a best practice.

/repo – The Global Virtual

In Artifactory versions 2 and 3, there was a global virtual repository called /repo, which aggregated all local and remote repositories found within the Artifactory application. As Artifactory was initially a small and specialized application, things started off well enough. Users would point their builds to and know they’d be able to pull any Java-based package into Artifactory.

However, problems soon began to manifest.

Builds would pull the wrong artifact from the wrong repository. Maven’s Snapshot version would jump wildly depending on which path was resolved first. Downloads would take forever as the repository had to aggregate a multiplicity of paths. Artifactory’s developing usage philosophy led to this repository being permanently disabled in Artifactory 4.7.3.

What were the lessons learned?

The Problem with Large Virtuals

Virtual repositories aggregate other repositories. This is supposed to scale up smoothly, but problems can present themselves when you exceed twenty aggregated repositories (with the understanding that the scale of included repositories can affect this number). Artifactory has to perform a calculation event in its virtual repositories every time it has to serve an artifact. This ensures that responses are coming from the correct repositories. Virtual repositories use this resolution algorithm to:

  1. Serve any files found in local repositories first.
  2. Serve any files found in remote repository caches second.
  3. Download and serve remote artifacts last.

To determine which repository contains a particular artifact, Artifactory must check every repository in the virtual. The Trace Artifact Retrieval REST API can be used to view this logic in action. The worst-case scenario from this algorithm should be clear: What if an artifact is in the last remote repository in the virtual list? If a given artifact is at the bottom of the resolution algorithm (Show above as step #3), it will take far longer to download versus pulling it directly from a remote repository.

UI Performance Problems

There are other, subtler problems that arise when managing a huge virtual repository. As your Artifacts browser must display all artifacts and subfolders within any given folder, virtual repositories need to perform a nested check to determine what is displayed. This means checking each remote endpoint, as well as every local folder your virtual repository is aggregating.

Here’s an example of a two-second load time with only a remote repository:

User-added image

In most instances, your virtual repository will need to complete many HEAD requests against multiple remote endpoints. Expect UI actions such as opening a folder to take upwards of thirty (30) seconds to complete on a large virtual repository folder. 

Aggregation Issues

Virtual repositories for different repository types will have different logic handlers. For example, if two Maven Snapshot builds share the same path, the virtual repository will serve an "incorrect" file based on the resolution order:

User-added image

You may be able to change the Resolution Order of your repositories to fix this particular pathing issue, but it could break other builds that are using the same, large virtual repository.

Alternatives to large virtuals

There's an easy solution to the problems associated with managing large virtual repositories – use more specific, smaller virtual repositories. There are fewer limits on the number of virtual repositories that can be created in Artifactory. Each development team can craft their own virtual repository to suit particular build or project needs. This also delegates the problem of repository management to the specific parties that are most interested in maintaining such solutions.