Caching Step Runtimes

JFrog Pipelines Documentation

Products

JFrog Pipelines

Content Type

User Guide

ft:sourceType

Paligo

Caching helps you speed up execution of a step by preserving and restoring packages and dependencies between runs of that step. In this way, you can reduce build times by avoiding repeating the installation or loading of large dependencies every time the step is run.

Native steps perform caching as needed, so they will always execute as fast as possible. You will only need to use the methods described in this page when you are using general-purpose Bash steps, or if your native step performs a cacheable action in its onStart or onComplete execution block.

How Caching Works

Caching is performed through utility functions that store and restore data to and from the Artifactory filestore. In this way, a step can benefit from the dependencies that were installed or loaded from a previously executed run.

Cache File Scope

A cache file that a step stores using the add_cache_files function will be associated with that step, and will only be available to a restore_cache_files function from that same step in a subsequent run.

You cannot pass a cache file to other steps in a pipeline. For example, if step_1 adds the cache file step_1_cache, and step_2 tries to restore step_1_cache, nothing will be loaded.

Note

Steps running in parallel can overwrite the cache. So, steps trying to update step or pipeline state in parallel will result in only the last version of the state saved for the step/pipeline being available to future steps.

Filestore Limitations

In general, the Artifactory filestore provides the highest available performance for storing and restoring data.

However, the speed you experience will depend on which storage medium Artifactory has been configured to use. If Artifactory has been configured to use the file system on a local or mounted filestore, this is fast storage and caching will always accelerate step execution. If Artifactory has been configured to use remote storage such as S3 or Google Cloud Storage, then the slower roundtrip to and from the filestore may diminish the usefulness of caching:

Files that take a long time to install always benefit from caching. So anything related to bundler, npm, composer, pip, etc are great candidates for caching.
Files that take a long time to download but are installed quickly do not benefit from caching since it takes as much time to download from S3 as from the original source. Examples are compiled binaries, JDK packages, etc.

Example

The following example caches the results of an npm install for subsequent runs.

resources:
  - name: my_gitrepo
    type: GitRepo
    configuration:
      gitProvider: my_github-integration              # replace with your integration
      path: my-github/my-pipelines-project            # replace with your repository path
      branches:
        include: master

pipelines:
  - name: pipelines_caching
    steps:
      - name: step_1_pipelines_caching
        type: Bash
        configuration:
          inputResources:
            - name: my_gitrepo
        execution:
          onExecute:
            - cd $res_my_gitrepo_resourcePath
            - restore_cache_files npm_cache $res_my_gitrepo_resourcePath/node_modules
            - npm install
          onComplete:
            - add_cache_files $res_my_gitrepo_resourcePath/node_modules npm_cache

The step's onExecute block performs a restore_cache_files function to load the cached npm dependencies if they are available from a previous run. If none exist, no error will result, so the remainder of the step will execute without interruption.
When the npm install is run, it will recognize if the dependencies are already present from the cache, so the step will execute more rapidly. If there was no cache to load, then the npm dependencies will be installed.
When the step is complete, it will always write the npm dependencies to the cache so they will be available to the step in the next run of the pipeline.