Caching helps you speed up execution of a step by preserving and restoring packages and dependencies between runs of that step. In this way, you can reduce build times by avoiding repeating the installation or loading of large dependencies every time the step is run.
Native steps perform caching as needed, so they will always execute as fast as possible. You will only need to use the methods described in this page when you are using general-purpose Bash steps, or if your native step performs a cacheable action in its onStart or onComplete execution block.
How Caching Works
Caching is performed through utility functions that store and restore data to and from the Artifactory filestore. In this way, a step can benefit from the dependencies that were installed or loaded from a previously executed run.
Cache File Scope
A cache file that a step stores using the add_cache_files function will be associated with that step, and will only be available to a restore_cache_files function from that same step in a subsequent run.
You cannot pass a cache file to other steps in a pipeline. For example, if step_1 adds the cache file step_1_cache, and step_2 tries to restore step_1_cache, nothing will be loaded.
Note
Steps running in parallel can overwrite the cache. So, steps trying to update step or pipeline state in parallel will result in only the last version of the state saved for the step/pipeline being available to future steps.
Filestore Limitations
In general, the Artifactory filestore provides the highest available performance for storing and restoring data.
However, the speed you experience will depend on which storage medium Artifactory has been configured to use. If Artifactory has been configured to use the file system on a local or mounted filestore, this is fast storage and caching will always accelerate step execution. If Artifactory has been configured to use remote storage such as S3 or Google Cloud Storage, then the slower roundtrip to and from the filestore may diminish the usefulness of caching:
Files that take a long time to install always benefit from caching. So anything related to bundler, npm, composer, pip, etc are great candidates for caching.
Files that take a long time to download but are installed quickly do not benefit from caching since it takes as much time to download from S3 as from the original source. Examples are compiled binaries, JDK packages, etc.
Example
The following example caches the results of an npm install for subsequent runs.
resources:
- name: my_gitrepo
type: GitRepo
configuration:
gitProvider: my_github-integration # replace with your integration
path: my-github/my-pipelines-project # replace with your repository path
branches:
include: master
pipelines:
- name: pipelines_caching
steps:
- name: step_1_pipelines_caching
type: Bash
configuration:
inputResources:
- name: my_gitrepo
execution:
onExecute:
- cd $res_my_gitrepo_resourcePath
- restore_cache_files npm_cache $res_my_gitrepo_resourcePath/node_modules
- npm install
onComplete:
- add_cache_files $res_my_gitrepo_resourcePath/node_modules npm_cacheThe step's
onExecuteblock performs arestore_cache_filesfunction to load the cached npm dependencies if they are available from a previous run. If none exist, no error will result, so the remainder of the step will execute without interruption.When the
npm installis run, it will recognize if the dependencies are already present from the cache, so the step will execute more rapidly. If there was no cache to load, then the npm dependencies will be installed.When the step is complete, it will always write the npm dependencies to the cache so they will be available to the step in the next run of the pipeline.