Caching helps you speed up execution of a step by preserving and restoring packages and dependencies between runs of that step. In this way, you can reduce build times by avoiding repeating the installation or loading of large dependencies every time the step is run.
Native steps perform caching as needed, so they will always execute as fast as possible. You will only need to use the methods described in this page when you are using general-purpose Bash steps, or if your native step performs a cacheable action in its onStart
or onComplete
execution block.
How Caching Works
Caching is performed through utility functions that store and restore data to and from the Artifactory filestore. In this way, a step can benefit from the dependencies that were installed or loaded from a previously executed run.
Cache File Scope
A cache file that a step stores using the add_cache_files
function will be associated with that step, and will only be available to a restore_cache_files
function from that same step in a subsequent run.
You cannot pass a cache file to other steps in a pipeline. For example, if step_1
adds the cache file step_1_cache
, and step_2
tries to restore step_1_cache
, nothing will be loaded.
Note
Steps running in parallel can overwrite the cache. So, steps trying to update step or pipeline state in parallel will result in only the last version of the state saved for the step/pipeline being available to future steps.
Filestore Limitations
In general, the Artifactory filestore provides the highest available performance for storing and restoring data.
However, the speed you experience will depend on which storage medium Artifactory has been configured to use. If Artifactory has been configured to use the file system on a local or mounted filestore, this is fast storage and caching will always accelerate step execution. If Artifactory has been configured to use remote storage such as S3 or Google Cloud Storage, then the slower roundtrip to and from the filestore may diminish the usefulness of caching:
Files that take a long time to install always benefit from caching. So anything related to bundler, npm, composer, pip, etc are great candidates for caching.
Files that take a long time to download but are installed quickly do not benefit from caching since it takes as much time to download from S3 as from the original source. Examples are compiled binaries, JDK packages, etc.
Example
The following example caches the results of an npm install
for subsequent runs.
resources: - name: my_gitrepo type: GitRepo configuration: gitProvider: my_github-integration # replace with your integration path: my-github/my-pipelines-project # replace with your repository path branches: include: master pipelines: - name: pipelines_caching steps: - name: step_1_pipelines_caching type: Bash configuration: inputResources: - name: my_gitrepo execution: onExecute: - cd $res_my_gitrepo_resourcePath - restore_cache_files npm_cache $res_my_gitrepo_resourcePath/node_modules - npm install onComplete: - add_cache_files $res_my_gitrepo_resourcePath/node_modules npm_cache
The step's
onExecute
block performs arestore_cache_files
function to load the cached npm dependencies if they are available from a previous run. If none exist, no error will result, so the remainder of the step will execute without interruption.When the
npm install
is run, it will recognize if the dependencies are already present from the cache, so the step will execute more rapidly. If there was no cache to load, then the npm dependencies will be installed.When the step is complete, it will always write the npm dependencies to the cache so they will be available to the step in the next run of the pipeline.