How to clean up old Docker images

Angello Maggio
2019-08-26 17:44

Cleaning Up Old and Unused Docker Images

Sometimes we accumulate too many images that are no longer used, or that are not downloaded in a very long time. However, due to the layered nature of Docker and Artifactory's checksum based storage, this can be a tricky task.

Description

Docker is stored in layers, and each layer has its own checksum value stored. Just like with any other artifact, Artifactory will store the layers based on this value, causing layers to be shared by different deployments; not only between different tags, but also between different images. That means that deleting a layer based on their last download date might cause issues cleaning up. Let's say you are using the REST API or AQL to find old Docker images based on the least used, so you run a query to find all artifacts not downloaded since 3 months ago. If you then delete those artifacts you might still have images that have not been used in a long time, and that are now incomplete. This is because some of the layers might still be used by other tags or images, so those layers did not get deleted. On that line we also want to make it clear that if you delete a layer from one image, it will not be fully deleted as long as other images are referencing it, so what we have to focus on is deleting that image as a whole.

Resolution

So how to cleanup Docker? We search based on the manifest.json file, which is what will be changed only when that specific image/tag are downloaded/used. 

For example the following Python script would look for all manifest.json files that are 4 weeks old or more and delete the entire image. Be careful when running the script as it will delete files, make sure to test first. 

def clean_docker():    import requests    base_url = 'http://localhost:8081/artifactory/'    headers = {        'content-type': 'text/plain',    }    data = 'items.find({"name":{"$eq":"manifest.json"},"stat.downloaded":{"$before":"4w"}})'    myResp = requests.post(base_url+'api/search/aql', auth=('admin', 'password'), headers=headers, data=data)    for result in eval(myResp.text)["results"]:        artifact_url = base_url+ result['repo'] + '/' + result['path']        requests.delete(artifact_url, auth=('admin', 'password'))      <----- [[[[[THIS WILL DELETE FILES]]]]]]if __name__ == '__main__':    clean_docker() 

You can also do this sort of operation using JFrog's CLI for deleting files. In this case you'd specify your targets and filtering in a spec file. You'd then run the CLI in this fashion:
 

jfrog rt del --spec=<mySpecfile> 

An example of a spec file in this case would look like this
 

{ "files": [  {     "aql": {        "items.find" : {          "repo":"<DOCKER_REPO_NAME>",          "$and": [{              "created":{"$before":"4w"},          }]   }}}]}
Instead of "created" you may use the "stat.downloaded" field to find the ones that haven't been used recently (see more fields here, remember to use them as domain.field) along with a relative time operator such as $last (for example downloaded in the last 4 weeks) or $before (for example, downloaded before 4 weeks ago). You may also put them together to make a range by compounding criteria ($and $or).