How to Choose a Docker Base Image for Python
When it comes to containerized applications, choosing the right container base image – meaning the foundational image on which you build new, customized container images – is half the battle for success. The base image you choose plays a central role in your application’s security, performance, maintainability and more.
To illustrate the point, this article walks through several approaches to selecting a base image for an application developed in Python, one of the most popular programming languages today. As we’ll see, although Python applications are compatible with virtually any type of container base image, one base image may make more sense for you than another depending on your needs and priorities.
Why choosing the right base image is critical
Before diving into base image options for Python, let’s discuss the numerous reasons why selecting the right base image is so important:
-
- Performance: The overall size of your base image affects how quickly the image can be downloaded. In addition, the amount of code that the image runs when the container starts impacts the time it takes for your app to get up and running. Both of these factors are key considerations for overall container performance.
-
- Dependencies: Many applications have dependencies, meaning libraries or other code that they require to operate. If these dependencies are baked into the base image, you don’t have to do as much work to configure the container or customize its contents to fit your app.
-
- Updates: You’ll want to ensure that your base image is kept up-to-date. That’s easier to do if your base image is based on a software platform or distribution that is actively maintained and regularly patched by third-party developers.
-
- Security: The more code that is inside your base image, the larger the attack surface of your container. For this reason, base images that include only the code strictly necessary to run your app, and nothing more, are more secure.
As you can see, there is some conflict between these priorities. For example, base images that contain a lot of libraries are more likely to satisfy your application’s dependencies. On the other hand, they may also be riskier from a security perspective, because they could contain libraries you don’t actually need, and which therefore increase your exposure to potential security vulnerabilities.
Given the competing considerations to weigh when selecting a base image, there is rarely an obviously ideal choice. Instead, you’ll need to think strategically about the particular requirements of your application and hosting environment before committing to a particular base image.
Three base image options for Python
As an example of how to factor varying priorities into base image selection, imagine that you have a Python application that you want to run inside a container. For example purposes, we’ll use a very simple Hello World Application whose source consists solely of this line of Python source code:
print("Hello world!")
We’ll name our application hello.py. You can go ahead and run it on the command line if you want with:
python hello.py
The output should, of course, be:
Hello world!
But we don’t want to run the application directly from the command line. We want to run it as a container. And for that, we need to create a Docker image to run the app.
Because Python is a cross-platform language that can run almost anywhere, you can create a containerized Python application using virtually any base image. Still, the base image you choose will impact the way your app operates, how easy it is to maintain and secure and so on.
So, let’s examine the three following approaches to Python base images:
-
- Lightweight Linux distribution
- Standard Linux distribution
- Minimal base image built from scratch
1. Lightweight Linux base images for Python
For a simple Python application with few or no dependencies, you could use a base image that was created using a lightweight Linux distribution. This type of base image provides a minimalist Linux environment for your container to run in. That makes it more secure than a “heavierweight” environment that includes more potential vulnerabilities.
The downside of using a minimalist base image is that if your app has dependencies, and those dependencies are not built into the base image (which they likely are not if it’s a minimalist image), then you’ll need to write more lines of code when creating your image.
For now, let’s assume zero dependencies exist in your applicaiton. It can run using a lightweight Linux base image, like Alpine, which you can pull for free from Docker Hub. The only thing we need to add on top of the base image in this case is a Python interpreter, because Alpine doesn’t come with Python installed by default.
Let’s create a Dockerfile that uses Alpine as a base image and installs Python, then runs it on our application.
It would look like this:
FROM alpine WORKDIR /usr/bin ENV PYTHONUNBUFFERED=1 RUN apk add --update --no-cache python3 && ln -sf python3 /usr/bin/python COPY hello.py /usr/bin/hello CMD python /usr/bin/hello
Save this file with the name Dockerfile, and make sure your Python app is stored in the same directory as the Dockerfile. Then, build the container with:
docker build -t hello .
This tells Docker to name the container with the tag “hello.”
To run the container, enter:
docker run -it hello
You should see “Hello world!” printed on your CLI.
If you’re curious how large your new container image is,
run the command:docker image ls
If you used Alpine as a base image, you’ll see that the container image’s size is relatively small – around 52 megabytes:
REPOSITORY TAG IMAGE ID CREATED SIZE hello latest 456fcecc53d7 7 minutes ago 51.7MB
2. Standard Linux base images for Python
A lightweight Linux distribution like Alpine works well for a simple “Hello World” app that has no dependencies other than Python. But if you had a more complex Python app with a lot of dependencies, Alpine may not be a great choice, because you’d have to install each module on top of the base image.
Instead, you might prefer to use a standard Linux distribution, like Ubuntu. In this case, you can create a container for your Python app using a Dockerfile that looks like this (In this example, we use the Ubuntu 20.04 base image, specifically):
ROM ubuntu:20.04 RUN apt update2 RUN apt install -y python3 COPY hello.py /usr/bin/hello.py ENTRYPOINT [ "python3" ] CMD ["/usr/bin/hello.py"]
You can go ahead and create a container based on this Dockerfile using the same command as with the Alpine base image:
docker build -t hello_ubuntu .
In this case, we name the image “hello_ubuntu” to differentiate it from the Alpine-based image.
You can run the new container with:
docker run -it hello_ubuntu
Because we used an Ubuntu base image, the total image size in this case is 147 megabytes – quite a bit larger than our Alpine image:
REPOSITORY TAG IMAGE ID CREATED SIZE hello_ubuntu latest 0ed400ca01b4 4 minutes ago 147MB
3. Minimalist base image from scratch
A third way to containerize a Python app is to use a base image that you create from scratch. Under this approach, you don’t use a third-party base image at all. Instead, you generate your own unique base image, then create a container image for your application using your base image. (Technically, there is still a base image involved in this approach – Docker’s “scratch” image – but that is a specialized, extremely minimalist image whose singular purpose is to provide a blank slate for creating brand new base images.)
The advantage of using a base image created from scratch is that it will result in a container that is as small and secure as possible. The main downside is that creating and updating the container requires more effort because you have to compile the Python code into a static binary. This is necessary because you need to be able to execute your application without having a Python interpreter available inside the container.
To compile the Python code to static code, first install cython (which translates Python code into C):
sudo -H pip3 install cython
Then, generate C source code based on your Python code (here, we’re using the “hello.py” example program):
cython hello.py --embed
Finally, go ahead and compile the code with gcc:
gcc -Os -I /usr/include/python2.7 -o hello hello.c -lpython2.7 -lpthread -lm -lutil -ldl
(Note that you may need to modify the preceding command depending on which version of Python you have installed on your system.)
At this point, you should have an executable binary file, called hello, that you can run without the Python interpreter using:
./hello
Now, we can create a Dockerfile that defines a container to run this binary:
FROM scratch ADD hello / CMD ["/hello"]
Build and run the container with commands like:
docker build -t hello_scratch docker run --it hello_scratch
If you check the size of this container, you’ll see that it’s very, very small – a mere 24 or so kilobytes, which is about 20,000 times smaller than the container we created using the Alpine base image:
REPOSITORY TAG IMAGE ID CREATED SIZE hello_scratch latest 8459e05660ad About a minute ago 24.1kB
So, although creating a base image from scratch for a Python app requires more effort, you get an extremely efficient and (due to its minimalist nature) secure container.