Back

5 best practices for building effective Dockerfiles

Image Slider

June 1, 2021

By Jordan A. - DevOps Expert

During DockerCon 2021, which took place on May 27, 2021, we were able to attend a number of very interesting conferences. One of them caught our attention because it presented basic concepts for writing Dockerfile files and thus creating effective and efficient containers.

This conference was given by Aaron Kalin, Technical Evangelist at Datadog: "Lessons Learned With Dockerfiles and Docker Builds" and offers seven lessons to remember, which I will detail and illustrate with concrete examples.

 

Lesson 1: Be mindful of the background image you use

Alpine images have been very popular in recent years due to their small size and limited number of vulnerabilities. This makes them an ideal basis for building your own Docker image.

Yes, but... With repeated use, Alpine images are no longer unanimously accepted by developers. One of the first issues concerns the use of musl rather than glibc (whereas the most popular distributions tend to use glibc). This means that elements compiled on Alpine distributions may not be usable on Ubuntu (and vice versa).

Furthermore, what about packages that are not yet available on Alpine but are available on other distributions, and are essential for handling dependencies in your code?

Aaron Kalin suggests we use "slim" versions of images instead, which are smaller in size, sometimes quite similar to the size of alpine images, as shown here:

$ docker image ls | grep python
python     3.9.1-slim-buster   8c84baace4b3    3 months ago   114MB
python     3.7.4-alpine3.9     32a1b98d0495   19 months ago   98.5MB

Lesson 2: Chain your RUN commands

The principle of chaining your RUN commands for installing dependencies allows you to have only one layer created (because for each command in the Dockerfile, a new layer is created) for your dependencies.

Aaron Kalin also recommends organizing the names of packages to be installed in alphabetical order with a single package per line (easier to maintain and reorganize).

For example, let's take aDockerfile with the packages to be installed on a single line:

FROM ubuntu:bionic

RUN apt-get update && apt-get install -y --no-install-recommends \
curl \
git \
nginx \
python

EXPOSE 80

CMD ["nginx", "-g", "daemon off;"]

The following history is obtained:

$ docker history docker_1_layer:latest 
IMAGE               CREATED         CREATED BY                                      SIZE                COMMENT
6892a0a503de        18 seconds ago  /bin/sh -c #(nop)     CMD 
["nginx" "-g" "daemon…   0B                  
374bdfdad2b2        18 seconds ago  /bin/sh -c #(nop)     EXPOSE 
80                    0B                  
3f7201caacaa        20 seconds ago  /bin/sh -c apt-get update
&& apt-get install…   189MB               
81bcf752ac3d        8 days ago      /bin/sh -c #(nop)     CMD ["/bin/bash"]            0B                  
<missing>           8 days ago      /bin/sh -c mkdir -p 
/run/systemd && echo 'do…   7B                  
<missing>           8 days ago      /bin/sh -c [ -z "$(apt-get indextargets)" ]     0B                  
<missing>           8 days ago      /bin/sh -c set -xe    && 
echo '#!/bin/sh' > /…   745B                
<missing>           8 days ago    /bin/sh -c #(nop)       ADD file:e05689b5b0d51a231…   63.1MB  

Now, let's perform the same experiment with the elements dispatched line by line in its Dockerfile:

FROM ubuntu:bionic

RUN apt-get update && apt-get install -y --no-install-recommends curl
RUN apt-get install -y git
RUN apt-get install -y nginx
RUN apt-get install -y python3

EXPOSE 80

CMD ["nginx", "-g", "daemon off;"]

This gives us the following history:

IMAGE               CREATED            CREATED BY                       SIZE                COMMENT
0d25db122b31        16 seconds ago     /bin/sh -c #(nop)   CMD
["nginx" "-g" "daemon…   0B                  
3cf4fb051b11        17 seconds ago     /bin/sh -c #(nop)   EXPOSE
80                    0B                  
f736c0e7e9e6        18 seconds ago     /bin/sh -c apt-get  install
-y python3           29.4MB              
c6c35fc73cad        28 seconds ago     /bin/sh -c apt-get  install
-y nginx             53.3MB              
53e8b93b739a        39 seconds ago     /bin/sh -c apt-get  install
-y git               83.4MB              
57e76bf1ae81        52 seconds ago     /bin/sh -c apt-get  update
&& apt-get install…   48.9MB              
81bcf752ac3d        8 days ago         /bin/sh -c #(nop)   CMD
["/bin/bash"]        0B                  
<missing>           8 days ago         /bin/sh -c mkdir -p
/run/systemd && echo 'do…   7B                  
<missing>           8 days ago         /bin/sh -c [ -z "$(apt-get indextargets)" ]     0B                  
<missing>           8 days ago         /bin/sh -c set -xe  && 
echo '#!/bin/sh' > /…   745B                
<missing>           8 days ago         /bin/sh -c #(nop)   ADD file:e05689b5b0d51a231…   63.1MB

We obtain two images that are different sizes, with the second one being more complex:

$ docker image ls | grep docker
docker_4_layers            latest              0d25db122b31
About a minute ago   278MB
docker_1_layer             latest              6892a0a503de
4 minutes ago        252MB
$

Lesson 3: Clean up after installing packages

Let's return to our next example:

FROM ubuntu:bionic

RUN apt-get update && apt-get install -y --no-install-recommends \
curl \
git \
nginx \
python

EXPOSE 80

CMD ["nginx", "-g", "daemon off;"]

Here, after installing the packages using apt, we did not perform any clearing. However, to further reduce the image size, and therefore its build and load time, you can add the following commands:

rm -rf /var/lib/apt/lists/* && apt clean

This would give us:

FROM ubuntu:bionic

RUN apt-get update && apt-get install -y --no-install-recommends \
curl \
git \
nginx \
python \
&& rm -rf /var/lib/apt/lists/* \
&& apt clean

EXPOSE 80

CMD ["nginx", "-g", "daemon off;"]

This allows us to compare the size of the image without cleaning: docker_1_layer, and the image after cleaning: docker_1_layer_clean:

$ docker image ls | grep docker
docker_1_layer_clean       latest              494fb62a6e8c        
16 seconds ago      216MB
docker_4_layers            latest              0d25db122b31        
7 minutes ago       278MB
docker_1_layer             latest              6892a0a503de        
10 minutes ago      252MB

We can see that the image where cleaning has been performed is smaller in size than the image where cleaning has not been performed. We have therefore succeeded in reducing the size of our image.

Lesson 4: Launch the installation of application dependencies separately at the end of the Dockerfile

Since these dependencies are likely to change from time to time as your code evolves, it is best to place them at the bottom of the Dockerfile. This avoids having to rebuild all subsequent layers in the event of changes.

Don't forget to specify that the tool should not cache any data (similar to apt).

Here is an example for installing Python libraries:

RUN pip install --no-cache-dir -r requirements.txt

Lesson 5: Don't forget to use .dockerignore

Aaron Kalin rightly reminds us to use the .dockerignore file wisely. It allows you to exclude directories and files from any copies that may be made within the Docker image.

Among the files and directories that are often forgotten not to include are: .git

If your code-related files are versioned using Git, you will have a hidden .git directory created inside your working directory.

What a shame that it's downloaded inside your Docker image?!

Other files that we tend to forget are all the Dockerfile files in our working directory.

So this is what our .dockerignore file would look like:

.git
Dockerfile*

Lessons 6 and 7 will be covered in the next article. They concern the use of image construction through the multi-stage feature and the use of labels.