Docker Introduction (Part 2): Volumes & Persistent Data

By Alexander Eriksson · · Updated


Docker containers are ephemeral, meaning any data they store is lost when the container is stopped. To persist data, we must use volumes. There are two main types: bind mounts and named volumes. Bind mounts are used primarily for development, as they link a directory on your host machine to a directory inside the container, giving you direct access to the files. Named volumes, on the other hand, are the preferred method for production environments. Docker fully manages these volumes, storing the data within its own system. They are not tied to a specific host path, making them more portable and less prone to permission issues. In essence, while bind mounts offer direct control, named volumes provide a more robust and production-ready solution.

The Problem Docker Solves

As containers are designed to be replaced, they will not store any any lasting data once removed. In fact, they're supposed to be this very way and Docker writes themselves in their best practices guidelines that the containers you build should be as ephemeral as possible.

However, we often do want to store our data—and to do this—we need to so called mount volumes.

The Two Types of Volumes

In Docker, there are two primary ways to persist data outside of an ephemeral container.

  1. Bind Mounts: These map a directory or file on your host machine to a path inside the container. The changes in the container are reflected on the host and vice versa and they're generally good for development when you need to edit files on your host and see updates immediately in the container.

  2. Named Volumes: These are managed entirely by Docker and exist outside the host filesystem structure. They are more portable and (usually!) more production-friendly than bind mounts. This because docker handles the storage location, therefore we do not have to worry about absolute host paths.

Let's look into the two types of volumes and give some examples of use cases.

Named volumes

As mentioned, bind mounts quickly become a problem in production as they tie your container to the host file system, which can cause the setup to be less portable.

The more "production-friendly" approach is therefore to use named volumes which are volumes entirely managed by Docker. These are not tied to a particular host path, and are portable across environment.

If we would like to create a named volume called mydata and map that to the same /app/data path (as we did for bind mounts), we can simply run the a similar command

# create and run a container with a named volume
docker run --rm -it -v mydata:/app/data busybox sh

And inside the container, we can write some data to a file, for instance bar.txt

echo "One volume to rule them all." > /app/data/bar.txt

and you will find that—even after restarting the container—the file will persist:

docker run --rm -it -v mydata:/app/data busybox sh
cat /app/data/test.txt
# What is dead may never die.

Previously, we said that this is now stored in dockers internal system. Docker provides a way for you to see the existing volumes on your system by running

>> docker volume ls
 DRIVER    VOLUME NAME
local              mydata

But since the data is no longer in a simple file path mount on the host, you do actually need to mount the data into a temporary container if you want to inspect the data

>> docker run --rm -it -v mydata:/data busybox sh
>> ls /data
bar.txt

and as we can see, the content in our volume still exists.

Bind Mounts

The first type of volume is very tangible, and is called a "bind mount". This mount should primarily be used during development and is in most cases not suitable for a production environment.

flowchart LR H[Host Directory: ./data] <--> C[Container: /app/data]

With a bind mount we can make the container and the host look at the same file or directory, so that any changes to that given file or directory will be persistent, even when the container has been removed. Examples have always been the best learning method for me, so let's show one.

Let's create a directory called data and mount this volume into a container at the path /app/data. Since we want some basic CLI tools, we are going to use the busybox image for a simple environment.

# on host, create a folder
mkdir data

# run a container with a bind mount
docker run --rm -it -v $(pwd)/data:/app/data busybox sh

This will mount the directory data on our host to /app/data inside our container. Any change inside

i. Our data/ directory on our host, or ii. Our /app/data inside our container

will be persistent, meaning we will not lose any of the data that's written to our container (or to our host, less importantly).

We can verify this easily by writing something to a file in /app/data in our container:

echo > "What is dead may never die." > app/data/foo.txt

and this very file will now exist in data/foo.txt.

This is of course very useful for writing to databases, logs etcetera. However, bind mounts comes with some drawbacks such as headaches with permissiong, tight coupling to the host and is less portable than the alternate: named volumes.

Named volumes

As mentioned, bind mounts quickly become a problem in production as they tie your container to the host filesystem, which can cause the setup to be less portable.

The more "production-friendly" approach is therefore to use named volumes which are volumes entirely managed by Docker. These are not tied to a particular host path, and are portable across environment.

flowchart LR subgraph Docker V[(Named Volume: mydata)] end C[Container: /app/data] <--> V

If we would like to create a named volume called mydata and map that to the same /app/data path (as we did for bind mounts), we can simply run the a similar command

# create and run a container with a named volume
docker run --rm -it -v mydata:/app/data busybox sh

And inside the container, we can write some data to a file, for instance bar.txt

echo "One volume to rule them all." > /app/data/bar.txt

and you will find that—even after restarting the container—the file will persist

docker run --rm -it -v mydata:/app/data busybox sh
cat /app/data/test.txt
# What is dead may never die.

Previously, we said that this is now stored in dockers internal system. Docker provides a way for you to see the existing volumes on your system by running

>> docker volume ls
 DRIVER    VOLUME NAME
local              mydata

But since the data is no longer in a simple file path mount on the host, you do actually need to mount the data into a temporary container if you want to inspect the data

>> docker run --rm -it -v mydata:/data busybox sh
>> ls /data
bar.txt

and as we can see, the content in our volume still exists.

Summary

Containers in Docker are ephermeral—meaning you lose the data—when the container is stopped. To solve this, there are two types of volumes we can mount in Docker: bind- and named mounts.

flowchart LR subgraph BindMount["Bind Mount"] H[Host ./data] <--> C1[Container /app/data] end subgraph NamedVolume["Named Volume"] V[(mydata)] <--> C2[Container /app/data] end BindMount --- NamedVolume

The key takeaways is that you should in most scenarios use named volumes to avoid filesystem quirks, such as permission issues or accidental deletion.

Next up, we are going into some Docker networking and how containers can communicate with each other and the outside world. We will cover :

  • Port mapping: how to expose container services to the host.
  • Bridging: the default container network and custom bridges.
  • NAT and iptables: how Docker routes traffic in and out of containers.
  • Docker DNS: how containers can discover each other by name.
Back to Blog