How to dockerize a postgres SQL database
In this article, we will walk through how to dockerize a postgres database. If you are unfamiliar with docker, see this quick start.
We will work off an existing postgres database installed locally. First let’s connect to our local database and create a dump.sql
back up of the database using pg_dump
with the following format:
$ pg_dump -h {host} -p {port} -U {username} {database} > ./dump.sql
For this tutorial, I’m using a database called topstack on localhost
port 5432
with user postgres
. We’ll create a backup SQL script with:
$ pg_dump -h localhost -p 5432 -U postgres topstack > ./dump.sql
We now have a backup of our existing database in the form of a SQL script dump.sql
, we’ll use this to import the data into our postgres docker container. First let’s define our Docker container with a Dockerfile, that will sit in the same directory as dump.sql
:
FROM postgres:9.6
ENV POSTGRES_DB topstack
ADD ./dump.sql /docker-entrypoint-initdb.d/
VOLUME [ "/var/lib/postgresql/data" ]
EXPOSE 5432
Let’s look a bit closer at what this Dockerfile is doing. The first line FROM postgres:9.6
is defining our base image that we will build off of, for our case we are using the official postgres image with postgres version 9.6. If the version is omitted, docker will pull the lastest version.
ENV POSTGRES_DB employees
is setting an environment variable the docker image uses for the name of the database.
ADD ./dump.sql /docker-entrypoint-initdb.d/
copies our SQL script back-up from our host machine into the designated directory that docker executes files from on initialization. For us, this will import our backed up data into the postgres database on our docker container.
VOLUME [ "/var/lib/postgresql/data" ]
tells docker this directory is a volume, meaning that we can mount the directory along with its data on our host machine. We will use this to persist data from our container on the host machine so if the container is removed, the data will remain on our host machine as a docker volume.
Finally, EXPOSE 5432
tells docker that we will want to open port 5432 on our container. This is the default port that postgres listens on.
We can now build our docker image with using the Dockerfile:
$ docker build -t topstack-db-image -f ./Dockerfile .
Let’s break down this command. docker build
tells docker to build an image, -t topstack-db-image
tags that image with the name ‘topstack-db-image’, -f ./Dockerfile .
tells docker to build the image from the file in our current directory called Dockerfile
, which we created above.
We now have a built docker docker image called topstack-db
. We should be able to see the image listed, along with our base images if we run the command:
$ docker images
Before we create a container from our image, let’s create a docker volume that will keep our postgres data on our host machine so that it will not be lost if the container is removed. We will name the volume ‘topstack’.
$ docker volume create topstack
Now that we have our volume made and our image is ready to go, we are ready to spin up a docker container.
docker run -d -v topstack:/var/lib/postgresql/data --name topstack-db-container -p 12345:5432 topstack-db-image
The docker run
command tells docker to start up a container defined by our image we built from our Dockerfile. The distinction between a docker image and docker container can be thought of like using VMware or Virtual Box to run a virtual machine. The ISO file that defines the virtual machine ‘image’ is like the docker image, then we you open that ISO file and run an instance of the virtual machine, it is like executing the docker run
command and telling docker to run an instance of the docker image in a new container. In short, a docker container is a particular instance of a docker image.
The -d
flag tells docker to start the image in detach mode, which will run it in the background. -v topstack:/var/lib/postgresql/data
is how we map the volume we created on our host machine, topstack
, with the volume we defined in our image /var/lib/postgresql/data
. This way all our postgres data in the container will be backed up on our host machine docker volume.
--name topstack-db-container
is giving the container a name. -p 12345:5432
is telling docker to map port 12345 of our host machine to 5432 in our container, which we exposed in the Dockerfile. This will be our entry point for the database in the container. Finally, we must specify the name of the image we are using to run the container off, here it’s topstack-db-image
. NOTE: you don’t have to name your image with -image
or container with -container
, I only did that for clarity in the distinction between the two for this tutorial.
Now we can see our container running:
$ docker ps
From the output of that command, you can see that docker is forwarding requests to port 12345 to port 5432 on the container. This means that our newly dockerized postgres database is listening on http://localhost:12345
For example we can use the following jdbc url:
jdbc:postgresql://localhost:12345/topstack
We can add/edit data in the postgres database on the container and it will persist to our host machine.
There are many advantages to use a dockerized database. We can run multiple postgres databases and have them map to different ports seamlessly. We can quickly spin up snapshots of a database for testing or rapid development. We have an isolated instance of our database. Plus much more!