Enias Caillau
Set up an AWS EC2 instance with GPU support for TensorFlow using Docker to efficiently run deep learning tasks.
Here at Superlinear.eu, we use GPUs to accelerate our Deep Learning training. One of our goals is to remain platform neutral towards our clients. Docker is a great platform that allows us to achieve this goal. Docker is a platform which abstracts the hardware from the container in operation. Now Nvidia and Docker made it possible to pass GPU capabilities such as CUDA to a Docker instance. This tutorial provides a step-through guide on how to get GPU accelerated docker up and running.
Prepare your EC2 instance
In this tutorial, we prepare an Amazon EC2 P2 GPU instance to support nvidia-dockers.
Image: Deep Learning Base AMI (Ubuntu)
Region: eu-central-1 (EU Frankfurt)
Instance type: p2.xlarge
Storage: 50 GB (more if you will use large datasets)
First, boot up an instance with the specifications featured above. Once booted ssh into the machine using your certificate:
ssh -i certificate.pem ubuntu@<Public DNS (IPv4)>
Once on the machine, we first need to install Docker:
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
sudo apt-get update
apt-cache policy docker-ce
sudo apt-get install -y docker-ce
sudo systemctl status docker
sudo groupadd docker
sudo usermod -aG docker $USER
Currently, Docker has no native support for GPU. Luckily Nvidia provides an nvidia-docker runtime which can be used to replace the default Docker runtime. Nvidia-docker2 can be installed using the following commands:
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install nvidia-docker2
We can now test if the runtime is working by running a GPU accelerated docker in the nvidia runtime:
docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
If everything is installed correctly, you should see a printout which includes the name of the GPU that is passed to the Docker instance (in our case the Tesla K80).
The docker command we used to boot up our nvidia/cuda docker selects the nvidia runtime using the runtime argument. However, we don’t want to supply this argument each time we run a docker. To avoid a bloated docker command, we modify the docker daemon to use the Nvidia runtime automatically:
cat <<"EOF" > /etc/docker/daemon.json
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
EOF
pkill -SIGHUP dockerd
That’s it! Your instance is now ready to accept docker images which include GPU support. As an example, let’s deploy a Jupiter notebook to start our deep learning development.
Running Tensorflow GPU in a jupyter notebook
First, we need to ensure the security group of our instance accepts incoming traffic on port 6006 and 8888. Then we can start a docker machine using the following command:
docker run -it -p 8888:8888 -p 6006:6006 tensorflow/tensorflow:latest-gpu
Now you can visit your Jupyter notebook using the public ip from your EC2 instance.
Final thoughts
In this tutorial, we used docker to deploy GPU-accelerated deep learning environments on AWS. Using these docker-machines means that you can collect all your dependencies in one place, resulting in a portable solution.
Contact Us