Saki Shinoda

Machine learning engineer based in London, UK.

Home
About
Blog

GPU-enabled Tensorflow in Docker on AWS

2017-02-13 02:22:00 +0000

Update (again): Not described here are the security group settings necessary for accessing the Jupyter notebook running on the AWS instance (or SSH-ing in). Again, something I might add in future.


Update: I tried saving this setup as an AMI to then launch a new spot instance from. Turns out to be able to persist use of the Nvidia drivers, Nouveau needs to be blacklisted (as described here). I also only just discovered that nvidia-docker has specific documentation for deployment on AWS EC2. I might check that out and make a fixed Nvidia-Docker + Tensorflow-GPU AMI. I might alternatively work on a non-Docker Tensorflow-GPU install since Docker might just be even more hassle than CUDNN, etc.


I might elaborate on this later, but at the moment this is a barebones script for getting Nvidia-Docker up and running on an Ubuntu 14.04 AWS instance (I used a spot g2.xlarge in Oregon) to then run the Tensorflow-GPU docker image. It puts into one place the commands you have to follow which are given separately (with more commentary) at the following links:


The script

# Ubuntu 14.04
# Note python is 2.7 by default

sudo apt-get -y update
sudo apt-get -y upgrade

# INSTALL DOCKER ENGINE
# https://docs.docker.com/engine/installation/linux/ubuntu/
sudo apt-get -y install curl linux-image-extra-$(uname -r) linux-image-extra-virtual
sudo apt-get -y install apt-transport-https ca-certificates
curl -fsSL https://yum.dockerproject.org/gpg | sudo apt-key add -
sudo apt-get -y install software-properties-common
sudo add-apt-repository \
       "deb https://apt.dockerproject.org/repo/ \
       ubuntu-$(lsb_release -cs) \
       main"
sudo apt-get -y update
sudo apt-get -y install docker-engine
# Optional: testing
# sudo docker run hello-world  

# ALLOW NON-SUDO DOCKER
# https://docs.docker.com/engine/installation/linux/linux-postinstall/
sudo groupadd docker
sudo usermod -aG docker $USER
# Log out and log back in (i.e. exit, then ssh back in)
# testing, again:
# docker run hello-world  

# INSTALL NVIDIA DOCKER and PREREQs
# https://github.com/NVIDIA/nvidia-docker
sudo apt-get -y install build-essential  # need gcc
sudo apt-get -y install nvidia-modprobe  # not sure this is essential
wget -P /tmp http://us.download.nvidia.com/XFree86/Linux-x86_64/367.57/NVIDIA-Linux-x86_64-367.57.run
# when running the nvidia install, default settings fine
sudo sh /tmp/NVIDIA-Linux-x86_64-367.57.run
rm /tmp/NVIDIA-Linux-x86_64-367.57.run
# Install nvidia-docker and nvidia-docker-plugin
wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.0/nvidia-docker_1.0.0-1_amd64.deb
sudo dpkg -i /tmp/nvidia-docker*.deb && rm /tmp/nvidia-docker*.deb
# Test nvidia-smi
nvidia-docker run --rm nvidia/cuda nvidia-smi

# PULL TENSORFLOW DOCKER IMG
# https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/docker
docker pull gcr.io/tensorflow/tensorflow:latest-gpu

# Launch notebook:
nvidia-docker run -it -p 8888:8888 gcr.io/tensorflow/tensorflow:latest-gpu