Installing Pytorch with ROCm 3.3.0 on Ubuntu 18.04

Goal

I have my RX 580 ready for TensorFlow, I tried to install Pytorch but it say my GPU is too old and they do not support now. I brought a VEGA 56 with 10.54 TFLOPS for FP32 from newegg.com at price 266 USD. Let’s install Pytorch on top of ROCm 3.3.0.

First of all, install ROCm 3.3.0 (refer to previous tutorial), requirements are the same.

We follow the instructions from ROCm first, and I will add solution to problem I encountered.

You will need to have 16 GB RAM or more to finish the whole compile, install and test process.

Install dependencies

Install Docker

You will need Docker to finish the installation. Docker is similar to virtual machine simulate a operating system environment isolate from your computer but Docker is much lighter and faster, learn more from their docuements.

Install Docker with instructions from Docker official document or you can use their convenience script. And examine scripts downloaded from the internet before running them locally. Make sure no one added a line to install a trojan into your computer.

1
2
3

$ curl -fsSL https://get.docker.com -o get-docker.sh

$ sudo sh get-docker.sh

Install ROCm-Dev package

We are going to compile Pytorch from source, it requires rocm-dev package.

$ sudo apt-get update

$ sudo apt-get upgrade

$ sudo apt-get install rocm-dev

Step Two

Prepare environment for compiling

Now we get the compilation environment for ROCm 3.3.0. The official document is not up-to-date which tells you to run docker pull rocm/pytorch:rocm3.0_ubuntu16.04_py3.6_pytorch. You should go to their DockerHub and make sure the tag rocm3.0_ubuntu16.04_py3.6_pytorch is what you need. For ROCm 3.3.0 I need rocm3.3_ubuntu16.04_py3.6_pytorch so I run:

1	$ sudo docker pull rocm/pytorch:rocm3.3_ubuntu16.04_py3.6_pytorch

Prepare source code for compiling

Now clone the source code of Pytorch with Git, do sudo apt-get install git if you don’t have git.

1
2
3

$ cd ~

$ git clone https://github.com/pytorch/pytorch.git

And then clone the other required source code automatically.

$ cd pytorch

$ git submodule init

$ git submodule update

I would suggest you to run git submodule update --init --recursive instead of git submodule update as some of the required source code may have their own required repository which needs to download with --recursive flag.

1	$ git submodule update --init --recursive

Compile and Install

Enter environment for compiling

Make sure the tag is correct before you run this command, my tag was rocm3.3_ubuntu16.04_py3.6_pytorch for ROCm 3.3.0. Official document forgot to remind you that the tag really matters.

1	$ sudo docker run -it -v $HOME:/data --privileged --rm --device=/dev/kfd --device=/dev/dri --group-add video rocm/pytorch:rocm3.3_ubuntu16.04_py3.6_pytorch

And you will get something look another terminal:

1	root@f78375b1c487:/#

Now we change to the mounted source code directory:

1	root@f78375b1c487:/# cd /data/pytorch

We now start building

Export the right code for GPU. You can check the code by running rocminfo on your host (out side the docker) from another terminal. Or you can find it here Ctrl+F search your GPU. gfx900 for VEGA 56.

1	root@f78375b1c487:/# export HCC_AMDGPU_TARGET=gfx900

Start compiling

An automated script is provided, just run the following command will build and install everything to the docker container.

1	root@f78375b1c487:/# .jenkins/pytorch/build.sh

Test

Before we finish everything, we need to run a test.

You may run the script for test…

1	root@f78375b1c487:/# PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py --verbose

And it may say Import Error : no module named torch. No worry, it is easy to fix.

Check your Python version

root@f78375b1c487:/# python -V
Python 2.7.18
root@f78375b1c487:/# python3 -V
Python 3.5.8
root@f78375b1c487:/# python3.6 -V
Python 3.6.10

Since the Pytorch was compiled and installed for Python 3.6, you need to use Python 3.6 for running the test.

1	root@f78375b1c487:/# PYTORCH_TEST_WITH_ROCM=1 python3.6 test/run_test.py --verbose

Error?

If you do not have 16 GB RAM, it will use up all the memeory and malloc will raise error for unable to allocate memory.

If you try to run the test with RX 580, Pytorch will tell you the GPU is too old and their do not support now.

Finishing

Install torchvision

Try to install it and you suppose to see it already installed with your compilation and installation of Pytorch.

1	root@f78375b1c487:/# pip install torchvision

Save the container

Use the container ID to save it into image so you can use it for different project and prevent environment contamination of different dependencies. The container ID is the hash showing in your terminal for container, f78375b1c487 for mine.

1	$ sudo docker commit f78375b1c487 -m 'pytorch installed'

Change f78375b1c487 to your container ID.

The docker container will be automatically removed after quit the environment. Therefore you will need to commit the container with another terminal. If you are using Command Line Interface, use Ctrl+Alt+F3 (Usually F7 is the Graphic Desktop, on Fedora it is F2) to switch to another terminal. I used tmux so I Ctrl+B and then press % create a new terminal on screen. And commit the container.

DONE

Pytorch time! (>w<)b

And I think you may need a tutorial for Docker to get on.