Installing Pytorch with ROCm 3.3.0 on Ubuntu 18.04
Goal
I have my RX 580 ready for TensorFlow, I tried to install Pytorch but it say my GPU is too old and they do not support now. I brought a VEGA 56 with 10.54 TFLOPS for FP32 from newegg.com at price 266 USD. Let’s install Pytorch on top of ROCm 3.3.0.
First of all, install ROCm 3.3.0 (refer to previous tutorial), requirements are the same.
We follow the instructions from ROCm first, and I will add solution to problem I encountered.
You will need to have 16 GB RAM or more to finish the whole compile, install and test process.
Install dependencies
Install Docker
You will need Docker to finish the installation. Docker is similar to virtual machine simulate a operating system environment isolate from your computer but Docker is much lighter and faster, learn more from their docuements.
Install Docker with instructions from Docker official document or you can use their convenience script. And examine scripts downloaded from the internet before running them locally. Make sure no one added a line to install a trojan into your computer.
1 | $ curl -fsSL https://get.docker.com -o get-docker.sh |
Install ROCm-Dev package
We are going to compile Pytorch from source, it requires rocm-dev
package.
1 | $ sudo apt-get update |
Step Two
Prepare environment for compiling
Now we get the compilation environment for ROCm 3.3.0. The official document is not up-to-date which tells you to run docker pull rocm/pytorch:rocm3.0_ubuntu16.04_py3.6_pytorch
. You should go to their DockerHub and make sure the tag rocm3.0_ubuntu16.04_py3.6_pytorch
is what you need. For ROCm 3.3.0 I need rocm3.3_ubuntu16.04_py3.6_pytorch
so I run:
1 | $ sudo docker pull rocm/pytorch:rocm3.3_ubuntu16.04_py3.6_pytorch |
Prepare source code for compiling
Now clone the source code of Pytorch with Git, do sudo apt-get install git
if you don’t have git.
1 | $ cd ~ |
And then clone the other required source code automatically.
1 | $ cd pytorch |
I would suggest you to run git submodule update --init --recursive
instead of git submodule update
as some of the required source code may have their own required repository which needs to download with --recursive
flag.
1 | $ git submodule update --init --recursive |
Compile and Install
Enter environment for compiling
Make sure the tag is correct before you run this command, my tag was rocm3.3_ubuntu16.04_py3.6_pytorch
for ROCm 3.3.0. Official document forgot to remind you that the tag really matters.
1 | $ sudo docker run -it -v $HOME:/data --privileged --rm --device=/dev/kfd --device=/dev/dri --group-add video rocm/pytorch:rocm3.3_ubuntu16.04_py3.6_pytorch |
And you will get something look another terminal:
1 | root@f78375b1c487:/# |
Now we change to the mounted source code directory:
1 | root@f78375b1c487:/# cd /data/pytorch |
We now start building
Export the right code for GPU. You can check the code by running rocminfo
on your host (out side the docker) from another terminal. Or you can find it here Ctrl
+F
search your GPU. gfx900
for VEGA 56.
1 | root@f78375b1c487:/# export HCC_AMDGPU_TARGET=gfx900 |
Start compiling
An automated script is provided, just run the following command will build and install everything to the docker container.
1 | root@f78375b1c487:/# .jenkins/pytorch/build.sh |
Test
Before we finish everything, we need to run a test.
You may run the script for test…
1 | root@f78375b1c487:/# PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py --verbose |
And it may say Import Error : no module named torch
. No worry, it is easy to fix.
Check your Python version
1 | root@f78375b1c487:/# python -V |
Since the Pytorch was compiled and installed for Python 3.6, you need to use Python 3.6 for running the test.
1 | root@f78375b1c487:/# PYTORCH_TEST_WITH_ROCM=1 python3.6 test/run_test.py --verbose |
Error?
If you do not have 16 GB RAM, it will use up all the memeory and malloc
will raise error for unable to allocate memory.
If you try to run the test with RX 580, Pytorch will tell you the GPU is too old and their do not support now.
Finishing
Install torchvision
Try to install it and you suppose to see it already installed with your compilation and installation of Pytorch.
1 | root@f78375b1c487:/# pip install torchvision |
Save the container
Use the container ID to save it into image so you can use it for different project and prevent environment contamination of different dependencies. The container ID is the hash showing in your terminal for container, f78375b1c487
for mine.
1 | $ sudo docker commit f78375b1c487 -m 'pytorch installed' |
Change f78375b1c487
to your container ID.
The docker container will be automatically removed after quit the environment. Therefore you will need to commit the container with another terminal. If you are using Command Line Interface, use Ctrl
+Alt
+F3
(Usually F7
is the Graphic Desktop, on Fedora it is F2
) to switch to another terminal. I used tmux
so I Ctrl
+B
and then press %
create a new terminal on screen. And commit the container.
DONE
Pytorch time! (>w<)b
And I think you may need a tutorial for Docker to get on.