I have my RX 580 ready for TensorFlow, I tried to install Pytorch but it say my GPU is too old and they do not support now. I brought a VEGA 56 with 10.54 TFLOPS for FP32 from newegg.com at price 266 USD. Let’s install Pytorch on top of ROCm 3.3.0.
First of all, install ROCm 3.3.0 (refer to previous tutorial), requirements are the same.
We follow the instructions from ROCm first, and I will add solution to problem I encountered.
You will need to have 16 GB RAM or more to finish the whole compile, install and test process.
You will need Docker to finish the installation. Docker is similar to virtual machine simulate a operating system environment isolate from your computer but Docker is much lighter and faster, learn more from their docuements.
Install Docker with instructions from Docker official document or you can use their convenience script. And examine scripts downloaded from the internet before running them locally. Make sure no one added a line to install a trojan into your computer.
$ curl -fsSL https://get.docker.com -o get-docker.sh
We are going to compile Pytorch from source, it requires
$ sudo apt-get update
Now we get the compilation environment for ROCm 3.3.0. The official document is not up-to-date which tells you to run
docker pull rocm/pytorch:rocm3.0_ubuntu16.04_py3.6_pytorch. You should go to their DockerHub and make sure the tag
rocm3.0_ubuntu16.04_py3.6_pytorch is what you need. For ROCm 3.3.0 I need
rocm3.3_ubuntu16.04_py3.6_pytorch so I run:
$ sudo docker pull rocm/pytorch:rocm3.3_ubuntu16.04_py3.6_pytorch
Now clone the source code of Pytorch with Git, do
sudo apt-get install git if you don’t have git.
$ cd ~
And then clone the other required source code automatically.
$ cd pytorch
I would suggest you to run
git submodule update --init --recursive instead of
git submodule update as some of the required source code may have their own required repository which needs to download with
$ git submodule update --init --recursive
Make sure the tag is correct before you run this command, my tag was
rocm3.3_ubuntu16.04_py3.6_pytorch for ROCm 3.3.0. Official document forgot to remind you that the tag really matters.
$ sudo docker run -it -v $HOME:/data --privileged --rm --device=/dev/kfd --device=/dev/dri --group-add video rocm/pytorch:rocm3.3_ubuntu16.04_py3.6_pytorch
And you will get something look another terminal:
Now we change to the mounted source code directory:
root@f78375b1c487:/# cd /data/pytorch
Export the right code for GPU. You can check the code by running
rocminfo on your host (out side the docker) from another terminal. Or you can find it here
F search your GPU.
gfx900 for VEGA 56.
root@f78375b1c487:/# export HCC_AMDGPU_TARGET=gfx900
An automated script is provided, just run the following command will build and install everything to the docker container.
Before we finish everything, we need to run a test.
You may run the script for test…
root@f78375b1c487:/# PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py --verbose
And it may say
Import Error : no module named torch. No worry, it is easy to fix.
Check your Python version
root@f78375b1c487:/# python -V
Since the Pytorch was compiled and installed for Python 3.6, you need to use Python 3.6 for running the test.
root@f78375b1c487:/# PYTORCH_TEST_WITH_ROCM=1 python3.6 test/run_test.py --verbose
If you do not have 16 GB RAM, it will use up all the memeory and
malloc will raise error for unable to allocate memory.
If you try to run the test with RX 580, Pytorch will tell you the GPU is too old and their do not support now.
Try to install it and you suppose to see it already installed with your compilation and installation of Pytorch.
root@f78375b1c487:/# pip install torchvision
Use the container ID to save it into image so you can use it for different project and prevent environment contamination of different dependencies. The container ID is the hash showing in your terminal for container,
f78375b1c487 for mine.
$ sudo docker commit f78375b1c487 -m 'pytorch installed'
f78375b1c487 to your container ID.
The docker container will be automatically removed after quit the environment. Therefore you will need to commit the container with another terminal. If you are using Command Line Interface, use
F7 is the Graphic Desktop, on Fedora it is
F2) to switch to another terminal. I used
tmux so I
B and then press
% create a new terminal on screen. And commit the container.
Pytorch time! (>w<)b
And I think you may need a tutorial for Docker to get on.