For some reason I need to reinstall operating system and CUDA on a deep learning machine with GTX 1070 Ti. After installed Ubuntu Server 18.04, I was confused with the NVIDIA document, so I write down this notes to keep a reference.
The procedures are actually very simple, but the document was a bit too detailed or the layout is too complex to find the key points, I was lost in the lines.
- Ubuntu 16.04 or later
- NVIDIA GPU(s) that support CUDA
Since I installed the Ubuntu Server 18.04 with LVM, I soon used up all space. It seems the default space is just fit for the operating system. Solution to it is to extend the LVM partition.
$ sudo lvm
CUDA version does not really matter. Detailed Installation guide for your reference.
It is simple as I have installed the latest Ubuntu Server LTS version and I know it is supports CUDA things, I also sure GTX 1070 Ti supports CUDA. All I need to do now is to install GCC compiler and Linux development packages.
I am so lazy that I just install everything I need to build anything.
$ sudo apt-get install build-essential
Then install the kernel headers and development packages for the currently running kernel.
$ sudo apt-get install linux-headers-$(uname -r)
Now go to CUDA Toolkit Download Page download the installation package and follow the guide to install it.
I choosed the easiest way to install, use a automated script. Copy the instructions, enter the terminal, press Enter key and wait… Few agreement will require manual input “accept” for EULA before proceed to install the package.
It will be a good idea to use Docker for each project. It is easy to set up and run. If you know how to work with Docker, check out the document.
$ docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
pip is available for Python, install it if not.
$ sudo apt-get install python3-pip
And then install Pytorch.
$ pip3 install torch torchvision
Find the command for your environment from Pytorch Official document