Linux Windows
After you create an instance with one or more GPUs, your system requiresNVIDIA device drivers so that your applications can access the device. Make sureyour virtual machine (VM) instances have enough free disk space. You shouldchoose at least 30GB for the boot disk when creating the new VM.
To install the drivers, you have two options to choose from:
- If you plan to run graphics-intensive workloads, such as those for gamingand visualization, install drivers for the NVIDIA RTX Virtual Workstation (vWS).SeeInstalling drivers for NVIDIA RTX Virtual Workstations (vWS).
For most workloads, follow the instructions in this document to installthe NVIDIA driver.
Before you begin
- If you want to use the command-line examples in this guide, do the following:
- Install or update to the latest version of the Google Cloud CLI.
- Set a default region and zone.
- If you want to use the API examples in this guide, set up API access.
There are different versioned components of drivers and runtime that might beneeded in your environment. These include the following components:
- NVIDIA driver
- CUDA toolkit
- CUDA runtime
When installing these components, you have the ability toconfigure your environment to suit your needs. For example, if you have an earlierversion of Tensorflow that works best with an earlier version of the CUDA toolkit,but the GPU that you want to use requires a later version of the NVIDIA driver,then you can install an earlier version of a CUDA toolkit along with a later versionof the NVIDIA driver.
However, you must make sure that your NVIDIA driver and CUDA toolkit versions arecompatible. For CUDA toolkit and NVIDIA driver compatibility, see the NVIDIAdocumentation about CUDA compatibility.
Required NVIDIA driver versions
NVIDIA GPUs running on Compute Engine must use the following NVIDIA driverversions:
- For L4 GPUs:
- Linux : 525.60.13 or later
- Windows: 528.89
- For A100 GPUs:
- Linux : 450.80.02 or later
- Windows: 452.77 or later
For T4, P4, P100, and V100 GPUs:
- Linux : 410.79 or later
- Windows : 426.00 or later
For K80 GPUs (End-of-life):
- Linux : 410.79 - latest R470 version
- Windows : 426.00 - latest R470 version
For K80 GPUs, NVIDIA has announced that theR470 driver branch will be thefinal driver version to receive debug support. To review this update,see NVIDIA Software Support Matrix.
Installing GPU drivers on VMs
One way to install the NVIDIA driver on most VMs is to install theNVIDIA CUDA Toolkit.
To install the NVIDIA toolkit, complete the following steps:
Select aCUDA toolkit that supports the minimum driver that you need.
Connect to the VMwhere you want to install the driver.
On your VM, download and install the CUDA toolkit. The installation packageand guide for the minimum recommended toolkit is found in the following table.Before you install the toolkit, make sure you complete the pre-installationsteps found in the installation guide.
GPU type Minimum recommended CUDA toolkit version Installation instructions for minimum version - NVIDIA L4
- Linux: CUDA Toolkit 12.1
- Windows: A supported CUDA Toolkit with the required driver is not yet available.
You can install one of the following standalone NVIDIA 528.89 drivers:
- Linux: CUDA 12.1 installation guide
- Windows:TBD
- NVIDIA A100
- Linux: CUDA Toolkit 11.1
- Windows: CUDA Toolkit 11.2
- Linux: CUDA 11.1 installation guide
- Windows: CUDA 11.2 installation guide
- NVIDIA T4
- NVIDIA V100
- NVIDIA P100
- NVIDIA P4
- Linux: CUDA Toolkit 10.1 update2
- Windows: CUDA Toolkit 10.1 update2
- Linux: CUDA 10.1 installation guide
- Windows: CUDA 10.1 installation guide
Installation scripts
You can use the following scripts to automate the installation process.To review these scripts, see theGitHub repository.
Limitations
- This script won't work on Linux VMs that haveSecure Boot enabled.For Linux VMs that have Secure Boot enabled, seeInstalling GPU drivers on VMs that use Secure Boot.
If you have version 2.38.0 or later of theOps Agentcollecting GPU metrics on your VM, you must stop the agent before you caninstall or upgrade your GPU drivers using this installation script.
After you have completed the installation or upgrade of the GPU driver, youmust then reboot the VM.
To stop the Ops Agent, run the following command:
sudo systemctl stop google-cloud-ops-agent
Linux
Supported operating systems
The Linux installation script was tested on the following operatingsystems:
- CentOS 7 and 8
- Debian 10 and 11
- Red Hat Enterprise Linux (RHEL) 7 and 8
- Rocky Linux 8
- Ubuntu 18, 20, and 22
If you use this script on other operating systems, the installationwill fail. For Linux VMs, this script installs only the NVIDIA driver.
Ensure that Python 3 is installed on your operating system.
Download the installation script.
curl https://raw.githubusercontent.com/GoogleCloudPlatform/compute-gpu-installation/main/linux/install_gpu_driver.py --output install_gpu_driver.py
Run the installation script.
sudo python3 install_gpu_driver.py
The script takes some time to run. It might restart your VM. If the VMrestarts, run the script again to continue the installation.
Verify the installation. SeeVerifying the GPU driver install.
Windows
This installation script can be used on VMs that have secure boot enabled.
- For Windows VMs that use a G2 machine series, this script installs only theNVIDIA driver.
- For other machine types, the script installs the NVIDIA driver and CUDAtoolkit.
Open a PowerShell terminal as an administrator, then complete the followingsteps:
If you are using Windows Server 2016, set the Transport Layer Security(TLS) version to 1.2.
[Net.ServicePointManager]::SecurityProtocol = 'Tls12'
Download the script.
Invoke-WebRequest https://github.com/GoogleCloudPlatform/compute-gpu-installation/raw/main/windows/install_gpu_driver.ps1 -OutFile C:\install_gpu_driver.ps1
Run the script.
C:\install_gpu_driver.ps1
The script takes some time to run. No command prompts are given during theinstallation process. Once the script exits, the driver is installed.
This script installs the drivers in the following default location onyour VM:
C:\Program Files\NVIDIA Corporation\
.Verify the installation. SeeVerifying the GPU driver install.
Installing GPU drivers on VMs that use Secure Boot
VMS with Secure Bootenabled require all kernel modules to be signed by the key trusted by the system.
OS support
- For installation of NVIDIA drivers on Windows operating systems that useSecure Boot, see the general Installing GPU drivers on VMssection.
- For Linux operating systems, support is only available for Ubuntu 18.04,20.04, and 22.04 operating systems. Support for more operating systems is inprogress.
Ubuntu VMs
Connect to the VMwhere you want to install the driver.
Update the repository.
sudo apt-get update
Search for the most recent NVIDIA kernel module package or the version youwant. This package contains NVIDIA kernel modules signed by the Ubuntukey. If you want to find an earlier version, change the number for thetail parameter to get an earlier version. For example, specify
tail -n 2
.Ubuntu PRO and LTS
For Ubuntu PRO and LTS, run the following command:
NVIDIA_DRIVER_VERSION=$(sudo apt-cache search 'linux-modules-nvidia-[0-9]+-gcp$' | awk '{print $1}' | sort | tail -n 1 | head -n 1 | awk -F"-" '{print $4}')
Ubuntu PRO FIPS
For Ubuntu PRO FIPS, run the following commands:
Enable Ubuntu FIPS updates.
sudo ua enable fips-updates
Shutdown and reboot
sudo shutdown -r now
Get the latest package.
NVIDIA_DRIVER_VERSION=$(sudo apt-cache search 'linux-modules-nvidia-[0-9]+-gcp-fips$' | awk '{print $1}' | sort | tail -n 1 | head -n 1 | awk -F"-" '{print $4}')
You can check the picked driver version by running
echo $NVIDIA_DRIVER_VERSION
.The output is a version string like455
.Install the kernel module package and corresponding NVIDIA driver.
sudo apt install linux-modules-nvidia-${NVIDIA_DRIVER_VERSION}-gcp nvidia-driver-${NVIDIA_DRIVER_VERSION}
If the command failed with the
package not found error
, the latestNVIDIA driver might be missing from the repository. Retry the previous stepand select an earlier driver version by changing the tail number.Verify that the NVIDIA driver is installed.You might need to reboot the VM.
If you rebooted the system to verify the NVIDIA version. After the reboot,you need to reset the
NVIDIA_DRIVER_VERSION
variable by rerunning thecommand that you used in step 3.Configure APT to use the NVIDIA package repository.
To help APT pick the correct dependency, pin the repositories as follows:
sudo tee /etc/apt/preferences.d/cuda-repository-pin-600 > /dev/null <<EOLPackage: nsight-computePin: origin *ubuntu.com*Pin-Priority: -1
Package: nsight-systemsPin: origin *ubuntu.com*Pin-Priority: -1
Package: nvidia-modprobePin: release l=NVIDIA CUDAPin-Priority: 600
Package: nvidia-settingsPin: release l=NVIDIA CUDAPin-Priority: 600
Package: *Pin: release l=NVIDIA CUDAPin-Priority: 100EOLInstall
software-properties-common
. This is required if youare using Ubuntu minimal images.sudo apt install software-properties-common
Set the Ubuntu version.
Ubuntu 18.04
For Ubuntu 18.04, run the following command:
export UBUNTU_VERSION=ubuntu1804/x86_64
Ubuntu 20.04
For Ubuntu 20.04, run the following command:
export UBUNTU_VERSION=ubuntu2004/x86_64
Ubuntu 22.04
For Ubuntu 22.04, run the following command:
export UBUNTU_VERSION=ubuntu2204/x86_64
Download the
cuda-keyring
package.wget https://developer.download.nvidia.com/compute/cuda/repos/$UBUNTU_VERSION/cuda-keyring_1.0-1_all.deb
Install the
cuda-keyring
package.sudo dpkg -i cuda-keyring_1.0-1_all.deb
Add the NVIDIA repository.
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/$UBUNTU_VERSION/ /"
If prompted, select the default action to keep your current version.
Find the compatible CUDA driver version.
The following script determines the latest CUDA driver version that iscompatible with the NVIDIA driver we just installed:
CUDA_DRIVER_VERSION=$(apt-cache madison cuda-drivers | awk '{print $3}' | sort -r | while read line; do if dpkg --compare-versions $(dpkg-query -f='${Version}\n' -W nvidia-driver-${NVIDIA_DRIVER_VERSION}) ge $line ; then echo "$line" break fi done)
You can check the CUDA driver version by running
echo $CUDA_DRIVER_VERSION
.The output is a version string like455.32.00-1
.Install CUDA drivers with the version identified from the previous step.
sudo apt install cuda-drivers-${NVIDIA_DRIVER_VERSION}=${CUDA_DRIVER_VERSION} cuda-drivers=${CUDA_DRIVER_VERSION}
Optional: Hold back
dkms
packages.After enabling Secure Boot, all kernel modules must be signed to beloaded. Kernel modules built by
dkms
don't work on the VM because theyaren't properly signed by default. This is an optional step, but it canhelp prevent you from accidentally installing otherdkms
packages in thefuture.To hold
dkms
packages, run the following command:sudo apt-get remove dkms && sudo apt-mark hold dkms
Install CUDA toolkit and runtime.
Pick the suitable CUDA version. The following script determines the latestCUDA version that is compatible with the CUDA driver we just installed:
CUDA_VERSION=$(apt-cache showpkg cuda-drivers | grep -o 'cuda-runtime-[0-9][0-9]-[0-9],cuda-drivers [0-9\\.]*' | while read line; do if dpkg --compare-versions ${CUDA_DRIVER_VERSION} ge $(echo $line | grep -Eo '[[:digit:]]+\.[[:digit:]]+') ; then echo $(echo $line | grep -Eo '[[:digit:]]+-[[:digit:]]') break fi done)
You can check the CUDA version by running
echo $CUDA_VERSION
.The output is a version string like11-1
.Install the CUDA package.
sudo apt install cuda-${CUDA_VERSION}
Verify the CUDA installation.
sudo nvidia-smi
/usr/local/cuda/bin/nvcc --version
The first command prints the GPU information. The second commandprints the installed CUDA compiler version.
Verifying the GPU driver install
After completing the driver installation steps, verify that the driver installedand initialized properly.
Linux
Connect to the Linux instanceand use the nvidia-smi
command to verify that the driver is running properly.
sudo nvidia-smi
The output is similar to the following:
Tue Mar 21 19:50:15 2023+-----------------------------------------------------------------------------+| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 ||-------------------------------+----------------------+----------------------+| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. || | | MIG M. ||===============================+======================+======================|| 0 NVIDIA L4 Off | 00000000:00:03.0 Off | 0 || N/A 63C P0 30W / 75W | 0MiB / 23034MiB | 8% Default || | | N/A |+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+| Processes: || GPU GI CI PID Type Process name GPU Memory || ID ID Usage ||=============================================================================|| No running processes found |+-----------------------------------------------------------------------------+
If this command fails, review the following:
Check if there is any GPU attached to the VM.
Use the following command to check for any NVIDIA PCI devices:
sudo lspci | grep -i "nvidia"
.Check that the driver kernel version and the VM kernel version are thesame.
- To check the VM kernel version, run
uname -r
. - To check the driver kernel version, run
sudo apt-cache show linux-modules-nvidia-NVIDIA_DRIVER_VERSION-gcp
.
If the versions don't match, reboot the VM to the new kernel version.
- To check the VM kernel version, run
Windows Server
Connect to the Windows Server instanceand open a PowerShell terminal as an administrator, then run the followingcommand to verify that the driver is running properly.
&"C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi.exe"
The output is similar to the following:
Tue Mar 21 19:50:15 2023+-----------------------------------------------------------------------------+| NVIDIA-SMI 531.14 Driver Version: 531.14 CUDA Version: 12.1 ||-------------------------------+----------------------+----------------------+| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. || | | MIG M. ||===============================+======================+======================|| 0 NVIDIA L4 WDDM | 00000000:00:04.0 Off | 0 || N/A 50C P8 18W / 70W | 570MiB / 15360MiB | 2% Default || | | N/A |+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+| Processes: || GPU GI CI PID Type Process name GPU Memory || ID ID Usage ||=============================================================================|| 0 N/A N/A 408 C+G Insufficient Permissions N/A || 0 N/A N/A 3120 C+G ...w5n1h2txyewy\SearchUI.exe N/A || 0 N/A N/A 4056 C+G Insufficient Permissions N/A || 0 N/A N/A 4176 C+G ...y\ShellExperienceHost.exe N/A || 0 N/A N/A 5276 C+G C:\Windows\explorer.exe N/A || 0 N/A N/A 5540 C+G ...in7x64\steamwebhelper.exe N/A || 0 N/A N/A 6296 C+G ...y\GalaxyClient Helper.exe N/A |+-----------------------------------------------------------------------------+
What's next?
- To monitor GPU performance, see Monitor GPU performance.
- To handle GPU host maintenance, see Handle GPU host maintenance events.
- To optimize GPU performance, see Optimize GPU performance.