Install GPU drivers | Compute Engine Documentation

Linux Windows

After you create an instance with one or more GPUs, your system requiresNVIDIA device drivers so that your applications can access the device. Make sureyour virtual machine (VM) instances have enough free disk space. You shouldchoose at least 30GB for the boot disk when creating the new VM.

To install the drivers, you have two options to choose from:

If you plan to run graphics-intensive workloads, such as those for gamingand visualization, install drivers for the NVIDIA RTX Virtual Workstation (vWS).SeeInstalling drivers for NVIDIA RTX Virtual Workstations (vWS).
For most workloads, follow the instructions in this document to installthe NVIDIA driver.

Before you begin

If you want to use the command-line examples in this guide, do the following:
1. Install or update to the latest version of the Google Cloud CLI.
2. Set a default region and zone.
If you want to use the API examples in this guide, set up API access.

There are different versioned components of drivers and runtime that might beneeded in your environment. These include the following components:

NVIDIA driver
CUDA toolkit
CUDA runtime

When installing these components, you have the ability toconfigure your environment to suit your needs. For example, if you have an earlierversion of Tensorflow that works best with an earlier version of the CUDA toolkit,but the GPU that you want to use requires a later version of the NVIDIA driver,then you can install an earlier version of a CUDA toolkit along with a later versionof the NVIDIA driver.

However, you must make sure that your NVIDIA driver and CUDA toolkit versions arecompatible. For CUDA toolkit and NVIDIA driver compatibility, see the NVIDIAdocumentation about CUDA compatibility.

Required NVIDIA driver versions

NVIDIA GPUs running on Compute Engine must use the following NVIDIA driverversions:

For L4 GPUs:
- Linux : 525.60.13 or later
- Windows: 528.89
For A100 GPUs:
- Linux : 450.80.02 or later
- Windows: 452.77 or later
For T4, P4, P100, and V100 GPUs:
- Linux : 410.79 or later
- Windows : 426.00 or later
For K80 GPUs (End-of-life):
- Linux : 410.79 - latest R470 version
- Windows : 426.00 - latest R470 version
For K80 GPUs, NVIDIA has announced that theR470 driver branch will be thefinal driver version to receive debug support. To review this update,see NVIDIA Software Support Matrix.

Installing GPU drivers on VMs

One way to install the NVIDIA driver on most VMs is to install theNVIDIA CUDA Toolkit.

To install the NVIDIA toolkit, complete the following steps:

Select aCUDA toolkit that supports the minimum driver that you need.
Connect to the VMwhere you want to install the driver.

On your VM, download and install the CUDA toolkit. The installation packageand guide for the minimum recommended toolkit is found in the following table.Before you install the toolkit, make sure you complete the pre-installationsteps found in the installation guide.

GPU type	Minimum recommended CUDA toolkit version	Installation instructions for minimum version
NVIDIA L4	Linux: CUDA Toolkit 12.1 Windows: A supported CUDA Toolkit with the required driver is not yet available. You can install one of the following standalone NVIDIA 528.89 drivers: Windows Server Windows client	Linux: CUDA 12.1 installation guide Windows:TBD
NVIDIA A100	Linux: CUDA Toolkit 11.1 Windows: CUDA Toolkit 11.2	Linux: CUDA 11.1 installation guide Windows: CUDA 11.2 installation guide
NVIDIA T4 NVIDIA V100 NVIDIA P100 NVIDIA P4	Linux: CUDA Toolkit 10.1 update2 Windows: CUDA Toolkit 10.1 update2	Linux: CUDA 10.1 installation guide Windows: CUDA 10.1 installation guide

Installation scripts

You can use the following scripts to automate the installation process.To review these scripts, see theGitHub repository.

Limitations

This script won't work on Linux VMs that haveSecure Boot enabled.For Linux VMs that have Secure Boot enabled, seeInstalling GPU drivers on VMs that use Secure Boot.
If you have version 2.38.0 or later of theOps Agentcollecting GPU metrics on your VM, you must stop the agent before you caninstall or upgrade your GPU drivers using this installation script.
After you have completed the installation or upgrade of the GPU driver, youmust then reboot the VM.
To stop the Ops Agent, run the following command:
```
sudo systemctl stop google-cloud-ops-agent
```

Linux

Supported operating systems

The Linux installation script was tested on the following operatingsystems:

CentOS 7 and 8
Debian 10 and 11
Red Hat Enterprise Linux (RHEL) 7 and 8
Rocky Linux 8
Ubuntu 18, 20, and 22

If you use this script on other operating systems, the installationwill fail. For Linux VMs, this script installs only the NVIDIA driver.

Ensure that Python 3 is installed on your operating system.

Download the installation script.

curl https://raw.githubusercontent.com/GoogleCloudPlatform/compute-gpu-installation/main/linux/install_gpu_driver.py --output install_gpu_driver.py

Run the installation script.
```
sudo python3 install_gpu_driver.py
```
The script takes some time to run. It might restart your VM. If the VMrestarts, run the script again to continue the installation.
Verify the installation. SeeVerifying the GPU driver install.

Windows

This installation script can be used on VMs that have secure boot enabled.

For Windows VMs that use a G2 machine series, this script installs only theNVIDIA driver.
For other machine types, the script installs the NVIDIA driver and CUDAtoolkit.

Open a PowerShell terminal as an administrator, then complete the followingsteps:

If you are using Windows Server 2016, set the Transport Layer Security(TLS) version to 1.2.
```
[Net.ServicePointManager]::SecurityProtocol = 'Tls12'
```

Download the script.

Invoke-WebRequest https://github.com/GoogleCloudPlatform/compute-gpu-installation/raw/main/windows/install_gpu_driver.ps1 -OutFile C:\install_gpu_driver.ps1

Run the script.
```
C:\install_gpu_driver.ps1
```
The script takes some time to run. No command prompts are given during theinstallation process. Once the script exits, the driver is installed.
This script installs the drivers in the following default location onyour VM: C:\Program Files\NVIDIA Corporation\.
Verify the installation. SeeVerifying the GPU driver install.

Installing GPU drivers on VMs that use Secure Boot

VMS with Secure Bootenabled require all kernel modules to be signed by the key trusted by the system.

OS support

For installation of NVIDIA drivers on Windows operating systems that useSecure Boot, see the general Installing GPU drivers on VMssection.
For Linux operating systems, support is only available for Ubuntu 18.04,20.04, and 22.04 operating systems. Support for more operating systems is inprogress.

Ubuntu VMs

Connect to the VMwhere you want to install the driver.
Update the repository.
```
 sudo apt-get update
```
Search for the most recent NVIDIA kernel module package or the version youwant. This package contains NVIDIA kernel modules signed by the Ubuntukey. If you want to find an earlier version, change the number for thetail parameter to get an earlier version. For example, specify tail -n 2.
Ubuntu PRO and LTS
For Ubuntu PRO and LTS, run the following command:
```
NVIDIA_DRIVER_VERSION=$(sudo apt-cache search 'linux-modules-nvidia-[0-9]+-gcp$' | awk '{print $1}' | sort | tail -n 1 | head -n 1 | awk -F"-" '{print $4}')
```
Ubuntu PRO FIPS
For Ubuntu PRO FIPS, run the following commands:
1. Enable Ubuntu FIPS updates.
```
sudo ua enable fips-updates
```
2. Shutdown and reboot
```
sudo shutdown -r now
```
3. Get the latest package.
```
NVIDIA_DRIVER_VERSION=$(sudo apt-cache search 'linux-modules-nvidia-[0-9]+-gcp-fips$' | awk '{print $1}' | sort | tail -n 1 | head -n 1 | awk -F"-" '{print $4}')
```
You can check the picked driver version by running echo $NVIDIA_DRIVER_VERSION.The output is a version string like 455.
Install the kernel module package and corresponding NVIDIA driver.
```
 sudo apt install linux-modules-nvidia-${NVIDIA_DRIVER_VERSION}-gcp nvidia-driver-${NVIDIA_DRIVER_VERSION}
```
If the command failed with the package not found error, the latestNVIDIA driver might be missing from the repository. Retry the previous stepand select an earlier driver version by changing the tail number.
Verify that the NVIDIA driver is installed.You might need to reboot the VM.
If you rebooted the system to verify the NVIDIA version. After the reboot,you need to reset the NVIDIA_DRIVER_VERSION variable by rerunning thecommand that you used in step 3.

Configure APT to use the NVIDIA package repository.

To help APT pick the correct dependency, pin the repositories as follows:

sudo tee /etc/apt/preferences.d/cuda-repository-pin-600 > /dev/null <<EOLPackage: nsight-computePin: origin *ubuntu.com*Pin-Priority: -1
Package: nsight-systemsPin: origin *ubuntu.com*Pin-Priority: -1
Package: nvidia-modprobePin: release l=NVIDIA CUDAPin-Priority: 600
Package: nvidia-settingsPin: release l=NVIDIA CUDAPin-Priority: 600
Package: *Pin: release l=NVIDIA CUDAPin-Priority: 100EOL

Install software-properties-common. This is required if youare using Ubuntu minimal images.
```
 sudo apt install software-properties-common 
```
Set the Ubuntu version.
Ubuntu 18.04
For Ubuntu 18.04, run the following command:
```
export UBUNTU_VERSION=ubuntu1804/x86_64
```
Ubuntu 20.04
For Ubuntu 20.04, run the following command:
```
export UBUNTU_VERSION=ubuntu2004/x86_64
```
Ubuntu 22.04
For Ubuntu 22.04, run the following command:
```
export UBUNTU_VERSION=ubuntu2204/x86_64
```

Download the cuda-keyring package.

wget https://developer.download.nvidia.com/compute/cuda/repos/$UBUNTU_VERSION/cuda-keyring_1.0-1_all.deb

Install the cuda-keyring package.
```
sudo dpkg -i cuda-keyring_1.0-1_all.deb
```

Add the NVIDIA repository.

sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/$UBUNTU_VERSION/ /"

If prompted, select the default action to keep your current version.

Find the compatible CUDA driver version.
The following script determines the latest CUDA driver version that iscompatible with the NVIDIA driver we just installed:
```
 CUDA_DRIVER_VERSION=$(apt-cache madison cuda-drivers | awk '{print $3}' | sort -r | while read line; do if dpkg --compare-versions $(dpkg-query -f='${Version}\n' -W nvidia-driver-${NVIDIA_DRIVER_VERSION}) ge $line ; then echo "$line" break fi done)
```
You can check the CUDA driver version by running echo $CUDA_DRIVER_VERSION.The output is a version string like 455.32.00-1.

Install CUDA drivers with the version identified from the previous step.

 sudo apt install cuda-drivers-${NVIDIA_DRIVER_VERSION}=${CUDA_DRIVER_VERSION} cuda-drivers=${CUDA_DRIVER_VERSION}

Optional: Hold back dkms packages.
After enabling Secure Boot, all kernel modules must be signed to beloaded. Kernel modules built by dkms don't work on the VM because theyaren't properly signed by default. This is an optional step, but it canhelp prevent you from accidentally installing other dkms packages in thefuture.
To hold dkms packages, run the following command:
```
 sudo apt-get remove dkms && sudo apt-mark hold dkms
```

Install CUDA toolkit and runtime.

Pick the suitable CUDA version. The following script determines the latestCUDA version that is compatible with the CUDA driver we just installed:

 CUDA_VERSION=$(apt-cache showpkg cuda-drivers | grep -o 'cuda-runtime-[0-9][0-9]-[0-9],cuda-drivers [0-9\\.]*' | while read line; do if dpkg --compare-versions ${CUDA_DRIVER_VERSION} ge $(echo $line | grep -Eo '[[:digit:]]+\.[[:digit:]]+') ; then echo $(echo $line | grep -Eo '[[:digit:]]+-[[:digit:]]') break fi done)

You can check the CUDA version by running echo $CUDA_VERSION.The output is a version string like 11-1.

Install the CUDA package.
```
 sudo apt install cuda-${CUDA_VERSION}
```
Verify the CUDA installation.
```
 sudo nvidia-smi /usr/local/cuda/bin/nvcc --version
```
The first command prints the GPU information. The second commandprints the installed CUDA compiler version.

Verifying the GPU driver install

After completing the driver installation steps, verify that the driver installedand initialized properly.

Linux

Connect to the Linux instanceand use the nvidia-smi command to verify that the driver is running properly.

sudo nvidia-smi

The output is similar to the following:

Tue Mar 21 19:50:15 2023+-----------------------------------------------------------------------------+| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 ||-------------------------------+----------------------+----------------------+| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. || | | MIG M. ||===============================+======================+======================|| 0 NVIDIA L4 Off | 00000000:00:03.0 Off | 0 || N/A 63C P0 30W / 75W | 0MiB / 23034MiB | 8% Default || | | N/A |+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+| Processes: || GPU GI CI PID Type Process name GPU Memory || ID ID Usage ||=============================================================================|| No running processes found |+-----------------------------------------------------------------------------+

If this command fails, review the following:

Check if there is any GPU attached to the VM.
Use the following command to check for any NVIDIA PCI devices:
sudo lspci | grep -i "nvidia".
Check that the driver kernel version and the VM kernel version are thesame.
- To check the VM kernel version, run uname -r.
- To check the driver kernel version, run sudo apt-cache show linux-modules-nvidia-NVIDIA_DRIVER_VERSION-gcp.
If the versions don't match, reboot the VM to the new kernel version.

Windows Server

Connect to the Windows Server instanceand open a PowerShell terminal as an administrator, then run the followingcommand to verify that the driver is running properly.

&"C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi.exe"

The output is similar to the following:

Tue Mar 21 19:50:15 2023+-----------------------------------------------------------------------------+| NVIDIA-SMI 531.14 Driver Version: 531.14 CUDA Version: 12.1 ||-------------------------------+----------------------+----------------------+| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. || | | MIG M. ||===============================+======================+======================|| 0 NVIDIA L4 WDDM | 00000000:00:04.0 Off | 0 || N/A 50C P8 18W / 70W | 570MiB / 15360MiB | 2% Default || | | N/A |+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+| Processes: || GPU GI CI PID Type Process name GPU Memory || ID ID Usage ||=============================================================================|| 0 N/A N/A 408 C+G Insufficient Permissions N/A || 0 N/A N/A 3120 C+G ...w5n1h2txyewy\SearchUI.exe N/A || 0 N/A N/A 4056 C+G Insufficient Permissions N/A || 0 N/A N/A 4176 C+G ...y\ShellExperienceHost.exe N/A || 0 N/A N/A 5276 C+G C:\Windows\explorer.exe N/A || 0 N/A N/A 5540 C+G ...in7x64\steamwebhelper.exe N/A || 0 N/A N/A 6296 C+G ...y\GalaxyClient Helper.exe N/A |+-----------------------------------------------------------------------------+

What's next?

To monitor GPU performance, see Monitor GPU performance.
To handle GPU host maintenance, see Handle GPU host maintenance events.
To optimize GPU performance, see Optimize GPU performance.

Install GPU drivers | Compute Engine Documentation | Google Cloud (2024)

Before you begin

Required NVIDIA driver versions

Installing GPU drivers on VMs

Installation scripts

Limitations

Linux

Windows

Installing GPU drivers on VMs that use Secure Boot

OS support

Ubuntu VMs

Ubuntu PRO and LTS

Ubuntu PRO FIPS

Ubuntu 18.04

Ubuntu 20.04

Ubuntu 22.04

Verifying the GPU driver install

Linux

Windows Server

What's next?