How do I install NVIDIA GPU driver, CUDA toolkit and optionally NVIDIA Container Toolkit on Amazon Linux 2023 (AL2023)?

3 minute read
Content level: Intermediate
1

I want to install NVIDIA driver, CUDA toolkit and optionally NVIDIA Container Toolkit on Amazon Linux 2023 ( al2023 )

Prepare Amazon Linux 2023

Launch a NVIDIA GPU instance

sudo dnf update -y
sudo dnf install -y dkms kernel-devel kernel-modules-extra

Restart your AL2023 especially if kernel is updated.

Install NVIDIA driver and CUDA toolkit

Method 1: Package Manager Installation (x86_64)

CUDA 12.5 supports Amazon Linux 2023 on x86_64 only.

Ensure your OS has more than 5 GiB of free disk space

Add NVIDIA repo

sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/amzn2023/x86_64/cuda-amzn2023.repo
sudo dnf clean expire-cache

Install NVIDIA driver

sudo dnf module install -y nvidia-driver:latest-dkms

Install CUDA toolkit

sudo dnf install -y cuda-toolkit

Method 2: Runfile installation (x86_64 and arm64)

The instructions below may not work as CUDA Toolkit 12.5 currently only support AL2023 x86_64 rpm install

Ensure your OS has more than 10 GiB of free disk space

Install development libraries

sudo dnf install -y vulkan-devel libglvnd-devel elfutils-libelf-devel

Download CUDA toolkit installer

You can go to CUDA Toolkit download page to obtain latest runfile (local) installer download URL for RHEL 9 on x86_64 and arm64 sbsa

Intel/AMD x86_64

wget https://developer.download.nvidia.com/compute/cuda/12.5.1/local_installers/cuda_12.5.1_555.42.06_linux.run

Graviton arm64 (G5g instance)

wget https://developer.download.nvidia.com/compute/cuda/12.5.1/local_installers/cuda_12.5.1_555.42.06_linux_sbsa.run

Install NVIDIA driver and CUDA toolkit

chmod +x ./cuda*.run
sudo ./cuda_*.run --driver --toolkit --tmpdir=/var/tmp --silent

Post installation

Restart your OS

sudo reboot

Verify NVIDIA driver

nvidia-smi

Your output should be similar to below

Sun Jul  7 05:01:14 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.06              Driver Version: 555.42.06      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA T4G                     Off |   00000000:00:1F.0 Off |                    0 |
| N/A   71C    P0             29W /   70W |       1MiB /  15360MiB |     12%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                        
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Verify CUDA tookit

/usr/local/cuda/bin/nvcc --version

Output should be similar to below

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Jun__6_02:26:10_PDT_2024
Cuda compilation tools, release 12.5, V12.5.82
Build cuda_12.5.r12.5/compiler.34385749_0

More information

Refer to NVIDIA CUDA Installation Guide for Linux for more details and post installation instructions.

[Optional] Install NVIDIA Container Toolkit

sudo dnf config-manager --add-repo https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo
sudo dnf clean expire-cache
sudo dnf install -y nvidia-container-toolkit

Refer to NVIDIA Container Toolkit site for container engine configuration instructions.