I want to install NVIDIA driver, CUDA toolkit and optionally NVIDIA Container Toolkit on Amazon Linux 2023 ( al2023 )
Prepare Amazon Linux 2023
Launch a NVIDIA GPU instance
sudo dnf update -y
sudo dnf install -y dkms kernel-devel kernel-modules-extra
Restart your AL2023 especially if kernel is updated.
Install NVIDIA driver and CUDA toolkit
Method 1: Package Manager Installation (x86_64)
CUDA 12.5 supports Amazon Linux 2023 on x86_64 only.
Ensure your OS has more than 5 GiB of free disk space
Add NVIDIA repo
sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/amzn2023/x86_64/cuda-amzn2023.repo
sudo dnf clean expire-cache
Install NVIDIA driver
sudo dnf module install -y nvidia-driver:latest-dkms
Install CUDA toolkit
sudo dnf install -y cuda-toolkit
Method 2: Runfile installation (x86_64 and arm64)
The instructions below may not work as CUDA Toolkit 12.5 currently only support AL2023 x86_64 rpm install
Ensure your OS has more than 10 GiB of free disk space
Install development libraries
sudo dnf install -y vulkan-devel libglvnd-devel elfutils-libelf-devel
Download CUDA toolkit installer
You can go to CUDA Toolkit download page to obtain latest runfile (local)
installer download URL for RHEL 9 on x86_64 and arm64 sbsa
Intel/AMD x86_64
wget https://developer.download.nvidia.com/compute/cuda/12.5.1/local_installers/cuda_12.5.1_555.42.06_linux.run
wget https://developer.download.nvidia.com/compute/cuda/12.5.1/local_installers/cuda_12.5.1_555.42.06_linux_sbsa.run
Install NVIDIA driver and CUDA toolkit
chmod +x ./cuda*.run
sudo ./cuda_*.run --driver --toolkit --tmpdir=/var/tmp --silent
Post installation
Restart your OS
sudo reboot
Verify NVIDIA driver
nvidia-smi
Your output should be similar to below
Sun Jul 7 05:01:14 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.06 Driver Version: 555.42.06 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA T4G Off | 00000000:00:1F.0 Off | 0 |
| N/A 71C P0 29W / 70W | 1MiB / 15360MiB | 12% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Verify CUDA tookit
/usr/local/cuda/bin/nvcc --version
Output should be similar to below
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Jun__6_02:26:10_PDT_2024
Cuda compilation tools, release 12.5, V12.5.82
Build cuda_12.5.r12.5/compiler.34385749_0
More information
Refer to NVIDIA CUDA Installation Guide for Linux for more details and post installation instructions.
[Optional] Install NVIDIA Container Toolkit
sudo dnf config-manager --add-repo https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo
sudo dnf clean expire-cache
sudo dnf install -y nvidia-container-toolkit
Refer to NVIDIA Container Toolkit site for container engine configuration instructions.