NVIDIA GPU Workloads
Mirantis Kubernetes Engine (MKE) supports running workloads on NVIDIA GPU nodes. Current support is limited to NVIDIA GPUs. MKE uses the NVIDIA GPU Operator to manage GPU resources on the cluster.
To enable GPU support, MKE installs the NVIDIA GPU Operator on your cluster.
Prerequisites
Before you can enable NVIDIA GPU support in MKE, you must install the following components on each GPU-enabled node:
- The device driver for your GPU
- NVIDIA GPU toolkit
- NVIDIA container runtime for containerd, using the command
sudo nvidia-ctk runtime configure --runtime=containerd --config /etc/k0s/containerd.d/nvidia.toml
Configuration
NVIDIA GPU support is disabled by default. To enable NVIDIA GPU support, configure
the nvidiaGPU
section of the MKE configuration file under devicePlugins
:
devicePlugins:
nvidiaGPU:
enabled: true
Running GPU Workloads
Run a simple GPU workload that reports detected NVIDIA GPU devices:
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
restartPolicy: Never
containers:
- name: cuda-container
image: nvcr.io/nvidia/cloud-native/gpu-operator-validator:v22.9.0
resources:
limits:
nvidia.com/gpu: 1 # requesting 1 GPU
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
EOF
Verify the successful completion of the pod:
kubectl get pods | grep "gpu-pod"
Example output:
NAME READY STATUS RESTARTS AGE
gpu-pod 0/1 Completed 0 7m56s
Upgrading
To upgrade an MKE 3 cluster with GPU enabled, make sure that you complete the GPU prerequisites before you start the upgrade process. Failing to do this will cause the upgrade process to detect the GPU configuration in MKE 3 and incorrectly transfer it to MKE 4.