ITNEXT

ITNEXT is a platform for IT developers & software engineers to share knowledge, connect…

Follow publication

Enabling NVIDIA GPUs on K3s for CUDA workloads

--

Update on 2024: I recommend now using solutions like Cloudfleet AI to manage GPU workloads at edge. Cloudfleet provides a managed Kubernetes Control Plane tailored for AI/ML workloads and allows adding nodes to this cluster from various cloud and on-premise platforms.

If you are a machine learning or crypto aficionado and have built a mini datacenter at home with a couple of GPU-attached machines, you might be spending your precious time managing your workloads on your machines. How do you reconfigure everything and deploy your workloads when you decide to reformat one of your machines, how do you shift one workload from one machine to other, how do you make sure that your setup is reproducible? Wouldn’t you be interested in having your Kubernetes cluster in your homelab and manage all your workloads centrally, without worrying too much about how to set up your new DELL R510 you got from eBay?

Image Credit: programmerwiki.com

K3S has democratized Kubernetes by allowing anyone to create Kubernetes clusters within minutes. You can easily set up a Kubernetes cluster using K3S even in your single node setup and improve how you manage your workloads. Later you can move the same workloads to the cloud or larger on-premise Kubernetes clusters only with minimum changes. Needless to say, it is also a great learning opportunity if you want to improve your Kubernetes skills.

The purpose of this article, naturally, is not tell how great Kubernetes is but how to enhance it for workloads requiring GPUs. My most recent challenge was to run CUDA workloads on my K3S cluster to polish a bit more my machine learning skills. It is possible to run Kubernetes Pods on K3S that use NVIDIA GPUs for CUDA workloads and, it does not require anything other than a couple of steps. However, I have noticed that there is not a clear set of instructions showing these steps, which components are required, how to enable them, and so on. Therefore I have decided to write this article that outlines the required steps one by one.

From the bird’s eye of view, the required steps are first, installing the NVIDIA drivers on the K3s node. Please note that on the node itself, we only need the device drivers and not necessarily the CUDA itself. NVIDIA releases CUDA-enabled containers you can use to build your application container. This also means that on the same machine you can easily run different containers with different CUDA versions. The second step is to install the NVIDIA Container Toolkit, which helps to expose the GPU resources from the K3S node to the containers running on it. The third step is to tell K3S to use this toolkit to enable GPUs on the containers. After these steps, everything else is standard Kubernetes stuff, which is installing the NVIDIA device plugin for Kubernetes to your cluster to make Kubernetes aware of GPU resources and make them available to Pods.

In this article, we assume that you already have a running K3S cluster and you have some nodes that have NVIDIA GPUs. Another assumption is that your nodes run Ubuntu 20.04 and K3s v1.21.3+k3s1 but I believe the same instructions apply on other distributions and most of the other recent K3S versions.

Let us dive into the steps now:

Step 1: Install the NVIDIA drivers

The first step on our journey is to install the NVIDIA drivers on our node machine. We can first search the available drivers using apt:

$ apt search nvidia-driver

As of the writing, the latest available driver version is 470, so let us go ahead and install this version:

$ sudo apt install nvidia-headless-470-server

One important point to pay attention to is our choice of installing the “headless” and “server” driver. The standard NVIDIA drivers that many resources would suggest installing come with the baggage of X11. It means that as soon as you install these standard drivers, you also enable the GUI on your host. For a Kubernetes node, you very likely do not want this, although this will not affect how your Kubernetes cluster work. On the other hand, the headless server versions of the driver package only install the device drivers, and we prefer installing them.

Step 2: Install NVIDIA Container Toolkit

NVIDIA Container Toolkit helps us to build and run GPU accelerated containers. In other words, it enables us to expose the GPU to containers running on our node.

The documentation of the Container Toolkit is clear. One point to pay attention to is to install only the containerd version of the toolkit. K3S does not use Docker at all, since Kubernetes already deprecated Docker, and it uses only the containerd to manage containers. Installing also the Docker support won’t impact how your cluster works since it will also implicitly install the containerd support, but since we avoid installing unnecessary packages on our lean Kubernetes nodes, we directly go with the containerd installation.

The documentation is here. If you directly want to jump to the installation, first install the repositories:

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add — \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

and install the nvidia-container-runtime:

$ sudo apt-get update \
&& sudo apt-get install -y nvidia-container-runtime

Now, you can run a test container to make sure that your GPU is exposed to a container:

$ sudo ctr image pull docker.io/nvidia/cuda:11.0-base
$ sudo ctr run --rm --gpus 0 -t docker.io/nvidia/cuda:11.0-base cuda-11.0-base nvidia-smi

You should see the nvidia-smi output, but running this time inside a container!

Step 3: Configure K3S to use nvidia-container-runtime

We should now tell K3S to use nvidia-container-runtime (which is a kind of plugin of containerd) on the containerd of our node.

Our friends at K3D created a useful guide for that. The only part we are interested in in that guide is the “Configure containerd” section. The template they have shared is configuring the containerd to use the nvidia-container-runtime plugin, together with a couple of more extra boilerplate settings. To install that template to our node, we can simply run the following command:

$ sudo wget https://k3d.io/v4.4.8/usage/guides/cuda/config.toml.tmpl -O /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl

Step 4: Install NVIDIA device plugin for Kubernetes

The NVIDIA device plugin for Kubernetes is a DaemonSet that scans the GPUs on each node and exposes them as GPU resources to our Kubernetes nodes.

If you follow the documentation of the device plugin, there is also a Helm chart available to install it. On K3S, we have a simple Helm controller that allows us to install Helm charts on our cluster. Let us leverage it and deploy this Helm chart:

$ cat <<EOF | kubectl apply -f -
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
name: nvidia-device-plugin
namespace: kube-system
spec:
chart: nvidia-device-plugin
repo: https://nvidia.github.io/k8s-device-plugin
EOF

Of course, you can install the device plugin also by applying the manifest directly or by installing the chart using “helm install”. It really depends on your taste.

Step 5: Test everything on a CUDA-enabled Pod

Finally, we can test everything by creating a Pod that uses the CUDA Docker image and requests a GPU resource:

$ cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
name: gpu
spec:
restartPolicy: Never
containers:
- name: gpu
image: "nvidia/cuda:11.4.1-base-ubuntu20.04"
command: [ "/bin/bash", "-c", "--" ]
args: [ "while true; do sleep 30; done;" ]
resources:
limits:
nvidia.com/gpu: 1
EOF

Finally let us run the nvidia-smi on our Pod:

$ kubectl exec -it gpu -- nvidia-smi
Sun Aug 22 10:02:05 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| 33% 40C P8 10W / 180W | 0MiB / 8117MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

Congratulations!

Now you have a Kubernetes cluster with GPU resources available. It is exactly the same interface that the Kubernetes documentation defines. It means that the GPU workloads you design for your new cluster will be completely portable to any other Kubernetes cluster with GPU resources.

Do you find this article useful or have additional comments? Please feel free to let me know.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Published in ITNEXT

ITNEXT is a platform for IT developers & software engineers to share knowledge, connect, collaborate, learn and experience next-gen technologies.

Written by Çağatay Gürtürk

Senior Cloud Architect @epam, formerly Software Development Manager @ebay. Author of Building Serverless Architectures.

Responses (1)

Write a response